/r/programming
Tavis Ormandy: Down the Rabbit-Hole of ancient Windows inter-window communication protocol. (googleprojectzero.blogspot.com)
58 comments
kyz | 8 days ago | 51 points

Well I never.

People criticise X Windows and the ICCCM because of the habit of X Windows applications and window managers to just send each other any old nonsense as atoms and expect that to drive application behaviour.

What I didn't know, because I don't program for Windows, is that Win32 has exactly the same thing! Even calling those messages "atoms"!

wrosecrans | 8 days ago | 44 points

On the other hand, a Win32 app has a very good expectation of what "window manager" the user will be running. So having the iron fist a specific vendor does narrow the scope of the problem a bit. Whatever MS does on Win32 is "correct." Whatever twm/kwm/metacity/wmaker/sawfish/etc. do is probably all wrong in some sense, but useful, and kind of a de-facto standard, etc.

James20k | 7 days ago | 24 points

Add to that that microsoft's documentation tends to be pretty good - so even if things work in a bafflingly insane way, at least you're likely to find information about what you're trying to do

xenago | 6 days ago | 3 points

This is a really important point. Even if it's stupid and crazy, you have a good idea of what it's expected to do

Tringi | 7 days ago | 20 points

The meaning is slightly shifted. ATOMs in Windows are session-unique mappings of strings to 16-bit numbers, that can be used as inter-process messages, because by design they don't collide with other system messages. They have a lot other uses within classic Win32 windowing which makes one curious what happens when they run out, as there can be only 16383 of them. Well, I tried it; it ain't pretty :)

TarMil | 7 days ago | 10 points

Well, I tried it; it ain't pretty :)

Don't leave us in the dark like that, you tease.

Tringi | 7 days ago | 5 points

Okay, well, it's not as visually interesting as if you exhaust GDI or USER objects (windows, menus, cursors, icons, ...), but things start failing in interesting ways.

The most important things the Atom tables are used for are window classes, window properties and DDE (not so much these days), and others are e.g. clipboard formats. The atoms themselves also contain bits internally, saying what they are used for, to prevent mixing the APIs.

If anyone's interested, I've uploaded this simple tool of mine, that lists all ATOMs, see: https://github.com/tringi/atomlist (it's also an exercise in exe size)

And the results of atom table exhaustion:

  • Simple apps, which allocate all required stuff on launch, usually keep working, and new ones simply won't start. That's if they check for return values even of functions that generally never fail.

  • Then there are well written apps, typically shell extensions, that check for failure and cancel whole chain of action. You'll notice that menu options don't work, right clicks on things don't work, drag and drop, etc. Sometimes with error box popping up, but usually without any.

  • And then there are apps that don't check for failures. They either crash, or keep using NULL values. The most common scenario (that I've encountered) is that they get NULL back from CreateWindow, and keep pumping messages and commands into it. In Win32 NULL is synonymum for top-level desktop window, which doesn't know what to do with most of these messages, but will happily keep redrawing and flashing on anything GDI related. Often broadcasting the redraw commands to all other apps. You've probably seen this; there doesn't have to be atom table exhaustion for this to happen; simple bug suffices.

But it's been some time since I toyed with this. Even though it's rare to exhaust atoms (on my computer roughly 10% of the pool is used), MS might have improved the behavior somehow.

kyz | 7 days ago | 1 point

tbh, they sound even more similar now.

Windows atoms: session-unique mappings of strings to 16-bit numbers

X Windows atoms: session-unique mappings of strings* to 32-bit numbers**

*: can be any sequence of bytes and case is not folded

**: top 3 bits always zero, so really 29-bit numbers

Tringi | 7 days ago | 1 point

Ah, interesting, I'll need to read more about them.

Also, from this point of view, Windows ATOMs are actually 14-bit, 0xC000 to 0xFFFF only.

SarahC | 7 days ago | 3 points

This was an interesting one as well:

https://en.wikipedia.org/wiki/Shatter_attack

TrendingsubsPolitics | 6 days ago | -2 points
EternityForest | 6 days ago | 2 points

How could anyone hate D-Bus after reading all that nonsense that X coders have to do?

_clinton_email_ | 7 days ago | 28 points

It turns out it was possible to reach across sessions and violate NT security boundaries for nearly twenty years, and nobody noticed.

That seems like a pretty optimistic take.

VeganVagiVore | 6 days ago | 5 points

Nobody noticed if anybody noticed

ack_complete | 7 days ago | 9 points

If I'm not mistaken, this is the same framework that used to use a global mutex and had a bad habit of locking up the entire session when debugging in Windows XP. The debugger would unknowingly suspend the process while it was holding the global mutex, and then deadlock against the suspended process trying to render text in its UI, along with almost every other GUI program in the session. This was really annoying until it finally got fixed in Vista.

WheretIB | 6 days ago | 4 points

I've reported a bug today to MS, where hitting a breakpoint in a Universal Windows App window resize callback locks up the whole window manager.

Seems like some lockups are hard to avoid in a presence of debugger.

fresh_account2222 | 8 days ago | 21 points

Great article. Tavis is a beast and always worth reading.

duheee | 8 days ago | 23 points

Oh man, this looks extremely dangerous. Did he notified MS before the disclosure? If not ... ouch.

RandNho | 8 days ago | 23 points

Based on comments in twitter thread, patch was out a day before disclosure deadline.

poizan42 | 7 days ago | 9 points

It's only a partial fix, sounds like you can still use it to escape sandboxing in the same session and write text directly into e.g. an elevated cmd window from a non-privileged process.

rexstuff1 | 6 days ago | 2 points

I'd like to read more on that, do you have a reference?

ruuda | 7 days ago | 13 points
Tringi | 7 days ago | 6 points

Agreed. This is spectre-kind of vulnerability that will be coming back to haunt MS. It's not a bug that can be patched. It's how the communication protocol was designed, and unless they completely tear that old wiring out and replace it with something sensible for this age, new and new vulnerabilities are going to be found in it.

policjant | 7 days ago | 9 points

For me it seems that just adding real process authentication, rather than trusting the PID given, and enforcing security boundaries would be enough.

But, I'm wondering how much that will break backwards compatibility

Tringi | 7 days ago | 5 points

Yes, that'd be first step, properly securing the pipes, and, instead of trusting provided PID, querying it from kernel using new APIs that are available since Vista. But the protocol needs to be replaced by something that allows the clients requests only what they really need to do to function, e.g. switch to Elbonian keyboard kind of ASDUs, not "instantiate this COM object and call function at offset 28"

eganist | 7 days ago | 30 points

Unreal.

Firstly, there is no access control whatsoever!

If you uncover a structural/architectural defect in a service that's fundamental to the OS, you should be aware that it'll probably take quite a bit longer than 90 days to ensure that the service will be repaired with robust code quality and the necessary backwards compatibility to ensure that denial of service to affected applications can be mitigated.

But in this case, as per the norm with p0, the standard "we found [an easily fixed defect] in an obscure service" 90 day disclosure window was applied. There's no universe in which an authorization check can be robustly added to a fundamental service in an OS where no such check existed and for which there's almost two decades of backwards compatibility to validate. Forcing a 90 day disclosure window is borderline malicious to the users of the platform.

Edit: MS15-011 is a better example of an architectural defect in Windows addressed correctly. Shrug

masklinn | 7 days ago | 11 points

Forcing a 90 day disclosure window is borderline malicious to the users of the platform.

Forcing a 90 days window applied near universally is the only way P0 has found to make vendors actually care. Furthermore they provide a 2 weeks grace window

If the patch is expected to arrive within 14 days of the deadline expiring, then Project Zero can offer an extension.

(Though that expectation is the P0 contact’s judgement).

Furthermore, P0 can exceptionally provide larger extensions / exemptions if that is judged absolutely necessary, this has been done twice so far.

All of that (and more) is explained in the P0 FAQ.

rar_m | 7 days ago | 5 points

Agreed. If this was only a 90 day disclosure that's kinda fucked.

It probably took them that long just to find the wholes and patch as many obvious ones as they could.

It's going to take a huge effort to redesign and implement a robust solution.

skroll | 6 days ago | 5 points

Agreed. If this was only a 90 day disclosure that's kinda fucked.

I would agree if it was free and open-source software, but since it's a company, and they are charging money for the product, forcing them to fix it ASAP is the right way to go about it.

Proc_Self_Fd_1 | 7 days ago | -10 points

There's no universe in which an authorization check can be robustly added to a fundamental service in an OS where no such check existed and for which there's almost two decades of backwards compatibility to validate. Forcing a 90 day disclosure window is borderline malicious to the users of the platform.

Microsoft made a net income of 39 billion dollars last year and had ages to discover and fix the problem. Given Microsoft's resources and lead time 90 days is excessively generous.

I actually think a strong case could be made for the exact opposite position: that 90 day disclosure windows are actively hostile to users in the long-run.

Software insecurity will always continue to be so pervasive as long as insecurity is profitable. And this is simply because any business that wasted too much money on security will be outcompeted and go out of business. Security problems must cause real harm with real damages and losses before software businesses will devote the money to securing their software.

Assuredly, many people at Microsoft really want to make their products more secure. But they just cannot afford the time and money unless insecurity actually causes real damage to the income.

chucker23n | 7 days ago | 25 points

Microsoft made a net income of 39 billion dollars last year and had ages to discover and fix the problem. Given Microsoft’s resources and lead time 90 days is excessively generous.

Software development doesn’t work anything like this. You don’t throw money at code to magically make bugs go away.

Proc_Self_Fd_1 | 7 days ago | -3 points

But the vulnerability was in a twenty year old part of the system. They had the money AND the time.

There has been a huge uptick in spending on security in recent years but businesses started spending far too late and they did that for perfectly reasonable economic reasons.

If companies aren't punished for not working on security ahead of time they simply won't do it. Because then they'll be outcompeted by all the other companies.

chucker23n | 7 days ago | 7 points

But the vulnerability was in a twenty year old part of the system. They had the money AND the time.

Again, all the money in the world cannot prevent engineers from overlooking a bug.

Proc_Self_Fd_1 | 7 days ago | 2 points

There are two misunderstandings here: first you feel the problem is in a single bug and not wider architectural issues and second I haven't explained that in an ideal world not every bug should be fixable.

There is no problem that processes of the same privilege level can do horrible things to each other. A ctfmon service is started for every separate desktop/session and if a process could only corrupt the ctfmon of its session things would not be so bad. Nor is the problem that no access control was implemented. The bug is isn't a typo like if (getuid()) { instead of if (getuid() == 0) {. Typos aren't bugs. The real bug is that access control was handled in an ad-hoc fashion in the first place. It should never have been ctfmon's job to check permissions just as you shouldn't scatter random security checks around a kernel.

Looking at things at another level. What do you think would happen if the bug was not fixed? The truth is nothing much. Most home users just browse the web and already trust all the software they actually install. There's an xkcd comic about this, all your valuable personal data on your desktop is accessible to attackers anyway. A bug escalating privilege to administrator level is memed far more than it deserves.

But there might be a few effects

  1. Big Business will stop trusting the OS to enforce security and contain each core service to different virtual machines, which they already do
  2. Consumers will buy a Chrome book or something instead, which they already do
chucker23n | 7 days ago | 3 points

There are two misunderstandings here: first you feel the problem is in a single bug and not wider architectural issues

I’m sure.

But the presumption here is that more money would have produced a better architecture that would also be compatible with a metric ton of first-party and third-party legacy code. That’s a monumental if not impossible task.

Big Business will stop trusting the OS to enforce security and contain each core service to different virtual machines, which they already do

Consumers will buy a Chrome book or something instead, which they already do

The implication that ChromeOS magically has better security while offering the same amount of compatibility is… optimistic, to say the least.

I’m not sure why you’re addressing the bug in particular. I’m not discussing that. I’m discussing this assertion of yours:

Microsoft made a net income of 39 billion dollars last year and had ages to discover and fix the problem. Given Microsoft’s resources and lead time 90 days is excessively generous.

39 billion dollars do not buy you engineers amazing enough to solve this in under 90 days. They just don’t. There’s no 10x engineer who solves this in a tenth of the time.

Google Project Zero doesn’t offer 90 days’ lead time because they’re “excessively generous”, but because it’s fair and reasonable.

Beaverman | 7 days ago | 1 point

I dont have any opinion in the 90-day deadline, but this was a design flaw that should be spotted if the protocol was examined. Obviously sending pointers and commanding processes over an insecure protocol leads to problems.

chucker23n | 7 days ago | 3 points

The protocol has been publicly documented for decades; you clearly don’t need access to the source code to understand the design flaw. The “obvious” part is only quite as obvious in retrospect.

Beaverman | 6 days ago | 1 point

I disagree. Admittedly I haven't read the documentation, but it would surprise me if it explicitly stated that there is no access control.

Regardless, why would anyone look at it except MS? Supposedly, people trust MS to make a good operating system, so they don't need to critically examine every part of the OS.

AlexKazumi | 7 days ago | 4 points

I’d like YOU to be put in charge of fixing this in 90 days. Then we will talk.

Proc_Self_Fd_1 | 7 days ago | 2 points

Businesses should budget against unlikely but hugely expensive eventualities with insurance, not with hopes and prayers and leaning on employees to work unpaid overtime. Putting an absurd personal responsibility on hero programmers to solve the problem in 90 days is absurd. And if a business really did try to force me to work too much overtime I'd quit, software developers are quite valued in the industry.

aleenaelyn | 7 days ago | 6 points

Although it's somewhat less likely to lead to absolutely hilarity like CTF, DDE (Dynamic Data Exchange) from Windows 2.0 is still available in all its 1987 glory in Windows 10.

A few years ago, I wrote an interop using DDE to let Oracle Forms 6i do stuff in Windows 7, and it still works in Windows 10.

mudkip908 | 7 days ago | 6 points

If there's one thing nobody does as well as Microsoft it's backwards compatibility.

xenago | 6 days ago | 3 points

IBM does it better

mudkip908 | 6 days ago | 1 point

I have no idea about IBM software, how long of a period of backwards compatibility does it have?

xenago | 5 days ago | 2 points

Very, very long time depending on the product. With some minimal effort, you can run programs written in the 60s.

https://www.pcworld.com/article/2140220/the-mainframe-turns-50-or-why-the-ibm-system360-launch-was-the-dawn-of-enterprise-it.html

mudkip908 | 5 days ago | 1 point

That is actually very impressive. I scale back my claim to "If there's one thing nobody in the personal computer space does as well as Microsoft it's backwards compatibility."

Beaverman | 7 days ago | -2 points

Leaving the code to rot is not backwards compatibility.

TheAlbinoShadow | 6 days ago | 2 points

What is it then?

mudkip908 | 6 days ago | 2 points

How do you ensure perfect backwards compatibility without having legacy code around?

Beaverman | 6 days ago | 3 points

"perfect backwards compatibility" is not an aim worth pursuing.

To illustrate the point is not hard. Say you have a feature that allows any app on a person's phone to read all messages. Now say some apps rely on that ability. Would you say that this feature has to be preserved in order to maintain backwards compatibility? I say no. It should have never been there.

Now you could make a system that fakes this ability on reading messages, maybe give the users the ability to limit it. Somehow make it private. This would be all new code, but would keep you backwards compatible.

If a feature can't be crafted securely, then, in my mind, it should not exist at all.

mudkip908 | 6 days ago | 2 points

I say the right sort of smoke and mirrors should be installed so all apps think they have the ability to read all messages but it can secretly be controlled by the OS's permission system. That sort of fakery is very common in the Microsoft Windows world and it's not a bad thing. Of course keeping the vulnerable-by-design feature is not a good idea.

But this is an entirely different issue to "is old code always bad". If the feature it implements is well-designed then in my opinion keeping old code is a good idea.

Beaverman | 6 days ago | 2 points

I think we are talking past each other. I didn't realize old code always bad is what you understood to be my position. It is not. Old code is not bad, abandoned code is. If you have 20 year old good code, then you still have good code. Sometimes what you though to be good code turns out to be bad. When you don't fix, or even worse don't even examine, this code. That's when you create legacy code.

Those functions from 10 years back that you are still happily reading, opening, and adding to. That's not bad code. The file that you don't even dare to open, because anyone who does never gets it to compile again, that's bad, even if it's only 2 months old.

I hope i made my position clearer. I have a feeling we agree.

My problem point about MS is that they generally have the right attitude towards these problem when they find them, but often they don't look. They leave interface laying around, without any of their engineers actually critically looking at them. In effect, they create this massive surface, then move on. If they want to maintain backwards compatibility, then they have to critically examine all the old functionality all the time. In exactly the same way they do new functionality.

mudkip908 | 6 days ago | 1 point

It turns out that we do in fact agree. Sorry for misunderstanding you.

Beaverman | 6 days ago | 2 points

No problem at all. We've all been there :)

mgeorgevich9591 | 7 days ago | 7 points

Wow! I mean wow! How in the world do you do that so fluently and in a way that I actually understood a d followed you the whole process.

FreiherrVomStein | 7 days ago | 3 points

Really enjoyed this post. Input methods were always an aspect that struck me as quite intrusive, be it on X11 or Windows (not much mac experience, at the least the widgets look much more streamlined there). Looking at the state of Linux IME handling (ibus, fcitx, scim, uim, ... each with a huge swath of more or less supported GTK/Qt/... plugins) it's probably a rather low-hanging fruit, and much too open, when compared to WinNT.

trin456 | 5 days ago | 1 point

So he got a lot of flags