This blog post is the result of some research I did back in July of 2016, but did not have the possibility to publish before now. In June of 2016 Theori published a blog post on an Internet Explorer vulnerability which was patched in MS16-063, the exploit they wrote was for Internet Explorer 11 on Windows 7, and as their own blog post indicates the exploit will not work on Windows 10 due to the mitigation technology called Control Flow Guard. This blog post describes how I ported the exploit to Windows 10 and bypassed CFG, in fact I found another method, which will be posted in an upcoming blog post.
Understanding the Enemy – Control Flow Guard
Control Flow Guard (CFG) is a mitigation implemented by Microsoft in Windows 8.1 Update 3 and Windows 10 which attempts to protect indirect calls at the assembly level. Trend Micro has published a good analysis of how CFG is implemented on Windows 10. There have already been several bypasses published for CFG, but most of these previous ones have targeted the CFG implementation algorithms themselves, while I wanted to look at weaknesses in the functionality. As Theori wrote in their blog post the exploit technique from Windows 7 will not work due to the presence of CFG, let us look closer at why and try to understand a way around it.
The exploit code from the Theori github works on Internet Explorer on Windows 10 up until the overwritten virtual function table is called. So, we are left with the question of how to leverage the arbitrary read/write primitive to bypass CFG. According to the research by Trend Micro, CFG is invoked by the function LdrpValidateUserCallTarget which validates if a function is valid to use in an indirect call, it looks like this:
The pointer loaded into EDX is the base pointer of the validation bitmap, which in this case is:
Then the function which is validated has its address loaded into ECX, if kernel32!VirtualProtectStub is taken as example then the address in this case is:
The address is then right shifted 8 bits and used to load the DWORD which holds the validation bit for that address, in this case:
The function address is then bit shifted 3 to the right and a bit test is performed, this essentially does a modulo 0x20 on the bit shifted address which is then the bit to be checked in the DWORD from the validation bitmap, so in this case:
So the relevant bit is at offset 0x14 in:
Which means that it is valid, so VirtualProtect is a valid calling address, however this does not really solve the problem, the arguments for it must be supplied by the attacker as well. Normally this is done in a ROP chain but any bytes not stemming from the beginning of a function are not valid. So, the solution is to find a function which may be called where the arguments can be controlled and the functionality of the function gives the attacker an advantage. This requires us to look closer at the exploit.
Exploit on Windows 10
In the exploit supplied by Theori, code execution is achieved by overwriting the virtual function table of the TypedArray with a stack pivot gadget, since this is no longer possible it is worth looking into the functions available to a TypedArray, while doing this the following two functions seem interesting:
The API I located which could be used is RtlCaptureContext which is present in kernel32.dll, kernelbase.dll and ntdll.dll, the API takes one argument which is a pointer to a CONTEXT structure as shown on MSDN:
A CONTEXT structure holds a dump of all the registers including ESP, furthermore the input value is just a pointer to a buffer which can hold the data. Looking at the layout of a TypedArray object the following appears:
The first DWORD is the vtable pointer, which can be overwritten to create a fake vtable holding the address of the RtlCaptureContext API at offset 0x7C, while the DWORD at offset 0x20 is the pointer to the actual data of the TypedArray where the size is user controlled:
Since it is also possible to leak the address of this buffer, it can serve as the parameter for RtlCaptureContext. To accomplish this a fake vtable now has to be created with a pointer to ntdll!RtlCaptureContext at offset 0x7C, that means leaking the address of RtlCaptureContext, which in turn means leaking the address of ntdll.dll. The default route of performing this would be to use the address of the vtable which is a pointer into jscript9.dll:
From this pointer iterate back 0x1000 bytes continuously looking for the MZ header, and then going through the import table looking for a pointer into kernelbase.dll. Then doing the same for that pointer to gain the base address of kernelbase.dll, then looking at the import tables for a pointer into ntdll.dll and again getting the base address and then looking up the exported functions from here to find RtlCaptureContext. While this method is perfectly valid it does have a drawback, if EMET is installed on the system it will trigger a crash since code coming from jscript9.dll, which our read/write primitive does, is not allowed to read data from the PE header or to go through the export table, to get around that I used a different technique. Remember that every indirect call protected by CFG calls ntdll!LdrpValidateUserCallTarget, and since jscript9.dll is protected by CFG any function with an indirect call contains a pointer directly into ntdll.dll. One such function is at offset 0x10 in the vtable:
Using the read primitive, the pointer to ntdll.dll may then be found through the following function:
Going from a pointer into ntdll.dll to the address of RtlCaptureContext without looking at the export tables may be accomplished by using the read primitive to search for a signature or hash. RtlCaptureContext looks like this:
The first 0x30 bytes always stay the same and are pretty unique, so they may be used as a collision free hash when added together as seen below:
Where the function takes a pointer into ntdll.dll as argument.
Putting all of this together gives:
From here offset 0x200 of the buffer contains the results from RtlCaptureContext, viewing it shows:
From the above it is clear that stack pointers have been leaked, it is now a matter of finding an address to overwrite which will give execution control. Looking at the top of the stack shows:
Which is the current function return address, this address is placed at an offset of 0x40 bytes from the leaked pointer at offset 0x9C in the RtlCaptureContext information. With a bit of luck this offset will be the same for other simple functions, so it should be possible to invoke the write primitive and make it overwrite its own return address thus bypassing CFG.
The addition to the exploit is shown below:
Which when run does show EIP control:
Furthermore, the writes to offset 0x40 and 0x44 are now placed at the top of stack, which allows for creating a stack pivot and then a ROP chain, one way could be to use a POP EAX gadget followed by XCHG EAX, ESP gadget.
Microsoft has stated that CFG bypassed which corrupt return addresses on the stack are a known design limitation and hence not eligible to fixes or any kind of bug bounty as shown here:
With that said, Microsoft has done two things to mitigate this technique, first in the upcoming version of Windows 10, Return Flow Guard will be implemented which is seen as a way to stop stack corruptions from giving execution control. The other is the introduction of sensitive API’s in the Anniversary edition release of Windows 10, it only protects Microsoft Edge, so would not help in this case, but it does block the RtlCaptureContext API on Microsoft Edge.
If you made it this far, thanks for reading. The proof of concept code can be found on: https://github.com/MortenSchenk/RtlCaptureContext-CFG-Bypass