This blog post is part two of Bypassing Control Flow Guard in Windows 10. It is also the result of some research I did back in July of 2016, but did not have the possibility to publish before now. The same Internet Explorer vulnerability is used with the same original proof of concept by Theori. This blog post will present another method of bypassing CFG, which also still works on Internet Explorer, but not in Microsoft Edge, due to suppressed API’s. It is assumed that the reader has read the previous blog post, so the details of CFG are not explained, and I jump right to the point of having an arbitrary read/write primitive.
Looking for Another CFG Bypass
In the last blog post I leaked the registers, including a stack pointer, thus allowing me to overwrite the return address of the write primitive. Another generic approach to bypassing CFG would be to start a ROP chain, where the first gadget comes from a DLL which is compiled without CFG. This method would work since the CFG validation bitmap corresponding to a module which is compiled without CFG is to allow all addresses. The problem is however that Microsoft has compiled all the modules loaded by Internet Explorer and Microsoft Edge with CFG. If some plugin or third-party application is installed which has a loaded module in the process and that module is compiled without CFG then that would be an attack vector. I wanted to find a native way to do it without relying on any third-party modules. This then raises the question, are all native modules in C:\Windows\System32 compiled with CFG, the answer is actually no, not by a long shot. To find the non CFG compiled native modules I wrote the following python script:
This script finds all DLL files in C:\Windows\System32 and checks the DLL characteristics, since this contains a flag which indicates whether the DLL was compiled with CFG.
Running it on my PoC Windows 10 1511 gave 145 unique DLL’s which are not compiled with CFG, which one to use ends up depending on the available ROP gadgets in them. I found that quite a few of these DLL’s do not contains any useful gadgets, so the ones with a large code size are interesting. The one I ended up using was mfc40.dll:
Loading a DLL
Now that we have found a native DLL which might contain useful gadgets and is not compiled with CFG, we face the problem that this DLL is not loaded in the browser process by default. We may simply load the DLL into the process using the kernelbase!LoadLibraryA API, let us verify that this is allowed by CFG. Below is the verification bitmap:
From that bitmap we follow the algorithm of ntdll!LdrpValidateUserCallTarget:
From which we can see that the bit at offset 4 is 1, hence the API is an allowed indirect call. The next step is locating the API, this may be done by first leaking a pointer into kernelbase.dll and then locating the function in the DLL. The first part is done through first locating the function Segment::Initialize in jscript9 since it uses kernel32!VirtualAllocStub, which in turn uses kernelbase!VirtualAlloc. The way I find the function is by scanning through jscript9 from the address of the vtable and calculating a hash, this is done using the read primitive. The algorithm looks like shown below:
The hash is found by adding 5 DWORDs and going one byte forward each time until the correct hash is found. The very simple hashing function is actually quite collision free. The dereference call to kernel32!VirtualAlloc is then at offset 0x37 in the Segment::Initialize function as shown below:
This pointer can be read out to get:
Which then contains the dereference jump to kernelbase!VirtualAlloc at offset 0x6:
Now we have a pointer into kernelbase.dll, then we may locate the address of LoadLibraryA by using another hash value as seen below
This is done by overwriting offset 0x7C in the vtable with LoadLibraryA. HasItem is chosen since it takes one variable, so It fits the needs of LoadLibraryA. A pointer to the name of the variable has to be supplied as an argument, so the string C:\Windows\System32\mfc40.dll must be written into the TypedArray buffer like shown below:
Which is implemented below:
Now we can load the DLL into the process:
Which when run shows that the module is now loaded into the process:
Interacting With a DLL
We have made sure that a non CFG compiled DLL is loaded into the process space, this will allow us to use it for stack pivot gadgets thus bypassing CFG. However we need to know the address of mfc40.dll in the process space. Unfortunately the HasItem API only returns a Boolean, so even though LoadLibraryA does return the load address, it is filtered out before returning to attacker control. We have to find some other way to leak the address of the module. One way to accomplish this is to leak the address of the PEB since it holds a pointer to a linked list of all modules loaded into the process.
So the problem turns into finding the PEB instead of the module. The way I chose to find it is by using the API IsBadCodePtr which has the following syntax according to MSDN:
Which means that supplied a memory address the API will return a Boolean to indicate whether it is allocated or not, which allows us to use the read primitive without fear of causing an exception. The Idea is then to search through the process space and for each allocated memory page test if it is one of the TEB’s and from that find the PEB. On Windows 8.1 the TEB’s was always located between 0x7F000000 and 0x7FFE0000, however on Windows 10 it has been randomized much more and may now reside between 0x100000 and 0x4000000 according to my testing. That is 0x3F00000 memory address, but since we only want to know if the memory is allocated it is enough to test once for each memory page, meaning 0x3F00 pages. First we need to find the address of IsBadCodePtr in kernel32, which looks like this:
That means first leaking a pointer into kernel32.dll and then using a hash to search for the function. The pointer into kernel32.dll may be found using the same method as for kernelbase.dll since we are dereferencing kernel32!VirtualAllocStub into kernelbase!VirtualAlloc, so we may just stop earlier and return the pointer. The code for finding IsBadCodePtr looks like this:
Now that we are able to call the API, the question is what are we looking for in a memory page which is valid. Looking at the PEB and TEB’s in memory we get the following dump:
From this it is clear that the TEB’s are always close to each other in memory and quite a few of them occupy subsequent memory pages. Furthermore looking at the structure of the TEB:
It is clear that the DWORD at offset 0x18 contains the address of the TEB and the DWORD at offset 0x30 contains the address of the PEB, which is the same for all TEB’s. So to sum up we are searching for the following qualities when we find a valid memory page:
- Offset 0x18 must contain address of the page base address.
- Two subsequent allocations are valid.
- Offset 0x30 of both allocations must contain the same value.
This may be implemented in the following algorithm:
When running the code the debugger catches the following exception:
The reason for this is that IsBadCodePtr works by registering a structured exception handler and then tries to read the memory and if it is not valid the new exception handler will catch it, thus giving the answer. For us, that however means that it becomes harder to debug it, since every 0x3F00 potential call will pause the debugger at an exception. So to verify that the code works it is run without a debugger and with an alert containing the address of the PEB when found, which looks like this:
And to verify that, the debugger is then attached to the process and the address of the PEB is found:
Which shows that the algorithm works. To add a note, the IsBadCodePtr is the reason this method does not work in Microsoft Edge, since it is blocked by the suppressed API mitigation, but any other method which leaks the PEB will reenable this method.
Finding the DLL
To locate the DLL base address we are going to take advantage of the PEB_LDR_DATA structure which is located at offset 0xC in the PEB:
This structure looks like this:
Which means that at offset 0xC, 0x14 and 0x1C we have three linked lists which all contain the loaded DLL’s currently in the process space. We are going to use the one at 0x1C. Below is shown how to find the DLL name from the PEB:
The procedure is based on offset 0x18 in the linked list contains a pointer to the name of the DLL and offset 0x8 contains the base address of that module, while the first DWORD is the pointer to the next element in the list. It is then a matter of iterating through this list until mfc40.dll is found. This is performed by the code below:
Where the function getName reads out the Unicode name and translates it to ASCII to compare them. When run we get:
Which shows that have found the DLL and we can now use it for ROP gadgets.
The final piece is to find a ROP gadget which allows us to begin executing the ROP chain and in turn bypass CFG. Finding the right gadget can be tricky and depends on the way EIP control is given. The way I ended up doing it was to use the the subarray call also used by Theori in the original PoC. In assembly it looks like this:
It is at offset 0x188 in the vtable and takes one or two parameters. If only one parameter is specified the other has a default value. In the following screenshot I overwrote the Subarray address with IsBadCodePtr from before, but put a breakpoint on it:
From this we can see that both EBX and ECX contain a pointer to the object which has the overwritten length. Furthermore at ESP+4 and ESP+8 we find the two supplied arguments. While hunting through ways to take advantage of this I came across the following gadget:
Since the second DWORD on the stack is controlled by us we may place the address of a stack pivot gadget there. Remembering that CFG does not protect against the address on the stack, we may look for a stack pivot gadget in all loaded DLL’s. We can use the following gadget in kernelbase.dll as our stack pivot:
Since ECX also points to the object, the content at offset 0xD8 may be controlled by us. If we place the address of the ROP chain to use it will allow us to call VirtualProtect. First we need to find the address of these gadgets dynamically, this is done by searching through the DLL, this time not for a hash, but for the bytes the gadget consists off:
When both gadgets have been found we insert them at the correct offsets from the start of the object, this is seen below:
When run we get:
Where we can see that the CFG bypass gadget did indeed bypass CFG and the stack pivot gadget was placed onto the stack. Furthermore when we step through the instruction we do end up with a stack pivot:
The ROP gadget buffer is empty since no gadgets have been placed there yet.
It is clear that non-CFG modules are a threat and that a single non-CFG DLL in the process space creates a free bypass for CFG. It should be noted that Microsoft did suppress the IsBadCodePtr API on Microsoft Edge so this method does not work there, however it does still work on Internet Explorer. While Microsoft most likely will recompile more and more DLL’s with CFG, third party vendors will lag behind. What about the browser plugins or the security software like antivirus which injects a DLL into the process to monitor, is that CFG compiled?
The proof of concept code can be found on: https://github.com/MortenSchenk/LoadLibrary-CFG-Bypass