Dissecting ELF files
Introduction
Rainer Maria RilkeVous me demandez si vos vers sont bons. Et c'est moi que vous interrogez. Vous avez, auparavant, demandé leur avis à d'autres gens. Vous avez envoyé ces vers à des revues. Vous les comparez à d'autres poèmes, et vous êtes inquiet lorsque certaines rédactions refusent vos essais. Puisque vous m'avez autorisé à vous donner quelque conseil, je vous prierai de cesser tout cela. Votre regard est tourné vers l'extérieur, et c'est d'abord cela que vous ne devriez désormais plus faire. Personne ne peut vous conseiller ni vous aider, personne. Il n'existe qu'un seul moyen: plongez en vous-même, recherchez la raison qui vous enjoint d'écrire; examinez si cette raison étend ses racines jusqu'aux plus extrêmes profondeurs de votre cœur; répondez franchement à la question de savoir si vous seriez condamné à mourir au cas où il vous serait refusé d'écrire. Avant toute chose, demandez-vous, à l'heure la plus tranquille de votre nuit : est-il nécessaire que j'écrive? Creusez en vous-même en quête d'une réponse profonde. Et si elle devait être positive, si vous étiez fondé à repondre à cette question grave par un puissant et simple « je ne peux pas faire autrement », construisez alors votre existence en fonction de cette nécessité.
The Executable and Linkable Format (ELF) [1]
is the common binary format on many Unix-like systems.
This article is a hands-on primer on 64-bit ELF. Instead of jumping between readelf,
objdump, and documentation, you can explore the same structures in one place:
header, program headers, section headers, symbols, and relocations.
Have fun.
ELF header
The format comes with structures that differ in 32-bit and 64-bit architectures, this is noticable in the naming convention used (i.e: Elf64_Ehdr). The ELF Header (Ehdr) is the principal structure that exists at the start of the file.
The architecture (e_machine) we're focusing on in this article is EM_X86_64, the types
of that are interesting for us are ET_EXEC (executables) and ET_REL
(relocatable objects). Executables provide an entry point, this address
contains code that runs first.
You might expect it to point to main(), but that'd be wrong. In fact, gcc
links against "startup files" that define a function called _start, it does
initialization (__libc_start_main), and calls main at the end with the famous
arguments int argc and char** argv.
(You thought they came from the sky ?)
_start
\
__libc_start_main_impl
\
exit(main(argc, argv))
Let's override e_entry with a function called fn, we'll omit the
startup objects with -nostartfiles and indicate to the linker
our entry point
(The function must not "return" like a normal C function, there is no
caller to return to, so you should terminate the process explicitly via exit()).
#include <stdlib.h>
void fn()
{
// Perform main
// logic before
// terminating.
exit(EXIT_SUCCESS);
}
gcc -nostartfiles -Wl,-efn entry.c
This primary header also describes where to find the program and section header tables in the file. For the program header table, the file offset is stored in e_phoff, it contains e_phnum entries, and each entry is e_phentsize bytes long. The section header table follows the same pattern using e_shoff, e_shnum and e_shentsize.
The field e_shstrndx stores the index of the section header that points to the section-name string table (.shstrtab), which is used to resolve section names.
Mental model
To help understand the difference between program headers and section headers,
it's important to look at the bigger picture, one is required and the other
is optional, one is used during runtime, the other provides symbols and
debug information during build-time.
If you debug a binary you compiled yourself, it often still contains symbol
information. In that case, when you ask gdb to disassemble main,
it can locate it by name because the binary includes a symbol table (named .symtab).
These symbols make debugging much easier, but they also make it easier for others to
understand and reverse-engineer your program. For production releases, you may
want to reduce the amount of symbol and debug information therein. One common
approach is to remove the full symbol table using strip,
producing a "stripped" executable.
strip main -o main_stripped
You can see that readelf still prints section names because the section-header
string table (.shstrtab) is still present. And while the regular symbol table may be
removed, the dynamic symbol table (.dynsym, with its strings in .dynstr)
remains because it's needed at runtime to resolve symbols like
__libc_start_main, __gmon_start__,
and puts.
Let's use radare to patch the ELF file. We'll hide (not remove)
the section headers to show they're optional and that the program can run
without them. To do that, we'll zero out the three fields e_shoff,
e_shnum, and e_shstrndx.
And, it runs !
"But wait" I hear you say, "didn't we say that .dynsym and .dynstr are important?"
Yes, they're still referenced by a program header
of type PT_DYNAMIC.
Note: if you want to remove sections that aren't referenced by the program headers,
use the sstrip tool.
Program headers
Thus, we understand that program headers are essential for executing an ELF binary.
Each program header describes a segment, loadable or not, and in 64-bit ELF files,
the structure is Elf64_Phdr
(see elf.h [2]).
They act as instructions for the loader, guiding the operating system as it maps the executable into memory and starts execution. In the following subsections, we'll walk through the different types and explain what each one does.
PT_NULL
The loader ignores PT_NULL entries.
PT_PHDR
This segment references the program header table itself.
PT_LOAD
Loadable segments are represented as PT_LOAD entries, each describes where the segment should appear in memory (p_vaddr), its size and alignment (p_memsz, p_align), and the access permissions in p_flags (PF_R (read), PF_W (write), PF_X (execute)).
// The size, aligned to
// page boundary
// (1024).
size_t size;
size = phdr->p_memsz + (phdr->p_align - (phdr->p_memsz % phdr->p_align));
mmap(
// The virtual address at
// which the segment
// starts.
phdr->p_vaddr,
size,
// The protections of
// the page, at first
// data needs to be
// copied.
PROT_READ | PROT_WRITE,
MAP_FIXED | MAP_ANONYMOUS,
-1,
0);
At load time, the loader maps the segment and copies p_filesz bytes from the file starting at p_offset into memory, and if p_memsz is larger than p_filesz, the remaining bytes are zero-filled.
fread(
// Read directly to
// the mmap()'d
// region.
phdr->p_vaddr,
// The number of
// bytes to read
// from the file.
phdr->p_filesz, 1,
// The file pointer
// to the ELF.
fp);
int prot = 0;
prot |= (phdr->p_flags & PF_X) ? PROT_EXEC : 0;
prot |= (phdr->p_flags & PF_R) ? PROT_READ : 0;
prot |= (phdr->p_flags & PF_W) ? PROT_WRITE : 0;
// The protection is
// applied after
// reading data.
mprotect(phdr->p_vaddr, size, prot);
PT_INTERP
This executable can't really run in isolation. Due to the complexity of the structures
involved, it relies on a dynamic linker/loader, which maps the executable
and its shared-library dependencies into memory, resolves external
symbols, applies relocations, before transfering control to the
program's entry point.
The kernel determines which dynamic linker to invoke from the
PT_INTERP program header:
it contains the filesystem path to the loader. The kernel maps it
and transfers control to it, in glibc, its startup routine is _dl_start().
PT_DYNAMIC
This segment provides all information required for dynamic linking at runtime, it's an
array of Elf64_Dyn entries, each starting with a tag that tells what its
value holds, the tags include:
- DT_NEEDED: libraries that are to be loaded.
- DT_SYMTAB: references .dynsym.
- DT_JMPREL: references .rela.plt.
- DT_STRTAB: references .dynstr.
PT_GNU_STACK
PT_GNU_STACK is a GNU extension program header that tells the loader what memory protections to use for the stack in p_flags. In the past, attackers could place injected code on the stack (via a local buffer overflow or crafted environment variables) and redirect control flow to it. Modern defenses, most notably non-executable (NX) stacks, together with ASLR killed classic "stack shellcode" attacks.
case PT_GNU_STACK:
stack_flags = ph->p_flags;
break;
gcc -Wl,-z,execstack main.c # Enable executable stack.
PT_GNU_RELRO
Because of defenses like stack canaries, attackers had to find other routes to control the instruction pointer, their attention turned to overwriting GOT entries (to redirect indirect calls) or function pointers stored in .fini_array (destructors executed at exit).
case PT_GNU_RELRO:
l->l_relro_addr = ph->p_vaddr;
l->l_relro_size = ph->p_memsz;
break;
Partial RELRO (enabled by default) protects sensitive
sections by making them read-only, while leaving part of
.got.plt writable to support lazy binding, in this example, it's the last
sizeof(void*)=8 bytes for puts().
For Full RELRO, all symbols are resolved at startup and the entire GOT becomes read-only. gcc -Wl,-z,relro,-z,now main.c # Enable Full RELRO.
Section headers
Section headers are optional metadata used mainly for linking, symbols, and debugging.
Each entry in the section header table is an Elf64_Shdr. The sections marked as
SHF_ALLOC fall within ranges that are mapped by loadable
segments.
The section header field sh_name is an offset into the section-name string table
(.shstrtab), which is typically not mapped at runtime (it lacks SHF_ALLOC).
The ELF header field e_shstrndx stores the index of .shstrtab in
the section header table.
String tables
The SHT_STRTAB sections are string tables, null-terminated strings addressed with an offset: .shstrtab contain section names, .strtab names symbols in .symtab (often stripped), and .dynstr names symbols in .dynsym (used at runtime by the dynamic linker).
Symbol tables
A symbol table is an array of symbol entries (Elf64_Sym). Each one stores its name as
an offset in st_name. That offset is resolved in the string table linked by the
symbol table section header (sh_link).
The st_info field is a single byte that packs two values: the symbol's binding and type.
#define ELF32_ST_BIND(val) (((unsigned char) (val)) >> 4)
#define ELF32_ST_TYPE(val) ((val) & 0xf)
#define ELF32_ST_INFO(bind, type) (((bind) << 4) + ((type) & 0xf))
The symbol type indicates what it represents.
- STT_TLS: Thread-local storage object
- STT_FILE: Source file name
- STT_FUNC: Function (code)
- STT_NOTYPE: Symbol type is unspecified
- STT_OBJECT: Data object (i.e: global variable)
- STT_SECTION: Symbol associated with a section
- STT_COMMON: Common object (unallocated data)
- STB_LOCAL: are confined to the current object.
- STB_GLOBAL: might be referenced from other objects.
- STB_WEAK: fallback, used only if no strong (global) definition is available.
Relocations
Relocation sections are either SHT_REL or SHT_RELA (similar, but has an extra field called r_addend in each relocation entry). Before reaching main, the loader performs calculations based on the relocation type and stores results in r_offset for entries in .rela.dyn.
#define ELF64_R_SYM(i) ((i) >> 32)
#define ELF64_R_TYPE(i) ((i) & 0xffffffff)
#define ELF64_R_INFO(sym,type) ((((Elf64_Xword) (sym)) << 32) + (type))
Relocation types determine the formula that the loader uses to calculate the
value to place at the relocation site (r_offset), the variables
used in these formulas are:
- A: addend
- S: resolved symbol value
- G: offset of symbol GOT entry
- P: address of the relocation site
- GOT: address of Global offset table
| R_X86_64_64 | S + A |
| R_X86_64_PC32 | S + A - P |
| R_X86_64_GLOB_DAT | S |
| R_X86_64_JMP_SLOT | S |
| R_X86_64_RELATIVE | S |
| R_X86_64_GOTPCRELX | GOT + G + A - P |
Because this file is a fully linked executable (ET_EXEC), the relocation sections you see
(.rela.dyn and .rela.plt) contain dynamic relocations, their sh_link
field points to the dynamic symbol table (SHT_DYNSYM), these entries are
intended for the runtime loader, and r_offset is the
virtual address of the location to patch.
Lazy binding
A program can use multiple external functions, and it'd be expensive to resolve all symbols
at once during loading process, since sometimes, the code invoking or using such entries
might not even be reached.
To optimize this process, lazy binding exists, it uses the concept of PLT
(Procedure Linkage Table) which we'll talk about shortly, and uses the relocation
entries, the link_map and a function that performs symbol
resolution upon request (_dl_runtime_resolve).
Link map
This structure holds information about loaded files, starting from the main executable
and references through double-linked list pointers, the libraries it depends on
(the dynamic linker, VDSO and libc), or one that is
force-loaded using dlopen().
Most important, it has a field named l_ld that references PT_DYNAMIC segment,
it holds all the structures that are needed during runtime, including string tables,
dynamic symbols and relocations. Other fields are l_addr for the
base address, l_name contains the path of the file, but most of them
are internal and aren't exposed in the link.h file.
void* handle;
struct link_map* link_map;
// Get a handle
// to current
// executable.
handle = dlopen(NULL, RTLD_LAZY);
// Get its
// link_map.
dlinfo(handle, RTLD_DI_LINKMAP, &link_map);
// Close
// handle.
dlclose(handle);
GOT
The resolution process happens once, and the result must be stored for subsequent
calls, this is a role for the Global Offset Table (GOT). For example, a
puts("Hello world") requires that puts be resolved upon
the first call and its address to be stored in a pre-defined entry in the
GOT.
The GOT is a table of addresses used by position-independent code to access
global data and external functions. It introduces two sections, .got that
holds symbols resolved before main() is called (i.e: __libc_start_main_impl,
__gmon_start__). And .got.plt comes with a header, and reserved areas for
symbols to be resolved upon an initial call.
The header is prepared by the dynamic linker's _dl_start(),
specifically the inlined function elf_machine_runtime_setup().
.got.plt[0] = .dynamic .got.plt[1] = struct link_map* .got.plt[2] = _dl_runtime_resolve_xsave
PLT
The Procedure Linkage Table (PLT) has call stubs for external functions in a section
named .plt, the first instruction jumps to the address specified in GOT,
initially, it's set to point to the next instruction after the jmp,
it finds a push $n, pushing a relocation index and going to
PLT0.
The PLT0 pushes the struct link_map* (in .got.plt[1]), and transfers
execution to the resolver _dl_runtime_resolve_xsave (in .got.plt[2]), so it finds
the .rela.plt (DT_JMPREL) and performs this relocation at runtime.
Lazy-binding is simple to disable (no PLT remains), and the call points directly
to puts@GLIBC_2.2.5 instead of puts@plt.
gcc main.c -fno-plt