HACKING: THE ART OF EXPLOITATION
HACKING: THE ART OF EXPLOITATION
2ND EDITION
Jon Erickson
0x100 INTRODUCTION
0x200 PROGRAMMING
0x250 Getting Your Hands Dirty
firstprog.c:#include <stdio.h> int main() { int i; for(i=0; i < 10; i++) { puts("Hello, world!\n"); } return 0; }
0x251 The Bigger Picture
Each architecture has a different machine language, so the compiler acts as a middle ground—translating C code into machine language for the target architecture. The GNU development tools include a program called objdump , which can be used to examine compiled binaries. Let’s start by looking at the machine code the main() function was translated into. The output is piped into grep with the command-line option to only display 20 lines after the regular expression pattern "main.:" .$ objdump -D a.out | grep -A20 main.: 000000000000063a <main>: 63a: 55 push %rbp 63b: 48 89 e5 mov %rsp,%rbp 63e: 48 83 ec 10 sub $0x10,%rsp 642: c7 45 fc 00 00 00 00 movl $0x0,-0x4(%rbp) 649: eb 10 jmp 65b <main+0x21> 64b: 48 8d 3d a2 00 00 00 lea 0xa2(%rip),%rdi # 6f4 <_IO_stdin_used+0x4> 652: e8 b9 fe ff ff callq 510 <puts@plt> 657: 83 45 fc 01 addl $0x1,-0x4(%rbp) 65b: 83 7d fc 09 cmpl $0x9,-0x4(%rbp) 65f: 7e ea jle 64b <main+0x11> 661: b8 00 00 00 00 mov $0x0,%eax 666: c9 leaveq 667: c3 retq 668: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1) 66f: 00 0000000000000670 <__libc_csu_init>: 670: 41 57 push %r15 672: 41 56 push %r14 674: 49 89 d7 mov %rdx,%r15
- The hexadecimal numbers—starting with 000000000000063a on the far left—are memory addresses. Older Intel x86 processors use a 32-bit addressing scheme, while newer ones use a 64-bit one.
- The hexadecimal bytes in the middle of the listing above are the machine language instructions for the x86 processor.
- The instructions on the far right are in assembly language. Assembly language is really just a collection of mnemonics for the corresponding machine language instructions.
The main x86 assembly language syntax has two main types: AT&T syntax and Intel syntax.
AT&T syntax uses % and $ symbols for prefixing everything, Intel syntax can be shown by providing an additional command-line option, -M intel , to objdump ,
$ objdump -D -M intel a.out | grep -A20 main.: 000000000000063a <main>: 63a: 55 push rbp 63b: 48 89 e5 mov rbp,rsp 63e: 48 83 ec 10 sub rsp,0x10 642: c7 45 fc 00 00 00 00 mov DWORD PTR [rbp-0x4],0x0 649: eb 10 jmp 65b <main+0x21> 64b: 48 8d 3d a2 00 00 00 lea rdi,[rip+0xa2] # 6f4 <_IO_stdin_used+0x4> 652: e8 b9 fe ff ff call 510 <puts@plt> 657: 83 45 fc 01 add DWORD PTR [rbp-0x4],0x1 65b: 83 7d fc 09 cmp DWORD PTR [rbp-0x4],0x9 65f: 7e ea jle 64b <main+0x11> 661: b8 00 00 00 00 mov eax,0x0 666: c9 leave 667: c3 ret 668: 0f 1f 84 00 00 00 00 nop DWORD PTR [rax+rax*1+0x0] 66f: 00 0000000000000670 <__libc_csu_init>: 670: 41 57 push r15 672: 41 56 push r14 674: 49 89 d7 mov r15,rdxThese instructions consist of an operation and sometimes additional arguments that describe the destination and/or the source for the operation.
0x252 The x86 Processor
The x86 processor has several registers, which are like internal variables for the processor. Debuggers are used by programmers to step through compiled programs, examine program memory, and view processor registers.$ gdb -q ./a.out Reading symbols from ./a.out...(no debugging symbols found)...done. (gdb) break main Breakpoint 1 at 0x63e (gdb) run Starting program: /home/jerry/test/a.out Breakpoint 1, 0x000055555555463e in main () (gdb) info registers rax 0x55555555463a 93824992233018 rbx 0x0 0 rcx 0x555555554670 93824992233072 rdx 0x7fffffffdec8 140737488346824 rsi 0x7fffffffdeb8 140737488346808 rdi 0x1 1 rbp 0x7fffffffddd0 0x7fffffffddd0 rsp 0x7fffffffddd0 0x7fffffffddd0 r8 0x7ffff7dd0d80 140737351847296 r9 0x7ffff7dd0d80 140737351847296 r10 0x0 0 r11 0x3 3 r12 0x555555554530 93824992232752 r13 0x7fffffffdeb0 140737488346800 r14 0x0 0 r15 0x0 0 rip 0x55555555463e 0x55555555463e <main+4> eflags 0x246 [ PF ZF IF ] cs 0x33 51 ss 0x2b 43 ds 0x0 0 es 0x0 0 fs 0x0 0 gs 0x0 0 (gdb)
0x253 Assembly Language
Inside GDB, the disassembly syntax can be set to Intel by simply typing set disassembly intel. You can configure this setting to run every time GDB starts up by$ echo "set disassembly intel" > ~/.gdbinitThe assembly instructions in Intel syntax generally follow this style:
operation <destination>, <source>The destination and source values will either be a register, a memory address, or a value. For example,
63b: 48 89 e5 mov rbp,rsp 63e: 48 83 ec 10 sub rsp,0x10the above instructions:
- move the value from RSP to RBP
- subtract 0x10 from RSP
- The cmp operation is used to compare values
- any operation beginning with j is used to jump to a different part of the code (depending on the result of the comparison)
Otherwise, execution flows to the next instruction with an mov operation.
65b: 83 7d fc 09 cmp DWORD PTR [rbp-0x4],0x9 65f: 7e ea jle 64b <main+0x11> 661: b8 00 00 00 00 mov eax,0x0 666: c9 leaveA quick history of x86 accumulator:
- Prior to the 8086 the registers were single letters, e.g., A, B, C, D. Each was an 8-bit register.
- The 8086 had 16-bit registers that could be referenced either 8-bits at a time or all 16-bits at once. For example, we could reference the 8 high-order bits of the A register, the 8 low-order bits of the A register, or the entire 16 bits of the A register. The nomenclature of the first two were chosen to be AL and AH, where the L/H designated the low-order or the high order half. Now we needed a term to designate the full 16 bits. So the letter X was selected. The X was simply an arbitrary letter that combined both L and H – sort of like the use of X in algebra to designate the unknown.
- SP is stack pointer
- BP is base pointer
- SI is source index
- DI is destination index
- x86 The main registers were extended to 32 bits by adding an E prefix.
- x86-64 In 2003 AMD effectively takes over the architectural leadership and introduces the first 64-bit processor in the x86 lineage.
- The eight main registers are extended to 64 bits.
- The new registers ( R8 – R15 ) also got their “narrow” versions.
A stood for accumulator, which was an implicit operand and return value of the arithmetic and logical operations.
8086 also introduced the segment registers and more general purpose registers:
The -g flag can be used by the GCC compiler to include extra debugging information, which will give GDB access to the source code.
$ ls -l *.out -rwxr-xr-x 1 jerry jerry 8296 九 22 20:02 a.out $ gcc -g test.c $ ls -l *.out -rwxr-xr-x 1 jerry jerry 10760 九 23 09:45 a.out
- display the source code
$ gdb -q ./a.out Reading symbols from ./a.out...done. (gdb) list 1 #include <stdio.h> 2 3 int main() 4 { 5 int i; 6 for(i=0; i < 10; i++) 7 { 8 puts("Hello, world!\n"); 9 } 10 return 0;-q "Quiet". Do not print the introductory and copyright messages.
(gdb) disassemble main Dump of assembler code for function main: 0x000055555555463a <+0>: push rbp 0x000055555555463b <+1>: mov rbp,rsp 0x000055555555463e <+4>: sub rsp,0x10 => 0x555555554642 <+8>: mov DWORD PTR [rbp-0x4],0x0 0x0000555555554649 <+15>: jmp 0x55555555465b <main+33> 0x000055555555464b <+17>: lea rdi,[rip+0xa2] # 0x5555555546f4 0x0000555555554652 <+24>: call 0x555555554510 <puts@plt> 0x0000555555554657 <+29>: add DWORD PTR [rbp-0x4],0x1 0x000055555555465b <+33>: cmp DWORD PTR [rbp-0x4],0x9 0x000055555555465f <+37>: jle 0x55555555464b <main+17> 0x0000555555554661 <+39>: mov eax,0x0 0x0000555555554666 <+44>: leave 0x0000555555554667 <+45>: ret End of assembler dump.
(gdb) break main Breakpoint 1 at 0x642: file test.c, line 6. (gdb) run Starting program: /home/jerry/test/a.out Breakpoint 1, main () at test.c:6 6 for(i=0; i < 10; i++)
(gdb) info register rip rip 0x555555554642 0x555555554642 <main+8>Notice that EIP contains a memory address that points to an instruction in the main() function’s disassembly.
The instructions before this are collectively known as the function prologue and are generated by the compiler to set up memory for the rest of the main() function’s local variables.
The debugger knows this part of the code is automatically generated and is smart enough to skip over it Examining memory is a critical skill for any hacker. The GDB debugger provides a direct method to examine memory, using the command x , which is short for examine. This command expects two arguments when it’s used:
- the location in memory to examine The memory the RIP register is pointing to can be examined by using the address stored in RIP.
(gdb) info register rip rip 0x555555554642 0x555555554642 <main+8> (gdb) x/o 0x555555554642 0x555555554642 <main+8>: 077042707 (gdb) x/o $rip 0x555555554642 <main+8>: 077042707The debugger lets you reference registers directly, so $rip is equivalent to the value RIP contains at that moment.
- o Display in octal.
- x Display in hexadecimal.
- u Display in unsigned, standard base-10 decimal.
- t Display in binary.
A number can also be prepended to the format of the examine command to examine multiple units at the target address.
The default size of a single unit is a 4-byte unit called a word. The size of the display units for the examine command can be changed by adding a size letter to the end of the format letter:
- b A single byte
- h A halfword, which is 2 bytes in size
- w A word, which is 4 bytes in size
- g A giant, which is 8 bytes in size
(gdb) x/4xw $rip 0x555555554642 <main+8>: 0x00fc45c7 0xeb000000 0x3d8d4810 0x000000a2 (gdb) x/16xb $rip 0x555555554642 <main+8>: 0xc7 0x45 0xfc 0x00 0x00 0x00 0x00 0xeb 0x55555555464a <main+16>: 0x10 0x48 0x8d 0x3d 0xa2 0x00 0x00 0x00On the x86 processor values are stored in little-endian byte order, which means the least significant byte is stored first.
A program to dump the endian:
#include <stdio.h> int main() { int i, num=0x12345678; int *p=NULL; p = (int *) # char *b=NULL; b = (char *) p; for(i=0; i < sizeof(int); i++) { printf("Hello, world! %x\n", *(b+i) ); } return 0; } $ ./test Hello, world! 78 Hello, world! 56 Hello, world! 34 Hello, world! 12 // MSB is in hifh addressThe GDB debugger is smart enough to know how values are stored, the bytes are reversed to display the correct values in hexadecimal.
(gdb) x/4ub $rip 0x555555554642 <main+8>: 199 69 252 0 (gdb) x/1uw $rip 0x555555554642 <main+8>: 16532935 16532935 = 0*(256^3) + 252*(256^2) + 69*(256^1) + 199*(256^0)The examine command also accepts the format letter i , short for instruction, to display the content of the memory as disassembled assembly language instructions.
(gdb) x/7xb $rip 0x555555554642 <main+8>: 0xc7 0x45 0xfc 0x00 0x00 0x00 0x00 (gdb) x/i $rip => 0x555555554642 <main+8>: mov DWORD PTR [rbp-0x4],0x0We have seen that the binary dump is disassembled as the following:
642: c7 45 fc 00 00 00 00 mov DWORD PTR [rbp-0x4],0x0This assembly instruction will move the value of 0 into memory located at the address stored in the RBP register, minus 4.
This is where the C variable i is stored in memory; i was declared as an integer that uses 4 bytes of memory on the x86 processor.
Basically, this command will zero out the variable i for the for loop.
Let’s execute the current instruction using the command nexti (next instruction):
Breakpoint 1, main () at test.c:6 6 for(i=0; i < sizeof(int); i++) (gdb) nexti 0x0000555555554659 6 for(i=0; i < sizeof(int); i++)The processor will read the instruction at RIP, execute it, and advance RIP to the next instruction:
(gdb) info r rip rip 0x555555554659 0x555555554659 <main+15> (gdb) x/i $rip => 0x555555554659 <main+15>: jmp 0x555555554670 <main+38>
留言