HACKING: THE ART OF EXPLOITATION

HACKING: THE ART OF EXPLOITATION
2ND EDITION
Jon Erickson

0x100 INTRODUCTION

0x200 PROGRAMMING

0x250 Getting Your Hands Dirty

firstprog.c:

#include <stdio.h>
int main()
{
  int i;
  for(i=0; i < 10; i++)
  {
  puts("Hello, world!\n");
  }
  return 0;
}

0x251 The Bigger Picture

Each architecture has a different machine language, so the compiler acts as a middle ground—translating C code into machine language for the target architecture. The GNU development tools include a program called objdump , which can be used to examine compiled binaries. Let’s start by looking at the machine code the main() function was translated into. The output is piped into grep with the command-line option to only display 20 lines after the regular expression pattern "main.:" .

$ objdump -D a.out | grep -A20 main.:
000000000000063a <main>:
 63a:	55                   	push   %rbp
 63b:	48 89 e5             	mov    %rsp,%rbp
 63e:	48 83 ec 10          	sub    $0x10,%rsp
 642:	c7 45 fc 00 00 00 00 	movl   $0x0,-0x4(%rbp)
 649:	eb 10                	jmp    65b <main+0x21>
 64b:	48 8d 3d a2 00 00 00 	lea    0xa2(%rip),%rdi        # 6f4 <_IO_stdin_used+0x4>
 652:	e8 b9 fe ff ff       	callq  510 <puts@plt>
 657:	83 45 fc 01          	addl   $0x1,-0x4(%rbp)
 65b:	83 7d fc 09          	cmpl   $0x9,-0x4(%rbp)
 65f:	7e ea                	jle    64b <main+0x11>
 661:	b8 00 00 00 00       	mov    $0x0,%eax
 666:	c9                   	leaveq 
 667:	c3                   	retq   
 668:	0f 1f 84 00 00 00 00 	nopl   0x0(%rax,%rax,1)
 66f:	00 

0000000000000670 <__libc_csu_init>:
 670:	41 57                	push   %r15
 672:	41 56                	push   %r14
 674:	49 89 d7             	mov    %rdx,%r15

  • The hexadecimal numbers—starting with 000000000000063a on the far left—are memory addresses.
  • Older Intel x86 processors use a 32-bit addressing scheme, while newer ones use a 64-bit one.
  • The hexadecimal bytes in the middle of the listing above are the machine language instructions for the x86 processor.
  • The instructions on the far right are in assembly language.
  • Assembly language is really just a collection of mnemonics for the corresponding machine language instructions.
    The main x86 assembly language syntax has two main types: AT&T syntax and Intel syntax.
    AT&T syntax uses % and $ symbols for prefixing everything, Intel syntax can be shown by providing an additional command-line option, -M intel , to objdump ,
    
    $ objdump -D -M intel  a.out | grep -A20 main.:
    000000000000063a <main>:
     63a:	55                   	push   rbp
     63b:	48 89 e5             	mov    rbp,rsp
     63e:	48 83 ec 10          	sub    rsp,0x10
     642:	c7 45 fc 00 00 00 00 	mov    DWORD PTR [rbp-0x4],0x0
     649:	eb 10                	jmp    65b <main+0x21>
     64b:	48 8d 3d a2 00 00 00 	lea    rdi,[rip+0xa2]        # 6f4 <_IO_stdin_used+0x4>
     652:	e8 b9 fe ff ff       	call   510 <puts@plt>
     657:	83 45 fc 01          	add    DWORD PTR [rbp-0x4],0x1
     65b:	83 7d fc 09          	cmp    DWORD PTR [rbp-0x4],0x9
     65f:	7e ea                	jle    64b <main+0x11>
     661:	b8 00 00 00 00       	mov    eax,0x0
     666:	c9                   	leave  
     667:	c3                   	ret    
     668:	0f 1f 84 00 00 00 00 	nop    DWORD PTR [rax+rax*1+0x0]
     66f:	00 
    
    0000000000000670 <__libc_csu_init>:
     670:	41 57                	push   r15
     672:	41 56                	push   r14
     674:	49 89 d7             	mov    r15,rdx
      
    
    These instructions consist of an operation and sometimes additional arguments that describe the destination and/or the source for the operation.

0x252 The x86 Processor

The x86 processor has several registers, which are like internal variables for the processor. Debuggers are used by programmers to step through compiled programs, examine program memory, and view processor registers.

$ gdb -q ./a.out
Reading symbols from ./a.out...(no debugging symbols found)...done.
(gdb) break main
Breakpoint 1 at 0x63e
(gdb) run
Starting program: /home/jerry/test/a.out 

Breakpoint 1, 0x000055555555463e in main ()
(gdb) info registers
rax            0x55555555463a	93824992233018
rbx            0x0	0
rcx            0x555555554670	93824992233072
rdx            0x7fffffffdec8	140737488346824
rsi            0x7fffffffdeb8	140737488346808
rdi            0x1	1
rbp            0x7fffffffddd0	0x7fffffffddd0
rsp            0x7fffffffddd0	0x7fffffffddd0
r8             0x7ffff7dd0d80	140737351847296
r9             0x7ffff7dd0d80	140737351847296
r10            0x0	0
r11            0x3	3
r12            0x555555554530	93824992232752
r13            0x7fffffffdeb0	140737488346800
r14            0x0	0
r15            0x0	0
rip            0x55555555463e	0x55555555463e <main+4>
eflags         0x246	[ PF ZF IF ]
cs             0x33	51
ss             0x2b	43
ds             0x0	0
es             0x0	0
fs             0x0	0
gs             0x0	0
(gdb) 

0x253 Assembly Language

Inside GDB, the disassembly syntax can be set to Intel by simply typing set disassembly intel. You can configure this setting to run every time GDB starts up by

$ echo "set disassembly intel" > ~/.gdbinit  
The assembly instructions in Intel syntax generally follow this style:

  operation <destination>, <source>
The destination and source values will either be a register, a memory address, or a value. For example,

 63b:	48 89 e5             	mov    rbp,rsp
 63e:	48 83 ec 10          	sub    rsp,0x10
the above instructions:
  • move the value from RSP to RBP
  • subtract 0x10 from RSP
There are also operations that are used to control the flow of execution.
  • The cmp operation is used to compare values
  • any operation beginning with j is used to jump to a different part of the code (depending on the result of the comparison)
The example below first compares a 4-byte value located at RBP minus 4 with the number 9. The next instruction is shorthand for jump if less than or equal to, referring to the result of the previous comparison. If that value is less than or equal to 9, execution jumps to the instruction at 0x64b.
Otherwise, execution flows to the next instruction with an mov operation.

 65b:	83 7d fc 09          	cmp    DWORD PTR [rbp-0x4],0x9
 65f:	7e ea                	jle    64b <main+0x11>
 661:	b8 00 00 00 00       	mov    eax,0x0
 666:	c9                   	leave   
A quick history of x86 accumulator:
  • Prior to the 8086 the registers were single letters, e.g., A, B, C, D.
  • Each was an 8-bit register.
    A stood for accumulator, which was an implicit operand and return value of the arithmetic and logical operations.
  • The 8086 had 16-bit registers that could be referenced either 8-bits at a time or all 16-bits at once.
  • For example, we could reference the 8 high-order bits of the A register, the 8 low-order bits of the A register, or the entire 16 bits of the A register. The nomenclature of the first two were chosen to be AL and AH, where the L/H designated the low-order or the high order half. Now we needed a term to designate the full 16 bits. So the letter X was selected. The X was simply an arbitrary letter that combined both L and H – sort of like the use of X in algebra to designate the unknown.
    8086 also introduced the segment registers and more general purpose registers:
    • SP is stack pointer
    • BP is base pointer
    • SI is source index
    • DI is destination index
  • x86
  • The main registers were extended to 32 bits by adding an E prefix.
  • x86-64
  • In 2003 AMD effectively takes over the architectural leadership and introduces the first 64-bit processor in the x86 lineage.
    • The eight main registers are extended to 64 bits.
    • The new registers ( R8 – R15 ) also got their “narrow” versions.
Let’s use the debugger to step through the first program at the assembly instruction level.
The -g flag can be used by the GCC compiler to include extra debugging information, which will give GDB access to the source code.

$ ls -l *.out
-rwxr-xr-x 1 jerry jerry 8296  九  22 20:02 a.out
$ gcc -g test.c
$ ls -l *.out
-rwxr-xr-x 1 jerry jerry 10760  九  23 09:45 a.out
  • display the source code
  • 
    $ gdb -q ./a.out
    Reading symbols from ./a.out...done.
    (gdb) list
    1	#include <stdio.h>
    2	
    3	int main()
    4	{
    5	  int i;
    6	  for(i=0; i < 10; i++)
    7	  {
    8	  puts("Hello, world!\n");
    9	  }
    10	  return 0;
    
    
    -q "Quiet". Do not print the introductory and copyright messages.
  • the disassembly of the main() function
  • 
    (gdb) disassemble main
    Dump of assembler code for function main:
       0x000055555555463a <+0>:	push   rbp
       0x000055555555463b <+1>:	mov    rbp,rsp
       0x000055555555463e <+4>:	sub    rsp,0x10
    =>  0x555555554642 <+8>:	mov    DWORD PTR [rbp-0x4],0x0
       0x0000555555554649 <+15>:	jmp    0x55555555465b <main+33>
       0x000055555555464b <+17>:	lea    rdi,[rip+0xa2]        # 0x5555555546f4
       0x0000555555554652 <+24>:	call   0x555555554510 <puts@plt>
       0x0000555555554657 <+29>:	add    DWORD PTR [rbp-0x4],0x1
       0x000055555555465b <+33>:	cmp    DWORD PTR [rbp-0x4],0x9
       0x000055555555465f <+37>:	jle    0x55555555464b <main+17>
       0x0000555555554661 <+39>:	mov    eax,0x0
       0x0000555555554666 <+44>:	leave  
       0x0000555555554667 <+45>:	ret
    
    End of assembler dump.
      
      
  • a breakpoint is set at the start of main()
  • 
    (gdb) break main
    Breakpoint 1 at 0x642: file test.c, line 6.
    (gdb) run
    Starting program: /home/jerry/test/a.out 
    
    Breakpoint 1, main () at test.c:6
    6	  for(i=0; i < 10; i++)
      
        
  • the value of (the Instruction Pointer) is displayed.
  • 
    (gdb) info register rip
    rip            0x555555554642	0x555555554642 <main+8>
         
    Notice that EIP contains a memory address that points to an instruction in the main() function’s disassembly.
    The instructions before this are collectively known as the function prologue and are generated by the compiler to set up memory for the rest of the main() function’s local variables.
    The debugger knows this part of the code is automatically generated and is smart enough to skip over it
Examining memory is a critical skill for any hacker. The GDB debugger provides a direct method to examine memory, using the command x , which is short for examine. This command expects two arguments when it’s used:
  • the location in memory to examine
  • The memory the RIP register is pointing to can be examined by using the address stored in RIP.
    
    (gdb) info register rip
    rip            0x555555554642	0x555555554642 <main+8>
    (gdb) x/o 0x555555554642
    0x555555554642 <main+8>:	077042707
    (gdb) x/o $rip
    0x555555554642 <main+8>:	077042707
      
         
    The debugger lets you reference registers directly, so $rip is equivalent to the value RIP contains at that moment.
  • how to display that memory
  • The display format also uses a single-letter shorthand,
    • o
    • Display in octal.
    • x
    • Display in hexadecimal.
    • u
    • Display in unsigned, standard base-10 decimal.
    • t
    • Display in binary.
    which is optionally preceded by a count of how many items to examine.
    A number can also be prepended to the format of the examine command to examine multiple units at the target address.
    The default size of a single unit is a 4-byte unit called a word. The size of the display units for the examine command can be changed by adding a size letter to the end of the format letter:
    • b
    • A single byte
    • h
    • A halfword, which is 2 bytes in size
    • w
    • A word, which is 4 bytes in size
    • g
    • A giant, which is 8 bytes in size
    
    (gdb) x/4xw $rip
    0x555555554642 <main+8>:	0x00fc45c7	0xeb000000	0x3d8d4810	0x000000a2
    (gdb) x/16xb $rip
    0x555555554642 <main+8>:	0xc7	0x45	0xfc	0x00	0x00	0x00	0x00	0xeb
    0x55555555464a <main+16>:	0x10	0x48	0x8d	0x3d	0xa2	0x00	0x00	0x00
    
         
On the x86 processor values are stored in little-endian byte order, which means the least significant byte is stored first.
A program to dump the endian:

#include <stdio.h>

int main()
{
  int i, num=0x12345678;
  int *p=NULL;

  p = (int *) &num;
  char *b=NULL;

  b = (char *) p;
  for(i=0; i < sizeof(int); i++)
  {
  printf("Hello, world! %x\n", *(b+i) );
  }
  return 0;
}

$ ./test
Hello, world! 78
Hello, world! 56
Hello, world! 34
Hello, world! 12  // MSB is in hifh address
  
The GDB debugger is smart enough to know how values are stored, the bytes are reversed to display the correct values in hexadecimal.

(gdb) x/4ub $rip
0x555555554642 <main+8>:	199	69	252	0
(gdb) x/1uw $rip
0x555555554642 <main+8>:	16532935

16532935 = 0*(256^3) + 252*(256^2) + 69*(256^1) + 199*(256^0)
     
The examine command also accepts the format letter i , short for instruction, to display the content of the memory as disassembled assembly language instructions.

(gdb) x/7xb $rip
0x555555554642 <main+8>:	0xc7	0x45	0xfc	0x00	0x00	0x00	0x00
(gdb) x/i $rip
=> 0x555555554642 <main+8>:	mov    DWORD PTR [rbp-0x4],0x0

We have seen that the binary dump is disassembled as the following:

642:	c7 45 fc 00 00 00 00 	mov    DWORD PTR [rbp-0x4],0x0
This assembly instruction will move the value of 0 into memory located at the address stored in the RBP register, minus 4.
This is where the C variable i is stored in memory; i was declared as an integer that uses 4 bytes of memory on the x86 processor.
Basically, this command will zero out the variable i for the for loop.
Let’s execute the current instruction using the command nexti (next instruction):

Breakpoint 1, main () at test.c:6
6	  for(i=0; i < sizeof(int); i++)
(gdb) nexti
0x0000555555554659	6	  for(i=0; i < sizeof(int); i++)

The processor will read the instruction at RIP, execute it, and advance RIP to the next instruction:

(gdb) info r rip
rip            0x555555554659	0x555555554659 <main+15>
(gdb) x/i $rip
=> 0x555555554659 <main+15>:	jmp    0x555555554670 <main+38>

留言

熱門文章