1月 26, 2014

Kernel Debugging

Debug Hacks

Basics

#4 Core Dump of a Process

The default action of certain signals is to cause a process to terminate and produce a core dump file, a disk file containing an image of the process's memory at the time of termination.
This image can be used in a debugger (e.g., gdb(1)) to inspect the state of the program at the time that it terminated.

一般的執行shell的環境是限制core file的產生:


$ ulimit -c
0

-c: The maximum size of core files created.

要設成允許core file的產生:


$ ulimit -c unlimited

You can see a process’s limits by running cat /proc/PID/limit.
Use the following code for testing:


#include <string.h>

int main(){
 char *ptr=NULL;

 *ptr=0;

}

to generate core file


$ gcc -g seg.c -o ./test

$ ./test
Segmentation fault (core dumped)

$ file core
core: ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style, from './test', real uid: 1000, effective uid: 1000, real gid: 1000, effective gid: 1000, execfn: './test', platform: 'x86_64'

$ gdb -c core ./test
GNU gdb (Ubuntu 8.1-0ubuntu3.2) 8.1.0.20180409-git
...
Reading symbols from ./test...(no debugging symbols found)...done.
[New LWP 7209]
Core was generated by `./test'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x000055a3a2a8060a in main () at seg.c:6
6  *ptr=0;
(gdb)

Note, debug information must be compiled in with -g so that the crash line can be interpreted.
Using gdb's list command 'l 6' can help to dump the source code around the crash line:


(gdb) l 6
1 #include 
2 
3 int main(){
4  char *ptr=NULL;
5 
6  *ptr=0;
7  
8 }

預設是在目前工作的目錄下產生core file, 但是對大型軟體來說, 很難去找到哪個程式執行時的工作目錄, 最好能有專門存放core file的目錄, 也可藉此控制產生的大小 Linux supports an alternate syntax for the /proc/sys/kernel/core_pattern file. If the first character of this file is a pipe symbol(|), then the remainder of the line is interpreted as a program to be executed.
Ubuntu桌面版預裝了Apport，它是一個錯誤收集系統，當一個應用程式崩潰或者出現Bug時候，Apport就會通過彈窗警告用戶並且詢問用戶是否提交崩潰報告。
Apport uses /proc/sys/kernel/core_pattern to directly pipe the core dump into apport:


|/usr/share/apport/apport %p %s %c %d %P

%p
%s
%c
%d
%P
%e
%h
%t/li>
timestamp

Apport has logs in /var/log/apport.log, by default, it will ignore crashes from binaries that aren’t part of an Ubuntu packages. I didn’t feel like trying to convince Apport to give me my core dumps. I ended up just overriding this Apport business and setting kernel.core_pattern to


sysctl -w kernel.core_pattern=/tmp/core-%e.%p.%h.%t

The core dump file generated : /tmp/core-test.9950.jerry-Latitude-E6410.1577308679 The Linux-specific /proc/[pid]/coredump_filter file can be used to control which memory segments are written to the core dump file. This file is provided only if the kernel was built with the CONFIG_ELF_CORE configuration option. The value in the file is a bit mask of memory mapping types (see mmap(2)):

bit 0 Dump anonymous private mappings.
bit 1 Dump anonymous shared mappings.
bit 2 Dump file-backed private mappings.
bit 3 Dump file-backed shared mappings.
bit 4 Dump ELF headers.
bit 5 Dump private huge pages.
bit 6 Dump shared huge pages.
bit 7 Dump private DAX pages.
bit 8 Dump shared DAX pages.

By default, the following bits are set: 0, 1, 4, 5. (110011 , 0x00000033) If you don't want the huge shared memory to be dump, you can modify it.


echo 1 > /proc/[pid]/coredump_filter

#5 GDB Basic 1: Trace

Build the program with debug -g option

gcc -Wall -O2 -g

-Werror

Start gdb


$ gdb program

Set breakpoint


break position

position

function name
line number
filename:function_name
filename:line_number
+offset
-offset
*address


(gdb) b main
Breakpoint 1 at 0x5fe: file seg.c, line 4.
(gdb) b 4
Note: breakpoint 1 also set at pc 0x5fe.
Breakpoint 2 at 0x5fe: file seg.c, line 4.


(gdb) info break

Delete breakpoint

clear
clear function
clear filename:function
clear linenum
clear filename:linenum
delete [breakpoints] [range...]
delete all


run program_parameters

start

next
step

nexi

stepi


continue 次數

Set breakpoint with condition


break position if cond

cond

Backtrace: Displays the call trace for the currently selected thread.

the location of the call in your program
the arguments of the call
the local variables of the function being called

stack frame

call stack

main

initial frame

the outermost frame

backtrace


#include <stdio.h>

void call_2()
{
    printf("hello 2\n");
}

void call_1()
{
    printf("hello 1\n");
    call_2();
}

int main()
{
    printf("hello\n");
    call_1();

    return 0;
}


(gdb) b call_2
Breakpoint 1 at 0x63e: file stack.c, line 5.
(gdb) r
Starting program: /home/jerry/test/a.out 
hello
hello 1

Breakpoint 1, call_2 () at stack.c:5
5     printf("hello 2\n");
(gdb) bt
#0  call_2 () at stack.c:5
#1  0x0000555555554667 in call_1 () at stack.c:11
#2  0x0000555555554684 in main () at stack.c:17

Set variables


set variable 變數=運算式

print: Examining Data

print


print expr

info registers


(gdb) p $registerName

p/格式


(gdb) x $pc
0x55555555466e <main+4>: 0xaf3d8d48


(gdb) x/i $pc
=> 0x55555555466e : lea    0xaf(%rip),%rdi        # 0x555555554724


 x/NFU addr

Numbers of the repeat count
Format of the displayed result
Unit size


(gdb) x/5i $pc
=> 0x55555555466e : lea    0xaf(%rip),%rdi        # 0x555555554724


disassemble
disassemble [Function]
disassemble [Address]
disassemble [Start],[End]
disassemble [Function],+[Length]
disassemble [Address],+[Length]
disassemble /m [...]
disassemble /r [...]

watchpoint

watchpoint

Set a watchpoint for an expression. GDB will break when expr is written into by the program and its value changes.


watch expr

Set a watchpoint that will break when watch expr is read by the program.


rwatch expr

Set a watchpoint that will break when expr is either read or written into by the program.


awatch expr

This command prints a list of watchpoints, breakpoints, and catchpoints; it is the same as info break.


info watchpoints

Generate core file


generate-core-file


$ gcore PID

#6 GDB Basic 2

Debugging an already-running process


attach process-id

stop

examine and modify an attached process with all the GDB commands

run

detach

If you exit GDB or use the run command while you have an attached process, you kill that process.

Break conditions


 if ( node == 0 )


  break position


  condition #break condition


  condition #break

Breakpoint command lists

a series of commands to execute when your program stops due to that breakpoint


  commands #break
  ... command-list ...
  end

#8 Intel Architecture Basic

#9 Stack

The primary purpose of a call stack is to store the return addresses. When a subroutine is called, the location (address) of the instruction at which the calling routine can later resume needs to be saved somewhere. A call stack is composed of stack frames. These are machine dependent and ABI-dependent data structures containing subroutine state information. Each stack frame corresponds to a call to a subroutine which has not yet terminated with a return. For example, if a subroutine named DrawLine() is currently running, having been called by a subroutine DrawSquare(), the top part of the call stack might be laid out like this:

The stack frame at the top of the stack is for the currently executing routine. The stack frame usually includes at least the following items (in push order):

the arguments (parameter values) passed to the routine (if any)
the return address back to the routine's caller (e.g. in the DrawLine() stack frame, an address into DrawSquare()'s code)
space for the local variables of the calling routine (if any).

Chapter 3 Prepare for Kernel Debug

Decode Oops Messages

Bug hunting: Kernel bug reports often come with a stack dump, depending on the severity of the issue, it may also contain the word Oops. The following file oopsdemo.c is used to generate oops:


#include <linux/init.h>
#include <linux/module.h>
MODULE_LICENSE("Dual BSD/GPL");

static int init_oopsdemo(void) {
  *((int *) 0x00 ) = 0x123456;
  return 0;
}

static void cleanup_oopsdemo(void){

}

module_init(init_oopsdemo);
module_exit(cleanup_oopsdemo);

Makefile:


# If KERNELRELEASE is defined, we've been invoked in the kernel build system
ifneq ($(KERNELRELEASE),)
obj-m := oopsdemo.o
# Otherwise we were called directly from the command line
else
KERNELDIR ?= /lib/modules/$(shell uname -r)/build
PWD  := $(shell pwd)
default:
        $(MAKE) -C $(KERNELDIR) M=$(PWD) modules
endif

Load the oopsdemo.ko module:



[  618.291854] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
[  618.291864] #PF error: [WRITE]
[  618.291866] PGD 0 P4D 0 
[  618.291877] Oops: 0002 [#1] SMP PTI

[  618.291884] CPU: 0 PID: 7275 Comm: insmod Tainted: G           OE     5.0.0-23-generic #24~18.04.1-Ubuntu
[  618.291886] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
[  618.291897] RIP: 0010:init_oopsdemo+0x8/0x20 [oopsdemo]
[  618.291903] Code: Bad RIP value.
[  618.291904] RSP: 0018:ffffafad41b03c70 EFLAGS: 00010246
[  618.291906] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[  618.291908] RDX: 000000000000f1f9 RSI: 00000000006000c0 RDI: ffffffffc0529000
[  618.291909] RBP: ffffafad41b03ce8 R08: ffff95dbfda27080 R09: ffff95dbfd401900
[  618.291910] R10: ffffe40e01e88580 R11: ffff95dbfffae000 R12: ffffffffc0529000
[  618.291911] R13: ffff95dbb560eea0 R14: ffffafad41b03e78 R15: ffffffffc052b000
[  618.291913] FS:  00007f889cc40540(0000) GS:ffff95dbfda00000(0000) knlGS:0000000000000000
[  618.291914] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  618.291916] CR2: ffffffffc0528fde CR3: 0000000078fd6000 CR4: 00000000000006f0
[  618.291920] Call Trace:
[  618.291946]  ? do_one_initcall+0x4a/0x1c9
[  618.291960]  ? _cond_resched+0x19/0x40
[  618.291966]  ? kmem_cache_alloc_trace+0x42/0x1c0
[  618.291971]  do_init_module+0x5f/0x216
[  618.291974]  load_module+0x19f6/0x20a0
[  618.291978]  __do_sys_finit_module+0xfc/0x120
[  618.291979]  ? __do_sys_finit_module+0xfc/0x120
[  618.291982]  __x64_sys_finit_module+0x1a/0x20
[  618.291984]  do_syscall_64+0x5a/0x120
[  618.291987]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  618.291990] RIP: 0033:0x7f889c756839
[  618.291993] Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 1f f6 2c 00 f7 d8 64 89 01 48
[  618.291995] RSP: 002b:00007fff7a2e7ad8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
[  618.291997] RAX: ffffffffffffffda RBX: 00005556adb9d780 RCX: 00007f889c756839
[  618.291998] RDX: 0000000000000000 RSI: 00005556ad454d2e RDI: 0000000000000003
[  618.292000] RBP: 00005556ad454d2e R08: 0000000000000000 R09: 00007f889ca29000
[  618.292001] R10: 0000000000000003 R11: 0000000000000246 R12: 0000000000000000
[  618.292002] R13: 00005556adb9ff70 R14: 0000000000000000 R15: 0000000000000000
[  618.292004] Modules linked in: oopsdemo(OE+) snd_hda_codec_generic ledtrig_audio snd_hda_intel crct10dif_pclmul snd_hda_codec crc32_pclmul ghash_clmulni_intel snd_hda_core joydev aesni_intel snd_hwdep aes_x86_64 snd_pcm qxl snd_seq_midi crypto_simd snd_seq_midi_event ttm snd_rawmidi cryptd snd_seq glue_helper drm_kms_helper snd_seq_device snd_timer snd drm fb_sys_fops soundcore syscopyarea sysfillrect sysimgblt input_leds serio_raw qemu_fw_cfg mac_hid sch_fq_codel parport_pc ppdev lp parport ip_tables x_tables autofs4 hid_generic usbhid hid psmouse virtio_blk virtio_net net_failover failover i2c_piix4 pata_acpi floppy
[  618.292057] CR2: 0000000000000000
[  618.292064] ---[ end trace dfd9153d646f12aa ]---
[  618.292068] RIP: 0010:init_oopsdemo+0x8/0x20 [oopsdemo]
[  618.292072] Code: Bad RIP value.
[  618.292074] RSP: 0018:ffffafad41b03c70 EFLAGS: 00010246
[  618.292075] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[  618.292082] RDX: 000000000000f1f9 RSI: 00000000006000c0 RDI: ffffffffc0529000
[  618.292084] RBP: ffffafad41b03ce8 R08: ffff95dbfda27080 R09: ffff95dbfd401900
[  618.292085] R10: ffffe40e01e88580 R11: ffff95dbfffae000 R12: ffffffffc0529000
[  618.292087] R13: ffff95dbb560eea0 R14: ffffafad41b03e78 R15: ffffffffc052b000
[  618.292088] FS:  00007f889cc40540(0000) GS:ffff95dbfda00000(0000) knlGS:0000000000000000
[  618.292090] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  618.292091] CR2: ffffffffc0528fde CR3: 0000000078fd6000 CR4: 00000000000006f0

Oops: error code [# ]

bit 0
bit 1
bit 2

CPU:
PID:
Comm:
Tainted:

Call Trace

stack dump

stack backtrace

stack traces

identify the line inside the Kernel’s source code where the bug happened

Normally the Oops text is read from the kernel buffers by klogd and handed to syslogd which writes it to a syslog file, typically /var/log/messages (depends on /etc/syslog.conf). On systems with systemd, it may also be stored by the journald daemon, and accessed by running journalctl command. If klogd dies, you can run:


 dmesg > file

or,


cat /proc/kmsg > file

If the machine has crashed so badly that you cannot enter commands or the disk is not available, you need to get the Oops by:

Hand copy the text from the screen/terminal
Use Kdump

To find the bug’s location:

objdump

objdump


$ objdump -r -S -l --disassemble oopsdemo.o

oopsdemo.o:     file format elf64-x86-64


Disassembly of section .text:

0000000000000000 :
init_oopsdemo():
   0: e8 00 00 00 00        callq  5 
   1: R_X86_64_PC32 __fentry__-0x4
   5: 55                    push   %rbp
   6: 31 c0                 xor    %eax,%eax
   8: c7 04 25 00 00 00 00  movl   $0x123456,0x0
   f: 56 34 12 00 
  13: 48 89 e5              mov    %rsp,%rbp
  16: 5d                    pop    %rbp
  17: c3                    retq   
  18: 0f 1f 84 00 00 00 00  nopl   0x0(%rax,%rax,1)
  1f: 00 

0000000000000020 :
cleanup_oopsdemo():
  20: e8 00 00 00 00        callq  25 
   21: R_X86_64_PC32 __fentry__-0x4
  25: 55                    push   %rbp
  26: 48 89 e5              mov    %rsp,%rbp
  29: 5d                    pop    %rbp
  2a: c3                    retq

gdb

the Kernel should be pre-compiled with debug info


$ grep CONFIG_DEBUG_INFO /boot/config-`uname -r`
CONFIG_DEBUG_INFO=y


$ gdb vmlinux
(gdb) l *0xc021e50e

#18 Linux Magic System Request Key

It is a ‘magical’ key combo you can hit which the kernel will respond to regardless of whatever else it is doing, unless it is completely locked up. To enable the magic SysRq key:


  CONFIG_MAGIC_SYSRQ=y

/proc/sys/kernel/sysrq controls the functions allowed to be invoked via the SysRq key:

0
1
bitmask value


  2 =   0x2 - enable control of console logging level
  4 =   0x4 - enable control of keyboard (SAK, unraw)
  8 =   0x8 - enable debugging dumps of processes etc.
 16 =  0x10 - enable sync command
 32 =  0x20 - enable remount read-only
 64 =  0x40 - enable signalling of processes (term, kill, oom-kill)
128 =  0x80 - allow reboot/poweroff
256 = 0x100 - allow nicing of all RT tasks

To use the magic SysRq key, write a command character to /proc/sysrq-trigger. e.g.:


echo m > /proc/sysrq-trigger

To enable all functions:


$ sudo -i
# echo 1 > /proc/sys/kernel/sysrq
# echo m > /proc/sysrq-trigger
# dmesg
...
[ 8575.330056] sysrq: SysRq : Show Memory
[ 8575.330067] Mem-Info:
...

Following are the command keys available for Alt+SysRq+commandkey.

‘k’ – Kills all the process running on the current virtual console.
‘s’ – This will attempt to sync all the mounted file system.
‘b’ – Immediately reboot the system, without unmounting partitions or syncing.
'c' – Generate crash dump
‘e’ – Sends SIGTERM to all process except init.
‘m’ – Output current memory information to the console.
‘i’ – Send the SIGKILL signal to all processes except init
‘r’ – Switch the keyboard from raw mode (the mode used by programs such as X11), to XLATE mode.
‘s’ – sync all mounted file system.
‘t’ – Output a list of current tasks and their information to the console.
‘u’ – Remount all mounted filesystems in readonly mode.
‘o’ – Shutdown the system immediately.
‘p’ – Print the current registers and flags to the console.
‘0-9’ – Sets the console log level, controlling which kernel messages will be printed to your console.
‘f’ – Will call oom_kill to kill process which takes more memory.
‘h’ – Used to display the help. But any other keys than the above listed will print help.

Use Kdump to get kernel crash dump

Kdump is a kernel crash dumping mechanism that allows you to save the contents of the system’s memory for later analysis. It relies on kexec, which can be used to boot a Linux kernel from the context of another kernel, bypass BIOS, and preserve the contents of the first kernel’s memory that would otherwise be lost. The kernel crash dump utility is installed with the following command:


sudo apt install linux-crashdump

The kdump mechanism will be enabled during the installation. Kdump utilizes two kernels:

System kernel
Dump capture kernel

In real production environments system and dump capture kernel will be different - system kernel needs a lot of features and compiled with a many kernel flags/drivers. While dump capture kernel goal is to be minimalistic and take as small amount of memory as possible.

Compiling the dump capture kernel

CONFIG_DEBUG_INFO=y
CONFIG_CRASH_DUMP=y
CONFIG_PROC_VMCORE=y

Setup kdump kernel


  crashkernel=64M

/boot/grub/grub.cfg

/etc/default/grub

/etc/grub.d/

/etc/default/grub


GRUB_TIMEOUT_STYLE=menu
GRUB_CMDLINE_LINUX="crashkernel=128M"

/etc/grub.d/

update-grub


$ cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-5.0.0-37-generic root=UUID=900ed0fb-963a-4962-a941-2f6056f43d9e ro crashkernel=128M quiet splash vt.handoff=1


crashkernel=(range1):(size1),(range2):(size2)


crashkernel=128M@16M

Configuring the kdump type

a file in a local file system
written directly to a device
sent over a network using the NFS (Network File System) or SSH (Secure Shell) protocol

To confirm that the kernel dump mechanism is enabled,

the crashkernel boot parameter is present


cat /proc/cmdline

the requested memory area for the kdump kernel is reserved


$ dmesg | grep -i crash

display the current config


$ kdump-config show
DUMP_MODE:        kdump
USE_KDUMP:        1
KDUMP_SYSCTL:     kernel.panic_on_oops=1
KDUMP_COREDIR:    /var/crash
crashkernel addr: 0x
   /var/lib/kdump/vmlinuz: symbolic link to /boot/vmlinuz-5.0.0-37-generic
kdump initrd: 
   /var/lib/kdump/initrd.img: symbolic link to /var/lib/kdump/initrd.img-5.0.0-37-generic
current state:    ready to kdump

kexec command:
  /sbin/kexec -p --command-line="BOOT_IMAGE=/boot/vmlinuz-5.0.0-37-generic root=UUID=900ed0fb-963a-4962-a941-2f6056f43d9e ro quiet splash vt.handoff=1 systemd.unit=kdump-tools-dump.service nr_cpus=1 irqpoll nousb ata_piix.prefer_ms_hyperv=0" --initrd=/var/lib/kdump/initrd.img /var/lib/kdump/vmlinuz

Testing the Crash Dump Mechanism,


$ sudo -s
# echo 1 > /proc/sys/kernel/sysrq
# echo c > /proc/sysrq-trigger

After the system is booted normally, you will then find the Kernel Crash Dump file, and related subdirectories, in the /var/crash directory :


drwxr-sr-x 2 root whoopsie  4096  一   7 13:44 202001071344
-rw-r--r-- 1 root whoopsie   298  一   7 13:44 kexec_cmd
-rw-r----- 1 root whoopsie 17439  一   7 13:44 linux-image-5.0.0-37-generic-202001071344.crash

Using the crash utility

To determine the cause of the system crash, you can use the crash utility, which provides an interactive prompt very similar to the GNU Debugger (GDB). This utility allows you to interactively analyze a running Linux system as well as a core dump created by netdump, diskdump, xendump, or kdump. Install crash packages,


sudo apt-get install crash

Running the crash utility,


crash vmlinux  /var/crash/(timestamp)/vmcore

Chapter 6 Debug Tips

#56 Surviving the Linux OOM Killer

When your Linux machine runs out of memory, Out of Memory (OOM) killer is called by kernel to free some memory. The Linux kernel gives a score to each running process called oom_score which shows how likely it is to be terminated in case of low available memory. The oom_score of a process can be found in the /proc directory. The OOM killer checks oom_score_adj(-1000 to 1000) to adjust its final calculated score. To check if any of your processes have been OOM-killed:


grep -i kill /var/log/syslog

( -i, --ignore-case )

Hacking: The Art of Exploitation, 2nd Edition

by Jon Erickson

Chapter 0x200. PROGRAMMING

0x250. Getting Your Hands Dirty

firstprog.c:


#include <stdio.h>
int main() {
  int i;
  for(i=0; i < 10; i++) {
    puts("Hello, world!\n");
  }
  return 0;
}

0x251. The Bigger Picture

The GNU development tools include a program called objdump, which can be used to examine compiled binaries.


$ objdump -D a.out | grep -A20 main.:
000000000000063a <main>:
 63a: 55                    push   %rbp
 63b: 48 89 e5              mov    %rsp,%rbp
 63e: 48 83 ec 10           sub    $0x10,%rsp
 642: c7 45 fc 00 00 00 00  movl   $0x0,-0x4(%rbp)
 649: eb 10                 jmp    65b 
 64b: 48 8d 3d a2 00 00 00  lea    0xa2(%rip),%rdi        # 6f4 <_IO_stdin_used+0x4>
 652: e8 b9 fe ff ff        callq  510 
 657: 83 45 fc 01           addl   $0x1,-0x4(%rbp)
 65b: 83 7d fc 09           cmpl   $0x9,-0x4(%rbp)
 65f: 7e ea                 jle    64b 
 661: b8 00 00 00 00        mov    $0x0,%eax
 666: c9                    leaveq 
 667: c3                    retq   
 668: 0f 1f 84 00 00 00 00  nopl   0x0(%rax,%rax,1)
 66f: 00 

0000000000000670 <__libc_csu_init>:
 670: 41 57                 push   %r15
 672: 41 56                 push   %r14
 674: 49 89 d7              mov    %rdx,%r15

grep with the command-line option "-A20" to only display 20 lines after the regular expression main.:. The same code can be shown in Intel syntax by providing an additional command-line option, "-M intel", to objdump,


$ objdump -M intel -D a.out | grep -A20 main.:
000000000000063a <main>: 
 63a: 55                    push   rbp
 63b: 48 89 e5              mov    rbp,rsp
 63e: 48 83 ec 10           sub    rsp,0x10
 642: c7 45 fc 00 00 00 00  mov    DWORD PTR [rbp-0x4],0x0
 649: eb 10                 jmp    65b 
 64b: 48 8d 3d a2 00 00 00  lea    rdi,[rip+0xa2]        # 6f4 <_IO_stdin_used+0x4>
 652: e8 b9 fe ff ff        call   510 
 657: 83 45 fc 01           add    DWORD PTR [rbp-0x4],0x1
 65b: 83 7d fc 09           cmp    DWORD PTR [rbp-0x4],0x9
 65f: 7e ea                 jle    64b 
 661: b8 00 00 00 00        mov    eax,0x0
 666: c9                    leave  
 667: c3                    ret    
 668: 0f 1f 84 00 00 00 00  nop    DWORD PTR [rax+rax*1+0x0]
 66f: 00 

0000000000000670 <__libc_csu_init>:
 670: 41 57                 push   r15
 672: 41 56                 push   r14
 674: 49 89 d7              mov    r15,rdx

0x252. The x64 Processor


$ gdb -q ./a.out
Reading symbols from ./a.out...done.
(gdb) break main
Breakpoint 1 at 0x642: file firstprog.c, line 4.
(gdb) run
Starting program: /home/jerry/test/a.out 

Breakpoint 1, main () at firstprog.c:4
4   for(i=0; i < 10; i++) {
(gdb) info registers
rax            0x55555555463a 93824992233018
rbx            0x0 0
rcx            0x555555554670 93824992233072
rdx            0x7fffffffdd78 140737488346488
rsi            0x7fffffffdd68 140737488346472
rdi            0x1 1
rbp            0x7fffffffdc80 0x7fffffffdc80
rsp            0x7fffffffdc70 0x7fffffffdc70
r8             0x7ffff7dd0d80 140737351847296
r9             0x7ffff7dd0d80 140737351847296
r10            0x2 2
r11            0x3 3
r12            0x555555554530 93824992232752
r13            0x7fffffffdd60 140737488346464
r14            0x0 0
r15            0x0 0
rip            0x555555554642 0x555555554642 
eflags         0x202 [ IF ]
cs             0x33 51
ss             0x2b 43
ds             0x0 0
es             0x0 0
fs             0x0 0
gs             0x0 0

There are sixteen 64-bit registers in x86-64. By convention,

%rax is used to store a function’s return value, if it exists and is no more than 64 bits long.
%rbx, %rbp, and %r12-r15 are callee-save registers, meaning that they are saved across function calls.
%rsp is used as the stack pointer, a pointer to the topmost element in the stack.
%rdi, %rsi, %rdx, %rcx, %r8, and %r9 are used to pass the first six integer or pointer parameters to called functions. Additional parameters (or large parameters such as structs passed by value) are passed on the stack.

0x253. Assembly Language

White Paper: Red Hat Crash Utility

Abstract

Crash is a tool for interactively analyzing the state of the Linux system while it is running, or after a kernel crash has occurred and a core dump has been created by the netdump, diskdump, LKCD, kdump, xendump or kvmdump facilities. It is loosely based on the SVR4 UNIX crash command, but has been significantly enhanced by completely merging it with the gdb debugger. The crash utility is designed to be independent of Linux version dependencies.

Prerequisites

The crash utility has the following prerequisites:

kernel object file

-g

memory image
platform processor types
Linux kernel versions

Installation


$ sudo apt-get install crash

Invocation

When crash is run on a dumpfile, at least two arguments are always required:

The kernel object filename
The dumpfile name

pr_debug()

“Kernel hacking” à kobject debugging [CONFIG_DEBUG_KOBJECT]

Some files call pr_debug(), which is ordinarily an empty macro that discards

its arguments at compile time. To enable debugging output, build the

appropriate file with -DDEBUG by adding

CFLAGS_[filename].o := -DDEBUG

to the makefile.

For example, to see all attempts to spawn a usermode helper (such as

/sbin/hotplug), add to lib/Makefile the line:

CFLAGS_kobject_uevent.o := -DDEBUG

Then boot the new kernel, do something that spawns a usermode helper, and

use the "dmesg" command to view the pr_debug() output.

Debugging Support in the Kernel

Except where specified otherwise, all of these options are found under the “kernel hacking” menu in whatever kernel configuration tool you prefer. Note that some of these options are not supported by all architectures.

CONFIG_DEBUG_KERNEL

This option just makes other debugging options available; it should be turned on but does not, by itself, enable any features.

CONFIG_DEBUG_SLAB

This crucial option turns on several types of checks in the kernel memory alloca- tion functions; with these checks enabled, it is possible to detect a number of memory overrun and missing initialization errors. Each byte of allocated memory is set to 0xa5 before being handed to the caller and then set to 0x6b when it is freed. If you ever see either of those “poison” patterns repeating in output from your driver (or often in an oops listing), you’ll know exactly what sort of error to look for. When debugging is enabled, the kernel also places special guard values before and after every allocated memory object; if those values ever get changed, the kernel knows that somebody has overrun a memory allocation, and it com- plains loudly. Various checks for more obscure errors are enabled as well.

CONFIG_DEBUG_PAGEALLOC

Full pages are removed from the kernel address space when freed. This option can slow things down significantly, but it can also quickly point out certain kinds of memory corruption errors.

CONFIG_DEBUG_SPINLOCK

With this option enabled, the kernel catches operations on uninitialized spin- locks and various other errors (such as unlocking a lock twice).

CONFIG_DEBUG_SPINLOCK_SLEEP

This option enables a check for attempts to sleep while holding a spinlock. In fact, it complains if you call a function that could potentially sleep, even if the call in question would not sleep.

CONFIG_INIT_DEBUG Items marked with __init (or __initdata) are discarded after system initializa- tion or module load time. This option enables checks for code that attempts to access initialization-time memory after initialization is complete.

CONFIG_DEBUG_INFO

This option causes the kernel to be built with full debugging information included. You’ll need that information if you want to debug the kernel with gdb. You may also want to enable CONFIG_FRAME_POINTER if you plan to use gdb.

CONFIG_MAGIC_SYSRQ

Enables the “magic SysRq” key. We look at this key in the section “System Hangs,” later in this chapter.

CONFIG_DEBUG_STACKOVERFLOW

CONFIG_DEBUG_STACK_USAGE

These options can help track down kernel stack overflows. A sure sign of a stack overflow is an oops listing without any sort of reasonable back trace. The first option adds explicit overflow checks to the kernel; the second causes the kernel to monitor stack usage and make some statistics available via the magic SysRq key.

CONFIG_KALLSYMS

This option (under “General setup/Standard features”) causes kernel symbol information to be built into the kernel; it is enabled by default. The symbol information is used in debugging contexts; without it, an oops listing can give you a kernel traceback only in hexadecimal, which is not very useful.

CONFIG_IKCONFIG

CONFIG_IKCONFIG_PROC

These options (found in the “General setup” menu) cause the full kernel config- uration state to be built into the kernel and to be made available via /proc. Most kernel developers know which configuration they used and do not need these options (which make the kernel bigger). They can be useful, though, if you are trying to debug a problem in a kernel built by somebody else.

CONFIG_ACPI_DEBUG

Under “Power management/ACPI.” This option turns on verbose ACPI (Advanced Configuration and Power Interface) debugging information, which can be useful if you suspect a problem related to ACPI.

CONFIG_DEBUG_DRIVER

Under “Device drivers.” Turns on debugging information in the driver core, which can be useful for tracking down problems in the low-level support code. We’ll look at the driver core in Chapter 14.

CONFIG_SCSI_CONSTANTS

This option, found under “Device drivers/SCSI device support,” builds in infor- mation for verbose SCSI error messages. If you are working on a SCSI driver, you probably want this option.

CONFIG_INPUT_EVBUG

This option (under “Device drivers/Input device support”) turns on verbose log- ging of input events. If you are working on a driver for an input device, this option may be helpful. Be aware of the security implications of this option, how- ever: it logs everything you type, including your passwords.

CONFIG_PROFILING

This option is found under “Profiling support.” Profiling is normally used for system performance tuning, but it can also be useful for tracking down some kernel hangs and related problems.

We will revisit some of the above options as we look at various ways of tracking down kernel problems. But first, we will look at the classic debugging technique: print statements.

KERN_EMERG

KERN_ALERT

KERN_CRIT

KERN_ERR

KERN_WARNING

KERN_NOTICE

KERN_INFO

An emergency condition; the system is probably dead. A problem that requires immediate attention.
A critical condition.
An error.
A warning.
A normal, but perhaps noteworthy, condition. An informational message.

KERN_DEBUG

A debug message—typically superfluous.

If the priority is less than the integer variable console_loglevel, the message is delivered to the console one line at a time (nothing is sent unless a trailing newline is provided).

The variable console_loglevel is initialized to DEFAULT_CONSOLE_LOGLEVEL

It is also possible to read and modify the console loglevel using the text file /proc/sys/ kernel/printk.

The file hosts four integer values: the current loglevel, the default level for messages that lack an explicit loglevel, the minimum allowed loglevel, and the boot-time default loglevel.

Writing a single value to this file changes the current loglevel to that value; thus, for example, you can cause all kernel messages to appear at the console by simply entering:

# echo 8 > /proc/sys/kernel/printk

It should now be apparent why the hello.c sample had the KERN_ALERT; markers; they

are there to make sure that the messages appear on the console.

It is worth the memory use, however, during not only development but also deployment. The configuration option CONFIG_KALLSYMS_ALL additionally stores the symbolic name of all symbols, not only functions.This is generally needed only by specialized debuggers. The CONFIG_KALLSYMS_EXTRA_PASS option causes the kernel build process to make a second pass over the kernel’s object code. It is useful only when debugging kallsyms itself. Thanks to kernel preemption, the kernel has a central atomicity counter.The kernel can be set such that if a task sleeps while atomic, or even does something that might sleep, the kernel prints a warning and provides a back trace. Potential bugs that are detectable include calling schedule() while holding a lock, issuing a blocking memory allocation while holding a lock, or sleeping while holding a reference to per-CPU data.This debugging infrastructure catches a lot of bugs and is highly recommended. The following options make the best use of this feature: CONFIG_PREEMPT=y CONFIG_DEBUG_KERNEL=y CONFIG_KALLSYMS=y CONFIG_DEBUG_SPINLOCK_SLEEP=y Most architectures define BUG() and BUG_ON() as illegal instructions, which result in the desired oops.You normally use these routines as assertions, to flag situations that should not happen: if (bad_thing) BUG(); Or even better BUG_ON(bad_thing); A more critical error is signaled via panic(). A call to panic() prints an error message and then halts the kernel. Obviously, you want to use it only in the worst of situations: if (terrible_thing) panic(“terrible_thing is %ld!\n”, terrible_thing); Sometimes, you just want a simple stack trace issued on the console to help you in debugging. In those cases, dump_stack()is used. It simply dumps the contents of the reg- isters and a function back trace to the console: if (!debug_check) { printk(KERN_DEBUG “provide some information...\n”); dump_stack(); }

搜尋此網誌

I'm Jay's father