Kernel Debugging
Debug Hacks
Basics
#4 Core Dump of a Process
The default action of certain signals is to cause a process to terminate and produce a core dump file, a disk file containing an image of the process's memory at the time of termination.
This image can be used in a debugger (e.g., gdb(1)) to inspect the state of the program at the time that it terminated.
一般的執行shell的環境是限制core file的產生:
$ ulimit -c 0-c: The maximum size of core files created.
要設成允許core file的產生:
$ ulimit -c unlimitedYou can see a process’s limits by running cat /proc/PID/limit.
Use the following code for testing:
#include <string.h> int main(){ char *ptr=NULL; *ptr=0; }to generate core file
$ gcc -g seg.c -o ./test $ ./test Segmentation fault (core dumped) $ file core core: ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style, from './test', real uid: 1000, effective uid: 1000, real gid: 1000, effective gid: 1000, execfn: './test', platform: 'x86_64' $ gdb -c core ./test GNU gdb (Ubuntu 8.1-0ubuntu3.2) 8.1.0.20180409-git ... Reading symbols from ./test...(no debugging symbols found)...done. [New LWP 7209] Core was generated by `./test'. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x000055a3a2a8060a in main () at seg.c:6 6 *ptr=0; (gdb)Note, debug information must be compiled in with -g so that the crash line can be interpreted.
Using gdb's list command 'l 6' can help to dump the source code around the crash line:
(gdb) l 6 1 #include2 3 int main(){ 4 char *ptr=NULL; 5 6 *ptr=0; 7 8 }
預設是在目前工作的目錄下產生core file, 但是對大型軟體來說, 很難去找到哪個程式執行時的工作目錄, 最好能有專門存放core file的目錄, 也可藉此控制產生的大小 Linux supports an alternate syntax for the /proc/sys/kernel/core_pattern file. If the first character of this file is a pipe symbol(|), then the remainder of the line is interpreted as a program to be executed.
Ubuntu桌面版預裝了Apport,它是一個錯誤收集系統,當一個應用程式崩潰或者出現Bug時候,Apport就會通過彈窗警告用戶並且詢問用戶是否提交崩潰報告。
Apport uses /proc/sys/kernel/core_pattern to directly pipe the core dump into apport:
|/usr/share/apport/apport %p %s %c %d %P
- %p PID of dumped process
- %s signal number
- %c the limit of dump size
- %d dump mode
- %P PID of dumped process
- %e program name
- %h hostname
- %t/li>
timestamp
sysctl -w kernel.core_pattern=/tmp/core-%e.%p.%h.%tThe core dump file generated : /tmp/core-test.9950.jerry-Latitude-E6410.1577308679 The Linux-specific /proc/[pid]/coredump_filter file can be used to control which memory segments are written to the core dump file. This file is provided only if the kernel was built with the CONFIG_ELF_CORE configuration option. The value in the file is a bit mask of memory mapping types (see mmap(2)):
- bit 0 Dump anonymous private mappings.
- bit 1 Dump anonymous shared mappings.
- bit 2 Dump file-backed private mappings.
- bit 3 Dump file-backed shared mappings.
- bit 4 Dump ELF headers.
- bit 5 Dump private huge pages.
- bit 6 Dump shared huge pages.
- bit 7 Dump private DAX pages.
- bit 8 Dump shared DAX pages.
echo 1 > /proc/[pid]/coredump_filter
#5 GDB Basic 1: Trace
- Build the program with debug -g option gcc -Wall -O2 -g Produce debugging information in the operating system's native format (stabs, COFF, XCOFF, or DWARF). -Werror will think warnings as errors.
- Start gdb
$ gdb program
break positionwhere position can be one of the following:
- function name
- line number
- filename:function_name
- filename:line_number
- +offset 目前停止的位址再繼續
- -offset 目前停止的位址再倒退
- *address
(gdb) b main Breakpoint 1 at 0x5fe: file seg.c, line 4. (gdb) b 4 Note: breakpoint 1 also set at pc 0x5fe. Breakpoint 2 at 0x5fe: file seg.c, line 4.顯示目前已設定的breakpoint:
(gdb) info break
- clear Delete any breakpoints at the next instruction to be executed in the selected stack frame (see section Selecting a frame). When the innermost frame is selected, this is a good way to delete a breakpoint where your program just stopped.
- clear function
- clear filename:function Delete any breakpoints set at entry to the function function.
- clear linenum
- clear filename:linenum Delete any breakpoints set at or within the code of the specified line.
- delete [breakpoints] [range...] Delete the breakpoints, watchpoints, or catchpoints of the breakpoint ranges specified as arguments. If no argument is specified,
- delete all breakpoints (GDB asks confirmation, unless you have set confirm off). You can abbreviate this command as d.
run program_parameters把main設為breakpoint並執行到main是常見的作法, 指令start可以做到這些要求. 執行一行程式碼有兩方法:
- next 函數呼叫也只算是一行(不會跳進函數內執行)
- step 碰到函數呼叫時會跳進函數內執行
continue 次數
break position if cond
Evaluate the expression cond each time the breakpoint is reached, and stop only if the value is nonzero -- that is, if cond evaluates as true.
- the location of the call in your program
- the arguments of the call
- the local variables of the function being called
#include <stdio.h>
void call_2()
{
printf("hello 2\n");
}
void call_1()
{
printf("hello 1\n");
call_2();
}
int main()
{
printf("hello\n");
call_1();
return 0;
}
Set the breakpoint in call_2() then backtrace:
(gdb) b call_2
Breakpoint 1 at 0x63e: file stack.c, line 5.
(gdb) r
Starting program: /home/jerry/test/a.out
hello
hello 1
Breakpoint 1, call_2 () at stack.c:5
5 printf("hello 2\n");
(gdb) bt
#0 call_2 () at stack.c:5
#1 0x0000555555554667 in call_1 () at stack.c:11
#2 0x0000555555554684 in main () at stack.c:17
set variable 變數=運算式
print expr
info registers prints the names and values of all registers except floating-point registers (in the selected stack frame). To print a register:
(gdb) p $registerName
可以指定要顯示的格式: p/格式. 格式: - x hex
- d decimal
- c character
- s string
- t binary
- a address
(gdb) x $pc
0x55555555466e <main+4>: 0xaf3d8d48
Besides, you can disassemble the content of the address
(gdb) x/i $pc
=> 0x55555555466e : lea 0xaf(%rip),%rdi # 0x555555554724
In general, we want to dump and interpret a series of memory,
x/NFU addr
- Numbers of the repeat count
- Format of the displayed result `s' (null-terminated string), or `i' (machine instruction). The default is `x' (hexadecimal) initially.
- Unit size b: Bytes. h: Halfwords (two bytes). w: Words (four bytes). This is the initial default. g: Giant words (eight bytes).
(gdb) x/5i $pc
=> 0x55555555466e : lea 0xaf(%rip),%rdi # 0x555555554724
You can use gdb to disassembles a specified function or a fragment of memory:
disassemble
disassemble [Function]
disassemble [Address]
disassemble [Start],[End]
disassemble [Function],+[Length]
disassemble [Address],+[Length]
disassemble /m [...]
disassemble /r [...]
- Set a watchpoint for an expression. GDB will break when expr is written into by the program and its value changes.
watch expr
rwatch expr
awatch expr
info watchpoints
generate-core-file
對正在執行中的process產生core file:
$ gcore PID
#6 GDB Basic 2
- Debugging an already-running process
attach process-id
This command attaches to a running process--one that was started outside GDB. (info files shows your active targets.) The first thing GDB does after arranging to debug the specified process is to stop it. You can examine and modify an attached process with all the GDB commands that are ordinarily available when you start processes with run. When you have finished debugging the attached process, you can use the detach command to release it from GDB control. After the detach command, that process and GDB become completely independent once more. If you exit GDB or use the run command while you have an attached process, you kill that process. 這可以讓你透過backtrace觀察什麼原因造成一個process在等待或是在一個無窮迴圈裏
if ( node == 0 )
當特定條件符合時 才會跳至中斷位址:
break position
替某個中斷編號增加條件:
condition #break condition
從某個中斷編號中刪除條件:
condition #break
commands #break
... command-list ...
end
#8 Intel Architecture Basic
#9 Stack
The primary purpose of a call stack is to store the return addresses. When a subroutine is called, the location (address) of the instruction at which the calling routine can later resume needs to be saved somewhere. A call stack is composed of stack frames. These are machine dependent and ABI-dependent data structures containing subroutine state information. Each stack frame corresponds to a call to a subroutine which has not yet terminated with a return. For example, if a subroutine named DrawLine() is currently running, having been called by a subroutine DrawSquare(), the top part of the call stack might be laid out like this: The stack frame at the top of the stack is for the currently executing routine. The stack frame usually includes at least the following items (in push order):- the arguments (parameter values) passed to the routine (if any)
- the return address back to the routine's caller (e.g. in the DrawLine() stack frame, an address into DrawSquare()'s code)
- space for the local variables of the calling routine (if any).
Chapter 3 Prepare for Kernel Debug
Decode Oops Messages
Bug hunting: Kernel bug reports often come with a stack dump, depending on the severity of the issue, it may also contain the word Oops. The following file oopsdemo.c is used to generate oops:#include <linux/init.h> #include <linux/module.h> MODULE_LICENSE("Dual BSD/GPL"); static int init_oopsdemo(void) { *((int *) 0x00 ) = 0x123456; return 0; } static void cleanup_oopsdemo(void){ } module_init(init_oopsdemo); module_exit(cleanup_oopsdemo);Makefile:
# If KERNELRELEASE is defined, we've been invoked in the kernel build system ifneq ($(KERNELRELEASE),) obj-m := oopsdemo.o # Otherwise we were called directly from the command line else KERNELDIR ?= /lib/modules/$(shell uname -r)/build PWD := $(shell pwd) default: $(MAKE) -C $(KERNELDIR) M=$(PWD) modules endifLoad the oopsdemo.ko module:
[ 618.291854] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 [ 618.291864] #PF error: [WRITE] [ 618.291866] PGD 0 P4D 0 [ 618.291877] Oops: 0002 [#1] SMP PTI [ 618.291884] CPU: 0 PID: 7275 Comm: insmod Tainted: G OE 5.0.0-23-generic #24~18.04.1-Ubuntu [ 618.291886] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014 [ 618.291897] RIP: 0010:init_oopsdemo+0x8/0x20 [oopsdemo] [ 618.291903] Code: Bad RIP value. [ 618.291904] RSP: 0018:ffffafad41b03c70 EFLAGS: 00010246 [ 618.291906] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 [ 618.291908] RDX: 000000000000f1f9 RSI: 00000000006000c0 RDI: ffffffffc0529000 [ 618.291909] RBP: ffffafad41b03ce8 R08: ffff95dbfda27080 R09: ffff95dbfd401900 [ 618.291910] R10: ffffe40e01e88580 R11: ffff95dbfffae000 R12: ffffffffc0529000 [ 618.291911] R13: ffff95dbb560eea0 R14: ffffafad41b03e78 R15: ffffffffc052b000 [ 618.291913] FS: 00007f889cc40540(0000) GS:ffff95dbfda00000(0000) knlGS:0000000000000000 [ 618.291914] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 618.291916] CR2: ffffffffc0528fde CR3: 0000000078fd6000 CR4: 00000000000006f0 [ 618.291920] Call Trace: [ 618.291946] ? do_one_initcall+0x4a/0x1c9 [ 618.291960] ? _cond_resched+0x19/0x40 [ 618.291966] ? kmem_cache_alloc_trace+0x42/0x1c0 [ 618.291971] do_init_module+0x5f/0x216 [ 618.291974] load_module+0x19f6/0x20a0 [ 618.291978] __do_sys_finit_module+0xfc/0x120 [ 618.291979] ? __do_sys_finit_module+0xfc/0x120 [ 618.291982] __x64_sys_finit_module+0x1a/0x20 [ 618.291984] do_syscall_64+0x5a/0x120 [ 618.291987] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 618.291990] RIP: 0033:0x7f889c756839 [ 618.291993] Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 1f f6 2c 00 f7 d8 64 89 01 48 [ 618.291995] RSP: 002b:00007fff7a2e7ad8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 [ 618.291997] RAX: ffffffffffffffda RBX: 00005556adb9d780 RCX: 00007f889c756839 [ 618.291998] RDX: 0000000000000000 RSI: 00005556ad454d2e RDI: 0000000000000003 [ 618.292000] RBP: 00005556ad454d2e R08: 0000000000000000 R09: 00007f889ca29000 [ 618.292001] R10: 0000000000000003 R11: 0000000000000246 R12: 0000000000000000 [ 618.292002] R13: 00005556adb9ff70 R14: 0000000000000000 R15: 0000000000000000 [ 618.292004] Modules linked in: oopsdemo(OE+) snd_hda_codec_generic ledtrig_audio snd_hda_intel crct10dif_pclmul snd_hda_codec crc32_pclmul ghash_clmulni_intel snd_hda_core joydev aesni_intel snd_hwdep aes_x86_64 snd_pcm qxl snd_seq_midi crypto_simd snd_seq_midi_event ttm snd_rawmidi cryptd snd_seq glue_helper drm_kms_helper snd_seq_device snd_timer snd drm fb_sys_fops soundcore syscopyarea sysfillrect sysimgblt input_leds serio_raw qemu_fw_cfg mac_hid sch_fq_codel parport_pc ppdev lp parport ip_tables x_tables autofs4 hid_generic usbhid hid psmouse virtio_blk virtio_net net_failover failover i2c_piix4 pata_acpi floppy [ 618.292057] CR2: 0000000000000000 [ 618.292064] ---[ end trace dfd9153d646f12aa ]--- [ 618.292068] RIP: 0010:init_oopsdemo+0x8/0x20 [oopsdemo] [ 618.292072] Code: Bad RIP value. [ 618.292074] RSP: 0018:ffffafad41b03c70 EFLAGS: 00010246 [ 618.292075] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 [ 618.292082] RDX: 000000000000f1f9 RSI: 00000000006000c0 RDI: ffffffffc0529000 [ 618.292084] RBP: ffffafad41b03ce8 R08: ffff95dbfda27080 R09: ffff95dbfd401900 [ 618.292085] R10: ffffe40e01e88580 R11: ffff95dbfffae000 R12: ffffffffc0529000 [ 618.292087] R13: ffff95dbb560eea0 R14: ffffafad41b03e78 R15: ffffffffc052b000 [ 618.292088] FS: 00007f889cc40540(0000) GS:ffff95dbfda00000(0000) knlGS:0000000000000000 [ 618.292090] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 618.292091] CR2: ffffffffc0528fde CR3: 0000000078fd6000 CR4: 00000000000006f0
- Oops: error code [# ] This is the error code value in hex. Each bit has a significance of its own:
- bit 0 0 means no page found, 1 means a protection fault
- bit 1 0 means read, 1 means write
- bit 2 0 means kernel, 1 means user-mode
- CPU: which CPU the error occurred
- PID:
- Comm:
- Tainted: The Tainted flag is defined in kernel/panic.c:
- P Proprietary module has been loaded.
- F Module has been forcibly loaded.
- S SMP with a CPU not designed for SMP.
- R User forced a module unload.
- M System experienced a machine check exception.
- B System has hit bad_page.
- U Userspace-defined naughtiness.
- A ACPI table overridden.
- W Taint on warning.
- Call Trace Oops reports end with a stack dump and (possibly lengthy) stack backtrace showing what caused the kernel to reach the point it is at. In this case, a kernel stack backtrace alongside register contents and other pertinent information is printed to the system console and also recorded to the system logs. Such stack traces provide enough information to identify the line inside the Kernel’s source code where the bug happened.
dmesg > fileor,
cat /proc/kmsg > fileIf the machine has crashed so badly that you cannot enter commands or the disk is not available, you need to get the Oops by:
- Hand copy the text from the screen/terminal
- Use Kdump
- objdump To debug a kernel, use objdump and look for the hex offset from the crash output to find the valid line of code/assembler.
$ objdump -r -S -l --disassemble oopsdemo.o oopsdemo.o: file format elf64-x86-64 Disassembly of section .text: 0000000000000000: init_oopsdemo(): 0: e8 00 00 00 00 callq 5 1: R_X86_64_PC32 __fentry__-0x4 5: 55 push %rbp 6: 31 c0 xor %eax,%eax 8: c7 04 25 00 00 00 00 movl $0x123456,0x0 f: 56 34 12 00 13: 48 89 e5 mov %rsp,%rbp 16: 5d pop %rbp 17: c3 retq 18: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1) 1f: 00 0000000000000020 : cleanup_oopsdemo(): 20: e8 00 00 00 00 callq 25 21: R_X86_64_PC32 __fentry__-0x4 25: 55 push %rbp 26: 48 89 e5 mov %rsp,%rbp 29: 5d pop %rbp 2a: c3 retq
$ grep CONFIG_DEBUG_INFO /boot/config-`uname -r` CONFIG_DEBUG_INFO=yOn a kernel compiled with CONFIG_DEBUG_INFO, you can simply copy the EIP value from the OOPS then use GDB to translate that to human-readable form:
$ gdb vmlinux (gdb) l *0xc021e50e
#18 Linux Magic System Request Key
It is a ‘magical’ key combo you can hit which the kernel will respond to regardless of whatever else it is doing, unless it is completely locked up. To enable the magic SysRq key:CONFIG_MAGIC_SYSRQ=y/proc/sys/kernel/sysrq controls the functions allowed to be invoked via the SysRq key:
- 0 disable sysrq completely
- 1 enable all functions of sysrq
- bitmask value
2 = 0x2 - enable control of console logging level 4 = 0x4 - enable control of keyboard (SAK, unraw) 8 = 0x8 - enable debugging dumps of processes etc. 16 = 0x10 - enable sync command 32 = 0x20 - enable remount read-only 64 = 0x40 - enable signalling of processes (term, kill, oom-kill) 128 = 0x80 - allow reboot/poweroff 256 = 0x100 - allow nicing of all RT tasksTo use the magic SysRq key, write a command character to /proc/sysrq-trigger. e.g.:
echo m > /proc/sysrq-triggerTo enable all functions:
$ sudo -i # echo 1 > /proc/sys/kernel/sysrq # echo m > /proc/sysrq-trigger # dmesg ... [ 8575.330056] sysrq: SysRq : Show Memory [ 8575.330067] Mem-Info: ...Following are the command keys available for Alt+SysRq+commandkey.
‘k’ – Kills all the process running on the current virtual console. ‘s’ – This will attempt to sync all the mounted file system. ‘b’ – Immediately reboot the system, without unmounting partitions or syncing. 'c' – Generate crash dump ‘e’ – Sends SIGTERM to all process except init. ‘m’ – Output current memory information to the console. ‘i’ – Send the SIGKILL signal to all processes except init ‘r’ – Switch the keyboard from raw mode (the mode used by programs such as X11), to XLATE mode. ‘s’ – sync all mounted file system. ‘t’ – Output a list of current tasks and their information to the console. ‘u’ – Remount all mounted filesystems in readonly mode. ‘o’ – Shutdown the system immediately. ‘p’ – Print the current registers and flags to the console. ‘0-9’ – Sets the console log level, controlling which kernel messages will be printed to your console. ‘f’ – Will call oom_kill to kill process which takes more memory. ‘h’ – Used to display the help. But any other keys than the above listed will print help.
Use Kdump to get kernel crash dump
Kdump is a kernel crash dumping mechanism that allows you to save the contents of the system’s memory for later analysis. It relies on kexec, which can be used to boot a Linux kernel from the context of another kernel, bypass BIOS, and preserve the contents of the first kernel’s memory that would otherwise be lost. The kernel crash dump utility is installed with the following command:sudo apt install linux-crashdumpThe kdump mechanism will be enabled during the installation. Kdump utilizes two kernels:
- System kernel is a normal kernel that is booted with special kdump-specific flags. We need to tell the system kernel to reserve some amount of physical memory where dump-capture kernel will be loaded.
- Dump capture kernel Once kernel crash happens the kernel crash handler uses Kexec mechanism to boot dump capture kernel. Please note that memory with system kernel is untouched and accessible from dump capture kernel as seen at the moment of crash. Once dump capture kernel is booted, the user can use the file /proc/vmcore to get access to memory of crashed system kernel.
- Compiling the dump capture kernel To create a kernel you need to edit kernel config (or config.x86_64) file and enable following configuration options:
- CONFIG_DEBUG_INFO=y
- CONFIG_CRASH_DUMP=y
- CONFIG_PROC_VMCORE=y then check the built kernel config: /boot/config-5.0.0-23-generic .
- Setup kdump kernel To reserve memory for dump capture kernel. Edit you bootloader configuration and add
- /etc/default/grub
- /etc/grub.d/ Remember to always generate the main configuration file by running 'update-grub' after making changes to /etc/default/grub and/or files in /etc/grub.d/. make sure "crashkernel=128M" has been added to boot the kernel,
- Configuring the kdump type When a kernel crash is captured, the core dump can be either stored as:
- a file in a local file system
- written directly to a device
- sent over a network using the NFS (Network File System) or SSH (Secure Shell) protocol
crashkernel=64Mboot option to the system kernel you just installed. For GRUB as an ex., on an installed system, GRUB loads the /boot/grub/grub.cfg configuration file each boot. That grub.cfg file can be generated, the generation process can be influenced by a variety of options in /etc/default/grub and scripts in /etc/grub.d/:
GRUB_TIMEOUT_STYLE=menu GRUB_CMDLINE_LINUX="crashkernel=128M"
$ cat /proc/cmdline BOOT_IMAGE=/boot/vmlinuz-5.0.0-37-generic root=UUID=900ed0fb-963a-4962-a941-2f6056f43d9e ro crashkernel=128M quiet splash vt.handoff=1You can also set the amount of reserved memory to be variable, depending on the total amount of installed memory:
crashkernel=(range1):(size1),(range2):(size2)For example: "crashkernel=512M-2G:64M,2G-:128M", this reserves 64 MB of memory if the total amount of system memory is between 512 MB and 2 GB, 128 MB is reserved if the total amount of system memory is more than 2 GB. To offset the reserved memory, use the following syntax:
crashkernel=128M@16MThis reserves 128 MB of memory starting at 16 MB (physical address 0x01000000)
- the crashkernel boot parameter is present
- the requested memory area for the kdump kernel is reserved
- display the current config
cat /proc/cmdline
$ dmesg | grep -i crash
$ kdump-config show DUMP_MODE: kdump USE_KDUMP: 1 KDUMP_SYSCTL: kernel.panic_on_oops=1 KDUMP_COREDIR: /var/crash crashkernel addr: 0x /var/lib/kdump/vmlinuz: symbolic link to /boot/vmlinuz-5.0.0-37-generic kdump initrd: /var/lib/kdump/initrd.img: symbolic link to /var/lib/kdump/initrd.img-5.0.0-37-generic current state: ready to kdump kexec command: /sbin/kexec -p --command-line="BOOT_IMAGE=/boot/vmlinuz-5.0.0-37-generic root=UUID=900ed0fb-963a-4962-a941-2f6056f43d9e ro quiet splash vt.handoff=1 systemd.unit=kdump-tools-dump.service nr_cpus=1 irqpoll nousb ata_piix.prefer_ms_hyperv=0" --initrd=/var/lib/kdump/initrd.img /var/lib/kdump/vmlinuz
$ sudo -s # echo 1 > /proc/sys/kernel/sysrq # echo c > /proc/sysrq-triggerAfter the system is booted normally, you will then find the Kernel Crash Dump file, and related subdirectories, in the /var/crash directory :
drwxr-sr-x 2 root whoopsie 4096 一 7 13:44 202001071344 -rw-r--r-- 1 root whoopsie 298 一 7 13:44 kexec_cmd -rw-r----- 1 root whoopsie 17439 一 7 13:44 linux-image-5.0.0-37-generic-202001071344.crash
Using the crash utility
To determine the cause of the system crash, you can use the crash utility, which provides an interactive prompt very similar to the GNU Debugger (GDB). This utility allows you to interactively analyze a running Linux system as well as a core dump created by netdump, diskdump, xendump, or kdump. Install crash packages,sudo apt-get install crashRunning the crash utility,
crash vmlinux /var/crash/(timestamp)/vmcore
Chapter 6 Debug Tips
#56 Surviving the Linux OOM Killer
When your Linux machine runs out of memory, Out of Memory (OOM) killer is called by kernel to free some memory. The Linux kernel gives a score to each running process called oom_score which shows how likely it is to be terminated in case of low available memory. The oom_score of a process can be found in the /proc directory. The OOM killer checks oom_score_adj(-1000 to 1000) to adjust its final calculated score. To check if any of your processes have been OOM-killed:grep -i kill /var/log/syslog( -i, --ignore-case )
Hacking: The Art of Exploitation, 2nd Edition
by Jon EricksonChapter 0x200. PROGRAMMING
0x250. Getting Your Hands Dirty
firstprog.c:#include <stdio.h> int main() { int i; for(i=0; i < 10; i++) { puts("Hello, world!\n"); } return 0; }
0x251. The Bigger Picture
The GNU development tools include a program called objdump, which can be used to examine compiled binaries.$ objdump -D a.out | grep -A20 main.: 000000000000063a <main>: 63a: 55 push %rbp 63b: 48 89 e5 mov %rsp,%rbp 63e: 48 83 ec 10 sub $0x10,%rsp 642: c7 45 fc 00 00 00 00 movl $0x0,-0x4(%rbp) 649: eb 10 jmp 65bgrep with the command-line option "-A20" to only display 20 lines after the regular expression main.:. The same code can be shown in Intel syntax by providing an additional command-line option, "-M intel", to objdump,64b: 48 8d 3d a2 00 00 00 lea 0xa2(%rip),%rdi # 6f4 <_IO_stdin_used+0x4> 652: e8 b9 fe ff ff callq 510 657: 83 45 fc 01 addl $0x1,-0x4(%rbp) 65b: 83 7d fc 09 cmpl $0x9,-0x4(%rbp) 65f: 7e ea jle 64b 661: b8 00 00 00 00 mov $0x0,%eax 666: c9 leaveq 667: c3 retq 668: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1) 66f: 00 0000000000000670 <__libc_csu_init>: 670: 41 57 push %r15 672: 41 56 push %r14 674: 49 89 d7 mov %rdx,%r15
$ objdump -M intel -D a.out | grep -A20 main.: 000000000000063a <main>: 63a: 55 push rbp 63b: 48 89 e5 mov rbp,rsp 63e: 48 83 ec 10 sub rsp,0x10 642: c7 45 fc 00 00 00 00 mov DWORD PTR [rbp-0x4],0x0 649: eb 10 jmp 65b64b: 48 8d 3d a2 00 00 00 lea rdi,[rip+0xa2] # 6f4 <_IO_stdin_used+0x4> 652: e8 b9 fe ff ff call 510 657: 83 45 fc 01 add DWORD PTR [rbp-0x4],0x1 65b: 83 7d fc 09 cmp DWORD PTR [rbp-0x4],0x9 65f: 7e ea jle 64b 661: b8 00 00 00 00 mov eax,0x0 666: c9 leave 667: c3 ret 668: 0f 1f 84 00 00 00 00 nop DWORD PTR [rax+rax*1+0x0] 66f: 00 0000000000000670 <__libc_csu_init>: 670: 41 57 push r15 672: 41 56 push r14 674: 49 89 d7 mov r15,rdx
0x252. The x64 Processor
$ gdb -q ./a.out Reading symbols from ./a.out...done. (gdb) break main Breakpoint 1 at 0x642: file firstprog.c, line 4. (gdb) run Starting program: /home/jerry/test/a.out Breakpoint 1, main () at firstprog.c:4 4 for(i=0; i < 10; i++) { (gdb) info registers rax 0x55555555463a 93824992233018 rbx 0x0 0 rcx 0x555555554670 93824992233072 rdx 0x7fffffffdd78 140737488346488 rsi 0x7fffffffdd68 140737488346472 rdi 0x1 1 rbp 0x7fffffffdc80 0x7fffffffdc80 rsp 0x7fffffffdc70 0x7fffffffdc70 r8 0x7ffff7dd0d80 140737351847296 r9 0x7ffff7dd0d80 140737351847296 r10 0x2 2 r11 0x3 3 r12 0x555555554530 93824992232752 r13 0x7fffffffdd60 140737488346464 r14 0x0 0 r15 0x0 0 rip 0x555555554642 0x555555554642There are sixteen 64-bit registers in x86-64. By convention,eflags 0x202 [ IF ] cs 0x33 51 ss 0x2b 43 ds 0x0 0 es 0x0 0 fs 0x0 0 gs 0x0 0
- %rax is used to store a function’s return value, if it exists and is no more than 64 bits long.
- %rbx, %rbp, and %r12-r15 are callee-save registers, meaning that they are saved across function calls.
- %rsp is used as the stack pointer, a pointer to the topmost element in the stack.
- %rdi, %rsi, %rdx, %rcx, %r8, and %r9 are used to pass the first six integer or pointer parameters to called functions. Additional parameters (or large parameters such as structs passed by value) are passed on the stack.
0x253. Assembly Language
White Paper: Red Hat Crash Utility
Abstract
Crash is a tool for interactively analyzing the state of the Linux system while it is running, or after a kernel crash has occurred and a core dump has been created by the netdump, diskdump, LKCD, kdump, xendump or kvmdump facilities. It is loosely based on the SVR4 UNIX crash command, but has been significantly enhanced by completely merging it with the gdb debugger. The crash utility is designed to be independent of Linux version dependencies.Prerequisites
The crash utility has the following prerequisites:- kernel object file A vmlinux kernel object file must have been built with the -g C flag.
- memory image A kernel crash dump file generated from any of the supported dump facilties, or live system memory accessed via /dev/mem ( /dev/crash). If no dump file argument is issued on the crash command line, live system memory will be used by default. When examining a live system, root privileges are required.
- platform processor types The crash utility is actively developed and tested on the x86, x86_64, ia64, ppc64, arm, s390 and s390x processors.
- Linux kernel versions The crash utility is backwards-compatible.
Installation
$ sudo apt-get install crash
Invocation
When crash is run on a dumpfile, at least two arguments are always required:- The kernel object filename vmlinux
- The dumpfile name vmcore
pr_debug()
“Kernel hacking” à kobject debugging [CONFIG_DEBUG_KOBJECT]
Some files call pr_debug(), which is ordinarily an empty macro that discards
its arguments at compile time. To enable debugging output, build the
appropriate file with -DDEBUG by adding
CFLAGS_[filename].o := -DDEBUG
to the makefile.
For example, to see all attempts to spawn a usermode helper (such as
/sbin/hotplug), add to lib/Makefile the line:
CFLAGS_kobject_uevent.o := -DDEBUG
Then boot the new kernel, do something that spawns a usermode helper, and
use the "dmesg" command to view the pr_debug() output.
Debugging Support in the Kernel
Except where specified otherwise, all of these options are found under the “kernel hacking” menu in whatever kernel configuration tool you prefer. Note that some of these options are not supported by all architectures.
CONFIG_DEBUG_KERNEL
This option just makes other debugging options available; it should be turned on but does not, by itself, enable any features.
CONFIG_DEBUG_SLAB
This crucial option turns on several types of checks in the kernel memory alloca- tion functions; with these checks enabled, it is possible to detect a number of memory overrun and missing initialization errors. Each byte of allocated memory is set to 0xa5 before being handed to the caller and then set to 0x6b when it is freed. If you ever see either of those “poison” patterns repeating in output from your driver (or often in an oops listing), you’ll know exactly what sort of error to look for. When debugging is enabled, the kernel also places special guard values before and after every allocated memory object; if those values ever get changed, the kernel knows that somebody has overrun a memory allocation, and it com- plains loudly. Various checks for more obscure errors are enabled as well.
CONFIG_DEBUG_PAGEALLOC
Full pages are removed from the kernel address space when freed. This option can slow things down significantly, but it can also quickly point out certain kinds of memory corruption errors.
CONFIG_DEBUG_SPINLOCK
With this option enabled, the kernel catches operations on uninitialized spin- locks and various other errors (such as unlocking a lock twice).
CONFIG_DEBUG_SPINLOCK_SLEEP
This option enables a check for attempts to sleep while holding a spinlock. In fact, it complains if you call a function that could potentially sleep, even if the call in question would not sleep.
CONFIG_INIT_DEBUG Items marked with __init (or __initdata) are discarded after system initializa- tion or module load time. This option enables checks for code that attempts to access initialization-time memory after initialization is complete.
CONFIG_DEBUG_INFO
This option causes the kernel to be built with full debugging information included. You’ll need that information if you want to debug the kernel with gdb. You may also want to enable CONFIG_FRAME_POINTER if you plan to use gdb.
CONFIG_MAGIC_SYSRQ
Enables the “magic SysRq” key. We look at this key in the section “System Hangs,” later in this chapter.
CONFIG_DEBUG_STACKOVERFLOW
CONFIG_DEBUG_STACK_USAGE
These options can help track down kernel stack overflows. A sure sign of a stack overflow is an oops listing without any sort of reasonable back trace. The first option adds explicit overflow checks to the kernel; the second causes the kernel to monitor stack usage and make some statistics available via the magic SysRq key.
CONFIG_KALLSYMS
This option (under “General setup/Standard features”) causes kernel symbol information to be built into the kernel; it is enabled by default. The symbol information is used in debugging contexts; without it, an oops listing can give you a kernel traceback only in hexadecimal, which is not very useful.
CONFIG_IKCONFIG
CONFIG_IKCONFIG_PROC
These options (found in the “General setup” menu) cause the full kernel config- uration state to be built into the kernel and to be made available via /proc. Most kernel developers know which configuration they used and do not need these options (which make the kernel bigger). They can be useful, though, if you are trying to debug a problem in a kernel built by somebody else.
CONFIG_ACPI_DEBUG
Under “Power management/ACPI.” This option turns on verbose ACPI (Advanced Configuration and Power Interface) debugging information, which can be useful if you suspect a problem related to ACPI.
CONFIG_DEBUG_DRIVER
Under “Device drivers.” Turns on debugging information in the driver core, which can be useful for tracking down problems in the low-level support code. We’ll look at the driver core in Chapter 14.
CONFIG_SCSI_CONSTANTS
This option, found under “Device drivers/SCSI device support,” builds in infor- mation for verbose SCSI error messages. If you are working on a SCSI driver, you probably want this option.
CONFIG_INPUT_EVBUG
This option (under “Device drivers/Input device support”) turns on verbose log- ging of input events. If you are working on a driver for an input device, this option may be helpful. Be aware of the security implications of this option, how- ever: it logs everything you type, including your passwords.
CONFIG_PROFILING
This option is found under “Profiling support.” Profiling is normally used for system performance tuning, but it can also be useful for tracking down some kernel hangs and related problems.
We will revisit some of the above options as we look at various ways of tracking down kernel problems. But first, we will look at the classic debugging technique: print statements.
KERN_EMERG KERN_ALERT KERN_CRIT KERN_ERR KERN_WARNING KERN_NOTICE KERN_INFO | An emergency condition; the system is probably dead. A problem that requires immediate attention. A critical condition. An error. A warning. A normal, but perhaps noteworthy, condition. An informational message. |
KERN_DEBUG | A debug message—typically superfluous. |
If the priority is less than the integer variable console_loglevel, the message is delivered to the console one line at a time (nothing is sent unless a trailing newline is provided).
The variable console_loglevel is initialized to DEFAULT_CONSOLE_LOGLEVEL
It is also possible to read and modify the console loglevel using the text file /proc/sys/ kernel/printk.
The file hosts four integer values: the current loglevel, the default level for messages that lack an explicit loglevel, the minimum allowed loglevel, and the boot-time default loglevel.
Writing a single value to this file changes the current loglevel to that value; thus, for example, you can cause all kernel messages to appear at the console by simply entering:
# echo 8 > /proc/sys/kernel/printk
It should now be apparent why the hello.c sample had the KERN_ALERT; markers; they
are there to make sure that the messages appear on the console.
留言