Linux Kernel in a Nutshell
Linux Kernel in a Nutshell
by Greg Kroah-HartmanChapter 1, Introduction
This book will go into how to build and install a custom kernel, and provide some hints on how to enable specific options that you will probably wish to use for different situations.
Chapter 2, Requirements for Building and Using the Kernel
Compiler
Be warned that getting the most recent gcc version is not always a good idea. Some of the newest gcc releases don’t build the kernel properly.
$ gcc --version
Linker
An additional set of tools known as binutils is needed to do the linking and assembling of source files.
$ ld -v
make
make is a tool that walks the kernel source tree to determine which files need to be compiled, and then calls the compiler and other build tools to do the work in building the kernel.
$ make --version
After building, following files are generated under the kernel source folder:
Module.symvers
System.map
vmlinux
Tools to Use the Kernel
There are a small number of program for which the kernel version is important to them. If the kernel is upgraded, some of these packages may also need to be upgraded in order for the system to work properly.
- util-linux Most of these utilities handle the mounting and creation of disk partitions and manipulation of the hardware clock in the system.
- module-init-tools The module-init-tools package is needed if you wish to use Linux kernel modules. The linker((the code that resolves all symbols and figures out how to put the pieces together in memory) for the module is built into the kernel.
- e2fsprogs For ext2/ext3/ext4
- jfsutils For JFS
- quota-tools
- nfs-utils
- udev Almost all Linux distributions use udev to manage the /dev directory, it provide a persistent device-naming system in the /dev directory. Unfortunately, udev relies on the structure of /sys, It is highly recommended that you use the version of udev that comes with your Linux distribution.
- procps The package procps includes the commonly used tools ps and top, as well as many other handy tools for managing and monitoring processes running on the system.
- pcmciautils.
Chapter 3, Retrieving the Kernel Source
The distribution packages have the advantage of being built to be compatible with the compiler and other tools provided by the distribution.If you can create your own environment with the latest kernel, compiler, and other tools, you will be able to build exactly what you want.
Kernel development release cycle:
While the development of the new features was happening, the 2.6.17.1, 2.6.17.2, and other stable kernel versions were released, containing bug fixes and security updates.
Open Source
- Setup the Build Environment
sudo apt-get gcc make perl sudo apt-get install build-essential libncurses-dev bison flex libssl-dev libelf-dev
$ mkdir linux; cd linux $ wget https://cdn.kernel.org/pub/linux/kernel/v5.x/linux-5.5.8.tar.xz $ unxz linux-5.5.8.tar.xz $ tar xf linux-5.5.8.tar
Ubuntu
- Setup the Build Environment Un-comment 'deb-src' lines in /etc/apt/sources.list then execute
sudo apt-get updateIf you have not built a kernel on your system before, there are some packages needed before you can successfully build. You can get these installed with:
sudo apt-get build-dep linux linux-image-$(uname -r) sudo apt-get install libncurses-dev flex bison openssl libssl-dev dkms libelf-dev libudev-dev libpci-dev libiberty-dev autoconf
- the kernel that is installed on your system
apt-get source linux-image-$(uname -r)The sourec code will be download and extracted.
git clone git://kernel.ubuntu.com/ubuntu/ubuntu-release codename.git
Chapter 4, Configuring and Building
Independent Build
- Modifying the configuration Copy existing config file :
cp -v /boot/config-$(uname -r) linux/linux-5.5.8/.configThe kernel configuration is kept in a file called .config in the top directory of the kernel source tree. It can be created by:
- make config The kernel configuration program will step through every configuration option and ask you if you wish to enable this option or not.
- make defconfig Every kernel version comes with a “default” kernel configuration.
- make menuconfig a console-based program that offers a way to move around the kernel configuration using the arrow keys on the keyboard.
- gconfig, xconfig Use the mouse to navigate the submenus and select options.
make
$ make drivers/usb/serial $ make M=drivers/usb/serial $ makeTo build only a specific file in the kernel tree, just pass it as the argument to make.
$ make drivers/usb/serial/visor.ko
- architecture with the ARCH=
- compiler with the CC=
- cross-compile toolchain with the CROSS_COMPILE=
$ make ARCH=x86_64 defconfig $ make ARCH=arm CROSS_COMPILE=/usr/local/bin/arm-linux-It is useful even for a non-cross-compiled building. Examples of this are using the distcc or ccache programs, both of which help greatly reduce the time it takes to build a kernel.
$ make CC="ccache gcc" $ make CC="ccache distcc"
ccache is a software development tool that caches the output of C/C++ compilation so that the next time, the same compilation can be avoided and the results can be taken from the cache.
Ubuntu
- Modifying the configuration This step can be skipped if no configuration changes are wanted. Add something like "+test1" to the end of the first version number in the debian/changelog file, before building.
chmod a+x debian/rules chmod a+x debian/scripts/* chmod a+x debian/scripts/misc/* fakeroot debian/rules clean fakeroot debian/rules editconfigsThis takes the current configuration for each architecture/flavour supported and calls menuconfig to edit its config file.
fakeroot debian/rules clean # quicker build: fakeroot debian/rules binary-headers binary-generic binary-perarch # if you need linux-tools or lowlatency kernel, run instead: fakeroot debian/rules binaryIf the build is successful, a set of three .deb binary package files will be produced in the directory above the build root directory.
sudo dpkg -i linux*4.8.0-17.19*.deb sudo reboot
Chapter 5, Installing and Booting from a Kernel
Using a Installation Scripts
If you have built any modules,you must install modules first:
# make modules_installThis will install all the modules that you have built and place them in the proper location in the filesystem for the new kernel to properly find. Modules are placed in the /lib/modules/kernel_version directory, where kernel_version is the kernel
version of the new kernel you have just built.
/lib/modules/5.5.8Almost all distributions come with a script called installkernel that can be used by the kernel build system to automatically install a built kernel into the proper location and modify the bootloader so that nothing extra needs to be done by the developer
installkernel installs a new kernel image onto the system from the Linux source tree. It is called by the Linux kernel makefiles when make install is invoked there.
This will kick off the following process:
- The kernel build system will verify that the kernel has been successfully built properly.
- The build system will install the static kernel (vmlinuz-x.x.x) into the /boot directory and name this executable file based on the kernel version of the built kernel.
- Any needed initial ramdisk images (initrd.img-x.x.x) will be automatically created, using the modules that have just been installed during the modules_install phase. This can be done not via "make install":
- Ubuntu
$ sudo update-initramfs -c -k 5.5.8
$ sudo mkinitrd /boot/initrd.img $(uname -r)The files in initrd can be listed:
$ lsinitramfs initrd.img-5.13.0-1007-intelThe file type of the initrd image can be checked via "file" and can be one of the following file types :
- pure ramdisk
- cpio archive
$ file /boot/initrd.img-5.3.0-40-generic /boot/initrd.img-5.3.0-40-generic: ASCII cpio archive (SVR4 with no CRC)Examining an initrd in this type:
$
initrd-file: gzip compressed data, was "build.initramfs", from Unix
Linux Compressed ROM File System dataThe initrd image is a compressed image. You can check this with file command: One popular use of an initrd is for a framebuffer image to display during bootup, using programs such as bootsplash.
It has many more functional uses however, such as being a place where modules can be placed that will automatically get loaded on boot up, which is handy for many reasons. Most off the shelf distros like using an initrd file on a standard installation, but they are not strictly required normally.
The initrd ramdisk contains the modules required for mounting the root partition.
This initrd resides on the same partition on which kernel image is present.
So the kernel loads the initrd in memory, accesses the modules and mounts the root partition in read-only mode.
$ sudo update-grubupdate-grub scans /boot folder and adds the new kernel binary to the grub.cfg automatically.
/boot/vmlinuz-5.5.8 /boot/initrd.img-5.5.8 /boot/config-5.5.8 /boot/System.map-5.5.8And, /boot/grub/grub.cfg is modified.
Installing by Hands
If your distribution does not have a installkernel command, or you wish to just do the work by hands:
- The modules must be installed
# make modules_install
# make kernelversion 5.5.8Copy image and system table:
# make kernelversion # cp arch/i386/boot/bzImage /boot/bzImage-KERNEL_VERSION # cp System.map /boot/System.map-KERNEL_VERSION
The system initialization
The computer system undergoes several phases of boot strap processes from the power-on event until it offers the fully functional operating system (OS) to the user.
The typical boot strap process is like a four-stage rocket.
Each stage rocket hands over the system control to the next stage one.
- Stage 1: the UEFI When a computer is powered on, the boot manager is the 1st stage of the boot process which checks the boot configuration and based on its settings, then executes the specified OS boot loader or operating system kernel (usually boot loader).
- Stage 2: the boot loader The boot loader is the 2nd stage of the boot process which is started by the UEFI.
- Stage 3: the mini-Debian system This is started by the boot loader.
- the kernel converts initrd into a “normal” RAM disk and frees the memory used by initrd
- if the root device is not /dev/ram0, the old (deprecated) change_root procedure is followed.
- if the root device is /dev/ram0, the initrd image is then mounted as root
- /sbin/init is executed (this can be any valid executable, including shell scripts; it is run with uid 0 and can do basically everything init can do).
- a shell script program if initramfs was created by initramfs-tools.
- a binary systemd program if initramfs was created by dracut.
- Stage 4: the normal Debian system The system kernel for the mini-Debian system continues to run in this environment.
- mounts the “real” root file system
- places the root file system at the root directory using the pivot_root system call The root filesystem is switched from the one on the memory to the one on the real hard disk filesystem.
- init execs the /sbin/init on the new root filesystem, performing the usual boot sequence
- the initrd file system is removed
An EFI system partition (ESP) is a data storage device partition , it stores UEFI applications and the files these applications need to run, including operating system boot loaders.
(On the legacy PC system, BIOS stored in the MBR may be used instead.)
It loads the system kernel image and the initrd image to the memory and hands control over to them.
It runs the system kernel:
This system is commonly referred as the initrd or initramfs system.
The "/init" program is executed as the first program in this root filesystem on the memory. It is a program which initializes the kernel in user space and hands control over to the next stage.
The "/init" program can be :
The "/init" :
GRUB
GRUB stands for GRand Unified Bootloader.When a computer is turned on, BIOS finds the configured primary bootable device (usually the computer's hard disk) and loads and executes the initial bootstrap program from the master boot record (MBR). The MBR is the first sector of the hard disk, with zero as its offset (sectors counting starts at zero).
- boot.img boot.img is written to the first 440 bytes of the Master Boot Record (MBR in sector 0). It addresses diskboot.img which is the first sector of core.img.
- diskboot.img It loads the rest of core.img.
- core.img It enters 32-bit protected mode, uncompresses itself to have the ability to /boot/grub. Then, it loads /boot/grub/<platform>/normal.mod.
- normal.mod It parses /boot/grub/grub.cfg, optionally loads modules (eg. for graphical UI) and shows the menu.
Grub can be configured to automatically load a specified OS after a user-defined timeout. If the timeout is set to zero seconds, pressing and holding ⇧ Shift while the computer is booting makes it possible to access the boot menu.
In the operating system selection menu GRUB accepts a couple of commands:
- By pressing e It is possible to edit kernel parameters of the selected menu item before the operating system is started.
- By pressing c enters the GRUB command line.
GRUB is configured by the file /boot/grub/grub.cfg.
On a modern Ubuntu, to prevent from editing this file incorrectly, you edit a few settings in /etc/default/grub, and then run update-grub to rebuild it.
Look at some usable variables in /etc/default/grub:
GRUB_DEFAULT=saved GRUB_TIMEOUT=2 GRUB_CMDLINE_LINUX_DEFAULT=”panic=5″
- GRUB_DEFAULT=saved This tells GRUB to use the last-saved selection. You can configure which kernel is initially the one that’s ‘saved’. To find available items to be selected:
$ grep menuentry /boot/grub/grub.cfgFind the label and set that to be the saved default for the future:
sudo grub-set-default "Ubuntu, with Linux 5.3.0-40-generic"
Chapter 6, Upgrading a Kernel
Download the New Source
Which Patch Applies to Which Release?
- Stable kernel patches apply to the base kernel version. 2.6.17.10 patch will only apply to the 2.6.17 kernel release. The 2.6.17.10 kernel patch will not apply to the 2.6.17.9 kernel or any other release.
- Base kernel release patches only apply to the previous base kernel version. the 2.6.18 patch will only apply to the 2.6.17 kernel release.
Finding the Patch
There are 2 patches needed to go from the 2.6.17.9 to the 2.6.17.11 release:
- patch-2.6.17.9-10.bz2
- patch-2.6.17.10-11.bz2
Applying the Patch
Decompress the patch
bzip2 -dv patch-2.6.17.9-10.bz2Apply the patch files to the kernel directory:
cd linux-2.6.17.9 patch -p1 < ../patch-2.6.17.9-10It is a good idea to look at the Makefile of the kernel to see the kernel version patched:
$ head -n 5 Makefile VERSION = 2 PATCHLEVEL = 6 SUBLEVEL = 17 EXTRAVERSION = .10 NAME=Crazed Snow-Weasel
Reconfigure the Kernel
Once you have a working configuration, the only thing that is necessary is to update it with any new options that have been added to the kernel since the last release. To do this, the make oldconfig and make silentoldconfig options should be used.
- make oldconfig make oldconfig takes the current kernel configuration in the .config file. If there is a new option, the program stops and asks the user what the new configuration value should be set to.
- make silentoldconfig make silentoldconfig works exactly the same way as oldconfig, but it does not print anything to the screen, unless it needs to ask a question about a new configuration option.
Chapter 7, Customizing a Kernel
To decide which drivers and configuration options are needed for your machine to work properly.
Using a Distribution Kernel
Most distribution kernels are built to include the configuration within the /proc filesystem.$ cp /proc/config.gz ~/linux/ $ cd ~/linux $ gzip -dv config.gzThe disadvantage of this kernel image is that you will have built almost every kernel module and driver that is present in the kernel source tree.
A virtual filesystem called sysfs provides a glimpse into how the different portions of the kernel are hooked together.
sysfs should always be mounted at the /sys location in your filesystem.
Example: Determining the network driver
- find out net device
$ ls /sys/class/net/ eno1 lo wlp2s0
$ ls -l /sys/class/net/eno1/device/driver/module/drivers total 0 lrwxrwxrwx 1 root root 0 三 12 15:32 pci:e1000e -> ../../../bus/pci/drivers/e1000e
$ find -type f -name Makefile | xargs grep e1000e ./drivers/net/ethernet/intel/Makefile:obj-$(CONFIG_E1000E) += e1000e/ ./drivers/net/ethernet/intel/e1000e/Makefile:obj-$(CONFIG_E1000E) += e1000e.o ./drivers/net/ethernet/intel/e1000e/Makefile:e1000e-objs := 82571.o ich8lan.o 80003es2lan.o \
#!/bin/bash # # find_all_modules.sh # for i in `find /sys/ -name modalias -exec cat {} \;`; do /sbin/modprobe --config /dev/null --show-depends $i ; done | rev | cut -f 1 -d '/' | rev | sort -u
Determining the Correct Module from Scratch
The easiest way to figure out which driver controls a new device is to build all of the different drivers of that type in the kernel source tree as modules, and let the udev startup process match the driver to the device.
Find the driver for a device
- PCI PCI devices are distinguished by vendor ID and device ID;
$ lspci | grep Ethernet 00:19.0 Ethernet controller: Intel Corporation 82577LM Gigabit Network Connection (rev 05)The first few bits of the lspci output show the PCI bus ID for this device, 00:19.0. That is the value we will use when looking through sysfs:
$ ls /sys/bus/pci/devices/ | grep 00:19.0 0000:00:19.0 $ cat /sys/bus/pci/devices/0000:00:19.0/vendor 0x8086 $ cat /sys/bus/pci/devices/0000:00:19.0/device 0x10eaThe kernel uses the vendor and device IDs for this PCI device. PCI drivers tell the kernel which vendor and device IDs they will support so that the kernel knows how to bind the driver to the proper device. To find the proper kernel driver that advertises that it supports this device.
- Search include/linux/pci_ids.h for our vendor and product number
$ grep -i 0x8086 include/linux/pci_ids.h | grep VENDOR #define PCI_VENDOR_ID_INTEL 0x8086 $ grep -i 0x10ea include/linux/pci_ids.h | grep DEVICE
$ grep -Rl PCI_VENDOR_ID_INTEL drivers/net drivers/net/wireless/intel/ipw2x00/ipw2100.c drivers/net/wireless/intel/ipw2x00/ipw2200.c drivers/net/wireless/intel/iwlwifi/pcie/drv.c drivers/net/wireless/intel/iwlegacy/common.h drivers/net/can/pch_can.c drivers/net/can/c_can/c_can_pci.c drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c drivers/net/ethernet/broadcom/tg3.c drivers/net/ethernet/dec/tulip/tulip_core.c drivers/net/ethernet/intel/i40e/i40e_common.c drivers/net/ethernet/intel/ixgbe/ixgbe_main.c drivers/net/ethernet/intel/ixgb/ixgb_main.c drivers/net/ethernet/intel/e1000/e1000_main.c drivers/net/ethernet/intel/e1000/e1000.h drivers/net/ethernet/intel/e100.c drivers/net/ethernet/intel/i40evf/i40e_common.cAll PCI drivers contain a list of the different devices that they support. That list is contained in a structure of struct pci_device_id values.
$ lsusb | grep Mouse Bus 002 Device 003: ID 046d:c077 Logitech, Inc. M105 Optical MouseYou can unplug the device then lsusb to make sure it. Note, the USB device numbers are not unique, but change every time a device is plugged in. What is stable is the vendor and product ID shown after "ID ". Unfortunately, no single file contains all of the USB vendor IDs, as PCI has. So a search of the whole kernel source tree is necessary:
Chapter 8, Kernel Configuration Recipes
Chapter 9, Kernel Boot Command-Line Parameter Reference
Chapter 10, Kernel Build Command-Line Reference
Chapter 11, Kernel Configuration Option Reference
Appendix A, Helpful Utilities
Inline assembly for x86 in Linux
GNU assembler syntax in brief
Register naming
Register names are prefixed by %. That is, if eax has to be used, it should be used as %eax.Source and destination ordering
In any instruction, source comes first and destination follows. This differs from Intel syntax, where source comes after destination.Transfers the contents of eax to ebx:
mov %eax, %ebx
Size of operand
The instructions are suffixed by b, w, or l, depending on whether the operand is a byte, word, or long.This is not mandatory; GCC tries provide the appropriate suffix by reading the operands.
But specifying the suffixes manually improves the code readability and eliminates the possibility of the compilers guessing incorrectly.
movb %al, %bl movw %ax, %bx movl %eax, %ebx
Immediate operand
An immediate operand is specified by using $.Move the value of 0xffff into eax register:
movl $0xffff, %eax
Indirect memory reference
Any indirect references to memory are done by using ( ).Transfer the byte in the memory pointed by esi into al register
movb (%esi), %al
Inline assembly
GCC provides the special construct "asm" for inline assembly
Basic inline
asm("assembly code");Examples,
asm("movl %ecx %eax"); /* moves the contents of ecx to eax */
Extended asm
In extended assembly, we can also specify the operands. It has the following format:
asm ( assembler template : output operands (optional) : input operands (optional) : list of clobbered registers (optional) );where:
- assembler template This consists of assembly instructions.
- output operands the C expressions on which the output of the assembly instructions will be performed
- input operands the C expressions that serve as input operands to the instructions
- clobbered registers 哪些register會被這段程式碼修改
Commas separate the operands within each group.
Examples,
- there are no output operands but there are input operands
asm ("stosl" : /* no output registers */ : "c" (count), "a" (fill_value), "D" (dest) : "%ecx", "%edi" );
int x = 10, y; asm ("movl %1, %%eax; "movl %%eax, %0;" :"=r"(y) /* %0. y is output operand */ :"r"(x) /* %1. x is input operand */ :"%eax"); /* %eax is clobbered register */
- y is the output operand, referred to by %0
- x is the input operand, referred to by %1
- "r" and "=r" are constraints on the operands "r" says to GCC to use any register for storing the operands. "=r" says that it is the output operand and is write-only.
- The clobbered register %eax after the third colon tells GCC that the value of %eax is to be modified inside "asm", so GCC won’t use this register to store any other value.
- operands have a single % as prefix. %% prefixed to the register name to helps GCC to identify it is a register but ASM inline operand.
movl %edx, %eax /* x is moved to %eax */ movl %eax, %edx /* y is allocated in edx and updated */
Kernel Modules
Obtaining information
Modules are stored in:$ ls /usr/lib/modules/$(uname -r)
- To show information about a module
$ modinfo snd_hda_intel filename: /lib/modules/5.4.0-80-generic/kernel/sound/pci/hda/snd-hda-intel.ko description: Intel HDA driver license: GPL srcversion: 2F60277DAE563209FA7BA4A alias: pci:v00001D17d00003288sv*sd*bc*sc*i* ... alias: pci:v00008086d00001C20sv*sd*bc*sc*i* depends: snd-hda-core,snd-hda-codec,snd-pcm,snd,snd-intel-dspcfg ... parm: index:Index value for Intel HD audio interface. (array of int) ...
$ modprobe -c | grep snd_hda_intel
$ modprobe --show-depends snd_hda_intel insmod /lib/modules/5.4.0-80-generic/kernel/sound/soundcore.ko insmod /lib/modules/5.4.0-80-generic/kernel/sound/core/snd.ko insmod /lib/modules/5.4.0-80-generic/kernel/sound/core/snd-timer.ko insmod /lib/modules/5.4.0-80-generic/kernel/sound/core/snd-pcm.ko insmod /lib/modules/5.4.0-80-generic/kernel/sound/core/snd-hwdep.ko insmod /lib/modules/5.4.0-80-generic/kernel/sound/hda/snd-hda-core.ko insmod /lib/modules/5.4.0-80-generic/kernel/sound/pci/hda/snd-hda-codec.ko insmod /lib/modules/5.4.0-80-generic/kernel/sound/hda/snd-intel-dspcfg.ko insmod /lib/modules/5.4.0-80-generic/kernel/sound/pci/hda/snd-hda-intel.ko
Automatic module loading with systemd
All necessary modules loading is handled automatically by udev, so if you do not need to use any out-of-tree kernel modules, there is no need to put modules in any configuration file because that should be loaded at boot .
Kernel modules can be explicitly listed in files under /etc/modules-load.d/ for systemd to load them during boot.
systemd-modules-load.service reads files from /etc/modules-load.d/ which contain kernel modules to load during boot in a static list.
Each configuration file under /etc/modules-load.d/:
- named in the style
/etc/modules-load.d/*.conf
Setting module options
- Manually set parameters at load time using modprobe
$ sudo modprobe module_name name=value
The syntax is:
options module_name name=valueIf any of the affected modules is loaded from the initramfs, then you will need to add the appropriate .conf file to FILES in mkinitcpio.conf or use the modconf hook, so that it will be included in the initramfs.
module_name.name=value
Aliasing
Aliases are alternate names for a module.Create an alias, /etc/modprobe.d/myalias.conf:
alias mymod really_long_module_nameIt means you can use "modprobe my-mod" instead of "modprobe really_long_modulename".
Blacklisting
Blacklisting is a mechanism to prevent the kernel module from loading.To blacklist a module :
- Using files in /etc/modprobe.d/
- Using kernel command line
留言