Booting ARM Linux SMP on MPCore
Booting ARM Linux SMP
on MPCore
It is important to understand what happens from the time the
power button is switched on until the popup of the command shell environment
with all the 4 CPU cores running. The boot process of an embedded Linux kernel
differs from the PC environment, typically because the environment setting and
the available hardware change from one platform to another. For example, an
embedded system doesn’t have a hard disk or a PC BIOS, but include a boot
monitor and flash memories. So basically, the main difference between each
architecture’s boot process is in the application used to find and load the
kernel. Once the kernel is in the memory, the same sequence of events occurs
for all the CPU architectures, with some overloaded functionalities specific to
each of them.
When we press the system power on, a Boot Monitor code
executes from a predefined address location from the NOR flash memory
(0x00000000). The Boot Monitor initializes the PB11MPCore hardware
peripherals’, and then launches the real bootloader U-Boot in case an automatic
script is provided; else the user runs U-Boot manually by entering the
appropriate command in the Boot Monitor command shell. U-Boot initializes the
main memory and copies the compressed Linux kernel image (uImage), which is
located either on the on-board NOR flash memory, MMC, CompactFlash or on a host
PC, to the main memory to be executed by the ARM11 MPCore, after passing some
initialization parameters to the kernel. Then the Linux kernel image
decompresses itself, starts initializing its data structures, creates some user
processes, boots all the CPU cores and finally runs the command shell
environment in the user-space.
This was a brief introduction to the whole boot process. In
the next sections, we will explain each stage in details and highlight the
Linux source code that is executing the corresponding stage.
a) System
startup (Boot Monitor)
When the system is powered on or reset, all CPUs of the
ARM11 MPCore fetch the next instruction from the reset vector address to their
PC register. In our case, it is the first address in the NOR flash memory
(0x00000000), where the Boot Monitor program exists. Only CPU0 continues to
execute the Boot Monitor code and the secondary CPUs (CPU1, CPU2, and CPU3)
execute a WFI instruction, which is actually a loop that checks the value of
SYS_FLAGS register. The secondary CPUs start executing meaningful code during
Linux Kernel boot process, which is explained in details later in this section
in paragraph ARM Linux.
The Boot Monitor is the standard ARM application that runs
when the system is booted and is built with the ARM platform library.
On reset, the Boot Monitor performs the following actions:
- Executes
on CPU0 the main code and on the secondary CPUs the WFI instruction
- Initialize
the memory controllers and configure the main board peripherals
- Set
up a stack in memory
- Copy
itself to the main memory DRAM
- Reset
the boot memory remapping
- Remap
and redirect the C library I/O routines depending on the settings of the
switches on the front panel of the PB11MPCore (output: UART0 or LCD –
input: UART0 or keyboard)
- Run
a bootscript automatically, if it exists in the NOR flash memory and the
corresponding switch is ON on the front panel of the PB11MPCore. Else, the
Boot Monitor command shell is prompted
So basically, the Boot Monitor application shipped with the
board is similar to BIOS in the PC. It has limited functionalities and cannot
boot a Linux kernel image. So, another bootloader is needed to complete the
booting process, which is U-Boot. The U-Boot code is cross-compiled to the ARM
platform and flashed to the NOR flash memory. The final step is to launch
U-Boot image from the Boot Monitor command line. This can be done using a
script or manually by entering the appropriate command.
b) Bootloader
(U-Boot)
When the bootloader is
called by the Boot Monitor, it is located in the NOR flash memory without
access to system RAM because the memory controller is not initialized properly
as U-Boot expects. So how U-Boot moves itself from the flash memory to the main
memory?
In order to get the C
environment working properly and run the initialization code, U-Boot needs to
allocate a minimal stack. In case of the ARM11 MPCore, this is done in a locked
part of the L1 data cache memory. In this way, the cache memory is used as
temporary data storage to initialize U-Boot before the SDRAM controller is setup.
Then, U-Boot initializes the ARM11 MPCore, its caches and the SCU. Next, all
available memory banks are mapped using a preliminary mapping and a simple
memory test is run to determine the size of the SDRAM banks. Finally, the
bootloader installs itself at the upper end of the SDRAM area and allocates
memory for use by malloc() and for the global board info data. In the low
memory, the exception vector code is copied. Now, the final stack is set up.
At this stage, the 2nd
bootloader U-Boot is in the main memory and a C environment is set up. The
bootloader is ready to launch the Linux kernel image from a pre-specified
location after passing some boot parameters to it. In addition, it initializes
a serial or video console for the kernel. Finally, it calls the kernel image by
jumping directly to the ‘start’ label in arch/arm/boot/compressed/head.S
assembly file, which is the start header of the Linux kernel decompressor.
The bootloader can
perform lot of functionalities; however a minimal set of requirements should be
always achieved:
-
Configure the system’s main memory:
The Linux kernel does not have the knowledge of the setup or
configuration of the RAM within a system. This is the task of the bootloader to
find and initialize the entire RAM that the kernel will use for volatile data
storage in a machine dependent manner, and then passes the physical memory
layout to the kernel using ATAG_MEM parameter, which will be explained later.
-
Load the kernel image at the correct memory
address:
The ‘uImage’ encapsulates
a compressed Linux kernel image with header information that is marked by a
special magic number and a data portion. Both the header and data are secured
against corruption by a CRC32 checksum. In the data field, the start and end
offsets of the size of the image are stored. They are used to determine the
length of the compressed image in order to know how much memory can be
allocated. The ARM Linux kernel expects to be loaded at address 0x7fc0 in the
main memory.
-
Initialize a console:
Since a serial
console is essential on all the platforms in order to allow communication with
the target and early kernel debugging facilities, the
bootloader should initialize and enable one serial port on the target. Then it
passes the relevant console parameter option to the kernel in order to inform
it of the already enabled port.
-
Initialize
the boot parameters to pass to the kernel:
The bootloader
must pass parameters to the kernel in form of tags, to describe the setup it
has performed, the size and shape of memory in the system and, optionally,
numerous other values as described in Table 1:
Tag name
|
Description
|
ATAG_NONE
|
Empty tag used to
end list
|
ATAG_CORE
|
First tag used to
start list
|
ATAG_MEM
|
Describes a
physical area of memory
|
ATAG_VIDEOTEXT
|
Describes a VGA text display
|
ATAG_RAMDISK
|
Describes how the
ramdisk will be used in kernel
|
ATAG_INITRD2
|
Describes where the
compressed ramdisk image is placed in memory
|
ATAG_SERIAL
|
64 bit board serial
number
|
ATAG_REVISION
|
32 bit board
revision number
|
ATAG_VIDEOLFB
|
Initial values for
vesafb-type framebuffers
|
ATAG_CMDLINE
|
Command line to
pass to kernel
|
-
Obtain the ARM Linux machine type:
The bootloader should provide the machine
type of the ARM system, which is a simple unique number that identifies the
platform. It can be hard coded in the source code since it is pre-defined, or
read from some board registry. The machine type number can be fetched from
ARM-Linux project website.
-
Enter the kernel with the appropriate
register values:
Finally, and before starting execution of the Linux kernel
image, the ARM11 MPCore registers must be set in an appropriate way:
- Supervisor
(SVC) mode
- IRQ
and FIQ interrupts disabled
- MMU
off (no translation of memory addresses is required)
- Data
cache off
- Instruction
cache may be either on or off
- CPU
register0 = 0
- CPU register1 = ARM Linux machine
type
- CPU
register2 = physical address of the parameter list
c) ARM
Linux
As mentioned earlier, the bootloader jumped to the
compressed kernel image code and passed some initialization parameters denoted
by ATAG. The beginning of the compressed Linux kernel image is the ‘start’
label in arch/arm/boot/compressed/head.S assembly
file. From this stage, the boot process comprises of 3 main stages. First the
kernel decompresses itself. Then, the processor-dependent (ARM11 MPCore) kernel
code executes which initializes the CPU and memory. And finally, the
processor-independent kernel code executes which startup the ARM Linux SMP
kernel by booting up all the ARM11 cores and initializes all the kernel
components and data structures.
The flowchart in Figure 2
summarizes the boot process of the ARM Linux kernel:
In the Linux SMP environment, CPU0 is responsible for
initializing all resources just as in a uniprocessor environment. Once
configured, access to a resource is tightly controlled using synchronization
rules such as a spinlock. CPU0 will configure the boot page translation so
secondary cores boot from a dedicated section of Linux rather than the default
reset vector. When secondary cores boot the same Linux image, they will enter
Linux at a specific location so they simply initialize resources specific only
to their core (caches, MMU) and don’t reinitialize resources that have already
been configured, and then execute the idle process with PID 0.
A step-by-step walkthrough for the Linux kernel boot process is provided below:
This appendix will provide a walkthrough in the Linux kernel
boot process for the ARM-based systems, specifically the ARM11 MPCore, by
highlighting the source code of the kernel that executes each step. The boot
process comprises of 3 main stages:
Image decompression:
Ø U-Boot
jumps at the ‘start’ label in arch/arm/boot/compressed/head.S
Ø The
parameters passed by U-Boot in r0 (CPU architecture ID) and r1 (ATAG parameter
list pointer) are saved
Ø
Execute architecture specific code, then turn
off the cache and MMU
Ø
Setup the C environment properly
Ø
Assign the appropriate values to the registers
and stack pointer. i.e: r4= kernel physical start address – sp=decompressor
code
Ø
Turn on the cache memory again by calling
cache_on procedure which walk through proc_types list and find the
corresponding ARM architecture. For the ARM11 MPCore (ARM v6),
__armv4_mmu_cache_on, __armv4_mmu_cache_off, and __armv6_mmu_cache_flush
procedures are called to turn on, off, and flush the cache memory to RAM
respectively
Ø
Check if the decompressed image will overwrite
the compressed image and jump to the appropriate routine
Ø
Call the decompressor routine
decompress_kernel() which is located in arch/arm/boot/compressed/misc.c.
The decompress_kernel() will display the “Uncompressing Linux...” message on
the output terminal, followed by calling gunzip() function, then displaying “
done, booting the kernel” message.
Ø
Flush the cache memory contents to RAM using
__armv6_mmu_cache_flush
Ø
Turn off the cache using __armv4_mmu_cache_off,
because the kernel initialization routines expects that the cache memory is off
at the beginning
Ø
Jump to start of kernel in RAM, where its
address is stored in r4 register. The kernel start address is specific for
Ø
Each
platform architecture. For the PB11MPCore, it is stored in arch/arm/mach-realview/Makefile.boot in zreladdr-y variable
(zreladdr-y := 0x00008000)
(zreladdr-y := 0x00008000)
Processor dependent (ARM) specific
kernel code:
The kernel startup entry
point is in stext procedure in arch/arm/kernel/head.S
file, where the decompressor has jumped after turning off the MMU and cache
memory and setting the appropriate registers. At this stage, the following
sequence of events is done in stext: (arch/arm/kernel/head.S)
Ø
Ensure that the CPU runs in Supervisor mode and
disable all the interrupts
Ø
Lookup for the processor type using
__lookup_processor_type procedure defined in
arch/arm/kernel/head-common.S. This will return a pointer to a
proc_info_list defined in arch/arm/mm/proc-v6.S for the ARM11 MPCore
Ø
Lookup for the machine type using __lookup_machine_type
procedure defined in
arch/arm/kernel/head-common.S. This will return a pointer to a machine_desc
struct defined for the PB11MPCore
Ø
Create the page table using __create_page_tables
procedure, which will setup the barest amount of page tables required to get
the kernel running; in other words to map in the kernel code
Ø
Jump to __v6_setup procedure in arch/arm/mm/proc-v6.S, which will
initialize the TLB, cache and MMU state of CPU0
Ø
Enable the MMU using __enable_mmu procedure,
which will setup some configuration bits and then call __turn_mmu_on (arch/arm/kernel/head.S)
Ø
In __turn_mmu_on, the appropriate control
registers are set and then it jumps to __switch_data which will execute the
first procedure __mmap_switched (arch/arm/kernel/head-common.S)
Ø
In __mmap_switched procedure, the data segment
is copied to RAM and the BSS segment is cleared. Finally, it jumps to start_kernel() routine in the init/main.c source code where the Linux
kernel starts
Processor independent kernel code
From this stage on, it is a common
sequence of events for the boot process of the Linux Kernel independent of the
hardware architecture. Well some functions are still hardware dependent, and
they actually override the independent implementation. We will concentrate mainly
on how the SMP part of Linux will boot and how the CPUs in the ARM11 MPCore are
initialized.
In start_kernel(): (init/main.c)
Ø Disable the interrupts on CPU0 using local_irq_disable()
(include/linux/irqflags.h)
Ø Lock the kernel using lock_kernel() to
prevent from being interrupted or preempted from high priority interrupts (include/linux/smp-lock.h)
Ø Activate the first processor (CPU0) using
boot_cpu_init() (init/main.c)
Ø Initialize the kernel tick control using
tick_init() (kernel/time/tick-common.c)
Ø Initialize the memory subsystem using
page_address_init() (mm/highmem.c)
Ø Display the kernel version on the console
using printk(linux_banner) (init/version.c)
Ø Setup architecture specific subsystems
such as memory, I/O, processors, etc…by using setup_arch(&command_line).
The command_line is the parameter list passed by U-Boot when calling the
kernel. (arch/arm/kernel/setup.c)
o
In
setup_arch(&command_line) function, we execute architecture dependent code.
For the ARM11 MPCore, smp_init_cpus() is called, which initialize the CPU map.
It is in this stage where the kernel knows that there are 4 cores in the ARM11
MPCore. (arch/arm/mach-realview/platsmp.c)
o
Initialize
one processor (CPU0 in this case) using cpu_init() which dumps the cache information,
initializes SMP specific information, and sets up the per-cpu stacks (arch/arm/kernel/setup.c)
Ø Setup a multiprocessing environment using
setup_per_cpu_areas(). This function determines the size of memory a single CPU
requires, allocates and initializes the memory for each corresponding CPU (4
CPUs). This way, each CPU has its own region to place its data. (init/main.c)
Ø Allow the booting processor (CPU0) to
access its own storage data already initialized using smp_prepare_boot_cpu() (arch/arm/kernel/smp.c)
Ø Setup the Linux scheduler using
sched_init() (kernel/sched.c)
o
Initialize
a runqueue for each of the 4 CPUs with its corresponding data (kernel/sched.c)
o
Fork
an idle thread for CPU0 using init_idle(current, smp_processor_id()) (kernel/sched.c)
Ø Initialize the memory zones such as DMA,
normal, high memory using build_all_zonelists() (mm/page_alloc.c)
Ø Parse the arguments passed to Linux kernel
using parse_early_param() (init/main.c)
and parse_args() (kernel/params.c)
Ø Initialize the interrupt table and GIC and
trap exception vectors using init_IRQ() (arch/arm/kernel/irq.c)
and trap_init() (arch/arm/kernel/traps.c).
Also assign the processor affinity for each interrupt.
Ø Prepare the boot CPU (CPU0) to accept
notifications from tasklets using softirq_init() (kernel/softirq.c)
Ø Initialize and run the system timer using
time_init() (arch/arm/kernel/time.c)
Ø Enable the local interrupts on CPU0 using
local_irq_enable() (include/linux/irqflags.h)
Ø Initialize the console terminal using
console_init() (drivers/char/tty_io.c)
Ø Find the total number of free pages in all
memory zones using mem_init() (arch/arm/mm/init.c)
Ø Initialize the slab allocation using
kmem_cache_init() (mm/slab.c)
Ø Determine the speed of the CPU clock in
BogoMips using calibrate_delay() (init/calibrate.c)
Ø Initialize the kernel internal components
such as page tables, SLAB caches, VFS, buffers, signals queues, max number of
threads and processes, etc…
Ø Initialize the proc/ filesystem using
proc_root_init() (fs/proc/root.c)
Ø Call rest_init() which will create Process 1
In rest_init(): (init/main.c)
Ø Create the init process, which is also
called Process 1, using kernel_thread(kernel_init, NULL, CLONE_FS |
CLONE_SIGHAND)
Ø Create the kernel thread daemon, which is
the parent of all kernel threads and has PID
2, using pid = kernel_thread(kthreadd, NULL, CLONE_FS | CLONE_FILES) (kernel/kthread.c)
Ø Release the kernel lock that was locked at
the beginning of start_kernel() using unlock_kernel()(include/linux/smp-lock.h)
Ø Execute the schedule() instruction to
start running the scheduler (kernel/sched.c)
Ø Execute the CPU idle thread on CPU0 using
cpu_idle(). This thread yields CPU0 to the scheduler and is returned to when
the scheduler has no other pending process to run on CPU0. CPU idle thread
tries to conserve power and keep overall latency low (arch/arm/kernel/process.c)
In kernel_init(): (init/main.c) <Process 1>
Ø Start preparing the SMP environment by
calling smp_prepare_cpus() (arch/arm/mach-realview/platsmp.c)
o
Enable
the local timer of the current processor which is CPU0, using local_timer_setup(cpu)
(arch/arm/mach-realview/localtimer.c)
o
Move
data corresponding to CPU0 to its own storage using smp_store_cpu_info(cpu) (arch/arm/kernel/smp.c)
o
Initialize
the present CPU map which describes the set of CPUs actually populated at the
present time using cpu_set(i, cpu_present_map). This will inform the kernel
that there are 4 CPUs.
o
Initialize
the Snoop Control Unit using scu_enable() (arch/arm/mach-realview/platsmp.c)
o
Call
poke_milo() function which will take care of booting the secondary processors (arch/arm/mach-realview/platsmp.c)
§ In poke_milo(), it triggers the other CPUs
to execute realview_secondary_startup procedure by clearing the lower 2 bits of
SYS_FLAGSCLR register and writing the physical address of realview_secondary_startup
procedure in SYS_FLAGSSET (arch/arm/mach-realview/headsmp.S)
§ In realview_secondary_startup procedure,
the secondary CPUs are waiting a synchronization signal from the kernel
(running on CPU0) which says that they are ready to be initialized. When all
the processors are ready, then they will be initialized using secondary_startup
procedure (arch/arm/mach-realview/headsmp.S)
§ secondary_startup procedure does a similar
operation as the stext procedure when CPU0 was booted: (arch/arm/mach-realview/headsmp.S)
·
Switch to Supervisor protected mode and disable
all the interrupts
·
Lookup for the processor type using
__lookup_processor_type procedure defined in
arch/arm/kernel/head-common.S. This will return a pointer to a
proc_info_list defined in arch/arm/mm/proc-v6.S
for the ARM11 MPCore
·
Use the page tables
supplied from __cpu_up for each of the
CPUs (to be explained later in cpu_up function)
·
Jump to __v6_setup procedure in arch/arm/mm/proc-v6.S, which will
initialize the TLB, cache and MMU state of the corresponding secondary CPU
·
Enable the MMU using __enable_mmu procedure,
which will setup some configuration bits and then call __turn_mmu_on (arch/arm/kernel/head.S)
·
In __turn_mmu_on, the appropriate control
registers are set and then it jumps to __secondary_data which will execute
__secondary_switched procedure (arch/arm/kernel/head.S)
·
In __secondary_switched procedure, it jumps to secondary_start_kernel routine in arch/arm/kernel/smp.c source code after
setting the stack pointer to a thread structure allocated via cpu_up function
that is running on CPU0. (to be explained later)
·
secondary_start_kernel
(arch/arm/kernel/smp.c) is the
official start of the kernel for the secondary CPUs. It is considered as a
kernel thread which is running on the corresponding CPU (see previous step). In
this thread, further initialization is done such as:
o
Initialize
the CPU using cpu_init() which dumps the cache information, initializes SMP
specific information, and sets up the per-cpu stacks (arch/arm/kernel/setup.c)
o
Synchronize with the boot thread in CPU0 and
enable some interrupts such as timer
irq in the corresponding CPU interface of the Distributed Interrupt Controller
using platform_secondary_init(cpu) function (arch/arm/mach-realview/platsmp.c)
o
Enable the local interrupts using
local_irq_enable() and local_fiq_enable() (include/linux/irqflags.h)
o
Setup the local timer of the corresponding CPU
using local_timer_setup(cpu) (arch/arm/mach-realview/localtimer.c)
o
Determine
the speed of the CPU clock in BogoMips using calibrate_delay() (init/calibrate.c)
o
Move
data corresponding to CPUx to its own storage using smp_store_cpu_info(cpu) (arch/arm/kernel/smp.c)
o
Execute the idle thread (also can be called as
process 0) on the corresponding secondary CPU using cpu_idle() which will yield CPUx to the scheduler and is
returned to when the scheduler has no other pending process to run on CPUx (arch/arm/kernel/process.c)
Ø Call smp_init() (init/main.c) <we are on
CPU0>
§ Boot every offline CPU which are CPU1,CPU2
and CPU3 using cpu_up(cpu): (arch/arm/kernel/smp.c)
·
Create
a new idle process manually using fork_idle(cpu) and assign it to the data
structure of the corresponding CPU
·
Allocate
initial page tables to allow the secondary CPU to enable the MMU safely using
pgd_alloc()
·
Inform
the secondary CPU where to find its stack and page tables
·
Boot
the secondary CPU using boot_secondary(cpu,idle): (arch/arm/mach-realview/platsmp.c)
o
Synchronize
between the boot processor (CPU0) and the secondary processor using locking
mechanism spin_lock(&boot_lock);
o
Inform
the secondary processor that it can start booting its part of the kernel
o
Wake
the secondary core up using smp_cross_call(mask_cpu), which will send a soft
interrupt (include/asm-arm/mach-realview/smp.h)
o
Wait
for the secondary core to finish its booting and calibrations that are done
using secondary_start_kernel function (explained before)
·
Repeat
this process for every secondary CPU
§ Display the kernel message on the console
“SMP: Total of 4 processors activated (334.02 BogoMIPS), using smp_cpus_done(max_cpus)
(arch/arm/kernel/smp.c)
Ø Call sched_init_smp() (kernel/sched.c)
§ Build the scheduler domains using
arch_init_sched_domains(&cpu_online_map) which will set the topology of the
multicore (kernel/sched.c)
§ Check how many online CPUs exist and adjust the scheduler granularity value
appropriately using sched_init_granularity() (kernel/sched.c)
Ø The do_basic_setup() function initializes the driver model using driver_init()
(drivers/base/init.c), the
sysctl interface, the network socket
interface u, and work queue support using init_workqueues(). Finally it calls
do_initcalls () which initializes the built-in device drivers routines (init/main.c)
Ø Call init_post() (init/main.c)
In init_post() (init/main.c):
This is where we switch to user mode by calling
sequentially the following processes:
run_init_process("/sbin/init");
run_init_process("/etc/init");
run_init_process("/bin/init");
run_init_process("/bin/sh");
/sbin/init process executes and displays
lot of messages on the console, and finally it transfers the control to the
console and stays alive.
SMP Linux on a Minimal Dual-Core Arm Cortex-A15 System
the Hardware Design
- the CPU must be updated to the dual-core Arm Cortex-A15 processor.
- the addition of an extra memory which is used to communicate the starting address for the secondary core.
- For SMP Linux there is a register (offset 0x30) which is used to pass the jump address to the secondary CPU
- For the minimal system Provide a simple memory at the base address of the System Registers, this is system address 0x1c010000.
Only offsets 0x30 and 0x34 are used and the values must be initialized to 0 because the secondary code waits for a non-zero value.
When the secondary CPU sees a non-zero value it will jump to the address contained in at 0x1c010030.
If this address is not 0 at startup the system will not boot properly.
Linux Changes
- enable SMP in the kernel config
- rebuild the kernel , adjusting the CROSS_COMPILE to match the prefix of your ARM cross compiler
$ make ARCH=arm CROSS_COMPILE=arm-linux-gnueabi- -j 4
Starting the Secondary CPU
After reset, both CPUs will start running the code in boot.S which is located at 0x80000000.- determine if the code is running CPU0 or CPU1 This is done by reading the CPU ID register located in co-processor 15 (CP15).
- the secondary core waits in the boot loader for a jump address to be provided at address 0x1c010030.
- The primary CPU which is running Linux is responsible to release the secondary CPU by writing the jump address and sending an interrupt. The well commented last line in the screenshot below (arch/arm/mach-vexpress/platsmp.c) gives the details:
- Check
- Messages in the boot log should indicate that 2 CPUs are running.
- /proc/cpuinfo
This register is also referred to as the Multiprocessor Affinity Register, MPIDR.
It provides information about which core of an MPCore processor the code is running on, and which cluster of a multi-cluster system the code is running on.
In the below example, we have a single cluster and two cores so the code simply identifies CPU ID 0 as the primary core and CPU ID 1 as the secondary. The primary core finishes the boot loader and immediately starts running Linux
A screenshot of the memory contents for the System Registers address range,
The primary CPU has written 0xffffffff into address 0x1c010034 and then the 32-bit jump address into 0x1c010030.
The virtual address 0xf8010000 was entered into the memory viewer window. .
留言