Booting ARM Linux SMP on MPCore

Booting ARM Linux SMP on MPCore


It is important to understand what happens from the time the power button is switched on until the popup of the command shell environment with all the 4 CPU cores running. The boot process of an embedded Linux kernel differs from the PC environment, typically because the environment setting and the available hardware change from one platform to another. For example, an embedded system doesn’t have a hard disk or a PC BIOS, but include a boot monitor and flash memories. So basically, the main difference between each architecture’s boot process is in the application used to find and load the kernel. Once the kernel is in the memory, the same sequence of events occurs for all the CPU architectures, with some overloaded functionalities specific to each of them.

The Linux boot process can be represented in 3 stages as shown in Figure 1:

Figure 1 Linux boot process
When we press the system power on, a Boot Monitor code executes from a predefined address location from the NOR flash memory (0x00000000). The Boot Monitor initializes the PB11MPCore hardware peripherals’, and then launches the real bootloader U-Boot in case an automatic script is provided; else the user runs U-Boot manually by entering the appropriate command in the Boot Monitor command shell. U-Boot initializes the main memory and copies the compressed Linux kernel image (uImage), which is located either on the on-board NOR flash memory, MMC, CompactFlash or on a host PC, to the main memory to be executed by the ARM11 MPCore, after passing some initialization parameters to the kernel. Then the Linux kernel image decompresses itself, starts initializing its data structures, creates some user processes, boots all the CPU cores and finally runs the command shell environment in the user-space.

This was a brief introduction to the whole boot process. In the next sections, we will explain each stage in details and highlight the Linux source code that is executing the corresponding stage.


a)   System startup (Boot Monitor)


When the system is powered on or reset, all CPUs of the ARM11 MPCore fetch the next instruction from the reset vector address to their PC register. In our case, it is the first address in the NOR flash memory (0x00000000), where the Boot Monitor program exists. Only CPU0 continues to execute the Boot Monitor code and the secondary CPUs (CPU1, CPU2, and CPU3) execute a WFI instruction, which is actually a loop that checks the value of SYS_FLAGS register. The secondary CPUs start executing meaningful code during Linux Kernel boot process, which is explained in details later in this section in paragraph ARM Linux.


The Boot Monitor is the standard ARM application that runs when the system is booted and is built with the ARM platform library.

On reset, the Boot Monitor performs the following actions:
  • Executes on CPU0 the main code and on the secondary CPUs the WFI instruction 
  • Initialize the memory controllers and configure the main board peripherals
  • Set up a stack in memory
  • Copy itself to the main memory DRAM
  • Reset the boot memory remapping
  • Remap and redirect the C library I/O routines depending on the settings of the switches on the front panel of the PB11MPCore (output: UART0 or LCD – input: UART0 or keyboard)
  • Run a bootscript automatically, if it exists in the NOR flash memory and the corresponding switch is ON on the front panel of the PB11MPCore. Else, the Boot Monitor command shell is prompted

So basically, the Boot Monitor application shipped with the board is similar to BIOS in the PC. It has limited functionalities and cannot boot a Linux kernel image. So, another bootloader is needed to complete the booting process, which is U-Boot. The U-Boot code is cross-compiled to the ARM platform and flashed to the NOR flash memory. The final step is to launch U-Boot image from the Boot Monitor command line. This can be done using a script or manually by entering the appropriate command.






b)  Bootloader (U-Boot)


When the bootloader is called by the Boot Monitor, it is located in the NOR flash memory without access to system RAM because the memory controller is not initialized properly as U-Boot expects. So how U-Boot moves itself from the flash memory to the main memory?

In order to get the C environment working properly and run the initialization code, U-Boot needs to allocate a minimal stack. In case of the ARM11 MPCore, this is done in a locked part of the L1 data cache memory. In this way, the cache memory is used as temporary data storage to initialize U-Boot before the SDRAM controller is setup. Then, U-Boot initializes the ARM11 MPCore, its caches and the SCU. Next, all available memory banks are mapped using a preliminary mapping and a simple memory test is run to determine the size of the SDRAM banks. Finally, the bootloader installs itself at the upper end of the SDRAM area and allocates memory for use by malloc() and for the global board info data. In the low memory, the exception vector code is copied. Now, the final stack is set up.

At this stage, the 2nd bootloader U-Boot is in the main memory and a C environment is set up. The bootloader is ready to launch the Linux kernel image from a pre-specified location after passing some boot parameters to it. In addition, it initializes a serial or video console for the kernel. Finally, it calls the kernel image by jumping directly to the ‘start’ label in arch/arm/boot/compressed/head.S assembly file, which is the start header of the Linux kernel decompressor.

The bootloader can perform lot of functionalities; however a minimal set of requirements should be always achieved:

-          Configure the system’s main memory:
The Linux kernel does not have the knowledge of the setup or configuration of the RAM within a system. This is the task of the bootloader to find and initialize the entire RAM that the kernel will use for volatile data storage in a machine dependent manner, and then passes the physical memory layout to the kernel using ATAG_MEM parameter, which will be explained later.

-          Load the kernel image at the correct memory address:
The ‘uImage’ encapsulates a compressed Linux kernel image with header information that is marked by a special magic number and a data portion. Both the header and data are secured against corruption by a CRC32 checksum. In the data field, the start and end offsets of the size of the image are stored. They are used to determine the length of the compressed image in order to know how much memory can be allocated. The ARM Linux kernel expects to be loaded at address 0x7fc0 in the main memory.

-          Initialize a console:
Since a serial console is essential on all the platforms in order to allow communication with the target and early kernel debugging facilities, the bootloader should initialize and enable one serial port on the target. Then it passes the relevant console parameter option to the kernel in order to inform it of the already enabled port.

-          Initialize the boot parameters to pass to the kernel:
The bootloader must pass parameters to the kernel in form of tags, to describe the setup it has performed, the size and shape of memory in the system and, optionally, numerous other values as described in Table 1:

Table 1 Linux kernel parameter list
Tag name
Description
ATAG_NONE
Empty tag used to end list
ATAG_CORE
First tag used to start list
ATAG_MEM
Describes a physical area of memory
ATAG_VIDEOTEXT
Describes a VGA text display
ATAG_RAMDISK
Describes how the ramdisk will be used in kernel
ATAG_INITRD2
Describes where the compressed ramdisk image is placed in memory
ATAG_SERIAL
64 bit board serial number
ATAG_REVISION
32 bit board revision number
ATAG_VIDEOLFB
Initial values for vesafb-type framebuffers
ATAG_CMDLINE
Command line to pass to kernel


-          Obtain the ARM Linux machine type:
The bootloader should provide the machine type of the ARM system, which is a simple unique number that identifies the platform. It can be hard coded in the source code since it is pre-defined, or read from some board registry. The machine type number can be fetched from ARM-Linux project website.

-          Enter the kernel with the appropriate register values:

Finally, and before starting execution of the Linux kernel image, the ARM11 MPCore registers must be set in an appropriate way:
  • Supervisor (SVC) mode
  • IRQ and FIQ interrupts disabled
  • MMU off (no translation of memory addresses is required)
  • Data cache off
  • Instruction cache may be either on or off
  • CPU register0 = 0
  • CPU register1 = ARM Linux machine type
  • CPU register2 = physical address of the parameter list

c)   ARM Linux


As mentioned earlier, the bootloader jumped to the compressed kernel image code and passed some initialization parameters denoted by ATAG. The beginning of the compressed Linux kernel image is the ‘start’ label in arch/arm/boot/compressed/head.S assembly file. From this stage, the boot process comprises of 3 main stages. First the kernel decompresses itself. Then, the processor-dependent (ARM11 MPCore) kernel code executes which initializes the CPU and memory. And finally, the processor-independent kernel code executes which startup the ARM Linux SMP kernel by booting up all the ARM11 cores and initializes all the kernel components and data structures.

The flowchart in Figure 2 summarizes the boot process of the ARM Linux kernel:


Figure 2 ARM Linux kernel boot
In the Linux SMP environment, CPU0 is responsible for initializing all resources just as in a uniprocessor environment. Once configured, access to a resource is tightly controlled using synchronization rules such as a spinlock. CPU0 will configure the boot page translation so secondary cores boot from a dedicated section of Linux rather than the default reset vector. When secondary cores boot the same Linux image, they will enter Linux at a specific location so they simply initialize resources specific only to their core (caches, MMU) and don’t reinitialize resources that have already been configured, and then execute the idle process with PID 0.

A step-by-step walkthrough for the Linux kernel boot process is provided below:

This appendix will provide a walkthrough in the Linux kernel boot process for the ARM-based systems, specifically the ARM11 MPCore, by highlighting the source code of the kernel that executes each step. The boot process comprises of 3 main stages:

Image decompression:

Ø  U-Boot jumps at the ‘start’ label in arch/arm/boot/compressed/head.S
Ø  The parameters passed by U-Boot in r0 (CPU architecture ID) and r1 (ATAG parameter list pointer) are saved
Ø  Execute architecture specific code, then turn off the cache and MMU
Ø  Setup the C environment properly
Ø  Assign the appropriate values to the registers and stack pointer. i.e: r4= kernel physical start address – sp=decompressor code
Ø  Turn on the cache memory again by calling cache_on procedure which walk through proc_types list and find the corresponding ARM architecture. For the ARM11 MPCore (ARM v6), __armv4_mmu_cache_on, __armv4_mmu_cache_off, and __armv6_mmu_cache_flush procedures are called to turn on, off, and flush the cache memory to RAM respectively
Ø  Check if the decompressed image will overwrite the compressed image and jump to the appropriate routine
Ø  Call the decompressor routine decompress_kernel() which is located in arch/arm/boot/compressed/misc.c. The decompress_kernel() will display the “Uncompressing Linux...” message on the output terminal, followed by calling gunzip() function, then displaying “ done, booting the kernel” message.
Ø  Flush the cache memory contents to RAM using __armv6_mmu_cache_flush
Ø  Turn off the cache using __armv4_mmu_cache_off, because the kernel initialization routines expects that the cache memory is off at the beginning
Ø  Jump to start of kernel in RAM, where its address is stored in r4 register. The kernel start address is specific for
Ø   Each platform architecture. For the PB11MPCore, it is stored in arch/arm/mach-realview/Makefile.boot in zreladdr-y variable
(zreladdr-y := 0x00008000)


Processor dependent (ARM) specific kernel code:


The kernel startup entry point is in stext procedure in arch/arm/kernel/head.S file, where the decompressor has jumped after turning off the MMU and cache memory and setting the appropriate registers. At this stage, the following sequence of events is done in stext: (arch/arm/kernel/head.S)
Ø  Ensure that the CPU runs in Supervisor mode and disable all the interrupts
Ø  Lookup for the processor type using __lookup_processor_type procedure defined in arch/arm/kernel/head-common.S. This will return a pointer to a proc_info_list defined in arch/arm/mm/proc-v6.S for the ARM11 MPCore
Ø  Lookup for the machine type using __lookup_machine_type procedure defined in arch/arm/kernel/head-common.S. This will return a pointer to a machine_desc struct defined for the PB11MPCore
Ø  Create the page table using __create_page_tables procedure, which will setup the barest amount of page tables required to get the kernel running; in other words to map in the kernel code
Ø  Jump to __v6_setup procedure in arch/arm/mm/proc-v6.S, which will initialize the TLB, cache and MMU state of CPU0
Ø  Enable the MMU using __enable_mmu procedure, which will setup some configuration bits and then call __turn_mmu_on (arch/arm/kernel/head.S)
Ø  In __turn_mmu_on, the appropriate control registers are set and then it jumps to __switch_data which will execute the first procedure __mmap_switched (arch/arm/kernel/head-common.S)
Ø  In __mmap_switched procedure, the data segment is copied to RAM and the BSS segment is cleared. Finally, it jumps to start_kernel() routine in the init/main.c source code where the Linux kernel starts


 Processor independent kernel code

From this stage on, it is a common sequence of events for the boot process of the Linux Kernel independent of the hardware architecture. Well some functions are still hardware dependent, and they actually override the independent implementation. We will concentrate mainly on how the SMP part of Linux will boot and how the CPUs in the ARM11 MPCore are initialized.

In start_kernel(): (init/main.c)
Ø  Disable the interrupts on CPU0 using local_irq_disable() (include/linux/irqflags.h)
Ø  Lock the kernel using lock_kernel() to prevent from being interrupted or preempted from high priority interrupts (include/linux/smp-lock.h)
Ø  Activate the first processor (CPU0) using boot_cpu_init() (init/main.c)
Ø  Initialize the kernel tick control using tick_init() (kernel/time/tick-common.c)
Ø  Initialize the memory subsystem using page_address_init() (mm/highmem.c)
Ø  Display the kernel version on the console using printk(linux_banner)  (init/version.c)
Ø  Setup architecture specific subsystems such as memory, I/O, processors, etc…by using setup_arch(&command_line). The command_line is the parameter list passed by U-Boot when calling the kernel. (arch/arm/kernel/setup.c)
o   In setup_arch(&command_line) function, we execute architecture dependent code. For the ARM11 MPCore, smp_init_cpus() is called, which initialize the CPU map. It is in this stage where the kernel knows that there are 4 cores in the ARM11 MPCore. (arch/arm/mach-realview/platsmp.c)
o   Initialize one processor (CPU0 in this case) using cpu_init() which dumps the cache information, initializes SMP specific information, and sets up the per-cpu stacks (arch/arm/kernel/setup.c)
Ø  Setup a multiprocessing environment using setup_per_cpu_areas(). This function determines the size of memory a single CPU requires, allocates and initializes the memory for each corresponding CPU (4 CPUs). This way, each CPU has its own region to place its data. (init/main.c)
Ø  Allow the booting processor (CPU0) to access its own storage data already initialized using smp_prepare_boot_cpu() (arch/arm/kernel/smp.c)
Ø  Setup the Linux scheduler using sched_init() (kernel/sched.c)
o   Initialize a runqueue for each of the 4 CPUs with its corresponding data (kernel/sched.c)
o   Fork an idle thread for CPU0 using init_idle(current, smp_processor_id()) (kernel/sched.c)
Ø  Initialize the memory zones such as DMA, normal, high memory using build_all_zonelists() (mm/page_alloc.c)
Ø  Parse the arguments passed to Linux kernel using parse_early_param() (init/main.c) and parse_args() (kernel/params.c)
Ø  Initialize the interrupt table and GIC and trap exception vectors using init_IRQ() (arch/arm/kernel/irq.c) and trap_init() (arch/arm/kernel/traps.c). Also assign the processor affinity for each interrupt.
Ø  Prepare the boot CPU (CPU0) to accept notifications from tasklets using softirq_init() (kernel/softirq.c)
Ø  Initialize and run the system timer using time_init() (arch/arm/kernel/time.c)
Ø  Enable the local interrupts on CPU0 using local_irq_enable() (include/linux/irqflags.h)
Ø  Initialize the console terminal using console_init() (drivers/char/tty_io.c)
Ø  Find the total number of free pages in all memory zones using mem_init() (arch/arm/mm/init.c)
Ø  Initialize the slab allocation using kmem_cache_init() (mm/slab.c)
Ø  Determine the speed of the CPU clock in BogoMips using calibrate_delay() (init/calibrate.c)
Ø  Initialize the kernel internal components such as page tables, SLAB caches, VFS, buffers, signals queues, max number of threads and processes, etc…
Ø  Initialize the proc/ filesystem using proc_root_init() (fs/proc/root.c)
Ø  Call rest_init() which will create Process 1

In rest_init(): (init/main.c)
Ø  Create the init process, which is also called Process 1, using kernel_thread(kernel_init, NULL, CLONE_FS | CLONE_SIGHAND)
Ø  Create the kernel thread daemon, which is the parent of all kernel threads and has PID  2, using pid = kernel_thread(kthreadd, NULL, CLONE_FS | CLONE_FILES) (kernel/kthread.c)
Ø  Release the kernel lock that was locked at the beginning of start_kernel() using unlock_kernel()(include/linux/smp-lock.h)
Ø  Execute the schedule() instruction to start running the scheduler (kernel/sched.c)
Ø  Execute the CPU idle thread on CPU0 using cpu_idle(). This thread yields CPU0 to the scheduler and is returned to when the scheduler has no other pending process to run on CPU0. CPU idle thread tries to conserve power and keep overall latency low (arch/arm/kernel/process.c)

In kernel_init(): (init/main.c) <Process 1>
Ø  Start preparing the SMP environment by calling smp_prepare_cpus() (arch/arm/mach-realview/platsmp.c)
o   Enable the local timer of the current processor which is CPU0, using local_timer_setup(cpu) (arch/arm/mach-realview/localtimer.c)
o   Move data corresponding to CPU0 to its own storage using smp_store_cpu_info(cpu) (arch/arm/kernel/smp.c)
o   Initialize the present CPU map which describes the set of CPUs actually populated at the present time using cpu_set(i, cpu_present_map). This will inform the kernel that there are 4 CPUs.
o   Initialize the Snoop Control Unit using scu_enable() (arch/arm/mach-realview/platsmp.c)
o   Call poke_milo() function which will take care of booting the secondary processors (arch/arm/mach-realview/platsmp.c)
§  In poke_milo(), it triggers the other CPUs to execute realview_secondary_startup procedure by clearing the lower 2 bits of SYS_FLAGSCLR register and writing the physical address of realview_secondary_startup procedure in SYS_FLAGSSET (arch/arm/mach-realview/headsmp.S)
§  In realview_secondary_startup procedure, the secondary CPUs are waiting a synchronization signal from the kernel (running on CPU0) which says that they are ready to be initialized. When all the processors are ready, then they will be initialized using secondary_startup procedure (arch/arm/mach-realview/headsmp.S)
§  secondary_startup procedure does a similar operation as the stext procedure when CPU0 was booted: (arch/arm/mach-realview/headsmp.S)
·         Switch to Supervisor protected mode and disable all the interrupts
·         Lookup for the processor type using __lookup_processor_type procedure defined in arch/arm/kernel/head-common.S. This will return a pointer to a proc_info_list defined in arch/arm/mm/proc-v6.S for the ARM11 MPCore
·         Use the page tables supplied from  __cpu_up for each of the CPUs (to be explained later in cpu_up function)
·         Jump to __v6_setup procedure in arch/arm/mm/proc-v6.S, which will initialize the TLB, cache and MMU state of the corresponding secondary CPU
·         Enable the MMU using __enable_mmu procedure, which will setup some configuration bits and then call __turn_mmu_on (arch/arm/kernel/head.S)
·         In __turn_mmu_on, the appropriate control registers are set and then it jumps to __secondary_data which will execute __secondary_switched procedure (arch/arm/kernel/head.S)
·         In __secondary_switched procedure, it jumps to secondary_start_kernel routine in arch/arm/kernel/smp.c source code after setting the stack pointer to a thread structure allocated via cpu_up function that is running on CPU0. (to be explained later)
·         secondary_start_kernel (arch/arm/kernel/smp.c) is the official start of the kernel for the secondary CPUs. It is considered as a kernel thread which is running on the corresponding CPU (see previous step). In this thread, further initialization is done such as: 
o   Initialize the CPU using cpu_init() which dumps the cache information, initializes SMP specific information, and sets up the per-cpu stacks (arch/arm/kernel/setup.c)
o   Synchronize with the boot thread in CPU0 and enable some interrupts such as timer irq in the corresponding CPU interface of the Distributed Interrupt Controller using platform_secondary_init(cpu) function (arch/arm/mach-realview/platsmp.c)
o   Enable the local interrupts using local_irq_enable() and local_fiq_enable() (include/linux/irqflags.h)
o   Setup the local timer of the corresponding CPU using local_timer_setup(cpu) (arch/arm/mach-realview/localtimer.c)
o   Determine the speed of the CPU clock in BogoMips using calibrate_delay() (init/calibrate.c)
o   Move data corresponding to CPUx to its own storage using smp_store_cpu_info(cpu) (arch/arm/kernel/smp.c)
o   Execute the idle thread (also can be called as process 0) on the corresponding secondary CPU using cpu_idle() which will yield CPUx to the scheduler and is returned to when the scheduler has no other pending process to run on CPUx (arch/arm/kernel/process.c)


Ø  Call smp_init() (init/main.c) <we are on CPU0>
§  Boot every offline CPU which are CPU1,CPU2 and CPU3 using cpu_up(cpu): (arch/arm/kernel/smp.c)
·         Create a new idle process manually using fork_idle(cpu) and assign it to the data structure of the corresponding CPU
·         Allocate initial page tables to allow the secondary CPU to enable the MMU safely using pgd_alloc()
·         Inform the secondary CPU where to find its stack and page tables
·         Boot the secondary CPU using boot_secondary(cpu,idle): (arch/arm/mach-realview/platsmp.c)
o   Synchronize between the boot processor (CPU0) and the secondary processor using locking mechanism spin_lock(&boot_lock);
o   Inform the secondary processor that it can start booting its part of the kernel
o   Wake the secondary core up using smp_cross_call(mask_cpu), which will send a soft interrupt (include/asm-arm/mach-realview/smp.h)
o   Wait for the secondary core to finish its booting and calibrations that are done using secondary_start_kernel function (explained before)
·         Repeat this process for every secondary CPU
§  Display the kernel message on the console “SMP: Total of 4 processors activated (334.02 BogoMIPS), using smp_cpus_done(max_cpus) (arch/arm/kernel/smp.c)

Ø  Call sched_init_smp() (kernel/sched.c)
§  Build the scheduler domains using arch_init_sched_domains(&cpu_online_map) which will set the topology of the multicore (kernel/sched.c)
§  Check how many online CPUs exist and  adjust the scheduler granularity value appropriately using sched_init_granularity() (kernel/sched.c)
Ø  The do_basic_setup() function initializes the driver model using driver_init() (drivers/base/init.c), the sysctl interface, the network socket interface u, and work queue support using init_workqueues(). Finally it calls do_initcalls () which initializes the built-in device drivers routines (init/main.c)
Ø  Call init_post() (init/main.c)

In init_post() (init/main.c):

This is where we switch to user mode by calling sequentially the following processes:
            run_init_process("/sbin/init");
            run_init_process("/etc/init");
            run_init_process("/bin/init");
            run_init_process("/bin/sh");


/sbin/init process executes and displays lot of messages on the console, and finally it transfers the control to the console and stays alive.

SMP Linux on a Minimal Dual-Core Arm Cortex-A15 System

the Hardware Design

  • the CPU must be updated to the dual-core Arm Cortex-A15 processor.
  • the addition of an extra memory which is used to communicate the starting address for the secondary core.
    • For SMP Linux
    • there is a register (offset 0x30) which is used to pass the jump address to the secondary CPU
    • For the minimal system
    • Provide a simple memory at the base address of the System Registers, this is system address 0x1c010000.
      Only offsets 0x30 and 0x34 are used and the values must be initialized to 0 because the secondary code waits for a non-zero value.
      When the secondary CPU sees a non-zero value it will jump to the address contained in at 0x1c010030.
      If this address is not 0 at startup the system will not boot properly.

Linux Changes

  • enable SMP in the kernel config
  • rebuild the kernel , adjusting the CROSS_COMPILE to match the prefix of your ARM cross compiler
  • 
    $ make ARCH=arm CROSS_COMPILE=arm-linux-gnueabi- -j 4    
        

Starting the Secondary CPU

After reset, both CPUs will start running the code in boot.S which is located at 0x80000000.
  1. determine if the code is running CPU0 or CPU1
  2. This is done by reading the CPU ID register located in co-processor 15 (CP15).
    This register is also referred to as the Multiprocessor Affinity Register, MPIDR.
    It provides information about which core of an MPCore processor the code is running on, and which cluster of a multi-cluster system the code is running on.
    In the below example, we have a single cluster and two cores so the code simply identifies CPU ID 0 as the primary core and CPU ID 1 as the secondary.
    The primary core finishes the boot loader and immediately starts running Linux
  3. the secondary core waits in the boot loader for a jump address to be provided at address 0x1c010030.
  4. The primary CPU which is running Linux is responsible to release the secondary CPU by writing the jump address and sending an interrupt.
  5. The well commented last line in the screenshot below (arch/arm/mach-vexpress/platsmp.c) gives the details:

    A screenshot of the memory contents for the System Registers address range,

    The primary CPU has written 0xffffffff into address 0x1c010034 and then the 32-bit jump address into 0x1c010030.
    The virtual address 0xf8010000 was entered into the memory viewer window. .
  6. Check
    • Messages in the boot log should indicate that 2 CPUs are running.
    • /proc/cpuinfo

Method for Booting ARM Based Multi-Core SoCs

留言

熱門文章