ARM Embedded System
Embedded Systems with ARM Cortex-M Microcontrollers in Assembly Language and C (Third Edition)
preface
The book introduces basic programming of ARM Cortex-M cores in assembly and C at the register level, and the fundamentals of embedded system design.It presents basic concepts such as data representations (integer, fixed-point, floating-point), assembly instructions, stack, and implementing basic controls and functions of C language at the assembly level.
It covers advanced topics such as interrupts, mixing C and assembly, direct memory access (DMA), system timer (SysTick), multi-tasking, SIMD instructions for digital signal processing (DSP), and instruction encoding/decoding.
The book also gives detailed examples of interfacing peripherals, such as general purpose I/O (GPIO), LCD driver, keypad interaction, stepper motor control, PWM output, timer input capture, DAC, ADC, real-time clock (RTC), and serial communication (USART, I2C, SPI, and USB).
1. See a Program Running
This chapter shows how a program is gnerated and executed.1.1 Translate a C Program into a Machine Program
Compiliers first perform some analysis on the source program, and then create an intermediate representation(IR).FOr C programs, the intermediate program is similar to a assembly program.
Finally, the compilers translate the assemble program into a machine program.(binary executable)
The binary machine program follows a standard called executable and linkable format(ELF) which most ARM-based system support .
ELF defines 2 interfaces:
- a linkable interface used at link time to combine multiple files
- an executable interface use at run time to create a process image when the program is loaded and executed.
- load view The loa view classfies the input sctions into read-write and read-only regions.
- execution view The execution view provides information for the processor to load the executable at runtime.
- a test segment
- a read-only data segment
- a read-write data segment
- a zero-initialized data segment
This depends on 4 critical sections:
1.2 Load a Machine Program into Memory
1.2.1 Harvard Architecture and Non Neumann Architecture
There are 2 ype of architecture in memory accessing: Because the data and instruction memory are small enough to fit in the same 32-bit memory address space, they often share the same memory adress bus.For ex.,256 KB data memory and 4 KB instruction memory can share the address bus,
1.2.2 Creating Runtime memory Image
ARM Cortex-M3/M4/M7 processors are Harvard computer architecture, the instruction memory(flash) and data memory(SRAM) are built into th eprocessor chip.A simple example shows how the Harvard architecture loads a program to start the execution, When the processor boots successfully, the 1st instruction of the program is loaded from the instruction memory into the processor, and the program starts to run.
The memory map is pre-defined by the chip manufacture and is not programmable usually.
For ex., an example memory map of the 4 GB memory space:
The peripherial has a set of registers and may contain a small memory, the processor maps the register and memory of all peripherials to the same memory addressspace.
To interface a peripherial, the processor uses regular memory access instructions to Read/Wrote pre-defined addresses for this peripherial.
This method is called memory-mapped IO.
1.3 registers
All registers are of the same size and typically hold 16, 32, or 64 bits.A processor core has 2 types of registers: generail purpose and special purpose registers.
1.3.1 Reusing Registers to Improve Performance
Some data items are accessed more frequently.Therefore, most compiliers try to place the value of frequently or recently accessed data variables and memory addresses in registers for performance optimization.
Processor architecture design may use caching and prefetching to speed up the performance.
The number of registers on a processor is often small:
- registers always exhibt the highes temperature
- instruction's length to encode registers
2. Data Representation
3. ARM Instruction Set Architecture
4. Arithmetic and Logic
5. Load and Store
6. Branch and Conditional Execution
7. Structured Programming
8. Subroutines
9. 64-bit Data Processing
10. Mixing C and Assembly
11. Interrupt
12. Fixed-point and Floating-point Arithmetic
13. Instruction Encoding and Decoding
14. General-purpose I/O
15. General-purpose Timers
16. Stepper Motor Control
17. Liquid-crystal Display (LCD)
18. Real-time Clock (RTC)
19. Direct Memory Access (DMA)
20. Analog-to-Digital Converter (ADC)
21. Digital-to-Analog Converter (DAC)
22. Serial Communication Protocols
23. Multitasking
24. Digital Signal Processing
Appendix A: GNU Compiler
Short Lectures
1. Why use Two's Complement?
2. Carry flag for unsigned addition and subtraction
3. Overflow flag for signed addition and subtraction
4. C Pointer
5 Memory-mapped I/O
This short video explains what is memory mapped I/O.Usually, each on-chip peripheral device has a few registers, such as control registers, status registers, data input registers, and data output registers.
In general, there are 2 approaches to exchange data between the processor core and a peripheral device:
- Port-mapped I/O Port mapped I/O uses special CPU instructions which are designed specifically for I/O opeartions, such as the in and out instructions found on microprocessors based on the x86 and x86-64 architectures.
- Memory-mapped I/O Each device register is assigned to a memory address in the memory address space of the microprocessor.
The memory and registers of the I/O devices are mapped to (associated with) address values. So a memory address may refer to either a portion of physical RAM, or instead to memory and registers of the I/O device.
Each I/O device monitors the CPU's address bus and responds to any CPU access of an address assigned to that device, connecting the data bus to the desired device's hardware register.
To accommodate the I/O devices, some areas of the address bus used by the CPU must be reserved for I/O and must not be available for normal physical memory.
Memory-mapped I/O is performed by the native load and store instructions of the processor.
LDR/STR Reg, [Reg, #imm]Therefore, memory-mapped I/O is a more convient way to interface I/O devices.
Here is an example of memory mapped I/O.
When you write to this special memory location 0x48000014, the data you write is sent to the corresponding I/O device.
The memory address of ARM Cortex-M has a total of 32 bits, supporting 4GB of memory space.
The memory space is divided into six different pre-defined regions.
- The 1st region is code region This is primarily used to store program code.
- The 2nd region is SRAM It is primarily used to store data, such as heaps and stacks.
- The 3rd region is peripheral These peripherials include Advanced High Performance Bus peripherials, such as GPIO and ADC, or Advanced Peripherial Bus peripherials, such as timers and UART.
- The 4-th region is for External Device Such as SD card.
- The 5-th region is External RAM Executable region for data.
- The 6-th region is system region This includes the NVIC, system timer, system control block, and vendor specific memory.
It can also store data.
The code region is on-chip memory, typically on-chip flash.
The size of on-chip flash is limited to half a GB. The actual size of the on-chip flash varies based on different venders and different chips.
We can also put code here.
It supports half a GB.
This region covers the memory address of all on-chip peripherals.
Specific mapping addresses are dependent on vendors and chips.
It is off-chip memory, primarily used to store large data blocks.
It has a total of 1GB.
For ex., on STM32L4, the registers of GPIO Port A, are mapped to a small memory region starting at 0x4800000.
Let's take a closer look at the memory map for GPIO Port A.
Each port has 12 registers, and each register has 4 bytes.
While a total 1KB space is reserved for Port A, only 48 bytes are used.
Within this 48 bytes memory region, the GPIO mode register MODER is mapped to the lowest memory adress, and the GPIO analog switch control register(ASCR) is mapped to the highest memory address.
If we want to set the output of pin#14 of the GPIO port A to high, we need to set bit 14 of the output data register(ODR) of GPIO port A to 1.
The output data register (ODR) of Port A on STM32L4 are mapped to the memory addresses from 0x48000014 to 0x48000017.
If little endian is used, the highest memory address holds the most significant 8 bits, and the lowest memory address holds the least significant 8 bits.
This can be set using the following C statement: A sequence of load, modify, and store operations are performed in the above C statement*
- this statement casts the memory address to a memory pointer, which points to an 32-bit unsigned integer.
- the deference operator retrieves the ODR register value as a 32-bit integer
- a bit-wise operation is performed to modify this unsigned integer value
- the updated value is stored back to the ODR register via the deferencing
When a variavle is declared as volatile, the compiler is informed that even though no statements in the program appear to change it, the value might still change.
Typically, compilers minimize the number of memory accesses , by storing the memory value in a register, and then repeatedly using it without accessing the memory.
The volatile qualifier on a variable prevents the compilier from making such optimization on this variable.
6. GPIO Output: Lighting up a LED
7. GPIO Input: Interfacing joystick
8. LCD Driver
9. Interrupts
This short video will explain how interrupts work on ARM Cortex-M microprocessors. Us the STM32 L4 discovery kit as an ex., there are 2 LEDs and a joystick with 4 push buttons. If we want to develop a software: if a button is prssed, the software turns on the red LED. There are 2 ways to monitor the logic state on an input pin which is attached to a push button:- polling
- interrupt When the interrupt signal is generated, the processor receives the interrupt then suspend the current execution of programs, and start the execution of a special program called the interrup handler. After the interrupt handler completes, the processor resume the execution of programs.
If the memory address is 32 bits, it can support 4GB of memory space.
The memory space is divided in 6 pre-defined regions and each region has suggested usage:
The internal SRAM is divided into several segments.
- Initialized data It contains global and static variables, which the program gives some initial values.
- Zero-initialized data It contains all globall, or static variables, which are uninitialized, or initialized to 0, in the program.
- Heap It hold data objects, which an application creates dynamically at runtime. It grow upwards.
- stack It can save the runtime environment, local variables and subroutines, and pass arguments to a subroutine.
The stack is placed on the top of the internal SRAM memory, and it grows downwards.
They grow in the opposite direction.
When the stack meets the heap, free memory space is exhausted.
While the code space can have as large as half a GB in the address space, much of this space is reserved.
For ex., STM32L4 chip has only 1MB on-chip flash memory which starts at 0x08000000 and ends at 0x080FFFF.
in addition, a small flash memory region starting at 0x08000000 is mapped to the lowest memory region starting at the address 0.
The Nested Vector Interrupt Controller(NVIC) prioritizes and handle all interrupts.
When we press the push button connected to the pin PA3, the HW generates an electrical signal, called interrupt request, EXTI3.
When NVIC receives the interrupt request, it forces the processor to jump to and execute a special piece of code, called an interrupt service routine or an interrupt handler.
The entry points of all interrupt service routines are stored in a special table, called an interrupt vector table.
The interrupt vector table is stored at a pre-defined area in the memory.
For ARM Cortex processors, the interrupt vector table starts at the memory address 0x0004.
By default, the interrupt vector table is mapped to the lowest address of the internal flash memory.
However, software can re-map it to a different location, such as internal SRAM.
Each address is the starting address of the interrupt service routine.
The interrupt number is used to index the interrupt table.
The reset handlerr contains the function pointer which is called when the processor is rest.
When the processor is in reset, the program counter is initialized to this address value.
Typically, the reset handler performs some HW initialization, then calls the main function.
If the interrupt arrives, the interrupt controller will read the address of the interrupt handler which is stored in the IVT. Then, set the program counter to that value.
This way forces the processor to jump to the ISR.
Before jumpping to the ISR, the interrupt controller perform stacking to reserve the program's status.
Note, ARM uses decending stack, if a 32 bits item is pushed to the stack, the SP(stack pointer) is decremented by 4.
BX LRThe above instruction informs the interrupt controller to perform an unstacking process.
10. Interrupt Enable and Interrupt Priority
A Cortex M microcontroller support up to 256 interrupts.- Each interrupt, except the interrupt reset, has an interrupt number.
- The first 16 interrupts are system interrupts, also called system exceptions. CMSIS(Cortex Microcontroller Software Interface Standard) defines all system exceptions by using negative values.
- The reset 240 interrupts are peripherial interrupts, also called non-system exceptions The peripheral interrupt number starts with 0.
Peripherial interrupts are defined by chip manufactures.
The total number of peripherial interrupts supported varies among chips.
NVIC_DisableIRQ(IRQn); // disable interrupt NVIC_EnableIR(IRQn); // enable interrupt NVIC_ClearingPending(IRQn); // clear pending status NVIC_SetPriority(IRQn, priority); // set priority levelWhen an interrupt is serviced, the current interrupt or exception number is recorded in the program status register(PSR).
The recorded value in PSR is different from the number in CMSIS, In this tutorial, when we say interrupt number, we mean the interrupt number defined for CMSIS.
This is the interrupt number definition for STM32L4 Cortex-M4 microprocessors, it is always defined in a header file: Enabling a system exception is different from enabling a peripherial interrupt.
There is no enabling/disabling rsgisters for system excptions:
- Some system exceptions, such as reset and hard fault, cannot be disabled. They are always enabled.
- The other system exceptions can be enabled or disabled by the corresponding modules, such as system timer
On the other hand, the enabling/disabling peripherial interrupts are implemented by modifying 2 sets of registers: ISER(interrupt set enable register) and ICER registers.
We can enable a peripherial interrupt by writing 1 to the corresponding bit of the ISER register.
For ex., to enable interrupt Timer 7,
- the interrupt number of Timer 7 is 44 for STM32L1
- we need to set bit 12 of ISER1 to 1
What should the processor do if multiple interrupts arrive at the same time?
ARM processor allow software to set priority levels for almost every interrupt.
In ARM, numerically low priority values are used to specify logically high interrupt priorities.
The priority of some interrupts are fixed.
Interrupt priority is configured by Interrupt Priority(IP) register.
In embedded systems, we often have to perform some critical operations, in which data should not be corrupted by other interrupts.
Therefore, we need to disable all interrupts with less urgency to ensure that the execution of the critical code will not be interrupted by other interrupts.
We can use the Base Priority Mask Register(BASEPRI) to achieve the protection of critical code.
In this ex., we disable all interrupts whose priority is >= 5 during the execution of the critical code.
__set_BASEPRI( 5 << 4 ) // critical code start .. // critical code end __set_BASEPRI(0)
11. External interrupts (EXTI)
This lecture will show you how to configure and program external interrupt(EXTI).
External interrupts are generated by peripherals or devices, external to the microcontroller, such as push buttons and key pads.
There are 2 approaches to monitor and respond to external events.
- polling
- interrupt
The interrupt controller :
- temporally stops the normal flow of program execution
- causes the interrupt service routine(ISR) to be executed After the ISR completes, normal program execution is resumed at the point where is was last time.
Use STM32L4 Kit as an ex.,
GPIO port A's pins PA0, PA1, PA5, PA2 and PA3 are connected to the "center", "left", "down", "right", and "u"p pin of the jpystick respectively.Each ping is connected to the ground via a capacitor.
These capacitors perform HW switch debouncing.
When the "up" of the joystick is pressed, this switch is then closed.
As a result, PA#3 is then connected to the 3V via the "COMMON" terminal.
Note that:
- the default voltage of the "CENTER" pin is 0 because of the pull down register R59.
- The other 4 joystick terminals are not pull down. Their defailt voltage may be high or low depending on its last usage
Each GPIO pin can trigger an interrupt request signal independently.
SW can configure the external interrupt controller so that:
- PA#0 triggers EXTI0
- PA#1 triggers EXTI1
- PA#5 triggers EXTI5
- PA#2 triggers EXTI2
- PA#3 triggers EXTI3
The external interrupt controller monitors the change of the voltage signal.
The rising or falling edge of the voltage signal can make the external interrupt controller generate an interrupt request.
The interrupt request will be sent to the NVIC.
The external interrupt controller supports 16 external interrupt input, these inputs are named from external interrupt 0 to 15 and associated with GPIO pins.
Each interrupt input is associated with one specific GPIO port's pin.
Multiple GPIO port's pins can be used as the input interrupt source simultaneously.
Therefore, we can use only specific GPIO pin number from GPIO ports at the same time.
The interrupt controller has one multiplexer for each GPIO pin. There are 16 multiplexers.
A multiplexer(MUX) is a simple circuit. It selects one of its inputs and forwards it ti the output.
There are dedicated interrupt handlers for external interrupts.
For ex.,
- PA.3 can be mapped to EXTI3 and its corresponding interrupt handler is EXT_3_IRQHandler.
- External interrupts from number 5 to 0, share the same interrupt handler EXT_9_5_IRQHandler
- External interrupts from number 10 to 15, share the same interrupt handler EXT_15_10_IRQHandler
- configurable external interrupts Interrupt associated with GPIO, RTC, comparators, power voltage detector and peripherial voltage monitoring(PVM).
- direct external interrupts Only rsing edge can generate an interrupt request.
For these interrupts, the controller has a programmable edge detector, and software can select which active edge generate an interrupt request.
Besides, SW can generate an interrupt request by writing 1 to the SW interrupt event register(SWIER).
These interrupts are mostly used for communication peripherials, low-power timer, and LED.
Let's work on the SW part: if we press the "UP" button of th ejoystick, SW turns on the LED
The "UP" butto is connected to the GPIO pin PA3 which can generate the external interrupt request 3.
- First, we need to enable the GPIO port A.
- Then, configure the mode of pin PA.3 as the digital input.
- Set PA.3 as pull down GPIO PA.3 is neither pulled up nor pulled down externally. It is connected to the ground via a capacitor.
- enable external interrupt 3
- select PA.3 as the source of external interrupt 3
- rising edge trigger selection
- set the interrupt mask register
- ISR for external interrupt 3 After receiving the interrupt request, the NVIC controller forces the processor to execute interrupt handler EXTI_IRQHandler().
RCC->AHB2ENR |= RCC_AHB2ENR_GPIOAEN;
// GPIO mode: digital input(00), digital output(01), alternative function(10), analog(11, default). GPIOA->MODER &= ~3U << 6;
If the processor doesn't pull it down internally, the voltage on pin PA.3 is floating.
// GPIO non pull-up , pull-down(00), pull-up(01), pull-down(10), reserved(11) GPIO->PUPDR &= ~3U << 6; GPIO->PUPDR |= 2U << 6; // pull-down(10)
NVIC_EnableIRQ(EXTI3_IRQn);
RCC->APB2ENR != RCC_APB2ENR_SYSFGGEN; SYSCFG->EXTICR[0] &= ~SYSCFG_EXTICR1_EXTI3; SYSCFG->EXTICR[0] |= SYSCFG_EXTICR1_EXTI3_PA;When PA.3 is selected, the other port's pin 3 cannot be used to generate external interrupts.
// 0: trigger disabled, 1: trigger enabled EXTI->RTSR1 != EXTI->RTSR1_RT3;
// 0: masked, 1: not masked EXTI->IMR1 != EXTI->IMR1_IM3;
void EXTI3_IRQHandler(void){ if ((EXTI->PR1 & EXTI_PR1_PIF3) != 0) { // toggle LED .. // clear interrupt flag EXTI->PR1 |= EXTI_PR1_PIF3; } }
12. System Timer (SysTick)
13. Timer PWM output
14. Timer Input Capture
15. Booting Process
16. Volatile Variables
17. Race Condition
18. ADC
19. Floating-Point Unit (FPU)
20. Fixed Point Numbers
21. Why learn assembly language
22. Big Endian and Little Endian
23. Load and Store Instructions
24. Addressing mode: pre-index, post-index, and pre-index with update
25. Arithmetic and Logical Instructions
26. Updating NZCV bit flags
27. Branch Instructions
28. Conditional Execution
29. Calling a subroutine
30. Passing arguments to a subroutine
31. Preserving registers in a subroutine
32. Mixing C and assembly
SoC, MPU, MCU
Microcontrollers vs. Microprocessors: What’s the difference?
Microcontrollers (MCUs) tend to be less expensive than, simpler to set-up, and simpler to operate than microprocessors (MPUs).An MCU can be viewed as a single-chip computer, whereas an MPU has surrounding chips that support various functions like memory, interfaces, and I/O.
One of the main differences between microcontrollers and microprocessors is that
- a microprocessor will typically run an operating system. An operating system allows multiple processes to run at the same time via multiple threads. Drivers are required to support peripherals.
- A microcontroller will run a “bare metal interface,” which means there is not an operating system. Without an operating system, a microcontroller can only run one control loop at a time.
From a software perspective, this means a single thread is running on the microcontroller’s processor or Central Processing Unit (CPU).
An MCU might have I2C, SPI, a UART (serial), and sometimes a low-level USB connection.
These basic interfaces are often used just for programming the MCU.
An MCU provides more on a single chip than an MPU. The difference between MCUs and MPUs is becoming less pronounced since some MCUs now come with simple software drivers for more sophisticated peripherals and more MPUs can be found that have integrated peripherals on-chip.
SoC
An SoC( System-on-a-Chip ) can be based on an MCU or MPU and will provide everything that’s necessary to perform certain types of applications.SoCs enable an entire system of chips on a single, tiny IC.
For example, for image processing, an SoC might have a combination of
- MPU
- a Digital Signal Processor (DSP)
- a Graphic Processing Unit (GPU) for performing rapid algorithm calculations, along with on-chip interfaces for driving a display and an HDMI or other audio/video input/output technology.
ARM Instruction Set
- R13 通常會被用來當作堆疊指標 (Stack Pointer, SP),在實際使用中,一般會在記憶體分配一些空間作為堆疊,系統初始化時將這一塊堆疊的底部位址儲存到 R13 。
- R14 為 連結暫存器 (Link register, LR) ,用來存放副程式的返回地址,比如我們在組語中呼叫到了 BL、BLX 等指令時,會將 PC 的數值複製到 R14 中,作為反還 (return) 的位址
- R15 則是程式計數器(Program Counter, PC),用來存放下一道指令的位址
Basic Syntax
label opcode operand1, operand2, ...; Comments
- lable 可有可無,通常用來當作地址的標記
- opcode 指令的操作碼
- operand 第一個operand是指令結果的destination,不同指令則有所不同個operand
- The state of an ARM system is determined by the content of visible registers and memory.
- A user-mode program can see 15 32-bit general- purpose registers (R0-R14), program counter (PC) and CPSR.
- Instruction set defines the operations that can change the state.
The ISA defines the supported data types, the registers, how the hardware manages main memory, key features (such as virtual memory), which instructions a microprocessor can execute, and the input/output model of multiple ISA implementations.
ARM instructions are all 32-bit long (except for Thumb mode).
There are 232 possible machine instructions. Fortunately, they are structured. Regarding registers, briefly:
- r0 Return value, first function argument
- r1-r3 Function arguments and general scratch
- r4-r11 Saved registers
- r12 ip. Intra-procedure scratch register, rarely used by the linker
- r13 sp. Stack pointer, a pointer to the end of the stack. Moved by push and pop instructions.
- r14 lr. Link register, storing the address to return to when the function is done. Written by "bl" (branch and link, like function call), often saved with a push/pop sequence, read by "bx lr" (branch to link register) or the pop.
- r15 pc. Program counter, the current memory address being executed. It's very unusual, but handy, to have the program counter just be another register--for example, you can do program counter relative addressing very easily, by just loading from [pc+addr].
- Data processing They are move, arithmetic, logical, comparison and multiply instructions.
- Data movement
- Flow control
C6: A64 Base Instruction Descriptions
C6.2.173 MRS(Move System Register)
To read an AArch64 System register into a general-purpose register.
C6.2.175 MSR (register)
To write an AArch64 System register from a general-purpose register.
What is the purpose of WFI and WFE instructions and the event signals?
We have 2 instructions for entering low-power standby state where most clocks are gated: WFI and WFE.- WFI is targeted at entering either standby, dormant or shutdown mode, where an interrupt is required to wake-up the processor.
- WFE makes use of the event register, the SEV instruction and EVENTI, EVENTO signals. A usage for WFE is to put it into a spinlock loop.
Where a CPU wants to access a shared resource such as shared memory, we can use a semaphore flag location managed by exclusive load and store access.
If multiple CPUs are trying to access the resource, one will get access and will start to use the resource while the other CPUs will be stuck in the spinlock loop.
To save power, you can insert the WFE instruction into the loop so the CPUs instead of looping continuously will enter STANDBYWFE.
Then the CPU who has been using the resource should execute SEV instruction after it has finished using the resource.
This will wake up all other CPUs from STANDBYWFE and another CPU can then access the shared resource.
RASPBERRY PI ON QEMU
Emulate Raspberry Pi 3 using QEMU in 64 bit
學習實作小型作業系統
Low-Level Programming University
ARM Cortex-A Series Programmer's Guide for ARMv7-A
Preface
The purpose of this book is to provide a single guide for programmers who want to develop applications for the Cortex-A series of processors, bringing together information from a wide variety of sources that will be useful to both assembly language and C programmers.Hardware concepts such as caches and Memory Management Units are covered, but only where this is valuable to the application writer.
We will also look at the way operating systems such as Linux make use of ARM features, and how to take full advantage of the capabilities of the ARM processor, in particular writing software for multi-core processors.
This is not an introductory level book. It assumes some knowledge of the C programming language and microprocessors, but not of any ARM-specific background.
We hope that the book is suitable for programmers who have a desktop PC or x86 background and are taking their first steps into the ARM processor based world.
Chapter 1 Introduction
Chapter 2 ARM Architecture and Processors
Chapter 3 ARM Processor Modes and Registers
The ARM architecture is a modal architecture.Before the introduction of Security Extensions it had seven processor modes: six privileged modes and a non-privileged user mode.
- User (USR) Mode in which most programs and applications run
- FIQ Entered on an FIQ interrupt exception
- IRQ Entered on an IRQ interrupt exception
- Supervisor (SVC) Entered on reset or when a Supervisor Call instruction (SVC) is executed
- Abort (ABT) Entered on a memory access exception
- Undef (UND) Entered when an undefined instruction executed
- System (SYS) (kernel) Mode in which the OS runs, sharing the register view with User mode
For ex., the user mode cannot do MMU configuration and cache operations.
Modes are associated with exception events, which are described in Exception Handling.
The introduction of the TrustZone Security Extensions created two security states for the processor that are independent of Privilege and processor mode, with a new Monitor mode to act as a gateway between the Secure and Non-secure states and modes existing independently for each security state.
For processors that implement the TrustZone extension, system security is achieved by dividing all of the hardware and software resources for the device.When a processor is in the Non-secure state, it cannot access the memory that is allocated for Secure state.
In this situation the Secure Monitor acts as a gateway for moving between these two worlds. Software executing in Monitor mode controls transition between Secure and Non-secure processor states.
The ARMv7-A architecture Virtualization Extensions add a hypervisor mode (Hyp), in addition to the existing privileged modes.
Virtualization enables more than one Operating System to co-exist and operate on the same system.
- PL0 Software executing at PL0 can make only unprivileged memory accesses.
- PL1 Software execution in all modes other than User mode and Hyp mode is at PL1.
- PL2 Hyp mode is normally used by a hypervisor, that controls, and can switch between Guest Operating Systems that execute at PL1.
Normally, operating system software executes at PL1.
These privilege levels are separate from the TrustZone Secure and Normal (Non-secure) settings.
The privilege level defines the ability to access resources in the current security state, and does not imply anything about the ability to access resources in the other security state.
Chapter 4
Generic Interrupt Controller (GIC)
A Generic Interrupt Controller (GIC) takes interrupts from peripherals, prioritizes them, and delivers them to the appropriate processor core.The Arm GIC architecture has three forms in general use with the A-profile and R-profile processors.
1. Introduction
Terminology
About the Generic Interrupt Controller architecture
The GIC is a centralized resource for supporting and managing interrupts in a system that includes at least one processor.It provides registers for managing interrupt sources, interrupt behavior, and interrupt routing to one or more processors.
The GIC includes interrupt grouping functionality that supports:
- configuring each interrupt as either Group 0 or Group 1
- signaling Group 0 interrupts to the target processor using either the IRQ or the FIQ exception request
- signaling Group 1 interrupts to the target processor using the IRQ exception request only
- a unified scheme for handling the priority of Group 0 and Group 1 interrupts
- optional lockdown of the configuration of some Group 0 interrupts.
Security Extensions support
Virtualization support
Terminology
- Interrupt states
- Interrupt types
- Models for handling interrupts
- Spurious interrupts
- Processor security state and Secure and Non-secure GIC accesses a processor in Non-secure state can make only Non-secure accesses to a GIC.
- Banking
- Interrupt banking
- Register banking
a processor in Secure state can make both Secure and Non-secure accesses to a GIC
2. GIC Partitioning
About GIC partitioning
The GIC architecture splits logically into a Distributor block and one or more CPU interface blocks.The GIC Virtualization Extensions add one or more virtual CPU interfaces to the GIC.
- Distributor The Distributor block performs interrupt prioritization and distribution to the CPU interface blocks that connect to the processors in the system.
- CPU interfaces Each CPU interface block performs priority masking and preemption handling for a connected processor in the system.
- Virtual CPU interfaces The GIC Virtualization Extensions add a virtual CPU interface for each processor in the system.
- Virtual interface control The main component of the virtual interface control block is the GIC virtual interface control registers, that include a list of active and pending virtual interrupts for the current virtual machine on the connected processor.
- Virtual CPU interface Each virtual CPU interface block provides physical signaling of virtual interrupts to the connected processor.
The Distributor block registers are identified by the GICD_ prefix.
CPU interface block registers are identified by the GICC_ prefix.
Each virtual CPU interface is partitioned into the following blocks:
Typically, these registers are managed by the hypervisor that is running on that processor.
Virtual interface control block registers are identified by the GICH_ prefix.
The ARM processor Virtualization Extensions signal these interrupts to the current virtual machine on that processor.
The GIC virtual CPU interface registers, accessed by the virtual machine, provide interrupt control and status information for the virtual interrupts.
Virtual CPU interface block registers are identified by the GICV_ prefix.
The Distributor
The Distributor centralizes all interrupt sources, determines the priority of each interrupt, and for each CPU interface forwards the interrupt with the highest priority to the interface, for priority masking and preemption handling.Interrupts from sources are identified using ID numbers. Each CPU interface can see up to 1020 interrupts.
CPU interfaces
Each CPU interface block provides the interface for a processor that is connected to the GIC.3. Interrupt Handling and Prioritization
4. Programmers' Model
This chapter describes the Distributor and CPU interface registers.The programmers' model for the GIC Distributor and CPU interfaces is to operate using a memory-mapped register interface.
About the programmers' model
GIC register names
Distributor register map
CPU interface register map
GIC register access
Enabling and disabling the Distributor and CPU interfaces
Effect of the GIC Security Extensions on the programmers' model
GICv3 and GICv4 Software Overview
1. Preface
1.3 Terms and Abbreviations
- ARE Affinity Routing Enable
- PE The term Processing Element or PE is used as a generic term for a machine that implements the ARM architecture.
For the ARM® Cortex®-A57 MPCore as an ex., it can be up to 4 cores. Each core is what the architecture specifcations refer to as a PE.
2. Introduction
2.4 Legacy support
The programmers’ model that is used is controlled by the Affinity Routing Enable (ARE) bits in GICD_CTRL :- When ARE == 0, affinity routing is disabled (legacy operation).
- When ARE == 1, affinity routing is enabled.
3. GICv3 fundamentals
3.1 Interrupts types
3.1.3 How interrupts are signaled to the interrupt controller
- Traditionally, interrupts are signaled from a peripheral to the interrupt controller using a dedicated hardware signal.
- GICv3 supports message-based interrupts. A message-based interrupt is an interrupt that is set and cleared by a write to a register in the interrupt controller. Using a message to forward the interrupt from a peripheral to the interrupt controller removes the requirement for a dedicated signal per interrupt source.
3.3 Affinity routing
GICv3 uses affinity routing to identify connected PEs and to route interrupts to a specific PE or group of PEs.The affinity of a PE is represented as four 8-bit fields:
<affinity level 3>.<affinity level 2>.<affinity level 1>.<affinity level 0>The affinity scheme matches that used in ARMv8-A, with the affinity of a PE reported in MPIDR_EL1.
System designers must ensure that the affinity value indicated by MPIDR_EL1 is identical to that indicated by GICR_TYPER for the Redistributor connected to the PE.
The exact meaning of the different levels of affinity is defined by the specific processor and SoC.
For ex.,
<group of groups> . <group of processors> .<processor> .<core>
<group of processors> .<processor> .<core> .<thread>
3.4 Security model
3.5 Programmers’ model
The register interface of a GICv3 interrupt controller is split into three groups:- Distributor interface(GICD_*).
- Redistributor interface(GICR_*).
- CPU interface(ICC_*_ELn). In GICv3 the CPU Interface registers are accessed as System registers (ICC_*_ELn).
Generic Timer
The Generic Timer includes a System Counter and set of per-core timers, The System Counter is an always-on device, which provides a fixed frequency incrementing system count.
The system count value is broadcast to all the cores in the system, giving the cores a common view of the passage of time.
Each core has a set of timers. These timers are comparators, which compare against the broadcast system count that is provided by the System Counter.
Each timer has the following three system registers: For example, CNTP_CVAL_EL0 is the Comparator register of the EL1 physical timer.
The CNTPCT_EL0 system register reports the current system count value.
CNTFRQ_EL0 reports the frequency of the system count. However, this register is not populated by hardware.
Timer virtualization
Timers can be divided into two groups: virtual timers and physical timers.
- Physical timers Like the EL3 physical timer, CNTPS, compare against the count value provided by the System Counter.
- Virtual timers Like the EL1 Virtual Timer, CNTV, compare against a virtual count.
This value is referred to as the physical count and is reported by CNTPCT_EL0.
The virtual count is calculated as:
Virtual Count = Physical Count - <offset>The offset value is specified in the register CNTVOFF_EL2, which is only accessible at EL2 or EL3.
If EL2 not implemented, the offset is fixed as 0. This means that the virtual and physical count values are always the same. The virtual count allows a hypervisor to show virtual time to a Virtual Machine (VM).
This means that the virtual count can represent time experienced by the VM, rather than wall clock time.
System Counter
The System Counter generates the system count value that is distributed to all the cores in the system.This means that all cores share the same view of the passing of time.
Consider the following example:
- Device A reads the current system count and adds it to a message as a timestamp, then sends the message to Device B.
- When Device B receives the message, it compares the timestamp to the current system count.
The System Counter measures real time.
The count must continue to increment at its fixed frequency.
The System Counter provides two register frames: CNTControlBase and CNTReadBase.
Registers
CNTFRQ_EL0, Counter-timer Frequency register
This register is provided so that software can discover the frequency of the system counter.It must be programmed with this value as part of system initialization.
The value of the register is not interpreted by hardware.
CNTFRQ_EL0 is a 64-bit register.
Bits [31:0] ndicates the system counter clock frequency, in Hz.
CNTPCT_EL0, Counter-timer Physical Count register
This holds the 64-bit physical count value.CNTVCT_EL0, Counter-timer Virtual Count register
This holds the 64-bit virtual count value.The virtual count value is equal to the physical count value visible in CNTPCT_EL0 minus the virtual offset visible in CNTVOFF_EL2.
This register can be read using MRS with the following syntax:
MRS <Xt>, <systemreg>
CNTVOFF_EL2, Counter-timer Virtual Offset register
This holds the 64-bit virtual offset.This is the offset between the physical count value visible in CNTPCT_EL0 and the virtual count value visible in CNTVCT_EL0.
MRS <Xt>, <systemreg>
MIDR, Main ID Register
Provides identification information for the PE, including an implementer code for the device and a device ID number.There is one instance of this register that is used in both Secure and Non-secure states.
Some fields of the MIDR are IMPLEMENTATION DEFINED.
- Implementer, bits [31:24] This field must hold an implementer code that has been assigned by ARM.
- Variant, bits [23:20] An IMPLEMENTATION DEFINED variant number.
- Architecture, bits [19:16]
- PartNum, bits [15:4] An IMPLEMENTATION DEFINED primary part number for the device.
- Revision, bits [3:0] An IMPLEMENTATION DEFINED revision number for the device.
For ex., NVIDIA uses 0x4E.
System Control Register (SCTLR)
The SCTLR provides the top level control of the system, including its memory system.- EE, bit [25] The value of the PSTATE.E bit on branch to an exception vector or coming out of reset, and the endianness of stage 1 translation table walks in the PL1&0 translation regime.
- 0 Little-endian. PSTATE.E is cleared to 0 on taking an exception or coming out of reset.
- 1 Big-endian. PSTATE.E is set to 1 on taking an exception or coming out of reset.
- I, bit [12] Instruction access Cacheability control, for accesses at EL1 and EL0:
- 0 All instruction access to Normal memory from PL1 and PL0 are Non-cacheable for all levels of instruction and unified cache.
- 1 All instruction access to Normal memory from PL1 and PL0 can be cached at all levels of instruction and unified cache.
- C, bit [2] Cacheability control, for data accesses at EL1 and EL0:
- 0 All data access to Normal memory from PL1 and PL0, and all accesses to the PL1&0 stage 1 translation tables, are Non-cacheable for all levels of data and unified cache.
- 1 All data access to Normal memory from PL1 and PL0, and all accesses to the PL1&0 stage 1 translation tables, can be cached at all levels of data and unified cache.
- M, bit [0] MMU enable for EL1 and EL0 stage 1 address translation.
- 0 EL1 and EL0 stage 1 address translation disabled. See the SCTLR.I field for the behavior of instruction accesses to Normal memory.
- 1 EL1 and EL0 stage 1 address translation enabled.
The possible values of this bit are:
Stage 1 translation table walks in the PL1&0 translation regime are little-endian.
Stage 1 translation table walks in the PL1&0 translation regime are big-endian.
If the value of SCTLR.M is 0, instruction accesses from stage 1 of the PL1&0 translation regime are to Normal, Outer Shareable, Inner Non-cacheable, Outer Non-cacheable memory.
If the value of SCTLR.M is 0, instruction accesses from stage 1 of the PL1&0 translation regime are to Normal,
Possible values of this bit are:
SCTLR_EL1, System Control Register (EL1)
Provides top level control of the system, including its memory system, at EL1 and EL0.
AArch64 System register SCTLR_EL1 bits [31:0] are architecturally mapped to AArch32 System register SCTLR[31:0].
- DSSBS, bit [44] Default PSTATE.SSBS value on Exception Entry.
- When FEAT_SSBS is implemented
- 0 PSTATE.SSBS is set to 0 on an exception to EL1.
- 1 PSTATE.SSBS is set to 1 on an exception to EL1.
- Otherwise Reserved, RES0.
SSBS, Speculative Store Bypass Safe
This register is present only when FEAT_SSBS is implemented. Otherwise, direct accesses to SSBS are UNDEFINED.HCR_EL2, Hypervisor Configuration Register (EL2)
Provides configuration controls for virtualization, including defining whether various Non-secure operations are trapped to EL2.- RW, bit [31] Execution state control for lower Exception levels:
- 0 Lower levels are all AArch32.
- 1 The Execution state for EL1 is AArch64.
The Execution state for EL0 is determined by the current value of PSTATE.nRW when executing at EL0.
SCR_EL3, Secure Configuration Register (EL3)
Defines the configuration of the current Security state. It specifies:- The Security state of EL0 and EL1, either Secure or Non-secure.
- The Execution state at lower Exception levels.
- Whether IRQ, FIQ, SError interrupts, and External abort exceptions are taken to EL3.
- RW, bit [10] Execution state control for lower Exception levels.
- 0 Lower levels are all AArch32.
- 1 The next lower level is AArch64.
- If EL2 is present:
- EL2 is AArch64.
- EL2 controls EL1 and EL0 behaviors.
- If EL2 is not present:
- EL1 is AArch64.
- EL0 is determined by the Execution state described in the current process state when executing at EL0.
- Bits [5:4] Reserved, RES1.
- NS, bit [0] Non-secure bit.
- 0 Indicates that EL0 and EL1 are in Secure state, and so memory accesses from those Exception levels can access Secure memory.
- The AT S1E2R, AT S1E2W, TLBI VAE2, TLBI VALE2, TLBI VAE2IS, TLBI VALE2IS, TLBI ALLE2, and TLBI ALLE2IS System instructions are UNDEFINED.
- Each AT S12E** System instruction executes as the corresponding AT S1E**instruction. For example, AT S12E0R executes as AT S1E0R.
- Each of the TLBI IPAS2E1, TLBI IPAS2E1IS, TLBI IPAS2LE1, and TLBI IPAS2LE1IS System instructions executes as a NOP.
- A TLBI VMALLS12E1 System instruction executes as TLBI VMALLE1, and a TLBI VMALLS12E1IS System instruction executes as TLBI VMALLE1IS.
- 1 Indicates that EL0 and EL1 are in Non-secure state, and so memory accesses from those Exception levels cannot access Secure memory.
When executing at EL3:
SPSR_EL3, Saved Program Status Register (EL3)
Holds the saved process state when an exception is taken to EL3.ACTLR, Auxiliary Control Register
AArch32 System register ACTLR provides IMPLEMENTATION DEFINED configuration and control options for execution at EL1 and EL0.ACTLR is a 32-bit register, and is part of:
- The Other system control registers functional group.
- The Implementation defined functional group.
ACTLR_EL1, Auxiliary Control Register (EL1)
Provides IMPLEMENTATION DEFINED configuration and control options for execution at EL1 and EL0.ACTLR_EL1 is a 64-bit register
ACTLR_EL2, Auxiliary Control Register (EL2)
Provides IMPLEMENTATION DEFINED configuration and control options for EL2.ACTLR_EL3, Auxiliary Control Register (EL3)
Provides IMPLEMENTATION DEFINED configuration and control options for EL3.ACTLR_EL3 is a 64-bit register.
MPIDR_EL1, Multiprocessor Affinity Register, EL1
The MPIDR_EL1 provides an additional core identification mechanism for scheduling purposes in a cluster.Configuration of what a processing element (PE) is in an ARM core or cluster is defined by the MPIDR system register.
The format of this is as follows (for AArch64): The MPIDR_EL1 enables software to determine on which core it is executing.
This register has a different value for each processing element in the system.
- RES0, [63:40] Reserved.
- Aff3, [39:32] Affinity level 3. Highest level affinity field.
- RES1, [31] Reserved
- U, [30] Indicates whether this is a single core or a multi-core cluster.
- [29:25] Reserved.
- MT, [24] Indicates whether the lowest level of affinity consists of logical cores that are implemented using a multithreading type approach.
- Aff2, [23:16]
- Aff1, [15:12] Part of Affinity level 1.
- Aff1, [11:8] Part of Affinity level 1. CPUID.Identification number for each CPU in the Cortex-A75 cluster:
- 0x0 MP1: CPUID: 0
- ...
- 0x7 MP8: CPUID: 7
- Aff0, [7:0] Affinity level 0.
0 means core is part of a multiprocessor system. This is the value for implementations with more than one core, and for implementations with an ACE or CHI master interface.
Read-As-Zero.
The level identifies individual threads within a multithreaded core.
The Cortex-A75 core is single-threaded, so this field has the value 0x00.
各個core之間是相互獨立,且可以並行執行邏輯的,每個core都有自己單獨的暫存器,l1, l2 快取等物理硬體。
intel又在core的基礎上提出了hyper-threading概念,即一個core裡可以模擬多個邏輯核,這個就叫做thread。
Thread is a logical processing unit which is implemented by software logic.
The affinity fields give a hierarchical description of the core's location relative to other cores.
Typically,
- Affinity 0 is the core ID within the cluster
- Affinity 1 is the cluster ID.
// 读取当前CPUID,如果id不为0(primary core),使其跳至halt休眠 // mrs -- Move the contents of a special register to a general-purpose register. // mpidr_el1 用来读取核心ID用 mrs x1, mpidr_el1 and x1, x1, #0xFF // CPU number is in MPIDR Affinity Level 0 cbnz x1, halt // Hang for all non-primary CPU
arch/arm64/include/asm/sysreg.h
#define read_sysreg_s(r) ({ \ u64 __val; \ asm volatile(__mrs_s("%0", r) : "=r" (__val)); \ __val; \ })
arch/arm64/include/asm/cputype.h
#define read_cpuid(reg) read_sysreg_s(SYS_ ## reg)
arch/arm/include/asm/cputype.h
#define CPUID_MPIDR 5 static inline unsigned int __attribute_const__ read_cpuid_mpidr(void) { return read_cpuid(CPUID_MPIDR); }
ARM GCC Inline Assembler Cookbook
The GNU C compiler for ARM RISC processors offers, to embed assembly language code into C programs.GCC asm statement
With inline assembly you can use the same assembler instruction mnemonics as you'd use for writing pure ARM assembly code.Basic inline assembly syntax
__asm [volatile] (code);code is the assembly instruction.
For ex.,
/* NOP example */ asm("mov r0,r0");You can write more than one assembler instruction in a single inline asm statement.
asm( "mov r0, r0\n\t" "mov r0, r0\n\t" "mov r0, r0\n\t" "mov r0, r0" );
Extended inline assembly syntax
However, registers and constants are specified in a different way, if they refer to C expressions.__asm [volatile] ( code_template : output operand list : input operand list : clobber list);code_template is a template for an assembly instruction.
The connection between assembly language and C operands is provided by an optional second and third part of the asm statement, the list of output and input operands.
- Each operand consists of a symbolic name in square brackets
- a constraint string
- "=r" for the output operands
- "r" for the output operands
- a C expression in parentheses.
/* Rotating bits example */ asm("mov %[result], %[value], ror #1" :: [result] "=r" (y) : [value] "r" (x));
The following example sets the current program status register of the ARM CPU. It uses an input, but no output operand.
asm ("msr cpsr,%[ps]" :: :: [ps] "r" (status));
ARM Trusted Firmware Porting Guide
Introduction
Porting the ARM Trusted Firmware to a new platform involves making some mandatory and optional modifications for both the cold and warm boot paths.Common Modifications
Common mandatory modifications
A platform port must enable the Memory Management Unit (MMU) with identity mapped page tables, and enable both the instruction and data caches for each BL stage.In the ARM FVP port, each BL stage configures the MMU in its platform- specific architecture setup function, for example blX_plat_arch_setup().
2.2 Handling reset
BL1 by default implements the reset vector where execution starts from a cold or warm boot.BL3-1 can be optionally set as a reset vector using the RESET_TO_BL31 make variable.
2.3 Common optional modifications
The following are helper functions implemented by the firmware that perform common platform-specific tasks.- int platform_get_core_pos(unsigned long) A platform may need to convert the MPIDR of a CPU to an absolute number, which can be used as a CPU-specific linear index into blocks of memory.
This routine contains a simple mechanism to perform this conversion, using the assumption that each cluster contains a maximum of 4 CPUs:
linear index = cpu_id + (cluster_id * 4) cpu_id = 8-bit value in MPIDR at affinity level 0 cluster_id = 8-bit value in MPIDR at affinity level 1
3 Boot Loader stage specific modifications
3.1 Boot Loader stage 1 (BL1)
3.2 Boot Loader stage 2 (BL2)
3.3 Boot Loader stage 3-1 (BL3-1)
3.3.1 Power State Coordination Interface (in BL3-1)
The ARM Trusted Firmware's implementation of the PSCI API is based around the concept of an affinity instance.Each affinity instance can be uniquely identified in a system by a CPU ID (the processor MPIDR is used in the PSCI interface) and an affinity level.
CPU affinity enables binding a process or multiple processes to a specific CPU core in a way that the process(es) will run from that specific core only.
When trying to perform performance testing on a host with many cores, it is wise to run multiple instances of a process, each one on different core.
This enables higher CPU utilization.
PSCI implementation (in BL3-1)
Interrupt Management framework (in BL3-1)
Crash Reporting mechanism (in BL3-1)
C Library
Storage abstraction layer
Fixed Virtual Platforms(FVP)
Fixed Virtual Platforms (FVPs) are complete simulations of an Arm system, including processor, memory and peripherals.These are set out in a "programmer's view", which gives you a comprehensive model on which to build and test your software.
Learning operating system development using Linux kernel and Raspberry Pi
Introduction
Contribution guide
Prerequisites
Lesson 1: Kernel Initialization
1.1 Introducing RPi OS, or bare metal “Hello, world!” Linux 1.2 Project structure 1.3 Kernel build system 1.4 Startup sequence 1.5 ExercisesLesson 2: Processor initialization
2.1 RPi OS
Exception levels
Each ARM processor that supports ARM.v8 architecture has 4 exception levels.You can think about an exception level (or EL for short) as a processor execution mode in which only a subset of all operations and registers is available.
The least privileged exception level is level 0. When processor operates at this level, it mostly uses only general purpose registers (X0 - X30) and stack pointer register (SP). EL0 also allows using STR and LDR commands to load and store data to and from memory and a few other instructions commonly used by a user program.
An operating system should deal with exception levels because it needs to implement process isolation.
A user process should not be able to access other process’s data.
To achieve such behavior, an operating system always runs each user process at EL0.
Operating at this exception level a process can only use it’s own virtual memory and can’t access any instructions that change virtual memory settings.
So, to ensure process isolation, an OS need to prepare separate virtual memory mapping for each process and put the processor into EL0 before transferring execution to a user process.
An operating system itself usually works at EL1.
While running at this exception level processor gets access to the registers that allows configuring virtual memory settings as well as to some system registers. Raspberry Pi OS also will be using EL1.
EL2 is used in a scenario when we are using a hypervisor.
In this case host operating system runs at EL2 and guest operating systems can only use EL 1.
This allows host OS to isolate guest OSes in a similar way how OS isolates user processes.
EL3 is used for transitions from ARM “Secure World” to “Insecure world”.
This abstraction exist to provide full hardware isolation between the software running in two different “worlds”.
Application from an “normal world” has no way to access or modify information (both instruction and data) that belongs to “Secure world”, and this restriction is enforced at the hardware level.
Finding current Exception level
A small function can figure out at which exception level is:.globl get_el get_el: mrs x0, CurrentEL lsr x0, x0, #2 retHere we use mrs instruction to read the value from CurrentEL system register into x0 register.
Then we shift this value 2 bits to the right (we need to do this because first 2 bits in the CurrentEL register are reserved and always have value 0).
And finally in the register x0 we have an integer number indicating current exception level.
To display this value,
int el = get_el(); printf("Exception level: %d \r\n", el);
Changing current exception level
In ARM architecture there is no way how a program can increase its own exception level without the participation of the software that already runs on a higher level.Current EL can be changed only if an exception is generated. This can happen if:
- a program executes some illegal instruction for example, tries to access memory location at a nonexisting address, or tries to divide by 0
- an application can run svc instruction to generate an exception on purpose
- a hardware interrupt
- Address of the current instruction is saved in the ELR_ELn register. ( Exception link register )
- Current processor state is stored in SPSR_ELn register (Saved Program Status Register)
- An exception handler is executed and does whatever job it needs to do. exception handler also needs to store the state of all general purpose registers and restore it back afterwards
- Exception handler calls eret instruction. This instruction restores processor state from SPSR_ELn and resumes execution starting from the address, stored in the ELR_ELn register.
- exception handler is not obliged to return to the same location from which the exception originates.
- Both ELR_ELn and SPSR_ELn are writable and exception handler can modify them if it wants to.
Switching to EL1
Strictly speaking, operating system is not obliged to switch to EL1, but EL1 is a natural choice because this level has just the right set of privileges to implement all common OS tasks.#include "arm/sysregs.h" #include "mm.h" .section ".text.boot" .globl _start _start: mrs x0, mpidr_el1 and x0, x0,#0xFF // Check processor id cbz x0, master // Hang for all non-primary CPU b proc_hang proc_hang: b proc_hang master: ldr x0, =SCTLR_VALUE_MMU_DISABLED msr sctlr_el1, x0 ldr x0, =HCR_VALUE msr hcr_el2, x0 ldr x0, =SCR_VALUE msr scr_el3, x0 ldr x0, =SPSR_VALUE msr spsr_el3, x0 adr x0, el1_entry msr elr_el3, x0 eret el1_entry: adr x0, bss_begin adr x1, bss_end sub x1, x1, x0 bl memzero mov sp, #LOW_MEMORY bl kernel_main b proc_hang // should never come hereAnalysis:
- sctlr_el1 sctlr_el1 is responsible for configuring different parameters of the processor, when it operates at EL1.
- hcr_el2 Even We are not going to implement our own hypervisor. Stil we need to use this register because, among other settings, it controls the execution state at EL1.
- scr_el3 This register is responsible for configuring security settings.
- spsr_el3 spsr_el3 contains processor state, that will be restored after we execute eret instruction.
- Condition Flags Those flags contains information about previously executed operation: whether the result was negative (N flag), zero (A flag), has unsigned overflow (C flag) or has signed overflow (V flag).
- Interrupt disable bits Those bits allows to enable/disable different types of interrupts.
- Some other information, required to fully restore the processor execution state after an exception is handled.
- ELR_EL3
For example, it controls whether the cache is enabled and, what is most important for us, whether the MMU (Memory Management Unit) is turned on.
sctlr_el1 is accessible from all exception levels higher or equal than EL1 (you can infer this from _el1 postfix)
// Some bits in the description of sctlr_el1 register are marked as RES1. // Those bits are reserved for future usage and should be initialized with 1. #define SCTLR_RESERVED (3 << 28) | (3 << 22) | (1 << 20) | (1 << 11) // This field controls endianess of explicit data access at EL1. // We are going to configure the processor to work only with little-endian format. #define SCTLR_EE_LITTLE_ENDIAN (0 << 25) // this one controls endianess of explicit data access at EL0 #define SCTLR_EOE_LITTLE_ENDIAN (0 << 24) // Disable instruction cache. #define SCTLR_I_CACHE_DISABLED (0 << 12) // Disable data cache. #define SCTLR_D_CACHE_DISABLED (0 << 2) // Disable MMU. #define SCTLR_MMU_DISABLED (0 << 0) #define SCTLR_MMU_ENABLED (1 << 0) #define SCTLR_VALUE_MMU_DISABLED (SCTLR_RESERVED | SCTLR_EE_LITTLE_ENDIAN | SCTLR_I_CACHE_DISABLED | SCTLR_D_CACHE_DISABLED | SCTLR_MMU_DISABLED)
Execution state must be AArch64 and not AArch32.
#define HCR_RW (1 << 31) #define HCR_VALUE HCR_RW
For example, it controls whether all lower levels are executed in “secure” or “nonsecure” state.
It also controls execution state at EL2.
#define SCR_RESERVED (3 << 4) #define SCR_RW (1 << 10) #define SCR_NS (1 << 0) #define SCR_VALUE (SCR_RESERVED | SCR_RW | SCR_NS)
It is worth saying a few words explaining what processor state is.
Processor state includes the following information:
Values of those flags can be used in conditional branch instructions.
For example, b.eq instruction will jump to the provided label only if the result of the last comparison operation is equal to 0.
The processor checks this by testing whether Z flag is set to 1.
However this register is writable, so we take advantage of this fact and manually prepare processor state.
// After we change EL to EL1 all types of interrupts will be masked (or disabled, which is the same). #define SPSR_MASK_ALL (7 << 6) // At EL1 we can either use our own dedicated stack pointer or use EL0 stack pointer. // EL1h mode means that we are using EL1 dedicated stack pointer. #define SPSR_EL1h (5 << 0) #define SPSR_VALUE (SPSR_MASK_ALL | SPSR_EL1h)
2.2 Linux
2.3 Exercises
Lesson 3: Interrupt handling
3.1 RPi OS Linux 3.2 Low level exception handling 3.3 Interrupt controllers 3.4 Timers 3.5 ExercisesLesson 4: Process scheduler
4.1 RPi OS Linux 4.2 Scheduler basic structures 4.3 Forking a task 4.4 Scheduler 4.5 ExercisesLesson 5: User processes and system calls
5.1 RPi OS 5.2 Linux 5.3 ExercisesLesson 6: Virtual memory management
6.1 RPi OS 6.2 Linux (In progress) 6.3 ExercisesLesson 7: Signals and interrupt waiting (To be done)
Lesson 8: File systems (To be done)
Lesson 9: Executable files (ELF) (To be done)
Lesson 10: Drivers (To be done)
Lesson 11: Networking (To be done)
嵌入式系統建構:開發運作於STM32的韌體程式
Programming with 64-Bit ARM Assembly Language
Single Board Computer Development for Raspberry Pi and Mobile DevicesStephen Smith
Introduction
This book delves into how these are programmed at the bare metal level and provides insight into their architecture.Knowing how the processor works will let you write more efficient C code.
Source Code Location: https://github.com/Apress/Programming-with-64-Bit-ARM--Assembly-Languag
CHAPTER 1 Getting Started
The idea was to use reduced instruction set computer (RISC) technology as opposed to complex instruction set computer (CISC) .Writing in Assembly is harder, as you must solve problems with memory addressing and CPU registers that is all handled transparently by high- level languages.
Hardware
- Broadcom BCM2711, 四核Cortex-A72 (ARM v8) 64位元 1.5GHz處理器
- 4GB LPDDR4-3200 SDRAM
- 2.4 GHz/5.0 GHz IEEE 802.11b/g/n/ac 無線網路, 藍牙 5.0 BLE
- Gigabit Ethernet
- 2個USB 3.0埠; 2個USB 2.0埠
- Raspberry Pi標準40 pin GPIO排針擴充板插座
- 2個micro-HDMI埠 (可達4K60幅顯示輸出)
- 2-lane MIPI DSI顯示埠
- 2-lane MIPI CSI相機埠
- 4-pole 立體聲音和複合視訊埠
- H264 (1080p60解碼, 1080p30編碼)
- OpenGL ES 3.0 graphics
- Micro-SD卡插槽
- 5V DC 可經由USB-C插座輸入 (最小3A)
- 5V DC 可經由GPIO插座輸入 (最小3A)
- 5V DC 可經由PoE輸入 (需要另外安裝PoE擴充板)
- 工作環境溫度: 0 - 50 度C
Software
Raspberry Pi OS with desktop
- Downloading Installing the Operating System
- Raspberry Pi OS (Legacy) with desktop
https://downloads.raspberrypi.org/raspios_oldstable_armhf/images/raspios_oldstable_armhf-2022-04-07/2022-04-04-raspios-buster-armhf.img.xz
https://downloads.raspberrypi.org/raspios_lite_armhf/images/raspios_lite_armhf-2022-04-07/2022-04-04-raspios-bullseye-armhf-lite.img.xz
https://downloads.raspberrypi.org/raspios_armhf/images/raspios_armhf-2022-04-07/2022-04-04-raspios-bullseye-armhf.img.xz
https://downloads.raspberrypi.org/raspios_full_armhf/images/raspios_full_armhf-2022-04-07/2022-04-04-raspios-bullseye-armhf-full.img.xz
https://downloads.raspberrypi.org/raspios_arm64/images/raspios_arm64-2022-04-07/2022-04-04-raspios-bullseye-arm64.img.xz
$ sudo apt install rpi-imagerOpen Raspberry Pi Imager and choose the required OS from the list presented.
Or, on Linux, you can use the standard command line tools:
$ sudo dd if=2021-10-30-raspios-bullseye-armhf.img of=/dev/sdX bs=4M conv=fsync
$ tree /dev/disk/ ... └── by-uuid ├── 137e2641-afc7-4d05-bfbf-a40cad4f8261 -> ../../sda1 (swap, 8G) ├── ec464b47-461d-4b86-acc8-6ab342d6a8e3 -> ../../sda2 (/usr, 8G) ├── cff70456-f637-4eed-945b-3c95a8bc48db -> ../../sda3 (/opt, 1G) ├── 69b33879-49f9-4d4e-b787-07b0b60211ba -> ../../sda5 (/var, 2G) └── f0d406f3-3da0-4fa7-8aa7-9eaf2b74047e -> ../../sda6 (/home, 9.6G) $ sudo mkswap /dev/sda1 $ sudo swapon -U 137e2641-afc7-4d05-bfbf-a40cad4f8261 $ cat /etc/fstab proc /proc proc defaults 0 0 PARTUUID=003e8b7d-01 /boot vfat defaults,flush 0 2 PARTUUID=003e8b7d-02 / ext4 defaults,noatime 0 1 # a swapfile is not a swap partition, no line here # use dphys-swapfile swap[on|off] for that # /dev/sda1 for swap UUID=137e2641-afc7-4d05-bfbf-a40cad4f8261 none swap sw 0 0 # /dev/sda2 for /usr UUID=ec464b47-461d-4b86-acc8-6ab342d6a8e3 /usr ext4 defaults 0 2 # /dev/sda3 for /opt UUID=cff70456-f637-4eed-945b-3c95a8bc48db /opt ext4 defaults 0 2 # /dev/sda5 for /var UUID=69b33879-49f9-4d4e-b787-07b0b60211ba /var ext4 defaults 0 2 # /dev/sda6 for /home UUID=f0d406f3-3da0-4fa7-8aa7-9eaf2b74047e /home ext4 defaults 0 2
$ sudo apt-get -y install scim-chewing登出後再重新登入,即會生效. 輸入框按下「Ctrl」 + 「Space」後,螢幕右上角圖示變更,此時可輸入中文。 If you are installing Raspberry Pi OS Lite and intend to run it headless, you will still need to create a new user account. Since you will not be able to create the user account on first boot, you MUST configure the operating system using the Advanced Menu.
Ubuntu
$ xzcat /home/jerry/Downloads/ubuntu-22.04-preinstalled-desktop-arm64+raspi.img.xz | sudo dd of=/dev/sdd bs=32M; sync
Kali Linux
Kali Linux works very well and will be using it to test all the programs in this book.Kali Linux contains several hundred tools targeted towards various information security tasks, such as Penetration Testing, Security Research, Computer Forensics and Reverse Engineering.
To install a pre-built image of the standard build of Kali Linux on your Raspberry Pi 4, follow these instructions:
- Get a fast microSD card with at least 16GB capacity. Class 10 cards are highly recommended.
- Download and validate our preferred Kali Raspberry Pi 4 image from the downloads area. The process for validating an image is described in more detail on Downloading Kali Linux.
- Use the dd utility to image this file to your microSD card (same process as making a Kali USB. Assume the storage device is located at /dev/sdd.
$ xzcat kali-linux-2022.1-raspberry-pi-arm64.img.xz | sudo dd of=/dev/sdd bs=4M status=progress
You should be able to log in to Kali.
User: kali Password: kaliEnable ssh login:
- Install Kali Linux remote SSH-OpenSSH server
- Enable Kali Linux Remote SSH Service
- check whether the service is running.
$ sudo apt-get install ssh $ sudo service ssh start
$ sudo update-rc.d -f ssh remove $ sudo update-rc.d -f ssh defaults
$ sudo apt-get install chkconfig $ sudo chkconfig -l ssh
ARM Assembly Instructions
The ARM is what is called a RISC computer, there are fewer instructions and each one is simple, so the processor can execute each instruction quickly.CPU Registers
The registers are part of the CPU circuitry allowing instant access, whereas memory is a separate component and there is a transfer time for the CPU to access it.In all computers, data is not operated in the computer’s memory; instead it’s loaded into a CPU register, then the data processing or arithmetic operation is performed in the registers.
If you want to add two numbers, you might do the following:
- Load one into one register and the other into another register.
- Perform the add operation putting the result into a third register.
- Copy the answer from the results register into memory.
- X0–X30 These 31 registers are general purpose; you can use them for anything you like, though some have standard agreed-upon usage that we will cover later.
- SP, XZR The stack pointer or zero register depending on the context.
- X30, LR The link register.
- PC The program counter.
If you call a function, this register will be used to hold the return address.
As this is a common operation, you should avoid using this register for other things.
The memory address of the currently executing instruction.
Using 32 bits saves memory.
ARM Instruction Format
Each ARM binary instruction is 32 bits long.Every bit in the instructin is used to tell the processor what to do.
There are quite a few instruction formats, and it can be helpful to know how the bits for each instruction are packed into 32 bits.
Since there are 32 registers in used mode, it takes 5 bits to specify a register.
Having small fixed length instructions, it doesn’t need to start decoding an instruction to know how long it is and hence where the next instruction starts.
This is a key feature to allowing processing parallelism and efficiency.
Each instruction that takes registers can either use the 32-bit W version or the 64-bit Z version.
To specify which is the case, the high bit of each instruction specifies how we are viewing the registers.
Data processing are move, arithmetic, logical, comparison and multiply instructions.
The instruction encoding of the data processing instruction: An instruction in isolation takes three clock cycles,
- one to load the instruction from memory
- one to decode the instruction, and then
- one to execute the instruction
Computer Memory
The 64-bit mode means:- Memory addresses are specified using 64 bits.
- The CPU registers are each 64 bits wide and perform 64-bit integer arithmetic.
You can load from memory by using a register to specify the address to load.
This is called indirect memory access.
About the GCC Assembler
The general way you specify Assembly instructions is:label: opcode operands
- label: optional and only required if you want the instruction to be the target of a branch instruction.
- opcodes each one is a short mnemonic such as
- ADD for addition
- LDR for load a register
- B for branch
- There are quite a few different formats for the operands
- Install the GNU Compilers Collection (GCC)’s toolchain for the x86_64 platform
$ sudo apt install -y build-essential $ sudo apt install -y crossbuild-essential-arm64 $ sudo apt install -y crossbuild-essential-armhf
$ sudo apt update && sudo apt dist-upgrade $ sudo apt-get install build-essential gawk gcc g++ gfortran git texinfo bison libncurses-dev bc flex libssl-dev make
Hello World
HelloWorld.s:.global _start // Provide program starting address _start: mov x0, #1 /* 1 = StdOut */ ldr x1, =helloworld /* string to print */ mov x2, #13 /* length of our string */ mov x8, #64 /* linux write() system call */ svc 0 /* call Linux system call */ // setup parameters to exit the program gracefully mov x0, #0 // return code = 0 mov x8, #93 // service call 93 svc 0 /* call Linux system call */ .data helloworld: .ascii "Hello World!\n"Build the execute:
$ as -o HelloWorld.o HelloWorld.s $ ld -o HelloWorld HelloWorld.o $ ./HelloWorld Hello World!
About Comments
This is the same as comments in C/C++ code:- //double slashes
- /∗ and ∗/
Where to Start
The Assembler marks the statement containing _start as the program entry point; then the linker can find it.only one file can contain _start.
Assembly Instructions
svc 0command that executes software interrupt number 0.
This branches to the interrupt handler in the Linux kernel.
Data
A label “helloworld” followed by an .ascii directive which allocates one or more bytes of memory in the current section, and defines the initial contents of the memory from a string literal.
Calling Linux
This program makes two Linux system calls to do its work:- The first is the Linux write to file command (#64).
- Each system call number is specified by putting its function number in X8.
- put the parameters in registers X0–X7 depending on how many parameters are needed.
- a return code is placed in X0 for checking the execution result
Reverse Engineering Our Program
$ objdump -s -d HelloWorld.o HelloWorld.o: file format elf64-littleaarch64 Contents of section .text: 0000 200080d2 e1000058 a20180d2 080880d2 ......X........ 0010 010000d4 000080d2 a80b80d2 010000d4 ................ 0020 00000000 00000000 ........ Contents of section .data: 0000 48656c6c 6f20576f 726c6421 0a Hello World!. Disassembly of section .text: 0000000000000000 <_start>: 0: d2800020 mov x0, #0x1 // #1 4: 580000e1 ldr x1, 20 <_start+0x20> 8: d28001a2 mov x2, #0xd // #13 c: d2800808 mov x8, #0x40 // #64 10: d4000001 svc #0x0 14: d2800000 mov x0, #0x0 // #0 18: d2800ba8 mov x8, #0x5d // #93 1c: d4000001 svc #0x0 ...Let’s investigate the binary representation of the first MOV instruction which compiled to 0xd2800020:
- The 1st bit is 1 It means to use the 64-bit version of the registers, in this case X0 rather than W0.
- The 3rd bit is 0 It means that this instruction doesn’t set any flags that would affect conditional instructions.
- The 2nd bit combined with the 4-th to 9-th bits make up the opcode for this MOV instruction. This is move wide immediate, meaning it contains a 16-bit immediate value as the operand.
- The 10-th and 11-th bits of 0 indicate there is no shift operation involved.
- The 12-th to 27-th bits are the immediate value which is 1
- The last 5 bits are the register to load. These are 0 since we are loading register X0.
Chapter 2: Loading and Adding
To understand the ARM instruction set by going slowly through the MOV and ADD instructions.Negative Numbers
The CPU must look at the sign bits, then decide whether to add or subtract and in which order.About Two’s Complement
Two’s complement is to change all the 1s to 0s and all the 0s to 1s and then add 1.-3 can be represented as
~ (0000 0011) +1 = 1111 1101 = 0xFDFor 1 byte calculation,
5 - 3 = 5 + (-3) = 5 + 0xFD = 0x102 = 2
About Gnome Programmer’s Calculator
The Gnome programmer’s calculator can calculate the two’s complement.About One’s Complement
If we don’t add 1, and just change all the 1s to 0s and vice versa, then this is called one’s complement.Big vs. Little Endian
Big endian is how we normally deal with numbers: the most significant byte or digits are placed leftmost in the structure (the big end, the low memory address). Known as the "network byte order," the TCP/IP Internet protocol also uses big endian regardless of the hardware at either end.About Bi-endian
Pros of Little Endian
Even though Linux uses little endian, many protocols like TCP/IP used on the Internet use big endian and so require a transformation when moving data from the computer to the outside world.Shifting and Rotating
0x30 = 3 * 16 = 3 * 2 4About Carry Flag
When instructions execute, they can optionally set some flags that contain useful information on what happened. Then other instructions can test these flags and process accordingly.
About the Barrel Shifter
Basics of Shifting and Rotating
- Logical shift left The last bit shifted out ends up in the carry flag.
- Logical shift right the last bit shifted out ends up in the carry flag.
- Arithmetic shift right If we want to preserve the sign bit, use arithmetic shift right. Here a 1 comes in from the left, if the number is negative, and a 0 if it is positive.
- Rotate right
Loading Registers
Instruction Aliases
MOV isn’t an ARM Assembly instruction; it’s an alias.The Assembler finds a real ARM instruction to do the job.
For ex.,
ADD X0, XZR, X1This instruction adds the contents of register X1 to the zero register and puts the result in X0.
If you use objdump, it might show the same alias you used, another alternate alias, or the real instruction. There is a “-M no-aliases” option for objdump where you can see the true underlying instruction.
MOV/MOVK/MOVN
There are several forms of the MOV instruction:- MOV(Register to Register) For example:
MOV X1, X2This copies register X2 into register X1.
For ex., to load register X2 with the 64-bit hex value 0x1234FEDC4F5D6E3A
MOV X2, #0x6E3A MOVK X2, #0x4F5D, LSL #16 MOVK X2, #0xFEDC, LSL #32 MOVK X2, #0x1234, LSL #48The above example adding a shift operator to the second operand.
About Operand2
All the ARM’s data processing instructions have the option of taking a flexible Operand2 as one of their parameters.There are three formats for Operand2:
- A register and a shift You can specify a register and a shift.
For ex.,
MOV X1, X2, LSL #1 // Logical shift left MOV X1, X2, LSR #1 // Logical shift right MOV X1, X2, ASR #1 // Arithmetic shift right MOV X1, X2, ROR #1 // Rotate rightTo make the code a little more readable, the Assembler provides mnemonics (aliases) for these to generate the same byte code,
LSL X1, X2, #1// Logical shift left LSR X1, X2, #1// Logical shift right ASR X1, X2, #1// Arithmetic shift right ROR X1, X2, #1// Rotate right
- uxtb Unsigned extend byte
- uxth Unsigned extend halfword
- uxtw Unsigned extend word
- sxtb Sign-extend byte
- sxth Sign-extend halfword
- sxtw Sign-extend word
// Too big for #imm16 MOV X1, #0xAB000000will be translated by the Assembler to
MOV x1, #0xAB00, LSL #16
MOVN(Move Not)
It works just like MOV, except it reverses all the 1s and 0s as it loads the register.It applies a logical NOT operation to each bit in the word you are loading into the register.
Its main usage:
- To calculate the one’s complement
- Multiply by -1. The negative of a number is the two’s complement of the number, or the one’s complement plus one.
MOV Examples
The example to illustrate the MOV instructions.This program doesn’t do anything besides move various numbers into registers.
movexamps.s,
// Examples of the MOV instruction. // .global _start // Provide program starting address // Load X2 with 0x1234FEDC4F5D6E3A first using MOV and MOVK _start: mov x2, #0x6E3A MOVK X2, #0x4F5D, LSL #16 MOVK X2, #0xFEDC, LSL #32 MOVK X2, #0x1234, LSL #48 // Just move W2 into W1 MOV W1, W2 // Now lets see all the shift versions of MOV MOV X1,X2,LSL #1 // Logical shift left MOV X1, X2, LSR #1 // Logical shift right MOV X1, X2, ASR #1 // Arithmetic shift right // Repeat the above shifts using mnemonics. LSL X1,X2,#1 // Logical shift left LSR X1,X2,#1 // Logical shift right ASR X1,X2,#1 //Arithmetic shift right ROR X1,X2,#1 // Rotate right // Example that works with 8 bit immediate and shift MOV X1, #0xAB000000 // Too big for #imm16 // Example that can't be represented and results in an error // Uncomment the instruction if you want to see the error // MOV X1, #0xABCDEF11 // Too big for #imm16 and can't be represented. // Example of MOVN MOVN W1, #45 // Example of a MOV that the Assembler will change to MOVN MOV W1, #0xFFFFFFFE // (-2) // Setup the parameters to exit the program // and then call Linux to do it. MOV X0, #0 // Use 0 return code MOV X8, #93 // Serv command code 93 terms SVC 0 // Call linux to terminateWe can see the true ARM 64-bit instructions that are produced by the Assembler by objdump:
$ objdump -s -d -M no-aliases movexamps.o movexamps.o: file format elf64-littleaarch64 Contents of section .text: 0000 42c78dd2 a2eba9f2 82dbdff2 8246e2f2 B............F.. 0010 e103022a e10702aa e10742aa e10782aa ...*......B..... 0020 41f87fd3 41fc41d3 41fc4193 4104c293 A...A.A.A.A.A... 0030 0160b5d2 a1058012 21008012 000080d2 .`......!....... 0040 a80b80d2 010000d4 ........ Disassembly of section .text: 0000000000000000 <_start>: 0: d28dc742 movz x2, #0x6e3a 4: f2a9eba2 movk x2, #0x4f5d, lsl #16 8: f2dfdb82 movk x2, #0xfedc, lsl #32 c: f2e24682 movk x2, #0x1234, lsl #48 10: 2a0203e1 orr w1, wzr, w2 14: aa0207e1 orr x1, xzr, x2, lsl #1 18: aa4207e1 orr x1, xzr, x2, lsr #1 1c: aa8207e1 orr x1, xzr, x2, asr #1 20: d37ff841 ubfm x1, x2, #63, #62 24: d341fc41 ubfm x1, x2, #1, #63 28: 9341fc41 sbfm x1, x2, #1, #63 2c: 93c20441 extr x1, x2, x2, #1 30: d2b56001 movz x1, #0xab00, lsl #16 34: 128005a1 movn w1, #0x2d 38: 12800021 movn w1, #0x1 3c: d2800000 movz x0, #0x0 40: d2800ba8 movz x8, #0x5d 44: d4000001 svc #0x0We can see the shift instructions were converted into UBFM, SBFM, and EXTR instructions.
ADD/ADC
These instructions all add their second and third parameters and put the result in their first parameter register destination (Rd):ADD{S} Xd, Xs, Operand2 ADC{S} Xd, Xs, Operand2The registers Rd and source register (Rs) can be the same.
Examples,
// the immediate value can be 12-bits, so 0-4095 // X2 = X1 + 4000 ADD X2, X1, #4000 // the shift on an immediate can be 0 or 12 // X2 = X1 + 0x20000 ADD X2, X1, #0x20, LSL 12 // simple addition of two registers // X2 = X1 + X0 ADD X2, X1, X0 // addition of a register with a shifted register // X2 = X1 + (X0 * 4) ADD X2, X1, X0, LSL 2 // With register extension options // X2 = X1 + signed extended byte(X0) ADD X2, X1, X0, SXTB // X2 = X1 + zero extended halTo print out a number, we must first convert the number to an ASCII string.
There is a trick, we can get one number from our program via the program’s return code.
/* This is a comment */ .global _start /* 'main' is our entry point and must be global */ _start: /* This is main */ mov w0, #2 /* Put a 2 inside the register w0 */ // Setup the parameters to exit the program and then call Linux to do it. // W0 is the return code MOV X8, #93 // Service command code 93 SVC 0 // Call linux to terminateTo see the return code after execution:
$ echo $? 2
Add with Carry
We can combine multiple ADD instructions to add arbitrarily large integers. The key to this is the carry flag.When an addition overflows, it sets the carry flag.
The ARM processor adds 64 bits at a time, so we only need the carry flag if we are dealing with numbers larger than what will fit into 64 bits.
If we want an instruction to alter them, then we place an “S” on the end of the opcode, and the Assembler will set the carry flag( bit 29 ) when it builds binary version of the instruction.
This example will add two 128-bit integers,
- registers X2 and X3 for the first 12b-bit number
- registers X4 and X5 for the first 12b-bit number
- X0 and X1 for the result.
ADDS X1, X3, X5 // Lower order 64-bits ADC X0, X2, X4 // Higher order 64-bits
- ADDS adds the lower order 64 bits and sets the carry flag
- ADDC adds the higher-order words, plus the carry flag
SUB/SBC
SUB{S} Xd, Xs, Operand2 SBC{S} Xd, Xs, Operand2The carry flag is used to indicate when a borrow is necessary.
SUBS will clear the carry flag if the result is negative and set it if positive; SBC then subtracts one if the carry flag is clear.
Chapter 3: Tooling Up
GNU Make
Rebuilding a File
A Rule for Building .s Files
%.o : %.s as $< -o $@ HelloWorld: HelloWorld.o ld -o HelloWorld HelloWorld.o
- %.s is like a wildcard meaning any .s file.
- $< is a symbol for the source file.
- $@ is a symbol for the output file.
Defining Variables
TARGET = HelloWorld OBJS = $(TARGET).o
GDB
sudo apt-get install gdb
Preparing to Debug
To add debug information to our program, we must Assemble it with the -g flag.Use a Makefile variable to control the debug flag,
ifdef DEBUG DEBUGFLGS = -g else DEBUGFLGS = endi
Beginning GDB
Commands:- gdb executable
- run runs to completion
- list lists ten lines.
- disassemble _start shows the actual code produced by the Assembler with no comments.
- b _start To set a breakpoint. We can specify a line number, or a symbol for our breakpoint
- s step through the program
- i r see the values of the registers
- c continue to the next breakpoint
- i b see infomation of all breakpoints
- delete 1 delete a breakpoint with the delete command, specifying the breakpoint number to delete.
- x /Nfu addr display content of memory in different formats.
- N the number of units to be displayed
- f the display format, commonly used:
- t
- binary
- x
- hexadecimal
- d
- decimal
- i
- instruction
- s
- string
- u unit size.
- b
- bytes
- h
- halfwords (16 bits)
- w
- words (32 bits)
- g
- giant words (64 bits)
- q
- quit gdb
(gdb) x /4ubft _start 0x400078 <_start>: 01000010 11000111 10001101 11010010 (gdb) x /4ubfi _start 0x400078 <_start>: mov x2, #0x6e3a // #28218 => 0x40007c <_start+4>: movk x2, #0x4f5d, lsl #16 0x400080 <_start+8>: movk x2, #0xfedc, lsl #32 0x400084 <_start+12>: movk x2, #0x1234, lsl #48 (gdb) x /4ubfx _start 0x400078 <_start>: 0x42 0xc7 0x8d 0xd2 (gdb) x /4ubfd _start 0x400078 <_start>: 66 -57 -115 -46
Cross-Compiling
Get all the necessary GNU and Linux tools to compile for ARM,sudo apt-get install gcc-aarch64-linux-gnu g++-aarch64-linux-gnuThese tools will be installed under /usr/aarch64-linux-gnu/ so that it will not be used in Intel-based host machine by default path.
To use the cross-platform tools, add this path in our makefile:
TOOLPATH = /usr/aarch64-linux-gnu/bin HelloWorld: HelloWorld.o $(TOOLPATH)/ld -o HelloWorld HelloWorld.o HelloWorld.o: HelloWorld.s $(TOOLPATH)/as -o HelloWorld.o HelloWorld.sIt can be faster to do your builds on a more powerful laptop or desktop than on the target.
The workflow is to build the program on a full development (native) system and then transfer the program to the target processor using a USB cable, serial cable, or via Ethernet.
Emulation
There are quite a few different emulators available with Ubuntu Linux running on an Intel CPU.To play around with Arm assembly without an Arm board, the QEMU user mode emulation is more than sufficient.
- Executing ARM64 binaries (C to Binary) Setting up a full-system QEMU emulation on your x86_64 Linux host system.
$ sudo apt install qemu-user qemu-user-static gcc-aarch64-linux-gnu binutils-aarch64-linux-gnu binutils-aarch64-linux-gnu-dbg build-essentialCreate a file containing a simple C program for testing,
#include <stdio.h> int main(void) { return printf("Hello, I'm executing ARM64 instructions!\n"); }To compile the code as a static executable,
$ aarch64-linux-gnu-gcc -static -o hello64 hello.c $ file hello64 hello64: ELF 64-bit LSB executable, ARM aarch64, version 1 (GNU/Linux), statically linked, BuildID[sha1]=f6e13f22124754ff411cd4c40011b3da72388684, for GNU/Linux 3.7.0, not strippedThanks to qemu-user-static, statically linked aarch64 binary can be run on our x86_64 host directly,
$ ./hello64 Hello, I'm executing ARM64 instructions!To execute a dynamically linked Arm executable on our x86_64 host, the package that makes this possible is qemu-user.
To compile the code as a dynamicly linked executable, compile the C code without the -static flag.
$ aarch64-linux-gnu-gcc -o hello64dyn hello.c $ file ./hello64dyn ./hello64dyn: ELF 64-bit LSB shared object, ARM aarch64, version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux-aarch64.so.1, BuildID[sha1]=8d5a19d29c460ef70c98912db056e5e1ca9e9607, for GNU/Linux 3.7.0, not strippedThen, we need to use qemu-aarch64 and supply the aarch64 libraries via the -L flag.
$ qemu-aarch64 -L /usr/aarch64-linux-gnu ./hello64dyn Hello, I'm executing ARM64 instructions!
sudo apt install gcc-arm-linux-gnueabihf binutils-arm-linux-gnueabihf binutils-arm-linux-gnueabihf-dbg
Android NDK
Apple XCode
Source Control and Build Servers
Git
Jenkins
Chapter 4: Controlling Program Flow
Unconditional Branch
An unconditional branch to a labe:B labelThe label is interpreted as an offset from the current PC register and has 26 bits in the instruction.
This allows a jump of up to 128 megabytes in either direction.
An endless loop:
_start: MOV X1, #1 B _start
About Condition Flags
The condition flags are- Negative N is 1 if the signed value is negative and cleared if the result is positive or 0.
- Zero Z Is set if the result is 0; this usually denotes an equal result from a comparison.
- Carry For addition type operations, this flag is set if the result produces an overflow.
- OVerflow For addition and subtraction, this flag is set if a signed overflow occurred.
If the result is nonzero, this flag is cleared.
For subtraction type operation, this flag is set if the result does not require a borrow.
Also, it’s used in shifting to hold the last bit that is shifted out.
Overflow occurs if the result is greater than or equal to 231, or less than -231.
These flags are only set if you append an “S” to the end of the instruction’s opcode, otherwise the flags will remain unmodified.
Branch on Condition
To only branch if a certain condition flags are set or clear.B.{condition} labelwhere {condition} is taken from the following: For ex.,
B.EQ _startwill branch to _start if the Z flag is set.
About the CMP Instruction
CMP Xn, Operand2This instruction compares the contents of register Xn with Operand2.
This instruction is equivalent to
SUBS XZR, Xn, Operand2The status flag will be updated accordingly. For example, to do a branch only if register W4 is 45,
B.EQ _start
Loops
Loops can be constructed with branch and comparison instructions.FOR Loops
FOR I = 1 to 10 ... some statements...The above can be implemented:
MOV W2, #1 // W2 holds I loop: // body of the loop goes here. // Most of the logic is at the end ADD W2, W2, #1 // I = I + 1 CMP W2, #10 B.LE loop // IF I <= 10 goto loop
While Loop
// WHILE X < 5 // ... other statements .... // END WHILE // W4 is X and has been initialized loop: CMP W4, #5 B.GE loopdone // ... other statements in the loop body ... B loop loopdone: // program continues
If/Then/Else
For ex,IF W5 < 10 THEN .... if statements ... ELSE ... else statements ... END IFImplement:
CMP W5, #10 B.GE elseclause ... if statements ... B endif elseclause: ... else statements ... endif: // continue on after the /then/else ...
Logical Operators
The ARM’s logical operators manipulate the bits in the registers.AND{S} Xd, Xs, Operand2 EOR{S} Xd, Xs, Operand2 ORR{S} Xd, Xs, Operand2 BIC{S} Xd, Xs, Operand2
AND
AND performs a bitwise logical and operation between each bit in Xs and Operand2, putting the result in Xd.For ex., if we only want the high-order byte of a register
AND W6, W6, #0xFF000000 // shift the byte down to the // low order position. LSR W6, W6, #24
EOR
EOR performs a bitwise exclusive or operation between each bit in Xs and Operand2, putting the result in Xd.ORR
ORR performs a bitwise logical or operation between each bit in Xs and Operand2, putting the result in Xd.For ex., set the low-order byte of X6 to all 1 bits (0xFF) while leaving the seven other bytes unaffected.
ORR X6, X6, #0xFF
BIC
BIC (bit clear) performs Xs AND NOT Operand2.The reason this is called bit clear is that
- if the bit in Operand2 is 1, then the resulting bit will be 0. For ex., This clears the low-order byte of X6, while leaving the other seven bytes unaffected
BIC X6, X6, #0xFF
Design Patterns
If you adopt a few standard design patterns for how to perform loops and other programming constructs, it will make reading your programs much easier.Converting Integers to ASCII
Pseudo-code to print a register:outstr = memory where we want the string + 9 // (string is form 0x123456789ABCDEF0 and we want // the last character) FOR W5 = 16 TO 1 STEP -1 digit = X4 AND 0xf IF digit < 10 THEN asciichar = digit + '0' ELSE asciichar = digit + 'A' - 10 END IF *outstr = asciichar outstr = outstr - 1 NEXT W5printdword.s:
// // Assembler program to print a register in hex // to stdout. // // X0-X2 - parameters to linux function services // X1 - is also address of byte we are writing // X4 - register to print // W5 - loop index // W6 - current character // X8 - linux function number // .global _start // Provide program starting address _start: MOV X4, #0x6E3A MOVK X4, #0x4F5D, LSL #16 MOVK X4, #0xFEDC, LSL #32 MOVK X4, #0x1234, LSL #48 LDR X1, =hexstr // start of string ADD X1, X1, #17 // start at least sig digit // The loop is FOR W5 = 16 TO 1 STEP -1 MOV W5, #16 // 16 digits to print loop:AND W6, W4, #0xf // mask of least sig digit // If W6 >= 10 then goto letter CMP W6, #10 // is 0-9 or A-F B.GE letter // Else its a number so convert to an ASCII digit ADD W6, W6, #'0' B cont // goto to end if letter: // handle the digits A to F ADD W6, W6, #('A'-10) cont:// end if STRB W6, [X1] // store ascii digit SUB X1, X1, #1 // decrement address for next digit LSR X4, X4, #4 // shift off the digit // next W5 SUBS W5, W5, #1 // step W5 by -1 B.NE loop // another for loop if not done // Setup the parameters to print our hex number // and then call Linux to do it. mov X0, #1 // 1 = StdOut ldr X1, =hexstr // string to print mov X2, #19 // length of our string mov X8, #64 // linux write system call svc 0 // Call linux to output the string // Setup the parameters to exit the program // and then call Linux to do it. mov X0, #0 // Use 0 return code mov X8, #93 // Service code 93 terminates svc 0 // Call linux to terminate .data hexstr: .ascii "0x123456789ABCDEFG\n"compile and execute the program,
$ as printdword.s -o printdword.o $ ld -o printdword printdword.o $ ./printdword 0x1234FEDC4F5D6E3A
Using Expressions in Immediate Constants
ADD W6, W6, #('A'-10)
Storing a Register to Memory
STRB W6, [X1]The store byte (STRB) instruction saves the low-order byte of the first register into the memory location contained in X1.
The syntax [X1] is to make clear that we are using memory indirection, and not just putting the byte into register X1.
Why Not Print in Decimal
Performance of Branch Instructions
If you put a lot of branches in your code, you suffer a performance penalty.More Comparison Instructions
Summary
Chapter 5: Thanks for the Memories
- how to define data in memory
- how to load memory into registers for processing
- how to write the results back to memory
Defining Memory Contents
The GNU Assembler contains several directives to help you define memory in a .data section of your program.Some sample memory directives:
label: .byte 74, 0112, 0b00101010, 0x4A, 0X4a, 'J', 'H' + 2 .word 0x1234ABCD, -1434 .quad 0x123456789ABCDEF0 .ascii "Hello World\n"The .byte statement defines 1 or more bytes of memory.
The list of memory definition Assembler directives,
Aligning Data
These data directives put the data in memory contiguously byte by byte.We can instruct the Assembler to align the next piece of data with an .align directive.
For ex.,
.data .byte 0x3F .align 4 .word 0x12345678The first is only 1 byte, the next word of data will not be aligned.
We can add the “.align 4” directive to make it word aligned.
This will result in three wasted bytes.
ARM Assembly instructions must be word aligned.
Usually the Assembler will give you an error when alignment is required, and throwing in an “.align 4” directive is a quick fix.
Loading a Register with an Address
PC Relative Addressing
Addresses can be represented as a register-relative or PC-relative expression.- A register-relative expression evaluates to a named register combined with a numeric expression.
- A PC-relative expression is written in source code as the PC or a label combined with a numeric expression. For PC relative addressing, it really becomes addressing relative to the current instruction.
It can be expressed in the form:
[PC, #number]The assembler calculates the required offset from the label and the address of the current instruction.
It is recommended to write PC-relative expressions using labels rather than PC because the value of PC depends on the instruction set.
LDR r4,=data+4*n ; n is an assembly-time variable ; code MOV pc,lr data DCD value_0 ; n-1 DCD directives DCD value_n ; data+4*n points here ; more DCD directivesA simpler ex.,
LDR X1, =helloworldto load the address of our helloworld string into X1.
The Assembler knows the value of the program counter at this point, so it can provide an offset to the correct memory address.
Loading Data from Memory
The simple form of LDR to load data given an address isLDR{type} Xt, [Xa]where type is one of the types:
- B Unsigned byte
- SB signed byte
- H Unsigned halfword (16 bits)
- SH signed halfword (16 bits)
- SW signed word
// load the address of mynumber into X1 LDR X1, =mynumber // load the word stored at mynumber into X2 LDR X2,[X1] .data mynumber: .QUAD 0x123456789ABCDEF0it load 0x123456789ABCDEF0 into X2.
Note the square bracket syntax represents indirect memory access.
This means load the data stored at the address pointed to by X1, not move the contents of X1 into X2.
Indexing Through Memory
The ARM instruction set gives us support for the array indexing operation.Suppose we have an array of 10 words (4 bytes each) defined:
arr1: .FILL 10, 4, 0 LDR X1, =arr1 ; load the array’s address // Load the first element LDR W2, [X1] // Load element 3 // The elements count from 0, so 2 is // the third one. Each word is 4 bytes, // so we need to multiply by 4 LDR W2, [X1, #(2 * 4)]Using a register as an offset
// The 3rd element is still number 2 MOV X3, #(2 * 4) // Add the offset in X3 to X1 to get our element. LDR W2, [X1, X3]If X1 points to the end of the array, we can do indexing shifts in reverse
LDR W2, [X1, #-(2 * 4)] MOV X3, #(-2 * 4) LDR W2, [X1, X3]Post-Indexed Addressing:
// Load X1 with the memory pointed to by X2 // Then do X2 = X2 + 2 LDR X1, [X2], #2
An Example Converting to Upper-Case
Pseudo-code:i= 0 DO char = inStr[i] IF char >= 'a' AND char <= 'z' THEN char = char - ('a' - 'A') END IF outStr[i] = char i=i+ 1 UNTIL char == 0 PRINT outStrin this ex., NULL-terminated strings is used, the input string is not changed, a new output string with the upper-case version of the input string is generated.
upper.s:
// // X0-X2 - parameters to Linux function services // X3 - address of output string // X4 - address of input string // W5 - current character being processed // X8 - linux function number // .global _start // Provide program starting address to linker _start: LDR X4, =instr // start of input string LDR X3, =outstr // address of output string // The loop is until byte pointed to by X1 is non-zero loop: LDRB W5, [X4], #1 // load character and incr pointer // If W5 > 'z' then goto cont CMP W5, #'z' // is letter > 'z'? B.GT cont // Else if W5 < 'a' then goto end if CMP W5, #'a' B.LT cont // goto to end if // if we got here then the letter is lower case, so convert it. SUB W5, W5, #('a'-'A') cont: // end if STRB W5, [X3], #1 // store character to output str CMP W5, #0 // stop on hitting a null character B.NE loop // loop if character isn't null // Setup the parameters to print our hex number // and then call Linux to do it. MOV X0, #1 LDR X1, =outstr SUB X2, X3, X1 MOV X8, #64 SVC 0 // 1 = StdOut // string to print // get the len by sub'ing the pointers // Linux write system call // Call Linux to output the string // Setup the parameters to exit the program // and then call Linux to do it. MOV X0, #0 MOV X8, #93 SVC 0 // Use 0 return code // Service code 93 terminates // Call Linux to terminate the program .data instr: .asciz "This is our Test String that we will convert.\n" outstr: .fill 255, 1, 0compile and run the program,
$ as upper.s -o upper.o $ ld -o upper upper.o $ ./upper THIS IS OUR TEST STRING THAT WE WILL CONVERT.LDR and STR just load and save; they don’t have functionality to examine what they are loading or saving, so they can’t set the condition flags, hence the need for the CMP instruction in the UNTIL part of the loop to test for NULL.
Storing a Register
The STR instruction is a mirror of the LDR instruction.Double Registers
There are doubleword versions of all the LDR and STR instructions: LDP and STP.For example, to load the address of a 128-bit quantity (the address is still 64 bits) and then loads the 128 bits into X2 and X3. Then we store X2 and X3 back into the myoctaword:
LDR X1, =myoctaword LDP X2, X3, [X1] STP X2, X3, [X1] .data myoctaword: .OCTA 0x12345678876543211234567887654321these instructions are extensively used when we need to save registers to the stack and later restore them.
Summary
Chapter 6: Functions and the Stack
Stacks on Linux
Branch with Link
Nesting Function Calls
Function Parameters and Return Values
Managing the Registers
Summary of the Function Call Algorithm
Upper-Case Revisited
Stack Frames
Stack Frame Example
Macros
Include Directive
Macro Definition
Labels
Why Macros
Macros to Improve Code
Summary
Chapter 7: Linux Operating System Service
So Many Services
Calling Convention
Linux System Call Numbers
Return Codes
Structures
Wrappers
Converting a File to Upper-Case
Building .S Files
Opening a File
Error Checking
Looping
Summary
Chapter 8: Programming GPIO Pins
We can program the GPIO pins in two ways:- by using the Linux device driver
- by accessing the GPIO controller’s registers directly
GPIO Overview
On the raspberry Pi, pins 3, 5, 7–8, 10–13, 15, 16, 18, 19, 21–24, and 26: Are programmable general purpose.In Linux, Everything Is a File
Flashing LEDs
Moving Closer to the Metal
Virtual Memory
In Devices, Everything Is Memory
Registers in Bits
GPIO Function Select Registers
GPIO Output Set and Clear Registers
More Flashing LEDs
Root Access
Table Driven
Setting Pin Direction
Setting and Clearing Pins
Summary
Chapter 9: Interacting with C and Pythons
Calling C Routines
Printing Debug Information
Adding with Carry Revisited
Calling Assembly Routines from C
Packaging Our Code
Static Library
Shared Library
Embedding Assembly Code Inside C Code
Calling Assembly from Python
Summary
Chapter 10: Interfacing with Kotlin and Swift
Chapter 11: Multiply, Divide, and Accumulate
Chapter 12: Floating-Point Operations
Chapter 13: Neon Coprocessor
Chapter 14: Optimizing Code
Chapter 15: Reading and Understanding Code
Chapter 16: Hacking Code
Appendix A: The ARM Instruction Set
Appendix B: Binary Formats
Appendix C: Assembler Directive
Appendix D: ASCII Character Set
ARM (32-bits) assembler in Raspberry Pi
1 Introduction
2 Registers and basic arithmetic
3 Memory, addresses. Load and store.
4 GDB
5 Branches
6 Control structures
7 Indexing modes
8 Arrays and structures and more indexing modes.
9 Functions (I)
10 Functions (II). The stack
11 Predication
12 Loops and the status register
13 Floating point numbers
14 Matrix multiply
15 Integer division
16 Switch control structure
17 Passing data to functions
18 Local data and the frame pointer
19 The operating system
20 Indirect calls
21 Subword data
22 The Thumb instruction set
23 Nested functions
24 Trampolines
25 Integer SIMD
26 A primer about linking
27 Dynamic linking
Introduction to Computer Organization: ARM Assembly Language Using the Raspberry Pi
Robert G. Plantz
Chapter 1 Introduction
This book begins with the fundamental high-level language concepts and “looks under the hood” to see how they are implemented at the assembly language level.There are many challenging opportunities in programming embedded systems, and much of the work in this area demands at least an understanding of the ISA(instruction set architecture).
1.1 Efficient Use of This Book
1.2 Computer Subsystems
The von Neumann architecture: both the program instructions and data are stored in a memory unit that is separate from the processing unit.We will focus on how the program and data are stored in memory and how the CPU executes instructions.
1.3 How the Subsystems Interact
The buses shown here are logical groupings of the signals that must pass between the three subsystems.For example, the PCI bus standard uses the same physical pathway for the address and the data, but at different times.
Control signals indicate whether there is an address or data on the lines at any given time.
If the CPU is instructed to store data in memory, it places the data on the data bus, places the location in memory where the data is to be stored on the address bus, and places a “write” signal on the control bus. The memory subsystem responds by copying the data on the data bus into the specified memory location.
1.4 Setting Up Your Raspberry Pi
Installing the binutils-doc package to get full documentation for the GNU assembler, as.Chapter 2 Data Storage Formats
2.1 Bits and Groups of Bits
2.2 Exercises
2.3 Mathematical Equivalence of Binary and Decimal
2.4 Exercises
2.5 Unsigned Decimal to Binary Conversion
2.6 Exercises
2.7 Memory
2.8 Exercises
2.9 Using C Programs to Explore Data Formats
2.10 Programming Exercises
2.11 Examining Memory With a Debugger
/* intAndFloat.c * Using printf to display an integer and a float. * 2017-09-29: Bob Plantz */ #include <stdio.h> int main(void) { int anInt = 19088743; float aFloat = 19088.743; printf("The integer is %d and the float is %f\n", anInt, aFloat); return 0; }Build the example the run the gdb:
$ gcc -g -Wall -o intAndFloat intAndFloat.c $ gdb ./intAndFloat GNU gdb (Debian 10.1-1.7) 10.1.90.20210103-git Copyright (C) 2021 Free Software Foundation, Inc. ... For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from ./intAndFloat... (gdb)gdb has a large number of commands.
The few here will be sufficient to get you started:
- li LineNumber List ten lines of the source code, centered at the line number specified by LineNumber.
(gdb) li 1 /* intAndFloat.c 2 * Using printf to display an integer and a float. 3 * 2017-09-29: Bob Plantz 4 */ 5 #include <stdio.h> 6 7 int main(void) 8 { 9 int anInt = 19088743; 10 float aFloat = 19088.743; (gdb) 11 12 printf("The integer is %d and the float is %f\n", anInt, aFloat); 13 14 return 0; 15 } 16Simply pushing the return key will repeat the previous command, and li is smart enough to display the next (up to) ten lines.
Control will return to gdb when the line number is encountered.
(gdb) br 12 Breakpoint 1 at 0x798: file intAndFloat.c, line 12.I set a breakpoint at line 12.
Execution will pause before the statement is executed
(gdb) r Starting program: /home/pi/intAndFloat Breakpoint 1, main () at intAndFloat.c:12 12 printf("The integer is %d and the float is %f\n", anInt, aFloat);The run command causes the program to start execution from the beginning.
(gdb) print anInt $1 = 19088743 (gdb) print aFloat $2 = 19088.7422 (gdb) printf "anInt = %i and aFloat = %f\n", anInt, aFloat anInt = 19088743 and aFloat = 19088.742188
(gdb) help x Examine memory: x/FMT ADDRESS. ADDRESS is an expression for the memory address to examine. FMT is a repeat count followed by a format letter and a size letter. Format letters are o(octal), x(hex), d(decimal), u(unsigned decimal), t(binary), f(float), a(address), i(instruction), c(char), s(string) and z(hex, zero padded on the left). Size letters are b(byte), h(halfword), w(word), g(giant, 8 bytes). The specified number of objects of the specified size are printed according to the format. If a negative number is specified, memory is examined backward from the address. Defaults for format and size letters are those previously used. Default count is 1. Default address is following last thing printed with this command or "print".
(gdb) print &anInt $3 = (int *) 0x7ffffff3dc (gdb) print &aFloat $4 = (float *) 0x7ffffff3d8 (gdb) x/1dw 0x7ffffff3dc 0x7ffffff3dc: 19088743 (gdb) x/1fw 0x7ffffff3d8 0x7ffffff3d8: 19088.7422 (gdb) x/1xw 0x7ffffff3dc 0x7ffffff3dc: 0x01234567 (gdb) x/4xb 0x7ffffff3dc 0x7ffffff3dc: 0x67 0x45 0x23 0x01
The "format" string follows the same rules as the printf in the C Standard Library.
2.12 Programming Exercise
2.13 Storing Characters
2.14 Programming Exercise
2.15 Low-level Character Handling
2.16 Programming Exercises
2.17 Accessing the GPIO in C
Chapter 3 Computer Arithmetic
3.1 Addition and Subtraction
3.2 Exercises
3.3 Arithmetic Errors—Unsigned Integers
Use four-bit values to simplify the discussion.Consider addition of the two unsigned integers, 2 and 4:
0010 0100 0100 + 0100 + 1110 - 1110 ------ ------ ------ 0110 0010 0110 Carry =0 Carry=1 Carry=1These four-bit arithmetic examples generalize to any size arithmetic performed by the computer.
When adding or subtracting two unsigned integers, the result is arithmetically correct if and only if the carry condition flag (C) is set to zero.
the C flag in the CPSR register is always set to the appropriate value, or , each time an addition or subtraction is performed by the CPU.
In particular, the CPU will not ignore the C flag when there is no carry; it will actively set it to zero.
3.4 Signed Integers
3.5 Exercises
3.6 Arithmetic Errors—Signed Integers
The number of bits used to represent a value is determined at the time a program is written.The flags register, CPSR, provides a bit, the overflow condition flag, V, for detecting whether the sum of two -bit, signed numbers stored in the two's complement code has exceeded the range allocated for it.
1 >-- penultimate carry 0001 0101 + 0110 1111 --------- 1000 0100 Carry=0The V flag is equal to the exclusive or of carry and penultimate carry:
V = C ^ penultimate carrywhere ‘^’ is the exclusive or operator.
The CPU does not consider integers as either signed or unsigned.
- If your algorithm treats the result as unsigned the carry condition flag (C) is zero if and only if the result is within the -bit range; V is irrelevant.
- If your algorithm treats the result as signed the overflow condition flag (V) is zero if and only if the result is within the -bit range; C is irrelevant.
After each addition or subtraction operation the program should check the state of C for unsigned integers or V for signed integers and at least indicate when the sum is in error.
3.7 Exercises
Chapter 4 Basic Data Types
4.1 C/C++ Basic Data Types
4.2 Hexadecimal to Integer Conversion
4.3 Programming Exercise
4.4 Bitwise Logical Operations
4.5 Programming Exercise
4.6 Other Codes
Chapter 5 Boolean Algebra
5.1 Boolean Algebra Operations
5.2 Exercises
5.3 Canonical (Standard) Forms
5.4 Exercise
5.5 Boolean Function Minimization
Chapter 6 Logic Gates
6.1 Crash Course in Electronics
6.2 CMOS Transistors
6.3 NAND and NOR Gates
6.4 Exercise
Chapter 7 Logic Circuits
7.1 Combinational Logic Circuits
7.2 Programmable Logic Devices
7.3 Sequential Logic Circuits
7.4 Designing Sequential Circuits
7.5 Memory Organization
Chapter 8 Central Processing Unit
ARM CPUs used in different Raspberry Pi models. The 64-bit ARM processor in the Raspberry Pi 3 B can be run in either AARCH32 (32-bit) or AARCH64 (64-bit) state.
8.1 Overview
CPU block diagram. The CPU communicates with the Memory and I/O subsystems via the Address, Data, and Control buses.- Program Counter contains the address of the next instruction to be executed. (Also called an Instruction Pointer.)
- L1 Cache Memory Very fast memory on the CPU chip.
- Instruction Register Contains the instruction that is currently being executed.
- Control Unit Controls the activities of all the units in the CPU.
- Register A named group of several bytes of memory within the CPU.
- Arithmetic Logic Unit (ALU)
- Bus Interface The means for the CPU to communicate with the rest of the computer system—memory and I/O devices.
- Condition Flags Bits in a status register that show results of many operations performed by the ALU.
Many modern CPUs use two L1 cache memories organized in a Harvard architecture—one for instructions, the other for data. (See Section 1.2.) Its use is generally transparent to an applications programmer.
It contains circuitry to place addresses on the address bus, read and write data on the data bus, and read and write signals on the control bus.
The Bus Interface on many CPUs interfaces with external bus control units that in turn interface with memory and with different types of I/O buses, e.g., Serial ATA, PCI-E, USB, etc.
8.2 CPU Registers
A portion of the memory in the CPU is organized into registers. Machine instructions access CPU registers by their addresses.
The registers are in the CPU, the assembler has predefined names for the registers.
Applications programmers have access to 16 integer registers in the AARCH32 (32-bit) state, r0 — r15.
The names of the registers and their usage in AARCH32 state are summarized
Register Register Name Number Usage --------------------------------------- r0–r10 0–10 General Purpose r11 or fp 11 Frame Pointer r12 or ip 12 Intraprocess scratch r13 or sp 13 Stack Pointer r14 or lr 14 Link Register r15 or pc 15 Program Counter
In AARCH64 (64-bit) state applications programmers have access to 30 integer registers.
Full 64-bit Low 32-bit Register Register Name Register Name Number Usage ------------------------------------------------------------- r0–r30 or x0–x30 w0–w30 0 - 30 General Purpose sp wsp 31 Stack Pointer xzr wzr virtual Zero RegisterUsing wn, where ,n=0,1,…,30, refers to the low-order 32-bit portion of the register.
If an instruction reads these 32 bits from the register, bits 63–32 are ignored, and if an instruction writes to the 32 bits, bits 63–32 are set to zero.
Many instructions can access one byte in a register, which consists of the bits 7–0 in the specified register. And accessing two bytes at a time works on bits 15–0 in the specified register. This is specified in the instruction, not in the register name.
8.3 CPU Interaction with Memory
If store one byte 0xcd at location 0x7efff174, the control unit then- places 0x7efff174 on the address bus
- places 0xcd on the data bus, and then
- places a “write” signal on the control bus.
8.4 Program Execution in the CPU
The CPU is programmed via the instruction register — whose bit pattern determines what the CPU will do.Once that action has been completed, the bit pattern in the instruction register can be changed, and the CPU will perform the operation specified by this next bit pattern.
Most modern CPUs use an instruction queue.
Several instructions are waiting in the queue, ready to be executed.
Since instructions are simply bit patterns, they can be stored in memory.
The instruction pointer register always has the memory address of (points to) the next instruction to be executed.
In order for the control unit to execute this instruction, it is copied into the instruction register.
The senario is:
- A sequence of instructions is stored in memory
- The memory address where the first instruction is located is copied to the program counter
- The CPU sends the address in the program counter to memory via the address bus.
- Memory responds by sending a copy of the state of the bits at that memory location on the data bus, which the CPU then copies into its instruction register.
- The instruction pointer is automatically incremented to contain the address of the next instruction in memory.
- The CPU executes the instruction in the instruction register.
- Go to step 3.
Steps 3–7 make up a cycle, the instruction execution cycle, The wfi (“wait for interrupt”) instruction places the CPU in an idle state, where it remains until an I/O device sends an interrupt signal to the CPU.
Just to understand that the wfi instruction stops the program execution cycle.
The instructions for a program are stored in a file.
When you indicate to the operating system that you wish to execute a program, the operating system locates a region of memory large enough to hold the instructions in the program, and then copies them from the file to memory.
8.5 Using gdb to View the CPU Registers
We will use the following program to illustrate the use of gdb to view the contents of the CPU registers./* gdbExample1.c * Subtracts one from user integer. * Demonstrate use of gdb to examine registers, etc. * 2017-09-29: Bob Plantz */ #include <stdio.h> int main(void) { register int wye; int *ptr; int ex; ptr = &ex; ex = 305441741; wye = -1; printf("Enter an integer: "); scanf("%i", ptr); wye += *ptr; printf("The result is %i\n", wye); return 0; }Compile the program for gdb debugging:
$ gcc -g -O0 -Wall -o gdbExample1 gdbExample1.c
- The “-g” option tells the compiler to include debugger information in the executable program.
- The “-Wall” option causes the compiler to warn you about many constructions that might be a programming error.
$ gdb ./gdbExample1Some additional commands that will be useful in this section:
- lists ten lines of source code centered around the specified line number.
(gdb) li 11 6 7 #include <stdio.h> 8 9 int main(void) 10 { 11 register int wye; 12 int *ptr; 13 int ex; 14 15 ptr = &ex;
(gdb) br 18 Breakpoint 1 at 0x10478: file gdbExample1.c, line 18. (gdb) run Starting program: /home/pi/gdbExample1 Breakpoint 1, main () at gdbExample1.c:18 18 printf("Enter an integer: ");When line 18 is reached, the program is paused before the statement is executed
(gdb) print ex $1 = 305441741 (gdb) print &ex $2 = (int *) 0x7efff430
(gdb) help x Examine memory: x/FMT ADDRESS. ADDRESS is an expression for the memory address to examine. FMT is a repeat count followed by a format letter and a size letter. Format letters are o(octal), x(hex), d(decimal), u(unsigned decimal), t(binary), f(float), a(address), i(instruction), c(char), s(string) and z(hex, zero padded on the left). Size letters are b(byte), h(halfword), w(word), g(giant, 8 bytes). The specified number of objects of the specified size are printed according to the format. If a negative number is specified, memory is examined backward from the address. Defaults for format and size letters are those previously used. Default count is 1. Default address is following last thing printed with this command or "print".
(gdb) x/1dw 0x7efff430 0x7efff430: 305441741 (gdb) x/1xw 0x7efff430 0x7efff430: 0x1234abcd (gdb) x/4xb 0x7efff430 0x7efff430: 0xcd 0xab 0x34 0x12Note:
- 0xcd is stored in the byte at address 0x7efff430
- 0xab is stored in the byte at address 0x7efff431
- 0x34 is stored in the byte at address 0x7efff432
- 0x12 is stored in the byte at address 0x7efff433
(gdb) print ptr $2 = (int *) 0x7efff430 (gdb) print &ptr $3 = (int **) 0x7efff504the ptr variable is located at address 0x7efff504 and its content is 0x7efff4300 , the address of the variable ex.
It is important that you can distinguish between a memory address and the value that is stored there, which can be another memory address.
(gdb) print wye $4 = -1 (gdb) print &wye Address requested for identifier "wye" which is in register $r4Registers are located in the CPU and do not have memory addresses.
List of integer registers and their contents,
(gdb) i r r0 0x1 1 r1 0x7efff674 2130703988 r2 0x7efff67c 2130703996 r3 0x1234abcd 305441741 r4 0xffffffff 4294967295 r5 0x0 0 r6 0x10368 66408 r7 0x0 0 r8 0x0 0 r9 0x0 0 r10 0x76fff000 1996484608 r11 0x7efff514 2130703636 r12 0x7efff528 2130703656 sp 0x7efff500 0x7efff500 lr 0x76e6abe0 1994828768 pc 0x10478 0x10478 <main+32> cpsr 0x60000010 1610612752 fpscr 0x0 0
- The first column is the name of the register.
- The second shows the current bit pattern in the register, in hexadecimal. Notice that leading zeros are not displayed.
- The third column shows some the register contents in 32-bit unsigned decimal.
(
8.6 Programming Exercises
Chapter 9 Programming in Assembly Language
9.1 Program Organization
/* doNothingProg1.c * The minimum components of a C program. * 2017-09-29: Bob Plantz */ int main(void) { return 0; }use the -S command line option to look at the assembly language that the compiler produces:
$ gcc -S -O0 doNothingProg1.c
- -S causes the compiler to create the .s file, which contains the assembly language equivalent of the source code.
- -O0 tells the compiler not to do any optimization. For instructional purposes, we want to see every step of the assembly language. (This is upper-case “oh” followed by the numeral zero.)
.arch armv6 .eabi_attribute 28, 1 .eabi_attribute 20, 1 .eabi_attribute 21, 1 .eabi_attribute 23, 3 .eabi_attribute 24, 1 .eabi_attribute 25, 1 .eabi_attribute 26, 2 .eabi_attribute 30, 6 .eabi_attribute 34, 1 .eabi_attribute 18, 4 .file "doNothingProg1.c" .text .align 2 .global main .arch armv6 .syntax unified .arm .fpu vfp .type main, %function main: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 1, uses_anonymous_args = 0 @ link register save eliminated. str fp, [sp, #-4]! add fp, sp, #0 mov r3, #0 mov r0, r3 add sp, fp, #0 @ sp needed ldr fp, [sp], #4 bx lr .size main, .-main .ident "GCC: (Raspbian 10.2.1-6+rpi1) 10.2.1 20210110" .section .note.GNU-stack,"",%progbitsUse this programmer's version for investigation:
@ doNothingProg2.s @ Minimum components of a C program, in assembly language. @ 2017-09-29: Bob Plantz @ Define my Raspberry Pi .cpu cortex-a53 .fpu neon-fp-armv8 .syntax unified @ modern syntax @ Program code .text .align 2 .global main .type main, %function main: str fp, [sp, -4]! @ save caller frame pointer add fp, sp, 0 @ establish our frame pointer mov r3, 0 @ return 0; mov r0, r3 @ return values go in r0 sub sp, fp, 0 @ restore stack pointer ldr fp, [sp], 4 @ restore caller's frame pointer bx lr @ back to callerThe assembly language is line-oriented. That is, there is only one assembly language statement on each line, and none of the statements spans more than one line.
The following assembly language statement is equivalent to the machine lamguage "0xe3a03000":
mov r3, 0Next, notice that the pattern of each assembly line falls into one of three categories:
- comment The ‘@’ character any place on a line designates the rest of the line as a comment.
- Blank lines for readibility
- statements each of the assembly language lines is organized into four possible fields:
label: operation operand(s) @ comment
- label give a symbolic name to any line in the program. The memory location can be refered by this symbolic name.
- operation There are 2 types of operations:
- An assembly language mnemonic
- An assembler directive or pseudo op begins with the period (‘.’)
- operand
- comment
Identifiers are called Symbol Names. Case is also significant.
- Compiler-generated labels begin with the ‘.’ character
- many system related names begin with the ‘_’ character.
Assembler Directives
Assembler directives are directions to the assembler to take some action or change a setting.Assembler directives do not represent instructions, and are not translated into machine code.
For this assembler, all directives begin with a “.” or “#” (the comment is a #), and the directive must exist on a separate line from any other assembler directive or assembler instruction.
There are 4 main assembler directives:
- .text The .text directive tells the assembler that the information that follows is program text (assembly instructions), and the translated machine code is to be written to the text segment of memory.
- .data The .data directive tells the assembler that information that follows is program data. The information following a .data instruction will be data values, and will be stored in the data segment.
- .label A label is an address in memory corresponding to either an instruction or data value. It is just a convenience so the programmer can reference an address by a name.
- .number The number directive tells the assembler to set aside 2 bytes of memory for a data value, and to initialize the memory to the given value. It will often be used with the .label directive to set a label to a 2-byte memory value, and initialize the value
When a source code file is translated into machine code, an object file is produced.,br> The object file format used is Executable and Linking Format (ELF).
Programs that store information in ELF files store it in sections. The ELF standard specifies many different types of sections, each depending on the type of information stored in it.
The .text directive specifies that when the following assembly language statements are translated into machine instructions, they should be stored in a text section in the object file. Text sections are used to store program instructions in machine code format.
- Text Segment Where program instructions and constant data are stored.
- Data Segment Where global variables and static local variables are stored.
- Stack Segment Where automatic local variables and the data that links functions are stored.
- Heap Segment The pool of memory available when a C program calls the malloc function (or C++ calls new).
The operating system prevents a program from changing anything stored in the text segment, treating it as read-only memory during program execution. Also called code segment.
Both read-only and read-write data segments can occur in a program. It remains in place for the duration of program execution.
It is read-write memory that is allocated and deallocated dynamically as the program executes.
It is read-write memory that is allocated and deallocated by the program.
The operating system needs to view an ELF file as a set of segments. One of the functions of the ld program is to group ELF sections together into segments so that they can be loaded into memory.
When the operating system loads the program into memory, it uses the segment view of the ELF file. Thus, for example, the contents of all the text sections will be loaded into the text segment of the program process.
The readelf program is also useful for learning about ELF files.
The AArch32 target selection directives specify code generation parameters for AArch32 targets.
The following three directives identify the characteristics of the ARM processor this code will run on:
.cpu cortex-a53 .fpu neon-fp-armv8 .syntax unified @ modern syntaxThere are many variations of the ARM architecture, and the assembler needs to know which one this code is intended for. The appropriate values for each directive for the various Raspberry Pi models are given below:
Raspberry Pi | .cpu | .fpu |
---|---|---|
Pi Zero | ||
Pi 1 A+ | arm1176jzf-s | vfp |
Pi 1 B+ | ||
Pi 2 B | cortex-a7 | neon-vfpv4 |
Pi 3 B | cortex-a53 | neon-fp-armv8 |
The first assembler directive in the text segment has one operand, 2,
.align 2For the ARM, this tells the assembler to ensure that the lowest two bits of the starting address of the generated code are zero.
That is, the addressing is adjusted, incremented if necessary, to be a multiple of four.
Each machine instruction is four bytes long, so this ensures proper alignment of the instructions in memory.
The .global directive makes the name globally known, code outside this file can refer to this name.
.global mainWhen a program is executed, the operating system does some preliminary set up of system resources. It then starts program execution by calling a function named “main,” so the name must be global in scope.
The following declares the label, main, as the name of a function in the program.
.type main, %functionThis simply identifies the original C source code file,
.file: "doNothingProg1.c"The .size directive gives the number of bytes in the code, and the .ident directive lists the version of the compiler that produced this assembly language.
These directives are used to describe the characteristics of the statements that follow.
They are not translated into actual machine instructions, and none of them occupy any memory in the finished program.
9.2 First Assembly Language Instructions
To see the details of the instruction, you need to read the ARM manuals,- ARM Architecture Reference Manual ARMv7-A and ARMv7-R edition for 32-bit
- Architecture Reference Manual ARMv8, for ARMv8-A architecture profile for 64-bit
I will use ‘%’ to add my comments.
9.2.1 Some Notation
The syntax that ARM uses for their assembly language is called Unified Assembler Language (UAL).The assembler, as, recognizes the UAL syntax if you use the assembler directives to identify the ARM model correctly.
the version of gcc currently (August 2016) running on Raspbian uses pre-UAL syntax. The differences are minor.
For example, the compiler-generated assembly language uses a ‘#’ character to prefix each literal value:
str fp, [sp, #-4]!But the UAL syntax specifies that the ‘#’ character is optional.
The ‘#’ character for immediate values will not be used in my examples in this book.
To use the UAL syntax when writing your own assembly language programs will become very important when we get to the floating-point instructions.
9.2.2 Condition Codes
Most AARCH32 ARM instructions have an option that allows you to specify that it will be executed only if a specific setting of the condition flags exists.These settings are expressed by adding a mnemonic Condition Code to the instruction mnemonic.
Mnemonic suffixes for conditional execution of instructions. Meaning depends on whether the values are integers or floats: The cond column shows the machine code.
9.2.3 Shift Options
Many ARM instructions include an option to shift one of the data values during the operation that the instruction performs.Mnemonic codes for adding shifts to instructions. The ‘#’ is optional. As an example of how the shifting syntax is used,
mov r0, 12 #store 12 in r0 mov r1, 60 # store 60 in r1 add r2, r0, r1, lsl 2 # lsl #2 left shifts the value in r1 two bit, r1 = 240would store 252 in r2.
To let the amount of the shift be under program control,
mov r0, 12 mov r1, 60 mov r3, 2 add r2, r0, r1, lsl r3
9.2.4 First Instructions
Even though the program does nothing, it uses six instructions.- MOV Copies (moves) a value into a register. Format
MOV{S}{<c>} <Rd>, #<const> % immediate MOV{S}{<c>} <Rd>, <Rm> % register
- S If ‘S’ is present, the condition flags are updated according to the value being moved.
- c <c> is the condition cod
- Rd specifies the destination register
- Rm the source register
- const [-257 , +256]
If absent, the condition flags are not changed.
MVN{S}{<c>} <Rd>, #<const> % immediate MVN{S}{<c>} <Rd>, <Rm>{, <shift>} % register MVN(S}{<c>} <Rd>, <Rm>, <type> <Rs> % register-shifted register
ADD{S}{<c>} {<Rd>,} <Rn>, #<const> % immediate ADD{S}{<c>>} {<Rd>,} <Rn>>, <Rm>{, <shift>} % register ADD{S}{<c>>} {<Rd>,} <Rn>>, <Rm>, <type> <Rs> % register-shifted register
SUB{S}{<c>} {<Rd>,} <Rn>, #<const> % immediate SUB{S}{<c>} {<Rd>>,} <Rn>, <Rm>{, <shift>} % register SUB{S}{<c>} {<Rd>,} <Rn>, <Rm>, <type> <Rs> % register-shifted register
BX{<c>} <Rm>The value in the Rm register is moved to the pc, thus causing program execution to branch to that location.
The value in Rm does not change.
LDR<c> <Rt>, <label> % Label LDR<c> <Rt>, [<Rn>{, #+/-<imm>}] % Offset LDR<c> <Rt>, [<Rn>, #+/-<imm>]! % Pre-indexed LDR<c> <Rt>, [<Rn>], #+/-<imm> % Post-indexed
- <Rt> is the destination register, and <Rn> is the base register
- <label> is a labeled memory address
- label form the address corresponding to the <label>
- offset form the signed integer, <imm>, is added to the value in the base register, <Rn>, the value at this address is loaded into <Rt>, but the base register is not changed.
- Pre-indexed form the signed integer is added to the value in the base register, <Rn>, the base register is updated to the new address, and then the value at this new address is loaded into <Rt>.
- Post-indexed form the value in the base register, <Rn>, is used as an address, and the value at that address is loaded into <Rt>. Then the signed integer is added to the value in the base register.
STR<c>> <Rt>, <label> % Label STR<c> <Rt>, [<Rn>{, #+/-<imm>}] % Offset STR<c> <Rt>, [<Rn>, #+/-<imm>]! % Pre-indexed STR<c> <Rt>>, [<Rn>], #+/-<imm> % Post-indexed
- <Rt> is the source register, and <Rn> is the base register.
- <label> is a labeled memory address.
9.2.5 Code Walkthrough
每一個函數被執行時都有一個frame代表那函數的記憶體使用區,指著目前函數區域變數開始存放的位址的系統變數則叫作 frame pointer。
A call stack is composed of stack frames .
The stack frame at the top of the stack is for the currently executing routine, which can access information within its frame (such as parameters or local variables) .
The stack frame usually includes at least the following items (in push order):
- the arguments (parameter values) passed to the routine (if any);
- the return address back to the routine's caller
- space for the local variables of the routine (if any).
While the subroutine is active, the frame pointer, points at the top of the stack. (stacks grow downward)
- first determines a memory address by subtracting 4 from the address in the sp register and updating the sp register to this new address.
- It then stores the address in the fp register in memory at this new address.
str fp, [sp, -4]! @ save caller frame pointer
-
This instruction
Each function in the program has its own area of the stack, known as a Stack Frame.
The function keeps track of where its frame is by maintaining its memory address in the fp register.
留言