Embedded Linux Programming

Mastering Embedded Linux Programming

Third Edition
Create fast and reliable embedded solutions with Linux 5.4 and the Yocto Project 3.1 (Dunfell)
Frank Vasquez
Chris Simmonds

Section 3:Writing Embedded Applications

Data alignment

cpu 通常一次會取 4 byte(要看電腦的規格 32 位元的 cpu 一次可以讀取 32 bit 的資料,64 位元一次可以讀取 64 bit),並且是按照順序取的
. For ex.,
如果你的資料是分布在 1~4, 由於資料分布不在 4 的倍數,導致了存取次數增加使得速度降低 :

When copying data from a memory address that is not aligned to the natural alignment boundary of the architecture (such as 4 bytes alignment on many systems), it can lead to a performance issue known as "unaligned memory access."
Unaligned memory access occurs when a memory operation crosses the boundaries of the architecture's natural alignment.

Unaligned memory access can result in performance penalties and, in some cases, even incorrect behavior, depending on the architecture and the specific operation being performed. Here are some potential issues and their impacts:

  • Performance Impact
  • Unaligned memory accesses might require the processor to perform multiple memory fetches and additional calculations to properly align the data. This can slow down memory operations and lead to performance degradation. Modern processors are often optimized for aligned memory accesses, and unaligned accesses can cause pipeline stalls or increased memory latency.
  • Data Transfer Impact
  • When copying data that is unaligned, the process might involve fetching data from multiple memory locations, which can increase memory traffic and potentially lead to cache inefficiencies. This can have a negative impact on memory bandwidth.
  • Architecture-Specific Effects
  • Different processors and architectures handle unaligned memory access differently. Some architectures might handle unaligned accesses with minimal performance penalty, while others might require more complex operations to align the data before performing the access. In some cases, unaligned memory accesses might result in exceptions or faults that need to be handled by the operating system.
  • Code Portability
  • Code that relies heavily on unaligned memory access might not be portable across different architectures. Architectures with strict alignment requirements might not support unaligned access at all, leading to code that doesn't work as expected or crashes on certain systems.
To mitigate the performance impact of unaligned memory access, it's generally recommended to align data structures and memory allocations to the natural alignment boundary of the architecture. This might involve adding padding bytes to structures or using compiler-specific alignment directives. Additionally, using platform-specific optimized memory copy functions (e.g., `memcpy` variations) can help minimize the negative effects of unaligned memory access.

Memory alignment is an important consideration when implementing a memory copy function. Here's an improved version of the memcpy() function that takes alignment into account:


void* my_memcpy(void* dest, const void* src, size_t n) {
    unsigned char* d = (unsigned char*)dest;
    const unsigned char* s = (const unsigned char*)src;

    // Perform aligned copies as long as possible
    while (((uintptr_t)d % sizeof(uintptr_t)) != 0 && n > 0) {
        *d++ = *s++;
        --n;
    }

    // Perform bulk copy in units of uintptr_t
    size_t bulk = n / sizeof(uintptr_t);
    uintptr_t* dest_ptr = (uintptr_t*)d;
    const uintptr_t* src_ptr = (const uintptr_t*)s;

    for (size_t i = 0; i < bulk; ++i) {
        *dest_ptr++ = *src_ptr++;
    }

    // Copy any remaining bytes
    d = (unsigned char*)dest_ptr;
    s = (const unsigned char*)src_ptr;

    for (size_t i = 0; i < n % sizeof(uintptr_t); ++i) {
        *d++ = *s++;
    }

    return dest;
}  
  

Specifying Attributes of Types

The keyword __attribute__ allows you to specify various special properties of types.
Some type attributes apply only to structure and union types, and in C++, also class types, while others can apply to any type defined via a typedef declaration.

The __attribute__ keyword is followed by an attribute specification enclosed in double parentheses.
You may specify type attributes in an enum, struct or union type declaration or definition by placing them immediately after the struct, union or enum keyword.
The following type attributes are supported on most targets.

  • aligned (alignment)
  • The aligned attribute specifies a minimum alignment (in bytes) for variables of the specified type.
    When specified, alignment must be a power of 2.
    Specifying no alignment argument implies the maximum alignment for the target, which is often, but by no means always, 8 or 16 bytes.
    For example, the declarations:
    
    struct __attribute__ ((aligned (8))) S { short f[3]; };
    typedef int more_aligned_int __attribute__ ((aligned (8)));    
        
    force the compiler to ensure (as far as it can) that each variable whose type is struct S or more_aligned_int is allocated and aligned at least on a 8-byte boundary.
    Note that the aligned attribute may change the memory layout of structures, by inserting padding between members. Subsequently, the size of the structure will change.
  • packed
  • This attribute, attached to a struct, union, or C++ class type definition, specifies that each of its members (other than zero-width bit-fields) is placed to minimize the memory required.
    This is equivalent to specifying the packed attribute on each of the members.
    When attached to an enum definition, the packed attribute indicates that the smallest integral type should be used.
    For example,
    
    struct my_unpacked_struct
     {
        char c;
        int i;
     };
    
    struct __attribute__ ((packed)) my_packed_struct
      {
         char c;
         int  i;
         struct my_unpacked_struct s;
      };    
        
    • struct my_packed_structs members are packed closely together
    • the internal layout of its s member struct my_unpacked_struct is not packed

Proper Use of C's volatile Keyword

A variable should be declared volatile whenever its value could change unexpectedly.
In practice, only three types of variables could change:
  • Memory-mapped peripheral registers
  • As a very simple example, consider an 8-bit status register that is memory mapped at address 0x1234.
    It is required that you poll the status register until it becomes non-zero.
    The naive and incorrect implementation is as follows:
    
    uint8_t * p_reg = (uint8_t *) 0x1234;
    
    // Wait for register to read non-zero 
    do { ... } while (0 == *p_reg)
    		
    This code will almost certainly fail as soon as you turn compiler optimization on.
    That's because the compiler will generate assembly language (here for an 16-bit x86 processor) that looks something like this:
    
      mov p_reg, #0x1234
      mov a, @p_reg
    loop:
      ...
      bz loop
    		
    The rationale of the optimizer is quite simple: having already read the variable's value into the accumulator (on the second line of assembly), there is no need to reread it, since the value will (duh!) always be the same.
    Thus, from the third line of assembly, we enter an infinite loop.
  • Global variables modified by an interrupt service routine(ISR)
  • An incorrect implementation of this might be:
    
    bool gb_etx_found = false;
    
    void main() 
    {
        ... 
        while (!gb_etx_found) 
        {
            // Wait
        } 
        ...
    }
    
    interrupt void rx_isr(void) 
    {
        ... 
        if (ETX == rx_char) 
        {
            gb_etx_found = true;
        } 
        ...
    }
    	
    The problem is that the compiler has no idea that gb_etx_found can be changed within the ISR function, which doesn't appear to be ever called.
    The solution is to declare the variable gb_etx_found to be volatile.
  • Global variables accessed by multiple tasks within a multi-threaded application
  • Thus, a task asynchronously modifying a shared global is conceptually the same as the ISR scenario discussed above.
Thus all shared global objects (variables, memory buffers, hardware registers, etc.) must also be declared volatile to prevent compiler optimization from introducing unexpected behaviors.
[WARNING: Global variables shared by tasks and ISRs will also need to be protected against race conditions, e.g. by a mutex.]

Mutex vs Semaphore

There are two methods used for process synchronization : Mutex and Semaphore.
The mutex object allows all the processes to use the same resource but at a time, only one process is allowed to use the resource.

Semaphore is an integer variable S, that is initialized with the number of resources present in the system and is used for process synchronization.
Firstly, the semaphore variable is initialized with the number of resources available.
Whenever a process needs some resource, then the wait() function is called and the value of the semaphore variable is decreased by one.
when the value of the semaphore variable goes to 0, if some other process wants to use resources then that process has to wait for its turn.
After using the resource, the signal() function is called and the value of the semaphore variable is increased by one.

Let's have a look into the difference between mutex and semaphore:

  • Mutex uses a locking mechanism i.e. if a process wants to use a resource then it locks the resource, uses it and then release it. But on the other hand, semaphore uses a signalling mechanism where wait() and signal() methods are used to show if a process is releasing a resource or taking a resource.
  • A mutex is an object but semaphore is an integer variable.
  • In semaphore, we have wait() and signal() functions. But in mutex, there is no such function.
  • A mutex object allows multiple process threads to access a single shared resource but only one at a time. On the other hand, semaphore allows multiple process threads to access the finite instance of the resource until available.
  • Mutex 只能讓一個 thread 進入 critical section,Semaphore 的話則可以設定要讓幾個 thread 進入
  • In mutex, the lock can be acquired and released by the same process at a time. But the value of the semaphore variable can be modified by any process that needs some resource but only one process can change the value at a time.
  • 最大的差異在於 Mutex 只能由上鎖的 thread 解鎖,而 Semaphore 沒有這個限制,可以由原本的 thread 或是另外一個 thread 解開。

Storage

市面上常見的手機快閃記憶體規格有 eMMCUFSNVMe,其中 Android 手機主要採用 eMMC 與 UFS 標準,而 NVMe 則是蘋果 iPhone 所使用的快閃記憶體標準。

Introduction

eMMC(embedded MultiMedia Card)是由 MMC 協會針對手機或平板電腦訂立的快閃記憶體標準,起源時間比 UFS 與 NVMe 還要更早。eMMC 基礎上由 MMC 記憶卡發展而來,採用「並行傳輸」技術製成,讀寫必須分開執行,雖然僅提供單路讀寫功能,但仍具備體積小、高度集成與低複雜度的優勢,目前最新的 eMMC 5.1 標準,連續讀取速度約為 250MB/s。

UFS(Universal Flash Storage)最早由 JEDEC 於 2011 年推出,採用全新「串行傳輸」技術,同時讀寫操作。第一代 UFS 由於與當時 eMMC 標準速度差異不大,且成本較為高昂,因此並未成功普及,直至 2014 年 UFS 2.0 標準問世後,連續讀取速度約達 800MB/s,UFS 至此成為 Android 旗艦手機逐漸採用的標準配置,目前最新的 UFS 3.1 標準,連續讀取速度約為 1,700MB/s。

NVMe(NVM Express)原先是用在 MacBook 上的主流固態存儲裝置標準之一,具備高效率、低負載、低延時的特性,蘋果自 2015 年起首度將 NVMe 應用到 iPhone 6S 系列中,連續讀取速度約為 900MB/s。由於早期 Android 手機多是使用 eMMC 5.1 標準,因此在連續讀取速度方面,iPhone 擁有絕對領先的優勢,直至 UFS 3.0 問世後兩者差距才逐漸縮小。

eMMC 採用的「並行傳輸」技術,必須分開執行讀寫操作,因此當手機執行多任務操作時,eMMC 的反應速度相比 UFS 與 NVMe 會慢上許多。此外,UFS 與 NVMe 日常平均功耗也比起 eMMC 來的更低。

GPIO

GPIO Sysfs Interface for Userspace

THIS ABI IS DEPRECATED, NEW USERSPACE CONSUMERS ARE SUPPOSED TO USE THE CHARACTER DEVICE ABI. THIS OLD SYSFS ABI WILL NOT BE DEVELOPED (NO NEW FEATURES), IT WILL JUST BE MAINTAINED.
DO NOT ABUSE SYSFS TO CONTROL HARDWARE THAT HAS PROPER KERNEL DRIVERS.

See the userspace header in include/uapi/linux/gpio.h for the new character device ABI.



The deprecated sysfs ABI

Paths in Sysfs


/sys/class/gpio
|-- export
|-- gpiochip200 -> ../../devices/pci0000:00/INT34BB:00/gpio/gpiochip200
`-- unexport
There are three kinds of entries in /sys/class/gpio:
  • Control interfaces used to get userspace control over GPIOs
  • The control interfaces are write-only:
    • export
    • Userspace may ask the kernel to export control of a GPIO to userspace by writing its number to this file.
      For ex., this will create a "gpio19" node for GPIO #19, if that's not requested by kernel code.
      
            	echo 19 > export
              
    • unexport
    • Reverses the effect of exporting to userspace.
      For ex., this will remove a "gpio19" node exported using the "export" file.
      
            	echo 19 > unexport
              
  • GPIOs themselves
  • GPIO signals have paths like /sys/class/gpio/gpioN/ (for GPIO #N) and have the following read/write attributes:
    • direction
    • reads as either "in" or "out".
      This value may normally be written.
      Writing as "out" defaults to initializing the value as low.
      To ensure glitch free operation, values "low" and "high" may be written to configure the GPIO as an output with that initial value.
      Note that this attribute *will not exist* if the kernel doesn't support changing the direction of a GPIO, or it was exported by kernel code that didn't explicitly allow userspace to reconfigure this GPIO's direction.
    • value
    • reads as either 0 (low) or 1 (high).
      If the GPIO is configured as an output, this value may be written; any nonzero value is treated as high.
      If the pin can be configured as interrupt-generating interrupt and if it has been configured to generate interrupts (see the description of "edge"), you can poll(2) on that file and poll(2) will return whenever the interrupt was triggered.
      If you use poll(2), set the events POLLPRI and POLLERR.
      If you use select(2), set the file descriptor in exceptfds.
      After poll(2) returns, either lseek(2) to the beginning of the sysfs file and read the new value or close the file and re-open it to read the value.
    • edge
    • reads as either "none", "rising", "falling", or "both".
      Write these strings to select the signal edge(s) that will make poll(2) on the "value" file return.
      This file exists only if the pin can be configured as an interrupt generating input pin.
    • active_low
    • reads as either 0 (false) or 1 (true).
      Write any nonzero value to invert the value attribute both for reading and writing.
      Existing and subsequent poll(2) support configuration via the edge attribute for "rising" and "falling" edges will follow this setting.
  • GPIO controllers ("gpio_chip" instances)
  • GPIO controllers have paths like /sys/class/gpio/gpiochip200/ (for the controller implementing GPIOs starting at #200) and have the following read-only attributes:
    • base
    • same as N, the first GPIO managed by this chip
    • label
    • provided for diagnostics (not always unique)
    • ngpio
    • how many GPIOs this manages (N to N + ngpio - 1)
    
    /sys/class/gpio/gpiochip200
    |-- base
    |-- device -> ../../../INT34BB:00
    |-- label
    |-- ngpio
    |-- power
    |   |-- async
    |   |-- autosuspend_delay_ms
    |   |-- control
    |   |-- runtime_active_kids
    |   |-- runtime_active_time
    |   |-- runtime_enabled
    |   |-- runtime_status
    |   |-- runtime_suspended_time
    |   `-- runtime_usage
    |-- subsystem -> ../../../../../class/gpio
    `-- uevent    
        

Exporting from Kernel code

Kernel can explicitly manage exports of GPIOs which have already been requested using gpio_request().

	/* export the GPIO to userspace */
	int gpiod_export(struct gpio_desc *desc, bool direction_may_change);

	/* reverse gpio_export() */
	void gpiod_unexport(struct gpio_desc *desc);

	/* create a sysfs link to an exported GPIO node */
	int gpiod_export_link(struct device *dev, const char *name,
		      struct gpio_desc *desc);
After a kernel driver requests a GPIO, it may only be made available in the sysfs interface by gpiod_export().
The driver can control whether the signal direction may change. This helps drivers prevent userspace code from accidentally clobbering important system state. After the GPIO has been exported, gpiod_export_link() allows creating symlinks from elsewhere in sysfs to the GPIO sysfs node.
Drivers can use this to provide the interface under their own device in sysfs with a descriptive name.

Subsystem drivers using GPIO

These drivers can quite easily interconnect with other kernel subsystems using hardware descriptions such as device tree or ACPI:
  • leds-gpio
  • drivers/leds/leds-gpio.c will handle LEDs connected to GPIO lines, giving you the LED sysfs interface
  • gpio-keys
  • drivers/input/keyboard/gpio_keys.c is used when your GPIO line can generate interrupts in response to a key press. Also supports debounce.
  • gpio_mouse
  • drivers/input/mouse/gpio_mouse.c is used to provide a mouse with up to three buttons by simply using GPIOs and no mouse port. You can cut the mouse cable and connect the wires to GPIO lines or solder a mouse connector to the lines for a more permanent solution of this type.
  • restart-gpio
  • drivers/power/reset/gpio-restart.c is used to restart/reboot the system by pulling a GPIO line and will register a restart handler so userspace can issue the right system call to restart the system.
  • poweroff-gpio
  • drivers/power/reset/gpio-poweroff.c is used to power the system down by pulling a GPIO line and will register a pm_power_off() callback so that userspace can issue the right system call to power down the system.
  • i2c-gpio
  • drivers/i2c/busses/i2c-gpio.c is used to drive an I2C bus (two wires, SDA and SCL lines) by hammering (bitbang) two GPIO lines. It will appear as any other I2C bus to the system and makes it possible to connect drivers for the I2C devices on the bus like any other I2C bus driver.
  • gpio-fan
  • drivers/hwmon/gpio-fan.c is used to control a fan for cooling the system, connected to a GPIO line (and optionally a GPIO alarm line), presenting all the right in-kernel and sysfs interfaces to make your system not overheat.
Use these instead of talking directly to the GPIOs from userspace; they integrate with kernel frameworks better than your userspace code could.

References

  • https://jasonblog.github.io/note/index.html

Data Allignment

What is Data Alignment?

Data alignment means that the address of a data can be evenly divisible by 1, 2, 4, or 8.
In other words, data object can have 1-byte, 2-byte, 4-byte, 8-byte alignment or any power of 2.
For instance, if the address of a data is 12FEECh (1244908 in decimal), then it is 4-byte alignment because the address can be evenly divisible by 4. (You can divide it by 2 or 1, but 4 is the highest number that is divisible evenly.)

CPU accesses memory in 2, 4, 8, 16, or 32 byte chunks at a time.
The reason for doing this is the performance - accessing an address on 4-byte or 16-byte boundary is a lot faster than accessing an address on 1-byte boundary.

If the data is misaligned of 4-byte boundary, CPU has to perform extra work to access the data:

  • load 2 chucks of data
  • shift out unwanted bytes
  • combine them together
This process definitely slows down the performance and wastes CPU cycle just to get right data from memory.

Structure Member Alignment

For example, if you have 1 char variable (1-byte) and 1 int variable (4-byte) in a struct, the compiler will pad 3 bytes between these two variables.
Therefore, the total size of this struct variable is 8 bytes, instead of 5 bytes.
By doing this, the address of this struct data is divisible evenly by 4.
This is called structure member alignment.

留言

熱門文章