3月 28, 2022

Embedded Linux Programming

Mastering Embedded Linux Programming

Third Edition
Create fast and reliable embedded solutions with Linux 5.4 and the Yocto Project 3.1 (Dunfell)
Frank Vasquez
Chris Simmonds

Section 3:Writing Embedded Applications

Data alignment

cpu 通常一次會取 4 byte(要看電腦的規格 32 位元的 cpu 一次可以讀取 32 bit 的資料，64 位元一次可以讀取 64 bit)，並且是按照順序取的
. For ex.,

如果你的資料是分布在 1~4, 由於資料分布不在 4 的倍數，導致了存取次數增加使得速度降低 :

When copying data from a memory address that is not aligned to the natural alignment boundary of the architecture (such as 4 bytes alignment on many systems), it can lead to a performance issue known as "unaligned memory access."
Unaligned memory access occurs when a memory operation crosses the boundaries of the architecture's natural alignment.

Unaligned memory access can result in performance penalties and, in some cases, even incorrect behavior, depending on the architecture and the specific operation being performed. Here are some potential issues and their impacts:

Performance Impact
Data Transfer Impact
Architecture-Specific Effects
Code Portability

To mitigate the performance impact of unaligned memory access, it's generally recommended to align data structures and memory allocations to the natural alignment boundary of the architecture. This might involve adding padding bytes to structures or using compiler-specific alignment directives. Additionally, using platform-specific optimized memory copy functions (e.g., `memcpy` variations) can help minimize the negative effects of unaligned memory access.

Memory alignment is an important consideration when implementing a memory copy function. Here's an improved version of the memcpy() function that takes alignment into account:


void* my_memcpy(void* dest, const void* src, size_t n) {
    unsigned char* d = (unsigned char*)dest;
    const unsigned char* s = (const unsigned char*)src;

    // Perform aligned copies as long as possible
    while (((uintptr_t)d % sizeof(uintptr_t)) != 0 && n > 0) {
        *d++ = *s++;
        --n;
    }

    // Perform bulk copy in units of uintptr_t
    size_t bulk = n / sizeof(uintptr_t);
    uintptr_t* dest_ptr = (uintptr_t*)d;
    const uintptr_t* src_ptr = (const uintptr_t*)s;

    for (size_t i = 0; i < bulk; ++i) {
        *dest_ptr++ = *src_ptr++;
    }

    // Copy any remaining bytes
    d = (unsigned char*)dest_ptr;
    s = (const unsigned char*)src_ptr;

    for (size_t i = 0; i < n % sizeof(uintptr_t); ++i) {
        *d++ = *s++;
    }

    return dest;
}

Specifying Attributes of Types

The keyword __attribute__ allows you to specify various special properties of types.
Some type attributes apply only to structure and union types, and in C++, also class types, while others can apply to any type defined via a typedef declaration.

The __attribute__ keyword is followed by an attribute specification enclosed in double parentheses.
You may specify type attributes in an enum, struct or union type declaration or definition by placing them immediately after the struct, union or enum keyword.
The following type attributes are supported on most targets.

aligned (alignment)

aligned

in bytes

a power of 2


struct __attribute__ ((aligned (8))) S { short f[3]; };
typedef int more_aligned_int __attribute__ ((aligned (8)));

aligned

packed

minimize the memory

packed


struct my_unpacked_struct
 {
    char c;
    int i;
 };

struct __attribute__ ((packed)) my_packed_struct
  {
     char c;
     int  i;
     struct my_unpacked_struct s;
  };

struct my_packed_structs members are packed closely together
the internal layout of its s member struct my_unpacked_struct is not packed

Proper Use of C's volatile Keyword

A variable should be declared volatile whenever its value could change unexpectedly.
In practice, only three types of variables could change:

Memory-mapped peripheral registers


uint8_t * p_reg = (uint8_t *) 0x1234;

// Wait for register to read non-zero 
do { ... } while (0 == *p_reg)


  mov p_reg, #0x1234
  mov a, @p_reg
loop:
  ...
  bz loop

Global variables modified by an interrupt service routine(ISR)


bool gb_etx_found = false;

void main() 
{
    ... 
    while (!gb_etx_found) 
    {
        // Wait
    } 
    ...
}

interrupt void rx_isr(void) 
{
    ... 
    if (ETX == rx_char) 
    {
        gb_etx_found = true;
    } 
    ...
}

volatile

Global variables accessed by multiple tasks within a multi-threaded application

Thus all shared global objects (variables, memory buffers, hardware registers, etc.) must also be declared volatile to prevent compiler optimization from introducing unexpected behaviors.
[WARNING: Global variables shared by tasks and ISRs will also need to be protected against race conditions, e.g. by a mutex.]

Mutex vs Semaphore

There are two methods used for process synchronization : Mutex and Semaphore.
The mutex object allows all the processes to use the same resource but at a time, only one process is allowed to use the resource.

Semaphore is an integer variable S, that is initialized with the number of resources present in the system and is used for process synchronization.
Firstly, the semaphore variable is initialized with the number of resources available.
Whenever a process needs some resource, then the wait() function is called and the value of the semaphore variable is decreased by one.
when the value of the semaphore variable goes to 0, if some other process wants to use resources then that process has to wait for its turn.
After using the resource, the signal() function is called and the value of the semaphore variable is increased by one.

Let's have a look into the difference between mutex and semaphore:

Mutex uses a locking mechanism i.e. if a process wants to use a resource then it locks the resource, uses it and then release it. But on the other hand, semaphore uses a signalling mechanism where wait() and signal() methods are used to show if a process is releasing a resource or taking a resource.
A mutex is an object but semaphore is an integer variable.
In semaphore, we have wait() and signal() functions. But in mutex, there is no such function.
A mutex object allows multiple process threads to access a single shared resource but only one at a time. On the other hand, semaphore allows multiple process threads to access the finite instance of the resource until available.
In mutex, the lock can be acquired and released by the same process at a time. But the value of the semaphore variable can be modified by any process that needs some resource but only one process can change the value at a time.

Storage

市面上常見的手機快閃記憶體規格有 eMMC、UFS 與 NVMe，其中 Android 手機主要採用 eMMC 與 UFS 標準，而 NVMe 則是蘋果 iPhone 所使用的快閃記憶體標準。

Introduction

eMMC（embedded MultiMedia Card）是由 MMC 協會針對手機或平板電腦訂立的快閃記憶體標準，起源時間比 UFS 與 NVMe 還要更早。eMMC 基礎上由 MMC 記憶卡發展而來，採用「並行傳輸」技術製成，讀寫必須分開執行，雖然僅提供單路讀寫功能，但仍具備體積小、高度集成與低複雜度的優勢，目前最新的 eMMC 5.1 標準，連續讀取速度約為 250MB/s。

UFS（Universal Flash Storage）最早由 JEDEC 於 2011 年推出，採用全新「串行傳輸」技術，可同時讀寫操作。第一代 UFS 由於與當時 eMMC 標準速度差異不大，且成本較為高昂，因此並未成功普及，直至 2014 年 UFS 2.0 標準問世後，連續讀取速度約達 800MB/s，UFS 至此成為 Android 旗艦手機逐漸採用的標準配置，目前最新的 UFS 3.1 標準，連續讀取速度約為 1,700MB/s。

NVMe（NVM Express）原先是用在 MacBook 上的主流固態存儲裝置標準之一，具備高效率、低負載、低延時的特性，蘋果自 2015 年起首度將 NVMe 應用到 iPhone 6S 系列中，連續讀取速度約為 900MB/s。由於早期 Android 手機多是使用 eMMC 5.1 標準，因此在連續讀取速度方面，iPhone 擁有絕對領先的優勢，直至 UFS 3.0 問世後兩者差距才逐漸縮小。

eMMC 採用的「並行傳輸」技術，必須分開執行讀寫操作，因此當手機執行多任務操作時，eMMC 的反應速度相比 UFS 與 NVMe 會慢上許多。此外，UFS 與 NVMe 日常平均功耗也比起 eMMC 來的更低。

GPIO

GPIO Sysfs Interface for Userspace

THIS ABI IS DEPRECATED, NEW USERSPACE CONSUMERS ARE SUPPOSED TO USE THE CHARACTER DEVICE ABI. THIS OLD SYSFS ABI WILL NOT BE DEVELOPED (NO NEW FEATURES), IT WILL JUST BE MAINTAINED.
DO NOT ABUSE SYSFS TO CONTROL HARDWARE THAT HAS PROPER KERNEL DRIVERS.

See the userspace header in include/uapi/linux/gpio.h for the new character device ABI.

The deprecated sysfs ABI

Paths in Sysfs


/sys/class/gpio
|-- export
|-- gpiochip200 -> ../../devices/pci0000:00/INT34BB:00/gpio/gpiochip200
`-- unexport

There are three kinds of entries in /sys/class/gpio:

Control interfaces used to get userspace control over GPIOs

export


      	echo 19 > export

unexport


      	echo 19 > unexport

GPIOs themselves

direction

value

edge

active_low

GPIO controllers ("gpio_chip" instances)

200

base

label
ngpio


/sys/class/gpio/gpiochip200
|-- base
|-- device -> ../../../INT34BB:00
|-- label
|-- ngpio
|-- power
|   |-- async
|   |-- autosuspend_delay_ms
|   |-- control
|   |-- runtime_active_kids
|   |-- runtime_active_time
|   |-- runtime_enabled
|   |-- runtime_status
|   |-- runtime_suspended_time
|   `-- runtime_usage
|-- subsystem -> ../../../../../class/gpio
`-- uevent

Exporting from Kernel code

Kernel can explicitly manage exports of GPIOs which have already been requested using gpio_request().


	/* export the GPIO to userspace */
	int gpiod_export(struct gpio_desc *desc, bool direction_may_change);

	/* reverse gpio_export() */
	void gpiod_unexport(struct gpio_desc *desc);

	/* create a sysfs link to an exported GPIO node */
	int gpiod_export_link(struct device *dev, const char *name,
		      struct gpio_desc *desc);

After a kernel driver requests a GPIO, it may only be made available in the sysfs interface by gpiod_export().
The driver can control whether the signal direction may change. This helps drivers prevent userspace code from accidentally clobbering important system state. After the GPIO has been exported, gpiod_export_link() allows creating symlinks from elsewhere in sysfs to the GPIO sysfs node.
Drivers can use this to provide the interface under their own device in sysfs with a descriptive name.

Subsystem drivers using GPIO

These drivers can quite easily interconnect with other kernel subsystems using hardware descriptions such as device tree or ACPI:

leds-gpio

leds

gpio-keys

keyboard

gpio_mouse

mouse

restart-gpio

reset

poweroff-gpio
i2c-gpio

busses

gpio-fan

Use these instead of talking directly to the GPIOs from userspace; they integrate with kernel frameworks better than your userspace code could.

References

https://jasonblog.github.io/note/index.html

Data Allignment

What is Data Alignment?

Data alignment means that the address of a data can be evenly divisible by 1, 2, 4, or 8.
In other words, data object can have 1-byte, 2-byte, 4-byte, 8-byte alignment or any power of 2.
For instance, if the address of a data is 12FEECh (1244908 in decimal), then it is 4-byte alignment because the address can be evenly divisible by 4. (You can divide it by 2 or 1, but 4 is the highest number that is divisible evenly.)

CPU accesses memory in 2, 4, 8, 16, or 32 byte chunks at a time.
The reason for doing this is the performance - accessing an address on 4-byte or 16-byte boundary is a lot faster than accessing an address on 1-byte boundary.

If the data is misaligned of 4-byte boundary, CPU has to perform extra work to access the data:

load 2 chucks of data
shift out unwanted bytes
combine them together

This process definitely slows down the performance and wastes CPU cycle just to get right data from memory.

Structure Member Alignment

For example, if you have 1 char variable (1-byte) and 1 int variable (4-byte) in a struct, the compiler will pad 3 bytes between these two variables.
Therefore, the total size of this struct variable is 8 bytes, instead of 5 bytes.
By doing this, the address of this struct data is divisible evenly by 4.
This is called structure member alignment.

搜尋此網誌

I'm Jay's father