Embedded Systems Architecture: A Comprehensive Guide for Engineers and Programmers

Embedded Systems Architecture

A Comprehensive Guide for Engineers and Programmers

By
Tammy Noergaard

CHAPTER 4 Embedded Processors

An electronic device contains at least one master processor, acting as the central controlling device, and can have additional slave processors that work with and are controlled by the master processor.

For ex., STPC ATLAS is a powerful X86 Core PC Compatible Information Appliance System-on-Chip .


The STPC Atlas integrates a standard 5th generation x86 core along with a powerful UMA graphics/video chipset, support logic including PCI, ISA, Local Bus, USB, EIDE controllers and combines them with standard I/O interfaces to provide a single PC compatible subsystem on a single device, suitable for all kinds of terminal and industrial appliances.

In the block diagram of an x86 reference board, the Atlas STPC is the master processor, and the super I/O and ethernet controllers are slave processors.

The complexity of the master processor usually determines whether it is classified as a micro-processor or a microcontroller:
  • microprocessors contain a minimal set of integrated memory and I/O components
  • microcontrollers have most of the system memory and I/O components integrated on the chip.
With fewer components and lower power requirements, an integrated processor may result in a smaller and cheaper board.
Processors are considered to be of the same architecture when they can execute the same set of machine code instructions.

4.1 ISA Architecture Models

The features that are built into an architecture’s instruction set are commonly referred to as the Instruction Set Architecture or ISA.

CHAPTER 5 Board Memory

Embedded platforms can have a memory hierarchy, a collection of different types of memory, each with unique speeds, sizes, and usages.
  • Some of this memory can be physically integrated on the processor, like registers and certain types of primary memory, which is memory connected directly to or integrated in the processor such as ROM, RAM, and level-1 cache.
  • Some types of primary memory, such as ROM, level-2+ cache, and main memory, and secondary/tertiary memory, which is memory that is connected to the board but not the master processor directly
The basics of memory operation are essentially the same whether the memory is integrated into an IC or located discretely on a board.

Primary memory is typically a part of a memory subsystem made up of three components:

  • The memory IC
  • a memory IC is made up of three units: the memory array, the address decoder, and the data interface.
  • An address bus
  • A data bus
Memory ICs that can connect to a board come in a variety of packages,
  • dual inline packages (DIPs)
  • single in-line memory modules (SIMMs)
  • dual in-line memory modules (DIMMs)
The capacitors in the memory array of DRAM are not able to hold a charge (data). The charge gradually dissipates over time, thus requiring some additional mechanism to refresh DRAM, in order to maintain the integrity of the data.
SRAMs usually consume less power than DRAMs, since there is no extra energy needed for a refresh.
DRAM is usually the “main” memory in larger quantities, as well as being used for video RAM and cache.

Level 2+ (level 2 and higher) cache is the level of memory that exists between the CPU and main memory in the memory hierarchy.

Basically, cache is used to store subsets of main memory that are used or accessed often.
  • Writes must be done in both cache and main memory to ensure that cache and main memory are consistent
  • When the CPU wants to read data from memory, level-1 cache is checked first. If the data is in cache, it is called a cache hit, the data is returned to the CPU and the memory access process is complete. If the data is not located in level-1 cache, it is called cache miss. External off-chip caches are then checked, and if there is a miss there also, then on to main memory to retrieve and return the data to the CPU.
In systems with memory management units (MMU) to perform the translation of addresses, cache can be integrated between the master processor and the MMU, or the MMU and main memory.

5.4 Memory Management of External Memory

The two most common types of memory managers found on an embedded board are memory controllers (MEMC) and memory management units (MMUs).

Memory Controller:

  • The memory controller is a hardware component responsible for managing the flow of data between the CPU and the system memory (RAM).
  • Implement and provide glueless interfaces to the different types of memory in the system, such as SRAM and DRAM, synchronizing access to memory and verifying the integrity of the data being transferred.
  • It controls the timing and organization of data transfer between the CPU and RAM, including tasks such as fetching instructions and data from memory and writing data back to memory.
  • The memory controller ensures that data is transferred reliably and efficiently between the CPU and memory modules.
  • It also handles various memory-related tasks such as memory refresh operations (in DRAM), error correction, and sometimes memory mapping.
  • The controller manages the request from the master processor and accesses the appropriate banks, awaiting feedback and returning that feedback to the master processor.
Memory Management Unit (MMU):
  • The MMU is also a hardware component but is more closely associated with the CPU.
  • Its primary function is to translate virtual addresses generated by the CPU into physical addresses used by the memory subsystem.
  • The MMU enables the use of virtual memory, allowing programs to address more memory than physically available by utilizing disk space as an extension of RAM.
  • It implements techniques such as paging and segmentation to manage virtual memory, allocate memory space to processes, and control memory access permissions.
  • The scheme supporting segmentation or paging of the MMU typically depends on the software (the operating system).
  • Additionally, the MMU often handles memory protection, ensuring that processes cannot access memory locations outside their allocated address space.
In the case of translated addresses, the MMU can use level-1 cache or portions of cache allocated as buffers for caching address translations, commonly referred to as the translation lookaside buffer (TLB), on the processor to store the mappings of logical addresses to physical addresses.

5.5 Board Memory and Performance

The performance throughput can be negatively impacted by main memory especially, since the DRAM used for main memory can have a much lower bandwidth than that of the processors.

CHAPTER 6 Board I/O (Input/Output)

Input/output (I/O) components on a board are responsible for moving information into and out of the board to I/O devices connected to an embedded system.
Board I/O can consist of:
  • input components
  • only bring information from an input device to the master processor
  • output components
  • take information out of the master processor to an output device
  • components that do both
In short, board I/O can be
  • as simple as a basic electronic circuit that connects the master processor directly to an I/O device, such as a master processor’s I/O port to a clock or LED located on the board
  • more complex I/O subsystem circuitry that includes several units

6.2 Interfacing the I/O Components

I/O hardware is made up of all or some combination of integrated master processor I/O, I/O controllers, a communications interface, a communication port, I/O buses, and a transmission medium.

For off-board I/O devices, such as keyboards, mice, LCDs, printers, and so on, a transmission medium is used to interconnect the I/O device to an embedded board via a communication port.

The communication port would then be interfaced to an I/O controller, a communication interface controller, or the master processor (with an integrated communication interface) via an I/O bus on the embedded board .
An I/O bus is essentially a collection of wires transmitting the data.
I/O buses typically support various protocols and standards, such as USB (Universal Serial Bus), SATA (Serial ATA), PCIe (Peripheral Component Interconnect Express), and Ethernet, depending on the type of device and the speed and bandwidth requirements.

The design of the communications interface between the I/O controller and master is based on four requirements:

  1. An ability of the master CPU to initialize and monitor the I/O Controller.
  2. I/O controllers can typically be configured via control registers and monitored via status registers.
  3. A way for the master processor to request I/O.
  4. The most common mechanisms used by the master processor to request I/O via the I/O controller are memory-mapped I/O, in which the I/O controller registers have reserved spaces in main memory.
  5. A way for the I/O device to contact the master CPU.
  6. Generally, an I/O device initiates an asynchronous interrupt requesting signaling to indicate (for example) control and status registers can be read from or written to.
  7. Some mechanism for both to exchange data.
  8. data is actually exchanged between the I/O controller and the master processor.
    DMA has the ability to manage data transmissions or receptions directly to and from main memory and an I/O device. Essentially, DMA requests control of the bus from the master processor.

CHAPTER 8 Device Drivers

  • A device driver that is architecture-specific manages the hardware that is integrated into the master processor (the architecture).
  • Examples of architecture-specific drivers that initialize and enable components within a master processor include on-chip memory, integrated memory managers (MMUs), and floating point hardware.
  • A device driver that is generic manages hardware that is located on the board and not integrated onto the master processor.
  • A generic driver can be configured to run on a variety of architectures that contain the related board hardware for which the driver is written.

8.2 Example 2: Memory Device Drivers

The master processor and programmers view memory as a large one-dimensional array, commonly referred to as the Memory Map. In the memory map, each cell of the array is a row of bytes (8 bits) and the number of bytes per row depends on the width of the data bus (8-bit, 16-bit, 32-bit, 64-bit, etc.).
Sample memory map,
When physical memory is referenced from the software’s point-of-view, it is commonly referred to as logical memory, and its most basic unit is the byte.
  • Logical memory refers to the memory space that a process or program can access.
  • It consists of the logical addresses generated by the CPU during program execution.
  • Programs interact with logical memory through pointers, variables, and data structures.
  • Logical memory provides a uniform and abstracted view of memory for programs, allowing them to operate independently of the underlying hardware.
  • Logical memory is made up of all the physical memory (registers, ROM, and RAM) in the entire embedded system.
The memory subsystem includes all types of memory management components, such as memory controllers and MMU, as well as the types of memory in the memory map, such as registers, cache, ROM, DRAM, and so on.
A more complex address translation scheme is implemented in which the logical address provided via OS is made up of a segment number (address of start of segment) and offset (within a segment) which is used to determine the physical address of the memory location.
The primary role of the Memory Management Unit (MMU) is to translate logical addresses generated by the CPU into physical addresses.
This translation process allows programs to operate using logical addresses while the MMU handles the mapping of these logical addresses to physical memory locations.
Through this address translation mechanism, the logical memory space of each process is mapped to the physical memory space available in the system.

Virtual memory is a memory management technique that extends the available logical memory beyond the physical memory capacity of the system.

  • It allows programs to use more memory than physically available by utilizing disk space as an extension of RAM.
  • Virtual memory systems maintain a mapping between logical addresses and physical addresses, swapping data between physical memory and disk storage as needed.
  • This enables efficient memory utilization and allows multiple programs to run simultaneously without exhausting physical memory resources.
OS implement various memory management policies to efficiently manage the mapping between logical and physical memory.
These policies include page replacement algorithms (e.g., LRU, FIFO), memory allocation strategies (e.g., paging, segmentation), and memory protection mechanisms.

The terms "virtual address" and "logical address" are sometimes used interchangeably, but they can have distinct meanings depending on the context.
The main differences:

  • In virtual memory systems, each process has its own virtual address space, which may be larger than the physical memory available in the system.
  • Logical addresses typically correspond directly to physical memory locations in traditional memory addressing schemes.
  • The size of the logical memory is the same as the physical memory.
  • Virtual addresses are translated into physical addresses by the MMU, allowing for dynamic mapping of memory pages to physical memory locations.
Virtual addresses are generated by the CPU during program execution according to the memory addressing scheme implemented by the OS.
The process of generating virtual addresses involves several steps:
  1. Address Space Allocation
  2. When a program is loaded into memory, the OS allocates a contiguous block of virtual memory to the process.
    This block represents the process's address space, which includes the code, data, and stack segments.
  3. Segmentation and Paging
  4. Depending on the memory management scheme used by the operating system, virtual memory may be managed using segmentation, paging, or a combination of both. Segmentation divides the process's address space into logical segments, such as code segment, data segment, and stack segment. Each segment has its own base address and length.
    Paging divides the address space into fixed-size pages. Each page is typically 4 KB or 8 KB in size. Pages can be mapped to physical memory frames or swapped out to disk as needed.
  5. Logical Address Generation
  6. The CPU generates logical addresses during program execution. These addresses are relative to the base address of the segment or page being accessed.
    For example, when accessing a variable in the data segment, the CPU calculates the logical address as the sum of the base address of the data segment and the offset of the variable within the segment.
    Similarly, when accessing an instruction in the code segment, the CPU calculates the logical address based on the base address of the code segment and the offset of the instruction.
  7. Translation to Physical Address
  8. The MMU (Memory Management Unit) translates logical addresses into physical addresses. This translation is performed using hardware mechanisms such as page tables or translation lookaside buffers (TLBs).
    • If the translation is successful, the MMU retrieves the corresponding physical address from the page table or TLB.
    • If the translation is not found in the TLB, it may result in a page fault, prompting the operating system to load the required page into physical memory from disk.
  9. Accessing Physical Memory
  10. Once the MMU has translated the virtual address to a physical address, the CPU can access the corresponding location in physical memory.
    Data can be read from or written to physical memory using the physical address obtained from the translation process.

8.2.1 Memory Management Device Driver Pseudocode Examples

The following pseudocode demonstrates implementation of various memory management routines on the MPC860.

  1. Initializing the Memory Controller and connected ROM/RAM
  2. MPC860 Integrated memory controller:
    The on-board memory (Flash, SRAM, DRAM, etc.) is initialized by initializing the memory controller.
    The memory controller has two different types of subunits, that exist to connect to certain types of memory
    • the general-purpose chip-select machine (GPCM)
    • The GPCM is designed to interface to SRAM, EPROM, Flash EPROM, and other peripherals (such as PCMCIA)
    • the user-programmable machines (UPMs)
    • The UPMs are designed to interface to a wide variety of memory, including DRAMs.
    The pinouts of the MPC860’s memory controller reflect the different signals that connect these subunits to the various types of memory. For every chip select (CS), there is an associated memory bank.
    • PowerPC connected to SRAM
    • PowerPC connected to DRAM
    With every new access request to external memory, the memory controller determines whether the associated address falls into one of the eight address ranges (one for each bank) defined by the eight base registers (which specify the start address of each bank) and option registers (which specify the bank length) pairs.
    If it does, the memory access is processed by either the GPCM or one of the UPMs, depending on the type of memory located in the memory bank that contains the desired address.
    Because each memory bank has a pair of base and option registers (BR0/OR0–BR7/OR7), they need to be configured in the memory controller initialization drivers.
  3. Initializing the Internal Memory Map
  4. The MPC860’s internal memory map contains the architecture’s special purpose registers (SPRs)
  5. Initializing the MMU
  6. The MPC860 MMU allows support for a 4 GB uniform (user) address space that can be divided into pages of a variety of sizes, specifically 4 kB, 16 kB, 512 kB, or 8 MB, that can be individually protected and mapped to physical memory.
    Using the smallest page size(4 KB), a virtual address space can be divided , a translation table(also called memory map or page table) would contain a million address translation entries, one for each 4 KB page in the 4 GB address space.
    The MPC860 MMU does not manage the entire translation table at one time (in fact, most MMUs do not). Typically, there is no need for 4 GB of physical memory to be managed at one time.
    So, as a result, the MPC860 MMU contains small caches (translation lookaside buffers, TBL) within it to store a subset of this memory map. The TLBs has chches for instruction and data.
    In the case of the MPC860,
    • the TLBs are 32-entry and fully associative caches
    • the entire memory map is stored in cheaper off-chip main memory as a two-level tree of data structures that define the physical memory layout of the board and their corresponding effective memory address
    The TLB is how the MMU translates (maps) logical/virtual addresses to physical addresses.
    When the software attempts to access a part of the memory map not within the TLB, a TLB miss occurs, which is essentially a trap requiring the system software (through an exception handler) to load the required translation entry into the TLB. The system software that loads the new entry into the TLB does so through a process called a tablewalk. This is basically the process of traversing the MPC860’s two-level memory map tree in main memory to locate the desired entry to be loaded in the TLB.
    • The first level has 1024 entries(for 10-bit level-1 indexing), where each entry is 4 bytes (24 bits), and represents a segment of virtual memory that is 4 MB in size.
    • The entry has a pointer to base address of the level-2 table which represents the associated 4 MB segment of virtual memory.
    • Within each level-2 table, every entry represents the pages of the respective virtual memory segment.
    The format of the 32-bit logical (effective) address generated by the PowerPC Core differs depending on the page size.
    • For a 4 kB (0x1000, 12 bits addressing range) page
    • the effective address is made up of a 10-bit level-1 index, a 10-bit level-2 index, and a 12-bit page offset
    • For a 16 kB (0x4000, 14 bits addressing range) page
    • the effective address is made up of a 10-bit level-1 index, a 8-bit level-2 index, and a 14-bit page offset
    The larger the virtual memory page size, the less memory used for level-2 translation tables.
    In short, the MMU uses these effective address fields (level-1 index, level-2 index, and offset) in conjunction with other registers, TLB, translation tables, and the tablewalk process to determine the associated physical address.
    The MMU initialization sequence involves initializing the MMU registers and translation table entries.

CHAPTER 9 Embedded Operating Systems

9.3 Memory Management

CPU only executes task code that is in cache or RAM.
The OS treats memory as one large one-dimensional array, called a memory map. Either a hardware component integrated in the master CPU or on the board does the conversion between logical and physical addresses (such as an MMU), or it must be handled via the OS.

Kernel routines run in kernel mode (also referred to as supervisor mode), Higher layers of software run in user mode, and can only access anything running in kernel mode via system calls, the higher-level interfaces to the kernel’s subroutines.

Because multiple processes are sharing the same physical memory when being loaded into RAM for processing, The operating system uses memory “swapping,” where partitions of memory are swapped in and out of memory at run-time. The most common partitions of memory used in swapping are segments and pages . Swapping is the foundation for virtual memory.

A process encapsulates all the information that is involved in executing a program, all of the different types of information within a process are divided into “logical” memory units of variable sizes, called segments. A segment is a set of logical addresses containing the same type of information.
Most OSes typically allow processes to have all or some combination of five types of information within segments:

  • text (or code) segment
  • data segment
  • bss (block started by symbol) segment
  • stack segment
  • heap segment
The data, text, and bss segments are all fixed in size at compile time, and are as such static segments; it is these three segments that typically are part of the executable file.
The OS creates a task’s image by memory mapping the contents of the executable file.
The stack and heap segments, on the other hand, are not fixed at compile time, and can change in size at runtime and so are dynamic allocation components.

Either with or without segmentation, some OSes divide logical memory into some number of fixed-size partitions, called blocks, frames, pages or some combination of a few or all of these.

  • When a process is loaded in its entirety into memory (in the form of pages), its pages may not be located within a contiguous set of frames. Every process has an associated process table that tracks its pages, and each page’s corresponding frames in memory.
  • The logical address spaces generated are unique for each process, typically made up of a page-frame number, which indicates the start of that page, and an offset of an actual memory location within that page.
  • the logical address is the sum of the page number and the offset.
Dividing up logical memory into pages aids the OS in more easily managing tasks being relocated in and out of various types of memory in the memory hierarchy, a process called swapping.

Virtual memory is typically implemented via demand segmentation and/or demand paging memory fragmentation techniques.When virtual memory is implemented via these “demand” techniques, it means that only the pages and/or segments that are currently in use are loaded into RAM.

The OS :

  • generates virtual addresses based on the logical addresses
  • maintains tables for the sets of logical addresses into virtual addresses conversions
  • manages more than one different address space for each process (the physical, logical, and virtual)
The process views memory as one continuous memory space, whereas the kernel actually manages memory as several fragmented pieces which can be segmented and paged, segmented and unpaged, unsegmented and paged, or unsegmented and unpaged.

留言

熱門文章