8月 25, 2022

ARM Virtualization

Understanding virtualization facilities in the ARMv8 processor architecture

While ARMv7 had a special CPU mode to run a hypervisor as an extension, in ARMv8, it has become a part of the architecture, and it has been integrated into the privilege-level system under the name EL2.
Virtualization in ARMv8-based systems is organized as shown,

the EL2 privilege level runs a hypervisor controlling the execution of virtual machines’ (VM) code and sharing of resources between them.
The hypervisor, also known as the Virtual Machine Monitor (VMM), is a software layer that provides virtualization capabilities.

When a program is executed(process), it operates within its own virtual address(VA) space.
Each process has its own unique set of translation tables and the kernel switches from one to another as part of the process of switching context between one task and another.
The program accesses memory using VAs, unaware of the actual physical memory locations.
When the program accesses memory using a VA, the CPU needs to translate this VA into a PA before it can retrieve or store data in memory.
Once the PA is obtained, the CPU can access the data in memory using this physical address.

The levels of EL1 (OS kernel, privileged code) and EL0 (unprivileged code) are left for VM instances.
Address translation is performed in two stages

When a portion of memory is mapped to a cache, it's possible for the cached data to become out of sync with the actual data in memory.
An "invalidate operation" is used to inform the cache that the data it holds is no longer valid, and it needs to be reloaded from the main memory when accessed next.
The term "invalidate operation" typically refers to a mechanism used to mark or flag certain cached data as invalid or stale.

The Translation Lookaside Buffer (TLB) is a hardware cache that stores a mapping between VAs used by a program and their corresponding PAs in RAM (Random Access Memory).
When the CPU issues a 64-bit VA for an instruction fetch, or data access, the MMU hardware translates the VA to the corresponding PA.
The TLB is searched( lookup) to see if there's a mapping for that VA.

If there is a match (a TLB hit), the corresponding PA is obtained directly from the TLB, and the access to memory proceeds.
If there is no match (a TLB miss), the TLB needs to be updated with the mapping for the VA that caused the miss, typically by retrieving the mapping from the page table in main memory. This process is known as a TLB miss handler.
TLB entries can become outdated or invalid. In such cases, TLB entries related to the outdated mapping need to be invalidated. This ensures that the TLB doesn't contain stale information.
A TLB flush operation involves clearing all or a subset of entries in the TLB.

For a Virtual Address the top 16 bits [63:47] (upper bits) must be all 0s or 1s, otherwise the address triggers a fault.

The base addresses of the translation table are specified in the Translation Table Base Registers (TTBR0_EL1) and (TTBR1_EL1) to separate kernel and application VA spaces.

The translation table pointed to by TTBR0 is selected when the upper bits of the VA are all 0.
TTBR1 is selected when the upper bits of the VA are all set to 1.

The Translation Control Register (TCR_EL1) defines the exact number of most significant (upper) bits that are checked.

The integer in the fields T0SZ[5:0] and T1SZ[5:0] give the number of the most significant bits (upper bits) that must be either all 0s or all 1s.

The least significant bits are then used to give an offset within the selected section, so that the MMU combines:

the Physical Address bits from the block table entry
the least significant bits from the original address to produce the final address

Virtual to Physical Address translation for a 512MB [28:0] block,

Virtual to Physical Address translation for a 64KB [0:15] page,

Stage 1 Translation is used by the processor if TLBs cache miss.

Two-level translation

The two-level translation allows VMs to maintain their own translation tables while also allowing the hypervisor to fully control the final results.

the first stage translation

The Intermediate Physical Address Size (IPS) field controls the maximum output address size of IPA.

000=32 bits of Physical Address
101=48 bits of Physical Address

IPA refers to an address that is an intermediate step in the translation process from a VA to a PA in ARMv8-A's memory management system.

A IPA is calculated from a VA using first-level translation tables (pointers held in TTBR0_EL1/TTBR1_EL1 registers).
The two-bit Translation Granule (TG) TG1 and TG0 fields give the granule size for kernel or user space respectively, 00=4KB, 01=16KB, 11=64KB.

the second stage translation

The real physical address(PA) is calculated using the second-level table prepared by the hypervisor (the pointer is stored in the VTTBR_EL2 register).

In ARMv8-A architecture, virtualization support allows multiple operating systems or Virtual Machines (VMs) to run concurrently on a single physical processor. The Hypervisor (or Virtual Machine Monitor) is responsible for managing and orchestrating these virtualized environments.

The VTTBR_EL2 (Virtualization Translation Table Base Register for Exception Level 2) is a system register in the ARMv8-A architecture. It is specific to virtualization and is used in the context of the Hypervisor mode (EL2).
The VTTBR_EL2 register is used to hold the base address of the translation table for stage 1 address translation in the Hypervisor's address translation regime. This translation table maps VAs used by the VMs to PAs in the system's memory.

The VTTBR_EL2 register is part of the virtualization support in ARMv8-A architecture, enabling efficient and secure virtualization of hardware resources.
The VTTBR_EL2 register allows the Hypervisor to efficiently manage address translation and memory access for multiple VMs running concurrently on the same physical processor.

System memory management unit

These aspects of virtualized environments in the ARMv8 systems are handled by two units: the generic interrupt controller (GIC) and the system memory management unit (SMMU).

SMMUs perform translation of I/O addresses in the same way as it is done by CPU's MMU.

The SMMU supports the one- and two-stage translation of I/O addresses.
The benefits of translation and protection of memory areas can be used in VMs as well as in the hypervisor.
Hence, devices are allowed to read/write only to/from specific memory address ranges.

The usage model of translation stages is almost the same as that for the CPU cores:

the output of the first stage produces an IPA unique to the current VM
the output of the second stage produces the real PA unique to the entire system

The format of SMMU translation tables is similar to that for the CPU, with some differences in page attributes.
Each involved device has its own translation context (which ultimately selects the associated translation table set).
Context selection is performed by the unit using the so-called Stream ID, a hardware-dependent device identifier.

SMMU maintenance resembles that of the CPU memory management unit (MMU).
However, the operations on the processor MMU (TLB reset, translation result retrieval, etc.) are performed via special instructions, while for SMMUs, they are performed by accessing context registers.
Context descriptors, in turn, contain pointers to the first-level translation tables.

SMMU versions 3.0 and 3.1 have support for extended stream IDs and use tables in RAM to match the IDs of streams and contexts.
Such tables can have one or two levels.
Table elements contain pointers to context descriptors that are also stored in the memory, as well as the VM identifier to which the element is related, and pointers to the second-level translation tables.
Context descriptors, in turn, contain pointers to the first-level translation tables.
SMMUv3 also supports VM identifier masks, which allows the sharing of translation tables between different VMs, thus reducing the TLB pressure.

Interrupt virtualization

A system that uses virtualization is more complex in interrupt handling:

Some interrupts might be handled by the hypervisor itself.
Other interrupts might come from devices allocated to a Virtual Machine (VM), and need to be handled by software within that VM.

This means that:

you need mechanisms to support the handling of some interrupts in EL2 by the hypervisor.
you also need mechanisms for forwarding other interrupts to a specific VM or specific Virtual CPU (vCPU) within a VM.

To enable these mechanisms, the architecture includes support for virtual interrupts: vIRQs, vFIQs, and vSErrors.
These virtual interrupts behave like their physical counterparts (IRQs, FIQs, and SErrors), but can only be signaled while executing in EL0 and EL1.
It is not possible to receive a virtual interrupt while executing in EL2 or EL3.

Virtual interrupts are processed by the processor in exactly the same way as physical ones.
The processing of interrupts in virtualized environments based on ARMv8 is organized as follows:

physical interrupts from the devices are sent to the EL2 level (to the hypervisor)
the hypervisor activates the corresponding virtual interrupt on the virtual processor if the interrupt is intended for it

Both system and service interrupts can be routed to the hypervisor.
The hypervisor handles physical interrupts before they are virtualized, in accordance with the GIC specification.

The GIC plays a crucial role in virtualized environment functioning.
GIC itself is quite a complicated device due to the necessity of delivering interrupts in multiprocessor systems (existing implementations can have 256 or more hardware threads). Virtual interrupts can be classified into one of the two virtual groups: 0 and 1.

Group 0 holds the so-called fast interrupt requests (FIQs)
Group 1 holds all the others (interrupt requests, IRQs)

Both FIQs and IRQs are mechanisms used to signal the CPU that an external event or condition requires attention.
However, there are differences between FIQs and IRQs in terms of priority, handling, and use cases:

Priority
Handling

FIQ mode

IRQ mode

interrupt service routine (ISR)

Use Cases

Availability

To signal virtual interrupts to EL0/1 (enable virtual interrupts), a hypervisor must set the corresponding routing bit in HCR_EL2.
For example, to enable vIRQ signaling, a hypervisor must set HCR_EL2.IMO. This setting routes physical IRQ exceptions to EL2, and enables signaling of the virtual exception to EL1.

There are two mechanisms for generating virtual interrupts:

Internally by the core, using controls in HCR_EL2.

VI = Setting this bit registers a vIRQ.
VF = Setting this bit registers a vFIQ.
VSE = Setting this bit registers a vSError.

Using a GICv2, or later, interrupt controller.

signal both physical and virtual interrupts

Support for interrupt virtualization in GIC is backed by a list of events representing virtual interrupts, stored in corresponding registers and handled as virtual IRQs or FIQs.
The control of virtual interrupts through the processor register interface resembles that of physical interrupts.
Thus, software running on a virtual processor is able to do the following:

set virtual priority masks
control the way virtual priority is interpreted within groups
acknowledge virtual interrupts
lower the priority of virtual interrupts
deactivate virtual interrupts

To manage virtual interrupts, the CPU interface provides a set of system registers located at the same addresses as the physical interrupt control registers. This means that the control mechanism is absolutely transparent for the VMs.

Interfaces and interaction of the components with the interrupt controller in a virtualized environment in an ARMv8-based system,

Armv8-A virtualization

1 Overview

2 Introduction to virtualization

The term hypervisor means a piece of software that is responsible for creating, managing, and scheduling of Virtual Machines (VMs).
In a Type 2 hypervisor configuration, the Host OS has full control of the hardware platform and all its resources, including CPU and physical memory.

The hypervisor can then host virtual machines, which themselves run an OS. We refer to this as the Guest OS.

A standalone, or Type 1, hypervisor ,

The hypervisor runs directly on the hardware, and has full control of the hardware platform and all its resources, including CPU and physical memory.

A VM will contain one or more vCPUs,

A page of memory might be allocated to a VM, and therefore be accessible to all the vCPUs in that VM.
However, a virtual interrupt is targeted at a specific vCPU, and can only go to that vCPU.

3 Virtualization in AArch64

Secure EL2 is shown in gray. This is because support for EL2 in Secure state is not always available.

4 Stage 2 translation

Stage 2 translation allows a hypervisor to control a view of memory in a Virtual Machine (VM).
Specifically, it allows the hypervisor to control which memory-mapped system resources a VM can access.
Stage 2 translation can be used to ensure that a VM can only see the resources that are allocated to it,

The OS-controlled translation is called stage 1 translation

Intermediate Physical Address (IPA)

the hypervisor-controlled translation is called stage 2 translation.

Each VM is assigned a virtual machine identifier (VMID). The VMID is used to tag translation lookaside buffer (TLB) entries, to identify which VM each entry belongs to.
TLB entries can also be tagged with an Address Space Identifier (ASID).
An application is assigned an ASID by the OS, and all the TLB entries in that application are tagged with that ASID.

Virtual peripherals are completely emulated in software by the hypervisor,

An assigned peripheral is a real physical device that has been allocated to the VM, and mapped into its IPA space.
A virtual peripheral is one that the hypervisor is going to emulate in software.

a stage 2 fault

the hypervisor emulating the peripheral access in the exception handler

ARMv8

ARMv8架構介紹

ARMv8架構有一個重要的特點是他與其之前的架構相容。

Execution state

AArch64 : 64-bit Execution state

提供 31組 64-bit 的通用暫存器，其中 X31 當 Procedure Link Register 使用
提供 64-bit Program Counter(PC), Stack-Poiner (SP )與 Exception-Link-Register (ELR)
定義最多四種 (EL0 - EL3) 特權模式
支援 64-bit 虛擬地址
定義一組 PSTATE 來保存 PE state
沒有協處理器的觀念

AArch32 : 32-bit Execution state

提供 16組 32-bit 的通用暫存器
提供 1組 ELR，作為從 Hyp-Mode 的 Exception 返回之用
提供 A32 (相容ARMv7 ARM) 與 T32 (相容 ARMv7 Thumb) 兩種指令集
使用 32-bit 虛擬地址
使用單一的 CPSR 來保存 PE state
AArch32 只支援 CP10, CP11, CP14, and CP15

在 A32 與 T32 中做切換只要透過BX即可，但要在 AArch32 與 AArch64 間做切換只能透過 Exception

AArch64 指令集 (instruction set) 介紹

ARMv8-A_Architecture_Reference_Manual P.111.

系統層級架構 (System Level Architecture)

ARMv8-A_Architecture_Reference_Manual P.1405

例外層級 (Exception levels)

ARMv8-A_Architecture_Reference_Manual P.1408

ARMv8-A架構定義了四個例外層級，分別為EL0到EL3，其中數字越大代表特權(privilege)越大。

EL0: 無特權模式(unprivileged)
EL1: 作業系統核心模式(OS kernel mode)
EL2: 虛擬機器監視器模式(Hypervisor mode)
EL3: TrustZone® monitor mode

要提升到較高層級需要透過exceptions(如: 中斷、page faults等)。

EL0 => EL1: SVC (system call)
EL1 => EL2: HVC (hypervisor call)
EL2 => EL3: SMC (secure monitor call)

根據目前架構，由下層系統的Execution State決定上層系統所在模式

安全性狀態 (Security state)

You can think of TrustZone as a virtualization technology for ARM CPUs, it virtualizes a physical ARM CPU core.
A TrustZone enabled ARMv8 core can exist in one of 2 states: Secure OR Non-Secure.
This, in turn, allows us to partition all system HW and SW resources so that they exist in 1 of the 2 worlds. ARMv8-A架構提供兩種安全性狀態，每一個安全性狀態有個別的實體記憶體定址空間(Secure physical address space)。

安全狀態(Secure state): PE可以存取安全及不安全的實體定址空間，有EL0.EL1.EL3
不安全狀態(Non-Secure state): 只能存取不安全的實體定址空間，有EL0.EL1.EL2

After a power on or reset, an Armv8 system begins executing code in the `secure state`
This usually involves secure booting of the system along with some level of system initialization.
A few configuration routines are used to divide the system’s entire memory-map into non-overlapping secure, non-secure and non-secure-callable regions.
The result of configurations is called the "security attribution map" (for the entire system).

Configuration of `memory security attributes` is done via 2 HW blocks called `security attribution unit` (SAU) and/or ‘implementation defined attribution unit` (IDAU).
After assigning ‘security attributes’ to system memory, every memory access by the processor is tested for its `memory security attributes` (i.e. is it a secure or non-secure address).
SAU and IDAU work together to enforce memory access restrictions at runtime.

ARM Virtualization

Understanding virtualization facilities in the ARMv8 processor architecture

Two-level translation

the first stage translation

the second stage translation

System memory management unit

Interrupt virtualization

Armv8-A virtualization

1 Overview

2 Introduction to virtualization

3 Virtualization in AArch64

4 Stage 2 translation

ARMv8

ARMv8架構介紹

Execution state

AArch64 指令集 (instruction set) 介紹

系統層級架構 (System Level Architecture)

例外層級 (Exception levels)

安全性狀態 (Security state)

虛擬化 (Virtualization)

指令與例外處理暫存器 (Registers for instruction processing and exception handling)

通用暫存器

CPSR

Process state PSTATE

SP暫存器 (stack pointer registers)

SIMD 與浮點暫存器

程式狀態儲存暫存器SPSRs (Saved Program Status Registers)

例外連結暫存器ELRs (Exception Link Registers)

Syndrome Register

虛擬記憶體系統架構 (Virtual Memory System Architecture)

概述

地址轉換階級控制 (Controlling address translation stages)

記憶體轉換顆粒大小(Memory translation granule size) (暫譯)

Translation table walks

通用計時器 (The Generic Timer)

系統計數器 (System counter)

實體計數器 (physical counter)

虛擬計數器 (virtual counter)

Reference

留言

熱門文章

A Tutorial on the Device Tree

Linux Modem Manager