Linux Device Tree
In the past, the bootloader tells the kernel two things:
- which board is being booted through a machine type integer, passed in register r1 ,
- additional information, called ATAGS, which address is passed to the kernel through register r2.
With a device tree, the description of the hardware is located in a separate binary: the Device Tree Bolb. The bootloader passes the DTB address through the register r2.
On ARM, all Device Tree Source (DTS) files are located in arch/arm/boot/dts:
- .dts files for board-level definitions
- .dtsi files for included files
The tool Device Tree Compiler is used to compile the source into a binary form, the Device Tree Bolb.
The bootloader loads it then the kernel parse it at boot time.
The bootloader loads it then the kernel parse it at boot time.
arch/arm/boot/dts/Makefile lists which DTB should be generated at kernel's building time.
Some bootloaders can't support the device tree. A compatibiity mechanism was added:
- CONFIG_ARM_APPENDED_DTB tells the kernel to look for a DTB right after the kernel image, therefore, the DTB needs to be appended to the kernel image
- CONFIG_ARM_ATAG_DTB_COMPAT tells the kernel to read ATAG information from the bootloader, and use them to update the DT.
There are a great many number of embedded devices that are based on the same silicon chips but designed into custom boards or devices, all of which could use the same kernel if not for the slight differences between them. Furthermore, even the same family of chips can have slightly different connections leading to a custom kernel being required.
As the number of chip-sets increase, so too does the software needed to support them at the kernel level.
Device tree's try to combat these slight differences by allowing hardware resources to be described in a human-readable form which is then converted into a binary object and passed to the kernel at boot-time. This allows the kernel to discover the hardware it is running on instead of embedding this information inside the kernel itself.
This idea is based on the Open Firmware IEEE 1275 Open Firmware implementation.
The Flattened Device Tree (FDT) is a data structure for describing the hardware in a system. It is a derived from the device tree format used by Open Firmware to encapsulate platform information and convey it to the operating system. The operating system uses the FDT data to find and register the devices in the system.
Currently the Linux kernel can read device tree information in the ARM, x86, Microblaze, PowerPC, and Sparc architectures. There is interest in extending support for device trees to other platforms, to unify the handling of platform description across kernel architectures.
The format of the FDT is expressive and able to describe most board design aspects including:
- the number and type of CPUs,
- base addresses and size of RAM,
- busses and bridges,
- peripheral device connections, and
- interrupt controllers and IRQ line connections.
Just like initrd images, an FDT image can either be statically linked into the kernel or passed to the kernel at boot time.
For SOC vendors, FDT reduce or eliminate effort needed to write machine support code (ie arch/arm/mach-*). Focus on device driver development instead.
All Device Tree bindings recognized by the kernel are documented in Documentation/devicetree/bindings. Each binding doucmentations described which properties are accepted, with which values, which properties are mandatory or optional, etc.
http://elinux.org/Device_Tree
Introduction
Device Tree data can be represented in several different formats. It is derived from the device tree format used by Open Firmware to encapsulate platform information and convey it to the Linux operating system. The device tree data is typically created and maintained in a human readable format in .dts source files and .dtsi source include files. The Linux build system pre-processes the source with cpp.
The device tree source is compiled into a binary format contained in a .dtb blob file. The format of the data in the .dtb blob file is commonly referred to as a Flattened Device Tree (FDT). The Linux operating system uses the device tree data to find and register the devices in the system. The FDT is accessed in the raw form during the very early phases of boot, but is expanded into a kernel internal data structure for more efficient access for later phases of the boot and after the system has completed booting.
Currently the Linux kernel can read device tree information in the ARM, x86, Microblaze, PowerPC, and Sparc architectures. There is interest in extending support for device trees to other platforms, to unify the handling of platform description across kernel architectures.
The Flattened Device Tree is...
The Flattened Device Tree (FDT) is a data structure. Nothing more.
It describes a machine hardware configuration. It is derived from the device tree format used by Open Firmware. The format is expressive and able to describe most board design aspects including:
- the number and type of CPUs,
- base addresses and size of RAM,
- busses and bridges,
- peripheral device connections, and
- interrupt controllers and IRQ line connections.
Just like initrd images, an FDT image can either be statically linked into the kernel or passed to the kernel at boot time.
The Flattened Device Tree is not...
- is not a solution to all board port problems
- Nothing will eliminate all board specific drivers for custom and complex boards.
- is not a firmware interface
- It might be part of a generic firmware interface, but on its own the device tree is just a data structure.
- does not replace ATAGS... but an FDT image can be passed via an ATAG.
- See "Competing Solutions" below
- is not intended to be a universal interface.
- It is a useful data structure which solves several problems, but whether or not to use it is still up to the board port author.
- is not an invasive change
- Device Tree is required for new board support in the ARM architecture.
- No requirement to convert existing board ports
- No requirement to modify existing firmware
Advantages
for distributions
- potentially fewer kernel images needed on an installer image (ie. for ARM netbooks)
- Ship one FDT image per machine
- Becomes feasible for current installer image to boot on future hardware platforms using same chipset.
- Note: FDT is only part of the solution here. Some boot software is still required to select and pass in the correct FDT image.
for System on Chip (SoC) vendors
- Reduce or eliminate effort needed to write machine support code (ie arch/arm/mach-* ). Focus on device driver development instead.
for board designers
- Reduce effort required to port.
- SoC vendor supplied reference design binaries may also be bootable on custom machine.
- No need to allocate a new global ARM machine id for each new board variant.
- Use the device tree
, namespace instead
- Use the device tree
- Most board specific code changes constrained to device tree file and device drivers.
- Example: Xilinx FPGA toolchain has a tool to generate a device tree source file from the FPGA design files.
- Since the hardware description is constrained to the device tree source, FPGA engineers can test design changes without getting involved with kernel code.
- Alternately, kernel coders don't need to manually extract design changes from the FPGA design files.
for embedded Linux ecosystem
- smaller amount of board support code to merge
- greater likelihood of mainline support for boards from "uninterested" vendors
- greater ability to correct poor board support by fixing or replacing broken FDT images.
for firmware/bootloader developers
- reduce impact of getting board description wrong (FDT stored as a separate image instead of statically linked into firmware). If initial release gets the board description wrong, then it is easily updated without a risky reflash of firmware.
- expressive format to describe related board variants without allocating new machine numbers or new ATAGs.
- Note: The FDT isn't a replacement to ATAGS, but does supplement them.
Other advantages
- Device tree source and FDTs can easily be machine generated and/or modified.
- Xilinx FPGA tools do device tree source generation
- U-Boot firmware can inspect and modify an FDT image before booting
Competing Solutions
board specific data structures
Some platforms use board-specific C data structures for passing data from the bootloader to the kernel. Notable here is embedded PowerPC support before standardizing on the FDT data format.
Experience with PowerPC demonstrated that using a custom C data structure is certainly an expedient solution for small amounts of data, but it causes maintainability issues in the long term and it doesn't make any attempt to solve the problem of describing the board configuration as a whole. Special cases tend to grow and there is no way for the kernel to determine what specific version of the data structure is passed to it. PowerPCs board info structure ended up being a mess of #ifdefs and ugly hacks, and it still only passed a handful of data like memory size and Ethernet MAC addresses.
ATAGs have the elegance of providing an well defined namespace for passing individual data items (memory regions, initrd address, etc) and the operating system can reliably decode them. However, only a dozen or so ATAGs are defined and is not expressive enough to describe the board design. Using ATAGs essentially requires a separate machine number to be allocated for each board variant, even if they are based on the same design.
That being said, an ATAG is an ideal method for passing an FDT image to the kernel in the same way an ATAG is used to pass the initrd address.
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/devicetree
Documentation for device trees, a data structure by which bootloaders pass hardware layout to Linux in a device-independent manner, simplifying hardware probing.
booting-without-of.txt
I - Introduction
================
During the development of the Linux/ppc64 kernel, and more specifically, the addition of new platform types outside of the old IBM pSeries/iSeries pair, it was decided to enforce some strict rules regarding the kernel entry and bootloader <-> kernel interfaces, in order to avoid the degeneration that had become the ppc32 kernel entry point and the way a new platform should be added to the kernel. ->
The legacy iSeries platform breaks those rules as it predates this scheme, but no new board support will be accepted in the main tree that doesn't follow them properly.
In addition, since the advent of the arch/powerpc merged architecture for ppc32 and ppc64, new 32-bit platforms and 32-bit platforms which move into arch/powerpc will be required to use these rules as well.
The main requirement that will be defined in more detail below is the presence of a device-tree whose format is defined after Open Firmware specification. However, in order to make life easier to embedded board vendors, the kernel doesn't require the device-tree to represent every device in the system and only requires some nodes and properties to be present. This will be described in detail in section III, but, for example, the kernel does not require you to create a node for every PCI device in the system. It is a requirement
to have a node for PCI host bridges in order to provide interrupt routing information and memory/IO ranges, among others. It is also recommended to define nodes for on chip devices and other buses that don't specifically fit in an existing OF specification. This creates a
great flexibility in the way the kernel can then probe those and match drivers to device, without having to hard code all sorts of tables. It also makes it more flexible for board vendors to do minor hardware upgrades without significantly impacting the kernel code or cluttering it with special cases.
1) Entry point for arch/arm
---------------------------
There is one single entry point to the kernel, at the start of the kernel image.
That entry point supports two calling conventions. A summary of the interface is described here. A full description of the boot requirements is documented in Documentation/arm/Booting
a) ATAGS interface. Minimal information is passed from firmware
to the kernel with a tagged list of predefined parameters.
r0 : 0
r1 : Machine type number
r2 : Physical address of tagged list in system RAM
b) Entry with a flattened device-tree block. Firmware loads the
physical address of the flattened device tree block (dtb) into r2,
r1 is not used, but it is considered good practice to use a valid
machine number as described in Documentation/arm/Booting.
r0 : 0
r1 : Valid machine type number. When using a device tree, a single machine type number will often be assigned to represent a class or family of SoCs.
r2 : physical pointer to the device-tree block(defined in chapter II) in RAM. Device tree can be located anywhere in system RAM, but it should be aligned on a 64 bit boundary.
The kernel will differentiate between ATAGS and device tree booting by reading the memory pointed to by r2 and looking for either the flattened device tree block magic value (0xd00dfeed) or the ATAG_CORE value at offset 0x4 from r2 (0x54410001).
*** Note ***
The "Machine ID" is specific to ARM Linux, and the numbers were assigned by the ARM kernel maintainer. Machines supported in mainline are listed in arch/arm/tools/mach-types
; the full registry can be found here.
ARM systems are problematic in that there is no "standard" hardware layout (e.g. the IBM PC-compatible for x86), no standard firmware (e.g. ACPI BIOS), and most peripherals are directly connected to the CPU rather than being behind a probable bus (e.g. PCI). Hence the ARM kernel had to rely on the bootloader telling it what machine it's running on, thus which hard-coded hardware definition/support code to use (see arch/arm/mach-*/
).
Note that this system is now obsolete and the preferred way of describing hardware is with Device Tree, which removes most of the need to fill the kernel with machine-specific code (indeed many of those older systems that are still supported are being converted from "boardfiles" to DT).
2) Entry point for arch/powerpc
-------------------------------
There is one single entry point to the kernel, at the start of the kernel image. That entry point supports two calling conventions:
a) Boot from Open Firmware. If your firmware is compatible
with Open Firmware (IEEE 1275) or provides an OF compatible
client interface API (support for "interpret" callback of
forth words isn't required), you can enter the kernel with:
r5 : OF callback pointer as defined by IEEE 1275
bindings to powerpc. Only the 32-bit client interface
is currently supported
r3, r4 : address & length of an initrd if any or 0
The MMU is either on or off; the kernel will run the
trampoline located in arch/powerpc/kernel/prom_init.c to
extract the device-tree and other information from open
firmware and build a flattened device-tree as described
in b). prom_init() will then re-enter the kernel using
the second method. This trampoline code runs in the
context of the firmware, which is supposed to handle all
exceptions during that time.
b) Direct entry with a flattened device-tree block. This entry
point is called by a) after the OF trampoline and can also be
called directly by a bootloader that does not support the Open
Firmware client interface. It is also used by "kexec" to
implement "hot" booting of a new kernel from a previous
running one. This method is what I will describe in more
details in this document, as method a) is simply standard Open
Firmware, and thus should be implemented according to the
various standard documents defining it and its binding to the
PowerPC platform. The entry point definition then becomes:
r3 : physical pointer to the device-tree block
(defined in chapter II) in RAM
r4 : physical pointer to the kernel itself. This is
used by the assembly code to properly disable the MMU
in case you are entering the kernel with MMU enabled
and a non-1:1 mapping.
r5 : NULL (as to differentiate with method a)
Note about SMP entry: Either your firmware puts your other
CPUs in some sleep loop or spin loop in ROM where you can get
them out via a soft reset or some other means, in which case
you don't need to care, or you'll have to enter the kernel
with all CPUs. The way to do that with method b) will be
described in a later revision of this document.
Board supports (platforms) are not exclusive config options. An
arbitrary set of board supports can be built in a single kernel
image. The kernel will "know" what set of functions to use for a
given platform based on the content of the device-tree. Thus, you
should:
a) add your platform support as a _boolean_ option in
arch/powerpc/Kconfig, following the example of PPC_PSERIES,
PPC_PMAC and PPC_MAPLE. The later is probably a good
example of a board support to start from.
b) create your main platform file as
"arch/powerpc/platforms/myplatform/myboard_setup.c" and add it
to the Makefile under the condition of your CONFIG_
option. This file will define a structure of type "ppc_md"
containing the various callbacks that the generic code will
use to get to your platform specific code
A kernel image may support multiple platforms, but only if the
platforms feature the same core architecture. A single kernel build
cannot support both configurations with Book E and configurations
with classic Powerpc architectures.
3) Entry point for arch/x86
-------------------------------
There is one single 32bit entry point to the kernel at code32_start,
the decompressor (the real mode entry point goes to the same 32bit
entry point once it switched into protected mode). That entry point
supports one calling convention which is documented in
Documentation/x86/boot.txt
The physical pointer to the device-tree block (defined in chapter II)
is passed via setup_data which requires at least boot protocol 2.09.
The type filed is defined as
#define SETUP_DTB 2
This device-tree is used as an extension to the "boot page". As such it
does not parse / consider data which is already covered by the boot
page. This includes memory size, reserved ranges, command line arguments
or initrd address. It simply holds information which can not be retrieved
otherwise like interrupt routing or a list of devices behind an I2C bus.
4) Entry point for arch/mips/bmips
----------------------------------
Some bootloaders only support a single entry point, at the start of the kernel image. Other bootloaders will jump to the ELF start address.
Both schemes are supported; CONFIG_BOOT_RAW=y and CONFIG_NO_EXCEPT_FILL=y, so the first instruction immediately jumps to kernel_entry().
Similar to the arch/arm case (b), a DT-aware bootloader is expected to set up the following registers:
a0 : 0
a1 : 0xffffffff
a2 : Physical pointer to the device tree block (defined in chapter
II) in RAM. The device tree can be located anywhere in the first
512MB of the physical address space (0x00000000 - 0x1fffffff),
aligned on a 64 bit boundary.
Legacy bootloaders do not use this convention, and they do not pass in a
DT block. In this case, Linux will look for a builtin DTB, selected via
CONFIG_DT_*.
This convention is defined for 32-bit systems only, as there are not currently any 64-bit BMIPS implementations.
II - The DT block format
========================
This chapter defines the actual format of the flattened device-tree passed to the kernel. The actual content of it and kernel requirements are described later. You can find example of code manipulating that format in various places, including arch/powerpc/kernel/prom_init.c which will generate a flattened device-tree from the Open Firmware representation, or the fs2dt utility which is part of the kexec tools which will generate one from a filesystem representation. It is expected that a bootloader like uboot provides a bit more support, that will be discussed later as well.
Note: The block has to be in main memory. It has to be accessible in both real mode and virtual mode with no mapping other than main memory. If you are writing a simple flash bootloader, it should copy the block to RAM before passing it to the kernel.
1) Header
---------
The kernel is passed the physical address pointing to an area of memory
that is roughly described in include/linux/of_fdt.h by the structure
boot_param_header:
struct boot_param_header {
u32 magic; /* magic word OF_DT_HEADER */
u32 totalsize; /* total size of DT block */
u32 off_dt_struct; /* offset to structure */
u32 off_dt_strings; /* offset to strings */
u32 off_mem_rsvmap; /* offset to memory reserve map
*/
u32 version; /* format version */
u32 last_comp_version; /* last compatible version */
/* version 2 fields below */
u32 boot_cpuid_phys; /* Which physical CPU id we're
booting on */
/* version 3 fields below */
u32 size_dt_strings; /* size of the strings block */
/* version 17 fields below */
u32 size_dt_struct; /* size of the DT structure block */
};
Along with the constants:
/* Definitions used by the flattened device tree */
#define OF_DT_HEADER 0xd00dfeed /* 4: version,
4: total size */
#define OF_DT_BEGIN_NODE 0x1 /* Start node: full name
*/
#define OF_DT_END_NODE 0x2 /* End node */
#define OF_DT_PROP 0x3 /* Property: name off,
size, content */
#define OF_DT_END 0x9
All values in this header are in big endian format, the various
fields in this header are defined more precisely below. All
"offset" values are in bytes from the start of the header; that is
from the physical base address of the device tree block.
- magic
This is a magic value that "marks" the beginning of the
device-tree block header. It contains the value 0xd00dfeed and is
defined by the constant OF_DT_HEADER
- totalsize
This is the total size of the DT block including the header. The
"DT" block should enclose all data structures defined in this
chapter (who are pointed to by offsets in this header). That is,
the device-tree structure, strings, and the memory reserve map.
- off_dt_struct
This is an offset from the beginning of the header to the start
of the "structure" part the device tree. (see 2) device tree)
- off_dt_strings
This is an offset from the beginning of the header to the start
of the "strings" part of the device-tree
- off_mem_rsvmap
This is an offset from the beginning of the header to the start
of the reserved memory map. This map is a list of pairs of 64-
bit integers. Each pair is a physical address and a size. The
list is terminated by an entry of size 0. This map provides the
kernel with a list of physical memory areas that are "reserved"
and thus not to be used for memory allocations, especially during
early initialization. The kernel needs to allocate memory during
boot for things like un-flattening the device-tree, allocating an
MMU hash table, etc... Those allocations must be done in such a
way to avoid overriding critical things like, on Open Firmware
capable machines, the RTAS instance, or on some pSeries, the TCE
tables used for the iommu. Typically, the reserve map should
contain _at least_ this DT block itself (header,total_size). If
you are passing an initrd to the kernel, you should reserve it as
well. You do not need to reserve the kernel image itself. The map
should be 64-bit aligned.
- version
This is the version of this structure. Version 1 stops
here. Version 2 adds an additional field boot_cpuid_phys.
Version 3 adds the size of the strings block, allowing the kernel
to reallocate it easily at boot and free up the unused flattened
structure after expansion. Version 16 introduces a new more
"compact" format for the tree itself that is however not backward
compatible. Version 17 adds an additional field, size_dt_struct,
allowing it to be reallocated or moved more easily (this is
particularly useful for bootloaders which need to make
adjustments to a device tree based on probed information). You
should always generate a structure of the highest version defined
at the time of your implementation. Currently that is version 17,
unless you explicitly aim at being backward compatible.
- last_comp_version
Last compatible version. This indicates down to what version of
the DT block you are backward compatible. For example, version 2
is backward compatible with version 1 (that is, a kernel build
for version 1 will be able to boot with a version 2 format). You
should put a 1 in this field if you generate a device tree of
version 1 to 3, or 16 if you generate a tree of version 16 or 17
using the new unit name format.
- boot_cpuid_phys
This field only exist on version 2 headers. It indicate which
physical CPU ID is calling the kernel entry point. This is used,
among others, by kexec. If you are on an SMP system, this value
should match the content of the "reg" property of the CPU node in
the device-tree corresponding to the CPU calling the kernel entry
point (see further chapters for more information on the required
device-tree contents)
- size_dt_strings
This field only exists on version 3 and later headers. It
gives the size of the "strings" section of the device tree (which
starts at the offset given by off_dt_strings).
- size_dt_struct
This field only exists on version 17 and later headers. It gives
the size of the "structure" section of the device tree (which
starts at the offset given by off_dt_struct).
So the typical layout of a DT block (though the various parts don't
need to be in that order) looks like this (addresses go from top to
bottom):
------------------------------
base -> | struct boot_param_header |
------------------------------
| (alignment gap) (*) |
------------------------------
| memory reserve map |
------------------------------
| (alignment gap) |
------------------------------
| |
| device-tree structure |
| |
------------------------------
| (alignment gap) |
------------------------------
| |
| device-tree strings |
| |
-----> ------------------------------
|
|
--- (base + totalsize)
(*) The alignment gaps are not necessarily present; their presence
and size are dependent on the various alignment requirements of
the individual data blocks.
2) Device tree generalities
---------------------------
This device-tree itself is separated in two different blocks, a structure block and a strings block. Both need to be aligned to a 4 byte boundary.
First, let's quickly describe the device-tree concept before detailing the storage format. This chapter does _not_ describe the detail of the required types of nodes & properties for the kernel, this is done later in chapter III.
The device-tree layout is strongly inherited from the definition of the Open Firmware IEEE 1275 device-tree. It's basically a tree of nodes, each node having two or more named properties. A property can have a value or not.
It is a tree, so each node has one and only one parent except for the root node who has no parent.
A node has 2 names. The actual node name is generally contained in a
property of type "name" in the node property list whose value is a
zero terminated string and is mandatory for version 1 to 3 of the
format definition (as it is in Open Firmware). Version 16 makes it
optional as it can generate it from the unit name defined below.
There is also a "unit name" that is used to differentiate nodes with
the same name at the same level, it is usually made of the node
names, the "@" sign, and a "unit address", which definition is
specific to the bus type the node sits on.
The unit name doesn't exist as a property per-se but is included in
the device-tree structure. It is typically used to represent "path" in
the device-tree. More details about the actual format of these will be
below.
The kernel generic code does not make any formal use of the
unit address (though some board support code may do) so the only real
requirement here for the unit address is to ensure uniqueness of
the node unit name at a given level of the tree. Nodes with no notion
of address and no possible sibling of the same name (like /memory or
/cpus) may omit the unit address in the context of this specification,
or use the "@0" default unit address. The unit name is used to define
a node "full path", which is the concatenation of all parent node
unit names separated with "/".
The root node doesn't have a defined name, and isn't required to have
a name property either if you are using version 3 or earlier of the
format. It also has no unit address (no @ symbol followed by a unit
address). The root node unit name is thus an empty string. The full
path to the root node is "/".
Every node which actually represents an actual device (that is, a node
which isn't only a virtual "container" for more nodes, like "/cpus"
is) is also required to have a "compatible" property indicating the
specific hardware and an optional list of devices it is fully
backwards compatible with.
Finally, every node that can be referenced from a property in another
node is required to have either a "phandle" or a "linux,phandle"
property. Real Open Firmware implementations provide a unique
"phandle" value for every node that the "prom_init()" trampoline code
turns into "linux,phandle" properties. However, this is made optional
if the flattened device tree is used directly. An example of a node
referencing another node via "phandle" is when laying out the
interrupt tree which will be described in a further version of this
document.
The "phandle" property is a 32-bit value that uniquely
identifies a node. You are free to use whatever values or system of
values, internal pointers, or whatever to generate these, the only
requirement is that every node for which you provide that property has
a unique value for it.
Here is an example of a simple device-tree. In this example, an "o"
designates a node followed by the node unit name. Properties are
presented with their name followed by their content. "content"
represents an ASCII string (zero terminated) value, while
represents a 32-bit value, specified in decimal or hexadecimal (the
latter prefixed 0x). The various nodes in this example will be
discussed in a later chapter. At this point, it is only meant to give
you a idea of what a device-tree looks like. I have purposefully kept
the "name" and "linux,phandle" properties which aren't necessary in
order to give you a better idea of what the tree looks like in
practice.
/ o device-tree
|- name = "device-tree"
|- model = "MyBoardName"
|- compatible = "MyBoardFamilyName"
|- #address-cells = <2>
|- #size-cells = <2>
|- linux,phandle = <0>
|
o cpus
| | - name = "cpus"
| | - linux,phandle = <1>
| | - #address-cells = <1>
| | - #size-cells = <0>
| |
| o PowerPC,970@0
| |- name = "PowerPC,970"
| |- device_type = "cpu"
| |- reg = <0>
| |- clock-frequency = <0x5f5e1000>
| |- 64-bit
| |- linux,phandle = <2>
|
o memory@0
| |- name = "memory"
| |- device_type = "memory"
| |- reg = <0x00000000 0x00000000="" 0x20000000="">
| |- linux,phandle = <3>
|
o chosen
|- name = "chosen"
|- bootargs = "root=/dev/sda2"
|- linux,phandle = <4>
This tree is almost a minimal tree. It pretty much contains the
minimal set of required nodes and properties to boot a linux kernel;
that is, some basic model information at the root, the CPUs, and the
physical memory layout. It also includes misc information passed
through /chosen, like in this example, the platform type (mandatory)
and the kernel command line arguments (optional).
The /cpus/PowerPC,970@0/64-bit property is an example of a
property without a value. All other properties have a value. The
significance of the #address-cells and #size-cells properties will be
explained in chapter IV which defines precisely the required nodes and
properties and their content.
3) Device tree "structure" block
The structure of the device tree is a linearized tree structure. The
"OF_DT_BEGIN_NODE" token starts a new node, and the "OF_DT_END_NODE"
ends that node definition. Child nodes are simply defined before
"OF_DT_END_NODE" (that is nodes within the node). A 'token' is a 32
bit value. The tree has to be "finished" with a OF_DT_END token
Here's the basic structure of a single node:
* token OF_DT_BEGIN_NODE (that is 0x00000001)
* for version 1 to 3, this is the node full path as a zero
terminated string, starting with "/". For version 16 and later,
this is the node unit name only (or an empty string for the
root node)
* [align gap to next 4 bytes boundary]
* for each property:
* token OF_DT_PROP (that is 0x00000003)
* 32-bit value of property value size in bytes (or 0 if no
value)
* 32-bit value of offset in string block of property name
* property value data if any
* [align gap to next 4 bytes boundary]
* [child nodes if any]
* token OF_DT_END_NODE (that is 0x00000002)
So the node content can be summarized as a start token, a full path,
a list of properties, a list of child nodes, and an end token. Every
child node is a full node structure itself as defined above.
NOTE: The above definition requires that all property definitions for
a particular node MUST precede any subnode definitions for that node.
Although the structure would not be ambiguous if properties and
subnodes were intermingled, the kernel parser requires that the
properties come first (up until at least 2.6.22). Any tools
manipulating a flattened tree must take care to preserve this
constraint.
4) Device tree "strings" block
In order to save space, property names, which are generally redundant,
are stored separately in the "strings" block. This block is simply the
whole bunch of zero terminated strings for all property names
concatenated together. The device-tree property definitions in the
structure block will contain offset values from the beginning of the
strings block.
III - Required content of the device tree
=========================================
WARNING: All "linux,*" properties defined in this document apply only
to a flattened device-tree. If your platform uses a real
implementation of Open Firmware or an implementation compatible with
the Open Firmware client interface, those properties will be created
by the trampoline code in the kernel's prom_init() file. For example,
that's where you'll have to add code to detect your board model and
set the platform number. However, when using the flattened device-tree
entry point, there is no prom_init() pass, and thus you have to
provide those properties yourself.
1) Note about cells and address representation
----------------------------------------------
The general rule is documented in the various Open Firmware
documentations. If you choose to describe a bus with the device-tree
and there exist an OF bus binding, then you should follow the
specification. However, the kernel does not require every single
device or bus to be described by the device tree.
In general, the format of an address for a device is defined by the
parent bus type, based on the #address-cells and #size-cells
properties. Note that the parent's parent definitions of #address-cells
and #size-cells are not inherited so every node with children must specify
them. The kernel requires the root node to have those properties defining
addresses format for devices directly mapped on the processor bus.
Those 2 properties define 'cells' for representing an address and a
size. A "cell" is a 32-bit number. For example, if both contain 2
like the example tree given above, then an address and a size are both
composed of 2 cells, and each is a 64-bit number (cells are
concatenated and expected to be in big endian format). Another example
is the way Apple firmware defines them, with 2 cells for an address
and one cell for a size. Most 32-bit implementations should define
#address-cells and #size-cells to 1, which represents a 32-bit value.
Some 32-bit processors allow for physical addresses greater than 32
bits; these processors should define #address-cells as 2.
"reg" properties are always a tuple of the type "address size" where
the number of cells of address and size is specified by the bus
#address-cells and #size-cells. When a bus supports various address
spaces and other flags relative to a given address allocation (like
prefetchable, etc...) those flags are usually added to the top level
bits of the physical address. For example, a PCI physical address is
made of 3 cells, the bottom two containing the actual address itself
while the top cell contains address space indication, flags, and pci
bus & device numbers.
For buses that support dynamic allocation, it's the accepted practice
to then not provide the address in "reg" (keep it 0) though while
providing a flag indicating the address is dynamically allocated, and
then, to provide a separate "assigned-addresses" property that
contains the fully allocated addresses. See the PCI OF bindings for
details.
In general, a simple bus with no address space bits and no dynamic
allocation is preferred if it reflects your hardware, as the existing
kernel address parsing functions will work out of the box. If you
define a bus type with a more complex address format, including things
like address space bits, you'll have to add a bus translator to the
prom_parse.c file of the recent kernels for your bus type.
The "reg" property only defines addresses and sizes (if #size-cells is
non-0) within a given bus. In order to translate addresses upward
(that is into parent bus addresses, and possibly into CPU physical
addresses), all buses must contain a "ranges" property. If the
"ranges" property is missing at a given level, it's assumed that
translation isn't possible, i.e., the registers are not visible on the
parent bus. The format of the "ranges" property for a bus is a list
of:
bus address, parent bus address, size
"bus address" is in the format of the bus this bus node is defining,
that is, for a PCI bridge, it would be a PCI address. Thus, (bus
address, size) defines a range of addresses for child devices. "parent
bus address" is in the format of the parent bus of this bus. For
example, for a PCI host controller, that would be a CPU address. For a
PCI<->ISA bridge, that would be a PCI address. It defines the base
address in the parent bus where the beginning of that range is mapped.
For new 64-bit board support, I recommend either the 2/2 format or
Apple's 2/1 format which is slightly more compact since sizes usually
fit in a single 32-bit word. New 32-bit board support should use a
1/1 format, unless the processor supports physical addresses greater
than 32-bits, in which case a 2/1 format is recommended.
Alternatively, the "ranges" property may be empty, indicating that the
registers are visible on the parent bus using an identity mapping
translation. In other words, the parent bus address space is the same
as the child bus address space.
2) Note about "compatible" properties
-------------------------------------
These properties are optional, but recommended in devices and the root
node. The format of a "compatible" property is a list of concatenated
zero terminated strings. They allow a device to express its
compatibility with a family of similar devices, in some cases,
allowing a single driver to match against several devices regardless
of their actual names.
3) Note about "name" properties
-------------------------------
While earlier users of Open Firmware like OldWorld macintoshes tended
to use the actual device name for the "name" property, it's nowadays
considered a good practice to use a name that is closer to the device
class (often equal to device_type). For example, nowadays, Ethernet
controllers are named "ethernet", an additional "model" property
defining precisely the chip type/model, and "compatible" property
defining the family in case a single driver can driver more than one
of these chips. However, the kernel doesn't generally put any
restriction on the "name" property; it is simply considered good
practice to follow the standard and its evolutions as closely as
possible.
Note also that the new format version 16 makes the "name" property
optional. If it's absent for a node, then the node's unit name is then
used to reconstruct the name. That is, the part of the unit name
before the "@" sign is used (or the entire unit name if no "@" sign
is present).
4) Note about node and property names and character set
-------------------------------------------------------
While Open Firmware provides more flexible usage of 8859-1, this
specification enforces more strict rules. Nodes and properties should
be comprised only of ASCII characters 'a' to 'z', '0' to
'9', ',', '.', '_', '+', '#', '?', and '-'. Node names additionally
allow uppercase characters 'A' to 'Z' (property names should be
lowercase. The fact that vendors like Apple don't respect this rule is
irrelevant here). Additionally, node and property names should always
begin with a character in the range 'a' to 'z' (or 'A' to 'Z' for node
names).
The maximum number of characters for both nodes and property names
is 31. In the case of node names, this is only the leftmost part of
a unit name (the pure "name" property), it doesn't include the unit
address which can extend beyond that limit.
5) Required nodes and properties
--------------------------------
These are all that are currently required. However, it is strongly
recommended that you expose PCI host bridges as documented in the
PCI binding to Open Firmware, and your interrupt tree as documented
in OF interrupt tree specification.
a) The root node
The root node requires some properties to be present:
- model : this is your board name/model
- #address-cells : address representation for "root" devices
- #size-cells: the size representation for "root" devices
- compatible : the board "family" generally finds its way here,
for example, if you have 2 board models with a similar layout,
that typically get driven by the same platform code in the
kernel, you would specify the exact board model in the
compatible property followed by an entry that represents the SoC
model.
The root node is also generally where you add additional properties
specific to your board like the serial number if any, that sort of
thing. It is recommended that if you add any "custom" property whose
name may clash with standard defined ones, you prefix them with your
vendor name and a comma.
Additional properties for the root node:
- serial-number : a string representing the device's serial number
b) The /cpus node
This node is the parent of all individual CPU nodes. It doesn't
have any specific requirements, though it's generally good practice
to have at least:
#address-cells = <00000001>
#size-cells = <00000000>
This defines that the "address" for a CPU is a single cell, and has
no meaningful size. This is not necessary but the kernel will assume
that format when reading the "reg" properties of a CPU node, see
below
c) The /cpus/* nodes
So under /cpus, you are supposed to create a node for every CPU on
the machine. There is no specific restriction on the name of the
CPU, though it's common to call it ,. For
example, Apple uses PowerPC,G5 while IBM uses PowerPC,970FX.
However, the Generic Names convention suggests that it would be
better to simply use 'cpu' for each cpu node and use the compatible
property to identify the specific cpu core.
Required properties:
- device_type : has to be "cpu"
- reg : This is the physical CPU number, it's a single 32-bit cell
and is also used as-is as the unit number for constructing the
unit name in the full path. For example, with 2 CPUs, you would
have the full path:
/cpus/PowerPC,970FX@0
/cpus/PowerPC,970FX@1
(unit addresses do not require leading zeroes)
- d-cache-block-size : one cell, L1 data cache block size in bytes (*)
- i-cache-block-size : one cell, L1 instruction cache block size in
bytes
- d-cache-size : one cell, size of L1 data cache in bytes
- i-cache-size : one cell, size of L1 instruction cache in bytes
(*) The cache "block" size is the size on which the cache management
instructions operate. Historically, this document used the cache
"line" size here which is incorrect. The kernel will prefer the cache
block size and will fallback to cache line size for backward
compatibility.
Recommended properties:
- timebase-frequency : a cell indicating the frequency of the
timebase in Hz. This is not directly used by the generic code,
but you are welcome to copy/paste the pSeries code for setting
the kernel timebase/decrementer calibration based on this
value.
- clock-frequency : a cell indicating the CPU core clock frequency
in Hz. A new property will be defined for 64-bit values, but if
your frequency is < 4Ghz, one cell is enough. Here as well as
for the above, the common code doesn't use that property, but
you are welcome to re-use the pSeries or Maple one. A future
kernel version might provide a common function for this.
- d-cache-line-size : one cell, L1 data cache line size in bytes
if different from the block size
- i-cache-line-size : one cell, L1 instruction cache line size in
bytes if different from the block size
You are welcome to add any property you find relevant to your board,
like some information about the mechanism used to soft-reset the
CPUs. For example, Apple puts the GPIO number for CPU soft reset
lines in there as a "soft-reset" property since they start secondary
CPUs by soft-resetting them.
d) the /memory node(s)
To define the physical memory layout of your board, you should
create one or more memory node(s). You can either create a single
node with all memory ranges in its reg property, or you can create
several nodes, as you wish. The unit address (@ part) used for the
full path is the address of the first range of memory defined by a
given node. If you use a single memory node, this will typically be
@0.
Required properties:
- device_type : has to be "memory"
- reg : This property contains all the physical memory ranges of
your board. It's a list of addresses/sizes concatenated
together, with the number of cells of each defined by the
#address-cells and #size-cells of the root node. For example,
with both of these properties being 2 like in the example given
earlier, a 970 based machine with 6Gb of RAM could typically
have a "reg" property here that looks like:
00000000 00000000 00000000 80000000
00000001 00000000 00000001 00000000
That is a range starting at 0 of 0x80000000 bytes and a range
starting at 0x100000000 and of 0x100000000 bytes. You can see
that there is no memory covering the IO hole between 2Gb and
4Gb. Some vendors prefer splitting those ranges into smaller
segments, but the kernel doesn't care.
e) The /chosen node
This node is a bit "special". Normally, that's where Open Firmware
puts some variable environment information, like the arguments, or
the default input/output devices.
This specification makes a few of these mandatory, but also defines
some linux-specific properties that would be normally constructed by
the prom_init() trampoline when booting with an OF client interface,
but that you have to provide yourself when using the flattened format.
Recommended properties:
- bootargs : This zero-terminated string is passed as the kernel
command line
- linux,stdout-path : This is the full path to your standard
console device if any. Typically, if you have serial devices on
your board, you may want to put the full path to the one set as
the default console in the firmware here, for the kernel to pick
it up as its own default console.
Note that u-boot creates and fills in the chosen node for platforms
that use it.
(Note: a practice that is now obsolete was to include a property
under /chosen called interrupt-controller which had a phandle value
that pointed to the main interrupt controller)
f) the /soc node
This node is used to represent a system-on-a-chip (SoC) and must be
present if the processor is a SoC. The top-level soc node contains
information that is global to all devices on the SoC. The node name
should contain a unit address for the SoC, which is the base address
of the memory-mapped register set for the SoC. The name of an SoC
node should start with "soc", and the remainder of the name should
represent the part number for the soc. For example, the MPC8540's
soc node would be called "soc8540".
Required properties:
- ranges : Should be defined as specified in 1) to describe the
translation of SoC addresses for memory mapped SoC registers.
- bus-frequency: Contains the bus frequency for the SoC node.
Typically, the value of this field is filled in by the boot
loader.
- compatible : Exact model of the SoC
Recommended properties:
- reg : This property defines the address and size of the
memory-mapped registers that are used for the SOC node itself.
It does not include the child device registers - these will be
defined inside each child node. The address specified in the
"reg" property should match the unit address of the SOC node.
- #address-cells : Address representation for "soc" devices. The
format of this field may vary depending on whether or not the
device registers are memory mapped. For memory mapped
registers, this field represents the number of cells needed to
represent the address of the registers. For SOCs that do not
use MMIO, a special address format should be defined that
contains enough cells to represent the required information.
See 1) above for more details on defining #address-cells.
- #size-cells : Size representation for "soc" devices
- #interrupt-cells : Defines the width of cells used to represent
interrupts. Typically this value is <2>, which includes a
32-bit number that represents the interrupt number, and a
32-bit number that represents the interrupt sense and level.
This field is only needed if the SOC contains an interrupt
controller.
The SOC node may contain child nodes for each SOC device that the
platform uses. Nodes should not be created for devices which exist
on the SOC but are not used by a particular platform. See chapter VI
for more information on how to specify devices that are part of a SOC.
Example SOC node for the MPC8540:
soc8540@e0000000 {
#address-cells = <1>;
#size-cells = <1>;
#interrupt-cells = <2>;
device_type = "soc";
ranges = <0x00000000 0x00100000="" 0xe0000000="">
reg = <0xe0000000 0x00003000="">;
bus-frequency = <0>;
}
IV - "dtc", the device tree compiler
====================================
dtc source code can be found at
WARNING: This version is still in early development stage; the
resulting device-tree "blobs" have not yet been validated with the
kernel. The current generated block lacks a useful reserve map (it will
be fixed to generate an empty one, it's up to the bootloader to fill
it up) among others. The error handling needs work, bugs are lurking,
etc...
dtc basically takes a device-tree in a given format and outputs a
device-tree in another format. The currently supported formats are:
Input formats:
-------------
- "dtb": "blob" format, that is a flattened device-tree block
with
header all in a binary blob.
- "dts": "source" format. This is a text file containing a
"source" for a device-tree. The format is defined later in this
chapter.
- "fs" format. This is a representation equivalent to the
output of /proc/device-tree, that is nodes are directories and
properties are files
Output formats:
---------------
- "dtb": "blob" format
- "dts": "source" format
- "asm": assembly language file. This is a file that can be
sourced by gas to generate a device-tree "blob". That file can
then simply be added to your Makefile. Additionally, the
assembly file exports some symbols that can be used.
The syntax of the dtc tool is
dtc [-I ] [-O ]
[-o output-filename] [-V output_version] input_filename
The "output_version" defines what version of the "blob" format will be
generated. Supported versions are 1,2,3 and 16. The default is
currently version 3 but that may change in the future to version 16.
Additionally, dtc performs various sanity checks on the tree, like the
uniqueness of linux, phandle properties, validity of strings, etc...
The format of the .dts "source" file is "C" like, supports C and C++
style comments.
/ {
}
The above is the "device-tree" definition. It's the only statement
supported currently at the toplevel.
/ {
property1 = "string_value"; /* define a property containing a 0
* terminated string
*/
property2 = <0x1234abcd>; /* define a property containing a
* numerical 32-bit value (hexadecimal)
*/
property3 = <0x12345678 0x12345678="" 0xdeadbeef="">;
/* define a property containing 3
* numerical 32-bit values (cells) in
* hexadecimal
*/
property4 = [0x0a 0x0b 0x0c 0x0d 0xde 0xea 0xad 0xbe 0xef];
/* define a property whose content is
* an arbitrary array of bytes
*/
childnode@address { /* define a child node named "childnode"
* whose unit name is "childnode at
* address"
*/
childprop = "hello\n"; /* define a property "childprop" of
* childnode (in this case, a string)
*/
};
};
Nodes can contain other nodes etc... thus defining the hierarchical
structure of the tree.
Strings support common escape sequences from C: "\n", "\t", "\r",
"\(octal value)", "\x(hex value)".
It is also suggested that you pipe your source file through cpp (gcc
preprocessor) so you can use #include's, #define for constants, etc...
Finally, various options are planned but not yet implemented, like
automatic generation of phandles, labels (exported to the asm file so
you can point to a property content and change it easily from whatever
you link the device-tree with), label or path instead of numeric value
in some cells to "point" to a node (replaced by a phandle at compile
time), export of reserve map address to the asm file, ability to
specify reserve map content at compile time, etc...
We may provide a .h include file with common definitions of that
proves useful for some properties (like building PCI properties or
interrupt maps) though it may be better to add a notion of struct
definitions to the compiler...
V - Recommendations for a bootloader
====================================
Here are some various ideas/recommendations that have been proposed
while all this has been defined and implemented.
- The bootloader may want to be able to use the device-tree itself
and may want to manipulate it (to add/edit some properties,
like physical memory size or kernel arguments). At this point, 2
choices can be made. Either the bootloader works directly on the
flattened format, or the bootloader has its own internal tree
representation with pointers (similar to the kernel one) and
re-flattens the tree when booting the kernel. The former is a bit
more difficult to edit/modify, the later requires probably a bit
more code to handle the tree structure. Note that the structure
format has been designed so it's relatively easy to "insert"
properties or nodes or delete them by just memmoving things
around. It contains no internal offsets or pointers for this
purpose.
- An example of code for iterating nodes & retrieving properties
directly from the flattened tree format can be found in the kernel
file drivers/of/fdt.c. Look at the of_scan_flat_dt() function,
its usage in early_init_devtree(), and the corresponding various
early_init_dt_scan_*() callbacks. That code can be re-used in a
GPL bootloader, and as the author of that code, I would be happy
to discuss possible free licensing to any vendor who wishes to
integrate all or part of this code into a non-GPL bootloader.
(reference needed; who is 'I' here? ---gcl Jan 31, 2011)
VI - System-on-a-chip devices and nodes
=======================================
Many companies are now starting to develop system-on-a-chip
processors, where the processor core (CPU) and many peripheral devices
exist on a single piece of silicon. For these SOCs, an SOC node
should be used that defines child nodes for the devices that make
up the SOC. While platforms are not required to use this model in
order to boot the kernel, it is highly encouraged that all SOC
implementations define as complete a flat-device-tree as possible to
describe the devices on the SOC. This will allow for the
genericization of much of the kernel code.
1) Defining child nodes of an SOC
---------------------------------
Each device that is part of an SOC may have its own node entry inside
the SOC node. For each device that is included in the SOC, the unit
address property represents the address offset for this device's
memory-mapped registers in the parent's address space. The parent's
address space is defined by the "ranges" property in the top-level soc
node. The "reg" property for each node that exists directly under the
SOC node should contain the address mapping from the child address space
to the parent SOC address space and the size of the device's
memory-mapped register file.
For many devices that may exist inside an SOC, there are predefined
specifications for the format of the device tree node. All SOC child
nodes should follow these specifications, except where noted in this
document.
See appendix A for an example partial SOC node definition for the
MPC8540.
2) Representing devices without a current OF specification
----------------------------------------------------------
Currently, there are many devices on SoCs that do not have a standard
representation defined as part of the Open Firmware specifications,
mainly because the boards that contain these SoCs are not currently
booted using Open Firmware. Binding documentation for new devices
should be added to the Documentation/devicetree/bindings directory.
That directory will expand as device tree support is added to more and
more SoCs.
VII - Specifying interrupt information for devices
===================================================
The device tree represents the buses and devices of a hardware
system in a form similar to the physical bus topology of the
hardware.
In addition, a logical 'interrupt tree' exists which represents the
hierarchy and routing of interrupts in the hardware.
The interrupt tree model is fully described in the
document "Open Firmware Recommended Practice: Interrupt
Mapping Version 0.9". The document is available at:
1) interrupts property
----------------------
Devices that generate interrupts to a single interrupt controller
should use the conventional OF representation described in the
OF interrupt mapping documentation.
Each device which generates interrupts must have an 'interrupt'
property. The interrupt property value is an arbitrary number of
of 'interrupt specifier' values which describe the interrupt or
interrupts for the device.
The encoding of an interrupt specifier is determined by the
interrupt domain in which the device is located in the
interrupt tree. The root of an interrupt domain specifies in
its #interrupt-cells property the number of 32-bit cells
required to encode an interrupt specifier. See the OF interrupt
mapping documentation for a detailed description of domains.
For example, the binding for the OpenPIC interrupt controller
specifies an #interrupt-cells value of 2 to encode the interrupt
number and level/sense information. All interrupt children in an
OpenPIC interrupt domain use 2 cells per interrupt in their interrupts
property.
The PCI bus binding specifies a #interrupt-cell value of 1 to encode
which interrupt pin (INTA,INTB,INTC,INTD) is used.
2) interrupt-parent property
----------------------------
The interrupt-parent property is specified to define an explicit
link between a device node and its interrupt parent in
the interrupt tree. The value of interrupt-parent is the
phandle of the parent node.
If the interrupt-parent property is not defined for a node, its
interrupt parent is assumed to be an ancestor in the node's
_device tree_ hierarchy.
3) OpenPIC Interrupt Controllers
--------------------------------
OpenPIC interrupt controllers require 2 cells to encode
interrupt information. The first cell defines the interrupt
number. The second cell defines the sense and level
information.
Sense and level information should be encoded as follows:
0 = low to high edge sensitive type enabled
1 = active low level sensitive type enabled
2 = active high level sensitive type enabled
3 = high to low edge sensitive type enabled
4) ISA Interrupt Controllers
----------------------------
ISA PIC interrupt controllers require 2 cells to encode
interrupt information. The first cell defines the interrupt
number. The second cell defines the sense and level
information.
ISA PIC interrupt controllers should adhere to the ISA PIC
encodings listed below:
0 = active low level sensitive type enabled
1 = active high level sensitive type enabled
2 = high to low edge sensitive type enabled
3 = low to high edge sensitive type enabled
VIII - Specifying Device Power Management Information (sleep property)
===================================================================
Devices on SOCs often have mechanisms for placing devices into low-power
states that are decoupled from the devices' own register blocks. Sometimes,
this information is more complicated than a cell-index property can
reasonably describe. Thus, each device controlled in such a manner
may contain a "sleep" property which describes these connections.
The sleep property consists of one or more sleep resources, each of
which consists of a phandle to a sleep controller, followed by a
controller-specific sleep specifier of zero or more cells.
The semantics of what type of low power modes are possible are defined
by the sleep controller. Some examples of the types of low power modes
that may be supported are:
- Dynamic: The device may be disabled or enabled at any time.
- System Suspend: The device may request to be disabled or remain
awake during system suspend, but will not be disabled until then.
- Permanent: The device is disabled permanently (until the next hard
reset).
Some devices may share a clock domain with each other, such that they should
only be suspended when none of the devices are in use. Where reasonable,
such nodes should be placed on a virtual bus, where the bus has the sleep
property. If the clock domain is shared among devices that cannot be
reasonably grouped in this manner, then create a virtual sleep controller
(similar to an interrupt nexus, except that defining a standardized
sleep-map should wait until its necessity is demonstrated).
IX - Specifying dma bus information
Some devices may have DMA memory range shifted relatively to the beginning of
RAM, or even placed outside of kernel RAM. For example, the Keystone 2 SoC
worked in LPAE mode with 4G memory has:
- RAM range: [0x8 0000 0000, 0x8 FFFF FFFF]
- DMA range: [ 0x8000 0000, 0xFFFF FFFF]
and DMA range is aliased into first 2G of RAM in HW.
In such cases, DMA addresses translation should be performed between CPU phys
and DMA addresses. The "dma-ranges" property is intended to be used
for describing the configuration of such system in DT.
In addition, each DMA master device on the DMA bus may or may not support
coherent DMA operations. The "dma-coherent" property is intended to be used
for identifying devices supported coherent DMA operations in DT.
* DMA Bus master
Optional property:
- dma-ranges: encoded as arbitrary number of triplets of
(child-bus-address, parent-bus-address, length). Each triplet specified
describes a contiguous DMA address range.
The dma-ranges property is used to describe the direct memory access (DMA)
structure of a memory-mapped bus whose device tree parent can be accessed
from DMA operations originating from the bus. It provides a means of
defining a mapping or translation between the physical address space of
the bus and the physical address space of the parent of the bus.
(for more information see ePAPR specification)
* DMA Bus child
Optional property:
- dma-ranges: value. if present - It means that DMA addresses
translation has to be enabled for this device.
- dma-coherent: Present if dma operations are coherent
Example:
soc {
compatible = "ti,keystone","simple-bus";
ranges = <0x0 0x0="" 0xc0000000="">;
dma-ranges = <0x80000000 0x00000000="" 0x80000000="" 0x8="">;
[...]
usb: usb@2680000 {
compatible = "ti,keystone-dwc3";
[...]
dma-coherent;
};
};
Appendix A - Sample SOC node for MPC8540
========================================
soc@e0000000 {
#address-cells = <1>;
#size-cells = <1>;
compatible = "fsl,mpc8540-ccsr", "simple-bus";
device_type = "soc";
ranges = <0x00000000 0x00100000="" 0xe0000000="">
bus-frequency = <0>;
interrupt-parent = <&pic>;
ethernet@24000 {
#address-cells = <1>;
#size-cells = <1>;
device_type = "network";
model = "TSEC";
compatible = "gianfar", "simple-bus";
reg = <0x24000 0x1000="">;
local-mac-address = [ 0x00 0xE0 0x0C 0x00 0x73 0x00 ];
interrupts = <0x29 0x30="" 0x34="" 2="">;
phy-handle = <&phy0>;
sleep = <&pmc 0x00000080>;
ranges;
mdio@24520 {
reg = <0x24520 0x20="">;
compatible = "fsl,gianfar-mdio";
phy0: ethernet-phy@0 {
interrupts = <5 1="">;
reg = <0>;
};
phy1: ethernet-phy@1 {
interrupts = <5 1="">;
reg = <1>;
};
phy3: ethernet-phy@3 {
interrupts = <7 1="">;
reg = <3>;
};
};
};
ethernet@25000 {
device_type = "network";
model = "TSEC";
compatible = "gianfar";
reg = <0x25000 0x1000="">;
local-mac-address = [ 0x00 0xE0 0x0C 0x00 0x73 0x01 ];
interrupts = <0x13 0x14="" 0x18="" 2="">;
phy-handle = <&phy1>;
sleep = <&pmc 0x00000040>;
};
ethernet@26000 {
device_type = "network";
model = "FEC";
compatible = "gianfar";
reg = <0x26000 0x1000="">;
local-mac-address = [ 0x00 0xE0 0x0C 0x00 0x73 0x02 ];
interrupts = <0x41 2="">;
phy-handle = <&phy3>;
sleep = <&pmc 0x00000020>;
};
serial@4500 {
#address-cells = <1>;
#size-cells = <1>;
compatible = "fsl,mpc8540-duart", "simple-bus";
sleep = <&pmc 0x00000002>;
ranges;
serial@4500 {
device_type = "serial";
compatible = "ns16550";
reg = <0x4500 0x100="">;
clock-frequency = <0>;
interrupts = <0x42 2="">;
};
serial@4600 {
device_type = "serial";
compatible = "ns16550";
reg = <0x4600 0x100="">;
clock-frequency = <0>;
interrupts = <0x42 2="">;
};
};
pic: pic@40000 {
interrupt-controller;
#address-cells = <0>;
#interrupt-cells = <2>;
reg = <0x40000 0x40000="">;
compatible = "chrp,open-pic";
device_type = "open-pic";
};
i2c@3000 {
interrupts = <0x43 2="">;
reg = <0x3000 0x100="">;
compatible = "fsl-i2c";
dfsrr;
sleep = <&pmc 0x00000004>;
};
pmc: power@e0070 {
compatible = "fsl,mpc8540-pmc", "fsl,mpc8548-pmc";
reg = <0xe0070 0x20="">;
};
};0xe0070>0x3000>0x43>0x40000>2>0>0x42>0>0x4600>0x42>0>0x4500>1>1>0x41>0x26000>0x13>0x25000>3>7>1>5>0>5>0x24520>0x29>0x24000>1>1>0>0x00000000>1>1>0x80000000>0x0> 0x12345678>0x1234abcd> 0>0xe0000000>0x00000000>2>1>1>2> 00000000>00000001>->4>3>0x00000000>2>0x5f5e1000>0>0>1>1>0>2>2>
usage-model.txt
Linux and the Device Tree ------------------------- The Linux usage model for device tree data Author: Grant LikelyThis article describes how Linux uses the device tree. An overview of the device tree data format can be found on the device tree usage page at devicetree.org[1]. [1] http://devicetree.org/Device_Tree_Usage The "Open Firmware Device Tree", or simply Device Tree (DT), is a data structure and language for describing hardware. More specifically, it is a description of hardware that is readable by an operating system so that the operating system doesn't need to hard code details of the machine. Structurally, the DT is a tree, or acyclic graph with named nodes, and nodes may have an arbitrary number of named properties encapsulating arbitrary data. A mechanism also exists to create arbitrary links from one node to another outside of the natural tree structure. Conceptually, a common set of usage conventions, called 'bindings', is defined for how data should appear in the tree to describe typical hardware characteristics including data busses, interrupt lines, GPIO connections, and peripheral devices. As much as possible, hardware is described using existing bindings to maximize use of existing support code, but since property and node names are simply text strings, it is easy to extend existing bindings or create new ones by defining new nodes and properties. Be wary, however, of creating a new binding without first doing some homework about what already exists. There are currently two different, incompatible, bindings for i2c busses that came about because the new binding was created without first investigating how i2c devices were already being enumerated in existing systems. 1. History ---------- The DT was originally created by Open Firmware as part of the communication method for passing data from Open Firmware to a client program (like to an operating system). An operating system used the Device Tree to discover the topology of the hardware at runtime, and thereby support a majority of available hardware without hard coded information (assuming drivers were available for all devices). Since Open Firmware is commonly used on PowerPC and SPARC platforms, the Linux support for those architectures has for a long time used the Device Tree. In 2005, when PowerPC Linux began a major cleanup and to merge 32-bit and 64-bit support, the decision was made to require DT support on all powerpc platforms, regardless of whether or not they used Open Firmware. To do this, a DT representation called the Flattened Device Tree (FDT) was created which could be passed to the kernel as a binary blob without requiring a real Open Firmware implementation. U-Boot, kexec, and other bootloaders were modified to support both passing a Device Tree Binary (dtb) and to modify a dtb at boot time. DT was also added to the PowerPC boot wrapper (arch/powerpc/boot/*) so that a dtb could be wrapped up with the kernel image to support booting existing non-DT aware firmware. Some time later, FDT infrastructure was generalized to be usable by all architectures. At the time of this writing, 6 mainlined architectures (arm, microblaze, mips, powerpc, sparc, and x86) and 1 out of mainline (nios) have some level of DT support. 2. Data Model ------------- If you haven't already read the Device Tree Usage[1] page, then go read it now. It's okay, I'll wait.... 2.1 High Level View ------------------- The most important thing to understand is that the DT is simply a data structure that describes the hardware. There is nothing magical about it, and it doesn't magically make all hardware configuration problems go away. What it does do is provide a language for decoupling the hardware configuration from the board and device driver support in the Linux kernel (or any other operating system for that matter). Using it allows board and device support to become data driven; to make setup decisions based on data passed into the kernel instead of on per-machine hard coded selections. Ideally, data driven platform setup should result in less code duplication and make it easier to support a wide range of hardware with a single kernel image. Linux uses DT data for three major purposes: 1) platform identification, 2) runtime configuration, and 3) device population. 2.2 Platform Identification --------------------------- First and foremost, the kernel will use data in the DT to identify the specific machine. In a perfect world, the specific platform shouldn't matter to the kernel because all platform details would be described perfectly by the device tree in a consistent and reliable manner. Hardware is not perfect though, and so the kernel must identify the machine during early boot so that it has the opportunity to run machine-specific fixups. In the majority of cases, the machine identity is irrelevant, and the kernel will instead select setup code based on the machine's core CPU or SoC. On ARM for example, setup_arch() in arch/arm/kernel/setup.c will call setup_machine_fdt() in arch/arm/kernel/devtree.c which searches through the machine_desc table and selects the machine_desc which best matches the device tree data. It determines the best match by looking at the 'compatible' property in the root device tree node, and comparing it with the dt_compat list in struct machine_desc (which is defined in arch/arm/include/asm/mach/arch.h if you're curious). The 'compatible' property contains a sorted list of strings starting with the exact name of the machine, followed by an optional list of boards it is compatible with sorted from most compatible to least. For example, the root compatible properties for the TI BeagleBoard and its successor, the BeagleBoard xM board might look like, respectively: compatible = "ti,omap3-beagleboard", "ti,omap3450", "ti,omap3"; compatible = "ti,omap3-beagleboard-xm", "ti,omap3450", "ti,omap3"; Where "ti,omap3-beagleboard-xm" specifies the exact model, it also claims that it compatible with the OMAP 3450 SoC, and the omap3 family of SoCs in general. You'll notice that the list is sorted from most specific (exact board) to least specific (SoC family). Astute readers might point out that the Beagle xM could also claim compatibility with the original Beagle board. However, one should be cautioned about doing so at the board level since there is typically a high level of change from one board to another, even within the same product line, and it is hard to nail down exactly what is meant when one board claims to be compatible with another. For the top level, it is better to err on the side of caution and not claim one board is compatible with another. The notable exception would be when one board is a carrier for another, such as a CPU module attached to a carrier board. One more note on compatible values. Any string used in a compatible property must be documented as to what it indicates. Add documentation for compatible strings in Documentation/devicetree/bindings. Again on ARM, for each machine_desc, the kernel looks to see if any of the dt_compat list entries appear in the compatible property. If one does, then that machine_desc is a candidate for driving the machine. After searching the entire table of machine_descs, setup_machine_fdt() returns the 'most compatible' machine_desc based on which entry in the compatible property each machine_desc matches against. If no matching machine_desc is found, then it returns NULL. The reasoning behind this scheme is the observation that in the majority of cases, a single machine_desc can support a large number of boards if they all use the same SoC, or same family of SoCs. However, invariably there will be some exceptions where a specific board will require special setup code that is not useful in the generic case. Special cases could be handled by explicitly checking for the troublesome board(s) in generic setup code, but doing so very quickly becomes ugly and/or unmaintainable if it is more than just a couple of cases. Instead, the compatible list allows a generic machine_desc to provide support for a wide common set of boards by specifying "less compatible" values in the dt_compat list. In the example above, generic board support can claim compatibility with "ti,omap3" or "ti,omap3450". If a bug was discovered on the original beagleboard that required special workaround code during early boot, then a new machine_desc could be added which implements the workarounds and only matches on "ti,omap3-beagleboard". PowerPC uses a slightly different scheme where it calls the .probe() hook from each machine_desc, and the first one returning TRUE is used. However, this approach does not take into account the priority of the compatible list, and probably should be avoided for new architecture support. 2.3 Runtime configuration ------------------------- In most cases, a DT will be the sole method of communicating data from firmware to the kernel, so also gets used to pass in runtime and configuration data like the kernel parameters string and the location of an initrd image. Most of this data is contained in the /chosen node, and when booting Linux it will look something like this: chosen { bootargs = "console=ttyS0,115200 loglevel=8"; initrd-start = <0xc8000000>; initrd-end = <0xc8200000>; }; The bootargs property contains the kernel arguments, and the initrd-* properties define the address and size of an initrd blob. Note that initrd-end is the first address after the initrd image, so this doesn't match the usual semantic of struct resource. The chosen node may also optionally contain an arbitrary number of additional properties for platform-specific configuration data. During early boot, the architecture setup code calls of_scan_flat_dt() several times with different helper callbacks to parse device tree data before paging is setup. The of_scan_flat_dt() code scans through the device tree and uses the helpers to extract information required during early boot. Typically the early_init_dt_scan_chosen() helper is used to parse the chosen node including kernel parameters, early_init_dt_scan_root() to initialize the DT address space model, and early_init_dt_scan_memory() to determine the size and location of usable RAM. On ARM, the function setup_machine_fdt() is responsible for early scanning of the device tree after selecting the correct machine_desc that supports the board. 2.4 Device population --------------------- After the board has been identified, and after the early configuration data has been parsed, then kernel initialization can proceed in the normal way. At some point in this process, unflatten_device_tree() is called to convert the data into a more efficient runtime representation. This is also when machine-specific setup hooks will get called, like the machine_desc .init_early(), .init_irq() and .init_machine() hooks on ARM. The remainder of this section uses examples from the ARM implementation, but all architectures will do pretty much the same thing when using a DT. As can be guessed by the names, .init_early() is used for any machine- specific setup that needs to be executed early in the boot process, and .init_irq() is used to set up interrupt handling. Using a DT doesn't materially change the behaviour of either of these functions. If a DT is provided, then both .init_early() and .init_irq() are able to call any of the DT query functions (of_* in include/linux/of*.h) to get additional data about the platform. The most interesting hook in the DT context is .init_machine() which is primarily responsible for populating the Linux device model with data about the platform. Historically this has been implemented on embedded platforms by defining a set of static clock structures, platform_devices, and other data in the board support .c file, and registering it en-masse in .init_machine(). When DT is used, then instead of hard coding static devices for each platform, the list of devices can be obtained by parsing the DT, and allocating device structures dynamically. The simplest case is when .init_machine() is only responsible for registering a block of platform_devices. A platform_device is a concept used by Linux for memory or I/O mapped devices which cannot be detected by hardware, and for 'composite' or 'virtual' devices (more on those later). While there is no 'platform device' terminology for the DT, platform devices roughly correspond to device nodes at the root of the tree and children of simple memory mapped bus nodes. About now is a good time to lay out an example. Here is part of the device tree for the NVIDIA Tegra board. /{ compatible = "nvidia,harmony", "nvidia,tegra20"; #address-cells = <1>; #size-cells = <1>; interrupt-parent = <&intc>; chosen { }; aliases { }; memory { device_type = "memory"; reg = <0x00000000 0x40000000="">; }; soc { compatible = "nvidia,tegra20-soc", "simple-bus"; #address-cells = <1>; #size-cells = <1>; ranges; intc: interrupt-controller@50041000 { compatible = "nvidia,tegra20-gic"; interrupt-controller; #interrupt-cells = <1>; reg = <0x50041000 0x1000="">, < 0x50040100 0x0100 >; }; serial@70006300 { compatible = "nvidia,tegra20-uart"; reg = <0x70006300 0x100="">; interrupts = <122>; }; i2s1: i2s@70002800 { compatible = "nvidia,tegra20-i2s"; reg = <0x70002800 0x100="">; interrupts = <77>; codec = <&wm8903>; }; i2c@7000c000 { compatible = "nvidia,tegra20-i2c"; #address-cells = <1>; #size-cells = <0>; reg = <0x7000c000 0x100="">; interrupts = <70>; wm8903: codec@1a { compatible = "wlf,wm8903"; reg = <0x1a>; interrupts = <347>; }; }; }; sound { compatible = "nvidia,harmony-sound"; i2s-controller = <&i2s1>; i2s-codec = <&wm8903>; }; }; At .init_machine() time, Tegra board support code will need to look at this DT and decide which nodes to create platform_devices for. However, looking at the tree, it is not immediately obvious what kind of device each node represents, or even if a node represents a device at all. The /chosen, /aliases, and /memory nodes are informational nodes that don't describe devices (although arguably memory could be considered a device). The children of the /soc node are memory mapped devices, but the codec@1a is an i2c device, and the sound node represents not a device, but rather how other devices are connected together to create the audio subsystem. I know what each device is because I'm familiar with the board design, but how does the kernel know what to do with each node? The trick is that the kernel starts at the root of the tree and looks for nodes that have a 'compatible' property. First, it is generally assumed that any node with a 'compatible' property represents a device of some kind, and second, it can be assumed that any node at the root of the tree is either directly attached to the processor bus, or is a miscellaneous system device that cannot be described any other way. For each of these nodes, Linux allocates and registers a platform_device, which in turn may get bound to a platform_driver. Why is using a platform_device for these nodes a safe assumption? Well, for the way that Linux models devices, just about all bus_types assume that its devices are children of a bus controller. For example, each i2c_client is a child of an i2c_master. Each spi_device is a child of an SPI bus. Similarly for USB, PCI, MDIO, etc. The same hierarchy is also found in the DT, where I2C device nodes only ever appear as children of an I2C bus node. Ditto for SPI, MDIO, USB, etc. The only devices which do not require a specific type of parent device are platform_devices (and amba_devices, but more on that later), which will happily live at the base of the Linux /sys/devices tree. Therefore, if a DT node is at the root of the tree, then it really probably is best registered as a platform_device. Linux board support code calls of_platform_populate(NULL, NULL, NULL, NULL) to kick off discovery of devices at the root of the tree. The parameters are all NULL because when starting from the root of the tree, there is no need to provide a starting node (the first NULL), a parent struct device (the last NULL), and we're not using a match table (yet). For a board that only needs to register devices, .init_machine() can be completely empty except for the of_platform_populate() call. In the Tegra example, this accounts for the /soc and /sound nodes, but what about the children of the SoC node? Shouldn't they be registered as platform devices too? For Linux DT support, the generic behaviour is for child devices to be registered by the parent's device driver at driver .probe() time. So, an i2c bus device driver will register a i2c_client for each child node, an SPI bus driver will register its spi_device children, and similarly for other bus_types. According to that model, a driver could be written that binds to the SoC node and simply registers platform_devices for each of its children. The board support code would allocate and register an SoC device, a (theoretical) SoC device driver could bind to the SoC device, and register platform_devices for /soc/interrupt-controller, /soc/serial, /soc/i2s, and /soc/i2c in its .probe() hook. Easy, right? Actually, it turns out that registering children of some platform_devices as more platform_devices is a common pattern, and the device tree support code reflects that and makes the above example simpler. The second argument to of_platform_populate() is an of_device_id table, and any node that matches an entry in that table will also get its child nodes registered. In the Tegra case, the code can look something like this: static void __init harmony_init_machine(void) { /* ... */ of_platform_populate(NULL, of_default_bus_match_table, NULL, NULL); } "simple-bus" is defined in the ePAPR 1.0 specification as a property meaning a simple memory mapped bus, so the of_platform_populate() code could be written to just assume simple-bus compatible nodes will always be traversed. However, we pass it in as an argument so that board support code can always override the default behaviour. [Need to add discussion of adding i2c/spi/etc child devices] Appendix A: AMBA devices ------------------------ ARM Primecells are a certain kind of device attached to the ARM AMBA bus which include some support for hardware detection and power management. In Linux, struct amba_device and the amba_bus_type is used to represent Primecell devices. However, the fiddly bit is that not all devices on an AMBA bus are Primecells, and for Linux it is typical for both amba_device and platform_device instances to be siblings of the same bus segment. When using the DT, this creates problems for of_platform_populate() because it must decide whether to register each node as either a platform_device or an amba_device. This unfortunately complicates the device creation model a little bit, but the solution turns out not to be too invasive. If a node is compatible with "arm,amba-primecell", then of_platform_populate() will register it as an amba_device instead of a platform_device.347>0x1a>70>0x7000c000>0>1>77>0x70002800>122>0x70006300>0x50041000>1>1>1>0x00000000>1>1>0xc8200000>0xc8000000>
留言