Linux Device Drivers- V
Content
- 15. Memory Mapping and DMA
- 16. Block Drivers
- 17. Network Drivers
- 18. TTY Drivers
Chapter 15: Memory Mapping and DMA
The understanding of how Linux memory management works can do:- the mapping of device memory directly into a user process’s address space the mmap() system call
- direct access to user-space pages to map user-space memory into the kernel (with get_user_pages())
- direct memory access (DMA) I/O operations make peripherals can access to system memory directly.
Memory addressing
Physical Memory
- Single Address Space Simple systems have a single address space,
- memory and peripherals share the same addresses Memory is mapped to one part, peripherals are mapped to another.
- All processes and OS share the same memory space No memory protection!
- CPUs with single address space 8086-80206, ARM Cortex-M, 8- and 16-bit PIC, AVR, SH-1, SH-2, most 8- and 16-bit systems
Virtual Memory
Virtual Memory is a system that uses an address mapping which maps virtual address space to physical address space:- Maps virtual addresses to physical RAM
- Maps virtual addresses to hardware devices PCI devices, GPU RAM, On-SoC IP blocks
- Each processes can have a different memory mapping
- One process's RAM is inaccessible (and invisible) to other processes Built-in memory protection
- Kernel RAM is invisible to user space processes
- Memory can be swapped to disk
- Hardware device memory can be mapped into a process's address space Requires the kernel to perform the mapping
- Physical RAM can be mapped into multiple processes at once Shared memory
- Memory regions can have access permissions Read, write, execute
Memory-Management Unit
The memory-management unit (MMU) is the hardware responsible for implementing virtual memory. The MMU sits between the CPU core and memory , it transparently handles all memory accesses from Load/Store instructions:
- map virtual addresses to system RAM or memory-mapped peripheral hardware
- Handles permissions
-
Generates an exception (page fault) on an invalid access(Unmapped address or
insufficient
permissions)
Memory Management in Linux
Address Types
- Physical addresses Addresses as used by the hardware
- Virtual addresses Addresses as used by software
- User virtual addresses These are the regular addresses seen by user-space programs. Each process has its own virtual address space, struct mm used in task_struct . User space processes make full use of the MMU:
- Only the used portions of RAM are mapped
- Memory is not contiguous
- Memory may be swapped out
- Memory can be moved
- Physical addresses The addresses used between the processor and the system’s memory. Physical addresses are 32- or 64-bit quantities; even 32-bit systems can use larger.
- Bus addresses The addresses used between peripheral buses and memory. Often, they are the same as the physical addresses used by the processor, but that is not necessarily the case. Some architectures can provide an I/O memory management unit (IOMMU) that remaps addresses between a bus and main memory.
- Kernel logical addresses These make up the normal address space of the kernel. On most architectures, logical addresses and their associated physical addresses differ only by a constant offset. Memory returned from kmalloc() has a kernel logical address. If you have a logical address, the macro __pa( ) (defined in <asm/page.h>) returns its associated physical address.
- Kernel virtual addresses Kernel virtual addresses are not necessarily the linear, one-to-one mapping to physical addresses. In Linux, the kernel uses virtual addresses, as user space processes do:
- The upper part is used for the kernel space
- The lower part is used for user space
For 64-bit, the split varies by architecture. The memory allocated by vmalloc() has a virtual address (but no direct physical mapping).
Different kernel functions require different types of addresses.
Physical Addresses and Pages
Physical memory is divided into discrete units called pages. The constant PAGE_SIZE gives the page size on any given architecture.High and Low Memory
With 32 bits, it is possible to address 4 GB of memory. A typical split dedicates 3 GB to user space, and 1 GB for kernel space.- Low memory
- Physical memory which has a kernel logical address
- Physically contiguous
- High memory
- Physical memory beyond ~896MB
- Has no logical address
- Not physically contiguous when used in the kernel Often for large buffers which could potentially be too large to find contiguous memory, this is allocated by vmalloc().
- Memory-mapped I/O Map peripheral devices(PCI, SoC IP blocks) into kernel by ioremap(), kmap().
The Memory Map and Struct Page
Chapter 16: Block Drivers
Chapter 17: Network Drivers
Network interface is similar to the mounted block device: a network interface must register itself within specific kernel data structure in order to be invoked when packets are exchanged with the outside world. There are some differences:- the block device needs a file under /dev for operations
- block drivers operate only in response to requests from the kernel, network drivers ask to push incoming packets toward the kernel
How snull Is Designed
- snull supports only IP traffic
- The snull module creates two interfaces.
- whatever you transmit through one of the interfaces loops back to the other one
- hidden loopback the snull interface toggles the least significant bit of the 3d octet of both the source and destination addresses.
x.x.0.x <-> x.x.1.x
Assigning IP Numbers
- snullnet0 is the network that is connected to the sn0 interface
- local0 is the IP address assigned to the sn0 interface
- remote0 is a host in snullnet0, and its fourth octet is the same as that of local1 .
- /etc/networks
snullnet0 192.168.0.0 snullnet1 192.168.1.0
ifconfig sn0 local0 ifconfig sn1 local1
Connecting to the Kernel
To see how real-world Linux network drivers operate: loopback.c, plip.c, and e100.cKERNEL IMPLEMENTATION OF SOCKETS
The BSD socket is a framework that provides a common interface to various different protocol families(PF_INET, PF_IPX, PF_PACKET) and socket types(SOCK_STREAM, SOCK_DGRAM). sys_socket() is the function called in the kernel when user application makes a call to socket() system call. sys_socket() just calls sock_create() to :- initialize the socket and sock structure for the protocol family
- link the socket with the VFS by calling sock_map_fd()
Networking and Network Devices APIs
linux/net.h:- enum sock_type Socket types:
- SOCK_STREAM stream (connection) socket
- SOCK_DGRAM datagram (conn.less) socket
- SOCK_RAW raw socket
- SOCK_RDM
- SOCK_SEQPACKET
- SOCK_DCCP
- SOCK_PACKET
- struct socket general BSD socket.
struct socket { socket_state state; short type; unsigned long flags; struct file *file; struct sock *sk; const struct proto_ops *ops; struct socket_wq wq; };
- state socket state (SS_CONNECTED, etc). This field describes the connection status of the socket.
- type socket type (SOCK_STREAM, etc)
- flags socket flags (SOCK_NOSPACE, etc). These flags reflect the resource status for a given socket and is associated with the receive and send buffer (space availability).
- ops protocol specific socket operations. This is the pointer to the proto_ops structure containing the set of func- tions specifi c to protocol family
struct proto_ops { int family; struct module *owner; int (*release) (struct socket *sock); int (*bind) (struct socket *sock, struct sockaddr *myaddr, int sockaddr_len); int (*connect) (struct socket *sock, struct sockaddr *vaddr, int sockaddr_len, int flags); int (*socketpair)(struct socket *sock1, struct socket *sock2); int (*accept) (struct socket *sock, struct socket *newsock, int flags, bool kern); int (*getname) (struct socket *sock, struct sockaddr *addr, int peer); __poll_t (*poll) (struct file *file, struct socket *sock, struct poll_table_struct *wait); int (*ioctl) (struct socket *sock, unsigned int cmd, unsigned long arg); ... };
struct sock { ... struct socket *sk_socket; ... };
The following kernel data structures and functions are to be used:
- struct net_device linux/netdevice.h:
struct net_device { ... };
- netdev_features_t features;
- netdev_features_t hw_features;
- const struct net_device_ops *netdev_ops;
- int (*ndo_init)(struct net_device *dev); This function is called once when a network device is registered. The network device can use this for any late stage initialization or semantic validation. It can fail with an error code which will be propagated back to register_netdev.
- int (*ndo_open)(struct net_device *dev); This function is called when a network device transitions to the up state.
- int (*ndo_stop)(struct net_device *dev); This function is called when a network device transitions to the down state.
- netdev_tx_t (*ndo_start_xmit)(struct sk_buff *skb, struct net_device *dev); Called when a packet needs to be transmitted. Returns NETDEV_TX_OK.
- void (*ndo_get_stats64)(struct net_device *dev, struct rtnl_link_stats64 *storage);
- struct net_device_stats* (*ndo_get_stats)(struct net_device *dev); Called when a user wants to get the network device usage statistics. Drivers must do one of the following:
- Define ndo_get_stats64() to fill in a zero-initialised rtnl_link_stats64 structure passed by the caller.
- Define ndo_get_stats() to update a net_device_stats structure (which should normally be dev->stats) and return a pointer to it. The structure may be changed asynchronously only if each field is written atomically.
- Update dev->stats asynchronously and atomically, and define neither operation.
- int (*ndo_set_mac_address)(struct net_device *dev, void *addr); This function is called when the Media Access Control address needs to be changed. If this interface is not defined, the MAC address can not be changed.
- int (*ndo_set_config)(struct net_device *dev, struct ifmap *map); Used to set network devices bus interface parameters. This interface is retained for legacy reasons; new devices should use the bus interface (PCI) for low level management.
- int (*ndo_do_ioctl)(struct net_device *dev, struct ifreq *ifr, int cmd); Called when a user requests an ioctl which can't be handled by the generic interface code. If not defined ioctls return not supported error code.
- ...
- const struct ethtool_ops *ethtool_ops; Management operations, optional netdev operations.
struct ethtool_ops { u32 supported_coalesce_params; void (*get_drvinfo)(struct net_device *, struct ethtool_drvinfo *); ... };
- int (*create) (struct sk_buff *skb, struct net_device *dev, unsigned short type, const void *daddr, const void *saddr, unsigned int len);
- int (*parse)(const struct sk_buff *skb, unsigned char *haddr);
- int (*cache)(const struct neighbour *neigh, struct hh_cache *hh, __be16 type);
- void (*cache_update)(struct hh_cache *hh,const struct net_device *dev, const unsigned char *haddr);
- bool (*validate)(const char *ll_header, unsigned int len);
- __be16 (*parse_protocol)(const struct sk_buff *skb);
- sizeof_priv size of private data to allocate space for
- name device name format string, the name of this interface, as is seen by user space; this name can have a printf-style %d in it. The kernel replaces the %d with the next available interface number.
- name_assign_type origin of device name
- setup callback to initialize device
#define alloc_etherdev(sizeof_priv) alloc_etherdev_mq(sizeof_priv, 1) #define alloc_etherdev_mq(sizeof_priv, count) alloc_etherdev_mqs(sizeof_priv, count, count) struct net_device *alloc_etherdev_mqs(int sizeof_priv, unsigned int txqs, unsigned int rxqs) { return alloc_netdev_mqs(sizeof_priv, "eth%d", NET_NAME_UNKNOWN, ether_setup, txqs, rxqs); } /** * ether_setup - setup Ethernet network device * @dev: network device * * Fill in the fields of the device structure with Ethernet-generic values. */ void ether_setup(struct net_device *dev) { dev->header_ops = ð_header_ops; dev->type = ARPHRD_ETHER; dev->hard_header_len = ETH_HLEN; dev->min_header_len = ETH_HLEN; dev->mtu = ETH_DATA_LEN; dev->min_mtu = ETH_MIN_MTU; dev->max_mtu = ETH_DATA_LEN; dev->addr_len = ETH_ALEN; dev->tx_queue_len = DEFAULT_TX_QUEUE_LEN; dev->flags = IFF_BROADCAST|IFF_MULTICAST; dev->priv_flags |= IFF_TX_SKB_SHARING; eth_broadcast_addr(dev->broadcast); }alloc_etherdev() allocates a network device using eth%d for the name argument. It provides its own initialization function (ether_setup) that sets several net_device fields with appropriate values for Ethernet devices. Writers of drivers for other types of devices may want to take advantage of one of the other helper functions:
- alloc_fcdev() for fiber-channel devices
- alloc_fddidev() for FDDI devices
- alloc_trdev() for token ring devices
Setting the network namespace for a network device is done by calling the dev_net_set() method, and getting the network namespace associated to a network device is done by calling the dev_net() method. The nd_net is set typically when a network device is registered or when a network device is moved to a different network namespace
int register_netdev(struct net_device *dev) { int err; if (rtnl_lock_killable()) return -EINTR; err = register_netdevice(dev); rtnl_unlock(); return err; }register_netdev() takes a completed network device structure and add it to the kernel interfaces. A NETDEV_REGISTER message is sent to the netdev notifier chain. This is a wrapper around register_netdevice that takes the rtnl semaphore and expands the device name if you passed a format string to alloc_netdev.
Device Registration
When a driver module is loaded into a running kernel, it requests resources and offers facilities; there’s nothing new in that. Since there is no equivalent of major and minor numbers for network interfaces, the driver inserts a data structure for each newly detected interface into a global list of network devices.The implementation of snull driver:
- pointers to two of struct net_device
struct net_device *snull_devs[2];
snull_devs[0] = alloc_netdev(sizeof(struct snull_priv), "sn%d", snull_init); snull_devs[1] = alloc_netdev(sizeof(struct snull_priv), "sn%d", snull_init); if (snull_devs[0] = = NULL || snull_devs[1] = = NULL) goto out;
for (i = 0; i < 2; i++) if ((result = register_netdev(snull_devs[i]))) printk("snull: error %i registering device \"%s\"\n", result, snull_devs[i]->name);
Initializing Each Device
snull uses a separate initialization function snull_init(struct net_device *dev), the core of this function is as follows:void snull_init(struct net_device *dev) { struct snull_priv *priv; ether_setup(dev); /* assign some of the fields */ dev->open = snull_open; dev->stop = snull_release; dev->set_config = snull_config; dev->hard_start_xmit = snull_tx; dev->do_ioctl = snull_ioctl; dev->get_stats = snull_stats; dev->rebuild_header = snull_rebuild_header; dev->hard_header = snull_header; dev->tx_timeout = snull_tx_timeout; dev->watchdog_timeo = timeout; /* keep the default flags, just add NOARP */ dev->flags |= IFF_NOARP; dev->features |= NETIF_F_NO_CSUM; dev->hard_header_cache = NULL; /* Disable caching */ /* * Then, initialize the priv field. This encloses the statistics * and a few private fields. */ priv = netdev_priv(dev); memset(priv, 0, sizeof(struct snull_priv)); spin_lock_init(&priv->lock); snull_rx_ints(dev, 1); /* enable receive interrupts */ snull_setup_pool(dev); }
- IFF_NOARP
- hard_header_cache
Since the “remote” systems simulated by snull do not really exist, there is nobody available to answer ARP requests for them.
This disables the caching of the (nonexistent) ARP replies on this interface.
Module Unloading
void snull_cleanup(void) { int i; for (i = 0; i < 2; i++) { if (snull_devs[i]) { unregister_netdev(snull_devs[i]); snull_teardown_pool(snull_devs[i]); free_netdev(snull_devs[i]); } } return; }
The net_device Structure in Detail
Opening and Closing
The kernel opens or closes an interface in response to the ifconfig command,- it assigns the address by means of ioctl(SIOCSIFADDR) (Socket I/O Control Set Interface Address).
- it sets the IFF_UP bit in dev->flag by means of ioctl(SIOCSIFFLAGS) (Socket I/O Control Set Interface Flags) This calls the ndo_open() method for the device. open() requests any system resources it needs and tells the interface to come up.
Similarly, when the interface is shut down, ifconfig uses ioctl(SIOCSIFFLAGS) to clear IFF_UP , and the ndo_stop() method is called. stop() shuts down the interface and releases system resources.
Bulk network packet transmission
Every time a packet is transmitted over the network, a sequence of operations must be performed. These include:- acquiring the lock for the queue of outgoing packets
- passing a packet to the driver
- putting the packet in the device's transmit queue
- telling the device to start transmitting
Packet Transmission
Whenever the kernel needs to transmit a data packet, it calls the driver’s hard_start_transmit() method to put the data on an outgoing queue. Each packet handled by the kernel is contained in a socket buffer structure: struct sk_buff. A pointer to sk_buff is usually called skb. sk_buff is the network buffer that represents the network packet on Linux TCP/IP stack, it contains many fields:- Some fields are pointers to transport layer, network layer, and link layer headers. For the outgoing packet, the current layer will be prefixed by the header of the next layer. The protocol headers are built as a packet (sk_buff) traverses down the protocol layers for transmission.
- Some fields contain some control information for each protocol which may be used to build headers and also can also be used to decide the next action to be taken based on specifi c events.
- Some fields manipulate actual packet data
- Some fields provide information about the device the packet arrived and to leave from
- sk_buff structure the sk_buff header
- Linear data block data
- Nonlinear data portion the struct skb_shared_info
struct sk_buff { union { struct { /* These two members must be first. */ struct sk_buff *next; struct sk_buff *prev; union { struct net_device *dev; /* Some protocols might use this space to store information, * while device pointer would be NULL. * UDP receive path is one user. */ unsigned long dev_scratch; }; ... struct list_head list; }; union { struct sock *sk; ... }; union { ktime_t tstamp; u64 skb_mstamp_ns; /* earliest departure time */ }; ... unsigned int len, data_len; __u16 mac_len, hdr_len; ... __u8 pkt_type:3; ... union { __wsum csum; struct { __u16 csum_start; __u16 csum_offset; }; }; __u32 priority; ... __be16 protocol; __u16 transport_header; __u16 network_header; __u16 mac_header; ... /* These elements must be at the end, see alloc_skb() for details. */ sk_buff_data_t tail; sk_buff_data_t end; unsigned char *head, *data; ... };#define skb_shinfo(SKB) ((struct skb_shared_info *)(skb_end_pointer(SKB)))
- next, prev These fields link the related sk_buff together. For example, when a packet is fragmented, each fragment of the original packet is linked through the next field.
- dev Device we arrived on/are leaving by
- dev_scratch alternate use of dev when dev would be NULL
- list This is pointer to the queue (struct sk_buff_head ) or list on which this sk_buff is currently placed.
- sk Pointer to the socket to which this packet ( sk_buff ) belongs.
- tstamp Time we arrived/left
- skb_mstamp_ns earliest departure time; start point for retransmit timer
- len Length of actual data. This field keeps the total length of the data associated with the sk_buff (packet length at any point of time).
- data_len Data length. This fi eld is used only when we have nonlinear data (paged data) associated with the sk_buff . This fi eld indicates the portion of the total packet length that is contained as paged data, which means that the linear data length will be : ( skb->len - skb->data_len ).
- mac_len Length of link layer header
- hdr_len writable header length of cloned skb
- pkt_type This fi eld contains information about the type of the packet. The types generally are multicast, broadcast, loopback, host, other hosts, outgoing and so on
- csum Checksum (must include start/offset pair). This is the checksum of the protocol at any point in time.
- priority This field keeps information about the queuing priority of the packet. This is based on the TOS field of the IP header.
- protocol Packet protocol from driver
- transport_header Transport layer header
- network_header Network layer header
- mac_header Link layer header
- tail Tail pointer. This field points to the last byte of the data residing in the linear data area.
- end End pointer. This fi eld points to the end of the linear data area and is different from tail .
- head Head of buffer. This field points to the start of the linear data area (first byte of the linear data area allocated for the sk_buff ).
- data Data head pointer. This field points to the start of the data residing in the linear data area. The data residing in the linear - data area may not always start from the start of the linear data area pointed to by head.
Whenever we allocate a new sk_buff, we provide the size of the linear data area. At the same time, we initialize the four fields of sk_buff to point to linear data area in appropriate positions.
留言