DMA in linux
http://coweb.cc.gatech.edu/sysHackfest/uploads/58/DMA_howto.1.txt
this document is broken up into three sections:
1) contains a brief summary of DMA in linux and is just an overview (not hw specific)
2) contains a summary of DMA hw for the arm architecture
3) contains a summary of DMA sw for the arm architecture
after reading this, you should read: Documentation/DMA-mapping.txt. This has some good info
that is specific to device drivers (for SCSI, PCI, etc devices)
========================================================================
========================================================================
======================= section 1 ======================================
======================= DMA overview (non-arch specific) ===============
========================================================================
========================================================================
note: this is just a summary of DMA in linux, for a more detailed description of DMA read:
(1) "Linux device drivers", chaper 13 pages 289-300
(2) "Understanding the Linux Kernel", chaper 13, pages 377-391
also note that the information here was taken out of these chapters, and is basically a
summary of the two.
DMA data transfer:
there are two ways how DMA data transfers take place: (1) software asks for
data (i.e. using read); (2) asynchronous request from the hardware
(1) a) process calls read and driver method allocates a DMA buffer. Then
tell the hardware to transfer its data and process is put to sleep
b) hardware writes data to DMA buffer and raises interupt when done
c) interrupt handler gets input data, acks the interrupt and wakes up the process,
which can now read the data.
(2) a) hardware raises interrupt to announce the arrival of data
b) interrupt handler allocates buffer and tells hardware to transfer it's data
c) peripheral device writes data to the buffer and raises an interrupt when done
d) handlers dispatches new data, wakes up relevant process and "housekeeps"
DMA buffer allocation:
If you need more than one page, the pages need to be continuous in physical memory
because the device transfers data using ISA or PCI bus etc, which use physical
addresses.
Note that this contraint doesn't apply to the SBUS because it uses virtual
addresses on the peripheral bus.
ISA - when using kmalloc() need to bitwise-or GFP_DMA with GFP_KERNEL (or _ATOMIC)
because of the following:
GFP_DMA guarantees: (1) physical addresses are consecutive when get_free_page
returns more than one page and (2) only addresses lower than MAX_DMA_ADDRESS
are returned. MAX_DMA_ADDRESS is 16MB on the PC because of ISA constraings
PCI - don't need to use GFP_DMA because there is no MAX_DMA_ADDRESS limit
when you need more than 128KB (because get_free_pages and kmalloc can't return more than
128KB (32 pages) of consecutive memory do the following:
you should allocate memory at boot time or reserve the top of physical RAM for
your buffer. Reserving RAM at boot time is done by passing the 'mem=' argument to
the kernel. Example, if you have 32Mb of RAM use 'mem=31M' so the kernel doesn't
use the last meg. Then get that memory by doing:
vremap is the old way, ioremap is the new way. Both techniques are shown here, but use
ioremap.
dmabuf = vremap( 0x1F00000 /* 31MB */, 0x100000 /* 1MB */ );
dmabus = ioremap( 0x1F00000 /* 31MB */, 0x100000 /* 1MB */ );
Bus Addresses:
DMA hardware uses bus addressses (instead of physical). Note: on the PC, ISA and PCI
addresses are the same as physical but this is not the same for evey platform.
use the following conversions when neccessary:
unsigned long virt_to_bus(volatile void *addr);
void *bus_to_virt(unsigned long addr);
virt_to_bus() would be used when the driver sends address information to an I/O device
(such as a DMAC) and bus_to_virt() is used when information is received from
the bus.
DMA for PCI
"The 2.4 kernel includes a flexible mechanism that supports PCI DMA (also known as bus
mastering). It handles the details of buffer allocation and can deal with setting
up the bus hardware for multipage transfers on hardware that supports them. This
code also takes care of situations in which a buffer lives in a non-DMA-capable
zone of memory, though only on some platforms and at a computational cost.."
Difficult hardware:
To see is PCI DMA is supported on a particular platform, use the following call:
int pci_dma_supported( struct pci_dev *pdev, dma_addr_t mask);
where mask is a bit mask describing which address bits the device can use. A return
of a positive number indicates that it is supported. For example
if your device can only handle 16-bit addresses, you could do the following:
if (pci_dma_supported (pdev, 0xffff))
pdev->dma_mask = 0xffff;
else {
card->use_dma = 0; /* We'll have to live without DMA */
printk (KERN_WARN, "mydev: DMA not supported\n");
}
NOTE: for devices that can handle 32-bit addresses, there is no need to call
pci_dma_supported()
As of 2.4.3, pci_set_dma_mask() has been provided:
int pci_set_dma_mask(struct pci_dev *pdev, dma_addr_t mask);
if DMA is supported with the mask, this will return 0 and set the dma_mask field;
otherwise it returned -EIO.
DMA mappings:
mapping is a combination of allocating a DMA buffer and generating an address for
that buffer that is accessible by the device. There are two types of mappings:
Consistent DMA mappings:
these exist for the life of the driver. This mapped buffer must be
available to both the CPU and the peripheral. (note: other types of
mappings can only be available to one at a time). The buffer should
avoid cachying issues which would cause one not to see an update made
by the other.
Streaming DMA mappings:
used for a single operation. These are prefered over consistent because:
1) each DMA mapping uses one or more mapping registers on the bus and
consistent mapping will use all of these up. 2) on some hardware,
streaming mappings can be optimized.
setting up consistent:
void *pci_alloc_consistent(struct pci_dev *pdev, size_t size, dma_addr_t *bus_addr);
This allocates and maps the buffer. This returns a kernel vitual address for the
buffer. The associated bus address is returned in bus_addr.
To free this buffer use:
void pci_free_consisten(struct pci_dev *pdev, size_t size, void *cpu_addr, dma_handle_t bus_addr);
setting up streaming:
to get a buffer, use the function:
dma_addr_t pci_map_single(struct pci_dev *pdev, void *buffer,
size_t size, int direction);
This returns the bus address to be used by the device or NULL if this failed.
the direction can be:
PCI_DMA_TODEVICE, PCI_DMA_FROMDEVICE, PCI_DMA_BIDIRECTIONAL, PCI_DMA_NONE.
using bidirectional should be avoided if possible because it causes severe
performance penalties. Using buffers with type NONE will cause a kernel panic.
use this function when the transfer is complete:
void pci_unmap_single(struct pci_dev *pdev, dma_addr_t bus_addr,
size_t size, int direction);
the size and direction args must match the ones that were called to create it.
NOTE: - buffer transfers must be used in the direction for which they were allocated
- the mapped buffer belongs to the device not the processor. The driver shouldn't
touch the buffer until it has been pci_unmap_single()'ed
- the buffer must not be unmapped while the DMA is still active.
if it is necessary to access the streaming DMA buffer without unmapping it, use:
void pci_sync_single(struct pci_dev *pdev, dma_handle_t bus_addr,
size_t size, int direction_;
this is to be called before the processor accesses a
PCI_DMA_FROMDEVICE buffre, and after an access to a PCI_DMA_TODEVICE buffer.
Scatter-gather mappings:
this is a special case of streaming DMA mappings. You may need to transfer several buffers
all at the same time. See page 409-410 of device driver book for details on how this
may occur and why scatter-gather has advantages over mapping each buffer by themselves.
To use scatter-gather mappings, you need to:
- create and fill in an array of struct scatterlist for buffs to be transfered
- found in
- has: char *address - address of buffer
unsigned int length - length of that buffer.
- call:
int pci_map_sg(struct pci_dev *pdev, struct scatterlist *list,
int netns, int direction);
This returns the number of DMA buffers to transfer, NOTE: this may be
less than nents which is the nubmer of scatterlist entried passed in.
- transfer each buffer returned by pci_map_sg(). use the following calls to
make your code portable:
dma_addr_t sg_dma_address(struct scatterlist *sg);
unsigned int sg_sma_len(struct scatterlist *sg);
use these to get the fields of the scatterlist entries because the location
in the structure varies from arch to arch.
NOTE: the address and length of the buffers may be differenet from what was passed into
pci_map_sg().
- after the transfer is complete, you need to unmap the scatter gather mapping:
void pci_unmap_sg(struct pci_dev *pdev, struct scatterlist *list,
int nects, int direction);
NOTE: nents must be the the same that you passed into pci_map_sg (not what it returned).
The same rules apply to scatter-gather mappings as the regular streaming mappings.
Use pci_dma_sync_sg() to sync the buffer before access
to the mapped buffer.
See the device driver book page 411 to see how different arch suport PCI DMA because they are very
hardware dependant.
NOTE: PCi DMA interface didn't exist prior to kernel 2.3.41, so this next section is old. Use this for
kernels prior to 2.3.41.
PCI supports multiple bus-masters, so DMA reduces to bus-mastering. Programming a dma
with a PCI includes:
(1) allocating a buffer. you do not need to specify GFP_DMA when calling kmalloc()
because there is no 16MB limit.
(2) talking to the device. The device needs to know about the DMA buffer, so
it needs the address and size of the buffer. The address passed to a PCI
device needs to be a bus-addresss.
see device driver book page 412 for an SBUS explaination.
this next section was from the older book (covering up to kernel 2.1.43) but ISA hasn't changed
since 2.0, so it's still correct
DMA for ISA:
two types of DMA transfers for the ISA bus
(1) native transfer where the DMA uses standard DMAC circuitry on the mainboard to
drive the signal lines on the ISA bus.
(2) ISA-busmater DMA where the transfer is controlled by the peripheral device.
This is similar to PCI devices.
for the native transfer there are 3 pieces involved:
(1) the DMAC - holds information about DMA transfer such as direction, memory
address and teh size of the transfer. It also keeps a counter that tracks
the status of ongoing transfers.
(2) the peripheral device - the device must activate the DMA request signal when
it wants to transfer data. The device issues an interrupt when the
transfer is complete.
(3) the device driver - this provides the DMAC with the directions, RAM address and
the size of the transfer. It also talks to the peripheral device to
prepare it for transferring data, and then responds to the intterupt raised
by the device when the transfer is complete.
The DMA holds "channels" each of which is associated with one set of DMA registers. PCs
have two DMAC devices (each having 4 channels). The second one is the master and
is connected to the system processor. The first is the slave and is connected to channel
0 of the master. Channels are numbered 0 to 7 (#4 is the cascade of the slave onto
the master). The size of the DMA transfer is a 16 bit number representing the
number of bus cycles. The maximum transfer size if 64KB for slave and 128KB for
the master because the master contains 16 bit channels while the slave contains
8-bit channels.
requesting DMA channels:
The flow in the use of a DMA channel is as follows:
(1) request an interrupt line (via request_irq())
(2) request the channel (via request_dma(unsigned int channel, const char *name))
(3) "do stuff"
(4) release dma channel (via free_dma(unsigned int channel))
(5) release irq line (via free_irq(unsigned int irq))
The driver needs to config the DMAC when a read or write is called, or for asyncronous
transfers. the asynchronous transfer is performed at open time or in responce to
an ioctl call. Here are functions for controlling the DMAC:
set_dma_mode() - set the mode (DMA_MODE_READ, DMA_MODE_WRITE, or DMA_MODE_CASCADE)
set_dma_addr() - set the RAM address of the beggining of the data to transfer
set_dma_count() - set the amount of bytes to transfer
disable_dma() - disable a dma channel
enable_dma() - enable a dma channel
get_dma_residue() - used to see if transfer is complete
clear_dma_ff() - clear the DMA flip-flop which is used to control access
to the 16-bit registers. This flip-flop automatically toggles
when 8 bits have been transferred. The programmer needs to
clear the flip-flop before accessing the DMA registers.
========================================================================
========================================================================
======================= section 2 ======================================
======================= DMA hw for arm =================================
========================================================================
========================================================================
note: - modified by summarizing and put into summary type list
OVERVIEW
- DMAC consists of six independent DMA channels (2 channels requred to service full-duplex serial controller)
- DMAC relieves processor of interrupt overhead
- any peripherals (except UDC) may be serviced with programmed I/O instead of DMA
- DMAC has set of config and control regs for each channel and a common data transfer engine to service active channel
CHANNELS
- Channels are services in a fixed priority (channel 0 has highest, and 5 the lowest priority)
- each channel is serviced in increments of that devices burst size and delivered in the granularity of that devices port width
- the burst size and port width for each device is programmed in channel's registers and based on the device's
FIFO depth and bandwidth needs.
- when multiple channels are active, each one is serviced with a burst of data, then the DMAC may perform a context
switch to another channel
- DMAC context switches based on:
- if channel is active
- whether its target device is requesting service (FIFO is half-empty)
- where that channel lies in priority scheme.
DATA TRANSFERS
- data transfers are performed between a device (a serial controller) and memory (ROM, RAM, Flash, SRAM, or DRAM).
- can't transfer to/from PCMCIA space.
- during write:
- burst of data is read from mem as words into a buff inside DMAC
- then data is written to the device according to device's port width and the state of the endian bit (E)
- during read:
- data is read from device according to device's port width and then sent to mem as words.
- the organization of the bytes inside that word is determined by the endian bit (E)
DMAC CONTROL REGISTERS
- DMAC has the following control regs:
- two starting address regs
- two transfer count regs
- these are programmecd by the system at start of transfer.
- buffs: the regs control two rotating buffs during a transfer. the buffs (A and B) can be chained together so
when a transfer of one completes, the other immediately begins
- interrogating the status info in the channel control/status register, the user can safely update the addr pointer
and transfer count of the inactive buff.
DMA REGISTER DEFINITIONS:
- each DMA channel has six 32-bit registers (which is part of the DMAC hardware)
- DMA device addr reg (DDARn)
- DMA control/status reg (DCSRn)
- DMA buffer A start addr (DBSAn)
- DMA buff B start addr (DBSBn)
- DMA buff A transfer count (DBTAn)
- DMA buff B transfer count (DBTBn)
note: n is a value from 0 to 5 and is the channel number
Below is a description of each register:
===========================================================================================
DDARn
- 32-bit read/write reg which has channel info for the target device.
- writes are blocked if the RUN bit in DCSRn is one
- The DA 31:8 field is constructed as follows:
DA 31:28 = Device port address 31:28.
Device port address 27:22 is assumed to be zero.
DA 27:8 = Device port address 21:2.
Device port address 1:0 is assumed to be zero.
below is the format for DDARn
Bits Name Description
0 RW Device data transfer direction (read/write).
0 = Transfer is a write (memory to device).
1 = Transfer is a read (device to memory).
fixed for each device type
1 E Device endianess.
0 = Byte ordering is little endian.
1 = Byte ordering is big endian.
2 BS Device burst size.
0 = Four datums per burst.
1 = Eight datums per burst.
fixed for each device type. This val is chosen based on the FIFO size of the device
3 DW Device datum width.
0 = Datum size is one byte.
1 = Datum size is one half-word.
This is fixed for each device type
7..4 DS 3..0 Device select.
This field is programmed to point to the desired device (which channel this channel
responds to).
31..8 DA 31..8 Device address field.
This field is a partial address of the data port of the
device currently being serviced. 1
===========================================================================================
DMA control/status reg: DCSRn
- 32-bit read/write reg which contains the control and status bits for the channel
Bits Name Description
0 RUN Run bit.
This is a control bit and is set by the user to indicate
that the device address register has been loaded. No
transfer will occur on this channel unless this bit is set.
Clearing the RUN bit on an active channel acts as a pause
to that channel. Operation can then be resumed by again
setting the RUN bit. If the RUN bit is cleared in the middle
of a burst, the burst will complete before the channel is paused.
The DDAR may be written only when RUN is zero.
1 IE Interrupt enable.
This bit enables interrupts to be passed onto the interrupt
controller. An interrupt is the 'OR' of the DONEA, DONEB,
and ERROR bits. The interrupt is negated when all of these
status bits are cleared.
2 ERROR Transfer error bit.
ERROR is a status bit and is set to indicate that a memory
error has occurred (only reserved mem, not non-existant mem).
It can generate an interrupt if the IE bit is set. ERROR is
cleared by software through setting the RUN bit. If enabled,
ERROR generates a channel interrupt.
3 DONEA Buffer A done.
This bit is a status bit and indicates that the transfer
into or out of buffer A has completed. It is cleared by
writing a one to it or by setting the STRTA bit. DONEA can
generate an interrupt if IE is set. If enabled, DONEA causes
a channel interrupt.
4 STRTA Buffer A transfer start.
This bit is a control bit and is written by the user. It
causes the buffer A transfer to begin. This bit is
functional only if the RUN bit is set. The immediate action
from setting STRTA depends on the state of the BIU bit.
5 DONEB This bit is a status bit and indicates that the transfer
into or out of buffer B has completed. It is cleared by
writing a one to it or by setting the STRTB bit. DONEB can
generate an interrupt if IE is set. If enabled, DONEB will
cause a channel interrupt
6 STRTB Buffer B transfer start.
This bit is a control bit and is written by the processor.
It causes the buffer B transfer to begin. This bit is
functional only if the RUN bit is set. The immediate action
from setting STRTB depends on the state of the BIU bit.
7 BIU Buffer in use.
BIU is a status bit and may be read to indicate which
buffer (A or B) is active . This bit is toggled by the DMA
controller when DONEA or DONEB are set. This bit is cleared
by all reset sources (hard, sleep, watchdog, or software).
The processor must interrogate this bit before programming the
channel for a new transfer. If both STRTA and STRTB are set at
the same time, the first buffer serviced depends on the state of BIU.
31..8 -- Reserved.
These bits are reserved and read as zeros. Writes to this
field have no effect.
===========================================================================================
DMA Buffer A Start Address Register (DBSAn)
- 32-bit read/write register that contains the starting memory address for buffer A.
- may be written only when STRTA is zero.
===========================================================================================
DMA Buffer A Transfer Count Register (DBTAn)
- 32-bit read/write register that contains the current transfer count in bytes for buffer A.
- may be written only when the STRTA bit for this channel is a zero.
Bits Name Description
12..0 TCA 12..0 Transfer count (buffer A).
This field is a 13-bit value and contains the current transfer count
(in bytes) for the transfer to or from buffer A. The maximum value
programmed via this transfer count is 8 Kbyte.
31..13 -- Reserved. These bits are reserved and read as zeros. Writes to this field have no effect.
===========================================================================================
DMA Buffer B Start Address Register (DBSBn)
- 32-bit read/write register that contains the starting memory address for buffer B.
- may be written only while STRTB in the DCSR is zero.
===========================================================================================
DMA Buffer B Transfer Count Register (DBTBn)
- 32-bit read/write register that contains the current transfer count in bytes for buffer B.
- may be written only when the STRTB bit for this channel is a zero.
Bits Name Description
12..0 TCB 12..0 Transfer count (buffer B).
This field is a 13-bit value and contains the current transfer count (in bytes) for the transfer to or
from buffer B. The maximum value programmed via this transfer count is 8 Kbyte.
31..13 -- Reserved.
These bits are reserved and read as zeros. Writes to this field have no effect.
===========================================================================================
DMA OPERATION
- DMAC has dynamic context switching between active channels (this is demand based)
- context switch occurs when channel completes a command or when a burst (portion of transfer) is complete
- i.e. FIFO in transmit serial controller is full and can't accept any more data, that channel can now be
switched out of the active context for another channel requesting service.
- channels are serviced in priority: channel 0 highest, channel 5 being lowest.
see: SA-1110 Developer�s Manual 11-13 -> Peripheral Control Module -> 11.6.2 DMA Operation
- contains a table representing the DMA registers
========================================================================
========================================================================
======================= section 3 ======================================
======================= DMA sw for arm =================================
========================================================================
========================================================================
DMA software stuff
as stated above in hw section:
- DMAC = six independent DMA channels (2 channels requred to service full-duplex serial controller)
- DMAC relieves processor of interrupt overhead
- any peripherals (except UDC) may be serviced with programmed I/O instead of DMA
SA11x0 DMA API
--------------
int sa1100_request_dma( dmach_t *channel, const char *device_id );
- This is to be called before any other DMA calls
- search for free DMA channel and put it in *channel (passed in)
- device_id points to a string identifying DMA usage or device (used for /proc)
- if no channel is available, an error code is returned.
===========================================================================================
int sa1100_dma_set_device( dmach_t channel, dma_device_t device );
- after DMA channel has be registers, it needs to be assigned to a peripheral port.
NOTE: reading and writting to/from a port are different streams, thus two
DMA channels are needed to perform each.
- see include/asm-arm/arch-sa1100/dma.h for possible dma_device_t values.
- channel arg is the one from calling sa1100_request_dma()
===========================================================================================
int sa1100_dma_queue_buffer( dmach_t channel, void *buf_id,
dma_addr_t data, int size );
- enques the buffer for DMA processing.
- buffer is transmitted or filled with incoming data depending on channel
configureation (from sa1100_dma_set_device() )
- is queue is empty, DMA starts immediately on given buffer
Arguments are:
dmach_t channel: the channel number.
void *buf_id: a buffer identification known by the caller.
dma_addr_t data: the buffer's physical address.
int size: the buffer size in bytes.
NOTE: dma_addr_t is not the virtual addr (the address returned by kmalloc(), etc)
the DMAC needs a physical addr to a buffer which is not cached by the CPU
data cache. To acheive this, use the DMA mapping functions (see
Documentation/DMA-mapping.txt). The relevant ones are pci_alloc_consistent(),
pci_map_single() and the unmap counterparts. The PCI dev arg is NULL.
- no restriction on buffer size. DMA code will split it up internally for DMAC as needed.
- if buffer can't be enqueued, the appropriate error code is retuned.
===========================================================================================
int sa1100_dma_set_callback( dmach_t channel, dma_callback_t cb );
- a callback function is needed to notify the driver when the DMA completes with a
buffer. The callback prototype is:
void dma_callback( void *buf_id, int size );
buf_id: buffer id as passed into sa1100_dma_queue_buffer()
size: number of bytes the DMA processed (should be same as buffer size)
NOTE: The callback func is called while in interrupt context, so it should be small
and efficient while posponing more complex stuff to a bottom-half function.
===========================================================================================
int sa1100_dma_get_current( dmach_t channel, void **buf_id,
dma_addr_t *addr );
- returns the buffer ID and DMA addr pointer within the buffer currently being
processed
- this is used for mmap()'ed buffers like in audio drivers
- if no such buffer is being processed, an error code is returned
===========================================================================================
int sa1100_dma_stop( dmach_t channel );
- stops any DMA transfer on the given channel.
===========================================================================================
int sa1100_dma_resume( dmach_t channel );
- resumes a DMA transfer which would have been stopped through sa1100_dma_stop().
===========================================================================================
int sa1100_dma_flush_all( dmach_t channel );
- flushes all queued buffers and DMA transfers on the channel provided.
- the next enqueued buffer after this call will be processed immediately
===========================================================================================
int sa1100_dma_set_spin( dmach_t channel, dma_addr_t addr, int size );
Because there is at least one device out there that uses its receive
signal for its transmit clock reference, we need a mecanism to make the
DMA "spin" on a certain buffer for when there is no more actual buffer to
process. The 'addr' argument is the physical memory address to use, and
the 'size' argument determines the spin DMA chunk. This size can't be
larger than 8191 (if so, it is clamped to 4096). When the size is 0,
the spin function is turned off.
When activated, DMA will "spin" until there is any buffer in the queue.
The current DMA chunk will terminate before a newly queued buffer is
processed. The spin buffer will only be reused when there is no more
acctual buffer to process.
It is important not to choose a too small 'size' value since it will
greatly increase the interrupt load required to restart the spin. Since
this feature will typically be used on transmit DMAs, and because a buffer
full of zeros is probably the best thing to spin out, the 'addr' argument
may well be used with FLUSH_BASE_PHYS for which no allocation nor memory
bus request are needed.
The spinning DMA is affected by sa1100_dma_stop() and sa1100_dma_resume()
but not bu sa1100_dma_flush_all().
===========================================================================================
void sa1100_free_dma( dmach_t channel );
- clears all activities on a given DMA channel and releases it for future requests.
===========================================================================================
留言