12月 22, 2018

Linux Device Drivers -III

Content

7. Time, Delays, and Deferred Work
8. Allocating Memory
9. Communicating with Hardware
10. Interrupt Handling

Chapter 7: Time, Delays, and Deferred Work

Measuring Time Lapses

Timer interrupts are generated by the system’s timing hardware at regular intervals; this interval is programmed at boot time by the kernel according to the value of HZ.
HZ is an architecture-dependent value, the popular x86 PC defaults to 1000.
Every time a timer interrupt occurs, the value of an internal kernel counter " jiffies" is incremented. The counter is initialized to 0 at system boot, so it represents the number of clock ticks since last boot.

Using the jiffies Counter


#include <linux/jiffies.h>

unsigned long j, stamp_1, stamp_half, stamp_n;
j = jiffies;                 /* read the current value */
stamp_1 = j + HZ;            /* 1 second in the future */
stamp_half = j + HZ/2;       /* half a second */
stamp_n = j + n * HZ / 1000; /* n milliseconds */

To compare jiffies samples,


int time_after(unsigned long a, unsigned long b);
int time_before(unsigned long a, unsigned long b);
int time_after_eq(unsigned long a, unsigned long b);
int time_before_eq(unsigned long a, unsigned long b);

The kernel exports four helper functions to convert time values expressed as jiffies to and from struct timeval and struct timespec used in user space program:


#include <linux/time.h>

unsigned long timespec_to_jiffies(struct timespec *value);
void jiffies_to_timespec(unsigned long jiffies, struct timespec *value);
unsigned long timeval_to_jiffies(struct timeval *value);
void jiffies_to_timeval(unsigned long jiffies, struct timeval *value);

If you need to read the 64-bit jiffies:


u64 get_jiffies_64(void);

Processor-Specific Registers

If you need extremely high precision, you can resort to platform-dependent resources such as the TSC (timestamp counter), introduced in x86 processors.

Knowing the Current Time

Looking at jiffies is almost always sufficient when you need to measure time intervals.
Dealing with real-world time is usually best left to user space, where the C library offers better support.
There is a kernel function that turns a wall-clock time into a jiffies value,


#include <linux/time.h>

unsigned long mktime (unsigned int year, unsigned int mon,
                          unsigned int day, unsigned int hour,
                          unsigned int min, unsigned int sec);

Delaying Execution

In this chapter, we use the phrase “long delay" to refer to a multiple-jiffy delay.

Long Delays

Occasionally a driver needs to delay execution for relatively long periods—more than one clock tick.

Busy waiting

The easiest implementation is a loop that monitors the jiffy counter:


while ( time_before(jiffies, target_j) )
    cpu_relax( );

The call to cpu_relax() invokes an architecture-specific way of saying that you’re not doing much with the processor at the moment.
This busy loop severely degrades system performance. If you didn’t configure your kernel for preemptive operation, the loop completely locks the processor for the duration of the delay; the scheduler never preempts a process that is running in
kernel space, and the computer looks completely dead until time target_j is reached.
This implementation of delaying code is available, in the jit module.
To test the busy-wait code, you can read /proc/jit/jitbusy, which busy-loops for one second for each line it returns.
The suggested command to read /proc/jit/jitbusy is:


 dd bs=20 count=3 < /proc/jitbusy

optionally specifying the number of blocks as well.
Each 20-byte line returned by the file represents the value the jiffy counter had before and after the delay. Delays are exactly one second (HZ jiffies) between each read.

Under Linux, user-space programs have always been preemptible : the kernel interrupts user-space programs to switch to other threads, using the regular clock tick. This means that an infinite loop in an user-space program cannot block the system.
If the kernel is not preemtible, an infinite loop in the kernel code will block the entire system.
So, kernel preemption has been introduced in 2.6 kernels, and one can enable or disable it using the CONFIG_PREEMPT option. If CONFIG_PREEMPT is enabled, then kernel code can be preempted everywhere, except when the code has disabled local interrupts.
If you repeat the command while running a preemptible kernel, and run a program which forks 50 processes, the individual delays are far longer than one second because the process has been interrupted during its delay, scheduling other processes.

Yielding the processor

Busy waiting imposes a heavy load on the system because it lock the CPU resource. A better way is to explicitly release the CPU when we’re not interested in it.
This is accomplished by calling the schedule( ) function, declared in <linux/sched.h>:


while ( time_before(jiffies, j1) ) {
    schedule( );
}

This loop can be tested by reading /proc/jit/jitsched.
The current process does nothing but release the CPU, but it remains in the run queue. The process is actually executing during the delay. Once a process releases the processor with schedule(), there are no guarantees that the process will get the processor back anytime soon. You can see that the delay associated to each line on the output is extended by a few seconds, because other processes are using the CPU when the timeout expires.

Sleeping with Timeouts

If you want to be sure that it waits for a condition within a certain period of time, the better way is to ask the kernel to do it for you:


#include <linux/wait.h>

long wait_event_timeout(wait_queue_head_t q, condition, long timeout);
long wait_event_interruptible_timeout(wait_queue_head_t q, condition, long timeout);

Note that the timeout value represents the number of jiffies to wait.

If the timeout expires, the functions return 0 ;
if the process is awakened by another event, it returns the remaining delay expressed in jiffies.

For just sleep, no event to wait for, and uses 0 as a condition.
It is observed that /proc/jit/jitqueue is near optimal.

The wait queue head is not really used. The kernel offers the schedule_timeout() function to do the similar job:


#include <linux/sched.h>

signed long schedule_timeout(signed long timeout);

Make the current task sleep until timeout jiffies have elapsed.
The return value is 0 unless the function returns before the given timeout has elapsed (in response to a signal). The caller must set the current process state to be TASK_INTERRUPTIBLE/TASK_UNINTERRUPTIBLE before calling schedule_timeout() so that the scheduler won’t run the current process again:


set_current_state(TASK_INTERRUPTIBLE);
schedule_timeout(delay);

Short Delays

The clock tick is a longer latency than the delay needed by the HW drivers.
The kernel provides functions for short delays:


#include <linux/delay.h>
void ndelay(unsigned long nsecs);
void udelay(unsigned long usecs);
void mdelay(unsigned long msecs);

The actual implementations of the functions are in <asm/delay.h>, being architecture-specific, and sometimes build on an external function. Every architecture implements udelay.
It’s important to remember that the three short delay functions are busy-waiting; other tasks can’t be run during the time lapse.

There is another way of achieving millisecond (and longer) delays that does not involve busy waiting:


void msleep(unsigned int millisecs);
void ssleep(unsigned int seconds)

unsigned long msleep_interruptible(unsigned int millisecs);

Interrupts

An interrupt is an event that alters the normal execution flow of a program and can be generated by hardware devices or even by the CPU itself.
Interrupts can be grouped into two categories based on the source of the interrupt:

synchronous interrupts

exceptions

processor detected

faults
traps
aborts
programmed

int n

asynchronous interrupts

external events

maskable interrupts

can be ignored
signalled via INT pin

non-maskable interrupts

cannot be ignored
signalled via NMI pin

IRET

interrupt context

Code that runs in interrupt context

it runs as a result of an IRQ (not of an exception)
there is no well defined process context associated
not allowed to trigger a context switch (no sleep, schedule, or user memory access)

Deferrable actions are used to run callback functions at a later time. If deferrable actions scheduled from an interrupt handler, the associated callback function will run after the interrupt handler has completed.
There are two large categories of deferrable actions:

those that run in interrupt context
those that run in process context

avoid doing too much work in the interrupt handler function

In Linux there are 3 types of deferrable actions:

softIRQ

runs in interrupt context
statically allocated
same handler may run in parallel on multiple cores

tasklet

runs in interrupt context
can be dynamically allocated
same handler runs are serialized

workqueues

run in process context

Softirqs

implementation of the bottom half handlers are built on the performance of the processor specific kernel thread

ksoftirqd

systemd-cgls


$ systemd-cgls -k | grep ksoft
├─   9 [ksoftirqd/0]
├─  18 [ksoftirqd/1]
├─  24 [ksoftirqd/2]
├─  30 [ksoftirqd/3]

systemd-cgls

spawn_ksoftirqd


early_initcall(spawn_ksoftirqd);

Softirqs are determined statically at compile-time of the Linux kernel

softirq_vec

NR_SOFTIRQS

softirq_action


static struct softirq_action softirq_vec[NR_SOFTIRQS] __cacheline_aligned_in_smp;

enum
{
        HI_SOFTIRQ=0,
        TIMER_SOFTIRQ,
        // two for networking
        NET_TX_SOFTIRQ,
        NET_RX_SOFTIRQ,
        // two for the block layer
        BLOCK_SOFTIRQ,
        BLOCK_IOPOLL_SOFTIRQ,
        TASKLET_SOFTIRQ,
        SCHED_SOFTIRQ,
        HRTIMER_SOFTIRQ,
        RCU_SOFTIRQ,
        NR_SOFTIRQS
};

const char * const softirq_to_name[NR_SOFTIRQS] = {
        "HI", "TIMER", "NET_TX", "NET_RX", "BLOCK", "BLOCK_IOPOLL",
        "TASKLET", "SCHED", "HRTIMER", "RCU"
};

struct softirq_action
{
         void    (*action)(struct softirq_action *);
};

softirq_action

action


$ cat /proc/softirqs
                    CPU0       CPU1       CPU2       CPU3       CPU4       CPU5       CPU6       CPU7       
          HI:      65208          6         13      60996          0          0          0          0
       TIMER:     899907     837399     927077     865850          0          0          0          0
      NET_TX:          2          3         65          7          0          0          0          0
      NET_RX:       1102       1174        914       6913          0          0          0          0
       BLOCK:      29141        344       8994      58669          0          0          0          0
    IRQ_POLL:          0          0          0          0          0          0          0          0
     TASKLET:        242         26       4900        166          0          0          0          0
       SCHED:    1002338     923765     999987     918435          0          0          0          0
     HRTIMER:          0          0        539          0          0          0          0          0
         RCU:     470947     468360     483743     481231          0          0          0          0

Each processor has its own "ksoftirqd" thread
Each "ksoftirqd" thread is setup to handle 10 softirq actions
Checks of the existence of the deferred interrupts are performed periodically.

ksoftirqd

checks existence of deferred interrupts and calls the __do_softirq() function depending on the result of the check

do_IRQ

exiting_irq

irq_exit

invoke_softirq


if (!in_interrupt() && local_softirq_pending())
    invoke_softirq();

Tasklets

Tasklets

TASKLET_SOFITIRQ

HI_SOFTIRQ

They are always run at interrupt time
they always run on the same CPU which schedules them
they receive an unsigned long argument

you can’t ask to execute the function at a specific time.

executed at a later time chosen by the kernel

A tasklet can be disabled and re-enabled later; it won’t be executed until it is enabled as many times as it has been disabled.
Just like timers, a tasklet can reregister itself.
A tasklet can be scheduled to execute at normal priority or high priority.

initialization


#include <linux/interrupt.h>
struct tasklet_struct {
 struct tasklet_struct *next;
 unsigned long state;
 atomic_t count;
 void (*func)(unsigned long);
 unsigned long data;
};

void tasklet_init(struct tasklet_struct *t,
    void (*func)(unsigned long), unsigned long data);

#define DECLARE_TASKLET(name, func, data) \
struct tasklet_struct name = { NULL, 0, ATOMIC_INIT(0), func, data }

#define DECLARE_TASKLET_DISABLED(name, func, data) \
struct tasklet_struct name = { NULL, 0, ATOMIC_INIT(1), func, data }

activation

void tasklet_enable(struct tasklet_struct *t)
void tasklet_disable(struct tasklet_struct *t)

tasklet_schedule

busy-waits

void tasklet_disable_nosync(struct tasklet_struct *t)
void tasklet_schedule(struct tasklet_struct *t)
void tasklet_hi_schedule(struct tasklet_struct *t)

higher priority

stop

void tasklet_kill(struct tasklet_struct *t)

not scheduled to run again

void tasklet_kill_immediate(struct tasklet_struct *t, unsigned int cpu)

Work Queues

process context

kernel threads

work item

queue

independent thread

asynchronous execution context

queue

work queue

thread

worker

When there is no work item left on the workqueue the worker becomes idle.
When a new work item gets queued, the worker begins executing again.

work item

worker threads

kernel threads

worker threads

deferrable functions run in interrupt context

atomic

functions in work queues run in process context

Running in process context

sleep

descriptor


#include <linux/workqueue.h>

struct workqueue_struct {
...
 char   name[WQ_NAME_LEN]; /* I: workqueue name */
...
};

EXPORT_SYMBOL(system_wq);
EXPORT_SYMBOL_GPL(system_highpri_wq);
EXPORT_SYMBOL_GPL(system_long_wq);
EXPORT_SYMBOL_GPL(system_unbound_wq);
EXPORT_SYMBOL_GPL(system_freezable_wq);
EXPORT_SYMBOL_GPL(system_power_efficient_wq);
EXPORT_SYMBOL_GPL(system_freezable_power_efficient_wq);

Workqueue APIs

Allocates a workqueue


struct workqueue_struct *alloc_workqueue(const char *fmt,
      unsigned int flags,
      int max_active, ...)

fmt
flags
max_active
args...

()

Kernel Timers

schedule an action

TIMER_SOFTIRQ

timer functions must be atomic

outside of process context

Access to user space is NOT allowed.
The current pointer is not meaningful in atomic mode and cannot be used since the relevant code has no connection with the process that has been interrupted.
No sleeping or scheduling may be performed.

schedule

wait_event

kmalloc

GFP_KERNEL

in_interrupt

hardware

software


/*
 * Are we doing bottom half or hardware interrupt processing?
 *
 * in_irq()       - We're in (hard) IRQ context
 * in_softirq()   - We have BH disabled, or are processing softirqs
 * in_interrupt() - We're in NMI,IRQ,SoftIRQ context or have BH disabled
 * in_serving_softirq() - We're in softirq context
 * in_nmi()       - We're in NMI context
 * in_task()	  - We're in task context
 *
 * Note: due to the BH disabled confusion: in_softirq(),in_interrupt() really
 *       should not be used in new code.
 */

The Timer API

struct timer_list


struct timer_list {
 struct hlist_node	entry;
 unsigned long  expires;
 void   (*function)(struct timer_list *);
 u32  flags;
#ifdef CONFIG_LOCKDEP
 struct lockdep_map	lockdep_map;
#endif
};

from_timer(var, callback_timer, timer_fieldname)


#define from_timer(var, callback_timer, timer_fieldname) \
 container_of(callback_timer, typeof(*var), timer_fieldname)

timer initialization

DEFINE_TIMER

timer_setup


#define DEFINE_TIMER(_name, _function)				\
	struct timer_list _name =				\
		__TIMER_INITIALIZER(_function, 0)

#define timer_setup(timer, callback, flags)			\
	__init_timer((timer), (callback), (flags))

start a timer


void add_timer(struct timer_list * timer);

deactivates a timer


int del_timer(struct timer_list * timer);
int del_timer_sync(struct timer_list *timer);

expires

function

unsigned long

The example timer used to generate /proc/jitimer data is run every 10 jiffies by default,

define the timer function and data used in the timer


/* This data structure is used as "data" for the timer and tasklet functions. */
struct jit_data {
 struct timer_list timer;
...
 wait_queue_head_t wait;
 unsigned long prevjiffies;
 unsigned char *buf;
 int loops;
};

void jit_timer_fn(unsigned long arg)
{
 struct jit_data *data = (struct jit_data *) arg;
 unsigned long j = jiffies;
 data->buf += sprintf(data->buf, "%9li  %3li     %i    %6i   %i   %s\n",
        j, j - data->prevjiffies, in_interrupt() ? 1 : 0,
        current->pid, smp_processor_id(), current->comm);

 if (--data->loops) {
  data->timer.expires += tdelay;
  data->prevjiffies = j;
  add_timer(&data->timer);
 } else {
  wake_up_interruptible(&data->wait);
 }
}


int jit_timer(char *buf, char **start, off_t offset,
       int len, int *eof, void *unused_data)
{
 struct jit_data *data;
 char *buf2 = buf;
 unsigned long j = jiffies;

 data = kmalloc(sizeof(*data), GFP_KERNEL);
 if (!data)
  return -ENOMEM;

 init_timer(&data->timer);
 init_waitqueue_head(&data->wait);

 /* Write the first lines in the buffer. */
 buf2 += sprintf(buf2, "   time   delta  inirq    pid   cpu command\n");
 buf2 += sprintf(buf2, "%9li  %3li     %i    %6i   %i   %s\n",
   j, 0L, in_interrupt() ? 1 : 0,
   current->pid, smp_processor_id(), current->comm);

 /* fill the data for our timer function */
 data->prevjiffies = j;
 data->buf = buf2;
 data->loops = JIT_ASYNC_LOOPS;

 /* register the timer */
 data->timer.data = (unsigned long) data;
 data->timer.function = jit_timer_fn;
 data->timer.expires = j + tdelay; /* parameter */
 add_timer(&data->timer);
 /* wait for the buffer to fill */
 wait_event_interruptible(data->wait, !data->loops);

 if (signal_pending(current))
  return -ERESTARTSYS;
 buf2 = data->buf;
 kfree(data);
 *eof = 1;
 return buf2 - buf;
}

remove timers

The Implementation of Kernel Timers

internal_add_timer

__run_timers

hardware interrupts
other timers
asynchronous tasks

Chapter 8: Allocating Memory

Memory mapping

Overview

kernel address space

user address space

mmap()

file_operations

mmap()

The basic unit for virtual memory management is a page, which size is usually 4K, but it can be up to 64K on some platforms.
Whenever we work with virtual memory we work with two types of addresses:

virtual address
physical address

CPU access (including from kernel space) uses virtual addresses

translated by the MMU into physical addresses with the help of page tables

A physical page of memory is identified by the Page Frame Number (PFN).
The PFN can be easily computed from the physical address by dividing it with the size of the page (or by shifting the physical address with PAGE_SHIFT bits to the right).

user space

kernel space

user space
kernel space

lowmem

contiguously mapped in physical memory

PAGE_OFFSET

highmem

Memory allocated by kmalloc() resides in lowmem and it is physically contiguous.
Memory allocated by vmalloc() is not contiguous and does not reside in lowmem (it has a dedicated zone in highmem).

Chapter 9: Communicating with Hardware

I/O Ports and I/O Memory

peripheral devices are different from memory and, therefore, deserve a separate address space

I/O Registers and Conventional Memory

Optimization and Memory Barriers

memory barrier

optimization barrier

primitive

before the primitive

after the primitive

barrier()


  asm volatile(""::: "memory"),

The asm instruction tells the compiler to insert an assembly language fragment
The volatile keyword for- bids the compiler to reshuffle the asm instruction with the other instructions of the program.
The memory keyword forces the compiler to assume that all memory locations in RAM have been changed by the assembly language instruction

memory barrier

primitive

mb()
rmb()
wmb()
smp_mb()
smp_rmb()
smp_wmb()

Using I/O Ports

Using I/O Memory

memory-mapped

registers

device memory

I/O memory

page tables

ioremap

I/O Memory Allocation and Mapping


<linux/ioport.h>

struct resource *request_mem_region(unsigned long start, unsigned long len, char *name);


void release_mem_region(unsigned long start, unsigned long len)

ioremap

virtual addresses to I/O memory regions


#include <asm/io.h>

void *ioremap(unsigned long phys_addr, unsigned long size)
void *ioremap_nocache(unsigned long phys_addr, unsigned long size)
void iounmap(void * addr)

Accessing I/O Memor


unsigned int ioread8(void *addr);
unsigned int ioread16(void *addr);
unsigned int ioread32(void *addr);
void iowrite8(u8 value, void *addr);
void iowrite16(u16 value, void *addr);
void iowrite32(u32 value, void *addr);

ioremap


void ioread8_rep(void *addr, void *buf, unsigned long count);
void ioread16_rep(void *addr, void *buf, unsigned long count);
void ioread32_rep(void *addr, void *buf, unsigned long count);
void iowrite8_rep(void *addr, const void *buf, unsigned long count);
void iowrite16_rep(void *addr, const void *buf, unsigned long count);
void iowrite32_rep(void *addr, const void *buf, unsigned long count);


void memset_io(void *addr, u8 value, unsigned int count);
void memcpy_fromio(void *dest, void *source, unsigned int count);
void memcpy_toio(void *dest, void *source, unsigned int count);

Ports as I/O Memory

Chapter 10: Interrupt Handling

The irq_domain interrupt number mapping library

hardware IRQ numbers

Linux IRQ numbers

Device --> IOAPIC -> Interrupt remapping Controller -> Local APIC -> CPU

+----------------------------------------------------------------+
      Device --> IOAPIC -> Interrupt remapping Controller -> Local APIC -> CPU |
              |      domain#2                         domain#2      domain#3   |
              +---------------------------------------------------------+------+
                              stacked irq_chip                         /|\
                                                                        |
                                                                       \|/
                                                                       
                                                               Linux IRQ numbers

Linux generic IRQ handling

Linux Inside

interrupt

Programmable Interrupt Controller

Local APIC
I/O APIC

Interrupt handlers

concurrency

Interrupt Descriptor Table

IDT


#define IDT_ENTRIES   256
#define NUM_EXCEPTION_VECTORS  32


static void
idt_setup_from_table(gate_desc *idt, const struct idt_data *t, int size, bool sys)
{
 gate_desc desc;

 for (; size > 0; t++, size--) {
  idt_init_desc(&desc, t);
  write_idt_entry(idt, t->vector, &desc);
  if (sys)
   set_bit(t->vector, system_vectors);
 }
}

static void set_intr_gate(unsigned int n, const void *addr)
{
 struct idt_data data;

 BUG_ON(n > 0xFF);

 memset(&data, 0, sizeof(data));
 data.vector = n;
 data.addr = addr;
 data.segment = __KERNEL_CS;
 data.bits.type = GATE_INTERRUPT;
 data.bits.p = 1;

 idt_setup_from_table(idt_table, &data, 1, false);
}

void __init idt_setup_early_handler(void)
{
 int i;

 for (i = 0; i < NUM_EXCEPTION_VECTORS; i++)
  set_intr_gate(i, early_idt_handler_array[i]);
#ifdef CONFIG_X86_32
 for ( ; i < NR_VECTORS; i++)
  set_intr_gate(i, early_ignore_irq);
#endif
 load_idt(&idt_descr);
}

load_idt

lidt


    asm volatile("lidt %0"::"m" (idt_descr));

processor

vector number


  BUG_ON( (unsigned)n > 0xFF );

The first 32 vector numbers from 0 to 31 are reserved by the processor
Vector numbers from 32 to 255 are used for user-defined interrupts.

maskable


asm volatile("cli": : :"memory");

asm volatile("sti": : :"memory");

sti

cli

non-maskable

Linux Device Driver Tutorial

Interrupts in Linux Kernel

Upon receiving an interrupt, the interrupt controller sends a signal to the processor.
The processor detects this signal and interrupts its current execution to handle the interrupt.
The processor can then notify the operating system that an interrupt has occurred, and the operating system can handle the interrupt appropriately.

asynchronous interrupts generated by hardware

synchronous interrupts generated by the processor

System calls

software interrupt

execution of a special system call handler

interrupt handler

interrupt service routine (ISR)

Each device that generates interrupts has an associated interrupt handler.
The interrupt handler for a device is part of the device’s driver (the kernel code that manages the device).

interrupt handlers

interrupt context

process context

interrupt context

Go to sleep or relinquish the processor
Acquire a mutex
Perform time-consuming tasks
Access user space virtual memory

Top halves
Bottom halves

Softirqs
Tasklets
Work-queue
Threaded IRQs

Interrupts Example Program in Linux Kernel

Interrupt handlers can not enter sleep, so to avoid calls to some functions which has sleep.
When the interrupt handler has part of the code to enter the critical section, use spinlocks lock, rather than mutexes. Because if it couldn’t take mutex it will go to sleep until it takes the mute.
Interrupt handlers can not exchange data with the user space.
The interrupt handlers must be executed as soon as possible. To ensure this, it is best to split the implementation into two parts, top half and bottom half. The top half of the handler will get the job done as soon as possible, and then work late on the bottom half, which can be done with softirqs or tasklets or workqueus.
Interrupt handlers can not be called repeatedly. When a handler is already executing, its corresponding IRQ must be disabled until the handler is done.
Interrupt handlers can be interrupted by higher authority handlers. If you want to avoid being interrupted by a highly qualified handlers, you can mark the interrupt handler as a fast handler. However, if too many are marked as fast handlers, the performance of the system will be degraded, because the interrupt latency will be longer.

request_irq( unsigned int irq, irq_handler_t handler, unsigned long flags, const char *name, void *dev_id)
free_irq(unsigned int irq, void *dev_id)
enable_irq(unsigned int irq)
disable_irq(unsigned int irq)
disable_irq_nosync(unsigned int irq)
in_irq()
in_interrupt()

flags

IRQF_DISABLED
IRQF_SAMPLE_RANDOM
IRQF_SHARED
IRQF_TIMER

An interrupt can be raised using ‘int’ instruction by software.
In linux, IRQ to vector mapping is done in arch/x86/include/asm/irq_vectors.h:


/*
 * Linux IRQ vector layout.
 *
 * There are 256 IDT entries (per CPU - each entry is 8 bytes) which can
 * be defined by Linux. They are used as a jump table by the CPU when a
 * given vector is triggered - by a CPU-external, CPU-internal or
 * software-triggered event.
 *
 * Linux sets the kernel code address each entry jumps to early during
 * bootup, and never changes them. This is the general layout of the
 * IDT entries:
 *
 *  Vectors   0 ...  31 : system traps and exceptions - hardcoded events
 *  Vectors  32 ... 127 : device interrupts
 *  Vector  128         : legacy int80 syscall interface
 *  Vectors 129 ... LOCAL_TIMER_VECTOR-1
 *  Vectors LOCAL_TIMER_VECTOR ... 255 : special interrupts
 *
 * 64-bit x86 has per CPU IDT tables, 32-bit has one shared IDT table.
 *
 * This file enumerates the exact layout of them:
 */

/*
 * IDT vectors usable for external interrupt sources start at 0x20.
 * (0x80 is the syscall vector, 0x30-0x3f are for ISA)
 */
#define FIRST_EXTERNAL_VECTOR  0x20


#define IRQ0_VECTOR (FIRST_EXTERNAL_VECTOR + 0x10)


 IRQ0_VECTOR + 11


  asm("int $0x3B")

sim_irq/Makefile


ifneq ($(KERNELRELEASE),)
obj-m += sim_irq.o
else
KERNELDIR ?= /lib/modules/$(shell uname -r)/build
PWD  := $(shell pwd)
default:
        $(MAKE) -C $(KERNELDIR) M=$(PWD) modules
endif

sim_irq/sim_irq.c


#include <linux/kernel.h&>t;
#include <linux/init.h&>t;
#include <linux/module.h&>t;
#include <linux/kdev_t.h&>t;
#include <linux/fs.h&>t;
#include <linux/cdev.h&>t;
#include <linux/device.h&>t;
#include<linux/slab.h&>t;                 //kmalloc()
#include<linux/uaccess.h&>t;              //copy_to/from_user()
#include<linux/sysfs.h&>t; 
#include<linux/kobject.h&>t; 
#include <linux/interrupt.h&>t;
#include <asm/io.h&>t;
 
#define IRQ_NO 11
 
//Interrupt handler for IRQ 11. 
 
static irqreturn_t irq_handler(int irq,void *dev_id) {
  printk(KERN_INFO "Shared IRQ: Interrupt Occurred");
  return IRQ_HANDLED;
}
 
 
volatile int etx_value = 0;
 
 
dev_t dev = 0;
static struct class *dev_class;
static struct cdev etx_cdev;
struct kobject *kobj_ref;
 
static int __init etx_driver_init(void);
static void __exit etx_driver_exit(void);
 
/*************** Driver Fuctions **********************/
static int etx_open(struct inode *inode, struct file *file);
static int etx_release(struct inode *inode, struct file *file);
static ssize_t etx_read(struct file *filp, 
                char __user *buf, size_t len,loff_t * off);
static ssize_t etx_write(struct file *filp, 
                const char *buf, size_t len, loff_t * off);
 
/*************** Sysfs Fuctions **********************/
static ssize_t sysfs_show(struct kobject *kobj, 
                struct kobj_attribute *attr, char *buf);
static ssize_t sysfs_store(struct kobject *kobj, 
                struct kobj_attribute *attr,const char *buf, size_t count);
 
struct kobj_attribute etx_attr = __ATTR(etx_value, 0660, sysfs_show, sysfs_store);
 
static struct file_operations fops =
{
        .owner          = THIS_MODULE,
        .read           = etx_read,
        .write          = etx_write,
        .open           = etx_open,
        .release        = etx_release,
};
 
static ssize_t sysfs_show(struct kobject *kobj, 
                struct kobj_attribute *attr, char *buf)
{
        printk(KERN_INFO "Sysfs - Read!!!\n");
        return sprintf(buf, "%d", etx_value);
}
 
static ssize_t sysfs_store(struct kobject *kobj, 
                struct kobj_attribute *attr,const char *buf, size_t count)
{
        printk(KERN_INFO "Sysfs - Write!!!\n");
        sscanf(buf,"%d",&etx_value);
        return count;
}
 
static int etx_open(struct inode *inode, struct file *file)
{
        printk(KERN_INFO "Device File Opened...!!!\n");
        return 0;
}
 
static int etx_release(struct inode *inode, struct file *file)
{
        printk(KERN_INFO "Device File Closed...!!!\n");
        return 0;
}
 
static ssize_t etx_read(struct file *filp, 
                char __user *buf, size_t len, loff_t *off)
{
        printk(KERN_INFO "Read function\n");
        asm("int $0x3B");  // Corresponding to irq 11
        return 0;
}
static ssize_t etx_write(struct file *filp, 
                const char __user *buf, size_t len, loff_t *off)
{
        printk(KERN_INFO "Write Function\n");
        return 0;
}
 
 
static int __init etx_driver_init(void)
{
        /*Allocating Major number*/
        if((alloc_chrdev_region(&dev, 0, 1, "etx_Dev")) <0){
                printk(KERN_INFO "Cannot allocate major number\n");
                return -1;
        }
        printk(KERN_INFO "Major = %d Minor = %d \n",MAJOR(dev), MINOR(dev));
 
        /*Creating cdev structure*/
        cdev_init(&etx_cdev,&fops);
 
        /*Adding character device to the system*/
        if((cdev_add(&etx_cdev,dev,1)) < 0){
            printk(KERN_INFO "Cannot add the device to the system\n");
            goto r_class;
        }
 
        /*Creating struct class*/
        if((dev_class = class_create(THIS_MODULE,"etx_class")) == NULL){
            printk(KERN_INFO "Cannot create the struct class\n");
            goto r_class;
        }
 
        /*Creating device*/
        if((device_create(dev_class,NULL,dev,NULL,"etx_device")) == NULL){
            printk(KERN_INFO "Cannot create the Device 1\n");
            goto r_device;
        }
 
        /*Creating a directory in /sys/kernel/ */
        kobj_ref = kobject_create_and_add("etx_sysfs",kernel_kobj);
 
        /*Creating sysfs file for etx_value*/
        if(sysfs_create_file(kobj_ref,&etx_attr.attr)){
                printk(KERN_INFO"Cannot create sysfs file......\n");
                goto r_sysfs;
        }
        if (request_irq(IRQ_NO, irq_handler, IRQF_SHARED, "etx_device", (void *)(irq_handler))) {
            printk(KERN_INFO "my_device: cannot register IRQ ");
                    goto irq;
        }
        printk(KERN_INFO "Device Driver Insert...Done!!!\n");
    return 0;
 
irq:
        free_irq(IRQ_NO,(void *)(irq_handler));
 
r_sysfs:
        kobject_put(kobj_ref); 
        sysfs_remove_file(kernel_kobj, &etx_attr.attr);
 
r_device:
        class_destroy(dev_class);
r_class:
        unregister_chrdev_region(dev,1);
        cdev_del(&etx_cdev);
        return -1;
}
 
void __exit etx_driver_exit(void)
{
        free_irq(IRQ_NO,(void *)(irq_handler));
        kobject_put(kobj_ref); 
        sysfs_remove_file(kernel_kobj, &etx_attr.attr);
        device_destroy(dev_class,dev);
        class_destroy(dev_class);
        cdev_del(&etx_cdev);
        unregister_chrdev_region(dev, 1);
        printk(KERN_INFO "Device Driver Remove...Done!!!\n");
}
 
module_init(etx_driver_init);
module_exit(etx_driver_exit);
 
MODULE_LICENSE("GPL");
MODULE_AUTHOR("EmbeTronicX <embetronicx@gmail.com or admin@embetronicx.com&>t;");
MODULE_DESCRIPTION("A simple device driver - Interrupts");
MODULE_VERSION("1.9");

Build the driver


$ sudo make

Load the driver


$ sudo insmod sim_irq.ko

[ 2781.101790] Major = 239 Minor = 0 
[ 2781.101959] Device Driver Insert...Done!!!

To trigger interrupt


$ sudo cat /dev/etx_device

Now see the Dmesg


$ dmesg
...
[ 2872.619249] Device File Opened...!!!
[ 2872.619260] Read function
[ 2872.619262] do_IRQ: 1.59 No irq handler for vector
[ 2872.619273] Device File Closed...!!!



__visible unsigned int __irq_entry do_IRQ(struct pt_regs *regs)
{
...
 unsigned vector = ~regs->orig_ax;
...
 desc = __this_cpu_read(vector_irq[vector]);
...
  if (desc == VECTOR_UNUSED) {
                    pr_emerg_ratelimited("%s: %d.%d No irq handler for vector\n",
                    __func__, smp_processor_id(),
                      vector);
                } else {
                     __this_cpu_write(vector_irq[vector], VECTOR_UNUSED);
                }
...
}

"request_irq(11,...)" and the interrupt vector received is 59(0x 3B)


#define NR_VECTORS 256


#define VECTOR_UNUSED  NULL

typedef struct irq_desc* vector_irq_t[NR_VECTORS];
DECLARE_PER_CPU(vector_irq_t, vector_irq);


 0x3B = 0x20 + 0x10 + 11

$ cat /proc/interrupts


 11:          0          0          0          0   IO-APIC  11-edge      sim_irq

Preparing the Parallel Port

Installing an Interrupt Handler


int request_irq(unsigned int irq,
                irqreturn_t (*handler)(int, void *, struct pt_regs *),
                unsigned long flags,
                const char *dev_name, 
                void *dev_id);

void free_irq(unsigned int irq, void *dev_id);

irq
handler
flags

IRQF_SHARED

status register

IRQF_PROBE_SHARED
IRQF_TIMER
IRQF_PERCPU
IRQF_NOBALANCING
IRQF_IRQPOLL
IRQF_ONESHOT
IRQF_NO_SUSPEND
IRQF_FORCE_RESUME
IRQF_NO_THREAD
IRQF_EARLY_RESUME
IRQF_COND_SUSPEND

*dev_name
dev_id

open

The correct place to call request_irq is when the device is first opened, before the hardware is instructed to generate interrupts.
The place to call free_irq is the last time the device is closed, after the hardware is told not to interrupt the processor any
more.

The /proc Interface


           CPU0       CPU1       CPU2       CPU3       
  0:         32          0          0          0   IO-APIC   2-edge      timer
  1:          0          0      18154          0   IO-APIC   1-edge      i8042
  8:          0          0          0          1   IO-APIC   8-edge      rtc0
...

intr


...
intr 641582490 32 18356
...

Autodetecting the IRQ Number

the user specifies the interrupt number at load time is a bad idea
the driver retrieves the interrupt number told by the device
the driver tells the device to generate interrupts and watches what happens.

Kernel-assisted probing

unsigned long probe_irq_on(void)
int probe_irq_off(unsigned long)

call probe_irq_on()
enable interrupts on the probed device
disable interrupts on the probed device
call probe_irq_off()


unsigned long mask;
mask = probe_irq_on( );
/* enable interrupts then disable interrupts on the probed device. */
udelay(5); /* give it some time */
short_irq = probe_irq_off(mask);
if (short_irq == 0) /* none of them? */
    printk(KERN_INFO "short: no irq reported by probe\n");
}

Do-it-yourself probing

NR_IRQS


irqreturn_t short_probing(int irq, void *dev_id, struct pt_regs *regs) 
{
...
}

irq

Fast and Slow Handlers

The internals of interrupt handling on the x86

an assembly-language file that handles much of the machine-level work
a bit of code is assigned to every possible interrupt
In each interrupt case, the code pushes the interrupt number on the stack and jumps to a common segment, which calls do_IRQ(), defined in arch/x86/kernel/irq.c.


__visible unsigned int __irq_entry do_IRQ(struct pt_regs *regs)

Introduction to deferred interrupts (Softirq, Tasklets and Workqueues)

Handler of an interrupt must execute quickly
Sometime an interrupt handler must do a large amount of work.

Top half;
Bottom half;

all the different ways of organizing deferred processing of an interrupt

softirqs;
tasklets;
workqueues;

Implementing a Handler

give feedback to its device about interrupt reception and to read or write data

Top and Bottom Halves

top half

request_irq

bottom half

scheduled by the top half

mechanisms that may be used to implement bottom-half processing

Tasklets are often the preferred mechanism for bottom-half processing; they are very fast, but all tasklet code must be atomic.
The alternative to tasklets is workqueues, which may have a higher latency but that are allowed to sleep.

Linux Device Drivers -III

Chapter 7: Time, Delays, and Deferred Work

Measuring Time Lapses

Using the jiffies Counter

Processor-Specific Registers

Knowing the Current Time

Delaying Execution

Long Delays

Busy waiting

Yielding the processor

Sleeping with Timeouts

Short Delays

Tasklets

Work Queues

Workqueue APIs

Kernel Timers

The Timer API

The Implementation of Kernel Timers

Chapter 8: Allocating Memory

Overview

Chapter 9: Communicating with Hardware

I/O Ports and I/O Memory

I/O Registers and Conventional Memory

Optimization and Memory Barriers

Using I/O Ports

Using I/O Memory

I/O Memory Allocation and Mapping

Accessing I/O Memor

Ports as I/O Memory

Chapter 10: Interrupt Handling

Preparing the Parallel Port

Installing an Interrupt Handler

The /proc Interface

Autodetecting the IRQ Number

Kernel-assisted probing

Do-it-yourself probing

Fast and Slow Handlers

The internals of interrupt handling on the x86

Implementing a Handler

Top and Bottom Halves

留言

熱門文章