Kernel : DIY

Guide to Assembly Language Programming in Linux

by
Sivarama P. Dandamudi

1. Assembly Language


Introduction


Assembly language programming is referred to as low-level programming because each assembly language instruction performs a much lower-level task compared to an instruction in a high-level language.

Assembly language instructions are native to the processor used in the system. For example, a program written in the Intel assembly language cannot be executed on the ARM processor.

Machine language is a close relative of the assembly language. Typically, there is a one-to-one correspondence between the assembly language and machine language instructions. The processor understands only the machine language, whose instructions consist of strings of Is and Os.

What Is Assembly Language?


Assembly language is directly influenced by the instruction set and architecture of the processor.
The assembly language code must be processed by a program "assembler" in order to generate the machine language code.
NASM (Netwide Assembler) is a free assembler which supports a variety of formats including the formats used by Microsoft Windows, Linux and a host of others.
Looking at the equivalent machine language instructions to the assembly language instructions:

It is obvious from these examples that understanding the code of a program in the machine language is almost impossible.

2. Digital Logic Circuits


Introduction


A computer system has three main components:
  • a central processing unit (CPU)
  • a memory unit
  • input/output (I/O) devices
These three components are interconnected by a system bus.
The three major components of the system bus are :
  • address bus
  • data bus
  • control bus

While the address bus carries the information about the device with which the CPU is communicating and the data bus carries the actual data being processed, the control bus carries commands from the CPU and returns status signals from the devices.
The number and type of lines in a control bus varies but there are basic lines common to all microprocessors, such as:
  • Read
  • A single line that when active (logic zero) indicates the device is being read by the CPU.
  • Write
  • A single line that when active (logic zero) indicates the device is being written by the CPU.
  • Byte enable
  • A group of lines that indicate the size of the data (8, 16, 32, 64 bytes).
Systems that have more than one bus master have additional control bus lines.
For example, when the processor is writing data into the memory, the memory write signal line is asserted.

Data transfers on the system bus are called bus transactions. Every bus transaction involves a master and a slave. The master is the initiator of the transaction and the slave is the target of the transaction. The processor usually acts as the master of the system bus, while components like memory are usually slaves.

3 Memory Organization


The IA-32 Architecture


It is important for the assembly language programmer to understand the segmented memory organization.

Processor Execution Cycle


Processor Registers


The IA-32 architecture provides ten 32-bit and six 16-bit registers.
These registers are grouped into :
  • general registers
  • The general registers are further divided into data, pointer, and index registers.
  • control registers
  • Instruction pointer register is sometimes called the program counter register. When an instruction is fetched from memory, the instruction pointer is updated to point to the next instruction. This register is also modified during the execution of an instruction that transfers control to another location in the program (such as a jump, procedure call, or interrupt).
  • segment registers

Protected Mode Memory Architecture




NASM Assembly Language Tutorials

Lesson 1: Hello, world!


The only interface an assembly programmer has above the actual hardware is the linux system calls provided by the kernel. These system calls are a library built into the operating system to provide functions such as reading input from a keyboard and writing output to the screen.
When you invoke a system call the kernel will immediately suspend execution of your program. It will then contact the necessary drivers needed to perform the task you requested on the hardware and then return control back to your program.

Multiboot kernel



Basic x86 interrupts


To add the support for hardware IO, it requires initialization of x86 interrupts.

There are 3 sources or types of interrupts:
  • Hardware interrupts
  • comes from hardware devices like keyboard or network card.
  • Software interrupts
  • Generated by the software int instruction. Most Unix systems and derivatives do not use software interrupts, with the exception of interrupt 0x80, used to make system calls. (this could mean something totally different in another OS ) The “traditional” way to invoke a system call makes use of the int assembly language instruction. During kernel initialization, the trap_init() function sets up the Interrupt Descriptor Table entry corresponding to vector 128(0x80) as follows:
    
      set_system_gate(0x80, &system_call);
    
    This call loads the following values into the gate descriptor fields:
    • Segment Selector
    • The _ _KERNEL_CS Segment Selector of the kernel code segment.
    • Offset
    • The pointer to the system_call( ) handler.
    • Type
    • Set to 15. Indicates that the exception is a Trap and that the corresponding handler does not disable maskable interrupts.
    • DPL (Descriptor Privilege Level)
    • Set to 3.
    Therefore, when a User Mode process issues an int $0x80 instruction, the CPU switches into Kernel Mode and starts executing instructions from the system_call address.
  • Exceptions
  • generated by CPU itself in response to some error like “divide by zero” or “page fault”.

x86 interrupt system involves 3 parts to work together:
  • Programmable Interrupt Controller (PIC) must be configured to receive interrupt requests (IRQs) from devices and send them to CPU.
  • CPU must be configured to receive IRQs from PIC and invoke correct interrupt handler, via gate described in an Interrupt Descriptor Table (IDT).
  • Operating system kernel must provide Interrupt Service Routines (ISRs) to handle interrupts and be ready to be preempted by an interrupt. It also must configure both PIC and CPU to enable interrupts.

PIC

PIC is the piece of hardware that various peripheral devices are connected to instead of CPU:
  • More interrupt lines via PIC chaining (2 PICs give 15 interrupt lines)
  • Ability to mask particular interrupt line instead of all ( cli )
  • Interrupts queueing
  • When some interrupt is disabled, PIC queues it for later delivery instead of dropping.
Without PIC programming and remapping interrupts, keyboard has interrupt number 9 in CPU (but IRQ1 in PIC). Original IBM PCs had separate 8259 PIC chip. Modern PC systems have APIC (advanced programmable interrupt controller) that solves interrupts routing problems for multi-core/processors machines. PIC is connected to the CPU data bus. This bus is used to:
  • send IRQ number from PIC to CPU
  • send configuration commands from CPU to PIC
  • Configuration commands include PIC initialization, IRQ masking, End-Of-Interrupt (EOI) command and so on.

Interrupt descriptor table (IDT)

IDT is an x86 system table that holds descriptors for Interrupt Service Routines (ISRs). In real (address) mode, there is an interrupt vector table which is located by the fixed address 0x0. IDT is the table in memory which created and filled by OS. IDT is pointed by idtr register which is loaded with lidt instruction. IDT can only be used in protected (virtual address) mode. When a processor that supports x86 protected mode is powered on, it begins executing instructions in real mode, protected mode may only be entered after the system software sets up Global Descriptor Table (GDT, a null descriptor, a CS descriptor and DS descriptor) and enables the Protection Enable (PE) bit in the control register 0 (CR0). IDT entries are descriptors similar to GDT:
  • offset
  • is a pointer to an ISR within code segment chosen by segment selector.
  • type
  • specifies gate type - task, trap or interrupt.

Interrupt service routines (ISR)

The main purpose of IDT is to store pointers to ISR that will be automatically invoked by CPU when it receives interrupts. Once you have configured IDT and enabled interrupts (sti) CPU will eventually pass the control to your handler with some works:
  • If the interrupt occurred in user space
  • CPU saves the current context (SS, ESP, EFLAGS, CS and EIP registers) onto the new stack, loads the CS and IP for the ISR, executes the instructions at the new privilege level.
  • If the interrupt occurred in kernel space
  • CPU will not switch stacks, meaning that in kernel space interrupt doesn’t have its own stack, instead, it uses the stack of the interrupted procedure.



留言

熱門文章