GPU in Linux

Graphics Card Fundamentals




Outputs

  • VGA Outputs (D-Sub)
  • DVI Outputs
  • DVI stands for 'Digital Video/Visual Interface'
  • Composite Video
  • S-Video
  • Component Video
  • HDMI

Inputs

  • ISA
  • PCI
  • PCI stands for Peripheral Components Interconnect. It is a 32 bit wide bus that runs at 33 MHz, delivering a bandwidth of 133 MB/s
  • PCI-X
  • PCI-X stands for 'Peripheral Component Interconnect - Extended', which can be taken literally: Its 64 bit wide interface delivers up to 4,266 MB/s, depending on the bus clock speed. PCI-X (not to be confused with PCI Express!) was first a speed upgrade to the PCI bus, but was upgraded with certain features that are required in the server space. It is not very common in ordinary PCs, and PCI-X graphics cards are very rare.
  • AGP
  • AGP 8x at 2,1 GB/s. AGP is being replaced by the PCI Express interface on new motherboards, but AGP 8x (and even AGP 4x) still offer sufficient bandwidth for contemporary video cards.
  • PCI Express
  • PCI Express x16 (16 links) offers a bandwidth of 4 GB/s up and down or 8 GB/s total. The inferior slot options (x8, x4, x1) are not used for graphics.

The local graphics memory is usually placed right next to the graphics processor to keep traces as short as possible. This is important to reach high interface clock speeds.
Today's graphics cards are equipped with 128, 256 or 512 MB of local memory, and both DDR2 and GDDR3 memory products are being used. The more local memory the graphics processor can access, the more graphics data (mostly textures) can be stored locally, which means that it does not have to be swapped into the computer's main memory (RAM), which again would be a huge bottleneck.


Glossary Of Basic Graphics Terms


  • Refresh Rate
  • The amount of times that the graphics card will update this image every second. When the computer is processing frames faster than the monitor's refresh rate, this could be a problem that a frame is calculated and is displayed halfway through one of the monitor's refreshes. As a solution, V-sync (short for vertical synchronization) can be enabled, the calculated frames will never exceed the refresh rate.
  • Vertex
  • All objects in a 3D scene are made up of vertices. A vertex is a point in 3D space with X, Y, Z coordinates.
  • Texture
  • A texture is simply a 2D image, that is applied to a 3D object to simulate its surface.
  • Shader
  • A shader is a type of computer program originally used for shading in 3D scenes (the production of appropriate levels of light, darkness, and color in a rendered image). The basic flat shader and Phong shading: In general, there are two forms of shaders: pixel shaders and vertex shaders. Vertex shaders deform or transform 3D elements. Pixel shaders can change pixel colors based on complex input.
  • (Pixel) Fill Rates
  • The fill rate is generally referred to as the rate at which a graphics processor can draw pixels.

Graphics Processor Architecture


  • Vertex Processors (a.k.a. Vertex Shader Units)
  • vertex processors are components on the graphics processor designed to process shaders that affect only vertices.
  • Pixel Processors (a.k.a. Pixel Shader Units)
  • These processing units only do calculations regarding pixels(colors).
  • Texture Mapping Units (TMUs)
  • TMUs work in conjunction with pixel and vertex shader units.
  • Raster Operator Units (a.k.a. ROPs)
  • The raster operation processors are responsible for writing pixel data to memory. The speed at which this is done is known as the fill rate.
  • Pipeline
  • Pipeline is a term used to describe the graphics card's architecture. It is a conceptual model that describes what steps a graphics system needs to perform to render a 3D scene to a 2D screen. A graphics pipeline can be divided into three main parts: Application, Geometry and Rasterization.
    • Application
    • The application step is executed by the software on the main processor (CPU).
    • Geometry
    • The geometry step (with Geometry pipeline), which is responsible for the majority of the operations with polygons and their vertices (with Vertex pipeline).
    • Rasterization
    Graphics cards like the Radeon 9700 had eight pixel processors, each attached to a single TMU, and as such it was considered an eight-pipeline card.
  • Graphics Processor Clock Speed
  • Graphics processor clock speed is measured in Megahertz (MHz), which can be described as 'millions of cycles per second'. Clock speed isn't everything, however. 16 pipelines at 350 MHz would offer roughly the same performance with 8 pipelines at twice the speed (700 MHz).
  • Local Graphics Memory
  • RAM does come in useful is for higher-resolution texture sets. The amount of RAM has a very small impact on performance when compared to other considerations like clock speed and the memory interface. The memory bus is one of the most important aspects of memory performance. As the memory bus width increases, so does the amount of data that it can carry per cycle, and that is very important for performance. DDR (double data rate) transfers twice the data per clock cycle. For example, DDR memory is considered to be "1000 MHz DDR" memory (a.k.a. "1000 MHz effective") when its actual clock speed is 500 MHz. The difference between DDR, DDR2 and GDDR3 memory is only manufacturing technology.
  • Graphics Card Interface
  • There are three types of graphics interfaces currently in use: PCI, AGP, and PCI Express. PCI Express is the preferred graphics card slot today
  • Microsoft's DirectX And Shader Model Versions
  • DirectX and OpenGL are graphics API's. DirectX is Microsoft's creation. DirectX includes APIs for sound, music, input devices and media. The specific DirectX API that applies to 3D graphics is called Direct3D. As DirectX quickly increased in popularity and use, graphics processor manufacturers began to design their graphics processors to sync up with the newest DirectX capabilities. For this reason, graphics cards will often be described by their DirectX model version. Different Direct versions may support different Pixel Shader specification.
  • Anti-Aliasing
  • Aliasing (abbreviated 'AA') is a term to describe jagged or blocky patterns associated with displaying digital images.
  • Texture Filtering
  • High Definition Texture Sets
  • All of the required textures must fit into graphics card memory while playing, otherwise performance will be heavily taxed, as the extra textures required will have to be stored in slower system RAM, or even on the hard disk.

General Information about GPUs and Video Cards by Advanced Micro Devices (AMD)


Field explanations




How to Stress Test a Graphics Card on Linux


GpuTest

GpuTest is a cross-platform (Windows, Linux and Max OS X) GPU stress test and OpenGL benchmark. GpuTest comes with several GPU tests including some popular ones from Windows'world (FurMark or TessMark).
GpuTest is available for the following operating systems:
  • Windows 7 and 8, 64-bit
  • Linux 64-bit (Ubuntu-based, openSUSE)
  • OSX 10.7, 10.8 and 10.9
Download then unzip the file. From there, open a terminal window and type the following to start the GpuTest GUI:

python gputest_gui.py

Glxgears

To give you a quick indication of your current GPU framerate, you can use the Glxgears tool. This is a tool included with the Mesa 3D graphics library, available for Linux users. To install it,

$ sudo apt-get install mesa-utils
start Glxgears by typing

$ glxgears 
As you might guess from the name, it performs a framerate test by loading a 3D simulation of moving gears.
Every five seconds it logs the current framerate in the terminal window. If there are any sudden framerate drops, you can use this information as a sign to investigate your GPU further.

Glmark2


This is another popular open source GPU stress testing and OpenGL benchmark tool , forked from the original Glmark . available for Linux and Android platform.

$ sudo apt-get install glmark2

$ glmark2

this will open up the default 800x600 pixel window rendering various 3D objects.
Looping the process will simulate a heavy stress on the GPU,

$ glmark2 --run-forever

Unigine Benchmarks

The basic version is totally free, and it is available for Windows, Linux and Mac.
The tests would not run on lower-end graphic cards.
Benchmarks are displayed to choose from: “Valley,” “Heaven,”.
To install, go to the Unigine Benchmark website and select the tool you wish to download, then click the “download” button. The tool will come as a RUN file. Once downloaded, open a terminal window, go to the location of the file, and type:

sudo chmod +x ./Benchmark-Filename.run
./Benchmark-Filename.run

Enter the folder the installation file creates then execute the related binary.

AGP

The Accelerated Graphics Port (AGP) is a PCI bus technology enhancement that improves 3D graphics performance by using low-cost system memory. AGP chipsets use the Graphics Address Remapping Table (GART) to map discontiguous system memory into a contiguous PCI memory range (known as the AGP Aperture), enabling the graphics card to utilize the mapped aperture range as video memory.


The AGP aperture size is an available option configurable through the computer CMOS setup that is usually set to a default size of 64 MB. AGP aperture size defines how much system memory (not memory on your video card) the AGP controller uses for texture maps.

The agpgart driver creates a pseudo device node at /dev/agpgart and provides a set of ioctls for managing allocation/deallocation of system memory, setting mappings between system memory and aperture range, and setting up AGP devices. The agpgart driver manages both pseudo and real device nodes, but to initiate AGP-related operations you operate only on the /dev/agpgart pseudo device node. To do this, open /dev/agpgart.




To enable the Linux kernel configuration item CONFIG_AGP to build the AGP (Accelerated Graphics Port) driver:

/dev/agpgart (AGP Support) (AGP) [Y/m/?]


Next, you have to configure a set of options for the Direct Rendering Manager (DRM) — a device-independent driver that supports the XFree86 Direct Rendering Infrastructure (DRI).
DRI is meant for direct access to 3-D graphics hardware in advanced graphics cards, such as 3Dfx Banshee and Voodoo3+. To find out more about DRI, use the Web browser to visit the URL http://dri.freedesktop.org/wiki.

If you have a 3-D graphics card, you can answer Yes to DRM and build the module for the graphics card in your system. If you do not have one of the listed graphics cards, you should answer No to these options.

You should use the AGP driver that works best with your AGP chipset. If you are experiencing problems with stability, you may want to start by disabling AGP and seeing if that solves the problems. Then you can experiment with the AGP driver configuration.

To use the AGPGART driver provided by Linux 2.6 or more recent Linux kernels, both the AGPGART frontend module, agpgart.ko, and the backend module for your AGP chipset (nvidia-agp.ko, intel-agp.ko, via-agp.ko, ...) need to be statically linked into the kernel, or built as modules and loaded.

You can query the current AGP status at any time via the /proc filesystem interface if the backend module for your AGP chipset supports it.

If you are using a Linux 2.6 or newer Linux kernel that has the Linux AGPGART driver statically linked in (some distribution kernels do), you can pass the
agp=off
parameter to the kernel (via LILO or GRUB, for example) to disable AGPGART support.

The agp driver provides uniform, abstract methods for controlling the AGP devices.
The following ioctl(2) operations can be performed on /dev/agpgart, which are defined in :
  • AGPIOC_INFO
  • Returns state of the agp system. The result is a pointer to the following structure: typedef struct _agp_info { agp_version version; /* version of the driver */ uint32_t bridge_id; /* bridge vendor/device */ uint32_t agp_mode; /* mode info of bridge */ off_t aper_base; /* base of aperture */ size_t aper_size; /* size of aperture */ size_t pg_total; /* max pages (swap + system) */ size_t pg_system; /* max pages (system) */ size_t pg_used; /* current pages used */ } agp_info;
  • AGPIOC_ACQUIRE
  • Acquire control of the AGP chipset for use by this client. Re- turns EBUSY if the AGP chipset is already acquired by another client.
  • AGPIOC_RELEASE
  • Release control of the AGP chipset. This does not unbind or free any allocated memory, which is the responsibility of the client to handle if necessary.
  • AGPIOC_SETUP
  • Enable the AGP hardware with the relevant mode. This ioctl(2) takes the following structure: typedef struct _agp_setup { uint32_t agp_mode; /* mode info of bridge */ } agp_setup; The mode bits are defined in .
  • AGPIOC_ALLOCATE
  • Allocate physical memory suitable for mapping into the AGP aper- ture. This ioctl(2) takes the following structure: typedef struct _agp_allocate { int key; /* tag of allocation */ size_t pg_count; /* number of pages */ uint32_t type; /* 0 == normal, other devspec */ uint32_t physical; /* device specific (some devices * need a phys address of the * actual page behind the gatt * table) */ } agp_allocate; Returns a handle to the allocated memory.
  • AGPIOC_DEALLOCATE
  • Free the previously allocated memory associated with the handle passed.
  • AGPIOC_BIND
  • Bind the allocated memory at given offset with the AGP aperture. Returns EINVAL if the memory is already bound or the offset is not at AGP page boundary. This ioctl(2) takes the following structure: typedef struct _agp_bind { int key; /* tag of allocation */ off_t pg_start; /* starting page to populate */ } agp_bind; The tag of allocation is the handle returned by AGPIOC_ALLOCATE.
  • AGPIOC_UNBIND
  • Unbind memory from the AGP aperture. Returns EINVAL if the mem- ory is not bound. This ioctl(2) takes the following structure: typedef struct _agp_unbind { int key; /* tag of allocation */ uint32_t priority; /* priority for paging out */ } agp_unbind;


AMD’s Cayman GPU Architecture


留言

熱門文章