Apr 8, 2012

Graphics Processing Unit (GPU)

A graphics processing unit or GPU (also occasionally called visual processing unit or VPU) is a specialized electronic circuit designed to rapidly manipulate and alter memory in such a way so as to accelerate the building of images in a frame buffer intended for output to a display. GPUs are used in embedded systems, mobile phones, personal computers, workstations, and game consoles. Modern GPUs are very efficient at manipulating computer graphics, and their highly parallel structure makes them more effective than general-purpose CPUs for algorithms where processing of large blocks of data is done in parallel. In a personal computer, a GPU can be present on a video card, or it can be on the motherboard or—in certain CPUs—on the CPU die. More than 90% of new desktop and notebook computers have integrated GPUs, which are usually far less powerful than those on a dedicated video card.
 

History

1980s

 
In 1983 Intel made the Video Graphics Controller Multimodule Board for industrial systems based on the Multibus standard. The framebuffer was also accelerated through loading via DMA. The board was intended for use with Intel's line of Multibus industrial single board computer plugin cards.
The GPU supported line draw, area fill, and included a type of stream processor called a blitter which accelerated the movement, manipulation, and combination of multiple arbitrary bitmaps. Also included was a graphics coprocessor with its own (primitive) instruction set. Prior to this and quite some time after, many other personal computer systems required a general purpose CPU to handle every aspect of drawing the display. In 1986, Texas Instruments released the first microprocessor with on-chip graphics capabilities. It could run general-purpose code, but it had a very graphics-oriented instruction set. In 1990-1991, this chip became the basis of the Texas Instruments Graphics Architecture ("TIGA").
 

1990s

In 1991, S3 Graphics introduced the S3 86C911, which its designers named after the Porsche 911 as an indication of the performance increase it promised.
Throughout the 1990s, 2D GUI acceleration continued to evolve. In the early and mid-1990s, CPU-assisted real-time 3D graphics were becoming increasingly common in computer and console games, which led to an increasing public demand for hardware-accelerated 3D graphics. Early examples of mass-marketed 3D graphics hardware can be found in fifth generation video game consoles. These chips were essentially previous-generation 2D accelerators with 3D features bolted on. Many were even pin-compatible with the earlier-generation chips for ease of implementation and minimal cost. Initially, performance 3D graphics were possible only with discrete boards dedicated to accelerating 3D functions. However, as manufacturing technology again progressed, video, 2D GUI acceleration, and 3D functionality were all integrated into one chip. Rendition's Verite chipsets were the first to do this well enough to be worthy of note.
OpenGL appeared in the early 90s as a professional graphics API, but originally suffered from performance issues which allowed the Glide API to step in and become a dominant force on the PC in the late 90s.However these issues were quickly overcome and the Glide API fell by the wayside. Software implementations of OpenGL were common during this time although the influence of OpenGL eventually led to widespread hardware support. Over time a parity emerged between features offered in hardware and those offered in OpenGL. DirectX became popular among Windows game developers during the late 90s. Unlike OpenGL, Microsoft insisted on providing strict one-to-one support of hardware. The approach made DirectX less popular as a stand alone graphics API initially since many GPUs provided their own specific features, which existing OpenGL applications were already able to benefit from, leaving DirectX often one generation behind. Over time Microsoft began to work more closely with hardware developers, and started to target the releases of DirectX with those of the supporting graphics hardware. Direct3D 5.0 was the first version of the burgeoning API to gain widespread adoption in the gaming market, and it competed directly with many more hardware specific, often proprietary graphics libraries, while OpenGL maintained a strong following. Direct3D 7.0 introduced support for hardware-accelerated transform and lighting (T&L) for Direct3D, while OpenGL already had this capability already exposed from its inception. 3D accelerators moved beyond being just simple rasterizers to add another significant hardware stage to the 3D rendering pipeline.  Hardware transform and lighting, both already existing features of OpenGL, came to consumer-level hardware in the 90s and set the precedent for later pixel shader and vertex shader units which were far more flexible and programmable.
 

2000 to present

 
With the advent of the OpenGL API and similar functionality in DirectX, GPUs added programmable shading to their capabilities. Each pixel could now be processed by a short program that could include additional image textures as inputs, and each geometric vertex could likewise be processed by a short program before it was projected onto the screen. Nvidia was first to produce a chip capable of programmable shading. By October 2002, with the introduction of the ATI, the world's first Direct3D 9.0 accelerator, pixel and vertex shaders could implement looping and lengthy floating point math, and in general were quickly becoming as flexible as CPUs, and orders of magnitude faster for image-array operations. Pixel shading is often used for things like bump mapping, which adds texture, to make an object look shiny, dull, rough, or even round or extruded. As the processing power of GPUs has increased, so has their demand for electrical power. High performance GPUs often consume more energy than current CPUs. See also performance per watt and quiet PC. Today, parallel GPUs have begun making computational inroads against the CPU, and a subfield of research, dubbed GPU Computing or GPGPU for General Purpose Computing on GPU, has found its way into fields as diverse as machine learning,  oil exploration, scientific image processing, linear algebra, statistics, 3D reconstruction and even stock options pricing determination. More recently OpenCL has become broadly supported.
 

Computational functions

 
Modern GPUs use most of their transistors to do calculations related to 3D computer graphics. They were initially used to accelerate the memory-intensive work of texture mapping and rendering polygons, later adding units to accelerate geometric calculations such as the rotation and translation of vertices into different coordinate systems. Recent developments in GPUs include support for programmable shaders which can manipulate vertices and textures with many of the same operations supported by CPUs, oversampling and interpolation techniques to reduce aliasing, and very high-precision color spaces. Because most of these computations involve matrix and vector operations, engineers and scientists have increasingly studied the use of GPUs for non-graphical calculations.
In addition to the 3D hardware, today's GPUs include basic 2D acceleration and framebuffer capabilities (usually with a VGA compatibility mode).
 

GPU accelerated video decoding

Most GPUs made since 1995 support the YUV color space and hardware overlays, important for digital video playback, and many GPUs made since 2000 also support MPEG primitives such as motion compensation and iDCT. This process of hardware accelerated video decoding, where portions of the video decoding process and video post-processing are offloaded to the GPU hardware, is commonly referred to as "GPU accelerated video decoding", "GPU assisted video decoding", "GPU hardware accelerated video decoding" or "GPU hardware assisted video decoding".
More recent graphics cards even decode high-definition video on the card, offloading the central processing unit.
 

Video decoding processes that can be accelerated

The video decoding processes that can be accelerated by today's modern GPU hardware are:
  • Motion compensation (mocomp)
  • Inverse discrete cosine transform (iDCT)
    • Inverse telecine 3:2 and 2:2 pull-down correction
  • Inverse modified discrete cosine transform (iMDCT)
  • In-loop deblocking filter
  • Intra-frame prediction
  • Inverse quantization (IQ)
  • Variable-Length Decoding (VLD), more commonly known as slice-level acceleration
  • Spatial-temporal deinterlacing and automatic interlace/progressive source detection
  • Bitstream processing (CAVLC/CABAC).And perfect pixels positioning.

GPU forms

Dedicated graphics card

 
The GPUs of the most powerful class typically interface with the motherboard by means of an expansion slot such as PCI Express (PCIe) or Accelerated Graphics Port (AGP) and can usually be replaced or upgraded with relative ease, assuming the motherboard is capable of supporting the upgrade. A few graphics cards still use Peripheral Component Interconnect (PCI) slots, but their bandwidth is so limited that they are generally used only when a PCIe or AGP slot is not available.
A dedicated GPU is not necessarily removable, nor does it necessarily interface with the motherboard in a standard fashion. The term "dedicated" refers to the fact that dedicated graphics cards have RAM that is dedicated to the card's use, not to the fact that most dedicated GPUs are removable. Dedicated GPUs for portable computers are most commonly interfaced through a non-standard and often proprietary slot due to size and weight constraints. Such ports may still be considered PCIe or AGP in terms of their logical host interface, even if they are not physically interchangeable with their counterparts.
 

Integrated graphics solutions

Integrated graphics solutions, shared graphics solutions, or Integrated graphics processors (IGP) utilize a portion of a computer's system RAM rather than dedicated graphics memory. They are integrated into the motherboard. Exceptions are AMD's IGPs that use dedicated sideport memory on certain motherboards, and APUs, where they are integrated with the CPU die. Computers with integrated graphics account for 90% of all PC shipments. These solutions are less costly to implement than dedicated graphics solutions, but tend to be less capable. Historically, integrated solutions were often considered unfit to play 3D games or run graphically intensive programs but could run less intensive programs such as Adobe Flash. While older platforms had the IGP integrated onto the motherboard, newer platforms integrate the GPU right onto the CPU die. As a GPU is extremely memory intensive, an integrated solution may find itself competing for the already relatively slow system RAM with the CPU, as it has minimal or no dedicated video memory. IGPs can have up to 29.856 GB/s of memory bandwidth from system RAM, however graphics cards can enjoy up to 16GB/s of bandwidth over PCIe 3.0. Older integrated graphics chipsets lacked hardware transform and lighting, but newer ones include it.
 

Hybrid solutions

This newer class of GPUs competes with integrated graphics in the low-end desktop and notebook markets. The most common implementations of this are ATI's HyperMemory and Nvidia's TurboCache. Hybrid graphics cards are somewhat more expensive than integrated graphics, but much less expensive than dedicated graphics cards. These share memory with the system and have a small dedicated memory cache, to make up for the high latency of the system RAM. Technologies within PCI Express can make this possible. While these solutions are sometimes advertised as having as much as 768MB of RAM, this refers to how much can be shared with the system memory.
 

Stream processing and general purpose GPUs (GPGPU)

It is becoming increasing common to use a general purpose graphics processing unit as a modified form of stream processor. This concept turns the massive floating-point computational power of a modern graphics accelerator's shader pipeline into general-purpose computing power, as opposed to being hard wired solely to do graphical operations. In certain applications requiring massive vector operations, this can yield several orders of magnitude higher performance than a conventional CPU. In certain circumstances the GPU calculates forty times faster than the conventional CPUs traditionally used by such applications. GPGPU can be used for many types of embarrassingly parallel task including ray tracing, computational fluid dynamics and weather modelling. They are generally suited to high-throughput type computations that exhibit data-parallelism to exploit the wide vector width SIMD architecture of the GPU.
Furthermore, GPU-based high performance computers are starting to play a significant role in large-scale modelling. Three of the 5 most powerful supercomputers in the world take advantage of GPU acceleration.
Since 2005 there has been interest in using the performance offered by GPUs for evolutionary computation in general, and for accelerating the fitness evaluation in genetic programming in particular. Most approaches compile linear or tree programs on the host PC and transfer the executable to the GPU to be run. Typically the performance advantage is only obtained by running the single active program simultaneously on many example problems in parallel, using the GPU's SIMD architecture. However, substantial acceleration can also be obtained by not compiling the programs, and instead transferring them to the GPU, to be interpreted there. Acceleration can then be obtained by either interpreting multiple programs simultaneously, simultaneously running multiple example problems, or combinations of both. A modern GPU can readily simultaneously interpret hundreds of thousands of very small programs.

References

  1. Denny Atkin. "Computer Shopper: The Right GPU for You". Retrieved 2007-05-15.
  2. Michael Swaine, "New Chip from Intel Gives High-Quality Displays", March 14, 1983, p.16
  3. 3dfx Glide API
  4. Søren Dreijer. "Bump Mapping Using CG (3rd Edition)". Retrieved 2007-05-30.
  5. http://www.xbitlabs.com/articles/video/display/power-noise.html X-bit labs: Faster, Quieter, Lower: Power Consumption and Noise Level of Contemporary Graphics Cards
  6. http://dl.acm.org/citation.cfm?id=1553486
  7. "Linear algebra operators for GPU implementation of numerical algorithms", Kruger and Westermann, International Conf. on Computer Graphics and Interactive Techniques, 2005
  8. "ABC-SysBio—approximate Bayesian computation in Python with GPU support", Liepe et al., Bioinformatics, (2010), 26:1797-1799 
  9. Khronos Group
  10. Q3 Sales Report from Jon Peddie Research via TechReport.com
  11. http://www.s3graphics.com/en/products/index.aspx
  12. http://www.via.com.tw/en/products/graphics
  13. http://www.matrox.com/graphics/en/products/graphics_cards
  14. AnandTech: µATX Part 2: Intel G33 Performance Review
  15. Tim Tscheblockov. "Xbit Labs: Roundup of 7 Contemporary Integrated Graphics Chipsets for Socket 478 and Socket A Platforms". Retrieved 2007-06-03.
  16. Bradley Sanford. "Integrated Graphics Solutions for Graphics-Intensive Applications". Retrieved 2007-09-02.
  17. Darren Murph. "Stanford University tailors Folding@home to GPUs". Retrieved 2007-10-04.
  18. Mike Houston. "Folding@Home - GPGPU". Retrieved 2007-10-04.
  19. http://pressroom.nvidia.com/easyir/customrel.do?easyirid=A0D622CE9F579F09&version=live&prid=678988&releasejsp=release_157
  20. John Nickolls. "Stanford Lecture: Scalable Parallel Programming with CUDA on Manycore GPUs".
  21. S Harding and W Banzhaf. "Fast genetic programming on GPUs". Retrieved 2008-05-01.
  22. W Langdon and W Banzhaf. "A SIMD interpreter for Genetic Programming on GPU Graphics Cards". Retrieved 2008-05-01.
  23. V. Garcia and E. Debreuve and M. Barlaud. Fast k nearest neighbor search using GPU. In Proceedings of the CVPR Workshop on Computer Vision on GPU, Anchorage, Alaska, USA, June 2008.
  24. http://developer.amd.com/gpu/AMDAPPSDK/assets/OpenVideo_Decode_API.PDF OpenVideo Decode (OVD) API
  25. http://www.tuaw.com/2011/01/20/xbmc-for-ios-and-atv2-now-available/ XBMC for iOS and Apple TV now available
  26. "MATLAB Adds GPGPU Support". 2010-09-20.

0 comments:

Favorite Blogs