Introduction
Most present day computers
sport powerful GPUs (Graphics
Processing
Units)
capable of beating
even high-end CPUs in terms of raw FLOPS when performing at their peak.
Consequently, there has been lot of work recently on implementing
different problems on GPUs to exploit their high processing power. One
such problem is sorting, which has been a major computational bottleneck in
applications spanning the breadth of computer science, most importantly
in databases and data mining. We have designed algorithms to perform sorting on the GPU and extract the maximum computational throughoutput of the GPU. Our algorithm maps the bitonic sorting network to GPUs using simple texture mapping
operations and effectively utilizes the memory bandwidth using cache-efficient memory accesses. We have observed that our implementation is faster than qsort() running on a 3.4 GHz Pentium processor system
with a NVIDIA GeForce 6800 Ultra GPU. GPUSort is a C++ class which
contains our implementation. It is capable of sorting floating point
values in both 16 and
32-bit precision, and can also quickly sort records based on a floating point key. Please
refer
to the documentation for details regarding the API and the contents of
the distribution. Also, please read through the system requirements below
before using the library.
System Requirements
- OS:
Microsoft Windows XP/2000 and Linux
- RAM: Atleast a size of graphics processor video memory is required.
- GPU:
NVIDIA GeForce/Quadro family card with
support for the following OpenGL extensions:
- EXT_framebuffer_object
- ARB_texture_rectangle
- ARB_fragment_program
- The above requirements are
met by NV30-based GPUs and above
(GeForce FX and GeForce 6 series)
- The library has been tested
on the following cards:
- GeForce 6800 Ultra/GTO
- GeForce 5950 Ultra
- GeForce 5800
- Quadro FX 4000
- Laptop graphics cards:
QuadroFX 700 Go, QuadroFX 1000 Go,
QuadroFX 1400 Go, GeForce 6800 Go
For obtaining reasonably high performance, we recommend a PC with AGP8X/PCI-Express NVIDIA GeForce 6800 GT or faster GPU.
- Video
RAM: The Video RAM will
determine the
maximum array length that can be sorted on the GPU. A rough guideline
for sorting 32-bit floats is: Maximum array length in millions = Video
RAM in MB / 16. Therefore, on a card with 256 MB VRAM, the
maximum-length array which can be sorted is 256/16 = 16 Million values
- Drivers: Latest drivers from
NVIDIA (version 7772 or higher for windows, and 7664 for linux)
.
Note:
- ATI
cards: ATI cards are not
supported
in the present release of GPUSort mainly due to the lack of suport for framebuffer objects and ARB_texture_rectangle in current ATI drivers. These cards will be supported in future releases.
- Non
power-of-two arrays: The
present
implementation will not sort arrays with non power-of-two size completely
on the GPU. Our present implementation takes the largest power of two
smaller than the input array size, sorts that part on the GPU and
merges the result with the "tail" after running qsort() on it.
Therefore, the code will run
slower on non-power-of-two arrays
because the CPU will also be used.
We are currently working on
addressing this issue.
Related Links
UNC GAMMA Group
GPGP: General
Purpose Computation using Graphics Hardware
Publications
- A
Cache-Efficient Sorting Algorithm for Database and Data
Mining
Computations using Graphics Processors,
Naga K. Govindaraju, Nikunj Raghuvanshi, Michael Henson, Dinesh Manocha
UNC Tech. Report 2005
- Fast and
Approximate Stream Mining of Quantiles and Frequencies Using Graphics
Processors,
Naga K. Govindaraju, Nikunj Raghuvanshi, Dinesh Manocha
ACM
SIGMOD 2005.
- Interactive
Visibility Ordering and Transparency Computations among Geometric
Primitives in Complex Environments,
Naga K. Govindaraju, Michael Henson, Ming Lin, Dinesh Manocha
ACM
Symposium
on Interactive 3D Graphics and Games 2005.
- Vis-Sort:
Fast Visibility Ordering of 3-D Geometric
Primitives,
Naga K. Govindaraju, Ming Lin, Dinesh Manocha
UNC Tech. Report 2004.
©2003 Department of Computer Science, UNC
Chapel Hill