GP2 Logo Animation
ACM Workshop on General Purpose Computing on Graphics Processors

Sponsored by and Co-located with ACM SIGGRAPH
August 7-8, 2004
Wilshire Grand Hotel, Los Angeles, California


The workshop program is here.


Download The Official GP2 Workshop Proceedings (11.2 MB PDF).

Poster Session

A list of poster presentations is here.

Panel Discussion

Title: "GPUs and CPUs: The Uneasy Alliance?"

Panel Slides (4MB PPT)

Moderator: Peter N. Glaskowsky (MicroDesign Resources)

The invited panelists are

  • Mike Doggett (ATI Technologies)
  • Dave Kirk (NVIDIA)
  • Adam Lake (Intel Corporation)
  • Bill Mark (University of Texas at Austin)
  • Neil Trevett (3Dlabs)

Panel Description

Today's high-end 3D chips are fully programmable microprocessors with extraordinary computational power. On suitable tasks--such as those associated with 3D rendering--these graphics processing units (GPUs) are orders of magnitude faster than conventional CPUs. We have invited representatives from the four most important providers of 3D chips, and a leading academic expert on this subject, to discuss how GPUs and CPUs will both cooperate and compete as computational resources within personal computers and workstations.

Panelist Bios

Neil Trevett is Senior Vice President for Market Development at 3Dlabs, Inc. Trevett also serves as President of the Web3D Consortium and secretary of the Khronos Group developing the OpenML and OpenGL ES standards for dynamic media processing and graphics APIs for embedded appliances and applications.

Michael Doggett is an architect at ATI. He is working on upcoming graphics hardware for microsoft and desktop PC graphics chips. Before joining ATI, Doggett was a post doc at the University of Tuebingen in Germany and completed his Ph.D. at the University of New South Wales in Sydney, Australia.

Adam Lake is a Sr. Software Engineer at Intel specializing in 3D graphics. Previous areas of work include stream processing, compilers for high level shading languages, and non-photorealistic rendering. He holds an M.S. degree from the University of North Carolina at Chapel Hill.

David Kirk has been NVIDIA's Chief Scientist since January 1997. Prior to joining NVIDIA, Kirk held positions at Crystal Dynamics and the Apollo Systems Division of Hewlett-Packard Company. Kirk holds M.S. and Ph.D. degrees in Computer Science from the California Institute of Technology.

Bill Mark is an assistant professor in the Department of Computer Sciences at the University of Texas at Austin. Mark was the lead architect of NVIDIA's Cg language and development system. He holds a Ph.D. from the University of North Carolina at Chapel Hill.

Invited Speakers

The following speakers have been confirmed for GP2.

Talk Abstracts

The All-Purpose Unit (APU) based on a tiled-processor architecture: A Logical Successor to the CPU, GPU, and NPU?

Anant Agarwal, CSAIL, Massachusetts Institute of Technology

Users expect computers to double in performance every 3 years. Keeping to this growth schedule has become challenging as microprocessors face serious technological constraints related to wire delays and power. The potential for performance growth, however, certainly exists, because technology continues to offer exponentially more transistors per chip. If it is achieved, the performance growth will in turn enable even more applications such as graphics and signal processing to be run effectively with general purpose processors.

This talk will introduce an approach called tiled processor architecture (TPA) to meet these challenges, even as it takes advantage of the increasing number of transistors to run a wide class of applications including sequential ILP and streams. The talk will discuss whether tiled-processor architectures can enable a new kind of polymorphic processor, or an APU, which attempts to combine the best of the CPU and GPU worlds. The talk will also describe a single-chip TPA prototype we have built at MIT called the Raw microprocessor ( The talk will also introduce new compilation techniques to take full advantage of such TPAs. Results from the VersaBench application suite including several desktop and embedded programs will be presented.

On the Power of Streaming Table-Lookup

Frederick P. Brooks, Jr., University of North Carolina at Chapel Hill

Streaming table-lookup was first introduced as three Convert operations in IBM 709 in 1957. This is a powerful and flexible way to process data, especially when only one or two transformations are to be performed on each datum, but the set of possible transformations is large. I submit that much of the flexibility and power of GPUs comes from exactly this capability.

An interesting computer to exploit this concept was the IBM Harvest, the Model 7950, delivered to the NSA in 1962 and retired only after fourteen years of active use. I will sketch the organization and give examples of how the streaming table-lookup can be used.

Streaming table-lookup becomes substantially more powerful when the results of particular lookup not only go into the output stream but can also change the base table location for the next lookup, i.e., process state.

"Compiler Support for GPUs: Challenges, Obstacles, and Opportunities"

Keith D. Cooper, Rice Univiersity

Graphic processors are high-speed, domain-specific processors. Consumer demand for graphic interfaces has produced an intensely focused design effort for these devices, resulting in increasing processing power and decreasing prices. Simple economics compel us to examine the opportunities to use GPUs in general-purpose computation.

To make GPUs useful for general-purpose computation, we need

  1. mechanisms to identify those portions of a computation that are well suited to execution on the GPU;
  2. tools that produce high-quality code for those program segments and link the code into an executable form; and
  3. tools to debug (both performance and correctness) the resulting programs.

This talk will survey the technical challenges that arise in attacking these problems and the available solutions to them. It will discuss the infrastructure issues that make addressing these problems difficult. It will suggest ways that the community can improve the likelihood of a successful attack on these problems.

"Stream Processors vs. GPUs"

Bill Dally, Stanford University

GPUs have been evolving from fixed-function pipelines toward programmable stream processors. This evolution, however, still has a considerable way to go. While modern GPUs have considerable programmable arithmetic performance, they lack much of the storage and communication facility of stream processors making it difficult to exploit some forms of locality. This talk will describe stream processors, contrast them to GPUs, and suggest how the evolution can be completed by augmenting the storage and communication of a GPU.

"Utilizing Commercial Graphics Processors in the real-time Geo-Registration of Streaming High-Resolution Imagery"

Laurence Flath, Michael Kartz, and Randy Frank
Lawrence Livermore National Laboratory

There are numerous image and data processing applications that are computationally constrained to the point that real-time results are not possible. Comparison of data recorded from diverse platforms necessitates mapping into a common format. This generally requires significant computational resources and thus often relegated to a post-processing role. To provide the performance required for automated real-time information extraction from surveillance imagery, the huge data volumes generated by such platforms must be reduced to accommodate realistic analysis, local storage capacity, and transmission bandwidths. Here we present the results of our investigation into mapping the required algorithms, appropriate for the real-time geo-registration of large streaming imagery, into a GPU. We have demonstrated that these algorithms map well into the processing units available in today's GPUs. We also demonstrate, that many of the operators required for these types of computations are available in commercial graphics processors. We show that novel means of using the high memory bandwidth and massive parallelism in commodity GPUs can be developed to overcome many processing limitations.

"Stream Programming Environments"

Pat Hanrahan, Stanford

GPUs are examples of massively parallel chip-multiprocessors. How should we program these cheap, ubiquitous, high-performance processors? One approach is using a graphics API such as OpenGL and DX++; at the other extreme is threads or communicating sequential processors. As an alternative, we advocate classic data parallel programming, enhanced to support modern stream processors. In this talk I will discuss different approaches to programming GPUs, describe our work on the Brook programming environment, and describe several applications ported to GPUs using Brook.

"Using GPUs as CPUs for Engineering Applications: Challenges and Issues"

Michael A Heroux, Sandia National Laboratories

Although production-quality engineering applications are very complex, a few key computational kernels can provide insight into the performance potential of a computer system on a broad set of these codes. In this presentation we provide an overview of some key kernels used in engineering applications, discussing the computational and memory access behavior of these operations. We also discuss how these kernels are combined to provide solvers for applications, often the most time-comsuming phase of an application run. Finally, we briefly present some issues related to using 32-bit floating point hardware for these applications.

"The case for Asymmetric Multiprocessor Architecture"

Kai li, Princeton University

Graphics processors have presented CPU-intensive applications with new opportunities at a relatively low cost. However, they have several limitations including relatively low clock frequency, relatively low communication bandwidth with the main processor, and relatively small memory. This talk describes a new research project at Princeton that investigates architectural and system issues in designing Asymmetric Chip MultiProcessor (ACMP) which is built with many heterogenous processing elements that supports a combination of threading programming model and data parallelism.

"GPUs: Economic Attraction and Performance Challenges"

Dan Reed, University of North Carolina at Chapel Hill

Each generation of computing technology -- mainframes, minicomputers, workstations, PCs and mobile devices -- has broadened the base of users. Graphics processors are the latest in this long history, albeit designed for specialized tasks. We will examine the tension between economics and mass marketing on the one hand and balanced system design for scientific computing on the other. The talk will conclude with some insights from application porting and performance measurement on the Sony PlayStation2.

"GPU Requirements for Large Scale Scientific Applications"

Mark Seager, Lawrence Livermore National Lab

The development and use of large scale scientific applications places heavy demands on the hardware and software environments. In order to have a wide range of scientific applications ported to GPUs, the code development and runtime environments are critical. For instance, a standards based programming approach to dividing work between the CPU and GPUs should be adopted. Also, automatic and manual methods for data sharing between GPU, CPU and high performance networking is essential. Basic hardware requirements like 64b floating point arithmetic and delivered memory bandwidth can also be barriers to adopting GPUs. In this talk we discuss the critical applications development hardware and software issues surrounding utilizing GPUs for large scale scientific applications.

"Graphics Memory for General Applications"

Turner Whitted, Microsoft Research

The factors of memory size, bandwidth, and latency that limit the performance of general purpose processors are just as much of a limiting factor for GPUs. However there are numerous differences in the ways that GPUs and CPUs address and access memory and those, just as much as physical factors, dictate the form of general-purpose applications on GPUs. This talk will discuss both the physical characteristics of memory and the behavior of a few sample applications programmed for GPUs and operating on data in graphics memory.