Fig. 1. Our unified lossless compression/decompression system can handle any type of data. We were able to obtain bandwidth savings of 1.5x, 7.7x, and 3.3x for the color, depth, and geometry data, respectively, shown above.
Abstract
In this work, we explore the lossless compression of 32-bit floating-point buffers on graphics hardware. We first adapt a state-of-the-art 16-bit floating-point color and depth buffer compression scheme for operation on 32-bit data and propose two specific enhancements: dynamic bucket selection and a Fibonacci encoder. Next, we describe a unified codec for any type of floating-point buffer: color, depth, geometry, and GPGPU data. We also propose a method to further compress variable-precision data. Finally, we test our techniques on color, depth, and geometry buffers from existing applications. Using our enhancements to an existing technique, we have improved bandwidth savings by an average of 1.26x. Our unified codec achieved average bandwidth savings of 1.5x, 7.9x, and 2.9x for color (including buffers incompressible by past work), depth, and geometry buffers. Even higher savings were achieved when combined with our variable-precision technique, though specific ratios will depend on the tolerance of the application to reducing its precision.
Proposed Enhancements
Dynamic Bucket Selection
As buffers are compressed, they are split into chunks of uniform input size to retain the ability to randomly access the contents. These input chunks can be compressed to different degrees, depending on their contents. Though any compression rate can be seen for any chunk, the compressed outputs are stored in memory segments of one of several discrete sizes to accommodate memory access atoms. We refer to this sizes as "compression buckets."
In past approaches, these bucket sizes have been chosen by the hardware designer to be 25% and 50% of the input chunk sizes. This does not seem to have caused any problems for 16-bit floating-point numbers, but we found that compressing 32-bit floating point numbers do not always lead to similar reductions in size. Further, different data sets may not be able to take advantage of such optimistic buckets. So, we propose the idea of selecting the buckets used for a given input buffer dynamically, to allow for the contents to dictate the bucket size and lead to higher effective compression rates.
We allow for three buckets to be chosen for any buffer in increments of 1/8th the original size. One final bucket, required for all buffers, captures any chunks that are not able to be compressed at all. The buckets are chosen for each buffer greedily: as soon as a tile is compressed that is not tightly bounded by an already-chosen bucket, the tightest bucket is added to the list of buckets for the active buffer. If three buckets have already been chosen, then the chunk is assigned the best-fitting bucket. A "buffer map" keeps track of the bucket assigned to each input chunk of the buffer.
Fibonacci Encoder
We explored the use of a Fibonacci encoder as a replacement for the standard unary encoder in one step of Golomb-Rice encoding, a common compression method for numeric data. Fibonacci encoding is ideally suited for this purpose: like unary encoding, it results in an instantaneous code which satisfies the prefix condiion. Furthermore, it also maps smaller-valued inputs to codewords with smaller lengths. It differs in two ways. First, its code words for larger values will be smaller than the corresponding code words produced by a unary encoder. Second, it is slightly more complicated, though its hardware implementation is not prohibitively expensive, as we detail in the full paper.
Dynamic Range Reduction
In our past work, we have seen that significant energy savings can be realized by reducing the precision of vertex and pixel shaders' arithmetic. With this reduced precision, there is no need to move or store the bits that do not add any information to the final results. So, we propose the addition of a dynamic range reduction step to any given compressor scheme to ignore these bits. In our approach, since we have interpreted floating-point numbers as integers, the unused mantissa bits will appear as LSBs of the integers. By shifting these integers to the right to truncate these bits, we can effectively reduce the range of the integer values, which will reduce the magnitude of their differences and, finally, their compressed size.
Unified Compressor
We believe that it will become important very soon for GPUs to be able to compress any given floating-point buffer. Increasing general purpose use and programmability of the GPU dictates this. To this end, we have designed a unified compressor that can handle color, depth, geometry, and GPGPU data. Its goal is to enable compression of data sets that past approaches have had to leave by the wayside: those with negative inputs, alpha channels whose values are not 1.0f at every point, and general buffers without a clear tile-based layout, such as geometry buffers or GPGPU data.
Results
We have designed a general-purpose compression and decompression scheme for 32-bit floating-point data on graphics hardware. It both outperforms an existing 16-bit compressor adapted to handle 32-bbit data and is able to compress general data. We have shown this capability by presenting promising compression rates for geometry data (vertex positions, normals, texture coordinates, etc.) for real-world applications. Average rates for color, depth, and geometry data are 1.5x, 7.9x, and 2.9x, respectively.
Furthermore, we have proposed two novel techniques applicable to any hardware compression scheme: dynamic bucket selection and the use of a Fibonacci encoder. These proposals increased compression ratios by averages of 1.25x and 1.06x, with maximum improvements of 2.4x and 1.7x, respectively. Note that these are not just compression rates, this also takes quantized storage into account. So, these results should not be viewed as a single tile seeing an improvement of 1.25x (for example), but as several tiles remaining unchanged, and several others improving by 2x.
Lastly, we have shown that extra savings are available by using range reduction on variable-precision data. The additional savings will depend on the specific application, but are expected to be between 5% and 20%, for overall color, depth, and geometry compression rates of 1.9x, 10.7x, and 3.6x, respectively.



