Paranoia [1] is the name for a program written by William Kahan in the
early 80’s. It was designed to characterize floating-point behavior of computer
systems. The goal of this project is to characterize the floating-point behavior
of graphics hardware, and in fact adopts some of the tests from the original
Paranoia
program.
Floating-Point?
A typical binary floating-point representation would be:
The sign is a single bit. The significand (or mantissa) and exponent are each represented by a fixed number of bits. The important thing to note is that this represents a finite set, not a continuous number system.
You can add, subtract, multiply, and divide, but not in the abstract mathematical sense. This is because the set of floating-point numbers is not closed over these operations. For example: Assume a 4 bit significand. We will ignore the issue of the implied leading one for now, and use one of the bits to store it explicitly. We will use a subscript to indicate the base of the representation. Let’s try the following addition:
The problem is that the exact result cannot be represented with a 4 bit significand. The closest we can come is either (17)10 = (1.001 x 24)2 or (18)10 = (1.010 x 24)2.
In 1987 the IEEE developed a standard for this choice
[2]
The rule is to round to the nearest value,
and in the case of a tie (as in the above case), round to whichever value is
even. Therefore, the result would be
18. The error in the answer, referred to
as rounding error
is 1/2 ulp (unit in last place).
In this case the ulp is one, so the error is
0.5, which is the maximum error you can get for this particular exponent.
CPUs are designed to follow the IEEE standard. GPUs do not.
This is the very question we are trying to answer. There is currently no standard in place for GPUs. Additionally, GPU manufacturers won’t even tell us what they do. So our only recourse is to figure it out for ourselves. Below are the results for two abstract systems and two real systems.
Table 1. Floating-Point Error in ulps* |
||||
Operation |
Exact Rounding |
Chopped |
R300/arbfp** |
NV35/fp30** |
Addition |
[-0.5,0.5] |
(-1,0] |
[-1.000, 0.000] |
[-1.000, 0.000] |
Subtraction |
[-0.5,0.5] |
(-1,1) |
[-1.000, 1.000] |
[-0.750, 0.750] |
Multiplication |
[-0.5,0.5] |
(-1,0] |
[-0.989, 0.125] |
[-0.782, 0.625] |
Division |
[-0.5,0.5] |
(-1,0] |
[-2.869, 0.094] |
[-1.199, 1.375] |
* Note that the R300 has a 16 bit significand, where as the NV35 has 23 bits. Therefore 1 ulp on an R300 is equivalent to 27 ulps on an NV35. ** Results obtained with Cg version 1.2.1, ATI driver 6.14.10.6444, NVIDIA dreiver 56.72 on a Windows/OpenGL platform |
It turns out that neither of these GPUs have an instruction for division. Instead, it is performed by a reciprocal and a multiply, so error is actually incurred twice. Also, division is more expensive to perform, and is therefore more likely to involve aggressive approximation relative to other operations.
We measure the error empirically. To exhaustively try all combinations of all possible floating-point numbers for all four operations would be impractical. Instead, we chose a subset of floating-point numbers that we believe does a reasonable job of characterizing the entire set. This is an approach used by others for testing correct operation of floating-point hardware. Schryer, for example, selected an important set of significands for testing floating-point [3]. Since we are interested in measurement rather than yes/no answers, we expanded the set and included additional exponents in order to push the limites of round-off error and cases where the most work must be performed, such as extensive carry propagation. The significands and exponents are given below.
significand patterns:
1.00...010... (all zeros except a single one, or Schryer's Type 1
)
1.10...010... (same as previous, but with additional leading one)
1.11...101... (all ones except a single zero)
1.00...01... (zeros then ones)
1.10...01... (same as previous, but with additional leading one)
1.11...10... (ones then zeros, or Schryer's Type 2
)
relative exponents:
No, there's plenty more to consider. First of all, there are a number of factors that affect floating-point behaviour here: hardware, driver, compiler, host sytem, etc. This is why we developed a tool that can be easly run whenever needed.
Semantics for programming GPUs currently allow for considerable leeway in how a program is implemented. Instructions can be re-ordered. Subexpressions involving constants or "uniform" parameters may be evaluated on the CPU. Associative and distributive properties, which do not hold for floating-point operations, may be applied in optimization. Our tool does not take into consideration the kinds of optimizations possible in larger program contexts.
The original Paranoia uses six tests for each
operation. The tests are chosen to be tough
cases for exact rounding, or failing that, chopping.
Therefore, the results are seems to be
rounded correctly
, seems to be chopped
or neither
.
There are further tests for a sticky bit, but
those tests are only used if rounding looks good up to that point.
Note that in the case of exact rounding or chopping, you know
the error bars. They are given in Table
1. If the result is neither
, you can’t
say anything about the accuracy of the floating-point operations (although
there are earlier checks to make sure the accuracy is at least reasonable).
A good place to start is What Every Computer Scientist Should Know About Floating-Point Arithmetic, which is listed in the references [4]. I’ve also included a link to the extended abstract that will go into the GP2 Proceedings [5].
Of course! I’ll give you source code and windows executables, but there are a few warnings here.
In general this is the raw code and executables I used to get numbers. There’s nothing nice about it! Repercussions follow:
assemblythat is produces. I'll leave issues of what is meant by
assemblyfor another day.
I may do some cleaning and updating in the future, but that will depend partly on whether this turns out to be useful.
Now that you've been properly warned, here's the info you need to use it: