From Vertices to Pixels: the Transformation Pipeline
Kenny Hoff
6/17/97
One important problem that has been "glossed" over through this paper
is the complex interplay between the different coordinate systems at work
in a typical transformation pipeline. Typically, (as in OpenGL) a 3D vertex
in world coordinates goes through the following sequence of transformations
(excluding modeling transformations that take model coords to world coords):
-
Extend the given 3D vector (x,y,z) into homogeneous space by adding a w=1
component: (x,y,z)=>(x,y,z,1)
-
Transform the vector by the current composite transformation matrix C which
is composed of all modeling, viewing, and normalized perspective depth
transformations (does not include viewport transformation): (x,y,z,1)=>(x',y',z',w')
-
Perform homogeneous normalization by dividing through by w' and drop w
component to obtain 3D NDC coordinates where -1<=x,y,z<=1: (x',y',z',w')=>(x'/w',y'/w',z'/w',w'/w')=>(x'',y'',z'',1)=>(x'',y'',z'')
-
Perform viewport transformation by scale and translating (x'',y'') portion
of the NDC coordinates to form pixel coordinates: (x'',y'',z'')=>(x''',y''',z'').
z'' remains the same as becomes the associated Z-value for depth comparisons
(depending on the implementation this Z-value is sometimes inverted or
scaled and then stored).
To summarize, the different coordinate systems and possible uses are listed:
-
3D world coordinate system (x,y,z): modeling and scene description
-
4D normalized homogeneous system (x,y,z,1): more convenient for
4x4 matrix transformations that includes translations and perspective
-
4D homogeneous system after composite transformation (x,y,z,w):
useful for efficient clipping. Values for x, y, and z should be clipped
to the range [-w,w] so that after the perspective divide x, y, and z are
in [-1,1] (why? -1<=x/w<=1 after persp div, mult through by w and
-w<=x<=w before persp div).
-
3D NDC (Normalized Device Coordinates) fitting in ([-1,1],[-1,1],[-1,1])
(x,y,z): provides efficient means for viewport scaling of the (x,y) values
and the Z-values are normalized for greater precision.
-
3D screen space (Pixel space) (x,y,z): rasterization and Z-buffered
hidden surface removal. (x,y) is the pixel location in ([0,ScreenWidth-1],[0,ScreenHeight-1])
with (0,0) being the top-left corner (as opposed to OpenGL being the bottom-left).
The Z-value here is a floating point number in [-1,1] and is scaled to
fit within [0,1] for "maximum precision" with the following operation:
NewZ=(OldZ+1)*0.5 (translation of +1 followed by a scaling of
0.5); the resulting normalized Z is depth-compared and stored in the Z-buffer
as necessary.
Fixed-Point Scaling for Z: the value stored in the Zbuffer ranges
from 0 to 1, but typically the Zbuffer contains and compares only fixed-point
integers as opposed to the floating-point values we have computed. PixelFlow
has 32 unsigned bits for depth so the normalized Z values are scaled to
completely cover this range. In fixed-point terminology, the values in
the Z-buffer are in a 0.32 split, meaning that there are 0 whole number
bits and 32 fractional bits. The basic idea is that we must fit the range
[0,1] into [0,232-1] where 232-1 is the maximum value
obtainable by a 32-bit unsigned integer. So ideally our scaling factor
(Zscale) should be 232-1; however, this value cannot be stored
as a floating point number. We have to make a tradeoff and take the next
largest positive float as our Zscale : 4294967040. All Z-compares and writes
are performed in this 0.32 fixed-point system, but if we require a normalized
floating point value again we can simply divide by Zscale.
Depth-Range Scaling for Z: the Z-value before fixed-point scaling
must lie in the range [0,1]; however, before conversion to fixed-point
the Z-values can be scaled and translated to fit inside an interval within
[0,1] specified by the user as [NearRange,FarRange] where NearRange and
FarRange are in [0,1] and NearRange is less than or equal to FarRange.
This calculation is simply: DepthRangeScaledZ = ((Far-Near)/2)*NormalizedZ
+ (Near+Far)/2 The final fixed-point Z-value is then calculated as
follows: FinalZ = Zscale * DepthRangeScaledZ