Pixel-Warping from One View to Another: the Easy
Way
Kenneth E. Hoff III
06/18/98
The goal here is to warp the pixels in one view to the pixels in another view. A view is defined by a camera that specifies a perspective projection and a viewing matrix (for those knowledgeable of OpenGL: we are refering to the GL_PROJECTION matrix and the GL_MODELVIEW matrix minus the modeling part respectively). Typically the complete viewing matrix for an object that takes it from object space to screen space (pixels with depth) is formed as follows:
View = Viewport * Projection * Viewing * Modeling
(We will be assuming that each matrix is premultiplied using column vectors)
So the View matrix takes an object space vertex to the view's pixel screen space. Each matrix is described below:
Modeling: object-space to world-space transformation. World-space is the common area where all objects can be organized for the entire scene. Object-space is the local modeling area; this may be hierarchical for more complex organizations.
Viewing: world-space to eye-space transformation. This is typically defined by the camera's position and orientation (the camera pose). In OpenGL, this is typically defined by the glLookat() matrix.
Projection: eye-space to 4D homogeneous space. This transformation is typically defined by some sort of perspective matrix. Typically, in OpenGL, we use the gluPerspective or the glFrustum call to generate this matrix. Often the perspective divide is performed after this level of transformation to enter NDC space where x,y,z are in [-1,1].
Viewport: technically this takes screen-space to a scaled screen-space that fits the current window viewport. Often this includes a depth transformation as well that takes the NDC [-1,1] z-values and maps them to [0,1]. The spaces transformed between are typically defined by where the perspective divide is performed. It could be done after this transformation as well resulting in a "scaled" NDC space where (x,y) is a pixel value and z is the typical depth value.
We will assume that the perspective divide is performed AFTER the composite View transformation. Clipping is more efficient if the perspective divide is after the projection stage, but it adds the complexity of a post-scaling.
So now that we have defined a complete view matrix, we will now get to our real problem. Given two views each with their own composite viewing matrices View1 and View2, for a given screen-space point (x,y,z) in view 1 we wish to find its values (x',y',z') in view 2. (x,y) refers to the pixel location and z is the pixel's "depth" value. The problem basically reduces to finding the view 1 point (x,y,z) in view 2 by a simply 4x4 matrix transformation. We obtain a composite "warping" matrix M that takes us from view 1 screen-space to view 2 screen-space as follows:
Warping Matrix M = View2 * View1^-1
The inverse of View1 transforms from View1 screen-space to object-space and View2 transforms from object-space to View2 screen-space, so the transformation proceeds as follows: View1 screen-space ==> object-space ==> View2 screen-space. So warping proceeds as follows:
[x' y' z']^T =/w= M * [x y z]^T
=/w= means equivalent after perspectively dividing: [x y z w] => [x/w y/w z/w 1] To transform all "pixels" from view 1 to view 2 we must perform a matrix-vector multiply and a perspective divide for all pixels in the source image - YIKES! This results in 12 multiplies, 12 adds, and 3 divides per pixel! We can take advantage of coherence to reduce this dramatically. In order to transform all pixels we will step through each value of x for each value of y. The looping structure will look like this:
for y = 0 to NumPixelsY-1
for x = 0 to NumPixelsX-1
{
(1) perform matrix-vector multiply for point (x,y,z)
(2) perform perspective divide
}
If we expand the matrix-vector multiply, we can see values that we can precompute for each row:
[ x' ] [ M00 M01 M02 M02 ] [ x ] [ y' ] = [ M10 M11 M12 M12 ] * [ y ] [ z' ] [ M20 M21 M22 M22 ] [ z ] [ w' ] [ M30 M31 M32 M32 ] [ 1 ] [ x' ] = [ M00*x + M01*y + M02*z + M03 ] [ y' ] = [ M10*x + M11*y + M12*z + M13 ] [ z' ] = [ M20*x + M21*y + M22*z + M23 ] [ w' ] = [ M30*x + M31*y + M32*z + M33 ]
First of all, obviously we cannot precompute the z term since it potentially changes for every pixel. The fourth term is constant always and the y term is constant over a row of pixels. The x term changes for every pixel but only by 1 since x is incremented by 1 for each pixel. So for each row y, we can precompute the sum of the second and fourth terms. The inner-loop becomes:
for y = 0 to NumPixelsY-1
{
// PRECOMPUTE VALUES THAT STAY CONSTANT FOR A ROW
M01y_M03 = M01*y + M03;
M11y_M13 = M11*y + M13;
M21y_M23 = M21*y + M23;
M31y_M33 = M31*y + M33;
for x = 0 to NumPixelsX-1
{
(1) perform matrix-vector multiply using precomputed values
[ x' ] = [ M00*x + M02*z + M01y_M03 ]
[ y' ] = [ M10*x + M12*z + M11y_M13 ]
[ z' ] = [ M20*x + M22*z + M21y_M23 ]
[ w' ] = [ M30*x + M32*z + M31y_M33 ]
(2) perform perspective divide
}
}
This reduces the computation (except for the first column of pixels) to only 8 mults, 8 adds, and 3 divides per pixel.
With a little more effort, we can also eliminate the x term. Let's trace a the first few elements of the sequence of x terms (for x from 0 to 2): M00*0, M00*1, M00*2. Each element in the sequence differs only by M00. Each subsequent x term is equivalent to the previous x term incremented by it corresponding matrix element M*1. We can simply update the precomputed values after using them:
for y = 0 to NumPixelsY-1
{
// PRECOMPUTE VALUES THAT STAY CONSTANT FOR A ROW
// x TERM IS INCLUDED, SO VALUES HAVE TO BE UPDATED AFTER USE FOR X+1
// x STARTS AT 0 SO THE TERM IS ZERO (NO NEED TO COMPUTE).
M00x_M01y_M03 = M01*y + M03;
M10x_M11y_M13 = M11*y + M13;
M20x_M21y_M23 = M21*y + M23;
M30x_M31y_M33 = M31*y + M33;
for x = 0 to NumPixelsX-1
{
(1) perform matrix-vector multiply using precomputed values
[ x' ] = [ M02*z + M00x_M01y_M03 ]
[ y' ] = [ M12*z + M10x_M11y_M13 ]
[ z' ] = [ M22*z + M20x_M21y_M23 ]
[ w' ] = [ M32*z + M30x_M31y_M33 ]
but we must now update the precomputed values for x+1
M00x_M01y_M03 += M00;
M10x_M11y_M13 += M10;
M20x_M21y_M23 += M20;
M30x_M31y_M33 += M30;
(2) perform perspective divide
}
}
This reduces the computation down to an amazing 4 mults, 4 adds, and 3 divs per pixel (again, except for the first column of pixels).
In summary, we must be concerned about error accumulation. In the second version containing the x term, we do not accumulate any error since we only store precomputed values and nothing is incremental. In the final version, we accumulate error for each new x-value along a row since we are incrementally updating the x-term. Be warned!