Understanding Projective Textures
Correcting the Double-Perspective Distortion
Kenneth E. Hoff
11/19/97, updated 6/14/98

Projective texturing is nothing more than the classic problem of perspectively-correct texture-mapping, but with a twist:

The problem:

From a given view we store the following:

From this information, we would like to reconstruct the image from a new view (standard image-warping through the use of a dense mesh). This is easy since the mesh is so dense; we can simply assign a pixel color to each vertex and turn on smooth shading. Then we need only apply a back-projection to transform the screen-space coords
back into object-space coords. Then we have normal object-space geometry that can be rendered from the
new "warped" view.

This is simple for the original mesh since we have a vertex for each pixel in the original snapshot image. However, when we apply simplification to the mesh, a simple assignment of vertex colors is insufficient. We then must resort to a texture-mapped mesh where each vertex is assigned (s,t) coordinates by directly copying over the original (x,y) components of the screen-space coords. This will also work fairly well for slightly simplified meshes, but as the polygons grow larger in size from increased simplification a perspective texture distortion will become apparent across the interpolated regions of each of the larger triangles. The texture coords will be correct at the vertices, but will not be interpolated correctly across the faces.

This is due to the "double-perspective" effect caused by texture-mapping from an image that has already been projected (perspectively-correct). In order to properly map this image back onto the mesh geometry, we must unproject and undue the original perspective mapping. We can do this in several equivalent ways which equate to
nothing more than standard projective textures.

The problem arises because we are assigning (x,y,0,1) to the (s,t,r,q) texture coords. In order to perform perspectively correct texture mapping the texture coords must have "q" equal to 1/w where w is the resulting homogeneous coordinate for the perspectively transformed vertex.

The path proceeds as follows: we have an original vertex (x,y,z,1) with tex coords (s,t,0,1). After the 4x4 composite transformation w/ perspective the vertex becomes (x', y', z', w') with unaffected tex coords (s,t,0,1). We transform this homogeneous vertex into 3D NDC space through a perspective divide and we get (x'/w', y'/w', z'/w', 1) with tex coords also divided through by w as (s/w, t/w, 0, 1/w). Each vertex now has an associated 6-vector in the following form: (x'/w', y'/w', z'/w', s/w, t/w, 1/w) which we can just write as: (x'', y'', z'', s/w, t/w, 1/w).

Now in order to obtain perspectively-correct texture mapping, each of these six components are linearly interpolated in screen-space along each projected edge, and for each pixel we perform a perspective divide
of the current (s,t) tex coords by the current q: (s/w / 1/w, t/w / 1/w) which simply gives us (s,t). This effectively cancels the w term and gives us appropriate texture coords corresponding to a correct linearly interpolation along the edge in object-space. This could not be achieved directly since a linear interpolation along an edge in screen-space does not correspond to a linear interpolation along the same edge in object-space. The relationship is nonlinear, but is correctly handled by a rational interpolation of the 3-vector (s/w, t/w, 1/w).

So the problem now reduces to find the missing "q-value" in the texture coords taken directly from the (x,y) projected locations of the mesh vertices. Here are several possible ways to obtain (s,t,q) given the mesh in screen space, the projected snapshot image, the original camera parameters, and the new camera parameters:

(1) Simply apply the inverse viewing/projection matrix from the original view to the screen-space mesh vertices, and simply apply the original view/projection matrix from the original view again. This is simply back-projection and re-projection all from the original view; this appears to give you the identity matrix, however, it does not because of the hidden perspective divide that is now missing. The original mesh vertices have already been perspectively divided through to obtain the screen-space coords. This method goes directly to the final form desired, but requires the storage and bandwidth requirements of 3D texture coordinates (requires storage of the s,t,q with the mesh vertices and needs a texture coordinate function call along with each glVertex call). It is important to note that no unnecessary repeated computation will be performed with this method. The next method requires the use of the texture-matrix stack and thus performs two full 4x4 transformations for each vertex.

(2) Apply the inverse viewing/projection matrix from the original view to the screen-space mesh vertices to obtain the object-space vertices. We can then use the vertices themselves as 3D texture-coordinates by pushing the original view/projection matrix onto the texture-matrix stack. This will give us the same coordinates as (1), but will use the OpenGL texture-matrix stack to do the final transformation. This does not require any texture coordinates storage with the mesh vertices, but may or may not require the bandwidth used by a texture coordinate function call depending on whether we can use the automatic texture coordinate generation functions.

As far as OpenGL is concerned, the interesting solutions lie with (2) where we can use the automatic texture generation modes to assist us. These give us two possibilities: creation of 3D texture coords in the eye-space of the new camera view or in object-space.

Most of the projective texture examples that I have seen define the texture-coords in the eye-space of the new view and simply apply the inverse new camera view/projection transformation to obtain object-space coords (again) along with the original camera transformation to obtain the desired resulting (s,t,q) in the old view. This seems excessively complicated since we only want to simply pass along the object-space mesh vertex coordinates precisely as the texture coordinates and only apply the original camera viewing transformation (no inverse transformations required).


METHOD 1: Tex-coords as new-view eye-space coords:

In order to define the eye-space version, we typically see the following:

// INIT AUTOMATIC TEXTURE COORDINATE GENERATION
GLfloat eyePlaneS[] = {1.0, 0.0, 0.0, 0.0};
GLfloat eyePlaneT[] = {0.0, 1.0, 0.0, 0.0};
GLfloat eyePlaneR[] = {0.0, 0.0, 1.0, 0.0};
GLfloat eyePlaneQ[] = {0.0, 0.0, 0.0, 1.0};
glTexGeni(GL_S, GL_TEXTURE_GEN_MODE, GL_EYE_LINEAR);
glTexGeni(GL_T, GL_TEXTURE_GEN_MODE, GL_EYE_LINEAR);
glTexGeni(GL_R, GL_TEXTURE_GEN_MODE, GL_EYE_LINEAR);
glTexGeni(GL_Q, GL_TEXTURE_GEN_MODE, GL_EYE_LINEAR);
glTexGenfv(GL_S, GL_EYE_PLANE, eyePlaneS);
glTexGenfv(GL_T, GL_EYE_PLANE, eyePlaneT);
glTexGenfv(GL_R, GL_EYE_PLANE, eyePlaneR);
glTexGenfv(GL_Q, GL_EYE_PLANE, eyePlaneQ);
glEnable(GL_TEXTURE_GEN_S);
glEnable(GL_TEXTURE_GEN_T);
glEnable(GL_TEXTURE_GEN_R);
glEnable(GL_TEXTURE_GEN_Q);

with the following texture-matrix stack definition:

glMatrixMode(GL_TEXTURE);
glLoadIdentity();
glTranslatef(0.5,0.5,0);
glScalef(0.5,0.5,1);
glMultMatrixf( OrigCam.GetPerspectiveMatrix() );
glMultMatrixf( OrigCam.GetModelViewMatrix() );
glMultMatrixf( Inverse(NewCam.GetModelViewMatrix()) );
// glMultMatrixf( Inverse(NewCam.GetPerspectiveMatrix()) );

I commented out the last line because we are using OpenGL EYE-coords as the tex-coords. This means that the object-space vertices that were sent with a glVertex call are first transformed by the MODELVIEW matrix from object- to eye-space before using the coords as the tex-coords. So, we need only undo the inverse of the MODELVIEW matrix, NOT the associated perspective matrix that takes the coords beyond eye-space into the homogeneous space (4D space just before the perspective divide).


METHOD 2: Tex-coords as object-space coords

The alternative in object-space is as follows:

// INIT AUTOMATIC TEXTURE COORDINATE GENERATION
GLfloat objPlaneS[] = {1.0, 0.0, 0.0, 0.0};
GLfloat objPlaneT[] = {0.0, 1.0, 0.0, 0.0};
GLfloat objPlaneR[] = {0.0, 0.0, 1.0, 0.0};
GLfloat objPlaneQ[] = {0.0, 0.0, 0.0, 1.0};
glTexGeni(GL_S, GL_TEXTURE_GEN_MODE, GL_OBJECT_LINEAR);
glTexGeni(GL_T, GL_TEXTURE_GEN_MODE, GL_OBJECT_LINEAR);
glTexGeni(GL_R, GL_TEXTURE_GEN_MODE, GL_OBJECT_LINEAR);
glTexGeni(GL_Q, GL_TEXTURE_GEN_MODE, GL_OBJECT_LINEAR);
glTexGenfv(GL_S, GL_OBJECT_PLANE, objPlaneS);
glTexGenfv(GL_T, GL_OBJECT_PLANE, objPlaneT);
glTexGenfv(GL_R, GL_OBJECT_PLANE, objPlaneR);
glTexGenfv(GL_Q, GL_OBJECT_PLANE, objPlaneQ);
glEnable(GL_TEXTURE_GEN_S);
glEnable(GL_TEXTURE_GEN_T);
glEnable(GL_TEXTURE_GEN_R);
glEnable(GL_TEXTURE_GEN_Q);

with this simpler texture-matrix stack definition:

glMatrixMode(GL_TEXTURE);
glLoadIdentity();
glTranslatef(0.5,0.5,0);
glScalef(0.5,0.5,1);
glMultMatrixf( OrigCam.GetPerspectiveMatrix() );
glMultMatrixf( OrigCam.GetModelViewMatrix() );


The normal matrices are used from the new camera location to view the geometry. Both methods do not require the defining of any explicit texture coordinates. In short, the meshes are rendered normally (with only glVertex calls, no glNormal calls) after setting up the automatic texture coordinate generation routines.

The use of the automatic texture generation modes for eye-space and object-space result in the glVertex calls being used explicitly as part of the tex-coord definition. In method 1, new-view eye space coords are used as the tex-coords; this means that the actual coords passed in with the glVertex calls are first transformed by the OpenGL modelview matrix (Object-to-Eye transformation) and then used as the tex-coords for that vertex. In method 2, object-space coords are used explicitly as the tex-coords; the glVertex calls ARE the object-space coords, so the glVertex coords passed in become the tex-coords directly. NOTE: these "tex-coords" refer to the equivalent of making the glTexCoord call explicitly, the actual values come from a tranformation of these coords by the texture tranformation matrix.

Method 2 seems superior since we do not need any information about the new view; even more, we do not need those nasty inverse matrices that OpenGL gives no direct access to. However, there is a hidden problem in method 2 that will not show up in our particular application. They will only work if the model being viewed (the model that we are applying projective textures to, in our case, the quad-mesh) is static. This means that the object-space glVertex coords must be the same as the world-space coords. The reason why the OpenGL modelview matrix is called "modelview" is because that there are actually two transformations hidden in it: the modeling transformation that transforms object-space vertices into "world-space" vertices, and the viewing transformation that transforms world-space vertices into eye-space vertices. OpenGL has no notion of a world-space, but it is conceptually still there. As a simple example, imagine we are modeling a robot arm from unit-cube primitives. In this case the object-space coords for each part of the robot arm are all the same, we are "instancing" the unit-cubes to create the parts of the robot-arm. The entire robot-arm as a whole collection of unit-cubes then actually exists only in world-space (or another level of a hierarchy of object spaces - more complicated - we'll just discuss a two-level hierarchy). In order to properly apply these projective textures, we need the glVertex calls to correspond to world-space coords; however they are object-space and we can only obtain automatic tex-coord generation from eye-space (after modelview xform) or from object-space (directly from the glVertex calls). The former xforms the object-space coords directly into the eye-space, and the latter uses the object-space coords explicitly.

Method 1 will correctly handle this problem since it takes the object/world space problem into account, but method 2 will have to be further modified to obtain world-coords. We can use the object-space method by first tranforming by the modeling matrix (the first part of the MODELVIEW matrix):

METHOD 3:

glMatrixMode(GL_TEXTURE);
glLoadIdentity();
glTranslatef(0.5,0.5,0);
glScalef(0.5,0.5,1);
glMultMatrixf( OrigCam.GetPerspectiveMatrix() );
glMultMatrixf( OrigCam.GetModelViewMatrix() );
glMultMatrixf( Object2WorldMatrix() );  // OBJECT-SPACE TO WORLD

By simply adding the modeling transformation, we obtain the world-space coordinate that can then simply be projected into the old camera to obtain projective tex-coords. This solution did not require any additional view information, but requires a separation of the OpenGL modelview matrix into both modeling and viewing xforms which should conceptually be there anyway. This extra matrix was not need before since both object and world spaces are equivalent in a static, flat-hierarchy model (like our quad-mesh).



SUMMARY:

Here is a summary of the various tranformation pipelines:

Method 1:

  1. object-space (glVertex)
  2. xformed by modelview into eye-space
  3. eye-space coords are used as tex-coords
  4. tex-coords are xformed by texture xform matrix: goes from new-view eye-space to old-view screen-space (actual tex-coords indexing into the projective texture). The texture xform matrix goes through several spaces:
    1. new-view eye-space
    2. through new-view inverse modelview xform into object-space
    3. through old-view modelview xform into old-view eye-space
    4. through old-view projection xform into old-view screen-space (for projective texture indexing)
Method 2:
  1. object-space (glVertex) used directly as tex-coords
  2. tex-coords are xformed by texture xform matrix: goes from object-space to old-view screen-space (actual tex-coords indexing into the projective texture). The texture xform matrix goes through several spaces:
    1. object-space
    2. through old-view modelview xform into old-view eye-space
    3. through old-view projection xform into old-view screen-space (for projective texture indexing)
Method 3:
  1. object-space (glVertex) used directly as tex-coords
  2. tex-coords are xformed by texture xform matrix: goes from object-space to old-view screen-space (actual tex-coords indexing into the projective texture). The texture xform matrix goes through several spaces:
    1. object-space
    2. through modeling xform into world-space
    3. through old-view modelview xform into old-view eye-space
    4. through old-view projection xform into old-view screen-space (for projective texture indexing)