My data file consisted only of the list of 172 points in "x y" format. The coordinates are floating point values, although integers are permissible. In some cases, I magnified an image to get subpixel accuracy on my feature points. My data file format permitted blank lines and lines beginning with "#" for comments. This turned out to be a wise decision when I accidently left out a feature point on a face and had to "debug" which point I had left out. [Note that this technique relies on the assumption of a static number of feature points and that the order of feature points is important.] An example data file looks like this.
So the "pipeline" of data creation looks like this:
I also had to write a "behind-the-scenes" C++ class to handle sets of feature points. The code and header file is given below. This code provides methods to read data files, print in GNUPLOT format, print in interleaved format for the KUIM warping, translate/rotate/scale points, and compute the centroid of the set. These latter functions are essential to implementing the procrustes metric. I'll abridge some detail of normalizing the faces--see the code below for the exact details. Basically, each face needs to have its centroid translated to the origin, then each face has to be scaled and rotated to give the best match to a canonical feature set.
Next I had to create an "average" face for males and females. I picked a canonical face for each class and normalized everything to those faces, then averaged feature points. To normalize, I used my normalize.c program with a command like "cat data.canonical data.f1 | grep -v "#" | normalize", where "grep -v "#"" would remove the comments in the data files. Once I had normalized faces, I would concatenate them all into one file like "cat data.f1.canon data.f2.canon data.f3.canon data.f4.canon > female", then edit the file "female" and put the number "4" at the top of the file to tell my averaging program that 4 faces were to be averaged. To average the faces, I used a command like "cat female | average > data.f.avg". The averaging program is given below.
By this time, I had face data, normalized face data, and averaged face points for all my faces. I created an average female face, and average male face, and an average overall (androgynous) face. These average faces had the same data format as the original faces points.
I could do all sorts of fun things with my new data. I created a method for my pointset class to spit out the points in GNUPlot format so I could do a PostScript plot of feature points. I also created an interleave method for my class to take two faces and produce a point format for warping two images. An example program which does this, interleave.c, is given below. Now I could warp any face image to any other face image (based on feature landmarks). In particular, I could warp females to the female average and males to the male average. Finally, I used "ppmblend" to combine the warped faces together to get average images (not just feature points) for females, males, and overall. Since I could also do math on point sets using my class, I could also interpolate between faces or amplify differences from the average face. Once the data reaches this stage, creativity is the only limitation.