UNDER CONSTRUCTION!!!
COMP254 - Image Processing and Analysis
Final Project
Landmark-Based
Statistical Shape Classification
using Thin-Plate Splines
This project was developed in conjunction with
Gregory Pruett.
Outline
- Shape Classification with Image Warping
- Mean Shape and Affine Transformations
- Results
- Applications
- Limitations
- Conclusions
- Future Work
- References
We are interested in classifying objects according to their 3D
shape using 2D projections (images). Given a set of images of
different objects that we use to train the classifier, we would
like to process any new image and classify it to the object it
represents. In this project we investigated the application of
image warping using thin-plate splines to shape classification.
IMAGE WARPING
Given a pair of images, image warping is a
one-to-one mapping describing how the pixels of one image should be displaced
to represent the spatial structure represented in the other image. Image
warping operates on pixel locations (does not change pixel colors) and can cause
compressions and stretches in the warped images (which may require resampling).
We use the thin-plate splines as the continuous representation between
samples in order to perform the resampling.
LANDMARKS
In order to capture the shape of an object we represent it as a set
of landmarks (important points on the structure
of the object) bending an imaginary flat thin-plate
placed above its 2D projection (image). Landmarks should be placed so that
they represent characteristic points on the object that can be used to
distinguish two objects of a same class (e. g., for two faces:
eyes, nose, ears, etc.) The landmarks should try to approximate regions
where we see visual discontinuities in the object.
Given two images to be warped, we have to specify landmarks in both the
source and the destination image. The landmarks must be in the same number
in both images and must be specified in the same correspondence order in
both images. A landmark representing a left eye on the source image must
have a correspondent landmark representing a left eye on the destination
image.
BENDING ENERGY CLASSIFICATION
Given a set of landmarks for each standard object,
we can compute the bending energy required to warp a new set of landmarks (from
an image to be classified) into each of the standard shapes; i. e., the
energy required to bend the thin-plate associated the new set of landmarks
to represent the corresponding standard landmark sets. Once we have the bending
energy required to bend the new set of landmarks to each standard object, we
classify the new image to that object it requires less energy to be warped to.
This classification process can be divided in two steps:
- TRAINING STEP:
Select the landmarks for each of the standard objects.
- CLASSIFICATION STEP:
Select the corresponding landmarks on the new image;
Compute the bending energy required to warp the new image into the
standard ones, and
Classify the new shape to the standard object which requires less bending
energy.
The bending energy required to warp one set of landmarks into another set
is invariant to affine transformations (translation, rotation, scale, reflection,
and shear). This feature makes it possible to classify a new image without having
to normalize it to a common origin, scale, and orientation.
In order to improve the shape representation of the
standard objects, we can average its
landmark representation from several 2D projections (images) and compute
its mean shape. Due to affine transformations in 3D or in 2D spaces,
the landmarks of different views of the same object have to be normalized to a
common origin, scale, and orientation before being averaged.
Given more than one view of an object, we identify the landmarks in
each image, compute the centroid of the
landmarks of each view (average the landmark points) and define landmark vectors
from each centroid to the landmark points. Label accordingly corresponding landmarks
of each view (in this work we label landmarks with colors). The centroid and the labels
help dealing with affine transformations among the different views:
- TRANSLATION:
superimpose all centroids at a common origin (e.g., (0,0)) and translate
all landmark vectors accordingly.
- SCALE:
normalize landmark vectors of each view (such that their length add up to one for
each view, for example).
- ROTATION:
rotate the landmark vectors of each view such that the squared distance between
corresponding landmarks to a standard representation (choose one of the views
as the standard or canonical one) is minimized (Procrustes metric).
Procrustes metric minimization can be implemented using constant stepsize
sampling of the distance function or any other root finding method such as
Newton`s method or secant`s method. For simplicity, we implemented the constant
stepsize sampling for 1 degree accuracy. Although the implementation is slower
than other methods and it has fixed accuracy, it does not have problems due to
local minima.
After normalizing all the landmark representations of an object to a common
origin, scale, and orientation, we can see that corresponding landmarks form
clusters when superimposing all the landmark representations. We compute the mean
shape of the object averaging corresponding landmarks of all views. Increasing the
number of views used to compute the mean shape improves the
confidence of the classification.
Our system implements the mean shape
computation, the bending energy evaluation, and the
shape classification. All the figures presented below
are organized as:
- TOP LEFT: standard image 1
- TOP RIGHT: standard image 2
- BOTTOM LEFT: test image being classified
- BOTTOM RIGHT: warped image 1 into image 2.
Shape classification for different models of cars:
Shape classification for faces recognition:
Shape classification for likeness:
We can easily see that the bending energy of a shape increases with the
number of landmarks used. We can also notice that increasing the number of
landmarks used to represent a certain shape we increase the accuracy of the
landmark and bending energy representation. Consequently, increasing the number
of landmarks we can increase the confidence of the classifications.
- Computer vision - robotics
- Models of cars recognition
- Faces recognition
- Answer the question:
Does she/he look like her/his Mother or Father?
- LANDMARKS
- user have to specify them
- same number in all images
- same order in all images
- 3D rotations and hidden landmarks
The bending energy based classifier has shown to be reliable and robust. Even
when using small numbers of landmarks (around 10 in the examples above), the
classifications were coherent with the expected results. However, we could
improve the confidence even more using larger numbers of landmarks (increasing
the bending energy) per class.
Increasing the number of views used to compute the mean shape we can also
improve the confidence of the classifications.
Last updated, Mon Apr 29 23:54:02 EDT 1996
by bastos@cs.unc.edu.