COMP 236 Final Project: Image Mosaic

My final project is called "Image Mosaic". The inspiration came from InfiniteImage, a company that sells images produced using a similar technique.

A picture of myself in LA.

Technique

The idea is to precompute samples of each image from an image library for sizes 1x1, 2x2, 4x4, 8x8,... 256x256. Then, starting from an arbitrary 256x256 image, decompose it into "tiles" which can be zoomed into. The zooming is performed using normal upsampling until the image quality gets bad (I chose the doubling of the size as the threshold). Then, the tiles are replaced by the precomputed higher resolution versions of themselves, and the process continues until only one tile is visible. At that point, the image is yet again decomposed into tiles, and the whole process is repeated ad infinitum. My initial goal was to try and implement this in real time. What I currently have is a working prototype that can display the images and saves them to disk. The optimization is still at a very basic level. Some possible directions are discussed in this report.

Implementation details

First I "farmed" the Web for images, using a shareware version of Teleport Pro. I ended up with some 15000 images.

I wrote an Image resampler program in the Borland Delphi Personal Edition RAD programming environment (free for educational use) to compute all the sampled versions of the images. It uses a Lanczos3 filter for the resampling, since it is fast and produces excellent results. Pictures were saved as GIFs to occupy less space. Below is a screenshot of the resampler:

My image resampler.

Of course, any picture that wasn't close to square in shape ended up distorted. The resampler filtered out most of those automatically. Afterwards, I manually deleted the images that looked wrong: partial logos, stretched writings, blurred images, etc. I ended up with some 3000 "good" images that I converted to PPM (a format that I could read from C++) using the freeware IrfanView.

The application is built on top of the OpenGL skeleton given in class. I added some functions to the FrameBuffer class:

There are two other classes in the program: ResampledImage and MosaicImage.

ResampledImage is used in the zooming process and has the following member functions:

MosaicImage is used to store different versions of the images in the library and has the following member functions:

Discussion

The three classes described above provide the minimal functionality to implement the project. Searching through the library for the closest match to a tile is performed as follows: a tile of size TileSize is downsampled to size DownSize, then compared with the same size version of all the images in the library. A distance function sums the difference in colors for all the pixels in each library image, and the library image with the least distance is chosen as the closest match.

I tried different values for TileSize and DownSize. Small values for TileSize produce good-looking pictures. Small values for DownSize produce faster results (less computing), but when zoomed in the higher resolution versions don't match the original image closely enough. Large values for TileSize produce slower results, and the quality does not justify the extra time spent: because the number of images in the library is relatively small, there are few candidates for a closest match to any particular tile. The images below show the two situations described, taken to the extreme: a picture with a lot of contrast.

An image with TileSize=4 and DownSize=2.

The same image, zoomed in.

With bigger tiles, a close match is hard to find (TileSize=8, DownSize=4).

From both situations, it is fairly obvious that the larger the image library, the more chances are that a closer match is found, and the higher the quality of the mosaic. However, a larger library would take longer to load and compare and would occupy more memory.

Exhaustively comparing each tile with each image in the library takes 10 to 20 seconds on my computer. So, the next thing that comes to mind is optimization. I eliminated all the multiplications from the distance function and reorganized the memory allocation for the tiles so that summation goes smoothly. Another thing that came to mind was to try different color encoding schemes, that would allow for less comparisons while still producing reasonable results. One such scheme is transforming RGB to HSV and either encoding them with less bits or ignoring S altogether. With enough images, the overhead of transforming between the two coding schemes would be amortized. However, the gain in speed would not compensate for the loss in quality.

Future work

The other optimization direction I thought about was implementing a data structure that would allow efficient closest match searches. The problem is similar to volumetric searches in a 3*D*D-dimensional space, where 3 is the number of colors (R,G,B) and D is the image size at wich the comparison is done (DownSize in my program). One of the data structures that can be used for such a query is a tree similar to the KD-tree used for range queries in computational geometry. I did not have enough time to implement it, but it is deffinitely a direction worth considering.

Keyboard interface

The keyboard has the following commands:

Links

Results

Below are some screenshots produced by my program.

Professor Gary Bishop's picture from his webpage.

Last modified on May 11, 2002 by .