COMP 236 Final Project: Mosaic Image

COMP 236 Final Project: Image Mosaic

My final project is called "Image Mosaic". The inspiration came from InfiniteImage, a company that sells images produced using a similar technique.

A picture of myself in LA.

Technique

The idea is to precompute samples of each image from an image library for sizes 1x1, 2x2, 4x4, 8x8,... 256x256. Then, starting from an arbitrary 256x256 image, decompose it into "tiles" which can be zoomed into. The zooming is performed using normal upsampling until the image quality gets bad (I chose the doubling of the size as the threshold). Then, the tiles are replaced by the precomputed higher resolution versions of themselves, and the process continues until only one tile is visible. At that point, the image is yet again decomposed into tiles, and the whole process is repeated ad infinitum. My initial goal was to try and implement this in real time. What I currently have is a working prototype that can display the images and saves them to disk. The optimization is still at a very basic level. Some possible directions are discussed in this report.

Implementation details

First I "farmed" the Web for images, using a shareware version of Teleport Pro. I ended up with some 15000 images.

I wrote an Image resampler program in the Borland Delphi Personal Edition RAD programming environment (free for educational use) to compute all the sampled versions of the images. It uses a Lanczos3 filter for the resampling, since it is fast and produces excellent results. Pictures were saved as GIFs to occupy less space. Below is a screenshot of the resampler:

My image resampler.

Of course, any picture that wasn't close to square in shape ended up distorted. The resampler filtered out most of those automatically. Afterwards, I manually deleted the images that looked wrong: partial logos, stretched writings, blurred images, etc. I ended up with some 3000 "good" images that I converted to PPM (a format that I could read from C++) using the freeware IrfanView.

The application is built on top of the OpenGL skeleton given in class. I added some functions to the FrameBuffer class:

Resample() - zooms in towards the upper left corner
FindIndices() - finds the images that match the tiles of the current image
CopyTile(ResampledImage *tile, int tileno) - copies a tile to an image
ZoomCopyFromFB(ResampledImage *img, int tx, int ty) - copies part of the framebuffer to an image
ZoomCopyToFB(ResampledImage *img, int tx, int ty) - copies an image to the framebuffer
SaveToFile() - saves the contents of the framebuffer to a PPM file

There are two other classes in the program: ResampledImage and MosaicImage.

ResampledImage is used in the zooming process and has the following member functions:

get_pixel(int x, int y) - gets the RGB color of the pixel at position (x,y)
get_row(rgb *row, int y) - gets the colors of the pixels on row y
get_column(rgb *row, int x) - gets the colors of the pixels on column x
put_pixel(int x, int y, rgb pix) - sets the color of the pixel at position (x,y) to a RGB value
zoom(ResampledImage *dst) - resamples an image using a Lanczos3 filter (implementation adapted from Graphics Gems III)
SaveToFile() - saves the contents of an image to a file

MosaicImage is used to store different versions of the images in the library and has the following member functions:

ReadFromFile(int size, rgb *dest) - reads from a file an image of the required size
CopyToFB(int x, int y, int size, int ix, int iy, int w, int h, FrameBuffer *fb) - copies the image to the framebuffer at a specified location

Discussion

The three classes described above provide the minimal functionality to implement the project. Searching through the library for the closest match to a tile is performed as follows: a tile of size TileSize is downsampled to size DownSize, then compared with the same size version of all the images in the library. A distance function sums the difference in colors for all the pixels in each library image, and the library image with the least distance is chosen as the closest match.

I tried different values for TileSize and DownSize. Small values for TileSize produce good-looking pictures. Small values for DownSize produce faster results (less computing), but when zoomed in the higher resolution versions don't match the original image closely enough. Large values for TileSize produce slower results, and the quality does not justify the extra time spent: because the number of images in the library is relatively small, there are few candidates for a closest match to any particular tile. The images below show the two situations described, taken to the extreme: a picture with a lot of contrast.

An image with TileSize=4 and DownSize=2.

The same image, zoomed in.

With bigger tiles, a close match is hard to find (TileSize=8, DownSize=4).

From both situations, it is fairly obvious that the larger the image library, the more chances are that a closer match is found, and the higher the quality of the mosaic. However, a larger library would take longer to load and compare and would occupy more memory.

Exhaustively comparing each tile with each image in the library takes 10 to 20 seconds on my computer. So, the next thing that comes to mind is optimization. I eliminated all the multiplications from the distance function and reorganized the memory allocation for the tiles so that summation goes smoothly. Another thing that came to mind was to try different color encoding schemes, that would allow for less comparisons while still producing reasonable results. One such scheme is transforming RGB to HSV and either encoding them with less bits or ignoring S altogether. With enough images, the overhead of transforming between the two coding schemes would be amortized. However, the gain in speed would not compensate for the loss in quality.

Future work

The other optimization direction I thought about was implementing a data structure that would allow efficient closest match searches. The problem is similar to volumetric searches in a 3*D*D-dimensional space, where 3 is the number of colors (R,G,B) and D is the image size at wich the comparison is done (DownSize in my program). One of the data structures that can be used for such a query is a tree similar to the KD-tree used for range queries in computational geometry. I did not have enough time to implement it, but it is deffinitely a direction worth considering.

Keyboard interface

The keyboard has the following commands:

'1' - load a random image into the framebuffer
'2'- compute the tiles for the current image
'3' - zoom in towards the upper left corner
'4' - save the framebuffer to a file
'5' - reload the image into the framebuffer
'6' - load the first image in the library into the framebuffer (I used this to produce the "portrait" shown below and my own picture shown at the top of this report, by simply replacing the files)