COMP 790 - Project: Photofractals

COMP 790 - Project: Photofractals

Ryan Schubert
Fall, 2008

Original Proposals (added 9/10/2008)

(Added 10/1/2008)

I have decided to do photofractals--I think that in most cases the automatic edge preserving image compositing project would not produce very attractive results without careful user specification of the source image patch and the destination photo, which sort of defeated my original idea (to computationally find cool image patch relationships in photos that were not initially apparent to a user).

Overview
Basic idea of photofractals as an infinitely zooming slideshow transition:
The user provides some input sequence of photos (for simplicity, we can assume that order is specified, for now). For each sequential pair of photos:

Compute distance between a downsampled version of the second photo and all possible same-sized patches in the first photo. This could also include looking at a fixed number of discrete rotations as well (an easy starting point that wouldn't require resampling if we assume square pixels is just looking at 0, 90, 180, and 270 degrees rotations). The distance could be simple L2 distance in RGB space to begin with.
1. Downsample the second photo to a pre-set scale (say, 1/20) and define the resulting image as your target patch
2. Compute L2 distances between the target patch and all same-sized patches in the first photo
Composite the target patch into the first photo. I had originally planned on doing poisson blending for this step, but I'm not sure if that will introduce some color artifacts when zooming into the second image (granted if we use the colorspace as our distance metric, we might get lucky...). Could also do a grabcut on the image patch to cut at hard edges where background colors don't match.

I don't think MATLAB will work for displaying the resulting transitions, despite how nice it is in terms of interactive testing and debugging. Instead this will likely need to be an OpenGL C++ app.
That said, for initial testing of the target patch/photo alignment algorithm, without any transition animation, I've started writing some MATLAB code.
pf.m - brute-force for loops for finding lowest distance position of 1/20 scaled second photo in the first. This is super slow and absolutely needs lots of speeding up.
box.m - simple helper script I was using to draw a box in a figure

Some additional thoughts at this point:

For photos where the color histograms are significantly different it may be impossible to find a good color-space match for the target patch--in these cases we might need to resort to looking in gradient space instead (and then perhaps do poisson blending).
In an attempt to make the target patch integrate seamlessly into the first photo, there will likely need to be some sort of blending, and/or lowest-energy seam calculation. A parhaps more complex issue will be how well such methods would scale as the user zooms in. There's a whole additional issue of blending between two images where one image is essentially at least an order of magnatude higher resolution.

(Added 10/2/2008)
Updated pf.m
Did some tests for the case where photo1 == photo2 (maybe the most fractal-like type of result).
I'm currently looping over some small range of scales for the target patch (e.g. from 0.06 to 0.11 in steps of 0.01). I can also look at L2 distance in RGB space or gradient space now (or some weighted combination of the two).
Using torre.bmp as both input images and looking over the scale range above I get the following results:
Visually, the scale range of the target patches was as follows:

Distance measure	Lowest distance match
Only RGB distance
Only gradient distance
RGB and gradient distances weighted equally

Only looking at color we tend to prefer just matching the dark lower quarter of the image to the horizon while only looking at the image gradients we get terrible color discontinuities. Looking at both color and gradients, however, gives us what I would consider a pretty decent match in the image.

(Added 10/29/2008)
Updated pf.m

At this point I felt the need to test this out on a variety of images with different properties, to try to get an idea for which images work better, what sorts of things might be problematic, and what obvious visual artifacts might result from the compositing.

Overall, there are some cases in which I think the result works fairly well, and then there are others in which the result is terrible.

Patch scales used	Input image	Result image
Some test images to verify that my algorithm is doing what I would expect for obvious test cases.
0.09, 0.10, 0.11, 0.12, 0.13
0.09, 0.10, 0.11, 0.12, 0.13
Test 3 initially did not behave as I had expected, but after considering the difference in line thicknesses between the scaled down version and the original version, it makes semse. (note that I'm currently taking the first 'best' match, when there are many tied best positions). It's an interesting test case though, because perceptually it would probably work better in the center, despite the fact that there's actually a higher color difference for the pixels and the fact that the gradients do not line up at all. Perhaps an interesting thing to look into might be some way of estimating a scale invariant gradient measure: something that might look for 'lines' that are defined by two close complementary gradients--one positive and one negative--and then yield a high response in the middle of those two gradients (essentially defining the 'middle' of the line). Then lines of slightly different widths would still elicit a high gradient repsonse when lined up, rather than the opposite.
0.09, 0.10, 0.11, 0.12, 0.13
0.09, 0.10, 0.11, 0.12, 0.13
0.09, 0.10, 0.11, 0.12, 0.13
Note that failing to fully explore enough patch sizes for good matches can result in missing out on a good, obvious match. In this case, when I constrained the fractal image to only look at one particular patch scale (that happened to be smaller than what results in a close match in the first example), the end result is pretty bad.
0.09, 0.10, 0.11, 0.12, 0.13
0.09
In this case, it's not immediately obvious where the patch ends up in the resulting image (although once I found it, it became more obvious). But in this case I think it leverages simply the look of the input image, and doesn't necessarily exhibit a good match with the underlying image patch.
0.09, 0.10, 0.11, 0.12, 0.13
0.09, 0.10, 0.11, 0.12, 0.13
0.08, 0.09, 0.10, 0.11, 0.12
Here's an example of a pretty bad failure.
0.09, 0.10, 0.11, 0.12, 0.13

I also wrote my own image embedding function for inserting an image patch into another image that supports simple linear crossfading over a specified edge width (in pixels):
embed.m

Here are a few examples of different blend widths for the village_oceanview1.jpg result. All of these were run over the same patch scales (0.9 through 0.13):

Blend width	Result
5
10
15
20

There are a few observations from this:
1. To completely account for the color differences without simply blending over the entire patch (the light patch of 'sky' in the water is still noticable) would require some sort of color-matching or gradient domain blending
2. Even a little blending goes a long way in perceptually smoothing out the edges of the image patch. (if you compare to the result above with no blending)

Another example with blending, this time using torre.jpg (over the same patch scales), with a blend width of 15. In this case we notice the underlying image starting to 'show through' at the point of the spire and on the left side--something which may not actually be desired. One solution to this problem might involve a dynamic blend width, based on gradient information, e.g. keep blending as long as you aren't crossing any high gradients:

Some other current thoughts:

I'm currently doing nothing that might favor larger patch sizes--simply looking for the lowest distance (normalized by the total number of pixels in the current patch size). It might actually be preferrable to introduce some weighting such that smaller patches (which preserve less of the original detail) would be penalized. One way of thinking about this is looking at the extreme of downsampling a photo to a single pixel (presumably the average pixel value from the original photo). It might have a great lowest distance to other similarly colored pixels in the image, but does a poor job of satisfying the effect I'd like to achieve.

I've yet to look at a wider range of scales and other orientations mainly for the sake of keeping my developement cycle reasonably timed. It takes anywhere from 1-2 minutes to run a single scale iteration on an image (so 5-10 minutes for most of the runs above that looked at 0.09 through 0.13), obviously varying based on the original image resolution. It would be pretty trivial to rotate the patch by increments of 90 degrees to look for matches at those orientations as well, but without further optimization it would simply quadruple the run-time. By extention the only difficult part about looking at arbitrary orientations is dealing with resampling artifacts and masking out the rotated patch within a larger image array. I'm sure there are optimizations that could be done on the code that I have not yet implemented, and perhaps some algorithmic improvements that could be done as well.