COMP 776 - Final Project: Ribbon Carving

Ryan Schubert
April 29, 2008


(first progress report webpage can be viewed here)


Things done:

Matlab code used:
ribbon.m - main file
calcEnergy.m - calculates the energy of the given x-time slice. The timeScaleFactor is set inside this file, not passed an argument.
calcSummedArea.m - calculates the summed area table for the given energy image
findSeam.m - takes a summed area image and traces the seam of lowest energy backwards

cleanVid.m - a simple stand-alone MATLAB script that will load in an avi and remove duplicated frames. I wrote this because my digital camera would duplicate every 3rd frame while taking video. It saves out the slimmed down video as [name]_cleaned.avi


Basic algorithm outline:
1. Read in avi file (I use the MATLAB aviread() function for this)
2. Convert video data from an array of structs, each of which contain a 3D matrix (Y-X-color), to a 4D matrix (Y-X-time-color)
3. Convert any 0 color values to 1 (on a scale of 0-255 this causes no perceptible difference). A value of 0 is used later as a mask for easily collapsing the video.
4. Loop over all y slices of X-time images
    a. compute energy of current slice
    b. process energy image if desired (blur or median filter)
    c. keep running sum/max of all energy slices so far
5. Process max/avg energy image if desired (blur or median filter)
6. Compute summed area table using processed max/avg energy image
7. Compute seam of lowest energy from summed area table
8. Mask out ribbon (extruded seam) in video data with 0's
9. Collapse video volume forwards in time to fill in 0's from the cut ribbon
10. Repack 4D video data back into an array of structs
11. Write out avi with the equivalent of 1 frame removed
12. Repeat steps 4-11 until desired number of frames have been removed


Steps 1-3
See ribbon.m MATLAB file.

Step 4: Loop over all y slices of X-time images
Part a: compute energy of current slice
The energy for a given slice of the video volume is computed by taking the x and y image gradients (for R, G, and B) using the gradient() function in MATLAB. I then take the absolute value of these gradients and scale the x gradient components by the timeScaleFactor (since in my image slices, the x dimension corresponds to time). The resulting 6 values (x and y gradients for R, G, and B) are the summed; the resulting image is the energy.

Using a timeScaleFactor of 1 results in image-space high contrast edges overpowering temporal changes in some places, where what we're really intersted in is the temporal gradient. Here you can see the difference between the energy function of 3pens_cleaned.avi with a TSF of 1 and a TSF of 100:
TSF = 1 TSF = 100

Part b: process energy image, if desired (blur or median filter)
There were some cases where object motion in the X-time energy function was being 'split' because of what appeared to be video compression artifacts. An example of this can be seen in this section of the max energy and cut seam from the 3pens video when removing the 28th frame:

While we would prefer the seam to cut around all the motion energy, there is noticible, patterned energy in the static background, possibly resulting from compression artifacts in the frames of video. However, where there is lots of temporal information, the compression artifacts are less severe, resulting in less apparent energy in the space between, but temporally near, the pen's movement.

We can compensate somewhat for this by filtering each energy slice after we compute it, to remove noise. I tried both a gaussian blur and a median filter with a 3x3 and 5x5 kernel and qualitatively decided that the 3x3 median filter was the most appropriate for this step (a 5x5 median filter started noticably removing some salient information, doing more harm than good).

Part c: keep a running sum/max of all energy slices so far
I originally had proposed computing the average over all y energy slices as the energy to use for cutting my ribbon. However, after a short amount of testing it seemed to me that taking the max energy is likely better, at least for the test videos I was using. While there is some intuitive sense that averaging will result in the least salient information being lost once some important information must be lost, if we desire to simply see how many frames we can remove before seeing any significant loss of salience at all, we would prefer that all movement in the video be treated as high energy. Thus maintaining the max energy values here made more sense.

Step 5: Process max/avg energy image if desired (blur or median filter)
This is very similar to part b of step 4, and simply goes further to denoise and/or smooth the calcualted average or max energy function of the x-time images. I tried 3x3 and 5x5 gaussian and median filters for this step as well, and eventually qualitatively chose to use a 3x3 gaussian blurring pass here.

Step 6: Compute summed area table using processed max/avg energy image
This is the first half of the dynamic programming step. I start at the top of the image and work my way down, at each pixel adding that pixel's energy to the cheapest summed energy so far of its top 3 neighbors.

One trick that I included here was a 'diagonalFactor', with which I could penalize the decision of picking one of the two diagonal neighbors. A very small penalty (1.01) was enough to significantly prefer the direct vertical neighbor, resulting in mostly vertical seam cuts. This may or may not be preferable, as vertical cuts more closely resemble removing whole frames of video, resulting in less spatial distortion in the video (less 'trippyness'), but harder transitions where ribbons have been cut. There is an example of this later in the example results. Here you can see the effect in the resulting first seam of a test video with diagonalFactors of 1.0 and 1.01:
diagonalFactor = 1.0 diagonalFactor = 1.01

Step 7: Compute seam of lowest energy from summed area table
The second part of the dynamic programming process, here we simply look at the bottom of the summed area table for the lowest summed energy value and then back-track up through the table to get the cheapest path. In my implementation the resulting seam is simply an array the same length as the x dimension in the image where seam(i) = frame number of the seam at that x position.

Step 8: Mask out ribbon (extruded seam) in video data with 0's
This is where the seam is basically extruded out in y and all RGB values in our video volume that fall on the ribbon are set to 0 to be collapsed in the next step. Note that masking isn't really necessary for this and the next step (and if we cared about preserving 0 values in the video we could implement a less simple method of collapsing the video down 1 frame, however, using 0 as a mask worked fine as a proof of concept for a quicker and easier implementation).

Step 9: Collapse video volume forwards in time to fill in 0's from the cut ribbon
Here I sadly had to replace my elegantly simple:
videoData = permute(videoData, [3 1 2 4]); % now we're t y x c
videoData(videoData == 0) = []; % collapse (in t dim)
numFrames = numFrames - 1;
videoData = reshape(videoData, [numFrames height width 3]);
videoData = permute(videoData, [2 3 1 4]); % back to y x t c
with a much less elegant, but more memory conservative loop through all video frames, shifting in pixels from the next frame wherever there were 0's.

Step 10-11: Repack 4D video data and write out avi
Mostly strait forward--see ribbon.m if you're curious!



Results:
Starting with: test_cleaned.avi
A visualization of the first seam being carved. Everywhere the video is black, that is the path the seam will take through the video volume. This is essentially the result of the masked video before it is collapsed: outputSeam.avi

A sampling of the produced output. Note that for some reason I found that my media player would try to play the linked avi file and fail, while saving the file to my computer and then playing it (still in media player) would work...no idea why. At any rate, it may be much more convenient to browse my outputs from my network drive at \\phosphorus-cs\ribbon
Original video temporal weighting (TSF) diagonal penalty (diagFactor) processing on energy slices processing on max energy Output video with specified number of frames removed and corresponding energy function and last seam
test_cleaned.avi 10.0 1.0 none (using avg energy) 15 (energy) 30 (energy) 45 (energy) 100 (energy)
bflip.avi 10.0 1.0 none none 15 (energy) 30 (energy) 45 (energy) 60 (energy)
bflip.avi 10.0 1.01 none none 15 (energy) 30 (energy) 45 (energy) 60 (energy)
3pens_cleaned.avi 100.0 1.0 none none 15 (energy) 30 (energy) 45 (energy) 100 (energy) 130 (energy)
3pens_cleaned.avi 100.0 1.01 none none 15 (energy) 30 (energy) 45 (energy) 100 (energy) 130 (energy)
3pens_cleaned.avi 100.0 1.0 3x3 median 3x3 gaussian 15 (energy) 30 (energy) 45 (energy) 100 (energy) 130 (energy)
treeBlow1_cleaned.avi 100.0 1.0 3x3 median 3x3 gaussian 15 (energy) 30 (energy) 45 (energy) 100 (energy) 130 (energy)
walking1_cleaned.avi 100.0 1.0 3x3 median 3x3 gaussian 15 (energy) 30 (energy) 45 (energy) 100 (energy)
marcDavidSashi_cleaned.avi 100.0 1.0 3x3 median 3x3 gaussian 15 (energy) 30 (energy) 45 (energy) 100 (energy)