15-418 Final Project | Tim Brooks and Suhaas Reddy
Video summarization enables shortening the duration of videos, while keeping the most important information, and is often useful to condense long surveillance or time-lapse videos, or as an effect in editing clips in a film. Our video summarization algorithm is based on seam carving, and attempts to smoothly remove the least important information from videos in the temporal dimension.
Seam carving, which is traditionally applied to static images, iteratively removes “seams” (illustrated in red) with pixel values similar to those surrounding it, so as to remove information from the video that is the least significant. The resulting image maintains proportions for the most important features, as shown in the example below, while still changing the image's aspect ratio. We will attempt to apply this concept in the temporal dimension by removing surfaces of one unit thickness in time with the least important information in videos. By iteratively removing such surfaces, the video will decrease in duration, while maintaining the most important features of the video over time.
We are currently implementing an initial sequential version as a demonstration of this concept, and to use as a baseline against our parallel implementation. In the sequential implementation, we have taken an energy gradient over voxels in the video. Voxels are similar to pixels but are associated with a specific point in three dimensions, which in this case are x, y and time. We then construct a graph with edge weights corresponding with the energy gradient, find the min cut of the graph, and remove voxels that correspond with the min cut. Creating and updating the energy gradient, calculating the min cut given an energy gradient, and removing a min cut surface from the video will all be important aspects to parallelized. These components are computation heavy, and run sequentially will likely be extremely slow. For longer videos, it would also be feasible to break the video into smaller sections, and parallelize across these sections before combining them.
Maintaining the energy gradient will be difficult because in addition to requiring special locality when reading from and writing to frames in the video, temporal locality will be prominant when accessing memory to update the energy gradient. Both spacial and temporal locality must be considered in our data structures used to store video information, as well as our access patterns when reading and updating these values.
When removing a min cut from a video, we will suffer from divergent execution, since some voxels will be removed while most will not. We will have to experiment with spacial versus temporal parallelization in this case, or possibly utilize a combination of data structures to allow for constant time access and removal of only those voxels which are in the min cut.
A fast calculation of the min cut given an energy gradient will be the most difficult and complicated task. Karger's Algorithm provides an efficient sequential method for attaining the min cut, however, the algorithm requires graph searches called in iteration, which is inherently sequential. We suspect that adapting a parallel version (possibly an approximation) will be necessary to meet our desired speed.
For this project, our code base will start from scratch. We will use the OpenCV library to read and write video files. We have experimented with different graph libraries, and will need to use one, but have not yet determined which will be best for our application. Below are the two primary papers that we are using as references, in addition to a video summarizing the second paper.
It is our goal to quickly shorten videos while making the result contain the most important information. We plan to acheive these result on a series of test videos that we have taken, including a video with multiple people walking by in a row, a video with one person walking by multiple times and a video with one person entering the scene and waving from multiple sides. We plan to achieve at least a 20x speedup from our initial sequential implementation when the video duration is cut in half.
As a reach goal, we will attempt to make the video summarization occur in under a minute. Other additional tasks that we may attampt if we have the time include testing the summarization on more complex videos and videos taken from a moving camera, and correcting for changes in scene brightness when constructing the output video to make the result look more natural.
At the parallelism competition, we hope to display input videos compared to output videos that demonstrate the video summarization algorithm we implemented, as well as graphs comparing the sequential and parallelized runtimes.
We have chosen the latedays cluster as the machine on which to optimize performance, since it allows us to use many nodes to analyze different parts of the video as well as utalize multiple cores within one node. This program is likely not best suited for a GPU since computing the min cut is the most computationally intensive aspect of the program, and it is dependent on large graphs, which will likely not fit on a GPU. We have elected to write our tool in C++, which will allow us to easily use OpenCV, as well as fine-tune our own performance.
|3/28||Implement a working sequential implementation of the video summarization tool. Write project proposal and create website. Research existing video summarization methods.|
|4/4||Optimize the sequential implementation in terms of data locality and algorithic efficiency. Research min cut approximation algorithms and test existing graph libraries.|
|4/11||Implement multiple min cut approximations and compare performance as well as video summarization results of the varying algorithms.|
|4/18||Integrate the ideal min cut algorithm with our code base, time all tests on the latedays cluster. Fine-tune code to optimize for running on latedays.|
|4/25||Test implementation on longer videos processed in groups of frames. Experiment with other test videos. Potentially implement frame blending among removed frames.|
|5/2||Run final timings of test videos. Create graphs comparing runtimes of sequential and parallel implementations. Create presentation materials. Write up results and publish on website.|
|4/21||Implement naive seam carving algorithm (Tim)|
|4/23||Optimize with blocking (Tim)|
|4/24||Optimize with SIMD (Suhaas)|
|4/27||Add approximations to improve speedup (Suhaas/Tim)|
|5/3||Add interface for object removal|
|5/5||Polish object removal interface|
The work completed thus far is a naive sequential implementation of temporal seam carving for videos. The program takes a short video, finds gradients at every pixel based on the values of all of the pixels around it, and creates a corresponding graph. Using the GridCut library, we find the min-cut of the graph, which is also the surface with the least total energy gradient. We remove that surface and recompute the new energy gradients for the voxels before repeating to remove more surfaces. This technique removes one voxel from every x-y coordinate from possibly multiple frames, which theoretically makes the video shorter in the least noticeable way possible.
We have found that the runtime is extremely long for this computation, which makes testing very difficult. Slow testing has resulted in us completing fewer of our goals than we had hoped. While our concept appears to work on short videos that remove a few frames, we have decided that we will gain more and create a more impressive result from modifying our project to spatial seam carving. This allows us to still benefit from the extensive research we have done on seam carving, as well as use a similar platform, while making the final product more feasible.
A large challenge we face is the time crunch involved in meeting our new goals, which are related but somewhat disjoint from our original plans. We are fairly confident in our ability to use OpenCV and to optimize the operations based on the size of the cache size/SIMD width/etc. Beyond that we believe the main obstacle in our way is the amount of time we will be able to spend coding over the next few weeks to create a stunning presentation. We still intend to show a mixture of graphs showing speedup and media at the presentation.