Rolling Shutter Tracker

Best paper award at ISMAR 2016
ISMAR-TVCG paper

This project is motivated by the latency requirements for Augmented Reality(AR) or Virtual Reality(VR) systems. IN AR/VR, there are three subsystems:

  1. Tracking
  2. Rendering
  3. Display

In this project we target the tracking sub-system. Our goal is to create a high frequency camera-based tracker.


Why tracking?
Here I would like to quote a sentence from Greg Welch's thesis: SCAAT. He says " tracker latency is unique in that it (with the estimation rate) determines how much time elapses before the first possible opportunity to respond to user motion. When the user moves, we want to know as soon as possible."
This concept of first possible opportunity to respond to motion is main motivation to tackle tracking.
NOTE: The tracking latency is not necessarily the inverse of tracking frequency. Here, we present a proof-of-concept system which just targets the tracking frequency part. To get a sense of actual latency, we need to implement our algorithm in Verilog and deploy on FPGAs.

1.1 Abstract

To maintain a reliable registration of the virtual world with the real world, augmented reality (AR) applications require highly accurate, low-latency tracking of the device. In this paper, we propose a novel method for performing this fast 6-DOF head pose tracking using a cluster of rolling shutter cameras. The key idea is that a rolling shutter camera works by capturing the rows of an image in rapid succession, essentially acting as a high-frequency 1D image sensor. By integrating multiple rolling shutter cameras on the AR device, our tracker is able to perform 6-DOF markerless tracking in a static indoor environment with minimal latency. Compared to state-of-the-art tracking systems, this tracking approach performs at significantly higher frequency, and it works in generalized environments. To demonstrate the feasibility of our system, we present thorough evaluations on synthetically generated data with tracking frequencies reaching 56.7 kHz. We further validate the method's accuracy on real-world images collected from a prototype of our tracking system against ground truth data using standard commodity GoPro cameras capturing at 120 Hz frame rate tracking at 80.4kHz.


1.2 Key Ideas

  1. Rolling shutter as 1-D high frequency sensor
  2. Use of redundant cameras
  3. Linearized motion model
  4. Direct pixel-wise comaprison

1.2.1 Rolling shutter as 1-D high frequency sensor

Most of the cameras in your smartphones and GoPros, employ what is called a rolling shutter(RS). In rolling shutter, EACH row of the images is captured at a slightly different time. Hence each row is actually a snapshot of the scene at different time instances. We treat these captured rows as an individual sample rather than treating the 2-D image as a single frame. The frequency of such row-images(row-samples) is very high. Even for a camera in your iphone capturing an image of height 720 rows at a fps of 120 is more than 80kHz. In our system, we process each row-image to produce a tracking estimate.

1.2.2 Redundant cameras

There are two important disadvantages of using a rolling shutter camera.

  1. RS is a dynamic 1-D sensor with periodic movement
  2. In RS each row corresponds to a small slice of the vertial FoV of the camera

To tackle these two challenges we use a set of redundant cameras.

1.2.3 Linearized motion model

As we are tracking at very high frequency, we can assume that the motion between two timwstamps is small. This lets us linearize the motion model making the motion model simple and efficient to compute.

1.2.4 Direct pixel-wise comaprison

Direct pixel-wise comparison instead of feature-extracton-and-matching pipeline keeps our system efficient. We use our binary decriptors to match pixels and hamming distanc as a measure of cost. This makes our system even more efficient.

PDF version of my talk at ISMAR 2016
PPTX version of my talk at ISMAR 2016
PDF version of ISMAR-TVCG paper 2016