Real-time Structured Light Depth Extraction

Kurtis Keller a and Jeremy Ackermanb

UNC Computer Science, Sitterson Hall, Chapel Hill, NC, USA

a keller@cs.unc.edu

b ackerman@cs.unc.edu

abstract

Gathering depth data using structured light has been a procedure for many different environments and uses. Many of these systems are utilized instead of laser line scanning because of their quickness. However, to utilize depth extraction for some applications, in our case laparoscopic surgery, the depth extraction must be in real time.

We have developed an apparatus that speeds up the raw image display and grabbing in structured light depth extraction from 30 frames per second to 60 and 180 frames per second. This results in an updated depth and texture map of about 15 times per second versus about 3. This increased update rate allows for real time depth extraction for use in augmented medical / surgical applications.

Our miniature, fist-sized projector utilizes an internal ferro-reflective LCD display that is illuminated with cold light from a flex light pipe. The miniature projector, attachable to a laparoscope, displays inverted pairs of structured light into the body where these images are then viewed by a high-speed camera set slightly off axis from the projector that grabs images synchronously.

The images from the camera are ported to a graphics-processing card where six frames are worked on simultaneously to extract depth and create mapped textures from these images. This information is then sent to the host computer with 3D coordinate information of the projector/camera and the associated textures. The surgeon is then able to view body images in real time from different locations without physically moving the laparoscope imager/projector; thereby, reducing the trauma of moving laparoscopes in the patient.

Keywords: Depth extraction, structured light, real-time, laparoscopic

1. Introduction

During laparoscopic surgery, the surgeon is limited to the single viewpoint of the laparoscope. Stereo laparoscopes are available but they still are limited to their single vantagepoint and the ability for a surgeon to fuse the two images together. If 3D-depth information would be included in the image, then the surgeon could have different vantagepoints to assist in the procedure. This is especially important in large cavity areas such as in abdominal procedures.

We will divide out the portions of this work the following way:

Two different designs were built for experimenting and applying the high speed structured light depth extraction. The first design was a non-sized optimized prototype design where proof of concept and animal cadaver experiments were performed on. The second design utilizes a miniature, half the size of a fist, projector that clamps on to an endoscope. Processing is performed real-time by a graphics processor PC card.

2. Applications

As a research lab, our applications are rather narrow. Our development goes toward solving the problems that we encounter directly and that solve our own research’s problems. Because of that, our two areas of application for the high-speed structured light are in medical uses and for teleconference use. There are many more applications where 3D-depth information in digital video would be useful, but we have not investigated into them. Some of the more obvious ones are: scene reconstruction for rooms, dental cavity molding, digital studio sets for broadcast TV and other entertainment uses.

2.1. Medical

Our main effort in the desire for real time structured light depth extraction for medical applications. Specifically, minimally invasive surgery using laparoscopes. Our area of interest is in abdominal cavity area. This particular part of the body is unique that it can be inflated with CO2 and the organs can be viewed from the top. However, this large viewing area also causes working problems in that distances from actuator to tool end of the tools is quite great and the surgeon has difficulty perceiving the location of tool ends to body part. Having a 3d image to work from would allow the surgeon to view the area of interest from side to side, thereby greatly increasing his awareness of the task.

To increase accuracy farther, a Head Mounted Display (HMD) can be worn with video see-through capabilities to augment this 3d information in a graphics computer. This augmented reality system then utilizes this 3D imaging information to the fullest and can speed up the surgical procedure times. Reduced surgical times, reduced trauma, has a strong relation to how quickly the patient recovers from a surgical procedure. Not only is a quicker procedure more cost effective but also better for the patient overall.

Two views of a simulated medical procedure with augmented reality overlays are shown in figure 1 and 2 here. Figure 1 shows a surgeon practicing on our test simulator and the second shows the image that he sees in the HMD.

2.2. Teleconference

A second area where we have been experimenting with the structured light depth extraction has been in teleconferencing. The ability to look the person you are talking to in the eye is very important to teleconferencing to bring in the intimacy that real face to face conversation does. Without real eye contact, the conversation is not much closer than a regular telephone. Vantagepoints can be shifted from an off axis camera to a point where the listener would actually be, thereby increasing the presence for all participants. Real-time structured light depth extraction can allow these transformations to happen.

  1. Current systems
  2. There are many different optical depth extraction systems out there. Some of which are real time or close to real time. All of them have limitations, including structured light. I will compare and contrast them to show where the real time structured light depth extraction can be useful, what the technology limitations currently are and also where it is not a good choice.

    3.1 Line Scan

    Laser line scan cameras are normally quite accurate and can make high-resolution images in under a couple seconds. A secondary process or camera is required to superimpose a texture on the resultant 3D wire frame model. They have the advantage of very distinct separation lines, not easily confused on curved surfaces and fairly high resolution. Their wak points are: relatively slow to other depth extraction techniques which can make moving or live objects difficult to scan, fail miserably scanning perforated objects such as hair, light levels can be distracting to a person or animal being scanned and require a secondary camera or image to paint textures on the model. However, successful systems for live subjects, such as foot scanners for custom orthotics, are available along with several very good systems for still objects, normally using cylindrical scanning orientations.

    3.2 Optical Stereo

    Optical stereo depth extraction utilizes two or more cameras to identify features in a scene common to both cameras and use the known geometry coordinates between the cameras to interpolate the distances of item or features in the scenes. This type of system can be relatively fast, including real time and also has the advantage that the textures are gathered at the same time by the same cameras. These systems are relatively inexpensive for close to real time use.

    One large drawback with this type of depth grabber is by the difficult in extracting common features automatically. Especially on curved surfaces. Automatic operation in rooms where there are many planes and strait lines and other such environments are where these types of systems are most effective. For medical use, where internals of the body are all curved and surrounded in fluids, this type of depth extraction is not very effective.

    3.3 Structured Light

    Structured light utilizes a projector to display a known pattern or patters, stripes or checkerboard designs are the most common, onto a surface. A synchronized, off-axis camera grabs these images and the coordinates of the edges of the patterns can then be calculated. Progressively finer patterns are projected over this same area and greater depth detail is resulted by the increase in edges displayed. Starting with a course texture help reduce errors from strong discontinuity in features.

    This type of system normally takes about 5 camera frames to grab an entire image with depth. Texture is inherent in the grabbing phase as a separate image or one inverted pattern so that all areas have been scanned. Structured light benefits medical use because of its ability, like line scan, to accurately read curved surfaces where Optical Stereo cannot. However, it has a greater lag of 5 or so frames where Optical Stereo can be as fast as one to two frames behind. This lag is in structured light is what we will show how to reduce and to make this technology available for real time use.

     

    Speed

    Automatic feature extraction

    Linear Surfaces

    Curved Surfaces

    Combined depth / texture grabbing

    Line Scan

    Slow

    Yes

    Yes

    Yes

    Separate imager or scan required

    Optical Stereo

    Real time

    Somewhat

    Yes

    No

    Combined

    Structured Light

    Almost real time

    Yes

    Yes

    Yes

    Combined

  3. First Design with a large digital projector
  4. An initial design of a depth extracting laparoscope was built in 1998 [MICCAI’98]. This system was based on a 3-D acquisition system designed for use in an immersive teleconferencing system. The system was further modified and for medical use and several studies were performed on animal cadavers.

    4.1 Hardware system

    The initial design of the depth-extracting laparoscope was composed of a commercially available video projector (using Texas Instruments’ digital micromirrors) with custom optics between it and a flexible fiberscope and rigid endoscope. Then a camera was rigidly mounted to a laparoscope and these two laparoscopes, the projector and camera, were held rigid in a predetermined offset. A block diagram for the system is given in Figure 3. A tracking device (Flashpoint™ 5000 by Image-Guided Technologies, Inc.) tracked the assembly to register acquired geometry to the world for use in augmented reality.

    Structured light patterns were generated from a SGI O2 and were projected through the tube into a simulated patient. Images from the laparoscope camera were captured by the O2 and processed on the O2. The O2 could then store or transmit the range images as well as video images for use in our augmented reality system.

    This initial system was large and difficult to reposition. Further improvements included adding a fiber optic link from the projector allowing some ability to reposition the laparoscope without needing to move the unwieldy projector. An image of this improved system is shown in Figure 4.

    4.2 Structured light

    Structured light is a well-known technique in computer vision to acquire depth information1,4,6,8. These techniques use a projector to project a known light pattern onto a scene. The scene is then viewed from a camera at a slightly different position than the projector. The distance of objects in the scene from the camera can be calculated if the position of both camera and projector are known and if the camera can recover features projected onto the scene. Distance measurements may be precalculated for particular camera/projector geometry reducing the computational costs of locating features in camera images and table lookup of range values.

    This system has used both binary encoded stripe patterns as well as simple single lines. Binary encoded stripes allow use of a small number of images to derive comparable information to using many more single stripes. Simple thresholding was used to locate stripes in the camera images.

    Complete range images could be calculated and acquired in five to ten seconds with this system. Stereo images rendered from acquired data in our augmented reality system are shown in Figure 5.

    4.3 Animal cadaver experiments

    This system was able to acquire adequate range images from simple models. Previous implementations of structured light fail when presented with a highly specular environment. The abdominal cavity is typically a highly specular environment due to the fluids that coat internal organs as well as the optical properties of the cells that line them. Structured light techniques also typically fail in this environment because of the difficulty associated with automatically recovering projected features.

    To test whether we could recover features from images projected onto internal organs, we tested our device on a porcine cadaver. The experimental setup is shown in Figure 6.

    An image was captured from a pattern (a series of squares) projected directly onto the porcine organs. A coating of talcum powder was then blown onto the organ to reduce the amount of specular reflection and then a second image was captured without moving the laparoscope. This approach allowed us to compare the automatically generated position of projected features on the raw organs as well as from images that we could be confident in the acquisition. Only small differences in the position of the projected features were found between the two sets of images.

  5. Real-time custom projector system
  6. With what we learned from the structured light, digital projector depth extraction system we created the requirements for the real time system. Additionally, hardware became available that would, theoretically, let a real time system be possible.

    1. Changes from the original, Digital Structured light System:

New high-speed cameras, fast, external graphics processing board and a way to project images at 6 times the standard projector speed sparked the possibility of a true, real-time structured light depth extraction system. Hopefully, robust and reliable enough for even medical use.

5.1.1What we learned:

The flexible projection shaft to a laparoscope is nice but not too maneuverable; mainly in that it cannot rotate nicely. Another problem with it is that the actual projector is bulky and in an operating room environment, space near the operating table is constantly at a premium with all of the necessary monitoring and other electronic equipment.

Light output was marginal. This was due to the very small aperture in the flexible laparoscope. Although a custom optical lens system was installed between the Texas Instruments micro-mirror device, it was not optimized for light gathering ability. It is optically difficult to use a large image plane device and shrink the projected image through a narrow aperture without loosing a great percentage of the light. A new optical or projection system would be needed.

The speed of the system was also not as fast as would be acceptable for real time use. New high-speed cameras and projectors would be needed. Hardware or dedicated preprocessing of the images would need to be performed before it is sent to a computer bus. We determined that to have an update rate of 15 frames per second with 5 x 2 (with an inverted field) and at our desired resolution of approximately 500 x 520 pixels, we would require following bandwidth just for the raw images:

15 (frames/sec) x 5 frames x 2 (fields/frame) x 500 x 520 (pixels2) x 8 bits/pixel x 1/8 bytes/bit = 40Mbytes/second.

This is considered the minimum amount of information bandwidth. This does not include the use of color, which adds 30% more time overall per final frame because of a spinning color wheel is used at the end of each depth extraction sequence in front of the BW camera for inverse color addition. Nor the use of subsampling time where a long exposure, full frame exposure time, is used in conjunction to duplicate short frame time exposure to extend bit depth and reduce spectral reflection difficulties. With these two options, necessary for medical use where the object of interest is highly specular and accurate color is extremely important, the bandwidth then increases to a around 100Mbytes/second. A preprocessor board would be needed.

5.1.2 New technology:

During the time after the piggy tests, new hardware came out that we decided we could build upon. Ferro-reflective LCD displays with extremely fast frame rates (over 180Hz) were appearing; these displays were also quite small and started us on the route to our own, custom, high-speed projector. A high-speed camera became available from Dalsa Corporation that could read up to 264 frames/second and an image processing board from Matrox that could preprocess our images and interface to this camera also was becoming available.

 

5.2 Hardware Details of High Speed Structured Light Depth Extraction System

An image of the initial breadboard with the first high-speed images captured is displayed at right. Because of the fast display frequency and the positive and negative method in displaying these structured light images, the projected images are invisible to the eye.

In this system, we have the host computer generate the structured light image; in our case, vary width and quantities of stripes or squares. They are displayed through an ordinary VGA card at 640 x 480 resolution. Each of the three colors is considered an individual displayable frame and the images are pre-generated that way. The graphics card output goes to the custom, miniature projector. This projector grabs each color and displays a single color channel at a time in a positive black and white image and then immediately following as in inverse of that image. The inverse is to keep the LCD liquid from pooling or migrating in the display. After the image and inverse image of the red channel is displayed, then the green channel is displayed, in positive and inverse image like the red. This is repeated for the blue channel. Overall, for a single color frame sent to the miniature ferro-reflective projector, 6 output frames are produced.

Synchronized to the VGA output to the projector is the high-speed camera. The Dalsa CA-D6-0512 camera has a resolution of 532 x 516 and synchronized at to the projector at 180 Hz. Images from this camera are sent via four parallel 8bit LVDS ports to a Matrox Genesis board with a special high speed LVDS adapter card. This card then performs the same type of structured light depth extraction techniques that the earlier flex scope SGI based structured light system did. However, these are now performed in real time, sending out 20 frames/second of 3d depth maps with textures instead of the bandwidth choking 180 Hz of uncompressed images.

5.2.1 High Speed Projector:

A Displaytech ferro-reflective display engine was used as the core of the projector. Because we needed much higher light levels than LED could give, but yet wanted to maintain a very small handheld package, we used a remote flex light. The particular light picked has a very efficient IR blocker which projects visible light only through the fiber onto the LCD display reducing the damage from IR heat that a normal light would impose.

Collimation lenses were used at the end of the fiber bundle to concentrate the light evenly onto the display. New lenses were also installed on the exit side of the imager to allow the image to be projected through a standard endoscope eye aperture. They were focused to project an image at 3 feet away, about the same distance that a laparoscope is set for. A special quick change mount was also integrated to allow us to switch out different endoscopes and laparoscopes quickly. Overall the projector is a scant 50mm in diameter by 67mm long. The images to the right show an assembled projector attached to a standard laparoscope extending from the right. The bottom image is an exploded view showing the fiber light entrance (top) with collimator lens; the LCD display engine and cube (center); and the exit lenses and mount (toward laparoscope).

There is now a manufacture that is producing a high speed ferro-reflective display that has a reduced crystal migration effect. By reducing the number of inverse images one has to make, required because of the crystal migration, one can speed up the overall update rate of the system.

5.2.2 Cameras

We used two different cameras to gather high speed images from this system. The initial workhorse was a Pulnix progressive scan camera grabbing 60 frames per second. This camera was utilized more during the high speed research because of its propensity to always work. The Dalsa DA-0512 and Matrox Genesis LVDS system theoretically were a matched pair; however, our models of both components were from the first month of introduction and they both were, unfortunately, very unstable together.

5.2.3 Processor board

The Matrox Genesis board with LVDS option was the preferred graphics card for the real-time nature of these experiments. The onboard processor was able to convert the image sets as fast as we expected it to. However, as mentioned above, interfacing to the matched camera via the high-speed LVDS was not easy and this particular arrangement is not recommended. A more robust and high speed digital linkage is needed between camera and image grabber/processor board to make this system more than a laboratory system. Because of the Matrox card and Dalsa flakiness together, we could not use this system on subjects.

On the Matrox board, we grab six frames in two separate sets. From each of these sets, three sets of depth information is extracted. Each set having finer and finer resolutions off axis, same as the first system running on the Silicon Graphics workstation. These sets are then sent out from the Genesis card to the PC processor where a texture map with lookup tables of depth and light intensity are updated. This data is then grabbed as a 3D skin and the corresponding texture is overlaid. The display of this information is controlled from an external input device, normally a tracker on a HMD or surgical headband and then displayed to either a monitor or HMD for the surgeon to view.

5.3Future additions

There are additions that we were not able to include during these initial experiments that we feel would greatly improve the usefulness of the system:

5.3.1 Color wheel

The images gathered are all black and white. For many applications, especially manufacturing and object tracking, this is perfectly fine. In medical applications, color is imperative. A surgeon needs to see the color and changes in color of the various items and fluids that they encounter during procedures.

We intend to add a synchronized color wheel to the input of the fiber light. When a single color frame is projected, items that respond to that color will be brighter. When each of the three colors are projected and are individually imaged, these three images can be combined together to create a full color image. This color image would replace the B/W one that was grabbed during the general structured light imaging. The color wheel addition would only be displayed during a small percentage of the time and not during the structured light grabbing, additionally, it will be displayed every other frame to reduce lag even farther. The addition of the color wheel would decrease the final frame update rate by 30%.

5.3.2 Stereo Laparoscope

A stereo laparoscope will maintain the geometry offsets required for accurate depth extraction. Because of our dual tube prototype arrangement, the laparoscope is not easily adjusted inside the patient. Several different style stereo laparoscopes are available but the style needed requires a high contrast ratio between images since one tube path is the high speed camera while the other is the ultra-bright projector.

5.3.3 Increased light depth

Because the inside of the body is wet, it is highly specular. This causes problems for projection devices blinding the camera from various places on the surfaces. To reduce this blinding problem on the imager, increased digital depth from 8 bits to 12 or more are needed.

Currently, there are not any high resolution, high-speed cameras with greater than 8 bits output of sensitivity. When cameras become available, this would be the easiest way to reduce the specularity problem. For now, we propose a secondary scanning of the image with the camera’s shutter open for 1/8 the normal duration. Combining this data to the full intensity data will result in 12 bits of light intensity information per pixel.

Another method to reduce the specularity problem is by various polarizing schemes between both the camera and the projector. Since the reflective LCD is already polarized, unwanted reflections by adding a polarizer on the camera side can be reduced for some angles.

6. Summary

It has been shown that structured light systems are feasible and that high-speed structured light is possible and has been shown operationally. However, we believe that newer, more stable hardware should allow for more reliable systems that will make available useful systems for manufacturing, dynamic modeling and medical use.

7. ACKnowledgments

We want to thank the following for all the help and knowledge that they lent us for this research: Henry Fuchs, Mark Livingston, Sang-Uok Kum, Remesh Raskar, Michael Rosenthal, and Andrei State.

8. references

  1. R.C. Dailey, L. G. Hasselbrook, S. Tungate, J. M. Reisig, T. A. Reed, B. K. Williams, J. S. Daughterys, and M. Bond, "Topolgraphical Analysis with Time Modulated Structured Light", SPIE Proceedings 2488, 1995, pp. 396-407.
  2. Andreas Fountoulakis, A. Loman, S. Chadwick, "Initial Experience of 3D Video Endoscopy in General Surgery", Surgical Technology International, pp. 115- 119.
  3. Henry Fuchs, Mark A. Livingston, Ramesh Raskar, D'nardo Colucci, Kurtis Keller, Andrei State, Jessica R. Crawford, Paul Rademacher, Samuel H. Drake, and Anthony A. Meyer, "Augmented Reality Visualization for Laparoscopic Surgery", MICCAI ( Proceedings of First International Conference on Medical Image Computing and Computer-Assisted Intervention) 1998.
  4. P. Levoie, D. Ionescue, & E.M. Petriu, " 3-D Object Model Recovery from 2-D Images Using Structured Light", IEEE Instrument Measurement Technology Conference, pp. 377-382.
  5. J. Marco, Mark A. Livingston, and Andrei State. "Managing Latency in Complex Augmented Reality Systems." Proceedings on Interactive 3D Graphics Series, ACM SIGGRAPH, pp. 49-54, 1997.
  6. R Raskar, G Welch, W Chen, "Tabletop Spatially Augmented Reality: Bringing Physical Models to Life using Projected Imagery", Second Int Workshop on Augmented Reality (IWAR'99), October 1999, San Francisco, CA.
  7. Andrei State, Gentaro Hirota, David T. Chen, William F. Garrett, and Mark A. Livingston, "Superior Augmented Reality Registration by Integrating Landmark Tracking and Magnetic Tracking," Proceedings of SIGGRAPH '96, Computer Graphics, Annual Conference Series, 1996, pp. 429-438.
  8. G.Turk and M. Levoy, "Zippered Polygon Meshes from Range Images", Proceedings ACM SIGGRAPH 1994, pp. 311-318.