Teleimmersion Initiative at UNC

UNC-CH Participation in the National Tele-Immersion Initiative

A Proposal to Advanced Network and Services, Mr. Jaron Lanier

Principal Investigators Henry Fuchs and Greg Welch

UNC participation in the National Tele-Immersion Initiative (the "Initiative") has two main impacts. First, the participation will dramatically accelerate our ongoing research and development of the technologies necessary in realizing our dream of a tele-immersive system within an office environment. At the level of our current NSF funding, it would be very difficult to build and experiment with a large-scale prototype of our "visionary" tele-immersive system. The Initiative will provide the resources to dramatically increase our ability to do so for a reasonably-sized office environment.

Second, our participation in the Initiative brings to us new incentive and resources aimed at addressing the networking aspects of tele-immersion in general, and our proposed technologies in particular. In addition to the possibility of funding in support of networking research with UNC collaborators, the Initiative offers the expertise of ANS, and a concrete framework within which to implement and experiment. In the absence of the Initiative, we had no immediate plans to address the networking aspect of this research, despite the importance of such work.

With ANS support, and by leveraging existing relevant NSF Science and Technology Center for Graphics and Visualization (STC) research and funding, we plan to have a preliminary prototype at the end of one year, and a substantial realization of a distributed tele-immersive system in three years. This three-year visionary system would accommodate four distant sites (one user per site) in an office-corner tele-cubicle arrangement. In this setup each user would see (optionally in head-tracked stereo) the other three users, a reconstruction of the other three users' environments, and a common mechanical part under design. In addition, the users would be able to jointly interact in the design (and manufacture) of mechanical parts and assemblies.

The UNC activities under the Initiative fall into two categories: long-term research into the algorithms and systems aimed at a full realization of our tele-cubicle dream, and more immediate (short-term) connectivity & interoperability tasks aimed at aiding the members of the Initiative who are undertaking the primary networking research.

1. Long-Term Research Activities

We plan to devote virtually all of our energies into the research and development of a dramatically new tele-cubicle with a unified approach to scene acquisition, display, and calibration through the use of globally synchronized control of cameras, projectors and lights in a tele-cubicle. We hope that this new approach will yield dramatically improved results, and will alter the way people think about the problem of tele-immersion.

In the first year of the Initiative we anticipate being able to offer limited but compelling local (UNC) demonstrations of scene acquisition, display, calibration, and tracking technologies. By the third year of the Initiative we expect that relatively complete technology will be transferable to, and demonstrable at, remote Initiative sites.

The long-term research activities can be broken down into five key areas as follows.

1.1 Scene Acquisition

The key idea behind our planned long-term approach to scene acquisition is that for every solid particle in the tele-cubicle working area we aim to determine its 3D location and its reflectance properties in real time, and then either (1) display it or (2) display on it. In other words we seek to "extract" the geometric and photometric properties of all scene and display surfaces in the environment. This information could then be used as a (dense) representation of a local scene, acquired solely for the purpose of display at a remote site, or as calibration data for local surfaces that have been designated as "display surfaces" for the purpose of viewing similarly acquired remote scenes. For such designated display surfaces the acquired geometric and photometric data could be used to ensure color and intensity uniformity for arbitrary viewing directions, in effect autocalibrating the display surfaces. This concept is described in section 1.2.

For some time now, UNC and Ruzena Bajcsy's GRASP Laboratory at the University of Pennsylvania (Upenn) have been pursuing two complementary approaches to the scene acquisition problem. Henry Fuchs and Ruzena Bajcsy have been working in close cooperation, monitoring the progress and direction of each approach, aiming to achieve success from either direction. It is possible that the two approaches may converge in the coming years, with the method of choice being some combination of the two. Henry and Ruzena are working on a one-page document that will clarify the relationship. They have identified several milestones in the first year.

UNC has been experimenting with, and feels that there is promise in, dynamic structured-light approaches to scene acquisition. A common approach is to project into the scene a sequence of known coded patterns, and then to observe corresponding images of the scene in order to determine a correspondence between each projected light ray and camera pixel. This correspondence information can be combined with off-line calibration information that reflects the geometric relationship between the projector and the camera, to produce range or depth data. The results can be improved by first measuring the surface reflectance by projecting and observing minimum and maximum intensity images.

Such active techniques offer some advantages over "traditional" passive computer vision techniques that rely on the determination of correspondence between image pairs from naturally lit scenes. For example, active techniques can succeed with surfaces that contain little or no texture information. The correspondence of images from such "textureless" scenes is otherwise difficult or impossible to obtain. In addition, controlled reflectance measurements can improve the results for surfaces that are highly specular, a condition that can result in erroneous results with traditional passive computer vision techniques.

While such structured-light techniques are not new, we have been working on two novel aspects of the problem. First we are working on algorithms and realizations that are aimed at the real-time application of such structured light techniques. This work includes a novel calibration procedure that indirectly determines the extrinsic parameters of a camera and a projector, distributed parallelization of the computations, and a variety of more traditional code optimizations.

Second we are working on technologies aimed at making the otherwise distracting light patterns imperceptible by carefully manipulating micromirror-based digital light projectors (DLP) at rates that are above a perceptible threshold, rendering the dynamic patterns imperceptible, resulting in the appearance of what is effectively plain white light. We believe that by populating the tele-cubicle environment with a synchronized collection of such "work lights", and eliminating all other sources of uncontrolled light, we will be able to acquire geometric and photometric scene information without the knowledge of the user.

UPenn on the other hand is pushing passive multi-baseline camera correlation techniques as far as possible, with minimal reliance on structured light. The researchers at UPenn have significant experience in this area, and been making strides primarily in two areas: robust algorithms for dense image correspondence, and optimizations aimed at real-time operation.

Because, as stated above, the correspondence of images from "textureless" scenes is difficult or impossible to obtain, the researchers at UPenn are experimenting with relatively static, random light patterns that effectively apply texture to the scene surfaces, reducing the correspondence problem and improving their results. (UNC is trying to minimize this correlation problem by using dynamically structured light.) UPenn's light patterns could be, for example, projected with DLP's such as those used at UNC, or even a simple overhead projector.

While this approach appears to have less stringent camera/projector timing and synchronization requirements, UPenn will have to address the problem of the random patterns degrading the appearance of the scene, both locally and remotely. In either case, participants may be distracted by the appearance on the patterns on various surfaces in the scene. It might be necessary, for example, to synchronize the camera rig and the pattern generator(s) to the extent that one multibaseline snapshot can be taken with the patterns projected into the environment, and then one without. The first could be used to extract dense geometric information (depth), the second could be used to provide color for the extracted points.

In one year we should be able to extract and display a single user and limited desktop area within the tele-cubicle, in 3D. It should be possible to view the results in a nearby room (same building) at interactive rates, and at a remote site at a lower rate with higher latency.

Assuming better access to the DMD chip from Texas Instruments, we should be able to do this using imperceptible structured light.

In three years we believe that we will be able to acquire a complete office area at modest resolution, and the office-corner tele-cubicle in relatively high resolution, at interactive rates. This will require "gen-locking" multiple camera/DLP pairs, i.e. implementing synchronized multiple-cluster simultaneous extraction.

As an intermediate step we will be using multiple cameras with a single DLP. In fact we plan to use our recently developed wide-field-of-view and high-resolution "camera cluster" in order to capture a larger portion of the desktop and office area. This will allow virtually seamless views of the entire office work area even in the very near future. In the first year we plan to use one or two existing 6-camera clusters. By the third year, we would design, build, and field (with our University of Utah collaborators) new physically-smaller, higher resolution clusters that are specifically made to fit in an office work environment, for example in the corner of an office such that four tele-cubicle with copies of a 90 degree field-of-view camera cluster could be used to obtain a central cylindrical reconstruction of the four physically-distant offices.

1.2 Tele-Cubicle Design & Construction&emdash;Integrated Display with Capture

We have already begun construction of a corner tele-cubicle, with an arrangement that resembles the above figure. The back walls of the tele-cubicle would be used as display surfaces for a tiled display system that would be based on the same type of digital light projectors (DLP) discussed above. Using multiple ceiling-mounted DLP/camera pairs that are aimed at the designated display surfaces we will acquire the geometric and photometric characteristics of those surfaces. This information will then be used for display autocalibration, i.e. to ensure color and intensity uniformity (blending) for arbitrary viewing directions, given arbitrarily overlapping DLP projections. Similar ceiling-mounted DLP/camera pairs will be aimed downward toward the local participant and his immediate environment, and will be used to extract a (dense) representation of the local scene. A portion of this information will then be transmitted over the network and rendered at a remote sites (see next section).

Eventually, all of the ceiling-mounted DLP/camera pairs will have to be synchronized so that projections from one do not interfere with another. Once this is accomplished, not only will neighboring DLP/camera pairs not interfere with each other, but DLP/camera pairs could be completely overlapped in space and interleaved in time to obtain higher resolution and/or more light.

In addition, if the system is to offer stereoscopic display capabilities, there will likely be additional synchronization requirements. Stereo would be likely be accomplished by employing shuttered glasses, with each of the left and right eye being shuttered sequentially in synchrony with projection of the corresponding left and right eye images on the display surface. Thus it might be further necessary to synchronize a user's shuttered stereo glasses with the DLP/camera pairs so that the tiled display surface can be used as a stereoscopic display while the user and his surroundings are simultaneously being "scanned" for reconstruction at the remote site. We will work toward direct control of the micro-mirrors to enable precise stereo as well as other effects, such as camera capture of the user when both eyeglass lenses are "open" to allow eye contact with the collaborators. In the first year, before the "direct mirror drive" interface is finished, the tele-cubicle's operating modes will be limited, for example, to either stereo with limited color, full color but no stereo, or stereo/monochrome with unobstructed eye contact.

With respect to the autocalibration of the display surfaces, we are aiming to achieve simple "plug-and-play" automation of display surface determination and projector and renderer accommodation to the display surface geometry and reflectance characteristics. We expect the initial implementation of autocalibration to become operational by the end of the first year. We expect this initial implementation to be mostly, but not totally, automatic. In addition we expect that the calibration will not be continuous, nor imperceptible. The system will probably require a known target, like a checkerboard, to be moved about the scene, to roughly known locations. This will calibrate the cluster of cameras and projectors. After this semi-automatic calibration procedure, the system will automatically calibrate itself to the (nearly-) arbitrary projector surfaces geometry. This surface geometry will be integrated with the rendering system so that the rendering system will compensate for distortions caused by the display surface geometries and also for overlapping projector images at certain places. We plan to demonstrate the effectiveness of the procedure by altering the screen geometry, running the automatic screen calibration, showing the changed rendering. Although we expect that the initial implementation will run correctly, conceptually, we're concerned that there will be inaccuracies that will necessitate further research and development.

In the third year, we plan to have fully automated, continuous imperceptible display surface autocalibration that will facilitate simple "plug and play" setup in a new or changing office environment as well as keeping the system in continuous calibration. Our inspiration for this, our new "ceiling" tracker, continues to operate effectively, and recalibrates itself, even when beacon-embedded ceiling tiles are moved many inches from their nominal positions.

1.3 Rendering

Rendering concerns for the long-term, full realization of our tele-cubicle dream, appear to be significantly different from those for a short-term geometry-based system such as is typically used in Caves. In the latter case, conventional rendering engines and graphics applications are well understood, and can readily be purchased and applied. However, the dense scene acquisition techniques being pursued by both UNC and UPenn potentially give rise to new rendering (and networking) challenges as the primitives become points instead of polygons. The scene acquisition approaches appear to lend themselves more to real-time image-based rendering than conventional geometry-based rendering.

For our dream system, the sheer volume of such real-time image-based rendering data could be overwhelming. Consider that for either the UNC or UPenn approach, or some hybrid of the two, each scene acquisition rig will eventually be both collecting geometry and texture "images" at approximately 30 frames per second. In other words, each rig will require network and rendering bandwidth of approximately 60 frames per second. These requirements scale approximately linearly with the number of rigs, so that as rigs are distributed about the tele-cubicle, and more and more of the scene is acquired in real time, the network and rendering bandwidth increases approximately linearly. The network bandwidth needs increase both locally within a single acquisition system, and between remote sites. The rendering needs increase similarly at both sites (assuming replicated systems).

While some increased network bandwidth and performance is anticipated as a result of help and resources from ANS, it is clear that some careful data management and simplification techniques will be required to reduce the data such that both network transfer and rendering are feasible. We intend to work with our local experts on this subject in order to arrive at an architecture that inherently and/or explicitly achieves such simplification. The fundamental notion is to send only a fraction of the scene information at each 'frame' time, for example, having at the remote site a computationally-capable machine that can maintain (and predict) state of the scene at each of the other sites. In other words, each participant might have his/her own local database server that maintains information about the state of all the scenes remote to its user. However, before we deal with the many-user problem we want to attack the problem for just a few participants, which will be plenty challenging. We're already working on the network limitations for the few-user scenario, e.g., view frustum culling, and loss-limited compression of depth maps.

In the first year, we should be able to take local real-time video & depth (UPenn or UNC) and perform image-based rendering to a nearby viewer location, at interactive rates. As feasible we would render from a head-tracked viewpoint, and eventually in stereo.

1.4 Participant Tracking

For either the short or the long-term proposed tele-cubicle approaches, there is a need for tracking participant's heads and hands for the purposes of rendering (stereo or head-tracked mono), as well as for application input, e.g., gestural control. Initially, we plan to use the UNC HiBall tracking system for head and hand tracking. As discussed in our recent publications (e.g., SIGGRAPH'97) this system offers data rates, latencies, and accuracy that is unrivaled in commercial tracking systems: 2,000 updates per second, 0.1 millimeter translation precision and 0.02 degree rotation precision, within a room-sized working volume.

Despite such stunning performance, our eventual goal is to eliminate the need for a separate tracking system like this altogether by tracking the user as an integral part of the scene acquisition. Our hope is that some of the approaches used in our HiBall tracking system can be incorporated into the larger problem of scene capture, display, and continuous calibration. In particular, we plan on pursuing imperceptible structured light approaches that track particular designated scene objects. We believe that this can be done under some circumstances by illuminating an object from a DLP and then observing the reflected light from multiple camera viewpoints that individually offer incomplete information but together contain a solution.

In the first year we plan to rely on the UNC HiBall tracking system for head and hand tracking. In addition, we believe that we can demonstrate limited hand tracking using our ideas for imperceptible structured light. By the third year we would expect to have more robust hand tracking and integration into an application (see next section) for gestural recognition. In addition, we hope to demonstrate imperceptible structured light head tracking, although we will likely have to rely on known targets placed on the user's shuttered glasses.

1.5 Applications

We believe strongly that any prototype tele-immersion system should be coupled with one or more designated "real" applications with real users. As opposed to "toy" applications, this practice should give a more objective measure of the usefulness of the tool (the tele-immersion system). For this and other reasons we have had an ongoing relationship with researchers the University of Utah who are experts in mechanical design and fabrication. As part of our joint work under the STC we have been working on the collaborative design and rapid prototyping of a variety of mechanical parts, with the dual aims of developing better tools for remote collaborative mechanical CAD, and developing mechanical parts that we were in need of but unable (or unlikely) to get from another source.

Participation in the Initiative offers an opportunity to accelerate various efforts to develop telecollaborative CAD tools, and to place such work on a more solid scientific basis. Our Brown and Utah colleagues are working to augment Utah's existing CAD tools with the dramatic new "sketch-like" interface of the Brown Sketch/Jot system. We intend to work with them to incorporate an interface for these CAD tools into our long-term tele-cubicle system. By the third year it might be possible to extend the capability of this system to the other sites within the Initiative so that designers at NY or USC, for example, could design a mechanical part collaboratively and have it manufactured at Utah.

In parallel with long-term efforts to incorporate such advanced CAD tools into the tele-immersion system, we feel that the Initiative offers an opportunity to better asses the applicability and usefulness of such tools in the typical mechanical design process where remote collaborators are involved. We envision participating in an experiment that follows the complete design and manufacture of a real mechanical part, with the following classes of people involved:

a. mechanical CAD experts who run and observe the design and manufacture experiment,

b. technicians and engineers involved in the design and manufacturing process,

c. people providing the tele-immersion tools.

We at UNC feel equipped to participate as members of classes (b) and (c), while we would anticipate the folks at Utah might cover class (a). The ideal circumstances might be a Ph.D. student at Utah who was interested in arguing for or against the use of tele-immersion systems for different aspects of remote mechanical CAD collaboration.

2. Short-Term Connectivity & Interoperability Activities

The work in this second category of effort is proposed in recognition of one of the central goals of the Initiative, namely to provide an application or class of applications that will likely stress the networks of the future. The problem is that while we recognize the need to begin network research in the short term, we feel that our long-term research into the algorithms and systems aimed at a full realization of our tele-cubicle dream would, by definition, not be appropriate as a viable "network-stressing application" in the short term. Furthermore we feel that it would be a mistake to try and modify the direction of the long-term research to facilitate the early network research, in that such a re-direction could adversely affect the quality and compelling nature of the scene acquisition and display systems.

Instead, we propose to participate in more immediate (short-term) connectivity and interoperability tasks aimed at aiding the members of the Initiative who are undertaking the primary networking research. This participation would also have the effect of fostering an appropriate degree of four-site interaction as rapidly as possible.

As part of this effort we envision three specific activities for the short-term:

a. interconnection with the other sites, e.g., UIC, USC, and Advanced

b. implementation of a 2-wall Cave in UNC's Protein Interactive Theater (PIT)

c. demonstration of a simple application, e.g., mechanical or medical, with a UNC specialist

We have tentatively assigned one of our more experienced students to manage this research for the coming semester. Her major research task for the semester will be to engage with other researchers in the consortium on collaboration software, including importing the Cave-related software and bringing up on our two-wall "PIT", our closest current approximation to a Cave.

In the later years of the Initiative we foresee a transition from our short-term Cave-based implementation to the then complete "visionary" system with scene acquisition, display, etc. (Note that the long-term system may indeed be implemented on top of a Cave framework, it's difficult to say at this point.)

3. Summary

In summary, we expect that within the first year we expect to participate in some immediate (short-term) connectivity and interoperability tasks aimed at aiding the members of the Initiative who are undertaking the primary networking research. Within the first year we also plan to demonstrate portions of the complete long-term "dream system", possibly even a limited tele-cubicle at UNC and some environment at each of the other three sites.

We expect that within three years, there will be a comprehensive realization of our "visionary" system that unifies, in a dramatically new way, environment display, capture and calibration. Further, we expect that this "visionary" fully functional tele-cubicle will be reproducible to other sites within the tele-immersion initiative project, allowing a new level of presence with distant collaborators.

January 30, 1998