Architecture Overview : |
The Renderer is the backend component in the 3D Tele-immersion pipeline. It is designed to receive dense depth maps over multiple channels and render them as a giant point cloud and re-create a life-size, view-dependent stereo display of the acquired scene. Multiple depth maps are time synchronized to create a composite display frame. The stereo display system uses two projectors in a passive stereo setup while the user's current position is estimated using a HiBall wide-area tracker.
The old version of Ti_viewer was designed to run on a single PC with both the left and right eye image rendered on the same machine with a dual output graphics card, running in twin view mode . The renderer had two threads : the first thread read depth maps off the network and re-created a point cloud display list while the second thread received updates from the wide-area tracker and accomplished view dependent rendering. With 3D reconstruction at a rate of 1-2 frames/sec with background segmentation, and depth maps of resolution 320 x 240 pixels, this version of ti_viewer succesfully rendered approximately 50,000 points for 3 streams of depth maps. However the architecture was simply not scalable to support dense depth maps from full scene reconstruction at 640x480 resolution and possibly higher reconstruction frame rate. This was the motivation for creating a 3 machine architecture for a distributed renderer which could handle upto 2 million points and still render it at an interactive rate.
The 3 PC
Renderer has three modules / processes :
These 3 processes
run on 3 PC's connected over a local dedicated Gigabit Ethernet switch.
The PC's run Red Hat Linux 7.3. The hardware configurations of these 3
PC's are described later.
The recon server is the aggregate node for the renderer where all the depth map frames are received and synchronised based on a timestamp in the header of the depth map. The data is forwarded to the left and right eye renderers over the dedicated local network using UDP multicast. The actual networking protocol depends on which mode the renderer is running in. The left and right eye renderers are each 2 processes running on 2 different m/c's and comprise of 2 threads each. The first thread : the network thread reads data from the recon_server and updates the point cloud model in memory. The second thread : the rendering thread renders the model at an interactive rate. The exact rendering architecture depends on the mode of operation of the renderer.
A typical tele-immersion demo is described alongwith the b/w and rendering data sets involved.Let us assume 5 streams of 640x480 resolution depth maps each with a reconstruction rate of 1 frame/sec.
The depth maps contain a stream definition header followed by the depth map itself. Every pixel in the depth map
is represented as 5 bytes ( 3 bytes for color - RGB and 2 bytes for the depth ). Holes in the depth map ( these are points for which a correspondence was not found with a high confidence ) are not rendered and are represented using a special value defined as part of the stream definition. Similarly background pixels are represented by a special value and are not rendered. Assuming 75% reconstruction efficiency on an average, the size of 1 depth map is :0.75 * 640 * 480 * 5 = 1.098 MBytes / depth map.
When a depth map pixel is converted into a 3D world point, it requires 15 bytes for storage :
12 bytes for 3 coordinates and 3 bytes for RGB color.
For the above example : 1 depth map would produce 0.75 * 640 * 480 points = 230,400 points.
Each point takes 15 bytes for storage , thus the total number of bytes used to store this is 3.1 MBytes.With 4 streams, the renderer has to render almost 1 million points at an interactive rate.
Fig 1 : Overview of the 3 PC Renderer Architecture.
Modes of Operations:
There are 2 main modes of operations. An alternative approach would have been to design two different applications which would have considerable overlap in terms of functional modules like stream parsing, networking, timestamp based synchronisation etc. Instead both the rendering functionalities were built into the same renderer and the decision to run in a particular mode can be decided during a tele-immersion session by the user. The 2 modes of operations shall be called the following from now on.
Fig 2: 3 PC Renderer is VAR Mode
The main motivation behind this alternative design of the renderer was the quest for higher rendering speeds. Since conversion of the depth maps into points is a standard matrix multiplication, it was decided that a vertex shader program could probably do the computation faster. Thus the idea was to move the CPU intensive operation in the recon_server to the rendering processes. An extra benefit would be lower bandwidth over UDP since 1 depth map pixel ( 5 bytes ) requires more storage space as a 3D Point ( 15 bytes ). We still had to implement this design to be sure of the trade-offs and gains of this approach.
- Recon Server in Cg Mode
The recon_server becomes a simpler module which simply synchronises the streams based on timestamps and chooses the particular depth maps to display in the next update. Each of these depth maps are forwarded or relayed to the left and right eye renderer over different UDP connections. Each of the depth maps are received in the renderers over a different UDP port which is automatically selected during initialization.
In this mode the complete depth map needs to be transported from the recon_server to the left and right eye renderer running in Vertex Shader Cg Mode. For a lower reconstruction efficiency, the size of all the depth maps put together would be higher than that of the composite 3D point cloud payload in the VAR mode of operation. However a simple analysis will show that for a reconstruction efficiency of higher than 33 %, the bandwidth over UDP for a 3D point cloud would be higher than that of all the corresponding depth maps.
Renderer in Cg Mode
This time the renderer has to parse the depth maps from the UDP stream, in the same way that the recon server parses the streams in the VAR Mode, the depth map buffer is passed to the rendering thread directly to the vertex shader program. The conversion of depth maps to 3D Points is done through floating point instructions in a Cg program. Depth map pixels that are holes or background pixels are projected to the plane at infinity and are not rendered visible in this way.
A top-level view of the architecture of the renderer in the Cg Mode is shown below :
Fig 3: 3 PC Renderer in Vertex Shaders Cg Mode.
- Parsing Depth Maps / Stream Definition
In this section we shall briefly look at the stream definition, ie. the protocol for the frames in the depth map stream.
Every Depth Map Frame contains the following fields in the following sequence :
Frame 4 bytes unsigned int The frame number / timestamp field Height 4 bytes integer Height of the Frame , typically 480, 240 Width 4 bytes integer Width of the Frame , typically 640, 320 offset 4 bytes float Offset of the camera from the image plane stepsize 4 bytes float depth quantization stepsize used in reconstruction red length 4 bytes unsigned int Length of the buffer of red values for all pixels red plane byte char stream of the above length Red Buffer green length 4 bytes unsigned int Length of the buffer of green values for all pixels green plane byte char stream of the above length Green Buffer blue length 4 bytes unsigned int Length of the buffer of blue values for all pixels blue plane byte char stream of the above length Blue Buffer depth rows 4 bytes int Depth frame height same as texture frame typical. depth columns 4 bytes int Depth frame width same as texture frame typical. depth length 4 bytes unsigned int Length of depth pixel buffer depth image 2 byte unsigned short stream of the above length Depth pixel buffer , each pixel ( 1/z value ).
At startup every TCP Connection in the recon_server receives the camera matrix for every view in the scene acquisition room and stores this information. The camera transformation matrix is used to compute the value of the 3D coordinates of the point in camera coordinates from the inverse z value of the (i,j)th pixel in the depth map.A brief explanation of how the (X,Y,Z) coordinates are obtained from the inverse - z values.
![]()
(xc, yc,wc ) are the homogenous image coordinates whereas (X,Y,Z) are the world coordinates of the point.Add a 4th rowto the column vector on the left in the above equation and add a 4th row [ 0 0 0 1 ] to the matrix.
![]()
the inhomogenous coordinates of the points will be obtained by dividing the column vector on the left by ( wc ).
But Wc is nothing but equal to Zc in the above equation.Thus by augmenting the normal 3 x 4 Camera projection matrix by a [ 0 0 0 1 ] row, we have stored the ( 1 / Wc ) ie. the inverse z value in the above column vector. The matrix that the renderer obtains at startup from the reconstruction module is nothing but the inverse of this 4 x 4 matrix in the above fig..
Thus for the (i,j) th pixel, the ( X, Y, Z ) is obtained by a matrix multiplication as follows
![]()
Since the reconstruction code sampled the depth by some quanta, the reciprocal of the step-size is multiplied to the (1/z) in the above equation and an extra offset is added because the camera center may not beexactly at the Origin in the above model.
- Timestamp based Synchronisation
The recon_server maintains a buffer of M depth map frames from each of the N streams, thus at any time the buffer can have atmost M*N frames and the corresponding timestamps are recorded in the buffer. The buffer is treated like a sliding window of depth frames of fixed size M. M is typical set to a small value '5'. 2 timestamp values are equated based on a tolerance value of typically 15 ms. Based on the timestamp values, a certain candidate from each of the 'N' streams is chosen and is sent on to the renderer. Pointers to denote the last frame is stored in each of the N buffer windows. When a frame is received from each of the N streams, these pointers are incremented in a circular fashion.
Suppose in this scheme, frames on one of the connections got delayed, lets take a concrete example and discuss what would happen.
Say M = 5, N = 3 and the last timestamp displayed at the renderer ( current pointer in the buffer ) for all the streams is t=100.
Next say frames arrive on stream 1 and 2 corresponding to t=101, but the frame on stream 3 gets delayed. The recon_server would record the two new frames corresponding to timestamp t=101 but would realize that it hasn't received a frame on the third connection - so it would not create a refresh event until it has 3 frames corresponding to similar timestamps. Either the delayed frame arrives first corresponding to stream 3, t=101, in which case the recon server will check the buffer and see it has 3 frames corresponding to t=101, or future frames arrive from stream 1, 2 but do not get displayed until the third stream can catch up.
The timestamp based synchronisation code hasn't been well tested and its behaviour is not well understood and this needs to be looked at.
- Rendering using Vertex Array Range
Rendering is done in OpenGL making use of graphics accleration features from the latest of Nvidia's graphics cards and drivers. In the earlier version of the renderer, OpenGL display lists were being constructed on every frame update from the recon_server. However since the display list creation required the CPU, this would freeze the renderer and would create a break-in-presence in the preception of tele-immersion. To alleviate this problem, the construction of the display list was extended over a few iterations of the rendering loop instead of trying to construct it at one pass. This introduced a very small latency but at the added benefit of reducing the break-in-presence considerably.
Since we were looking for faster rendering options we looked at Vertex Array Range OpenGL extensions from Nvidia. This seemed to be a promising option since contrary to display lists, Vertex Array Range allows the graphics primitives ( in our case a vertex array of points ) to be stored directly in AGP memory on the graphics card.The official OpenGL Extension Specifications document for NV_Vertex_Array_Range is at :
http://oss.sgi.com/projects/ogl-sample/registry/NV/vertex_array_range.txt
Using VAR has acclerated rendering upto 300%.
- Rendering using Vertex Shaders
I need some input from Scott.
- Left and Right Eye Synchronisation
Since the data transfer between the recon_server and the renderers in either mode of operation is essentially UDP without any underlying guranteed delivery mechanism, there is a possibility of packets being dropped. Morever the time taken to deliver packets to the left and right eye renderers may be different and there has to be a mechanism to tie the left and right eye renderers together so that they update the reconstructed point cloud model at the same instant. If this is not done correctly the left and right eye will see different models for a short period of time, one eye will see the older model while the other eye will see the updated model and this will cause severe discomfort to the user.
To synchronize the left and right eye renderer, the 2 network threads send special messages to each other to signal the complete receipt of a payload, ( either point cloud or a group of depth maps ). Both the renderers essentially work in the following way,
.
.
read Packets from UDP ..
signal Peer ( either left or right renderer ).
wait signal from Peer ..
signal Rendering Thread that New Model is ready ( Unblock semaphore )
.
.The special signal message is sent to the peer process over a TCP connection which is initialized at startup. The special message is a "HELLO NTII <ip address>" message 14 bytes long.
Network Interface in 3PC Renderer.
The basic motivation behind using UDP is that it is fast. Morever on a dedicated direct link, it will probably not drop packets unless something unusual happens. UDP allows us to use multicasting which is practical in our case since both left and right eye renderers need to get the same data payload from the recon_server.
A basic UDP sender and receiver functions completely asynchronously. However the various modules of the renderer need to sychronize with each other to function properly. This synchronization is done in our own protocol which is now described.The idea behind the UDP based protocol in the 3 PC Renderer is to have a signalling connection and a separate data connection for every abstract stream of data. The signal connection is used to synchronize the sender and the receiver at the beginning of data transfer. Once both are synchronized data is transfered over the data connection at the fastest rate possible. The sender informs the receiver how much data is going to be sent in the very first packet that it sends. Next it tries to send as much data as the OS send buffer can hold. If it tries to send more there is a high chance of buffer overflow and would lead to packet loss. The time interval between two such consecutive sends is controlled by the user through a command line prompt.
The client knows when it has received all packets and will signal back to the sender saying "I am done".
However both receiver and sender are only allowed a small time-interval in which they have to finish data transfer. This restriction is imposed by the application. Right now the timeout value is a constant but this could be easily be changed into an adaptive timeout value based on what the update frame rates and the percentage packet losses are.
In VAR mode there is only 1 UDP signal socket and 1 data socket. The payload in this case is always one composite point cloud however many depth map streams there may be. For an idea of the size of the point cloud and the time to deliver the data to the left and right eye renderer, refer to the Section on rendering performance. The details of the UDP based protocol is covered in the section on "Framing and Timing Issues".
The following file was modified
/etc/rc.d/rc.local
echo "4096 87380 8388608" > /proc/sys/net/ipv4/tcp_rmem
echo "4096 65536 8388608" > /proc/sys/net/ipv4/tcp_wmem
echo "8388608 8388608 8388608" > /proc/sys/net/ipv4/tcp_mem
# Increases socket buffer sizes.
echo 8388608 > /proc/sys/net/core/wmem_max
echo 8388608 > /proc/sys/net/core/rmem_max
echo 65536 > /proc/sys/net/core/rmem_default
echo 65536 > /proc/sys/net/core/wmem_default
This means :
/proc/sys/net/core/rmem_max = 8388608 = 8MB - Max Receive Window Size
/proc/sys/net/core/wmem_max = 8388608 = 8MB - Max Send
Window Size
/proc/sys/net/core/rmem_default = 65536 = 64KB - Default Receive Window
Size
/proc/sys/net/core/wmem_default = 65536 = 64KB - Default Send Window
Size
/proc/sys/net/ipv4/tcp_rmem = "4096 87380 4194304" = 4KB, 85KB, 4MB
- Receive Window ( min, default,max )
/proc/sys/net/ipv4/tcp_wmem = "4096 65536 4194304" = 4KB, 64KB, 4MB
- Send Window ( min, default, max )
/proc/sys/net/ipv4/tcp_mem = "4194304 4194304 4194304" = 4MB, 4MB,
4MB - TCP Memory ( min, pressure, max )
At any instant the sender tries to only send a finite number of packets which will fill the OS's write buffer. It waits for a short timeout interval ( this is specified through the command line ) before it sends the next batch. Before the sender starts sending out packets it calculates the number of batches it needs to send. When the sender has completed sending all the batches of packets, it waits for the receiver to signal back, saying its done. However it waits only for a finite amount of time based on a timeout which is proportional to the amount of data sent but will never be greater than a maximum value ( typically set to 1 second ).
The maximum amount of time the sender will ever take to deliver the payload is this fixed timeout interval ( which is set to 1 second in most of our test runs ). This worst case scenario happens only when packets are dropped on the network. In that case the receiver doesn't get all the packets and never signals back to the sender saying "I am done". The sender times out, and triggers a warning saying "The receiver's weren't responding" and goes ahead with the next frame of data. The receiver on the other hand also times out and realises that it has dropped packets .. it goes ahead and renders whatever data it has received and synchronises with the sender again at the beginning of the next update cycle.
The behaviour of this protocol has been tested thoroughly in VAR mode, amazingly it works quite well and transfers large data payloads in a small amount of time. The best part is that even if packets are dropped occasionally, the sender and the receiver recover within a few cycles. However the networking code still has to be tested thoroughly in Vertex Shader Cg Mode. The multiple channels of data in the Cg Mode may cause contention on the wire. Morever de-multiplexing of the data to the different receiving ports on the renderer may cause extra delay. This needs to be tested thoroughly.
Rendering Performance:
The columns in the table mean the following:
- Dataset Name : Describes whether the data set consisted of highly foreground pixels / hence smaller reconstruction percentage in every depth frame or whether it was a full scene reconstruction depth map
- Resolution : Depth Map resolution
- Bytes delivered by UDP per update : This is the UDP payload size between the recon_server (sender) and the renderers ( receivers ).
- Num of Points Rendered : the number of points in the point cloud, every point that is repeated is counted as many times it appears
- UDP Transfer Time : The time it took to transfer one point cloud payload / or one set of depth maps ( that will be displayed together ).
- Refresh Rate : The framebuffer refresh rate. This controls how smooth and interactive the rendered scene looks like. A refresh rate higher than 30 fps is usually quite comfortable for interactive walkthroughs and real life scenes.
- Update Rate : The rate at which depth maps were consumed by the renderer. During demo or test runs, the depth maps are fed in by a stream player which simulates the front end of the system. The update rate in the following column is the highest update rate that was tolerated by the renderer before it started dropping packets. Basically if we try to run the stream player at an update rate faster than this, the renderer will automatically slow it down. This is the speed at which the renderer becomes the bottleneck of the system for that particular data set size.
- Var Mode
For the numbers in the following table :
- TCP was used between the streamplayer ( reconstruction simulator - stream playback application ) and the renderer.
- Renderer was not compiled with optimization options.
- Some minimal data was being printed on the console
- Making these optimizations may improve the Update rate ( last column ) slightly.
- The following table pertains to a simulation when 9 - 640x480 streams were run using TCP between streamplayer and recon_server.
Conclusion : We are spending way too much time in parsing the streams. That is why when we have 9 streams running and we are displaying only 1 - we are able to run at only 1.7 frame/sec whereas in the case where we are running with only 1 stream, we can go upto 5.3 frames/sec/
Question about bandwidth utilization :
Inter PC bandwidth :
During these tests we were transferring data from recon_server to the left and right eye renderers at ( 31.5 / 0. 56 ) * 8 Mbits / sec = 450 Mbits/sec.
Incoming bandwidth :During these tests we were running streamplayer on a local machine directly connected to the same Gbit ethernet switch.
Each of 9 depth maps - 1.5 Mbytes each.
We recorded the time taken to read the stream off the network and parse them to be 0. 05 seconds each on an average.
Thus time taken to parse the 9 streams are taking up 0.45 seconds at the recon_server.
Since the framerate is 0.70 for 9 parallel streams : total bandwidth is only 1.5 * 9 * 8 / 0.45 = 240 Mbits / sec.
But where is the rest of the bandwidth going ? shouldn't this have roughly added to 1000 Mbits/sec ?In other words can we do better than 450 Mbits/sec over UDP between recon_server and left and right eye renderers assuming that we are trying to send as fast as we can.
The following table shows TCP vs RUDP throughput comparison
Dataset Name Resolution Bytes delivered by UDP / update. ( MB ) Num. of Points Rendered
UDP Transfer Time in secondsRefresh rate
( frames / sec )Update rate
per secondForegrnd only
3 Streams320 x 240 0.800 55,000 Foregrnd only
5 Streams320 x 240 1.3 MB 100,000 Full Scene
1 stream640 x 480 3.2 MB 232,000 Full Scene
2 streams640 x 480 6.3 MB 455,000 Full Scene
3 streams640 x 480 9.9 MB 720,000 Full Scene
4 streams640 x 480 13.3 MB 900,000 Full Scene
5 streams640 x 480 17.5 MB 1,160,000 Full Scene
7 streams640 x 480 Full Scene
9 streams640 x 480
- Vertex Array Cg Mode
Dataset Name Resolution Bytes delivered by UDP / update.
UDP
Transfer TimeRefresh rate Update rate Foregrnd 3 Streams 320 x 240 Forgrnd 5 Stream 640 x 480 Full Scene 1 Stream 640 x 480 Full Scene 2 Streams 640 x 480 Full Scene 3 Streams 640 x 480 Full Scene 5 Streams 640 x 480 Full Scene 7 Streams 640 x 480 Full Scene 8 Streams 640 x 480
Usage:
- General Instructions :
- Recon_server
Steps to run the recon_server
cd ~/stc/d7/3dti/bin/ti_viewer/hires414243/recon_server
./reconserver <cmd line options>
Command Line Options for recon_server-v Verbose mode
-p Assign incoming TCP port number.
-c Assign outgoing client machine name:port.
e.g. hires41-tn:5001 means port 5001 of hires41-tn.
-m Use UDP multicast.
-l Change default local machine name.
-n Local Network Interface on which to receive Depth Streams over TCP.
-i Assign timeout value between successive sends ( in millisecs ).
-y Vertex shader Cg mode enabled. [ Default Mode : VAR Mode ].
-r Num of frames to run before graceful exit [ only if profiling has been enabled].
- Left and Right Eye Renderer
Steps to execute on the relevant machines:
cd ~/stc/d7/3dti/bin/ti_viewer/hires414243./tiv_left <cmd line options>
./tiv_right <cmd line options>
Command Line Options for ti_viewer
-c [FILE] calibration file
-notracker do not use tracker updates (default off)
-simtracker simulate tracker reports (default off)
-nobackground do not create background office model (default off)
-eye [r/l/b] render eye right/left/both option (default left)
-full use full resolution textures for the model (default half)
-p port number to connect to (can't use with -udp option)
-udp [host] UDP socket retransmitter machine:port (can't use with -p option)
-multicast Use multicast on UDP socket retransmitter
-multithread Multithreaded version (default single-threaded)
-master [host] Peer Machine host name
-localhost [host] Change local machine name
-r Repeat so many frames ( enabled while doing profiling)
VAR Mode: Cmd Line Options For Recon_Server
./reconserver -p 5001 -p 5002 -p 5003 -i 3 // render 3 streams data arriving on ports 5001, 5002 , 5003.
// let the internal UDP Send timeout value be 3 milliseconds.Cmd Line Options For Ti_Viewer
./tiv_left -nobackground -multithread -master
./tiv_right -nobackground -multithread -slave <peer host name>
Cmd Line Options For Ti_Viewer
In either case we shall start seeing the rendered scene on both hires41-tn and hires42-tn ( full screen rendering ).Demo Setup :
- Start the VRPN Server ( Hardware Box ) by switching on the RED SWITCH.
- Start the HiBall 3.0 Server.
- Point the infra-red camera towards the ceiling and double check on the TV screen that the ceiling LED's are functioning correctly.
- Turn on the Projectors.
- The PC's the demos are run on are the following
- Hires41-tn ( Left Eye Renderer )
- Hires42-tn ( Right Eye Renderer )
- Hires44-tn / Hires43-tn ( Recon Server )
Renderer is ready ...
- Log on to all 3 Machines preferably using a lightweight window manager.
- Open 3 consoles on Hires44-tn / Hires43-tn , lets call them Console A, B, C.
- Remote Login to Hires41-tn and Hires42-tn on Console A, B respectively and export DISPLAY=:0.0 here.
- In console A @ hires41-tn ( remote console A ) run the left eye renderer.
- In console B @ hires42-tn ( remote console B ) run the right eye renderer.
- In console C @ hires44-tn / hires43-tn ( local console C ) run the recon_server
- Open another console, say D, and run the ntii_StreamPlayer to simulate the rest of the system for a stand-alone demo.
- Otherwise the front end system and the reconstruction modules need to be started at this point to start feeding depth maps
To check projector alignment.
Refer to the discussion regarding bandwidth
utilization. We need to see if myrinet can improve the UDP Transfer
time further compared to what we have.
we tried running in VAR Mode using the Myrinet local interfaces instead of the Gigabit interfaces for the 3PC Communication. During this test , the streamplayer was streaming depth maps to the recon_server on the Gigabit interface as before. However we ran into an issue with IP Multicasting over Myrinet. As of now the issue has still not been resolved.
The problem with the way we were doing Multicasting was the following :
===========================================================================
.
.
mreq.imr_multiaddr.s_addr=mcastAddr.s_addr;
mreq.imr_interface.s_addr=htonl(INADDR_ANY);
// Let the OS pick the network interface.
.
.
rc = setsockopt(sd,IPPROTO_IP,IP_ADD_MEMBERSHIP,
(void *) &mreq, sizeof(mreq) )
==========================================================================
We fixed this problem so that we could use only the Myrinet interface
for UDP Multicasting, by doing the following :
// myhostname =" Ip address of particular
interface "
ipAddr = gethostbyname(myhostname);
if (ipAddr==NULL) {
printf("UDPSndRcv:InitSocket");
exit(1);
}
bcopy((BCOPY_T *) ipAddr->h_addr, (BCOPY_T
*) &ia,ipAddr->h_length);
bcopy((BCOPY_T *) &ia, (BCOPY_T *) &mReq.imr_interface.s_addr, sizeof(struct in_addr));
rc = setsockopt(sd,IPPROTO_IP,IP_ADD_MEMBERSHIP,
(void *) &mreq, sizeof(mreq))
Now with the above piece
of code , we were able to choose the Gigabit Ethernet Interface to do multicasting
on, but IP Multicasting over Myrinet would still not work.
Currently when we run in VAR mode, a large composite point cloud model
is generated which becomes the payload that needs to be
transferred from the recon server to the left and right eye renderer
over UDP. The fact that every depth map pixel ( 5 bytes ) is converted
into a 3D Point (15 bytes) actually increases the dataset size. Refer to
the Section on network protocols
to see how this data size determines the timeout value for the UDP
sender. A larger timeout value means that the UDP sender will actually
wait that long before it goes and reads the TCP Connections for the next
set of depth map frames. Thus larger payloads imply a larger timeout value
and potentially longer wait intervals leading to bottlenecks in the whole
pipeline.
Things we could do :
Things we could do :
The current stream definition has the following characteristics : For every depth frame, the whole red plane is sent in 1 contiguous block, followed by the green plane and the blue plane and finally the depth buffer. Since depth extraction computations are done pixel by pixel or at least scanline by scanline, there must be a stage in the front end of the system where pixels are being non-interleaved and being arranged in the above contiguous fashion. However parsing the depth map in the renderer would be much more efficient if the R,G,B,inverse-z value were interleaved instead of being arranged in the current fashion. Thus there are 2 phases where the stream data is being de-interleaved and interleaved without any added benefits. These points in the system need to be identified and the parsing code in the renderer should be made more efficient.
Things we could do :
The current rendering architecture tries to optimize the rendering performance from a systems standpoint by trying to increase the framebuffer refresh rate and handle the highest reconstruction frame rate for as large a dataset as 1.5 million points. The current renderer is robust and can easily handle datasets on this size, refer to section on rendering performance.
However, the rendering strategy is a brute-force
one, because whatever points are generated on the basis of reconstruction
are all drawn and sent through the graphics pipeline. Even trying to reduce
the redundancies of repeated points in the point cloud is avoided because
this extra processing might slow down the renderer.
Based on the notion that any processing with the aim of improving rendering
quality or introducing adaptive simplification has a high cost, we avoiding
special processing.
With every depth map pixel that is
reconstructed a confidence value is computed as this is used as a heuristic
in the stereo correspondence algorithm. If this value is sent to the renderer
using the minimum number of bits, the renderer may be able to use this
to intelligently filter outliers.
Things we could do :