Discontinuity Preserving Stereo with Small Baseline Multi-Flash Illumination

Abstract

Currently, sharp discontinuities in depth and occlusions in multiview imaging systems pose serious challenges for many dense correspondence algorithms. However, it is important for 3D reconstruction methods to preserve depth edges as they correspond to important shape features like silhouettes which are critical for understanding the structure of a scene. In this paper we show how active illumination algorithms can produce a rich set of feature maps that are useful in dense 3D reconstruction. We start by showing how to compute a qualitative depth map from a single camera, which encodes object relative distances, being a useful prior for stereo. In a multiview setup, we show that along with depth edges, binocular half-occluded pixels can also be explicitly and reliably labeled. To demonstrate the usefulness of these feature maps, we show how they can be used in
two different algorithms for dense stereo correspondence. Our experimental results show that our enhanced stereo algoritms are able to extract high quality, discontinuity preserving correspondence maps from scenes that are extremely challenging for conventional stereo methods.

 

Depth Edges with Multi-Flash   

Small baseline multi-flash illumination allows reliable and efficient depth edge detection. The main observation is that when a flash illuminates a scene during image capture, thin slivers of cast shadow are created at depth discontinuities. Thus, if we can shoot a sequence of images in which different light sources illuminate the subject from various positions, we can use the shadows in each image to assemble a depth edge map using the shadow images.

NPR camera website

Qualitative Depth Map    

We use a single multi-flash camera to derive a qualitative depth map based on two important measurements: the shadow width, which encodes object relative distances, and the sign of each depth edge pixel, which indicates which side of the edge corresponds to the foreground and background. Based on this measurements, we create a depth gradient field and integrate it by solving a Poisson equation. The resultant map effectively segments objects in the scene, providing depth-order relations.

Generating the Normal Map

We used our NPR output obtained with multi-flash imaging as a height field to create a normal map. The main advantage of our approach is that the NPR image captures bumpy features such as hair, wrinkles, and beard, allowing us to create fine-detail 3D models and NPR illustrations automatically.

In order to create the normal map, we first negate the NPR texture image so that darker regions of the height field are lower and lighter regions are higher. Then, we compute the normals based on partial derivatives of the height field surface, exactly as demonstrated in the CG book [2], page 203.

Tangent-space Bump Mapping

Since we are dealing with arbitrary geometry, we need to align the coordinate system of the normal vectors in the normal map (tangent space) with the light vectors coordinate system. This is done by creating a rotation matrix for each vertex with columns specified by the correspondent normal, binormal, and tangent vectors. See CG book [2], page 225, for how to compute these vectors. We transformed both the light vector and the normal map vectors in the same consistent eye-space coordinate system.

Results


From left to right: 3D model, bump mapping with small scale, bump mapping with large scale.


From left to right: Texture-mapped 3D model, Fine-detail 3D model with bumpy features created automatically, Non-photorealistic illustration with large scale bump mapping.

 

 

Occlusion Map    

Binocular half-occlusion points are those that are visible in only one of the two views provided by a binocular imaging system. They are a major source of errors in stereo matching algorithms, due to the fact that half-occluded points have no correspondence in the other view, leading to false disparity estimation. By placing the light sources close to the center of projection of each camera, we can use the length of the shadows created by the lights surrounding the other camera to bound the half-occluded regions. This allows us to segment occlusions in both textured and non-textured regions, without relying on the correspondence problem

Depth Edge Preserving Stereo    

We now demonstrate the usefulness of these feature maps by incorporating them into two different dense stereo correspondence algorithms, one based on local search and the second one based on belief propagation.

Enhanced Local Stereo. We adopt a sliding window which varies in shape and size, according to depth edges and occlusion, to perform local correlation. In order to determine the size and shape of the window for each pixel, we determine the set of pixels that has aproximatelly the same disparity as the center pixel of the window. This is achieved by a region growing algorithm (starting at the center pixel) which uses depth edges and half-occluded points as boundaries. Only this set of pixels is then used for matching in the other view. The other pixels in the window are disconsidered, since they correspond to a different disparity.

Left View
Ground Truth
Local Stereo
Our Local Approach
 
 
Global Stereo
Our Global Approach

 

Enhanced Global Stereo. The best results achieved in stereo matching thus far are given by global stereo methods, particularly those based on belief propagation and graph cuts. We use the qualitative depth and occlusion maps as prior information for these methods, so that smoothness constraints are stopped at object boundaries and neighboring pixels along depth edges are encouraged to have disparity values according to depth differences in the qualitative map.

Multi-Flash Stereo Datasets   

We encourage researchers to develop novel methods for stereo that take into account small baseline multi-flash illumination. We are currently collecting new datasets with different camera-flash configurations.

Tripod Scene (tripod.zip 4.7MB) - Challenging scene with ambiguous patterns, textureless regions, thin structures and a geometrically complex object

 

Source Code    

Depth Edge Detection (dephEdges.zip 5.8MB) - Matlab code and test images

Qualitative Depth Map (qualitative.zip 1.29MB)- Matlab Code + Visual C++ 7.0 source code (or just the binary) + test images

Our setereo algorithms are modified versions of the code available at the Middlebury Stereo Vision webpage.

Phase Functions

When modeling scattering within the layer, we can use phase functions to describe the result of light interacting with particles in the layer. A phase function describes the scattered distribution of light after a ray hits a particle in the layer. We use Henyey-Greenstein phase function (see [3], page 55), which depends on the incident and outgoing ray, and takes an asymmetric parameter g, that ranges from -1 to 1, spanning the range of strong retro-reflection to strong forward scattering.

The phase function is used to determine the BRDF that describes single scattering from a medium (see [3], page 56), along with scattering albedo. Multiple scattering is empirically approximated by adding together three single scattering terms, with different values for the asymmetric parameter g.

Fresnel Effect

We need to consider the Fresnel effect, which happens when the light ray enters and exits the surface. This is important to determine the incoming and outgoing directions (and also intensity) of the light rays inside the medium, so that the BRDF/scattering is properly computed.

Results

From left to right: subsurface scattering with mostly backscattering (note the glow effect), same as before with bump mapping, subsurface scattering with mostly forward scattering.


Subsurface scattering tends to smooth the lighting effects. We are considering a constant surface thickness for the face, but properly modeling this would cause redish effects (due to blood interaction) along thin facial features, such as ears and nostrils.