Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Overview
This project served as an introduction to calculating the disparity between a stereo set of
images. A stereo image set is comprised of two images from two separate cameras. In these
stereo images, the two cameras are separated with a baseline vector comprised of only
horizontal displacement. More specifically, the two cameras are at the same height and
distance from the object of interest. In a stereo camera setup, parts of the scene in the right
camera may not be seen by the left camera, and parts of the scene in the left camera may
not be seen by the right camera. This is known as occlusion. Occlusion in the left and right
images is problematic for disparity because it cannot be defined. The disparity of the stereo
image is defined as the displacement between the pixels in the left and right image that
correspond to the same, precise location in the physical world.
The results shown in the project are calculated from the Middlebury College’s computer
vision website’s motorbike and chair stereo image set. The motivation of this project was to
determine how the different patch-size for matching patches affects the disparity image. The
disparity image is defined as the two dimensional matrix that represents the displacement of
the pixel of interest in the left image to the location of the pixel of interest in the right image.
The three main measurements for patch matching are the sum of square differences, raw
correlation, and the normalized cross correlation. This project focuses on applying the
normalized cross correlation for determining patch matching. To speed up the computation,
this project downsamples the stereo images by a factor of four. Downsampling allows the
adjustment of patch-size to not affect the computation time significantly.
The maximum disparity for the native images of the stereo images of the motorbike and
chair are 270 pixels and 280 pixels, respectively. These are provided with the stereo image
sets, and are from true physical disparity calculation calibrated for the native resolution. Due
to downsampling, the maximum disparity in the processed disparity image is 67.5 pixels and
70 pixels, respectively.
The patch-size affects the apparent granularity of the processed disparity images. Smaller
patch-sizes calculates the disparity image quicker because there are less calculations per
corresponding patch-matching calculation, but appears more coarse However, having larger
patch-sizes adds extra constraints that classifies a patch matching pair to be the perfect
match. More specifically, the smaller the patch-sizes are, the more likely patch matching
pairs are not unique. In general, the larger the patch-size is the more smoother the
processed disparity image is. The details of how the disparity is calculated from patch
matching, is described in the outline of procedural approaches.
Outline of Procedural Approaches
Our program consists primarily of two files: main.m and ourDisparity.m, where ourDisparity is
a function. The program starts in main where it reads the left and right images into the
program, both of which are cast as doubles. We next define a variable for max disparity to
be 270 and a variable for patch width to be an arbitrary number. This width is used later on
as both the width and height of our patch.
Using the saved images that were read in, we extract just the green channel of the two
images so we have two single-channel images. We then scale down our single-channel
images by a factor of 1/N where N is an arbitrary integer. Once we have all of these
components, we are able to call the function ourDisparity, which calculates the disparity
image using the grayscale images, patch width, and maximum disparity as inputs.
ourDisparity starts off by defining two matrices; one is for the left image and contains a patch
for all pixels in the image that, when at the center of a patch, do not have any values which
would lie outside of the image. The other matrix is for the right patch, where we store rows of
the image and each row has a height that corresponds to the patch heights. There are as
many rows as there are rows of lines of pixels who, when at the center of the row (along the
x-axis), would not have any values which lie outside of the image (i.e. rows with the centers
within patch width / 2 of the top or bottom will not be included). These rows are the epipolar
constraints of the stereo image setup.
From there, the two matrices are filled using two nested for loops and the disparity image is
calculated using normalized cross correlation (NCC). We decided on two nested for loops
because it was a format we both understand conceptually and was as fast as any other
method that we tested. The outer for loop uses the variable i to represent a row of pixels
where said row is the center of a row of height “patch width”. i starts at the closest point to
the top of the image where the pixel would not have any patch pixels outside of the image
and goes to the bottom of the image with the same condition. In this loop, we calculate the
bounds for the top and bottom of the image rows to extract from the right image. The
extracted row is then added to our row matrix.
The inner loop utilizes the variable j and represents the columns of the image and each
column is the center of a patch. With the same patch conditions as i , j starts at the left of the
image and works towards the right. The inner loop starts by calculating the bounds for the
left, right, top and bottom of the patch used for the left image. These bounds are then used
to extract the patch from the left image; the patch is stored in the left patch matrix.
When a patch is calculated, we then use the function normxcorr2() to search for the patch in
the corresponding row of the right image that most closely matches the left patch. All of the
patch scores are stored in the variable patchMatch. We then look along the centerline of
patchMatch to determine which pixel value has the highest score when compared to the
current patch (while also being within the maximum disparity threshold). Once this index is
found, we find the distance between the two pixels. The distance is stored in our disparity
image matrix at the same index as our left patch’s center pixel. The loops continue on until
all disparity values are calculated.
A flowchart showing the structure of our code can be seen below:
Figure 1 - Flowchart showing the layout of our code
Experimental Observations
The first set of figures 2 through 4 shows the normalized cross correlation disparity image
calculated from the motorbike stereo image. In each of these cases, the resolution of the
motorbike stereo image was reduced by a factor of 4, to speed up computation. The patch
sizes that were studied are 5x5, 7x7 and 11x11. It is important to note that as the patch size
increases, progressively fewer border pixels are computed in the disparity image. Figures 2
to 4 very subtly show the increasing black border. Note: in MATLAB the figures have the full
black border but when saving the file as a .png, some of the border has been partially
removed for some of the images.
Figure 2 - Disparity image of the motorbike with a patch size of 5x5 using normalized cross
correlation. This image is downsampled by a factor of four.
Figure 3 - Disparity image of the motorbike with a patch size of 7x7 using normalized cross
correlation. This image is downsampled by a factor of four.
Figure 4 - Disparity image of the motorbike with a patch size of 11x11 using normalized
cross correlation. This image is downsampled by a factor of four.
Through observation, it can be seen that as the patch size becomes smaller, the disparity
image becomes coarser. Opposingly as the patch size becomes larger, the disparity image
becomes smoother. This is because each patch considers a larger sample of the image and
thus more reliable when determining the disparity. It is more reliable because the larger the
patch size, the smaller the probability of a mismatch from a patch in the left stereo image to
a patch in the right stereo image. When observing an image, details of a finer scale are more
susceptible to repetition than they would be at a larger scale. Another observation to note
however, is as the patch size becomes larger, so does the computational complexity of
determining the disparity image. Through this analysis, there is a tradeoff one must consider:
the importance of speed vs. accuracy.
Another observation that was taken into consideration was how downsampling affects the
disparity image. Figure 5 shows the disparity image calculated with a patch size of 11x11
using normalized cross correlation, but is downsampled by a factor of 2 instead of 4.
Figure 5 - Disparity image of the motorbike with a patch size of 11x11 using normalized
cross correlation. This image is downsampled by a factor of two.
An interesting thing to note is that as the resolution of the stereo images increase, the
smaller the patch size 11x11 appears to be relative to the image. This image, now with a
patch size of 11x11, has the same level of “coarse” details as seen in the disparity image
that was downsampled by a factor of 4 using a patch size of 5x5. The less the stereo image
is downsampled, the smoother the disparity image is because there is less information lost
when downsampling. This can be seen in that figure 5 appears to have smoother surfaces
than in figure 2 with an equivalent level of detail. The right and left image differ very subtly,
as the camera is only shifted horizontally, so an equivalent region in both the left and right
image will change very slightly during the downsampling process.
The next set of figures 6 to 8 show a different stereo image. This stereo image depicts a
rocking chair. In this case, the patch sizes were chosen to be much larger and the stereo
image was downsampled by a factor of 4. The patch sizes under consideration are 11x11,
21x21, and 35x35. These were chosen to study how much larger patch sizes affects the
disparity image.
Figure 6 - Disparity image of the chair with a patch size of 11x11 using normalized cross
correlation. This image is downsampled by a factor of four.
Figure 7 - Disparity image of the chair with a patch size of 21x21 using normalized cross
correlation. This image is downsampled by a factor of four.
Figure 8 - Disparity image of the chair with a patch size of 35x35 using normalized cross
correlation. This image is downsampled by a factor of four.
As expected from figures 6 to 8, as the patch size of the image is increased, the smoother
the disparity image appears. Also, it is very evident on which pixel disparities cannot be
computed from the black borders. This is because if a patch were to be centered on one of
the border pixels, the extracted patch would smaller than specified by the patch size. In this
project, no border handling was considered. For the chair, it appears that patch size of
35x35 is the best choice for the disparity image because it does not have any high
disparities calculated in the background as is present in the 11x11 case and slightly in the
21x21 case. There are some errors on the back of the chair and throughout the image, but it
is much better at generally segmenting different regions of depth within the image without
too much error. As mentioned before, it is expected that the disparity image will be more
accurate without downsampling. To get a comparable result as with the patch size of 35x35,
the patch size should be increased to reflect the increase in resolution by not downsampling
as much.
Exploration
Out of the three ways to calculate the disparity image, this project shows that the normalized
cross correlation is the best. The other two methods are based on patch matching using raw
correlation values and sum of squared differences. Figure 9 shows the same motorbike
stereo image downsampled by a factor of 4. The patch size under consideration is 11x11
and the patch matching used was raw correlation. In this implementation of raw correlation,
the average value of the patch in the left image was subtracted from the patch before being
matched with the patches in the right image.
Figure 9 - Disparity image of the motorbike with a patch size of 11x11 using raw correlation.
This image is downsampled by a factor of four.
This disparity image is saturated with high disparity measurements all over, in the
background and in the foreground. Interestingly enough, the outline of the motorbike is still
discernable. In comparison to normalized cross correlation, this result is far less accurate.
Figure 10 - Disparity image of motorbike with 11x11 patch size using sum of squared (SSD)
differences (downscaled by a factor of 4)
The disparity image using SSD as a patch matching method seen in figure 10 has highly
varying grayscale values across the image. The individual features of the image cannot be
discerned with any fair degree of certainty, though the overall outline of the bike is somewhat
visible. With that being said, using SSD as a patch matching method is far less accurate
than using NCC in our case.
Overview Matthew