Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
ABSTRACT
The availability of 3D content to that of the 2D
counterpart is still dwarfed despite of the significant
growth in last few years. To close this gap, many 2D-to3D image and video conversion methods have been
proposed. Specifically, the sense of depth in a scene
captured by conventional cameras, where virtually no
depth was ever provided, can now be experienced. A
sense of depth can be used in many applications, like
training simulations, gaming, scientific model
exploration, and in cinemas. All these applications can
use a sense of depth to get a sense of realism, and thus the
users can truly appreciate the content that they are
viewing. The method proposed here is a global method of
estimating the entire depth of an input query image from
a 3D repository(3D images+depth) by using a nearest
neighbour regression type idea.
Keywords - 3D images, depth, disparity, stereoscopic
images.
1. INTRODUCTION
Viewing 3D content, whether it be in the home, at the
cinema, or in scientific environments, has become
popular over the last few years. In the past, much research
focused on obtaining a sense of depth of a scene by using
only two views, corresponding to the left and right eye
viewpoints, known as stereo correspondence. Using a
reference viewpoint, whether it is the left or right eye, a
sense of depth is determined for each point in the
reference, where points in one image are matched with
their corresponding points in the other image. The amount
of shift that the corresponding points undergo is known as
disparity. Disparity and depth have an inverse
relationship, whereby higher the horizontal shift, or
disparity, the smaller the depth, and the closer the point is
to the viewer.
By presenting these two views of a scene to a human
observer, where one is slightly offset from the other, this
produces the illusion of depth in an image, thus
perceiving an image as 3D. The work is primarily done
www.ijsret.org
586
International Journal of Scientific Research Engineering & Technology (IJSRET), ISSN 2278 0882
Volume 4, Issue 5, May 2015
www.ijsret.org
587
International Journal of Scientific Research Engineering & Technology (IJSRET), ISSN 2278 0882
Volume 4, Issue 5, May 2015
3. 2D TO 3DCONVERSION BASED ON
GLOBAL
METHOD
OF
DEPTH
LEARNING WITH INPUT IMAGE
DENOISING
In this section a method that estimates the global depth
map of a query image or video frame directly from a
repository of 3D images (image+depth pairs or
stereopairs) using a nearest-neighbor regression type idea
is developed.
The approach proposed here is built upon a key
observation and an assumption. The observation is that
among millions and billions of 3D images available online, there likely to exist many whose 3D content matches
that of a 2D input (query) we wish to convert to 3D. We
are also making an assumption that two images that are
photometrically similar also have similar 3D structure
(depth). This is not unreasonable since photometric
properties are often correlated with 3D content (depth,
disparity). For example, edges in a depth map almost
always coincide with photometric edges. Given a
monocular query image Q, assumed to be the left image
of a stereopair that we wish to compute, we rely on the
above observation and assumption to learn the entire
depth field from a repository of 3D images I and render a
stereopair in the following steps:
1) Image denoising: removing the noise of input query
images using NL means filtering.
2) search for representative depth fields: find k 3D images
in the repository I that have most similar depth to the
query image, for example by performing a k nearest-
www.ijsret.org
588
International Journal of Scientific Research Engineering & Technology (IJSRET), ISSN 2278 0882
Volume 4, Issue 5, May 2015
www.ijsret.org
589
International Journal of Scientific Research Engineering & Technology (IJSRET), ISSN 2278 0882
Volume 4, Issue 5, May 2015
K}.
(1)
Although these depths are overly smooth, they provide a
globally correct, although coarse, assignment of distances
to various areas of the scene.
3.4 Cross Bilateral Filtering
While the median-based fusion helps make depth more
consistent globally, the fused depth is overly smooth and
locally inconsistent with the query image due to edge
misalignment between the depth fields of the kNNs and
the query image. This often results in the lack of edges in
the fused depth where sharp object boundaries should
occur and/or the lack of fused-depth smoothness where
smooth depth is expected.
In order to correct this, similarly to Agnot et al. [6], a
cross-bilateral filtering (CBF) is applied. CBF is a variant
of bilateral filtering, which is an edge-preserving image
smoothing method that applies anisotropic diffusion
controlled by the local content of the image itself [7]. In
CBF, however, the diffusion is not controlled by the local
content of the image under smoothing but by an external
input. CBF is applied to the fused depth d by using the
query image Q to control diffusion. This allows the
conversion to achieve two goals simultaneously:
alignment of the depth edges with those of the luminance
Y in the query image Q and local noise/granularity
suppression in the fused depth d.
4. EXPERIMENTAL RESUTS
We have tested our approach on Make3D outdoor dataset
with depth fields captured by a laser range finder . Note
that the Make3D[8] images are of 1704 2272 resolution
but the corresponding depth fields are only of 55 305
spatial resolution and relatively coarse quantization.
Therefore, for computational efficiency, we have re-sized
the images to 240 320 resolution. We selected one
image+depth pair from a database as the 2D query
treating the remaining pairs as the 3D image repository I
based on which a depth estimate and a right-image
estimate
are computed. As the quality metric, we
used normalized cross-covariance C between the
estimated depth and the ground-truth depth . The
normalized cross-covariance C takes values between 1
and +1 (for values close to +1 the depths are very similar
and for values close to 1 they are complementary).
In addition to this with the modification of image
denoising for getting error free input images, the error
rate of the output has also been reduced. This
performance improvement of the algorithm is evaluated
using the attributes Mean Squared Error (MSE) and its
corresponding Peak Signal to Noise Ratio (PSNR). The
performance graphs are shown in Fig.5 and Fig.6.
Table 1. Average and Normalized Cross-Covariance C
Computed Across all Images in Make 3D for Proposed
Method, Make 3D algorithm, Konrad et al[1]
Global
GLCM
Make 3D
Konrad et al
Average C
0.82
0.78
0.80
Median C
0.88
0.78
0.86
www.ijsret.org
590
International Journal of Scientific Research Engineering & Technology (IJSRET), ISSN 2278 0882
Volume 4, Issue 5, May 2015
5. CONCLUSION
We have proposed a new class of methods aimed at 2Dto- 3D image conversion that are based on the radically
different approach of learning from examples. The
method we proposed is based on globally estimating the
entire depth field of a query directly from a repository of
image+depth pairs using nearest neighbour based
regression using GLCM for feature extraction. We have
objectively validated our algorithms performance against
state-of-the-art algorithms. Our global method performed
better than the state-of-the-art algorithms in terms of
cumulative performance across Make 3D dataset and has
done so at a fraction of CPU execution time. Anaglyph
images produced by our algorithms result in a
comfortable 3D experience but are not completely void of
distortions. Certainly, there is room for improvement in
the future. With the continuous increase in amount of 3D
data on-line and with the rapidly growing computing
power in the cloud, the proposed work seems a promising
alternative to operator-assisted 2D-to-3D image and video
conversion.
ACKNOWLEDGEMNTS
Fig. 5 Performance evaluation of the proposed method
with Konrad[1] using MSE
www.ijsret.org
591
International Journal of Scientific Research Engineering & Technology (IJSRET), ISSN 2278 0882
Volume 4, Issue 5, May 2015
REFERENCES
1. J.Konrad, M. Wang, P. Ishwar, C. Wu and D.
Mukherjee, Learning-Based, Automatic 2D to3D
Image and Video Conversion, IEEE Trans. Image
Processing, Vol.22, No.9. pp. 3485-3496, Sep
2013
2. B. Liu, S. Gould, and D. Koller, Single image
depth estimation from predicted semantic labels,
in Proc. IEEE Conf. Comput. Vis. Pattern
Recognit., Jun. 2010, pp. 12531260.
3. Saxena, M. Sun, and A. Ng, Make3D: Learning
3D scene structure from a single still image,
IEEE Trans. Pattern Anal. Mach. Intell., vol. 31,
no. 5, pp. 824840, May 2009.
4. Torralba, R. Fergus, and W. T. Freeman, 80
million tiny images: A large data set for
nonparametric object and scene recognition,
IEEE Trans. Pattern Anal. Mach. Intell., vol. 30,
no. 11, pp. 19581970, Nov. 2008.
5. M. Wang, J. Konrad, P. Ishwar, K. Jing, and H.
Rowley, Image saliency: From intrinsic to
extrinsic context, in Proc. IEEE Conf. Comput.
Vis.Pattern Recognit., Jun. 2011, pp. 417424.
6. L. Angot, W.-J. Huang, and K.-C. Liu, A 2D to
3D video and image conversion technique based
on a bilateral filter, Proc. SPIE, vol. 7526, p.
75260D, Feb. 2010.
7. F. Durand and J. Dorsey, Fast bilateral filtering
for the display of high-dynamic-range images,
ACM Trans. Graph., vol. 21, pp. 257266, Jul.
2002.
8. (2012).
Make3D
[Online].
Available:
http://make3d.cs.cornell.edu/data.html
www.ijsret.org
592