Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Deformation
Salah Eddine KABBOUR1 Pierre-Yves RICHARD2
Abstract— In this paper, a novel fully automated method is In order to solve the problem of human ear low texture
developed to acquire an accurate surface 3D reconstruction Liu et al. [6] use an ear sampling device which consists
of the human ear by using multi-view stereo vision and of a mini-room in the shape of half cylinder with a fixed
morphable model without texture. As the results show, our
method outperform state of the art approaches. illumination, where people put their ear in a whole then
Our method is based on using a template to estimate several photos are taken, this device allows for a precise ear
the pose and orientation of the camera without relying on segmentation and feature extraction ; they also used Harris
correspondences, and after dense reconstruction is done, the ear corner detection paired with RANSAC for filtering outliers
morphable model is fitted on this point cloud by minimizing the and a semi automatic approach for further correspondence
distance between them, the form of the model can be transform
as wished by its coefficients, and it only uses shape without detection.
relying on texture to converge its coefficients. Zenget al [7] also use a device for photo acquisition, in
this case, a binocular stereo cameras are used to obtain a
I. I NTRODUCTION 3D ear points. From only two photos, SIFT [8] matching
The 3D reconstruction of the human ear have been the method combined with the knowledge of epipolar constraints
subject of different studies for a while now, but the interest is utilized in order to produce sparse correspondences.Then,
have peaked in late years due to the ever-growing number of a match propagation algorithm based on ZNCC is used to
applications using this technology. First of all, the current produce a semi-dense result.
human face recognition algorithms are not fail-proof and In our work, we focus on an approach that can be used by
also are very limited when the face is occluded or partially the masses without the need of purchasing a pricey device,
invisible in the image, so coupling it with ear recognition is also we went beyond just dense points, our results are in the
a preferred approach, As Pflug et al. [1] showed in their form of accurate ear representing meshes, which is much
survey, the 3D based methods outperform the others in more useful in simulations.
human identification via ear shape. Even in the absence of the
human face, ear biometrics alone are capable of accurately II. M ARKER A SSISTED M ULTIVIEW STRUCTURE
identifying people, this was firstly demonstrated by Alfred Reconstruction the 3D form of an object is not cutting
Iannarelli [2] results after conducting his experiments on edge science, as there exist a dozen of methods that achieve
large ear database, his success lead to his work being used this purpose, however the problem with the uniform texture
in criminal investigation. of ear makes it hard for traditional systems like Bundler with
An other emerging field of study that uses the human 3D SIFT [9] to give an accurate result, this is one of the reasons
ear model is the three dimensional sound. This is possible why state of the art methods use specialized devices.
because the way sound waves are deformed by the ear shape, Our proposed method consists of using a template that can
these changes are described by the Head Related Transfer be printed by anyone and a smart phone camera, this template
function (HRTF) which is unique for each person, so in order is used for accurate acquisition of photos which allows
to find the HRTF for a specific person, we can simulate the for precise estimation of camera positions and orientation
deformation of the sound waves once we have the 3D shape relative to the ear position.
of the ear, this has been shown to be possible in [3] [4].
Numerous studies focus on the problem of human ear A. The Template
reconstruction. Cadavid et al. [5] approach consisted of The template consists of 36 AprilTags [10] which has
taking several photos(or frames of video) of the person been proven to give robust corner detection, sometimes even
ear, then they applying different shape from shading (SFS) in tough illumination conditions, each tag has a unique
techniques on each photo, this method is prone to a lot of identifier and measures 1 centimetre in length and width,
inaccuracy in determining the 3D form, this is why they the template is made in a way that the ear would be placed
combined all the results by selecting the one which has in the center, meaning even if some tags are not visible in
the greatest similarity with the rest of the 3D models. This images, the camera will still take the whole ear (the template
method require fixed illumination, and the slightest variation is shown in figure 1).
in brightness threatens the success of the method. An adaptive threshold must be used for detection in order
to mitigate the problems of illumination variation. Each tag
*CentraleSupelec
1 salah7ddine@gmail.com detected in image provides us with 4 corners which adds up
2 pierre-yves.richard@centralesupelec.fr to 144 points in each image at max.
(a) (b)
Fig. 3: The morphable model is deformed in order to fit
the human ear, as (b) shows, it can have a different scale
compared to starting point (a) which the average ear before
optimization.
IV. R ESULTS
The entire process is automated and does not need any
human intervention or any special device except of the
template which can be printed and used.
The images used in all experience have the size of
1836x3264 but were reduced to 459x816 in order to ensure
a proper functionality of PMVS, these images were taken
by a cell phone. The algorithm were developed entirely
on python except PMVS, and it took around 4 minutes on
average for the whole process on a machine with CPU v3
3.30 Ghz and RAM of 8 Go, the run time can be drastically
Fig. 4: the heatmap is calculated by searching for the nearest
reduced through algorithm optimisation and implementation
neighbor of each point from the morphable model to in the
with faster languages like C++, the whole algorithm could
set of points in the ground truth , (b) shows the model
run close to real time.
The number of images taken is between 10 and 20 images heatmap distance error (average 0.0685 cm error), (c) show
with no pre-processing, images that are too close or too far the mean ear heatmap error before fitting (average error
are filtered automatically (due to absence of tags). 0.0808 cm), (a) is the eFit scan
A variation of the Sequential Least SQuares Programming
[17] was used to minimize the error function. The results
are shown in figures 3 and 4, quantifying the result is done ear, also it works well for bigger or smaller ears compared
through comparing the model at the end of the optimization to the ones in the database, since the optimization scales the
with the eFit scans which were considered as ground truths, morphable model to fit the size of the subject ear. but still
this is done by aligning the model with the ear scan using ICP there is room for improvement, for example the last part of
and then calculating the distance to the nearest neighbour. the ear (Lobule) has more error compared to the rest, and
Our results show an overall successful fitting of the ear, this is due to the fact that this part is not solid and thus
and this is especially true when compared with the mean vulnerable to the slightest movement caused by placing the
template. After talking with the experts in biometrics and
3D sound generation, it turns out that this does not affect
the result in any meaningful way.
V. C ONCLUSION
We have proposed a novel method to reconstruct the
surface of the ear that best fit dense cloud reconstruction, we
also showed the entire pipeline to automatize and acquire the
surface reconstruction of the human ear from only cell phone
images using a printed template while most other method
are not fully automatic. future work can be done to remove
the usage of template entirely while maintaining the same
level of accuracy and fast running time. This can be done
by looking into novel matching algorithms aiming to find
accurate and rich matches for texture-less objects like human
ears.
R EFERENCES
[1] A. Pflug and C. Busch, “Ear biometrics : a survey of detection, feature
extraction and recognition methods,” IET biometrics, vol. 1, no. 2, pp.
114–129, 2012.
[2] A. V. Iannarelli, Ear identification. Paramont Publishing Company,
1989.
[3] S. Ghorbal, T. Auclair, C. Soladié, and R. Séguier, “Pinna
morphological parameters influencing hrtf sets.”
[4] S. Ghorbal, R. Séguier, and X. Bonjour, “Process of hrtf
individualization by 3d statistical ear model,” in Audio Engineering
Society Convention 141. Audio Engineering Society, 2016.
[5] S. Cadavid and M. Abdel-Mottaleb, “3-D ear modeling and recognition
from video sequences using shape from shading,” IEEE Transactions
on Information Forensics and Security, vol. 3, no. 4, pp. 709–718,
2008.
[6] H. Liu and J. Yan, “Multi-view Ear Shape Feature Extraction and
Reconstruction.” IEEE, Dec. 2007, pp. 652–658.
[7] H. Zeng, Z.-C. Mu, K. Wang, and C. Sun, “Automatic 3d ear
reconstruction based on binocular stereo vision,” in Systems, Man
and Cybernetics, 2009. SMC 2009. IEEE International Conference
on. IEEE, 2009, pp. 5205–5208.
[8] D. G. Lowe, “Object recognition from local scale-invariant features,”
in Computer vision, 1999. The proceedings of the seventh IEEE
international conference on, vol. 2. Ieee, 1999, pp. 1150–1157.
[9] N. Snavely, S. Seitz, and R. Szeliski, “Photo tourism : Exploring
image collections in 3d (2006),” URL http ://www. cs. cornell. edu/˜
snavely/bundler.
[10] E. Olson, “Apriltag : A robust and flexible visual fiducial system,” in
Robotics and Automation (ICRA), 2011 IEEE International Conference
on. IEEE, 2011, pp. 3400–3407.
[11] B. M. Haralick, C.-N. Lee, K. Ottenberg, and M. Nölle, “Review and
analysis of solutions of the three point perspective pose estimation
problem,” International journal of computer vision, vol. 13, no. 3, pp.
331–356, 1994.
[12] F. Moreno-Noguer, V. Lepetit, and P. Fua, “Accurate non-iterative o
(n) solution to the pnp problem,” in Computer vision, 2007. ICCV
2007. IEEE 11th international conference on. IEEE, 2007, pp. 1–8.
[13] S. Li, C. Xu, and M. Xie, “A robust o (n) solution to the perspective-
n-point problem,” IEEE transactions on pattern analysis and machine
intelligence, vol. 34, no. 7, pp. 1444–1450, 2012.
[14] Y. Furukawa and J. Ponce, “Accurate, dense, and robust multiview
stereopsis,” IEEE transactions on pattern analysis and machine
intelligence, vol. 32, no. 8, pp. 1362–1376, 2010.
[15] V. Blanz and T. Vetter, “Face recognition based on fitting a 3d
morphable model,” IEEE Transactions on pattern analysis and
machine intelligence, vol. 25, no. 9, pp. 1063–1074, 2003.
[16] T. Vetter and V. Blanz, “Estimating coloured 3d face models from
single images : An example based approach,” in European Conference
on Computer Vision. Springer, 1998, pp. 499–513.
[17] D. Kraft, “A software package for sequential quadratic programming,”
Forschungsbericht- Deutsche Forschungs- und Versuchsanstalt fur
Luft- und Raumfahrt, 1988.