Sei sulla pagina 1di 309

DICTIONARY OF COMPUTER VISION AND

IMAGE PROCESSING
Dictionary of Computer Vision and
Image Processing

R. B. Fisher
University of Edinburgh

K. Dawson-Howe
Trinity College Dublin

A. Fitzgibbon
Oxford University

C. Robertson
CEO, Epipole Ltd

C. Williams
University of Edinburgh

A JOHN WILEY & SONS, INC., PUBLICATION


c
Copyright 2004 by John Wiley & Sons, Inc. All rights reserved.

Published by John Wiley & Sons, Inc., Hoboken, New Jersey.


Published simultaneously in Canada.

No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form
or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as
permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior
written permission of the Publisher, or authorization through payment of the appropriate per-copy fee
to
the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400,
fax (978) 646-8600, or on the web at www.copyright.com. Requests to the Publisher for permission
should
be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ
07030, (201) 748-6011, fax (201) 748-6008.

Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts
in
preparing this book, they make no representations or warranties with respect to the accuracy or
completeness of the contents of this book and specifically disclaim any implied warranties of
merchantability or fitness for a particular purpose. No warranty may be created ore extended by sales
representatives or written sales materials. The advice and strategies contained herin may not be
suitable for your situation. You should consult with a professional where appropriate. Neither the
publisher nor author shall be liable for any loss of profit or any other commercial damages, including
but not limited to special, incidental, consequential, or other damages.

For general information on our other products and services please contact our Customer Care
Department with the U.S. at 877-762-2974, outside the U.S. at 317-572-3993 or fax 317-572-4002.

Wiley also publishes its books in a variety of electronic formats. Some content that appears in print,
however, may not be available in electronic format.

Library of Congress Cataloging-in-Publication Data:

Dictionary of Computer Vision and Image Processing / Robert B. Fisher . . . [et al.].
***
Printed in the United States of America.
10 9 8 7 6 5 4 3 2 1
From Bob to Rosemary,
Mies, Hannah, Phoebe
and Lars

From AWF to Liz, to my


parents, and again to D.

To Karen and Aidan.


Thanks pips !

From Ken to Jane,


William and Susie

From Manuel to Emily,


Francesca, and Alistair
CONTENTS

Preface ix

References xiii

vii
PREFACE

This dictionary arose out of a continuing interest in the resources needed by


beginning students and researchers in the fields of image processing, computer
vision and machine vision (however you choose to define these overlapping fields).
As instructors and mentors, we often found confusion about what various terms
and concepts mean for the beginner. To support these learners, we have tried to
define the key concepts that a competent generalist should know about these
fields.
This second edition adds approximately 1000 new terms to the more than 2500
terms in the original dictionary. We have chosen new terms that have entered
reasonably common usage (e.g., appeared in the index of influential books), and
terms that were not included originally. We are pleased to welcome Chris Williams
into the authorial team and to thank Manuel Trucco for all of his help in the past.
One innovation in the second edition is the addition of reference links for a
majority of the old and new terms. Unlike more traditional dictionaries, which
provide references to establish the origin or meaning of the word, our goal here
was instead to provide further information about the term. Hence, we tried to use
Wikipedia as much as possible as this is an easily accessible and constantly
improving resource.
This is a dictionary, not an encyclopedia, so the definitions are necessarily brief
and are not intended to replace a proper textbook explanation of the term. We
have tried to capture the essentials of the terms, with short examples or
mathematical precision where feasible or necessary for clarity.
Further information about many of the terms can be found in the references
below. These are mostly general textbooks, each providing a broad view of a

ix
x PREFACE

portion of the field. Some of the concepts are also quite recent and, although
commonly used in research publications, have not yet appeared in mainstream
textbooks. Thus this book is also a useful source for recent terminology and
concepts.
Certainly some concepts are still missing from the dictionary, but we have
scanned both textbooks and the research literature to find the central and
commonly used terms.
Although the dictionary was intended for beginning and intermediate students
and researchers, as we developed the dictionary it was clear that we also had some
confusions and vague understandings of the concepts. It also surprised us that
some terms had multiple usages. To improve quality and coverage, each definition
was reviewed during development by at least two people besides its author. We
hope that this has caught any errors and vagueness, as well as reproduced the
alternative meanings. Each of the co-authors is quite experienced in the topics
covered here, but it was still educational to learn more about our field in the
process of compiling the dictionary. We hope that you find using the dictionary
equally valuable.
The authors would like to thank Xiang (Lily) Li and Georgios Papadimitriou for
their help with finding citations for the content from the first edition. We also
greatly appreciate all the support from the Wiley editorial and production team!

To help the reader, terms appearing elsewhere in the dictionary are underlined.
We have tried to be reasonably thorough about this, but some terms, such as 2D,
3D, light, camera, image, pixel and color were so commonly used that we decided
to not cross-reference all of these.

We have tried to be consistent with the mathematical notation: italics for scalars
(s), arrowed italics for points and vectors (~v ), and mathbf letters for matrices (M).
xi

The reference for most of the terms has two parts: AAA: BBB. The AAA
component refers to one of the items listed below. The BBB component normally
refers to a chapter/section/page in reference AAA. Wikipedia entries (WP) are
slightly different, in that the BBB term is the relevant Wikipedia page
http://en.wikipedia.org/wiki/BBB
REFERENCES

1. A. Gibbons. Algorithmic Graph Theory, Cambridge University Press, 1985.


2. A. Hornberg (Ed.). Handbook of Machine Vision, Wiley-VCH, 2006.
3. A. Jain. Fundamentals of Digital Image Processing, Prentice Hall Intl, 1989.
4. A. Low, Introductory Computer Vision and Image Processing, McGraw-Hill, 1991.
5. A. Papoulis, Probability, Random Variables, and Stochastic Processes, McGraw-Hill,
New York, Third Edition, 1991.
6. T. Acharya, A. K. Roy. Image Processing, Wiley, 2005.
7. B. A. Wandell. Foundations of Vision, Sinauer, 1995.
8. D. Ballard, C. Brown. Computer Vision, Prentice Hall, 1982.
9. A. M. Bronstein, M. M. Bronstein, R. Kimmel. Numerical Geometry of Non-Rigid
Shapes, Springer, 2008.
10. B. G. Batchelor, D. A. Hill, D. C. Hodgson. Automated Visual Inspection, IFS, 1985.
11. A. Blake, S. Isard. Active Contours, Springer, 1998.
12. J. C. Bezdek, J. Keller, R. Krisnapuram, N. Pal, Fuzzy Models and Algorithms for
Pattern Recognition and Image Processing, Springer 2005.
13. B. K. P. Horn. Robot Vision, MIT Press, 1986.
14. M. Bennamoun, G. J. Mamic. Object Recognition - Fundamentals and Case Studies,
Springer, 2002.
15. B. Noble, Applied Linear Algebra, Prentice-Hall, 1969.
16. R. D. Boyle, R. C. Thomas. Computer Vision: A First Course, Blackwell, 1988.
17. S. Boyd, L. Vandenberghe. Convex Optimization, Cambridge University Press, 2004.

Dictionary of Computer Vision andImage Processing, First Edition. By Robert B. Fisher, et alxiii
c
ISBN *** 2012 John Wiley & Sons, Inc.
xiv REFERENCES

18. C. Chatfield; The Analysis of Time Series: An Introduction, Chapman and Hall,
London, 4th edition, 1989.
19. C. M. Bishop. Pattern Recognition and Machine Learning, Springer, 2006.
20. B. Croft, D. Metzler, T. Strohman. Search Engines: Information Retrieval in
Practice, Addison-Wesley Publishing Company, USA, 2009.
21. B. Cyganek, J. P. Siebert. An Introduction to 3D Computer Vision Techniques and
Algorithms, Wiley, 2009.
22. T. M. Cover, J. A. Thomas. Elements of Information Theory, John Wiley & Sons,
1991.
23. R. O. Duda, P. E. Hart. Pattern Classification and Scene Analysis, James Wiley,
1973.
24. D. J. C. MacKay. Information Theory, Inference, and Learning Algorithms,
Cambridge University Press, Cambridge, 2003.
25. D. Marr. Vision, Freeman, 1982.
26. A. Desolneux, L. Moisan, J.-M. Morel. From Gestalt Theory to Image Analysis,
Springer, 2008.
27. E. Hecht. Optics. Addison-Wesley, 1987.
28. E. R. Davies. Machine Vision, Academic Press, 1990.
29. E. W. Weisstein. MathWorldA Wolfram Web Resource,
http://mathworld.wolfram.com/, accessed March 1, 2012.
30. F. R. K. Chung; Spectral Graph Theory.American Mathematical Society, 1997.
31. D. Forsyth, J. Ponce. Computer Vision - a modern approach, Prentice Hall, 2003.
32. J. Flusser, T. Suk, B. Zitov
a. Moments and Moment Invariants in pattern
Recognition, Wiley, 2009.
33. A.Gelman, J. B. Carlin, H. S. Stern, D. B. Rubin, Bayesian Data Analysis,
Chapman and Hall, London, 1995.
34. A. Gersho, R. Gray. Vector Quantization and Signal Compression, Kluwer, 1992.
35. P. Green, L. MacDonald (Eds.). Colour Engineering, Wiley, 2003.
36. G. R. Grimmett, D. R. Stirzaker. Probability and Random Processes, Clarendon
Press, Oxford, Second edition, 1992.
37. G. H. Golub, C. F. Van Loan. Matrix Computations, Johns Hopkins University
Press, Second edition, 1989.
38. S. Gong, T. Xiang. Visual Analysis of Behaviour: From Pixels to Semantics,
Springer, 2011.
39. H. Freeman (Ed). Machine Vision for Three-dimensional Scenes, Academic Press,
1990.
40. H. Freeman (Ed). Machine Vision for Measurement and Inspection, Academic Press,
1989.
41. R. M. Haralick, L. G. Shapiro. Computer and Robot Vision, Addison-Wesley
Longman Publishing, 1992.
42. H. Samet. Applications of Spatial Data Structures, Addison Wesley, 1990.
43. T. J. Hastie, R. J. Tibshirani, J. Friedman. The Elements of Statistical Learning,
Springer-Verlag, 2008.
44. R. Hartley, A. Zisserman. Multiple View Geometry, Cambridge University Press,
2000.
REFERENCES xv

45. J. C. McGlone (Ed.). Manual of Photogrammetry, ASPRS, 2004.


46. J. J. Koenderink, What does the occluding contour tell us about solid shape?,
Perception, Vol. 13, pp 321-330, 1984.
47. R. Jain, R. Kasturi, B. Schunck. Machine Vision, McGraw Hill, 1995.
48. J. Pearl; Probabilistic Reasoning in Intelligent Systems: Networks of Plausible
Inference, Morgan Kaufmann, San Mateo, CA, 1988.
49. D. Koller, N. Friedman. Probabilistic Graphical Models, MIT Press, 2009.
50. K. Fukunaga. Introduction to Statistical Pattern Recognition, Academic Press, 1990.
51. B. V. K. Vijaya Kumar, A. Mahalanobis, R. D. Juday. Correlation Pattern
Recognition, Cambridge, 2005.
52. K. P. Murphy. Machine Learning: a Probabilistic Perspective, MIT Press, 2012.
53. L. A. Wasserman. All of Statistics, Springer, 2004.
54. L. J. Galbiati. Machine Vision and Digital Image Processing Fundamentals, Prentice
Hall, 1990.
55. T. Luhmann, S. Robson, S. Kyle, I. Harley; Close Range Photogrammetry, Whittles,
2006.
56. S. Lovett. Differential Geometry of Manifolds, Peters, 2010.
57. F. Mokhtarian, M. Bober; Curvature Scale Space Representation: Theory,
Applications and MPEG-7 Standardization, Springer, Computational Imaging and
Vision Series, Vol. 25, 2003.
58. K. V. Mardia, J. T. Kent, J. M. Bibby. Multivariate Analysis, Academic Press,
London, 1979.
59. C. D. Manning, P. Raghavan, H. Sch
utze; Introduction to Information Retrieval,
Cambridge University Press, 2008.
60. J.-M. Morel, S. Solimini; Variational Models for Image Segmentation: with seven
image processing experiments, Birkhauser, 1994.
61. V. S. Nalwa. A Guided Tour of Computer Vision, Addison Wesley, 1993.
62. M. Nixon, A. Aguado. Feature Extraction & Image Processing, Elsevier Newnes,
2005.
63. N. A. C. Cressie. Statistics for Spatial Data, Wiley, New York, 1993.
64. O. Faugeras. Three-Dimensional Computer Vision - A Geometric Viewpoint, MIT
Press, 1999.
65. M. Petrou and P. Bosdogianni. Image Processing: The Fundamentals, Wiley
Interscience, 1999.
66. M. Petrou, P. Garcia Sevilla. Image Processing - Dealing with Texture, Wiley, 2006.
67. W. H. Press, S. A. Teukolsky, W. T. Vetterling, B. P. Flannery. Numerical Recipes
in C, Cambridge University Press, Second edition, 1992.
68. R. J. Schalkoff. Digital Image Processing and Computer Vision, Wiley, 1989.
69. R. Nevatia. Machine Perception, Prentice-Hall, 1982.
70. C. E. Rasmussen, C. K. I. Williams. Gaussian Processes for Machine Learning, MIT
Press, Cambridge, Massachusetts, 2006.
71. R. W. G. Hunt. The Reproduction of Colour, Wiley, 2004.
72. E. Reinhard, G. Ward, S. Pattanaik, P. Debevec. High Dynamic Range Imaging,
Morgan Kaufman, 2006.
xvi REFERENCES

73. R. Szeliski. Computer Vision: Algorithms and Applications, Springer, 2010.


74. C. Solomon, T. Breckon. Fundamentals of Digital Image Processing,
Wiley-Blackwell, 2011.
75. R. S. Sutton, A. G. Barto. Reinforcement Learning, MIT Press, 1998.
76. S. E. Palmer. Vision Science: Photons to Phenomenology, MIT Press, 1999.
77. S. E. Umbaugh. Computer Vision and Image Processing, Prentice Hall, 1998.
78. M. Sonka, V. Hlavac, R. Boyle. Image Processing, Analysis, and Machine Vision,
Chapman and Hall, 1993.
79. M. Sonka, V. Hlavac, R. Boyle. Image Processing, Analysis, and Machine Vision,
Thompson, 2008.
80. M. Seul, L. OGorman, M. J. Sammon. Practical Algorithms for Image Analysis,
Cambridge University Press, 2000.
81. W. E. Snyder and H. Qi. Machine Vision, Cambridge, 2004.
82. L. Shapiro, G. Stockman. Computer Vision, Prentice Hall. 2001.
83. B. Sch
olkopf, A. Smola, Learning with Kernels, MIT Press, 2002.
84. J. Shawe-Taylor, N. Cristianini. Kernel Methods for Pattern Analysis, Cambridge
University Press, 2004.
85. S. Winkler. Digital Video Quality, Wiley, 2005.
86. S. Thrun, W. Burgard, D. Fox; Probabilistic Robotics, MIT Press, 2005.
87. J. T. Tou, R. C. Gonzalez. Pattern Recognition Principles, Addison Wesley, 1974.
88. T. M. Mitchell. Machine Learning, McGraw-Hill, New York, 1997.
89. E. Trucco, A. Verri. Introductory Techniques for 3-D Computer Vision, Prentice
Hall, 1998.
90. Wikipedia, http://en.wikipedia.org, accessed March 11, 2011.
91. X. S. Zhou, Y. Rui, T. S. Huang. Exploration of Visual Data, Kluwer Academic,
2003.
0

1D: One dimensional, usually in direction parallel to the other axis, and
reference to some structure. Examples reading the numbers at the
include: 1) a signal x(t) that is a intersections [ JKS:1.4]:
function of time t, 2) the dimensionality
of a single property value or 3) one
degree of freedom in shape variation or
motion. [ EH:2.1]

2D: Two dimensional. A space


describable using any pair of orthogonal
basis vectors consisting of two elements.
[ WP:Two-dimensional space]

2D coordinate system: A system


associating uniquely 2 real numbers to
any point of a plane. First, two
intersecting lines (axes) are chosen on
2D Fourier transform: A special
the plane, usually perpendicular to each
case of the general Fourier transform
other. The point of intersection is the
often used to find structures in images .
origin of the system. Second, metric
[ FP:7.3.1]
units are established on each axis (often
the same for both axes) to associate 2D image: A matrix of data
numbers to points. The coordinates Px representing samples taken at discrete
and Py of a point, P, are obtained by intervals. The data may be from a
projecting P onto each axis in a variety of sources and sampled in a
1
2 0

variety of ways. In computer vision features is the SUSAN corner finder.


applications the image values are often [ TV:4.1]
encoded color or monochrome intensity
samples taken by digital camera s but 2D pose estimation: A fundamental
may also be range data . Some typical open problem in computer vision
intensity values are [ SQ:4.1.1]: where the correspondence between two
sets of 2D points is found. The problem
is defined as follows: Given two sets of
points {~xj } and {~yk }, find the
Euclidean transformation {R, ~t} (the
pose) and the match matrix {Mjk }
(the correspondences) that best relates
them. A large number of techniques has
been used to address this problem, for
example tree-pruning methods, the
Hough transform and
06 21 11
geometric hashing . A special case of
21 16 12 10 09
3D pose estimation .
10 09 08 09 20 31
07 06 01 02 08 42 2D projection: A transformation
17 12 09 04 mapping higher dimensional space onto
image values two dimensional space. The simplest
method is to simply discard higher
2D input device: A device for dimensional coordinates, although
sampling light intensity from the real generally a viewing position is used and
world into a 2D matrix of the projection is performed.
measurements. The most popular two
dimensional imaging device is the
charge-coupled device ( CCD ) camera.
Other common devices are flatbed
scanners and X-ray scanners. projected points
[ SQ:4.2.1]
viewpoint
2D point: A point in a 2D space, that
is, characterized by two coordinates;
most often, a point on a plane, for 3d solid
2d space
instance an image point in pixel
coordinates. Notice, however, that two
coordinates do not necessarily imply a
plane: a point on a 3D surface can be For example, the main steps for a
expressed either in 3D coordinates or computer graphics projection are as
by two coordinates given a surface follows: apply normalizing transform to
parameterization (see surface patch) . 3D point world coordinates; clip against
[ JKS:1.4] canonical view volume; project onto
projection plane; transform into
2D point feature: Localized viewport in 2D device coordinates for
structures in a 2D image, such as display. Commonly used projections
interest points , corners and line functions are parallel projection or
meeting points (X, Y and T shaped for perspective projection . [ JKS:1.4]
example). One detector for these
0 3

2.5D image: A range image obtained


by scanning from a single viewpoint . +Y
This allows the data to be represented
in a single image array, where each +Z
pixel value encodes the distance to the
observed scene. The reason this is not
called a 3D image is to make explicit
the fact that the back sides of the scene
+X
objects are not represented. [ SQ:4.1.1]

2.5D sketch: Central structure of


Marrs Theory of vision. An 3D data: Data described in all three
intermediate description of a scene spatial dimensions. See also
indicating the visible surfaces and their range data, CAT and NMR . An
arrangement with respect to the viewer. example of a 3D data set is:
It is built from several different
elements: the contour, texture and
shading information coming from the
primal sketch , stereo information and
motion. The description is theorized to
be a kind of buffer where partial
resolution of the objects takes place.
The name 2 12 D sketch stems from the
fact that although local changes in
depth and discontinuities are well
resolved, and the absolute distance to
all scene points may remain unknown.
[ FP:11.3.2]
3D data acquisition: Sampling data
3D: Three dimensional. A space in all three spatial dimensions. There
describable using any triple of mutually are a variety of ways to perform this
orthogonal basis vectors consisting of sampling, for example using
three elements. structured light triangulation .
[ WP:Three-dimensional space] [ FP:21.1]
3D coordinate system: Same as 3D image: See range image .
2D coordinate system , but in three [ SQ:4.1.1]
dimensions. [ JKS:1.4]
3D interpretation: A 3D model, e.g.,
a solid object, that explains an image
or a set of image data. For instance, a
certain configuration of image lines can
be explained as the
perspective projection of a polyhedron;
in simpler words, the image lines are
the images of some of the polyhedrons
lines. See also image interpretation .
[ BB:9.1]
4 0

3D model: A description of a 3D reconstruction: A general term


3D object that primarily describes its referring to the computation of a
shape. Models of this sort are regularly 3D model from 2D images . [ BT:8]
used as exemplars in
model based recognition and 3D 3D skeleton: See skeleton
computer graphics. [ TV:10.6] [ FP:24.2.1]

3D moments: A special case of 3D stratigraphy: A modeling and


moment where the data comes from a visualization tool used to display
set of 3D points . different underground layers. Often
used for visualizations of archaeological
3D object: A subset of R3 . In sites or for detecting different rock and
computer vision, often taken to mean a soil structures in geological surveying.
volume in R3 that is bounded by a
surface . Any solid object around you is 3D structure recovery: See
an example: table, chairs, books, cups, 3D reconstruction . [ BT:8]
and you yourself. [ BB:9.1]
3D texture: The appearance of
3D point: An infinitesimal volume of texture on a 3D surface when imaged,
3D space. [ JKS:1.4] for instance, the fact that the density of
texels varies with distance due to
3D point feature: A point feature on perspective effects. 3D surface
a 3D object or in a 3D environment. properties (e.g., shape, distances,
For instance, a corner in 3D space. orientation) can be estimated from such
effects. See also shape from texture ,
3D pose estimation: 3D pose texture orientation.
estimation is the process of determining
the transformation (translation and 3D vision: A branch of
rotation) of an object in one coordinate computer vision dealing with
frame with respect to another characterizing data composed of 3D
coordinate frame. Generally, only rigid measurements. For example, this may
objects are considered, models of those involve segmentation of the data into
object exist a priori, and we wish to individual surfaces that are then used
determine the position of that object in to identify the data as one of several
an image on the basis of matched models. Reverse engineering is a
features. This is a fundamental open specialism inside 3D vision.
problem in computer vision where the [ ERD:16.2]
correspondence between two sets of 3D
points is found. The problem is defined 4 connectedness: A type of
as follows: Given two sets of points image connectedness in which each
{~xj } and {~yk }, find the parameters of rectangular pixel is considered to be
an Euclidean transformation {R, t} ~ connected to the four neighboring
(the pose)and the match matrix {Mjk } pixels that share a common crack edge .
(the correspondences) that best relates See also 8 connectedness . This figure
them. Assuming the points correspond, shows the four pixels connected to the
they should match exactly under this central pixel (*) [ SQ:4.5]:
transformation. [ TV:11.2]
0 5

pixels. See also 4 connectedness . This


figure shows the eight pixels connected
to the central pixel (*) [ SQ:4.5]:
*

and the four groups of pixels joined by *


4 connectedness:

1 1 1 and the two groups of pixels joined by 8


2
2 4
3
3
connectedness:
2 4 3
2 3
2 2 3
2 2 2 1 1 1
1 1
Object pixel Connected Object Pixels 1 2 1
1 2 1
Background pixel
1 1
1 1 1
8 connectedness: A type of 1 1 1

image connectedness in which each Object pixel Connected Object Pixels


rectangular pixel is considered to be Background pixel
connected to all eight neighboring
A

A*: A search technique that performs [ JKS:15.5]


best-first searching based on an
evaluation function that combines the size(S)
P (E) =
cost so far and the estimated cost to size(Q)
the goal. [ WP:A* search algorithm]

a posteriori probability: Literally, aberration: Problem exhibited by a


after probability. It is the probability lens or a mirror whereby unexpected
p(s|e) that some situation s holds after results are obtained. There are two
some evidence e has been observed. types of aberration commonly
This contrasts with the encountered: chromatic aberration ,
a priori probability p(s) that is the where different frequencies of light
probability of s before any evidence is focus at different positions,
observed. Bayes rule is often used to
compute the a posteriori probability
from the a priori probability and the
evidence. [ JKS:15.5]

a priori probability: Suppose that


there is a set Q of equally likely blue red
outcomes for a given action. If a
particular event E could occur of any
one of a subset S of these outcomes,
then the a priori or theoretical
probability of E is defined by
chromatic abberation
6
A 7

and spherical aberration, where light absolute point: A 3D point defining


passing through the edges of a lens (or the origin of a coordinate system.
mirror) focuses at slightly different
positions. [ FP:1.2.3] absolute quadric: The symmetric
I3 ~03
4 4 rank 3 matrix = ~ .
absolute conic: The conic in 3D 03 0
projective space that is the intersection Like the absolute conic , it is defined to
of the unit (or any) sphere with the be invariant under Euclidean
plane at infinity. It consists only of transformations, is rescaled under
complex points. Its importance in similarities,
takes the form
computer vision is due to its role in the A A ~03
= ~0 under affine
problem of autocalibration : the image 3 0
of the absolute conic (IAC), a 2D conic, transforms and becomes an arbitrary
is represented by a 3 3 matrix that 4 4 rank 3 matrix under projective
is the inverse of the matrix KK , transforms. [ FP:13.6]
where K is the matrix of the internal
camera calibration parameters. Thus, absorption: Attenuation of light
identifying allows the camera caused by passing through an optical
calibration to be computed. [ FP:13.6] system or being incident on an object
surface. [ EH:3.5]
absolute coordinates: Generally
used in contrast to local or relative accumulation method: A method of
coordinates. A coordinate system that accumulating evidence in histogram
is referenced to some external datum. form, then searching for peaks, which
For example, a pixel in a satellite image correspond to hypotheses. See also
might be at (100,200) in image Hough transform ,
coordinates, but at (51:48:05N, generalized Hough transform .
8:17:54W) in georeferenced absolute [ AL:9.3]
coordinates. [ JKS:1.4.2]
accumulative difference: A means of
absolute orientation: In detecting motion in image sequences.
photogrammetry, the problem of Each frame in the sequence is compared
registering two corresponding sets of to a reference frame (after registration
3D points. Used to register a if necessary) to produce a difference
photogrammetric reconstruction to image. Thresholding the difference
some absolute coordinate system. image gives a binary motion mask. A
Often expressed as the problem of counter for each pixel location in the
determining the rotation R, translation accumulative image is incremented
~t and scale s that best transforms a set every time the difference between the
of model points {m ~ 1, . . . , m
~ n } to reference image and the current image
corresponding data points {d~1 , . . . , d~n } exceeds some threshold. Used for
by minimizing the least-squares error change detection . [ JKS:14.1.1]
n accuracy: The error of a value away
kd~i s(Rm
X
(R, ~t, s) = ~ i + ~t)k2 from the true value. Contrast this with
i=1 precision .
to which a solution may be found by [ WP:Accuracy and precision]
using singular value decomposition .
acoustic sonar: SOund Navigation
[ JKS:1.4.2]
And Ranging. A device that is used
8 A

primarily for the detection and location deformable curve representation such as
of objects (e.g., underwater or in air, as a snake . The term active refers to the
in mobile robotics, or internal to a ability of the snake to deform shape to
human body, as in medical ultrasound ) better match the image data. See also
by reflecting and intercepting acoustic active shape model . [ SQ:8.5]
waves. It operates with acoustic waves
in an analogous way to that of radar , active contour tracking: A
using both the time of flight and technique used in model based vision
Doppler effects, giving the radial where object boundaries are tracked in
component of relative position and a video sequence using
velocity. [ WP:Sonar] active contour models .

ACRONYM: A vision system active illumination: A system of


developed by Brooks that attempted to lighting where intensity, orientation, or
recognize three dimensional objects pattern may be continuously controlled
from two dimensional images, using and altered. This kind of system may
generalized cylinder primitives to be used to generate structured light .
represent both stored model and objects [ CS:1.2]
extracted from the image. [ RN:10.2]
active learning: Learning about the
active appearance model: A environment through interaction (e.g.,
generalization of the widely used looking at an object from a new
active shape model approach that viewpoint). [ WP:Active learning]
includes all of the information in the
active net: An active shape model
image region covered by the target
that parameterizes a triangulated mesh
object, rather than just that near
.
modeled edges. The active appearance
model has a statistical model of the active sensing: 1) A sensing activity
shape and gray-level appearance of the carried out in an active or purposive
object of interest. This statistical way, for instance where a camera is
model generalizes to cover most valid moved in space to acquire multiple or
examples. Matching to an image optimal views of an object. (See also
involves finding model parameters that active vision , purposive vision ,
minimize the difference between the sensor planning .) 2) A sensing activity
image and a synthesized model implying the projection of a pattern of
example, projected into the image. energy, for instance a laser line, onto
[ NA:6.5] the scene. See also
laser stripe triangulation ,
active blob: A region based approach
structured light triangulation .
to the tracking of non-rigid motion in
[ FP:21.1]
which an active shape model is used.
The model is based on an initial region active shape model: Statistical
that is divided using models of the shapes of objects that
Delaunay triangulation and then each can deform to fit to a new example of
patch is tracked from frame to frame the object. The shapes are constrained
(note that the patches can deform). by a statistical shape model so that
they may vary only in ways seen in a
active contour models: A technique
training set. The models are usually
used in model based vision where
formed by using
object boundaries are detected using a
A 9

principal component analysis to active volume: The volume of interest


identify the dominant modes of shape in a machine vision application.
variation in observed examples of the
shape. Model shapes are formed by activity analysis: Analyzing the
linear combinations of the dominant behavior of people or objects in a video
modes. [ WP:Active shape model] sequence, for the purpose of identifying
the immediate actions occurring or the
active stereo: An alternative long term sequence of actions. For
approach to traditional example, detecting potential intruders
binocular stereo . One of the cameras is in a restricted area.
replaced with a structured light [ WP:Occupational therapy#Activity analysis]
projector, which projects light onto the
object of interest. If the camera
calibration is known, the triangulation acuity: The ability of a vision system
for computing the 3D coordinates of to discriminate (or resolve) between
object points simply involves finding closely arranged visual stimuli. This
the intersection of a ray and known can be measure using a grating, i.e., a
structures in the light field. [ CS:1.2] pattern of parallel black and white
stripes of equal widths. Once the bars
active surface: 1) A surface become too close, the grating becomes
determined using a range sensor ; 2) an indistinguishable from a uniform image
active shape model that deforms to fit of the same average intensity as the
a surface. [ WP:Active surface] bars. Under optimal lighting, the
minimum spacing that a person can
active triangulation: Determination resolve is 0.5 min of arc. [ SEU:7.6]
of surface depth by triangulation
between a light source at a known adaptive: The property of an
position and a camera that observes the algorithm to adjust its parameters to
effects of the illuminant on the scene. the data at hand in order to optimize
Light stripe ranging is one form of performance. Examples include
active triangulation. A variant is to use adaptive contrast enhancement ,
a single scanning laser beam to adaptive filtering and
illuminate the scene and use a stereo adaptive smoothing .
pair of cameras to compute depth. [ WP:Adaptive algorithm]
[ WP:3D scanner#Triangulation]
adaptive coding: A scheme for the
active vision: An approach to transmission of signals over unreliable
computer vision in which the camera or channels, for example a wireless link.
sensor is moved in a controlled manner, Adaptive coding varies the parameters
so as to simplify the nature of a of the encoding to respond to changes
problem. For example, rotating a in the channel, for example fading,
camera with constant angular velocity where the signal-to-noise ratio
while maintaining fixation at a point degrades. [ WP:Adaptive coding]
allows absolute calculation of scene
point depth, instead of only relative adaptive contrast enhancement:
depth that depends on the camera An image processing operation that
speed. (See also kinetic depth .) applies histogram equalization locally
[ VSN:10] across an image.
[ WP:Adaptive histogram equalization]
10 A

adaptive edge detection:


Edge detection with adaptive pyramid: A method of
adaptive thresholding of the gradient multi-scale processing where small
magnitude image. [ VSN:3.1.2] areas of image having some feature in
common (say color) are first extracted
adaptive filtering: In signal into a graph representation. This graph
processing, any filtering process in is then manipulated, for example by
which the parameters of the filter pruning or merging, until the level of
change over time, or where the desired scale is reached.
parameters are different at different
parts of the signal or image. adaptive reconstruction: Data
[ WP:Adaptive filter] driven methods for creating statistically
significant data in areas of a 3D data
adaptive histogram equalization: cloud where data may be missing due
A localized method of improving image to sampling problems.
contrast. A histogram is constructed of
the gray levels present. These gray adaptive smoothing: An iterative
levels are re-mapped so that the smoothing algorithm that avoids
histogram is approximately flat. It can smoothing over edges. Given an image
be made perfectly flat by dithering. I(x, y), one iteration of adaptive
[ WP:Adaptive histogram equalization] smoothing proceeds as follows:

1. Compute gradient magnitude


image G(x, y) = |I(x, y)|

2. Make weights image


W (x, y) = eG(x,y)

3. Smooth the image


P1 P1
i=1 j=1 Axyij
S(x, y) = P1 P1
original after adaptive histogram equalization i=1 j=1 Bxyij

adaptive Hough transform: A


where
Hough transform method that
iteratively increases the resolution of
the parameter space quantization. It is Axyij = I(x+i, y+j)W (x+i, y+j)
particularly useful for dealing with high
dimensional parameter spaces. Its Bxyij = W (x + i, y + j)
disadvantage is that sharp peaks in the
histogram can be missed. [ NA:5.6]
[ WP:Additive smoothing]
adaptive meshing: Methods for
creating simplified meshes where adaptive thresholding: An improved
elements are made smaller in regions of image thresholding technique where the
high detail (rapid changes in surface threshold value is varied at each pixel.
orientation) and larger in regions of low A common technique is to use the
detail, such as planes. average intensity in a neighbourhood to
[ WP:Adaptive mesh refinement] set the threshold. [ ERD:4.4]
A 11

common boundary, nodes in a graph


connected by an arc or components in a
geometric model sharing some common
bounding component, etc. Formally
defining adjacent can be somewhat
heuristic because you may need a way
to specify closeness (e.g., on a
quantized grid of pixels) or consider
Image, I Smoothed, S Thresholded
I > S6 how much shared boundary is
required before two structures are
adaptive triangulation: See
adjacent. [ RN:2.1.1]
adaptive meshing .
adjacency: See adjacent. [ RN:2.1.1]
adaptive visual servoing: See
visual servoing . [ WP:Visual Servoing] adjacency graph: A graph that
shows the adjacency between
structures, such as
additive color: The way in which
segmented image regions . The nodes of
multiple wavelengths of light can be
the graph are the structures and an arc
combined to allow other colors to be
implies adjacency of the two structures
perceived (e.g., if equal amounts of
connected by the arc. This figure shows
green and red light are shone on a sheet
the graph associated with the
of white paper the paper will appear to
segmented image on the left:
be illuminated with a yellow light
source. Contrast this with
subtractive color . [ LG:3.7]
2 2

3 8
1
1 3
8
Yellow

Green Red 4 4
5
7

5 6
7
6

Regions Adjacency graph

additive noise: Generally image affine: A term first used by Euler.


independent noise that is added to it by Affine geometry is a study of properties
some external process. The recorded of geometric objects that remain
image I at pixel (i, j) is then the sum invariant under affine transformations
of the true signal S and the noise N . (mappings). These include:
parallelness, cross ratio, adjacency.
Ii,j = Si,j + Ni,j [ WP:Affine geometry]

The noise added at each pixel (i, j) affine arc length: For a parametric
could be different. [ SEU:3.2] equation of a curve f~(u) = (x(u), y(u)),
arc length is not preserved under an
adjacent: Commonly meaning next affine transformation . The affine length
to each other, whether in a physical
Z u
sense of being connected pixels in an 1
(u) = (x
y x
y)
3
image, image regions sharing some 0
12 A

is invariant under affine I2 =(230 203 630 21 12 03


transformations. [ SQ:8.4] +430 312 + 4321 03
affine camera: A special case of the 3221 212 )/10
00
projective camera that is obtained by
constraining the 3 4 camera
parameter matrix T such that I3 =(20 (21 03 212 )
T3,1 = T3,2 = T3,3 = 0 and reducing the
11 (30 03 21 12 ) +
camera parameter vector from 11
degrees of freedom to 8. [ FP:2.3.1] 02 (30 12 221 ))/700

affine curvature: A measure of


curvature based on the I4 =(320 203 6220 11 12 03
affine arc length, . For a parametric
6220 02 21 03 + 9220 02 212
equation of a curve f~(u) = (x(u), y(u)),
its affine curvature, , is +1220 211 21 03
+620 11 02 30 03
( ) = x ( )y ( ) x ( )y ( ) 1820 11 02 21 12
[ WP:Affine curvature] 8311 30 03 620 202 30 12
+920 202 221 + 12211 02 30 12
affine flow: A method of finding the
611 202 30 21 + 302 230 )/11
00
movement of a surface patch by
estimating the affine transformation
parameters required to transform the where each is the associated
patch from its position in one view to central moment . [ NA:7.3]
another.
affine quadrifocal tensor: The form
affine fundamental matrix: The taken by the quadrifocal tensor when
fundamental matrix which is obtained specialized to the viewing conditions
from a pair of cameras under affine modeled by the affine camera .
viewing conditions. It is a 3 3 matrix
affine reconstruction: A three
whose upper left 2 2 submatrix is all
dimensional reconstruction where the
zero. [ HZ:13.2.1]
ambiguity in the choice of basis is affine
affine invariant: An object or shape only. Planes that are parallel in the
property that is not changed (i.e., is Euclidean basis are parallel in the affine
invariant) by the application of an reconstruction. A
affine transformation . See also projective reconstruction can be
invariant . [ FP:18.4.1] upgraded to affine by identification of
the plane at infinity, often by locating
affine length: See affine arc length . the absolute conic in the
[ WP:Affine curvature] reconstruction. [ HZ:9.4.1]

affine moment: Four shape measures affine stereo: A method of scene


derived from second and third order reconstruction using two calibrated
moments that remain invariant under views of a scene from known view
affine transformation s. They are given points. It is a simple but very robust
by approximation to the geometry of
20 02 211 stereo vision, to estimate positions,
I1 =
400 shapes and surface orientations. It can
A 13

be calibrated very easily by observing The ratio of length of line


just four reference points. Any two segments of a given line remains
views of the same planar surface will be constant.
related by an affine transformation that
maps one image to the other. This The ratio of areas of two triangles
consists of a translation and a tensor, remains constant.
known as the disparity gradient tensor Ellipses remain ellipses and the
representing the distortion in image same is true for parabolas and
shape. If the standard unit vectors X hyperbolas.
and Y in one image are the projections
of some vectors on the object surface Barycenters of triangles (and
and the linear mapping between images other shapes) map into the
is represented by a 2 3 matrix A, then corresponding barycenters.
the first two columns of A will be the
Analytically, affine transformations are
corresponding vectors in the other
represented in the matrix form
image. Since the centroid of the plane
will map to both image centroids, it can f (x) = Ax + b
be used to find the surface orientation
where the determinant det(A) of the
affine transformation: A special set square matrix A is not 0. In 2D the
of transformations in Euclidean matrix is 2 2; in 3D it is 3 3.
geometry that preserve some properties [ FP:2.2]
of the construct being transformed.
affine trifocal tensor: The form
taken by the trifocal tensor when
specialized to the viewing conditions
modeled by the affine camera .

affinely invariant region: Image


patches that automatically deform with
changing viewpoint in such a way that
they cover identical physical parts of a
scene. Since such regions can are
describable by a set of invariant
features they are relatively easy to
match between views under changing
Affine transformations preserve:
illumination .
Collinearity of points: if three agglomerative clustering: A class of
points belong to the same straight iterative clustering algorithms that
line, their images under affine begin with a large number of clusters
transformations also belong to the and at each iteration merge pairs (or
same line and the middle point tuples) of clusters. Stopping the
remains between the other two process at a certain number of
points. iterations gives the final set of clusters,
or the process can be run until only one
Parallel lines remain parallel, cluster remains, and the progress of the
concurrent lines remain algorithm represented as a dendrogram.
concurrent (images of intersecting [ WP:Cluster analysis#Agglomerative hierarchical clustering]
lines intersect).
14 A

smoothing ) before downsampling


albedo: Whiteness. Originally a term mitigates the effect. [ FP:7.4]
used in astronomy to describe reflecting
power.

Albedo values

1.0 0.75 0.5 0.25 0.0

If a body reflects 50% of the light


falling on it, it is said to have albedo
0.5. [ FP:4.3.3]

algebraic distance: A linear


distance metric commonly used in
computer vision applications because of
its simple form and standard matrix
based least mean square estimation
operations. If a curve or surface is
defined implicitly by f (~x, ~a) = 0 (e.g.,
~x ~a = 0 for a hyperplane) the algebraic
distance of a point ~xi to the surface is
simply f (~xi , ~a). [ FP:10.1.5]

aliasing: The erroneous replacement of


high spatial frequency (HF)
components by low-frequency ones
when a signal is sampled . The affected
HF components are those that are
higher than the Nyquist frequency, or
half the sampling frequency. Examples
include the slowing of periodic signals
alignment: An approach to
by strobe lighting, and corruption of
geometric model matching by
areas of detail in image resizing. If the
registering a geometric model to the
source signal has no HF components,
image data. [ FP:18.2]
the effects of aliasing are avoided, so
the low pass filtering of a signal to ALVINN: Autonomous Land Vehicle
remove HF components prior to In a Neural Network: An early attempt,
sampling is one form of anti-aliasing. at Carnegie-Mellon University, to learn
The image below is the perspective a complex behaviour (maneuvering a
projection of a checkerboard. The vehicle) by observing humans.
image is obtained by sampling the
scene at a set of integer locations. First ambient light: Illumination by diffuse
figure: The spatial frequency increases reflections from all surfaces within a
as the plane recedes, producing aliasing scene (including the sky, which acts as
artifacts (jagged lines in the an external distant surface). In other
foreground, moire patterns in the words, light that comes from all
background). Second figure: removing directions, such as the sky on a cloudy
high-frequency components (i.e., day. Ambient light ensures that all
A 15

surfaces are illuminated, including AND operator: A boolean logic


those not directly facing light sources. operator that combines two input
[ FP:5.3.3] binary images, applying the AND logic
p q p&q
AMBLER: An autonomous active
vision system using both structured 0 0 0
light and sonar, developed by NASA 0 1 0
and Carnegie-Mellon University. It is 1 0 0
supported by a 12-legged robot and is 1 1 1
intended for planetary exploration. at each pair of corresponding pixels.
This approach is used to select image
amplifier noise: Spurious regions. The rightmost image below is
additive noise signal generated by the the result of ANDing the two leftmost
electronics in a sampling device. The images. [ SB:3.2.2]
standard model for this type of noise is
Gaussian. It is independent of the
signal. In color cameras, where more
amplification is used in the blue color
channel than in the green or red
channel there tends to be more noise in
the blue channel. In well-designed angiography: A method for imaging
electronics amplifier noise is generally blood vessels by introducing a dye that
negligible. is opaque when photographed by X-ray.
Also the study of images obtained in
analytic curve finding: A method of
this way. [ WP:Angiography]
detecting parametric curves by first
transforming data into a feature space angularity ratio: Given two figures,
that is then searched for the X and Y , i (X) and j (Y ) are angles
hypothesized curve parameters. subtending convex parts of the contour
Examples might be line finding using of the figure X and k (X) are angles
the Hough transform . subtending plane parts of the contour of
figure X, then the angularity ratios are:
anamorphic lens: A lens having one
or more cylindrical surfaces. X i (X)
Anamorphic lenses are used in 360o
i
photography to produce images that are
compressed in one dimension. Images and P
(X)
can later be restored to true form using Pi j
another reversing anamorphic lens set. k k (X)
This form of lens is used in wide-screen anisotropic filtering: Any filtering
movie photography. technique where the filter parameters
anatomical map: A biological model vary over the image or signal being
usable for alignment with or filtered. [ WP:Anisotropic filtering]
region labeling of a corresponding anomalous behavior detection:
image dataset. For example, one could Special case of surveillance where
use a model of the brains functional human movement is analyzed. Used in
regions to assist in the identification of particular to detect intruders or
brain structures in an NMR dataset. behavior likely to precede or indicate
crime. [ WP:Anomaly detection]
16 A

antimode: The minimum between two BEFORE AFTER


maxima. For example one method of
threshold selection is done by
determining the antimode in a bimodal
histogram.

f(x)
Antimode

x apparent contour: The apparent


contour of a surface S in 3D, is the set
aperture: Opening in the lens
of critical values of the projection of S
diaphragm of a camera through which
on a plane, in other words, the
light is admitted. This device is often
silhouette. If the surface is transparent,
arranged so that the amount of light
the apparent contour can be
can be controlled accurately. A small
decomposed into a collection of closed
aperture reduces the amount of light
curves with double points and cusps.
available, but increases the
The convex envelope of an apparent
depth of field . This figure shows nearly
contour is also the boundary of its
closed (left) and nearly open (right)
convex hull . [ VSN:4]
aperture positions [ TV:2.2.2]:
apparent motion: The 3D motion
suggested by the image motion field ,
but not necessarily matching the real
3D motion. The reason for this
mismatch is the motion fields may be
ambiguous, that is, may be generated
by different 3D motions, or light source
closed open
movement. Mathematically, there may
be multiple solutions to the problem of
aperture control: Mechanism for reconstructing 3D motion from the
varying the size of a cameras aperture . image motion field. See also
[ WP:Aperture#Aperture control] visual illusion , motion estimation .
[ WP:Apparent motion]
aperture problem: If a motion sensor
has a finite receptive field, it perceives appearance: The way an object looks
the world through something from a particular viewpoint under
resembling an aperture, making the particular lighting conditions.
motion of a homogeneous contour seem [ FP:25.1.3]
locally ambiguous. Within that
aperture, different physical motions are appearance based recognition:
therefore indistinguishable. For Object recognition where the object
example, the two alternative motions of model encodes the possible
the square below are identical in the appearances of the object (as
circled receptive fields [ VSN:8.1.1]: contrasted with a geometric model
A 17

that encodes the shape as used in


model based recognition ). In principle, appearance model: A representation
it is impossible to encode all used for interpreting images that is
appearances when occlusions are based on the appearance of the object.
considered; however, small numbers of These models are usually learned by
appearances can often be adequate, using multiple views of the objects. See
especially if there are not many models also active appearance model and
in the model base. There are many appearance based recognition .
approaches to appearance based [ WP:Active appearance model]
recognition, such as using a
appearance prediction: Part of the
principal component model to encode
science of appearance engineering,
all appearances in a compressed
where an object texture is changed so
framework, using color histograms to
that the viewer experience is
summarize the appearance, or using a
predictable.
set of local appearance descriptors such
as Gabor filters extracted at appearance singularity: An image
interest points . A common feature of position where a small change in viewer
these approaches is learning the models position can cause a dramatic change in
from examples. [ TV:10.4] the appearance of the observed scene,
such as the appearance or
appearance based tracking:
disappearance of image features. This
Methods for object or target
is contrasted with changes occurring
recognition in real time, based on image
when in a generic viewpoint . For
pixel values in each frame rather than
example, when viewing the corner of a
derived features. Temporal filtering,
cube from a distance, a small change in
such as the Kalman filter , is often
viewpoint still leaves the three surfaces
used.
at the corner visible. However, when
appearance change: Changes in an the viewpoint moves into the infinite
image that are not easily accounted for plane containing one of the cube faces
by motion, such as an object actually (a singularity), one or more of the
changing form. planes disappears.

appearance enhancement arc length: If f is a function such that


transform: Generic term for its derivative f is continuous on some
operations applied to images to change, closed interval [a, b] then the arc length
or enhance, some aspect of them. of f from x = a to x = b is the integral
Examples include brightness [ FP:19.1]
adjustment, contrast adjustment, edge
sharpening, histogram equalization,
saturation adjustment or magnification.

appearance flow: Robust methods for


real time object recognition from a
sequence of images depicting a moving
object. Changes in the images are used
rather than the images themselves. It is Z b
analogous to processing using p
1 + [f (x)]2 dx
optical flow . a
18 A

single instruction multiple data .


f(x)
[ WP:Vector processor]
arc length

11 11
00 00
f(x) 111
000 111
000 00 11
11 00
01 01 01
10 10 10
111
000 111
000 11 00
00 11
0110 0110 0110
10 10 10
x=a x=b x
111
000 111
000 11 00
00 11
arc of graph: Two nodes in a graph
can be connected by an arc. The 00
11
00
11 00
11
00
11 00
11
00
11 000
111
000
111
dashed lines here are the arcs:
00
11 00
11 00
11 000
111
11 11
00 00
111
000 111
000 00 11
11 00

A B arterial tree segmentation: Generic


term for methods used in finding
internal pipe-like structures in medical
images. Example image types are NMR
images, angiograms and X-rays .
C Example trees are bronchial systems
and veins.

[ WP:Graph (mathematics)] articulated object: An object


composed by a number of (usually)
architectural model reconstruction: rigid subparts or components connected
A generic term for reverse engineering by joints, which can be arranged in a
buildings based on collected 3D data as number of different configurations. The
well as libraries of building constraints. human body is a typical example.
[ BM:1.9]
area: The measure of a region or
surfaces extension in some given units. articulated object model: A
The units could be image units, such as representation of an articulated object
square pixels, or in scene units, such as that includes both its separate parts
square centimeters. [ JKS:2.2.1] and their range of movement (typically
joint angles) relative to each other.
area based: Image operation that is
applied to a region of an image, as articulated object segmentation:
opposed to pixel based. [ CS:6.6] Methods for acquiring an
articulated object from 2D or 3D data.
array processor: A group of
time-synchronized processing elements articulated object tracking:
that perform computations on data Tracking an articulated object in an
distributed across them. Some array image sequence. This includes both the
processors have elements that pose of the object and also its shape
communicate only with their immediate parameters, such as joint angles.
neighbors, as in the topology shown [ WP:Finger tracking]
below. See also
A 19

aspect graph: A graph of the set of features a, b, c and d. The maximal


views (aspects) of an object, where the clique consisting of A:a, B:b and C:c is
arcs of the graph are transitions one match hypothesis. [ BB:11.2.1]
between two neighboring views (the
nodes ) and a change between aspects is
called a visual event. See also A:a B:b
characteristic view . This graph shows
some of the aspects of the
hippopotamus [ FP:20]
C:c C:d
B:d

astigmatism: Astigmatism is a
refractive error where the light is
focused within an optical system, such
object
as in this example.

aspects

aspect ratio: 1) The ratio of the sides


of the bounding box of an object,
where the orientation of the box is
chosen to maximize this ratio. Since
this measure is scale invariant it is a
useful metric for object recognition . 2)
In a camera, it is the ratio of the
horizontal to vertical pixel sizes. 3) In It occurs when a lens has irregular
an image, it is the ratio of the image curvature causing light rays to focus at
width to height. For example, an image an area, rather than at a point. It may
of 640 by 480 pixels has an aspect ratio be corrected with a toric lens, which
of 4:3. [ AL:2.2] has a greater refractive index on one
axis than the others. In human eyes,
aspects: See characteristic view and astigmatism often occurs with
aspect graph . [ FP:20] nearsightedness and farsightedness.
[ FP:1.2.3]
association graph: A graph used in
structure matching, such as matching a atlas based segmentation: A
geometric model to a data description. segmentation technique used in medical
In this graph, each node corresponds to image processing, especially with brain
a pairing between a model and a data images. Automatic tissue segmentation
feature (with the implicit assumption is achieved using a model of the brain
that they are compatible). Arcs in the structure and imagery (see
graph mean that the two connected atlas registration ) compiled with the
nodes are pairwise compatible. Finding assistance of human experts. See also
maximal cliques is one technique for image segmentation .
finding good matches. The graph below
shows a set of pairings of model atlas registration: An image
features A, B and C with image registration technique used in medical
20 A

image processing, especially to register itself. For an infinitely long 1D signal


brain images. An atlas is a model f (t) : R 7 R, the autocorrelation at a
(perhaps statistical) of the shift t is
characteristics of multiple brains, Z
providing examples of normal and Rf (t) = f (t)f (t + t)dt

pathological structures. This makes it
possible to take into account anomalies The autocorrelation function Rf always
that single-image registration could not. has a maximum at 0. A peaked
See also medical image registration . autocorrelation function decays quickly
away from t = 0. The sample
ATR: See autocorrelation function of a finite set
automatic target recognition. of values f1..n is
[ WP:Automatic target recognition] {rf (d)|d = 1, . . . , n 1} where
Pnd
attention: See visual attention . (fi f)(fi+d f)
rf (d) = i=1Pn 2
[ WP:Attention] i=1 (fi f )
Pn
attenuation: The reduction of a and f = n1 i=1 fi is the sample mean.
particular phenomenon, for instance, [ WP:Autocorrelation]
noise attenuation as the reduction of
image noise. [ WP:Attenuation] autofocus: Automatic determination
and control of image sharpness in an
attributed graph: A graph useful for optical or vision system. There are two
representing different properties of an major variations in this control system:
image. Its nodes are attributed pairs of active focusing and passive focusing.
image segments, their color or shape for Active autofocus is performed using
example. The relations between them, sonar or infrared signal to determine
such as relative texture or brightness the object distance. Passive autofocus
are encoded as arcs . [ BM:4.5.2] is performed by analyzing the image
itself to optimize differences between
augmented reality: Primarily a adjacent pixels in the CCD array.
projection method that adds graphics [ WP:Autofocus]
or sound, etc as an overlay to original
image or audio. For example, a automatic: Performed by a machine
fire-fighters helmet display could show without human intervention. The
exit routes registered to his/her view of opposite of manual.
the building. [ WP:Augmented reality] [ WP:Automation]
automatic target recognition
autocalibration: The recovery of a (ATR): Sensors and algorithms used
cameras calibration using only point for detecting hostile objects in a scene.
(or other feature) correspondences from Sensors are of many different types,
multiple uncalibrated images and sampling in infrared , visible light and
geometric consistency constraints (e.g., using sonar and radar .
that the camera settings are the same [ WP:Automatic target recognition]
for all images in a sequence).
[ AL:13.7] autonomous vehicle: A mobile robot
controlled by computer, with human
autocorrelation: The extent to which input operating only at a very high
a signal is similar to shifted copies of level, stating the ultimate destination
or task for example. Autonomous
A 21

navigation requires the visual tasks of axis of elongation: 1) The line that
route detection, self-localization , minimizes the second moment of the
landmark location and data points. If {~xi } are the data points,
obstacle detection , as well as robotics and d(~x, L) is the distance from point ~x
tasks such as route planning and motor to line L, then P the axis of elongation A
control. [ WP:Driverless car] minimizes i d(~xi , A)2 . Let ~ be the
meanPof {~xi }. Define the scatter matrix
autoregressive model: A model that S = ~ )T . Then the
i (~
xi
~ )(~xi
uses statistical properties of past axis of elongation is the eigenvector of
behavior of some variable to predict S with the largest eigenvalue . See also
future behavior of that variable. A principal component analysis . The
signal xt at time t satisfies an figure below shows this axis of
autoregressive
Pp model if elongation for a set of points. 2) The
xt = n=1 n xtn + t , where t is longer midline of the bounding box
noise. [ WP:Autoregressive model] with largest length-to-width ratio. A
possible axis of elongation is the line in
autostereogram: An image similar to
this figure [ JKS :2.2.3]:
a random dot stereogram in which the
corresponding features are combined
into a single image. Stereo fusion
allows the perception of a 3D shape in
the 2D image. [ WP:Autostereogram]

average smoothing: See


mean smoothing . [ VSN:3.1]

AVI: Microsoft format for audio and


video files (audio video interleaved). axis of rotation: A line about which
Unlike MPEG, it is not a standard, so a rotation is performed. Equivalently,
that compatibility of AVI video files the line whose points are fixed under
and AVI players is not always the action of a rotation. Given a 3D
guaranteed. rotation matrix R, the axis is the
[ WP:Audio Video Interleave] eigenvector of R corresponding to the
eigenvalue 1. [ JKS :12.2.2]
axial representation: A
region representation that uses a curve axis-angle curve representation: A
to describe the image region. The axis rotation representation based on the
may be a skeleton derived from the amount of twist about the axis of
region by a thinning process. rotation, here a unit vector ~a. The
quaternion rotation representation is
similar.
B

B-rep: See
surface boundary representation . b-spline snake: A snake made from
[ BT:8] b-splines .

b-spline: A curve approximation back projection: 1) A form of display


spline represented as a combination of where a translucent screen is
basis functions: illuminated from the side not facing the
viewer. 2) The computation of a 3D
m
X quantity from its 2D projection. For
~c(t) = ~ai Bi (x) example, a 2D homogeneous point ~x is
i=0
the projection of a 3D point X ~ by a
where Bi are the basis functions and ~ai perspective projection matrix P, so
are the control points. B-splines do not ~x = PX.~ The backprojection of ~x is the
necessarily pass through any of the 3D line {null(P) + P+ ~x} where P+ is
control points; however, if b-splines are the pseudoinverse of P. 3) Sometimes
calculated for adjacent sets of control used interchangeably with
points the curve segments will join up triangulation . 4) Technique to compute
and produce a continuous curve. the attenuation coefficients from
[ JKS:13.7.1] intensity profiles covering a total cross
section under various angles. It is used
b-spline fitting: Fitting a b-spline to in CT and MRI to recover 3D from
a set of data points. This is useful for essentially 2D images. 5) Projection of
noise reduction or for producing a more the estimated 3D position of a shape
compact model of the observed curve.
[ JKS:13.7.1]
22
B 23

back into the 2D image from which the


shapes pose was estimated. [ AJ:10.3]

background: In computer vision,


generally used in the context of object
recognition. The background is either
(1) the area of the scene behind an
object or objects of interest or (2) the
part of the image whose pixels sample
from the background in the scene. As
opposed to foreground . See also
figure/ground separation . [ JKS:2.5]

background labeling: Methods for


differentiating objects in the foreground
of images or those of interest from
those in the background . [ AL:10.4]

background modeling:
Segmentation or change detection
method where the scene behind the
objects of interest is modeled as a fixed
or slowly changing background , with
possible foreground occlusions . Each
pixel is modeled as a distribution which
is then used to decide if a given
observation belongs to the background
or an occluding object. [ NA:3.5.2]

background normalization:
Removal of the background by some
image processing technique to estimate
the background image and then
dividing or subtracting the background
from an original image. The technique
is useful for when the background is
non-uniform. The images below backlighting: A method of
illustrate this where the first shows the illuminating a scene where the
input image, the second is the background receives more illumination
background estimate obtained by than the foreground . Commonly this is
dilation with ball(9, 9) used to produce silhouettes of opaque
structuring element and the third is the objects against a lit background, for
(normalized) division of the input image easier object detection. [ LG:2.1.1]
by the background image. [ JKS:3.2.1]
bandpass filter: A signal processing
filtering technique that allows signals
between two specified frequencies to
pass but cuts out signals at all other
frequencies. [ FP:9.2.2]
24 B

back-propagation: One of the arranged to give details on products or


best-studied neural network training other objects. Bar codes themselves
algorithms for supervised learning . have many different coding standards
The name arises from using the and arrangements. An example bar
propagation of the discrepancies code is [ LG:7]:
between the computed and desired
responses at the network output back
to the network inputs. The
discrepancies are one of the inputs into
the network weight recomputation
process. [ WP:Backpropagation] barrel distortion: Geometric
lens distortion in an optical system
back-tracking: A basic technique for that causes the outlines of an object to
graph searching : if a terminal but curve outward, forming a barrel shape.
non-solution node is reached, search See also pincushion distortion .
does not terminate with failure, but [ EH:6.3.1]
continues with still unexplored children
of a previously visited non-terminal barycentrum: See center of mass .
node. Classic back-tracking algorithms [ JKS:2.2.2]
are breadth-first, depth-first, and A* .
bas-relief ambiguity: The ambiguity
See also graph , graph searching ,
in reconstructing a 3D object with
search tree . [ BB:11.3.2]
Lambertian reflectance using shading
bar: A raw primal sketch primitive from an image under orthographic
that represents a dark line segment projection. If the true surface is z(x, y),
against a lighter background (or its then the family of surfaces
inverse). Bars are also one of the az(x, y) + bx + cy generate identical
primitives in Marrs theory of vision. images under these viewing conditions,
The following is a small dark bar so any reconstruction, for any values of
observed inside a receptive field : a, b, c is equally valid. The ambiguity is
thus up to a three-parameter family.

Receptive field baseline: Distance between two


cameras used in a binocular stereo
system. [ DH:10.6]

Bar Object point

bar detector: 1) Method or algorithm


Epipolar plane
that produces maximum excitation
when a bar is in its receptive field . 2)
Device used by thirsty undergraduates.
[ WP:Feature detection (nervous system)#History]
Left image plane Right image plane

bar-code reading: Methods and


algorithms used for the detection,
imaging and interpretation of black
parallel lines of different widths Left camera Stereo baseline Right camera
B 25

Bayesian model: A statistical


basis function representation: A modeling technique based on two input
method of representing a function as a models:
sum of simple (usually orthonormal )
ones. For example the
Fourier transform represents functions 1. a likelihood model p(y|x, h),
as a weighted sum of sines and cosines. describing the density of
[ AJ:1.2] observing y given x and h.
Regarded as a function of h, for a
Bayes rule: The relationship between
fixed y and x, the density is also
the conditional probability of event A
known as the likelihood of h.
given B and the conditional probability
of event B given event A. This
expressed as
2. a prior model, p(h|D0 ) which
P (B|A)P (A) specifies the a priori density of h
P (A|B) =
P (B) given some known information
denoted by D0 before any new
providing that P (B) 6= 0. [ SQ:14.2.1] data are taken into account.

Bayesian classifier: A mathematical


approach to classifying a set of data, by The aim of the Bayesian model is to
selecting the class most likely to have predict the density for outcomes y in
generated that data. If ~x is the data test situations x given data
and c is a class, then the probability of D = DT , D0 with both pre-known and
that class is p(c|~x). This probability training data.
can be hard to compute so Bayes rule
can then be used here, which says that Bayesian model learning: See
probabilistic model learning . [ DH:3.1]
p(c|~x) = P (~xp(~
|c)p(c)
x) . Then we can
compute the probability of the class
p(c|~x) in terms of the probability of Bayesian network: A belief modeling
having observed the given data ~x with, approach using a graph structure.
P (~x|c), and without, p(~x) assuming the Nodes are variables and arcs are
class c plus the a priori likelihood, p(c), implied causal dependencies and are
of observing the class. The Bayesian given probabilities. These networks are
classifier is the most common statistical useful for fusing multiple data (possibly
classifier currently used in computer of different types) in a uniform and
vision processes. [ DH:3.3.1] rigorous manner.
[ WP:Bayesian network]
Bayesian filtering: A probabilistic
data fusion technique. It uses a BDRF/BRDF: See
formulation of probabilities to represent bidirectional reflectance distribution function
the system state and likelihood . [ FP:4.2.2]
functions to represent their
relationships. In this form, Bayes rule beam splitter: An optical system that
can be applied and further related divides unpolarized light into two
probabilities deduced. orthogonally polarized beams, each at
[ WP:Bayesian filtering] 90o to the other, as in this example
[ EH:4.3.4]:
26 B

distributions. Given two arbitrary


distributions pi (x)i=1,2 the
Bhattacharyya distance between them
is [ PGS:4.5]
Z p
2
d = log (p1 (x)p2 (x).dx
incoming
beam
bicubic spline interpolation: A
special case of surface interpolation
that uses cubic spline functions in two
dimensions. This is like
behavior analysis: Model based bilinear surface interpolation except
vision techniques for identifying and that the interpolating surface is curved,
tracking behavior in humans. Often instead of flat.
used for threat analysis. [ WP:Bicubic interpolation#Bicubic convolution algorithm]
[ WP:Applied behavior analysis]

behavior learning: Generation of bidirectional reflectance


goal-driven behavior models by some distribution function
learning algorithm, for example (BRDF/BDRF): If the energy
reinforcement learning. arriving at a surface patch, denoted
E(i , i ), and the energy radiated in a
Beltrami flow: A noise suppression particular direction is denoted L(e , e )
technique where images are treated as in polar coordinates, then BRDF is
surfaces and the surface area is defined as the ratio of the energy
minimized in such a way as to preserve radiated from a patch of a surface in
edges. See also diffusion smoothing . some direction to the amount of energy
arriving there. The radiance is
bending energy: 1) A metaphor determined from the irradiance by
borrowed from the mechanics of thin
metal plates. If a set of landmarks is L(e , e ) = f (i , i , e , e )E(e , e )
distributed on two infinite flat metal
plates and the differences in the where the function f is the bidirectional
coordinates between the two sets are reflectance distribution function. This
vertical displacements of the plate, one function often only depends on the
Cartesian coordinate at a time, then difference between the incident angle i
the bending energy is the energy of the ray falling on the surface and the
required to bend the metal plate so angle e of the reflected ray. The
that the landmarks are coincident. geometry is illustrated by [ FP:4.2.2]:
When applied to images, the sets of
landmarks may be sets of features. 2)
n
Denotes the amount of energy that is L E
stored due to an objects shape.
e
best next view: See i
next view planning .
i
Bhattacharyya distance: A measure e
of the (dis)similarity of two probability
B 27

in x for fixed y. For example, if ~x and ~y


bilateral filtering: A non-iterative are vectors and A is a matrix such that
alternative to anisotropic filtering ~x A~y is defined, then the function
where images can be smoothed but f (~x, ~y ) = ~x A~y + ~x + ~y is bilinear in ~x
edges present in them are preserved. and ~y . [ WP:Bilinear form]
[ WP:Bilateral filter]
bimodal histogram: A histogram
bilateral smoothing: See with two pronounced peaks, or modes.
bilateral filtering. [ WP:Bilateral filter] This is a convenient intensity histogram
for determining a binarizing threshold.
An example is:
bilinear surface interpolation: To
determine the value of a function
f (x, y) at an arbitrary location (x, y), 10000

9000

of which only discrete samples 8000


m
fij = {f (xi , yj )}ni=1 j=1 are available. 7000

The samples are arranged on a 2D grid, 6000

5000

so the value at point (x, y) is 4000

interpolated from the values at the four 3000

2000

surrounding points. In the diagram 1000

below fbilinear (x, y) = 0


0 50 100 150 200 250

A+B bin-picking: The problem of getting a


(d1 + d1 )(d2 + d2 ) robot manipulator equipped with vision
sensors to pick parts, for instance
where screws, bolts, components of a given
assembly, from a random pile. A classic
A = d1 d2 f11 + d1 d2 f21 challenge for handeye robotic systems,
involving at least segmentation ,
B = d1 d2 f12 + d1 d2 f22 object recognition in clutter and
The gray lines offer an easy aide pose estimation .
memoire: each function value fij is
multiplied by the two closest d values. binarization: See thresholding .
[ TV:8.4.2] [ ERD:2.2.1]

binary image: An image whose pixel s


can either be in an on or off state,
represented by the integers 1 and 0
f11
respectively. An example is [ DH:7.4]:
f21

d2
d1 d1
(x,y)

d2

f12 f22

bilinearity: A function of two


variables x and y is bilinear in x and y binary mathematical morphology:
if it is linear in y for fixed x and linear A group of shape-based operations that
28 B

can be applied to binary images, based simultaneously usually from a similar


around a few simple mathematical viewpoint. See also stereo vision .
concepts from set theory. Common [ TV:7.1]
usages include noise reduction ,
image enhancement and binocular stereo: A method of
image segmentation . The two most deriving depth information from a pair
basic operations are dilation and of calibrated cameras set at some
erosion . These operators take two distance apart and pointing in
pieces of data as input: the input approximately the same direction.
binary image and a structuring element Depth information comes from the
(also known as a kernel). Virtually all parallax between the two images and
other mathematical morphology relies on being able to derive the same
operators can be defined in terms of feature in both images. [ JKS:12.6]
combinations of erosion and dilation
binocular tracking: A method that
along with set operators such as
tracks objects or features in 3D using
intersection and union. Some of the
binocular stereo .
more important are opening , closing
and skeletonization . Binary biometrics: The science of
morphology is a special case of discriminating individuals from
gray scale mathematical morphology . accurate measurement of their physical
See also mathematical morphology . features. Example biometric
[ SQ:7.1] measurements are retinal lines, finger
lengths, fingerprints, voice
binary moment: Given a
characteristics and facial features.
binary image B(i, j), there is an
[ WP:Biometrics]
infinite family of moments indexed by
the integer values p and q. The pqth bipartite matching:
momentPis P given by Graph matching technique often
mpq = i j ip j q B(i, j). applied in model based vision to match
observations with models or stereo to
binary noise reduction: A method
solve the correspondence problem .
of removing salt-and-pepper noise from
Assume a set V of nodes partitioned
binary images. For example, a point
into two non-intersecting subsets V 1
could have its value set to the median
and V 2 . In other words, V = V 1 V 2
value of its eight neighbors.
and V 1 V 2 = 0. The only arcs E in
binary object recognition: the graph lie between the two subsets,
1 2 2 1
Model based techniques and algorithms i.e., E {V V } {V V }. This
used to recognize objects from their is the bipartite graph. The bipartite
binary images . matching problem is to find a maximal
matching in the bipartite graph, in
binary operation: An operation that other words, a maximal set of nodes
takes two images as inputs, such as from the two subsets connected by arcs
image subtraction . [ SOS:2.3] such that each node is connected by
exactly one arc. One maximal matching
binary region skeleton: See in the graph below with sets
skeleton. [ ERD:6.8] V 1 = {A, B, C} and V 2 = {X, Y } pairs
(A, Y ) and (C, X). The selected arcs
binocular: A system that has two
are solid, and other arcs are dashed.
cameras looking at the same scene
B 29

blending operator: An image


V1 V2 processing operator that creates a third
image C by a weighted combination of
A the input images A and B. In other
X words, C(i, j) = A(i, j) + B(i, j) for
two scalar weights and . Usually,
B + = 1. The results of some process
can be illustrated by blending the
Y original and result images. An example
C of blending that adds a detected
boundary to the original image is:
[ WP:Matching (graph theory)#Maximum matchings in bipartite graphs]

bit map: An image with one bit per


pixel. [ JKS:3.3.1]

bit-plane encoding: An image


compression technique where the image blob analysis: Blob analysis is a
is broken into bit planes and run length group of algorithms used in medical
coding is applied to each plane. To get image analysis. There are four steps in
the bit planes of an 8-bit gray scale the process: derive optimum
image, the picture has a boolean AND foreground/background threshold to
operator applied with the binary value segment objects from their background;
corresponding to the desired plane. For binarize the images by applying a
example, ANDing the image with thresholding operation; perform
00010000 gives the fifth bit plane. region growing and assign a labels to
[ AJ:11.2] each discrete group (blob) of connected
pixels; extract physical measurements
bitangent: See curve bitangent . from the blobs.
[ WP:Bitangent]
blob extraction: A part of
bitshift operator: The bitshift blob analysis . See
operator shifts the binary connected component labeling .
representation of each pixel to the left [ WP:Blob extraction]
or right by a set number of bit
positions. Shifting 01010110 right by 2 block coding: A class of signal coding
bits gives 00010101. The bitshift techniques. The input signal is
operator is a computationally cheap partitioned into fixed-size blocks, and
method of dividing or multiplying an each block is transmitted after
image by a power of 2. A shift of n translation to a smaller (for
positions is a multiplication or division compression ) or larger (for
by 2n . error-correction) block size. [ AJ:11.1]
[ WP:Bitwise operation#Bit shifts]
blocks world: The blocks world is the
blanking: Clearing a CRT or video simplified problem domain in which
device. The vertical blanking interval much early artificial intelligence and
(VBI) in television transmission is used computer vision research was done.
to carry data other than audio and The essential feature of the blocks
video. [ WP:Blanking (video)] world is the restriction of analysis to
30 B

simplified geometric objects such as border tracing: Given a pre-labeled


polyhedra and the assumption that (or segmented) image, the border is the
geometric descriptions such as image inner layer of each regions connected
edges can be easily recovered from the pixel set. It can be traced using a
image. An example blocks world scene simple 8-connective or 4-connective
is [ VSN:4]: stepping procedure in a 3 3
neighborhood. [ RN:8.1.4]

boundary: A general term for the


lower dimensional structure that
separates two objects, such as the curve
between neighboring surfaces, or
surface between neighboring volume.
[ JKS:2.5.1]

boundary description: Functional,


geometry based or set-theoretic
description of a region boundary . For
blooming: Blooming occurs when too
an example, see chain code .
much light enters a digital optical
[ ERD:7.8]
system. The light saturates CCD
pixels, causing charge to overspill into boundary detection: An image
surrounding elements giving either processing algorithm that finds and
vertical or horizontal streaking in the labels the edge pixels between two
image (depending on the orientation of neighboring image segments after
the CCD). [ CS:3.3.5] segmentation . The boundary represents
physical discontinuities in the scene, for
Blums medial axis: See
example changes in color, depth, shape
medial axis transform [ JKS:2.5.10]
or texture. [ RN:7.1]
blur: A measure of sharpness in an
boundary grouping: An
image. Blurring can arise from the
image processing algorithm that
sensor being out of focus , noise in the
attempts to complete a fully connected
environment or image capture process,
image-segment boundary from many
target or sensor motion , as a side effect
broken pieces. A boundary might be
of an image processing operation, etc.
broken because it is commonplace for
A blurred image is:
sharp transitions in property values to
appear in the image as slow transitions,
or sometimes disappear due to noise ,
blurring , digitization artifacts, poor
lighting or surface irregularities, etc.

boundary length: The length of the


boundary of an object. See also
perimeter . [ WP:Perimeter]
[ WP:Blur] boundary matching: See
curve matching .
border detection: See
boundary detection . [ RN:7.1]
B 31

boundary property: Characteristics polarized perpendicularly to the surface


of a boundary , such as arc length , normal. The degree of polarization
curvature , etc. depends on the incident angle and the
refractive indices of the air and
boundary representation: See reflective medium. The angle of
boundary description and B-Rep . maximum polarization is called
[ BT:8] Brewsters angle and is given by
boundary segmentation: See

n1
curve segmentation . B = tan1
n2
boundary-region fusion: where n1 and n2 are the refractive
Region growing segmentation approach indices of the two materials. [ EH:8.6]
where two adjacent regions are
merged when their characteristics are brightness: The quantity of radiation
close enough to pass some similarity reaching a detector after incidence on a
test. The candidate neighborhood for surface. Often measured in lux or ANSI
testing similarity can be the pixels lying lumens. When translated into an
near the shared region boundary . image, the values are scaled to fit the
[ WP:Region growing] bit patterns available. For example, if
an 8-bit byte is used, the maximum
bounding box: The smallest value is 255. See also luminance .
rectangular prism that completely [ DH:7.2]
encloses either an object or a set of
points. The ratio of the length of box brightness adjustment: Increase or
sides is often used as a classification decrease in the luminance of an image.
metric in model based recognition . To decrease, one can linearly interpolate
[ WP:Minimum bounding box] between the image and a pure black
image. To increase, one can linearly
bottom-up: Reasoning that proceeds extrapolate from a black image and the
from the data to the conclusions. In target. The extrapolation function is
computer vision, describes algorithms
that use the data to generate v = (1 ) i0 + i1
hypotheses at a low level, that are
refined as the algorithm proceeds. where is the blending factor (often
Compare top-down . [ RJS:6] between 0 and 1), v is the output pixel
value and i0 and i1 are the
BRDF/BDRF: See corresponding image and black pixels.
bidirectional reflectance distribution function
See also gamma correction and
. contrast enhancement .
[ WP:Bidirectional reflectance distribution[ function]
WP:Gamma correction]

Brodatz texture: A well-known set of


breakpoint detection: See texture images often used for testing
curve segmentation . [ BB:8.2.1] texture-related algorithms. [ NA:8.2]

breast scan analysis: See building detection: A general term


mammogram analysis . [ CS:8.4.7] for a specific, model-based set of
algorithms for finding buildings in data.
Brewsters angle: When light reflects The range of data used is large,
from a dielectric surface it will be encompassing stereo images, range
32 B

images, aerial and ground-level in the hope that any defects will
photographs. manifest themselves early in the
components life (e.g., 72 hours of
bundle adjustment: An algorithm typical use). 3) The practice of
used to optimally determine the three discarding the first several samples of
dimensional coordinates of points and an MCMC process in the hope that a
camera positions from two dimensional very low-probability starting point will
image measurements. This is done by be converge to a high-probability point
minimizing some cost function that before beginning to output samples.
includes the model fitting error and the [ NA:1.4.1]
camera variations. The bundles are the
light rays between detected 3D features butterfly filter: A linear filter
and each camera center. It is these designed to respond to butterfly
bundles that are iteratively adjusted patterns in images. A small butterfly
(with respect to both camera centers filter convolution kernel is
and feature positions). [ FP:13.4.2]
0 2 0
burn-in: 1) A phenomenon of early 1 2 1
tube-based cameras and monitors 0 2 0
where, if the same image was presented
for long periods of time it became It is often used in conjunction with the
permanently burnt into the Hough transform for finding peaks in
phosphorescent layer. Since the advent the Hough feature space, particularly
of modern monitors (1980s) this no when searching for lines. The line
longer happens. 2) The practice of parameter values of (p, ) will generally
shipping only electronic components give a butterfly shape with a peak at
that have been tested for long periods, the approximate correct values.
C

CAD: See computer aided design .


[ WP:Computer-aided design] camera: 1) The physical device used
to acquire images. 2) The
calculus of variations: See mathematical representation of the
variational approach . [ BKPH:6.13] physical device and its characteristics
such as position and calibration. 3) A
calibration object: An object or class of mathematical models of the
small scene with easily locatable projection from 3D to 2D, such as
features used for camera calibration . affine -, orthographic - or
[ HZ:7.5.2] pinhole camera . [ NA:1.4.1]

camera calibration: Methods for


determining the position and
orientation of cameras and range
sensors in a scene and relating them to
scene coordinates. There are essentially
four problems in calibration:

1. Interior orientation. Determining


the internal camera geometry,
including its principal point, focal
length and lens distortion.

2. Exterior orientation. Determining


the orientation and position of the
33
34 C

camera with respect to some


absolute coordinate system. camera motion estimation: See
sensor motion estimation .
3. Absolute orientation. Determining [ WP:Egomotion]
the transformation between two
coordinate systems, the position camera position estimation:
and orientation of the sensor in Estimation of the optical position of the
the absolute coordinate system camera relative to the scene or observed
from the calibration points. structure. This generally consists of six
degrees of freedom (three for rotation ,
4. Relative orientation. Determining three for translation ). It is often a
the relative position and component of camera calibration .
orientation between two cameras Camera position is sometimes called
from projections of calibration the extrinsic parameters of the camera.
points in the scene. Multiple camera positions may be
These are classic problems in the field estimated simultaneously with the
of photogrammetry . [ FP:3] reconstruction of 3D scene structure in
structure-and-motion algorithms.
camera coordinates: 1) A
viewer-centered representation relative Canny edge detector: The first of
to the camera. The camera coordinate the modern edge detectors . It took
system is positioned and oriented account of the trade-off between
relative to the scene coordinate system sensitivity of edge detection versus the
and this relationship is determined by accuracy of edge localization. The edge
camera calibration . 2) An image detector consists of four stages: 1)
coordinate system that places the Gaussian smoothing to reduce noise
cameras principal point at the origin and remove small details, 2)
(0, 0), with unit aspect ratio and zero gradient magnitude and direction
skew. The focal length in camera calculation, 3)
coordinates may or may not equal 1. If non-maximal suppression of smaller
image coordinates are such that the gradients by larger ones to focus edge
3 4 projection matrix is of the form localization and 4) gradient magnitude
thresholding and linking that uses
hf 0 0i hysteresis so as to start linking at
R | ~t

0 f 0
0 0 1 strong edge positions, but then also
track weaker edges. An example of the
then the image and camera coordinate edge detection results is [ JKS:5.6.1]:
systems are identical. [ HZ:5.1]

camera geometry: The physical


geometry of a camera system. See also
camera model. [ RJS:2]

camera model: A mathematical


model of the projection from 3D (real
world) space to the camera
image plane. For example see
pinhole camera model . [ RJS:2]
canonical configuration: A stereo
camera motion compensation: See camera configuration in which the
sensor motion compensation . optical axes of the cameras are parallel,
C 35

the baselines are parallel to the image Hough transforms , with the output of
planes and the horizontal axes of the one transform used as input to the
image planes are parallel. This results next.
in epipolar lines that are parallel to the
horizontal axes, hence simplifying the cascading Gaussians: A term
search for correspondences. referring to the fact that the
convolution of a Gaussian with itself is
Optical Centers Optical Axes another Gaussian. [ JKS:4.5.4]

Image CAT: See X-ray CAT . [ RN:10.3.4]


plane 1

image pkane 2 catadioptric optics: The general


approach of using mirrors in
Corresponding epipolar lines
combination with conventional imaging
cardiac image analysis: Techniques systems to get wide viewing angles
involving the development of 3D vision (180o ). It is desirable that a
algorithms for tracking the motion of catadioptric system has a single
the heart from NMR and viewpoint because it permits the
echocardiographic images. generation of geometrically correct
perspective images from the captured
Cartesian coordinates: A position images. [ WP:Catadioptric system]
description system where an
n-dimensional point, P~ , is described by categorization: The subdivision of a
exactly n coordinates with respect to n set of elements into clearly distinct
linearly independent and often groups, or categories, defined by specific
orthonormal vectors, known as axes. properties. Also the assignment of an
[ WP:Cartesian coordinate system] element to a category or recognition of
its category. [ WP:Categorization]
Z
category: A group or class used in a
P=( xc yc z c )
classification system. For example, in
mean and Gaussian curvature
shape classification , the local shape of a
surface is classified into four main
categories: planar, ellipsoidal,
P hyperbolic, and cylindrical. Another
example is the classification of observed
xc
X
grazing animals into one of {sheep, cow,
zc
horse}. See also categorization .
[ WP:Categorization]
yc

CBIR: See
Y
content based image retrieval .
cartography: The study of maps and [ WP:Content-based image retrieval]
map-building. Automated cartography
is the development of algorithms that CCD: Charge-Coupled Device. A solid
reduce the manual effort in map state device that can record the number
building. [ WP:Cartography] of photons falling on it.

cascaded Hough transform: An


application of several successive
36 C

curve at P~ , has the same curvature as


the curve at P~ , and lies towards the
concave (inner) side of the curve. This
figure shows the circle and center of
curvature, C,~ of a curve at point P~
[ FP:19.1.1]:

A 2D matrix of CCD elements are used,


together with a lens system, in digital C
cameras where each pixel value in the
final images corresponds to the output P
one or more of the elements.
[ FP:1.4.1]
center of mass: The point within an
CCIR camera: Camera fulfilling color
object at which the force of gravity
conversion and pixel formation criteria
appears to act. If the object can be
laid out by the Comite Consultatif
described by a multi-dimensional point
International des Radio. [ SEU:1.7.3]
set {~xi } containing N points, the
PN
cell microscopic analysis: center of mass is N1 i=0 ~xi f (~xi ),
Automated image processing where f (~xi ) is the value of the image
procedures for finding and analyzing (e.g., binary or gray scale ) at point ~xi .
different cell types from images taken [ JKS:2.2.2]
by a microscope vision system.
Common examples are the analysis of center of projection: The origin of
pre-cancerous cells and blood cell the camera reference frame in the
analysis. [ WP:Live blood analysis] pinhole camera model . In such a
camera, the projection of a point in
cellular array: A massively parallel space is determined by the line passing
computing architecture, composed of a through the point itself and the center
high number of processing elements. of projection. See [ JKS:8.1]:
Particularly useful in machine vision
applications when a simple 1:N LENS
mapping is possible between image CENTER OF
PROJECTION
OPTICAL
pixels and processing elements. See also AXIS
systolic array and SIMD .
[ WP:Systolic array] IMAGE
PLANE SCENE
OBJECT
center line: See medial line .
center-surround operator: An
center of curvature: The center of operator that is particularly sensitive to
the circle of curvature (or osculating spot-like image features that have
circle) at a point P~ of a plane curve at higher (or lower) pixel values in the
which the curvature is nonzero. The center than the surrounding areas. A
circle of curvature is tangent to the simple convolution mask that can be
C 37

used as an orientation independent small curve shown below using a 4


spot detector is: connected coding scheme, starting from
the upper right pixel [ JKS:6.2.1]
81 18 81
18 1 81
18 18 81

central moments: A family of image


moments that are invariant to
translation because the center of mass
has been subtracted during the
calculation. If f (c, r) is the input image
pixel value ( binary or gray scale ) at
row r and column c then the pq th
central
P P moment is
c )p (r r)q f (c, r) where
r (c c
1
(
c, r) is the center of mass of the image.
[ RJS:6] * ***
* * 2 0
** **
central projection: It is defined by
projection of an image on the surface of 3
a sphere onto a tangential plane by rays
from the center of the sphere. A great
circle is the intersection of a plane with
the sphere. The image of the great
circle under central projection will be a
line. Also known as the gnomonic
projection. [ RJS:2]

centroid: See center of mass .


[ JKS:2.2.2]

certainty representation: Any of a


set of techniques for encoding the belief chamfer matching: A matching
in a hypothesis, conclusion, calculation, technique based on the comparison of
etc. Example representation methods contours, and based on the concept of
are probability and fuzzy logic . chamfer distance assessing the
similarity of two sets of points. This
chain code: An efficient method for can be used for matching edge images
contour coding where an arbitrary using the distance transform . See also
curve is represented by a sequence of Hausdorff distance . To find the
small vectors of unit length in a limited parameters (for example, translation
set of possible directions. Depending on and scale below) that register a library
whether the 4 connected or the image and a test image, the binary edge
8 connected grid is employed, the chain map of the test image is compared to
code is defined as the digits from 0 to 3 the distance transform. Edges are
or 0 to 7, assigned to the 4 or 8 detected on image 1, and the distance
neighboring grid points in a transform of the edge pixels is
counter-clockwise sense. For example, computed. The edges from image 2 are
the string 222233000011 describes the then matched. [ ZRH:2.3]
38 C

the object. The views are chosen so


that small changes in viewpoint do not
cause large changes in appearance (e.g.,
a singularity event ). Real objects have
an unrealistic number of singularities,
so practical approaches to creating
characteristic views require
approximations, such as only using
views on a tessellated viewsphere , or
only representing the viewpoints that
Image 1 Image 2 are reasonable stable over large ranges
on the viewsphere . See also
aspect graph and
appearance based recognition .

chess board distance metric: See


Manhattan metric .
[ WP:Chebyshev distance]

chi-squared distribution: The


Dist. Trans. Edges 2 chi-squared (2 ) probability
distribution describes the distribution
of squared lengths of vectors drawn
from a normal distribution. Specifically
let the cumulative distribution function
of the 2 distribution with d degrees of
freedom be denoted 2 (d, u). Then the
probability that a point ~x drawn from a
d-dimensional Gaussian distribution
will have squared norm |~x|2 less than a
Best Match value is given by 2 (d, ). Empirical
and theoretical plots of the 2
chamfering: See distance transform . probability density function with five
[ JKS:2.5.9] degrees of freedom are here:
change detection: See
motion detection . [ JKS:14.1] 0.06
Computed
0.04 Empirical
character recognition: See
optical character recognition . [ RJS:8] 0.02

0
0 5 10 15 20 25 30
character verification: A process
|X|2, X R5
used to confirm that printed or
displayed characters are within some [ WP:Chi-square distribution]
tolerance that guarantees that they are
readable by humans. It is used in chi-squared test: A statistical test of
applications such as labeling. the hypothesis that a set of sampled
values has been drawn from a given
characteristic view: An approach to distribution. See also
object representation in which an chi-squared distribution .
object is encoded by a set of views of [ WP:Chi-square test]
C 39

chip sensor: A CCD or other


semiconductor based light sensitive
imaging device.
1
515nm
chord distribution: A 2D shape 535nm
description technique based on all 505nm 555nm
575nm
chords in the shape (that is all pairwise 0.5
495nm 595nm
segments between points on the 780nm
boundary). Histograms of their 485nm

lengths and orientations are computed. 0 380nm


The values in the length histogram are
0 0.5 1
invariant to rotations and scale linearly
with the size of object. The orientation chrominance: 1) The part of a video
histogram values are invariant to scale signal that carries color. 2) One or both
and shifts. of the color axes in a 3D color space
that distinguishes intensity and color.
chroma: The color portion of a video See also chroma . [ WP:Chrominance]
signal that includes hue and
saturation, requiring luminance to chromosome analysis: Vision
make it visible. It is also referred to as technique used for the diagnosis of
chrominance . [ WP:Chroma] some genetic disorders from microscope
images. This usually includes sorting
chromatic aberration: A focusing the chromosomes into the 23 pairs and
problem where light of different displaying them in a standard chart.
wavelengths (color) is refracted by
different amounts and consequently CID: Charge Injection Device. A type
images at different places. As blue light of semiconductor imaging device with a
is refracted more than red light, objects matrix of light-sensitive cells. Every
may be imaged with color fringes at pixel in a CID array can be individually
places where there are strong changes addressed via electrical indexing of row
in lightness . [ FP:1.2.3] and column electrodes. It is unlike a
CCD because it transfers collected
chromaticity diagram: A 2D slice of charge out of the pixel during readout,
a 3D color space . The CIE 1931 thus erasing the image.
chromaticity diagram is the slice
through the xyz color space of the CIE CIE chromaticity coordinates:
where x + y + z = 1. This slice is shown Coordinates in the CIE color space
below. The color gamut of standard with reference to three ideal standard
0-1 RGB values in this model is the colors X, Y and Z. Any visible color
bright triangle in the center of the can be expressed as a weighted sum of
horseshoe-like shape. Points outside the these three ideal colors, for example, for
triangle have had their saturations a color p = w1 X + w2 Y + w3 Z. The
truncated. See also normalized values are given by
CIE chromaticity coordinates . w1
[ WP:Chromaticity diagram#The CIE xy chromaticity x =diagram and the CIE xyY color space]
w1 + w2 + w3
w2
y=
w1 + w2 + w3
w3
z=
w1 + w2 + w3
40 C

since x + y + z = 1, we only need to


know two of these values, say (x, y).
These are the chromaticity coordinates.
[ JKS:10.3] r

CIE L*A*B* model: A


C
color representation model based on
that proposed by the Commission
Internationale dEclairage (CIE) as an
international standard for color
measurement. It is designed to be
device-independent and perceptually circle detection: A class of
uniform (i.e., the separation between algorithms, for example the
two points in this space corresponds to Hough transform , that locate the
the perceptual difference between the centers and radii of circles in digital
colors). L*A*B* color consists of a images. In general images, scene circles
luminance, L*, and two chromatic usually appear as ellipses, as in this
components: A* component, from example [ ERD:9]:
green to red; B* component, from blue
to yellow. See also CIE L*U*V* model
. [ JKS:10.3]

CIE L*U*V* model: A


color representation system where
colors are represented by luminance
(L*) and two chrominance
components(U*V*). A given change in
value in any component corresponds
approximately to the same perceptual
difference. See also
CIE L*A*B* model. [ JKS:10.3]
circle fitting: Techniques for deriving
circle: A curve consisting of all points circle parameters from either 2D or 3D
on a plane lying a fixed radius r from observations. As with all fitting
the center point C. The arc defining the problems, one can either search the
entire circle is known as the parameter space using a good metric
circumference and is of length 2r. The (using, for example, a
area contained inside the curve is given Hough transform), or can solve a
by A = r2 . A circle centered at the well-posed least-squares problem.
point (h, k) has equation [ JKS:6.8.4]
(x h)2 + (y k)2 = r2 . The circle is a
special case of the ellipse. [ NA:5.4.3] circular convolution: The circular
convolution (ck ) of two vectors {xi }
and {yi } that are of length n is defined
Pn1
as ck = i=0 xi yj where 0 k < n
and j = (i k)mod n.
[ WP:Circular convolution]

circularity: One measure C of the


degree to which a 2D shape is similar to
C 41

a circle is given by

A
B


A
C = 4
P2 b c a

close operator: The application of


two binary morphology operators,
where C varies from 0 (non-circular) to dilation followed by erosion , which has
1 (perfectly circular). A is the object the effect of filling small holes in an
area and P is the object perimeter. image. This figure shows the result of
[ WP:Circular definition] closing with a mask 22 pixels in
diameter [ JKS:2.6]:
city block distance: See
Manhattan metric . [ JKS:2.5.8]

classification: A general term for the


assignment of a label (or class) to
structures (e.g., pixels, regions , lines ,
etc.). Example classification problems
include: a) labelling pixels as road,
vegetation or sky, b) deciding whether
cells are cancerous based on cell shapes clustering: 1) Grouping together
or c) the person with the observed face images regions or pixels into larger,
is an allowed system user. homogeneous regions sharing some
[ ERD:1.2.1] property. 2) Identifying the subsets of a
set of data points {~xi } based on some
classifier: An algorithm assigning a property such as proximity.
class among several possible to an input [ FP:14.1.2]
pattern or data. See also classification ,
unsupervised classification , clustering , clutter: A generic term for unmodeled
supervised classification and or uninteresting elements in an image.
rule-based classification . [ FP:22] For example, a face detector generally
has a model for faces, and not for other
clipping: Removal or non-rendering of objects, which are regarded as clutter.
objects that do not coincide with the The background of an image is often
display area. [ NA:3.3.1] expected to include clutter. Loosely
speaking, clutter is more structured
clique: A clique of a graph G is a fully
than noise . [ FP:18.2.1]
connected subgraph of G. In a fully
connected graph, every vertex is a CMOS: Complementary metal-oxide
neighbor of all others. The graph below semiconductor. A technology used in
has a clique with five nodes. (There are making image sensors and other
other cliques in the graph with fewer computer chips. [ NA:1.4.1]
nodes, e.g., ABac with four nodes,
etc.). [ WP:Clique (graph theory)] CMY: See CMYK . [ LG:3.7]
42 C

CMYB: See CMYK . [ LG:3.7] HALFSILVERED


MIRROR
CMYK: Cyan, magenta, yellow and CAMERA OPTICAL
black color model. It is a subtractive AXIS
model where colors are absorbed by a
medium, for example pigments in TARGET AREA
paints. Where the RGB color model LIGHT SOURCE
adds hues to black to generate a cognitive vision: A part of
particular color, the CMYK model computer vision focusing techniques for
subtracts from white. Red, green and recognition and categorization of
blue are secondary colors in this model. objects , structures and events, learning
[ LG:3.7] and knowledge representation , control
and visual attention .

coherence detection: Stereo vision


technique where maximal patch
correlations are searched for across two
images to generate features. It relies on
having a good correlation measure and
a suitably chosen patch size.

coherent fiber optics: Many


fiber optic elements bound into a single
cable component with the individual
fiber spatial positions aligned, so that it
can be used to transmit images.

coherent light: Light , for example


generated by a laser , in which the
emitted light waves have the same
wavelength and are in phase. Such light
waves can remain focused over long
distances. [ WP:Collimated light]
coarse-to-fine processing: coincidental alignment: When two
Multi-scale algorithm application that structures seem to be related, but in
begins by processing at a large or fact the structures are independent or
coarse level and then, iteratively, to a the alignment is just a consequence of
small or fine level. Importantly, results being in some special viewpoint .
from each level must be propagated to Examples are random edges being
ensure a good final result. It is used for collinear or surfaces coplanar , or
computing, for example, optical flow. object corners being nearby. See also
[ FP:7.7.2] non-accidentalness .
coaxial illumination: Front lighting collimate: To align the optics of a
with the illumination path running vision system, especially those in a
along the imaging optical axis . telescopic system.
Advantages of this technique are no
visible shadows or direct specularities collimated lighting: Collimated
from the cameras viewpoint. lighting (e.g., directional back-lighting)
C 43

is a special form of structured light. A perceptually different (e.g., red versus


collimator produces light in which all blue). [ EH:4.4]
the rays are parallel.
color based database indexing: See
color based image retrieval .
[ WP:Content-
based image retrieval#Color]

camera
color based image retrieval: An
example of the more general
image database indexing process ,
where one of the main indices into the
image database comes from either color
samples, the color distribution from a
sample image, or by a set of text color
object terms (e.g., red), etc. [ WP:Content-
based image retrieval#Color]

optical color clustering: See


system color image segmentation .

color constancy: The ability of a


vision system to assign a color
description to an object that is
lamp independent of the lighting
environment. This will allow the
It is used to produce well defined system to recognize objects under many
shadows that can be cast directly onto different lighting conditions. The
either a sensor or an object. human vision system does this
automatically, but most machine vision
collinearity: The property of lying systems cannot. For example, humans
along the same straight line. [ HZ:1.3] observing a red object in a cluttered
collineation: See scene under a blue light will still see the
projective transformation. [ OF:2.2.1] object as red. A machine vision system
might see it as a very dark blue.
color: Color is both a physical and [ WP:Color constancy]
psychological phenomenon. Physically,
color refers to the nature of an object color co-occurrence matrix: A
texture that allows it to reflect or matrix (actually a histogram ) whose
absorb particular parts of the light elements represent the sum of color
incident on it. (See also reflectance .) values existing, in a given image in a
The psychological aspect is sequence, at a certain pixel position
characterized by the visual sensation relative to another color existing at a
experienced when light of a particular different position in the image. See also
frequency or wavelength is incident on co-occurrence matrix .
the retina. The key paradox here [ WP:Co-occurrence matrix]
concerns why light of slightly different
wavelengths should be be so
44 C

color correction: 1) Adjustment of reproducing about 20% of perceivable


colors to achieve color constancy . 2) colors. The color gamut achieved with
Any change to the colors of an image. premixed inks (like the Pantone
See also gamma correction . Matching System) is also smaller than
[ WP:Color correction] the RGB gamut. [ WP:Gamut]

color differential invariant: A type color halftoning: See dithering .


of differential invariant based on color [ WP:Halftone#Multiple screens and color halftoning]
RG
information, such as ||R||||G|| that
has the same value invariant to
translation, rotation and variations in color histogram matching: Used in
uniform illumination. color image indexing where the
similarity measure is the distance
color doppler: A method for between color histograms of two
noninvasively imaging blood flow images, e.g., by using the
through the heart or other body parts Kullback-Leibler divergence or
by displaying flow data on the two Bhattacharyya distance .
dimensional echocardiographic image.
Blood flow in different directions will be color image: An image where each
displayed in different colors. element ( pixel ) is a tuple of values
from a set of color bases. [ SEU:1.7.3]
color edge detection: The process of
edge detection in color images. A color image restoration: See
simple approach is combine (e.g., by image restoration . [ SEU:1.3]
addition) the edge strengths of the
color image segmentation:
individual RGB color planes.
Segmenting a color image into
color efficiency: A tradeoff that is homogeneous regions based on some
made with lighting systems, where similarity criteria. The boundaries
conflicting design constraints require around typical regions are shown here:
energy efficient production of light
while simultaneously producing
sufficiently broad spectrum illumination
that the the colors look natural. An
obvious example of a skewed tradeoff is
with low pressure sodium street
lighting. This is energy efficient but has
poor color appearance.
color indexing: Using color
color gamut: The subset of all
information, e.g., color histograms , for
possible colors that a particular display
image database indexing . A key issue is
device (CRT, LCD, printer) can
varying illumination. It is possible to
display. Because of physical difference
use ratios of colors from neighboring
in how various devices produce colors,
locations to obtain illumination
each scanner, display, and printer has a
invariance. [ WP:Color index]
different gamut, or range of colors, that
it can represent. The RGB color gamut color matching: Due to the
can only display approximately 70% of phenomenon of trichromacy, any color
the colors that can be perceived. The stimulus can be matched by a mixture
CMYK color gamut is much smaller, of the three primary stimuli. Color
C 45

matching is expressed as :

C = RR + GG + BB

where a color stimulus C is matched by


R units of primary stimulus R mixed 16,777,216 colors 256 colors
with G units of primary stimulus G
and B units of primary stimulus B.
[ SW:2.5.1]

color mixture model: A


mixture model based on distributions
in some color representation system
that specifies both the color groups in a 16 colors 4 colors

model as well as their relationships to [ WP:Color quantization]


each other. The conditional probability
of a observed pixel ~xi belonging to an color re-mapping: An image
object O is modeled as a mixture with transformation where each original
K components. color is replaced by another color from
a colormap. If the image has indexed
color models: See colors, this can be a very fast operation
color representation system . and can provide special graphical
[ WP:Color model] effects for very low processing overhead.
color moment: A color image
description based on moments of each
color channels histogram , e.g., the
mean, variance and skewness of the
histograms.

color normalization: Techniques for


normalizing the distribution of color
values in a color image, so that the
image description is invariant to
illumination . One simple method for
Original Color remapped
producing invariance to lightness is to color representation system: A 2D
use vectors of unit length for color or 3D space used to represent a set of
entries, rather than coordinates in the absolute color coordinates. RGB and
color representation system . CIE are examples of such spaces.
color quantization: The process of color spaces: See
reducing the number of colors in a color representation system .
image by selecting a subset of colors, [ WP:Color space]
then representing the original image
using only them. This has the color temperature: A scalar measure
side-effect of allowing of colour. 1) The colour temperature of
image compression with fewer bits. A a given colour C is the temperature in
color image encoded with progressively kelvins at which a heated black body
fewer numbers of colors is shown here: would emit light that is dominated by
46 C

colour C. It is relevant to computer response of separate edge operators


vision in that the illumination color applied at several orientations. The
changes the appearance of the observed edge response at a pixel is commonly
objects. The color temperature of the maximum of the responses over the
incandescent lights is about 3200 several orientations.
kelvins and sunlight is about 5500
kelvins. 2) Photographic color composite filter: Hardware or
temperature is the ratio of blue to red software image processing method
intensity. [ WP:Color temperature] based on a mixture of components such
as noise reduction , feature detection ,
color texture: Variations ( texture ) in grouping, etc.
the appearance of a surface (or region , [ WP:Composite image filter]
illumination , etc.) arising because of
spatial variations in either the color , composite video: A television video
reflectance or lightness of a surface. transmission method created as a
backward-compatible solution for the
colorimetry: The measurement of transition from black-and-white to color
color intensity relative to some television. The black-and-white TV
standard. [ WP:Colorimetry] sets ignore the color component while
color TV sets separate out the color
combinatorial explosion: When information and display it with the
used correctly, this term refers to how black-and-white intensity.
the computational requirements of an [ WP:Composite video]
algorithm increases very quickly
relative to the increase in the number of compression: See image compression .
elements to be processed, as a [ SEU:1.3]
consequence of having to consider all
combinations of elements. For example, computational theory: An approach
consider matching M model features to to computer vision algorithm
D data features with D M , each data description promoted by Marr. A
feature can be used at most once and process can be described at three levels,
all model features must be matched. implementation (e.g., as a program),
Then the number of possible matchings algorithm (e.g., as a sequence of
that need to be considered is activities) and computational theory.
D (D 1) (D 2) (D M + 1). This third level is characterized by the
Here, if M increases by only one, assumptions behind the process, the
approximately D times as much mathematical relationship between the
matching effort is needed. input and output process and the
Combinatorial explosion is also loosely description of the properties of the
used for other non-combination input data (e.g., assumptions of
algorithms whose effort grows rapidly statistical distributions). The claimed
with even small increases in input data advantage of this approach is that the
sizes. [ WP:Combinatorial explosion] computational theory level makes
explicit the essentials of the process,
compactness: A scale , translation that can then be compared to the
and rotation invariant descriptor based essentials of other processes solving the
2
on the ratio perimeter
area . [ JKS:2.5.7] same problem. By this method, the
implementation details that can confuse
compass edge detector: A class of comparisons can be ignored.
edge detectors based on combining the
C 47

computational vision: See focus. The reflecting surface usually is


computer vision . [ JKS:1.1] rotationally symmetric about the
optical or principal axis and mirror
computer aided design: 1) A general surface can be part of a sphere ,
term for object design processes where paraboloid, ellipsoid , hyperboloid or
a computer assists the designer, e.g., in other surfaces. It is also known as a
the specification and layout of converging mirror because it brings
components. For example, most current light to a focus. In the case of the
mechanical parts are designed by a spherical mirror, half way between the
computer aided design (CAD) process. vertex and the sphere center, C, is the
2) A term used for distinguishing mirror focal point, F, as shown here:
objects designed with the assistance of
a computer.
[ WP:Computer-aided design] concave
mirror

computer vision: A broad term for


object image
the processing of image data. Every principal axis C F
professional will have a different
definition that distinguishes computer
vision from machine vision ,
image processing or
[ WP:Curved mirror#Concave mirrors]
pattern recognition . The boundary is
not clear, but the main issues that lead concave residue: The set difference
to this term being used are more between a shape and its convex hull .
emphasis on 1) underlying theories of For a convex shape, the concave residue
optics, light and surfaces, 2) underlying is empty. Some shapes (in black) and
statistical, property and shape models, their concave residues (in gray) are
3) theory-based algorithms, as shown here:
contrasted to commercially exploitable
algorithms and 4) issues related to
what humans broadly relate to
understanding as contrasted with
automation. [ JKS:1.1]

computed axial tomography: Also


known as CAT. An X-ray procedure
used in conjunction with vision
techniques to build a 3D
volumetric image from multiple X-ray
images taken from different viewpoints . concavity: Loosely, a depression, dent,
The procedure can be used to produce hollow or hole in a shape or surface.
a series of cross sections of a selected More precisely, a connected component
part of the human body, that can be of a shapes concave residue .
used for medical diagnosis. [ WP:X-
ray computed tomography#Terminology]concavity tree: An hierarchical
description of an object in the form of a
tree. The concavity tree of a shape has
concave mirror: The type of mirror the convex hull of its shape as the
used for imaging, in which a concave parent node and the concavity trees of
surface is used to reflect light to a its concavities as the child nodes.
48 C

These are subtracted from the parent dilate(X, J) M , where X is the


shape to give the original object. The original image, M is the mask and J is
concavity tree of a convex shape is the the structuring element .
shape itself. The concavity tree of the
gray shape is shown below [ ERD:6.6]: conditional distribution: A
distribution of one variable given
S31
the values of one or more other variables.
[ WP:Conditional probability distribution]
S311

S3
S32
conditional replenishment: A
method for coding of video signals,
S1
S where only the portion of a video image
that has changed since the previous
S2
frame is transmitted. Effective for
sequences with largely stationary
S4 backgrounds, but more complex
S41
sequences require more sophisticated
S
algorithms that perform motion
S1 S2 S3
compensation.
S4

S31 S32 S41


[ WP:MPEG-1#Motion vectors]
S311

conformal mapping: A function from


concurrence matrix: See the complex plane to itself, f : C 7 C,
co-occurrence matrix . [ RJS:6] that preserves local angles. For
condensation tracking: Conditional example, the complex function
density propagation tracking. The y = sin(z) = 21 i(eiz eiz ) is
particle filter technique applied by conformal. [ WP:Conformal map]
Blake and Isard to edge tracking . A conic: Curves arising from the
framework for object tracking with intersection of a cone with a plane (also
multiple simultaneous hypotheses that called conic sections). This is a family
switches between multiple continuous of curves including the circle, ellipse,
autoregressive process motion models parabola and hyperbola. The general
according to a discrete transition form for a conic in 2D is
matrix. Using importance sampling it ax2 + bxy + cy 2 + dx + ey + f = 0.
is possible to keep only the N strongest Some example conics are [ JKS:6.6]:
hypotheses.

condenser lens: An optical device


used to collect light over a wide angle
and produce a collimated output beam.
circle ellipse parabola hyperbola

conditional dilation: A binary image conic fitting: The fitting of a


operation that is a combination of the geometric model of a conic section
dilation operator and a logical ax2 + bxy + cy 2 + dx + ey + f = 0 to a
AND operation with a mask , that only set of data points {(xi , yi )}. Special
allows dilation into pixels that belong cases include fitting circles and ellipses.
to the mask. This process can be [ JKS:6.6]
described by the formula:
C 49

conic invariant: An invariant of a target function is found by iteratively


conic section . If the conic is in descending along non-interfering
canonical form (conjugate) directions . The conjugate
gradient method does not require
ax2 + bxy + cy 2 + dx + ey + f = 0 second derivatives and can find the
optima of an N dimensional quadric
with a2 + b2 + c2 + d2 + e2 + f 2 = 1, form in N iterations. By comparison, a
then the two invariants to rotation and Newton method requires one iteration
translation are functions of the and gradient descent can require an
eigenvalues of the leading
quadratic arbitrarily large number of iterations.
form matrix A = ab cb . For example, [ WP:Conjugate gradient method]
the trace and determinant are
invariants that are convenient to connected component labeling: 1)
compute. For an ellipse, the eigenvalues A standard graph problem. Given a
are functions of the radii. The only graph consisting of nodes and arcs , the
invariant to affine transformation is the problem is to identify nodes forming a
class of the conic (hyperbola, ellipse, connected set. A node is in a set if it
parabola, etc.). The invariant to has an arc connecting it to another
projective transformation is the set of node in the set. 2) Connected
signs of the eigenvalues of the 3 3 component labeling is used in binary
matrix representing the conic in and gray scale image processing to
homogeneous coordinates . join together neighboring pixels into
regions. There are several efficient
conical mirror: A mirror in the shape sequential algorithms for this
of (possibly part of) a cone. It is procedure. In this image, the pixels in
particularly useful for robot navigation each connected component have a
since a camera placed facing the apex different color [ JKS:2.5.2]:
of the cone aligning the cones axis and
the optical axis and oriented towards
its base can have a full 360o view.
Conical mirrors were used in antiquity
to produce cipher images known as
anamorphoses.

conjugate direction: Optimization


scheme where a set of independent
directions are identified on the search
space. A pair of vectors ~u and ~v are
conjugate with respect to matrix A if connectivity: See pixel connectivity .
~u A~v = 0. A conjugate direction [ JKS:2.5.1]
optimization method is one in which a
series of optimization directions are conservative smoothing: A noise
devised that are conjugate with respectfiltering technique whose name derives
to the normal matrix but do not requirefrom the fact that it employs a fast
the normal matrix in order for them to filtering algorithm that sacrifices noise
be determined. suppression power to preserve the
image detail. A simple form of
conjugate gradient: A basic conservative smoothing replaces a pixel
technique of numerical optimization in that is larger (smaller) than its
which the minimum of a numerical 8 connected neighbors by the largest
50 C

(smallest) value amongst those iterative methods, most notably


neighbors. This process works well with sequential quadratic programming.
impulse noise but is not as effective [ WP:Constraint optimization]
with Gaussian noise .
constraint satisfaction: An approach
constrained least squares: It is to problem solving that consists of
sometimes useful to minimize three components: 1) a list of what
||A~x ~b||2 over some subset of possible variables need values, 2) a set of
solutions ~x that are predetermined. For allowable values for each variable and
example, one may already know the 3) a set of relationships that must hold
function values at certain points on the between the values for each variable
parameterized curve. This leads to an (i.e., the constraints). For example, in
equality constrained version of the least computer vision, this approach has
squares problem, stated as: minimize been used for different structure
||A~x ~b||2 subject to B~x = ~c. There labelling (e.g., line labelling ,
are several approaches to the solution of region labelling ) and geometric model
this problem such as QR factorization recovery tasks (e.g., reverse engineering
and the SVD . As an example, this of 3D parts or buildings from range
regression technique can be useful in data). [ WP:Constraint satisfaction]
least squares surface fitting where the
plane described by ~x is constrained to constructive solid geometry (CSG):
be perpendicular to some other plane. A method for defining 3D shapes in
terms of a mathematically defined set
constrained matching: A generic of primitive shapes. Boolean set
term for recognition approaches where theoretic operations of intersection,
two objects are compared under a union and difference are used to
constraint on either or both. One combine shapes to make more complex
example of this would be a search for shapes. For example [ JKS:15.3.2]:
moving vehicles under 20 feet in length.

constrained optimization: - =
Optimization of a function f subject to
constraints on the parameters of the
function. The general problem is to find content based image retrieval:
the x that minimizes (or maximizes) Image database searching methods that
f (x) subject to g(x) = 0 and produce matches based on the contents
h(x) >= 0, where the functions f, g, h of the images in the database, as
may all take vector-valued arguments, contrasted with using text descriptors
and g and h may also be vector-valued, to do the indexing. For example, one
encoding multiple constraints to be can use descriptors based on
satisfied. Optimization subject to color moments to select images with
equality constraints is achieved by the similar invariants.
method of Lagrange multipliers . [ WP:Content-based image retrieval]
Optimization of a quadratic form
subject to equality constraints results context: In vision, the elements,
in a generalized eigensystem. information, or knowledge occurring
Optimization of a general f subject to together with or accompanying some
general g and h may be achieved by data, contributing to the datas full
meaning. For example, in a video
C 51

sequence one can speak of spatial change detection ) as the illumination


context of a pixel, indicating the changes during the day.
intensities at surrounding location in a
given frame (image), or of temporal contour analysis: Analysis of outlines
context, indicating the intensities at of image regions.
that pixel location (same coordinates)
contour following: See
but in previous and following frames.
contour linking . [ DH:7.7]
Information deprived of appropriate
context can be ambiguous: for instance, contour grouping: See
differential optical flow methods can contour linking .
only estimate the normal flow ; the full
flow can be estimated considering the contour length: The length of a
spatial context of each pixel. At the contour in appropriate units of
level of scene understanding , knowing measurements. For instance, the length
that the image data comes from a of an image contour in pixels. See also
theater performance provides context arc length . [ WP:Arc length]
information that can help distinguish
between a real fight and a stage act. contour linking: Edge detection or
[ DH:2.11] boundary detection processes typically
identify pixels on the boundary of a
contextual image classification: region . Connecting these pixels to form
Algorithms that take into account the a curve is the goal of contour linking.
source or setting of images in their
search for features and relationships in contour matching: See
the image. Often this context is curve matching .
composed of region identifiers, color,
contour partitioning: See
topology and spatial relationships as
curve segmentation .
well as task-specific knowledge.
contour representation: See
contextual method: Algorithms that
boundary representation .
take into account the spatial
arrangement of found features in their contour tracing: See contour linking .
search for new ones.

continuous convolution: The contour tracking: See


convolution of two continuous signals. contour linking .
In 2D image processing terms the
convolution of two images f and h is: contours: See object contour .
g(x,
R Ry) = f (x, y) h(x, y) = [ FP:19.2.1]

f ( u , v )h(x u , y v )d u d v
contrast: 1) The difference in
brightness values between two
continuous Fourier transform: See structures, such as regions or pixels. 2)
Fourier transform . [ NA:2.3] A texture measure. In a
gray scale image , contrast, C, is defined
continuous learning: A general term as
describing how a system continually
updates its model of a process based on
current data. For example, updating a XX
C= (i j)2 P [i, j]
background model (for
i j
52 C

where P is the gray-level image analysis or scene understanding


co-occurrence matrix . [ JKS:7.2] system. For instance, control can be
top-down (searching for image data
contrast enhancement: Contrast that verifies an expected target) or
enhancement (also known as contrast bottom-up (progressively acting on
stretching) expands the distribution of image data or results to derive
intensity values in an image so that a hypotheses). The control strategy may
larger range of sensitivity in the outputallow selection of alternative
device can be used. This can make hypotheses, processes or parameter
subtle changes in an image more values, etc.
obvious by increasing the displayed
contrast between image brightness convex hull: Given a set of points, S,
levels. Histogram equalization is one the convex hull is the smallest convex
method of contrast enhancement. An set that contains S. a 2D example is
example of contrast enhancement is shown here [ ERD:6.6]:
here:

object

convex hull

input image
convexity ratio: Also known as
solidity. A measure that characterizes
deviations from convexity. The ratio for
area(X)
shape X is defined as area(C X)
, where
CX is the convex hull of X. A convex
figure has convexity factor 1, while all
other figures have convexity less than 1.

convolution operator: A widely used


general image and signal processing
operator thatPcomputes the weighted
after contrast enhancement sum y(j) = i w(i)x(j i) where w(i)
contrast stretching: See are the weights, x(i) is the input signal
contrast enhancement . [ ERD:2.2.1] and y(j) is the result. Similarly,
convolutions ofPimage data take the
control strategy: The guidelines form y(r, c) = i,j w(i, j)x(r i, c j).
behind the sequence of processes Similar forms using integrals exist for
performed by an automatic continuous signals and images. By the
C 53

appropriate choice of the weight values, cooperative processing between


convolution can compute low elements representing the disparity at
pass/smoothing, high a given picture element.
pass/differentiation filtering or template
matching/matched filtering, as well as coordinate system: A spanning set
many other linear functions. The right of linearly independent vectors defining
image below is the result of convolving a vector space. One example is the set
(and then inverting) the left image with generally referred to as the X, Y and Z
axes. There are, of course, an infinite
a +1 1 mask [ FP:7.1.1]: number of sets of three linearly
independent vectors describing 3D
space. The right-handed version of this
is shown in the figure. [ FP:2.1.1]

co-occurrence matrix: A
Z X
representation commonly used in
texture analysis algorithms. It records coordinate system transformation:
the likelihood (usually empirical) of two A geometric transformation that maps
features or properties being at a given points, vectors or other structures from
position relative to each other. For one coordinate system to another. It is
example, if the center of the matrix M also used to express the relationship
is position (a, b) then the likelihood between two coordinate systems.
that the given property is observed at Typical transformations include
an offset (i, j) from the current pixel is translation and rotation . See also
given by matrix value M(a + i, b + j). Euclidean transformation.
[ WP:Co-occurrence matrix] [ WP:Coordinate system#Transformations]
cooperative algorithm: An
algorithm that solves a problem by a coplanarity: The property of lying in
series of local interactions between the same plane. For example, three
adjacent structures, rather than some vectors ~a, ~b and ~c are coplanar if their
global process that has access to all scalar triple product (~a ~b) ~c = 0 is
data. The value at a structure changes zero. [ WP:Coplanarity]
iteratively in response to changing
values at the adjacent structures, such coplanarity invariant: A
as pixels, lines, regions, etc. The projective invariant that allows one to
expectation is that the process will determine when five corresponding
converge to a good solution. The points observed in two (or more) views
algorithms are well suited for massive are coplanar in the 3D space. The five
local parallelism (e.g., SIMD ), and are points allow the construction of a set of
sometimes proposed as models for four collinear points whose cross ratio
human image processing. An early value can be computed. If the five
algorithm to solve the points are coplanar, then the cross ratio
stereo correspondence problem used value must be the same in the two
54 C

views. Here, point A is selected and the


lines AB, AC, AD and AE are used to correspondence constraint: See
define an invariant cross ratio for any stereo correspondence constraint .
line L that intersects them:
correspondence problem: See
stereo correspondence problem .
[ JKS:11.2]

A
cosine diffuser: Optical correction
mechanism for correcting spatial
L responsivity to light. Since off-angle
B
D light is treated with the same response
as normal light, a cosine transfer is
E
used to decrease the relative
responsivity to it.
C
cosine transform: Representation of
an signal in terms of a basis of cosine
core line: See medial line .
functions. For an even 1D function
corner detection: See f (x), the cosine transform is
curve segmentation . [ NA:4.6] Z
F (u) = 2 f (x) cos(2ux)dx.
corner feature detectors: See 0
interest point feature detectors and For a sampled signal f0..(n1) , the
curve segmentation . [ NA:4.6.4] discrete cosine transform is the vector
b0..(n1) where, for k 1:
coronary angiography: A class of
image processing techniques (usually r n1
based on X-ray data) for visualizing 1X
b0 = fi
and inspecting the blood vessels n i=0
surrounding the heart (coronaries). See r n1
2X
also angiography . bk = fi cos (2i + 1)k
[ WP:Coronary catheterization] n i=0 2n

correlation: See cross correlation . For a 2D signal f (x, y) the cosine


[ OF:6.4] transform F (u, v) is
Z Z
correlation based optical flow 4 f (x, y) cos(2ux)
estimation: Optical flow estimated 0 0
by correlating local image texture at
cos(2vy)dxdy
each point in two or more images and
noting their relative movement. [ SEU:2.5.2]

correlation based stereo: cost function: The function or metric


Dense stereo reconstruction (i.e., at quantifying the cost of a certain action,
every pixel) computed by move or configuration, that is to be
cross correlating local image minimized over a given parameter
neighborhoods in the two images to space. A key concept of optimization .
find corresponding points, from which See also Newtons optimization method
depth can be computed by and functional optimization . [ HZ:3.2]
stereo triangulation .
C 55

neither a step edge nor fold edge is


covariance: The covariance, denoted seen:
2 , of a random variable X is the
expected value of the square of the
deviation of the variable from the
mean. If is the mean, then
2 = E[(X )2 ]. CRACK EDGE
For a d-dimensional data set
represented as a set of n column vectors
~x1..n , the
Pnsample mean is crack following: Edge tracking on
~ = n1 i=1 ~xi , and the sample

the dual lattice or cracks between
covariancePis the d d matrix
1 n pixels based on the continuous
= n1 i=1 (~
xi ~ ) .
~ )(~xi
segments of line from a crack code .
[ DH:2.7.2]
Crimmins smoothing operator: An
covariance propagation: A method
iterative algorithm for speckle
of statistical error analysis, in which
(salt-and-pepper noise ) reduction. It
the covariance of a derived variable can
uses a nonlinear noise reduction
be estimated from the covariances of
technique that compares the intensity
the variables from which it is derived.
of each image pixel with its eight
For example, assume that independent
neighbors and either increments or
variables ~x and ~y are sampled from
decrements the value to try and make it
multi-variate normal distributions with
more representative of its surroundings.
associated covariance matrices Cx and
The algorithm raises the intensity of
Cy . Then, the covariance of the derived
pixels that are darker relative to their
variable ~z = a~x + b~y is
neighbors and lowers pixels that are
Cz = a2 Cx + b2 Cy . [ HZ:4.2]
relatively brighter. More iterations
crack code: A contour description produce more reduction in noise but at
method that codes not the pixels the cost of increased blurring of detail.
themselves but the cracks between
critical motion: In the problem of
them. This is done as a four-directional
self-calibration of a moving camera,
scheme as shown below. It can be
there are certain motions for which
viewed as a chain code with four
calibration algorithms fail to give
directions rather than eight.
unique solutions. Sequences for which
[ WP:Chain code]
self-calibration is not possible are
known as critical motion sequences.

0 cross correlation: Standard method


of estimating the degree to which two
series are correlated. Given two series
3 1 {xi } and {yi }, where
i = 0, 1, 2, .., (N 1) the cross
2 correlation, rd , at a delay d is defined as
P
(xi mx ).(yid my )
crack code = { 2, 2, 1, 2, 3, 2 } pP i pP
((x 2 2
i i mx ) i (yid my )
crack edge: A type of edge used in
line labeling research to represent where mx and my are the means of the
where two aligned blocks meet. Here, corresponding sequences. [ EH:11.3.4]
56 C

cross correlation matching:


Matching based on the cross correlation
of two sets. The closer the correlation is CROSS SECTION AXIS CROSS SECTION FUNCTION
to 1, the better the match is. For
example, in correlation based stereo ,
for each pixel in the first image, the
corresponding pixel in the second image
is the one with the highest correlation TRUNCATED PYRAMID

score, where the sets being matched are cross-validation: A test of how well a
the local neighborhoods of each pixel. model generalizes to other data (i.e.,
[ NA:5.3.1] using samples other than those that
were used to create the model). This
cross ratio: The simplest projective
approach can be used to determine
invariant. It generates a scalar from
when to stop training/learning, before
four points of any 1D projective space
over-generalization occurs. See also
(e.g., a projective line). The cross ratio
leave-one-out test . [ FP:16.3.5]
for the four points ABCD below is
[ FP:13.1]: crossing number: The crossing
number of a graph is the minimum
number of arc intersections in any
(r + s)(s + t) drawing of that graph. A planar graph
s(r + s + t) has crossing number zero. This graph
has a crossing number of one
[ ERD:6.8.1]:

t
s
r
D A
B
C
B
A
b c a

cross section function: Part of the


generalized cylinder representation that
CSG: See constructive solid geometry
gives a volumetric based representation
[ BT:8]
of an object. The representation defines
the volume by a curved axis, a cross CT: See X-ray CAT .
section and a cross section function at [ WP:X-ray computed tomography]
each point on that axis. The cross
section function defines how the size or cumulative histogram: A histogram
shape of the cross section varies as a where the bin contains not only the
function of its position along the axis. count of all instances having that value
See also generalized cone . This but also the count of all bins having a
example shows how the size of the lower index value. This is the discrete
square cross section varies along a equivalent of the cumulative probability
straight line to create a truncated distribution. The right figure is the
pyramid: cumulative histogram corresponding to
C 57

the normal histogram on the left:


[ WP:Histogram#Cumulative histogram] curvature primal sketch: A
multi-scale representation of the
significant changes in curvature along a
planar curve . [ NA:4.8]
6 12
4 8 curvature scale space: A multi-scale
2 4 representation of the curvature
zero-crossing points of a planar contour
1 2 3 4 5 1 2 3 4 5
as it evolves during smoothing. It is
NORMAL HISTOGRAM CUMULATIVE HISTOGRAM
found by parameterizing the contour
currency verification: Algorithms for using arc length, which is then
checking that printed money and convolved with a Gaussian filter of
coinage are genuine. A specialist field increasing standard deviation.
involving optical character recognition . Curvature zero-crossing points are then
recovered and mapped to the
scale-space image with the horizontal
curse of dimensionality: The axis representing the arc length
exponential growth of possibilities as a parameter on the original contour and
function of dimensionality . This might the vertical axis representing the
manifest as several effects as the standard deviation of the Gaussian
dimensionality increases: 1) the filter. [ WP:Curvature Scale Space]
increased amount of computational
effort required, 2) the exponentially curvature sign patch classification:
increasing amount of data required to A method of local surface classification
populate the data space in order that based on its mean and
training works and 3) how all data Gaussian curvature signs, or
points tend to become equidistant from principal curvature sign class . See also
each other, thus causing problems for mean and Gaussian curvature shape
clustering and machine learning classification .
algorithms.
curve: A set of connected points in 2D
[ WP:Curse of dimensionality]
or 3D, where each point has at most
cursive script recognition: Methods two neighbors. The curve could be
of optical character recognition defined by a set of connected points, by
whereby hand-written cursive (also an implicit function (e.g., y + x2 = 0),
called joined-up) characters are by an explicit form (e.g., (t, t2 ) for all
automatically classified. [ BM:5.2] t), or by the intersection of two surfaces
(e.g., by intersecting the planes X = 0
curvature: Usually meant to refer to and Y = 0), etc. [ NA:4.6.2]
the change in shape of a curve or
surface . Mathematically, the curvature curve binormal: The vector
of a curve is the length of the second perpendicular to both the tangent and
2
derivative | s~
x(s)
2 | of the curve ~x(s) normal vectors to a curve at any given
parameterized as a function of arc point:
length s. A related definition holds for
surfaces, only here there are two
distinct principal curvatures at each
point on a sufficiently smooth surface.
[ NA:4.6]
58 C

BINORMAL curve fitting: Methods for finding the


TANGENT parameters of a best-fit curve through a
set of 2D (or 3D) data points. This is
often posed as a minimization of the
least-squares error between some
NORMAL hypothesized curve and the data points.
curve bitangent: A line tangent to a If the curve, y(x), can be thought of as
curve or surface at two different points, the sum of a set of m arbitrary basis
as illustrated here: [ WP:Bitangent] functions, Xk and written
k=m
X
INFLECTION POINTS y(x) = ak Xk (x)
k=1

then the unknown parameters are the


weights ak . The curve fitting process
can then be considered as the
BITANGENT
BITANGENT POINTS LINE minimization of some log-likelihood
function giving the best fit to N points
curve evolution: A curve abstraction whose Gaussian error has standard
method whereby a curve can be deviation i . This function may be
iteratively simplified, as in this defined as
example:
i=N
X yi y(xi ) 2
2 =
i=1
i

The weights that minimize this can be


found from the design matrix D
Xj (xi )
Di,j =
i
evolution
stage
by finding the solution to the linear
equation
Da = r
yi
where the vector ri = i . [ NA:4.6.2]

curve inflection: A point on a curve


where the curvature is zero as it
changes sign from positive to negative,
For example, a relevance measure is as in the two examples below
assigned to every vertex in the curve. [ FP:19.1.1]:
The least important can be removed at
each iteration by directly connecting its INFLECTION POINTS
neighbors. This elimination is repeated
until the desired stage of abstraction is
reached. Another method of curve
evolution is to progressively smooth
the curve with Gaussian weighting of
BITANGENT
increasing standard deviation. BITANGENT POINTS
LINE
C 59

curve representation system:


curve invariant: Measures taken over Methods of representing or modeling
a curve that remain invariant under curves parametrically. Examples
certain transformations, e.g., include: b-splines , crack codes ,
arc length and curvature are invariant cross section functions ,
under Euclidean transformations . Fourier descriptors , intrinsic equations,
polycurves , polygonal approximations ,
curve invariant point: A point on a
radius vector functions , snakes ,
curve that has a geometric property
splines, etc. [ JKS:6.1]
that is invariant to changes in
projective transformation . Thus, the curve saliency: A voting method for
point can be identified and used for the detection of curves in a 2D or 3D
correspondence in multiple views of the image. Each pixel is convolved with a
same scene. Two well known planar curve mask to build a saliency map.
curve invariant points are curvature This map will hold high values for
inflection points and bitangent points, locations in space where likely
as shown here: candidates for curves exist.

INFLECTION POINTS curve segmentation: Methods of


identifying and splitting curves into
different primitive types. The location
of changes between one primitive type
and another is particularly important.
BITANGENT
For example, a good curve
BITANGENT POINTS
LINE segmentation algorithm should detect
the four lines that make up a square.
curve matching: The comparison of Methods include: corner detection ,
data sets to previously modeled curves Lowes method and recursive splitting .
or other curve data sets. If a modeled
curve closely corresponds to a data set
then an interpretation of similarity can curve smoothing: Methods for
be made. Curve matching differs from rounding polygon approximations or
curve fitting in that curve fitting vertex-based approximations of surface
involves minimizing the parameters of boundaries. Examples include
theoretical models rather than actual Bezier curves in 2D and NURBS in
examples. 3D. See also curve evolution . An
example of a polygonal data curve
curve normal: The vector smoothed by a Bezier curve is:
perpendicular to the tangent vector to
a curve at any given point and that
also lies in the plane that locally
contains the curve at that point:

BINORMAL
TANGENT

NORMAL
60 C

to base the reconstructed 3D


coordinates, or what viewpoint to use
when presenting the reconstruction.
The cyclopean viewpoint is located at
the midpoint of the baseline between
the two cameras.

cylinder extraction: Methods of


identifying the cylinders and the
constituent data points from 2.5D and
data curve 3D images that are samples from 3D
smoothed curve cylinders.
(Bezier)
cylinder patch extraction: Given a
curve tangent vector: The vector range image or a set of 3D data points,
that is instantaneously parallel to a cylinder patch extraction finds (usually
curve at any given point: connected) sets of points that lie on the
surface of a cylinder, and usually also
the equation of that cylinder. This
BINORMAL
process is useful for detecting and
TANGENT
modelling pipework in range images of
industrial scenes.

cylindrical mosaic: A
NORMAL
photomosaicing approach where
cut detection: The identification of individual 2D images are projected onto
the frames in film or video where the a cylinder. This is possible only when
camera viewpoint suddenly changes, the camera rotates about a single axis
either to a new viewpoint within the or the camera center of projection
current scene or to a new scene. remains approximately fixed with
[ WP:Shot transition detection] respect to the distance to the nearest
scene points.
cyclopean view: A term used in
stereo image analysis, based on the cylindrical surface region: A region
mythical one-eyed Cyclops. When of a surface that is locally cylindrical.
stereo reconstruction of a scene occurs A region in which all points have zero
based on two cameras, one has to Gaussian curvature , and nonzero
consider what coordinate system to use mean curvature.
D

darkfield illumination: A specialized data reduction: A general term for


illumination technique that uses oblique processes that 1) reduce the number of
illumination to enhance contrast in data points, e.g., by subsampling or by
subjects that are not imaged well under using cluster centers of mass as
normal illumination conditions. representative points or by decimation ,
[ LG:2.1.1] or 2) reduce the number of dimensions
in each data point, e.g., by projection
data fusion: See sensor fusion . or principal component analysis
[ WP:Data fusion] (PCA). [ WP:Data reduction]
data integration: See sensor fusion . data structure: A fundamental
[ WP:Data integration] concept in programming: a collection of
computer data organized in a precise
data parallelism: Reference to the
structure, for instance a tree (see for
parallel structuring of either the input
instance quadtree ), a queue, or a stack.
to programs, the organization of
Data structures are accompanied by
programs themselves or the
sets of procedures, or libraries,
programming language used. Data
implementing various types of data
parallelism is a useful model for much
manipulation, for instance storage and
image processing because the same
indexing. [ WP:Data structure]
operation can be applied independently
and in parallel at all pixels in the DCT: See discrete cosine transform .
image. [ RJS:8] [ SEU:2.5.2]

61
62 D

deblur: To remove the effect of a


?
known blurring function on an image.
If an observed image I is the Rule

convolution of an unknown image I ? ? ?


and a known blurring kernel B, so that
I = I B, then deblurring is the ? ? ? ? ?
process of computing I given I and B.
See deconvolution , image restoration ,
Wiener filtering . ? ? ? ? ? ?

decentering distortion (lens): Lens


Decisions made
decentering is a common cause of
? Decisions
tangential distortion . It arises when
the lens elements are not perfectly Results

aligned and creates an asymmetric decoding: Converting a signal that


component to the distortion. has been encoded back into its original
[ WP:Distortion (optics)#Software correction]
form (lossless coding) or into a form
close to the original (lossy coding). See
decimation: 1) In digital signal also image compression .
processing, a filter that keeps one [ WP:Decoding]
sample out of every N , where N is a decomposable filters: A complex
fixed number. See also subsampling . 2) filter that can be applied as a number
Mesh decimation: merging of similar of simpler filters applied one after the
adjacent surface patches or other. For example the 2D
mesh vertices in order to reduce the Laplacian of Gaussian filter can be
size of a model. Often used as a decomposed into four simpler filters.
processing step when deriving a surface
model from a range image . deconvolution: The inverse process of
[ WP:Decimation (signal processing)] convolution. Deconvolution is used to
remove certain signals (for example
decision tree: Tools for helping to blurring) from images by
choose between several courses of inverse filtering (see deblur ). For a
action. They are an effective structure convolution producing image
within which an agent can search h = f g + given f and g, the image
options and investigate the possible and convolution mask, is the noise
outcomes. They also help to balance and is the convolution, deconvolution
the risks and rewards associated with attempts to estimate f . Deconvolution
each possible course of action. is often an ill-posed problem and may
[ WP:Decision tree] not have a unique solution. See also
image restoration . [ AL:14.5]

defocus: Blurring of an image, either


accidental or deliberate, by incorrect
focus or viewpoint parameters use or
estimation. See also shape from focus ,
shape from defocus . [ BKPH:6.10]

defocus blur: Deformation of an


image due to the predictable behavior
D 63

of optics when incorrectly adjusted. which gets corrupted by unwanted


The blurring is the result of light rays processes. For instance, MPEG
that, after entering the optical system, compressiondecompression can alter
misconverge on the imaging plane. If some intensities, so that the image is
the camera parameters are known in degraded. (See also
advance, the blurring can be partially JPEG image compression), image noise.
corrected. [ BKPH:6.10] [ WP:Degradation (telecommunications)]

deformable model: Object


descriptors that model a specific class degree of freedom: A free variable in
of deformable objects (e.g., eyes, a given function. For instance,
hands) where the shapes vary according rotations in 3D space depend on three
to the values of the parameters. If the angles, so that a rotation matrix has
general, but not specific, characteristics nine entries but only three degrees of
of an object type are known then a freedom. [ VSN:3.1.3]
deformable model can be constructed
and used as a matching template for Delaunay triangulation: The
new data. The degree of deformation Delaunay graph of the point set can be
needed to match the shape can be used constructed from its Voronoi diagram
as matching score. See also by connecting the points in adjacent
modal deformable model , polygons. The connections form the
geometric deformable model . Delaunay triangulation. The
[ WP:Active contour model] triangulation has the property that the
circumcircle of every triangle contains
deformable shape: See no other points. The approach can be
deformable model . used to construct a polyhedral surface
approximation from a set of 3D sample
deformable superquadric: A type of points. The solid lines connecting the
superquadric volumetric model that points below are the Delaunay
can be deformed by bending, twisting, triangulation and the dashed lines are
etc. in order to fit to the data being the boundaries of the Voronoi diagram.
modeled. [ OF:10.4.4]
deformable template model: See
deformable model .

deformation energy: The metric


that must be minimized when
determining an active shape model .
Comprised of terms for both
internal energy (or force) arising from
the model shape deformation and
external energy (or force) arising from
the discrepancy between the model demon: A program that runs in the
shape and the data. background, for instance performing
[ WP:Internal energy#Description and definition]
checks or guaranteeing the correct
functioning of a module of a complex
degradation: A loss of quality system.
suffered by an image, the content of [ WP:Daemon (computer software)]
64 D

demosaicing: The process of relationships among the depth, camera


converting a single color per pixel parameters and the amount of blurring
image (as captured by most in images to derive the depths from
digital cameras ) into a three color per parameters that can be directly
pixel image. [ WP:Demosaicing] measured.

DempsterShafer: A belief modeling depth from focus: A method to


approach for testing a hypothesis that determine distance to one point by
allows information, in the form of taking many images in better and
beliefs, to be combined into a better focus. This is also called
plausibility measure for that hypothesis. autofocus or software focus.
[ WP:Dempster-Shafer theory] [ WP:Depth of focus]

dense reconstruction: A class of depth image: See range image .


techniques estimating depth at each [ JKS:11]
pixel of an input image or sequence,
thus generating a dense sampling of the depth image edge detector: See
3D surfaces imaged. This can be range image edge detector .
achieved, for instance, by
depth map: See range image .
range sensing, or stereo vision .
[ JKS:11]
dense stereo matching: A class of
depth of field: The distance between
methods establishing the
the nearest and the farthest point in
correspondence (see
focus for a given camera [ JKS:8.3]:
stereo correspondence problem )
between all pixels in a stereo pair of
Nearest point Furthest point
images. The generated disparity map in focus in focus
Camera
can then be used for depth estimation.

densitometry: A class of techniques


that estimate the density of a material
from images, for instance bone density Depth of field
in the medical domain (bone
densitometry). [ WP:Densitometry] depth perception: The ability to
perceive distances from visual stimuli,
depth: Distance of scene points from for instance motion or stereo vision .
either the camera center or the camera [ WP:Depth perception]
imaging plane. In a range image , the
intensity value in the image is a
measure of depth. [ JKS:13.1]

depth estimation: The process of 3D model


estimating the distance between a
sensor (e.g., a stereo pair) and a part of
the scene being imaged. Stereo vision
and range sensing are two well-known
ways to estimate depth.

depth from defocus: The depth from


defocus method uses the direct View 1 View 2
D 65

depth sensor: See range sensor . dichroic filter: A dichroic filter


[ BT:8] selectively transmits light of a given
wavelength. [ WP:Dichroic filter]
Deriche edge detector: Convolution
filter for edge finding similar to the dichromatic model: The dichromatic
Canny edge detector . Deriche uses a model states that the light reflected
different optimal operator where the from a surface is the sum of two
filter is assumed to have infinite extent. components, body and interface
The resulting convolution filter is reflectance. Body reflectance follows
sharper than the derivative of the Lamberts law. Interface reflectance
Gaussian that Canny uses models highlights. The model has been
applied to several computer vision tasks
|x|
f (x) = Axe including color constancy , shape
recovery and color image segmentation .
See also edge detection . See also color .

derivative based search: Numerical difference image: An image


optimization methods assuming that computed as pixelwise difference of two
the gradient can be estimated. An other images, that is, each pixel in the
example is the quasi-Newton approach, difference image is the difference
that attempts to generate an estimate between the pixels at the same location
of the inverse Hessian matrix. This is in the two input images. For example,
then used to determine the next in the figure below the right image is
iteration point. the difference of the left and middle
images (after adding 128 for display
purposes). [ RJS:5]

Start

diffeomorphism: A differentiable
one-to-one map between manifolds.
The map has a differentiable inverse.
[ WP:Diffeomorphism]

difference-of-Gaussians operator:
A convolution operator used to locate
edges in a gray-scale image using an
Conjugate gradient search approximation to the
Laplacian of Gaussian operator. In 2D
DFT: See discrete Fourier transform . the convolution mask is:
[ SEU:2.5.1]
(x2 +y 2 )

(x2 +y 2 )

2 2
c1 e 1
c2 e 2

diagram analysis: Syntactic analysis


of images of line drawings, possibly where the constants c1 and c2 control
with text in a report or other the height of the individual Gaussians
document. This field is closely related and 1 , 2 are the standard deviations.
to the analysis of visual languages. [ CS:4.5.4]
66 D

differential geometry: A field of m=2


mathematics studying the local m=1

derivative-based properties of curves m=0

and surfaces, for instance tangent plane

and curvature . [ TV:A.5]


Light
differential invariant: Image source Light banding

descriptors that are invariant under Diffraction


grating

geometric transformations as well as


illumination changes. Invariant diffuse illumination: Light energy
descriptors are generally classified as that comes from a multitude of
global invariants (corresponding to directions, hence not causing significant
object primitives) and local invariants shading or shadow effects. The opposite
(typically based on derivatives of the of diffuse illumination is
image function). The image function is directed illumination .
always assumed to be continuous and
differentiable. diffuse reflection: Scattering of light
[ WP:Differential invariant] by a surface in many directions. Ideal
Lambertian diffusion results in the
differential pulse code modulation: same energy being reflected in every
A technique for converting an analogue direction regardless of the direction of
signal to binary by sampling it, the incoming light energy.
expressing the value of the sampled [ WP:Diffuse reflection]
data modulation in binary and then
reducing the bit rate by taking account
Light
of the fact that consecutive samples do
not change much. [ AJ:11.3] Reflected Light

differentiation filtering: See


gradient filter.

diffraction: The bending of light rays diffusion smoothing: A technique


at the edge of an object or through a achieving Gaussian smoothing as the
transparent medium. The amount by solution of a diffusion equation with the
which a ray is bent is dependent on image to be filtered as the initial
wavelength. [ VSN:2.1.4] boundary condition. The advantage is
diffraction grating: An array of that, unlike repeated averaging,
diffracting elements that has the effect diffusion smoothing allows the
of producing periodic alterations in a construction of a continuous
waves phase, amplitude or both. The scale space .
simplest arrangement is an array of slits digital camera: A camera in which
(see moire interferometry ). the image sensing surface is made up of
[ WP:Diffraction grating] individual semiconductor sampling
elements (typically one per pixel of the
image), and quantized versions of the
sensed values are recorded when an
image is captured .
[ WP:Digital camera]
D 67

digital elevation map: A sampled blood vessels are made more visible by
and quantized map where every point using an X-ray contrast medium. See
represents a height above a reference also medical image registration .
ground plane (i.e., the elevation). [ WP:Digital subtraction angiography]

digital terrain map: See


digital elevation map .

digital topology: Topology (i.e., how


things are connected/arranged) in a
digital domain (e.g., in a
digital image). See also connectivity .
[ WP:Digital topology]

digital watermarking: The process


of embedding a signature/watermark
digital geometry: Geometry (points,
into digital data. In the domain of
lines, angles, surfaces, etc.) in a
digital images this is most normally
sampled and quantized domain.
done for copyright protection. The
[ WP:Digital geometry]
digital watermark may be invisible or
digital image: Any sampled and visible (as shown).
quantized . image [ SEU:1.7] [ WP:Digital watermarking]

41 43 45 51 56 49 45 40
56 48 65 85 55 52 44 46
59 77 99 81 127 83 46 56
52 116 44 54 55 186 163 163
51 129 46 48 71 164 86 97
50 85 192 140 167 99 51 44
57 63 91 126 102 56 54 49
146 169 213 246 243 139 180 163
41 44 54 56 47 45 36 54 digitization: The process of making a
sampled digital version of some analog
digital image processing: signal (such as an image).
Image processing restricted to the [ WP:Digitizing]
domain of digital images .
[ WP:Digital image processing] dihedral edge: The edge made by two
planar surfaces. A fold in a surface:
digital signal processor: A class of
co-processors designed to execute
processing operations on digitized
signals efficiently. A common
characteristic is the provision of a fast
multiply and accumulate function, e.g.,
a a + b c.
[ WP:Digital signal processor]

digital subtraction angiography: A dilate operator: The operation of


basic technique used in medical image expanding a binary or gray-scale
processing to detect, visualize and object with respect to the background .
inspect blood vessels, based on the This has the effect of filling in any
subtraction of a background image from small holes in the object(s) and joining
the target image, usually where the any object regions that are close
68 D

together. Most frequently described as discontinuity preserving


a morphological transformation , and is regularization: A method for
the dual of the erode operator . preserving edges (discontinuities) from
[ SEU:2.4.6] being blurred as a result of some
regularization operation (such as the
recovery of a dense disparity map from
a sparse set of disparities computed at
matching feature points).

discontinuous event tracking:


Tracking of events (such as a moving
person) through a sequence of images.
The discontinuous nature of the
dimensionality: The number of tracking is caused by the distance that
dimensions that need to be considered. a person (or hand, arm, etc.) can travel
For example 3D object location is often between frames and also be the
considered as a seven dimensional possibility of occlusion (or
problem (three dimensions for position, self-occlusion).
three for orientation and one for the
object scale). [ SQ:18.3.2]

direct least square fitting: Direct


fitting of a model to some data by a
method that has a closed form or
globally convergent solution.

directed illumination: Light energy


that comes from a particular direction discrete cosine transform (DCT):
hence causing relatively sharp shadows. A transformation that converts digital
The opposite of this form of images into the frequency domain in
illumination is diffuse illumination . terms of the coefficients of discrete
cosine functions. Used, for example,
directional derivative: A derivative within JPEG image compression .
taken in a specific direction, for [ SEU:2.5.2]
instance, the component of the
gradient along one coordinate axis. discrete Fourier transform (DFT):
The images on the right are the vertical A version of the Fourier transform for
and horizontal directional derivatives of sampled data. [ SEU:2.5.1]
the image on the left.
discrete relaxation: A technique for
[ WP:Directional derivative]
labeling objects in which the possible
type of each object is iteratively
constrained based on relationships with
other objects in the scene. The aim is
to obtain a globally consistent
interpretation (if possible) from locally
consistent relationships.

discontinuity detection: See discrimination function: A binary


edge detection . function separating data into two
classes. See classifier . [ DH:2.5.1]
D 69

applied to binary images in which every


disparity: The image distance shifted object point is transformed into a value
between corresponding points in stereo representing the distance from the
image pairs. [ JKS:11.1] point to the nearest object boundary.
This operation is also referred to as
Left image features Right image features Disparity
chamfering (see chamfer matching ).
[ JKS:2.5.9]

disparity gradient: The gradient of a


disparity map for a stereo pair, that
4 3 2 2 2 2 1 1 1 1
estimates the surface slope at each 4 3 2 1 1 1 1 0 0 0
4 3 2 1 0 0 0 0 1 0
image point. See also binocular stereo . 4
4
3
3
2
2
1
1
0
0
1
1
1
2
1
1
1
0
0
0

[ OF:6.2.5] 4
4
3
3
2
2
1
1
0
0
1
0
1
0
1
0
0
0
1
1
4 3 2 1 1 1 1 1 1 1
4 3 2 2 2 2 2 2 2 2

disparity gradient limit: The


maximum allowed disparity gradient in
a potential stereo feature match.
distortion coefficient: A coefficient
disparity limit: The maximum
in a given image distortion model, for
allowed disparity in a potential stereo
instance k1 , k2 in the
feature match. The notion of a
distortion polynomial . See also
disparity limit is supported by evidence
pincushion distortion , barrel distortion
from the human visual system.
.
dispersion: Scattering of light by the
distortion polynomial: A polynomial
medium through which it is traveling.
model of radial lens distortion . A
[ WP:Dispersion (optics)]
common example is
distance function: See x = xd (1 + k1 r2 + k2 r4 ),
distance metric . [ JKS:2.5.8] y = yd (1 + k1 r2 + k2 r4 ). Here, x, y are
the undistorted image coordinates,
distance map: See range image . xd , yd are the distorted image
[ JKS:11] coordinates, r2 = x2d + yd2 , and k1 , k2
are the distortion coefficients . Usually
distance metric: A measure of how k2 is significantly smaller than k1 , and
far apart two things are in terms of can be set to 0 in cases where high
physical distance or similarity. A metric accuracy is not required.
can be other functions besides the
standard Euclidean distance , such as distortion suppression: Correction
the algebraic or Mahalanobis of image distortions (such as
distances. A true metric must satisfy: non-linearities introduced by a lens).
1) d(x, y) + d(y, z) d(x, z), 2) See geometric distortion and
d(x, y) = d(y, x), 3) d(x, x) = 0 and 4) geometric transformation .
d(x, y) = 0 implies x = y, but computer
vision processes often use functions that dithering: A technique simulating the
do not satisfy all of these criteria. appearance of different shades or colors
[ JKS:2.5.8] by varying the pattern of black and
white (or different color) dots. This is a
distance transform: An common task for inkjet printers.
image processing operation normally [ AL:4.3.5]
70 D

character recognition and


document mosaicing ).

document mosaicing:
Image mosaicing of documents.

document retrieval: Identification of


a document in a database of scanned
documents based on some criteria.
divide and conquer: A technique for [ WP:Document retrieval]
solving problems efficiently by
subdividing the problem into smaller DoG: See difference of Gaussians .
subproblems, and then recursively [ CS:4.5.4]
solving these subproblems in the
expectation that the smaller problems dominant plane: A degenerate case
will be easier to solve. An example is encountered in uncalibrated
an algorithms for deriving a structure and motion recovery where
polygonal approximation of a contour most or all of the tracked
in which a straight line estimate is image features are coplanar in the
recursively split in the middle (into two scene.
segments with the midpoint put exactly
Doppler: A physics phenomenon
on the contour) until the distance
whereby an instrument receiving
between the polygonal representation
acoustic or electromagnetic waves from
and the actual contour is below some
a source in relative motion measures an
tolerance.
increasing frequency if the source is
[ WP:Divide and conquer algorithm]
approaching, and decreasing if receding.
The acoustic Doppler effect is employed
Curve
Final Estimate in sonar sensors to estimate target
velocity as well as position.
[ WP:Doppler effect]

Initial Estim
ate downhill simplex: A method for
finding a local minimum using a
divisive clustering: simplex (a geometrical figure specified
Clustering/cluster analysis in which all by N + 1 vertices) to bound the
items are initially considered as a single optimal position in an N -dimensional
set (cluster) and subsequently divided space. See also optimization .
into component subsets (clusters). [ WP:Nelder-Mead method]

DIVX: An MPEG 4 based video DSP: See digital signal processor .


compression technology aiming to [ WP:Digital signal processing]
achieve sufficiently high compression to
enable transfer of digital video contents dual of the image of the absolute
over the Internet, while maintaining conic (DIAC): If is the matrix
high visual quality. [ WP:DivX] representing the image of the
absolute conic , then 1 represents its
document analysis: A general term dual (DIAC). Calibration constraints
describing operations that attempt to are sometimes more readily expressed
derive information from documents in terms of the DIAC than the IAC.
(including for example [ HZ:7.5]
D 71

duality: The property of two concepts


or theories having similar properties
that can be applied to the one or to the
other. For instance, several relations
linking points in a projective space are
formally the same as those linking lines
in a projective space; such relations are
dual. [ OF:2.4.1]
dynamic scene: A scene in which
dynamic appearance model: A some objects move, in contrast to the
model describing the changing common assumption in
appearance of an object/scene over shape from motion that the scene is
time. rigid and only the camera is moving.

dynamic programming: An dynamic stereo: Stereo vision for a


approach to numerical optimization in moving observer. This allows
which an optimal solution is searched shape from motion techniques to be
by keeping several competing partial used in addition to the stereo
paths throughout and pruning techniques.
alternative paths that reach the same
point with a suboptimal value. dynamic time warping: A technique
[ VSN:7.2.2] for matching a sequence of observations
(usually one per time sample) to a
dynamic range: The ratio of the model sequence of feature, where the
brightest and darkest values in an hope is for a one-to-one match of
image. Most digital images have a observations to features. But, because
dynamic range of around 100:1 but of variations in rate at which
humans can perceive detail in dark observations are produced, some
regions when the range is even 10,000:1. features may get skipped or others
To allow for this we can create high matched to more than one observation.
dynamic range images. [ SQ:4.2.1] The usual goal is to minimize the
amount of skipping or multiple samples
matched (time warping). Efficient
algorithms to solve this problem exist
based on the linear ordering of the
sequences. See also
hidden Markov models (HMM) .
[ WP:Dynamic time warping]
E

early vision: A general term referring length of any orthogonal chord.


to the initial stages of computer vision [ WP:Eccentricity (mathematics)]
(i.e., image capture and
image processing ). Also known as

Maxim
low level vision . [ BKPH:1.4]
um o
earth movers distance: A metric for rthog

comparing two distributions by hord


um C
Maxim
onal

evaluating the minimum cost of


chord

transforming one distribution into the


other (e.g., can be applied to
color histogram matching ).
[ FP:25.2.2]
echocardiography: Cardiac
Distribution 1 Distribution 2 Transformation ultrasonography (echocardiography) is
a non-invasive technique for imaging
the heart and surrounding structures.
Generally used to evaluate cardiac
chamber size, wall thickness, wall
motion, valve configuration and motion
eccentricity: A shape representation and the proximal great vessels.
that measures how non-circular a shape [ WP:Echocardiography]
is. One way of computing this is to take
the ratio of the maximum chord length edge: A sharp variation of the
of the shape to the maximum chord intensity function. Represented by its
72
E 73

position, the magnitude of the intensity


gradient, and the direction of the
maximum intensity variation. [ FP:8]

edge based segmentation:


Segmentation of an image based on the
edges detected.
edge finding: See edge detection .
edge based stereo: A type of [ FP:8.3]
feature based stereo where the features
used are edges . [ VSN:7.2.2] edge following: See edge tracking .
[ FP:8.3.2]
edge detection: An image processing
operation that computes edge vectors edge gradient image: See
(gradient and orientation) for every edge image . [ WP:Image gradient]
point in an image. The first stage of
edge based segmentation . [ FP:8.3] edge grouping: See edge tracking .

edge image: An image where every


pixel represents an edge or the
edge magnitude .

edge linking: See edge tracking .


[ AJ:9.4]

edge magnitude: A measure of the


contrast at an edge, typically the
magnitude of the intensity gradient at
the edge point. See also edge detection,
edge point . [ JKS:5.1]

edge direction: The direction edge matching: See curve matching .


perpendicular to the normal to an [ BKPH:13.9.3]
edge, that is, the direction along the
edge motion: The motion of edges
edge, parallel to the lines of constant
through a sequence of images. See also
intensity. Alternatively, the normal
shape from motion and the
direction to the edge, i.e., the direction
aperture problem . [ JKS:14.2.1]
of maximum intensity change
(gradient). See also edge detection , edge orientation: See edge direction .
edge point . [ TV:4.2.2] [ TV:4.2.2]
edge enhancement: An edge point: 1) A location in an image
image enhancement operation that where some quantity (e.g., intensity)
makes the gradient of edges steeper. changes rapidly. 2) A location where
This can be achieved, for example, by the gradient is greater than some
adding some multiple of a Laplacian threshold. [ FP:8]
convolved version of the image L(i, j)
to the image g(i, j). edge preserving smoothing: A
f (i, j) = g(i, j) + L(i, j) where f (i, j) smoothing filter that is designed to
is the enhanced image and is some preserve the edges in the image while
constant. [ RJS:4] reducing image noise . For example see
74 E

median filter . of A are images of faces. These vectors


[ WP:Edge-preserving smoothing] can be used for face recognition .
[ WP:Eigenface]

eigenspace based recognition:


Recognition based on an
eigenspace representation . [ TV:10.4]
edge sharpening: See eigenspace representation: See
edge enhancement . [ RJS:4] principal component representation.
[ TV:10.4.2]
edge tracking: 1) The grouping of
edges into chains of significant edges. eigenvalue: A scalar that for a
The second stage of matrix A satisfies Ax = x where x is a
edge based segmentation . Also known nonzero vector ( eigenvector ).
as edge following , edge grouping and [ SQ:2.2.3]
edge linking . 2) Tracking how the edge
moves in a video sequence. [ ERD:4] eigenvector: A non-zero vector x that
for a matrix A satisfies Ax = x where
edge type labeling: Classification of is a scalar (the eigenvalue ).
edge points or edges into a limited [ SQ:2.2.3]
number of types (e.g., fold edge ,
shadow edge, occluding edge, etc.). eigenvector projection: Projection
[ ERD:6.11] onto the PCA basis vectors.
[ SQ:13.1.4]
EGI: See extended Gaussian image .
[ FP:20.3] electromagnetic spectrum: The
entire range of frequencies of
egomotion: The motion of the electromagnetic waves including X-rays,
observer with respect to the observed ultraviolet, visible light, infrared,
scene. [ FP:17.5.1] microwave and radio waves. [ EH:3.6]
egomotion estimation:
Wavelength (in meters)
Determination of the motion of a -12 -10 -8 -6 -4 -2 2 4
10 10 10 10 10 10 1 10 10
camera. Generally based on image
features corresponding to static objects X rays Microwave Radio
in the scene. See also Ultraviolet Visible Infrared

structure and motion . A typical image


pair where the camera position is to be ellipse fitting: Fitting of an ellipse
estimated is: [ WP:Egomotion] model to the boundary of some shape,
data points, etc. [ TV:5.3]
Image from Position A Image from Position B

Position A Position B

111
000 Motion of the observer
111
000
000
111 000
111
eigenface: An eigenvector determined ellipsoid: A 3D volume in which all
from a matrix A in which the columns plane cross sections are ellipses or
E 75

circles. An ellipsoid is the set of points MPEG and JPEG image compression .
2 2 2
(x, y, z) satisfying xa2 + yb2 + zc2 = 1. [ WP:Code]
Ellipsoids are used in computer vision
as a basic shape primitive and can be endoscope: An instrument for visually
combined with other primitives in order examining the interior of various bodily
to describe a complex shape. [ SQ:9.9] organs. See also fiberscope .
[ WP:Endoscopy]
elliptic snake: An
active contour model of an ellipse energy minimization: The problem
whose parameters are estimated of determining the absolute minimum
through energy minimization from an of a multivariate function representing
initial position. (by a potential energy-like penalty) the
distance of a potential solution from
elongatedness: A the optimal solution. It is a
shape representation that measures specialization of the optimization
how long a shape is with respect to its problem. Two popular minimization
width (i.e., the ratio of the length of algorithms in computer vision are the
the bounding box to its width), as LevenbergMarquardt and Newton
illustrated below. See also eccentricity . optimization methods.
[ WP:Elongatedness] [ WP:Energy minimization]

entropy: 1. Colloquially, the amount


of disorder in a system. 2. A measure
Len
gth of the information content of a
Wid

random variable X. Given that X has


th

a set of possible values or outcomes X,


with probabilities {P (x), x X}, the
entropy H(X) of X is defined as

X
P (x) log P (x)
xX
EM: See expectation maximization .
[ FP:16.1.2]
with the understanding that
empirical evaluation: Evaluation of 0 log 0 := 0. For a multivariate
computer vision algorithms in order to distribution, the joint entropy H(X, Y )
characterize their performance by of X, Y is
comparing the results of several
algorithms on standardized test
problems. Careful evaluation is a X
difficult research problem in its own P (x, y) log P (x, y)
(x,y)XY
right.

encoding: Converting a digital signal,


represented as a set of values, from one For a set of values represented as a
form to another, often to compress the histogram , the entropy of the set may
signal. In lossy encoding, information is be defined as the entropy of the
lost in the process and the decoding probability distribution function
algorithm cannot recover it. See also represented by the histogram.
76 E

epipolar plane image (EPI): An


image that shows how a particular line
from a camera changes as the camera
position is changed such that the image
line remains on the same epipolar plane
. Each line in the EPI is a copy of the
relevant line from the camera at a
different time. Features that are distant
Left: p log p as a function of p. from the camera will remain in the
Probabilities near 0 and 1 signal high same position in each line, and features
entropy, probabilities between are less that are close to the camera will move
entropic. Right: The entropy of the from line to line (the closer the feature
gray scale histograms in some windows the further it will move). [ AL:17.3.4]
on an image. [ AJ:2.13]
Image 1 Image 8
epipolar constraint: A geometric
constraint reducing the dimensionality
of the stereo correspondence problem .
For any point in one image, the possible
matching points in the other image are
constrained to lie on a line known as
EPI from 8 images for highlighted line:
the epipolar line . This constraint may
be described mathematically using the
fundamental matrix . See also
epipolar geometry . [ FP:10.1.1] epipolar plane image analysis: An
approach to determining
epipolar correspondence matching: shape from motion in which epipolar
Stereo matching using the plane images (EPIs) are analyzed. The
epipolar constraint . slope of lines in an EPI is proportional
to the distance of the object from the
epipolar geometry: The geometric
camera, where vertical lines
relationship between two
corresponding to features at infinity
perspective cameras . [ FP:10.1.1]
[ AL:17.3.4]

Real world point epipolar plane motion: See


Optical Center Optical Center epipolar plane image analysis .
Image Plane Image Plane
Eipolar Line Eipolar Line
epipolar rectification: The
Image Point Image Point
Camera 1 Camera 2 image rectification of stereo images so
that the epipolar lines are aligned with
epipolar line: The intersection of the the image rows (or columns).
epipolar plane with the image plane .
See also epipolar constraint . epipolar transfer: The transfer of
[ FP:10.1.1] corresponding epipolar lines in a stereo
pair of images, defined by a
epipolar plane: The plane defined by homography . See also stereo and
any real world scene point together stereo vision . [ FP:10.1.4]
with the optical centers of two
cameras. [ FP:10.1.1] epipole: The point through which all
epipolar lines from a camera appear to
E 77

pass. See also epipolar geometry . function of the translation and rotation
[ FP:10.1.1] of the camera in the world reference
frame. See also the fundamental matrix
Image Epipolar Lines
. [ FP:10.1.2]

Euclidean distance: The geometric


distance between two points (x1 , y1 )
and
p (x2 , y2 ), i.e.,
Epipole
(x1 x2 )2 + (y1 y2 )2 . For
n-dimensional Pnvectors ~x1 and ~x21 , the
distance is ( i=1 (x1,i x2,i )2 ) 2 .
epipole location: The operation of
[ SQ:9.1]
locating the epipoles . [ OF:6.2.1.2]
Euclidean reconstruction: 3D
equalization: See
reconstruction of a scene using a
histogram equalization . [ JKS:4.1]
Euclidean frame of reference, as
erode operator: The operation of opposed to an affine reconstruction or
reducing a binary or gray scale object projective reconstruction . The most
with respect to the background . This complete reconstruction achievable. For
has the effect of removing any isolated example, using stereo vision .
object regions and separating any
Euclidean space: A representation of
object regions that are only connected
the space of all n-tuples (where n is the
by a thin section. Most frequently
dimensionality ). For example the three
described as a
dimensional Euclidean space (X, Y, Z)
morphological transformation and is
is typically used to describe the real
the dual of the dilate operator .
world. Also known as Cartesian space
[ AL:8.2]
(see Cartesian coordinates ).
[ WP:Euclidean space]

Euclidean transformation: A
transformation that operates in
Euclidean space (i.e., maintaining the
Euclidean spatial arrangements).
error propagation: 1) The Examples include rotation and
propagation of errors resulting from one translation. Often applied to
computation to the next computation. homogeneous coordinates. [ FP:2.1.2]
2) The estimation of the error (e.g., [ SQ:7.3]
variance) of a process based on the Euler angle: The Euler angles
estimates of the error in the input data (, , ) are a particular set of angles
and intermediate computations. describing rotations in three
[ WP:Propagation of uncertainty] dimensional space. [ JKS:12.2.1]
essential matrix: In EulerLagrange: The
binocular stereo, a matrix E expressing EulerLagrange equations are the basic
a bilinear constraint between equations in the calculus of variations ,
corresponding image points u, u in a branch of calculus concerned with
camera coordinates: u Eu = 0. This maxima and minima of definite
constraint is the basis for several integrals. They occur, for instance, in
reconstruction algorithms. E is a
78 E

Lagrangian mechanics and have been method works well even when there are
used in computer vision for a variety of missing values. [ FP:16.1.2]
optimizations, including for surface
interpolation. See also expectation value: The mean value
variational approach and of a function (i.e., the average expected
variational problem . [ TV:9.4.2] value). If p(x) is the probability density
function of a random variable
R x, the
Euler number: The number of expectation of x is x = p(x)xdx.
contiguous parts (regions) less the [ VSN:A2.2]
number of holes. Also known as the
genus. [ AJ:9.10] expert system: A system that uses
available knowledge and heuristics to
even field: The first of the two fields solve problems. See also
in an interlaced video signal. knowledge based vision . [ AL:11.2]
[ AJ:11.1]
exponential smoothing: A method
even function: for predicting a data value (Pt+1 ) based
A function where f (x) = f (x) for all x. on the previous observed value (Dt )
[ WP:Even and odd functions#Even functions] and the previous prediction (Pt ).
Pt+1 = Dt + (1 )Pt where is a
weighting value between 0 and 1.
event analysis: See [ WP:Exponential smoothing]
event understanding .
[ WP:Event study]

event detection: Analysis of a


sequence of images to detect activities
in the scene. Pt (=1.0)
Value

Dt
Image from a sequence of images Movement detected in the image

Pt (=0.5)

0 1 2 3 4 5 6 7 8 9
Time

event understanding: Recognition of


an event (such as a person walking) in exponential transformation: See
a sequence of images. Based on the pixel exponential operator .
data provided by event detection .
[ WP:Event study] expression understanding: See
facial expression analysis .
exhaustive matching: Matching
where all possibilities are considered. extended Gaussian image (EGI):
As an alternative see Use of a Gaussian sphere for
hypothesize and verify . histogramming surface normals. Each
surface normal is considered from the
expectation maximization (EM): A center of the sphere and the value
method of finding a maximum associated with the surface patch with
likelihood estimate of some parameters which it intersects is incremented.
based on a sample data set. This [ FP:20.3]
E 79

and an active shape model that is part


of the models deformation energy .
This measure is used to deform the
model to the image data. [ SQ:8.5.1]

extremal point: Points that lie on the


boundary of the smallest convex region
extended light source: A enclosing a set of points (i.e., that lie
light source that has a significant size on the convex hull ). [ SOS:4.6.1]
relative to the scene, i.e., is not
extrinsic parameters: See
approximated well by a
exterior orientation . [ TV:2.4.2]
point light source . In other words this
type of light source has a diameter and eye location: The task of finding eyes
hence can produce fuzzy shadows. in images of faces. Approaches include
Contrast with: point light sources . blink detection, face feature detection ,
[ BKPH:10.5] etc.

eye tracking: Tracking the position of


No shadow the eyes in a face image sequence. Also,
Light Source
Fuzzy shadow tracking the gaze direction .
Complete shadow
[ WP:Eye tracking]

exterior orientation: The position of


a camera in a global coordinate system.
That which is determined by an
absolute orientation calculation.
[ FP:3.4]

external energy (or force): A


measure of fit between the image data
F

face analysis: A general term covering skin color analysis .


the analysis of face images and models. [ WP:Face detection]
Often used to refer to
facial expression analysis .

face authentication: Verification that


(the image of) a face corresponds to a
particular individual. This differs from
the face recognition in that here only
the model of a single person is face feature detection: The location
considered. of features (such as eyes, nose, mouth)
[ WP:Facial recognition system] from a human face. Normally
performed after face detection
although it can be used as part of
face detection . [ WP:Face detection]
?
=

face detection: Identification of faces


within an image or series of images.
This often involved a combination of
human motion analysis and
80
F 81

face identification: See face feature detection .


face recognition . [ WP:Computer facial animation]
[ WP:Facial recognition system]
facial expression analysis: Study or
face indexing: Indexing from a identification of the facial expression(s)
database of known faces as a precursor of a person from an image or sequence
to face recognition . of images.

face modeling: Representing a face Happy Perplexed Surprised

using some type of model typically


derived from an image (or images).
These models are used in
face authentication , face recognition,
etc.

face recognition: The task of factorization: See


recognizing a face from an image as an motion factorization . [ TV:8.5.1]
instance of a person recorded in a
database of faces. false alarm: See false positive .
[ WP:Facial recognition system] [ TV:A.1]

false negative: A binary classifier


c(x) returns + or - for examples x. A
false negative occurs when the classifier
?
returns - for an example that is in
= reality +. [ TV:A.1]

false positive: A binary classifier c(x)


returns + or - for examples x. A false
positive occurs when the classifier
face tracking: Tracking of a face in a returns + for an example that is in
sequence of images. Often used as part reality -. [ TV:A.1]
of a humancomputer interface. [ F. J.
Huang, and T. Chen, Tracking of fast Fourier transform (FFT): A
multiple faces for human-computer version of the Fourier transform for
interfaces and virtual environments, discrete samples that is significantly
IEEE Int. Conf. on Multimedia and more efficient (order N log2 N ) than the
Expo, Vol. 3, pp 1563-1566, 2000.] standard discrete Fourier transform
(which is order N 2 ) on data sets with
face verification: See N points. [ AL:13.5]
face authentication .
[ WP:Facial recognition system] fast marching method: A type of
level set method in which the search
facet model based extraction: The can move in only one direction (hence
extraction of a model based on facets making it faster).
(small simple surfaces; e.g., see [ WP:Fast marching method]
planar facet model ) from range data .
See also planar patch extraction . feature: 1) A distinctive part of
something (e.g., the nose and eyes are
facial animation: The way in which distinctive features of the face), or an
facial expressions change. See also attribute derived from an object/shape
82 F

(e.g., circularity ). See also feature point: The image location at


image feature . 2) A numerical property which a particular feature is found.
(possibly combined with others to form
a feature vector ) and generally used in feature point correspondence:
a classifier . [ TV:4.1] Matching feature points in two or more
images . The assumption is that the
feature based optical flow feature points are the image of the same
estimation: Calculation of scene point. Having the correspondence
optical flow in a sequence of images allows the estimation of the depth from
from image features . binocular stereo , fundamental matrix ,
homography or trifocal tensor in the
feature based stereo: A solution to case of 3D scene structure recovery or
the stereo correspondence problem in of the 3D target motion in the case of
which image features are compared target tracking. [ TV:8.4.2]
from the two images. The main
alternative approach is feature point tracking: Tracking of
correlation based stereo . individual image features in a sequence
of images.
feature based tracking: Tracking the
motion of image features through a feature selection: Selection of
sequence. [ TV:8.4.2] suitable features (properties) for a
specific task, for example, classification.
feature contrast: The difference Typically features should be
between two features. This can be independent, detectable, discriminatory
measured in many domains (e.g., and reliable. [ FP:22.3]
intensity, orientation, etc.).
[ SEU:2.6.1] feature similarity: How much two
features resemble each other. Measures
feature detection: Identification of of feature similarity are required for
given features in an image (or model). feature based stereo ,
For example see corner detection . feature based tracking,
[ SEU:2.6] feature matching , etc. [ SEU:2.6.1]
feature extraction: See feature space: The dimensions of a
feature detection . [ SEU:2.6] feature space are the feature (property)
values of a given problem. An object or
feature location: See
shape is mapped to feature space by
feature detection . [ SEU:2.6]
computing the values of the set of
feature matching: Matching of features defining the space, typically for
image features in several images of the recognition and classification. In the
same object (for instance, example below, different shapes are
feature based stereo ), or of features mapped to a 2D feature space defined
from an unknown object with features by area and rectangularity.
from known objects (feature based [ SEU:2.6.1]
recognition ). [ TV:8.4.2]

feature orientation: The orientation


of an image feature with respect to the
image frame of reference.
F 83

Area
Maximum
Ferets diameter

Rectangularity

feature stabilization: A technique


for stabilizing the position of an image
feature in an image sequence so that it
remains in a particular position on a FERET: A standard database of face
display (allowing/causing the rest of the images with a defined experimental
image to move relative to that feature). protocol for the testing and comparison
of face recognition algorithms.
[ WP:FERET database]
Original sequence
FFT: See fast Fourier transform .
[ AL:13.5]

Stabilized sequence
fiber optics: A medium for
transmitting light that consists of very
thin glass or plastic fibers. It can be
Stabilized feature
used to provide much higher bandwidth
for signals encoded as patterns of light
pulses. Alternately, it can be used to
feature tracking: See transmit images directly through
feature based tracking . [ TV:8.4.2] rigidly connected bundles of fibers, so
as to see around corners, past obstacles,
feature vector: A vector formed by etc. [ EH:5.6]
the values of a number of image
features (properties), typically all fiberscope: A flexible fiber optic
associated with the same object or instrument allowing parts of an object
image. [ SEU:2.6.1] to be viewed that would normally be
inaccessible. Most often used in medical
feedback: The use of outputs from a examinations. [ WP:Fiberscope]
system to control the systems actions.
[ WP:Feedback] fiducial point: A reference point for a
given algorithm, e.g., a fixed, known,
Ferets diameter: The distance easily detectable pattern for a
between two parallel lines at the calibration algorithm.
extremities of some shape that are
tangential to the boundary of the figureground separation: The
shape. Maximum, minimum and mean segmentation of the area of the image
values of Ferets diameter are often representing the object of interest (the
used (where every possible pair of figure) from the remainder of the image
parallel tangent lines is considered). (the background).
84 F

Image Figure Ground


fingerprint indexing: See
fingerprint database indexing .

finite element model: A class of


numerical methods for solving
differential problems. Another relevant
class is finite difference methods.
[ WP:Finite element method]
figure of merit: Any scalar that is
used to characterize the performance of finite impulse response filter (FIR):
an algorithm. [ WP:Figure of merit] A filter that produces an output value
(yn ) based on the current
Ppand past
filter: In general, any algorithm that input values (xi ). yn = i=0 ai xni
transforms a signal into another. For where ai are weights. See also
instance, bandpass filters infinite impulse response filters .
remove/reduce the parts of an input [ AJ:2.3]
signal outside a given frequency
interval; gradient filters allow only FIR: See finite impulse response filter .
image gradients to pass through; [ AJ:2.3]
smoothing filters attenuate high
frequencies. [ ERD:3] Firewire (IEEE 1394): A serial
digital bus system supporting 400
filter ringing: A type of distortion Mbits per second. Power, control and
caused by the application of a steep data signals are carried in a single
recursive filter. Normally this term cable. The bus system makes it possible
applies to electronic filters in which to address up to 64 cameras from a
certain components (e.g., capacitors single interface card and multiple
and inductors) can store energy and computers can acquire images from the
later release it, but there are also same camera simultaneously.
digital equivalents to this effect. [ WP:IEEE 1394]

filtering: Application of a filter . first derivative filter: See


[ BB:3.1] gradient filter .

fingerprint database indexing: first fundamental form: See


Indexing into a database of fingerprints surface curvature . [ FP:21.2.1]
using a number of features derived from
the fingerprints. This allows a smaller Fisher linear discriminant (FLD):
number of fingerprints to be considered A classification method that maps high
when attempting dimensional data into a single
fingerprint identification within the dimension in such a way as to maximize
database. class separability. [ DH:4.10]

fingerprint identification: fisheye lens: See wide angle lens .


Identification of an individual through [ WP:Fisheye lens]
comparison of an unknown fingerprint
flat field: 1) An object of uniform
(or fingerprints) with previously known
color, used for photometric calibration
fingerprints.
of optical systems. 2) A camera system
[ WP:Automated fingerprint identification]
is flat field correct if the gray scale
F 85

output at each pixel is the same for a nuclear magnetic resonance .


given light input. [ AJ:4.4] [ WP:Functional magnetic resonance imaging]

flexible template: A model of a


shape in which the relative position of FOA: See focus of attention .
points is not fixed (e.g., defined in [ WP:Focus of attention]
probabilistic form). This approach FOC: See focus of contraction .
allows for variations in the appearance [ JKS:14.5.2]
of the shape.
focal length: 1) The distance between
FLIR: Forward Looking Infrared. An the camera lens and the focal plane . 2)
infrared system mounted on a vehicle The distance from a lens at which an
looking ahead along the direction of object viewed at infinity would be in
travel. [ WP:Forward looking infrared] focus. [ FP:1.2.2]

LIGHT (from infinity)


Infrared Sensor

Focal Length
flow field: See optical flow field .
[ OF:9.2] focal point: The point on the
optical axis of a lens where light rays
flow histogram: A histogram of the
from an object at infinity (also placed
optical flow in an image sequence. This
on the optical axis) converge.
can be used, for example, to provide a
[ FP:1.2.2]
qualitative description of the motion of
the observer. Focal Point

flow vector field: Optical flow is


Optical Axis
described by a vector (magnitude and
orientation) for each image point.
Hence a flow vector field is the same as
an optical flow field . [ OF:9.2]
focal plane: The plane on which an
fluorescence: The emission of visible image is focused by a lens system.
light by a substance caused by the Generally this consists of an array of
absorption of some other (possibly photosensitive elements. See also
invisible) electromagnetic wavelength. image plane . [ EH:5.2.3]
This property is sometimes used in
focal surface: A term most frequently
industrial machine vision . [ FP:4.2]
used when a concave mirror is used to
fMRI: Functional Magnetic Resonance focus an image (e.g., in a reflector
Imaging, or fMRI, is a technique for telescope). The focal surface in this
identifying which parts of the brain are case is the surface of the mirror.
activated by different types of physical [ WP:Focal surface]
stimulation, e.g., visual or acoustic
Focal Surface
stimuli. A MRI scanner is set up to
register the increased blood flow to the Optical Axis
activated areas of the brain on
Functional MRI scans. See also Focal Point
86 F

directly forwards along the optical axis


focus: To focus a camera is to arrange then the optical flow vectors would all
for the focal points of various image emanate from the principal point
features to converge on the focal plane . (usually near the center of the image).
An image is considered to be in focus if [ FP:10.1.3]
the main subject of interest is in focus.
Note that focus (or lack of focus) can be Two images from a moving observer. Blended Image
used to derive useful information (e.g., FOE

see depth from focus ). [ TV:2.2.2]

In focus Out of focus


FOE: See focus of expansion .
[ FP:10.1.3]

fold edge: A surface orientation


discontinuity. An edge where two
locally planar surfaces meet. The figure
below shows a fold edge.
focus control: The control of the
focus of a lens system usually by
moving the lens along the optical axis FOLD EDGE
or by adjusting the focal length . See
also autofocus .

focus following: A technique for


slowly changing the focus of a camera
as an object of interest moves. See also
depth from focus . [ WP:Follow focus]

focus invariant imaging: Imaging foreground: In computer vision,


systems that are designed to be generally used in the context of object
invariant to focus . Such systems have recognition. The area of the scene or
large depths of field. image in which the object of interest
lies. See figureground separation .
focus of attention (FOA): The [ JKS:2.5.1]
feature or object or area to which the
attention of a visual system is directed. foreshortening: A typical perspective
[ WP:Focus of attention] effect whereby distant objects appear
smaller than closer ones. [ FP:4.1.1]
focus of contraction (FOC): The
point of convergence of the optical flow form factor: The physical size or
vectors for a translating camera. The arrangement of an object. This term is
component of the translation along the frequently used with reference to
optical axis must be nonzero. Compare computer boards. [ FP:5.5.2]
focus of expansion . [ JKS:14.5.2]
Forstner operator: A
focus of expansion (FOE): The feature detector used for
point from which all optical flow corner detection as well as other edge
vectors appear to emanate in a static features. [ WP:Interest-
scene where the observer is moving. For Operator#F.C3.B6rstner-Operator]
example if a camera system was moving
F 87

Fourier space: The frequency domain


forward looking radar: A radar space in which an image (or other
system mounted on a vehicle looking signal) is represented after application
ahead along the direction of travel. See of the Fourier transform.
also side looking radar .
Fourier space smoothing:
FourierBessel transform: See Application of a smoothing filter (e.g.,
Hankel transform . to remove high-frequency noise) in a
[ WP:Hankel transform] Fourier transformed image.
[ SEU:2.5.4]
Fourier domain convolution:
Convolution in the Fourier domain Fourier transform: A transformation
involves simply multiplication of the that allows a signal to be considered in
Fourier transformed image by the the frequency domain as a sum of sine
Fourier transformed filter. For very and cosine waves or equivalently as a
large filters this operation is much more sum of exponentials. For a two
efficient than convolution in the dimensional image F (u, v) =
R R
original domain. [ BB:2.2.4] f (x, y)e2i(xu+yv) dxdy. See

also fast Fourier transform ,
Fourier domain inspection:
discrete Fourier transform and
Identification of defects based on
inverse Fourier transform . [ FP:7.3.1]
features in the Fourier transform of an
image. fovea: The high-resolution central
region of the human retina. The
Fourier image processing:
analogous region in an artificial sensor
Image processing in the
that emulates the retinal arrangement
Fourier domain (i.e., processing images
of photoreceptor, for example a
that have been transformed using the
log-polar sensor. [ FP:1.3]
Fourier transform ). [ SEU:2.5.4]
foveal image: An image in which the
Fourier matched filter object
sampled pattern is inspired by the
recognition: Object recognition in
arrangement of the human fovea, i.e.,
which correlation is determined using a
sampling is most dense in the image
matched filter that is the conjugate of
center and gets progressively sparser
the Fourier transform of the object
towards the periphery of the image.
being located.

Fourier shape descriptor: A


boundary representation of a shape in
terms of the coefficients of a Fourier
transformation. [ BB:8.2.4]

Fourier slice theorem: A slice at an


angle of a 2D Fourier transform of an
object is equal to a 1D Fourier
transform of a parallel projection of the
object taken at the same angle. See
also slice based reconstruction .
[ WP:Projection-slice theorem]
88 F

foveation: 1) The process of creating a frame grabber: See frame store .


foveal image . 2) Directing the camera [ FP:1.4.1]
optical axis to a given direction.
[ WP:Foveated imaging] frame of reference: A
coordinate system defined with respect
fractal image compression: An to some object, the camera or with
image compression method based on respect to the real world.
exploiting self-similarity at different [ WP:Frame of reference]
scales. [ WP:Fractal compression]
Z world
fractal measure/dimension: A
measure of the roughness of a shape.
Consider a curve whose length (L1 and
L2 ) is measured at two scales (S1 and
S2 ). If the curve is rough the length
Ycube
will grow as the scale is increased. The
fractal dimension is D = log(L 1 L2 ) X cube
log(S2 S1 ) . Z cube
[ JKS:7.4]
X world

fractal representation: A
representation based on self-similarity. Yworld
Z
Ycylinder

For example a fractal representation of cylinder X cylinder

an image could be based on similarity


of blocks of pixels. frame store: An electronic device for
recording a frame from an imaging
fractal surface: A surface model that system. Typically such devices are used
is defined progressively using fractals as interfaces between CCIR cameras
(i.e., the surface displays self-similarity and computers. [ ERD:2.2]
at different scales).
freeform surface: A surface that does
fractal texture: A not follow any particular mathematical
texture representation based on form; for example, the folds of a piece
self-similarity between scales. of fabric, as shown below. [ BM:4.1]
[ JKS:7.4]

frame: 1) A complete standard


television video image consisting of
both the even and odd video fields . 2)
A knowledge representation technique
suitable for recording a related set of
facts, rules of inference, preconditions,
etc. [ TV:8.1]

frame buffer: A device that stores a


video frame for access, display and Freeman code: A type of chain code
processing by a computer. For example in which a contour is represented by
such devices are used to store the frame coordinates for the first point followed
from which a video display is refreshed. by a series of direction codes (typically
See also frame store . [ TV:2.3.1] 0 through 7). In the following figure we
show the Freeman codes relative to the
center point on the left and an example
F 89

of the codes derived from a chain of


points on the right. [ AJ:9.6] full primal sketch: A representation
described as part of Marrs theory of
vision, that is made up of the
raw primal sketch primitives together
5 6 7
with grouping information. The sketch
4 0 contains described image structures
3 2 1 that could correspond with scene
structures (e.g., image regions with
0, 0, 2, 3, 1, 0, 7, 7, 6, 0, 1, 2, 2, 4 scene surfaces).
Frenet frame: A triplet of mutually function based model: An
orthogonal unit vectors (the normal , object representation based on the
the tangent and the object functionality (e.g., an objects
binormal/bitangent ) describing a point purpose or the way in which an object
on a curve. [ BB:9.3.1] moves and interacts with other objects)
rather than its geometric properties.
Normal function based recognition:
Tangent Object recognition based on object
Binormal functionality rather than geometric
properties. See also
frequency domain filter: A filter
function based model .
defined by its action in the
Fourier space . See high pass filter and functional optimization: An
low pass filter . [ SEU:3.4] analytical technique for optimizing
(maximizing or minimizing) complex
frequency spectrum: The range of
functions of continuous variables.
(electromagnetic) frequencies.
[ BKPH:6.13]
[ EH:7.8]
functional representation: See
front lighting: A general term
function based model .
covering methods of lighting a scene
[ WP:Function representation]
where the lights are on the same side of
the object as the camera. As an fundamental form: A metric that
alternative consider backlighting . For useful in determining local properties of
example, [ WP:Frontlight] surfaces. See also
first fundamental form and
second fundamental form . [ OF:C.3]
Light
Source fundamental matrix: A bilinear
relationship between corresponding
Camera
Objects points (u, u ) in binocular stereo
being
imaged. images. The fundamental matrix, F,
incorporates the two sets of camera
Light parameters (K, K ) and the relative
Source position (~t) and orientation (R) of the
cameras. Matching points ~u from one
frontal: Frontal presentation of a image and ~u from the other image
planar surface is one in which the plane satisfy ~uT F~u = 0 where S(~t) is the
is parallel to the image plane .
90 F

skew symmetric matrix of ~t and


F = (K1 )T S(~t)R1 (K )1 . See also fuzzy morphology: A type of
the essential matrix . [ TV:7.3.4] mathematical morphology that is based
on fuzzy logic rather than the more
fusion: Integration of data from conventional Boolean logic.
multiple sources into a single
representation. [ SQ:18.5] fuzzy set: A grouping of data (into a
set) where each item in the set has an
fuzzy logic: A form of logic that associated grade/likelihood of
allows a range of possibilities between membership in the set.
true and false (i.e., a degree of truth). [ WP:Fuzzy set]
[ WP:Fuzzy logic]
fuzzy reasoning: See fuzzy logic .
G

Gabor filter: A filter formed by restricted by a Gaussian envelope


multiplying a complex oscillation by an function. [ NA:2.7.3]
elliptical Gaussian distribution
(specified by two standard deviations gaging: Measuring or testing. A
and an orientation). This creates filters standard requirement of industrial
that are local, selective for orientation, machine vision systems.
have different scales and are tuned for
gait analysis: Analysis of the way in
intensity patterns (e.g., edges, bars and
which human subjects move.
other patterns observed to trigger
Frequently used for biometric or
responses in the simple cells of the
medical purposes. [ WP:Gait analysis]
mammalian visual cortex) according to
the frequency chosen for the complex
oscillation. The filter can be applied in
the frequency domain as well as the
spatial domain . [ FP:9.2.2]

Gabor transform: A transformation


that allows a 1D or 2D signal (such as
an image) to be represented as a
weighted sum of Gabor functions.
[ NA:2.7.3]

Gabor wavelets: A type of wavelet gait classification: 1) Classification of


formed by a sinusoidal function that is different types of human motion (such
as walking, running, etc.). 2) Biometric
91
92 G

identification of people based on their Original Image Normal first derivative Gaussian first derivative

gait parameters. [ WP:Gait#Energy-


based gait classification]

Galerkin approximation: A method


for determining the coefficients of a Gaussian distribution: A probability
power series solution for a differential density function with this distribution:
equation.
1 (x)2

gamma: Devices such as cameras and P (x) = e 22


2
displays that convert between analogue
(denoted a) and digital (d) images where is the mean and is the
generally have a nonlinear relationship standard deviation. If ~x d , then the
between a and d. A common model for multivariate probability density
this nonlinearity is that the signals are function is p(~x) =
related by a gamma curve of the form det(2) 12
exp( 21 (~x
~ ) 1 (~x
~ ))

a = c d , for some constant c. For where ~ is the distribution mean and
CRT displays, common values of are is its covariance. [ BKPH:2.5.2]
in the range 1.02.5. [ BB:2.3.1]
Gaussian mixture model: A
gamma correction: The correction of representation for a distribution based
brightness and color ratios so that an on a combination of Gaussians. For
image has the correct dynamic range instance, used to represent color
when displayed on a monitor. histograms with multiple peaks. See
[ WP:Gamma correction] expectation maximization .
[ WP:Mixture model]
gauge coordinates: A coordinate
system local to the image surface itself. Gaussian noise: Noise whose
Gauge coordinates provide a convenient distribution is Gaussian in nature.
frame of reference for operators such as Gaussian noise is specified by its
the gradient operator . standard deviation about a zero mean,
and is often modeled as a form of
Gaussian convolution: See
additive noise . [ TV:3.1.1]
Gaussian smoothing . [ TV:3.2.2]

Gaussian curvature: A measure of


the surface curvature at a point. It is
the product of the maximum and
minimum of the normal curvatures in
all directions through the point. See
also mean curvature . [ FP:19.1.2]

Gaussian derivative: The


combination of Gaussian smoothing Gaussian pyramid: A
and a gradient filter . This results in a multi-resolution representation of an
gradient filter that is less sensitive to image formed by several images, each
noise. [ FP:8.2.1] one a subsampled and
Gaussian smoothed version of the
original one at increasing standard
deviation. [ WP:Gaussian pyramid]
G 93

Gaussian Smoothed Images


Original Image sigma = 1.0 sigma = 3.0

Gaussian smoothing: An
image processing operation aimed to gaze direction tracking: Continuous
attenuate image noise computed by gaze direction estimation (e.g., in a
convolution with a mask sampling a video sequence or a live camera feed).
Gaussian distribution . [ TV:3.2.2]
gaze location: See
gaze direction estimation .

generalized cone: A
generalized cylinder in which the swept
curve changes along the axis.
[ VSN:9.2.3]

generalized curve finding: A general


Gaussian speckle: Speckle that has term referring to methods that locate
a Gaussian distribution . arbitrary curves. For example, see
generalized Hough transform .
Gaussian sphere: A sampled
[ ERD:10]
representation of a unit sphere where
the surface of the sphere is defined by a generalized cylinder: A
number of triangular patches (often volumetric representation where the
computed by dividing a dodecahedron). volume is defined by sweeping a closed
See also extended Gaussian image . curve along an axis. The axis does not
[ VSN:9.2.5] need to be straight and the closed curve
may vary in shape as it is moved along
the axis. For example a cylinder may
be defined by moving a circle along a
straight axis, and a cone may be defined
by moving a circle of changing diameter
along a straight axis. [ FP:24.2.1]

Axis

gaze control: The ability of a human


subject or a robot head to control their
gaze direction.
generalized Hough transform: A
gaze direction estimation: version of the Hough transform
Estimation of the direction in which a capable of detecting the presence of
human subject is looking. Used for arbitrary shapes. [ ERD:10]
humancomputer interaction.
94 G

generalized order statistics filter: influence on perception theories, and


A filter in which the values within the subsequently on computer vision. Its
filter mask are considered in increasing basic tenet was that a perceptual
order and then combined in some pattern has properties as a whole,
fashion. The most common such filter which cannot be explained in terms of
is the median filter that selects the its individual components. In other
middle value. words, the whole is more than the sum
of its parts. This concept was captured
generate and test: See in some basic laws (proximity,
hypothesize and verify . similarity, closure, common destiny
[ WP:Trial and error] or good form, saliency), that would
apply to all mental phenomena, not
generic viewpoint: A viewpoint such
just perception. Much work on
that small motions may cause small
low-level computer vision, most notably
changes in the size or relative positions
on perceptual grouping and
of features, but no features appear or
perceptual organization , has exploited
disappear. This contrasts with a
these ideas. See also visual illusion .
privileged viewpoint .
[ FP:14.2]
[ WP:Neuroesthetics#The Generic Viewpoint]
geodesic: The shortest line between
two points (on a mathematically
genetic algorithm: An optimization
defined surface). [ AJ:3.10]
algorithm seeking solutions by refining
iteratively a small set of candidates geodesic active contour: An
with a process mimicking genetic active contour model similar to the
evolution. The suitability (fitness) of a snake model in that it attempts to
set of possible solutions (population) is minimize an energy function between
used to generate a new population until the model and the data, but which also
some conditions are satisfied (e.g., the incorporates a geometrical model.
best solution has not changed for a
given number of iterations). Initial Contour Final Contour

[ WP:Genetic algorithm]

genetic programming: Application


of genetic algorithms in some
programming language to evolve
programs that satisfy some evaluation geodesic active region: A technique
criteria. [ WP:Genetic programming] for region based segmentation that
genus: In the study of topology , the builds on geodesic active contours by
number of holes in a surface. In adding a force that takes into account
computer vision, sometimes used as a information within regions. Typically a
discriminating feature for simple object geodesic active region will be bounded
recognition. [ WP:Genus] by a single geodesic active contour.

Gestalt: German for shape. The geodesic distance: The length of the
Gestalt school of psychology, led by the shortest path between two points along
German psychologists Wertheimer, some surface. This is different from the
Kohler and Koffka in the first half of Euclidean distance that takes no
the twentieth century, had a profound account of the surface. The following
example shows the geodesic distance
G 95

between Calgary and London (following


the curvature of the Earth). geometric distance: In curve and
[ WP:Distance (graph theory)] surface fitting , the shortest distance
from a given point to a given surface.
In many fitting problems, the geometric
distance is expensive to compute but
yields more accurate solutions.
Compare algebraic distance .
[ HZ:3.2.2]

geometric distortion: Deviations


from the idealized image formation
geodesic transform: Assigns to each model (for example, pinhole camera) of
point the geodesic distance to some an imaging system. Examples include
feature or class of feature. radial lens distortion in standard
cameras.
geographic information system
(GIS): A computer system that stores geometric feature: A general term
and manipulates geographically describing a shape characteristic of
referenced data (such as images of some data, that encompasses features
portions of the Earth taken by such as edges , corners , geons , etc.
satellite).
[ WP:Geographic information system] geometric feature learning:
Learning geometric features from
geometric compression: The examples of the feature.
compression of geometric structures
such as polygons. geometric feature proximity: A
measure of the distance between
geometric constraint: A limitation geometric features, e.g., as by using the
on the possible physical distance between data and overlaid
arrangement/appearance of objects model features in
based on geometry. These types of hypothesis verification .
constraints are used extensively in
stereo vision (e.g., the geometric hashing: A technique for
epipolar constraint ), motion analysis matching models in which some
(e.g., rigid motion constraint) and geometric invariant features are
object recognition (e.g., focusing on mapped into a hash table, and this
specific classes of objects or relations hash table is used to perform the
between features). [ OF:6.2.6] recognition. [ BM:4.5.4]

geometric correction: In geometric invariant: A quantity


remote sensing , an algorithm or describing some geometric configuration
technique for correction of that remains unchanged under certain
geometric distortion . [ AJ:8.16] transformations (e.g., cross-ratio ,
perspective projection ).
geometric deformable model: A [ WP:Geometric invariant theory]
deformable model in which the
deformation of curves is based on the geometric model: A model that
level set method and stops at object describes the geometric shape of some
boundaries. A typical example is a object or scene. A model can be 2D
geodesic active contour model. (e.g., polycurve ) or 3D (e.g., surface
96 G

based models), etc.


[ WP:Geometric modeling]

geometric model matching:


Comparison of two geometric models
or of a model and a set of image data
shapes, for the purposes of recognition .

geometric optics: A general term geon: GEometrical iON. A basic


referring to the description of optics volumetric primitive proposed by
from a geometrical point of view. Biederman and used in
Includes concepts such as the simple recognition by components . Some
pinhole camera model , magnification , example geons are:
lenses , etc. [ EH:3] [ WP:Geon (psychology)]

geometric reasoning: Reasoning


with geometric shapes in order to
address such tasks as robot motion
planning, shape similarity, spatial
position estimation, etc.

geometric
gesture analysis: Basic analysis of
representation: See geometric model.
video data representing human gestures
[ WP:RGB color model#Geometric representation]
preceding the task of
gesture recognition .
geometric shape: A shape that takes [ WP:Gesture recognition]
a relatively simple geometric form (such
gesture recognition: The recognition
as a square, ellipse, cube, sphere,
of human gestures generally for the
generalized cylinder , etc.) or that can
purpose of humancomputer
be described as a combination of such
interaction. See also
geometric primitives.
hand sign recognition .
[ WP:Tomahawk (geometric shape)]
[ WP:Gesture recognition]
geometric transformation: A class
of image processing operations that
HI STOP
transform the spatial relationships in an
image. They are used for the correction
of geometric distortions and general
image manipulation. A geometric
transformation requires the definition of
a pixel coordinate transformation
together with an interpolation scheme.
For example, a rotation does Gibbs sampling: A method for
[ SEU:3.5]: probabilistic inference based on
transition probabilities (between
states). [ WP:Gibbs sampling]

GIF: Graphics Interchange Format. A


common compressed image format
G 97

based on the LempelZivWelch Radon transform , and the


algorithm. [ SEU:1.8] wavelet transform .

GIS: See golden template: An image of an


geographic information system . unflawed object/scene that is used
[ WP:Geographic information system] within template matching to identify
any deviations from the ideal
glint: A specular reflection visible on object/scene.
a mirror -like surface. [ WP:Glint]
gradient: Rate of change. This is
Glint frequently associated with
edge detection . See also
gray scale gradient . [ VSN:3.1.2]

Intensity

Gradient
global: A global property of a
mathematical object is one that Position in image row
depends on all components of the Position in image row

object. For example, the average


intensity of an image is a global gradient based flow estimation:
property, as it depends on all the image Estimation of the optical flow based on
pixels. [ WP:Global variable] gradient images. This computation can
be done directly through the
global positioning system (GPS): computation of a time derivative as
A system of satellites that allow the long as the movement between frames
position of a GPS receiver to be is quite small. See also the
determined in absolute aperture problem .
Earth-referenced coordinates. Accuracy
of standard civilian GPS is of the order gradient descent: An iterative
of meters. Greater accuracy is method for finding the (local) minimum
obtainable using differential GPS. of a function. [ DH:5.4.2]
[ WP:Global Positioning System]
gradient edge detection:
global structure extraction: Edge detection based on image
Identification of high level gradients . [ BB:3.3.1]
structures/relationships in an image
(e.g., symmetry detection ). gradient filter: A filter that is
convolved with an image to create an
global transform: A general term image in which every point represents
describing an operator that transforms the gradient in the original image in an
an image into some other space. orientation defined by the filter.
Sample global transforms include the Normally two orthogonal filters are
discrete cosine transform , the used and by combining these a
Fourier transform , the Haar transform , gradient vector can be determined for
the Hadamard transform , the every point. Common filters include
Hartley transform, histograms , the Roberts cross gradient operator ,
Hough transform , the Prewitt gradient operator and the
KarhunenLoeve transform , the Sobel gradient operator . The Sobel
98 G

horizontal gradient operator gives:


[ WP:Edge detection]

Gradient Filter
-1 0 1

* -2 0 2 =
-1 0 1

gradient image: See edge image .


[ WP:Image gradient#Computer vision]
gradient space: A representation of
surface orientations in which each
gradient magnitude thresholding: orientation is represented by a pair
Thresholding of a gradient image in z
(p, q) where p = x z
and q = y (where
order to identify strong edge points . the z axis is aligned with the
optical axis of the viewing device).
[ BKPH:15.3]

Z
Vectors representing various
A surface orientations Gradient Space
B q
D E C
C
B
E
Y A p
D

gradient matching stereo: An gradient vector: A vector describing


approach to stereo matching in which the magnitude and direction of
the image gradients (or features maximal change on an N-dimensional
derived from the image gradients) are surface. [ WP:Gradient]
matched. [ CS:6.9]
graduated non-convexity: An
gradient operator: An algorithm for finding a global minimum
image processing operator that in a function that has many sharp local
produces a gradient image from a minima (a non-convex function). This
gray scale input image I. Depending is achieved by approximating the
on the usage of the term, the output function by a convex function with just
could be 1) the vectors I of the x and one minimum (near the global
y derivatives at each point or 2) the minimum of the non-convex function)
magnitudes of these gradient vectors. and then gradually improving the
The usual role of the gradient operator approximation.
is to locate regions of strong gradients
that signals the position of an edge . grammar: A system of rules
The figure below shows a gray scale constraining the way in which
image and its gradient magnitude primitives (such as words) can be
image, where darker lines indicate combined. Used in computer vision to
stronger magnitudes. The gradient was represent objects where the primitives
calculated using the Sobel operator . are simple shapes, textures or features.
[ DH:7.3] [ DH:12.2.1]
G 99

Determining whether two graphs are


grammatical representation: A isomorphic is the graph isomorphism
representation that describes shapes problem and is believed to be
using a number of primitives that can NP-complete . These small graphs are
be combined using a particular set of isomorphic with A:b, C:a, B:c
rules (the grammar). [ OF:11.2.2]:
granulometric spectrum: The
resultant distribution from a A b
granulometry . B
c
granulometry: The study of the size
C a
characteristics of a set (e.g., the size of
a set of regions). Most normally this is graph matching: A general term
achieved by applying a series of describing techniques for comparing two
morphological openings (with graph models . These techniques may
structured elements of increasing size) attempt to find graph isomorphisms ,
and then studying the resultant size subgraph isomorphisms , or may just
distributions. try to establish similarity between
[ WP:Granulometry (morphology)] graphs. [ OF:11.2.2]
graph: A graph is formed by a set of
vertices V and a set of edges E V V
linking pairs of vertices. Vertices u and
Graph Model Graph Model
v are neighbors if (u, v) E or
(v, u) E. See graph isomorphism , ?
subgraph isomorphism . This is a graph
with five nodes [ FP:14.5.1]:

graph model: A model of data in


A terms of a graph. Typical uses in
B
computer vision include
object representation (see
graph matching ) and edge gradients
b c a (see graph searching ).
[ WP:Graphical model]

graph partitioning: The operation of


graph cut: A partition of the vertices splitting a graph into subgraphs
of a directed graph V into two disjoint satisfying some criteria. For example
sets S and T . The cost of the cut is the we might want to partition a graph of
costs of all the edges that go from a all polygonal edge segments in an image
vertex in S to a vertex in T . [ CS:6.11] into subgraphs corresponding to objects
in the scene. [ WP:Graph partition]

graph isomorphism: Two graphs are graph representation: See


isomorphic if there exists a mapping graph model .
(bijection) between their vertices that [ WP:Graph (data structure)#Representations]
makes the edge sets identical.
100 G

graph searching: Search for a specific 0 255

node or path through a graph . Used


for, among other things,
border detection (e.g., in an edge gray scale co-occurrence: The
gradient image) and object occurrence of two particular gray levels
identification (e.g., decision trees). some particular distance and
orientation apart. Used in
graph similarity: The degree to co-occurrence matrices . [ RN:8.3.1]
which two graph representations are
similar. Typically (in computer vision) gray scale correlation: The
these representations will not be cross correlation of gray scale values in
exactly the same and hence a double image windows or full images.
subgraph isomorphism may need to be
found to evaluate similarity. gray scale distribution model: A
model of how gray scales are
graph theoretic clustering: distributed in some image region . See
Clustering algorithms that use concepts also intensity histogram .
from graph theory, in particular
leveraging efficient graph-theoretic gray scale gradient: The rate of
algorithms such as maximum flow. change of the gray levels in a
[ WP:Cluster analysis#Graph- gray scale image . See also edge ,
theoretic methods] gradient image and
first derivative filter .

grassfire algorithm: A technique for gray scale image: A monochrome


finding a region skeleton based on image in which pixels typically
wave propagation. A virtual fire is lit represents brightness values ranging
on all region boundaries and the from 0 to 255. See also gray scale .
skeleton is defined by the intersection of [ SQ:4.1.1]
the wave fronts.

TIME

OBJECT FIRE SKELETON

grating: See diffraction grating .


gray scale mathematical
[ WP:Grating]
morphology: The application of
gray level . . . : See gray scale . . . mathematical morphology to
[ LG:3.4] gray scale images . Each
quantization level is treated as a
gray scale: A monochromatic distinct set where pixels are members of
representation of the value of a pixel. the set if they have a value greater than
Typically this represents image or equal to particular quantization
brightness and ranges from 0 (black) to levels. [ SQ:7.2]
255 (white). [ LG:3.4]
G 101

gray scale moment: A moment that of a number of pixels) from the local
is based on image or region neighborhood are used. Grid filters
gray scales . See also binary moment. require a training phase where noisy
data and corresponding ideal data are
gray scale morphology: See presented.
gray scale mathematical morphology.
[ SQ:7.2] ground following: See
ground tracking .
gray scale similarity: See
gray scale correlation . ground plane: The horizontal plane
that corresponds to the ground (the
gray scale texture moment: A surface on which objects stand). This
moment that describes texture in a concept is only really useful when the
gray scale image (e.g., the Haralick ground is roughly flat. The ground
texture operator describes image plane is highlighted here:
homogeneity). [ WP:Ground plane]
gray scale transformation: A
general term describing a class of
image processing operations that apply
to gray scale images , and simply
manipulate the gray scale of pixels.
Example operations include
contrast stretching and
histogram equalization .

gray value . . .: See gray scale . . . ground tracking: A loosely defined


[ WP:Grey] term describing the robot navigation
problem of sensing the ground plane
greedy search: A search algorithm and following some path.
seeking to maximize a local criterion [ WP:Ground track]
instead of a global one. Greedy
algorithms sacrifice generality for speed. ground truth: In performance
For instance, the stable configuration of analysis, the true value, or the most
a snake is typically found by an accurate value achievable, of the output
iterative energy minimization . The of a specific instrument under analysis,
snake configuration at each step of the for instance a vision system measuring
optimization can be found globally, by the diameter of circular holes. Ground
searching the space of all allowed truth values may be known
configurations of all pixels theoretically, e.g., from formulae, or
simultaneously (a large space) or locally obtained through an instrument more
(greedy algorithm), by searching the accurate than the one being evaluated.
space of all allowed configurations of [ TV:A.1]
each pixel individually (a much smaller
space). [ NA:6.3.2] grouping: 1) In human perception,
the tendency to perceive certain
grey . . .: See gray . . . [ LG:3.4] patterns or clusters of stimuli as a
coherent, distinct entity as opposed to a
grid filter: An approach to set of independent elements. 2) A
noise reduction where a nonlinear whole class of segmentation algorithms
function of features (pixels or averages is based on this idea. Much of this work
102 G

was inspired by the Gestalt school of


psychology. See also segmentation , grouping transform: An
image segmentation , image analysis technique for grouping
supervised classification , and clustering image features together (e.g., based on
. [ FP:14] collinearity, etc.). [ TV:5.5]
H

Haar transform: A wavelet transform Hamming distance of 01110 and 01100


that is used in image compression . The is 1, that of 10100 and 10001 is 2. A
basis functions used are similar to those very important concept in digital
used by first derivative edge detectors, communications. [ CS:6.3.2]
resulting in images that are decomposed
into horizontal, diagonal and vertical hand sign recognition: The
edges at different scales. [ PGS:4.4] recognition of hand gestures such as
those used in sign language.
Hadamard transform: A
transformation that can be used to H I
transform an image to its constituent
Hadamard components. A fast version
of the algorithms exists that is similar
to the fast Fourier transform , but all hand tracking: The tracking of a
values in the basis functions are either persons hand in a video sequence ,
+1 or 1. It requires significantly less often for use in humancomputer
computation and as such is often used interaction.
for image compression . [ SEU:2.5.3]
handeye calibration: The
halftoning: See dithering . calibration of a manipulator (such as a
[ WP:Halftone] robot arm) together with a visual
system (such as a number of cameras).
Hamming distance: The number of The main issue here is ensuring that
different bits in corresponding positions both systems use the same frame of
in two bit strings. For instance, the reference. See also camera calibration .
103
104 H

but the coefficients used are real


handeye coordination: The use of (whereas those used in the Fourier
visual feedback to direct the movement transform are complex). [ AL:13.4]
of a manipulator. See also
handeye calibration . Hausdorff distance: A measure of
the distance between two sets of
handwriting verification: (image) points. For every point in both
Verification that the style of sets determine the minimum distance
handwriting corresponds to that of to any point in the other set. The
some particular individual. Hausdorff distance is the maximum of
[ WP:Handwriting recognition] these minimum values. [ OF:10.3.1]
handwritten character recognition: HDTV: High Definition TeleVision.
The automatic recognition of characters [ WP:High-definition television]
that have been written by hand.
[ WP:Handwriting recognition] height image: See range image .
[ WP:Range imaging]

Helmholtz reciprocity: An
observation by Helmholtz about the
bidirectional reflectance
distribution function fr (~i, ~e) of a local
Hankel transform: A simplification surface patch, where ~i and ~e are the
of the Fourier transform for radially incoming and outgoing light rays
symmetric functions. respectively. The observation is that
[ WP:Hankel transform] the reflectance is symmetric about the
incoming and outgoing directions, i.e.,
hat transform: See fr (~i, ~e) = fr (~e,~i). [ FP:4.2.2]
Laplacian of Gaussian (also known as
Mexican hat operator ) and/or Hessian: The matrix of second
top hat operator . derivatives of a multi-valued scalar
[ WP:Top-hat transform] function. It can be used to design an
orientation-dependent
Harris corner detector: A second derivative edge detector
corner detector where a corner is
" 2 f (i,j) 2 f (i,j) #
2
detected if the eigenvalues of the matrix [ FP:3.1.2]. H = 2 i f (i,j)
ij
2 f (i,j)
M are large and locally maximum ji j 2
(f (i, j) is the intensity at point (i,j)). heterarchical/mixed control: An
approach to system control where
" #
f f f f
i i i j
M = f f f f . control is shared amongst several
i j j j systems.
To avoid explicit comutation of the
eigenvalues, the local maxima of heuristic search: A search process
det(M) 0.004 trace(M) can be that employs common-sense rules
used. This is also known as the (heuristics) to speed up search.
Plessey corner finder . [ BB:4.4]
[ WP:Harris affine region detector#Harris corner measure]
hexagonal image representation:
An image representation where the
Hartley transform: Similar pixels are hexagonal rather than
transform to the Fourier transform ,
H 105

rectangular. This representation might sequence of problems beginning with a


be used because 1) it is similar to the low-resolution Hough space and
human retina or 2) the distances to all proceeding to high-resolution space, or
adjacent pixels are equal, unlike using low-resolution images, or
diagonally connected pixels in operating on subimages of the input
rectangular grids image before combining the results.

Hexagonal Sampling Grid


hierarchical image compression:
Image compression using
hierarchical coding . This leads to the
concept of progressive image
transmission.

hierarchical matching: Matching at


increasingly greater levels of detail.
hidden Markov model (HMM): A This approach can be used when
model for predicting the probability of matching images or more abstract
system state on the basis of the representations.
previous state together with some
observations. HMMs have been used hierarchical model: A model formed
extensively in by smaller submodels, each of which
handwritten character recognition . may have further smaller submodels.
[ FP:23.4] The model may contain multiple
instances of the subcomponent models.
hierarchical: A general term referring The subcomponents may be placed
to the approach of considering data at a relative to the model by using a
low level of detail initially and then coordinate system transformation or
gradually increasing the level of detail. may just be listed in a set structure.
This approach often results in better This is a three-level hierarchical model
performance. [ WP:Hierarchy] with multiple usage of the
hierarchical clustering: An approach subcomponents:
to grouping in which each item is [ WP:Hierarchical database model]
initially put in a separate cluster, the
two most similar clusters are merged
and this merging is repeated until some
condition is satisfied (e.g., no clusters
of less that a particular size remain).
[ DH:6.10]

hierarchical coding: Coding of


(image) data at multiple layers starting
with the lowest level of detail and hierarchical recognition: See
gradually increasing the resolution. See hierarchical matching .
also hierarchical image compression . [ WP:Cognitive neuroscience of visual object recognition#Hierarchical Rec

hierarchical Hough transform: A


technique for improving the efficiency of hierarchical texture: A way of
the standard Hough transform . considering texture elements at
Commonly used to describe any multiple levels (e.g., basic texture
Hough-based technique that solves a elements may themselves be grouped
106 H

together to form a texture element at intensity levels (i.e., whose


another scale, and so on). [ BB:6.2] intensity histogram is flat). When this
technique is applied to a digital image ,
hierarchical thresholding: A however, the resulting histogram will
thresholding technique where an image often have large values interspersed
is considered at different levels of detail with zeros. [ AL:5.3]
in a pyramid data structure, and
thresholds are identified at different
levels in the pyramid starting at the
highest level.

high level vision: A general term


referring to image analysis and
understanding tasks (i.e., those tasks histogram modeling: A class of
that address reasoning about what is techniques, such as
seen, as opposed to basic processing of histogram equalization , modifying the
images). [ BKKP:5.10] dynamic range and contrast of an image
by changing its intensity histogram into
high pass filter: A one with desired properties.
frequency domain filter that removes or
suppresses all low-frequency histogram modification: See
components. [ SEU:2.5.4] histogram modeling . [ SEU:4.2.1]

highlight: See specular reflection . histogram moment: A moment


[ FP:4.3.4] derived from a histogram .
[ WP:Algorithms for calculating variance#Higher-
histogram: A representation of the order statistics]
frequency distribution of some values.
See intensity histogram , an example of
which is shown below. [ AL:5.2] histogram smoothing: The
application of a smoothing filter (e.g.,
600 Gaussian smoothing ) to a histogram .
This is often required before
histogram analysis operations can be
Frequency

applied.

600 600
0
0 Grey Scale 255
Frequency
Frequency

histogram analysis: A general term


describing a group of techniques that
abstract information from histograms 0 0

(e.g., determining the 0 Grey Scale 255 0 Grey Scale 255

anti-mode/trough in a
bi-modal histogram for use in hit and miss/hit or miss operator:
thresholding). A morphological operation where a
new image is formed by ANDing
histogram equalization: An (logical AND) together corresponding
image enhancement operation that bits for every pixel of an input image
processes a single image and results in and a structuring element. This
an image with a uniform distribution of operator is most appropriate for
H 107

binary images but may also be applied homogeneous coordinates: Points


to gray scale images . described in projective space. For
[ WP:Hit-or-miss transform] example an (x, y, z) point in Euclidean
space would be described as
(x, y, z, ) for any in
homogeneous coordinates. [ FP:2.1.1]

homogeneous representation: A
representation defined in
HK: See mean and Gaussian curvature projective space . [ HZ:1.2.1]
shape classification . homography: The relationship
[ WP:Gaussian curvature] described by a
HK segmentation: See homography transformation .
mean and Gaussian curvature [ WP:Homography]
shape classification . homography transformation: Any
[ WP:Gaussian curvature] invertible linear transformation between
HMM: See hidden Markov model . projective spaces. It is commonly used
[ FP:23.4] for image transfer , which maps one
planar image or region to another. The
holography: The process of creating a transformation can be estimated using
three dimensional image (a hologram) four non-collinear point pairs.
by recording the interference pattern [ WP:Homography]
produced by coherent laser light that
has been passed through a homomorphic filtering: An
diffraction grating. [ WP:Holography] image enhancement technique that
simultaneously normalizes brightness
homogeneous, homogeneity: 1. ( and enhances contrast. It works by
Homogeneous coordinates :) In applying a high pass filter to the
projective n-dimensional geometry, a original image in the frequency domain,
point is represented by a n + 1 element hence reducing intensity variation (that
vector, with the Cartesian changes slowly) and highlighting
representation being found by dividing reflection detail (that changes rapidly).
the first n components by the last one. [ SEU:3.4.4]
Homogeneous quantities such as points
are equal if they are scalar multiples of homotopic transformation: A
each other. For example a 2D point is continuous deformation that preserves
represented as (x, y) in Cartesian the connectivity of object features (e.g.,
coordinates and in homogeneous skeletonization ). Two objects are
coordinates by the point (x, y, 1) and homotopic if they can be made the
any multiple thereof. 2. (Homogeneous same by some series of homotopic
texture:) A two (or higher) dimensional transformations.
pattern, defined on a space S R2 for Hopfield network: A type of neural
which some functions (e.g., mean, network mainly used in optimization
standard deviation) applied to a problems, which has been used in
window on S have values that are object recognition . [ WP:Hopfield net]
independent of the position of the
window. [ WP:Homogeneous space]
108 H

horizon line: The line defined by all


vanishing points from the same plane. HSV: Hue Saturation Value
The most commonly used horizon line is color image format. [ FP:6.3.2]
that associated with the ground plane .
hue: Describes color using the
[ WP:Horizon#Theoretical model]
dominant wavelength of the light. Hue
is a common component of color image
Vanishing
Point
Vanishing
Point
formats (see HSI , HSL , HSV ).
[ FP:6.3.2]

Hueckel edge detector: A


Horizon Line Horizon Line
parametric edge detector that models
Hough transform: A technique for an edge using a parameterized model
transforming image features directly within a circular window (the
into the likelihood of occurrence of parameters are edge contrast, edge
some shape. For example see orientation and distance background
Hough transform line finder and mean intensity).
generalized Hough transform .
[ AL:9.3] Huffman encoding: An optimal,
variable-length encoding of values (e.g.,
Hough transform line finder: A pixel values) based on the relative
version of the Hough transform based probability of each value. The code
on the parametric equation of a line lengths may change dynamically if the
(s = i cos + j sin ) in which a set of relative probabilities of the data source
edge points {(i, j)} is transformed into change. This technique is commonly
the likelihood of a line being present as used in image compression. [ AL:15.3]
represented in a (s, ) space. The
likelihood is quantified, in practice, by human motion analysis: A general
a histogram of the sin , cos values term describing the application of
observed in the images. [ AL:9.3.1] motion analysis to human subjects.
Such analysis is used to track moving
Image Edge Image Significant Lines people, to recognize the pose of a
person and to derive 3D properties.
[ WP:Motion analysis#Human motion analysis]

HSI: Hue-Saturation-Intensity HYPER: HYpothesis Predicted and


color image format. [ JKS:10.4] Evaluated Recursively. A well known
vision system developed by Nicholas
HSL: Hue-Saturation-Luminance Ayache and Olivier Faugeras, in which
color image format. geometric relations derived from
[ WP:HSL and HSV] polygonal models are used for
recognition.
Color Image

hyperbolic surface region: A region


=
of a 3D surface that is locally
saddle-shaped. A point on a surface at
Hue Saturation Luminance which the Gaussian curvature is
negative (so the signs of the principal
curvatures are opposite).
H 109

hyperspectral image .
[ WP:Hyperspectral imaging]
11
00 hypothesize and test: See
hypothesize and verify . [ JKS:15.1]

hyperfocal distance: The distance D hypothesize and verify: A common


at which a camera should be focused in approach to object recognition in
order that the depth of field extends which possibilities (of object type and
from D/2 to infinity. Equivalently, if a pose) are hypothesized and then
camera is focused at a point at distance evaluated against evidence from the
D, points at D/2 and infinity are images. This is done either until all
equally blurred. [ JKS:8.3] possibilities are considered or until a
hypothesis with a sufficiently high
hyperquadric: A class of degree of fit is found. [ JKS:15.1]
volumetric shape representations that
include superquadrics . Hyperquadric Possible hypotheses:
models can describe arbitrary convex
What piece
polyhedra. [ SQ:9.11] goes here?

hyperspectral image: An image with Hypotheses which do not need to be considered


(in this 3 by 3 jigsaw):
a large number (perhaps hundreds) of
spectral bands. An image with a lower
number of spectral bands is referred to
as multi-spectral image .
[ WP:Hyperspectral imaging]

hyperspectral sensor: A sensor


capable of collecting many (perhaps hysteresis tracking: See
hundreds) of spectral bands thresholding with hysteresis . [ OF:4.5]
simultaneously. Produces a
I

ICA: See a digital image , which will suffer from


independent component analysis . rasterization. May also be used to refer
[ WP:Independent component analysis] to a vanishing point. [ HZ:1.2.2]

iconic: Having the characteristics of an IDECS: Image Discrimination


image. See iconic model . [ SQ:4.1.1] Enhancement Combination System. A
well-known vision system developed by
iconic model: A representation Haralick and Currier.
having the characteristics of an image.
For example the template used in identification: The process of
template matching . [ SQ:4.1.1] associating some observations with a
particular instance or class of object
iconic recognition: that is already known. [ TV:10.1]
Object recognition using iconic models.
identity verification: Confirmation of
the identity of a person based on some
ICP: See iterative closest point . biometrics (e.g., face authentication ).
[ FP:21.3.2] This differs from the recognition of an
unknown person in that only one model
ideal line: A line described in the
has to be compared with the
continuous domain as opposed to one in
information that is observed.
a digital image , which will suffer from
rasterization. [ HZ:1.2.2] IGS: Interpretation Guided
Segmentation. A vision technique for
ideal point: A point described in the
grouping image elements into regions
continuous domain as opposed to one in
110
I 111

based on semantic interpretations in illusory contour: A perceived border


addition to raw image values. where there is no edge present in the
Developed by Tenenbaum and Barrow. image data. See also
subjective contour. For example the
IHS: Intensity Hue Saturation following diagram shows the Kanizsa
color image format. [ BB:2.2.5] triangles. [ FP:14.2]
IIR: See
infinite impulse response filter.
[ WP:Infinite impulse response]

ill-posed problem: A mathematical


problem that infringes at least one of
the conditions in the definition of
well-posed problem. Informally, these
are that the solution must (a) exist, (b)
be unique, and (c) depend continuously
on the data. Ill-posed problems in
computer vision have been approached image: A function describing some
using regularization theory. See quantity (such as brightness ) in terms
regularization . [ SQ:6.2.1] of spatial layout (See
image representation ). Most frequently
illuminance: The total amount of computer vision is concerned with two
visible light incident upon a point on a dimensional digital images . [ SB:1.1]
surface. Measured in lux (lumens per
meter squared), or footcandles (lumens image addition: See
per foot squared). Illuminance pixel addition operator . [ SB:3.2.1]
decreases as the distance between the
viewer and the source increases. image analysis: A general term
[ JKS:9.1.1] covering all forms of analysis of image
data. Generally image analysis
illuminant direction: The direction operations result in a symbolic
from which illuminance originates. See description of the image contents.
also light source geometry . [ TV:9.3] [ AJ:1.5]

illumination: See illuminance . image acquisition: See


[ JKS:9.1.1] image capture . [ TV:2.3]

illumination constancy: The image arithmetic: A general term


phenomenon that allows humans to covering image processing operations
perceive the lightness/brightness of that are based on the application of an
surfaces as approximately constant arithmetic or logical operator to two
regardless of the illuminance . images. Such operations included
addition , subtraction , multiplication ,
illumination field calibration: division , blending , AND , NAND ,
Determination of the illuminance OR, XOR , and XNOR . [ SB:3.2]
falling on a scene. Typically this is
done by taking an image of a white image based: A general term
object of known brightness. describing operations or representations
that are based on images.
[ WP:Image analysis]
112 I

example, keywords) with images that


image based rendering: The allows the images to be indexed
production of a new image of a scene efficiently within a database.
from an arbitrary viewpoint based on a [ SQ:13A.3]
number of images of the scene together
with associated range images . image difference: See
[ FP:26] image subtraction . [ SB:3.2.1]

image blending: An image digitization: The process of


arithmetic operation similar to sampling and quantizing an analogue
image addition where a new image is image function to create a
formed by blending the values of digital image . [ VSN:2.3.1]
corresponding pixels from two input
images. Each input image is given a image distortion: Any effect that
weight for the blending so that the alters an image from the ideal image.
total weight is 1.0. [ SB:3.2.1.1] Most typically this term refers to
geometric distortions , although it can
also refer to other types of distortion
* 0.7 + * 0.3 =
such as image noise and effects of
sampling and quantization .
[ WP:Distortion (optics)]
image capture: The acquisition of an
image by a recording device, e.g., a Correct Image Distorted Image

camera . [ TV:2.3]

image coding: The mapping or


algorithm required to encode or decode
an image representation (such as a
compressed image) .
[ WP:Graphics Interchange Format#Image coding]
image encoding: The process of
converting an image into a different
representation. For example see
image compression: A method of
image compression .
representing an image in order to
[ WP:Image compression]
reduce the amount of storage space
that it occupies. Techniques can be image enhancement: A general term
lossless (which allows all image data to covering a number of image processing
be recorded perfectly) or lossy (where operations, that alter an image in order
some loss of quality is allowed, typically to make it easier for humans to
resulting in significantly better perceive. Example operations include
compression rates). [ SB:1.3.2] contrast stretching and
histogram equalization . For example,
image connectedness: See
the following shows a histogram
pixel connectivity . [ SB:4.2]
equalization operation [ SB:4]:
image coordinates: See image plane
coordinates and pixel coordinates .
[ JKS:12.1]

image database indexing: The image feature: A general term for an


technique of associating indices (for interesting image structure that could
I 113

arise from a corresponding interesting coordinates in some input image. The


scene structure. Features can be single computation is based on the values of
points such as interest points , nearby pixels in the input image. This
curve vertices , image edges , lines or type of operation is required for most
curves or surfaces , etc. [ TV:4.1] geometric transformations and
computations requiring
image feature extraction: A group subpixel resolution . Types of
of image processing techniques interpolation scheme include
concerned with the identification of nearest-neighbor interpolation,
particular features in an image. bilinear interpolation , bicubic
Examples include edge detection and interpolation, etc. This figure shows the
corner detection. [ TV:4.1] result of interpolation in image
enlargement [ RJS:2]:
image flow: See optic flow .
[ JKS:14.4] Enlarged image using bicubic interpolation

image formation: A general term


covering issues relating to the manner
in which an image is formed. For
example in the case of a digital camera
this term would include the camera
geometry as well as the process of
sampling and quantization. [ SB:2.1]

image grid: A geometric map image interpretation: A general


describing the image sampling in which term for computer vision processes
every image point is represented by a that extract descriptions from images
vertex (or hole) in the map/grid. (as opposed to processes that produce
output images for human viewing).
image indexing: See There is often the assumption that the
image database indexing . [ SQ:13A.3] descriptions are very high-level, e.g.,
the boy is walking to the store
image intensifier: A device for carrying a book or these cells are
amplifying an image, so that the cancerous. A broader definition would
resultant sensed luminous flux is also allow processes that extract
significantly higher. information needed by a subsequent
[ WP:Image intensifier] (usually non- image processing )
activity, e.g., the position of a bright
image interleaving: Describes the spot in an image.
way in which image pixels are
organized. Different possibilities include image invariant: An image feature or
pixel interleaving (where the image measurement image that is invariant to
data is ordered by pixel position), and some properties. For example invariant
band interleaving (where the image color features are often used in
data is ordered by band, and is then image database indexing .
ordered by pixel position within each [ WP:Image moment]
band). [ WP:Interleaving]
image irradiance equation: Usually
image interpolation: A method for expressed as E(x, y) = R(p, q), this
computing a value for a pixel in an equality (up to a constant scale factor
output image based on non-integer to account for illumination strength,
114 I

surface color and optical efficiency) says


that the observed brightness E at pixel
(x, y) is equal to the reflectance R of
the surface for surface normal
(p, q, 1). Usually there is a
one-degree-of-freedom family of surface
normals with the same reflectance value image morphology: An approach to
so the observed brightness only image processing that considers all
partially constrains local surface operations in terms of set operations.
orientation and thus shape. See mathematical morphology .
[ JKS:9.3.1] [ WP:Mathematical morphology]

image magnification: The extent to image mosaic: A composition of


which an image is expanded for viewing. several images, to provide a single
If the image size is actually changed larger image with covering a wider field
then image interpolation must be used. of view. For example, the following is a
Normally quoted relative to the original mosaic of three images [ RJS:2]:
size (e.g., 2, 10, etc.). [ AJ:7.4]

Magnified image (x4)

image matching: The comparison of


two images, often evaluated using image motion estimation:
cross correlation . See also Computation of optical flow for all
template matching . [ TV:10.4.2] pixels/features in an image.
[ WP:Motion estimation]
Image 1
Image 2
Locations where Image 2 matches Image 1.
image multiplication: See
pixel multiplication operator .
[ SB:3.2.1.2]

image noise: Degradation of an image


image memory: See frame store .
where pixels have values which are
[ ERD:2.2]
different from the ideal values. Often
image modality: A general term for noise is modeled as having a Gaussian
the sensing technique used to capture distribution with a zero mean, although
an image, e.g., a visible light, infrared it can take on different forms such as
or X-ray image. salt-and-pepper noise depending upon
the cause of the noise (e.g., the
image morphing: A gradual environment, electrical inference, etc.).
transformation from one image to Noise is measured in terms of the
another image. [ WP:Morphing] signal-to-noise ratio . [ SB:2.3.3]
I 115

Original Image Image with Gaussian Noise Image with Salt and Pepper Noise

IMAGE PLANE
OPTICAL AXIS

LENS
X

image normalization: The purpose Y


of image normalization is to reduce or
eliminate the effects of different
illumination on the same or similar
scenes. A typical approach is to
subtract the mean of the image and
divide by the standard deviation, which
image plane coordinates: The
produces a zero mean, unit variance
position of points in the physical image
image. Since images are not Gaussian
sensing plane. These have physically
random samples, this approach does
meaningful values, such as centimeters.
not completely solve the problem.
These can be converted to
Further, light source placement can
pixel coordinates , which are in pixels.
also cause variations in shading that
The two meanings are sometimes used
are not corrected by this approach.
interchangeably. [ JKS:1.6]
This figure shows an original image
(left) and its normalization (right): image processing: A general term
[ WP:Normalization (image processing)] covering all forms of processing of
captured image data. It can also mean
processing that starts from an image
and results in an image, as contrasted
to ending with symbolic descriptions of
the image contents or scene. [ JKS:1.2]

image processing operator: A


function that may be applied to an
image in order to transform it in some
image of absolute conic: See way. See also image processing.
absolute conic . [ HZ:7.5] [ ERD:2.2]

image pair rectification: See image pyramid: A hierarchical


image rectification . [ FP:11.1.1] image representation in which each
level contains a smaller version of the
image plane: The mathematical plane image at the previous level. Often pixel
behind the lens onto which an image is values are obtained by a smoothing
focused. In practice, the physical process. Usually the reduction is by a
sensing surface aims to be placed here, power of two (i.e., 2 or 4). The figure
but its position will vary slightly due to below shows four levels of a pyramid in
minor variations in sensor shape and which each level is formed by averaging
placement. The term is also used to together two pixels from the previous
describe the geometry of the image layer. The levels are enlarged to the
recorded at this location. See original image size for inspection of the
[ JKS:1.4]: effect of the compression. [ FP:7.7]
116 I

data is often stored in arrays where the


spatial layout of the array reflects the
spatial layout of the data. The figure
below shows a small 10 10 pixel image
patch with the gray scale values for the
corresponding pixels. [ AJ:1.2]
image quality: A general term,
usually referring to the extent to which
123 123 123 123 123 123 123 123 96 96
the image data records the observed 123 123 112 96 96 123 123 123 123 96
123 123 96 96 112 123 137 123 123 96
scene faithfully. The specific issues that 123 123 96 96 123 214 234 178 123 96
are important to image quality are 123 100 72 109 178 230 230 137 123 96
125 78 51 142 218 178 96 76 96 96
problem specific, but may include low 92 100 92 92 81 76 76 96 123 123
81 109 129 129 100 81 92 123 123 123
image noise , high image contrast , good 51 109 142 137 123 123 123 123 123 123
33 76 123 123 137 137 123 123 123 123
image focus , low motion blur , etc.
[ WP:Image quality]
image resolution: Usually used to
image querying: A shorthand term record the number of pixels in the
for indexing into image databases . horizontal and vertical directions in the
This is often done based on color , image, but may also refer to the
texture or shape indices . The database separation between pixels (e.g., 1 m)
keys could be based on global or local or the angular separation between the
measures. [ WP:Content- lines of sight corresponding to adjacent
based image retrieval#Query by example]pixels. [ SB:1.2]
image restoration: The process of
image reconstruction: A term used removing some known (and modelled)
in image compression to describe the distortion from an image, such as blur
process of recreating a digital image in an out-of-focus image. The process
from some compressed form. may not produce a perfect image, but
may remove an undesired distortion
image rectification: A warping of a (e.g., motion blur ) at the cost of
stereo pair of images such that another ignorable distortion (e.g.,
conjugate epipolar lines (defined by the phase distortion). [ SB:6]
two cameras epipoles and any 3D
scene point) are collinear. Usually the image sampling: The process of
lines are transformed to be parallel to measuring some pixel values from the
the horizontal axis so that physical image focused onto the
corresponding image features can be image plane . The sampling could be
found on the same raster line. This monochrome , color or multi-spectral ,
reduces the computational complexity such as RGB . The sampling usually
of the stereo correspondence problem . results in a rectangular array of pixels
[ FP:11.1.1] sampled at nearly equally spacing, but
other sampling could be used such as
image registration: See registration . space variant sensing . [ VSN:2.3.1]
[ FP:21.3]
image scaling: The operation of
image representation: A general increasing or reducing the size of an
term for how the image data is image by some scale factor. This
represented. Image data can be one, operation may require the use of some
two, three or more dimensional. Image type of image interpolation method.
I 117

See also image magnification . would be to remove systematic camera


[ WP:Image scaling] motions to produce a motionless image.
See also feature stabilization .
image segmentation: The grouping [ WP:Image stabilization]
of image pixels into meaningful, usually
connected, structures such as curves image sharpening operator: An
and regions . The term is applied to a image enhancement operator that
variety of image modalities , such as increases the high spatial frequency
intensity data or range data and component of the image, so as to make
properties, such as similar the edges of objects appear sharper or
feature orientation , feature motion, less blurred . See also
surface shape or texture . [ SB:10.1] edge enhancement . These images show
a raw image (left) and an image
image sequence: A series of images sharpened with the unsharp operator
generally taken at regular intervals in (right). [ SB:4.6]
time. Typically the camera and/or
objects in the scene will be moving.
[ TV:8.1]

image sequence fusion: The


integration of information from the
many images in an image sequence .
Different types of fusion include
3D structure recovery , production of a image size: The number of pixels in
mosaic of the scanned scene, tracking an image, for example, 768 horizontally
of a moving object, improved scene by 494 vertically.
imaging due to image averaging , etc. [ WP:Wikipedia:What is a featured picture%3F/Image size]
[ WP:Image fusion]

image sequence matching: image smoothing: See


Computing the correspondence between noise reduction . [ TV:3.2]
pixels or image features in frames of
image stabilization: See
the image sequence. With the
image sequence stabilization
correspondences, one can construct
[ WP:Image stabilization]
image mosaics , stabilize image jitter or
recover scene structure . image storage devices: See
frame store . [ ERD:2.2]
image sequence stabilization:
Normal hand-held video camera image subtraction operator: See
recordings contain some image motion pixel subtraction operator . [ SB:3.2.1]
due to the jitter of the human operator.
Image stabilization attempts to image transfer: 1) See
estimate the random portion of the novel view synthesis . 2) Alternatively,
camera motion jitter and translate the a general term describing the movement
images in the sequence to reduce or of an image from one device to another,
remove the jitter. A similar application or alternatively from one representation
118 I

to another. vegetation or mineral types).


[ WP:Picture Transfer Protocol] [ WP:Imaging spectroscopy]

image understanding: A general imaging surface: The surface within


term referring to the derivation of a camera on which the image is
high-level (abstract) information from projected by the lens . This surface in a
an image or series of images. This term digital camera is comprised of
is often used to refer to the emulation of photosensitive elements that record the
human visual capabilities. [ AJ:9.15] incident illumination. See also
image plane .
Image Understanding Environment
(IUE): A C++ based collection of implicit curve: A curve that is
data-types (classes) and standard defined by an equation of the form
computer vision algorithms. The f (~x) = 0. Then the curve is the set of
motivation behind the development of points S = {~x | f (~x) = 0}. [ FP:15.3.1]
the IUE was to reduce the independent
re-invention of basic computer vision
code in government funded computer implicit surface: The representation
vision research. of a surface as the set of points that
makes a function have the value zero.
image warping: A general term for For example, the sphere
transforming the positions of pixels in x2 + y 2 + z 2 = r2 of radius r at the
an image, usually while maintaining origin could be represented by the
image topology (i.e., neighboring function f (x, y, z) = x2 + y 2 + z 2 r2 .
original pixels remain neighbors in the The set of points where f (x, y, z) = 0 is
warped image). This results in an the implicit surface. [ SQ:4.1.2]
image with a new shape. This operation
might be done, for example, to correct impossible object: An object that
some geometric distortion , align two cannot physically exist, such as
images (see image rectification ), or [ VSN:4.1.1]:
transform shapes into a more easily
processed form (e.g., circles into
straight lines). [ SB:7.10]

imaging geometry: A general term


referring to the relative placement of
sensors , structured light sources ,
point light sources , etc. [ BB:2.2.2]

imaging spectroscopy: The


acquisition and analysis of surface
composition by using image data from
multiple spectral channels. A typical
sensor (AVIRIS) records 224 impulse noise: A form of image
measurements at 10 nm increments corruption where image pixels have
from 400 to 2500 nm. The term might their value replaced by the maximum
refer to the raw multi-dimensional value (e.g., 255). See also
signal or to the classification of that salt-and-pepper noise . This figure
signal into surface types (e.g., shows impulse noise on an image
[ TV:3.1.2]:
I 119

passes into a new material (Snells


Law). [ FP:1.2.1]

indexing: The process of retrieving an


element from a data structure using a
key. A powerful concept imported into
computer vision from programming.
For example, the problem of
incandescent lamp: A light source establishing the identity of an object
whose light arises from the glowing of a given an image and a set of candidate
very hot structure, such as a tungsten models is typically approached by
filament in the common light bulb. locating some characterizing elements
[ WP:Incandescent light bulb] in the image, or features , then using
the features properties to index a data
incident light: A general term base of models. See also
referring to the light that strikes or model base indexing . [ FP:18.4.2]
illuminates a surface.
industrial vision: A general term
incremental learning: Learning that covering uses of machine vision
is incremental in nature. See technology to industrial processes.
continuous learning . [ WP:Population- Applications include product
based incremental learning] inspection, process feedback, part or
tool alignment. A large range of
lighting and sensing techniques are
independent component analysis: used. A common feature of industrial
A multi-variate data analysis method. vision systems is fast processing rates
It finds a linear transformation that (e.g., several times a second), which
makes each component of the may require limiting the rate at which
transformed data vectors independent targets are analyzed or limiting the
of each other. Unlike types of processing.
principal component analysis, which
considers only second order properties infinite impulse response filter
(covariances) and transforms onto basis (IIR): A filter that produces an
vectors that are orthogonal to each output value (yn ) based on the current
other, ICA considers properties of the and past input values (xi ) together
whole distribution and transforms onto with pastPp output valuesPq (yj ).
basis vectors that need not be yn = i=0 ai xni + j=1 bj ynj where
orthogonal. ai and bj are weights.
[ WP:Independent component analysis] [ WP:Infinite impulse response]

index of refraction: The absolute inflection point: A point at which the


index of refraction in a material is the second derivative of a curve changes its
ratio of the speed of an electromagnetic sign, corresponding to a change in
wave in a vacuum to the speed in the concavity. See also curve inflection .
material. More commonly used is the [ FP:19.1.1]
relative index of refraction of two
media, which is the ratio of their
absolute indices of refraction. This
ratio is used in lens design and explains
the bending of light rays as the light
120 I

INFLECTION inspection: A general term for


POINT
visually examining a target to detect
defects. Common practical inspection
examples include printed circuit boards
for breaks or solder joint failures, paper
production for holes or discolorations,
and food for irregularities. [ SQ:17.4]

influence function: A function integer lifting: A method used to


describing the effect of an individual construct wavelet representations.
observations on a statistical model. [ WP:Lifting scheme]
This allows us to evaluate whether the
observation is having an undue integer wavelet transform: An
influence on the model. integer version of the discrete
[ WP:Influence function] wavelet transform .

information fusion: Fusion of integral invariant: An integral (of


information from multiple sources. See some function) that is invariant under a
sensor fusion . set of transformations. For example,
[ WP:Information integration] local integrals along a curve of
curvature or arc length are invariant to
infrared: See infrared light . [ SB:3.1] rotation and translation. Integral
invariants potentially have greater
stability to noise than, e.g., differential
infrared imaging: Production of a invariants, such as curvature itself.
image through use of an
infrared sensor. [ SB:3.1] integration time: The length of time
that a light-sensitive sensor medium is
infrared light: Electromagnetic exposed to the incident light (or other
energy with wavelengths approximately stimulus). Shorter times reduce the
in the range 700 nm to 1 mm. signal strength and possible
Immediately shorter wavelengths are motion blur (if the sensor or objects in
visible light and immediately longer the scene are moving).
wavelengths are microwave radio.
Infrared light is often used in intensity: 1) The brightness of a
machine vision systems because: 1) it light source . 2) Image data that records
is easily observed by most the brightness of the light that comes
semiconductor image sensors yet is not from the observed scene. [ TV:2.2.3]
visible by humans or 2) it is a measure
of the heat emitted by the observed intensity based database indexing:
scene. [ SB:3.1] This is a form of
image database indexing that uses
infrared sensor: A sensor capable of intensity descriptors such as histograms
observing or measuring infrared light . of pixel ( monochrome or color ) values
[ SB:3.1] or vectors of local derivative values.

inlier: A sample that falls within an intensity cross correlation:


assumed probability distribution (e.g., Cross correlation using intensity data .
within the 95 percentile). See also
outlier . [ WP:RANSAC]
I 121

intensity data: Image data that


represents the brightness of the
measured light. There is not usually a
linear mapping between the brightness
of the measured light and the stored
values. The term can refer to the
intensity of observed visible light as
well.

intensity gradient: The


mathematical gradient operation
applied to an intensity image I gives
the intensity gradient I at each image
point. The intensity gradient direction
shows the local image direction in intensity histogram: A data
which the maximum change in intensity structure that records the number of
occurs. The pixels of each intensity value. A typical
intensity gradient magnitude gives the gray scale image will have pixels with
magnitude of the local rate of change in values in [0,255]. Thus the histogram
image intensity. These terms are will have 256 entries recording the
illustrated below. At each of the two number of pixels that had value 0, the
designated points, the length of the number having value 1, etc. A dark
vector shows the magnitude of the object against a lighter background and
change in intensity and the direction of its histogram are shown here [ SB:3.4]
the vector shows the direction of
greatest change. [ WP:Image gradient]

111111111111
000000000000
000000000000
111111111111
000000000000
111111111111
000000000000
111111111111
0000000000000011
111111111111
000
111
000000
111111
000
111

intensity image: An image that


intensity gradient direction: The records the measured intensity data.
local image direction in which the [ TV:2.1]
maximum change in intensity occurs.
See also intensity gradient . intensity level slicing: An
[ WP:Gradient#Interpretations] image processing operation in which
pixels with values other than the
intensity gradient magnitude: The selected value (or range of values) are
magnitude of the local rate of change in set to zero. If the image is viewed as a
image intensity. See also landscape, with height proportional to
intensity gradient . The image below brightness, then the slicing operator
shows the raw image and its intensity takes a cross section through the height
gradient magnitude (contrast enhanced surface. The right image below shows
for clarity). (in black) the intensity level 80 of the
[ WP:Gradient#Interpretations] left image. [ AJ:7.2]
122 I

interest point feature detector: An


operator applied to an image to locate
interest points . Well-known examples
are the Moravec and the Plessey
interest point operators. [ SB:10.9]

intensity matching: This approach interference: When 1) ordinary light


finds corresponding points in a pair of interacts with matter that has
images by matching the gray scale dimensions similar to the wavelength of
intensity patterns. The goal is to find the light or 2) coherent light interacts
image neighborhoods that have nearly with itself, then interference occurs.
identical pixel intensities. All image The most notable effect from a
points could be considered for matching computer vision perspective is the
or only feature or interest points . An production of interference fringes and
algorithm where intensity matching is the speckle of laser illumination. May
used is correlation based stereo alternatively refer to electrical
matching. interference which can affect an image
when it is being transmitted on an
intensity sensor: A sensor that electrical medium. [ EH:9]
measures intensity data . [ BM:1.9.1]
interference fringe: When optical
interest point: A general term for interference occurs, the most noticeable
pixels that have some interesting effect it has is the production of
property. Interest points are often used interference fringes where the light
for making illuminates a surface. These are parallel
feature point correspondences between roughly equally spaced lighter and
images. Thus, the points usually have darker bands of brightness . One
some identifiable property. Further, important consequence of these bands
because of the need to limit the is blurring of the edge positions.
combinatorial explosion that matching [ EH:9.1]
can produce, interest points are often
expected to be infrequent in an image. interferometric SAR: An
Interest points are often points of high enhancement of
variation in pixel values. See also synthetic aperture radar (SAR) sensing
point feature . Example interest points to incorporate phase information from
from the Harris corner detector the reflected signal, increasing accuracy.
(courtesy of Marc Pollefeys) are seen [ WP:Interferometric synthetic aperture radar]
here [ SB:10.9]:
I 123

internal parameters (of camera):


interior orientation: A See intrinsic parameters. [ FP:2.2]
photogrammetry term for the
calibration of the intrinsic parameters inter-reflection: The reflection caused
of a camera, including its focal length, by light reflected off a surface and
principal point, lens distortion, etc. bouncing off another surface of the
This allows transformation of measured same object. See also
image coordinates into mutual illumination .
camera coordinates . [ JKS:12.9] [ WP:Diffuse reflection#Interreflection]

interlaced scanning: A technique interval tree: An efficient structure


arising from television engineering, for searching in which every node in the
whereby alternate rows of an image are tree is a parent to nodes in a particular
scanned or transmitted instead of interval of values. [ WP:Interval tree]
consecutive rows. Thus, one television
frame is transmitted by sending first interpolation: A mathematical
the odd rows, forming the odd field , process whereby a value is inferred from
and then the even rows, forming the other nearby values or from a
even field . [ LG:4.1.2] mathematical function linking nearby
values. For example, dense values along
intermediate representation: A a curve can be linearly interpolated
representation that is created as a stage between two known curve points by
in the derivation of some other fitting a line connecting the two curve
representation from some input points. Image, surface and volume
representation. For example the values can be interpolated, as well as
raw primal sketch , full primal sketch , higher dimensional structures.
and 2.5D sketch were intermediate Interpolating functions can be curved
representation between input images as well as linear. [ BB:A1.11]
and a 3D model in Marrs theory . In
the following example a binary image of interpretation tree search: An
the notice board is an intermediate algorithm for matching between
representation between the input image members between two discrete sets. For
and the textual output. each feature from the first set, it builds
[ WP:Intermediate representation] a depth-first search tree considering all
possible matching features from the
second set. After a match is found for
one feature (by satisfying a set of
08:32
1 Malahide
consistency tests), then it tries to
match the remaining features. The
Intermediate Representation algorithm can cope when no match is
possible for a given feature by allowing
internal energy (or force): A a given number of skipped features.
measure of the stability of a shape Here we see an example of a partial
(such as smoothness) of an interpretation tree that is matching
active shape or deformable contour model features to data features
model which is part of the [ TV:10.2]:
deformation energy . This measure is
used to constrain the appearance of the
model. [ WP:Internal energy]
124 I

properties intrinsic to the scene, instead


of properties of the input image.
Example intrinsic images include:
DATA 1 M1 M2 M3 * distance to scene points, scene
X ? ? surface orientations , surface reflectance
, etc. The right image below shows a
DATA 2 M1 M2 M3 *
X
depth image registered with the
X ?
intensity image on the left. [ BB:1.5]
DATA 3 M1 M2 M3 *

? ? ?

DATA 4
...
X - CONSISTENCY FAILURE
M1 - MODEL 1
M2 - MODEL 2
M3 - MODEL 3
* - WILDCARD

intrinsic camera parameters:


Parameters such as focal length,
coefficients of radial lens distortion, and
the position of the principal point, that
describe the mapping from image pixels intruder detection: An application of
to world rays in a camera. Determining machine vision , usually analyzing a
the parameters of this mapping is the video sequence to detect the
task of camera calibration . For a appearance of an unwanted person in a
pinhole camera, world rays ~r are scene. [ SQ:17.5]
mapped to homogeneous image
coordinates ~x by ~x = K~r where K is invariant: Something that does not
the upper triangular 3 3 matrix change under specified operations (e.g.,
translation invariant ).
u f s u0 [ WP:Invariant (mathematics)]
K= 0 v f v0
0 0 1 invariant contour function: The
contour function characterizes the
In this form, f represents the focal shape of a planar figure based on the
length, s is the skew angle between the external boundary. Values invariant to
image coordinate axes, (u0 , v0 ) is the position, scale or orientation can be
principal point, and u and v are the computed from the contour functions.
the aspect ratios (e.g., pixels/mm) in These invariants can be used for
the u and v image directions. [ FP:2.2] recognition of instances of the planar
figure.
intrinsic dimensionality: The inverse convolution: See
number of dimensions (degrees of deconvolution . [ AL:14.5]
freedom) inherent in a data set,
independent of the dimensionality of inverse Fourier transform: A
the space in which it is represented. For transformation that allows a signal to
example, a curve in 3D is intrinsically be recreated from its Fourier
1D although its points are represented coefficients. See Fourier transform .
in 3D. [ WP:Intrinsic dimension] [ SEU:2.5.1]

intrinsic image: A term describing inverse square law: A physical law


one of a set of images registered with that says the illumination power
the input intensity image that describe received at distance d from a point light
I 125

source is inversely proportional to the constant. i.e., f (x, y, z) = C where C is


square of d, i.e., is proportional to d12 . some constant. [ WP:Isosurface]
[ WP:Inverse-square law]
isotropic gradient operator: A
invert operator: A low-level gradient operator that computes the
image processing operation where a scalar magnitude of the gradient, i.e., a
new image is formed by replacing each value that is independent of edge
pixel by an inverted value. For direction. [ JKS:5.1]
binary images , this is 1 if the input
pixel is 0 or 0 if the input pixel is 1. isotropic operator: An operator that
For gray level images , this depends on produces the same output irrespective
the maximum range of intensity values. of the local orientation of the pixel
If the range of intensity values is [0,255] neighborhood where the operator is
then the inverse inverse of a pixel with applied. For example, a
value x is 256 x. The result is like a mean smoothing operator produces the
photographic negative. Below is a gray same output value, even if the image
level image and its inverted image data is rotated at the point where the
[ LG:5.1.2]: operator is being applied. On the other
hand, a directional derivative operator
would produce different values if the
image were rotated. This concept is
particularly relevant to
feature detectors , some of which are
sensitive to the local orientation of the
image pixel values and some of which
IR: See infrared . [ SB:3.1] are not (isotropic). [ LG:6.4.1]
irradiance: The amount of energy iterated closest point: See
received at a point on a surface from iterative closest point . [ FP:21.3.2]
the corresponding scene point.
[ JKS:9.1] iterative closest point (ICP): A
shape alignment algorithm that works
isometry: A transformation that by iterating its two-stage process until
preserves distances. Thus the some termination point: step 1) given
transformation T : x 7 u is an isometry an estimated transformation of the first
if, for all pairs (x, y), we have shape onto the second, find the closest
|x y| = |T (x) T (y)|. [ HZ:1.4.1] feature from the second shape for each
isophote curvature: Isophotes are feature of the first shape, and step 2)
curves of constant image intensity. given the new set of closest features,
Isophote curvature is defined at any re-estimate the transformation that
Lvv
given pixel as: Lw , where Lw is maps the first feature set onto the
magnitude of the gradient second. Most variations of the
perpendicular to the isophote and Lvv algorithm need a good initial estimate
is the curvature of the intensity surface of the alignment. [ FP:21.3.2]
along the isophote at that point. IUE: See
iso-surface: A surface in a 3D space Image Understanding Environment .
where the value of some function is
J

Jacobian: The matrix of derivatives of


a vector function. Typically if the JPEG: A common format for
~
function f (~x) is written in component compressed image representation
form as [ SQ:2.2.1] designed by the Joint Photographic
Experts Group (JPEG). [ SEU:1.8]
f~(~x) = f~(x1 , x2 , . . . , xp )
junction label: A symbolic label for
the pattern of edges meeting at the

f1 (x1 , x2 , . . . , xp )
f2 (x1 , x2 , . . . , xp ) junction. This approach is mainly used
= .. in blocks world scenes where all objects


.
are polyhedra, and thus all lines are
fn (x1 , x2 , . . . , xp ) straight and meet at only a limited
then the Jacobian J is the n p matrix number of configurations. Example Y
f (i.e., corner of a block seen front on)
f1

x
1
. . . xp and arrow (i.e., corner of a block
.1 . seen from the side) junctions are shown
J= .. .. here. See also line label . [ VSN:4.1.1]
fn
x1 . . . f xp
n

joint entropy registration:


Registration of data using joint entropy ARROW Y JUNCTION
JUNCTION
(a measure of the degree of uncertainty)
as a criterion.
[ WP:Mutual information#Applications of mutual information]

126
K

k-means: An iterative componentwise definition m ~ =


squared error clustering algorithm. (median{xi1 }ni=1 , ..., median{xid }ni=1 )
Input is a set of points {~xi }ni=1 , and and the analogue of the one
initial guess at the locations ~c1 , . . . , ~ck dimensional definition
Pn
of k cluster centers. The algorithm m~ = argminmR~ d ~ ~xi |.
i=1 |m
alternates two steps: points are [ WP:K-medians clustering]
assigned to the cluster center closest to
them, and then the cluster centers are k-nearest-neighbor algorithm: A
recomputed as the mean of the nearest neighbor algorithm that uses
associated points. Iterating yields an the classifications of the nearest k
estimate of the k cluster centers that is neighbors when making a decision.
P
likely to minimize ~x min~c |~x ~c| . 2 [ FP:22.1.4]
[ FP:14.4.2]
Kalman filter: A recursive linear
k-means clustering: See k-means . estimator of a varying state vector and
[ FP:14.4.2] associated covariance from observations,
their associated covariances and a
k-medians (also k-medoids): A dynamic model of the state evolution.
variant of k-means clustering in which Improved estimates are calculated as
multi-dimensional medians are new data is obtained. [ FP:17.3]
computed instead of means. The
definition of multi-dimensional median KarhunenLo` eve transformation:
varies, but options for the median m ~ of The projection of a vector (or image
a set of points {~xi }ni=1 , i.e., when treated as a vector) onto an
i i n
{(x1 , . . . , xd )}i=1 include the orthogonal space that has uncorrelated

127
128 K

components constructed from the


autocorrelation (scatter) matrix of a set kernel function: (1) A function in an
of example vectors. An advantage is the integral transformation (e.g., the
orthogonal components have a natural exponential term in the
ordering (by the largest eigenvalues of Fourier transform ); (2) a function
the covariance of the original vector applied at every point in an image (see
space) so that one can select the most convolution ).
significant variation in the dataset. The [ WP:Kernel (mathematics)]
transformation can be used as a basis
kernel principal component
for image compression, for estimating
analysis: An extension of the
linear models in high dimensional
principal component analysis (PCA)
datasets and estimating the dominant
method that allows classification with
modes of variation in a dataset, etc. It
curved region boundaries. The kernel
is also known as the
method is equivalent to a nonlinear
principal component transformation.
mapping of the data into a high
The following image shows a dataset
dimensional space from which the
before and after the KL transform was
global axes of maximum variation are
applied. [ AJ:5.11]
extracted. The method provides a
transformation via a kernel so that
+Y
PCA can be done in the input space
+Y
instead of the transformed space.
+X +X [ WP:Kernel principal component analysis]

key frames: Primarily a computer


PRINCIPAL EIGENVECTOR graphics animation technique, where
kernel: 1) A small matrix of numbers key frames in a sequence are drawn by
that is used in image convolutions . 2) more experienced animators and then
The structuring element used in intermediate interpolating frames are
mathematical morphology . 3) The drawn by less experienced animators.
mathematical transformation used In computer vision
kernel discriminant analysis . motion sequence analysis , key frames
[ FP:7.1.1] are the analogous video frames ,
typically displaying
kernel discriminant analysis: A motion discontinuities between which
classification approach based on three the scene motion can be smoothly
key observations: 1) some problems interpolated. [ WP:Key frame]
need curved classification boundaries,
2) the classification boundaries should KHOROS: An image processing
be defined locally by the classes rather development environment with a large
than globally and 3) a high set of operators. The system comes
dimensional classification space can be with a pull-down interactive
avoided by using the kernel method. development workspace where operators
The method provides a transformation can be instantiated and connected by
via a kernel so that linear discriminant click and drag operations.
analysis can be done in the input space
instead of the transformed space. kinetic depth: A technique for
estimating
[ WP:Linear discriminant analysis#Practical use] the depth at image feature
points (usually edges ) by exploiting a
K 129

controlled sensor motion. This when they might be usable or might


technique generally does not work at all fail. An additional common component
points of the image because of is some form of task dependent
insufficient image structure or sensor knowledge encoded in a
precision in smoothly varying regions, knowledge representation that is used
such as walls. See also to help guide the reasoning algorithm.
shape from motion . A typical motion Also common is some uncertainty
case is for the camera to rotate on a mechanism that records the confidence
circular trajectory while fixating on a that the system has about the
point in front of the camera, as seen here: outcomes of its processing. For
[ WP:Depth perception#Monocular cues] example, a knowledge-based vision
system might be used for aerial analysis
of road networks, containing specialized
detection modules for straight roads,
FIXATION road junctions, forest roads as well as
POINT TARGET
survey maps, terrain type classifiers,
curve linking, etc. [ RN:10.2]

knowledge representation: A
general term for methods of computer
encoding knowledge. In
SWEPT computer vision systems, this is usually
TRAJECTORY
knowledge about recognizable objects
and visual processing methods. A
common knowledge representation
Kirsch compass edge detector: A
scheme is the geometric model that
first derivative edge detector that
records the 2D or 3D shape of objects.
computes the gradient in different
Other commonly used vision knowledge
directions according to which
representation schemes are
calculation mask is used. Edges have
graph models and frames . [ BT:9]
high gradient values, so thresholding
the intensity gradient magnitude is one Koenderinks surface shape
approach to edge detection . A Kirsch classification: An alternative to the
mask that detects edges at 45 degrees is more common mean curvature and
[ SEU:2.3.4]: Gaussian curvature 3D
surface shape classification labels.
3 5 5 Koenderinks scheme decouples the two
3 0 5 intrinsic shape parameters into one
3 3 3 parameter (S) that represents the
local surface shape (including
cylindrical, hyperbolic, spherical and
knowledge-based vision: A style of planar) and a second parameter (C)
image interpretation that relies on that encodes the magnitude of the
multiple processing components capable curvedness of the shape. The shape
of different image analysis processes, classes represented in Koenderinks
some of which may solve the same task classification scheme are illustrated:
in different ways. Linking the
components together is a reasoning
algorithm that knows about the
capabilities of the different components,
130 K

Nyquist noise#Thermal noise on capacitors]

S: -1 -1/2 0 +1/2 +1 KullbackLeibler


distance/divergence: A measure of
the relative entropy or distance between
Kohonen network: A multi-variate two probability densities p1 (~x) and
data clustering and analysis method p2 (~x), defined as [ CS:6.3.4]
that produces a topological
p1 (~x)
Z
organization of the input data. The D(p1 || p2 ) = p1 (~x) log d~x
response of the whole network to a p2 (~x)
given data vector can be used as a
lower dimensional signature of the data kurtosis: A measure of the flatness of
vector. a distribution of gray scale values. If
[ WP:Counterpropagation network] ng is the number of pixels out of N
with gray scale value g, then the fourth
KTC noise: A type of noise histogram moment is
associated with Field Effect Transistor 4 = N1 g ng (g 1 )4 , where 1 is the
(FET) image sensors. The KTC term mean pixel value. The kurtosis is
is used because the noise is 4 3. [ AJ:9.2]
proportional to kT C where T is the
temperature, C is the capacitance of Kuwahara: An edge-preserving
the image sensor and k is Boltzmanns noise reduction filter . The filter uses
constant. This noise arises during four regions surrounding the pixel being
image capture at each pixel smoothed. The smoothed value for that
independently and is also independent pixel is the mean value of the region
of integration time. [ WP:Johnson- with smallest variance.
L

label: A description associated with


something for the purposes of
identification. For example see
region labeling . [ BB:12.4]

labeling problem: Given a set S of


image structures (which may be pixels
as well as more structured objects like
edges ) and a set of labels L, the
labeling problem is the question of how
to assign a label l L for each image lacunarity: A scale dependent
structure s S. This process is usually measure of translational invariance
dependent on both the image data and based on the size distribution of holes
neighboring labels. A typical remote within a set. High lacunarity indicates
sensing application is to label image that the set is heterogeneous and low
pixels by their land type, such as water, lacunarity indicates homogeneity.
snow, sand, wheat field, forest, etc. A [ PGS:3.3]
range image (below left) has its pixels
labeled by the sign of their LADAR: LAser Detection And
mean curvature (white: negative, light Ranging or Light Amplification for
gray: zero, dark gray: positive, black: Detection and Ranging. See laser radar
missing data). [ BB:12.4] . [ BB:2.3.2]

Lagrange multiplier technique: A


method of constrained optimization to
131
132 L

find a solution to a numerical problem electronic circuit card or an anatomical


that includes one or more constraints. feature such as the tip of the nose, or
The classical form of the Lagrange might be a more general image feature
multiplier technique finds the parameter such as interest points . [ SB:9.1]
vector ~v minimizing (or maximizing)
the function f (~v ) = g(~v ) + h(~v ), where LANDSAT: A series of satellites
g() is the function being minimized and launched by the United States of
h() is a constraint function that has America that are a common source of
value zero when its argument satisfies satellite images of the Earth.
the constraint. The Lagrange multiplier LANDSAT 7 for example was launched
is . [ BKPH:A.5] in April 1999 and provides complete
coverage of the Earth every 16 days.
Laguerre formula: A formula for [ BB:2.3.1]
computing the directed angle between
two 3D lines based on the cross ratio of Laplacian: Loosely, the Laplacian of a
four points. Two points arise where the function is the sum of its second order
two image lines intersect the ideal line partial derivatives. For example the
(i.e., the line through the Laplacian of f (x, y, z) : R3 7 R is
2 2 2
vanishing points ) and the other two 2 f (x, y, z) = xf2 + yf2 + zf2 . In
points are the ideal lines computer vision, the Laplacian
absolute points (intersection of the operator may be applied to an image,
ideal line and the absolute conic ). by convolution with the Laplacian
kernel, one definition of which is given
Lamberts law: The observed shading by the sum of second derivative kernels
on ideal diffuse reflectors is independent [1, 2, 1] and [1, 2, 1] , with zero
of observer position and varies with the padding to make the result 3 3
angle between the surface normal and [ JKS:5.3.1]:
source direction [ JKS:9.1.2]:
0 1 0
1 4 1
LIGHT SOURCE
SURFACE NORMAL 0 1 0
CAMERA

Laplacian of Gaussian operator: A


Lambertian surface: A surface whose low-level image operator that applies
reflectance obeys Lamberts law , more the second derivative
commonly known as a matte surface . Laplacian operator (2 ) after a
These surfaces have equally bright Gaussian smoothing operation
appearance from all viewpoints . Thus, everywhere in an image. It is an
the shading of the surface depends only isotropic operator . It is often used as
on the relative direction of the part of a zero crossing edge detection
incident illumination . [ FP:4.3.3] operator because the locations where
the value changes sign (positive to
landmark detection: A general term negative or vice versa) of the output
for detecting an image feature that is image are located near the edges in the
commonly used for registration . The input image, and the detail of the
registration might be between a model detected edges can be controlled by use
and the image or it might be between of the scale parameter of the Gaussian
two images, etc. Landmarks might be smoothing. An example mask that
task specific, such as components on an implements the Laplacian of Gaussian
L 133

operator with smoothing parameter laser radar: (LADAR) A LIDAR


= 1.4 is [ JKS:5.4]: range sensor that uses laser light. See
also laser range sensor . [ BB:2.3.2]

laser range sensor: A laser -based


range sensor records the distance from
the sensor to a target or target scene by
detecting the image of a laser spot or
stripe projected onto the scene. These
sensors are commonly based on
structured light triangulation ,
time of flight or phase difference
technologies. [ TV:2.5.3]

laser speckle: A time-varying light


pattern produced by interference of the
Laplacian pyramid: A compressed light reflected from a surface
image representation in which a illuminated by a laser . [ EH:14.2.2]
pyramid of Laplacian images is
created. At each level of the scheme, laser stripe triangulation: A
the current gray scale image has the structured light triangulation system
Laplacian applied to it. The next level that uses laser light. For example, a
gray scale image is formed by projected plane of light that would
Gaussian smoothing and subsampling. normally result in a straight line in the
At the final level, the smoothed and camera image is distorted by any
subsampled image is kept. The original objects in the scene where the
image can be approximately distortion is proportional to the height
reconstructed level by level through of the object. A typical triangulation
expanding and smoothing the current geometry is illustrated here
level image and then adding the [ JKS:11.4.1]:
Laplacian. [ FP:9.2.1]
LASER STRIPE

laser: Light Amplification by PROJECTOR

Stimulated Emission of Radiation. A


very bright light source often used for
machine vision applications because of
its properties: most light is at a single
spectral frequency , the light is
coherent, so various interference effects LASER STRIPE

can be exploited and the light beam


can be processed so that divergence is SCENE OBJECT

slight. Two common applications are


for structured light triangulation and CAMERA/SENSOR

range sensing . [ EH:14.2]


lateral inhibition: A process whereby
laser illumination: A very bright a given feature weakens or eliminates
light source useful because of its limited nearby features. An example of this
spectrum, bright power and coherence. appears in the Canny edge detector
See also laser . [ EH:14.2] where locally maximal
intensity gradient magnitudes cause
134 L

adjacent gradient values that lie across computation for the iterative and
(as contrasted with along) the edge to sorting algorithms but can be more
be set to zero. [ RN:6.2] robust to outliers than the
least mean square estimator .
Laws texture energy measure: A [ JKS:13.6.3]
measure of the amount of image
intensity variation at a pixel. The least square curve fitting: A
measure is based on 5 one dimensional least mean square estimation process
finite difference masks convolved that fits a parametric curve model or a
orthogonally to give 25 2D masks. The line to a collection of data points,
25 masks are then convolved with the usually 2D or 3D. Fitting often uses the
image. The outputs are smoothed Euclidean , algebraic or
nonlinearly and then combined to give Mahalanobis distance to evaluate the
14 contrast and rotation invariant goodness of fit. Here is an example of
measures. [ PGS:4.6] least square ellipse fitting
[ FP:15.2-15.3]:
least mean square estimation: Also
known as least square estimation or
mean square estimation. Let ~v be the
parameter vector that we are searching
for and ei (~v ) be the error meaasure
associated with the ith of N data items.
The error measure often used is the
Euclidean , algebraic or
Mahalanobis distance between the ith
data item and a curve or surface being
fit, that is parameterized by ~v . Then
the mean square error is: least square estimation: See
N
least mean square estimation .
1 X 2
ei (~v )
N i=1 least squares fitting: A general term
for a least mean square estimation
The desired parameter vector ~v process that fits some parametric
minimizes this sum. shape, such as a curve or surface , to a
[ WP:Least squares] collection of data. Fitting often uses
the Euclidean , algebraic or
least median of squares estimation: Mahalanobis distance to evaluate the
Let ~v be the parameter vector that we goodness of fit. [ BB:A1.9]
are searching for and ei (~v ) be the error
associated with the ith of N data items. least square surface fitting: A
The error measure often used is the least mean square estimation process
Euclidean , algebraic or that fits a parametric surface model to
Mahalanobis distance between the ith a collection of data points, usually
data item and a curve or surface being range data. Fitting often uses the
fit that is parameterized by ~v . Then the Euclidean , algebraic or
median square error is the median or Mahalanobis distance to evaluate the
middle value of the sorted set {ei (~v )2 }. goodness of fit. The range image (below
The desired parameter vector ~v left) has planar and cylindrical surfaces
minimizes this median value. This fitted to the data (below right).
estimator usually requires more [ JKS:3.5]
L 135

LempelZivWelch (LZW): A form


of file compression based on encoding
commonly occurring byte sequences.
This form of compression is used in the
common GIF image file format.
[ SEU:5.2.3]

lens: A physical optical device for


focusing incident light onto an imaging
surface, such as photographic film or an
leave-one-out test: A method for electronic sensor. Lenses can also be
testing a solution in which one sample used to change magnification , enhance
is left out of the training set and used or modify a field of view. [ BKPH:2.3]
instead for testing. This can be done
for every sample. [ FP:22.1.5] lens distortion: Unexpected variation
in the light field passing through a lens.
LED: Light Emitting semiconductor Examples are radial lens distortion or
Diode. Often used as detectable chromatic aberration and usually arise
point light source markers or from how the lens differs from the ideal
controllable illumination. [ LG:7.1] lens. [ JKS:12.9]
left-handed coordinate system: A lens equation: The simplest case of a
3D coordinate system with the XYZ convex converging lens with focal
axes arranged as shown below. The length f perfectly focused on a target
alternative is a at distance D has distance d between
right-handed coordinate system . the lens and the image plane as related
[ WP:Cartesian coordinate system#Orientation
by theand
lenshandedness]
equation 1 = 1 + 1 and
f D d
illustrated here [ JKS:8.1]:

+Y
+Z (INTO PAGE) SCENE
OBJECT

d H
OPTICAL
h AXIS
D
+X IMAGE
PLANE LENS

lens type: A general term for lens


shapes and functions, such as convex or
Legendre moment: The Legendre half-cylindrical, converging, magnifying,
moment of a piecewise continuous etc. [ BKPH:2.3]
function f (x, y) with order (m, n) is level set: The set of data points ~x that
1
(2m + 1)(2n + 1) satisfy a given equation of the form:
R4 +1 R +1
1 1 m
P (x)Pn (y)f (x, y)dxdy where f (~x) = c. Varying the value of c gives
Pm (x) is the mth order Legendre different sets of usually closely related
polynomial. These moments can be points. A visual analogy is of a
used for characterizing image data and geographic surface and the ocean rising.
images can be reconstructed from the If the function f () is the sea level, then
infinite set of moments. the level sets are the shore lines for
different sea levels c. The figure below
136 L

shows an intensity image and the pixels


at level (brightness) 80. [ SQ:8.6.1] light: A general term for the
electromagnetic radiation used in many
computer vision applications. The term
could refer to the illumination in the
scene or the irradiance coming from
the scene onto the sensor. Most
computer vision applications use light
that is visible , infrared or ultraviolet .
[ AJ:3.2]

light source: A general term for the


source of illumination in a scene,
LevenbergMarquardt whether deliberate or accidental. The
optimization: A numerical light source might be a
multi-variate optimization method that point light source or an
switches smoothly between gradient extended light source . [ FP:5.2]
descent when far from a (local) light source detection: The process
optimum and a second-order inverse of detecting the position of or direction
Hessian (quadratic) method when to the light sources in the scene, even if
nearer. [ FP:3.1.2] not observable. The light sources are
license plate recognition: A usually assumed to be
computer vision application that aims point light sources for this process.
to identify a vehicles license plate from light source geometry: A general
image data. Image data is often term referring to the shape and
acquired from automatic cameras at placement of the light sources in a
places where vehicles slow down such as scene.
bridges and toll barriers.
[ WP:Automatic number plate recognition] light source placement: A general
term for the positions of the
light sources in a scene. It may also
LIDAR: LIght Detection And refer to the care that machine vision
Ranging. A range sensor using applications engineers take when
(usually) laser light . It can be based placing the light sources so as to
on the time of flight of a pulse of laser minimize unwanted lighting effects,
light or the phase shift of a waveform. such as shadows and
The measurement could be of a single specular reflections , and to enhance the
point or an array of measurements if visibility of desired scene structures,
the light beam is swept across the e.g., by back lighting or
scene/object. [ BB:2.3.2] oblique lighting .
Lie groups: A group that can be light stripe ranging: See
represented as a continuous and structured light triangulation .
differentiable manifold of a space, such [ JKS:11.4.1]
that group operations are also
continuous. An example of a Lie group lightfield: A function that encodes the
is the orthogonal group SO(3) = {R radiance on an empty point in space as
R33 : R R = I, det(R) = 1} of rigid a function of the points position and
3D rotations . [ WP:Lie group]
L 137

the direction of the illumination . A


lightfield allows image based rendering line cotermination: When two lines
of new (unoccluded) scene views from have endpoints in exactly or nearly the
arbitrary positions within the lightfield. same location. See examples:
[ WP:Light field]

lighting: A general term for the


illumination in a scene, whether
LINE
deliberate or accidental. [ LG:2.1.1] COTERMINATIONS
lightness: The estimated or perceived
reflectance of a surface, when viewed in
monochrome . [ RN:6.1-6.2]

lightpen: A user-interface device that


allows people to indicate places on a line detection operator: A
computer screen by touching the screen feature detection process that detects
at the desired place with the pen. The lines. Depending on the specific
computer can then draw items, select operator, locally linear line segments
actions, etc. It is effectively a type of may be detected or straight lines might
mouse that acts on the display screen be globally detected. Note that this
instead of on a mat. [ WP:Light pen] detects lines as contrasted with edges .
[ RN:7.3]
likelihood ratio: The ratio of
probabilities of observing data D with line drawing analysis: 1) Analysis of
and without condition C: PP(D|C)
(D|C)
. hand-made or CAD drawings to extract
[ WP:Likelihood function] a symbolic description or shape
description. For example, research has
limb extraction: A process of investigated extracting 3D building
image interpretation that extracts 1) models from CAD drawings. Another
the arms or legs of people or animals, application is the analysis of
e.g., for tracking or 2) the barely visible hand-drawn circuit sketches to form a
edge of a curved surface as it curves circuit description. 2) Analysis of the
away from an observer (derived from an line junctions in a polyhedral
astronomical term). See figure below. blocks world scene, in order to
See also occluding contour . understand the 3D structure of the
scene. [ VSN:4]

line fitting: A curve fitting problem


where the objective is to estimate the
parameters of a straight line that best
LIMB interpolates given point data.
[ DH:9.2]

line following: See line grouping .


[ DH:7.7]
line: Usually refers to a straight ideal
line that passes through two points, but line grouping: Generally refers to the
may also refer to a general curve process of creating a longer curve by
marking, e.g., on paper. grouping together shorter fragments
[ RJS:APPENDIX 1] found by line detection . These might
138 L

be short connecting locally detected


line fragments, or might be longer line matching: The process of making
straight line segments separated by a a correspondence between the lines in
gap. May also refer to the grouping of two sets. One set might be a
line segments on the basis of grouping geometric model such as used in
principles such as parallelism. See also model based recognition or
edge tracking , perceptual organization , model registration or alignment .
Gestalt . Alternatively, the lines may have been
extracted from different images, as
line intersection: Where two or more when doing feature based stereo or
lines intersect at a point. The lines estimating the epipolar geometry
cross or meet at a line junction . See between the two lines.
[ BKPH:15.6]:
line moment: A line moment is
similar to the traditional area moment
but is calculated only at points
(x(s), y(s)) along the object contour.
The pq th moment is: x(s)p y(s)q ds.
R

The infinite set of line moments


LINE INTERSECTIONS uniquely determine the contour.
line junction: The point at which two line moment invariant: A set of
or more lines meet. See invariant values computable from the
junction labeling . [ VSN:4.1.1] line moments . These may be invariant
line label: In an ideal polyhedral to translation, scaling and rotation.
blocks world scene, lines arise from line of sight: A straight line from the
only a limited set of physical situations observer or camera into the scene,
such as convex or concave usually to some target. See [ JKS:1.4]:
surface shape discontinuities (
fold edges ), occluding edges where a
fold edge is seen against the LINE OF SIGHT
background (blade edge), crack edges
where two polyhedra have aligned edges
or shadow edges. Line labels identify
the type of line (i.e., one of these
types). Assigning labels is one step in
scene understanding that helps deduce
the 3D structure of the scene. See also line scan camera: A camera that uses
junction label . Here is an example of a solid-state or semiconductor (e.g.,
the usual line labels for convex(+), CMOS) linear array sensor , in which
concave() and occluding (>) edges. all of the photosensitive elements are in
[ BKPH:15.6] a single 1D line. Typical line scan
cameras have between 32 and 8192
elements. These sensors are used for a
variety of machine vision applications
+ + such as scanning, flow process control
+ and position sensing. [ BT:3]
-
-
line segmentation: See
line linking: See line grouping . curve segmentation. [ DH:9.2.4]
L 139

line spread function: The line spread linear features: A general term for
function describes how an ideal features that are locally or globally
infinitely thin line would be distorted straight, such as lines or straight
after passing through an optical system. edges.
Normally, this can be computed by
integrating the point spread functions linear filter: A filter whose output is a
of an infinite number of points along weighted sum of its inputs, i.e., all
the line. [ EH:11.3.5] terms in the filter are either constants
or variables. If {xi } are the inputs
line thinning: See thinning . (which may be pixel values from a
[ JKS:2.5.11] local neighborhood or pixel values from
the same position in different images of
linear: 1) Having a line-like form. 2) A the same scene, etc.), then the linear
mathematical description for a process filter output would be
P
ai xi + a0 , for
in which the relationship between some some constants ai . [ FP:7]
input variables ~x and some output
variables ~y is given by ~y = A~x where A linear regression: Estimation of the
is a matrix. [ BKPH:6.1] parameters of a linear relationship
between two random variables X and Y
linear array sensor: A solid-state or given sets of samples ~xi and ~yi . The
semiconductor (e.g., CMOS) sensor in objective is to estimate the matrix A
which all of the photosensitive elements and vector ~a that minimize the residual
are in a single 1D line. Typical linear r(A, ~a) = i k~yi A~xi ~ak2 . In this
P
array sensors have between 32 and 8192 form, the ~xi are assumed to be
elements and are used in line scan noise-free quantities. When both
cameras. variables are subject to error,
orthogonal regression is preferred.
linear discriminant analysis: See
[ WP:Linear regression]
linear discriminant function .
[ SB:11.6] linear transformation: A
mathematical transformation of a set of
linear discriminant function:
values by addition and multiplication
Assume a feature vector ~x based on
by constants. If the set of values is a
observations of some structure.
vector ~x, the general linear
(Assume that the feature vector is
transformation produces another vector
augmented with an extra term with
~y = A~x, where ~y need not have the
value 1.) A linear discriminant function
same dimension as ~x and A is a
is a basic classification process that
constant matrix (i.e., is not a function
determines which of two classes or cases
of ~x). [ SQ:2.2.1]
the structure belongs to based on the
sign of the Plinear function lip shape analysis: An application of
l = ~a ~x = ai xi , for a given coefficient computer vision to understanding the
vector ~a. For example, to discriminate position and shape of human lips as
between unit side squares and unit part of face analysis . The goal might
diameter circles based on the area A, be face recognition or
the feature vector is ~x = (A, 1) and the expression understanding .
coefficient vector ~a = (1, 0.89) . If
l > 0, then the structure is a square, lip tracking: An application of
otherwise a circle. [ SB:11.6] computer vision to following the
140 L

position and shape of human lips in a the curve is locally uncurved or


video sequence. The goal might be for straight), although the curve has
lip reading, augmentation of deaf sign nonzero local curvature at other other
analysis or focusing of resolution during points (e.g., at 4 ). See also
image compression . differential geometry .

local: A local property of a Local Feature Focus (LFF) method:


mathematical object is one that is A 2D part identification and
defined in terms only of a small pose estimation algorithm that can
neighborhood of the object, for cope with large amounts of occlusion of
instance, curvature . In image the parts. The algorithm uses a
processing, a local operator operates on mixture of property-based classifiers ,
a small number of nearby pixels at a graph models and geometric models .
time. [ BKPH:4.2] The key identification process is based
around local configurations of
local binary pattern: Given a local image features that is more robust to
neighborhood about a point, use the occlusion . [ R. C. Bolles, and R. A.
value of the central pixel to threshold Cain, Recognizing and locating
the neighborhood. This creates a local partially visible objects, the
descriptor of the gray scale structure local-feature-focus method, Int. J. of
that is invariant to lightness and Robotics Research, 1:57-82, 1982.]
contrast transformations, that can be
used to create local texture primitives . local invariant: See
[ PGS:4.7] local point invariant .

local contrast adjustment: A form local operator: An image processing


of contrast enhancement that adjusts operator that computes its output at
pixel intensities based on the values of each pixel from the values of the nearby
nearby pixels instead of the values of all pixels instead of using all or most of the
pixels in the image. The right image pixels in the image. [ JKS:1.7.2]
has the eye areas brightness (from
original image at the left) enhanced local point invariant: A property of
while maintaining the backgrounds local shape or intensity that is invariant
contrast: to, e.g., translation, rotation, scaling,
contrast or brightness changes, etc. For
example, a surfaces
Gaussian curvature is invariant to
change in position.

local surface shape: The shape of a


surface in a small region around a
point, often classified into one of a small
number of surface shape classifications .
local curvature estimation: A part Computed as a function of the
of surface or curve shape estimation surface curvatures .
that estimates the curvature at a given local variance contrast: The
point based on the position of nearby variance of the pixel values computed
parts of the curve or surface. For in a neighborhood about each pixel.
example, the curve y = sin(x) has zero Contrast is the difference between the
local curvature at the point x = 0 (i.e.,
L 141

larger and smaller values of this calculus. For example, a square can be
variance. Large values of this property defined as: square(s)
occurs in highly textured or varying polygon(s) & number of sides(s, 4)
areas. & e1 e2 (e1 6= e2 &
side of (s, e1 ) & side of (s, e2 )
log-polar image: An & length(e1 ) = length(e2 )
image representation in which the & (parallel(e1 , e2 )
pixels are not in the standard Cartesian | perpendicular(e1 , e2 ))) .
layout but instead have a space varying
layout. In the log-polar case, the image long baseline stereo: See
is parameterized by a polar coordinate wide baseline stereo .
and a radial coordinate r. However,
unlike polar coordinates , the radial long motion sequence: A
distance increases exponentially as r video sequence of more than just a few
grows. The mapping from position frames in which there is significant
(, r) to Cartesian coordinates is camera or scene motion. The essential
r r
( cos(), sin()), where is some idea is that the 3D scene structure can
design parameter. Further, the amount be inferred by effectively a stereo vision
of area of the image plane represented process. Here the matched
by each pixel grows exponentially with image features can be tracked through
r, although the precise pixel size the sequence, instead of having to solve
depends on factors like amount of pixel the stereo correspondence problem . If
overlap, etc. See also foveal image . a long sequence is not available, then
The receptive fields of a log-polar analysis could use optical flow or
image (courtesy of Herman Gomes) can short baseline stereo .
be seen in the outer rings of:
look-up table: Given a finite set of
input values { xi } and a function on
these values, f (x), a look-up table
records the values
{ (xi , f (xi )) } so that the value of the
function f () can be looked up directly
rather than recomputed each time.
Look-up tables can be easily used for
color remapping or standard functions
of integer pixel values (e.g., the
logarithm of a pixels value).
[ BKPH:10.14]

lossless compression: A category of


log-polar stereo: A form of image compression in which the
stereo vision in which the input images original image can be exactly
come from log-polar sensors instead of reconstructed from the compressed
the standard Cartesian layout. image. This contrasts with
logarithmic transformation: See lossy compression . [ SB:1.3.2]
pixel logarithm operator . [ SB:3.3.1] lossy compression: A category of
logical object representation: An image compression in which the
object representation based on some original image cannot be exactly
logical formalism such as the predicate reconstructed from the compressed
142 L

image. The goal is to lose insignificant


image details (e.g., noise ) while
limiting perception of changes to the
image appearance. Lossy algorithms
generally produce greater compression
than lossless compression . [ SB:1.3.2]

low angle illumination: A


machine vision technique, often used
for industrial vision , where a
light source (usually a
point light source ) is placed so that a
ray of light from the source to the low level vision: A general and
inspection point is almost somewhat imprecisely (i.e.,
perpendicular to the surface normal at contentiously) defined term for the
that point. The situation can also arise initial stages of image analysis in a
naturally, e.g., from the sun position at vision system. It can also be used for
dawn or dusk. One consequence of this the initial stages of processing in
low angle is that shallow surface shape biological vision systems. Roughly, low
defects and cracks cast strong shadows level vision refers to the first few stages
that may simplify the inspection of processing applied to
process. See: intensity images . Some authors use this
term only for operations that result in
other images. So, edge detection is
about where most authors would say
that low-level vision ends and
CAMERA
middle-level vision starts. [ BB:1.2]

low pass filter: This term is imported


LIGHT SOURCE
from 1D signal processing theory into
image processing . The term low is a
shorthand for low frequency, that, in
the context of a single image, means
low spatial frequency , i.e., intensity
TARGET POINT
patterns that change over many pixels.
Thus a low pass filter applied to an
low frequency: Usually referring to image leaves the low spatial frequency
low spatial frequency in the context of patterns, or large, slowly changing
computer vision . The low-frequency patterns, and removes the high spatial
components of an image are the slowly frequency components (sharp edges ,
changing intensity components of the noise ). Low pass filters are a kind of
image, such as large regions of bright smoothing or noise reduction filter.
and dark pixels. If low temporal Alternatively, filtering is applied to the
frequency is the intended meaning, then changing values of a given pixel over an
low frequency refers to slowly changing image sequence. In this case the pixel
patterns of brightness or darkness at values can be treated as a sampled time
the same pixel in a video sequence. sequence and the original signal
This image shows the low-frequency processing definition of low pass filter
components of an image. is appropriate. Filtering this way
[ WP:Low frequency] removes rapid temporal changes. See
L 143

also high pass filter . Here is an image


and a low-pass filtered version luma: The luminance component of
[ LG:6.2]: light. Color can be divided into luma
and chroma . [ FP:6.3.2]

luminance: The measured intensity


from a portion of a scene. [ AJ:3.2]

luminance efficiency: The sensor


specific function V () that determines
how the observed light I(x, y, ) at
sensor position (x, y) of wavelength
contributes
R to the measured luminance
l(x, y) = I()V ()d at that point.
[ AJ:3.2]
Lowes curve segmentation
luminous flux: The amount of light
method: An algorithm that tries to
at all wavelengths that passes through
split a curve into a sequence of straight
a given region in space. Proportional to
line segments. The algorithm has three
perceived brightness.
main stages: 1) a recursive splitting of
[ WP:Luminous flux]
segments into two shorter, but more
line-like segments, until all remaining luminosity coefficient: A component
segments are very short. This forms a of tristimulus color theory . The
tree of segments. 2) Merging segments luminosity coefficient is the amount of
in the tree in a bottom-up fashion luminance contributed by a given
according to a straightness measure. 3) primary color to the total perceived
Extracting the remaining unmerged luminance. [ AJ:3.8]
segments from the tree as the
segmentation result.
M

M-estimation: A robust some of the competences of the human


generalization of vision system. [ JKS:1.1]
least square estimation and
maximum likelihood estimation . macrotexture: The intensity pattern
[ FP:15.5.1] formed by spatially organized texture
primitives on a surface, such as a tiling.
Mach band effect: An effect in the This contrasts with microtexture .
human visual system in which a human [ JKS:7.1]
observer perceives a variation in
brightness at the edges of a region of magnetic resonance imaging
constant brightness. This variation (MRI): See NMR . [ FP:18.6]
makes the region appear slightly darker
magnification: The process of
when it is beside a brighter region and
enlargement (e.g., of an image). The
appear slightly brighter when it is
amount of enlargement applied.
beside a darker region. [ AJ:3.2]
[ AJ:7.4]
machine vision: A general term for
magnitude-retrieval problem: The
processing image data by a computer
reconstruction of a signal based on only
and often synonymous with
the phase (not the magnitude) of the
computer vision . There is a slight
Fourier transform .
tendency to use machine vision for
practical vision systems, such as for Mahalanobis distance: The distance
industrial vision , and computer between two N -dimensional points
vision for more exploratory vision scaled by the statistical variation in
systems or for systems that aim at
144
M 145

each component of the point. For require identifying roads, buildings or


example, if ~x and ~y are two points from land features. This image shows a road
the same distribution that has model (black) overlaying an aerial
covariance matrix C then the image
Mahalanobis distance is given by
1
((~x ~y ) C1 (~x ~y )) 2

The Mahalanobis distance is the same


as the Euclidean distance if the
covariance matrix is the identity
matrix. A common usage in
computer vision systems is for
comparing feature vectors whose
elements are quantities having different
ranges and amounts of variation, such
as a 2-vector recording the properties of marching cubes: An algorithm for
area and perimeter. [ SB:11.8] locating surfaces in volumetric datasets.
Given a function f () on the voxels , the
mammogram analysis: A algorithm estimates the position of the
mammogram is an X-ray of the human surface f (~x) = c for some c. This
female breast. The main purpose of requires estimating where the surface
analysis is the detection of potential intersects each of the twelve edges of a
signs of cancerous growths. voxel. Many implementations
propagate from one voxel to its
Manhattan distance: Also called the
neighbors, hence the marching term.
Manhattan metric. Motivated by the
[ W. Lorensen, and H. Cline, Marching
problem of only being able to walk
Cubes: a high resolution 3D surface
along city blocks in dense urban
construction algorithm, Computer
environments, the distance between
Graphics, Vol. 21, pp 163-169, 1987.]
points (x1 , y1 ) and (x2 , y2 ) is
| x1 x2 | + | y1 y2 |. [ BB:2.2.6] marginal distribution: A probability
distribution of a random variable X
many view stereo: See
derived from the joint probability
multi-view stereo . [ FP:11.4]
distribution of a number of random
MAP: See variables integrated over all variables
maximum a posteriori probability . except X. [ WP:Marginal distribution]
[ AJ:8.15]

map analysis: Analyzing an image of Markov Chain Monte Carlo:


a map (e.g., obtained with a flat-bed Markov Chain Monte Carlo (MCMC) is
scanner) in order to extract a symbolic a statistical inference method useful for
description of the terrain described by estimating the parameters of complex
the map. This is now a largely obsolete distributions. The method generates
process given digital map databases. samples from the distribution by
[ WP:Map analysis] running the Markov Chain that models
the problem for a long time (hopefully
map registration: The registration to equilibrium) and then uses the
of a symbolic map to (usually) aerial ensemble of samples to estimate the
or satellite image data. This may distribution. The states of the Markov
146 M

Chain are the possible configurations of type style, or a particular face viewed
the problem. at the right scale. It is similar to
[ WP:Markov chain Monte Carlo] template matching except the matched
filter can be tuned for spatially
Markov random field (MRF): An separated patterns. This is a
image model in which the value at a signal processing term imported into
pixel can be expressed as a linear image processing . [ AJ:9.12]
weighted sum of the values of pixels in
a finite neighborhood about the matching function: See
original pixel plus an additive random similarity metric . [ DH:6.7]
noise value. [ JKS:7.4]
matching method: A general term
Marrs theory: A shortened term for for finding the correspondences between
Marrs theory of the human vision two structures (e.g., surface matching )
system. Some of the key stages in this or sets of features (e.g.,
integrated but incomplete theory are stereo correspondence ). [ JKS:15.5.2]
the raw primal sketch ,
full primal sketch , 2.5D sketch and 3D mathematical morphology
object recognition . [ BT:11] operation: A class of mathematically
defined image processing operations in
MarrHildreth edge detector: An which the result is based on the spatial
edge detector based on multi-scale pattern of the input data values rather
analysis of the zero-crossings of the than values themselves. For example, a
Laplacian of Gaussian operator . morphological line thinning algorithm
[ NA:4.3.3] would identify places in an image where
a line description was represented by
mask: A term for an m n array of data more than 1 pixel wide (i.e., the
numbers or symbolic labels. A mask pattern to match). As this is
can be the smoothing mask used in a redundant, the thinning algorithm
convolution , the target in a would chose one of the redundant pixels
template matching or the kernel used to be set to 0. Mathematical
in a mathematical morphology morphology operations can apply to
operation, etc. Here is a simple mask both binary and gray scale images .
for computing an approximation to the This figure shows a small image patch
Laplacian operator [ TV:3.2]: image before and after a thinning
operation [ SQ:7]

matched filter: A matched filter is an


operator that produces a strong result matrix: A mathematical structure of a
in the output image when it processes a given number of rows and columns with
portion of the input image containing a each entry usually containing a number.
pattern for which it is matched. For A matrix can be used to represent a
example, the filter could be tuned for transformation between two coordinate
the letter e in a given font size and systems, record the covariance of a set
M 147

of vectors, etc. A matrix for rotating a parameters, position or identity


2D vector by 6 radians is [ AJ:2.7]: respectively that have highest
probability given the observed image
cos( 6 ) sin( 6 )

data. [ AJ:8.15]
sin( 6 ) cos( 6 )
maximum entropy: A method for
extracting the maximum amount of

0.866 0.500
= information ( entropy ) from a
0.500 0.866
measurement (such as an image) in the
presence of noise. This method will
matrix array camera: A 2D always give a conservative result; only
solid state imaging sensor , such as presenting structure where there is
those found in typical current video, evidence for it. [ AJ:6.2]
webcam and machine vision cameras.
[ LG:2.1.3] maximum entropy restoration: An
image restoration technique based on
matte surface: A surface whose maximum entropy . [ AJ:8.14]
reflectance follows the Lambertian
model. [ BB:3.5.1] maximum likelihood estimation:
Estimating the parameters of a problem
maximal clique: A clique (all nodes that has the highest likelihood or
are connected to all other nodes in the probability, i.e., given the observed
clique) where no further nodes exist data. For example, the maximum
that are connected to all nodes in the likelihood estimate of the mean of a
clique. Maximal cliques may have Gaussian distribution is the average of
different sizes the issue is maximality, the observed samples drawn from that
not size. Maximal cliques are used in distribution. [ AJ:8.15]
association graph matching algorithms
to represent maximally matched MCMC: See
structures. The graph below has two Markov Chain Monte Carlo .
maximal cliques: BCDE and ABD. [ WP:Markov chain Monte Carlo]
[ BB:11.3.3]
MDL: See
minimum description length.
A [ FP:16.3.4]

mean and Gaussian curvature


shape classification: A classification
B D
of a local (i.e., very small) surface
patch (often at single pixels from a
range image ) into one of a set of simple
surface shape classes based on the signs
C E
of the mean and Gaussian curvatures.
The standard set of shape classes is:
maximum a posteriori probability: {plane, concave cylinder, convex
The highest probability after some cylinder, concave ellipsoid, convex
event or observations. This term is ellipsoid, saddle valley, saddle ridge,
often used in the context of minimal}. Sometimes the classes
parameter estimation , pose estimation {saddle valley, saddle ridge, minimal}
or object recognition problems, in are conflated into the single class
which case we wish to estimate the
148 M

hyperbolic. This table summarizes


the classifications based on the
curvature signs:

MEAN CURVATURE
GAUSSIAN CURVATURE - 0 +

- measurement resolution: The


degree to which two differing quantities
can be distinguished by measurement.
0 This may be the minimum spatial
distance that two adjacent pixels
represent ( spatial resolution ) or the
minimum time difference between visual
+ IMPOSSIBLE observations ( temporal resolution ), etc.
[ WP:Resolution#Measurement resolution]
mean curvature: A mathematical
characterization for a component of
medial axis skeletonization: See
local surface shape at a point on a
medial axis transform . [ BB:8.3.4]
smooth surface. Each point can be
uniquely described by a pair of medial axis transform: An operation
principal curvatures . The mean on a binary image that transforms
curvature is the average of the principal regions into sets of pixels that are the
curvatures. [ JKS:13.3.2] centers of circles that are bitangent to
the boundary and that fit entirely
mean filter: See
within the region. The value of each
mean smoothing operator . [ JKS:4.3]
point on the axis is the radius of the
mean shift: An adaptive gradient bitangent circle. This can be used to
ascent technique that operates by represent the region by a simpler
iteratively moving the center of a search axis-like structure and is most effective
window to the average of certain points on elongated regions. A region and its
within the window. [ WP:Mean-shift] medial axis are below. [ BB:8.3.4]

mean smoothing operator: A


noise reduction operator that can be
applied to a gray scale image or to
separate components of a
multi-spectral image . The output value
at each pixel is the average of the
values of all pixels in a neighborhood
of the input pixel. The size of the
neighborhood determines how much
smoothing (or noise reduction) is done, medial line: A curve going through
but also how much blurring of fine the middle of an elongated structure.
detail also occurs. A image with See also medial axis transform . This
Gaussian noise with = 13 and its figure shows a region and its medial
mean smoothing are [ JKS:4.3]: line. [ BB:8.3.4]
M 149

medical image registration: A


general term for registration of two or
more medical image types or an atlas
with some image data. A typical
registration would align X-ray CAT
and NMR images.
[ WP:Image registration#Applications]

membrane model: A surface fitting


model that minimizes a combination of
the smoothness of the fit surface and
the closeness of the fit surface to the
original data. The surface class must
medial surface: The medial surface of
have C 0 continuity and thus it differs
a volume is the 3D generalization of the
from the smoother thin plate model
medial axis of a planar region. It is the
locus of centers of spheres that touch that has C 1 continuity.
the surface of the volume at three or mesh model: A tessellation of an
more points. [ BB:8.3.4] image or surface into polygonal
median filter: See median smoothing. patches, much used in
[ JKS:4.4] computer aided design (CAD) . The
vertices of the mesh are called nodes, or
median flow filtering: A nodal points. A popular class of meshes
noise reduction operation on vector is based on triangles, for instance the
data that generalizes the median filter Delaunay triangulation . Meshes can be
on image data. The assumption is that uniform, i.e., all polygons are the same,
the vectors in a spatial neighborhood or non-uniform. Uniform meshes can be
about the current vector should be represented by small sets of parameters.
similar. Dissimilar vectors are rejected. Surface meshes have been used for
The term flow arose through the modeling free-form surfaces (e.g., faces,
filters development in the context of landscapes). See also surface fitting .
image motion. This icosahedron is a mesh model of a
nearly spherical object [ JKS:13.5]:
median smoothing: An image
noise reduction operator that replaces a
pixels value by the median (middle) of
the sorted pixel values in its
neighborhood . An image with
salt-and-pepper noise and the result of
applying median smoothing are
[ JKS:4.4]:

mesh subdivision: Methods for


subdividing cells in a mesh model into
progressively smaller cells. For example
see Delaunay triangulation .
[ WP:Mesh subdivision]
150 M

metameric colors: Colors that are extrinsic camera parameters to enable


defined by a limited number of channels metric reconstruction of a scene.
each of which integrates a range of the
spectrum. Hence the same metameric Mexican hat operator: A
color can be caused by a variety of convolution operator that implements
spectral distributions. [ BKPH:2.5.1] either a Laplacian of Gaussian or
difference of Gaussians operator (which
metric determinant: The metric produce very similar results). The mask
determinant is a measure of curvature. that can be used to implement this
For surfaces, it is the square root of the convolution has a shape similar to a
determinant of the Mexican hat (sombrero), as seen here
first fundamental form matrix of the [ JKS:5.4]:
surface.
3
metric property: A visual property x 10
that is a measurable quantity, such as a
1
distance or area. This contrasts with
logical properties such as 0
image connectedness . [ HZ:1.7]
1
metric reconstruction: 2
Reconstruction of the 3D structure of a
3
scene with correct spatial dimensions
and angles. This contrasts with 4
4
projective reconstruction . Two views of 2 2
0 0
a metrical and projective reconstruction 2 2
4
of a cube are below. The metrical Y
X
projection looks correct from all
views, but the perspective projection micron: One millionth of a meter; a
may look correct only from the views micrometer. [ EH:2.2]
where the data was acquired.
[ WP:Camera auto- microscope: An optical device
calibration#Problem statement] observing small structures such as
organic cells, plant fibers or integrated
circuits. [ EH:5.7.5]

microtexture: See statistical texture .


[ RN:8.3.1]

OBSERVED VIEW mid-sagittal plane: The plane that


RECONSTRUCTED VIEW OBSERVED VIEW RECONSTRUCTED VIEW

METRICAL RECONSTRUCTION separates the body (and brain) into left


PERSPECTIVE RECONSTRUCTION

and right halves. In medical imaging


metric stratum: These are the set of (e.g., NMR ), it usually refers to a view
similarity transformations (i.e., rigid of the brain sliced down the middle
transformations with a scaling). This is between the two hemispheres.
what can be recovered from image data [ WP:Sagittal plane#Variations]
without external information such as
some known length. middle level vision: A general term
referring to the stages of visual data
metrical calibration: Calibration of processing between low level and
intrinsic and high level vision. There are many
M 151

variations of the definition of this term usually requires several components: 1)


but a usable rule of thumb is that the models observed (e.g., whether
middle level vision starts with lines or circular arcs), 2) the parameters
descriptions of the contents of an image of the models (e.g., the line endpoints),
and results in descriptions of the 3) how the image data varies from the
features of the scene. Thus, models (e.g., explicit deviations or
binocular stereo would be a middle noise model parameters) and 4) the
level vision process because it acts on remainder of the image that is not
image edge fragments to produce 3D explained by the models. [ FP:16.3.4]
scene fragments.
minimum distance classifier: Given
MIMD: See an unknown sample with feature vector
multiple instruction multiple data . ~x, select the class c with model vector
[ RJS:8] m
~ c for which the distance || ~x m
~ c || is
smallest. [ SB:11.8]
minimal point: A point on a
hyperbolic surface where the two minimum spanning tree: See
principal curvatures are equal in minimal spanning tree . [ DH:6.10.2.1]
magnitude but opposite in sign, i.e.,
1 = 2 . [ WP:Maxima and minima] MIPS: millions of instructions per
second. [ WP:Instructions per second]

minimal spanning tree: Consider a mirror: A specularly reflecting surface


graph G and a subset T of the arcs in for which incident light is reflected only
G such that all nodes in G are still at the same angle and in the same plane
connected in T and there is exactly one as the surface normal . [ EH:5.4]
path joining any two nodes. T is a
miss-one-out test: See
spanning tree. If each arc has a weight
leave-one-out test . [ FP:22.1.5]
(possibly constant), the minimal
spanning tree is the tree T with missing data: Data that is
smallest total weight. This is a graph unavailable, hence requiring it to be
and its minimal spanning tree estimated. For example a moving
[ DH:6.10.2.1]: person may become occluded resulting
in missing position data for a number of
frames. [ FP:16.6.1]

missing pixel: A pixel for which no


GRAPH MINIMAL SPANNING TREE
value is available (e.g., if there was a
minimum bounding rectangle: The problem with a sensing element in the
rectangle of smallest area that image sensor). [ FP:16.6.1]
surrounds a set of image data.
[ WP:Minimum bounding rectangle] mixed pixel: A pixel whose
measurement arises from more than one
minimum description length scene phenomena. For example, a pixel
(MDL): A criterion for comparing that observes the edge between two
descriptions usually based on the regions. This pixel has a gray level
implicit assumption that the best that lies between the different gray
description is the one that is shortest levels of the two regions.
(i.e., takes the fewest number of bits to
encode). The minimum description
152 M

mixed reality: Image data that


contains both original image data and
overlaid computer graphics. See also
augmented reality . This image shows
an example of mixed reality, where the
butterfly is a graphical object added to
the image of the small robot:
[ WP:Mixed reality]

model: An abstract representation of


some object or class of objects.
[ WP:Model]

model acquisition: The process of


learning a model, usually based on
observed instances or examples of the
structure being modeled. This may be
simply learning the parameters of a
distribution from examples. For
example, one might learn the image
texture properties that distinguish
tumorous cells from normal cells.
Alternatively, the structure of the
object might be learned as well, such as
constructing a model of a building from
a video sequence. Another type of
mixture model: A probabilistic model acquisition is learning the
representation in which more than one properties of an object, such as what
distribution is combined, modeling a properties and relations define a square
situation where the data may arise from as compared to other geometric shapes.
different sources or have different [ FP:21.3]
behaviors, each with different
probability distributions. [ FP:16.6.1] model base: A database of models
usually used as part of an identification
MLE: See process. [ JKS:15.1]
maximum likelihood estimation .
[ AJ:8.15] model base indexing: Selecting one
or more candidate models from a
modal deformable model: A model database of structures known by
deformable model based on modal the system. This is usually to eliminate
analysis (i.e., study of the different exhaustive testing with every member
shapes that an object can assume). of the model base. [ FP:16.3]

mode filter: A noise reduction filter model based coding: A method of


that, for each pixel, outputs the mode encoding the contents of an image (or
(most common) value in its local video sequence ) using a pre-defined or
neighborhood . The figure below shows learned set of models. This could be for
a raw image with salt-and-pepper noise producing a more compact description
and the filtered version at the right. of the image data (see
[ NA:3.5.3] model based compression ) or for
M 153

producing a symbolic description. For an image sequence . For example, the


example, a Mondrian style image could estimated position, orientation and
be encoded by the positions, sizes and velocity of a modeled vehicle in one
colors of the colored rectangular image allows a strong prediction of its
regions. location in the next image in the
sequence. [ FP:17]
model based compression: An
application of model based coding for model based vision: A general term
the purpose of reducing the amount of for using models of the objects
memory required to describe the image expected to be seen in the image data
while still allowing reconstruction of the to help with the image analysis. The
original image. [ SEU:5.3.6] model allows, among other things,
prediction of additional model feature
model based feature detection: positions, verification that a set of
Using a parametric model of a feature features could be part of the model and
to locate instances of the feature in an understanding of the appearance of the
image. For example, a model in the image data. [ FP:18]
parametric edge detector uses a
parameterized model of a step edge model building: See also
that encodes edge direction and model acquisition . The process of
edge magnitude. constructing a geometric model usually
based on observed instances or
model based recognition: examples of the structure being
Identification of the structures in an modeled, such as from a video
image by using some internally sequence. [ FP:21.3]
represented model of the objects known
to the computer system. The models model fitting: See model registration.
are usually geometric models. The [ RN:3.3]
recognition process finds image features
that match the model features with the model invocation: See
right shape and position. The model base indexing . [ FP:16.3]
advantage of model based recognition is
model reconstruction: See
that the model encodes the object
model acquisition . [ FP:21.3]
shape thus allowing predictions of
image data and less chance of model registration: A general term
coincidental features being falsely for aligning a geometric model to a set
recognized. [ TV:10.1] of image data. The process may require
estimating the rotation , translation
model based segmentation: An
and scale that maps a model onto the
image segmentation process that uses
image data. There may also be shape
geometric models to partition the
parameters, such as model length, that
image into different regions. For
need to be estimated. The fitting may
example, aerial images could have the
need to account for
visible roads segmented by using a
perspective distortion . This figure
geographic information system model
shows a 2D model registered on an
of the road network. [ FP:14]
intensity image of the same part.
model based tracking: An image [ RN:3.3]
tracking process that uses models to
locate the position of moving targets in
154 M

Moire interferometry: A technique


for contouring surfaces that works by
projecting a fringe pattern (e.g., of
straight lines) and observing this
model selection: See pattern through another grating. This
model base indexing . [ FP:16.3] effect can be acheieved in other ways as
well. The technique is useful for
modulation transfer function measuring extremely small stress and
(MTF): Informally, the MTF is a distortion movements.
measure of how well spatially varying [ WP:Moire pattern#Interferometric approach]
patterns are observed by an optical
system. More formally, in a 2D image,
let X(fh , fv ) and Y (fh , fv ) be the Moire pattern: See moire fringe .
Fourier transforms of the input x(h, v) [ AJ:4.4]
and output y(h, v) images. Then, the Moire topography: A method for
MTF of a horizontal and vertical measuring the local shape of a surface
spatial frequency pair (fh , fv ) is by analyzing the spacing of
| H(fh , fv ) | / | H(0, 0) |, where moire fringes on the target surface.
H(fh , fv ) = Y (fh , fv )/X(fh , fv ). This
is also the magnitude of the moment: A method for summarizing
optical transfer function . [ AJ:2.6] the distribution of pixel positions or
values. Moments are a parameterized
Moire fringe: An interference pattern family of values. For example, if I(x, y)
that is observed when spatially is a binary image then x,y I(x, y)xp y q
sampling, at a given spatial frequency ,
computes its pq th moment mpq . (See
a signal that has a slightly different
also gray level moments and
spatial frequency. The result is a set of
moments of intensity .) [ AJ:9.8]
light and dark bands in the observed
image. As well as causing image moment characteristic: See
degradation, this effect can also be used moment invariant . [ AJ:9.8]
in range sensors , where the fringe
positions give an indication of surface moment invariant: A function of
depth. An example of typical observed image moment values that keeps the
fringe patterns is [ AJ:4.4]: same value even if the image is
M 155

transformed in some manner. For another. For example, motion parallax


example, the value A12 ((20 )2 + (02 )2 ) or occlusion relationships give evidence
is invariant where pq are of relative depths.
central moments of a binary image [ WP:Depth perception#Monocular cues]
region and A is the area of the region.
This value is a constant even if the
image data is translated , rotated or monocular visual space: The visual
scaled . [ AJ:9.8] space behind the lens in an optical
system. This space is commonly
moments of intensity: An image assumed to be without structure but
moment value that takes account of the scene depth can be recovered from the
gray scales of the image pixels as well defocus blurring that occurs in this
as their positions. For example, if space.
G(x, y) is a gray scale image , then
x,y G(x, y)xp y q computes its pq th monotonicity: A sequence of values
moment of intensity gpq . See also or function that is either continuously
gray level moment . increasing (monotone increasing) or
continuously decreasing (monotone
Mondrian: A famous visual artist decreasing). [ WP:Monotonic function]
from the Netherlands, whose later
paintings were composed of adjacent
rectangular blocks of constant (i.e., Moravec interest point operator:
without shading ) color . This style of An operator that locates interest points
image has been used for much color at pixels where neighboring intensity
vision research and, in particular, values change greatly in at least one
color constancy because of its direction. These points can be used for
simplified image structure without stereo matching or feature point
shading, specularities , shadows or tracking. The operator computes the
light sources . [ BKPH:9.2] sum of the squares of pixel differences
in a line vertically, horizontally and
monochrome: Containing only both diagonal directions in a 5 5
different shades of a single color . This window about the given pixel. The
color is usually different shades of gray, minimum of these four values is
going from pure black to pure white. selected and then all values that are not
[ WP:Monochrome] local maxima or are below a given
threshold are suppressed. This image
monocular: Using a single camera, shows the interest points found by the
sensor or eye. This contrasts with Moravec operator as white dots on the
binocular and multi-ocular stereo original image. [ JKS:14.3]
where more than one sensor is used.
Sometimes there is also the implication
that the image data is acquired from
only a single viewpoint as a single
camera taking images over time is
mathematically equivalent to multiple
cameras. [ BB:2.2.2]

monocular depth cue: Image


evidence that indicates that one surface morphological gradient: A
may be closer to the viewer than gray scale mathematical morphology
156 M

operation applied to gray scale images morphology: The shape of a


that results in an output image similar structure. See also
to the standard intensity gradient . The mathematical morphology . [ AJ:9.9]
gradient is calculated by
1 morphometry: Techniques for the
2 (DG (A, B) EG (A, B)) where DG ()
and EG () are the gray scale dilate and measurement of shape.
erode respectively of image A by kernel [ WP:Morphometrics]
B. [ CS:4.5.5]
mosaic: The construction of a larger
morphological segmentation: Using image from a collection of partially
mathematical morphology operations overlapping images taken from different
applied to binary images to extract viewpoints . The reconstructed image
isolated regions of the desired shape. could have different geometries, e.g., as
The desired shape is specified by the if seen from a single perspective
morphological kernel . The process viewpoint, or as if seen from an
could also be used to separate touching orthographic viewpoint. See also
objects. image mosaic . [ RJS:2]

morphological smoothing: A motion: A general language term, but,


gray scale mathematical morphology in the context of computer vision, refers
operation applied to gray scale images to analysis of an image sequence where
that results in an output image similar the camera position or scene structure
to that produced by standard changes over time. [ BB:7]
noise reduction . The smoothing is
motion analysis: Analysis of an
calculated by CG (OG (A, B), B) where
image sequence in order to extract
CG () and OG () are the gray scale close
useful information. Examples of
and open operations respectively of
information routinely extracted include:
image A by kernel B.
shape of observed scene,
morphological transformation: One figureground separation ,
of a large class of binary and egomotion estimation , and estimates of
gray scale image transformations whose a targets position and motion.
primary characteristic is they react to [ BB:7.2-7.3]
the pattern of the pixel values rather
motion blur: The blurring of an
than the values themselves. Examples
image that arises when either the
include dilation , erosion , skeletonizing,
camera or something in the scene
thinning , etc. The right figure below is
moves while the image is being
the opening of the left figure, when
acquired. The image below shows the
using a disk shaped structuring element
blurring that occurs when an object
11 pixels in diameter. [ AJ:9.9]
moves during image capture.
[ WP:Motion blur]
M 157

point onto the image plane . In many


circumstances this is closely related to
the optical flow , but may differ as
image intensities can also change due to
illumination changes. Similarly, motion
of a uniformly shaded region is not
observable locally because there is no
changes in image intensity values .
[ TV:8.2]

motion layer segmentation: The


motion coding: 1) A component of segmentation of an image into different
video sequence compression in which regions where the motion is locally
efficient methods are used for consistent. The layering effect is most
representing movement of image regions noticeable when the observer is moving
between video frames . 2) A term for through a scene with objects at
neural cells tuned to respond for different depths (causing different
direction and speeds of image motion. amounts of parallax ) some of which
[ WP:Motion coding] might also be moving. See also
motion detection: Analysis of an motion segmentation . [ TV:8.6]
image sequence to determine if or when motion model: A mathematical
something in the observed scene moves. model of types of motion allowable for
See also change detection . [ JKS:14.1] the target object or camera, such as

motion discontinuity: When the only linear motion along the optical
smooth motion of either the camera or axis with constant velocity. Another
something in the scene changes, such example might allow velocities and
as the speed or direction of motion. accelerations in any direction, but
Another form of motion discontinuity is occasionally discontinuities, such as for
between two groups of adjacent pixels a bouncing ball. [ BB:7]
that have different motions. motion representation: See
motion estimation: Estimating the motion model. [ BB:7]
motion direction and speed of the motion segmentation: See
camera or something in the scene . motion layer segmentation . [ TV:8.6]
[ RJS:5]
motion sequence analysis: The class
motion factorization: Given a set of of computer vision algorithms that
tracked feature points through an process sequences of images captured
image sequence , a measurement matrix close together in space and time,
can be constructed. This matrix can be typically by a moving camera. These
factored into component matrices that analyses are often characterized by
represent the shape and 3D motion of assumptions on temporal coherence
the structure up to an 3D that simplify computation. [ BB:7.3]
affine transform (which is removable
using knowledge of the motion smoothness constraint: The
intrinsic camera parameters ). assumption that nearby points in the
image have similar motion directions
motion field: The projection of the and speeds, or similar optical flow .
relative motion vector for each scene
158 M

This constraint is based on the fact


that adjacent pixels generally record moving observer: A camera or other
data from the projection of adjacent sensor that is moving. Moving
surface patches from the scene. These observers have been extensively used in
scene components will have similar recent research on
motion relative to the observer. This structure from motion . [ VSN:8]
assumption can help reduce motion
MPEG: Moving Picture Experts
estimation errors or constrain the
Group. A group developing standards
ambiguity in optical flow estimates
for coding digital audio and video, as
arising from the aperture problem .
used in video CD, DVD and digital
motion tracking: Identification of the television. This term is often used to
same target feature points through an refer to media that is stored in the
image sequence . This could also refer MPEG 1 format.
to tracking complete objects as well as [ WP:Moving Picture Experts Group]
feature points, including estimating the
MPEG 2: A standard formulated by
trajectory or motion parameters of the
the ISO Motion Pictures Expert Group
target. [ FP:17]
(MPEG), a subset of ISO
movement analysis: A general term Recommendation 13818, meant for
for analyzing an image sequence of a transmission of studio-quality audio
scene where objects are moving. It is and video. It covers four levels of video
often used for analysis of human motion resolution. [ WP:MPEG-2]
such as for people walking or using sign
MPEG 4: A standard formulated by
language. [ BB:7.2-7.3]
the ISO Motion Pictures Expert Group
moving average smoothing: A form (MPEG), originally concerned with
of image noise reduction that occurs similar applications as H.263 (very low
over time by averaging the most recent bit rate channels, up to 64 kbps).
images together. It is based on the Subsequently extended to encompass a
assumption that variations in time of large set of multimedia applications,
the observed intensity at a pixel are including over the Internet.
random. Thus, averaging the values [ WP:MPEG-4]
will produce intensity estimates closer
MPEG 7: A standard formulated by
to the true (mean) value. [ DH:7.4]
the ISO Motion Pictures Expert Group
moving light display: An (MPEG). Unlike MPEG 2 and MPEG
image sequence of a darkened scene 4, that deal with compressing
containing objects with attached multimedia contents within specific
point light sources . The light sources applications, it specifies the structure
are observed as a set of moving bright and features of the compressed
spots. This sort of image sequence was multimedia content produced by the
used in the early research on different standards, for instance to be
structure from motion . used in search engines.
[ WP:MPEG-7]
moving object detection: Analyzing
an image sequence , usually with a MRF: See Markov random field .
stationary camera, to detect whether [ JKS:7.4]
any objects in the scene move.
[ JKS:14.1]
M 159

MRI: Magnetic Resonance Imaging. counts or other evidence values in the


See nuclear magnetic resonance . array makes it a histogram. [ BB:5.3.1]
[ FP:18.6]

MSRE: Mean Squared Reconstruction multi-grid method: An efficient


Error. algorithm for solving systems of
discretized differential (or other)
MTF: See equations. The term multi-grid is
modulation transfer function . used because the system is first solved
[ AJ:2.6] at a coarse sampling level, which is then
used to initialize a higher-resolution
multi-dimensional edge detection:
solution. [ WP:Multigrid method]
A variation on standard edge detection
of gray scale images in which the input multi-image registration: A general
image is multi-spectral (e.g., a RGB term for the geometric alignment of two
color image). The edge detection or more image datasets. Alignment
operator may detect edges in each allows pixels from the different source
dimension independently and then images to lie on top of each other or to
combine the edges or may use all be combined. (See also sensor fusion .)
information at each pixel directly. The For example, two overlapping intensity
following image shows edges detected images could be registered to help
from red, green and blue components of create a mosaic . Alternatively, the
an RGB image. images need not be from the same type
of sensor. (See multi-modal fusion .)
For example, NMR and CAT images
of the same body part could be
R registered to provide richer information,
e.g., for a doctor. This image shows
two unregistered range images on the
left and the registered datasets on the
right. [ FP:21.3]
G

B
multi-level: See multi-scale method.

multi-dimensional histogram: A multi-modal analysis: A general


histogram with more than one term for image analysis using image
dimension. For example consider data from more than one sensor type.
measurements as vectors, e.g., from a There is often the assumption that the
multi-spectral image , with N data is registered so that each pixel
dimensions in the vector. Then one records data of two or more types from
could create a histogram represented the same portion of the observed scene.
by an array with dimension N . The N [ WP:Computer Audition#Multi-
components in each vector are used to modal analysis]
index into the array. Accumulating
160 M

methods are: 1) some structures have


multi-modal fusion: See different natural scales (e.g., a thick
sensor fusion . bar could also be considered to be two
[ WP:Multimodal integration] back-to-back edges) and 2) coarse scale
information is generally more reliable in
multi-modal neighborhood
the presence of image noise , but the
signature: A description of a feature
spatial accuracy is better in finer scale
point based on the image data in its
information (e.g., an edge detector
neighborhood. The data comes several
might use a coarse scale to reliably
registered sensors, such as X-ray and
detect the edges and a finer scale to
NMR.
locate them more accurately). Below is
multi-ocular stereo: A an image with two scales of blurring.
stereo triangulation process that uses
more than one camera to infer 3D
information. The terms
binocular stereo and trinocular stereo
are commonly used when there are only
two or three cameras respectively.

multi-resolution method: See


multi-scale method . [ BB:3.7]

multi-scale description: See


multi-scale method .

multi-scale integration: 1)
Combining information extracted by
using operators with different scales . multi-scale representation: A
2) Combining information extracted representation having image features
from registered images with different or descriptions that belong to two or
scales. These two definitions could just more scales . An example might be
be two ways of considering the same zero crossings detected from
process if the difference in operator intensity images that have received
scale is only a matter of the amount of increasing amounts of
smoothing . An example of multi-scale Gaussian smoothing . A multi-scale
integration occurs combining edges model representation might represent
extracted from images with different an arm as a single generalized cylinder
amounts of smoothing to produce more at a coarse scale, two generalized
reliable edges. cylinders at an intermediate scale and
with a surface triangulation at a fine
multi-scale method: A general term scale. The representation might have
for a process that uses information results from several discrete scales or
obtained from more than one scale of from a more continuous range of scales,
image. The different scales might be as in a scale space . Below are zero
obtained by reducing the image size or crossings found at two scales of
by Gaussian smoothing of the image. Gaussian blurring.
Both methods reduce the [ WP:Scale space#Related multi-
spatial frequency of the information. scale representations#Related multi-
The main reasons for multi-scale scale representations]
M 161

red, green and blue components of an


RGB image. [ SEU:1.7.4]

G
multi-sensor geometry: The relative
placement of a set of sensors or
multiple views from a single sensor but
from different positions. One key
consequence of the different placements B
is ability to deduce the 3D structure of
the scene. The sensors need not be the
same type but usually are for
multi-spectral segmentation:
convenience. [ FP:11.4]
Segmentation of a
multi-spectral analysis: Using the multi-spectral image. This can be
observed image brightness at different addressed by segmenting the image
wavelengths to aid in the understanding channels individually and then
of the observed pixels. A simple version combining the results, or alternatively
uses RGB image data. Seven or more the segmentation can be based on some
bands, including several infrared combination of the information from
wavelengths are often used for satellite the channels.
remote sensing analysis. Recent [ WP:Multispectral segmentation]
hyperspectral sensors can give
multi-spectral thresholding: A
measurements at 100200 different
segmentation technique for
wavelengths. [ SQ:17.1]
multi-spectral image data. A common
multi-spectral image: An image approach is to threshold each spectral
containing data measured at more than channel independently and then
one wavelength. The number of logically AND together the resulting
wavelengths may be as low as two (e.g., images. An alternative is to cluster
some medical scanners), three (e.g., pixels in a multi-spectral space and
RGB image data), or seven or more choose thresholds that select desired
bands, including several infrared clusters. The images below show a
wavelengths (e.g., satellite colored image first thresholded in the
remote sensing ). Recent blue channel (0100 accepted) and then
hyperspectral sensors can give ANDed with the thresholded green
measurements at 100200 different channel (0100 accepted).
wavelengths. The typical
image representation uses a vector to
record the different spectral
measurements at each pixel of an image
array. The following image shows the
162 M

multi-tap camera: A camera that multiple motion segmentation: See


provides multiple outputs. motion segmentation . [ TV:8.6]

multi-thresholding: Thresholding multiple target tracking: A general


using a number of thresholds giving a term for tracking multiple objects
result that has a number of gray scales simultaneously in an image sequence.
or colors. In the following example the Example applications include tracking
image has been thresholded with two football players and automobiles on a
thresholds (113 and 200). road.

multiple view interpolation: A


technique for creating (or recognizing)
new unobserved views of a scene from
example images captured from other
viewpoints .

multiplicative noise: A model for the


multi-variate normal distribution:
corruption of a signal where the noise is
A Gaussian distribution for a variable
proportional to the signal strength.
that is a vector rather than as a scalar.
f (x, y) = g(x, y) + g(x, y).v(x, y) where
Let ~x be the vector variable with
f (x, y) is the observed signal, g(x, y) is
dimension N . Assume that this
the ideal (original) signal and v(x, y) is
variable has mean value ~ x and
the noise.
covariance matrix C. Then the
probability of observing the particular Munsell color notation system: A
value ~x is given by [ SB:11.11]: system for precisely specifying colors
and their relationships, based on hue ,
1 21 (~ x ) C1 (~
x~ x~
x )
N 1 e
value ( brightness ) and chroma
(2) 2 | C | 2 (saturation). The Munsell Book of
Color contains colored chips indexed
by these three attributes. The color of
multi-view geometry: See any unknown surface can be identified
multi-sensor geometry . [ FP:11.4] by comparison with the colors in the
book under specified lighting and
multi-view image registration: See viewing conditions. [ GM:5.3.6]
multi-image registration . [ FP:21.3]
mutual illumination: When light
multi-view stereo: See reflecting from one surface illuminates
multi-sensor geometry . [ FP:11.4] another surface and vice versa. The
multiple instruction multiple data consequence of this is that light
(MIMD): A form of parallelism in observed coming from a surface is a
which, at any given time, each function of not only the light source
processor might be executing a different spectrum and the reflectance of the
instruction or program on a different target surface, but also the reflectance
dataset or pixel. This contrasts with of the nearby surface (through the
single instruction multiple data spectrum of the light reflecting from
parallelism where all processors execute the nearby surface onto the first
the same instruction simultaneously surface). The following diagram shows
although on different pixels. [ RJS:8] how mutual illumination can occur.
M 163

images) have in common. In other


CA LIGHT words given a data item A and an
M unknown data item B, the mutual
ER

WHITE
A information
M I(A, B) = H(B) H(B|A) where

CE
A
H(x) is the entropy. [ CS:6.3.4]

RF
BR

SU
EN

O
GRE

EN
W
mutual interreflection: See

RE
G
RED SURFACE mutual illumination .
mutual information: The amount of
information two pieces of data (such as
N

NAND operator: An from a video sequence taken by a


arithmetic operation where a new moving camera.
image is formed by NANDing (logical
AND followed by NOT) together near infrared: Light wavelengths
corresponding bits for every pixel of the approximately in the range 7505000
two image images. This operator is nm. [ WP:Infrared]
most appropriate for binary images but
nearest neighbor: A classification ,
may also be applied to
labeling or grouping principle in which
gray scale images . For example the
a data item is associated with or takes
following shows the NAND operator
the same label as the previously
applied to two binary images
classified data item that is nearest to
[ SB:3.2.2]:
the first data item. This distance might
be based on spatial distance or a
distance in a property space. In this
figure the unknown square is classified
with the label of the nearest point,
namely a circle. [ JKS:15.5.1]

narrow baseline stereo: A form of x


stereo triangulation in which the sensor x x
positions are close together. The x
baseline is the distance between the
sensor positions. Narrow baseline stereo x
often occurs when the image data is x x
164
N 165

vertices that are connected to v by an


Necker cube: A line drawing of a arc. 2) The neighborhood of a point (or
cube drawn under pixel) x is a set of points near x. A
orthographic projection , which as a common definition is the set of points
result can be interpreted in two ways. within a certain distance of x, where
[ VSN:4] the distance metric may be
Manhattan distance or
Euclidean distance . 3) The 4 connected
neighborhood of a 2D location (x, y) is
the set of image locations
{(x+1, y), (x1, y), (x, y+1), (x, y1)}.
The 8 connected neighborhood is the
Necker reversal: An ambiguity in the set of pixels
recovery of 3D structure from multiple {(x + i, y + j)| 1 i, j 1}. The 26
images. Under affine viewing connected neighborhood of a 3D point
conditions, the sequence of 2D images (x, y, z) is defined analogously.
of a set of rotating 3D points is the [ SQ:4.5]
same as the sequence produced by the
rotation in the opposite direction of a
different set of points, so that two
solutions to the structure and motion
problem are possible. The different set
of points is the reflection of the first set 4-connected 8-connected

about any plane perpendicular to the


optical axis of the camera. [ HZ:13.6] neural network: A classifier that
maps input data ~x of dimension n to a
needle map: An image representation space of outputs ~y of dimension m. As
used for displaying 2D and 3D vector a black box, the network is a function
fields, such as surface normals . Each f : Rn 7 [0, 1]m . The most commonly
pixel has a vector. Diagrams showing used form of neural network is the
these use little lines with the magnitude multi-layer perceptron (MLP). An
and direction of the vector projected MLP is characterized by a m n
onto the image of a 3D vector. To avoid matrix of weights W, and a transfer
overcrowding the image, the pixels function that maps the reals to [0, 1].
where the lines are drawn are a subset The output of the single-layer network
of the full image. This image shows a is f~(~x) = (W~x) where is applied
needle map of the surface normals on elementwise to vector arguments. A
the block sides. [ BKPH:11.8] multi-layer network is a cascade of
single-layer networks, with different
weights matrices at each layer. For
example, a two-layer network with k
hidden nodes is defined by weights
matrices W1 Rkn and W2 Rmk ,
and written f (~x) = (W2 (W1 ~x)). A
negate operator: See invert operator common choice for is the sigmoid
. [ SB:3.2.2] function (t) = (1 + est )1 for some
value of s. When we make it explicit
neighborhood: 1) The neighborhood that f~ is a function of the weights as
of a vertex v in a graph is the set of well as the input vector, it is written
166 N

f~(W; ~x).Typically, a neural network is noise: A general term for the deviation
trained to predict the relationship of a signal away from its true value.
between the ~xs and ~y s of a given In the case of images , this leads to pixel
collection of training examples . values (or other measurements) that are
Training means setting the weights different from their expected values.
matrices Pto minimize the training error The causes of noise can be random
e(W) = i d(~yi , f~(W; ~xi )) where d factors, such as thermal noise in the
measures distance between the network sensor, or minor scene events, such as
output and a training example. dust or smoke. Noise can also represent
Common choices for d(~y , ~y ) include the systematic, but unmodeled, events such
2-norm k~y ~y k2 . [ FP:22.4] as short term lighting variations or
quantization . Noise might be reduced
Newtons optimization method: To or removed using a noise reduction
find a local minimum of function method. Here are images without and
f : Rn 7 R from starting position ~x0 . with salt-and-pepper noise . [ TV:3.1]
Given the functions gradient f and
Hessian H evaluated at ~xk , the Newton
update is ~xk+1 = ~xk H1 f . If f is a
quadratic form then a single Newton
step will directly yield the global
minimum. For general f , repeated
Newton steps will generally converge to
a local optimum. [ FP:3.1.2]

next view planning: When


inspecting an object or obtaining a
geometric or appearance-based model, noise model: A way to model the
it may be necessary to observe the statistical properties of noise without
object from several places. Next view having to model the causes of the noise.
planning determines where to next One general assumption about noise is
place the camera (by moving either the that it has some underlying, but
object or the camera) based on either perhaps unknown, distribution. A
what was observed (in the case of Gaussian noise model is a commonly
unknown objects) or a geometric model used for random factors and a
(in the case of known objects). uniform distribution is often used for
unmodeled scene effects. Noise could be
next view prediction: See modeled with a mixture model . The
next view planning . noise model typically has one or more
parameters that control the magnitude
NMR: See nuclear magnetic resonance of the noise. The noise model can also
. [ FP:18.6] specify how the noise affects the signal,
node of graph: A symbolic such as additive noise (which offsets
representation of some entity or feature. the true value) or multiplicative noise
It is connected to other nodes in a (which rescales the true value). The
graph by arcs , that represent type of noise model can constrain the
relationships between the different type of noise reduction method.
entities. [ SQ:12.1] [ AJ:8.2]

noise reduction: An
image processing method that tries to
N 167

reduce the distortion of an image that


has been caused by noise . For example, non-accidentalness: A general
the images from a video sequence taken principle that can be used to improve
with a stationary camera and scene image interpretation based on the
can be averaged together to reduce the concept that when regularities appear
effect of Gaussian noise because the in an image , they are most likely to
average value of a signal corrupted with result from regularities in the scene .
this type of noise converges to the true For example, if two straight lines end
value. Noise reduction methods often near to each other, then this could have
introduce other distortions, but these arisen from a coincidental alignment of
may be less significant to the the line ends and the observer.
application than the original noise. An However, it is much more probable that
image with salt-and-pepper noise and the two lines end at the same point in
its noise reduced by median smoothing the observed scene. This figure shows
are shown in the figure. [ TV:3.2] line terminations and orientations that
are unlikely to be coincidental.

NON-ACCIDENTAL TERMINATION

NON-ACCIDENTAL
PARALLELISM

noise removal: See noise reduction.


non-hierarchical control: A way of
[ TV:3.2]
structuring the sequence of actions in
noise source: A general term for an image interpretation system.
phenomena that corrupt image data. Non-hierarchical control is when there
This could be systematic unmodeled is no master process that orders the
processes (e.g., 60 Hz electromagnetic sequence of actions or operators
noise) or random processes (e.g., applied. Instead, typically, each
electronic shot noise). The sources operator can observe the current results
could be in the scene (e.g., chaff), in and decide if it is capable of executing
the medium (e.g., dust), in the lens and if it is desirable to do so.
(e.g., imperfections) or in the sensor
nonlinear filter: A process where the
(e.g., sensitivity variations).
outputs are a nonlinear function of the
[ WP:Noise]
inputs. This covers a large range of
noise suppression: See algorithms. Examples of nonlinearity
noise reduction . [ TV:3.2] might be: 1) doubling the values of all
input data does not double the values
noise-whitening filter: A noise of the output results (e.g., a filter that
modifying filter that outputs images reports the position at which a given
whose pixels have noise that is value appears), 2) applying an operator
independent of 1) other pixels noise to the sum of two images gives different
(spatial noise) or 2) other values of that results from adding the results of the
pixel at other times (temporal noise). operator applied to the two original
The resulting images noise is images (e.g., thresholding ). [ AJ:8.5]
white noise . [ AJ:6.2]
168 N

non-maximal suppression: A
technique for suppressing multiple non-rigid registration: The problem
responses (e.g., high values of of registering, or aligning, two shapes
gradient magnitude ) representing a that can take on a variety of
single edge or other feature. The configurations (unlike rigid shapes).
resulting edges should be a single pixel For instance, a walking person, a fish,
wide. [ JKS:5.6.1] and facial features like mouth and eyes
are all non-rigid objects, the shape of
non-parametric clustering: A data which changes in time. This type of
clustering process such as registration is frequently needed in
k-nearest neighbor that does not medical imaging as many human body
assume an underlying probability parts deform. Non-rigid registration is
distribution. considerably more complex than rigid
registration. See also alignment ,
non-parametric method: A registration , rigid registration .
probabilistic method used when the
form of the underlying probability non-rigid tracking: A tracking
distribution is unknown or multi-modal. process that is designed to track
Typical applications are to estimate the non-rigid objects . This means that it
a posteriori probability of a can cope with changes in actual object
classification given an observation. shape as well as apparent shape due to
Parzen windows or k-nearest neighbor perspective projection and observer
classifiers are often used. viewpoint .
[ WP:Non-parametric statistics]
non-symbolic representation: A
non-rigid model representation: A model representation in which the
model representation where the shape appearance is described by a numerical
of the model can change, perhaps under or image-based description rather than
the control of a few parameters. These a symbolic or mathematical description.
models are useful for representing For example, non-symbolic models of a
objects whose shape can change, such line would be a list of the coordinates of
as moving humans or biological the points in the line or an image of the
specimens. The differences in shape line. Symbolic object representations
may occur over time or be between include the equation of the line or the
different instances. Changes in endpoints of the line.
apparent shape due to
perspective projection and observer normal curvature: A plane that
viewpoint are not relevant here. By contains the surface normal ~n at point
contrast, a rigid model would have the p~ to a surface intersects that surface to
same actual shape irrespective of the form a planar curve that passes
viewpoint of the observer. through p~. The normal curvature is the
curvature of at p~. The intersecting
non-rigid motion: A motion of an plane can be at any specified
object in the scene in which the shape orientation about the surface normal.
of the object also changes. Examples See [ JKS:13.3.2]:
include: 1) the position of a walking
persons limbs and 2) the shape of a
beating heart. Changes in apparent
shape due to perspective projection
and viewpoint are not relevant here.
N 169

different viewpoints. One method is by


3D reconstruction, e.g., from
n
binocular stereo , and then rendering
the reconstruction using computer
graphics. However, the main
p approaches to novel view synthesis use
epipolar geometry and the pixels of two
or more images of the object to directly
synthesize a new image without
creating a 3D reconstruction.

NP-complete: A concept in
computational complexity covering a
normal distribution: See special set of problems. All of these
Gaussian distribution . [ AJ:2.9] problems currently can be solved, in the
worst case, in time exponential O(eN )
normal flow: The component of
in the number or size N of their input
optical flow in the direction of the
data. For the subset of exponential
intensity gradient . The orthogonal
problems called NP-complete, if an
component is not locally observable
algorithm for one could be found that
because small motions orthogonally do
executes in polynomial time O(N p ) for
not change the appearance of local
some p, then a related algorithm could
neighborhoods.
be found for any other NP-complete
normalized correlation: 1) An image algorithm. [ SQ:12.5]
or signal similarity measure that scales
NTSC: National Television System
the differences between the signals by a
Committee. A television signal
measure of the average signal strength:
recording system used for encoding
2 video data at approximately 60 video
P
(xi y i )
p Pi fields per second. Used in the USA,
2
P 2
( i xi )( i yi )
Japan and other countries. [ AJ:4.1]
This scales the difference so that it is
less significant if the inputs are larger. nuclear magnetic resonance
The similarities lie in the range [0,1], (NMR): An imaging technique based
where 0 is most similar. 2) A statistical on magnetic properties of the atomic
cross correlation process where the nuclei. Protons and neutrons within
correlation coefficient is normalized to atomic nuclei generate a magnetic
lie in the range [ 1,1], where 1 is dipole that can respond to an external
most similar. In the case of two scalar magnetic field. Several properties
variables, this means dividing by the related to the relaxation of that
standard deviations of the two magnetic dipole give rise to values that
variables. [ RJS:6] depend on the tissue type, thus allowing
identification or at least visualization of
NOT operator: See invert operator . the different soft tissue types. The
[ SB:3.2.2] measurement of the signal is a way of
measuring the density of certain types
novel view synthesis: A process of atoms, such as hydrogen in the case
whereby a new view of an object is of biological NMR scanners. This
synthesized by combining information technology is used for medical body
from several images of the object from
170 N

scanning, where a detailed 3D including freeform surfaces .


volumetric image can be produced. [ WP:Non-uniform rational B-spline]
Signal levels are highly correlated with
different biological structures so one can Nyquist frequency: The minimum
easily observe different tissues and their sampling frequency for which the
positions. Also called MRI/magnetic underlying true image (or signal) can
resonance imaging. [ FP:18.6] be reconstructed from the samples. If
sampling at a lower frequency, then
NURBS: Non-Uniform Rational aliasing will occur, creating apparent
B-Splines: a type of shape modeling image structure that does not exist in
primitive based on ratios of b-splines . the original image. [ SB:2.3.2.1]
Capable of accurately representing a
wide range of geometric shapes Nyquist sampling rate: See
Nyquist frequency . [ SB:2.3.2.1]
O

object: 1) A general term referring to a rectangular solid defined in its local


a group of features in a scene that coordinate system [ JKS:15.3.2]:
humans consider to compose a larger
structure. In vision it is generally
thought of as that to which attention is (L,H,W)
directed. 2) A general system theory H
term, where the object is what is of
L
interest (unlike the background ).
Resolution or scale may determine
what is considered the object. [ AL:p.
236]
object contour: See
object centered representation: A occluding contour . [ FP:19.2]
model representation in which the
position of the features and components object grouping: A general term
of the model are described relative to meaning the clustering of all of the
the position of the object itself. This image data associated with a distinct
might be a relative description (the observed object. For example, when
nose is 4 cm from the mouth) or might observing a person, object grouping
use a local coordinate system (e.g., the could cluster all of the pixels from the
right eye is at position (0,25,10) where image of the person. [ FP:24.1]
(0,0,0) is the nose.) This contrasts
with, for example, a object plane: In the case of convex
viewer centered representation . Here is simple lenses typically used in
laboratory TV cameras, the object
171
172 O

plane is the 3D scene plane where all which images are being supplied. See
points are exactly in focus on the also observer motion estimation .
image plane (assuming a perfect lens [ WP:Observer]
and the optical axis is perpendicular to
the image plane). The object plane is observer motion estimation: When
illustrated here: an observer is moving, image data of
[ WP:Microscopy#Oblique illumination] the scene provides optical flow or
trackable scene feature points . These
allow an estimate of how the observer is
LENS
moving relative to the scene, which is
useful for navigation control and
OPTICAL AXIS
position estimation. [ BKPH:17.1]
IMAGE PLANE OBJECT PLANE
obstacle detection: Using visual data
object recognition: A general term to detect objects in front of the
for identifying which of several (or observer, usually for mobile robotics
many) possible objects is observed in applications.
an image. The process may also include
computing the objects image or scene Occams razor: An argument
position , or labeling the image pixels attributed to William of Occam
or image features that belong to the (Ockham), an English nominalist
object. [ FP:21.4] philosopher of the early fourteenth
century, stating that assumptions must
object representation: An encoding not be needlessly multiplied when
of an object into a form suitable for explaining something (entia non sunt
computer manipulation. The models multiplicanda praeter necessitatem).
could be geometric models , Often used simply to suggest that,
graph models or appearance models , other conditions being equal, the
as well as other forms. [ JKS:15.3] simplest solution must be preferred.
Notice variant spelling Ockham. See
object verification: A component of also minimum description length .
an object recognition process that [ WP:Occams razor]
attempts to verify a hypothesized
object identity by examining evidence. occluding contour: The visible edge
Commonly, geometric object models of a smooth curved surface as it bends
are used to verify that object features away from an observer . The occluding
are observed in the correct image contour defines a 3D space curve on
positions. [ FP:18.5] the surface, such that a line of sight
from the observer to a point on the
objective function: 1) The cost space curve is perpendicular to the
function used in an optimization surface normal at that point. The 2D
process. 2) A measure of the misfit image of this curve may also be called
between the data and the model. the occluding contour. The contour can
[ SQ:2.3] often be found by an edge detection
oblique illumination: See process. The cylinder boundaries on
low angle illumination . both the left and right are occluding
contours from our viewpoint [ FP:19.2]:
observer: The individual (or camera)
making observations. Most frequently
this refers to the camera system from
O 173

occlusion understanding: A general


term for analyzing scene occlusions
OCCLUDING CONTOUR
that may include
occluding contour detection ,
occluding contour analysis: A determining the relative depths of the
general term that includes 1) detection surfaces on both sides of an
of the occluding contour , 2) inference occluding contour , searching for
of the shape of the 3D surface at the tee junctions as a cue for occlusion and
occluding contour and 3) determining depth order, etc. [ ERD:7.7]
the relative depth of the surfaces on
both sides of the occluding contour. occupancy grid: A map construction
[ FP:19.2] technique used mainly for autonomous
vehicle navigation. The grid is a set of
occluding contour detection: squares or cubes representing the scene
Determining which of the image edges , which are marked according to
arise from occluding contours . whether the observer believes the
[ FP:19.2] corresponding scene region is empty
(hence navigable) or full. A
occlusion: Occlusion occurs when one probabilistic measure could also be
object lies between an observer and used. Visual evidence from range ,
another object. The closer object binocular stereo or sonar sensors are
occludes the more distant one in the typically used to construct and update
acquired image. The occluded surface is the grid as the observer moves.
the portion of the more distant object [ WP:Occupancy grid mapping]
hidden by the closer object. Here, the
cylinder occludes the more distant brick OCR: See
[ ERD:7.7]: optical character recognition.
[ JKS:2.7]

octree: A volumetric representation in


which 3D space is recursively divided
into eight (hence oct) smaller
volumes by planes parallel to the XY,
YZ, XZ coordinate system planes. A
tree is formed by linking the eight
occlusion recovery: The process of subvolumes to each parent volume.
attempting to infer the shape and Additional subdivision need not occur
appearance of a surface hidden by when a volume contains only object or
occlusion . This recovery helps improve empty space. Thus, this representation
completeness when reconstructing can be more efficient than a pure voxel
scenes and objects for virtual reality . representation. Here are three levels of
This image shows two occluded pipes a pictorial representation of an octree,
and an estimated recovery [ ERD:7.7]: where one octant and the largest
(leftmost) level is expanded to give the
middle figure, and similarly an octant
of the middle [ H. H. Chen, and T. S.
Huang, A Survey of Construction and
Manipulation of Octrees, Computer
174 O

Vision, Graphics and Image Processing, shadows and occlusion .


Vol. 43, pp 409-431, 1988.] : [ WP:Opacity (optics)]

open operator: A
mathematical morphology operator
applied to a binary image . The
operator is a sequence of N erodes
followed by N dilates , both using a
specified structuring element . The
odd field: Standard interlaced video operator is useful for separating
transmits all of the even scan lines in touching objects and removing small
an image frame first and then all of the regions. The right image was created
odd lines. The set of odd lines is the by opening the left image with an
odd field. [ AJ:11.1] 11-pixel disk kernel [ SB:8.15]:

OGorman edge detector: A


parametric edge detector . A
decomposition of the image and model
by orthogonal Walsh function masks
was used to compute the step edge
parameters (contrast and orientation).
One advantage of the parametric model operator: A general term for a
was a goodness of model fit as well as function that is applied to some data in
the edge contrast that increased the order to transform it in some way. For
reliability of the detected edges. example see image processing operator .
[ LG:5]
omnidirectional sensing: Literally,
sensing all directions simultaneously. In opponent color: A
practice, this means using mirrors and color representation system originally
lenses to project most of the developed by Hering in which an image
lines of sight at a point onto a single is represented by three channels with
camera image . The space behind the contrasting colors: RedGreen,
mirrors and camera(s) is typically not YellowBlue, and BlackWhite.
visible. See also catadioptric optics . [ BB:2.2.5]
Here a camera using a spherical mirror
achieves a very wide field of view: optical: A process that uses light and
[ WP:Omnidirectional camera] lenses is an optical process.
[ WP:Optics]

optical axis: The ray, perpendicular


to the lense and through the
optical center , around which the lense
is symmetrical. [ FP:1.1.1]

Focal Point

Optical Axis

opaque: When light cannot pass


through a structure. This causes
O 175

optical center: See focal point . completely determine the image motion,
[ FP:1.2.2] as this has two degrees of freedom. The
equation provides only one constraint,
optical character recognition thus leading to an aperture problem .
(OCR): A general term for extracting [ WP:Optical flow#Estimation of the optical flow]
an alphabetic text description from an
image of the text. Common specialisms
include bank numerals, handwritten optical flow field: The field composed
digits, handwritten characters, cursive of the optical flow vector at each pixel
text, Chinese characters, Arabic in an image. [ FP:25.4]
characters, etc. [ JKS:2.7]
optical flow field segmentation:
optical flow: An instantaneous The segmentation of an optical flow
velocity measurement for the direction image into regions where the optical
and speed of the image data across the flow has a similar direction or
visual field. This can be observed at magnitude. The regions can arise from
every pixel, creating a field of velocity objects moving in different directions
vectors. The set of apparent motions of or surfaces at different depths. See also
the image pixel brightness values. optical flow boundary .
[ FP:25.4]
optical flow region: A region where
optical flow boundary: The the optical flow has a similar direction
boundary between two regions where or magnitude. Regions can arise from
the optical flow is different in direction objects moving in different directions,
or magnitude. The regions can arise or surfaces at different depths. See also
from objects moving in different optical flow boundary .
directions or surfaces at different
depths. See also optical flow smoothness constraint:
optical flow field segmentation . The The constraint that nearby pixels in an
dashed line in this image is the image usually have similar optical flow
boundary between optical flow moving because they usually arise from
left and right: projection of adjacent surface patches
having similar motions relative to the
observer . The constraint can be relaxed
at optical flow boundaries .

optical image processing: An


image processing technique in which
the processing occurs by use of lenses
and coherent light instead of by a
computer. The key principle is that a
coherent light beam that passes
optical flow constraint equation: through a transparency of the target
I
The equation t + I ~ux = 0 that image and is then focused produces the
links the observed change in image Is Fourier transform of the image at the
intensities over time I focal point where
t at image
position ~x to the spatial change in pixel frequency domain filtering can occur.
intensities at that position I and the A typical processing arrangement is:
velocity ~ux of the image data at that
pixel. The constraint does not
176 O

takes as input two binary images , I1


and I2 , and returns an image I3 in
which the value of each pixel is 0 if
both I1 and I2 are 0, and 1 otherwise.
SOURCE IMAGING
TRANSPARENCY
FOCAL
PLANE SENSOR The rightmost image below shows the
FILTER result of ORing the left and middle
optical transfer function (OTF): figures (note that the white pixels have
Informally, the OTF is a measure of value 1) [ SB:3.2.2]:
how well spatially varying patterns are
observed by an optical system. More
formally, in a 2D image, let X(fh , fv )
and Y (fh , fv ) be the Fourier transforms
of the input x(h, v) and output y(h, v)
images. Then, the OTF of a horizontal
and vertical spatial frequency pair
(fh , fv ) is H(fh , fv )/H(0, 0), where
order statistic filter: A filter based
H(fh , fv ) = Y (fh , fv )/X(fh , fv ). The
on order statistics, a technique that
optical transfer function is usually a
sorts the pixels of a neighborhood by
complex number encoding both the
intensity value, and assigns a rank (the
reduction in signal strength at each
position in the sorted sequence) to
spatial frequency and the phase shift.
each. An order statistics filter replaces
[ SB:5.11]
the central value of the filtering
optics: A general term for the neighborhood with the value at a given
manipulation and transformation of rank in the sorted list. A popular
light and images using lenses and example is the median filter . As this
mirrors . [ JKS:8] filter is less sensitive to outliers, it is
often used in robust statistics
optimal basis encoding: A general processes. See also rank order filter .
technique for encoding image or other [ SEU:3.3.1]
data by projecting onto some basis
functions of a linear space and then ordered texture: See macrotexture.
using the projection coefficients instead [ JKS:7.3]
of the original data. Optimal basis
ordering: Sorting a collection of
functions produce projection
objects by a given property, for
coefficients that allow the best
instance, intensity values in a
discrimination between different classes
order statistic filter . [ SEU:3.3.1]
of objects or members in a class (such
as for face recognition). orientation: The property of being
directed towards or facing a particular
optimization: A general term for
region of space, or of a line; also, the
finding the values of the parameters
pose or attitude of a body in space. For
that maximize or minimize some
instance, the orientation of a vector
quantity. [ BB:11.1.2]
(where the vector points to), specified
optimization parameter estimation: by its unit vector; the orientation of an
See optimization . [ BB:11.1.2] ellipsoid , specified by its
principal directions ; the orientation of
OR operator: A pixelwise logic a wire-frame model, specified by its
operator defined on binary variables. It own reference frame with respect to a
O 177

world reference frame.


[ WP:Orientation (computer vision)] orthographic camera: A camera in
which the image is formed according to
orientation error: The amount of a orthographic projection . [ FP:2.3]
error associated with an orientation
value. orthographic projection: Rendering
of a 3D scene as a 2D image by a set of
orientation representation: See rays orthogonal to the image plane.
pose representation . The size of the objects imaged does not
depend on their distance from the
oriented texture: A texture in which viewer. As a consequence, parallel lines
a preferential direction can be detected. in the scene remain parallel in the
For instance, the direction of the bricks image. The equations of orthographic
in a regular brick wall. See also projections are
texture direction , texture orientation .
x=X y=Y
orthogonal image transform:
Orthogonal Transform Coding is a where x, y are the image coordinates of
well-known class of techniques for image an image point in the camera reference
compression. The key process is the frame (that is, in millimeters, not
projection of the image data onto a set pixels), and X, Y, Z are the coordinates
of orthogonal basis functions. See, for of the corresponding scene point. An
instance, the discrete cosine , Fourier example is seen here [ FP:2.3] :
or Haar transforms. This is a special
case of the linear integral transform.
PARALLEL RAYS
orthogonal regression: Also known
as total least squares. Traditionally
seen as the generalization of
linear regression to the case where both IMAGE PLANE
x and y are measured quantities and orthoimage: In photogrammetry, the
subject to error. Given samples xi and warp of an aerial photograph to an
yi , the objective is to find estimates of approximation of the image that would
the true points ( xi , yi ), and line have been taken had the camera
parameters (a, b, c) such that pointed directly downwards. See also
axi +P b
yi + c = 0, i, and such that the orthographic projection .
error (xi x i )2 + (yi yi )2 is [ WP:Orthophoto]
minimized. This estimate is easily
obtained as the line (or plane, etc., in orthonormal: A property of a set of
higher dimensions) passing through the basis functions or vectors. If <, > is the
centroid of the data, in the direction of inner product function and a and b are
the eigenvector of the data any two different members of the set,
scatter matrix that has smallest then we have < a, a >=< b, b >= 1 and
eigenvalue. [ WP:Total least squares] < a, b >= 0. [ WP:Orthonormal basis]

orthographic: The characteristic


property of orthographic (or OTF: See optical transfer function .
perpendicular) projection onto the [ SB:5.11]
image plane. See
orthographic projection . [ FP:2.3] outlier: If a set of data mostly
conforms to some regular process or is
178 O

well represented by a model, with the


exception of a few data points, then over-segmented: Describing the
these exception points are outliers. output of a segmentation algorithm.
Classifying points as outliers depends Given an image where a desired
on both the models used and the segmentation result is known, the
statistics of the data. This figure shows algorithm over-segments if the desired
a line fit to some points and an outlying regions are represented by too many
point. [ CS:3.4.6] algorithmically output regions. This
image should be segmented into three
regions but it was oversegmented into
OUTLIER five regions [ SQ:8.7]:

INLIERS

outlier rejection: Identifying outliers


and removing them from the current
process. Identification is often a
difficult process. [ CS:3.4.6]
P

paired boundaries: See segments. The representation is


paired contours . invariant to rotation and translation.
PGHs can be compared using the
paired contours: A pair of contours Bhattacharyya metric .
occurring together in images and
related by a spatial relationship, for PAL camera: A camera conforming
instance the contours generated by to the European PAL standard (Phase
river banks in aerial images, or the Alternation by Line). See also NTSC ,
contours of a human limb (arm, leg). RS-170 , CCIR camera . [ AJ:4.1]
Co-occurrence can be exploited to make
contour detection more robust. See also palette: The range of colors available.
feature extraction . An example is seen [ NA:2.2]
here:
pan: Rotation of a camera about a
single axis through the camera center
and (approximately) parallel to the
image vertical:
[ WP:Panning (camera)]

pairwise geometric histogram: A


line- or edge-based shape representation
used for object recognition , especially
2D. Histograms are built by computing,
for each line segment, the relative angle
and perpendicular distance to all other
179
180 P

panoramic image stereo: A stereo


system working with a very large field
of view, say 360 degrees in azimuth and
120 degrees in elevation. Disparity
maps and depths are recovered for the
whole field of view simultaneously. A
normal stereo system would have to be
moved and results registered to achieve
the same result. See also
binocular stereo , multi-view stereo ,
panchromatic: Sensitive to light of all omnidirectional sensing .
visible wavelengths. Panchromatic
images are gray scale images where Pantone matching system (PMS):
each pixel averages light equally over A color matching system used by the
the visible range. printing industry to print spot colors.
Colors are specified by the Pantone
panoramic: Associated with a name or number. PMS works well for
wide field-of-view often created or spot colors but not for process colors,
observed by a panned camera. usually specified by the CMYK color
[ WP:Panoramic photography] model. [ WP:Pantone]

panoramic image mosaic: A class of Panums fusional area: The region


techniques for collating a set of of space within which single vision is
partially overlapping images into a possible (that is, you do not perceive
panoramic, single image. This double images of objects) when the eyes
fixate a given point. [ CS:6.7.1.1]

parabolic point: A point on a smooth


surface where the Gaussian curvature
is positive. See also HK segmentation .
[ VSN:9.2.5]

parallax: The angle between the two


is a mosaic build from the frames of a straight lines that join a point (possibly
hand-held camera sequence. Typically, a moving one) to two viewpoints. In
the mosaic yields both very high motion analysis, motion parallax occurs
resolution and large field of view, which when two scene points that project to
cannot be simultaneously achieved by a the same image point at one viewpoint
physical camera. There are several later project to different points as the
ways to build panoramic mosaic, but, in camera moves. The vector between the
general, there are three necessary steps: two new points is the parallax. See
first, determining correspondences (see [ TV:8.2.4]:
stereo correspondence problem )
between adjacent images; second, using
the correspondences to find a warping
transformation between the two images
(or between the current mosaic and a
new image); third, blending the new
image into the current mosaic.
P 181

estimated, for instance, by


least square surface fitting . [ DH:3.1]

parametric edge detector: An


edge detection technique that seeks to
match image data using a
parametric model of edge points and
thus detects edges when the image data
INITIAL CAMERA FINAL CAMERA
POSITION POSITION fits the edge model well. See
Hueckel edge detector . [ VSN:3.1.3]

parallel processing: An algorithm is parametric mesh: A type of surface


executed in parallel, or through parallel modeling primitive for 3D models in
processing, when it can be divided into which the surface is defined by a mesh
a number of computations that are of points. A typical example is NURBS
performed simultaneously on separate ( non-uniform rational b-splines ).
hardware. See also
single instruction multiple data, parametric model: A mathematical
multiple instruction multiple data, model expressed as function of a set of
pipeline parallelism , task parallelism . parameters, for instance, the
[ BB:10.4.1] parametric equation of a curve or
surface (as opposed to its implicit
parallel projection: A generalization form), or a parametric edge model (see
of orthographic projection in which a parametric edge detector ).
scene is projected onto the image plane [ VSN:3.1.3]
by a set of parallel rays not necessarily
perpendicular to the image plane. This paraperspective: An approximation
is a good approximation of perspective of perspective projection , whereby a
projection, up to a uniform scale factor, scene is divided into parts that are
when the scene is small in comparison imaged separately by
to its distance from the parallel projection with different
center of projection . Parallel projection parameters. [ FP:2.3.1-2.3.3]
is a subset of weak perspective part recognition: A class of
viewing, where the weak perspective techniques for recognizing assemblies or
projection matrix is subject not only to articulated objects from their
orthogonality of the rows of the left subcomponents (parts), e.g., a human
2 3 submatrix, but also to the body from head, trunk, and limbs.
constraint that the rows have equal Parts have been represented by 3D
norm. In orthographic projection, both models like generalized cones ,
rows have unit norm. [ FP:2.3.1] superquadrics , and others. In industrial
parameter estimation: A class of contexts, part recognition indicates the
techniques aimed to estimate the recognition of specific items (parts) in a
parameters of a given production line, typically for
parametric model. For instance, classification and quality control.
assuming that a set of image points lie part segmentation: A class of
on an ellipse, and considering the techniques for partitioning a set of data
implicit ellipse model into components (parts) with an
ax2 + bxy + cy 2 + dx + ey + f , the identity of their own, for instance a
parameter vector [a, b, c, d, e, f ] can be
182 P

human body into limbs, head, and parameters, which is updated via a
trunk. Part segmentation methods dynamical model and observation
exist for both 2D and 3D data, that is, model to produce the new set
intensity images and range images , representing the posterior distribution.
respectively. Various geometric models See also condensation tracking .
have been adopted for the parts, e.g., [ WP:Particle filter]
generalized cylinders , superellipses ,
and superquadrics . See also particle segmentation: A class of
articulated object segmentation . techniques for detecting individual
[ BM:6.2.2] instances of small objects (particles)
like pebbles, cells, or water droplets, in
partially constrained pose: A images or sequences. A typical problem
situation whereby an object is subject is severe occlusion caused by
to a number of constraints restricting overlapping particles. This problem has
the number of admissible orientations been approached successfully with the
or positions, but not fixing one watershed transform .
univocally. For instance, cars on a road
are constrained to rotate around an particle tracking: See
axis perpendicular to the road. condensation tracking

particle counting: An application of Parzen: A Parzen window is a linearly


particle segmentation to counting the increasing and decreasing weighting
instances of small objects (particles) window (triangle-shaped) used to limit
like pebbles, cells, or water droplets, in leakage to spurious frequencies when
images or sequences, such as in this computing the power spectrum of a
image: [ WP:Particle counter] signal:

See also windowing , Fourier transform.


[ DH:4.3]

passive sensing: A sensing process


particle filter: A tracking strategy that does not emit any stimulus or
where the probability density of the where the sensor does not move is
model parameters is represented as a passive. A normal stationary camera is
set of particles. A particle is a single passive. Structured light triangulation
sample of the model parameters, with or a moving video camera are active .
an associated weight. The probability [ VSN:1.1]
density represented by the particles is
typically a set of delta functions or a passive stereo: A passive stereo
set of Gaussians with means at the algorithm uses only the information
particle centers. At each tracking obtainable using a stationary set of
iteration, the current set of particles cameras and ambient illumination.
represents a prior on the model This contrasts with the active vision
P 183

paradigm in stereo , where the and statistical pattern recognition .


camera(s) might move or some [ RJS:6]
projected stimulus might be used to
help solve the PCA: See
stereo correspondence problem . principal component analysis .
[ BM:1.9.2] [ FP:22.3.1]

patch classification: The problem of PDM: See point distribution model.


attributing a surface patch to a [ WP:Point distribution model]
particular class in a shape catalogue,
peak: A general term for when a signal
typically computed from dense range
value is greater than the neighboring
data using curvature estimates or
signal values. An example of a signal
shading . See also
peak measured in one dimension is
curvature sign patch classification ,
when crossing a bright line lying on a
mean and Gaussian curvature
dark surface along a scanline . A
shape classification .
cross-section along a scanline of an
path coherence: A property used in image of a light line on a dark
tracking objects in an image sequence . background might observe the pixel
The assumption is that the object values 7, 45, 105, 54, 7. The peak
motion is mostly smooth in the scene would be at 105. A two dimensional
and thus the observed motion in a example is when observing the image of
projected image of the scene is also a bright spot on a darker background.
smooth. [ JKS:14.6] [ SOS:3.4.5]

path finding: The problem of pedestrian surveillance: See


determining a path with given person surveillance .
properties in a graph, for example, the
pel: See pixel . [ SB:3]
shortest path connecting two given
nodes, or two nodes with given pencil of lines: A bundle of lines
properties. A path is defined as a linear passing through the same point. For
subgraph. Path finding is a example, if p~ is a generic bundle point
characteristic problem of state-space and p~0 the point through which all lines
methods, inherited from symbolic pass, the bundle is
artificial intelligence. See also
graph searching . This term is also used p~ = p~0 + ~v
in the context of dynamic programming
search, for instance applied to the where is a real number and ~v the
stereo correspondence problem. direction of the individual line (both
[ WP:Pathfinding] are parameters). An example is
[ FP:13.1.4]:
pattern grammar: See
shape grammar .
[ WP:Pattern grammar]

pattern recognition: A large research


area concerned with the recognition
and classification of structures,
relations or patterns in data. Classic percentile method: A specialized
techniques include syntactic , structural thresholding technique used for
184 P

selecting the threshold. The method


assumes that the percentage of the
scene that belongs to the desired object
(e.g., a darker object against a lighter
background) is known. The threshold
that selects that percentage of pixels is
used. [ JKS:3.2.1] performance characterization: A
class of techniques aimed to assess the
perception: The process of performance of computer vision systems
understanding the world through the in terms of, for instance, accuracy,
analysis of sensory input (such as precision, robustness to noise,
images). [ DH:1.1] repeatability, and reliability. [ TV:A.1]
perceptron: A computational element
(w~ ~x) that acts on a data vector ~x, perimeter: 1) The perimeter of a
where w ~ is a vector of weights and () binary image is the set of foreground
is the activation function. Perceptrons pixels that touch the background. 2)
are often used for classifying data into The length of the path through those
one of two sets (i.e., if (w~ ~x) 0 or pixels. [ JKS:2.5.6]
(w~ ~x) < 0). See also classification ,
supervised classification , periodicity estimation: The problem
pattern recognition . [ RN:2.4] of estimating the period of a periodic
phenomenon, e.g., given a texture
perceptron network: A multi-layer created by the repetition of a fixed
arrangement of perceptrons , closely pattern, determine the patterns size.
related to the well-known
back-propagation networks. [ RN:2.4] person surveillance: A class of
techniques aimed at detecting, tracking,
perceptual grouping: See counting, and recognizing people or
perceptual organization . [ FP:14.2] their behavior in CCTV videos, for
perceptual organization: A theory security purposes. For examples,
based on Gestalt psychology, centered systems have been reported for the
on the tenet that certain organizations automated surveillance of car parks,
(or interpretations) of visual stimuli are banks, airports and the like. A typical
preferred over others by the human system must detect the presence of a
visual system. A famous example is person, track the persons movement
that a drawing of a wire-frame cube is over time, possibly identify the person
immediately interpreted as a 3D object, using a database of known faces, and
instead of a 2D collection of lines. This classify the persons behavior according
concept has been used in several to a small class of pre-defined behaviors
low-level vision systems, typically to (e.g., normal or anomalous). See also
find groups of low-level features most anomalous behavior detection ,
probably generated by interesting face recognition , and face tracking .
objects. See also grouping and perspective: The rendering of a 3D
Lowes curve segmentation . A more scene as a 2D image according to
complex example is below, where the perspective projection , the key
line of feature endings suggests a characteristic of which is, intuitively,
virtual horizontal line. [ FP:14.2] that the size of the imaged objects
depend on their distance from the
P 185

viewer. As a consequence, the image of projection equation of perspective is


a bundle of parallel lines is a bundle of
lines converging into a point, the X Y
x=f y=f ,
vanishing point . The geometry of Z Z
perspective was formalized by the where x, y are the image coordinates of
master painters of the Italian an image point in the camera reference
Quattrocento and Renaissance. frame (e.g., in millimeters, not pixels), f
[ FP:2.2] is the focal length and X, Y, Z are the
perspective camera: A camera in coordinates of the corresponding scene
which the image is formed according to point. [ FP:1.1.1]
perspective projection . The PET: See
corresponding mathematical model is positron emission tomography .
commonly known as the [ AJ:10.1]
pinhole camera model . An example of
the projection in the perspective phase congruency: The property
camera is [ FP:2.2]: whereby components of the
Fourier transform of an image are
LENS maximally in phase at feature points
CENTER OF
PROJECTION like step edges or lines. Phase
OPTICAL
AXIS congruency is invariant to image
brightness and contrast and has been
IMAGE
PLANE SCENE therefore used as an absolute measure
OBJECT
of the significance of feature points. See
perspective distortion: A type of also image feature .
distortion in which lines that are [ WP:Phase congruency]
parallel in the real world appear to
phase correlation: A motion
converge in a perspective image. In the
estimation method that uses the
example notice how the train tracks
translation-phase duality property of
appear to converge in the distance.
the Fourier transform , that is, a shift
[ SB:2.3.1]
in the spatial domain is equivalent to a
phase shift in the frequency domain.
When using log-polar coordinates, and
the rotation and scale properties of the
Fourier transform, spatial rotation and
scale can be estimated from the
frequency shift, independent of spatial
translation. See also
planar motion estimation .
[ WP:Phase correlation]
perspective inversion: The problem
of determining the position of a 3D phase matching stereo algorithm:
object from its image. I.e., solving the An algorithm for solving the
perspective projection equations for the stereo correspondence problem by
3D coordinates. See also looking for similarity of the phase of
absolute orientation . [ FP:2.2] the Fourier transform .

perspective projection: Imaging a phase-retrieval problem: The


scene with foreshortening. The problem of reconstructing a signal
186 P

based on only the magnitude (not the sensor, converting light to an electric
phase) of the Fourier transform . signal. [ WP:Photodiode]
[ WP:Phase retrieval]
photogrammetry: A research area
phase spectrum: The concerned with obtaining reliable and
Fourier transform of an image can be accurate measurements from
decomposed into its phase spectrum noncontact imaging, e.g., a digital
and its power spectrum . The phase height map from a pair of overlapping
spectrum is the relative phase offset of satellite images. Consequently, accurate
the given spatial frequency . camera calibration is a primary
[ EH:11.2.1] concern. The techniques used overlap
many typical of image processing and
phase unwrapping technique: The pattern recognition . [ FP:3.4]
process of reconstructing the true phase
shift from phase estimates wrapped photometric invariant: A feature or
into [, ] . The true phase shift characteristic of an image that is
values may not fall in this interval but insensitive to changes in illumination.
instead be mapped into the interval by See also invariant .
addition or subtraction of multiples of
2. The technique maximizes the photometric decalibration: The
smoothness of the phase image by correction of intensities in an image so
adding or subtracting multiples of 2 at that the same surface (at the same
various image locations. See also orientation) will give the same response
Fourier transform . regardless of the position in which it
[ WP:Range imaging#Interferometry] appears in the image.

phis curve (s): A technique for photometric stereo: A technique


representing planar contours . Each recovering surface shape (more
point in the contour is represented by precisely, the surface normal at each
the angle formed by the line through surface point) using multiple images
P and the shapes center (e.g., the acquired from a single viewpoint but
barycentrum or center of mass ) with a under different illumination conditions.
fixed direction, and the distance s from These lead to different
the center to P : reflectance maps, that together
constrain the surface normal at each
point. [ FP:5.4]
s
photometry: A branch of optics
concerned with the measurement of the
amount or the spectrum of light. In
computer vision, one frequently uses
photometric models expressing the
See also shape representation . amount of light emerging from a
[ BB:8.2.3] surface, be it fictitious, or the surface of
a radiating source, or from an
photo consistency: See illuminated object. A well-known
shape from photo consistency . photometric model is Lamberts law.
[ WP:Photo-consistency] [ WP:Photometry (optics)]
photodiode: The basic element, or
pixel, of a CCD or other solid state
P 187

photon noise: Noise generated by the the smallest directly measured


statistical fluctuations associated with image feature . [ SB:3]
photon counting over a finite time
interval in the CCD or other solid state picture tree: A recursive image and
sensor of a digital camera. Photon noise 2D shape representation in which a
is not independent of the signal, and is tree data structure is used. Each node
not additive. See also image noise , in the tree represents a region that is
digital camera . [ WP:Image noise] then decomposed into subregions.
These are represented by child nodes.
photopic response: The The figure below shows a segmented
sensitivity-wavelength curve modeling image with four regions (left) and the
the response of the human eye to corresponding picture tree.
normal lighting conditions. In such [ JKS:3.3.4]
conditions, the cones are the
photoreceptors on the retina that best
respond to light. Their response curve
peaks at 555 nm, indicating that the
eye is maximally sensitive to
green-yellow colors in normal lighting
conditions. When light intensity is very
low, the rods determine the eyes *
response, modeled by the scotopic
curve, which peaks near to 510 nm. C
[ AJ:3.2] B A B
A
D
photosensor spectral response:
C D
The spectral response of a photosensor
characterizing the sensors output as a
function of the input lights spectral
frequency. See also Fourier transform ,
frequency spectrum ,
spectral frequency.
[ WP:Frequency spectrum]
piecewise rigidity: The property of
physics based vision: An area of an object or scene that some of its
computer vision seeking to apply parts, but not the object or scene as a
physics laws or methods (of optics , whole, are rigid. Piecewise rigidity can
surfaces, illumination, etc.) to the be a convenient assumption, e.g., in
analysis of images and videos. motion analysis.
Examples include
polarization based methods , in which pincushion distortion: A form of
physical properties of the scene surfaces radial lens distortion where image
are estimated via estimates of the state points are displaced away from the
of polarization of the incoming light, center of distortion by an amount that
and the use of detailed radiometric increases with the distance to the
models of image formation. center. A straight line that would have
been parallel to an image side is bowed
picture element: A pixel . It is an towards the center of the image. This is
indivisible image measurement. This is the opposite of barrel distortion .
[ EH:6.3.1]
188 P

requires the following steps

a1 = A(x1 )
y1 = B(a1 )
a2 = A(x2 )
y2 = B(a2 )
a3 = A(x3 )
....
ai = A(xi )
yi = B(ai )....

However, notice that we compute yi


pinhole camera model: The just after yi1 , so the computation can
mathematical model for an ideal be arranged as
perspective camera formed by an image
plane and a point aperture, through a1 = A(x1 )
which all incoming rays must pass. For a2 = A(x2 ) y1 = B(a1 )
equations, see perspective projection . a3 = A(x3 ) y2 = B(a2 )
This is a good model for simple convex
lens camera, where all rays pass ....
through the virtual pinhole at the focal ai+1 = A(xi+1 ) yi = B(ai )
point. [ FP:1.1] ....

where steps on the same line may be


computed concurrently as they are
PINHOLE
independent. The output values yi
PRINCIPAL POINT OPTICAL
AXIS therefore arrive at a rate of one every
cycle rather than one every two cycles
IMAGE
PLANE SCENE without pipelining. The pipeline
OBJECT
process can be visualized as:

pink noise: Noise that is not white , xi+1 ai yi1


i.e., when there is a correlation between
A B
the noise at two pixels or at two times.
[ WP:Pink noise] pit: 1) A general term for when a
signal value is lower than the
pipeline parallelism: Parallelism neighboring signal values. Unlike signal
achieved with two or more, possibly peaks , pits usually refer to two
dissimilar, computation devices. The dimensional images . For example, a pit
non-parallel process comprises steps A occurs when observing the image of a
and B, and will operate on a sequence dark spot on a lighter background. 2)
of items xi , i > 0, producing outputs yi . A local point-like concave shape defect
The result of B depends on the result of in a surface.
A, so a sequential computer will
compute ai = A(xi ); yi = B(ai ); for pitch: A 3D rotation representation
each i. A parallel computer cannot (along with yaw and roll ) often used
compute ai and yi simultaneously as for cameras or moving observers. The
they are dependent, so the computation pitch component specifies a rotation
P 189

about a horizontal axis to give an image segmentation ,


u-p-down change in orientation. This supervised classification , and
figure shows the pitch rotation direction clustering. This image shows the pixels
[ JKS:12.2.1]: of the left image classified into four
classes denoted by the four different
shades of gray [ VSN:3.3.1]:
OBSERVATION
DIRECTION

PITCH
DIRECTION

pixel: The intensity values of a digital


image are specified at the locations of a
discrete rectangular grid; each location
is a pixel. A pixel is characterized by
its coordinates (position in the image) pixel connectivity: The pattern
and intensity value (see intensity and specifying which pixels are considered
intensity image ). Values can express neighbors of a given one (X) for the
physical quantities other than intensity purposes of computation. Common
for different kinds of images, as in, e.g., connectivity schemes are
infrared imaging . In physical terms, a 4 connectedness and 8 connectedness ,
pixel is the photosensitive cell on the as seen in the left and right images here
CCD or other solid state sensor of a [ SB:4.2]:
digital camera. The CCD pixel has a
precise size, specified by the
manufacturer and determining the
CCDs aspect ratio . See also
intensity sensor and
X X
photosensor spectral response . [ SB:3]

pixel addition operator: A low-level


pixel coordinates: The coordinates of
image processing operator taking as
a pixel in an image. Normally these are
input two gray scale images, I1 and I2 ,
the row and column position.
and returning an image I3 in which the
[ JKS:12.1]
value of each pixel is I3 = I1 + I2 . This
figure shows at the right the sum of the pixel coordinate transformation:
two images at the left (the sum divided The mathematical transformation
by 2 to rescale to the original intensity linking two image reference frames ,
level) [ SB:3.2.1]: specifying how the coordinates of a
pixel in one reference frame are
obtained from the coordinate of that
pixel in the other reference frame. One
linear transformation can be specified
by i1 = ai2 + bj2 + e
j1 = ci2 + dj2 + f
pixel classification: The problem of where the coordinates of p~2 = (i2 , j2 )
assigning the pixels of an image to are transformed into p~1 = (i1 , j1 ). In
certain classes. See also matrix form, p~1 = A~ p2 + ~t, with
190 P


a b intensity values. See also intensity ,
A= a rotation matrix and
c d intensity image , and intensity sensor .

~t = e
a translation vector. See pixel interpolation: See
f
also Euclidean , affine and image interpolation . [ WP:Pixelation]
holography transforms .
pixel jitter: A frame grabber must
pixel counting: A simple algorithm to estimate the pixel sampling clock of a
determine the area of an image region digital camera, i.e., the clock used to
by counting the numbers of pixels read out the pixel values, which is not
composing the region. See also region . included in the output signal of the
[ WP:Simulation cockpit#Aircraft Simpits]camera. Pixel jitter is a form of
image noise generated by time
variations in the frame grabbers
pixel division operator: An operator estimate of the cameras clock.
taking as input two gray scale images,
I1 and I2 , and returning an image I3 in pixel logarithm operator: An
which the value of each pixel is image processing operator taking as
I3 = I1 /I2 . input one gray scale image, I1 , and
returning an image I2 in which the
pixel exponential operator: A value of each pixel is
low-level image processing operator I2 = c logb (| I1 + 1 |). This operator is
taking as input one gray scale image, used to change the dynamic range of
I1 , and returning an image I2 in which an image (see also
the value of each pixel is I2 = cbI1 . contrast enhancement ), such as for the
This operator is used to change the enhancement of the magnitude of the
dynamic range of an image. The value Fourier transform . The base b of the
of the basis b depends on the desired logarithm function is often e, but it
degree of compression of the dynamic does not actually matter because the
range. c is a scaling factor. See also relationship between logarithms of any
logarithmic transformation , two bases is only one of scaling . See
pixel logarithm operator . The right also pixel exponential operator . The
image is 1.005 raised to the pixel values right image is the scaled logarithm of
of the left image: the pixel values of the left image
[ SB:3.3.1]:

pixel gray scale resolution: The pixel multiplication operator: An


number of different gray levels that can image processing operator taking as
be represented in a pixel, depending on input two gray scale images, I1 and I2 ,
the number of bits associated with each and returning an image I3 in which the
pixel. For instance, an 8-bit pixel (or value of each pixel is I3 = I1 I2 . The
image) can represent 28 = 256 different right image is the product of the left
P 191

and middle images (scaled by 255 for planar patch extraction: The
contrast here) [ SB:3.2.1.2]: problem of finding planar regions, or
patches, most commonly in
range images . Plane extraction can be
useful, for instance, in
3D pose estimation , as several
model-based matching techniques yield
higher accuracy with planar than
non-planar surfaces.

planar patches: See


pixel subsampling: The process of surface triangulation .
producing a smaller image from a given planar projective transformation:
one by including only one pixel out of See homography . [ HZ:1.3]
every N . Subsampling is rarely applied
this literally, however, as severe aliasing planar rectification: A class of
is introduced; scale space filtering is rectification algorithms projecting the
applied instead. original images onto a plane parallel to
the baseline of the cameras. See also
pixel subtraction operator: A stereo and stereo vision .
low-level image processing operator
taking as input two gray scale images, planar scene: 1) When the depth of a
I1 and I2 , and returning an image I3 in scene is small with respect to its
which the value of each pixel is distance from the camera, the scene can
I3 = I1 I2 . This operator implements be considered planar, and useful
the simplest possible change detection approximations can be adopted; for
algorithm. The right image (with 128 instance, the transformation between
added) is the middle image subtracted two views taken by a
from the left image [ SB:3.2.1]: perspective camera is a homography .
See also planar mosaic. 2) When all of
the surfaces in a scene are planar, e.g.,
a blocksworld scene.

plane: The locus of all points ~x such


that the surface normal ~n of the plane
planar facet model: See and a point in the plane p~ satisfy the
surface mesh . [ JKS:13.5] relation (~x p~) ~n = 0. In 3D space, for
instance, a plane is defined by two
planar mosaic: A vectors and a point lying on the plane,
panoramic image mosaic of a planar so that the planes parametric equation
scene. If the scene is planar, the is
transformation linking different views is p~ = a~u + b~v + p~0 ,
a homography .
where p~ is the generic plane point,
planar motion estimation: A class ~u, ~v , p~0 are the two vectors and the
of techniques aiming to estimate the point defining the plane, respectively.
motion parameters of bodies moving on The implicit equation of a plane is
a planes in space. See also ax + by + cz + d = 0, where [x, y, z] are
motion estimation . [ HZ:18.8] the coordinates of the generic plane
point. In vector form, p~ ~n = d, where
192 P

p~ = [x, y, z], ~n = [a, b, c] is a vector


perpendicular to the plane, and d is plenoptic function representation:
k~
nk
A parameterized function for describing
the distance of the plane from the everything that is visible from a given
origin. All of these definitions are point in space, a fundamental
equivalent. [ JKS:13.3.1] representation in
plane conic: Any of the curves defined image based rendering . [ FP:26.3]
by the intersection of a plane with a 3D Plessey corner finder: A well-known
double cone, namely ellipse, hyperbola corner detector also known as
and parabola. Two intersecting lines Harris corner detector , based on the
and a single point represent degenerate local autocorrelation of first-order
conics, defined by special configurations image derivatives. See also
of the cone and plane. The implicit feature extraction .
equation of a conic is [ WP:Corner detection#The Harris .26 Stephens .2F Plessey .2F Shi-
ax2 + bxy + cy 2 + dx + ey + f = 0. See Tomasi corner detection algorithm]
also conic fitting . This figure shows an
ellipse formed by intersection
[ JKS:6.6]: Plucker line coordinates: A
representation of lines in projective 3D
space. A line is represented by six
numbers (l12 , l13 , l14 , l23 , l24 , l34 ) that
must satisfy the constraint that
l12 l34 + l13 l24 + l14 l23 = 0. The numbers
are the entries of the Pl ucker matrix, L,
for the line. For any two points A, B on
the line, L is given by
lij = Ai Bj Bi Aj . The pencil of
planes containing the line are the
nullspace of L. The six numbers may
also be seen as a pair of 3-vectors, one a
point ~a on the line, one the direction ~n
plane projective transfer: An with ~a ~n = 0. [ OF:2.5.1]
algorithm based on
projective invariants that, given two PMS: See Pantone matching system .
images of a planar object, I1 and I2 , [ WP:Pantone]
and four feature correspondences, point: A primitive concept of
determines the position of any other Euclidean geometry, representing an
point of I1 in I2 . Interestingly, no infinitely small entity. In computer
knowledge of the scene or of the vision, pixels are regarded as image
imaging systems parameters is points, and one speaks of points in the
necessary. scene as positions in the 3D space
plane projective transformation: observed by the cameras.
The linear transformation between the [ WP:Point (geometry]
coordinates of two projective planes, point distribution model (PDM):
also known as homography . See also A shape representation for flexible 2D
projective geometry , projective plane , contours. It is a type of
and projective transformation . deformable template model and its
[ FP:18.4.1]
P 193

parameters can be learned by point of extreme curvature: A


supervised learning . It is suitable for point where the curvature achieves an
2D shapes that undergo general but extremum, that is, a maximum or a
correlated deformations or variations, minimum. This figure shows one of each
such as component motion or shape type circled: [ WP:Vertex (geometry)]
variation. For instance, fronto-parallel
images of leaves, fish or human hands, MINIMA
resistors on a board, people walking in
surveillance videos, and the like. The
shape variations of the contour in a
series of examples are captured by
principal component analysis .
[ WP:Point distribution model]

point feature: An image feature that


occupies a very small portion of an
image, ideally one pixel, and is therefore
local in nature. Examples are corners
(see corner detection ) or edge pixels. MAXIMA
Notice that, although point features point sampling: Selection of discrete
occupy only one pixel, they require a points of data from a continuous signal.
neighborhood to be defined; for For example a digital camera samples
instance, an edge pixel is characterized a continuous image function into a
by a sharp variation of image values in digital image .
a small neighborhood of the pixel. [ WP:Sampling (signal processing)]
point invariant: A property that 1) point similarity measure: A
can be measured at a point in an image function measuring the similarity of
and 2) is invariant to some image points (actually small
transformation. For instance, the ratio neighborhoods to include sufficient
of a pixels observed intensity to that of information to characterize the image
its brightest neighbor is invariant to location), for instance cross correlation,
changes in illumination. Another SAD (sum of absolute differences), or
example: the magnitude of the gradient SSD (sum of squared differences).
of intensity at a point is invariant to
translation and rotation. (Both of these point source: A point light source .
examples assume ideal images and An ideal illumination source in which
observation.) all light comes from a single spatial
point. The alternative is an
point light source: A point-like extended light source . The assumption
light source , typically radiating energy of being a point source allows easier
radially, whose intensity decreases as interpretation of shading and shadows ,
1
r 2 , where r is the distance to the etc. [ FP:5.2.2]
source. [ FP:5.2.2]
point spread function: The response
point matching: A class of algorithms of a 2D system or filter to an input
solving the matching or correspondence Dirac impulse. The response is typically
problem for point features . spread over a region surrounding the
point of application of the impulse,
hence the name. Analogous to the
194 P

impulse response of a 1D system. See polycurve: A simple curve C that is


also filter , linear filter . [ FP:7.2.2] smooth everywhere but at a finite set of
points, and such that, given any point
polar coordinates: A system of P on C, the tangent to C converges to
coordinates specifying the position of a a limit approaching P from each
point P in terms of the direction of the direction. Computer vision shape
line through P and the origin, and the models often describe boundary shapes
distance from P to the origin along using polycurve models consisting of a
that line. For example, the sequence of curved or straight
transformation between polar (r, ) and segments, such as in this example using
Cartesian coordinates (x, y) in the four circular arcs. See also polyline .
plane is given by xp = r cos and
y = r sin , or r = x2 + y 2 and
= atan( xy ) . [ BB:A1.1.2]

polar rectification: A rectification


algorithm designed to cope with any
camera geometry in the context of
uncalibrated vision, re-parameterizing
the images in polar coordinates around
the epipoles .
polygon: A closed, piecewise linear,
polarization: The characterizing 2D contour. Squares, rectangles and
property of polarized light . [ EH:8] pentagons are examples of regular
polygons, where all sides have equal
polarized light: Unpolarized light length and all angles formed by
results from the nondeterministic contiguous sides are equal. This does
superposition of the x and y not hold for a general polygon.
components of the electric field. [ WP:Polygon]
Otherwise, the light is said to be
polarized, and the tip of the electric polygon matching: A class of
field evolves on an ellipse (elliptically techniques for matching polygonal
polarized light). Light is often partially shapes. See polygon .
polarized, that is, it can be regarded as
the sum of completely polarized and polygonal approximation: A
completely unpolarized light. In polyline approximating a curve. This
computer vision, polarization analysis is circular arc is (badly) approximated by
an area of physics based vision , and the polyline [ BB:8.2]:
has been used for metaldielectric
discrimination, surface reconstruction ,
fish classification, defect detection, and
in structured light triangulation .
[ EH:8]

polarizer: A device changing the state polyhedron: A 3D object with planar


of polarization of light to a specific faces, a 3D polygon. A subset of R3
polarized state, for example, producing whose boundary is a subset of finitely
linearly polarized light in a given plane. many planes. The basic primitive of
[ EH:8.2] many 3D modeling schemes, as many
hardware accelerators process polygons
P 195

particularly quickly. A tetrahedron is thereof. Often the term means finding


the simplest polyhedron [ DH:12.4]: the transformation that aligns a
geometric model with the image data.
Several techniques exist for this
purpose. See also alignment ,
model registration ,
orientation estimation , and
rotation representation .
[ WP:Pose (computer vision)#Pose Estimation]

polyline: A piecewise linear contour. pose representation: The problem of


If closed, it becomes a polygon . See representing the angular position, or
also polycurve , contour analysis and pose, of an object (especially 3D) in a
contour representation . [ JKS:6.4] given reference frame. A common
pose: The location and orientation of representation is the rotation matrix ,
an object in a given reference frame, which can be parameterized in different
especially a world or camera reference ways, e.g., Euler angles , pitch -, yaw -,
frame. A classic problem of computer roll -angles, rotation angles around the
vision is pose estimation . [ SQ:4.2.2] coordinate axes, axis-angle , and
quaternions . See also
pose clustering: A class of algorithms orientation estimation and
solving the pose estimation problem rotation representation .
using clustering techniques (see
clustering/cluster analysis ). See also position: Location in space (either 2D
pose , k-means clustering . [ FP:18.3] or 3D ). [ WP:Position (vector)]

pose consistency: An algorithm position dependent brightness


seeking to establish whether two shapes correction: A technique seeking to
are equivalent. Given two sets of points counteract the brightness variation
G1 and G2 , for example, the algorithm caused by a real imaging system,
finds a sufficient number of point typically the fact that brightness
correspondences to determine a decreases as one moves away from the
transformation T between the two sets, optical axis in a lens system with finite
then applies T to all other points of G1 . aperture . This effect may be noticeable
If the transformed points are close to only in the periphery of the image. See
points in G2 , consistency is satisfied. also lens .
Also known as viewpoint consistency. position invariant: Any property
See also feature point correspondence . that does not vary with position. For
[ FP:18.2] instance, the length of a 3D line
pose determination: See segment is invariant to the lines
pose estimation . position in 3D space, but the length of
[ WP:Pose (computer vision)#Pose Estimation]the lines projection on the image plane
is not. See also invariant .

pose estimation: The problem of positron emission tomography


determining the orientation and (PET): A medical imaging method
translation of an object, especially a that can measure the concentration and
3D one, from one or more images
196 P

movement of a positronemitting deviation of the measurements. See also


isotope in living tissue. [ AJ:10.1] accuracy . 2) The number of significant
bits in a floating point or double
postal code analysis: A set of image precision number that lie to the right of
analysis techniques concerned with the decimal point.
understanding written or printed postal [ WP:Accuracy and precision]
codes. See handwritten and
optical character recognition . predictive compression method: A
[ WP:Handwriting recognition] class of image compression algorithms
using redundancy information, mostly
posture analysis: A class of correlation, to build an estimate of a
techniques aiming to estimate the pixel value from values of neighboring
posture of an articulated body, for pixels. [ WP:Linear predictive coding]
instance a human body (e.g., pointing,
sitting, standing, crouching, etc.). pre-processing: Operations on an
[ WP:Motion analysis#Applications] image that, for example, suppress some
distortion(s) or enhance some
potential field: A mathematical feature(s). Examples include
function that assigns some (usually geometric transformations ,
scalar) value at every point in some edge detection , image restoration , etc.
space. In computer vision and robotics, There is no clear distinction between
this is usually a measure of some scalar image pre-processing and
property at each point of a 2D or 3D image processing .
space or image, such as the distance [ WP:Data Pre-processing]
from a structure. The representation is
used in path planning, such that the Prewitt gradient operator: An
potential at every point indicates, for edge detection operator based on
example, the ease/difficulty of getting template matching . It applies a set of
to some destination. [ DH:5.11] convolution masks, or kernels (see
Prewitt kernel ), implementing
power spectrum: In the context of matched filters for edges at various
computer vision , normally the amount (generally eight) orientations. The
of energy at each spatial frequency . magnitude (or strength) of the edge at
The term could also refer to the a given pixel is the maximum of the
amount of energy at each light responses to the masks. Alternatively,
frequency. Also called the power some implementations use the sum of
spectrum density function or spectral the absolute value of the responses from
density function. [ AJ:11.5] the horizontal and vertical masks.
[ JKS:5.2.3]
precision: 1) The repeatability of the
accuracy of a vision system (in general, Prewitt kernel: The mask used by
of an instrument) over many measures the Prewitt gradient operator . The
carried out in the same conditions. horizontal and vertical masks are
Typically measured by the standard [ JKS:5.2.3]:
deviation of a target error measure. For
instance, the precision of a vision
system measuring linear size would be
assessed by taking thousands of
measurements of a perfectly known
object and computing the standard
P 197

and a vector y of projection weights:

~y = A(~x
~)

so that
~x = A1 ~y +
~
Usually only a subset of the
components of ~y is sufficient to
approximate ~x. The elements of this
subset correspond to the largest
primal sketch: A representation for
eigenvalues of the covariance matrix.
early vision introduced by Marr ,
See also
focusing on low-level features like
KarhunenLo`eve transformation .
edges. The full primal sketch groups
[ FP:22.3.1]
the information computed in the raw
primal sketch (consisting largely of principal component basis space:
edge, bar , end and blob feature In principal component analysis , the
information extracted from the images), space generated by the basis formed by
for instance by forming the eigenvectors, or eigendirections, of
subjective contours . See also the covariance matrix.
MarrHildreth edge detection and [ WP:Principal component analysis]
raw primal sketch . [ RN:7.2]
principal component
primary color: A color coding scheme representation: See
whereby a range of perceivable colors principal component analysis .
can be made by a weighted combination [ FP:22.3.1]
of primary colors. For example, color
television and computer screens use principal curvature: The maximum
red, green and blue lightemitting or minimum normal curvature at a
chemicals to produce these three surface point, achieved along a
primary colors. The ability to use only principal direction . The two principal
three colors to generate all others arises curvatures and directions, together
from the tri-chromacy of the human completely specify the local surface
eye, which has cones that respond to shape. The principal curvatures in the
three different color spectral ranges. two directions at the point X on the
See also additive and subtractive cylinder of radius r below are 0 (along
color. [ EH:4.4] axis) and 1r (across axis). [ JKS:13.3.2]

principal component analysis


(PCA): A statistical technique useful
for reducing the dimensionality of data, X
at the basis of many computer vision PRINCIPAL DIRECTIONS
techniques (e.g.,
point distribution models and
eigenspace based recognition ). In
essence, the deviation of a random
vector, ~x, from the population mean, , principal curvature sign class: See
can be expressed as the product of A, mean and Gaussian
the matrix of eigenvectors of the curvature shape classification .
covariance matrix of the population,
198 P

principal direction: The direction in in essence an acyclic graph in which


which the normal curvature achieves nodes represents variables and directed
an extremum, that is, a arcs represent cause and effect. A
principal curvature . The two principal probabilistic causal model is a causal
curvatures and directions, together, graph with the probability distribution
specify completely the local surface of each variable conditional to its
shape. The principal directions at the causes.
point X on the cylinder below are
parallel to the axis and around the probabilistic Hough transform:
cylinder. [ FP:19.1.2] The probabilistic Hough transform
computes an approximation to the
Hough transform by using only a
percentage of the image data. The goal
X is to reduce the computational cost of
PRINCIPAL DIRECTIONS the standard Hough transform. A
threshold effect has been observed so
that if the percentage sampled is above
the threshold level then few false
positives are detected.
principal point: The point at which [ WP:Randomized Hough Transform]
the optical axis of a
pinhole camera model intersects the probabilistic model learning: A
image plane , as in [ JKS:12.9]: class of Bayesian learning algorithms
based on probabilistic networks, that
PINHOLE
allow you to input information at any
PRINCIPAL POINT
node (unlike neural networks), and
OPTICAL
AXIS associate uncertainty coefficients to
classification answers. See also
IMAGE
PLANE SCENE Bayes rule , Bayesian model ,
OBJECT
Bayesian network .
principal texture direction: An [ WP:Bayesian probability]
algorithm identifying the direction of a
texture . A directional or probabilistic principal component
oriented texture in a small image patch analysis: A technique defining a
generates a peak in the probability model for
Fourier transform . To determine the principal component analysis (PCA).
direction, the Fourier amplitude plot is The model can be extended to
regarded as a distribution of physical mixture models , trained using the
mass, and the minimum-inertia axis expectation maximization (EM)
identified. algorithm. The original data is modeled
as being generated by the
privileged viewpoint: A viewpoint reduced-dimensionality subset typical of
where small motions cause image PCA plus Gaussian noise (called a
features to appear or disappear. This latent variable model).
contrasts with a generic viewpoint . [ WP:Nonlinear dimensionality reduction#Gaussian-
process latent variable models]
probabilistic causal model: A
representation used in artificial
intelligence for causal models. The probabilistic relaxation: A method
simplest causal model is a causal graph, of data interpretation in which local
P 199

inconsistencies act as inhibitors and knowledge). A classic example is the


local consistencies act as excitors. The production system . In contrast,
hope is that the combination of these declarative representations encode how
two influences constrains the an entity is structured. [ RJS:7]
probabilities.
Procrustes analysis: A method for
probabilistic relaxation labeling: comparing two data sets through the
An extension of relaxation labeling in minimization of squared errors, by
which each entity to be labeled, for translation, rotation and scaling.
instance each image feature, is not [ WP:Procrustes analysis]
simply assigned to a label, but to a set
of probabilities, each giving the production system: 1) An approach
likelihood that the feature could be to computerized logical reasoning,
assigned a specific label. [ BM:2.9] whereby the logic is represented as a set
of production rules. A rule is of the
probability: A measure of the form LHSRHS. This states that if
confidence one may have in the the pattern or set of conditions encoded
occurrence of an event, on a scale from in the left-hand side (LHS) are true or
0 (impossible) to 1 (certain), and hold, then do the actions specified in
defined as the proportion of favorable the right-hand side (RHS), which may
outcomes to the total number of simply be the assertion of some
possibilities. For instance, the conclusion. A sample rule might be If
probability of getting any number from the number of detected edge fragments
a dice in a single throw is 61 . is less than 10, then decrease the
Probability theory, an important part threshold by 10%. 2) An industrial
of statistics, is the basis of several system that manufactures some
vision techniques. [ WP:Probability] product. 3) A system that is to be
actually used, as compared to a
probability density estimation: A demonstration system. [ RJS:7]
class of techniques for estimating the
density function or its parameters given profiles: A shape signature for image
a sample from a population. A related regions, specifying the number of pixels
problem is testing whether a particular in each column (vertical profile) or row
sample has been generated by a process (horizontal profile). Used in
characterized by a particular pattern recognition. See also shape ,
probability distribution. Two common shape representation . [ SOS:4.9.2]
tests are the goodness-of-fit and the
KolmogorovSmirnov tests. The former progressive image transmission: A
is a parametric test best used with method of transmitting an image in
large samples; the latter gives good which a low-resolution version is first
results with smaller samples, but is a transmitted, followed by details that
non-parametric test and, as such, does allow progressively higher resolution
not produce estimates of the population versions to be recreated.
parameters. See also
non-parametric method . [ VSN:A2.2]

procedural representation: A class


of representations used in artificial
intelligence that are used to encode how
to perform a task (procedural
200 P

projective geometry: A field of


geometry dealing with projective spaces
and their properties. A projective
geometry is one where only properties
preserved by projective transformations
are defined. Projective geometry
provides a convenient and elegant
FIRST BETTER BEST theory to model the geometry of the
IMAGE IMAGE IMAGE common perspective camera . Most
notably, the perspective projection
progressive scan camera: A camera equations become linear. [ FP:13.1]
that transfers an entire image in the
order of left-to-right, top-to-bottom, projective invariant: A property, say
without the alternate line interlacing I, that is not affected by a
used in television standards. This is projective transformation . More
much more convenient for machine specifically, assume an invariant, I(P~ ),
vision and other computer-based of a geometric structure described by a
applications. parameter vector P~ . When the
[ WP:Digital video#Technical overview] structure is subject to a projective
transformation (M) this gives a
structure with parameter vector p~, and
projection: 1) The transformation of
I(P~ ) = I(~
p). The most fundamental
a geometric structure from one space to
projective invariant is the cross ratio .
another, e.g., the projection of a 3D
In some applications, invariants of
point onto the nearest point in a given
weight w occur, which transform as
plane. The projection may be specified
p) = I(P~ )(det M)w . [ TV:10.3.2]
I(~
by a linear function, i.e., for all points
p~ in the initial structure, the points p~ projective plane: A plane, usually
in the projected structure are given by denoted by P 2 , on which a
p~ = M~ p for some matrix M. projective geometry is defined.
Alternatively, the projection need not [ TV:A.4]
be linear, e.g., p~ = f~(~
p). 2) The
specific case of projection of a scene projective reconstruction: The
that creates an image on a plane by use problem of reconstructing the geometry
of, for example, a perspective camera , of a scene from a set or sequence of
according to the rules of perspective . images in a projective space. The
[ VSN:2.1] transformation from projective to
Euclidean coordinates is easy if the
projection matrix: The matrix Euclidean coordinates of the five points
transforming the homogeneous in a projective basis are known. See
projective coordinates of a 3D scene also projective geometry and
point (x, y, z, 1) into the pixel projective stereo vision .
coordinates (u, v, 1) of the points image [ WP:Fundamental matrix (computer vision)#Projective Reconstruction
in a pinhole camera. It can be factored
as the product of the two matrices of
the intrinsic camera parameters and projective space: A space of
extrinsic camera parameters . See also (n + 1)-dimensional vectors, usually
camera coordinates , image coordinates, denoted by P n , on which a
scene coordinates . [ FP:2.2-2.3]
P 201

projective geometry is defined. prototype: An object or model serving


[ FP:13.1.1] as representative example for a class,
capturing the defining characteristics of
projective stereo vision: A class of the class. [ WP:Prototype]
stereo algorithms based on
projective geometry . Key concepts proximity matrix: A matrix M
expressed elegantly by the projective occurring in cluster analysis . M(i, j)
framework are epipolar geometry , denotes the distance (e.g., the
fundamental matrix , and Hamming distance ) between clusters i
projective reconstruction . and j.

projective stratum: A layer in the pseudocolor: A way of assigning a


stratification of 3D geometries. Moving color to pixels that is based on an
from the simplest to the most complex, interpretation of the data rather than
we have the projective, affine, metric the original scene color. The usual
and Euclidean strata. See also purpose of pseudocoloring is to label
projective geometry , image pixels in a useful manner. For
projective reconstruction . example, one common pseudocoloring
assigns different colors according to the
projective transformation: Also local surface shape class . A
known as projectivity, from one pseudocoloring scheme for aerial or
projective plane to another. It can be satellite images of the earth assigns
represented by a non-singular 3 3 colors according to the land type, such
matrix acting on as water, forest, wheat field, etc.
homogeneous coordinates . The [ JKS:7.7]
transformation has eight
degrees of freedom , as only the ratio of PSF: See point spread function .
projective coordinates is significant. [ FP:7.2.2]
[ FP:2.1.2]
purposive vision: An area of
property based matching: The computer vision linking perception with
process of comparing two entities (e.g., purposive action; that is, modifying the
image features or patterns) using their position or parameters of an imaging
properties, e.g., the moments of a system purposively, so that a visual
region. See also classification , task is facilitated or made possible.
boundary property , metric property . Examples include changing the lens
parameters so to obtain information
property learning: A class of about depth , as in depth from defocus,
algorithms aiming at learning and or moving around an object to achieve
characterizing attributes of full shape information.
spatio-temporal patterns. For example,
learning the color and texture pyramid: A representation of an
distributions that differentiate beween image including information at several
normal and cancerous cells. See also spatial scales . The pyramid is
boundary property , metric property, constructed by the original image
unsupervised learning and (maximum resolution) and a
supervised learning . scale operator that reduces the content
[ WP:Supervised learning] of the image (e.g., a Gaussian filter) by
discarding details at coarser scales:
202 P

Gaussian pyramid , Laplacian pyramid ,


64x64 pyramid transform . [ JKS:3.3.2]

pyramid architecture: A computer


architecture supporting pyramid-based
128x128
processing, typically occurring in the
context of multi-scale processing. See
also scale space , pyramid ,
256x256 image pyramid , Laplacian pyramid ,
Gaussian pyramid . [ JKS:3.3.2]

pyramid transform: An operator for


Applying the operator and subsampling building a pyramid from an image. See
the resulting image leads to the next pyramid , image pyramid ,
(lower-resolution) level of the pyramid. Laplacian pyramid , Gaussian pyramid .
See also scale space , image pyramid , [ JKS:3.3.2]
Q

QBIC: See query by image content .


[ WP:Content- quadric: A surface defined by a
second-order polynomial. See also
based image retrieval#Other query methods]
conic. [ FP:2.1.1]

quadratic variation: 1) Any function quadric patch: A quadric surface


(here, expressing a variation of some defined over a finite region of the
variables) that can be modeled by a independent variables or parameters;
quadratic polynomial. 2) The specific for instance, in range image analysis, a
measure of surface shape deformation part of a range surface that is well
2 2 2
fxx + 2fxy + fyy of a surface f (x, y). approximated by a quadric (e.g., an
This measure has been used to elliptical patch). [ WP:Quadric]
constrain the smoothness of
quadric patch extraction: A class of
reconstructed surfaces. [ BKPH:8.2]
algorithms aiming to identify the
quadrature mirror filter: A class of portions of a surface that are well
filters occurring in wavelet and image approximated by quadric patches .
compression filtering theory. The filter Techniques are similar to those applied
splits a signal into a high pass for conic fitting . See also
component and a low pass component, surface fitting,
with the low pass components transfer least square surface fitting .
function a mirror image of that of the
quadrifocal tensor: An algebraic
high pass component.
constraint imposed on quadruples of
[ WP:Quadrature mirror filter]
corresponding points by the geometry

203
204 Q

of four simultaneous views, analogous


to the epipolar constraint for the
two-camera case and to the
trifocal tensor for the three-camera
case. See also stereo correspondence,
epipolar geometry . [ FP:10.3]

quadrilinear constraint: The


geometric constraint on four views of a
point (i.e., the intersection of four
epipolar lines ). See also
epipolar constraint and
trilinear constraint . [ FP:10.3]

quadtree: A hierarchical structure


representing 2D image regions, in which
each node represents a region, and the
whole image is the root of the tree.
Each non-leaf node, representing a
region R, has four children, that
represent the four subregions into which
R is divided:, as illustrated below.
Hierarchical subdivision continues until
the remaining regions have constant
properties. Quadtrees can be used to
create a compressed image structure.
The 3D extension of a quadtree is the
octree . [ SQ:5.9.1]

qualitative vision: A paradigm based


on the idea that many perceptual tasks
could be better accomplished by
computing only qualitative descriptions
of objects and scenes from images, as
opposed to quantitative information
like accurate measurements. Suggested
in the framework of computational
theories of human vision. [ VSN:10]

quantization: See
spatial quantization . [ SEU:2.2.4]
Q 205

quasi-invariant: An approximation of
quantization error: The an invariant . For instance,
approximation error created by the quasi-invariant parameterizations of
quantization of a continuous variable, image curves have been built by
typically using a regularly spaced scale approximating the invariant arc length
of values. This figure with lower spatial derivatives.
[ WP:Quasi-invariant measure]

quaternion: A forerunner of the


modern vector concept, invented by
5 Hamilton, used in vision to represent
4 rotations . Any rotation matrix , R, can
3 be parameterized by a vector of four
2 numbers, ~q = (q0 , q1 , q2 , q3 ), such that
P3 2
k=0 qk = 1, that define uniquely the
1 rotation. A rotation has two
0 representations, ~q and ~q. See
rotation matrix for alternative
shows a continuous function (dashed) representations of rotations.
and its quantized version (solid line) [ FP:21.3.1]
using six values only. The quantization
error is the vertical distance between query by image content (QBIC): A
the two curves. For instance, the class of techniques for selecting
intensity values in a digital image can members from a database of images by
only take on a certain number (often using examples of the desired image
256) of discrete values. See also content (as opposed to textual search).
sampling theorem and Examples of contents include color,
Nyquist sampling rate . [ SQ:4.2.1] shape, and texture. See also
image database indexing .
quantization noise: See [ WP:Content-
quantization error . [ SQ:4.2.1] based image retrieval#Other query methods]
R

RS curve: A contour representation radial lens distortion: A type of


giving the distance, r, of each point of geometric distortion introduced by a
the contour from an origin chosen real lens. The effect is to shift the
arbitrarily, as a function of the arc position of each image point, p, away
length, s. Allows rotation-invariant from its true position, along the line
comparison of contour. See also through the image center and p. See
contour , shape representation . also lens , lens distortion ,
barrel distortion , tangential distortion ,
pin cushion distortion ,
s s=0
distortion coefficient . This figure shows
r(0)
the typical deformations of a square
r(s)
(exaggerated) [ FP:3.3]:
X

radar: An active sensor detecting the


presence of distant objects. A narrow
beam of very high-frequency radio
pulses is transmitted and reflected by a
target back to the transmitter. The
direction of the reflected beam and the
time of flight of the pulse determine the
targets position. See also
time-of-flight range sensor . [ TV:2.5.2] radiance: The amount of light
(radiating energy) leaving a surface.
The light can be generated by the
206
R 207

surface itself, as in a light source , or figure (usually the center of gravity or a


reflected by it. The surface can be real physically meaningful point). The
(e.g., a wall) or imaginary (e.g., an representation then records the distance
infinite plane). See also irradiance , r() from ~c to points on the boundary,
radiometry . [ FP:4.1.3] as a function of , which is the angle
between the direction and some
radiance map: A map of radiance for reference direction. The representation
a scene. Sometimes used to refer to a has problems when the vector at angle
high dynamic range image. [ FP:4.1.3] intersects the boundary more than
one time. See:
radiant flux: The radiant energy per
time unit, that is, the amount of energy
transmitted or absorbed per time unit. r()

See also radiance , irradiance ,
radiometry . [ EH:3.3.1] c

radiant intensity: See radiant flux.


[ EH:3.3.1]

radiometric calibration: A process radon transform: A transformation


seeking to estimate radiance from pixel mapping an image into a parameter
values. The rationale for radiometric space highlighting the presence of lines.
calibration is that the light entering a It can be regarded as an extension of
real camera (the radiance) is, in the Hough transform . One definition is
general, altered by the camera itself. A
simple calibration model is g(, ) =
E(i, j) = g(i, j)I + o(i, j), where, for Z Z
each pixel (i, j), E is the radiance to I(x, y)( x cos y sin )dxdy
estimate, I the measured intensity, and
g and o a pixel-specific gain and offset where I(x, y) is the image (gray values)
to be calibrated. Ground truth values and = x cos + y sin is a parametric
for E can be measured using line in the image. Lines are identified
photometers. by peaks in the , space. See also
[ WP:Radiometric calibration] Hough transform line finder .
[ AJ:10.2]
radiometry: The measurement of
optical radiation, i.e., electromagnetic RAG: See region adjacency graph .
radiation between 3 1011 and 3 1016 [ JKS:3.3.4]
Hz (wavelengths between 0.01 and 1000
m). This includes ultraviolet, visible random access camera: A random
and infrared. Common units access camera is characterized by the
encountered are watts photons possibility of accessing any image
m2 and secsteradian .
Compare with photometry , which is location directly. The name was
the measurement of visible light. introduced to distinguish such cameras
[ FP:4.1] from sequential scan cameras, where
image values are transmitted in a
radius vector function: A contour standard order.
or boundary representation based
about a point ~c in the center of the random dot stereogram: A stereo
pair formed by one random dot image
208 R

(that is, binary images in which each compression it will be hard to see the
pixel is assigned to black or white at structure in the pixels with the low
random), and a second image that is values. The left image shows the
derived from the first. This figure magnitude of a 2D Fourier transform
with a single bright spot in the middle.
The right image shows the logarithm of
the left image, revealing more details.
[ AJ:7.2]

shows an example, in which a central


square is shifted horizontally. Looking
cross-eyed at close distance, you should
perceive a strong 3D effect. See also
stereo and stereo vision . [ VSN:7.1] range data: A representation of the
spatial distribution of a set of 3D
random sample consensus: See points. The data is often acquired by
RANSAC. [ FP:15.5.2] stereo vision or by a range sensor . In
computer vision, range data are often
random variable: A scalar or a vector
represented as cloud of points, i.e., a
variable that takes on a random value.
set of triplets representing the X, Y, Z
The set of possible values may be
coordinate of each point, or as
describable by a standard distribution,
range images , also known as moir`e
such as the Gaussian ,
patch. The figure below shows a range
mixture of Gaussians , uniform , or
image of an industrial part, where
Poisson distributions. [ VSN:A2.2]
brighter pixels are closer [ TV:2.5]:
randomized Hough transform: A
variation of the standard
Hough transform designed to produce
higher accuracy with less
computational effort. The line-finding
variant of the algorithm selects pairs of
image edge points randomly and
increments the accumulator cell
corresponding to the line through these
two points. The selection process is
repeated a fixed number of times.
[ WP:Randomized Hough Transform]

range compression: Reducing the


dynamic range of an image to enhance
the appearance of the image. This is range data fusion: The merging of
often needed for images resulting from multiple sets of range data , especially
the magnitude of the Fourier transform for the purpose of 1) extending the
which might have pixels with both large portion of an objects surface described
and very low values. Without range by the range data, or 2) increasing the
R 209

accuracy of measurements by exploiting


the redundancy of multiple measures
available for each point of surface area.
See also information fusion , fusion ,
sensor fusion .

range data integration: See


range data fusion .

range data registration: See


range image edge detector: An
registration . [ FP:21.3]
edge detector working on range images.
range data segmentation: A class of Typically, edges occur where depths or
techniques partitioning range data into surface normal directions ( fold edge )
a set of regions. For instance, a change rapidly. See also edge detection,
well-known method for segmenting range images . The right image shows
range images is HK segmentation , the depth and fold edges extracted from
which produces a set of surface patches the left range image:
covering the initial surface. The right
image shows the plane, cylinder and
spherical patches extracted from the
left range image. See also
surface segmentation . [ RN:9.3]

range sensor: Any sensor acquiring


range data . The most popular range
sensors in computer vision are based on
optical and acoustic technologies. A
laser range sensor often uses
range edge: See structured light triangulation . A
surface shape discontinuity time-of-flight range sensor measures the
round-trip time of an acoustic or optical
range flow: A class of algorithms for pulse. See also depth estimation. An
the measurement of motion in example of a triangulation range sensor
time-varying range data, made possible is [ TV:2.5.2]
by the evolution of fast range sensors .
See also optical flow .
LASER CAMERA
range image: A representation of PROJECTOR

range data as an image. The pixel


coordinates are related to the spatial
position of each point on the range OBJECT
TO SCAN
surface, and the pixel value represents STRIPE IMAGE

the distance of the surface point from


LASER STRIPE
the sensor (or from an arbitrary, fixed
background). The figure below shows a
range image of a face, where darker rank order filtering: A class of
pixels are closer [ JKS:11.4]: filters the output of which depends on
210 R

an ordering (ranking) of the pixels of the input value (assuming it is a


within the region of support . The Gaussian random variable). Then
2
classic example is the median filter Rd = max(0, 12 log2 ( D )). [ AJ:2.13]
which selects the middle value of the
set of input values. More generally, the raw primal sketch: The first
filter selects the kth largest value in the representation built in the perception
input set. [ SB:4.4.3] process according to Marrs theory of
vision, heavily based on detection of
RANSAC: Acronym for random local edge features. It represents the
sample consensus, a robust estimator location, orientation, contrast and scale
seeking to counter the effect of outliers of centersurround , edge , bar and
in data used, for example, in a least truncated bar features. See also
square estimation problem. In essence, primal sketch .
RANSAC considers a number of data
subsets of the minimum size necessary RBC: See recognition by components .
to solve the problem (say a parametric [ WP:Recognition by Components Theory]
surface fit), then looks for statistical
agreement of the results. See also
least median square estimation , real time processing: Any
M-estimation , outlier rejection . computation performed within the time
[ FP:15.5.2] limits imposed by a given process. For
example, in visual servoing a tracking
raster scan: Raster refers to the system feeds positional data to a
region of a monitor, e.g., a cathode ray control algorithm generating control
tube (CRT) or a liquid crystal display signals; if the control signals are
(LCD) capable of rendering images. In generated too slowly, the whole system
a CRT, the raster is a sequence of may become unstable. Different
horizontal lines that are scanned processes can impose very different
rapidly with an electron beam from left constraints for real time processing.
to right and top to bottom, largely in When processing video-stream data,
the same way as a TV picture tube is real time means complete processing of
scanned. In an LCD, the raster (usually one frame of data in the time before the
called a grid) covers the whole device next frame is acquired (possibly with
area and is scanned differently, in that several frames lag time as in a
image elements are displayed pipeline parallel process).
individually. [ AL:4.2] [ WP:Real-time computing]

rate-distortion: A statistical method receiver operating curves and


useful in analog-to-digital conversion. Itperformance analysis for vision: A
determines the minimum number of receiver operating curve (ROC) is a
bits required to encode data while diagram showing the performance of a
tolerating a given level of distortion, orclassifier. It plots the number or
vice versa. [ AJ:2.13] percentage of true positives against the
number or percentage of true negatives.
rate-distortion function: The Performance analysis is a substantial
number of bits per sample (the rate Rd ) topic in computer vision and the object
to encode an analog image (or other of an ongoing debate. See also
signal) value given the allowable performance characterization , test ,
distortion D (or mean square of the classification . [ FP:22.2.1]
error). Also needed is the variance 2
R 211

many shape from X methods reported


receptive field: 1) The retinal area (see shape from contour and following
generating the response to a entries). [ TV:7.4]
photostimulus. The main cells
responsible for visual perception in the reconstruction error: Inaccuracies in
retina are the rods and the cones, active a model when compared to reality.
in high- and low-intensity situations These can be caused by inaccurate
respectively. See also sensing or compression. (See
photopic response. 2) The region of lossy compression .)
visual space giving rise to that response. [ WP:Constructivism (learning theory)#Pedagogies based on constructivism
3) The region of an image that is input
to the calculation of each output value.
(See region of support .) [ FP:1.3] rectification: A technique warping
two images into some form of geometric
recognition: See identification . alignment, e.g., so that the vertical
[ TV:10.1] pixel coordinates of corresponding
points are equal. See also
recognition by components (RBC): stereo image rectification . This figure
1) A theory of human image shows a stereo pair (top row) and its
understanding devised by Biederman. rectified version (bottom row),
The foundation is a set of 3D shape highlighting some of the corresponding
primitives called geons , reminiscent of scanlines, where corresponding image
Marrs generalized cones . Different features lie [ JKS:12.5]:
combinations of geons yield a large
variety of 3D shapes, including
articulated objects. 2) The recognition
of a complex object by recognizing
subcomponents and then combining
these to recognize more complex
objects. See also hierarchical matching ,
shape representation ,
model based recognition ,
object recognition .
[ WP:Recognition by Components Theory]

recognition by
parts: See recognition by components.
[ WP:Object recognition (computer vision)#Recognition
recursive region by growing:
parts] A class of
recursive algorithms for region growing
. An initial pixel is chosen. Given an
recognition by structural adjacency rule to determine the
decomposition: See neighbors of a pixel, (e.g., 8-adjacency),
recognition by components . the neighboring pixels are explored. If
reconstruction: The problem of any meets the criteria for addition to
computing the shape of a 3D object or the region, the growing procedure is
surface from one or more intensity or called recursively on that pixel. The
range images. Typical techniques process continues until all connected
include model acquisition and the image pixels have been examined. See
also adjacent , image connectedness ,
212 R

neighborhood recursive splitting . reflectance estimation: A class of


[ SQ:8.3.1] technique for estimating the
bidirectional reflectance distribution
recursive splitting: A class of function (BDRF) . Used notably within
recursive algorithms for region the techniques for shape from shading
segmentation, dividing an image into a and image based rendering , which
region set. The region set is initialized seeks to render arbitrary images of
to the whole image. A homogeneity scenes from video material only. All
criterion is then applied; if not satisfied, information about geometry and
the image is split according to a given photometry (e.g., the BDRF) is derived
scheme (e.g., into four sub-images, as in from video. See also
a quadtree ), leading to a new region physics based vision . [ FP:4.2.2]
set. The procedure is applied
recursively to all regions in the new reflectance map: The reflectance
region set, until all remaining regions map expresses the reflectance of a
are homogeneous. See also material in terms of a viewer-centered
region segmentation , representation of local
region based segmentation , surface orientation . The most
recursive region growing . [ RN:8.1.1] commonly used is the Lambertian
reflectance map, based on
reference frame transformation: Lamberts law . See also
See coordinate system transformation . shape from shading ,
[ WP:Rotating reference frame] photometric stereo . [ JKS:9.3]
reference image: An image of a reflectance ratio: A
known scene or of a scene at a photometric invariant used for
particular time used for comparison segmentation and recognition. It is
with a current image. See, for example, based on the observation that the
change detection . illumination on both sides of a
reflectance or color edge is nearly the
reference views: In
same. So, although we cannot factor
iconic recognition, the views chosen as
out the reflectance and illumination
most representative for a 3D object.
from only the observed lightness, the
See also eigenspace based recognition ,
ratio of the lightnesses on both sides of
characteristic view .
the edge equals the ratio of the
reference white: A sample image reflectances, independent of
value which corresponds to a known illumination. Thus the ratio is invariant
white object. The knowledge of such a to illumination and local surface
value facilitates white balance geometry for a significant class of
corrections. [ WP:White point] reflectance maps . See also invariant ,
physics based vision .
reflectance: The ratio of reflected to
incident flux, in other words the ratio of reflection: 1) A mathematical
reflected to incident (light) power. See transformation where the output image
also bidirectional is the input image flipped over about a
reflectance distribution function . given transformation line in the image
[ JKS:9.1.2] plane. See reflection operator . 2) An
optics phenomenon whereby all incident
light incident on a surface is deflected
R 213

away, without absorption, diffusion or where n1 and n2 are the refraction


scattering. An ideal mirror is the indices of the two media, and 1 , 2 the
perfect reflecting surface. Given a respective refraction angles [ EH:4.1]:
single ray of light incident on a
reflecting surface, the angle of incidence INCIDENT RAY
equals the angle of reflection, as shown
below. See also specular reflection .


MEDIUM 1
INCIDENT RAY REFLECTED RAY
MEDIUM 2

REFRACTED RAY

region: A connected part of an image,


usually homogeneous with respect to a
[ WP:Reflection (physics)]
given criterion. [ BB:5.1]
reflection operator: A linear
region adjacency graph (RAG): A
transformation intuitively changing
graph expressing the adjacency
each vector or point of a given space to
relations among image regions , for
its mirror image, as shown below. The
instance generated by a segmentation
transformation corresponding matrix,
algorithm. See also
say H, has the property HH = I, i.e.,
region segmentation and
H1 = H: a reflection matrix is its own
region based segmentation . The
inverse. See also rotation .
adjacency relations of the regions in the
left figure are encoded in the RAG at
the right [ JKS:3.3.4]:

B B
A A D
C D C

region based segmentation: A class


of segmentation techniques producing
a number of image regions, typically on
refraction: An optical phenomenon the basis of a given homogeneity
whereby a ray of light is deflected while criterion. For instance, intensity image
passing through different optic regions can be homogeneous by color
mediums, e.g., from air to water. The (see color image segmentation ) or
amount of deflection is governed by the texture properties (see
difference between the refraction indices texture field segmentation ); range
of the two mediums, according to image regions can be homogeneous by
Snells law: shape or curvature properties (see
n1 n2 HK segmentation ). [ JKS:3.2]
=
sin(1 ) sin(2 )
214 R

region boundary extraction: The merged into the region when the data
problem of computing the boundary of are consistent with the previous region.
a region, for example, the contour of a The region is often redescribed after
region in an intensity image after each new set of data is added to it.
color based segmentation . Many region growing algorithms have
the form: 1) Describe the region based
region decomposition: A class of on the current pixels that belong to the
algorithms aiming to partition an image region (e.g., fit a linear model to the
or region thereof into regions . See also intensity distribution). 2) Find all
region based segmentation . [ JKS:3.2] pixels adjacent to the current region. 3)
Add an adjacent pixel to the region if
region descriptor: 1) One or more
the region description also describes
properties of a region, such as
this pixel (e.g., it has a similar
compactness or moments . 2) The data
intensity). 4) Return to step 1 as long
structure containing all data pertaining
as new pixels continue to be added. A
to a region . For instance, for image
similar algorithm exists for region
regions this could include the regions
growing with 3D points, giving a
position in the image (e.g., the
surface fitting . The data points could
coordinates of the center of mass ), the
come from a regular grid (pixel or
regions contour (e.g., a list of 2D
voxel) or from an unstructured list. In
coordinates), some indicator of the
the latter case, it is harder to determine
region shape (e.g., compactness or
adjacency. [ JKS:3.5]
perimeter squared over area), and the
value of the regions homogeneity index. region identification: A class of
[ NA:7.3] algorithms seeking to identify regions
with special properties, for instance, a
region detection: A vast class of
human figure in a surveillance video, or
algorithms seeking to partition an
road vehicles in an aerial sequence.
image into regions with particular
Region identification covers a very wide
properties. See for details
area of techniques spanning many
region identification , region labeling ,
applications, including remote sensing ,
region matching ,
visual surveillance , surveillance , and
region based segmentation .
agricultural and forestry surveying. See
[ SOS:4.3.2]
also target recognition ,
region filling: A class of algorithms automatic target recognition (ATR),
assigning a given value to all the pixels binary object recognition ,
in the interior of a closed contour object recognition , pattern recognition
identifying a region . For instance, one .
may want to fill the interior of a closed
region invariant: 1) A property of a
contour in a binary image with zeros or
region that is invariant (does not
ones. See also morphology ,
change) after some transformation is
mathematical morphology ,
applied to the region, such as
binary mathematical morphology .
translation , rotation or
[ SOS:4.3.2]
perspective projection . 2) A property
region growing: A class of algorithms or function which is invariant over a
that construct a connected region by region .
incrementally expanding the region,
usually at the boundary . New data are
R 215

region labeling: A class of algorithms


which are used to assign a label or
meaning to each image region in a
given image segmentation to achieve an
appropriate image interpretation .
Representative techniques are
relaxation labeling ,
probabilistic relaxation labeling , and
interpretation trees (see
interpretation tree search ). See also
labeling problem .

region matching: 1) Establishing the


correspondences between matching region of support: The subregion of
members of two sets of regions. 2) The an image that is used in a particular
degree of similarity between two computation. For example, an
regions, i.e., solving the matching edge detector usually only uses a
problem for regions. See, for instance, subregion of pixels neighboring the
template matching, color matching , pixel currently being considered for
color histogram matching . being an edge. [ KMJ:5.4.1-5.4.2]

region merging: A class of algorithms region neighborhood graph: See


fusing two image regions into one if a region adjacency graph . [ JKS:3.3.4]
given homogeneity criterion is satisfied.
See also region , region propagation: The problem of
region based segmentation , tracking moving image regions.
region splitting . [ RJS:6] region representation: A class of
region of interest: A subregion of an methods to represent the defining
image where processing is to occur. characteristics of an image region . For
Regions of interest may be used to: 1) encoding the shapes, see
reduce the amount of computation that axial representation , convex hull ,
is required or 2) to focus processing so graph model , quadtree ,
that image data outside the region do run-length coding , skeletonization . For
not distract from or distort results. As encoding a region by its properties, see
an example, when tracking a target moments , curvature scale space ,
through an image sequence, most Fourier shape descriptor ,
algorithms for locating the target in the wavelet descriptor ,
next video frame only consider image shape representation . [ JKS:3.3]
data from a region of interest region segmentation: See
surrounding the predicted target region based segmentation . [ JKS:3.2]
position. The figure shows a boxed
region of interest: region snake: A snake representing
[ WP:Region of interest] the boundary of some region . The
operation of computing of the snake
may be used as a region segmentation
technique.

region splitting: A class of


algorithms dividing an image, or a
216 R

region thereof, into parts (subregions) if where that functionality did not exist.
a given homogeneity criterion is not [ WP:Regression analysis]
satisfied over the region. See also
region , region based segmentation , regularization: A class of
region merging . [ RJS:6] mathematical techniques to solve an
ill-posed problem . In essence, to
registration: A class of techniques determine a single solution, one
aiming to align , superimpose, or match introduces the constraint that the
two objects of the same kind (e.g., solution must be smooth, in the
images, curves, models); more intuitive sense that similar inputs must
specifically, to compute the geometric correspond to similar outputs. The
transformation superimposing one to problem is then cast as a
the other. For instance, image variational problem , in which the
registration determines the region variational integral depends both on the
common to two images, thereby finding data and on the smoothness constraint.
the planar transformation (rotation and For instance, a regularization approach
translation) aligning them; similarly, to the problem of estimating a function
curve registration determines the f from a set of values y1 , y2 , . . . , yn at
transformation aligning the similar (or the data point ~x1 , . . . , ~xn , leads to the
same) part of two curves. This figure minimization of the functional
N
X
H(f ) = (f (~xi ) yi )2 + (f )
i=1

where (f ) is the smoothness


functional, and a positive parameter
shows the registration (right) of the called the regularization number.
solid (left) and dashed (middle) curves. [ JKS:13.7]
The transformation needs not be rigid;
non-rigid registration is common in relational graph: A graph in which
medical imaging, for instance in the arcs express relations between the
digital subtraction angiography . Notice properties of image entities (e.g.,
also that most often there is no exact regions or other features) which are the
solution, as the two objects are not nodes in the graph. For regions, for
exactly the same, and the best instance, commonly used properties are
approximate solution must be found by adjacency, inclusion, connectedness,
least squares or more complex methods. and relative area size. See also
See also Euclidean transformation , region adjacency graph (RAG),
medical image registration , shape representation . The adjacency
model registration , relations of the regions in the left figure
multi-image registration. [ FP:21.3] are encoded in the RAG at the right
[ DH:12.2.2]:
regression: 1) In statistics, the
relationship between one variable and
another, as in linear regression . A B B
particular case of curve and surface A A D
fitting . 2) Regression testing verifies C D C
that changes to the implementation of a
system have not caused a loss of relational matching: A class of
functionality, or regression to the state matching algorithms based on
R 217

relational descriptors. See also As the number of iterations increases,


relational graph . [ BB:11.2] the effect of local constraints are
propagated to farther and farther parts
relational model: See of the network. Convergence is achieved
relational graph . [ DH:12.2.2] when no more changes occur, or
changes become insignificant. See also
relational shape description: A
discrete relaxation , relaxation labeling ,
class of shape representation
probabilistic relaxation labeling .
techniques based on relations between
[ SQ:6.1]
the properties of image entities (e.g.,
regions or other features). For regions, relaxation labeling: A relaxation
for instance, commonly used properties technique for assigning a label from a
are adjacency, inclusion, connectedness, discrete set to each node of a network
and relative area size. See also or graph. A well-known example, a
relational graph , classic in artificial intelligence, is
region adjacency graph . Waltzs line labeling algorithm (see also
line drawing analysis ). [ JKS:14.3]
relative depth: The difference in
depth values (distance from some relaxation matching: A relaxation
observer ) for two points. In certain labeling technique for model matching,
situations while it may not be possible the purpose of which is to label (match)
to compute actual or absolute depth, it each model primitive with a scene
may be possible to compute relative primitive. Starting from an initial
depth. labeling, the algorithm harmonizes
iteratively neighboring labels using a
relative motion: The motion of an
coherence measure for the set of
object with respect to some other,
matches. See also discrete relaxation,
possibly also moving, frame of reference
relaxation labeling ,
(typically the observers).
probabilistic relaxation labeling .
relative orientation: The problem of
relaxation segmentation: A class of
computing the orientation of an object
segmentation techniques based on
with respect to another coordinate
relaxation . See also
system, such as that of the sensor.
image segmentation . [ BT:5]
More specifically, the rotation matrix
aligning the reference frames attached remote sensing: The acquisition,
to the object and second object. See analysis and understanding of imagery,
also pose and pose estimation . mainly of the Earths surface, acquired
[ JKS:12.4] by airplanes or satellites. Used
frequently in agriculture, forestry,
relaxation: A technique for assigning
meteorological and military
values from a continuous or discrete set
applications. See also
to the node of a network or graph by
multi-spectral analysis ,
propagating the effects of local
multi-spectral image ,
constraints. The network can be an
geographic information system (GIS).
image grid, in which case the pixels are
[ RJS:6]
nodes, or features, for instance edges or
regions. At each iteration, each node representation: A description or
interacts with its neighbors, altering its model specifying the properties defining
value according to the local constraints. an object or class of objects. A classic
218 R

example is shape representation, a removes the slowly varying components


group of techniques for describing the by exploiting the fact that the observed
geometric shape of 2D and 3D objects. brightness B = L I is product of the
See also Koenderinks surface lightness (or reflectance) L and
surface shape classification . the illumination I. By taking the
Representations can be symbolic or logarithm of B at each pixel, the
non-symbolic (see product of L and I become a sum of
symbolic object representation and logarithms. Slow changes can be
non-symbolic representation ), a detected by differentiation and then
distinction inherited from artificial removed by thresholding.
intelligence. Re-integration of the result produces
[ WP:Representation (mathematics)] the lightness image (up to an arbitrary
scale factor). [ BKPH:9.3]
resection: The computation of the
position of a camera given the images reverse engineering: The problem of
of some known 3D points. Also known generating a model of a 3D object from
as camera calibration , or a set of views, for instance a VRML or
pose estimation . [ HZ:21.1] a triangulated model . The model can
be purely geometric, that is, describing
resolution: The number of pixels per just the objects shape, or combine
unit area, length, visual angle, etc. shape and textural properties.
[ AL:p. 236] Techniques exists for reverse
engineering from both range images
restoration: Given a noisy sample of
and intensity images. See also
some true data, the goal of restoration
geometric model , model acquisition .
is to recover the best possible estimate
[ TV:4.6]
of the original true data, using only the
noisy sample. [ TV:3.1.1] RGB: A format for color images,
encoding the Red, Green, and Blue
reticle: The network of fine wires or
component of each pixel in separate
receptors placed in the focal plane of an
channels. See also YUV , color image.
optical instrument for measuring the
[ FP:6.3.1]
size or position of the objects under
observation. [ WP:Reticle] ribbon: A shape representation for
pipe-like planar objects whose contours
retinal image: The image which is
are approximately parallel, e.g., roads
formed on the retina of the human eye.
in aerial imagery. See also
[ VSN:1.2.2]
generalized cones ,
retinex: An image enhancement shape representation.
algorithm based on retinex theory, [ FP:24.2.2-24.2.3]
aimed to compute an
ridge: A particular type of
illuminant-independent quantity called
discontinuity of the intensity function,
lightness at each image pixel. The key
giving rise to thick edges and lines.
observation is that normal illumination
This figure
on a surface changes slowly, leading to
slow changes in the observed brightness
of a surface. This contrasts with strong
changes in brightness at reflectance
and fold edges. The retinex algorithm
R 219

from a sequence of images by assuming


that there are no changes in shape.
INTENSITY Rigidity simplifies the problem
significantly so that changes in
appearance arise solely from changes in
relative position and projection.
Techniques exist for using known
3D models , or estimating the motion of
PIXEL POSITION
a general cloud of 3D points , or from
image feature points or estimating
shows a characteristic motion from optical flow . See also
dark-to-light-to-dark intensity ridge motion estimation , egomotion .
profile along a scanline. See also [ BKPH:17.2]
step edge , roof edge , edge detection .
[ WP:Ridge detection] rigid registration: Registration
where neither the model nor data is
ridge detection: A class of allowed to deform. This reduces
algorithms, especially edge and line registration to estimating the
detectors, for detecting ridges in Euclidean transformation that aligns
images. [ WP:Ridge detection] the model with the data. See also
non-rigid registration .
right-handed coordinate system: A
3D coordinate system with the XYZ rigidity constraint: The assumption
axes arranged as follows. The that a scene or object under analysis is
alternative is a rigid, implying that all 3D points
left-handed coordinate system . remain in the same relative position in
[ FP:2.1.1] space. This constraint can simplify
significantly many algorithms, for
+Y instance shape reconstruction (see
shape and following shape from
entries) and motion estimation .
[ JKS:14.7]

+X road structure analysis: A class of


techniques which are used to derive
information about roads from images.
+Z These can be close-up images (e.g.,
images of the tarmac as acquired from
(OUT OF PAGE) a moving vehicles, to map defects
rigid body segmentation: The automatically over extended distances)
problem of partitioning automatically or remotely sensed images (e.g., to
the image of an articulated or analyze the geographical structure of
deformable body into a number of rigid road networks).
subcomponents. See also
part segmentation , Roberts cross gradient operator:
recognition by components (RBC) . An operator used for edge detection,
computing an estimate of perpendicular
rigid motion estimation: A class of components of the image gradient at
techniques aiming to estimate the 3D each pixel. The image is convolved with
motion of a rigid body or scene in space the two Roberts kernels , yielding two
220 R

G
components, Gx and Gy ,q
for each pixel. arctan Gxy can then be estimated as for
The gradient magnitude G2x + G2y any 2D vector. See also edge detection ,
G
Roberts cross gradient operator ,
and orientation arctan Gxy can then be Sobel gradient operator , Sobel kernel ,
estimated as for any 2D vector. See also Canny edge detector ,
edge detection , Canny edge detector , Deriche edge detector,
Sobel gradient operator , Sobel kernel , Hueckel edge detector ,
Deriche edge detector, Kirsch edge detector ,
Hueckel edge detector , MarrHildreth edge detector ,
Kirsch edge detector , OGorman edge detector . [ SEU:2.3.5]
MarrHildreth edge detector ,
OGorman edge detector ,
Robinson edge detector . [ JKS:5.2.1] robust: A general term referring to a
technique which is insensitive to noise
Roberts kernel: A pair of kernels, or or other perturbations. [ FP:15.5]
masks, used to estimate perpendicular
components of the image gradient robust estimator: A statistical
within the estimator which, unlike normal
Roberts cross gradient operator : least square estimators , is not
distracted by even significant
percentages of outliers in the data.
0 1 1 0 Popular robust estimators in computer
vision include RANSAC ,
least median of squares , and
1 0 0 1 M-estimators . See also
outlier rejection. [ FP:15.5]
The masks respond maximally to edge
oriented to plus or minus 45 degrees robust regression: A form of
from the vertical axis of the image. regression that does not use outlier
[ JKS:5.2.1] values in computing the fitting
parameters. For example, if doing a
Robinson edge detector: An least square straight line fit to a set of
operator for edge detection, computing data, normal regression methods use all
an estimate of the directional first data points, which can give distorted
derivatives of the image in eight results if even one point is very far away
directions. The image is convolved with from the true line. Robust processes
the eight kernels, three of which as either eliminate these outlying points or
shown here reduce their contribution to the results.
The figure below shows a rejected
1 1 1 1 1 1 1 1 1
outlying point [ JKS:6.8.3]:
1 2 1 1 2 1 1 2 1

1 1 1 1 1 1 1 1 1

REJECTED OUTLIER
Two of these, typically those
responding maximally to differences
along the coordinate axes, can be taken
as estimates of the two components of
the gradient,
qGx and Gy . The gradient INLIERS
magnitude G2x + G2y and orientation
R 221

the masks used in the


robust statistics: A general term Robinson edge detector . Most
describing statistical methods which are commonly used as a type of
not significantly influenced by outliers . average smoothing in which the most
[ WP:Robust statistics] homogeneous mask is used to compute
the smoothed value for every pixel. In
robust technique: See
the example, notice how although image
robust estimator . [ FP:15.5]
detail has been reduced the major
ROC: See boundaries have not been smoothed.
receiver operating characteristic .
[ FP:22.2.1]

roll: A 3D rotation representation


component (along with pitch and yaw )
often used for cameras or moving
observers. The roll component specifies
a rotation about the optical axis or line
of sight. This figure shows the roll
rotation direction [ JKS:12.2.1]:

ROLL
DIRECTION

roof edge: 1) An image edge where


the values increase continuously to a
maximum and then decrease
continuously, such as the brightness
values on a Lambertian cylinder when
lit by a point light source , or an
rotation: A circular motion of a set of
orientation discontinuity (or fold edge )
points or object around a given point
in a range image . 2) A scene edge
(2D) or line (3D, called the
where an orientation discontinuity
axis of rotation ). [ JKS:12.2.1-12.2.2]
occurs. The figure shows a horizontal
roof edge in a range image [ JKS:5]: rotation estimation: The problem of
estimating rotation from raw or
processed image, video or range data,
typically from two sets of corresponding
ROOF EDGE
points (or lines, planes, etc.) taken
from rotated versions of a pattern. The
problem usually appears in one of three
forms: 1) estimating the 3D rotation
rotating mask: A mask which is from 3D data (three points are needed),
considered in a number of orientations 2) estimating the 3D rotation from 2D
relative to some pixel. See, for example, data (three points are needed but lead
222 R

to multiple solutions), or 3) estimating quaternions , Euler angles , yaw - pitch -


the 2D rotation from 2D data (two roll , rotation angles around the
points are needed). A second issue to coordinate axes, and axis-angle , etc.
consider is the effect of noise: typically have also been used. [ BKPH:18.10]
more than the minimum number of
points are needed to counteract the rotational symmetry: The property
effects of noise, which leads to least of a set of point or object to remain
square algorithms. unchanged after a given rotation. For
instance, a cube has several rotational
rotation invariant: A property that symmetries, with respect to any 90
keeps the same value even if the data degree rotation around any axis passing
values, the camera, the image or the through the centers of opposite faces.
scene from which the data comes is See also rotation , rotation matrix .
rotated. One needs to distinguish [ WP:Rotational symmetry]
between 2D (i.e., in the image) and 3D
(i.e., in the scene) rotation invariance. RS-170: The standard
For example, the angle between two black-and-white video format in the
image lines is invariant to image United States. The EIA (Electronic
rotation, but not to rotation of the lines Industry Association) is the standards
in the scene. body that originally defined the
[ WP:Rotational invariance] 525-line, 30 frame per second TV
standard for North America, Japan,
rotation matrix: A linear operator and a few other parts of the world. The
rotating a vector in a given space. The EIA standard, also defined under US
inverse of a rotation matrix equals its standard RS-170A, defines only the
transpose. A rotation matrix has only monochrome picture component but is
three degrees of freedom in 3D and one mainly used with the NTSC color
in 2D. In 3D space, there are three encoding standard. A version exists for
eigenvalues, namely 1, cos + i sin , PAL cameras . [ LG:4.1.3]
cos i sin , where i is the imaginary
unit. A rotation matrix in 3D has nine rubber sheet model: See
entries but only three degrees of membrane model.
freedom, as it must satisfy six [ WP:Gravitational well#The rubber-
orthogonality constraints. It can be sheet model]
parameterized in various ways, usually
through Euler angles , yaw - pitch - roll ,
rule-based classification: A method
rotation angles around the coordinate
of object recognition drawn from
axes, and axis-angle , etc. See also
artificial intelligence in which logical
orientation estimation ,
rules are used to infer object type.
rotation representation , quaternions.
[ WP:Concept learning#Rule-
[ FP:2.1.2]
Based Theories of Concept Learning]
rotation operator: A linear operator
expressed by a rotation matrix .
run code: See run length coding .
[ JKS:12.2.1]
[ AJ:9.7]
rotation representation: A
run length coding: A
formalism describing rotations and
lossless compression technique used to
their algebra. The most frequent is
reduce the size of a repeating string of
definitely the rotation matrix , but
R 223

characters, called a run, also compared to other methods, but the


applicable to images. The algorithm algorithm is easy to implement and
encodes a run of symbols into two quick to execute. Run-length coding is
bytes, a count and a symbol. For supported by bitmap file formats such
instance, the 6-byte string xxxxxx as TIFF, BMP and PCX. See also
would become 6x occupying 2 bytes image compression , video compression ,
only. It can compress any type of JPEG . [ AJ:11.2, 11.9]
information content, but the content
itself affects, obviously, the compression run length compression: See
ratio. Compression ratios are not high run length coding . [ AJ:11.2, 11.9]
S

saccade: A movement of the eye or salt-and-pepper noise: A type of


camera, changing the direction of impulsive noise. Let x, y [0, 1] be two
fixation sharply. [ WP:Saccade] uniform random variables, I the true
image value at a given pixel, and In the
saliency map: A representation corrupted (noisy) version of I. We can
encoding the saliency of given image define the effect of salt-and-pepper
elements, typically features or groups noise as In = imin + y(imax imin ) iff
thereof. See also salient feature , x l, where l is a parameter
Gestalt , perceptual grouping , controlling how much of the image is
perceptual organization . corrupted, and imin , imax the range of
[ WP:Salience (neuroscience)] the noise. See also image noise ,
Gaussian noise . This image was
salient feature: A feature associated
corrupted with 1% noise [ TV:3.1.2]:
with a high value of a saliency measure,
quantifying feature suggestiveness for
perception (from the Latin salire, to
leap). For instance, inflection points
have been indicated as salient features
for representing contours. Saliency is a
concept originated from Gestalt
psychology. See also
perceptual grouping ,
perceptual organization .

224
S 225

when d is the more complicated


sampling: The transformation of a geometric distance
continuous signal into a discrete one by d(~x, S(~a)) = miny~S k~x ~y k2 . The
recording its values at discrete instants Sampson approximation defines
or locations. Most digital images are
sampled in space, time and intensity, as f (~a; ~x)2
intensity values are defined only on a d(~x, S(~a)) =
kf (~a; ~x)k2
regular spatial grid, and can only take
integer values. This shows an example which is a first-order approximation to
of a continuous signal and its samples the geometric distance. If an efficient
[ FP:7.4.1]: algorithm for minimizing weighted
algebraic distance is available, then the
Sampson iterations are a further
approximation, where the k th iterate ~ak
is the solution to
n
X
~ak = argmin wi f (~a; ~xi )2
~
a i=1

with weights computed using the


sampling density: The density of a previous estimate so
sampling grid, that is, the number of wi = 1/kf (~ak1 ; ~xi )k2 . [ HZ:3.2.6,
samples collected per unit interval. See 11.4]
also sampling . [ BB:2.2.6]
SAR: see synthetic aperture radar .
sampling theorem: If an image is [ WP:Synthetic aperture radar]
sampled at a rate higher than its
SAT: See symmetric axis transform.
Nyquist frequency then an analog
[ VSN:9.2.2]
image could be reconstructed from the
sampled image whose mean square satellite image: An image of a section
error with the original image converges of the Earth acquired using a camera
to zero as the number of samples goes mounted on an orbiting satellite.
to infinity. [ AJ:4.2] [ WP:Satellite imagery]
Sampson approximation: An saturation: Reaching the upper limit
approximation to the of a dynamic range. For instance,
geometric distance in the fitting of intensity saturation occurs for a 8-bit
implicit curves or surfaces that are monochromatic image when intensities
defined by a parameterized function of greater than 255 are recorded: any such
the form f (~a; ~x) = 0 for ~x on the value is encoded as 255, the largest
surface S(~a) defined by parameter possible value in the range.
vector ~a. Fitting the surface to the set [ WP:Saturation (color theory)]
of points {~x1 , ..., ~xn } consists in
minimizing
Pn a function of the form SavitzkyGolay filtering: A class of
e(~a) = i=1 d(~xi , S(~a)). Simple filters achieving least square fitting of
solutions are often available if the a polynomial to a moving window of a
distance function d(~x, S(~a)) is the signal. Used for fitting and data
algebraic distance d(~x, S(~a)) = f (~a; ~x)2 , smoothing. See also linear filter ,
but under certain common curve fitting .
assumptions, the optimal solution arises [ WP:Savitzky-Golay smoothing filter]
226 S

representing simplifications of the finer


scalar: A one dimensional entity; a real ones. The finest scale is the input image
number. [ WP:Scalar (mathematics)] itself. See scale space representation
for details. [ CS:5]
scale: 1) The ratio between the size of
an object, image, or feature and that of scale space filtering: The filtering
a reference or model. 2) The property operation that transforms one
that some image features are apparent resolution level into another in a
only when viewed at a given size, such scale space , for instance Gaussian
as a line being enlarged so much that it filtering. [ RJS:7]
appears as a pair of parallel edge
features. 3) A measure of the degree to scale space matching: A class of
which fine features have been removed matching techniques that compare
or reduced in an image. One can shape at various scales. See also
analyze images at multiple spatial scale space and image matching .
scales, whereby only features in certain [ CS:5.2.3]
size ranges appear at each scale (see
scale space and pyramid ). scale space representation: A
[ VSN:3.1.2] representation of an image, and more
generally of a signal, making explicit
scale invariant: A property that the information contained at multiple
keeps the same value even if the data, spatial scales , and establishing a causal
the image or the scene from which the relationship between adjacent scale
data comes is shrunk or enlarged. The levels. The scale level is identified by a
perimeter 2 scalar parameter, called scale
ratio area is invariant to image
scaling. [ WP:Scale invariance] parameter. A crucial requirement is
that coarser levels, obtained by
scale operator: An operator successive applications of a
suppressing details (high-frequency scale operator , should constitute
contents) in an image, e.g., simplifications of previous (finer) levels,
Gaussian smoothing . Details at small i.e., introduce no spurious details. A
scales are discarded. The resulting popular scale space representation is
content can be represented in a the Gaussian scale space, in which the
smaller-size image. See also next coarser image is obtained by
scale space, image pyramid , convolving the current image with a
Gaussian pyramid , Laplacian pyramid , Gaussian kernel. The variance of this
pyramid transform. kernel is the scale parameter. See also
scale space , image pyramid ,
scale reduction: The result of the Gaussian smoothing . [ CS:5.3]
application of a scale operator .
scaling: 1) The process of zooming or
scale space: A theory for early vision shrinking an image. 2) Enlarging or
developed to account properly for the shrinking a model to fit a set of data.
multi-scale nature of images. The 3) The process of transforming a set of
rationale is that, in the absence of a values so that they lie inside a standard
priori information on the optimal range (e.g., [1,1]), often to improve
spatial scale at which a specific numerical stability. [ VSN:6.2.1]
problem should be treated (e.g., edge
detection), images should be analyzed scanline: A single (horizontal) line of
at all possible scales, the coarser ones an image. Originally this term was used
S 227

for cameras in which the image is


acquired line by line by a sensing scattergram: See scatterplot .
element that generally scans each pixel [ DH:1.2]
on a line and then moves onto the next
scatterplot: A data display technique
line. [ WP:Scan line]
in which each data item is plotted as a
scanline slice: The cross section of a single point in an appropriate
structure along an image scanline . For coordinate system , that might help a
instance, the scanline slice of a convex person to better understand the data.
polygon in a binary image is: For example, if a set of estimated
surface normals is plotted in a 3D
scatterplot, then planar surfaces should
produce tight clusters of points. The
figure shows a set of data points plotted
1 according to their values of features 1
and 2 [ DH:1.2]:
0
FEATURE 2
scanline stereo matching: The
stereo matching problem with rectified
images, whereby corresponding points
lie on scanlines with the same index.
See also rectification ,
stereo correspondence .

scanning electron microscope


(SEM): A scientific microscope FEATURE 1
introduced in 1942. It uses a beam of scene: The part of 3D space captured
highly energetic electrons to examine by an imaging sensor, and every visible
objects on a very fine scale. The object therein. [ RJS:1]
imaging process is essentially the same
as for a light microscope apart from the scene analysis: The process of
type of radiation used. Magnification is examining an image or video, for the
much higher than what can be achieved purpose of inferring information about
with light. The images are rendered in the scene in view, such as the shape of
gray shades. This technique is the visible surfaces, the identity of the
particularly useful for investigating objects in the scene, and their spatial
microscopic details of surfaces. or dynamic relationships. See also
[ BKPH:11.1.3] shape from contour and the following
shape from entries,
scatter matrix: For a set of d object recognition , and
dimensional points represented as symbolic object representation .
columnPvectors {~x1 , ..., ~xn }, with mean [ RJS:6,7]
n
~ = n1 i=1 ~xi , the scatter matrix is the

d d matrix scene constraint: Any constraint
X
imposed on the image data by the
S= (~xi
~ )(~xi
~)
nature of the scene, for instance, rigid
i=1
motion, or the orthogonality of walls
It is (n 1) times the sample and floors, etc. [ HZ:9.4.1-9.4.2]
covariance matrix. [ DH:4.10]
228 S

scene coordinates: A 3D
coordinate system that describes the screw motion: A 3D transformation
position of scene objects relative to a comprising a rotation about an axis ~a
given coordinate system origin. and translation along ~a. The general
Alternative coordinate systems are Euclidean transformation ~x 7 R~x + ~t is
camera coordinates , a screw transformation if R~t = ~t.
viewer centered coordinates or [ VSN:8.2.1]
object centered coordinates .
search tree: A data structure that
[ JKS:1.4.2]
records the choices that could be made
scene labeling: The problem of in a problemsolving activity, while
identifying scene elements from image searching through a space of alternative
data, associating them to labels choices for the next action or decision.
representing their nature and roles. See The tree could be explicitly created or
also labeling problem, region labeling , be implicit in the sequence of actions.
relaxation labeling , For example, a tree that records
image interpretation , alternative model-to-data feature
scene understanding . [ BB:12.4] matching is a specialized search tree
called an interpretation tree . If each
scene reconstruction: The problem non-leaf node has two children, we have
of estimating the 3D geometry of a a binary search tree. See also
scene, for example the shape of visible decision tree , tree classifier .
surfaces or contours, from image data. [ DH:12.4.1]
See also reconstruction ,
shape from contour and the following SECAM: SECAM (Sequential Couleur
shape from entries or avec Memoire) is the television
architectural model , volumetric broadcast standard in France, the
, surface and slice based reconstruction. Middle East, and most of Eastern
[ WP:Computer vision#Scene reconstruction] Europe. SECAM broadcasts 819 lines
per second. It is one of three main
television standards throughout the
scene understanding: The problem world, the other two being PAL (see
of constructing a semantic PAL camera ) and NTSC . [ AJ:4.1]
interpretation of a scene from image
data, that is, describing the scene in second derivative operator: A
terms of object identities and linear filter estimating the second
relationships among objects. See also derivative from an image at a given
image interpretation , point and in a given direction.
object recognition , Numerically, a simple approximation of
symbolic object representation , the second derivative of a 1D function f
semantic net , graph model , is the central (finite) difference, derived
relational graph. from the Taylor approximation of f :
fi+1 2fi + fi1
SCERPO: Spatial Correspondence, fi = + O(h)
Evidential Reasoning and Perceptual h2
Organization. A well known vision where h is the sampling step (assumed
system developed by David Lowe that constant), and O(h) indicates that the
demonstrated recognition of complex truncation error vanishes as h. A
polyhedral objects (e.g., razors) in a similar but more complicated
complex scene. approximation exists for estimating the
S 229

second derivative in a given direction in See also camera calibration ,


an image. See also autocalibration , stratification ,
first derivative filter. [ JKS:5.3] projective geometry . [ FP:13.6]

second fundamental form: See self-localization: The problem of


surface curvature . [ FP:19.1.2] estimating the sensors position within
an environment from image or video
seed region: The initial region used data. The problem can be cast as
in a region growing process such as geometric model matching if models of
surface fitting in range data or sufficiently complex objects are
intensity region finding in an available, i.e., containing enough points
intensity image . The patch on the to allow a full solution of the
surface here is a potential seed region pose estimation problem. In some
for region growing the full cylindrical situations it is possible to identify a
patch [ JKS:3.5]: sufficient number of landmark points
(see landmark detection ). If no
information at all is available about the
scene , one can still apply tracking or
optical flow techniques to get
corresponding points over time, or
stereo correspondences in multiple
simultaneous frames. See also
motion estimation , egomotion .
segmentation: The problem of
dividing a data set into parts according self-occlusion: Occlusion in which
to a given set of rules. The assumption part of an object is occluded by another
is that the different segments part of the same object. In the
correspond to different structures in the following example the left leg of the
original input domain observed in the person is occluding their right leg.
image. See for instance
image segmentation ,
color image segmentation ,
curve segmentation ,
motion segmentation ,
part segmentation ,
range data segmentation,
texture segmentation . [ FP:14-14.1.2]

self-calibration: The problem of SEM: See


estimating the calibration parameters scanning electron microscope .
using only information extracted from a [ BKPH:11.1.3]
sequence or set of images (typically
feature point correspondences in semantic net: A graph representation
subsequent frames of a sequence or in in which nodes represent the objects of
several simultaneous views), as opposed a given domain, and arcs properties and
to traditional calibration in relations between objects. See also
photogrammetry , that adopt specially symbolic object representation ,
built calibration objects. Self graph model , relational graph . A
calibration is intimately related with simple example: an arch and its
basic concepts of multi-view geometry . semantic net representation [ BB:10.2]:
230 S

from the sequence. A typical example is


ARCH image sequence stabilization , in which
a target moving across the image in the
original sequence appears stationary in
TOP POST POST
the output sequence. Another example
PART_OF is keeping a robot stationary in front of
SUPPORTS a target using only visual data (station
keeping). Suppression of jitter in
semantic region growing: A hand-held video recorders is now
region merging scheme incorporating a commercially available. Basic
priori knowledge about adjacent ingredients are tracking and
regions ; for instance, in aerial imagery motion estimation . See also egomotion.
of countryside areas, the fact that roads
are usually surrounded by fields.
Constraint propagation can then be sensor motion estimation: See
applied to achieve a globally optimal egomotion . [ FP:17.5.1]
region segmentation. See also
constraint satisfaction , sensor path planning: See
relaxation labeling , sensor planning.
region segmentation , sensor placement determination:
region based segmentation , See camera calibration and
recursive region growing . [ BB:5.5] sensor planning .
sensor: A general word for a sensor planning: A class of
mechanism that records information techniques aimed to determine optimal
from the outside world, generally for sensing strategies for a reconfigurable
processing by a computer. The sensor sensor system, normally given a task
might obtain raw measurements, e.g., a and a geometric model of the target
video camera, or partially processed object (that may be partially acquired
information, e.g., depth from a in previous views). For example, given
stereo triangulation process. [ BM:1.9] a geometric feature on an object for
which a CAD-like model is known, and
sensor fusion: A vast class of the task to verify the features size, a
techniques aiming to combine the sensor planning system would
different information contained in data determine the best position and
from different sensors, in order to orientation of, say, a single camera and
achieve a richer or more accurate associated illumination for estimating
description of a scene or action. Among the size of each feature. The two basic
the many paradigms for fusing sensory approaches have been generate-and-test,
information are the Kalman filter , in which sensor configurations are
Bayesian models , fuzzy logic , generated and then evaluated with
DempsterShafer evidential reasoning, respect to the task constraints, and
production systems and synthetic methods, in which task
neural networks . [ WP:Sensor fusion] constraints are characterized
analytically and the resulting equations
sensor motion compensation: A solved to yield the optimal sensor
class of techniques aiming to suppress configuration. See also active vision ,
the motion of a sensor (or its effects) in purposive vision .
a video sequence, or in data extracted
S 231

sensor posi- blocking filters should be considered for


tion estimation: See pose estimation. fine measurements depending on
[ WP:Pose (computer vision)#Pose Estimation]camera intensities. We also notice that
a CCD camera makes a very good
sensor for the near-infrared range
sensor response: The output of a (7501000 nm).
sensor, or a characterization of some [ WP:Spectral sensitivity]
key output quantities, given a set of
inputs. Typically expressed in the separability: A term used in
frequency domain , as a function linking classification problems referring to
the magnitude and phase of the whether the data is capable of being
Fourier transform of the output signal split into distinct subclasses by some
with the known frequency of the input. automatic decision process. If property
See also phase spectrum , values of two classes overlap, then the
power spectrum , spectral response . classes are not separable. The circle
class is linearly separable in the figure
sensor sensitivity: In general, the below, but the and box classes are
weakest input signal that a sensor can not:
detect. It can be inferred from the
sensor response curve. For the common
FEATURE 2
CCD sensor of video cameras,
sensitivity depends on various
parameters, mainly the fill factor (the
x
percentage of the sensors area actually x
x
sensitive to light) and well capacity (the x
amount of charge that a photosensitive x x x
x
element can hold). The larger the x

values of the above parameters, the


FEATURE 1
more sensitive the camera. See also
sensor spectral sensitivity . separable filter: A 2D (in image
[ WP:Sensor#Use] processing) filter that can be expressed
as the product of two filters, each of
sensor spectral sensitivity: A which acts independently on rows and
characterization of a sensors response columns. The classic example is the
in frequency. For example, linear Gaussian filter (see
Gaussian convolution ). Separability
implies a significant reduction in
computational complexity, typically
reducing processing costs from O(N 2 )
to O(2N ), where N is the filter size.
See also linear filter ,
separable template .

separable template: A template or


shows the spectral sensitivity of a structuring element in a filter , for
typical CCD sensor (actually its instance a morphological filter (see
spectral response, from which the morphology ), that can be decomposed
spectral sensitivity can be inferred). into a sequence of smaller templates,
Notice that the high sensitivity of similarly to separable kernels for
silicon in the infrared means that IR linear filters . The main advantage is a
232 S

reduction in the computational


complexity of the associated filter. See ATTACHED
also separable filter . SHADOW

set theoretic modeling: See CAST SHADOW


constructive solid geometry .
[ JKS:15.3.2]

shading: The pattern formed by the shadow, attached: A shadow caused


graded areas of an intensity image, by an object on itself by self-occlusion.
suggesting light and dark. Variations in See also shadow, cast. [ FP:5.3.1]
the lightness of surfaces in the scene
may be due to vartiations in shadow, cast: A shadow thrown by
illumination , surface orientation and an object on another object. See also
surface reflectance . See also shadow, attached . [ FP:5.3.1]
illumination , shadow . [ BKPH:10.10] shadow detection: The problem of

shading correction: A class of identifying image regions


techniques for changing undesirable corresponding to shadows in the scene,
shading effects , for instance strongly using photometric properties. Useful
uneven brightness distribution caused for true color estimation and region
by nonuniform illumination . All analysis. See also color ,
techniques assume a shading model, color image segmentation,
i.e., a photometric model of color matching , photometry ,
image formation , formalizing the region segmentation .
dependency of the measured image shadow type labeling: A problem
brightness on camera parameters similar to shadow detection , but
(typically gain and offset), illumination requiring classification of different types
and object reflectance . See also of shadows.
shadow , photometry .
shadow understanding: Estimating
shading from shape: A technique various properties of a 3D scene based
recovering the reflectance of isolated on the appearance or size of shadows,
objects given a single image and a e.g., building height. See also
geometric model , but not exactly the shadow type labeling .
inverse of the classic
shape from shading problem. See also shape: Informally, the form of an
photometric stereo . image or scene object. Typically
described in computer vision through
shadow: A part of a scene that direct geometric representations (see
illumination does not reach because of shape representation ), e.g., modeling
self-occlusion ( attached shadow or image contours with polynomials or
self-shadow) or occlusion caused by b-spline , or range data patches with
other objects (cast shadow ). Therefore, quadric surfaces. More formally,
this region appears darker than its definitions are: 1. (adj) The quality of
surroundings. See also an object that is invariant to changes of
shape from shading , the coordinate system in which it is
shading from shape , expressed. If the coordinate system is
photometric stereo [ FP:5.3.1]. See: Euclidean , this corresponds to the
conventional idea of shape. In an affine
S 233

coordinate system, the change of parameters) that must be calibrated.


coordinates may be affine, so that, for Depth is estimated using this model
example, an ellipse and a circle have once image readings (pixel values) are
the same shape. 2. (n) A family of available. Notice that the camera uses a
point sets, any pair being related by a large aperture, so that the points in the
coordinate system transformation. scene are in focus over the smallest
3. (n) A specific set of n-dimensional possible depth interval. See also
points, e.g., the set of squares. For shape from focus . [ E. Krotkov,
example a curve in R2 defined Focusing, Int. J. of Computer Vision,
parametrically as ~c(t) = (x(t), y(t)) 1:223-237, 1987.]
comprises the point set or shape
{~c(t) | < t < }. The volume shape from focus: A class of
inside the unit sphere in 3D is the shape algorithms for estimating scene depth
{~x | k~xk < 1, ~x R3 }. [ ZRH:2.3] at each image pixel, and therefore
surface shape, by varying the focus
shape class: One in a set of classes setting of a camera until the image
representing different types of shape in achieves optimal focus (minimum blur)
a given classification, for instance, in a neighborhood of the pixel under
locally convex or hyperbolic in examination. Obviously, pixels
HK segmentation of a range image . corresponding to different depths would
[ TV:4.4.1] achieve optimal focus for different
settings. A model of the relation
shape decomposition: See between depth and image focus is
segmentation and assumed, containing a number of
hierarchical modeling . [ FP:14-14.1.2] parameters (e.g., the optics parameters)
that must be calibrated. Notice that
shape from contours: A class of
the camera uses a large aperture, so
algorithms for estimating the shape of
that the smallest possible depth interval
a 3D object from the contour it
generates in-focus image points. See
generates in an image. A well-known
also shape from defocus . [ EK]
technique, shape from silhouettes ,
consists in extracting the objects shape from line drawings: A class
silhouette from a number of views, and of symbolic algorithms inferring 3D
intersecting the 3D cones generated by properties of scene objects (as opposed
the silhouettes contours and the to exact shape measurements, as in
centers of projections. The intersection other shape from methods) from line
volume is known as the visual hull . drawings. First, assumptions are made
Work also exists on understanding about the type of line drawings
shape from the differential properties of admissible, e.g., polyhedral objects
apparent contours . [ JJK] only, no surface markings or shadows,
maximum three lines forming an image
shape from defocus: A class of
junction. Then, a dictionary of line
algorithms for estimating scene depth
junctions is formed, assigning a
at each image pixel, and therefore
symbolic label to every possible
surface shape, from multiple images
appearance of the line junctions in
acquired at different, controlled focus
space under the given assumptions.
settings. A closed-form model of the
This figure shows part of a simple
relation between depth and image focus
dictionary of junctions and a labeled
is assumed, containing a number of
shape:
parameters (e.g., the optics
234 S

as dense motion fields, i.e.,


optical flow, seeking to reconstruct
+ + dense surfaces. See also
motion factorization . [ JKS:11.3]
+
shape from multiple sensors: A
class of algorithms recovering shape
from information collected from a

+ + number of sensors of the same type, or


of different types. For the former class,
+ + + see multi-view stereo , For the second
class, see sensor fusion .
shape from optical flow: See
optical flow .
where + means planes intersecting in a
convex shape, in a concave shape, shape from orthogonal views: See
and the arrows a discontinuity shape from contours . [ JJK]
(occlusion) between surfaces. Each shape from perspective: A class of
image junction is then assigned the set techniques estimating depth for various
of all possible labels that its shape features from perspective cues, for
admits locally (e.g., all possible two-line instance the fact that a translation
junction labels for a two-line junction). along the optical axis of a
Finally, a constraint satisfaction perspective camera changes the size of
algorithm is used to prune labels the imaged objects. See also
inconsistent with the context. See also pinhole camera model . [ SQ:9A.2.1]
Waltzs line labeling ,
relaxation labeling . shape from photo consistency: A
technique based on space carving for
shape from monocular depth cues: recovering shape from multiple views
A class of algorithms estimating shape (photos). The basic constraint is that
from information related to depth the underlying shape must be
detected in a single image, i.e., from photo-consistent with all the input
monocular cues. See photos, i.e., roughly speaking, give rise
shape from contours , to compatible intensity values in all
shape from line drawings , cameras.
shape from perspective ,
shape from shading , shape from photometric stereo:
shape from specularity, See photometric stereo . [ JKS:11.3]
shape from structured light ,
shape from texture . shape from polarization: A
technique recovering local shape from
shape from motion: A vast class of the polarization properties of a surface
algorithms for estimating 3D shape under observation. The basic idea is to
(structure), and often depth , from the illuminate a surface with known
motion information contained in an polarized light , estimate the
image sequence. Methods exist that polarization state of the reflected light,
rely on tracking sparse sets of then use this estimate in a closed-form
image features (for instance, the model linking the surface normals with
TomasiKanade factorization ) as well
S 235

the measured polarization parameters. shape from texture: The problem of


In practice, polarization estimates can estimating shape, here in the sense of a
be noisy. This method can be useful field of normals from which a surface
wherever intensity images do not can be recovered up to a scale factor,
provide information, e.g., featureless from the image texture . The
specular surfaces. See also deformation of a planar texture
polarization based methods . recorded in an image (the
texture gradient ) depends on the shape
shape from shading: The problem of of the surface to which the texture is
estimating shape, here in the sense of a applied. Techniques exist for shape
field of normals from which a surface estimation from statistical texture and
can be recovered up to a scale factor, regular texture patterns. [ FP:9.4-9.5]
from the shading pattern (light and
shadows) of an image. The key idea is shape from X: A generic term for a
that, assuming a reflectance map for method that generates 3D shape or
the scene (typically Lambertian), an position estimates from one of a variety
image irradiance equation can be of possible techniques, such as stereo ,
written linking the surface normals to shading , focus , etc. [ TV:9.1]
the illumination direction and the
image intensity. The constraint can be shape from zoom: The problem of
used to recover the normals assuming computing shape (in the sense of the
local surface smoothness. [ JKS:9.4] distance of each scene point from the
sensor) from two or more images
shape from shadows: A technique acquired at different zoom settings,
for recovering geometry from a number achieved through a zoom lens . The
of images of an outdoor scene acquired basic idea is to differentiate the
at different times, i.e., with the sun at projection equations with respect to the
different angles. Geometric information focal length , f , achieving an expression
can be recovered under various linking the variations of f and pixel
assumptions and knowledge of the suns displacement with depth . [ J. Ma, and
position. Also called shape from S. I. Olsen, Depth from zooming, J.
darkness. See also shape from shading Optical Society of America A, Vol. 7,
and photometric stereo. pp 1883-1890, 1990.]

shape from silhouettes: See shape grammar: A grammar


shape from contours . [ JJK] specifying a class of shapes , whose
rules specify patterns for combining
shape from specularity: A class of more primitive shapes. Rules are
algorithms for estimating local shape composed of two parts, 1) describing a
from surface specularities. A specific shape and 2) how to replace or
specularity constrains the surface transform it. Used also in design, CAD,
normal as the incident and reflection and architecture. See also
angles must coincide. The detection of production system , expert system .
specularities in images is, in itself, a [ BB:6.3.2]
non-trivial problem.
shape index: A measure, usually
shape from structured light: See indicated by S, of the type of shape of
structured light triangulation . a surface patch in terms of its
[ JKS:11.4.1]
236 S

principal curvature . Formally shape texture: The texture of a


surface from the point of view of the
2 M + m variation in the shape, as contrasted to
S= arctan
M m the variation in the reflectance
patterns on the surface. See also
where m and M are the principal
surface roughness characterization .
curvatures. S is undetermined for
planar patches. A related parameter, sharpunsharp masking: A form of
R, called curvedness, measures the image enhancement that makes the
amount of curvedness of the patch: edges of image structures crisper. The
q operator can either add a weighted
(2M + 2m )/2 amount of a gradient or
high-pass filter of the image or subtract
All curvature-based shape classes map a weighted amount of a smoothing or
to the unit circle in the RS plane, with low pass filter of the image. The image
planar patches at the origin. See also on the right is an unsharp masked
mean and Gaussian curvature version of the one on the left
shape classification , [ SEU:4.3]:
shape representation .

shape magnitude class: Part of a


local surface curvature representation
scheme in which each point has a
curvature class , and a magnitude of
curvature (shape magnitude). This
representation is an alternative to the
more common shape classification based
on either the two principal curvatures
or the mean and Gaussian curvature .

shape representation: A large class shear transformation: An affine


of techniques seeking to capture the image transformation changing one
salient properties of shapes, both 2D coordinate only. The corresponding
and 3D, for analysis and comparison transformation matrix, S, is equal to
purposes. Many representations have the identity apart from s12 = sx , which
been proposed in the literature, changes the first image coordinate.
including skeletons for 2D and 3D Shear on the second image coordinate is
shapes (see medial axis skeletonization obtained similarly by s21 = sy . An
and distance transform ), example of the result of a shear
curvature-based representations (for transformation is [ SQ:9.1]:
instance, the curvature primal sketch ,
the curvature scale space , the
extended Gaussian image ),
generalized cones for articulated
objects, invariants , and flexible objects
models (for instance snakes ,
deformable superquadrics , and
deformable template model ).
[ ZRH:2.3] shock tree: A 2D
shape representation technique based
S 237

on the singularities (see purposes. See image compression ,


singularity event ) of the radius function digital watermarking .
along the medial axis (MA). The MA is
represented by a tree with the same signal processing: The collection of
structure, and is divided into mathematical and computational tools
continuous segments of uniform for the analysis of typically 1D (but
behavior (local maximum, local also 2D, 3D, etc.) signals such as audio
minimum, constant, monotonic). See recordings or other intensity versus
also medial axis skeletonization , time or position measurements. Digital
distance transform . signal processing is the subset of signal
processing which pertains to signals
short baseline stereo: See that are represented as streams of
narrow baseline stereo . binary digits. [ WP:Signal processing]

shot noise: See impulse noise and signal-to-noise ratio (SNR): A


salt-and-pepper noise . [ FP:1.4.2] measure of the relative strength of the
interesting and uninteresting (noise)
shutter: A device allowing the light part of a signal. In signal processing,
into a camera for enough time to form SNR is usually expressed in decibels as
an image on a photosensitive film or the ratio of the power of signal and
chip. Shutters can be mechanical, as in noise, i.e., 10 log Ps . With statistical
10 Pn
traditional photographic cameras, or noise, the SNR can be defined as 10
electronic, as in a digital camera . In times the log of the ratio of the
the former case, a window-like standard deviations of signal and noise.
mechanism is opened to allow the light [ AJ:3.6]
to be recorded by a photosensitive film.
In the latter case, a CCD or other type signature identification: A class of
of sensor is triggered electronically to techniques for verifying a written
record the amount of incident light at signature. Also known as Dynamic
each pixel. Signature Verification. An area of
[ WP:Shutter (photography)] biometrics . See also
handwriting verification ,
shutter control: The device handwritten character recognition ,
controlling the length of time that the fingerprint identification ,
shutter is open. face identification.
[ WP:Exposure (photography)#Exposure control]
[ WP:Handwriting recognition]

signature verification: The problem


side looking radar: A radar of authenticating a signature
projecting a fan-shaped beam automatically with image processing
illuminating a strip of the scene at the techniques; in practice, deciding
side of the instrument, typically used whether a signature matches a
for mapping a large area. The map is specimen sufficiently well. See also
produced as the instrument is carried handwriting verification and
along by a vehicle sweeping the surface handwritten character recognition .
to the side. See also sonar . [ WP:Handwriting recognition]
signal coding system: A system for silhouette: See object contour .
encoding a signal into another, [ FP:19.2]
typically for compression or security
238 S

SIMD: See
single instruction multiple data . simple lens: A lens composed by a
[ RJS:8] single piece of refracting material,
shaped in such a way to achieve the
similarity: The property that makes desired lens behavior. For example, a
two entities (images, models, objects, convex focusing lens. [ BKPH:2.3]
features, shape, intensity values, etc.)
or sets thereof similar, that is, simulated annealing: A
resembling each other. A coarse-to-fine, iterative optimization
similarity transformation creates algorithm. At each iteration, a
perfectly similar structures and a smoothed version of the energy
similarity metric quantifies the degree landscape is searched and a global
of similarity of two possibly minimum located by a statistical (e.g.,
non-identical structures. Examples of random) process. The search is then
similar structures are 1) two polygons performed at a finer level of smoothing,
identical except for a change in size, and so on. The idea is to locate the
and 2) two image neighborhoods whose basin of the absolute minimum at
intensity values are identical except for coarse scales, so that fine-resolution
scaling by a multiplicative factor. The search starts from an approximate
concept of similarity lies at the heart of solution close enough to the absolute
several classic vision problems, minimum to avoid falling into
including stereo correspondence , surrounding local minima. The name
image matching , and derives from the homonymous
geometric model matching . procedure for tempering metal, in
[ JKS:14.3] which temperature is lowered in stages,
each time allowing the material to
similarity metric: A metric reach thermal equilibrium. See also
quantifying the similarity of two coarse-to-fine processing . [ SQ:2.3.3]
entities. For instance, cross correlation
is a common similarity metric for image single instruction multiple data
regions. For similarity metrics on (SIMD): A computer architecture
specific objects encountered in vision, allowing the same instruction to be
see feature similarity , graph similarity , simultaneously executed on multiple
gray scale similarity . See also processors and thus different portions of
point similarity measure , matching . the data set (e.g., different pixels or
[ DH:6.7] image neighborhoods). Useful for a
variety of low-level image processing
similarity transformation: A operations. See also MIMD ,
transformation changing an object into pipeline parallelism , data parallelism ,
a similar-looking one; formally, a parallel processing. [ RJS:8]
conformal mapping preserving the ratio
of distances (the magnification ratio). single photon emission computed
The transformation matrix, T, can be tomography (SPECT): A medical
1
written as T = B AB, where A and imaging technique that involves the
B are similar matrices, that is, rotation of a photon detector array
representing the same transformation around the body in order to detect
after a change of basis. Examples photons emitted by the decay of
include rotation, translation, expansion previously injected radionuclides. This
and contraction (scaling). [ SQ:9.1] technique is particularly useful for
creating a volumetric image showing
S 239

metabolic activity. Resolution is lower which are the eigenvalues of a special


than PET but imaging is cheaper and symmetric tridiagonal matrix. This
some SPECT radiopharmaceuticals includes the
may be used where PET nuclides discrete cosine transform (DCT) .
cannot. [ WP:SPECT] [ AJ:5.12]

singular value decomposition skeleton: A curve, or tree-like set of


(SVD): A factorization of any m n curves, capturing the basic structure of
matrix A into A = UDVT . The an object. This figure shows an
columns of the m m matrix U are example of a linear skeleton for a
mutually orthogonal unit vectors, as are puppet-like 2D shape:
the columns of the n n matrix V. The
m n matrix D is diagonal, and its
nonzero elements, the singular values
i , satisfy 1 2 . . . n 0. The
SVD has extremely useful properties.
For example:
A is nonsingular if and only if all
its singular values are nonzero,
and the number of nonzero
singular values gives the rank of
A;
the columns of U corresponding The curves forming the skeleton are
to the nonzero singular values typically central to the shape. Several
span the range of A; the columns algorithms exist for computing
of V corresponding to the skeletons, for instance, the medial axis
nonzero singular values span the transform (see
null space of A; medial axis skeletonization ) and the
the squares of the nonzero distance transform , for which the
singular values are the nonzero grassfire algorithm can be applied.
eigenvalues of both AAT and [ AJ:9.9]
AT A, and the columns of U are skeleton by influence zones (SKIZ):
eigenvectors of AAT , those of V Commonly known as the
of AT A. Voronoi diagram . [ SQ:7.3.2]
Moreover, the pseudoinverse of a
matrix, occurring in the solution of skeletonization: A class of techniques
rectangular linear systems, can be that try to reduce a 2D (or 3D) binary
easily computed from the SVD image to a skeleton form in which
definition. [ FP:12.3.2] every remaining pixel is a skeleton
pixel, but the essential shape of the
singularity event: A point in the input image is captured. Definitions of
domain of the map of a geometric curve the skeleton include the set of centers of
or surface where the first derivatives circles bitangent to the object
vanish. boundary and
[ WP:Singular point of a curve] smoothed local symmetries . [ RJS:6]

sinusoidal projection: A family of skew: An error introduced in the


linear image transforms, C, the rows of imaging geometry by a non-orthogonal
240 S

pixel grid, in which rows and columns


of pixels do not form an angle of exactly
90 degrees. This is usually considered SURFACE NORMAL
DIRECTION
only in high-accuracy photogrammetry OF VIEW SLANT
applications. [ JKS:12.10.2] ANGLE

skew correction: A transformation


compensating for the skew error.
[ JKS:12.10.2]

skew symmetry: A skew symmetric See also tilt , shape from texture .
contour is a planar contour such that [ FP:9.4.1]
every straight line oriented at an angle
with respect to a particular axis, slant normalization: A class of
called the skew symmetry axis of the algorithms used in handwritten
contour, intersects the contour at two character recognition, transforming
points equidistant from the axis. An slanted cursive character into vertical
example [ BB:9.5.4]: ones. See
handwritten character recognition,
optical character recognition .

slice based reconstruction: The


reconstruction of a 3D object from a
number of planar slices, or sections
taken across the object. The slice plane
AXIS is typically advanced at regular spatial
intervals to sweep the working volume.
See also tomography ,
d computerized tomography ,
single photon emission
d
computed tomography and
nuclear magnetic resonance .

slope density function: This is the


histogram of the tangential orientations
(slopes) of a curve or region boundary.
skin color analysis: A set of It can be used to represent the curve
techniques for color analysis applied to shape in a manner invariant to
images containing skin, for instance for translation and rotation (up to a shift
retrieving images from a database (see of the density function). [ BB:8.4.5]
color based image retrieval ). See also
color, color image , small motion model: A class of
color image segmentation , mathematical models representing very
color matching , and colorimetry . small (ideally, infinitesimal)
camera-scene motion between frames.
SKIZ: See skeleton by influence zones . Used typically in shape from motion .
[ SQ:7.3.2] See also optical flow .

slant: The angle between a smart camera: A hardware device


surface normal in the scene and the incorporating a camera and an
viewing direction: on-board computer in a single, small
S 241

container, thus achieving a signal-to-noise ratio of the image. See


programmable vision system within the also discontinuity preserving
size of a normal video camera. smoothing, anisotropic diffusion and
[ TV:2.3.1] adaptive smoothing . [ FP:7.1.1]

smooth motion curve: The curve smoothing filter: Smoothing is often


defined by a motion that can be achieved by convolution of the image
expressed by smooth (that is, with a smoothing filter to reduce noise
differentiable: derivatives of all orders or high spatial frequency detail. Such
exist) parametric functions of the image filters include discrete approximations
coordinates. Notice that smooth is to the symmetric probability densities
often used in an intuitive sense, not in such as the Gaussian , binomial and
the strict mathematical sense above uniform distributions. For example, in
(clearly, an exacting constraint), as, for 1D, the discrete signal x1 . . . xn is
example, in image smoothing . See also convolved with the kernel [ 16 46 61 ] to
motion , motion analysis . produce the smoothed signal y1 . . . yn+2
in which yi = 61 xi1 + 64 xi + 16 xi+1 .
smoothed local symmetries: A class [ FP:7.1.1]
of skeletonization algorithms,
associated with Asada and Brady. smoothness constraint: An
Given a 2D curve that bounds a closed additional constraint used in data
region in the plane, the skeleton as interpretation problems. The general
computed by smoothed local principle is that results derived from
symmetries is the locus of chord nearby data must themselves have
midpoints of bitangent circles. similar values. Traditional examples of
Compare the symmetric axis transform. where the smoothness constraint can be
Two skeleton points as defined by applied are in shape from shading and
smoothed local symmetries are shown: optical flow . The underlying
observation that supports this
computational constraint is that the
observed real world surfaces and
motions are smooth almost everywhere.
[ JKS:9.4]

snake: A snake is the combination of a


deformable model and an algorithm for
fitting that model to image data. In
one common embodiment, the model is
a parameterized 2D curve , for example
smoothing: Generally, any a b-spline parameterized by its control
modification of a signal intended to points. Image data, which might be a
remove the effects of noise . Often used gradient image or 2D points, induces
to mean the attenuation of high forces on points on the snake that are
spatial frequency components of a translated to forces on the control
signal. As many models of noise have a points or parameters. An iterative
flat power spectral density (PSD), algorithm adjusts the control points
while natural images have a PSD that according to these forces and
decays toward zero at high spatial recomputes the forces. Stopping
frequencies, suppressing the high criteria, step lengths, and other issues
frequencies increases the overall
242 S

of optimization are all issues that must may be computed. See also
be dealt with in an effective snake. fuzzy morphology .
[ TV:5.4]
soft morphology: See
SNR: See signal-to-noise ratio . soft mathematical morphology .
[ AJ:3.6]
soft vertex: A point on a polyline
Sobel edge detector: An whose connecting line segments are
edge detector based on the almost collinear. Soft vertices may arise
Sobel kernels . The edge magnitude from segmentation of a smooth curve
image E is the square root of the sum into line segments. They are called
of squares of the convolution of the soft because they may be removed if
image with horizontal and vertical the segments of the polyline are
Sobelpkernels, given by replaced by curve segments. [ JKS:6.6]
E = (Kx I)2 + (Ky I)2 . The
Sobel operator applied to the left image
gives the right image [ JKS:5.2.2]: solid angle: Solid angle is a property
of a 3D object: the amount of the unit
spheres surface that the objects
projection onto the unit sphere
occupies. The unit spheres surface area
is 4, so the maximum value of a solid
angle is 4 steradians [ FP:4.1.2]:

Sobel gradient operator: See


Sobel kernel. [ JKS:5.2.2]

Sobel kernel: A gradient estimation SOLID ANGLE


kernel used for edge detection . The
horizontal kernel is the convolution of a
smoothing filter , s = [1, 2, 1] in the
horizontal direction and a
gradient operator d = [1, 0, 1] in the
vertical direction. The kernel

1 2 1
Ky = s d = 0 0 0 .
1 2 1

highlights horizontal edges. The


vertical kernel Kx is the transpose of
Ky . [ JKS:5.2.2] source: An emitter of energy that
illuminate the vision systems sensors .
soft mathematical morphology: An
extension of gray scale morphology in source geometry: See
which the min/max operations are light source geometry .
replaced by other rank operations e.g.,
replace each pixel in an image by the source image: The image on which an
90th percentile value in a 5 5 window image processing or an image analysis
centered at the pixel. Weighted ranks operation is based.
S 243

Source Image Target Image

spatial angle: The area on a unit


sphere that is bounded by a cone with
its apex in the center of the sphere.
Measured in steradians. This is
source placement: See frequently used when analyzing
light source placement . luminance .
space carving: A method for creating
a 3D volumetric model from 2D
images. Starting from a voxel
representation in which a 3D cube is
marked occupied, voxels are removed Spatial Angle
if they fail to be photo-consistent in
the set of 2D images in which they
appear. The order in which the voxels
are processed is a key aspect of space
carving, as it allows otherwise
intractable visibility computations to
be avoided. [ K. N. Kutulakos, and S. spatial averaging: The pixels in the
M. Seitz, A Theory of Shape by Space output image are weighted averages of
Carving, Int. J. of Computer Vision, their neighboring pixels in the input
Vol. 38, pp 199-218, 2000.] image. Mean and Gaussian smoothing
are examples of spatial averaging.
space curve: A curve that may follow [ AJ:7.4]
a path in 3D space (i.e., it is not
restricted to lying in a plane). spatial domain smoothing: An
[ WP:Space curve#Topology] implementation of smoothing in which
each pixel is replaced by a value that is
space variant sensor: A sensor in directly computed from other pixels in
which the pixels are not uniformly the image. In contrast,
sampling the projected image data. For frequency domain smoothing first
example, a log-polar sensor has rings of processes all pixels to create a linear
pixels of exponentially increasing size as transformation of the image, such as a
one moves radially from the central point Fourier transform and expresses the
[ WP:Space Variant Imaging#Foveated sensors]smoothing operation in terms of the
: transformed image. [ RJS:4]
244 S

spatial frequency: The rate of spatial matched filter: See


repetition of intensities across an image. matched filter . [ ERD:10.4]
In a 2D image the space to which
spatial refers is the images XY plane. spatial occupancy: A form of object
or scene representation in which a 3D
space is divided into a grid of voxels .
Voxels containing a part of the object
are marked as being occupied and other
voxels are marked as free space. This
representation is particularly useful for
tasks where properties of the object are
less important than simply the presence
and position of the object, as in robot
navigation. [ JKS:15.3.2]

spatial proximity: The distance


between two structures in real space (as
contrasted with proximity in a feature
This image has significant repetition at or property space). [ JKS:3.1]
1
a spatial frequency of 10 pixel1 in the spatial quantization: The conversion
horizontal direction. The 2D of a signal defined on an infinite
Fourier transform represents spatial domain to a finite set of
frequency contributions in all limited-precision samples. For example
directions, at all frequencies. A discrete the function f (x, y): R2 7 R might be
approximation is efficiently computed quantized to the image g, of width w
using the fast Fourier transform (FFT). and height h defined as g(i, j):
[ EH:7.7] {1..w} {1..h} 7 R. The value of a
spatial hashing: See spatial indexing . particular sample g(i, j) is determined
[ WP:Spatial index] by the point-spread function p(x, y),
and is given
R by
spatial indexing: 1) Conversion of a g(i, j) = p(x i, y j)f (x, y)dxdy.
shape to a number, so that it may be [ SEU:2.2.4]
quickly compared to other shapes.
Intimately linked with the computation spatial reasoning: Inference from
of invariants to spatial transformations geometric rather than symbolic or
and imaging distortions of the shape. linguistic information. See also
For example, a shape represented as a geometric reasoning .
collection of 2D boundary points might [ WP:Spatial reasoning]
be indexed by its compactness . 2) The spatial relation: An association of
design of efficient data structures for two or more spatial entities, expressing
search and storage of geometric the way in which such entities are
quantities. For example closest-point connected or related. Examples include
queries are made more efficient by the perpendicularity or parallelism of lines
computation of spatial indices such as or planes, and inclusion of one image
the Voronoi diagram , region in another. [ BKKP:5.8]
distance transform , k-D trees, or
Binary Space Partitioning (BSP) trees.
[ WP:Spatial index]
S 245

spatial resolution: The smallest


separation between distinct signal Laser source
Imaging surface
(e.g. CCD array)
Beam interference
features that can be measured by a gives light/dark spot

sensor. For a CCD camera, this is


dictated by the distance between
Rough surface
adjacent pixel centers. It is often xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

specified as an angle: the angle between


the 3D rays corresponding to adjacent speckle reduction: Restoration of
pixels. The inverse of the highest images corrupted with speckle noise,
spatial frequency that a sensor can such as laser or ultrasound images.
represent without aliasing . [ JKS:8.2] [ AJ:8.13]

spatio-temporal analysis: The SPECT: See single-photon emission


analysis of moving images by processing computed tomography . [ WP:SPECT]
that operates on the 3D volume formed
by the stack of 2D images in a sequence.
Examples include kinetic occlusion, the spectral analysis: 1) Analysis
epipolar plane image (EPI) and performed in either the spatial ,
spatio-temporal autoregressive models temporal or electromagnetic frequency
(STAR). domain. 2) Generally, any analysis that
involves the examination of eigenvalues.
special case motion: A subproblem This is a nebulous concept, and
of the general structure from motion consequently the number of spectral
problem, where the camera motion is techniques is large. Often equivalent
known to be constrained a priori. to PCA .
Examples include planar motion ,
turntable motion or single-axis rotation, spectral decomposition method:
and pure translation. In each case, the See spectral analysis .
constrained motion simplifies the spectral density function: See
general problem, yielding one or more power spectrum . [ AJ:2.11]
of: closed-form solutions, greater
efficiency, increased accuracy. Similar spectral distribution: The
benefits can be obtained from spatial power spectrum or
approximations such as the electromagnetic spectrum distribution.
affine camera and weak perspective .
spectral factorization: A method for
speckle: A pattern of light and dark designing linear filters based on
spots superimposed on the image of a difference equations that have a given
scene that is illuminated by coherent spectral density function when applied
light such as from a laser. Rough to white noise . [ AJ:6.3]
surfaces in the scene change the path
lengths and thus the interference effects spectral filtering: Modifying the light
of different rays, so a fixed scene, laser before it enters the sensor by using a
and imager configuration results in a filter tuned to different spectral
fixed speckle pattern on the imaging frequencies. A common use is with
surface. [ AJ:8.13] laser sensing, in which the filter is
chosen to pass only light at the lasers
frequency. Another usage is to
eliminate ambient infrared light in
order to increase the sharpness of an
246 S

image (as most silicon-based sensors are spherical harmonic: A function


also sensitive to infrared light). defined on the unit sphere of the form

spectral frequency: Electromagnetic Ylm (, ) = lm Plm (cos)eim


or spatial frequency. [ EH:7.7]
is a spherical harmonic, where lm is a
spectral reflectance: See reflectance . normalizing factor, and Plm is a
[ JKS:9.1.2] Legendre polynomial. Any real function
defined on the sphere f (, ) has an
spectral response: The response R of expansion in terms of the spherical
an imaging sensor illuminated by harmonics of the form
monochromatic light of wavelength is
l
X
the product of the input light intensity X
I and the spectral response at that f (, ) = lm Ylm (, )
wavelength s(), so R = Is(). l=0 m=l

that is analogous to the Fourier


spectrum: A range of values such as
expansion of a function defined on the
the electromagnetic spectrum .
plane, with the lm analogous to the
[ WP:Spectrum]
Fourier coefficients. Polar plots of the
specular reflection: Mirror-like first ten spherical harmonics, for
reflection or highlight. Formed when a m = 0...2, l = 0...m. The plots show
light source at 3D location L, surface r = 1 + Ylm (, ) in polar coordinates
point P , surface normal N at that [ BB:9.2.3]:
point and camera center C are all
coplanar, and the angles LP N and
N P C are equal. [ FP:4.3.4-4.3.5]

Light source Camera C

L Surface
normal
N
P
Surface

specularity: See specular reflection.


[ FP:4.3.4-4.3.5]

sphere: 1. A surface in any dimension spherical mirror: Sometimes used in


defined by the ~x such that k~x ~ck = r catadioptric cameras. A mirror whose
for a center ~c and radius r. 2. The shape is a portion of a sphere.
volume of space bounded by the above, [ WP:Spherical mirror#Mirror shape]
or ~x such that k~x ~ck r.
[ WP:Sphere] spin image: A local surface
representation of Johnson and Hebert.
spherical: Having the shape of, At selected points p~ with
characteristics of, or associations with, surface normal ~n, all other surface
a sphere . [ WP:Spherical] points ~x can be represented in a 2D
basis as (, ) =
S 247

p
( || ~x p~ ||2 (~n (~x p~))2 , ~n (~x p~)). predicted at that point by a spline x (t)
The spin image is the histogram of all fitted to neighboring values. [ AJ:8.7]
of the (, ) values for the surface.
Each selected points p~ leads to a split and merge: A two-stage
different spin image. Matching points procedure for segmentation or
compares their spin images by clustering . The data is divided into
correlation. Key advantages of the subsets, with the initial division being a
representation are 1) it is independent single set containing all the data. In
of pose and 2) it avoids ambiguities of the split stage, subsets are repeatedly
representation that can occur with subdivided depending on the extent to
nearly flat surfaces. [ FP:21.4.2] which they fail to satisfy a coherence
criterion (for example, similarity of
splash: An invariant representation of pixel colors). In the merge stage, pairs
the region about a 3D point. It gives a of adjacent sets are found that, when
local shape representation useful for merged, will again satisfy a coherence
position invariant object recognition. criterion. Even if the coherence criteria
are the same for both stages, the merge
spline: 1) A curve ~c(t) defined as a stage may still find subsets to merge.
weightedPnsum of control points: [ VSN:3.3.2]
~c(t) = i=0 wi (t)~ pi , where the control
points are p~1...n and one weighting (or SPOT: Systeme Probatoire de
blending) function wi is defined for lObservation de la Terre. A series of
each control point. The curve may satellites launched by France that are a
interpolate the control points or common source of satellite images of
approximate them. The construction of the earth. SPOT-5 for example was
the spline offers guarantees of launched in May 2002 and provides
continuity and smoothness. With complete coverage of the earth every 26
uniform splines the weighting functions days. [ WP:SPOT (satellites)]
for each point are translated copies of
each other, so wi (t) = w0 (t i). The spot detection: An image processing
form of w0 determines the type of operation for locating small bright or
spline: for B-splines and Bezier curves, dark locations against contrasting
w0 (t) is a polynomial (typically cubic) backgrounds. The issues here are what
in t. Nonuniform splines reparameterize size of spot and amount of contrast.
the t axis, ~c(t) = ~c(u(t)) where u(t)
spur: A short segment attached to a
maps the integers k = 0..n to knot
more significant line or edge . Spurs
points t0..n with linear interpolation for
often arise when linear structures are
non-integer values of t. Rational splines
tracked through noisy data, such as by
with n-D control points are perspective
an edge detector . This figure shows
projections of normal splines with
some spurs [ SOS:5.2]:
(n + 1)-D control points.
2) Tensor-product splines define a 3D
surface ~x(u, v) as a product of splines
in u and v. [ JKS:6.7]

spline smoothing: Smoothing of a


discretely sampled signal x(t) by
replacing the value at ti by the value
248 S

(~x, l) pairs or by a self-organizing


SPURS learning algorithm. [ AJ:9.14]

statistical pattern recognition:


Pattern recognition that depends on
classification rules learned from
examples rather than constructed by
designers. Compare
structural pattern recognition .
[ RJS:6]

statistical shape model: A


parameterized shape model where the
parameters are assumed to be random
squared error clustering: A class of variables drawn from a known
clustering algorithms that attempt to probability distribution. The
find cluster centers ~c1 . . . ~cn that distribution is learned from training
minimize the squared error examples. Examples include
P
min x ~ci )2 where X is point distribution models .
~
xX i{1...n} (~
the set of points to be clustered. [ WP:Statistical shape model]

stadimetry: The computation of statistical texture: A texture whose


distance to an object of known size description is in terms of the statistics
based on its apparent size in the of image neighborhoods. General
cameras field of view. examples are cooccurrence statistics
[ WP:Stadimeter] of pairs of neighboring pixels, Fourier
texture descriptors , autocorrelation
stationary camera: A camera whose and autoregressive models. A specific
optical center does not move. The example is the statistics of the
camera may pan, tilt and rotate about distribution of entries in 5 5
its optical center , but not translate. neighborhoods. These statistics may be
Images taken by a stationary camera learned from a set of training images or
are always related by a planar automatically discovered via clustering .
homography . Also known as a rotating [ RN:8.3.1]
camera or non-translating camera. The
term may also refer to a camera that steerable filter: A filter applied to a
does not move at all. 2D image, whose response is dependent
on an scalar orientation parameter ,
statistical classifier: A function but for which the response at any
mapping from a space of input data to arbitrary value of may be computed
a set of labels. Input data are points as a function of a small number of basis
n responses, thus saving computation.
~x R and labels are scalars. The
classifier c(~x) = l assigns the label l to For example, the directional derivative
point ~x. The classifier is typically a at orientation may be computed in
parameterized function, such as a terms of the x and y derivatives Ix and
neural network (with weights as Iy as
parameters) or a
support vector machine . The classifier
parameters could be set by optimizing dI cos Ix
=
performance on a training set of known d~n sin Iy
S 249

For non-steerable filters such as Gabor are relative orientation : the rotation
filters, the response must be computed and translation relating the two
at each orientation, leading to higher cameras. Achieved in several ways: 1)
computational complexity. conventional calibration of each camera
independently; 2) computation of the
steganography: Concealing of essential matrix or fundamental matrix
information in non-suspect carrier relating the pair, from which relative
data. For example, encoding orientation may be computed along
information in the low-order bits of a with one or two intrinsic parameters; 3)
digital image. [ WP:Steganography] for a rigid stereo rig, moving the rig
and capturing multiple image pairs.
step edge: 1) A discontinuity in image
[ TV:7.1.3]
intensity (compare with fold edge ). 2)
An idealized model of a step-change in stereo correspondence problem:
intensity. This plot of intensity I versus The key to recovering depth from stereo
X position shows an intensity step edge is to identify 2D image points that are
discontinuity in intensity I at a projections of the same 3D scene point.
[ JKS:5]: Pairs of such image points are called
correspondences. The correspondence
I problem is to determine which pairs of
image image points are
correspondences. Unfortunately,
matching features or image
neighborhoods is usually ambiguous,
leading to both massive amounts of
computation and many alternative
X solutions. To reduce the space of
a
matches, corresponding points are
steradian: The unit of solid angle . usually required to satisfy some
[ FP:4.1.2] constraints, such as having similar
orientation and contrast, local
stereo: General term for a class of smoothness, uniqueness of match. A
problems in which multiple images of powerful constraint is the epipolar
the same scene are used to recover a 3D constraint: from a single view, an image
property such as surface shape, point is constrained to lie on a 3D ray,
orientation or curvature. In binocular whose projection onto the second image
stereo, two images are taken from is an epipolar curve. For pinhole
different viewpoints allowing the cameras, the epipolar curve is a line.
computation of 3D structure. In This greatly reduces the space of
trifocal, trinocular and multiple-view potential matches. [ JKS:11.2]
stereo, three or more images are
available. In photometric stereo , the stereo convergence: The angle
viewpoint is the same, but lighting between the optical axes of two sensors
conditions are varied in order to in a stereo configuration:
compute surface orientation.

stereo camera calibration: The


computation of intrinsic and extrinsic
camera parameters for a pair of
cameras. Important extrinsic variables
250 S

rays. With noisy data, the optimal


triangulation is computed by finding
the 3D point that maximizes the
probability that the two imaged points
are noisy projections thereof. Also used
for
the analogous problem in multiple views.
[ WP:Range imaging#Stereo triangulation]

stereo fusion: The ability of the


human vision system, when presented stereo vision: The ability to
with a pair of stereo images, one to determine three dimensional structure
each eye independently, to form a using two eyes. See also stereo .
consistent 3D interpretation of the [ TV:7.1]
scene, essentially solving the
stereo correspondence problem . The stimulus: 1) Any object or event that
fact that humans can perform fusion a computer vision system may detect.
even on random dot stereograms 2) The perceived radiant energy itself.
means that high-level recognition is not [ WP:Stimulus]
required to solve all stereo stochastic gradient: An optimization
correspondence problems. [ BB:3.4.2] algorithm for minimizing a convex cost
stereo image rectification: For a function.
pair of images taken by pinhole [ WP:Stochastic gradient descent]
cameras, points in stereo stochastic completion field: A
correspondence lie on corresponding strategy for algorithmic discovery of
epipolar lines . Stereo image illusory contours .
rectification resamples the 2D images to
create two new images, with the same stochastic process: A process whose
number of rows, so that points on next state depends probabilistically on
corresponding epipolar lines lie on its current state.
corresponding rows. This reduces [ WP:Stochastic process]
computation for some stereo
algorithms, although certain stratification: A class of solutions to
relative orientations (e.g., translation self-calibration in which a
along the optical axis) make projective reconstruction is first
rectification difficult to achieve. converted to an affine reconstruction
[ JKS:12.5] (by computing the plane at infinity)
and then to a Euclidean reconstruction.
stereo matching: See [ HZ:18.5]
stereo correspondence problem .
[ JKS:11.2] streaming video: Video presented as
a sequence of images or frames . An
stereo triangulation: Determining algorithm processing such video cannot
the 3D position of a point given its 2D easily select a particular frame.
positions in each of two images taken [ WP:Streaming video]
by cameras in known positions. In the
noise-free case, each 2D point defines a
3D ray by back-projection , and the 3D
point is at the intersection of the two
S 251

stripe ranging: See structure from motion: Recovery of


structured light triangulation . the 3D shape of a set of scene points
[ JKS:11.4.1] from their motion. For a more modern
treatment, see
strobe duration: The time for which structure and motion recovery .
a strobe light is illuminated. [ JKS:14.7]
[ LG:2.1.1]
structure from optical flow:
strobed light: A light that is Recovery of camera motion by
illuminated for a very short period, computing optical flow constrained by
generally at high intensity. [ LG:2.1.1] the infinitesimal motion fundamental
matrix. The small motion
structural pattern recognition:
approximation replaces the
Pattern recognition where classification
rotation matrix R by I [~ ] where
~
is achieved using high-level rules or
is the axis of rotation, the unique
patterns, often specified by a human
vector such that R~ = ~.
designer. See also
syntactic pattern recognition . structure matching: See
[ WP:Structural pattern recognition] recognition by components .
structural texture: A texture that is structured light: A class of
formed by the regular repetition of a techniques where carefully engineered
primitive structure, for example an illumination is employed to simplify
image of bricks or windows. [ AJ:9.11] computation of scene properties.
Common examples include
structure and motion recovery:
structured light triangulation , and
The simultaneous computation of 3D
moire fringe sensing. [ JKS:11.4.1]
scene structure and 3D camera
positions from a sequence of images of a structured light source calibration:
scene. Common strategies depend on The special case of calibration in a
tracking of 2D image entities (e.g., structured light system where the
interest points or edges) through position of the light source is
multiple views and thus obtaining determined.
constraints on the 3D entities (e.g.,
points and lines) and camera motion. structured light triangulation:
Constraints are embodied in entities Recovery of 3D structure by computing
such as the fundamental matrix and the intersection of a ray (or plane or
trifocal tensor that may be estimated other light shape) of light with the ray
from image data alone, and then allow determined by the image of the
computation of the 3D camera illuminated scene surface [ JKS:11.4.1]:
positions. Recovery is up to certain
equivalence classes of scenes, where any
member of the class may generate the
observed data, such as projective or
affine reconstructions.

structure factorization: See


motion factorization .
252 S

part by a collection of smaller objects


ILLUMINATED in a hierarchical model .
SURFACE
subgraph isomorphism: Equivalence
of a pair of subgraphs of two given
graphs . Given graphs A and B, the
subgraph isomorphism problem is to
enumerate all pairs of subgraphs (a, b)
z
where: a A; b B; a is isomorphic to
b; and some given predicate p(a, b) is

true. Appropriate modifications of the
problem allow the solution of many
D graph problems including determining
shortest paths and finding
LEFT RIGHT maximal cliques . A given graph has a
SENSOR LASER number of subgraphs exponential in the
number of vertices and the general
structured model: See problem is NP-complete . This example
hierarchical model . shows subgraph isomorphism with the
matching graph being A:b-C:a-B:c
structuring element: The basic [ JKS:15.6.3]:
neighborhood structure of
morphological image processing . The
structuring element is an image, A b
B
typically small, that defines a shape c
pattern. Morphological operations on a
source image combine the structuring C D a
element with the source image in
various ways. [ JKS:2.6] subjective contour: An edge
perceived by humans in an image due
subband coding: A means of coding to Gestalt completion, particularly
a discrete signal for transmission. The when no image evidence is present.
signal is passed through a set of
bandpass filters , and each channel is
quantized separately. The sampling
rate of the individual channels is set
such that, before quantization, the sum
of the number of per-channel samples is
the same as the number of samples of
the original system. By varying the
quantization for different bands, the
number of samples may be reduced In this example, the triangle that
with small losses in signal quality. appears to float above the black discs is
[ WP:Subband coding] bounded partially by a subjective
contour. [ RN:7.4]
subcomponent: An object part used
in a hierarchical model . subpixel edge detection: Estimation
of the location of an image edge by
subcomponent decomposition: subpixel interpolation of the gradient
Representation of a complete object operator response, to give a position
S 253

more accurately than an integer pixel subspace learning: A


value. [ JKS:5.7] subspace method where the subspace is
learned from a number of observations.
subpixel interpolation: A class of
techniques that essentially interpolate subspace method: A general term
the position of local maxima in images describing methods that convert a
to positions at a resolution smaller than vector space into a lower dimensional
integer pixel coordinates. Examples subspace, e.g., projecting a set of N
include subpixel edge detection and dimensional vectors onto their first two
interest point detection. A rule of principal components to produce a 2D
thumb is 0.1 pixel accuracy is often subspace.
possible. If the input is an image
z(x, y) containing the response of some subtractive color: The way in which
kernel to a source image, a typical color appears due to the
approach might be as follows. attenuation/absorption of frequencies of
light by materials (e.g., we perceive
1. Identify a local maximum where that something is red it is because it is
z(x, y) z(a, b) where attenuating/absorbing all wavelengths
(a, b) neighborhood(x, y). other than those corresponding to red).
See also additive color .
2. Fit the quadratic surface [ WP:Subtractive color]
z = ai2 + bij + cj 2 + di + ej + f
to the set of samples superellipse: A class of 2D curves,
(i, j, z(x + i, y + j)) in a including the ellipses and Lame curves
neighborhood about (x, y). as special cases. The general form of
the superellipse is
3. Compute the position of the local
maximum of the quadratic surface x y
1 + =1
i 2a b d a b
=
j b 2c e
although several alternative forms exist.
1 1
4. If 2 < {i, j} < 2 then report a Fitting superellipses to data is difficult
maximum at subpixel location due to the strongly nonlinear
(x + i, y + j). dependence of the shape on the
parameters and .
Similar strategies apply when
computing the subpixel location of
edges. [ JKS:5.7]

subsampling: Reducing the size of an


image by producing a new image whose
pixel values are more widely sampling
the original image (e.g., every third
pixel). Interpolation can produce more
accurate samples To avoid aliasing , The above shows examples of two
spatial frequencies higher than the superellipses. The convex superellipse
Nyquist limit of the coarse grid should has = = 3, the concave example
be removed by low pass filtering the has = = 12 . [ WP:Super ellipse]
image. Also known as downsampling.
[ SOS:3.6] supergrid: An image representation
that is larger than the original image
254 S

and represents explicitly both the registration between viewpoints.


image points and the crack edges [ WP:Super resolution]
between them. [ JKS:3.3.4]
supervised classification: See
classification . [ AJ:9.14]
Pixels

supervised learning: A method for


training a neural network where the
Crack Edges
network is presented (in a training
phase) with a series of patterns and
their correct classification. See also
unsupervised learning . [ SQ:14.1.3]
superpixel: A superpixel is a pixel in
a high resolution image. An support vector machine: A
anti-aliasing computer graphics statistical classifier assigning labels l to
technique produces lower resolution points ~x in Rn . The support vector
image data by a weighted sum of the machine has two defining
superpixels. characteristics. Firstly, the classifier
places the decision surface that
superquadric: A 3D generalization of
separates points ~xi and ~xj that have
the superellipse , the solution set of
different labels li 6= lj in such a way as
x y z to maximize the margin between them.
+ + =1 Roughly speaking, the decision surface
a b c
is as far as possible from any ~x.
As with superellipses, fitting to 3D data Secondly, the classifier operates not on
is non-trivial, although some success the raw feature vectors ~x, but on high
has been achieved. Two examples of dimensional projections
superquadrics, both with = 2: f~(~x) : Rn 7 RN , N > n. However,
because the classifier only ever requires
dot products such as f~(~x) f~(~y ), we
never form f~ explicitly, but specify
instead the kernel function
K(~x, ~y ) = f~(~x) f~(~y ). Wherever the dot
product between high-dimensional
vectors is required, the kernel function
is used instead. [ SQ:14A.2]

support vector regression: A range


of techniques for function estimation
that attempts to determine a function
The convex superquadric has to model data while ensuring that the
= = 3, the concave example has function does not deviate from any data
= = 12 . [ SQ:9.11] sample by more than a certain amount.
See also support vector machine .
superresolution: Generation of a [ WP:Support vector machine#Regression]
high-resolution image from a collection
of low-resolution images of the same
object taken from different viewpoints . surface: A surface in general parlance
The key to successful superresolution is is a 2D shape that is located in 3D.
in the accurate estimation of the Mathematically, it is a 2D subset of R3
S 255

that is almost everywhere locally 14, curve 1 is bounded by vertices b


topologically equivalent to the open and c. [ JKS:15.3.2]
unit ball in R2 . This means that a
cloud of points is not a surface, but the surface class: Koenderinks
surface may have cusps or boundaries. classification of local surface shape into
A parameterization of the surface is a classes based on two functions of the
function from R2 to R3 that defines the principal curvatures :
3D surface point ~x(u, v) as a function of
The shape index
2D parameters (u, v). Restricting (u, v)
S = 2 tan1 11
+2
to subsets of R2 yields a subset of the 2

surface. The surface is the set S of q


1
points on it, defined over a domain D The curvedness C = 2 (1 + 2 )
[ WP:Surface]:
where 1 and 2 are the principal
2
S = {~x(u, v) | (u, v) D R } curvatures. The surface classes are
planar (C = 0), hyperboloid (|S| < 83 )
or ellipsoid (|S| > 85 ) and cylinder
surface area: Given a parametric ( 38 < |S| < 58 ), subdivided into concave
surface S = {~x(u, v) | (u, v) D R2 }, (S < 0) and convex (S > 0).
with unit tangent vectors ~xu (u, v) and Alternative classification systems exist
~xv (u, v), the area of the surface is based on the mean and
[ TV:A.5] Gaussian curvature or the
Z principal curvatures . The former
|~xu (u, v) ~xv (u, v)|dudv distinguishes more classes of
S hyperboloid surfaces.

surface boundary representation: surface continuity: Mathematically,


A method of defining surface models in surface continuity is defined at a single
computer graphics. It defines a 3D point parameterized by (u, v) on the
object as a collection of surfaces with surface {~x(u, v) | (u, v) D R2 }.
boundaries. The model topology states The surface is continuous at that point
which surfaces are connected, and if infinitesimal motions in any direction
which boundaries are shared between away from (u, v) can never cause a
patches. sudden change in the value of ~x. The
surface is everywhere continuous, or
just continuous if it is continuous at all
e 3 f points in D.
2 B 4
b 1 c 5
C surface curvature: Surface curvature
6 g
8 A measures the shape of a 3D surface (the
7
characteristics of the surface that are
a 9 d constant if the surface is rotated or
translated in 3D space). The shape is
The B-rep model of these three faces specified by the surfaces
comprises: 1) the faces A,B,C along principal curvatures at each point. To
with the parameters of their 3D compute the principal curvatures, we
surfaces ; the edges 19 with 3D curve need two pieces of machinery, called the
descriptions; and vertices ag; 2) first and second fundamental forms. In
connectivities of these entities, for the differential geometry of surfaces,
example face B is bounded by curves the first fundamental form encapsulates
256 S

arc-length of curves in a surface. If the surface fitting: A family of


surface is defined in parametric form by parametric surfaces ~x (u, v) is
a smooth function ~x(u, v), the surfaces parameterized by a vector of
tangent vectors at (u, v) are given by parameters . For example, the family
the partial derivatives ~xu (u, v) and of 3D spheres is parameterized by four
~xv (u, v). From these, we define the dot parameters: three for the center, one
products E(u, v) = ~xu ~xu , for the radius. Given a set of n sampled
F (u, v) = ~xu ~xv , G(u, v) = ~xv ~xv . data points {~ p1 , .., p~n }, the task of
Then arclength along a curve in the surface fitting is to find the parameters
surface is given by the first fundamental of the surface that best fits the given
form ds2 = Edu2 + 2F dudv + Gdv 2 . data. Common interpretations of best
The matrix of the first fundamental fit include finding the surface for which
form is the 2 2 matrix the sum of Euclidean distances from
the points to the surface is smallest, or
E F that maximize the probability that the
I=
F G data points could be noisy samples
from the surface. General techniques
The second fundamental form include least squares fitting or
encapsulates the curvature information. nonlinear optimization over the surface
The second partial derivatives are parameters. [ JKS:13.7]
~xuu (u, v) etc. The surface normal at
(u, v) is the unit vector ~n(u, v) along surface interpolation: Generating a
~xu ~xv . Then the matrix of the second continuous surface from sparse data
fundamental form at (u, v) is the 2 2 such as 3D points. For example, given a
matrix set of n sampled data points
S = {~ p1 , .., p~n }, one might wish to
~xuu ~n ~xuv ~n generate other points in R3 that lie on
II = .
~xvu ~n ~xvv ~n a smooth surface that passes through
all the points in S. Techniques include
If d~ = (du, dv) is a direction in the radial basis functions, spline s, natural
tangent space (so its 3D direction is neighbor interpolation. [ JKS:13.6]
~t(d)~ = du~xu + dv~xv ), then the normal
surface matching: Identifying
curvature in the direction d~ is given by
~ = ~
d
II ~
d corresponding points on two 3D
(d) . The minima and maxima
d~ Id~ surfaces, often as a precursor to surface
of as d~ varies at a point (u, v) are the registration .
principal curvatures at the point, given
by the generalized eigenvalues of surface mesh: A
II~z = I~z, i.e., the solutions to the surface boundary representation in
quadratic equation in given by which the faces are typically planar and
det(II I) = 0. [ FP:19.1.2] the edges are straight lines. Such
representations are often associated
surface discontinuity: A with efficient data structures (e.g.,
discontinuity is a point at which the winged edge, quad edge) that allow fast
surface, or its normal vector, is not computation of various geometric and
continuous. These are often fold edges , topological properties. Hardware
where the surface normal has a large acceleration of polygon rendering is a
change in direction. See also feature of many computers.
surface continuity . [ JKS:13.5.1]
S 257

point of each Di from the best-fit


surface normal: The direction quadric surface is below a threshold.
perpendicular to a surface . For a See also range data segmentation .
parametric surface ~x(u, v), the normal [ SQ:8.6]
~
x
is the unit vector parallel to u ~x
v .
For an implicit surface F (~x) = 0, the surface shape classification: The
normal is the unit vector parallel to use of curvature information of a
F = [ F F F
x , y , z ]. The figure shows surface to classify each point on the
the surface normal as defined by the surface as locally ellipsoidal, hyperbolic,
small neighborhood at the point X cylindrical or planar. See also
[ TV:A.5]: surface class . For example, given a
parametric surface ~x(u, v), the
classification function c(u, v) is a
mapping from the domain of (u, v) to a
set of discrete class labels.
X
surface shape discontinuity: A
discontinuity in the value of a
surface shape classification over a
surface orientation: The convention surface. For example, a discontinuity in
that decides whether the the classification function c(u, v).
surface normal or its negation points Another example occurs at the fold
outside the space bounded by the edge at point X:
surface. [ JKS:9.2]

surface patch: A surface whose


domain is finite. [ JKS:13.5.2]

surface reconstruction: The problem


of building a surface mesh or B-rep X
model from unorganized point data.
[ BM:3.7]

surface reflectance: A description of


the manner in which a surface interacts
with light. See reflectance . surface tracking: Identification of the
[ JKS:9.1.2] same surface through the frames of a
video sequence.
surface roughness characterization:
An inspection application where surface triangulation: See
estimates of the roughness of a surface surface mesh .
are made, e.g., when inspecting
spray-painted surfaces. surveillance: An application area of
vision concerned with the monitoring of
surface segmentation: Division of a activities in a scene. Typically this will
surface into simpler patches. Given a involve at least background modeling
surface defined over a domain D, and human motion analysis .
determine a partition D = {D1..n } on [ WP:Surveillance]
which some goodness criteria are well
satisfied. For example, it might be SUSAN corner finder: A popular
required that the maximal distance of a interest point detector developed by
258 S

Smith and Brady. Combines the symmetric axis transform (SAT):


smoothing and central difference stages A transformation that locates all points
of a derivative-based operator into a on the skeleton of a region by
single centersurround comparison. identifying those points that are the
[ WP:Corner detection#The SUSAN corner locus
detector]
of centers of bitangent circles. See
also medial axis skeletonization . In the
following example the medial axis is
SVD: See derived from a binary segmentation of a
singular value decomposition. moving subject. [ VSN:9.2.2]
[ FP:12.3.2]

SVM: See support vector machine .


[ SQ:14A.2]

swept object representation: A


volumetric representation scheme in
which 3D objects are formed by
sweeping a 2D cross section along an
axis or trajectory. A brick can be
formed by sweeping a rectangle. Some
schemes, like the geon or
generalized cylinder representation,
allow changes to the size of the cross
section and curved trajectories. A cone symmetry: A shape that remains
is defined here by sweeping a circle invariant under at least one
along a straight axis with a linearly non-identity transformation from some
decreasing radius [ JKS:15.3.2]: pre-specified transformation group is
symmetric. For example the set of
points comprising an ellipse is the same
after the ellipse is subjected to the
Euclidean transformation of rotation by
180 about its center. The image of the
symbolic: Inference or computation outline of a surface of revolution under
expressed in terms of a set of symbols perspective projection is invariant
rather than a signal. Where a digital under a certain homography , so the
signal is a discrete representation of a silhouette exhibits a projective
continuous function, symbols are symmetry. Affine symmetries are
inherently discrete. For example, an sometimes known as skew symmetries
image (signal) is converted to a list of and symmetry induced by reflection
the names of people who appear in it about a line are called bilateral
(symbols). symmetries. [ SQ:9.3]
symbolic object representation: symmetry detection: A class of
Representation of an object by lists of algorithms that search for symmetry in
symbolic terms like plane, quadric, imaged curves, surfaces and point sets.
corner, or face, etc., rather than
the points or pixels of the shape itself. symmetry line: The axis of a
The representation may include the bilateral symmetry . The solid line
shape and position of the objects, too. rectangle has two dashed line symmetry
lines [ WP:Line symmetry]:
S 259

grammars of local shapes or image


patches and transformation rules. Good
for modeling synthetic artificial
textures.

synthetic aperture radar (SAR):


An imaging device that transmits
long-wavelength (in comparison to
visible light) radio waves from airborne
symmetry plane: The axis of a or space platforms and builds a 2D
bilateral symmetry in 3D. The dashed image of the intensities of the returned
lines show three symmetry planes of reflections. Clouds are transparent at
this cube: these (centimeter) wavelengths, and the
active transmission means that images
may be taken at night. The images are
captured as a sequence of
low-resolution (small aperture) 1D
slices as the platform translates across
the target area, with a final
high-resolution (synthetic [large]
aperture) image recoverable via a
Fourier transform after all slices have
sync pulse: Abbreviation of been captured. The time-of-flight of the
synchronization pulse. Any electrical returned signal determines the distance
signal that allows two or more from the transmitter and therefore,
electronic devices to share a common assuming a planar (or known geometry)
time frame. Commonly used to surface, the pixel location in the
synchronize the capture instants of two cross-path direction.
cameras in a stereo image capture [ WP:Synthetic aperture radar]
system. [ LG:4.1.1]
systolic array: A class of parallel
syntactic pattern recognition: computer in which processors are
Object identification by converting an arranged in a directed graph. The
image of the object into a sequence or processors synchronously receive data
array of symbols and using grammar from one set of neighbors (e.g., North
parsing techniques to match the and West in a rectangular array),
sequence of symbols to grammar rules perform a computation, and transmit
in a database. [ RJS:6] the computed quantity to another set of
neighbors (e.g., South and East).
syntactic texture description: [ RJS:8]
Description of texture in terms of
T

tabu search: A heuristic search


Source Image Target Image
technique that seeks to avoid cycles by
forbidding or penalizing moves taking
the search to previously visited solution
spaces (hence tabu).
[ WP:Tabu search]

tangent angle function: Given a


curve (x(t), y(t)), the function
y(t)

(t) = tan1 x(t)
. target recognition: See
automatic target recognition .
tangent plane: The plane passing [ WP:Automatic target recognition]
through a point on a surface that is
perpendicular to the surface normal . task parallelism: Parallel processing
[ FP:19.1.2] achieved by the concurrent execution of
relatively large subsets of a computer
tangential distortion (lens): A program. A large subset might be
particular lens aberration created, defined as one whose run time is of the
among others, by lens decentering, order of tens of milliseconds. The
usually modeled only in high-accuracy parallel tasks need not be identical,
calibration systems. e.g., from a binary image, one task may
compute a moment while another
target image: The image resulting
computes the perimeter .
from an image processing operation.
[ WP:Task parallelism]

260
T 261

tee junction: An intersection between telepresence: Interaction with objects


line segments (possibly representing at a location remote from the user via
edges ) where a straight line meets and vision or robotic devices. Examples
terminates somewhere along another include slaving of remote cameras to
line segment. See also blocks world . the motion of a head-mounted display
Tee junctions can give useful worn by the user, transmission of audio
depth-ordering cues. Here we can from the remote location, use of local
hypothesize that surface c lies in front controls to operate remote machinery,
of the surfaces A and B, given the tee and haptic (i.e., touch) feedback from
junction at p [ VSN:4.1.1]: the remote to the local environment.
[ WP:Telepresence]

template matching: A strategy for


location of an object in an image. The
p A template, a 2D image of the object, is
compared with all windows of the same
C B size as the template in the image to be
searched. Windows where the difference
with the template (as computed by,
e.g., normalized cross-correlation or
telecentric optics: A lens system sum of squared differences (SSD)) is
arranged such that moving the image within a threshold are reported as
plane along the optical axis does not instances of the object. Interesting as a
change the magnification or image brute-force matching strategy. To
position of imaged world points. One obtain invariance to scale, rotation or
embodiment is to place an aperture in other transformations, the template
front of the lens so that when an object must be subjected explicitly to the
is imaged off the focal plane of the lens, transformations. [ FP:25.3.2]
the center of the (blurred) object is the
ray through the center of the aperture, temporal averaging: Any procedure
rather than the center of the lens. for noise reduction in which a signal
Placing the aperture at the lenss front that is known to be static over time is
focal plane will ensure these rays are sampled at different times and the
parallel after the lens. results averaged.
[ WP:Telecentric lens]
temporal representation: A model
representation that encodes the
dynamics of how an objects shape or
Focal plane

3. Image plane
moves
1. World point position can vary over time.

2. No aperture temporal resolution: The frequency


4. Image point
moves
of observations with respect to time
on image plane
Non-telecentric optics (e.g., one per second) as opposed to the
spatial resolution .
Focal plane

3. Image plane 1. World point


moves
temporal stereo: 1) Stereo achieved
2. Aperture through movement of the camera rather
4. Image point
stationary
than use of two separate cameras. 2)
on image plane
Telecentric optics Integration of multiple stereo views of a
262 T

dynamic scene to produce a better into packed tetrahedrons instead of the


estimate of each view. more commonly used rectangular voxel
decomposition. A tetrahedral
temporal tracking: See tracking . decomposition allows a recursive
[ FP:17-17.5] subdivision of a tetrahedron into eight
smaller tetrahedra. This figure
tensor product surface: A
illustrates the decomposition with one
parametric representation for a curved
of the eight smaller volumes shaded:
surface commonly used in computer
modeling and graphics applications,
The surface shape is defined by the
product of two polynomial (usually
cubic) curves in the independent
surface coordinates. [ JKS:13.5.3]

terrain analysis: Analysis and


interpretation of data representing the
shape of the planet surface. Typical
data structures are
digital elevation maps or triangulated
irregular networks (TINs).
texel: See texture element . [ BB:6.2]
tessellated viewsphere: A division of
the viewsphere into distinct subsets of texon: See texture element . [ BB:6.2]
(approximately) equal areas. Often
used as a data structure for
representation of functions of the form textel: See texture element .
f (~n) where ~n is a unit normal vector in [ BB:6.2]
R3 . Typically constructed by texton: Julesz 1981 definition of the
subdivision of the viewsphere into a units in which texture might be
polygon mesh such as an icosahedron: perceived. In the texton-based view, a
texture is a regular assembly of textons.
[ B. Julesz, Textons, the elements of
texture perception, and their
interactions, Nature, Vol. 290, pp
91-97, 1981.]

texture: The phenomenon by which


uniformity is perceived in regular
(etymologically, woven) patterns of
(possibly irregular) elements. In
computer vision, texture usually refers
to the patterns in the appearance or
test set: The set used to verify a reflectance on a surface. The texture
classifier or other algorithm. The test may be regular, i.e., satisfy some
set contains only examples not included texture grammar or may be
in the training set . [ WP:Test set] statistical texture i.e., the distribution
of pixel values may vary over the image.
tetrahedral spatial decomposition: Texture could also refer to variations in
A method of decomposing 3D space the local shape on a surface, e.g., its
T 263

degree of roughness. See also available. See also texture primitive .


shape texture . [ NA:8.2] [ NA:8.3]

texturebased image retrieval: texture direction: The


Content-based image retrieval that uses texture gradient or a 90 degree rotation
texture as its classification criterion. thereof. [ BB:6.5]
[ WP:Content-
based image retrieval#Texture] texture element (texel): A small
geometric pattern that is repeated
frequently on some surface resulting in
texture boundary: The boundary a texture . [ BB:6.2]
between adjacent regions in
texture segmentation . The boundary texture energy measure: A
perceived by humans between two single-valued texture descriptor with
regions of different textures. This figure strong response in textured regions. A
shows the boundary between three texture descriptor may be formed by
regions of different color and shape combining the results of several texture
texture: energy measures into a vector.

texture enhancement: A procedure


analogous to edge-preserving smoothing
in which texture boundaries rather
than edges are to be preserved.

texture field grouping: See


texture segmentation . [ FP:9]

texture field segmentation: See


texture segmentation . [ FP:9]

texture grammar: Grammar used


texture classification: Assignment of to describe textures as instances of
an image (or a window of an image) to simpler patterns with a given spatial
one of a set of texture classes. The relationship (including other textures
texture classes are typically defined by defined previously in this way). A
presentation of training images sentence from this grammar would be a
representing each class by a human. syntactic texture description .
The basis of texture segmentation . [ BB:6.3.1]
[ NA:8.4]
texture gradient: The gradient of a
texture descriptor: A vector valued single scalar output s(x, y) of a
function computed on an image texture descriptor . A common example
subwindow that is designed to produce is the scale output, for homogeneous
similar outputs when applied to texture, whose texture gradient can be
different subwindows of the same used to compute the foreshortening
texture . The size of the image direction. [ BB:6.5]
subwindow controls the scale of the
detector. If the response at a pixel texture mapping: In computer
position (x, y) is computed as the graphics, rendering of a polygonal
maximum over several scales, an surface where the surface color at each
additional scale output s(x, y) is output screen pixel is obtained by
264 T

interpolating values from an image,


called the texture map . The source texture synthesis: The generation of
image pixel location is computed using synthetic images of textured scenes.
correspondences between the polygons More particularly, the generation of
vertex coordinates and texture images that appear perceptually to
coordinates on the texture map. share the texture of a set of training
[ WP:Texture mapping] examples of a texture. [ FP:9]

texture matching: Matching of TheilSen estimator: A method for


regions based on texture descriptions. robust estimation of curve fits . A
family of curves is parameterized by
texture model: The theoretical basis parameters a1..p , and is to be fit to data
for a class of texture descriptor . For ~x1..n . If q is the smallest number of
example, autocorrelation of linear filter points that uniquely define a1..p , then
responses, statistical texture the TheilSen estimate of the optimal
descriptions, or parameters a 1..p are the parameters
syntactic texture descriptions . that have the median error measure of
all the q-point estimates. For example,
texture orientation: See for line fitting, the number of
texture gradient . [ BB:6.5] parameters (slope and intercept, say) is
p = 2, and the number of points
texture primitive: A basic unit of
required to give a fit is also q = 2. Thus
texture (e.g., a small pattern that is
the TheilSen estimate of the slope
repeated) as used in
gives the median error of the ( nq )
syntactic texture descriptions .
two-point slope estimates. The
texture recognition: See TheilSen estimator is not statistically
texture classification . [ NA:8.4] efficient, nor does it have a particularly
high breakdown point, in contrast to
texture region extraction: See such estimators as RANSAC and
texture field segmentation . [ FP:9] least median of squares .

texture representation: See thermal noise: In CCD cameras,


texture model. additional electrons released by thermal
vibration in the substrate that are
texture segmentation: counted with those released by incident
Segmentation of an image into patches photons. Thus, the gray scale values
of coherent texture . This figure shows are corrupted by an
a region segmented into three regions additive Poisson noise process.
based on color and shape texture [ WP:Thermal noise]
[ FP:9]:
thickening operator: Thickening is a
morphological operation that is used to
grow selected regions of foreground
pixels in binary images , somewhat like
dilation or closing . It has several
applications, including determining the
approximate convex hull of a shape,
and determining the
skeleton by zone of influence .
Thickening is normally only applied to
T 265

binary images, and it produces another obtaining measurements of scene


binary image as output. This is an properties at all points in a 3D space,
example of thickening six times in the including the insides of objects. This is
horizontal direction [ AJ:9.9]: used for inspection, but more commonly
for medical imaging. Techniques
include nuclear magnetic resonance ,
computerized tomography ,
positron emission tomography and
single photon emission computed
tomography . 2) 3D surface imaging:
obtaining surface information
embedded in a 3D space. Active
techniques generally include a source of
thin plate model: A model of surface structured light (or other
smoothness used in the electromagnetic or sonar radiation),
variational approach. The internal and a sensor such as a camera or
energy (or bending energy ) of a thin microphone. Either triangulation or
plate represented as a parametric time of flight computations allow the
surface (x, y, f (x, y)) is given by distance from the sensor system to be
2 2 2
fxx + 2fxy + fyy . [ FP:26.1.1] computed. Common technologies
include laser scanning , texture
thinning operator: Thinning is a projection systems, and moire fringe
morphological operation that is used to methods. Passive 3D imaging depends
remove selected foreground pixels from only on external (and hence
binary images , somewhat like erosion unstructured) illumination sources.
or opening . It can be used for several Examples of such systems are stereo
applications, but is particularly useful reconstruction and shape from focus
for skeletonization and to tidy up the techniques.
output of edge detectors by reducing
all lines to single pixel thickness. threshold selection: The automatic
Thinning is normally only applied to choice of threshold values for conversion
binary images and produces another of a scalar signal (such as a
binary image as output. This is a gray scale image ) to binary . Often
diagram illustrating the thinning of a (e.g., Otsus 1979 method) proceeds by
region [ JKS:2.5.11]: analysis of the histogram of the sample
values. Different assumptions about the
underlying distributions yield different
strategies. [ JKS:3.2.1]

thresholding: Quantization into two


values. For example, conversion of a
scalar signal (such as a
gray scale image) to binary . This
three view geometry: See figure shows an input image and its
trinocular stereo. [ OF:6.9] threshold output [ JKS:2.1, 3.2.1]:
three dimensional imaging: Any of
a class of techniques that obtain three
dimensional information using imaging
equipment. 1) 3D volumetric imaging:
266 T

motion, an image object will intersect


the plane parallel to the image plane
that contains the camera center. It can
be computed even in the absence of
metric information about the imaging
systemi.e., in an uncalibrated
setting.
thresholding with hysteresis:
Thresholding of a time-varying scalar time to impact: See time to contact.
signal where the threshold value is a
function of previous signal and time-of-flight range sensor: A
threshold values. For example a sensor that computes distance to target
thermostat control based on points by emitting electromagnetic (or
temperature receives a signal s(t), and other) radiation and measuring the
generates an output signal b(t) of the time between emitting the pulse and
form observing the reflection of the pulse.
( [ BM:1.9.2]
s(t) > cold if b(t 1) = 0
b(t) =
s(t) < hot if b(t 1) = 1 tolerance band algorithm: An
algorithm for incremental segmentation
where the value at time t depends on of a curve into straight line elements.
the previous decision b(t 1). In Assume that the current straight line
computer vision, often associated with segment defines two parallel boundaries
the edge following stage of the of a tolerance zone at a pre-selected
Canny edge detector . [ NA:4.2.5] distance from the line segment. When a
new curve point leaves the tolerance
TIFF: Tagged Image File Format. zone the current line segment is ended
[ SEU:1.8] and a new segment is started. A
tilt: The tilt direction of a 3D surface tolerance band is illustrated here
patch as observed in a 2D image is [ JKS:6.4.2]:
parallel to the projection of the 3D
surface normal into the image. If the
3D surface is represented as a
depth map z(x, y) in image TOLERANCE EXIT
coordinates, then the tilt direction at
(x, y) is the unit vector parallel to ZONE POINT

{
z z
( x , y ). The tilt angle may be defined
z z
as tan1 ( y / x ). [ FP:9.4.1]

time derivative: A technique for


computing how an image sequence tolerance interval: An interval
changes over time. Typically used as within which a stated proportion of
part of shape from motion . some population will lie.
[ WP:Tolerance interval]
time to collision: See
time to contact. TomasiKanade factorization: A
maximum-likelihood solution to
time to contact: From a sequence of structure and motion recovery in the
images I(t), computation of the value situation where points in a static scene
of t at which, assuming constant are observed by affine cameras and the
T 267

observed (x, y) positions are corrupted topological representation: Any


by Gaussian noise. The method representation that encodes
depends on the observation that if m connectedness of elements. For
points are observed over n views, the example, in a
2n m measurement matrix containing surface boundary representation
all the observations (after certain comprising faces, edges and vertices,
transformations have been performed) the topology of the representation is the
is of rank 3. The closest rank-3 list of faceedge and edgevertex
approximation of the matrix is reliably connections, which is independent of
obtained via the geometry (or spatial positions and
singular value decomposition , after sizes) of the representation. In this
which the 3D points and camera case, the fundamental relation is
positions are easily extracted, up to an bounded by, so a face is bounded-by
affine ambiguity. one or more edges, and an edge is
[ WP:TomasiKanade factorization] bounded-by zero or more vertices.

tomography: A technique for the topology: 1) Properties of point sets


reconstruction of a 3D volumetric (such as surfaces) that are unchanged
dataset based on a number of 2D slices. by continuous reparameterizations
The most common examples occur in (homeomorphisms) of space. 2) The
medical imaging (e.g., connectedness of objects in discrete
nuclear magnetic resonance , geometry (see
positron emission tomography ). topological representation ). One speaks
[ WP:Tomography] of the topology of a network, meaning
the set of connections within the
top-down: A reasoning approach that network, or equivalently the set of
searches for evidence for high-level neighborhood relationships that
hypotheses in the data. For example, a describe the network. [ RJS:6]
hypothesize-and-test algorithm might
have a strategy for making good torsion: A concept in the
guesses as to the position of circles in differential geometry of curves formally
an image and then compare the representing the intuitive notion of the
hypothesized circles to edges in the local twisting of a 3D curve as you
image, choosing those that have good move along the curve. The torsion (t)
support. Another example is a human of a 3D space curve ~c(t) is the scalar
body recognizer that employs body h ... i
part recognizers (e.g., heads, legs, ~
c (t), ~
c(t), ~c (t)
d~
torso) that, in turn, either directly use ~n (t) b(t) =
dt k~c(t)k2
image data or recognize even smaller
subparts. [ BB:10.4.2] where ~n(t) is the curve normal and ~b(t)
the binormal . The notation [~x, ~y , ~z]
top hat operator: A morphological denotes the scalar triple product
operator used to remove structures ~x (~y ~z). [ FP:19.1.1]
from images. The top-hat filtering of
image I by structuring element S is torus: 1) The volume swept by moving
the difference I open(I, S), where a sphere along a circle in 3D. 2) The
open(I, S) is the morphological opening surface of such a volume. [ WP:Torus]
of I by S.
total variation: A class of regularizer
in the variational approach . The total
268 T

variation regularizer of function measurements ~z(t) to estimate ~x(t).


f (~x) : RRn 7 R is of the form [ FP:17-17.5]
R(f ) = |f (~x)|d~x where is (a
subset of) the domain of f . traffic analysis: Analysis of video
[ WP:Total variation] data of automobile traffic, e.g., to
identify number plates, detect
tracking: A means of estimating the accidents, detect congestion, compute
parameters of a dynamic system. A throughput, etc.
dynamic system is characterized by a
set of parameters (e.g., feature point training set: The set of labeled
positions, target object positions, examples used to learn the parameters
human joint angles) evolving over time, of a classifier . In order to build an
of which we have measurements (e.g., effective classifier, the training set
photographs of the human) obtained at should be representative of the
successive time instants. The task of examples that will be encountered in
tracking is to maintain an estimate of the eventual domain of application.
the probability distribution over the [ WP:Training set]
model parameters, given the
trajectory: The path that a moving
measurements, as well as a priori
point makes over time. It could also be
models of how the parameters change
the path that a whole object takes if
over time. Common algorithms for
less precision of usage is desired.
tracking include the Kalman filter and
[ WP:Trajectory]
particle filters . Tracking may be viewed
as a class of algorithms that operate on trajectory estimation:
sequences of inputs, using assumptions Determination of the 3D trajectory of
about the coherence of successive an object observed in a set of 2D
inputs to improve performance of the images.
algorithm. Often the task of the
algorithm may be cast as estimation of transformation: A mapping of data
a state vectora set of parameters such in one space (such as an image) into
as the joint angles of a human another space (e.g., all
bodyat successive time instants t. image processing operations are
The state vector ~x(t) is to be estimated transformations).
using a set of sensors that yield [ WP:Transformation (function)]
observations, ~z(t), such as the 2D
positions of bright spots attached to a translation: A transformation of
human. In the absence of temporal Euclidean space that can be represented
coherence assumptions, ~x must be in the form ~x 7 T (~x) ~x 7 ~x + ~t. In
estimated at each time step solely using projective space, a transformation that
the information in ~z(t). With coherence leaves the plane at infinity pointwise
assumptions, the system uses the set of invariant. [ BB:A1.7.4]
all observations so far {~z( ), < t} to
translation invariant: A property
compute the estimate at time t. In
that keeps the same value even if the
practice, the estimate of the state is
data, scene or the image from which the
represented as a probability density
data comes is translated. The distance
over all possible values, and the current
between two points is translation
estimate uses only the previous state
invariant.
estimate ~x(t 1) and the current
T 269

translucency: The transmission of observed in three perspective 2D views.


light through a diffusing interface such Algebraically represented as a 3 3 3
as frosted glass. Light entering a array of values Tijk . If a single 3D point
translucent material has multiple projects to x, x , x in the first, second,
possible exit directions. and third views respectively, it must
[ WP:Translucence] obey the nine equations
j k
transmittance: Transmittance is the xi (x jr )(x ks )Ti = 0rs
ratio of the (outgoing) power for r and s varying from 1 to 3. In the
transmitted by a transparent object to above, is the epsilon-tensor for which
the incident (incoming) power.
[ WP:Transmittance]

1 ijk an even
permutation of 123



transparency: The property of a
surface to be traversed by radiation ijk = 0 two of i, j, k equal

1 ijk an odd

(e.g., by visible light), so that objects



on the other side can be seen. A permutation of 123
non-transparent surface is called
opaque. As this equation is linear in the
[ WP:Transparency and translucency] elements of T , it can be used to
estimate them given enough 2D point
tree classifiers: A classifier that correspondences x, x , x . As not all
applies a sequence of binary tests to 3 3 3 arrays represent realizable
input points ~x in order to determine the camera configurations, estimation must
label l of the class to which it belongs. also incorporate several nonlinear
constraints on the tensor elements.
tree search method: A class of [ FP:10.2]
algorithms to optimize a function
defined on tuples of values taken from a trilinear constraint: The geometric
finite set. The tree describes the set of constraint on three views of a point
all such tuples, and the order in which (i.e., the intersection of three
tuples are explored is defined by the epipolar lines ). This is similar to the
particular search algorithm. Examples epipolar constraint which is applied in
are depth-first, breadth-first, A and the two view scenario.
best-first search. Applications include [ FP:10.2.1-10.2.3]
the interpretation tree .
[ WP:Tree search algorithm] trilinear tensor: Another name for
the trifocal tensor . [ FP:10.2]
triangulated models: See
surface mesh . [ JKS:13.5.1] trilinearity: An equation in a set of
three variables in which holding two of
triangulation: See the variables fixed yields a linear
Delaunay triangulation , equation in the remaining one. For
surface triangulation , example xyz = 0 is trilinear in x, y and
stereo triangulation , z, while x2 = y is not, as holding y fixed
structured light triangulation . yields a quadratic in x. [ HZ:14.2.1]
[ HZ:9.1]
trinocular stereo: A multiview stereo
trifocal tensor: The geometric entity process that uses three cameras.
that relates the images of 3D points [ OF:6.9]
270 T

tristimulus theory of color tube with a photoconductive window.


perception: The human visual Once the only type of light sensor, the
system has three types of cones, with tube camera is now largely superseded
three different spectral response by the CCD, but remains useful for
curves, so that the perception of any some high dynamic range imaging. The
incident light is represented as three image orthicon tube or immy is
intensities, roughly corresponding to remembered in the name of the US
long (maximum about 558580 nm), Academy of Television Arts and
medium (531545 nm) and short Sciences Emmy awards. [ LG:2.1.3]
(410450 nm) wavelengths.
[ WP:CIE 1931 color space#Tristimulus twist: A 3D rotation representation
values]
component that specifies a rotation
about the vector defined by the
tristimulus values: The relative azimuth and elevation. This figure
amounts of the three primary colors shows the pitch rotation direction:
that need to be combined to match a
given color. [ AJ:3.8]
TWIST
true negative: A hypothesis which is ELEVATION
AZIMUTH
false that has been corrected rejected.
[ VSN:3.1.1]

true positive: A hypothesis which is


true that has been corrected accepted. twisted cubic: The curve (1, t, t2 , t3 )
[ VSN:3.1.1] in projective 3-space, or any projective
transformation thereof. The general
truncated median filter: An form is thus
approximation to mode filtering when
image neighborhoods are small. The x1 a11 a12 a13 a14 1
filter sharpens blurred image edges as x2 a21 a22 a23 a24 t
= 2
well as reducing noise . The algorithm x3 a31 a32 a33 a34 t
truncates the local distribution on the x4 a41 a42 a43 a44 t3
mean side of the median and then
recomputes the median of the new The projection of a twisted cubic into a
distribution. The algorithm can iterate 2D image is a rational cubic spline.
and, under normal circumstances, [ HZ:2.3]
converges approximately to the mode
two view geometry: See
even if the observed distribution has
binocular stereo . [ JKS:12.6]
very few samples with no obvious peak.
[ ERD:3.4] type I error: A hypothesis which is
true that has been rejected.
tube camera: See tube sensor .
[ VSN:3.1.1]
[ LG:2.1.3]
type II error: A hypothesis which is
tube sensor: A tube sensor converts
false that has been accepted.
light to a video signal using a vacuum
[ VSN:3.1.1]
U

ultrasonic imaging: Creation of is opaque to UV radiation, quartz glass


images by the transmission and is transparent. Often used to excite
recording of reflected ultrasonic pulses. fluorescent materials. [ EH:3.6.5]
A phased array of transmitters emits a
set of pulses, and then records the umbilic: A point on a surface where
returning pulse intensities. By varying the curvature is the same in every
the relative timings of the pulses, the direction. Every point on a sphere is an
returned intensities can be made to umbilic point. [ JKS:13.3.2]
correspond to locations in space,
umbra: The completely dark area of a
allowing measurements to be taken
shadow caused by a particular light
from within ultrasonic-transparent
source (i.e., where no light falls from
materials (including the human body,
the light source). [ FP:5.3.2]
excluding air and bone). [ FP:18.6.1]

ultrasound sequence registration:


Registration of overlapping ultrasound Light Source
No shadow

images. [ FP:18.6] Fuzzy shadow

Umbra (complete shadow)


ultraviolet: Description of
electromagnetic radiation with uncalibrated approach: See
wavelengths between about 300420 nm uncalibrated vision .
(near ultraviolet) and 40300 nm (far
ultraviolet). The short wavelengths uncalibrated stereo: Stereo
make it useful for fine-scale reconstruction performed without
examination of surfaces. Ordinary glass precalibration of the cameras.
271
272 U

Particularly, given a pair of images algorithm under-segments if regions


taken by unknown cameras, the output by the algorithm are generally
fundamental matrix is computed from the union of many desired regions. This
point correspondences, after which the image should be segmented into three
images may be rectified and regions but it was under-segmented into
conventional calibrated stereo may two regions [ SQ:8.7]:
proceed. The results of uncalibrated
stereo are 3D points in a projective
coordinate system, rather than the
Euclidean coordinate system that a
calibrated setup admits.

uncalibrated vision: The class of


vision techniques that require no
quantitative information about the
camera used in capturing the images on uniform distribution: A probability
which they operate. For example, distribution in which a variable can
techniques that can be applied on take any value
archive footage. In particular, applied in the given range with equal probability.
to geometric problems such as stereo [ WP:Uniform distribution (continuous)]
reconstruction that traditionally
required that the images be from a
camera system upon which calibration uniform illumination: An idealized
measurements had been previously configuration in which the arrangement
made. Uncalibrated approaches include of lighting within a scene is such that
those, such as uncalibrated stereo each point receives the same amount of
where the traditional calibration step is light energy. In computer vision,
replaced by procedures that can use sometimes uniform illumination has a
image features directly, and others, different meaning: that each point in an
such as time-to-contact computations image of the scene (or a part thereof
that can be expressed in ways that such as the background) has similar
factor out the calibration parameters. imaged intensity.
In general, uncalibrated systems will
have degrees of freedom that cannot be uniform noise: Additive corruption of
measured, such as overall scale, or a sampled signal. If the signals samples
projective ambiguity. are si then the corrupted signal is
si = si + ni where the ni are uniformly
uncertainty representation: A randomly drawn from a specified range
strategy for representation of the [, ].
probability density of a variable as used
in a vision algorithm. In a similar uniqueness stereo constraint:
manner, an interval can be used to When performing stereo matching or
represent a range of possible values. stereo reconstruction, matching can be
[ VSN:8.3] simplified by assuming that points in
one image correspond to only one point
under-segmented: Describing the in other images. This is generally true,
output of a segmentation algorithm. except at object boundaries and other
Given an image where a desired places where pixels are not completely
segmentation result is known, the opaque. [ OF:6.2.2]
U 273

unit ball: An n dimensional sphere of subtracting a smoothed version of the


radius one. [ VSN:2.2] image yielding

unit quaternion: A quaternion is a Iunsharp = I + (I Ismooth )


4-vector ~q R4 . Quaternions of unit
length can be used to parameterize 3D
This shows an input image and its
rotation matrices. Given a quaternion
unsharped output [ RJS:4]:
with components (q0 , q1 , q2 , q3 ) the
corresponding rotation matrix R is
(letting S = q02 q12 q22 q32 ):
" 2
#
S+2q1 2q1 q2 +2q0 q3 2q3 q1 2q0 q2
2q1 q2 2q0 q3 S+2q22 2q2 q3 +2q0 q1
2q3 q1 +2q0 q2 2q2 q3 2q0 q1 S+2q32

The identity rotation is given by the


quaternion (1, 0, 0, 0). The rotation axis
is the unit vector parallel to (q1 , q2 , q3 ).
[ WP:Quaternion]

unit vector: A vector of length one. unsupervised classification: See


[ WP:Unit vector] clustering . [ AJ:9.14]

unitary transform: A reversible unsupervised learning: A method


transformation (e.g., the for training a neural network or other
discrete Fourier transform ). U is a classifier where the network learns to
unitary matrix where U U = I, U is recognize patterns (in a training set)
the adjoint matrix and I is the identity automatically. See also
matrix. [ AJ:2.7, 5.2] supervised learning . [ SQ:14.1.3]

unrectified: When a stereo camera updating eigenspace: Algorithms for


pair has not been rectified . the incremental updating of
[ JKS:12.5] eigenspace representations . These
algorithms facilitate approaches such as
unsharp operator: An image active learning .
enhancement operator that sharpens
edges by adding a high pass filtered USB camera: A camera conforming
version of an image to itself. The high to the USB (Universal Serial Bus)
pass filter is implemented by standard. [ WP:USB camera]
V

validation: Testing whether or not vanishing line: The 2D line that is


some hypothesis is true. See also the image of the intersection of a 3D
hypothesize and verify . plane with the plane at infinity. The
[ WP:Validation] horizon line in an image is the image of
the intersection of the ground plane
valley: A dark elongated object in a with the plane at infinity, just as a pair
gray scale image, so called because it of railway lines meeting in a vanishing
corresponds to a valley in the image point is the intersection of two parallel
viewed as a 3D surface or elevation map lines and the plane at infinity. This
of intensity versus image position. sketch shows the vanishing line for the
ground plane with a road and railroad
valley detection: An image
[ HZ:7.6]:
processing operator (see also
bar detector ) that enhances linear
features rather than light-to-dark edges.
VANISHING POINT
See also valley .
VANISHING LINE
value quantization: When a
continuous number is encoded as a
finite number of integer values. A
common example of this occurs when a
voltage or current is encoded as integers
in the range 0255.

274
V 275

vanishing point: The image of the where the truth term measures
point at infinity where two parallel 3D fidelity to the data and the beauty
lines meet. A pair of parallel 3D lines term is a regularizer. These can be seen
are represented as ~a + ~n and ~b + ~n. in a specific example: smoothing. In
The vanishingpoint is the image of the the conventional approach, smoothing
~n might be considered the result of an
3D direction . This sketch shows
0 algorithm: convolve the image with a
the vanishing points for a road and Gaussian kernel. In the variational
railroad [ TV:6.2.3]: approach, the smoothed signal P is the
signal that best trades off smoothness,
measured as R the square of the second
VANISHING POINT derivative (P (t))2 dt, and fidelity to
VANISHING LINE the data, measured as the squared
difference
R between the input and the
output (P (t) I(t))2 dt, with the
balance chosen by a parameter :
Z
E(P ) = (P (t) I(t))2 + (P (t))2 dt

variational method: See


variable focus: A camera system with variational approach .
a lens system that allows zoom to be
changed under user or program control. variational problems: See
An image sequence in which focal variational approach .
length varies through the sequence.
vector field: A multi-valued function
variational approach: Signal f~ : Rn 7 Rm . For example, the
processing expressed as a problem of 2D-to-2D function f~(x, y) = (y, sin x)
variational calculus. The input signal is illustrated below. An RGB image
a function I(t) on the interval I(x, y) = (r(x, y), g(x, y), b(x, y)) is an
t [1, 1]. The processed signal is a example of a 2D-to-3D vector field.
function P defined on the same [ WP:Vector field]
interval, that minimizes an energy
functional E(P ) of the form
[y, sin( x)]
Z 1
E(P ) =
f (P (t), P (t), I(t))dt.
1

The calculus of variations shows that


the minimizing P is the solution to the
associated EulerLagrange equation
f d f
=
P dt P
In computer vision, the functional is
often of the form
Z
E = truth(P, I) + beauty(P )
276 V

velocity smoothness constraint:


vector quantization: Representation Changes in the magnitude or direction
of a set of vectors by associating each of an images velocity field occur
possible vector with one of a small set smoothly.
of codebook vectors. For example,
each pixel in an RGB image has 2563 vergence: 1) The angle between the
possible values, but one might expect optical axes in a stereo system, when
that a particular image uses only a the two cameras fixate on the same
small subset of these values. If a scene point. 2) The difference between
256-element colormap is computed, and the pan angle settings of the two
each RGB value is represented by the cameras. [ FP:11.2]
nearest RGB vector in the colormap,
the RGB space has been quantized into vergence maintenance: The action
256 elements. [ FP:26.3] of a control loop which ensures that the
optical centers of two cameraswhose
vehicle detection: An example of the positions are under program
object recognition problem where the controlare looking at the same scene
task is to identify vehicles in video point.
imagery.
verification: In the context of
vehicle license/number plate object recognition , a class of algorithms
analysis: When a visual system aiming to test the validity of various
locates the license plate in a video hypotheses (models) explaining the
image and then recognizes the data. Back projection is such a
characters. technique, typically used with
geometric models. See also
vehicle tracking: An example of the object verification . [ JKS:15.1]
tracking problem applied to images of
vehicles. [ FP:17.5.1] vertex: A point at the end of a line
(edge) segment. Often vertices are
velocity: Rate of change of position. common to two or more line segments.
Generally, for a curve ~x(t) Rn the [ AL:10.2]
velocity is the n-vector ~xt(t)
[ BB:3.6.1] video: 1) Generic term for a set of
images taken at successive instants with
velocity field: The image velocity of small time intervals between them. 2)
each point in an image. See also The analogue signal emitted by a
optical flow field . [ RJS:5] video camera . Each frame of video
corresponds to about 40 ms of electrical
velocity moment: A moment that signal that encodes the start of each
integrates information about region scan line, the image encoding of each
velocity as well as position and shape video scan line, and synchronization
[i]
distribution. Let mpq be the pq th information. 3) A video recording.
central moment of a binary region in [ SEU:1.4]
the ith image. Then the Cartesian
velocity moments are defined as video annotation: The association of
[i]
yi yi1 )s mpq , symbolic objects, such as text
i1 )r (
P
vpqrs = i ( xi x
where (xi , yi ) is the center of mass in descriptions or index terms with frames
the ith image. of video.
V 277

video camera: A camera that records such as number of lines, number


a sequence of images over time. of pixels, front porch, sync and blanking.
[ FP:1] [ WP:Moving image formats#Transmission]

video coding: The conversion of


video to a digital bitstream. The vidicon: A type of tube camera ,
source may be analogue or digital. successor of the image orthicon tube.
Generally, coding also compresses or [ RJS:8]
reduces the bitrate of the video data.
[ WP:Video coding] view based object recognition:
Recognition of 3D objects using
video compression: Video coding multiple 2D images of the objects
with the specific aim of reducing the rather than a 3D model.
number of bits required to represent a
video sequence. Examples include view combination: A class of
MPEG , H.263, and DIVX . techniques combining prototype views
[ WP:Video compression] linearly to form appearance models.
See also appearance model ,
video indexing: Video annotation eigenspace based recognition ,
with the aim of allowing queries of the prototype , representation .
form At what frame did event x
occur? or Does object x appear?. view volume: The infinite volume of
3D space bounded by the cameras
video rate system: A real time center of projection and the edges of
system that operates at the frame rate the viewable area on the image plane .
of the ambient video standard. The volume might also be bounded
Typically 25 or 30 frames per second, near and far by other planes because of
50 or 60 fields per second. focusing and depth of field constraints.
This figure illustrates the view volume
video restoration: Application of [ JKS:8.4]:
image restoration to video, often
making use of the temporal coherence
of video, or correcting for video-specific
degradations. VIEW VOLUME

video segmentation: Application of


segmentation to video, 1) with the
requirement that the segmentation
exhibit the temporal coherence in the CENTER OF PROJECTION

original footage and 2) to split the


video sequence into different groups ofviewer centered representation: A
consecutive frames , e.g., when there isrepresentation of the 3D world that an
a change of scene. observer (e.g., robot or human)
maintains. In the viewer centered
video sequence: See video . version, the global coordinate system is
[ WP:Video] maintained on the observer, and the
representation of the world changes as
video transmission format: A the observer moves. Compare
description of the precise form of the object centered representation .
analog video signal coding conventions [ TV:10.6.2]
in terms of duration of components
278 V

viewing space: The set of all possible viewsphere: The set of camera
locations from which an object or scene positions from which an object can be
could be viewed. Typically these observed. If the camera is orthographic,
locations are grouped to give a set of the viewsphere is parameterized by the
typical or characteristic views of the 2D set of points on the 3D unit sphere.
object. If orthographic projection is At the camera position corresponding
used, then the full 3D space of views to a particular point on the viewsphere,
can be simplified to a viewsphere . all images of the object due to camera
rotation are related by a 2D-to-2D
viewpoint: The position and image transformation, i.e., no parallax
orientation of the camera when an effects occur. See aspect graph . The
image was captured. The viewpoint placement of a camera on the
may be expressed in viewsphere is illustrated here:
absolute coordinates or relative to some
arbitrary coordinate system, in which
case the relative position of the camera
and the scene (or other cameras) is the
relevant quantity.

viewpoint consistency constraint:


Lowes term for the concept that a 3D
model matched to a set of 2D line
segments must admit at least one 3D
camera position that projects the 3D
model to those lines. Essentially, the
3D and 2D data must allow vignetting: Darkening of the corners
pose estimation . of an image relative to the image
viewpoint dependent center, which is related to the degree to
representations: See which the points are off the optical axis
viewer centered representation . . [ AL:3.11]
[ TV:10.6.2] virtual bronchoscopy: Creation of
viewpoint planning: Deciding where virtual views of the pulmonary system
an active vision system will look next, based on e.g.,
in order to maximize the likelihood of magnetic resonance imaging as a
achieving some preset goal. A common replacement for endoscope imaging.
example is computing the location of a [ WP:Bronchoscopy]
range sensor in several successive virtual endoscopy: Simulation of a
positions in order to gain a complete traditional endoscopy procedure using
3D model of a target object. After n virtual reality representation of
pictures have been captured, the physiological data such as that
viewpoint planning problem is to obtained by an X-ray CAT -scan or
choose the position of picture n + 1 in magnetic resonance imaging.
order to maximize the amount of new
data acquired, while ensuring that the virtual reality: The use of computer
new position will allow the new data to graphics and other interaction tools to
be registered to the n existing images. confer on a user the sensation of being
in, and interacting with, an alternative
environment. This includes simulation
V 279

of visual, aural, and haptic cues.


Common ways in which the visual visual attention: The process by
environment is displayed are: rendering which low level feature detection
a 3D model of the world into a directs high level scene analysis and
head-mounted display whose viewpoint object recognition strategies. In
is tracked in 3D so that the users head humans, the results of the process are
movements generate images evident in the pattern of fixations and
corresponding to their viewpoint; saccades in normal observation of the
placing the user in a computer world. [ WP:Attention]
augmented virtual environment
visual cortex: A part of the brain
(CAVE), where as much as possible of
dedicated to the processing of visual
the users field of view can be
information. [ WP:Visual cortex]
manipulated by the controlling
computer. [ WP:Virtual reality] visual hull: A space carving method
for approximating shape from multiple
virtual view: Visualization of a model
images. The method finds the
from a particular viewpoint .
silhouette contours of a given object in
viscous model: A deformable model each image. The region of space defined
based on the concept of a viscous fluid by each camera and the associated
(i.e., a fluid with a relatively high image contour imposes a constraint on
resistance to flow). [ WP:Viscous] the shape of the target object. The
visual hull is the intersection of all such
visible light: Description of constraints. As more views are taken,
electromagnetic radiation with the approximation becomes better. See
wavelengths between about 400 nm the shaded areas in [ FP:26.4]
(blue) and 700 nm (red), corresponding
to the range to which the rods and
cones of the human eye are sensitive.
[ WP:Visible spectrum]

visibility: Whether or not a particular


feature is visible from a camera
position.

visibility class: The set of points


where exactly the same portion of an
object or scene is visible. For example,
when viewing the corner of a cube, an
observer can move about in about visual illusion: The perception of a
one-eighth of the full viewing space scene, object or motion not
before entering a new visibility class. corresponding to the world actually
visibility locus: All camera positions causing the image or sequence. Illusions
from which a particular feature is are caused, in general, by the
visible. combination of special arrangements of
the visual stimuli, viewing conditions,
VISIONS: The early scene and responses of the human vision
understanding system of Hanson and system. Well-known examples include
Riseman. the Ames room (two persons are seen
as having very different heights in a
280 V

seemingly normal room) and the Ponzo space given one or more images of it.
illusion: Solutions differ according to several
factors including the number of input
images (one, as in model based
pose estimation , multiple discrete
images, as in stereo vision , or video
sequences, as in motion analysis ), the a
priori knowledge assumed (i.e.,
camera calibration available or not, full
perspective or simplified projection
model, geometric model of target
available or not).

visual navigation: The problem of


navigating (steering) a robot through
an environment using visual data,
Here two equal segments seem to be typically video sequences . It is
different lengths as interpreted as 3D possible, under diverse assumptions, to
projections). The well-known determine the distance from obstacles,
ambiguous figurebackground drawings the time-to-contact, and the shape
of the Gestalt psychology (see Gestalt ), and identity of the objects in view.
like the famous chalicefaces pattern, Both video and range sensors have
are a related subject. [ CS:3.2] been used, including acoustic sensors
(see sonar ). See also visual servoing ,
visual industrial inspection: The visual localization .
use of computer vision techniques in
order to effect quality control or to visual routine: Ullmans 1984 term
control processes in an industrial for a subcomponent of a visual system
setting. [ WP:Visual inspection] that performs a specific task, analogous
to a behavior in robotics.
visual inspection: A general term for [ WP:Visual routine]
analyzing a visual image to inspect
some item, such as might be used for visual salience: A (numerical)
quality control on a production line. assessment of the degree to which pixels
[ WP:Visual inspection] or areas of a scene attract
visual attention . The principle of
visual learning: The problem of Gestalt organization.
learning visual models from sets of [ WP:Salience (neuroscience)]
images (examples), or in general
knowledge that can be used to carry visual search: The task of searching
out vision tasks. An area of the vast an image for a particular prespecified
field of automated learning. Important object. Often used as a an experimental
applications employing visual learning tool in psychophysics.
include face recognition and [ WP:Visual search]
image database indexing . See also visual servoing: Robot control via
unsupervised learning , motions that make the image of, e.g.,
supervised learning . the robot end effector coincide with the
visual localization: The problem of image of the target position. Typically,
estimating the location of a target in the system has little or no a priori
V 281

knowledge of the camera locations, computerized tomography and


their relation to the robot, or the robot nuclear magnetic resonance data.
kinematics. These parameters are
learned as the robot moves. Visual volumetric reconstruction: Any of
servoing allows the calibration to several techniques that derive a
change during robot operation. Such volumetric representation from image
systems can adapt well to anomalous data. Examples include
conditions, such as an arm bending X-ray tomography , space carving and
under a load or motor slippage, or visual hull computation. [ BM:3.7.3]
where calibration may not provide
volumetric representation: A data
sufficient precision to allow the desired
structure by means of which a subset of
actions to be reliably produced purely
3D space is represented digitally.
from the modeled robot kinematics and
Examples include voxmap , octree and
dynamics. Because only image
the space bounded by surface
measurements are available, the inverse
representations. [ VSN:9.2]
kinematic problem may be harder than
in conventional servoing. Voronoi cell: See Voronoi diagram.
[ WP:Visual Servoing] [ SQ:7.3.2]
visual surveillance: Surveillance Voronoi diagram: Given n points
dependent only on the use of ~x1..n , the Voronoi diagram of the point
electromagnetic sensors. set is a partition of space into n regions
[ WP:Surveillance#Types of surveillance] or cells R1..n . Every point p in cell Ri
is closer to point ~xi than to any other
~x. The hyperplanes separating the
volume: 1) A region of 3D space. A
Voronoi regions are the perpendicular
subset of R3 . A (possibly infinite) 3D
bisectors of the edges in the
point set. 2) The space bounded by a
Delaunay triangulation of the point set.
closed surface . [ VSN:9.2]
The Voronoi diagram of these four
volume detection: The detection of points are the four cells surrounding
volume-shaped entities in 3D data sets, them [ SQ:7.3.2]:
such as might be produced by an
nuclear magnetic resonance scanner.

volume matching: Identification of


correspondence between objects or
subsets of objects defined using a
volumetric representation .

volume skeletons: The skeletons of


3D point sets, by extension of the
definitions for 2D curves or regions.

volumetric image: A voxmap or 3D


array of points where each entry
typically represents some measure of voxel: From volume element by
material density or other property in analogy with pixel. A region of 3D
3D space. Common examples include space, named by analogy with pixel .
Usually voxels are axis-aligned
rectangular solids or cubes. A
282 V

component of the voxmap the volume iff v(i, j, k) = 1. The


representation for 3D volumes. A voxel, advantages of the representation are
like a pixel, may have associated that it can represent arbitrarily
attributes such as color, occupancy, or complex topologies and is fast to look
the density of some measurement at up. The major disadvantage is the large
that point. [ FP:21.3.3] memory usage, addressed by the octree
representation.
voxmap: A volumetric representation
that describes a 3D volume by dividing VRML: Virtual Reality Markup
space into a regular grid of voxels , Language. A means of defining 3D
arranged as a 3D array v(i, j, k). For a geometric models intended for Internet
boolean voxmap, cell (i, j, k) intersects delivery. [ WP:Vrml]
W

walkthrough: A classification of the these values for a given order n is the


infinite number of paths between two Hadamard matrix H2n of order 2n .
points into one of nine equivalence The two functions of order 1 are the
classes of the eight relative directions rows of
between the points plus the ninth
1 1
having no movement. Point B is in H2 =
1 1
equivalence class 2 relative to A:
and the four of order 2 (depicted below)
8 1 are
7 2 H2 H2
H4 =
A B H2 H2

6 3 1 1 1 1
1 1 1 1
5 4 = 1

1 1 1
Walsh function: The Walsh functions 1 1 1 1
of order n are a particular set of square
In general, the functions of order n + 1
waves W (n, k) : [0, 2n ) 7 {1, 1} for k
are generated by the relation
from 1 to 2n . They are orthogonal, and
the product of Walsh functions is a

H2n H2n
Walsh function. The square waves H2n+1 =
H2n H2n
transition only at integer lattice points
so each function can be specified by the and this recurrence is the basis of the
vector of values it takes on the points fast Walsh transform. The four Walsh
{ 21 , 1 12 , . . . , 2n 21 }. The collection of functions of order 2 are [ SEU:2.5.3]:
283
284 W

1 1 1 1
0 0 0 0
1 1 1 1

0 2 4 0 2 4 0 2 4 0 2 4

Walsh transform: Expression of a The original image I(~x); Warping


2n -element vector v in terms of a basis function represented by arrows joining
of order-n Walsh functions ; the points ~x to w1 (~x); Warped image
multiplication by the corresponding W (~x). [ SB:7.10]
Hadamard matrix. The Walsh
transform has applications in image watermark: See digital watermarking.
coding, logic design and the study of [ WP:Watermark]
genetic algorithms. [ SEU:2.5.3]
watershed segmentation: Image
Waltz line labeling: A scheme for segmentation by means of the
the interpretation of line images of watershed transform . A typical
polyhedra in blocks-world images. implementation proceeds thus: 1.
Each image line is labeled to indicate Detect edges; 2. Compute the distance
what class of scene edge gave rise to it: transform D of the edges; 3. Compute
concave, convex, occluding, crack or watershed regions in D.
shadow. By including the constraints
supplied by junction labeling in a
constraint satisfaction problem, Waltz
demonstrated that collections of lines
whose labels were locally ambiguous
could be globally disambiguated. This
is a simple example of Waltz line
labeling showing concave edges (),
convex edges (+) and occluding edges
(>) [ BB:9.5.3]:

+ +

+ +
+
-
-

warping: Transformation of an image (a) Original image; (b) Canny edges;


by reparameterization of the 2D plane. (c) Distance transform; (d) Region
Given an image I(~x) and a 2D-to-2D boundaries of watershed transform
mapping w : ~x 7 x~ , the warped image of (c); (e) Mean color in watershed
W (~x) is I(w(~x)). Warping functions w regions; (f) Regions overlaid on image.
are often designed so that certain [ SB:10.10]
control points p~1..n in the source image
are mapped to specified locations p~1..n watershed transform: A tool for
in the destination image. See also morphological image segmentation.
image morphing . The watershed transform views the
W 285

image as an elevation map, with each cos(3x) = sin(3x + 2 ) = f ( 6x+1


2 ).
local minimum in the map given a Similarly, a wavelet basis is made from
unique integer label. The watershed a mother wavelet (x) by translating
transform of the image assigns to each and scaling: each basis function jk (x)
non-minimum pixel, p, the label of the is of the form
minimum to which a drop of water jk (x) = const (2j x k). The
would fall if placed at p. Points on conditions on ensure that different
ridges or watersheds of the elevation basis functions (i.e., with different j
map, that could fall into one of two and k) are orthonormal . There are
minima are called watershed points and several popular choices (e.g., by Haar
the set of pixels surrounding each and Daubechies) for , that trade off
minimum that share its label are called various desirable properties, such as
watershed regions. Efficient algorithms compactness in space and time, and
exist for the computation of the ability to approximate certain classes of
watershed transform. functions.


1,2 1,1 1,0 1,1 1,2

2,2 2,1 2,0 2,1 2,2


Here is an image with minima
superimposed. the same image viewed
as a 3D elevation map and the
watershed transform of the image,
where different minima have different The mother Haar wavelet and some of
colored regions and watershed pixels the derived wavelets j,k . [ S. G.
are shown in white. One particular Mallat, A Theory for Multiresolution
watershed is indicated by arrows. Signal Decomposition: The Wavelet
[ SB:10.10] Representation, IEEE Transactions on
Pattern Analysis and Machine
wavelength: The wavelength of a Intelligence, 11:674-693, 1989.]
wave is the distance between successive
peaks. Denoted , it is the waves wavelet descriptor: Description of a
speed divided by the frequency. shape in terms of the coefficients of a
Electromagnetic waves, particularly wavelet decomposition of the original
visible light, are often important in signal, in a manner similar to
computer vision, with wavelengths of Fourier shape descriptors for 2D
the order of 400700 nm. [ EH:2.2] curves. See also wavelet transform .
[ FP:22.5.2]
wavelet: A function (x) that has
certain properties that mean it can be wavelet transform: Representation of
used to derive a set of basis functions a signal in terms of a basis of wavelets .
in terms of which other functions can Similar to the Fourier transform , but
be approximated. Comparing to the as the wavelet basis is a two-parameter
Fourier transform basis functions, note family of functions jk , the wavelet
that they can be viewed as a set of transform of a d-D signal is an (d + 1)-D
scalings and translations of function. However, the number of
f (x) = sin(x), for example distinct values needed to represent the
286 W

transform of a discrete signal of length weight associated. The weights might


n is just O(n). The wavelet transform specify the confidence or quality of the
has similar applications to the Fourier data item. The use of weights can help
transform, but the wavelet basis offers make the estimation more robust .
advantages when representing natural [ WP:Weighted least squares#Weighted least squares]
signals such as images. [ SEU:2.5.5]

weak perspective: An approximation weighted walkthrough: A discrete


of viewing geometry between the measure of the relative position of two
pinhole or full perspective camera and regions. The measure is a histogram of
the orthographic imaging model. The the walkthrough relative positions of
projection of a homogeneous 3D point every pair of points selected from the
X~ = (X, Y, Z, 1) is given by the two regions.
formula
weld seam tracking: Using visual
x p11 p12 p13 p14 ~ feedback to control a robot welding
= X
y p21 p22 p23 p24 device, so it maintains the weld along
the desired seam.
for the affine camera , but with the
additional constraint that the vectors white balance: A system of color
(p11 , p12 , p13 ) and (p21 , p22 , p23 ) are correction to deal with differing light
scaled rows of a rotation matrix , i.e., conditions, in order for white objects to
appear white. [ WP:Color balance]
p11 p21 + p12 p22 + p13 p23 = 0
white noise: A noise process in which
[ TV:2.2.4] the noise power at all frequencies is
equal (as compared to pink noise ).
weakly calibrated stereo: Any When considering spatially distributed
two-view stereo algorithm for which noise, white noise means that there is
the only calibration information needed distortion at all spatial frequencies
is the fundamental matrix between the (i.e., large distortions as well as small).
cameras is said to be weakly calibrated. [ WP:White noise]
In the general, multi-view, case, means
the camera calibration is known up to a whitening filter: See
projective ambiguity . Weakly noise-whitening filter. [ AJ:6.2]
calibrated systems cannot determine
Euclidean properties such as absolute wide angle lens: A lens with a field
scale but will return results that are of view greater than about 45 degrees.
projectively equivalent to the Euclidean Wide angle lenses allow more
reconstructions. [ FP:10.1.5] information to be collected in a single
image, but often suffer a loss of
Webers Law: If a difference can be resolution, particularly at the periphery
just perceived between two stimuli of of the image. Wide angle lenses are also
values I and I + I then it should be more likely to require correction for
possible to perceive a difference between nonlinear lens distortion .
two stimuli with different values J and [ WP:Wide angle lens]
J + J where I J
I J . [ AJ:3.2]
wide baseline stereo: The
weighted least squares: A stereo correspondence problem (sense
least square error estimation process in 1) in the particular case when the two
which the data elements also have a
W 287

images for which correspondence is to reconstruction of F given G and K


be determined are significantly different [ AJ:8.3]:
because the cameras are separated by a
long baseline. In particular, a 2D GK
F =
window around a point in one image is |K|2 + N
expected to look significantly different
in the second image due to windowing: Looking at a small
foreshortening , occlusion, and lighting portion of a signal or image through a
effects. window. For example, given the
vector ~x = {x1 , . . . , x100 }, one might
wide field-of-view: Where the optics look at the window of 11 values
is designed to capture light rays centered around 50, {x45..55 }. Often
forming large angles (say 60 degrees or used in order to restrict some
more) with the optical axis. See also computation such as the
wide angle lens , Fourier transform to a small part of the
panoramic image mosaic , image. In general, windowing is
panoramic image stereo , described by a windowing function,
plenoptic function representation . which is multiplied by the signal to give
the windowed signal. For example, a
width function: Given a 2D shape signal f (~x) : Rn 7 R and windowing
(closed subset of the plane) S R2 , the function w(; ~x) are given, where
width function w() is the width of the controls the scale or width of w. Then
shape as a function of orientation. the windowed signal is
Specifically, the projection
P () := {x cos + y sin | (x, y) S}, fw (~x) = f (~x)w(; ~x ~c)
and w() := max P () min P ().
where ~c is the center of the window.
Wiener filter: A regularized The Bartlett (1 |~x| ), Hanning
inverse convolution filter. Given a ( 21 + 12 cos |~x|
), and Gaussian
signal g that is known to be the |~
x| 2

convolution of an unknown signal f and (exp( 2 )) windowing functions in 2D


a known corrupting signal k, it is are shown here [ SOS:7.1.3]:
desired to undo the effect of k and
recover f . If (F, G, K) are the
respective Fourier transforms of
(f, g, k), then G = F K, so the inverse
filter can recover F = G K. In
practice, however, G is corrupted by
noise, so that when an element of K is
less than the average noise level, the winged edge representation: A
noise is amplified. Wieners filter graph representation for polyhedra in
combats this tendency by adding an which the nodes represent vertices,
estimate of the noise to the divisor. edges and faces. Faces point to
Because the divisor is complex, a real bounding edge nodes, that point to
formulation is as follows: vertices, that point back to connecting
G GK
GK edges, that point to adjacent faces. The
F = =
= 2 winged edge term comes from the fact
K KK |K|
that edges have four links that connect
and adding the frequency domain noise to the previous and successor edges
estimate N , we obtain the Wiener around each of the two faces that
288 W

contain the given edge, as seen here descriptions of the surface between the
[ JKS:13.5.1]: edges, and in particular, does not
include information for hidden line
LINKED EDGES removal. This is a wire frame model of
a cube [ BT:8]:
FACE 2 FACE 1
CURRENT EDGE

LINKED EDGES

winner-takes-all: A strategy whereby


only the best candidate (e.g.,
algorithm, solution) is chosen, and any world coordinates: A
other is abandoned. Commonly found coordinate system useful for placing
in the neural network and learning objects in a scene . Usually this is a 3D
literature. [ WP:Winnertakeall] coordinate system with some arbitrarily
placed origin (e.g., at a corner of a
wire frame representation: A room). This contrasts with
representation of 3D geometry in terms object centered representations,
of vertices and edges linking the viewer centered representations or
vertices. It does not include camera coordinates . [ JKS:1.4.2]
X

X-ray: Electromagnetic radiation of output is the complement of the xor


shorter wavelength than ultraviolet operator. The rightmost image is the
light, i.e., less than about 440 nm. xnor of the two left images [ SB:3.2.2]:
Very short X-rays are called gamma
rays. Useful for medical imaging
because of their power to penetrate
most materials, and for other areas
such as lithography because of the short
wavelength. [ EH:3.6.6]

X-ray CAT/CT: Computed axial xor operator: A combination of two


tomography or computer-assisted binary images A, B where each pixel
tomography . A technique for dense 3D (i, j) in A xor B is 1 if exactly one of
imaging of the interior of a material, A(i, j) and B(i, j) is 1. The output is
particularly the human body. the complement of the xnor operator.
Characterized by use of an X-ray source The rightmost image is the xor of the
and imaging system that rotate round two left images [ SB:3.2.2]:
the object being scanned. [ RN:10.3.4]

xnor operator: A combination of


two binary images A, B where each
pixel (i, j) in A xnor B is 0 if exactly
one of A(i, j) and B(i, j) is 1. The

289
Y

290
Y 291

YARF: Yet Another Road Follower. A


CarnegieMellon University YCrCb: See YUV where U=Cr and
autonomous driving system. [ K. V=Cb. [ SEU:1.7.3]
Kluge, and C. Thorpe, The YARF
YIQ: Color space used in NTSC
system for vision-based road following,
television. Separates Luminance (Y)
Mathematical and Computer
and two color signals: In-phase (roughly
Modelling, Vol. 22, pp 213-233, 1995.]
orange/blue), and Quadrature (roughly
yaw: A 3D rotation representation purple/green). Conversion to YIQ from

component (along with pitch and roll ) RGB is by [Y, I, Q] = M[R, G, B]
often used for cameras or moving where [ BB:2.2.5]
observers. The yaw component specifies
a rotation about a vertical axis to give
a side-to-side change in orientation.
0.299 0.596 0.212

This figure shows the yaw rotation M = 0.587 0.275 0.523
direction [ JKS:12.2.1]: 0.114 0.321 0.311

YUV: A color representation system


in which each point is represented by
YAW luminance (Y) and two chrominance
DIRECTION channels (U which is Red minus Y, and
V which is Blue minus Y). [ WP:YUV]
Z

Zernike moment: The dot product of


an image with one of the Zernike
polynomials. The Zernike polynomial

Unm (, ) = Rnm ()eim

is defined in polar coordinates (, ) on


the plane, only within the unit disk.
When projecting an image, data outside
the unit disk are generally ignored. The
real and imaginary parts are called the zero crossing operator: A class of
even and odd polynomials respectively. feature detector that, rather than
The radial function Rnm (t) is given by detecting maxima in the first derivative,
(nm)/2
detects zero crossings in the second
X
l tn2l (n l)! derivative. An advantage of finding
(1) n+m
nm
zero crossings rather than maxima is

l=0
l! 2 l ! 2 l !
that the edges always form closed
The Zernike polynomials have a history curves, so that regions are clearly
in optics, as basis functions for delineated. A disadvantage is that noise
modeling nonlinear lens distortion . is enhanced, so the image must be
Below, the leftmost column shows the carefully smoothed before the second
real and imaginary parts of eim for derivative is computed. A common
m = 1. Columns 24 show the real and kernel that combines smoothing and
imaginary parts of Zernike polynomials second derivative computation is the
U11 , U31 , and U22 [ AJ:9.8]: Laplacian of Gaussian . [ FP:8.3.1]
292
Z 293

in volumetric images. There is one


zero crossings of the Laplacian of a 3 3 3 kernel for each of the three
Gaussian: See zero crossing operator. derivatives. For example, if v(x, y, z) is
[ FP:8.3.1] the volume image, vz is computed as
the convolution of the kernel
zipcode analysis: See
c = [S, 0, S] where S is the 2D
postal code analysis .
smoothing kernel
zoom: 1) To change the effective
a b a
focal length of a camera in order to
S = b 1 b
increase magnification of the center of
a b a
the field of view. 2) Used in referring to
the current focal-length setting of a
and a = 1/ 3 and b = 1/ 2.
zoom lens . [ AJ:7.4]
Specifically, the kernel
zoom lens: A lens that allows the Dz(i, j, k) = S(i, j)c(k), and the kernels
v v
effective focal length (or zoom ) to for x and y are permutations of Dz
be varied after manufacture. Zoom given by Dx(i, j, k) = Dz(j, k, i) and
lenses may be manipulated manually or Dy(i, j, k) = Dz(k, i, j). [ BB:3.3.3]
electrically. [ WP:Zoom lens]
ZunigaHaralick operator: A
ZuckerHummel operator: A corner detection operator that is based
convolution kernel for surface detection on the coefficients of a cubic polynomial
approximating the local neighborhood.

Potrebbero piacerti anche