Sei sulla pagina 1di 14

The Role of Perception in RMM

Perception & Sensing


Where am I relative to the world?
in Robotic Mobility and Manipulation sensors: vision, stereo, range sensors, acoustics
problems: scene modeling/classification/recognition
integration: localization/mapping algorithms (e.g. SLAM)
Gregory D. Hager
Laboratory for Computation, Sensing, and Control What is around me?
Department of Computer Science
sensors: vision, stereo, range sensors, acoustics, sounds,
Johns Hopkins University smell
problems: object recognition, structure from x, qualitative
modeling
integration: collision avoidance/navigation, learning

!"#$% & '(#)% "*+% *,% - % #)(.

The Role of Perception in RMM Topics Today


Techniques
How can I safely interact with environment (including
people!)? Computational Stereo
sensors: vision, range, haptics (force+tactile) Feature detection and matching
problems: structure/range estimation, modeling, tracking,
materials, size, weight, inference Motion tracking and visual feedback
integration: navigation, manipulation, control, learning
Applications in Robotics:
How can I solve new problems (generalization)?
sensors: vision, range, haptics, undefined new sensor
Obstacle detection, environment interaction
problems: categorization by function/shape/context/??
integrate: inference, navigation, manipulation, control, Mapping, registration, localization, recognition
learning
Manipulation

!"#$% & '(#)% "*+% *,% - % #)(.

What is Computational Stereo? Computational Stereo


Much of geometric vision is based on information from 2 (or
more) camera locations
hard to recover 3D information from a single 2D image without
extra knowledge
motion and stereo (multiple cameras) are both common in the
world

Stereo vision is ubiquitous in nature


(oddly, nearly 10% of people are stereo blind)

Stereo involves the following three problems:

1. calibration
Viewing the same physical point from
two different viewpoints allows depth 2. matching (correspondence problem)
from triangulation
3. reconstruction (reconstruction problem)
!"#$% & '(#)% "*+% *,% - % #)(. !"#$% & '(#)% "*+% *,% - % #)(.
Non-verged Binocular Stereo System
Binocular Stereo System: Geometry

GOAL: Passive 2-camera system Assume: image are scan-line aligned


using triangulation to generate a
depth map of a world scene. From perspective projection: Z
xL = sx X/Z
Depth map: z=f(x,y) where x,y are xR = sx (X - b)/Z
coordinates one of the image X yL = yR = syY/Z
planes and z is the height above
Y
the respective image plane.
Define Disparity:
Note that for stereo systems which (0,0,f) D = (xL - xR)
differ only by an offset in x, the v
coordinates (projection of y) is the
XL XR
same in both images! b sx Z=f
Z =
D
Note we must convert from image
(pixel) coordinates to external 4 intrinsic parameters convert (0,0) (b,0) X
coordinates -- requires calibration from pixel to metric values
sx sy cx cy
!"#$% & '(#)% "*+% *,% - % #)(. !"#$% & '(#)% "*+% *,% - % #)(.

Stereo-System Accuracy Two-Camera Geometry

Z = bDsx

To increase resolution:
Increase of the baseline
(B) - size of the system
Increase of the focal
75cm
length (f) - field of view
It is not hard to show that when we rotate the
Decrease of the pixel-size cameras inward, corresponding points no longer lie
(1/sx) - resolution of the on a scan line
camera

!"#$% & '(#)% "*+% *,% - % #)(. !"#$% & '(#)% "*+% *,% - % #)(.

How to Change Epipolar Geometry Fundamental Matrix Derivation

Note that E is invariant to the scale


of the points, therefore we also have prt E pl = 0
Image rectification is the computation of
where p denotes the (metric) image
an image as seen by a rotated camera projection of P

Pr Pl Now if K denotes the internal


calibration, converting from metric
to pixel coordinates, we have further
that
Original image plane
T
rrt K-t E K-1 rl = rrt F rl = 0
Pr = R(Pl T)
New image plane where r denotes the pixel coordinates
of p. F is called the fundamental matrix
!"#$% & '(#)% "*+% *,% - % #)(. !"#$% & '(#)% "*+% *,% - % #)(.
Stereo-Based Reconstruction MATCHING AND CORRESPONDENCE

Correspondence Problem:
Two major approaches
How to find corresponding areas of two camera
feature-based In feature-based matching, the idea is
images (points, line segments, curves, regions) to pick a feature type (e.g. edges),
region based define a matching criteria (e.g.
orientation and contrast sign), and
then look for matches within a
disparity range

!"#$% & '(#)% "*+% *,% - % #)(. !"#$% & '(#)% "*+% *,% - % #)(.

Results - Reconstruction MATCHING AND CORRESPONDENCE

Two major approaches


In region-based matching, the
feature-based idea is to pick a region in the image
and attempt to find the matching
region based region in the second image by
maximizing the some measure:
1. normalized SSD
2. SAD
3. normalized cross-correlation

!"#$% & '(#)% "*+% *,% - % #)(. !"#$% & '(#)% "*+% *,% - % #)(.

Match Metric Summary Correspondence Search Algorithm


MATCH METRIC DEFINITION

# ( I (u, v ) ! I ) " ( I (u + d, v ) ! I )
Normalized Cross-Correlation
(NCC)
u,v
1 1 2 2
u d
# ( I (u, v ) ! I ) " # ( I (u + d, v ) ! I ) v
2 2
1 1 2 2
u,v u,v

Sum of Squared Differences Remember, these


(SSD) For i = 1:nrows
! (I (u, v )" I (u + d , v ))
2
two are actually I1 I2
for j=1:ncols
1 2
u ,v
the same
Normalized SSD '
%
$
"
2
best(i,j) = -1
% (I (u, v )( I ) (I (u + d , v )( I ) " for k = mindisparity:maxdisparity
! %
1 1
( 2 2
"
! (I (u, v )( I ) ! (I (u + d , v )( I )
2 2

c = ComputeMatchMetric(I1(i,j),I2(i,j+k),winsize)
u ,v
% 1 1 2 2 "
Sum of Absolute Differences & u ,v u ,v #
(SAD) ! I (u, v )" I (u + d , v )
u ,v
1 2 if (c > best(i,j))
best(i,j) = c
_ _
Zero Mean SAD ! ( I (u, v )" I )" ( I (u + d , v )" I
1 1 2 2 )
disparities(i,j) = k
u ,v

Rank I k' (u , v ) = ! I k (m, n ) < I k (u , v )


end
! (I (u, v )" I (u + d , v ))
m,n
' '
1 2
u ,v
end
Census I k' (u , v ) = BITSTRINGm ,n (I k (m, n )< I k (u , v ))
end O(nrows * ncols * disparities * winx * winy)
! HAMMING(I (u, v ), I (u + d , v ))
u ,v
'
1
'
2
end
!"#$% & '(#)% "*+% *,% - % #)(. !"#$% & '(#)% "*+% *,% - % #)(.
Correspondence Search Algorithm V2 An Additional Twist
Note that searching from left to right is not the same as searching from
best = -ones(size(im)) right to left.
disp = zeros(size(im))
for k = mindisparity:maxdisparity As a result, we can obtain a somewhat independent disparity map by
prod = I1(:,overlap) .* I2(:,k+overlap) flipping the images around.
CC = conv2(prod,fspecial(average,winsize))
better = CC > best; The results should be the same map up to sign.
disp = better .* k + (1-better).*disp;
LRCheck: displr(i,j) = - disprl(i,j+displr(i,j))
best = better .*CC + (1-better).*best;
end

Typically saves O(winx*winy) operations for most any match d


metric

!"#$% & '(#)% "*+% *,% - % #)(. !"#$% & '(#)% "*+% *,% - % #)(.

Example Disparity Maps


SSD ZNNC Real-Time Stereo
REAL-TIME IMAGE FRAME RANGE METHOD PROCESSOR CAMERAS
SIZE RATE BINS
STEREO SYSTEM
INRIA 1993 256x256 3.6 fps 32 Normalized PeRLe-1 3
Correlation
CMU iWarp 1993 256x240 15 fps 16 SSAD 64 Processor 3
iWarp Computer
Teleos 1995 320x240 0.5 fps 32 Sign Pentium 166 MHz 2
Correlation
JPL 1995 256x240 1.7 fps 32 SSD Datacube & 2
68040
CMU Stereo 256x240 30 fps 30 SSAD Custom HW & 6
Machine 1995 C40 DSP Array
Point Grey Triclops 320x240 6 fps 32 SAD Pentium II 450 3
1997 MHz
SRI SVS 1997 320x240 12 fps 32 SAD Pentium II 233 2
MHz
SRI SVM II 1997 320x240 30+ fps 32 SAD TMS320C60x 2
200MHz DSP
Interval PARTS 320x240 42 fps 24 Census Custom FPGA 2
Engine 1997 Matching
CSIRO 1997 256x256 30 fps 32 Census Custom FPGA 2
Matching
SAZAN 1999 320x240 20 fps 25 SSAD FPGA & 9
Convolvers
Point Grey 320x240 20 fps 32 SAD Pentium IV 2
Triclops 2001 13 fps 1.4 GHz 3
SRI SVS 2001 320x240 30 fps 32 SAD Pentium III 2
700 MHZ

!"#$% & '(#)% "*+% *,% - % #)(. !"#$% & '(#)% "*+% *,% - % #)(.

Applications of Real-Time Stereo Stereo Example: Obstacle Detection


Mobile robotics
Detect the structure of ground; detect obstacles; convoying

Graphics/video
Detect foreground objects and matte in other objects (super-
matrix effect)

Surveillance
Detect and classify vehicles on a street or in a parking
garage

Medical /$%- 012 *#%*.%0314


Measurement (e.g. sizing tumors) 5).#)"6 ').7*- 1#8 11"*$10139"#
Visualization (e.g. register with pre-operative CT) %- .#9(01.*: ;< 5= *9"&
)$$10139"#*: >< ?= *%- .#9(01.
!"#$% & '(#)% "*+% *,% - % #)(. !"#$% & '(#)% "*+% *,% - % #)(.
Obstacle Detection (contd) Applications of Real-Time Stereo
Observation: Removing the ground plane immediately exposes obstacles

!"#$% & '(#)% "*+% *,% - % #)(. !"#$% & '(#)% "*+% *,% - % #)(.

Other Problems: Local vs. Global Matching


Comparative results on images from the University of Tsukuba, provided
Photometric issues: by Scharstein and Szeliski [69]. Left to right: left stereo image, ground
specularities truth, Muhlmann et al.s area correlation algorithm [57], dynamic
programming (similar to Intille and Bobick [36]), Roy and Coxs maximum
strongly non-Lambertian BRDFs
flow [65] and Komolgorov and Zabihs graph cuts [45].

Surface structure
lack of texture
repeating texture within horopter bracket

Geometric ambiguities
as surfaces turn away, difficult to get accurate
reconstruction (affine approximate can help)
at the occluding contour, likelihood of good match
but incorrect reconstruction
!"#$% & '(#)% "*+% *,% - % #)(. !"#$% & '(#)% "*+% *,% - % #)(.

Mapping, Localization, Recognition Object Recognition: The Problem


Given: A database D of known objects and an image I:

1. Determine which (if any) objects in D appear in I


2. Determine the pose (rotation and translation) of the object

Pose Est.
(where is it 3D)

Segmentation Recognition
(where is it 2D) (what is it)

The object recognition conundrum


!"#$% & '(#)% "*+% *,% - % #)(. !"#$% & '(#)% "*+% *,% - % #)(.
Recognition From Geometry? Recognition From Appearance?
Columbia SLAM system:
can handle databases of 100s of objects
single change in point of view
uniform lighting conditions
Given a database of Courtesy Shree Nayar, Columbia U.

objects and an image


determine what, if any
of the objects are
present in the image.

!"#$% & '(#)% "*+% *,% - % #)(. !"#$% & '(#)% "*+% *,% - % #)(.

Current Best Solution Feature Desiderata


Generally view based Features should be distinctive
Uses local features and local invariance (global is
too weak) Features should be easily detected under changes in
pose, lighting, etc.
Uses *lots* of features and some sort of voting
Also recent attempts to perform categorical object There should be many features per object
recognition using similar techniques

Example: recent papers by Schmid, Lowe, Ponce,


Hebert, Perona ...

Here, we discus SIFT features (Lowe 1999)


!"#$% & '(#)% "*+% *,% - % #)(. !"#$% & '(#)% "*+% *,% - % #)(.

Steps in SIFT Feature Selection Peak Detection


Scale-space peak selection
Find all max and min is LoG images in both space and
scale
Keypoint localization
8 spatial neighbors; 9 scale neighbors
includes rejection due to poor localization
orientation based on maximum of weighted histogram
also perform cornerness check using eigenvalues; reject
those with eigenvalue ratio greater than 10

Orientation Assignment
dominant orientation plus any within 80% of dominant

Build keypoint descriptor

Normal images yield approx. 2000 stable features


small objects in cluttered backgrounds require 3-6 features
!"#$% & '(#)% "*+% *,% - % #)(. !"#$% & '(#)% "*+% *,% - % #)(.
Keypoint Descriptor Example

!"#$% & '(#)% "*+% *,% - % #)(. !"#$% & '(#)% "*+% *,% - % #)(.

PDF of Matching Feature Matching

Uses a Hough transform (voting technique)


parameters are position, orientation and scale for
each training view
features are matched to closest Euclidean
distance neighbor in database; each database
feature indexed to object and view as well as
location, orientation and scale
features are linked to adjacent model views; these
links are also followed and accumulated
implemented using a hash table

!"#$% & '(#)% "*+% *,% - % #)(. !"#$% & '(#)% "*+% *,% - % #)(.

Results

Matching requires histogramming


followed by alignment

!"#$% & '(#)% "*+% *,% - % #)(.


Ponce&Rothganger: 51 test images with 1 to 5
!"#$% & '(#)% "*+% *,% - % #)(.
of 8 objects present in each image.
Results

96% recognition rate


(no false positives)

!"#$% & '(#)% "*+% *,% - % #)(. !"#$% & '(#)% "*+% *,% - % #)(.

Vision-Based Robot Mapping RMS Titanic


Leonard & Eustice

FASTSlam innovations 0 EKF-based system


Rao-Blackwellized particle filters 0 866 images
0 3494 camera constraints
Mapping results for multiple 0 Path length 3.1km 2D / 3.4km 3D
kilometers 0 Convex hull > 3100m2
0 344 min. data / 39 min. ESDF*
Laser and vision *excludes image registration time
joint issue of IJCV and IJRR
prominently vision-based
SLAM

Se, Lowe, Little, 2003

!"#$% & '(#)% "*+% *,% - % #)(. !"#$% & '(#)% "*+% *,% - % #)(.

3D Model Building

VISUAL TRACKING

Reconstruction

(Peter Allen, Columbia University)

!"#$% & '(#)% "*+% *,% - % #)(.


Cathedral of Saint Pierre
What Is Visual Tracking? Principles of Visual Tracking
I0 It

pt

Hager & Rasmussen 98 Bregler and Malik 98


Variability model: It = g(I0, pt)

Incremental Estimation: From I0, It+1 and pt compute 5pt+1


Hager & Belhumeur 98 Black and Yacoob 95

|| I0 - g(It+1, pt+1) ||2 ==> min


!"#$% & '(#)% "*+% *,% - % #)(. !"#$% & '(#)% "*+% *,% - % #)(.
Bascle and Blake 98

Principles of Visual Tracking Tracking Cycle


I0 It Prediction
Prior states predict new
pt appearance
Reference

Image warping
Generate a normalized
Variability model: It = g(I0, pt) view

Estimation Image
-
Incremental Estimation: From I0, It+1 and pt compute 5pt+1 Compute change in Warping
parameters from
changes in the image p

Visual Tracking = Visual Stabilization State integration


5p Model
Inverse
Apply correction to state
!"#$% & '(#)% "*+% *,% - % #)(. !"#$% & '(#)% "*+% *,% - % #)(.

Regions: A More Interesting


Some Background Case
Planar Object => Affine motion model: ui = A ui + d

Perspective (pinhole) camera


X = x/z
Y = y/z

Para-perspective Warping
X = s x surface
normal

Y = s y
th

Lamberts law
B = a cos(th) It = g(pt, I0)
!"#$% & '(#)% "*+% *,% - % #)(. !"#$% & '(#)% "*+% *,% - % #)(.
Stabilization Formulation On The Structure of M
Planar Object -> Affine motion model: ui = A ui + d
Model

I0 = g(pt, It ) (image I, variation model g,


parameters p)
& I/dt = M(pt, It) & p/dt ** (local linearization M)

Define an error

et+1 = g(pt, It ) - I0 M is N x m and


is time varying!
Close the loop
X Y Rotation Scale Aspect Shear

pt+1 = pt - (MT M)-1 MT et+1 where M = M(pt,It)


!"#$% & '(#)% "*+% *,% - % #)(. !"#$% & '(#)% "*+% *,% - % #)(.

3D Case : Global Geometry 3D Case: Local Geometry


Non-Planar Object: ui = A ui + b zi + d Non-Planar Object: ui = A ui + b zi + d

Observations:

Image coordinates lie in a


4D space

3D subspace can be fixed

Motion in two images gives


affine
structure

x y rot z scale aspect rot x rot y


!"#$% & '(#)% "*+% *,% - % #)(. !"#$% & '(#)% "*+% *,% - % #)(.

3D Case: Illumination Modeling Handling Occlusion


Non-Planar Object:It = B 9*@ *!A

Reference
Observations:

Lambertian object, single


source, no
cast shadows => 3D
image space Image
-
Warping
With shadows => a cone
Weighting
Empirical evidence p
suggests 5 to 6
basis images suffices 5p Model
B Inverse
!"#$% & '(#)% "*+% *,% - % #)(. !"#$% & '(#)% "*+% *,% - % #)(.
A Complete Implementation Extension: Layered
Systems
(Kentaro Toyama, MSR)
feature-based tracking
target state

template-based tracking

blob tracking

full configuration space

algorithmic layers
!"#$% & '(#)% "*+% *,% - % #)(. color thresholding

Layered System: Example Motion, Tracking, Control


Green: tracking Red: searching

Conventional image-plane SSD 3D SSD


M. Jagersand, U. Alberta

G. Hager, JHU

!"#$% & '(#)% "*+% *,% - % #)(. !"#$% & '(#)% "*+% *,% - % #)(.

Adding Kinematics Vision-Based Control


How should this be programmed?

?
!"#$% & '(#)% "*+% *,% - % #)(. !"#$% & '(#)% "*+% *,% - % #)(.
Vision-Based Control Vision-Based Control
Solution #1: Solution #2:
Calibrate camera to robot Compute position of both
Use stereo coordinates robot and object

Tobject e = Tobj- Trob

!"#$% & '(#)% "*+% *,% - % #)(. !"#$% & '(#)% "*+% *,% - % #)(.

Vision-Based Control An Observation


Solution #3: Given:
a desired kinematic constraint T(f1,f2) = 0
Compute errors based on
an encoding with e(y1,y2) = 0 iff T(f1,f2) = 0
images of robot and object
Compute:
de/dt = Je dq/dt
dq/dt = - Je-1 e(y1, y2)

Result:
1. If stable, e->0. This implies T->0.
2. Accuracy is calibration independent.

e = fobj- fob

!"#$% & '(#)% "*+% *,% - % #)(. !"#$% & '(#)% "*+% *,% - % #)(.

More Formally Example Camera Model Classes


Fix a viewspace H

Given C0 injective on H

?all [C0] G C :C injective on H , Im C = Im C0I

Task function T Set of cameras ! Image encoding E weakly calibrated injective cameras
Feature configuration f Actual camera C ! ! Image features y
Task: T(f) = 0
Observation y = C(f) New task E(y) = 0 Given projective 2-camera C0 inj. on H

?proj [C0] ? all[C0] G set of all projective 2-camera modelsI


weakly calibrated projective cameras
When can we ensure
Given pin-hole 2-camera C0 inj. on H
+: C= *D E: F = *D *A
A ?persp [C0] ? all[C0] G set of all pin-hole 2-camera modelsI
How can we specify all such tasks?
!"#$% & '(#)% "*+% *,% - % #)(. weakly calibrated
!"#$% & '(#)%perspective
"*+% *,% - % #)(. cameras
Weakly Calibrated Sets Some Examples
Injective cameras:
or
Invariance on
*****Jall G *group of all bijectionsI

Projective cameras:

Invariance on
*****Jproj G group of projective transformationI

Perspective cameras:

Invariance on
****J* pin-hole G * group of rigid body transformations with scalingI
!"#$% & '(#)% "*+% *,% - % #)(. !"#$% & '(#)% "*+% *,% - % #)(.

Some Examples Some Examples

!"#$% & '(#)% "*+% *,% - % #)(. !"#$% & '(#)% "*+% *,% - % #)(.

Future Challenges Complex Environments


Complex
Clutter
Complex Geometry Deformable Objects Complex Objects
Categories Materials

The pieces are starting to appear,


why dont we see real systems?
Challenge: Highly Dynamic Environments Human Interaction
Motivators
Recovering Geometry, Egomotion, Individual/Group Trajectories, and Activities
aging population
enabling disabled
huge market

Challenges (research)
highly integrative
unstructured problems
adaptivity

Challenges (market)
high initial investment
safety/reliability

!"#$% & '(#)% "*+% *,% - % #)(.

Generalization and Learning Cross-Cutting Challenges


Large-scale verification of algorithms
Clear value to data-driven approaches
data repositories
accepted evaluation methodologies
Rapid progress in recent years in
dimensional reduction System integration
unsupervised modeling almost no one has the resources to do it all and do it right
supervised methods
Facing the real world
Current methods still do not > 99% reliability
scale well manufacturable
make use of problem structure scalable
cannot be validated

!"#$% & '(#)% "*+% *,% - % #)(. !"#$% & '(#)% "*+% *,% - % #)(.

Potrebbero piacerti anche