Sei sulla pagina 1di 10

CMPEN/EE​ ​454​ ​Project​ ​2

Computing​ ​Dense​ ​Disparities​ ​with​ ​Simple​ ​Stereo


Matthew​ ​McTaggart​ ​&​ ​Peter​ ​Rancourt​ ​-​ ​Group​ ​1

Overview
This​ ​project​ ​served​ ​as​ ​an​ ​introduction​ ​to​ ​calculating​ ​the​ ​disparity​ ​between​ ​a​ ​stereo​ ​set​ ​of
images.​ ​A​ ​stereo​ ​image​ ​set​ ​is​ ​comprised​ ​of​ ​two​ ​images​ ​from​ ​two​ ​separate​ ​cameras.​ ​In​ ​these
stereo​ ​images,​ ​the​ ​two​ ​cameras​ ​are​ ​separated​ ​with​ ​a​ ​baseline​ ​vector​ ​comprised​ ​of​ ​only
horizontal​ ​displacement.​ ​More​ ​specifically,​ ​the​ ​two​ ​cameras​ ​are​ ​at​ ​the​ ​same​ ​height​ ​and
distance​ ​from​ ​the​ ​object​ ​of​ ​interest.​ ​In​ ​a​ ​stereo​ ​camera​ ​setup,​ ​parts​ ​of​ ​the​ ​scene​ ​in​ ​the​ ​right
camera​ ​may​ ​not​ ​be​ ​seen​ ​by​ ​the​ ​left​ ​camera,​ ​and​ ​parts​ ​of​ ​the​ ​scene​ ​in​ ​the​ ​left​ ​camera​ ​may
not​ ​be​ ​seen​ ​by​ ​the​ ​right​ ​camera.​ ​This​ ​is​ ​known​ ​as​ ​occlusion.​ ​Occlusion​ ​in​ ​the​ ​left​ ​and​ ​right
images​ ​is​ ​problematic​ ​for​ ​disparity​ ​because​ ​it​ ​cannot​ ​be​ ​defined.​ ​The​ ​disparity​ ​of​ ​the​ ​stereo
image​ ​is​ ​defined​ ​as​ ​the​ ​displacement​ ​between​ ​the​ ​pixels​ ​in​ ​the​ ​left​ ​and​ ​right​ ​image​ ​that
correspond​ ​to​ ​the​ ​same,​ ​precise​ ​location​ ​in​ ​the​ ​physical​ ​world.

The​ ​results​ ​shown​ ​in​ ​the​ ​project​ ​are​ ​calculated​ ​from​ ​the​ ​Middlebury​ ​College’s​ ​computer
vision​ ​website’s​ ​motorbike​ ​and​ ​chair​ ​stereo​ ​image​ ​set.​ ​The​ ​motivation​ ​of​ ​this​ ​project​ ​was​ ​to
determine​ ​how​ ​the​ ​different​ ​patch-size​ ​for​ ​matching​ ​patches​ ​affects​ ​the​ ​disparity​ ​image.​ ​The
disparity​ ​image​ ​is​ ​defined​ ​as​ ​the​ ​two​ ​dimensional​ ​matrix​ ​that​ ​represents​ ​the​ ​displacement​ ​of
the​ ​pixel​ ​of​ ​interest​ ​in​ ​the​ ​left​ ​image​ ​to​ ​the​ ​location​ ​of​ ​the​ ​pixel​ ​of​ ​interest​ ​in​ ​the​ ​right​ ​image.
The​ ​three​ ​main​ ​measurements​ ​for​ ​patch​ ​matching​ ​are​ ​the​ ​sum​ ​of​ ​square​ ​differences,​ ​raw
correlation,​ ​and​ ​the​ ​normalized​ ​cross​ ​correlation.​ ​This​ ​project​ ​focuses​ ​on​ ​applying​ ​the
normalized​ ​cross​ ​correlation​ ​for​ ​determining​ ​patch​ ​matching.​ ​To​ ​speed​ ​up​ ​the​ ​computation,
this​ ​project​ ​downsamples​ ​the​ ​stereo​ ​images​ ​by​ ​a​ ​factor​ ​of​ ​four.​ ​Downsampling​ ​allows​ ​the
adjustment​ ​of​ ​patch-size​ ​to​ ​not​ ​affect​ ​the​ ​computation​ ​time​ ​significantly.

The​ ​maximum​ ​disparity​ ​for​ ​the​ ​native​ ​images​ ​of​ ​the​ ​stereo​ ​images​ ​of​ ​the​ ​motorbike​ ​and
chair​ ​are​ ​270​ ​pixels​ ​and​ ​280​ ​pixels,​ ​respectively.​ ​These​ ​are​ ​provided​ ​with​ ​the​ ​stereo​ ​image
sets,​ ​and​ ​are​ ​from​ ​true​ ​physical​ ​disparity​ ​calculation​ ​calibrated​ ​for​ ​the​ ​native​ ​resolution.​ ​Due
to​ ​downsampling,​ ​the​ ​maximum​ ​disparity​ ​in​ ​the​ ​processed​ ​disparity​ ​image​ ​is​ ​67.5​ ​pixels​ ​and
70​ ​pixels,​ ​respectively.

The​ ​patch-size​ ​affects​ ​the​ ​apparent​ ​granularity​ ​of​ ​the​ ​processed​ ​disparity​ ​images.​ ​Smaller
patch-sizes​ ​calculates​ ​the​ ​disparity​ ​image​ ​quicker​ ​because​ ​there​ ​are​ ​less​ ​calculations​ ​per
corresponding​ ​patch-matching​ ​calculation,​ ​but​ ​appears​ ​more​ ​coarse​ ​ ​However,​ ​having​ ​larger
patch-sizes​ ​adds​ ​extra​ ​constraints​ ​that​ ​classifies​ ​a​ ​patch​ ​matching​ ​pair​ ​to​ ​be​ ​the​ ​perfect
match.​ ​More​ ​specifically,​ ​the​ ​smaller​ ​the​ ​patch-sizes​ ​are,​ ​the​ ​more​ ​likely​ ​patch​ ​matching
pairs​ ​are​ ​not​ ​unique.​ ​In​ ​general,​ ​the​ ​larger​ ​the​ ​patch-size​ ​is​ ​the​ ​more​ ​smoother​ ​the
processed​ ​disparity​ ​image​ ​is.​ ​The​ ​details​ ​of​ ​how​ ​the​ ​disparity​ ​is​ ​calculated​ ​from​ ​patch
matching,​ ​is​ ​described​ ​in​ ​the​ ​outline​ ​of​ ​procedural​ ​approaches.
Outline​ ​of​ ​Procedural​ ​Approaches
Our​ ​program​ ​consists​ ​primarily​ ​of​ ​two​ ​files:​ ​main.m​ ​and​ ​ourDisparity.m,​ ​where​ ​ourDisparity​ ​is
a​ ​function.​ ​The​ ​program​ ​starts​ ​in​ ​main​ ​where​ ​it​ ​reads​ ​the​ ​left​ ​and​ ​right​ ​images​ ​into​ ​the
program,​ ​both​ ​of​ ​which​ ​are​ ​cast​ ​as​ ​doubles.​ ​We​ ​next​ ​define​ ​a​ ​variable​ ​for​ ​max​ ​disparity​ ​to
be​ ​270​ ​and​ ​a​ ​variable​ ​for​ ​patch​ ​width​ ​to​ ​be​ ​an​ ​arbitrary​ ​number.​ ​This​ ​width​ ​is​ ​used​ ​later​ ​on
as​ ​both​ ​the​ ​width​ ​and​ ​height​ ​of​ ​our​ ​patch.

Using​ ​the​ ​saved​ ​images​ ​that​ ​were​ ​read​ ​in,​ ​we​ ​extract​ ​just​ ​the​ ​green​ ​channel​ ​of​ ​the​ ​two
images​ ​so​ ​we​ ​have​ ​two​ ​single-channel​ ​images.​ ​We​ ​then​ ​scale​ ​down​ ​our​ ​single-channel
images​ ​by​ ​a​ ​factor​ ​of​ ​1/N​ ​where​ ​N​ ​is​ ​an​ ​arbitrary​ ​integer.​ ​Once​ ​we​ ​have​ ​all​ ​of​ ​these
components,​ ​we​ ​are​ ​able​ ​to​ ​call​ ​the​ ​function​ ​ourDisparity,​ ​which​ ​calculates​ ​the​ ​disparity
image​ ​using​ ​the​ ​grayscale​ ​images,​ ​patch​ ​width,​ ​and​ ​maximum​ ​disparity​ ​as​ ​inputs.

ourDisparity​ ​starts​ ​off​ ​by​ ​defining​ ​two​ ​matrices;​ ​one​ ​is​ ​for​ ​the​ ​left​ ​image​ ​and​ ​contains​ ​a​ ​patch
for​ ​all​ ​pixels​ ​in​ ​the​ ​image​ ​that,​ ​when​ ​at​ ​the​ ​center​ ​of​ ​a​ ​patch,​ ​do​ ​not​ ​have​ ​any​ ​values​ ​which
would​ ​lie​ ​outside​ ​of​ ​the​ ​image.​ ​The​ ​other​ ​matrix​ ​is​ ​for​ ​the​ ​right​ ​patch,​ ​where​ ​we​ ​store​ ​rows​ ​of
the​ ​image​ ​and​ ​each​ ​row​ ​has​ ​a​ ​height​ ​that​ ​corresponds​ ​to​ ​the​ ​patch​ ​heights.​ ​There​ ​are​ ​as
many​ ​rows​ ​as​ ​there​ ​are​ ​rows​ ​of​ ​lines​ ​of​ ​pixels​ ​who,​ ​when​ ​at​ ​the​ ​center​ ​of​ ​the​ ​row​ ​(along​ ​the
x-axis),​ ​would​ ​not​ ​have​ ​any​ ​values​ ​which​ ​lie​ ​outside​ ​of​ ​the​ ​image​ ​(i.e.​ ​rows​ ​with​ ​the​ ​centers
within​ ​patch​ ​width​ ​/​ ​2​ ​of​ ​the​ ​top​ ​or​ ​bottom​ ​will​ ​not​ ​be​ ​included).​ ​These​ ​rows​ ​are​ ​the​ ​epipolar
constraints​ ​of​ ​the​ ​stereo​ ​image​ ​setup.

From​ ​there,​ ​the​ ​two​ ​matrices​ ​are​ ​filled​ ​using​ ​two​ ​nested​ ​for​ ​loops​ ​and​ ​the​ ​disparity​ ​image​ ​is
calculated​ ​using​ ​normalized​ ​cross​ ​correlation​ ​(NCC).​ ​We​ ​decided​ ​on​ ​two​ ​nested​ ​for​ ​loops
because​ ​it​ ​was​ ​a​ ​format​ ​we​ ​both​ ​understand​ ​conceptually​ ​and​ ​was​ ​as​ ​fast​ ​as​ ​any​ ​other
method​ ​that​ ​we​ ​tested.​ ​The​ ​outer​ ​for​ ​loop​ ​uses​ ​the​ ​variable​ i​ ​ ​to​ ​represent​ ​a​ ​row​ ​of​ ​pixels
where​ ​said​ ​row​ ​is​ ​the​ ​center​ ​of​ ​a​ ​row​ ​of​ ​height​ ​“patch​ ​width”.​ i​ ​ ​starts​ ​at​ ​the​ ​closest​ ​point​ ​to
the​ ​top​ ​of​ ​the​ ​image​ ​where​ ​the​ ​pixel​ ​would​ ​not​ ​have​ ​any​ ​patch​ ​pixels​ ​outside​ ​of​ ​the​ ​image
and​ ​goes​ ​to​ ​the​ ​bottom​ ​of​ ​the​ ​image​ ​with​ ​the​ ​same​ ​condition.​ ​In​ ​this​ ​loop,​ ​we​ ​calculate​ ​the
bounds​ ​for​ ​the​ ​top​ ​and​ ​bottom​ ​of​ ​the​ ​image​ ​rows​ ​to​ ​extract​ ​from​ ​the​ ​right​ ​image.​ ​The
extracted​ ​row​ ​is​ ​then​ ​added​ ​to​ ​our​ ​row​ ​matrix.

The​ ​inner​ ​loop​ ​utilizes​ ​the​ ​variable​ ​j​ ​and​ ​represents​ ​the​ ​columns​ ​of​ ​the​ ​image​ ​and​ ​each
column​ ​is​ ​the​ ​center​ ​of​ ​a​ ​patch.​ ​With​ ​the​ ​same​ ​patch​ ​conditions​ ​as​ i​ ,​ ​j​ ​starts​ ​at​ ​the​ ​left​ ​of​ ​the
image​ ​and​ ​works​ ​towards​ ​the​ ​right.​ ​The​ ​inner​ ​loop​ ​starts​ ​by​ ​calculating​ ​the​ ​bounds​ ​for​ ​the
left,​ ​right,​ ​top​ ​and​ ​bottom​ ​of​ ​the​ ​patch​ ​used​ ​for​ ​the​ ​left​ ​image.​ ​These​ ​bounds​ ​are​ ​then​ ​used
to​ ​extract​ ​the​ ​patch​ ​from​ ​the​ ​left​ ​image;​ ​the​ ​patch​ ​is​ ​stored​ ​in​ ​the​ ​left​ ​patch​ ​matrix.

When​ ​a​ ​patch​ ​is​ ​calculated,​ ​we​ ​then​ ​use​ ​the​ ​function​ ​normxcorr2()​ ​to​ ​search​ ​for​ ​the​ ​patch​ ​in
the​ ​corresponding​ ​row​ ​of​ ​the​ ​right​ ​image​ ​that​ ​most​ ​closely​ ​matches​ ​the​ ​left​ ​patch.​ ​All​ ​of​ ​the
patch​ ​scores​ ​are​ ​stored​ ​in​ ​the​ ​variable​ ​patchMatch​.​ ​We​ ​then​ ​look​ ​along​ ​the​ ​centerline​ ​of
patchMatch​ ​to​ ​determine​ ​which​ ​pixel​ ​value​ ​has​ ​the​ ​highest​ ​score​ ​when​ ​compared​ ​to​ ​the
current​ ​patch​ ​(while​ ​also​ ​being​ ​within​ ​the​ ​maximum​ ​disparity​ ​threshold).​ ​Once​ ​this​ ​index​ ​is
found,​ ​we​ ​find​ ​the​ ​distance​ ​between​ ​the​ ​two​ ​pixels.​ ​The​ ​distance​ ​is​ ​stored​ ​in​ ​our​ ​disparity
image​ ​matrix​ ​at​ ​the​ ​same​ ​index​ ​as​ ​our​ ​left​ ​patch’s​ ​center​ ​pixel.​ ​The​ ​loops​ ​continue​ ​on​ ​until
all​ ​disparity​ ​values​ ​are​ ​calculated.
A​ ​flowchart​ ​showing​ ​the​ ​structure​ ​of​ ​our​ ​code​ ​can​ ​be​ ​seen​ ​below:

Figure​ ​1​ ​-​ ​Flowchart​ ​showing​ ​the​ ​layout​ ​of​ ​our​ ​code

How​ ​to​ ​Run​ ​the​ ​Code


To​ ​run​ ​the​ ​code,​ ​select​ ​the​ ​main.m​ ​file​ ​and​ ​select​ ​run.​ ​All​ ​results​ ​will​ ​be​ ​loaded​ ​from​ ​the​ ​.mat
files​ ​corresponding​ ​to​ ​each​ ​result​ ​precomputed.​ ​If​ ​you​ ​would​ ​like​ ​to​ ​run​ ​our​ ​code​ ​on​ ​an
image​ ​to​ ​calculated​ ​disparity,​ ​uncomment​ ​one​ ​of​ ​the​ ​lines​ ​from​ ​48,49,​ ​and​ ​50​ ​to​ ​select​ ​the
method​ ​for​ ​calculating​ ​the​ ​disparity​ ​image.

Experimental​ ​Observations

The​ ​first​ ​set​ ​of​ ​figures​ ​2​ ​through​ ​4​ ​shows​ ​the​ ​normalized​ ​cross​ ​correlation​ ​disparity​ ​image
calculated​ ​from​ ​the​ ​motorbike​ ​stereo​ ​image.​ ​In​ ​each​ ​of​ ​these​ ​cases,​ ​the​ ​resolution​ ​of​ ​the
motorbike​ ​stereo​ ​image​ ​was​ ​reduced​ ​by​ ​a​ ​factor​ ​of​ ​4,​ ​to​ ​speed​ ​up​ ​computation.​ ​The​ ​patch
sizes​ ​that​ ​were​ ​studied​ ​are​ ​5x5,​ ​7x7​ ​and​ ​11x11.​ ​It​ ​is​ ​important​ ​to​ ​note​ ​that​ ​as​ ​the​ ​patch​ ​size
increases,​ ​progressively​ ​fewer​ ​border​ ​pixels​ ​are​ ​computed​ ​in​ ​the​ ​disparity​ ​image.​ ​Figures​ ​2
to​ ​4​ ​very​ ​subtly​ ​show​ ​the​ ​increasing​ ​black​ ​border.​ ​Note:​ ​in​ ​MATLAB​ ​the​ ​figures​ ​have​ ​the​ ​full
black​ ​border​ ​but​ ​when​ ​saving​ ​the​ ​file​ ​as​ ​a​ ​.png,​ ​some​ ​of​ ​the​ ​border​ ​has​ ​been​ ​partially
removed​ ​for​ ​some​ ​of​ ​the​ ​images.
Figure​ ​2​ ​-​ ​Disparity​ ​image​ ​of​ ​the​ ​motorbike​ ​with​ ​a​ ​patch​ ​size​ ​of​ ​5x5​ ​using​ ​normalized​ ​cross
correlation.​ ​This​ ​image​ ​is​ ​downsampled​ ​by​ ​a​ ​factor​ ​of​ ​four.

Figure​ ​3​ ​-​ ​Disparity​ ​image​ ​of​ ​the​ ​motorbike​ ​with​ ​a​ ​patch​ ​size​ ​of​ ​7x7​ ​using​ ​normalized​ ​cross
correlation.​ ​This​ ​image​ ​is​ ​downsampled​ ​by​ ​a​ ​factor​ ​of​ ​four.
Figure​ ​4​ ​-​ ​Disparity​ ​image​ ​of​ ​the​ ​motorbike​ ​with​ ​a​ ​patch​ ​size​ ​of​ ​11x11​ ​using​ ​normalized
cross​ ​correlation.​ ​This​ ​image​ ​is​ ​downsampled​ ​by​ ​a​ ​factor​ ​of​ ​four.

Through​ ​observation,​ ​it​ ​can​ ​be​ ​seen​ ​that​ ​as​ ​the​ ​patch​ ​size​ ​becomes​ ​smaller,​ ​the​ ​disparity
image​ ​becomes​ ​coarser.​ ​Opposingly​ ​as​ ​the​ ​patch​ ​size​ ​becomes​ ​larger,​ ​the​ ​disparity​ ​image
becomes​ ​smoother.​ ​This​ ​is​ ​because​ ​each​ ​patch​ ​considers​ ​a​ ​larger​ ​sample​ ​of​ ​the​ ​image​ ​and
thus​ ​more​ ​reliable​ ​when​ ​determining​ ​the​ ​disparity.​ ​It​ ​is​ ​more​ ​reliable​ ​because​ ​the​ ​larger​ ​the
patch​ ​size,​ ​the​ ​smaller​ ​the​ ​probability​ ​of​ ​a​ ​mismatch​ ​from​ ​a​ ​patch​ ​in​ ​the​ ​left​ ​stereo​ ​image​ ​to
a​ ​patch​ ​in​ ​the​ ​right​ ​stereo​ ​image.​ ​When​ ​observing​ ​an​ ​image,​ ​details​ ​of​ ​a​ ​finer​ ​scale​ ​are​ ​more
susceptible​ ​to​ ​repetition​ ​than​ ​they​ ​would​ ​be​ ​at​ ​a​ ​larger​ ​scale.​ ​Another​ ​observation​ ​to​ ​note
however,​ ​is​ ​as​ ​the​ ​patch​ ​size​ ​becomes​ ​larger,​ ​so​ ​does​ ​the​ ​computational​ ​complexity​ ​of
determining​ ​the​ ​disparity​ ​image.​ ​Through​ ​this​ ​analysis,​ ​there​ ​is​ ​a​ ​tradeoff​ ​one​ ​must​ ​consider:
the​ ​importance​ ​of​ ​speed​ ​vs.​ ​accuracy.

Another​ ​observation​ ​that​ ​was​ ​taken​ ​into​ ​consideration​ ​was​ ​how​ ​downsampling​ ​affects​ ​the
disparity​ ​image.​ ​Figure​ ​5​ ​shows​ ​the​ ​disparity​ ​image​ ​calculated​ ​with​ ​a​ ​patch​ ​size​ ​of​ ​11x11
using​ ​normalized​ ​cross​ ​correlation,​ ​but​ ​is​ ​downsampled​ ​by​ ​a​ ​factor​ ​of​ ​2​ ​instead​ ​of​ ​4.
Figure​ ​5​ ​-​ ​Disparity​ ​image​ ​of​ ​the​ ​motorbike​ ​with​ ​a​ ​patch​ ​size​ ​of​ ​11x11​ ​using​ ​normalized
cross​ ​correlation.​ ​This​ ​image​ ​is​ ​downsampled​ ​by​ ​a​ ​factor​ ​of​ ​two.

An​ ​interesting​ ​thing​ ​to​ ​note​ ​is​ ​that​ ​as​ ​the​ ​resolution​ ​of​ ​the​ ​stereo​ ​images​ ​increase,​ ​the
smaller​ ​the​ ​patch​ ​size​ ​11x11​ ​appears​ ​to​ ​be​ ​relative​ ​to​ ​the​ ​image.​ ​This​ ​image,​ ​now​ ​with​ ​a
patch​ ​size​ ​of​ ​11x11,​ ​has​ ​the​ ​same​ ​level​ ​of​ ​“coarse”​ ​details​ ​as​ ​seen​ ​in​ ​the​ ​disparity​ ​image
that​ ​was​ ​downsampled​ ​by​ ​a​ ​factor​ ​of​ ​4​ ​using​ ​a​ ​patch​ ​size​ ​of​ ​5x5.​ ​The​ ​less​ ​the​ ​stereo​ ​image
is​ ​downsampled,​ ​the​ ​smoother​ ​the​ ​disparity​ ​image​ ​is​ ​because​ ​there​ ​is​ ​less​ ​information​ ​lost
when​ ​downsampling.​ ​This​ ​can​ ​be​ ​seen​ ​in​ ​that​ ​figure​ ​5​ ​appears​ ​to​ ​have​ ​smoother​ ​surfaces
than​ ​in​ ​figure​ ​2​ ​with​ ​an​ ​equivalent​ ​level​ ​of​ ​detail.​ ​The​ ​right​ ​and​ ​left​ ​image​ ​differ​ ​very​ ​subtly,
as​ ​the​ ​camera​ ​is​ ​only​ ​shifted​ ​horizontally,​ ​so​ ​an​ ​equivalent​ ​region​ ​in​ ​both​ ​the​ ​left​ ​and​ ​right
image​ ​will​ ​change​ ​very​ ​slightly​ ​during​ ​the​ ​downsampling​ ​process.

The​ ​next​ ​set​ ​of​ ​figures​ ​6​ ​to​ ​8​ ​show​ ​a​ ​different​ ​stereo​ ​image.​ ​This​ ​stereo​ ​image​ ​depicts​ ​a
rocking​ ​chair.​ ​In​ ​this​ ​case,​ ​the​ ​patch​ ​sizes​ ​were​ ​chosen​ ​to​ ​be​ ​much​ ​larger​ ​and​ ​the​ ​stereo
image​ ​was​ ​downsampled​ ​by​ ​a​ ​factor​ ​of​ ​4.​ ​The​ ​patch​ ​sizes​ ​under​ ​consideration​ ​are​ ​11x11,
21x21,​ ​and​ ​35x35.​ ​These​ ​were​ ​chosen​ ​to​ ​study​ ​how​ ​much​ ​larger​ ​patch​ ​sizes​ ​affects​ ​the
disparity​ ​image.
Figure​ ​6​ ​-​ ​Disparity​ ​image​ ​of​ ​the​ ​chair​ ​with​ ​a​ ​patch​ ​size​ ​of​ ​11x11​ ​using​ ​normalized​ ​cross
correlation.​ ​This​ ​image​ ​is​ ​downsampled​ ​by​ ​a​ ​factor​ ​of​ ​four.
Figure​ ​7​ ​-​ ​Disparity​ ​image​ ​of​ ​the​ ​chair​ ​with​ ​a​ ​patch​ ​size​ ​of​ ​21x21​ ​using​ ​normalized​ ​cross
correlation.​ ​This​ ​image​ ​is​ ​downsampled​ ​by​ ​a​ ​factor​ ​of​ ​four.

Figure​ ​8​ ​-​ ​Disparity​ ​image​ ​of​ ​the​ ​chair​ ​with​ ​a​ ​patch​ ​size​ ​of​ ​35x35​ ​using​ ​normalized​ ​cross
correlation.​ ​This​ ​image​ ​is​ ​downsampled​ ​by​ ​a​ ​factor​ ​of​ ​four.

As​ ​expected​ ​from​ ​figures​ ​6​ ​to​ ​8,​ ​as​ ​the​ ​patch​ ​size​ ​of​ ​the​ ​image​ ​is​ ​increased,​ ​the​ ​smoother
the​ ​disparity​ ​image​ ​appears.​ ​Also,​ ​it​ ​is​ ​very​ ​evident​ ​on​ ​which​ ​pixel​ ​disparities​ ​cannot​ ​be
computed​ ​from​ ​the​ ​black​ ​borders.​ ​This​ ​is​ ​because​ ​if​ ​a​ ​patch​ ​were​ ​to​ ​be​ ​centered​ ​on​ ​one​ ​of
the​ ​border​ ​pixels,​ ​the​ ​extracted​ ​patch​ ​would​ ​smaller​ ​than​ ​specified​ ​by​ ​the​ ​patch​ ​size.​ ​In​ ​this
project,​ ​no​ ​border​ ​handling​ ​was​ ​considered.​ ​For​ ​the​ ​chair,​ ​it​ ​appears​ ​that​ ​patch​ ​size​ ​of
35x35​ ​is​ ​the​ ​best​ ​choice​ ​for​ ​the​ ​disparity​ ​image​ ​because​ ​it​ ​does​ ​not​ ​have​ ​any​ ​high
disparities​ ​calculated​ ​in​ ​the​ ​background​ ​as​ ​is​ ​present​ ​in​ ​the​ ​11x11​ ​case​ ​and​ ​slightly​ ​in​ ​the
21x21​ ​case.​ ​There​ ​are​ ​some​ ​errors​ ​on​ ​the​ ​back​ ​of​ ​the​ ​chair​ ​and​ ​throughout​ ​the​ ​image,​ ​but​ ​it
is​ ​much​ ​better​ ​at​ ​generally​ ​segmenting​ ​different​ ​regions​ ​of​ ​depth​ ​within​ ​the​ ​image​ ​without
too​ ​much​ ​ ​error.​ ​As​ ​mentioned​ ​before,​ ​it​ ​is​ ​expected​ ​that​ ​the​ ​disparity​ ​image​ ​will​ ​be​ ​more
accurate​ ​without​ ​downsampling.​ ​To​ ​get​ ​a​ ​comparable​ ​result​ ​as​ ​with​ ​the​ ​patch​ ​size​ ​of​ ​35x35,
the​ ​patch​ ​size​ ​should​ ​be​ ​increased​ ​to​ ​reflect​ ​the​ ​increase​ ​in​ ​resolution​ ​by​ ​not​ ​downsampling
as​ ​much.

Exploration

Out​ ​of​ ​the​ ​three​ ​ways​ ​to​ ​calculate​ ​the​ ​disparity​ ​image,​ ​this​ ​project​ ​shows​ ​that​ ​the​ ​normalized
cross​ ​correlation​ ​is​ ​the​ ​best.​ ​The​ ​other​ ​two​ ​methods​ ​are​ ​based​ ​on​ ​patch​ ​matching​ ​using​ ​raw
correlation​ ​values​ ​and​ ​sum​ ​of​ ​squared​ ​differences.​ ​Figure​ ​9​ ​shows​ ​the​ ​same​ ​motorbike
stereo​ ​image​ ​downsampled​ ​by​ ​a​ ​factor​ ​of​ ​4.​ ​The​ ​patch​ ​size​ ​under​ ​consideration​ ​is​ ​11x11
and​ ​the​ ​patch​ ​matching​ ​used​ ​was​ ​raw​ ​correlation.​ ​In​ ​this​ ​implementation​ ​of​ ​raw​ ​correlation,
the​ ​average​ ​value​ ​of​ ​the​ ​patch​ ​in​ ​the​ ​left​ ​image​ ​was​ ​subtracted​ ​from​ ​the​ ​patch​ ​before​ ​being
matched​ ​with​ ​the​ ​patches​ ​in​ ​the​ ​right​ ​image.

Figure​ ​9​ ​-​ ​Disparity​ ​image​ ​of​ ​the​ ​motorbike​ ​with​ ​a​ ​patch​ ​size​ ​of​ ​11x11​ ​using​ ​raw​ ​correlation.
This​ ​image​ ​is​ ​downsampled​ ​by​ ​a​ ​factor​ ​of​ ​four.

This​ ​disparity​ ​image​ ​is​ ​saturated​ ​with​ ​high​ ​disparity​ ​measurements​ ​all​ ​over,​ ​in​ ​the
background​ ​and​ ​in​ ​the​ ​foreground.​ ​Interestingly​ ​enough,​ ​the​ ​outline​ ​of​ ​the​ ​motorbike​ ​is​ ​still
discernable.​ ​In​ ​comparison​ ​to​ ​normalized​ ​cross​ ​correlation,​ ​this​ ​result​ ​is​ ​far​ ​less​ ​accurate.

Figure​ ​10​ ​-​ ​Disparity​ ​image​ ​of​ ​motorbike​ ​with​ ​11x11​ ​patch​ ​size​ ​using​ ​sum​ ​of​ ​squared​ ​(SSD)
differences​ ​(downscaled​ ​by​ ​a​ ​factor​ ​of​ ​4)
The​ ​disparity​ ​image​ ​using​ ​SSD​ ​as​ ​a​ ​patch​ ​matching​ ​method​ ​seen​ ​in​ ​figure​ ​10​ ​has​ ​highly
varying​ ​grayscale​ ​values​ ​across​ ​the​ ​image.​ ​The​ ​individual​ ​features​ ​of​ ​the​ ​image​ ​cannot​ ​be
discerned​ ​with​ ​any​ ​fair​ ​degree​ ​of​ ​certainty,​ ​though​ ​the​ ​overall​ ​outline​ ​of​ ​the​ ​bike​ ​is​ ​somewhat
visible.​ ​With​ ​that​ ​being​ ​said,​ ​using​ ​SSD​ ​as​ ​a​ ​patch​ ​matching​ ​method​ ​is​ ​far​ ​less​ ​accurate
than​ ​using​ ​NCC​ ​in​ ​our​ ​case.

Document​ ​of​ ​Roles​ ​on​ ​Project

MATLAB​ ​Coding​ ​for​ ​NCC Matthew​ ​&​ ​Peter

MATLAB​ ​Coding​ ​for​ ​Raw​ ​Correlation Matthew

MATLAB​ ​Coding​ ​for​ ​Sum​ ​of​ ​Squared​ ​Differences Peter

MATLAB​ ​Project​ ​Code​ ​Streamlining​ ​&​ ​Comments Matthew

Overview Matthew

Outline​ ​of​ ​Procedural​ ​Approaches Peter

Experimental​ ​Observations Matthew​ ​&​ ​Peter

Exploration Matthew​ ​&​ ​Peter

Potrebbero piacerti anche