Sei sulla pagina 1di 22

Publishedin

PRESENCE:TeleoperationsandVirtualEnvironments

SpecialIssueonAugmentedReality

Vol.6,No.4,August1997,pp.433­451

ConfluenceofComputerVisionandInteractiveGraphicsfor

AugmentedReality

GudrunJ.Klinker,KlausH.Ahlers,DavidE.Breen,Pierre­YvesChevalier,ChrisCrampton,DouglasS.Greer,

DieterKoller,AndreKramer,EricRose,MihranTuceryan,RossT.Whitaker

EuropeanComputer­IndustryResearchCentre(ECRC)

Arabellastraße17,81925Munich,Germany

Abstract

Augmentedreality(AR)isatechnologyinwhichauser'sviewoftherealworldisenhancedoraugmentedwithadditional informationgeneratedfromacomputermodel.UsingARtechnology,userscaninteractwithacombinationofrealand virtual objectsin anatural way.Thisparadigmconstitutesthecoreofavery promising newtechnology formany applications.However,beforeitcanbeappliedsuccessfully,ARhastofulfillverystrongrequirementsincludingprecise calibration,registrationandtrackingofsensorsandobjectsinthescene,aswellasadetailedoverallunderstandingofthe scene.

Weseecomputervisionandimageprocessingtechnologyplayanincreasingroleinacquiringappropriatesensorand

scenemodels.Tobalancerobustnesswithautomation,weintegrateautomaticimageanalysiswithbothinteractiveuser

assistanceandinputfrommagnetictrackersandCAD­models.Also,inordertomeettherequirementsoftheemerging

globalinformationsociety,futurehuman­computerinteractionwillbehighlycollaborativeanddistributed.Wethus

conductresearchpertainingtodistributedandcollaborativeuseofARtechnology.Wehavedemonstratedourworkin

severalprototypeapplications,suchascollaborativeinteriordesign,andcollaborativemechanicalrepair.Thispaper

describesourapproachtoARwithexamplesfromapplications,aswellastheunderlyingtechnology.

1. Introduction

Augmentedreality(AR)isatechnologyinwhichauser'sviewoftherealworldisenhancedoraugmentedwith additionalinformationgeneratedfromacomputermodel.Theenhancementmayconsistofvirtualartifactstobe fittedintotheenvironment,oradisplayofnon­geometricinformationaboutexistingrealobjects.ARallowsa

usertoworkwithandexaminereal3Dobjects,whilereceivingadditionalinformationaboutthoseobjectsorthe

taskathand.Byexploitingpeople'svisualandspatialskills,ARbringsinformationintotheuser'srealworld

ratherthanpullingtheuserintothecomputer'svirtualworld.UsingARtechnology,userscanthusinteractwitha

mixedvirtualandrealworldinanaturalway.Thisparadigmforuserinteractionandinformationvisualization

constitutesthecoreofaverypromisingnewtechnologyformanyapplications.However,realapplicationsimpose

verystrongdemandsonARtechnologythatcannotyetbemet.Someofsuchdemandsarelistedbelow.

Inordertocombinerealandvirtualworldsseamlesslysothatthevirtualobjectsalignwellwiththereal

ones,weneedveryprecisemodelsoftheuser'senvironmentandhowitissensed.Itisessentialto

determinethelocationandtheopticalpropertiesoftheviewer(orcamera)andthedisplay,i.e.:weneedto

calibratealldevices,registerthemandallobjectsinaglobalcoordinatesystem,andtrackthemovertime

whentheusermovesandinteractswiththescene.

Realisticmergingofvirtualobjectswitharealscenerequiresthatobjectsbehaveinphysicallyplausible

mannerswhentheyaremanipulated,i.e.:theyoccludeorareoccludedbyrealobjects,theyarenotableto

movethroughotherobjects,andtheyareshadowedorindirectlyilluminatedbyotherobjectswhilealso

castingshadowsthemselves.Toenforcesuchphysicalinteractionconstraintsbetweenrealandvirtual

objects,theAR­systemneedstohaveaverydetaileddescriptionofthephysicalscene.

InordertocreatetheillusionofanARinterfaceitisrequiredtopresentthevirtualobjectswithahigh

degreeofrealism,andtobuilduserinterfaceswithahighdegreeofimmersion.Convincinginteractionand

informationvisualizationtechniquesarestillverymucharesearchissue.Ontopofthat,formulti­user

applicationsinthecontextofARitisnecessarytoaddressthedistributionandsharingofvirtual

environments,thesupportforusercollaborationandawareness,andtheconnectionbetweenlocaland

remoteARinstallations.

Weseecomputervisionandimageprocessingtechnology—althoughstillrelativelybrittleandslow—playan

increasingroleinacquiringappropriatesensorandscenemodels.Ratherthanusingthevideosignalmerelyasa

backdroponwhichvirtualobjectsareshown,weexploretheuseofimageunderstandingtechniquestocalibrate,

registerandtrackcamerasandobjectsandtoextractthethree­dimensionalstructureofthescene.Tobalance

robustnesswithautomation,weintegrateautomaticimageanalysiswithinteractiveuserassistanceandwithinput

frommagnetictrackersandCAD­models.

InourapproachtoARwecombinecomputer­generatedgraphicswithalivevideosignalfromacameratoproduce anenhancedviewofarealscene,whichisthendisplayedonastandardvideomonitor.Wetrackusermotionand

providebasicpointingcapabilitiesinformofa3Dpointingdevicewithanattachedmagnetictracker,asshownin

Figure6.ThissufficesinourapplicationscenariostodemonstratehowARcanbeusedtoqueryinformationabout

objectsintherealworld.Forthemanipulationofvirtualobjects,weusemouse­basedinteractioninseveralrelated

2Dviewsofthesceneonthescreen.

WeconductresearchpertainingtodistributedandcollaborativeuseofARtechnology.Consideringthegrowing globalinformationsociety,weexpectanincreasingdemandforcollaborativeuseofhighlyinteractivecomputer technologyovernetworks.Ouremphasisliesonprovidinginteractionconceptsanddistributiontechnologyfor userswho collaboratively exploreaugmented realities,both locally immersed and remotely in theformofa telepresence.

Wehavedemonstratedourworkinseveralprototypeapplications,suchascollaborativeinteriordesign,and

collaborativemechanicalrepair.ThispaperdescribesourapproachtoARwithexamplesfromapplications,as

wellastheunderlyingtechnology.

2. PreviousWork

Researchinaugmentedrealityisarecentbutexpandingareaofresearch.Webrieflysummarizetheresearch conductedtodate.BaudelandBeaudouin­Lafonhavelookedattheproblemofcontrollingcertainobjects(e.g.,

cursorsonapresentationscreen)throughtheuseoffreehandgestures(Baudel&Beaudouin­Lafon,1993).Feiner

etal.haveusedaugmentedrealityinalaserprintermaintenancetask.Inthisexample,theaugmentedreality systemaidstheuserin thestepsrequired to open theprinterand replacevariousparts(Feiner,MacIntre&

Seligmann,1993).Wellnerhasdemonstratedanaugmentedrealitysystemforofficeworkintheformofavirtual

desktop on a physical desk (Wellner,1993).He interacts on this physical desk both with real and virtual documents.Bajuraetal.haveusedaugmentedrealityinmedicalapplicationsinwhichtheultrasoundimageryof

apatientissuperimposedonthepatient'svideoimage(Bajura,Fuchs&Ohbuchi,1992).Lorensenetal.usean

augmentedrealitysysteminsurgicalplanningapplications(Lorensen,Cline,Nafis,Kikinis,Altobelli&Gleason,

1993).MilgramandDrascicetal.useaugmentedrealitywithcomputergeneratedstereographicstoperform

teleroboticstasks(Milgram,Zhai,Drascic&Grodski,1993;Drascic,Grodski,Milgram,Ruffo,Wong&Zhai,

1993).CaudellandMizelldescribetheapplicationofaugmentedrealitytomanualmanufacturingprocesses

(Caudell&Mizell,1992).Fournierhasposedtheproblemsassociatedwithilluminationincombiningsynthetic

imageswithimagesofrealscenes(Fournier,1994).

TheutilizationofcomputervisioninARhasdependedupontherequirementsofparticularapplications.Deering hasexploredthemethodsrequiredtoproduceaccuratehighresolutionhead­trackedstereodisplayinorderto achieve sub­centimetervirtual to physical registration (Deering,1992).Azuma and Bishop,and Janin et al.

describetechniquesforcalibratingasee­throughhead­mounteddisplay(Azuma&Bishop,1994;Janin,Mizell&

Claudell,1993).GottschalkandHughespresentamethodforauto­calibratingtrackingequipmentusedinARand

VR(Gottschalk&Hughes,1993).GleicherandWitkinstatethattheirthrough­the­lenscontrolsmaybeusedto

register3Dmodelswithobjectsinimages(Gleicher&Witkin,1992).Morerecently,BajuraandNeumannhave

addressedtheissueofdynamiccalibrationandregistrationinaugmentedrealitysystems(Bajura&Neumann,

1995).Theyuseaclosed­loopsystemwhichmeasurestheregistrationerrorinthecombinedimagesandtriesto

correctthe3Dposeerrors.Grimsonetal.haveexploredvisiontechniquestoautomatetheprocessofregistering

medicaldatatoapatient'sheadusingsegmentedCTorMRIdataandrangedata(Grimson,Lozano­Perez,Wells,

Ettinger,White&Kikinis,1994;Grimson,Ettinger,White,Gleason,Lozano­Perez,Wells&Kikinis,1995).Ina

relatedproject,Mellorrecentlydevelopedareal­timeobjectandcameracalibrationalgorithmthatcalculatesthe relationshipbetweenthecoordinatesystemsofanobject,ageometricmodel,andtheimageplaneofacamera

(Mellor,1995).UenoharaandKanadehavedevelopedtechniquesfortracking2Dimagefeatures,suchasfiducial

marksonapatient'sleg,inrealtimeusingspecialhardwaretocorrelateaffineprojectionsofsmallimageareas

betweenimages(Uenohara&Kanade,1995).Periaetal.usespecializedopticaltrackingdevices(calibratedplates

withLEDsattachedtomedicalequipment)totrackanultrasoundprobeandregisteritwithSPECTdata(Peria,

Chevalier,François­Joubert,Caravel,Dalsoglio,Lavallee&Cinquin,1995).Bettingetal.aswellasHenrietal.

usestereodatatoalignapatient'sheadwithMRIorCTdata(Betting,Feldmar,Ayache&Devernay,1995;Henri,

Colchester,Zhao,Hawkes,Hill&Evans,1995).

Someresearchershavestudiedthecalibrationissuesrelevanttoheadmounteddisplays(Bajura,Fuchs&Ohbuchi,

1992;Caudell&Mizell,1992;Azuma&Bishop,1994;Holloway,1994;Kancherla,Rolland,Wright&Burdea,

1995).Othershavefocusedonmonitorbasedapproaches(Tuceryan,Greer,Whitaker,Breen,Crampton,Rose&

Ahlers,1995;Betting,Feldmar,Ayache&Devernay,1995;Grimson,Ettinger,White,Gleason,Lozano­Perez,

Wells&Kikinis,1995;Henri,Colchester,Zhao,Hawkes,Hill&Evans,1995;Mellor,1995;Peria,Chevalier,

François­Joubert,Caravel,Dalsoglio,Lavallee&Cinquin,1995;Uenohara&Kanade,1995).Bothapproaches

canbesuitabledependingonthedemandsoftheparticularapplication.

3. ApplicationScenarios

We have developed a comprehensive system,GRASP,which we have used asthe basisforourapplication demonstrations.Thissectiondiscussestwoexamples.ThenextsectionsdescribeindetailtheGRASPsystemand theresearchissuesthatwefocuson.

1.

CollaborativeInteriorDesign

Figure1.Augmentedroomshowingarealtablewitharealtelephoneandavirtuallamp,surrounded

bytwovirtualchairs.Notethatthechairsarepartiallyoccludedbytherealtablewhilethevirtual

lampoccludesthetable.

Thescenariofortheinteriordesignapplicationassumesanofficemanagerwhoisworkingwithaninterior designeronthelayoutofaroom(Ahlers,Kramer,Breen,Chevalier,Crampton,Rose,Tuceryan,Whitaker

&Greer,1995).Theofficemanagerintendstoorderfurniturefortheroom.Onacomputermonitorthey

both see a picture of the real room from the viewpoint of the camera. By interacting with various manufacturersoveranetwork,theyselectfurniturebyqueryingdatabasesusingagraphicalparadigm.The systemprovidesdescriptionsandpicturesoffurniturethatisavailablefromthevariousmanufactureswho havemademodelsavailableintheirdatabases.PiecesorgroupsoffurniturethatmeetcertainreUsing GraphicsHardwareinVolumeRenderingApplicationsquirementssuchascolor,manufacturer,orprice

mayberequested.Theuserschoosepiecesfromthis"electroniccatalogue"and3Drenderingsofthis

furnitureappearonthemonitoralongwiththeviewoftheroom.Thefurnitureispositionedusinga3D

mouse.Furniturecanbedeleted,added,andrearrangeduntiltheusersaresatisfiedwiththeresult;they

viewthesepiecesonthemonitorastheywouldappearintheactualroom.Astheymovethecamerathey

canseethefurnishedroomfromdifferentpointsofview.

Theuserscanconsultwithcolleaguesatremotesiteswhoarerunningthesamesystem.Usersatremote sitesmanipulatethesamesetoffurnitureusingastaticpictureoftheroomthatisbeingdesigned.Changes byoneuserareseeninstantaneouslybyalloftheothers,andadistributedlockingmechanismensuresthat apieceoffurnitureismovedbyonlyoneuseratatime.Inthiswaygroupsofusersatdifferentsitescan

worktogetheronthelayoutoftheroom(seeFigure1).Thegroupcanrecordalistoffurnitureandthe

layoutofthatfurnitureintheroomforfuturereference.

2. CollaborativeMechanicalRepair

Figure2.Augmentedengine.

In the mechanical maintenance and repair scenario, a mechanic is assisted by an AR system while

examiningandrepairingacomplexengine(Kramer&Chevalier,1996).Thesystempresentsavarietyof

informationtothemechanic,asshowninFigure2.Annotationsidentifythenameofparts,describetheir

function,orpresent otherimportant information likemaintenanceormanufacturing records.Theuser interactswiththerealobjectinitsnaturalsettingwithapointingdevicemonitoredbythecomputer.As themechanicpointstoaspecificpartoftheengine,theARsystemdisplayscomputer­generatedlinesand text(annotations)thatdescribethevisiblecomponentsorgivetheuserhintsabouttheobject.Queries withthepointingdeviceonthereal­worldobjectmaybeusedtoaddanddeleteannotationtags.Sincewe alsotracktheengine,theannotationsmovewiththeengineasitsorientationchanges.Thelinesattaching theannotationtagswiththeenginefollowtheappropriatevisiblecomponents,allowingtheusertoeasily identifythedifferentpartsastheviewoftheenginechanges.Themechaniccanalsobenefitfromthe assistanceofaremoteexpertwhocancontrolwhatinformationisdisplayedonthemechanic'sARsystem.

4. SystemInfrastructure

Figure3.TheGRASPsystemhardwareconfiguration.

Figure4.TheGRASPsystemsoftwareconfiguration.

TheGRASPsystemformsthecentralcoreofoureffortstokeepthegraphicsandvisualsceneinalignmentandto

provideaninteractivethree­dimensionalinterface(Ahlers,Crampton,Greer,Rose&Tuceryan,1994).Figure3

showsaschematicoftheGRASPhardwareconfiguration.Theworkstationhardwaregeneratesthegraphicalimage anddisplaysitonahighresolutionmonitor.Ascanconvertertransformsthegraphicsdisplayedonthemonitor intoastandardvideoresolutionandformat.Thescanconverteralsomixesthisgeneratedvideosignalwiththe

videosignalinputfromthecameravialuminancekeying.A6­DOFmagnetictracker,whichiscapableofsensing

thethreetranslationalandthethreerotationaldegreesoffreedom,providestheworkstationwithcontinually updated valuesforthe position and orientation ofthe tracked objects,including the video camera and the pointing device.Aframe grabberdigitizes video images forprocessing within the computerduring certain operations.ThesoftwarehasbeenimplementedusingtheC++programminglanguage.Aschematicdiagramofthe

softwarearchitectureisshowninFigure4.

5. SpecificationandAlignmentofCoordinateSpaces

Inordertoalignthevirtualandrealobjectsseamlessly,weneedveryprecisemodelsoftheuser'senvironmentand

howitissensed.Itisessentialtocalibratesensorsanddisplaydevices(i.e.,todeterminetheirlocationsand

opticalproperties),toregisterallobjectsandinteractiondevicesinaglobalcoordinatesystem,andtotrackthem

whiletheuseroperatesinthescene.

1. CalibrationofSensorsandVideoEquipment

Duringtheinitialsetup,thecameracharacteristics,thelocationofthe6Dtrackerandtheeffectsofscan

conversionandvideomixingmustbedetermined.Theseproceduresarereferredtoastheimage,camera,

andtrackingcalibration(Tuceryan,Greer,Whitaker,Breen,Crampton,Rose&Ahlers,1995).Wenow

describeseveralsuchtechniquesthatmixcomputervisionalgorithmswithvaryingamountsofmodel­

basedinformationandinteractiveinputfromtheuser.

1. ImageCalibration

OneoftheessentialstepsofourARsystemisthemixingoflivevideoinputwithsynthetically generatedgeometricdata.Whiletheliveinputiscapturedasananalogvideosignalbythecamera system,thesyntheticdataisrendereddigitallyandthenscanconvertedintoavideosignal.Inorder to align the two signals,we need to determine the horizontal and vertical positioning ofthe rendered,scan converted image with respect to the camera image,aswell asthe relationship betweenthetwoaspectratios.

Weuseasynthetictestimagethathastwomarkersinknownpositionstocomputefourdistortion

parameters(2Dtranslationandscaling).Thetestimageisscanconvertedintoavideosignal.For

image calibration purposes,we redigitize it and determine the location ofthe markersin the

grabbedimage.Thediscrepancybetweentheoriginallocationofthemarkersandtheirpositionin the grabbed image determines the translational and scaling distortions induced by the scan converter.Thisinteractiveimagecalibrationmethodaskstheusertoidentifythetwomarkersinthe grabbedimage.

The GRASPsystem also provides an alternative,automatic routine to compute the distortion parameters.Algorithmically,itiseasiertofindalarge,homogeneouslycoloredareainanimage thanthethinlinesofamarker.Accordingly,theautomaticalgorithmusesadifferenttestimage whichcontainsoneblacksquare.Itfindsthedarkarea,fitsfourlinestoitsboundariesandthus determines the corners of the square. Two of the corners suffice to determine the distortion parametersofthescanconverter.

Thecomparisonofthetwoapproachesillustratesanimportantdistinctionbetweeninteractiveand automatic algorithms: while humans work best with sharp line patterns to provide precise interactive input, automatic algorithms need to accommodate imprecision due to noise and digitization effects and thus work better on thicker patterns. On the other hand, automatic algorithmscandeterminegeometricpropertiesofextendedareasmorepreciselythanhumans,such asthecenter,anedgeoracornerofanarea.Inconclusion,itisessentialtothedesignofasystem andtoitsuseinanapplicationthatvisualcalibrationaidesbechosenaccordingtotheirintended use.Thisisarecurringthemeinourwork.

2. CameraCalibration

Figure5.Thecameracalibrationgrid.

Cameracalibration istheprocesswhich calculatestheextrinsic(position and orientation)and intrinsicparameters(focallength,imagecenter,andpixelsize)ofthecamera.Weassumethatthe intrinsicparametersofthecameraremainfixedduringtheaugmentedrealitysession.Thecamera's extrinsicparametersmaybetrackedandupdated.

Tocomputethecamera'sintrinsicandextrinsicparameters,wepointthecameraataknownobject

inthescene,thecalibrationgridshowninFigure5.Thepositionofthegridand,inparticular,the

position of the centers of the butterfly markers on the grid are known within the 3D world

coordinatesystem.Weusethemappingfromthese3Dobjectfeaturesto2Dimagefeaturesto

calculatethecurrentvantagepointofthecameraanditsintrinsicimagedistortionproperties.In

principle,eachmappingfroma3Dpointto2Dimagecoordinatesdeterminesarayinthescenethat

alignstheobjectpointwiththefocalpointofthecamera.Accordingtothepinholecameramodel, several such rays from different object points intersect at the focal point and thus uniquely determinetheposeofthecamera,aswellasitsimagingproperties.Accordingly,wecandefinea systemofequationstocomputetheintrinsicandextrinsiccameraparametersusingamappingof objectpointstoimagepointsandminimizingmeasurementerrors.Thedetailsaredescribedin

(Tuceryan,Greer,Whitaker,Breen,Crampton,Rose&Ahlers,1995).

TheGRASPsystemprovidesaninteractivecameracalibrationroutine:Auserindicatesthecenter

ofallbutterflypatternswithamouseandlabelsthembytypingtheappropriatecodenameonthe

keyboard.

Wealsouseanautomatic,computervision­basedcameracalibrationalgorithm.Inthisapproach,

weuseacalibrationboardthatshowsanarrangementof42blacksquaresonawhitebackground.

Processingtheimageatacoarsescale,wequicklydeterminethepositionsandextentsofblack

blobsintheimage.Byfittingrectanglestothebloboutlinesatfinerscalesandmatchingthemleft

torightandtoptobottomtothesquaresofthecalibrationboard,wedeterminethecalibration

parametersofthecamera.

3. MagneticTrackerCalibration

AlthoughweemphasizeinthispapertheuseofcomputervisiontechniquesforAR,wedonotrely exclusively on optical information. Complementarily, we also exploit magnetic tracking technology,aswellasotherinteractiveormodel­basedinput.Thetrackingsystemconsistsofa transmitterandseveralreceivers(trackers)thatcanbeattachedtoobjects,camerasandpointersin

thescene.Thetrackingsystemautomaticallyrelatesthe3Dpositionandorientationofeachtracker

to atracking coordinatesystemin thetransmitterbox.Itisthetask ofthetrackercalibration proceduretodeterminewherethetrackingcoordinatesystemresideswithrespecttotheworld coordinatesystemoftheARapplication.ThisisacriticalissuethatusuallydoesnotariseinVR applicationssincesuchsystemsonlyneedtotrackrelativemotion.Yet,theabsolutepositioning andtrackingofobjectsanddeviceswithinarealworldcoordinateframeisofgreatestimportance inARscenarioswhererealityisaugmentedwithvirtualinformation.

Atthebeginningofeachsession,wecalibratethemagnetictrackingsystem,relatingitslocal coordinatesystemtotheworldcoordinatesystem.Thisprocessiscurrentlyperformedinteractively, usingthesamecalibrationgridasforcameracalibration.Wedothisbydeterminingthelocationof atleastthreepointsonthecalibrationgridwithmagnetictrackers.Sincethesepointsarealso knownintheworldcoordinatesystem,wecanestablishasystemoflinearequations,relatingthe tracked coordinatesto the world coordinatesand thusdetermining the unknown position and orientationparametersofthetracker(Tuceryan,Greer,Whitaker,Breen,Crampton,Rose&Ahlers,

1995).

2. RegistrationofInteractionDevicesandRealObjects

Inadditiontothesensingdevicesthatwerecalibratedintheprevioussection,scenesalsocontainphysical

objectsthattheuserwantstointeractwithusing3Dinteractiondevices.Suchobjectsandgadgetsneedto

beregisteredwithrespecttotheworldcoordinatesystem.

1. PointerRegistration

Figure6.3Dpointingdevice.

Currently,weusethemagnetictrackingsystemtoregisterandtrackthepositionofa3Dpointerin

oursystem(seeFigure6).

Forthepointerregistration,weneedtodeterminetheposition(offset)ofthetipofapointerin

relationshiptoanattachedmagnetictracker.Ourprocedurerequirestheusertopointtothesame

pointin3Dspaceseveraltimes,usingadifferentorientationeachtimeforapointerthathasbeen

attachedtooneofthetrackers.Foreachpick,thepositionandtheorientationofthetrackermark withinthetrackercoordinatesystemarerecorded.Theresultofthisprocedureisasetofpointsand directionswiththecommonpropertythatthepointsareallthesamedistancefromthesingle,

pickedpointin3Dspaceandallofthedirectionsassociatedwiththepointsareorientedtowardthe

pickedpoint.Fromthisinformation,wecancomputesixparametersdefiningthepositionand orientation ofthe pointing device,using a least­squaresapproach to solve an overdetermined systemoflinearequations.

2. ObjectRegistration

Objectregistrationistheprocessoffindingthesixparametersthatdefinethe3Dpositionand

orientation,i.e.pose,ofanobjectrelativetosomeknowncoordinatesystem.Thisstepisnecessary, even when tracking objectsmagnetically,in orderto establish the 3Drelationship between a magneticreceiverandtheobjecttowhichitisfastened.

Wehavestudiedtwostrategiesfordeterminingthe3Dposeofanobject(Whitaker,Crampton,

Breen,Tuceryan&Rose,1995).Thefirstisacamerabasedapproach,whichreliesonacalibrated

cameratomatch3Dlandmarks("calibrationpoints")ontheobjecttotheirtoprojectioninthe

imageplane.Thesecondmethodusesthe3Dcoordinatesofthecalibrationpoints,asindicated

manuallyusingthe3Dpointerwithmagnetictracking,inordertoinferthe3Dposeoftheobject.

Therehasbeenextensiveresearchinposedeterminationinthecomputervision(Lowe,1985;

Grimson,1990),butmostofthesetechniquesapplytoonlylimitedclassesofmodelsandscenes.

Thefocusofthecomputervisionresearchistypicallyautomationandrecognition,featuresthatare interesting,butnotessentialtoaugmentedvision.Inourwork,thelocationsoflandmarkpointsin theimagearefoundmanuallybyauserwithamouse.Weassumethatthepointsaremappedfrom

knownlocationsin3­spacetotheimageviaarigid3Dtransformationandaprojection.

Werepresenttheorientationoftheobjectas33rotationmatrix,whichcreatesalinearsystem

with12unknowns.Eachpointgives2equations,and6pointsarenecessaryforauniquesolution.

In practiceweassumenoisein theinput dataand usean overdetermined systemwith aleast

squaredsolutioninordertogetreliableresults.However,becauseweusea33rotationmatrix,

,andtreateachelementasanindependentparameter,thislinearsystemdoesnotguaranteean

orthonormalsolutionsforthismatrix,anditcanproduce"non­rigid"rotationmatrices.Suchnon­

rigiditiescanproduceundesirableartifactswhenthesetransformationsarecombinedwithothersin

thegraphicssystem.

thegraphicssystem. Orthonormality is enforced adding an additional penalty to

Orthonormality is enforced adding an additional penalty to the least­squared solution,

adding an additional penalty to the least­squared solution,

.Thiscreatesanonlinearoptimizationproblemwhichwesolvethroughgradient descent. The gradient descent is initialized with the unconstrained (linear) solution, and

constrainedsolutionsaretypicalfoundin10­15iterations.

Figure7.Calibrationandtrackinganenginemodel:Awireframeenginemodelregisteredtoa

realmodelengineusinganimage­basedcalibration(a),butwhenthemodelisturnedandits

movementstracked(b),thegraphicsshowthemisalignmentinthecamera'sz­direction.

Despitegoodpointwisealignmentintheimageplane,theimage­basedcalibrationcanproduce significanterrorinthedepthtermwhichisnotseeninthereprojectedsolutions.Forinstance,inthe

caseoftheenginemodelshowninFigure7(a),theimage­basedapproachcanproducearigid

transformationwhichmatcheslandmarkpointsintheimagetowithinabout2pixels.Yettheerror

inthez­direction(distancefromthecamera)canbeasmuchas2­3centimeters.Thiserrorbecomes

evidentastheobjectisturnedasinFigure7(b).Weattributethiserrorprimarilytoerrorinthe

cameracalibration,andbettercameramodelsandcalibrationproceduresareatopicofongoing research.Becauseofsucherrorwehavedevelopedtheproceduredescribedinthenextsectionfor

calibratingobjectswitha3Dpointingdevice.

Theproblemhereistocomputetherigidtransformationbetweenasetof3Dpointpairs.Usingthe

3Dpointerandseveralkeystrokestheuserindicatestheworldcoordinates(orsomeotherknown

coordinate system)oflandmark pointson the object.also givesrise to a linearsystemof12

unknowns.Forauniquesolution4pointsareneeded,butinmostcasesweusemorethan4points

and solve forthe least­squares error.As with the image­based object calibration,errorin the measurements can produce solutions that represent non­rigid transformations. Thus, the same nonlinearpenaltytermcanbeintroducedinorderproduceconstrainedsolutions.

3. TrackingofObjectsandSensors

Calibrationandregistrationrefertostationaryaspectsofascene.InageneralARscenario,however,we

havetodealwithdynamicscenechanges.Withtrackingwedenotetheabilityofoursystemtocopewith

thosedynamicscenechanges.Thus,whilethecalculationoftheexternalcameraparametersandofthe

poseofanobjectaretheresultsofcalibrationandregistration,trackingcanberegardedasacontinuos

updateofthoseparameters.Wearecurrentlyexploringandusingtwoapproachestotracking,magnetic

tracking,andopticaltracking.

1. MagneticTracking

Asamagnetictrackingdeviceweusethe6Dtracker"FlockofBirds"fromAscensionTechnology

Corporation. Receivers are attached to the camera and each potential moving object. These receiverssensethesixdegreesoffreedom(threetranslationalandthreerotational)withrespecttoa transmitter,whoselocationisbeingkeptfixedinworldcoordinates.

Initially, we have relied exclusively on this magnetic technology since the trackers provide positionalandorientationalupdatesatnearlyreal­timespeedsandoperatewellinalaboratory setup.However,magnetictrackingisnotpracticableinlargescale,realisticsetups,becausethe trackingdatacaneasilybecorruptedbyferro­magneticmaterialsinthevicinityofthereceiverand becausethetrackersoperateonlyinalimitedrange.Anotherdrawbackisthelimitedaccuracyof thesensorreadings.

2. OpticalTracking

Opticaltrackingmethodsarebasedondetectingandtrackingcertainfeaturesintheimage.These canbelines,cornersoranyothersalientfeatures,whichareeasyandreliabletodetectintheimage

andcanuniquelybeassociatedwithfeaturesofthe3Dworld.Ourtrackingapproachcurrentlyuses

thecornersofsquaresattachedtoobjectsorwalls(seeFigure8)totrackamovingcamera.Oncethe

cameraparametersarerecovered,thescenecanbeaugmentedwithvirtualobjects,suchasshelves

andchairs(seeFigure9).

Figure8.Ouropticaltrackingapproachcurrentlytracksthecornersofsquares.Theleftfigure

showsacornerofaroomwitheightsquares.Therightfigureshowsthedetectedsquaresonly.

Figure9.Augmentedscenewithavirtualchairandshelfthatwererenderedusingthe

automaticallytrackedcameraparameters.

Thisscenario isrelevant to many AR applicationswhereausermovesin thesceneand thus continuouslychangeshis(thecamera's)viewpoint.Weuseafixedworldcoordinatesystem,thus recomputingthecameraparametersrelativetotheworldframeineachstep.Conversely,wecould also recompute the position ofthe world system relative to the camera frame,thus using an egocentricframeofrefererence.Theadvantageoftheformerapproachisthatwecanthusexploit certainmotioninvariantswhichmakethetrackingproblemmuchsimpler.

Weassumethatamodelofthesceneexistsandthatweareabletoadd"fiducialmarks",suchas

blacksquares,tothescenetoaidthetrackingprocess.Thesquaresareregisteredinthe3Dscene

model.Thus,inprinciple,thesamecameracalibrationtechniquesdescribedinsection5.1.2.canbe

usedtodetermine,atanypointintime,thepositionofthecamerainscene.Yet,duringthetracking phase,we need to pay particularattention to speed and robustnessofthe algorithms.To our advantage,wecanexploittimecoherenceofuseractions:usersmoveincontinuousmotions.We canbenefitfromprocessingresultsofpreviousimagesandfromanadaptivemodeloftheuser motiontopredictwherethetrackedfeatureswillappearinthenextframe.Wethusdonotneedto performthefullcameracalibrationprocedureoneverynewincomingimage.

Itiswellknownthatreasoningaboutthreedimensionalinformationfromtwodimensionalimages iserrorproneand sensitiveto noise,afactwhich hasto betaken into accountin any image processingmethodusingrealvideodata.Inordertocopewiththisnoisesensitivityweexploit physicalconstraintsofmovingobjects.Sincewedonothaveanyaprioriknowledgeaboutforces changingthemotionofthecameraortheobjects,weassumenoforces(accelerations)andhencea constantvelocity.Inthiscaseageneralmotioncanbedecomposedinaconstanttranslational velocityofthecenterofmassoftheobject,andarotationwithconstantangularvelocityaroundan

axisthroughthecenterofmass(e.g.Goldstein,1980).Thisconstitutesourso­calledmotionmodel

(seeFigure10).Sowedonotonlymeasure(estimate)thepositionandorientationofthecamera

andmovingobjects—asinthecaseofmagnetictracking—butalsotheirchangeintimewith

respecttoastationaryworldframe,i.e.theirtranslationalandangularvelocity.Thisisalsoreferred

toasmotionestimation.

Figure10.Each3Dmotioncanbedecomposedinatranslationtandarotation.Wechoosea

rotationaboutanaxesthroughthecenterofmassoftheobjects,whichisconstantintheabsence

denotesthe

ofanyforces.

cameracoordinateframe.

denotesthe ofanyforces. cameracoordinateframe. denotestheworldcoordinateframe,and
denotesthe ofanyforces. cameracoordinateframe. denotestheworldcoordinateframe,and

denotestheworldcoordinateframe,and

Themotionparameters(translationalandangularvelocityaccordingtothemotionmodel)are estimated using time­recursivefiltering based on Kalman Filtertechniques(e.g.Bar­Shalom&

Fortmann,1988;Gelb,1974),wheretheunknownaccelerationsaresuccessfullymodeledasso­

calledprocessnoise,inordertoallowforchangesofthevelocities.Thetime­recursivefiltering

processenablessmoothmotionseveninthepresenceofnoisyimagemeasurements,andenablesa

prediction­measurement­updatestepforeachvideoframe.Thepredictionallowsareductionofthe

searchspaceforfeaturesinthenextvideoimageandhencespeedsuptheprocess.

Atypicaldrawbackofopticalmethodsisbasedonthefactthatwewanttoreasonaboutthree dimensionalinformationfromtwodimensionalimagemeasurements,whichcanleadtonumerical instabilitiesifnotperformedcarefully.Ontheotherhandthereistheadvantageoftheimageofreal objectsbeingalmostperfectlyalignedwiththerenderedcounterpartsincethealignmenterrorcan be minimized in the image.Optical tracking approachescan hence be very accurate.Another advantageofopticaltrackingisthatitisanonintrusiveapproach,sinceitoperatesjustonvisual information,anditisbasicallynotlimitedtoanyspatialrange.Itisfurthermoresomehownatural sinceitisthewaymosthumanstrackobjectsandnavigatewithinanenvironment.

4. ObjectInteraction

Realistic immersion of virtual objects into a real scene requires that the virtual objects behave in physicallyplausiblemannerswhentheyaremanipulated,i.e.:theyoccludeorareoccludedbyrealobjects, theyarenotabletomovethroughotherobjects,andtheyareshadowedorindirectlyilluminatedbyother objectswhilealsocastingshadowsthemselves.Toenforcesuchphysicalinteractionconstraintsbetween realandvirtualobjects,theAugmentedRealitysystemneedstohaveaverydetaileddescriptionofthe physicalscene.

1. Acquisitionof3DSceneDescriptions

Figure11.ModifiedEngine.Thefactthattheuserhasremovedtheaircleanerisnotyet

detectedbytheARsystem.Thevirtualmodelthusdoesnotalignwithitsrealposition.

The most straightforward approach to acquiring scene descriptions would suggest the use of geometricmodels,e.g.,CAD­data.Givensuchmodels,theARsystemneedstoalignthemwith

theirphysicalcounterpartsintherealscene,asdescribedinsection5.2.2.Theadvantageofusing

such modelsisthat they can easily serveasstarting pointsforaccessing high­level,semantic informationabouttheobjects,asisdemonstratedinthemechanicalrepairapplication.

However,therearesomeproblemswiththisapproach.First,geometricmodelsarenotavailablein allcases.Forexample,interiorrestorationofoldbuildingstypicallyneedstooperatewithoutCAD­ data.Second,availablemodelsarenotcomplete.Sincemodelsareabstractionsofreality,real physicalobjectstypicallyshowmoredetailthanisrepresentedinthemodels.Inparticular,generic scenemodelscannotfullyanticipatetheoccurrenceofnewobjects,suchascoffeemugsontables, carsorcranesonconstructionsites,users'hands,orhumancollaborators.Furthermore,thesystem needs to account for the changing appearances of existing objects, such as buildings under

constructionorenginesthatarepartiallydisassembled(seeFigure11).Whenusersseesuchnewor

changedobjectsinthescene,theyexpectthevirtualobjectstointeractwiththeseastheydowith

therestofthe(modeled)scene.

Computervision techniquescan beused to acquireadditional information fromtheparticular sceneunderinspection.Althoughsuchinformationgenerallylackssemanticdescriptionsaboutthe sceneandthuscannotbeuseddirectlytoaugmentrealitywithhigher­levelinformation,suchas

theelectricwiringwithinawall,itprovidestheessentialenvironmentalcontextfortherealistic immersion ofvirtualobjectsinto thescene.Thus,weexpectfutureARsystemsto usehybrid solutions,usingmodeldatatoprovidethenecessaryhigh­levelunderstandingoftheobjectsthat aremostrelevanttothetasksperformed,andenrichingthemodelswithautomaticallyacquired furtherinformationaboutthescene.

We are investigating howstate­of­the­art image understanding techniquescan be used in AR applications.One particularparadigmin computervision,shape extraction, determines depth

informationasso­called2½­Dsketchesfromimages.Thesearenotfull3Ddescriptionsofthescene

butratherprovidedistance(depth)estimates,withrespecttothecamera,forsomeorallpixelsinan image.Ongoingresearchdevelopstechniquestodetermineobjectshapefromstereoimages,from motionsequences,fromobjectshading,fromshadowcasting,fromhighlightsandgloss,andmore. Itisimportanttoconsiderwhetherandhowsuchalgorithmscanbeusedcontinuously,i.e.,while theuserisworkinginthescene.Alternatively,thealgorithmscouldbeusedduringtheinitialsetup

phase,gathering3Dsceneinformationonceandcompilingaroughsketchofthescenethatthen

needstobeupdatedwithothertechniquesduringtheARsession.Yetotheroptionsinvolvetheuse

ofothersensingmodalitiesbesidescameras,suchaslaserrangescannersorsonarsensors.

Thissectiondiscussestwoapproachesweareinvestigating.

1. DenseShapeEstimatesfromStereoData

Stereoisaclassicalmethodofbuildingthree­dimensionalshapefromvisualcues.Ituses twocalibratedcameraswithtwoimagesofthescenefromdifferentvantagepoints.Using stereotriangulation,the3Dlocation ofdominant object featuresthat areseen in both imagescanbedetermined:ifthesamepointonanobjectisseeninbothimages,rayscast fromthefocalpointsofbothcamerasthroughthefeaturepositionsintheimagesintersectin

3Dspace,determiningthedistanceoftheobjectpointfromthecameras.

Shape from stereo has been studied extensively in the computervision literature.The choiceofimagefeaturedetectionalgorithmsandoffeaturematchingalgorithmsbetween imagesisofcriticalimportance.Dependingonthetypeofmethodsandalgorithmsone uses,shapefromstereo may result in sparsedepth mapsordensedepth maps.Forour

research,thegoalistousethecomputed3DshapeinformationintheARapplications.In

mostifnotallsuchscenarios,theavailabilityofdensemapsareneeded.Therefore,wehave

takenanexistingalgorithm(Weng,Huang&Ahuja,1989)tocomputeadensedepthmap

whichisusedintheARcontext.Thecamerageometryisobtainedbycalibratingboth

camerasindependentlyusingoneofthecameracalibrationmethodsdescribedinsection

5.1.

Thedetailsofthestereoalgorithmaregiveninthepaper(Weng,Huang&Ahuja,1989).In

summary,theheartofthealgorithmliesinthecomputationofthedisparitymap(du,dv) whichdescribesthedistancebetweenmatchedpointsinbothimages.Thisisaccomplished by computing matchesbetween fourkindsofimage featuresderived fromthe original images: smoothed intensity images, edge magnitudes, positive corners, and negative corners. The positive and negative corners separate the contrast direction at a corner. Distinguishing between these four feature types improves the matching results by preventing that incompatible image features are matched between the images,such as positiveandnegativecorners.

Theoverallalgorithmiterativelydeterminesthe(locally)bestmatchbetweentheimage featuresthat havebeen computed in both images.Starting with an initial hypothetical match,thematchesareiterativelychangedandimproved,minimizinganenergyfunction whichintegrates—overtheentireimage—theinfluenceofseveralerrortermsrelatedto thequalityoftheedgematchesbetweentheleftandrightimage,aswellasasmoothness termwhichensuresthattherecoveredsurfaceisnotexceedinglyroughandnoisy.

Figure12showsapairofstereoimages.Thedisparitymapscomputedfromtheseimagesare

showninFigure13andthedepthmapisshowninFigure14(a).Finally,Figure14(b)shows

howthecomputeddepthmapisusedtooccludethreevirtualfloatingcubes.

Figure12.Anexamplepairofstereoimages:(a)Leftimageand(b)Rightimage.

Figure13.ThedisparitiescomputedonthestereopairinFigure12(a)disparitiesinrows

(du)and(b)disparitiesincolumns(dv).Thebrighterpointshavelargerdisparities.

Figure14.(a)ThecomputeddepthmapfromthepairofimagesinFigure12.The

brighterpointsarefartherawayfromthecamera.(b)Thecomputeddepthmapin(a)is

usedtooccludethevirtualobject(inthiscaseacube)whichhasbeenaddedinthescene.

2. ShapefromShading

Complementary to geometric shape extraction methods, some approaches exploit the photometricreflectionpropertiesofobjects.Animageofasmoothobjectwithuniform surfacereflectancepropertiesexhibitssmoothvariationsintheintensityofthereflected lightreferredtoasshading.Thisinformationisusedbyhumanandothernaturalvision systemstodeterminetheshapeoftheobject.Thegoalinshapefromshadingistoreplicate thisprocessto the point ofbeing able to design an algorithmthat will automatically

determinetheshapeofasmoothobjectfromitsimage(Horn&Brooks,1989).

Thisshapeinformationcanbeusedinanumberofapplicationareaswhereknowledgeof the spatial characteristics in a scene is important. In particular, shape from shading informationcanfillthegapsinsparsedepthmapsthatareleftbygeometry­basedshape extraction methods.Geometric extraction works best on highly textured objects where manyfeaturescanbematchedbetweenimages.Shapefromshading,ontheotherhand,can propagateshapeinformationintohomogeneousareas.

Weareinvestigatinghowthesecondderivative,orhessian,ofasmoothobjectsurfacecan bedetermineddirectlyfromshadinginformation.Themethodofcharacteristicstripswhich

isoftenusedforcalculatingshapefromshading(Horn,1986),issetintheframeworkof

moderndifferentialgeometry.Weextendthismethodtocomputethesecondderivativeof theobjectssurface,independentlyfromthestandardsurfaceorientationcalculation.This independently derived information can be used to help classify critical points, verify assumptions about the reflectance function and identify effectively impossible images

(Greer&Tuceryan,1995).

2. MixingofRealandVirtualWorlds

Onceappropriatescenedescriptionshavebeenobtainedinteractivelyorautomatically,theyform thebasisformixingrealandvirtualworlds.Sincethemixingmustbeperformedatinteractiverates, great emphasishasto be placed on efficiency.Depending on the representation ofthe scene descriptions,differentoptionscanbepursued.

Ifthescenedescriptionisavailableasageometricmodel,wecanhandthecombinedlistofrealand virtualmodelstothegeometricrendererwhichwillthencomputetheinteractionsbetweenrealand virtualobjectsforus.Byrenderingmodelsofrealobjectsinblack,wecanusetheluminance keyingfeatureofthevideomixertosubstitutetherespectiveareawithlivevideodata.Asaresult, theuserseesapictureonthemonitorthatblendsvirtualobjectswithlivevideo,whilerespecting

3Docclusionrelationshipsbetweenrealandvirtualobjects.

Thisisastraightforwardapproachinapplicationswheregeometric,polygonalscenedescriptions

areavailable.Ifthedescriptionsarecomputedasdepthmaps,asdescribedinsection6.1,thedepth

mapsstillneedtobeconvertedintoageometricrepresentation,bytessellatinganddecimatingthe

data(Schroeder,Zarge&Lorensen,1992;Turk,1992).

Alternatively, we can side­step the tessellation and rerendering phases for real objects by

initializingtheZ­bufferofthegraphicshardwarewiththedepthmap(Wloka&Anderson,1995).

Occlusion of the virtual objects is then performed automatically. When the virtual object is rendered,pixelsthatarefurtherawayfromthecamerathantheZvaluesinthedepthmaparenot drawn.Bysettingthebackgroundcolortoblack,therealobjectspresentintheoriginalvideoare displayed in theseunmodified pixels.Figure14(a)presentsthreevirtual cubesoccluded by a woodenstandwithanengineandoccludingtheotherobjectsinarealroom,usingthedepth­based approach.

These approacheshave advantagesand disadvantages,depending on the application.Full 3D geometricmodelsarebestforreal­timemovementofcameras.Polygonalapproximationstodepth mapscanbeusedoveracertainrangeofcamerapositionssincethesynthesizedscenemodelis rerendered when the camera moves.Copying the depth mapsdirectly into the Z­bufferisthe hardestapproach:themapneedstoberecomputedaftereachcameramotionbecausethenew projectivetransformation"shifts"alldepthvaluesinthedepthmap.Thus,thisapproachonlyworks withstationarycamerasorwithshapeextractionalgorithmsthatperformatinteractivespeeds.

Ontheotherhand,thegeometricmodelingapproachsuffersfromaninherentdependenceonscene

complexity.Ifthesceneneedstoberepresentedbyaverylargepolygonalmodel,therendering

technologymaynotbeabletoprocessitinrealtime.Incontrast,thesizeofadepthmapdoesnot

dependonscenecomplexity.Whichapproachtouseinanapplicationdependsontheoverall

requirementsandthesystemdesign.

5. CollaborativeUseofAR

So farwewerediscussing techniquesand solutionsthat makeAR "work"forthesingleuser.Object modeling,objectinteraction,realisticdisplayandimmersiveinterfacesallservetopresenttheuserwitha consistentandcoherentworldofrealandvirtualobjects.

Whenweconsidertheapplicationscenariosdescribedaboveweareremindedofthefactthatinanyvirtual orreal environment it appears natural to encounterotherpersons and to interact with them.Virtual environmentsareapromisingplatformforresearchintheCSCWarea,anddistributedmulti­userinterfaces areachallengeformanyVEsystems(e.g.theeffortsrelatedtotheVRMLproposal(Bell,Parisi&Pesce, 1995)).In thecontext oftheGRASPsystem,we are interested in the problemand the paradigmsof distributedAR.Weareinvestigatingsolutionsintheareaofdistributedcomputingandexperimentwith systemarchitecturesforcollaborativeinterfacestosharedvirtualworlds.

1. ArchitectureforSharedAR

Eachsystemsupportingmulti­uservirtualenvironmentscanbecharacterizedbythedegreeortype

ofconcurrency,distribution,andreplicationinthesystemarchitecture(Dewan,1995).Sharing

betweenusershastobebasedonseparabilityintheuserinterface:wecallthedatabaseofshared logicalobjectsthe"model",andcreate"views"asaspecificinterpretationofthemodelineach interface.Theneedforrapidfeedbackintheuserinterfacemakesareplicatedarchitectureattractive forAR.Thisinturnleadstoobject­levelsharingwhereeachusercanviewandmanipulateobjects independently. It is necessary to manage the shared information so that simultaneous and conflictingupdatesdonotleadtoinconsistentinterfaces.Thisisguaranteedbythedistribution componentinourapplications.

Themodelreplication and distribution supportallowtheuserinterfacesofoneapplication to execute as different processes on different host computers. GRASP interfaces are not multi­ threaded,sothedegreeofdistributioncorrespondstothedegreeofconcurrencyinthesystem.The resultingarchitecturewasimplementedandsuccessfullyusedintheinteriordesigndemonstration.

2. ProvidingDistribution

ThereplicatedarchitectureisdirectlysupportedbytheForecastlibraryoftheGRASPsystem.

Basedonamessagebusabstraction,Forecastprovidesaneasy,reliable,anddynamicapproachto

constructingdistributedARapplications.

Centraltothissupportisaone­to­manyreliablecommunicationfacilitywhichcanbedescribedas adistributedextensionofahardwaresystembus.Components,situatedondifferentmachines,can dynamically connect to thesamedistributed busand send and receivemessagesoverit.This analogyhasbeenusedbeforeforgroupcommunicationorbroadcastsystemsanditsmessagingand

selectioncapabilityarecommontosystemssuchasLindaandSun'sToolTalk(Sunsoft,1991).

TheForecastmessagebusimplementsaone­to­manyFIFO(firstinfirstout)multi­casttransport protocol.Aspecialsequencerprocessisusedtoimposeauniqueglobalorderingonmessages.In thesimplerformoftheprotocol,nodesthatwishtobroadcastsendtheirmessagetothesequencer whichthenusestheone­to­manyreliableprotocoltodisseminatethemessage.Auniqueglobal orderisimposedonthemessagestreamssinceallmessagespassthroughthesequencer.Nodescan detecthowtheirmessageswerescheduledbylisteningtotheglobalmessagestream.Theprotocol

issimilartotheAmoebaereliablemulti­castprotocol(Kaashoek&Tanenbaum,1992),exceptthat

itusesreliablebufferedtransmissionbetweennodesandthesequencernodeattheexpenseofextra

acknowledgments.

Wechoosethemessagebusabstractionbecauseitprovideslocation,invocationandreplication transparency for applications (Architecture Projects Management, 1989) which makes the programmingoftheseapplicationseasier.GRASPprogrammersarefamiliarwiththeconceptof multiplelocalviewsandevents,bothofwhichwehaveextendedtoourdistributedsetting.

TheForecastmessagebusisusedwithinourtwocollaborativeARdemonstratorstoimplement model replication, direct interaction between components (e.g., to send pointer tracking informationtoremoteparticipants),andalsousinggenericfunctionslikefloorcontrolandlocking, statetransfer,sharedmanipulators,videotransmission(basedontheMBONEaudioandvideo

library(Macedonia&Brutzman,1994),andsynchronizationbetweenvideoandtrackingevents

(usingRTPstyletime­stamps).

6. Discussion

UsingAugmentedRealityinrealisticapplicationsrequiresthecomputertobeverywellinformedabout

the3Dworldinwhichusersperformtheirtasks.Tothiseffect,ARsystemsusevariousdifferentapproaches

to obtain,registerand track object and scene models.Ofparticularimportance are different sensing devices,such ascamerasormagnetic trackers.They provide the essential real­time link between the computer'sinternal,"virtual"understandingoftheworldandreality.Allsuchsensorsneedtobecalibrated carefullysothattheincominginformationisinalignmentwiththephysicalworld.

SensorinputisnotusedtoitsfullpotentialincurrentARsystems—duetoreal­timeconstraints,aswellas duetothelackofalgorithmsthatinterpretsignalsorcombineinformationfromseveralsensors.Research fields such as computer vision, signal processing, pattern recognition, speech processing, etc. have investigatedsuchtopicsforsometime.Somealgorithmsarematuringsothat—consideringtheprojected annual increases in computerspeed — it should soon become feasible to considertheiruse in AR applications.Inparticular,manyapplicationsoperateundersimplified(engineered)conditionssothat sceneunderstandingbecomesaneasiertaskthanthegeneralComputerVisionProblem(see,forexample

(Marr,1980)).

WeoperateatthisborderlinebetweencomputervisionandAR,injectingasmuchautomationintothe processasfeasiblewhileusinganengineeringapproachtowardssimplifyingthetasksofthealgorithms.In thisrespect,weemphasizethehybriduseofvariousdifferenttechniques,includinginteractiveuserinput whereconvenient,aswellasothersensingmodalities(magnetictrackers).Thispaperhasshownhowwe have developed and explored different techniques to address some ofthe important AR issues.Our pragmatic approach has allowed us to build several realistic demonstrations. Conversely, these applicationsinfluenceourresearchfocus,indicatingclearlythediscrepancybetweenthestateoftheart and what is needed.Tradeoffs between automation and assistance need to be furtherexplored.User interaction should be reserved as much as possible to the high­level control of the scene and its augmentationwithsyntheticinformationfrommulti­mediadatabases.Moresensingmodalitiesneedtobe exploredwhichwillallowtheusertointeractwiththecomputerviamorechannels,suchasgestureand sound.Experimentationwithhead­mounted,see­throughdisplaysiscrucialaswell—especiallyinregard tothequestionwhetherandhowtheARsystemcanobtainopticalinputsimilartowhattheuserseesso thatcomputervisiontechniquescanstillbeused.Theforemostconcern,however,remainstheprovisionof fast,real­timeinteractioncapabilitieswithrealandvirtualobjectsintegratedseamlesslyinanaugmented world.Tothisend,theaccuratemodeling,trackingandpredictionofuserorcameramotionisessential.

ArelatedresearchdirectionleadsustoinvestigatethecollaborativeuseofAugmentedReality.As

reportedinthispaper,wehavedevelopedadistributedinfrastructuresothatallourdemonstrationscan

operateinacollaborativesetting.WeconsiderthecollaborativeuseofARtechnologytobeakey

interactionparadigmintheemergingglobalinformationsociety.Thehighlyinteractive,visualnatureof

ARimposeshardrequirementsonthedistributedinfrastructure,anddemandsthedevelopmentof

appropriatecollaborationstyles.

AugmentedReality,especiallyinacollaborativesetting,hasthepotentialtoprovidemucheasierand

moreefficientuseofhumanandcomputerskillsbymergingthebestcapabilitiesofboth.Consideringthe

rapidresearchprogressinthisfield,weexpectfuturisticscenarioslikecollaborativeinteriordesign,or

jointmaintenanceandrepairofcomplexmechanicaldevicestosoonbecomerealityfortheprofessional

user.

Acknowledgments

ThisworkwasfinanciallysupportedbyBullSA,ICLPLC,andSiemensAG.Wewouldliketothankthedirector

ofECRC,AlessandroGiacalone,formanystimulatingdiscussionsregardingpotentialapplicationscenariosfor

distributed,collaborativeAugmentedReality.ManycolleaguesatECRC,especiallyStefaneBressanandPhilippe

Bonnet,contributedsignificantlytothesuccessfulimplementationandpresentationoftheInteriorDesignand

MechanicalRepairdemonstrations,providingotherkeypiecesoftechnology(databaseaccess)thatwerenot

discussedinthispaper.

References

Ahlers, K.H., Crampton, C., Greer, D., Rose, E., & Tuceryan, M. (1994). Augmented vision: A technical

introductiontotheGRASP1.2system.TechnicalReportECRC­94­14,http://www.ecrc.de.

Ahlers,K.H.,Kramer,A.,Breen,D.E.,Chevalier,P.­Y.,Crampton,C.,Rose,E.,Tuceryan,M.,Whitaker,R.T.,&

Greer,D.(1995).Distributedaugmentedrealityforcollaborativedesignapplications.Proc.Eurographics‘95.

ArchitectureProjectsManagement.(1989).ANSA:AnEngineer’sIntroductiontotheArchitecture.APMLimited,

PoseidonHouse,CambridgeCB3ORD,UnitedKingdom,Nov.

Azuma,R.,&Bishop,G.(1994).Improvingstaticanddynamicregistrationinanopticalsee­throughdisplay.

ComputerGraphics,July,194­204.

Bajura,M.,Fuchs,H.,&Ohbuchi,R.(1992).Mergingvirtualobjectswiththerealworld:Seeingultrasound

imagerywithinthepatient.ComputerGraphics,July,203­210.

Bajura,M.,&Neumann,U.(1995).Dynamicregistrationcorrectioninaugmented­realitysystems.Proc.ofthe

VirtualRealityAnnualInternationalSymposium(VRAIS‘95),189­196.

Bar­Shalom,Y.,&Fortmann,T.E.(1988).TrackingandDataAssociation.AcademicPress,NewYork.

Baudel, M., & Beaudouin­Lafon, M. (1993). Charade: Remote control of objects using freehand gestures.

CommunicationsoftheACM,37(7),28­35.

Bell, G., Parisi, A., & Pesce, M. (1995). The virtual reality modeling language, version 1.0 specification. http://vrml/wired.com/vrml.tech/

Betting,F.,Feldmar,J.,Ayache,N.,&Devernay,F.(1995).Aframeworkforfusingstereoimageswithvolumetric

medicalimages.Proc.oftheIEEEConferenceonComputerVision,VirtualRealityandRoboticsinMedicine

(CVRMed‘95),30­39.

Caudell,T.,&Mizell,D.(1992).Augmentedreality:Anapplicationofheads­updisplaytechnologytomanual

manufacturingprocesses.Proc.oftheHawaiiInternationalConferenceonSystemSciences,659­669.

Deering,M.(1992).Highresolutionvirtualreality.ComputerGraphics,26(2),195­202.

Dewan,P.(1995).Multi­userarchitectures.Proc.EHCI’95.

Drascic,D.,Grodski,J.J.,Milgram,P.,Ruffo,K.,Wong,P.,&Zhai,S.(1993).Argos: Adisplay systemfor augmentingreality.Formalvideoprogramandproc.oftheConferenceonHumanFactorsinComputingSystems

(INTERCHI’93),521.

Feiner,S.,MacIntyre,B.,&Seligmann,D.(1993).Knowledge­basedaugmentedreality.Communicationsofthe

ACM,36(7),53­62.

Fournier,A.(1994).Illumination problemsin computeraugmented reality.Journée INRIA,Analyse/Synthèse

D’Images,Jan,1­21.

Gelb,A.(ed.)(1974).AppliedOptimalEstimation.MITPress,Cambridge,MA.

Gleicher,M.,&Witkin,A.(1992).Through­the­lenscameracontrol.ComputerGraphics,July,331­340.

Goldstein,H.(1980).ClassicalMechanics,Addison­Wesley,Reading,MA.

Gottschalk, S., & Hughes, J. (1993). Autocalibration for virtual environments tracking hardware. Computer

Graphics,Aug.,65­72.

Greer,D.S.,&Tuceryan,M.(1995).Computingthehessianofobjectshapefromshading.TechnicalreportECRC­

95­30,http://www.ecrc.de.

Grimson,W.E.L.,Ettinger,G.J.,White,S.J.,Gleason,P.L.,Lozano­Perez,T.,Wells,W.M.III,&Kikinis,R.(1995).

Evaluatingandvalidatinganautomatedregistrationsystemforenhancedrealityvisualizationinsurgery.Proc.of

theIEEEConferenceonComputerVision,VirtualRealityandRoboticsinMedicine(CVRMed’95),3­12.

Grimson,W.E.L.,Lozano­Perez,T.,Wells,W.M.III,Ettinger,G.J.,White,S.J.,&Kikinis,R.(1995).Anautomatic

registrationmethodforframelessstereotaxy,imageguidedsurgery,andenhancedrealityvisualization.Proc.of

theIEEEConferenceonComputerVision,VirtualRealityandRoboticsinMedicine(CVRMed’95),430­436.

Grimson,W.E.L.(1990).ObjectRecognitionbyComputer.MITPress,Cambridge,MA.

Henri,C.J.,Colchester,A.C.F.,Zhao,J.,Hawkes,D.J.,Hill,D.L.G.,&Evans,R.L.(1995).Registrationof3Dsurface

dataforintra­operativeguidanceandvisualizationinframelessstereotacticneurosurgery.Proc.oftheIEEE

ConferenceonComputerVision,VirtualRealityandRoboticsinMedicine(CVRMed’95),47­58.

Holloway,R.(1994).AnAnalysisofRegistrationErrorsinaSee­ThroughHead­MountedDisplaySystemfor

CraniofacialSurgeryPlanning.Ph.D.thesis,UniversityofNorthCarolinaatChapelHill.

Horn,B.K.P.(1986).RobotVision.MITPress,Cambridge,MA.

Horn,B.K.P.,andBrooks,M.J.(1989).ShapefromShading.MITPress,Cambridge,MA.

Janin, A., Mizell, D., & Caudell, T. (1993). Calibration of head­mounted displays for augmented reality

applications.Proc.oftheVirtualRealityAnnualInternationalSymposium(VRAIS’93),246­255.

Kaashoek,M.F.,&Tanenbaum,A.S.(1992).FaultToleranceusingGroupCommunication.OperatingSystems

Review.

Kancherla,A.R,Rolland,J.P.,Wright,D.L.,&Burdea,G.(1995).ANovelVirtualRealityToolforTeaching

Dynamic3DAnatomy.Proc.of the IEEEConference on ComputerVision,Virtual Reality and Roboticsin

Medicine(CVRMed’95),163­169.

Kramer,A.,&Chevalier,P.­Y.(1996).Distributing augmented reality.Submitted to Virtual Reality Annual

InternationalSymposium(VRAIS’96).

Lorensen,W.,Cline,H.,Nafis,C.,Kikinis,R.,Altobelli,D.,&Gleason,L.(1993).Enhancing reality in the

operatingroom.Proc.oftheIEEEConferenceonVisualization,410­415.

Lowe,D.(1985).PerceptualOrganizationandVisualRecognition.KluwerAcademic,Norwell,MA.

Macedonia, M.R., & Brutzman, D.P. (1994). MBONE provides audio and video across the internet. IEEE Computer,April.

Marr,D.(1980).Vision:AComputationalInvestigationintotheHumanRepresentationandProcessingofVisual

Information.Freeman,SanFrancisco.

Mellor, J.P. (1995). Real­time camera calibration for enhanced reality visualizations. Proc. of the IEEE

ConferenceonComputerVision,VirtualRealityandRoboticsinMedicine(CVRMed’95),471­475.

Milgram,P.,Zhai,S.,Drascic,D.,&Grodski,J.J.(1993).Applicationsofaugmentedrealityforhuman­robot

communication.Proc.oftheInternationalConferenceonIntelligentRobotsandSystems(IROS’93),1467­1472.

Peria,O.,Chevalier,L.François­Joubert,A.,Caravel,J.­P.,Dalsoglio,S.,Lavallee,S.,&Cinquin,P.(1995).Using

a3DpositionsensorforregistrationofSPECTandUSimagesofthekidney.Proc.oftheIEEEConferenceon

ComputerVision,VirtualRealityandRoboticsinMedicine(CVRMed’95),23­29.

Schroeder,W.,Zarge,J.&Lorensen,W.(1992).Decimationoftrianglemeshes.ComputerGraphics,26(2),65­70.

SunSoft(1991).TheTooltalkService.Technicalreport,SunSoft,June.

Tuceryan, M., Greer, D., Whitaker, R., Breen, D., Crampton, C., Rose, E., & Ahlers, K. (1995). Calibration requirementsandproceduresforamonitor­basedaugmentedrealitysystem.IEEETransactionsonVisualization

andComputerGraphics,1,255­273.

Turk,G.(1992).Retilingpolygonalsurfaces.ComputerGraphics,26(2),55­64.

Uenohara,M.&Kanade,T.(1995).Vision­basedobjectregistrationforreal­timeimageoverlay.Proc.oftheIEEE

ConferenceonComputerVision,VirtualRealityandRoboticsinMedicine(CVRMed’95),13­22.

Wellner,P.(1993).Interactingwithpaperonthedigitaldesk.CommunicationsoftheACM,36(7),87­96.

Weng,J.,Huang,T.S.,&Ahuja,N.(1989).Motionandstructurefromtwoperspectiveviews:Algorithms,error

analysis,anderrorestimation.IEEETransactionsonPatternAnalysisandMachineIntelligence,11(5),451­476.

Whitaker,R.,Crampton,C.,Breen,D.,Tuceryan,M.,&Rose,E.(1995).Objectcalibrationforaugmentedreality.

Proc.Eurographics’95.

Wloka,M.&Anderson,B.(1995).Resolvingocclusioninaugmentedreality.Proc.oftheACMSymposiumon

Interactive3DGraphics,5­12.

TableofContents

1.Introduction

2.PreviousWork

3.ApplicationScenarios

3.1CollaborativeInteriorDesign

3.2CollaborativeMechanicalRepair

4.SystemInfrastructure

5.SpecificationandAlignmentofCoordinateSpaces

5.1CalibrationofSensorsandVideoEquipment

5.1.1ImageCalibration

5.1.2CameraCalibration

5.1.3MagneticTrackerCalibration

5.2RegistrationofInteractionDevicesandRealObjects

5.2.1PointerRegistration

5.2.2ObjectRegistration

5.3TrackingofObjectsandSensors

5.3.1MagneticTracking

5.3.2OpticalTracking

6.ObjectInteraction

6.1Acquisitionof3DSceneDescriptions

6.1.1DenseShapeEstimatesfromStereoData

6.1.2ShapefromShading

6.2MixingofRealandVirtualWorlds

7.CollaborativeUseofAR

7.1ArchitectureforSharedAR

7.2ProvidingDistribution

ListofFigures

Figure1.Augmentedroomshowingarealtablewitharealtelephoneandavirtuallamp,surroundedbytwo

virtualchairs.Notethatthechairsarepartiallyoccludedbytherealtablewhilethevirtuallampoccludesthe

table.

Figure2.Augmentedengine.

Figure3.TheGRASPsystemhardwareconfiguration.

Figure4.TheGRASPsystemsoftwareconfiguration.

Figure5.Thecameracalibrationgrid.

Figure6.3Dpointingdevice.

Figure7.Calibrationandtrackinganenginemodel:Awireframeenginemodelregisteredtoareal

modelengineusinganimage­basedcalibration(a),butwhenthemodelisturnedanditsmovements

tracked(b),thegraphicsshowthemisalignmentinthecamera'sz­direction.

Figure8.Ouropticaltrackingapproachcurrentlytracksthecornersofsquares.Theleftfigureshows

acornerofaroomwitheightsquares.Therightfigureshowsthedetectedsquaresonly.

Figure9.Augmentedscenewithavirtualchairandshelfthatwererenderedusingtheautomatically

trackedcameraparameters.

Figure10.Each3Dmotioncanbedecomposedinatranslationtandarotation.Wechoosea

rotationaboutanaxesthroughthecenterofmassoftheobjects,whichisconstantintheabsenceofany

forces.

coordinateframe.

forces. coordinateframe. denotestheworldcoordinateframe,and denotesthecamera
forces. coordinateframe. denotestheworldcoordinateframe,and denotesthecamera

denotestheworldcoordinateframe,and

denotesthecamera

Figure11.ModifiedEngine.Thefactthattheuserhasremovedtheaircleanerisnotyetdetectedby

theARsystem.Thevirtualmodelthusdoesnotalignwithitsrealposition.

Figure12.Anexamplepairofstereoimages:(a)Leftimageand(b)Rightimage.

Figure13.ThedisparitiescomputedonthestereopairinFigure12(a)disparitiesinrows(du)and(b)

disparitiesincolumns(dv).Thebrighterpointshavelargerdisparities.

Figure14.(a)ThecomputeddepthmapfromthepairofimagesinFigure12.Thebrighterpointsare

fartherawayfromthecamera.(b)Thecomputeddepthmapin(a)isusedtooccludethevirtualobject

(inthiscaseacube)whichhasbeenaddedinthescene.