Sei sulla pagina 1di 4

Adaptive and Smart Interface for VCR Remote Control Using Hand Gestures

Zoran Duric, Harry Wechsler, James Yven


Dept. Computer Science, George Mason University
' ' Fairfax, VA 22030-4444, USA
(zduric, wechsler, jyven}@gmu.edu

Abstract HCI has developed mostly along two competing meth-


odologies (Shneidennan & Maes, 1997): direct manipula-
This paper describes an adaptive and smart interface tion and intelligent agents (also known as delegation). The
and shows its successful application for VCR Remote two approaches can be contrasted as the computer sitting
Control using hand gestures. The interface is capable of passively waiting for input from the human versus the
learning the user's operational habits and can offer self- computer taking over from the human. In contrast to those
help, aka wizard mode of operation; it can then monitor established paradigms that correspond to direct manipu-
the.user's gestures and maintain constant vigilance in an lation and intelligent agents, Human Computer Intelli-
attempt to assist her through feedback using video display gent Interaction (HCII) endows the interface with
and/or loud-speaker. The availability of users' profiles is "smarts" to increase the bandwidth through which humans
used in an adaptive fashion to enhance human-computer interacts with computers (Huang, 1997; Pavlovic, Shanna,
interactions and to make them intelligent, i.e., causal. & Huang, 1997). The availability of users' profiles is used
The smart interface is suitable for handicapped users and in an adaptive fashion to enhance human-computer inter-
it can be used for security purposes too. actions and to make them intelligent, i.e., causal. The
intelligent interface, shown in Fig. 1, is both adaptive and
smart. As the interface learns users' profiles and its
1. Introduction smarts increase, the interface anticipates and/or guesses
users' needs and it thus becomes responsive to their re-
Imagine a smart human-computer interface (HCI) that quests, expressed both implicitly or explicitly. This paper

---
could understand and predict the user's needs using ges- describes an application of HCII - intelligent VCR remote
tural information. Further imagine that the interface could control -using hand gestures.
adapt itself - simplify, highlight, recommend or explain
- to improve the human-computer interaction using its People Environment Learning Agent
diagnoses and predictions. Smart interfaces continuously
adapt the interface medium in order to meet specific users
needs and demands. The emergence of human-centered
interaction with intelligent systems creates a richer, more
versatile and effective environment for human activity.
Human-centered design is problem-driven, activity-
centered, and context-bound; it employs computing tech-
nology as a tool for the user, not as a substitute. The em-
phasis is on supporting human activity using adaptive and
smart interfaces rather than on building (fully) autono-
mous systems that mimic humans. One approach to a
human-centered use of intelligent system technology
seeks to make such systems "team players" in the context
of human activity, where people and computer technology Figure 1. Intelligent VCR Remote Control Using
interact to achieve a common purpose. The goal for smart Hand Gestures
interfaces is to expand on the human perceptual, intellec-
tual. and motor activities.

1051-4651/02$17.00 0 2002 IEEE 1035


2. VCR Remote Control Specifications on the selected side for more than 3 seconds, the high-
lighted mode is activated.
The physical VCR remote control system interface is
When the “Learn” mode is activated, the video feed-
shown in Fig. 2. It allows people using of a single hand to
back screen displays a message asking the user to make
operate a VCR by using user definable gestures. The
her signature; instructions for that purpose are also dis-
gestures can be of any unique and distinct type, formable
played. Signatures are made using hand gesture, e.g., a
by the user’s hand regardless of user’s physical deformity.
The VCR interface gets its raw information through a
tight fist, and then placing the hand within the visual field
area. The gesture indicating the signature is displayed on
video camera, which allows it to ‘‘see’’ what the user’s
the screen within a grid with a numerical X-Y value, e.g.,
gestures are. The interface personalizes the interface as it
2-3; this is the user’s signature value. One has to maintain
allows the user to “train” the system on specific control
the hand gesture for at least one second and remove the
operations that she wants to be executed using hand ges-
hand from the field of view; and reconfirm the signature
tures. The set of commands includes play, rewind, fasf
using the same hand gesture for at least another second.
forward, record, etc. The interface is capable of learning
Once the VCR system identifies an user, it activates her
the user’s operational habits and can offer self-help, aka
user habit fik profile, if she is known; if the user is a
wizard mode of operation; it can then monitor the user’s
newcomer, it creates a new user file for her. The VCR
gestures and maintain constant vigilance in an attempt to
message screen now displays the acceptable commands,
assist her through feedback using video display and/or
one at a time and it asks the user to make a sequence of
loud-speaker. Examples of such assistance include short
hand gestures (“actions”) into the field of view. The user
cuts to commands and alerts to the user that her present
then indicates the command the she wants the interface to
gestures are not consistent with past actions and asking
learn from her. Once a command to learn is selected, the
her for clarification and/or confirmation.
interface displays a message explaining to the user the
PIOPlC Environment Agent start and stop criteria used by the system to view and learn
I--the corresponding gestures. The “Run” mode works as
follows. The video feedback screen displays a message
I asking the user to make her signature. If the signature is
validated, the interface displays a message stating that it
is in “vigilance” and is ready to accept user’s commands.
The user may now use hand gesture commands that she
has previously taught the interface. Note that the interface
updates the user habit file profile as it continuously learns
during the “Nn” mode of operation.

3. Interface Operation and Implementation


Figure 2. VCR Remote Gesture Control Physi- - The VCR remote control system consists of the fol-
lowing components:
cal System
Data Acqusition: a webcam camera made by D-Link,
The interface is actuated upon when an object comes model DSB-C100 with a USB interface.
into its field of view, e.g., when one inserts her hand un- Computer Platform: Intel Celeron 433Mhz eMachine
der the camera; it constantly updates the camera image with 256M of RAM.
input. Once actuated upon, the interface can operate in Sofhvare: Microsoft Visual C++ V6.0, with Micro-
two modes, Learn and Run. In “Learn” mode, the inter- soft’s Vision SDK V2.0 and Dupont’s ImageMagick as
face attempts to increase its vocabulary of commands by auxiliary function support.
allowing the user to teach it her unique set of gestures. In
“Run” mode, the interface attempts to recognize the user’s Interface Operation:
gestures used to control the VCR. When the VCR inter-
face is activated, the feedback video screen displays two 1. Gesture (“signature”) image capture;
vertical boxes; the left side is labelled “Learn” and the 2. User signature (gesture) identification;
right side “Run”. The user selects either mode by placing 3. If identification fails then go to step IO else retrieve
her hand either on the right or left side of the field of view user file;
of the camera. Once the user places her hand, the selected 4. Gesture (“command”) image capture;
mode is highlighted; if the user allows her hand to dwell

1036
5. Image.registration; if registration.fails then go to step Recognition: The recognilion vector i s compared with
4; , the set o f vector commands that the user has earlier taught
6. Pre-processing: ifpre-processing’ fails then’go to step the interface using flexible matching consisting o f cham-
4; . . fer matching [3]. A command vector may be composed of
I . Image parsing and feature extraction; if image pars- one or more recognised gestures.
. ing and/or feature extraction’fails, then go to step 4; User ID - Signalure: The interface gathers features
8. , Command recognition;’ if successful then ,(update such as the number o f pixels, the angular orientation, the
user‘habit file projle, execute command, go to step number o f fingers extended and their lengths, and RGB
4f else {form recognition vector, match recognition values. These features form a unique User ID signature.
vector against user.habit file,profilef ; When an ID login from the user can he verified, the user’s
9. if no.habit’trend found i n user hahitfile prafle then habit profile f i l e i s retrieved; othenvise a new user habit
go to step4 else use’wizard to consukuser via step profile file i s created.
‘4. (interface looks for user response: i f user agrees Command: If a command vector i s recognised, e.g.,
with wizard‘s suggestion then update user habilfile turn on VCRplay function, the VCR would execute i t via
‘profile, complete and execute command, and go to an electrical signal issued from the computer to an electri-
step 4). cal relay switch.
10. Initialise new user habit$leprofile, ask user to make Wizard The wizard is’vigilant to incoming commands
a hand gesture in order to register her signature for as it attempts to anticipate what the user’s intend is. If a
identification, and save the signature gesture for fu- reasonable “estimate” i s found, from comparisons made
ture login; if successful then go to step 4 else go to with stored recognition vectors, it advises the user via a
step 1 and restart. screen message; if the user agrees with the wizard’s un-
derstanding the suggested command i s executed without
Interface Implementation’: further user input. If ‘the wizard detects a discrepancy
.. .
from previous user executed commands andlor habits, the
Data Acqiiis;tion: Camera capable of capturing color wizard flags the user, via a screen message, that her pres-
video frames at the rate of 12 frames per second with ent command deviates from previous similar commands
320x240 pixels reso1ution:The camera i s pre-mounted on and asks her for confirmation or correction.
a static stand with its focus and lighting properly adjusted.
The background i s a non-reflective black surface, such
that any object that comes into its field of view, such as a 4. Experiments
hand gesture, can have its image captured without much
background interference or disturbance. ’ We describe and illustrate below one of the character-
Registration: Its function i s to center the input image istic uses ofthe VCR remote control using the smart inter-
over the field o f view of the camera. If the image is’ not face we developed. T h i s example emphasises the chal-
“balanced” with some prespecified proportional amounts lenge that an handicapped user can pose to interface abil-
.. . .. ..
of blank space on top, right and l e f t o f the image, a mes- ity to adapt. ’. ‘
sage i s displayed on the monitor to inform the user lo .. .
“center” her hand gesture. The user has the use df ti;, left hand with limited
Pre-processing: I t s function i s to enhance image qual- mobility’and has only the (deformed) index ‘and
ity. I t starts by averaging RGB values and the image is middle .fingers.functional. The user elected to”use
converted to BIW. It proceeds to “clean” the image by a’ two “symboi” gesture to act as a simple, but
removing “salt and pepper” artifacts. Towards that end it limited, control command set, which gives 2x2 = 4
checks the “entropy” of the image; ifthe entropy i s too control codes. Examples: Rewind = A, A; Play =
“high” an open and a close morphology operation is ap- A, B; Fast forward = B, A; Stop = B, B; where A
plied to smooth out the image [ I ] . Finally it finds the and B are symbols respectively (see Fig. 3a, 3b).
largest connected component (“blob) in the image; it is Activate the system by inserting any hand gesture
the hand gesture, since there should be no other objects in into.the system camera’s field of view.
the field of view. The monitor displays’the message “User Identifi-
Parsing and Fealure Erfraction: I t extracts pre- catio?’ on the bottom of the screen (displayed as a
defined features from the image using image skeletoniza- grid).
(ion [Z], e.g., the number of fingers, the size of the hand The user places his hand with its identification
image in pixel counts, the general orientation in degrees gesture on the field o f view covering the proper
of the hand gesture, etc. T h e features form a recognition grids on the screen (see Fig. 3c).
vector for individual gestures. Once the user i s identified, the VCR interface pro-
ceeds to register his gestures. If the gesture is out

1037
of registration (see Fig. 3d), the interface advises
(via monitor message display and/or audio mes-
sage) the user to reposition the gesture Proper
registration is shown in Fig. 3a and 3b.
The monitor displays now “Learn” on the right
half and “Run” on the left half of the monitor.
User makes a hand gesture on the camera’s field
of view such that the ‘‘Run’’ side is highlighted and
maintain the gesture for at least one second so the
interface can register his request to activate the
“Run” mode.
The user wants to play the VCR, (code = A, B),
and he places a hand gesture with the index finger
extended into the fields of view first (see Fig. 3a).
The interface acknowledges the request with a
screen blink and an audio beep when the proper
registration of the gesture is made.
The system records the average RGB color values
of the image, converts the image into BiW (see
Fig. 3e), extracts the largest connected component
(see Fig. 30, finds the gestures outline and cen-
troid (see Fig. 3g), extracts skeleton features,
namely location of fingers (see Fig. 3h) and image
grid occupancy (see Fig. 3i). These features form
the recognition vector. Figure 3. Experimental run through the system
The user removes his hand gesture from the field and its intermediate image outputs.
of view and makes a gesture for symbol B, by ex-
tending both the index and middle fingers (see Fig.
3a and 3b). 5. Conclusions
The system matches the recognition vector with
the user habit file profile, understands the user’s We have described a smart interface and showed its
request to play the VCR, and executes the play successful application for VCR remote control. The inter-
function. face is suitable for handicapped users and can be used for
security purposes too. We are now expanding the range
of commands using two hand gestures and allowing for
continuous gestural movements.

6. References
[I] R. M. Haralick, S. R. Stemberg and X. Zhuang (1987)
Image Analysis Using Mathematical Morphology, IEEE
Tramacriom on PAMI, 9 (4): 532-550.
[Z] H. Fujiyoshi, A. Lipton (1998), Real-time Human Motion
Analysis by Image Skeletonization, IEEE Proc. of the
Workshopon Application of Cornpurer Vision.
[3] G. Rorgerfors (1986), Distance Transformations in Digital
Images, Cornpurer Vision. Graphics and Image Processing,
34, 344 -371.
[4] T.S.Huang (1997). Workshop on Human Computer Intelli-
gent Interaction and Human Centered Systems, lSGW ‘97
NSF Interacrive Systems Grantees Workshop.
[5] V.I. Pavlovic, R. Sharma and T.S. Huang (1997), Visual
Interpretation of Hand Gestures for Human-Computer In-
teraction, IEEE Tramacrions on PAMI, 19611-695.
[6] Shneiderman, R and P. Maes (1997), Direct Manipulation
vs Interface Agents, Interacrions, 4,643-661.

1038

Potrebbero piacerti anche