Sei sulla pagina 1di 11

2.

Review of Related Literature


2.1 PC Camera
PC Camera, popularly known as web camera or webcam, is a real time camera widely
used for video conferencing via the Internet. Acquired images from this device were uploaded
in a web server hence making it accessible using the world wide web, instant messaging, or a
PC video calling application. Over the years, several applications were developed including in
the field of astrophotography, traffic monitoring, and weather monitoring. Web cameras
typically includes a lens, an image sensor, and some support electronics. Image sensors can be
a CMOS or CCD, the former being the dominant for low-cost cameras. Typically, consumer
webcams offers a resolution in the VGA region having a rate of around 25 frames per second.
Various lens were also available, the most being a plastic lens that can be screwed in and out to
manually control the camera focus. Support electronics is present to read the image from the
sensor and transmit it to the host computer.
2.2 Projectors
Projectors are classified into two technologies, DLP (Digital Light Processing) and
LCD (Liquid Crystal Display). This refers to the internal mechanisms that the projector uses to
compose the image (Projectorpoint).
2.2.1. DLP
DLP technology uses an optical semiconductor known as the Digital
Micromirror Device, or DMD chip to recreate the source material. Below is an
illustration of how it works (Projectorpoint).

2.2.1.1. Advantages of DLP Projectors


There are advantages of DLP projectors over the LCD projectors. First,
there is less ‘chicken wire’ or ‘screen door’ effect on DLP because pixels in
DLP are much closer together. Another advantage is that it has higher contrast
compared to LCD. DLP projectors are much portable for it only requires fewer
components and finally, claims had shown that DLP projectors last longer than
LCD (Projectorpoint).
2.2.1.2. Disadvantages of DLP Projectors
Certainly, DLP projectors also have disadvantages to consider. It has
less color saturation. The ‘rainbow effect’ is appearing when looking from one
side of the screen to the other, or when looking away from the projected image
to an off-screen object and sometimes ‘Halo effect’ appears (Projectorpoint).
2.2.2. LCD
LCD projectors contain three separate LCD glass panels, one for red, green, and
blue components of the image signal being transferred to the projector. As the light
passes through the LCD panels, individual pixels can be opened to allow light to pass
or closed to block the light. This activity modulates the light and produces the image
that is projected onto the screen (Projectorpoint).
2.2.2.1. Advantages of LCD Projectors
Advantages of LCD projectors over the DLP projectors include: It is
more ‘light efficient’ than DLP. It produces more saturated colors making it
seem brighter than a DLP projector. It produces sharper image (Projectorpoint).
2.2.2.2. Disadvantages of LCD Projectors
Disadvantages of LCD projectors over DLP projectors are: It produces
‘chicken wire’ effect causing the image to look more pixellated. LCD
projectors are more bulky because there are more internal components. Dead
pixels, which are pixels that are permanently on or permanently off, appear
which can be irritating to see. LCD panels can fail, and are very expensive to
replace (Projectorpoint).
2.3 Similar Researches
2.3.1. Bare-Hand Human-Computer Interaction
Human-computer interaction describes the interaction between the user and the
machine. Devices such as keyboard, mouse, joystick, electronic pens and remote
controls were commonly used as the means for human-computer interaction. Real-time
barehanded interaction is the controlling of computer system without any device or
wires attached to the user. The position of the fingers and the hand is to be used to
control the applications (Hardenburg, 2001).
2.3.1.1. Applications
Bare-hand computer interaction is more practical than traditional input
devices. A good example is during a presentation, the presenter may use hand
gestures for selecting slides therefore minimizing the delay or pauses caused by
moving back and fourth to the computer system to click for the slide.
Perceptual interface allows systems to be integrated in small areas and allows
users to operate at a certain distance. Direct manipulation of virtual objects
using fingers is made possible with this system. Also, with this system,
indestructible interface could be built by mounting the projector and camera
high enough for the user not to access or touch it. With these, the system will be
less prone to damage caused by the users (Hardenburg, 2001).
2.3.1.2. Functional Requirements
Functional requirement includes the services for a vision-based
computer interaction system. The three essential services needed in the
implementation of the aforementioned system are detection, identification and
tracking. Detection determines the presence and position of the objects
acquired. The output of detection could be used for controlling applications.
Identification service recognizes if the object present in the scene is within the
given class of objects. Some of the identification tasks were the identification
of certain hand posture and number of fingers visible. Tracking service is
required to be able to tell which object moved between two frames since the
identified objects will not rest in the same position over the time (Hardenburg,
2001).
2.3.1.3. Non-Functional Requirements
Non-functional requirements describe the minimum quality expected
from a service. The qualities to be monitored and maintained are latency,
resolution and stability. Latency is defined as the lag between the user’s action
and the response of the system. Eventually there is no system without latency
therefore the acceptable latency of the system is of given importance since the
application requires real-time interaction. Minimum input resolution is
important in the detection and identification processes. It is difficult to identify
fingers with a resolution width below six pixels. Tracking service is said to be
stable as long as the tracking object does not move and as long as the measured
position does not change (Hardenburg, 2001).
2.3.2. Dynamically Reconfigurable Vision-Based User Interfaces
Vision-based user interfaces (VB-UI) are an emerging area of user interface
technology where a user’s intentional gestures are detected via camera, interpreted and
used to control an application. The paper describes a system where the application
sends the vision system a description of the user interface as a configuration of widgets.
Based on this, the vision system assembles a set of image processing components that
implement the interface, sharing computational resources when possible. The
parameters of the surfaces where the interface can be realized are defined and stored
independently of any particular interface. These include the size, location and
perspective distortion within the image and characteristics of the physical environment
around that surface, such as the user’s likely position while interacting with it.
The framework presented in this paper should be seen as a way that vision
based applications can easily adapt to different environments. Moreover, the proposed
vision-system architecture is very appropriate for the increasingly common situations
where the interface surface is not static (Kjeldsen, 2003.).
2.3.2.1. Basic Elements
A VB-UI is composed of configurations, widgets, and surfaces.
Configurations are a set of individual interaction dialogs. It specifies a
boundary area that defines the configuration coordinate system. The boundary
is used during the process of mapping a configuration onto a particular surface.
Each configuration is a collection of interactive widgets. A widget provides an
elemental user interaction, such as detecting a touch or tracking a fingertip. It
generates events back to the controlling application where they are mapped to
control actions such as triggering an event or establishing a value of a
parameter. A surface is essentially the camera’s view of a plane in 3D space. It
is able to define the spatial layout of widgets with respect to each other and the
world but it should not be concerned with the details of the recognition process
(Kjeldsen, 2003).
2.3.2.2. Architecture
In this system, each widget is represented internally as a tree of
components. Each component performs one step in the widget’s operation.
There are components for finding the moving pixels in an image (Motion
Detection), finding and tracking fingertips in the motion data (Fingertip
Tracking), looking for touch-like motions in the fingertip paths (Touch Motion
Detection), generating the touch event for the application (Event Generation),
storing the region of application space where this widget resides (Image Region
Definition), and managing the transformation between application space and
the image (Surface Transformation) (Kjeldsen, 2003).
The figure below shows the component tree of a “touch button” and the
“tracking area.”
2.3.2.3. Example Applications
One experimental application developed that used the dynamically
reconfigurable vision system is the Everywhere Display Projector (ED). This
provides information access in retail spaces. The Product Finder Application is
another example. Its goal is to allow customer to look up products in a store
directory, and then guide him/her to where the product is (Kjeldsen, 2003.).

2.3.3. Computer Vision-Based Gesture Recognition for an Augmented Reality Interface


Current researchers are discerning the realization of taking out computers in
other places than in our desktops while eyeing everywhere computation as one of their
objectives. The idea of wearable computers to enhance human visual sensors by
augmenting image generated information on a visual input is one of these issues. One
of the main proponents of the research is Gesture-Recognition as the input such as
pointing and clicking of a finger. It has been classified that gesture recognition has two
steps: 1.) capturing the motion of the user input and 2.) Classify the gesture to its
predefined gesture classes. Capturing is either performed by glove–based or optical-
based system. Optical-based gesture recognition comprise of model-based and
appearance-based category. In a model-based system a geometric model of the hand is
created where it is matched to the image data to define the state of the hand. While in
appearance-based system the recognition is based on a pixel representation learned
from training images. Because both approaches require a lot of computational
complexity which is not desirable for Augmented Reality (AR) systems it requires
enhancements like markers and infrared lightings. Gesture recognition will be
introduced and the main topic in this paper in order to make useful interface, as well as
having a low computational complexity. Outline of the paper will be done to show how
the research is implemented (Moeslund T., 2004).
2.3.3.1. Defining the Gestures
Two primary gestures are introduced, pointing and clicking gesture of the
hand. Consideration of minimum requirements to control the application is
done also it include other easy to remember gestures that will help in short-cut
commands to be able to avoid numerous pop-up menus (Moeslund T., 2004).

2.3.3.2. Segmentation
Task of the segmentation will be for the recognition and detection of the
placeholder objects and pointers where the visual output of the system will be
projected as well as hands in the 2d image captured. In order to achieve
invariance to changing size and form of objects to be detected the research used
colour pixel-based approach to segment spots of similar colour image.
Problems like lighting settings, changing illumination and skin colour detection
is discussed and was given solutions to (Moeslund T., 2004).
2.3.3.3. Gesture Recognition
A basic approach is done to solve this problem, by counting the number of
fingers. Hand and fingers can be approximated by a circle and a number of
rectangles, where it equates to the number of the finger that is projected. Polar
transformation around the centre of the hand and count the number of fingers
(rectangles) present in each radius. The algorithm does not contain any
information regarding the relative distances between two fingers, because it
makes the system more general, and secondly because different users tend to
have different preferences in the shape and size of their hands (Moeslund T.,
2004).
2.3.3.4. System Performance
Gesture-recognition has been implemented as part of the computer vision
system of a computer vision system of an AR multi-user application. The low
level segmentation (section 3) can robustly segment 7 different colours from
the background (skin colour and 6 colours for PHO and pointers), given there
are no big changes in the illumination colour (Moeslund T., 2004).
Segmentation Results
2.3.4. A Design Tool for Camera-Based Interaction
Constructing a camera-based interface can be difficult for most programmers
and would require a better understanding of machine algorithms that are involve.
Basically a camera-based interface is that a camera will serve as the sensor/eyes of the
system regarding with your input. The goal is to make the system interactive while not
wearing any other special devices to detect the input rather than having other
traditional inputs like keyboard etc. This makes computing set in the environment
rather than in our desktops. Problem lies in the designing of a camera-based system, the
programming and the mathematics part is complicated that ordinary programmers do
not have the skill for it especially when we are considering bare-hand inputs. The main
item to be considered in a camera-based interaction is a classifier that takes an image
and identifies pixels that is considered. Acquiring skills in building a classifier is
greatly needed to pursue the idea (Fails, J.A., 2003).
Crayons is one of the tools to make a classifier which can be exported in a form
that can be read by java. Crayons help User Interface (UI) designers to make the
camera-based interface even without detailed knowledge on image processing. But its
features are unable to distinguish shapes and object orientation but do well in object-
detection and hand and object tracking (Fails, J.A., 2003).

Classifier Design Process


The function of the Crayons is to create a classifier with ease. Crayons receive
images and then after the user gives its input a classifier is created then a feedback is
displayed (Fails, J.A., 2003).

2.3.4.1. User Interface


There are four pieces of information that a designer must consider and
operate in designing a classifier interface which are: 1.) set of classes to be
recognized, 2.) Set of training images to be used, 3.) classification of pixels as
defined by the programmer and 4.) the classifier’s current classification of the
pixels (Fails, J.A., 2003).
2.3.4.2. Crayons Classifier
Automating the classifier creation is the main function of the crayon
tool. It is required to extract features and generate classifiers as quickly as
possible. Current Crayons prototype has about 175 features per pixel (Fails,
J.A., 2003).
Lastly to accomplish the application a machine learning algorithm that
can handle a large number of examples with a large number of features is
required (Fails, J.A., 2003).

2.3.5. Using Marking Menus to Develop Command Sets for Computer Vision Based
Hand Gesture Interfaces
The use of hand gestures for interaction, in an approach based on computer
vision. The purpose is to study if marking menus, with practice, could support the
development of autonomous command sets for gestural interaction. Some early
problems are reported, mainly concerning with user fatigue and precision of gestures
(Lenman, S., 2002).
Remote control of electronic appliances in a home environment, such as TV
sets and DVD players, has been chosen as a starting point. Normally it requires the use
of a number of devices, and there are clear benefits to an appliance-free approach. They
only implemented a first prototype for exploring pie- and marking menus for gesture-
based interaction (Lenman, S., 2002).
2.3.5.1. Perceptive and Multimodal User Interfaces
Perceptive User Interfaces (PUI) strives for automatic recognition of
natural, human gestures integrated with other human expressions, such as body
movements, gaze, facial expression, and speech. The second approach to
gestural interfaces will be the Multimodal User Interfaces (MUI), where hand
poses and specific gestures are used as commands in a command language. In
this approach, gestures are either a replacement for other interaction tools, such
as remote controls, mouse, or other interaction devices. The gestures need not
be natural gestures but could be developed for the situation, or based on a
standard sign language.
There is a growing interest in designing multimodal interfaces that
incorporate vision-based technologies. It contrasts the passive mode of PUI
with the active input mode addressed here. It claims that although passive
modes may be less obtrusive, active modes generally are more reliable
indicators of user intent, and not as prone to error.
The design space for such commands can be characterized along three
dimensions: Cognitive aspects, Articulatory aspects, and Technological aspects.

Cognitive aspects refer to how easy commands are to learn and to remember. It
is often claimed that gestural command sets should be natural and intuitive,
meaning that they should inherently make sense to the user.

Articulatory aspects refer to how easy gestures are to perform, and how tiring
they are for the user. Gestures involving complicated hand or finger poses
should be avoided, because they are difficult to articulate.

Technological aspects refer to the fact that in order to be appropriate for


practical use, and not only in visionary scenarios and controlled laboratory
situations, a command set for gestural interaction based on computer vision
must take into account the state-of-the art of technology (Lenman, S., 2002).
2.3.5.2. Current Work
The point of departure for the current work is cognitive, leaving
articulatory aspects aside at the moment. A command language based on a
menu structure has the cognitive advantage that the commands can be
recognized rather than recalled. Traditional menu based interaction is not
attractive in a gesture-based scenario. Pie- and marking menus might provide a
foundation for developing directness and autonomous gestural command sets
(Lenman, S., 2002).
Pie menus are pop-up menus with the alternatives arranged radially.
Because the gesture to select an item is directional, users can learn to make
selections without looking at the menu. The direction of the gesture is sufficient
to recognize the selection. If the user hesitates at some point in the interaction,
the underlying menus can be popped up, always giving the opportunity to get
feedback about the current selection.
Hierarchic marking menus are a development of pie menus that allow
more complex choices by the use of sub-menus. The shape of the gesture
(mark) with its movements and turns can be recognized as a selection, instead
of the sequence of distinct choices between alternatives.
The gestures in the command set would consist of a start pose, a
trajectory defined by menu organization for each possible selection and, lastly a
selection pose. Gestures ending in any other way than with the selection pose
would be discarded (Lenman, S., 2002).
2.3.5.3. A Prototype for Hand Gesture Interaction
Here remote control appliances in a domestic environment were chosen
as the first application. So far, the only designed hierarchic menu system is for
controlling some functions of a TV, a CD player, and a lamp (Lenman, S.,
2002).
The hand was chosen as a view-based representation which includes
both color and shape cues. The system tracks and recognizes the hand poses
based on a combination of multi-scale color feature detection, view-based
hierarchical hand models and particle filtering. The hand poses are represented
in terms of hierarchies of color image features at different scales, with
qualitative interrelations in terms of scale, position and orientation. These
hierarchical models capture the coarse shape of the hand poses. In each image,
detection of multi-scale color features is performed.
The particle filtering allows for the evaluation of multiple hypotheses
about the hand position, state, orientation and scale, and a possibility measure
determines what hypothesis to choose. To improve the performance of the
system, a prior on skin color is included in the particle filtering step. In fig. 1,
yellow (white) ellipses show detected multi-scale features in a complex scene
and the correctly detected and recognized hand pose is superimposed in red
(gray).
Fig. 1 Detected multi-scale features and the recognized hand pose superimposed
in an image of a complex scene.

There is a large number of works on real-time hand pose recognition in


the computer vision literature. Some of the most related in this approach is by
using normalized correlation of template images of hands for hand pose
recognition. Though efficient, this technique can be expected to be more
sensitive to different users, deformations of the pose and changes in view, scale,
and background.

However, the performance was far from real-time. The approach closest
was representing the poses as elastic graphs with local jets of Gabor filters
computed at each vertex. In order to maximize speed and accuracy in the
prototype, gesture recognition is currently tuned to work against a uniform
background within a limited area, approximately 0.5 by 0,65m in size, at a
distance of approximately 3m from the camera, and under relatively fixed
lighting conditions (Lenman, S., 2002).

Fig. 2 The demo space at CID


2.4 Similar Product
An Interactive Whiteboard (IW) is a projector-screen, except that the screen is either
touch sensitive or can respond to a special ‘pen.’ This means that the projector-screen can be
used to interact with the projected user image. This provides a more intuitive way to interact
rather than using input devices such as the mouse/keyboard for navigation of the computer
screen being projected. There are two basic functions of an IW, writing on the board and acting
as a mouse. All common IWs have character-recognition and can convert scrawls into text-
boxes.
There are two market leaders in IWs. They are Promethean ActivBoard and
SmartBoard. Promethean has its own presentation system, web browser, and its own file
system. SmartBoard uses the computer’s native browser. Promethean uses stylus pen to interact
with the board while the SmartBoard are touched to operate. The reason to prefer one to the
other will depend on its applications.
There are some issues regarding IWs. One of which is that it requires a computer with
an IW software installed. The need for a software makes it awkward to use an IW with
individual laptops. Another issue is that all IWs used were “front-lit” meaning that the user’s
shadow will be thrown across the screen. Backlit IWs currently are very expensive. Lastly,
although IWs have both character-recognition and an onscreen keyboard, it is not a good
technology for typing. The user can easily go back to the computer keyboard when he/she
needs to do a lot of typing. (Strowell, 2003)

2.5 Computer Vision and Image Processing Development Tools


2.5.1. OpenCV
OpenCV which stands for Open Computer Vision is an open source library
developed by Intel. This library is cross-platform which runs both on Windows and
Linux and mainly focuses on real-time image processing. This library is intended for
use, incorporation and modification by researchers, commercial software developers,
government and camera vendors as reflected in the license (Open Source Computer
Vision Library).
2.5.2.Microsoft Vision SDK
Microsoft Vision SDK is a library for writing programs to perform manipulation
and analysis on computers running on Microsoft Windows operating systems. The
library was developed to support researchers and developers of advanced applications,
including real-time image processing applications. Microsoft Vision SDK is a C++
library of object definitions, related software, and documentation for use with
Microsoft Visual C++. It is a low-level binary, intended to provide a strong
programming foundation for research and application development. It includes classes
and functions for working with images but it does not include image processing
predefined functions (The Microsoft Vision SDK).

Reference:

Hardenberg, C., Bérard, F., (2001). Bare-hand human-computer interaction. Orlando, FL USA.
Retrieved from

Kjeldsen, R., Levas, A., & Pinhanez, C. (2003). Dynamically Reconfigurable Vision-Based User
Interface. Retrieved from http://www.research.ibm.com/ed/publications/icvs03.pdf

DLP and LCD Projector Technology Explained. (n.d.). Retrieved June 2, 2006, from
http://www.projectorpoint.co.uk/projectorLCDvsDLP.htm.

Moeslund T., Liu Y., Storring M., (2004, September). Computer Vision-Based Gesture Recognition for
an Augmented Reality Interface. Marbella, Spain. Retrieved from
http://www.cs.sfu.ca/~mori/courses/cmpt882/papers/augreality.pdf

Fails, J.A., Olsen, D. (2003). A Design Tool for Camera-Based Interaction. Bringham University, Utah.
Retrieved from http://icie.cs.byu.edu/Papers/CameraBaseInteraction.pdf

Lenman, S., Bretzner, L., Thuresson B., (2002, October). Using Marking Menus to Develop Command
Sets for Computer Vision Based Hand Gesture Interfaces. Retrieved from
http://delivery.acm.org/10.1145/580000/572055/p239-
lenman.pdf?key1=572055&key2=1405429411&coll=GUIDE&dl=ACM&CFID=77345099&C
FTOKEN=54215790

Stowell, D. (May, 2003). Interactive Witeboard. Retrieved June 1, 2006 from


http://www.ucl.ac.uk/is/fiso/lifesciences/whiteboard.

Webcam. (n.d.). Wikipedia. Retrieved June 03, 2006, from Answers.com Web site:
http://www.answers.com/topic/web-cam.

Intel, (n.d.). Open source computer vision library. Retrieved June 4, 2006 from
http://www.intel.com/technology/computing/opencv/index.htm.

The Microsoft Vision SDK. (2000, May). Retrieved June 4, 2006 from
http://robotics.dem.uc.pt/norberto/nicola/visSdk.pdf

Potrebbero piacerti anche