Sei sulla pagina 1di 20

1

A Kinect-Based Sign Language Hand Gesture Recognition


System for Hearing- and Speech-Impaired: A Pilot Study of
Pakistani Sign Language.

Introduction:

The most common form of interaction between humans is through vocal language.
In addition to many other ways to communicate, languages provide us with one
such tool. Spoken language is something that has allowed humans to have an upper
hand from other living species, thus giving us the ability to communicate and share
ideas and thoughts directly. Unfortunately, some people have not been blessed
with the ability to speak or hear. They require some alternative method to
communicate. Sign language is the primary alternative to a spoken language. Sign
language uses manual movements and body language to communicate thoughts
with others. The basic component of a sign language includes hand movements,
arm movements, and facial expressions to communicate certain feelings. Every
region in the world has a unique spoken language, and similarly, every region has a
unique sign language. Thus, sign language varies from culture to culture and from
region to region (Sandler, 2006). There are 25 sign languages in Africa alone, and

America, Asia/Pacific, Europe, and the Middle East have their own sign languages

(Aarons & Philemon, 2002). People with speech and/or hearing impairment find it

difficult to communicate with other individuals via sign language due to the inability
of most of the people to understand a sign language.

ID:6703
2

Abstract:
The proposed system focuses on the problem explained above and uses image
processing techniques to assist the individuals with hearing and speech disability by
translating their sign language gestures into spoken language. The basic tool used is
a Microsoft Kinect 360TM camera. Microsoft Kinect has the ability to directly
provide depth images of body joints. An advantage of using Kinect is its infrared
camera that is useful in minimizing the issue of lighting conditions. We have used a
Dynamic Time Warping (DTW) based algorithm to recognize a performed gesture
that is later translated to a spoken language by an off-the-shelf software tool,
NaturalReader 12.0 (http://www. naturalreaders.com/pc_nr12.php). Details on the
DTW algorithm can be seen in Al-Naymat, Chawla, and Taheri. (2009). Those with
hearing disability only are provided with a text-based interface. The proposed
solution provides the facility to add new gestures to the dictionary, which can later
be recognised by the system. For the purpose of testing, we have focused on Indo-
Pak sign language. However, for the sake of completeness and to demonstrate the
systems generality to other gestures, we have tested the solution on three sets of
gestures a) Pakistani Sign Language (PSL) b) generic gestures and c) American Sign
Language (ASL). The proposed system has the ability to detect gestures while being
insensitive to finger movements (finger movements are a subset of few gestures;
example of gestures involving movement of fingers includes quit and order gestures
in PSL). The system is able to detect gestures that are performed between head and
hip. The proposed system can be used by hearing and speech disabled to assist them
in having better communication with other members of the society. It can also be
utilized in making places like schools, shopping marts and customer service counters
more friendly to speech and hearing impaired. Research questions are described in
the next section. A review of literature is presented in the Related Work section.
The proposed system is presented in Methodology section. Data analysis and

ID:6703
3

evaluation of the proposed system is presented in the Experiments and Results


section. Finally the section Conclusion and Future Directions concludes the article
with some directions for future work .

Related Work:

This section presents a review of literature closely related to the problem addressed
in this article. The section is organized such that it initially gives an overview of sign
language followed by eminent work in gesture recognition, and, lastly, a review of
Kinect-based assistive technologies is presented.

Sign Language:
Sign language not only requires movement of hands and arms but also involves
specific formation of fingers, facial expressions, and head movements. Most
importantly, sign language is not universal. Different countries have their own sign
languages; for instance, the Pakistani sign language contains nearly 4,000 signs for
different words. Different regions, towns, and cities can also have their own local
sign language. Shorter signs convey more meaning than a single short word (Alvi
et al., 2005; Hollis, 2011; Li, Lothrop, Gill, & Lau, 2011). Proper sign language was

first introduced in 1960. Manual component involves the hand, whereas involvement
of rest of the body parts in representation of signs can be described as non-manual
component. Two signs can be different using hand in terms of hand location,
orientation, shape, and movement. Single-handed signs can be in a state of motion
or can be represented using static or rest position of the hand. Double-handed signs
involve the domination of one hand over the other, or both hands share equal priority
(Diwakar & Basu, 2008; Li et al., 2011). The past, present, and future tenses in sign
languages are performed with differences in representation of the sign, performing
same sign with different facial expression or movement of different body parts. To

ID:6703
4

convey the verbal aspects, sign is performed with repetitions. Sign language also has
syntax, for instance, the question is represented by raising the eyebrows (Lucas,
1990). This requires inclusion of facial features as well while recognizing the
gestures. However, the work presented in this article focuses only on hands and will
incorporate facial features in recognizing gestures as a future extension.

Gesture Recognition:
There exists literature in gesture recognition where the problem is addressed by a
diverse set of techniques from artificial intelligence and image processing. Work in
Li (2012) mentioned use of Kinect to implement the gesture recognition system for
media player. To capture the hand motion, 3D vector and to detect the hand gesture,
Hidden Markov Model (HMM; Lang, Block, & Rojas, 2012) was used. The system
was limited to perform and detect hand gestures that do not involve different
alignment of fingertips. The system was able to detect top-todown or left-to-right
movement of hands. Raheja et al.s (Raheja, Dutta, Kalita, & Lovendra, 2012)
system was able to grab fingertips and to identify the center of the palm.
Segmentation was used to separate the hand from video frames.

Other Kinect-Based Assistive Technologies:

Authors in Ren, Meng, Yuan, and Zhang (2011) have presented a hand-gesture-
based Human Computer Interaction (HCI) solution using Kinect sensor. They have
used Finger-Earth Movers Distance (FEMD) for hand-gesture recognition and
showed its utility on two real-life applications. Authors in Abdur, Qamar, Ahmed,
Ataur, and Basalamah (2013) have used Kinect for an interactive multimedia-based
environment to act as a therapist for the rehabilitation of disabled children. The
system in Abdur et al. (2013) can be used in homes in the absence of an actual
therapist at the convenience of the users. The authors claim to obtain

ID:6703
5

promising results based on initial joint-based angular measurements. A Kinect-based


physical rehabilitation was presented in Chang, Chen, and Huang (2011), where two
young adults with motor impairments have been used in the study. The data collected
in Chang et al. (2011) showed significant improvement in the participants,
particularly in improving exercise performance during the intervention phases. An
interesting study has been presented in Chang, Chen, Chen, and Lin, (2013) where
authors introduced a Kinect interface for power wheelchair control system with the
aim to improve the living quality of disabled elders. The proposed system in Chang
et al. (2013) assists the elders to call a power wheelchair via hand gestures, and the
same is used to park the wheelchair when no longer required. A Kinect-based tool
to support communication between deaf individuals was presented in Chai et al.
(2013) as a communication interface. The solution in Chai et al. (2013) is for the
purpose of communication between deaf only and has been implemented for basic
communication based on 370 daily Chinese words Authors in Armin, Mehrana, and
Fatemeh (2013) have used a Kinect-based solution as a teaching aid for disabled
people. A strategy was presented in Armin et al. (2013) to train children who are
deaf and blind with low or no reading and writing skills. A gestures-based computer
game was presented in Soltani, Eskandari, and Golestan (2012) for deaf/mute

individuals with an aim to encourage them to be active physically. .The same


approach can be extended to other games as well (Halim, Baig, & Hasan, 2012).
Some of the other related works on similar lines can be seen in Murata, Yoshida,
Suzuki, and Takahashi (2013), Cantn, Gonzlez, Mariscal, and Ruiz (2012), and
Memis and Albayrak (2013).

Methodology:
The proposed system consists of a series of steps for the detection of a particular
gesture and for translating it to vocal signals. Initially, the subject has to position

ID:6703
6

him/herself to face the Kinect device and start performing the gesture. The

performed gesture is captured by Kinect as a series of frames. The information of


interest is extracted from the captured frames and is then processed by the DTW
algorithm for gesture recognition. The gesture is then compared with those pre-
recorded in the systems dictionary. Once an equivalent match is found in the
dictionary the gesture is converted to text that is later used as an input to the off-the-
shelf speech engine to produce vocal signals; if it is not found, a Gesture not found
message is displayed. For the purpose of achieving our objective, Kinect SDK was
used. The proposed system has two modes: (a) recording mode, and (b) translation
mode. In recoding mode, the Kinect stream is captured and coordinates (Cartesian
coordinates) that correspond to joints of interest are extracted. Since the size of users
would definitely vary, it is imperative to normalize the data to nullify the users size
impact while comparing gestures. Normalization is done by shifting the origin from
Kinect to human body spine and then shifted to spherical coordinate system. Next,
the normalized skeleton frames are stored in a linked list, and once the gesture is
completed, it is written on to the memory in a gestures dictionary. In translation
mode, all of the procedure (listed earlier) is similar to that of recording mode. The
only difference occurs when the gesture ends. Unlike recording mode where the
gestures are stored in a gesture dictionary, in translation mode, the gesture currently
stored in a linked list is compared to the gestures in the dictionary. This is achieved
through DTW algorithm (Rath & Manmatha, 2003). This whole process is divided
into a number of steps which are discussed later in detail. The individual steps are
(1) receiving joint of interest from Kinect video stream, (2) normalizing the skeleton
frame data, (3) building a linked list of the normalized data, and (4) storing or
detecting the gesture. Most of the sign language gestures are above the bodys hip
bone and below the head, so in order to start and end a sign language gesture, the

ID:6703
7

user has to perform a specific pre-defined gesture. In our case, the system requires
the user to place both hands below hip bone with distance between hands less than
0.5 m to be able to start or end a sign language gesture. We call this gesture
Recording Translation Gesture, or RTG. After the RTG is performed, the
remaining sequences of steps are explained as follows.

Joints of Interest:
The Microsoft Kinect SDK provides with functions to get the Cartesian coordinate
of the joints. We use these coordinates to store the movements performed by the
users in order to record the performed gesture. Microsoft Kinect SDK is capable
of getting 20 joints of the human body. However, we are interested

in only those joints which are required for detecting sign language gestures. For
RTG gestures, we require hip joint and center joint. Figure 2 shows the joints of
interest, we have used (1) head, (2) right wrist, (3) left wrist, (4) right hand, (5) left

hand, (6) spine, (7) hip bone, (8) left shoulder, (9) center shoulder, and (10) right
shoulder. We store and track the coordinates of the joints and then normalize
them.

Normalization:
Every users height and dimensions can be different, and this has a huge impact on
the performance of the system. The reason being that the, X, Y, and Z coordinates
of joints of every user might be different. This can also happen due to the varying
position of a user from Kinect. Ideally, a user should be at a distance of six feet from
Kinect and straight in front of the camera, but it is not always the case. So, a need
to normalize the data is necessary in order to increase accuracy of gesture
recognition. The coordinates when captured are in Cartesian coordinate system.

ID:6703
8

These are then converted to a three-dimensional space system, known as spherical


coordinate system. This system consists of three attributes: (1) distance of point
from origin, r, (2) a polar angle, , and (3) an azimuth angle,. Angle is measured
withreference to the Z axis, and angle is measured with respect to x-axis in two
dimensional X-Y coordinate planes. To minimize the problem of differences in the
size of users, we normalize the distances of the joints from the origin. This is
achieved by dividing all the joints distances by some factor. We have chosen this
factor to be the distance of head to origin. Once this is done, the normalization of
a frame is complete.

Temporary Storage:
After normalization, the data are to be stored in memory. A linked list is
maintained to store the normalized skeleton frames until the gesture is completed.
This is done by storing the coordinates of joints in private variables in an object of
gesture class and forming a linked list of the objects. The ending RTG marks the end
of a gesture. When the complete system executes, a dictionary (explained in the
next section) is loaded to the memory in the form of a two dimensional linked list
of objects. As shown in Figure 3, all gestures are linked vertically, whereas each
gesture individually is connected to a list of objects that consists of the values of
the joints, spherical coordinates, and their normalized coordinates for each frame.

Dictionary:
A dictionary is maintained to store the gestures that will be recognized by the
system. There are two modes of the dictionary: recording and translation. In order
to add new gestures to the dictionary, the recording mode is enabled. In the
recording mode, once the ending RTG is performed, the normalized skeleton frame
linked list is written in the gesture dictionary. The gesture dictionary is a text file

ID:6703
9

with joints coordinates stored as separated by commas and each gesture is


enclosed in braces in order to differentiate them. In case of translation mode, DTW
algorithm is applied to compare the gesture stored in a linked list with the gesture
stored in gesture dictionary, and the result of the corresponding gesture is
returned. Each gesture is linearly searched for comparison and is compared
through the DTW algorithm. For details on computational aspects of the
methodology, the reader may refer to Masood, et al. (2014).

Communicating the Gesture to Non-Sign Language Speakers:


Once the performed gesture is recognized by the system, it is communicated to the
individual who may or may not be a sign language speaker. For this purpose, a text-
based user interface is used where the gestures meaning is displayed as text. The
same is also convertible to audio using a text to speech software. We have used an
off-the-shelf software tool, NaturalReader 12.0, for this purpose. The users can
both enable or disable the text to speech module subject to their convenience.

Experimentation and Results:


For the purpose of studying the performance of the proposed system and also to
get the answers to the research questions, the system was tested on multiple
subjects using two types of evaluations listed as follows:

Controlled environment:
This environment was based on a laboratory setup, and the research groups
premises were used for this purpose. Field evaluation: This environment was based
on a real-life shopping mart, and the Institutes shopping area was utilized for this
purpose. Shopping mart was chosen to test the systems performance in a real-life
scenario where the disabled interact with relatively unacquainted people. A total

ID:6703
10

of 10 disabled subjects including 4 females and 6 males participated in the


evaluation; their demographics are listed in Table 1. Five subjects had speech
impairment, and five had hearing impairment. As listed above, the systems
accuracy in recognizing the gestures (both sign language and generic gestures) was
tested in the controlled laboratory experiment. However, to test the acceptability,
usefulness, portability, and overall output of the system, we had to test it in the field
evaluation using a shopping mart setup. Figure 4 demonstrates the configuration
used in the field evaluation. The field setup was based on the public dealing
environment in the shopping mart where the disabled subjects (as listed in Table
1) were of different age groups and had to purchase consumables. The scenario
was purposely chosen to evaluate the effectiveness of the proposed assistive
technology in providing the adequate amount of assistance in communication to
the subjects.In order to make sure that the system is tested under diverse scenarios
with different input data, three sets of gestures were used in the controlled
environment, including PSL, generic gestures, and ASL gestures. These three sets of
gestures were further tested in three different configurations (in the controlled
environment only) of the systems in terms of distance from Kinect, weight assigned
to hand, and weight assigned to elbow. A total of 20 signs have been tested, where
10 are from PSL, 5 from ASL, and 5 generic signs. Furthermore, the 20 gestures to
be tested were also categorized into two groups listed as follows:

Controlled environment gestures: These include 20 gestures, with 10 from PSL, 5


generic gestures, and 5 ASL gestures.Field evaluation gestures: This includes PSL
and generic gesturesonly. The reason to this is somewhat expected familiarity to
these by the individuals involved in field evaluation. The gesture depiction and

ID:6703
11

details are listed in Table 2. All the 20 gestures are tested in controlled experiment,
and 15 out of these 20 in the field experiment.

Controlled Environment Results:


Initially, four individuals (not listed in Table 1) were asked to perform the 20 signs
(listed in Table 2) separately for recording in the sign language dictionary. Each of
these four individuals entered five gestures. Afterwards, the system was tested by
10 different individuals as listed in Table 1. To test the accuracy of the system, three
sets of experiments were performed with different configuration of distance from
Kinect and weights assigned to hand and elbow. First, the test users were
positioned at a distance of six feet from the Kinect with the weight assigned to hand
and elbow as 1. The same test was repeated by keeping the distance from Kinect
constant and by varying the weights of hand and elbow first to 0.5 and 0.2,
respectively, and then to 0.3 and 0.1, respectively. Each of the subjects tested the
system by performing all gestures in the dictionary thrice to get average results.
The system output was a Boolean value, where 1 was interpreted as a gesture
recognized and 0 as not recognized.With the aforementioned configurations, all
the gestures were successfully recognized by the system. The first three columns
of Table 3 present detailed results of the experiment. For the second configuration,
users were positioned at a distance of nine feet from the Kinect with the weight
assigned to hand and elbow as 1. The same test was repeated while keeping the
distance from Kinect constant and changing the weights of hand and elbow to 0.5
and 0.2 first and then 0.3 and 0.1. Each subject tested the system by performing all
gestures in the dictionary three times. In case any single attempt failed, we
interpreted it as gesture reorganization failure. With the aforementioned
configurations, all the gestures with the exception of 2 were successfully

ID:6703
12

recognized by the system. Table 3 also shows the detailed results of this
experiment. For the third configuration, users were positioned at a distance of 12
feet from Kinect with the weight assigned to hand and elbow as 1. The same test
was repeated with keeping the distance from Kinect constant and changing the
weights of hand and elbow to 0.5 and 0.2 first and then 0.3 and 0.1. All subjects
tested the system three times by performing the 20 gestures in the dictionary. With
the aforementioned configurations, all the gestures, with the exception of 4, were
successfully recognized by the system. The last three columns of Table 3 shows the
detailed results of this experiment. The system was successful in detecting almost
all of the gestures. The overall accuracy of the system is found by taking the average
of each of the tests, resulting in an overall accuracy of 91%.

Field Environment Results:


To study the effect of the proposed system in enhancing communication between

speech/hearing disabled and non-disable having no knowledge of the sign

languages, we conducted field experiments in two different sessions. In the first


session, 10 subjects interacted via sign language with other common individuals
without the support of the system. The common individuals were those handling
the sales counter at the shopping mart. During the two sessions, a total of five
individuals interacted with the subjects listed in Table 1. The task given to the 10
subjects with speech or hearing disability was to purchase any three items of their
choice and leave the shopping mart. Only one subject at a time entered the
shopping mart.We noted the duration of elapsed time in the shopping mart for
each of the subject during both the sessions.For the second session, the support of
the proposed solution was made available. Both these sessions were conducted on

ID:6703
13

different days. A survey immediately after this session was conducted. The survey
covered questions listed in the Research Questions section of this paper. The
participants of the survey were the 10 subjects and the 5 individuals handling the
cash counter with whom these subjects had interacted with and without the
support of the proposed system. The participants of the survey were asked
questions related to acceptability, usefulness, portability, overall output, and
importance of the system. The participants had to answer each question with any
of the three options: agreed, disagreed, or uncertain. It was noticed from the
survey that 100% of the individuals agreed about the proposed solutions
acceptability. For the aspect of usefulness of the solution, around 87% agreed that
the solution is useful and helps in effective communication, whereas almost 7%
disagreed about the usefulness of the system. Although the usefulness reported is
high, it can further be improved if the individuals using the system get some
training to use it, since the system does require proper positioning in front of the
Kinect in order to start recognition of the gestures. Around 53% of the participants
found the solution to be portable, whereas 46% disagreed to the portability of the
solution. Since the proposed solution needs to move Kinect and requires
connection to a processing unit, this certainly decreases the portability of the
proposed solution. But to make places like schools, shopping marts, and customer
service counters more friendly to speech and hearing impaired individuals, the
system can be installed there. 73% of the participants agreed that the overall
output of the system is worth using it. Based on this discussion, Table 4 summarizes
the key benefits and limitations of the proposed solution. The questions listed in
the Research Question section are answered by the proposed system with an
average agreement value by the participants as 74%, considering acceptability,

ID:6703
14

usefulness, portability, and overall output of the System .Although the results of
field survey suggests improved communication between sign language speakers
and non-sign language speakers due to the use of proposed system, to further see

the systems ability to assist in terms of time consumed, we compare the time taken
by each of the 10 disabled individuals with and without the support of the system.
The duration of time show the number of minutes consumed by the subject from
entry to exit from the shopping mart. Table 5 lists the details of duration and shows
average time consumed by the subjects as 13.2 minutes without the system, which
is reduced to an average of 8.4 minutes with the support of the proposed system.
There is decrease in every subjects time consumed, with the maximum decrease
to be 7 minutes and minimum as 2 minutes. A t-test on the data presented in Table
5 has been performed. The null hypothesis was that the system support does not
reduce the time required for a specific job, and its alternative hypothesis was that
the support of proposed system significantly reduces the time. The task with the

proposed systems support is completed in 36% less time. For the t-test with =

0.05 (95% confidence level), the test gave a p-value of .0000153, which was less

than . Thus, we reject the null hypothesis and accept the alternative hypothesis.

Discussion:
Based on the two types of evaluations, the major finding of this work is that the use
of the proposed assistive technology enhances the communication capability of
speech and hearing impaired. The key feature is the systems ability to recognize a
performed gesture and communicate it to a non-sign language speaker viaaudio or
text-based interface. From the point of view of controlled experiment, the accuracy
of the system is tested using the three sets of gestures by varying the distance of

ID:6703
15

user from Kinect and weights assigned to hand and elbow. For the closest value of
distance (6 feet) of the user from the Kinect, all the gestures have been recognized
by the system. As the distance is increased from 6 feet to 9 feet, the average
recognition accuracy is decreased from 100% to an average of 86%. Similarly, if we
further increase the distance to 12 feet, recognition accuracy further reduces to
75%. Thus, there is a positive correlation between Kinect-to-user distance and the
accuracy of the system. As the distance increases, the maximum decrease in
recognition accuracy is observed for Direct free kick, Quit, Order, and Cost. This is
due to the fact that these gestures involve minor variation of figures, and as the
distance increases, the systems either fails to recognize them or misclassifies these
as any other gesture with close resemblance. The field evaluation tests the system
from the point of view of its users, including both sign language and non-sign
language speakers. The field evaluation results in 100% acceptability of the
proposed system, and 88% of the subjects find the system useful. However, the
portability of the system is rated as 53%, which needs to be looked into in the
future. Around 66% of the subjects find the system helpful in assisting them with
better communication. If we consider the individuals with hearing or speech
disability only, the rating of the system in terms of assisting them in better
communication is increased to 90%. The overall importance of the system is 66%.
However, if the systems importance is studied by considering the disabled subjects
only, it goes up to 80%. This shows that the proposed system is more important for
the disabled as compared to the rest. In summary, the proposed solution is helpful
in improving communication between the sign language speakers and non-sign
language speakers. However, the portability aspect needs to be improved.
Limitations:

ID:6703
16

Some limitations of the proposed system are mentioned in this section based on
the controlled experiment and field evaluation. The proposed system recognizes
gestures which are stored in its dictionary; however, it has a low performance on
particular gestures that involve minor variations of fingers. To illustrate this, the
Quit and Order gestures in PSL have minor variations in the orientation of fingers,
and the system shows poor performance in both these cases. The proposed system
also needs to add a feature where hearing people could communicate with deaf
people.This will require the system to record the voice of hearing people and
convert this to sign language gestures later to be displayed on a screen. From the
point of view of field evaluation, the system has been tested only in the shopping
mart scenario. It will be interesting to see the systems performance in other
situations and places as well. The system also has limitations from a portability
perspective; it will be more useful for disabled if the system is already provided for
them in particular places; otherwise carrying the complete system for better
communications seems to be undesirable for the disabled, keeping the size and
weight of the gadgets in mind. The system has been tested using 20 gestures only.
It will be useful to evaluate the systems performance with a larger number of
words in its dictionary.

Conclusion and Future Directions:


The work in this article has presented a 3D depth camera (Kinect) based gesture
recognition system to assist speech and hearing impaired in communication with
the rest of the society having no knowledge of sign language. The proposed
methodology uses the Microsoft Kinect SDK to get joints of interest from users
and, after their normalization, stores them in a gesture dictionary. A DTW algorithm
is used to match gestures with those stored in the dictionary and later converted

ID:6703
17

to voice using an off-the-shelf software tool. The proposed approach is capable of


detecting gestures with good accuracy. By increasing the size of training data, the
accuracy may further be increased, since there will be a larger number of available
gestures in the dictionary and the chance of a gesture not being found in the
dictionary will decrease. The system has been tested on three sets of gestures,
including gestures from two sign languages and also the generic ones. An empirical
study of the proposed solution was also conducted using 10 disabled subjects with
speech and hearing disability. Based on the results of the study, 76% of the subjects
have considered the solution to be acceptable, useful, portable, and important for
better communication. Though the focus of the article is on assistive technology
for translating a given sign language to vocal language, the overall approach is not
limited to these. The proposed system can be tailored for use in other domains
where gesture recognition is desirable.

Future work:
Future work will involve making the system more portable for the disabled to carry
with them at their convenience. This can preferably be achieved by using the
camera of mobile phones to get a sequence of images of hand gestures and then
applying image processing techniques to get the gesture recognized. Since the
mobile camera is 2D, it will be challenging to calculate depth factor in gestures.
Another direction can be to combine the two techniques of gesture recognition and
finger detection for a complete system capable of detecting any type of gesture
specifically those involving minor variations of fingers. The presented system is
designed to recognize the gestures and then using the off-the-shelf software to

convert these to voice/text commands. It will be very useful to add a feature that

ID:6703
18

does the converse process for all possible sign languages. Since there are many sign
languages and each with its own dictionary, the optimization of memory utilization
in this case will be of importance. Incorporating facial expression in the sign
language can also be one of the important directions which need to be investigated
in the future. Another direction in which this work can be extended is to evaluate
its performance using a larger dictionary size having more than 20 gestures and
study its effect on the accuracy and speed.

References:
Aarons, D., & Philemon, A. (2002). South African sign language: One language or
many?. In R. Mesthrie (Ed.), Language in South Africa (pp. 127147). UK:
Cambridge University Press.

Abdur, R. M., Qamar, A. M., Ahmed, M. A., Ataur, R. M., & Basalamah, S. (2013,
April). Multimedia interactive therapy environment for children having physical
disabilities. Proceedings of the 3rd ACM Conference on International Conference
on Multimedia Retrieval, pp. 313314.

Al-Naymat, G., Chawla, S., & Taheri, J. (2009). SparseDTW: A novel approach to
speed up dynamic time warping. Proc. of the Eighth Australasian Data Mining
Conference, pp. 117127. Alvi, A. K., Azhar, M. Y. B., Usman, M., Mumtaz, S., Rafiq,
S., Rehman, R. U., & Ahmed, I. (2005). Pakistan sign language recognition using
statistical template matching. Proceedings of World Academy of Science,
Engineering and Technology (Vol. 3), pp. 14.

Armin, K., Mehrana, Z., & Fatemeh, D. (2013). Using Kinect in teaching children with
hearing and visual impairment. In M. Moazeni et al., (Eds.), Proceedings of the 4th

ID:6703
19

International Conference on E-Learning and E-Teaching, ICELET (pp. 8690).


February 1314, Shiraz, Iran: IEEE.

Buller, D. B., & Burgoon, J. K. (1996). Interpersonal deception theory.


Communication Theory, 6(3), 203242. Cantn, P., Gonzlez, . L., Mariscal, G., &
Ruiz, C. (2012). Applying new interaction paradigms to the education of children
with special educational needs. In K. Miesenberger et al., (Eds.), Proceedings of the
13th International Conference ICCHP, Part I (pp. 6572). July 1113, Linz, Austria:
Springer.

Chai, X., Li, G., Chen, X., Zhou, M., Wu, G., & Li, H. (2013, October). VisualComm: A
tool to support communication between deaf and hearing persons with the Kinect.
Proceedings of the 15th International ACM SIGACCESS Conference on Computers
and Accessibility, p. 76.

Chang, C-L., Chen, C-C., Chen, C-Y., & Lin, B-S. (2013). Kinect-based Powered
Wheelchair Control System. In D. Al-Dabass et al., (Eds.), Proceedings of the Fourth
International Conference on Intelligent Systems, Modelling and Simulation, ISMS
(pp. 186189). January 2930, Bangkok, Thailand: IEEE.

Chang, Y. J., Chen, S. F., & Huang, J. D. (2011). A Kinect-based system for physical
rehabilitation: A pilot study for young adults with motor disabilities. Research in
Developmental Disabilities, 32(6), 25662570. Cooper, R. G., & Kleinschmidt, E. J.
(2011). New products: The key factors in success. Decatur, GA: Marketing Classics
Press.

Diwakar, S., & Basu, A. (2008). A multilingual multimedia Indian sign language
dictionary tool. Proceedings of 6thWorkshop on Asian Language Resources,
Hyderabad, India, 112 January 2008, pp. 6572. Hollis, S. (2011). Sign Language

ID:6703
20

for Beginners: Discover the Art of Sign Language. Bristol & West House,
Bournemouth, UK: Print Smarter.

Halim, Z., Baig, A. R., & Hasan, M. (2012). Evolutionary search for entertainment in
computer games. Intelligent Automation & Soft Computing, 18(1), 3347. A Kinect-
Based Sign Language Gesture Recognition 43 Lahamy, H., & Lichti, D. (2010). Real-
time hand gesture recognition using range cameras. Proceedings of the Canadian
Geomatics Conference, Calgary, Calgary, 1518 June 2010, pp. 16.

Lang, S., Block, M., & Rojas, R. (2012). Sign language recognition with Kinect.
Proceedings of the 11th International Conference on Artificial Intelligence and Soft
Computing, Zakopane, April 293 May 2012, pp. 394402.

Li, K. F., Lothrop, K., Gill, E., & Lau, S. (2011). A web-based sign language translator
using 3D video processing. Proceedings of IEEE International Conference on
Network-Based Information Systems, Melbourne, 2628 September 2011, pp. 356
361.

Li, Y. (2012). Hand gesture recognition using Kinect. Proceedings of 3rd IEEE
International Conference on Software Engineering and Service Science, Beijing, 22
24 June 2012, pp. 196199.

Lucas, C. (Ed.). (1990). Sign language research: theoretical issues. Washington D.C.:
Gallaudet University Press.

Masood, S., Parvez Q.M., Shah, M.B., Ashraf, S., Halim, Z., & Abbas, G. (2014).
Dynamic time wrapping based gesture recognition In A. Ghafoor et al., (Eds.),
Proceedings of the International Conference on Robotics and Emerging Allied
Technologies in Engineering, iCREATE (pp. 205210). April, 2224, Islamabad,
Pakistan: IEEE.

ID:6703

Potrebbero piacerti anche