Sei sulla pagina 1di 4

International Journal of Computer Applications (0975 8887)

Volume 94, May 2014


1
Sign Language Recognition using Neural Networks

Manoj Chavan
Asst. Professor, TCET, Mumbai
manoj.chavan@thakureducation.org
Akshay Phadke
University of Mumbai, Mumbai
akshayxomega@yahoo.com
Srishti Narayan
University of Mumbai, Mumbai
srishtinarayan@yahoo.com


ABSTRACT

This paper presents a technique to recognize hand gestures to
facilitate and enhance communication between normal and
hearing-impaired individuals. The basic signs and symbols of
the Indian Sign Language are translated to the corresponding
English text with the help of various Image Processing
techniques (image segmentation and edge detection) and
Neural Networks.

General Terms
Sign language, Gesture Recognition, Neural Networks.

Keywords
HCI, Computer Vision, Image segmentation, Edge Detection,
Moment Functions, Zernike Moments, Backpropagation
Algorithm.

1. INTRODUCTION

People belonging to the deaf and dumb community are unable
to communicate in the conventional manner, i.e. by means of a
spoken language. Thus need for an alternative language arose.
A form of communication was created which makes use of the
hand movements and facial expressions as gestures for
individual words. These gestures act as symbols representing
individual alphabets and words.

Thus a basic foundation is laid for a new language based on
these gestures. This is referred to as sign language. Various
countries have adopted their own standard sets of gestures to
constitute their own version of the sign language. This sign
language is used by the hearing impaired and mute people to
communicate amongst them.

When a mute person and a hearing-unimpaired person need
to communicate, an interpreter is required to enable
communication as they do not have a language in common. The
interpreters tasks are to facilitate communication without
becoming personally involved in the interaction and to give
both parties equal access to culturally appropriate information.
Sign language interpreters are needed in all aspects of life. This
may not be feasible all the time due to unavailability of
translator.

2. IMAGE SEGMENTATION

In signal processing, it is often desirable to be able to perform
some kind of noise reduction on an image or signal. Such noise
reduction is a typical pre-processing step to improve the results
of later processing. The aim is to isolate the hand and remove
the background in order to avoid false detection in later stages.
The color of the skin is taken as the parameter to segment the
input image [1]. The image captured is in the RGB color space.
This gives us only the information of the color primaries and
not about luminosity [2]. Thus we need to convert the input
image into YCbCr color space as follows:

Y = 0.299R+0.587G+0.114B
Cr = RY
Cb = BY

By converting the image into YCbCr color space, we obtain
information about Luminosity (Y), Blue-difference (Cb), and
Red-difference (Cr) components (Chrominance components).





Fig.1: CbCr planes for Y=0, Y=0.5, Y=1 respectively

By choosing the appropriate values of Chroma components
as thresholds we can isolate the hand from its background.
However, the thresholding is not perfect and leads to false
detection of the background in some cases where the local pixel
values of the background lie within the range of threshold
values. This unwanted background noise can be eliminated by
filtering it using median filters. Following are the threshold
values that gave best results that were chosen on a trial and error
basis [3]:-

77 Cb 127 and 133 Cr 173

Fig.2 shows the outcome for image segmentation by
thresholding. A bi-level skin mask is obtained by thresholding
International Journal of Computer Applications (0975 8887)
Volume 94, May 2014
2
and that mask is used to remove the background from the
grayscale version of the input image. The input image is
converted to grayscale so as to apply edge detection on it in the
next stage.



Fig.2: Image segmentation for skin detection


3. EDGE DETECTION
Edge detection is the name for a set of mathematical methods
which aim at identifying points in a digital image at which the
image brightness changes sharply (discontinuities). The points
at which image brightness changes sharply are typically
organized into a set of curved line segments termed as edges
[4]. As the edge detection method is able to provide us with
discontinuities in the intensity of the pixel values, it gives us
the outline of the hand and other features such as position of the
fingers, etc. Thus we obtain a bi-level image which traces the
outline of the hand. This image can be employed so as to be
able to recognize the gestures.
3.1 CANNY EDGE DETECTION
The Canny edge detection algorithm is known to many as the
optimal edge detector. It is important that edges occurring in
images should not be missed and that there be no responses to
non-edges. Also, the distance between the edge pixels as found
by the detector and the actual edge is to be at a minimum. Based
on these criteria, the canny edge detector first smoothes the
image to eliminate and noise. It then finds the image gradient
to highlight regions with high spatial derivatives. The algorithm
then tracks along these regions and suppresses any pixel that is
not at the maximum (non-maximum suppression) [5].

Usually, the upper tracking threshold can be set quite high
and the lower threshold quite low for good results. Setting the
lower threshold too high will cause noisy edges to break up.
Setting the upper threshold too low increases the number of
spurious and undesirable edge fragments appearing in the
output.

The canny edge detection algorithm was employed in our
technique taking into consideration the superior performance as
compared to other techniques. Fig.3 shows the canny edge
detection technique applied to the image after skin detection
has been applied on it.

Fig.3: Canny edge detection after skin detection
4. IMAGE MOMENTS
An image moment is a certain particular weighted average
(moment) of the image pixels' intensities, or a function of such
moments, usually chosen to have some attractive property or
interpretation [6]. Image moments are useful to describe objects
after segmentation. Simple properties of the image which are
found via image moments include area (or total intensity), its
centroid, and information about its orientation.
4.1 ZERNIKE MOMENTS

The Zernike polynomials were first proposed in 1934 by
Zernike [7]. Complex Zernike moments are constructed using
a set of complex polynomials which form a complete
orthogonal basis set defined on the unit disc (x
2
+ y
2
) 1. They
are expressed as:

=
+1

(, )[

(, )]




where m = 0,1,2,... and defines the order, f(x,y) is the function
being described, * denotes the complex conjugate, while n is an
integer depicting the angular dependence, or rotation, subject
to the conditions:

|| = , and || m

Vmn(x,y) indicates the Zernike Polynomial and is expressed in
the polar form as:

(, ) =

()



where (r, ) are defined over the unit disk and Rmn(r) is the
orthogonal radial polynomial, defined as:

() = (1)

( )!
! (
+||
2
) ! (
||
2
) !

2
||
2
=0


Zernike functions were chosen for moment formulation as
they are one of the most popular, outperforming the alternatives
in terms of noise resilience, information redundancy and
reconstruction capability. Zernike function gives us moment
International Journal of Computer Applications (0975 8887)
Volume 94, May 2014
3
vector which is complex in nature. The moment vector is given
to the Neural Network as an input.

5. NEURAL NETWORKS

Artificial intelligence, cognitive modelling, and neural
networks are information processing paradigms inspired by the
way biological neural systems process data. Artificial
intelligence and cognitive modelling try to simulate some
properties of biological neural networks.
Neural network is a processing device which maybe
expressed as hardware or an algorithm. It is inspired by the
design and functioning of the human brain. It is also known as
artificial neural network or a neural net. An artificial neural
network (ANN) maybe defined as an information processing
model that is inspired by the way biological nervous system,
such as the brain, process information [8]. An ANN is
composed of a large number of highly interconnected
processing elements (neurons) working in unison to solve a
particular problem.


Fig.4: Structure of Artificial Neural Network (ANN)

For example, in a neural network for handwriting recognition,
a set of input neurons may be activated by the pixels of an input
image representing a letter or digit. The activations of these
neurons are then passed on, weighted and transformed by some
function determined by the network's designer, to other
neurons, etc., until finally an output neuron is activated that
determines which character was read.

What they do have in common, however, is the principle of
non-linear, distributed, parallel and local processing and
adaptation
5.1 BACKPROPAGATION ALGORITHM

Backpropagation, an abbreviation for "backward propagation
of errors", is a common method of training artificial neural
networks. From a desired output, the network learns from many
inputs [9]. It is a supervised learning method, and is a
generalization of the delta rule. It requires a dataset of the
desired output for many inputs, making up the training set. It is
most useful for feed-forward networks. Fig.5 shows the
flowchart for the backpropagation training algorithm [10].































Fig.5: Flowchart of Backpropagation Algorithm

The backpropagation algorithm was used to train the Neural
Network to compute the weights of the synapses and the trained
network was tested by providing the test inputs to the network.

6. RESULTS
The recognition of the hand gestures was carried out in the
following manner:

1. The input image was converted from the RGB color space
to the YCbCr color space. Skin detection was carried out
by image segmentation by thresholding by choosing the
appropriate values of the threshold.
2. The segmented image was provided to the canny edge
detector to obtain the outlines of the hand which provided
the information about shape of hands and position of
fingers
Normalize the inputs and outputs
Set no. of neurons in Hidden Layer
Initialize the weight matrices representing the
synapses between the layers to random values
between -1 and 1
Present input and desired output
Compute inputs to hidden layer by multiplying
input with weights of corresponding synapse
Evaluate output of hidden layers by using
activation function
Compute inputs to output layer by multiplying
output of hidden layer with weights of
corresponding synapse
Evaluate output of output layers by using
activation function
Calculate the error
Is
Error < tolerance
?
Adjust the weights using the delta rule with
constant learning rate
No
Yes
End
International Journal of Computer Applications (0975 8887)
Volume 94, May 2014
4
3. The image was converted to a bi-level image, was resized
into a square image, and was given to the Zernike
function to calculate the Zernike moments for the image.
4. The moment vector generated was given to the trained
Neural Network as an input to obtain output
corresponding to the given input image.
The trained Neural Network was tested with sample images
for each of the gestures. The result of the test are as follows:
Table1: Results of testing trained Neural Network
Gesture No. of samples Correct
detection
Percentage
accuracy
A 20 20 100
B 20 18 90
C 20 17 85
D 20 15 75
E 20 18 90
F
H
I
J
K


























It is clear from the experimental results that the trained
Neural Network is successful in accurately detecting the hand
gestures of the Indian Sign Language with an average
accuracy of %
7. CONCLUSION

The development of a system for translating Sign Language
into English would be of great help for deaf as well as hearing
people. There is a need of automatic sign language recognition
system which can cater the need of hearing impaired people.

The ultimate gain of the proposed system would be
enormous. It will create awareness about the Language and also
enable easy translation. On further modification, it may also be
used as teaching tool in educational institutes for the hearing
handicapped.

8. ACKNOWLEDGEMENTS

The authors would like to thank Mrs. Sangeeta Mishra,
Assistant Professor, Department of Electronics and
Telecommunication Engineering, Thakur College of
Engineering and Technology, Mumbai for her guidance and
help.
9. REFERENCES

[1] Albiol, A., Torres, L., and Delp, E. J. 2001. Optimum
color spaces for skin detection. In Proceedings of the
International Conference on Image Processing, vol. 1,
122124.
[2] YCbCr, Wikipedia
(URL: http://en.wikipedia.org/wiki/YCbCr)
[3] D. Chai and K. N. Ngan, Face segmentation using skin
colour map in videophone applications, IEEE
Transactions on Circuits and Systems for Video
Technology 9 (4) (1999) 551-564.
[4] Edge Detection definition, Wikipedia
(URL: http://en.wikipedia.org/wiki/Edge_detection)
[5] Canny, J., A Computational Approach to Edge
Detection, IEEE Trans. Pattern Analysis and Machine
Intelligence, vol. 8:679-714, November 1986.
[6] Image Moments definition, Wikipedia
(URL: http://en.wikipedia.org/wiki/Image_moment)
[7] F. Zernike. Diffraction theory of the cut procedure and its
improved form, the phase contrast method. Physica, 1:pp.
689-704, 1934.
[8] Artificial Neural Network definition, S.N. Sivanandan,
S.N. Deepa - Principles of Soft Computing.
[9] Backpropagation definition, Wikipedia
(URL: http://en.wikipedia.org/wiki/Backpropagation)
[10] S. Rajasekaran, G.A. Vijayalakshmi Pai Neural
Networks, Fuzzy Logic, and Genetic Algorithms.

Potrebbero piacerti anche