Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Ednaldo B. Pizzolato
Departamento de Computao,
Universidade Federal de So Carlos
So Carlos, Brazil
ednaldo@dc.ufscar.br
Abstract - the question of sign language automatic translation has been widely debated in the computer vision field. However, some works have not adequately
addressed the issue of the quality of sign gesticulation. This is mainly because most of them have been using samples that are made by non-native communicators, which
are different from samples of native communicators. This paper addresses the issue of LIBRAS translation with special attention to features of signs made by natives and
experts. Specifically, we look at translation of natives and experts signs that show that even though these kind of samples are complex, there are statistical methods able
to deal with them. The result is a database with videos of LIBRAS made by 21 people (18 natives, 2 experts and 1 student of LIBRAS). In addition, this article describes
the creation of a sign language translation solution using pattern recognition and image processing.
Keywords: Sign Language; Pattern Recognition; Image Processing; Gestutre Recognition; Hidden Markov Models.
Hidden Markov Models (HMM) have already been successfully used in various areas of the human
knowledge. HMM have been applied in voice recognition or in analysis and interpretation of images
and signals with encouraging results. In this paper the HMM was used to create a solution of
automatic translation of Libras. In order to create this solution, first, it was created a data set
organized in training and validation data, this data set was named ASRP100. ASRP100 has videos
of 100 sentences made by 21 native and experts communicators.
Data Set
the asrp100 follows a pattern used in other works that created a state of the art sign-language
database. The pattern includes using signs made by native and experts, using videos with subtitles
and using sentences that belong to a specific context. In this paper, two contexts were used:
everyday situations and dialogs in hospitals. Besides, the ASRP100 was recorded using a depth
sensor called Kinect. As a result, it has three types of data: depth data, RGB images and skeleton
data (Image 1).
Image 1. From left to right: image in RGB, depth map and skeleton data
Table 1 Comparing ASRP100 with databases containing videos of LIBRAS
Results
Table 3 The final results of the automatic translation
Palavra
Treinamento
Validao
Acerto
% Acerto
Eu
478
239
177
74,06 %
Ter
170
85
37
43,53 %
El@
130
65
18
27,69 %
124
62
33
53,23 %
Morar
88
44
34
77,27 %
Problema
88
44
31
70,45 %
Dor
80
40
18
45,00 %
Namorad@
56
28
16
57,14 %
Viajar
112
56
46
82,14 %
Famlia
110
55
48
87,27 %
Cncer
110
55
37
67,27 %
Febre
42
21
42,86 %
108
54
39
72,22 %
Quebrar
102
51
35
68,63 %
Torcer
108
54
34
62,96 %
Pedra Rins
36
18
10
55,56 %
Database
Deaf
Listeners
Words
Sentences
Videos
Annotations
Pizza
20
10
60,00 %
ITA2008
DBOSCO
0
0
2
4
117
50
0
0
234
600
*
*
Dengue
88
44
35
79,55 %
UFSCAR*
MADEO*
CANEIRO
0
*
3
45
*
0
41
51
ALF
0
0
0
1845
*
234
*
*
*
Cinema
88
44
39
88,64 %
Avio
80
40
33
82,50 %
Brao
88
44
36
81,82 %
ASRP100
18
509
100
4200
SIM
Aids
88
44
37
84,09 %
Prstata
84
42
32
76,19 %
Operar
82
41
33
80,49 %
Comer
80
40
34
85,00 %
Formigamento
80
40
36
90,00 %
Table 2 Comparing ASRP100 with international databases containing videos of sign language..
Database
Deaf
Listeners
Words
Sentences
Videos
Annotations
RWTH
ECHO-NGT
ECHO-BSL
ECHO-SLL
A3LIS-147
ASRP100
4
100
2
1
10
18
0
0
0
0
0
3
*
2647
2865
3117
147
509
843
240
262
159
0
100
*
*
*
104
294
4200
SIM
SIM
SIM
SIM
SIM
SIM
Mdia
69,06%
Image 2. Using Virtual Wall in depth image processing . On the left side,
the all user's body and the right side the image resultant of Virtual Wall
Conclusion - In this poster we presented an overview of the ASRP100 (a database with 4200 videos
made by native and expert communicators) and breafly describes the process of depth image
processing and pattern recognition to achieve automatic translation from LIBRAS to Portuguese.