Sei sulla pagina 1di 1

Automatic Translation of Brazilian Sign Language

(LIBRAS) With Hidden Markov Models (HMM)


Diego G. Dias
Departamento de Computao,
Universidade Federal de So Carlos
So Carlos, Brazil
diego.dias@dc.ufscar.br;

Ednaldo B. Pizzolato
Departamento de Computao,
Universidade Federal de So Carlos
So Carlos, Brazil
ednaldo@dc.ufscar.br

Abstract - the question of sign language automatic translation has been widely debated in the computer vision field. However, some works have not adequately
addressed the issue of the quality of sign gesticulation. This is mainly because most of them have been using samples that are made by non-native communicators, which
are different from samples of native communicators. This paper addresses the issue of LIBRAS translation with special attention to features of signs made by natives and
experts. Specifically, we look at translation of natives and experts signs that show that even though these kind of samples are complex, there are statistical methods able
to deal with them. The result is a database with videos of LIBRAS made by 21 people (18 natives, 2 experts and 1 student of LIBRAS). In addition, this article describes
the creation of a sign language translation solution using pattern recognition and image processing.
Keywords: Sign Language; Pattern Recognition; Image Processing; Gestutre Recognition; Hidden Markov Models.
Hidden Markov Models (HMM) have already been successfully used in various areas of the human
knowledge. HMM have been applied in voice recognition or in analysis and interpretation of images
and signals with encouraging results. In this paper the HMM was used to create a solution of
automatic translation of Libras. In order to create this solution, first, it was created a data set
organized in training and validation data, this data set was named ASRP100. ASRP100 has videos
of 100 sentences made by 21 native and experts communicators.

Data Set
the asrp100 follows a pattern used in other works that created a state of the art sign-language
database. The pattern includes using signs made by native and experts, using videos with subtitles
and using sentences that belong to a specific context. In this paper, two contexts were used:
everyday situations and dialogs in hospitals. Besides, the ASRP100 was recorded using a depth
sensor called Kinect. As a result, it has three types of data: depth data, RGB images and skeleton
data (Image 1).

Image 1. From left to right: image in RGB, depth map and skeleton data
Table 1 Comparing ASRP100 with databases containing videos of LIBRAS

Results
Table 3 The final results of the automatic translation
Palavra

Treinamento

Validao

Acerto

% Acerto

Eu

478

239

177

74,06 %

Ter

170

85

37

43,53 %

El@

130

65

18

27,69 %

124

62

33

53,23 %

Morar

88

44

34

77,27 %

Problema

88

44

31

70,45 %

Dor

80

40

18

45,00 %

Namorad@

56

28

16

57,14 %

Viajar

112

56

46

82,14 %

Famlia

110

55

48

87,27 %

Cncer

110

55

37

67,27 %

Febre

42

21

42,86 %

108

54

39

72,22 %

Quebrar

102

51

35

68,63 %

Torcer

108

54

34

62,96 %

Pedra Rins

36

18

10

55,56 %

Database

Deaf

Listeners

Words

Sentences

Videos

Annotations

Pizza

20

10

60,00 %

ITA2008
DBOSCO

0
0

2
4

117
50

0
0

234
600

*
*

Dengue

88

44

35

79,55 %

UFSCAR*
MADEO*
CANEIRO

0
*
3

45
*
0

41
51
ALF

0
0
0

1845
*
234

*
*
*

Cinema

88

44

39

88,64 %

Avio

80

40

33

82,50 %

Brao

88

44

36

81,82 %

ASRP100

18

509

100

4200

SIM

Aids

88

44

37

84,09 %

Prstata

84

42

32

76,19 %

Operar

82

41

33

80,49 %

Comer

80

40

34

85,00 %

Formigamento

80

40

36

90,00 %

Table 2 Comparing ASRP100 with international databases containing videos of sign language..
Database

Deaf

Listeners

Words

Sentences

Videos

Annotations

RWTH
ECHO-NGT
ECHO-BSL
ECHO-SLL
A3LIS-147
ASRP100

4
100
2
1
10
18

0
0
0
0
0
3

*
2647
2865
3117
147
509

843
240
262
159
0
100

*
*
*
104
294
4200

SIM
SIM
SIM
SIM
SIM
SIM

Mdia

69,06%

Depth Image Processing and Pattern Recognition

Image 2. Using Virtual Wall in depth image processing . On the left side,
the all user's body and the right side the image resultant of Virtual Wall

Image 3. Feature extraction

Image 4. Inference process of Discrete HMM

Conclusion - In this poster we presented an overview of the ASRP100 (a database with 4200 videos
made by native and expert communicators) and breafly describes the process of depth image
processing and pattern recognition to achieve automatic translation from LIBRAS to Portuguese.

Potrebbero piacerti anche