Sei sulla pagina 1di 5

\or|evi} D., Mihajlov D., Josifovski Lj.

, Computer System for Supprot of Humans With Damaged Sight: Subsystem for Optical Character , Simpozijum o ra~unarskim naukama i informatici, Zbornik radova YU Info '95, Knjiga 3, pp. 240-243, Brezovica, Srbija, 1995.
Recognition of Printed Cyrillic Text

Computer System for Support of Humans With Damaged Sight: Subsystem for Optical Character Recognition of Printed Cyrillic Text

Dejan 1 \or|evi}
2

, Dragan Mihajlov , Ljubomir Josifovski

Faculty of Electrical Engineering - Skopje Faculty of Mechanical Engineering - Skopje optical character recognition of printed Cyrillic text, as a part of the system. 2. SYSTEM DESCRIPTION A schematic structure of the system for helping people with damaged sight is given on Fig. 2. The system includes a personal computer, a graphical scanner, a Braille printer and a speech synthesizer.

Abstract - A subsystem for optical character recognition of printed Macedonian Cyrillic text as a part of a system for supporting humans with damaged sight is presented. The system includes printed Cyrillic text recognition, and its printing on a Braille printer or speech synthesis. Various commercial programs for optical character recognition had been studied, but they showed to be inappropriate for implementation in the system. Therefore a program for optical character recognition was developed. Various types of classifiers like: simple overlapping of binary matrixes, neural network based perceptron, back-propagation and adaptive logical networks have been tested. Comparative analysis considering their accuracy, speed, resource requirements, and sensitivity to noise and deformations have been carried out. key words: Cyrillic text, optical character recognition, adaptive logical network, neural network 1. INTRODUCTION People use their senses as sight, hearing, touch, taste and smell for receiving information from the surroundings (Fig. 1). If one information input is disabled, it is possible to redirect the information from this input to the others after reconstructing it in a form which can be sensed by the other senses. The humans with damaged sight in absence of, or reduced abilities to receive information by the sense of sight, are being trained to receive information by other senses, primarily those of touch and hearing. For their needs the materials are printed in Braille writing, or recorded on audio tapes. Materials prepared this way are restricted and not always very actual.
s ig h t sigh t

spe aker

scan ne r

co m puter

B ra ille p rinter

Figure 2. System for helping people with damaged sight Printed text is being scanned with a scanner and the graphical image data is memorized in the computer in a form of a bit-mapped file. An optical character recognition program analyzes the picture of the scanned text, recognizes the printed characters, and stores them in a text file. This text file can further be printed on a Braille printer, or synthesized as a speech which can be heard from the loudspeaker. Ones converted to text form documents can be stored on diskettes, and reused for printing or speech synthesis again (Fig. 3). Several commercial products for optical character recognition have been tested, but they appear to be rather inappropriate for recognizing Macedonian Cyrillic letters, and incorporation in the system. Some commercial programs are capable of training for foreign characters, but training for recognizing Cyrillic letters is tricky, because of the absence of possibility for redefining the recognition of some Cyrillic letters that have the shape of some Latin letters. This is the reason why special program for optical character recognition (OCR) was developed. Several methods for character recognition have been tried. It was started with a simple matrix overlapping method which was expectedly not very satisfactory. Other methods like contour tracing have been tried too, but they appear to be slow and very complicated for implementation. A neural network model as a pattern classifier was accepted, as it appears to be very flexible. Several neural network architectures like: two-layer perceptron, multilayer backpropagation neural network, and adaptive logical network

h e arin g

to u ch

Figure 1. Model of human as a system As a result of a joint collaboration between Faculty of Electrical Engineering and Department for Rehabilitation of Children and Youngsters with Damaged Sight "Dimitar Vlahov" in Skopje a system for helping people with damaged sight is under development. The system includes automatic reading of printed Macedonian Cyrillic text, its printing on Braille printer or its synthesis by a speech synthesizer [1]. This paper addresses the subsystem for



h e arin g

to uc h

240

prin ted te xt scan n in g

bitm ap

B raille w riting

OCR
Licata so o{teten vid, vo nedostatok ili namaleni mo`nosti da primaat informacii so setiloto za vid, se obu~uvaat ovoj nedostatok da go

te xt file sp ee ch sp ee ch synth e sis


Figure 3. Recognizing text, printing, archiving and speech synthesis have been realized, and all of them have shown satisfactory results. 3. SUBSYSTEM REALIZATION A program for optical character recognition of Cyrillic text has been developed. The program was written in C and works on a PC compatible computer.
Optical scanner

a rchivin g in dig ita l form

It consists of three parts. In the first part scanned picture of the text is being analyzed and characters are being separated (Fig. 4). The second part performs the recognition of the characters, while the third part is used for training the network for recognizing. The second and the third part of the program are realized in several versions using different neural networks and teaching strategies.

TIFF file

Analyzing the picture

Locating text lines

Locating words

Figure 5. Horizontal projections of a scanned text


Locating characters

Separating characters in individual matrixes

Figure 6. Vertical projection in a single line of scanned text


Filtering

Scaling

to pattern classifier

Figure 4. Character separation

The first part of the program uses a TIFF file holding the scanned text as input. It analyzes the picture and locates the lines of text, then the words in a line and separates the characters at the end. Horizontal projections of the pixels are used to locate the lines of text on the picture (Fig. 5). Similarly the vertical projections of the pixels for every line are calculated and used for locating the words in that line (Fig. 6). At the end the characters in the word are located. Obtained binary matrixes are of different 241

dimensions, and they are copied in a square matrix, where some filtering is performed. After the filtering, the sample is scaled to binary square matrix of constant dimensions (1616) which is send to the neural network for recognition. The second part of the program is an emulation of neural network, which actually does the pattern recognition and classification. Several neural network architectures have been realized and compared in their abilities for character recognition. 4. COMPARISON ON APPLIED NEURAL NETWORK ARCHITECTURES The ability of neural networks to be trained to recognize samples, and ones trained to generalize i.e. to give appropriate responses to other possible input vectors not presented during training, makes them an excellent candidate for implementation in OCR systems. In order to determine the abilities of different neural network architectures to be used for character recognition, the second and the third part of the program have been realized in several versions, each using different network architecture. In the first version a simple two layer perceptron-type neural network was used (Fig. 7). The first layer consists of 256 neurons each connected to one element of the binary matrix holding the character sample for recognition. The number of neurons in the second layer corresponds to the number of letters which should be recognized. The network was trained using simple delta rule. This network appeared to be rather fast during training and recognition, but not accurate enough.
le tte r o

functions AND and OR (or logical gates if realized in hardware) enables performing of the basic pattern classification computations of binary information at extremely high speed. It is commonly organized in a form of a binary tree with nodes of two types: adaptive elements, and leaves (Adaptive TREE). Each element has two inputs and can operate as an AND, OR, LEFT, or RIGHT gate. Binary inputs and their complements are randomly connected to the leaves, while the output is received on the root of the tree (Fig. 8). Adaptive logic networks are very convenient for hardware implementation, and can evaluate extremely fast. Even emulated in software they are much faster then other neural network models. On the other hand, by increasing the dimension of the problem, the complexity of the ALN increases relatively slow, despite of for e.g. MLP. Training ALN is also much faster then training MLP. Ones trained, the ALN has built-in capacity to generalize and maintain excellent noise immunity [2]. Another feature of the ALNs is that one can use all the data he has to train the network, without performing selection of the data needed for making decision.
Output vector
. . . . . .

Multiple parallel trees

Random connections

. . .

. . .

Complements

Input binary vector


bit m ap

Figure 8. Adaptive logic network Our implementation of ALN uses one or more trees per output class. Every tree is trained to recognize a single class, i.e. to answer whether the sample belongs to its class or not. Several trees (usually an odd number) are provided for every class and they vote for decision with the trees of the other classes (Fig. 9). The maximum selector decides in favor of the character with largest number of votes. We have tried networks with 512 to 2048 leaves and realizations of 1 to 9 voters. ALN appeared to be relatively fast during training, and extremely fast during recognition. It gives very good results in recognition especially when using more voters.

Figure 7. Two-layer perceptron In the second approach a multilayer-layer neural network was used. We have tested 3 and 4 layer network with 50 to 300 neurons in the hidden layers. Back propagation was used for training. Training of this network was extremely slow, but ones trained it gives good results in recognition, but the recognition was rather slow. In the third approach an adaptive logic network (ALN) was used. An adaptive logic network is a special case of the familiar multilayer perceptron (MLP) feedforward network. Its unique architecture utilizing only logical

242

CLASS

MAXIMUM SELECTOR

5
voters
. . . . .

suitability for implementation in OCR system have been carried out. Simple two-layer perceptron although relatively fast and with small resource requirements showed rather poor results during recognition. Multilayer-layer neural networks have shown quite acceptable accuracy in recognition, but they appear to be very slow. Adaptive logic networks appeared to be most flexible. They are very fast during recognition and training, and have shown satisfactory accuracy. Once in use this system would contribute in helping people with damaged sight to follow daily press and literature to date. Considering the fact that the studies have been made on a simulation example, it is aimed to use the comments of the end users as a feedback for further improvement of the system.

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1

0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

1 1 1 1 0 0 1 1 1 0 0 0 0 1 1 1

1 1 1 1 0 0 1 1 1 0 0 0 1 1 1 1

1 1 1 1 0 0 1 1 1 0 0 0 1 1 1 1

0 1 1 1 0 0 1 1 1 0 0 0 1 1 1 1

0 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1

0 1 1 1 0 0 0 1 1 1 1 1 1 1 1 0

0 1 1 1 0 0 0 1 1 1 1 1 1 1 1 0

0 0 0 0 0 0 0 0 1 1 1 1 1 1 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

sample (binary matrix 16x16)

Figure 9. Pattern recognition using adaptive logic trees Sensitivity on Rotation, Noise Translation + +/+ + +/+ +/+/+ big Different Fonts + +/+/+/+/Resources required Memory +/+ + fear Time + +/+ Complexity + +/+ + small Recognition + +/+

Overlapping Contour 2L Percep. MLP (BP) ALN

Table 1 Training of the networks is performed by a dedicated program for automatic training to which a set of character samples together with the classes they correspond to are presented. After the training, weights of the neurons, or the functions in the nodes of the ALN are being saved in a file which is later used during the recognition. Summary of the features of different methods commonly used for optical character recognition is given in Table 1. Summary of the speed of different neural network architectures during training and recognition is given in Table 2. Time during Training Recognition + +/Table 2 5. CONCLUSION A subsystem for optical character recognition of printed Macedonian Cyrillic text is realized. The subsystem is implemented in a system for support of humans with damaged sight. Analysis of the properties of different types of neural network architectures considering their LITERATURE [1] D. Mihajlov, D. \or|evi}, N. Kotevska, "Computer System for Support of Humans with Damaged Sight", ETAI, Ohrid, 1993 [2] G. v. Bochmann, W. Armstrong, Properties of Boolean Functions with a Tree Decomposition, BIT 13, 1974. pp. 1-13. [3] W. Armstrong and G. Godbout, "Properties of Binary Trees of Flexible Elements Useful in Pattern Recognition", IEEE 1975 International Conf. on Cybernetics and Society, San Francisco, 1975, IEEE Cat. No. 75 CHO 997-7 SMC, pp. 447-449. [3]

2L Perceptron MLP (BP) ALN

D. \or|evi}, "Opticko prepoznavanje na znaci so

upotreba na adaptivna logicka mre<a", seminarska rabota po predmetot nevronski mre`i i paralelno distribuirano procesiranje, Elektrotehni~ki fakultet - Skopje, 1995. [5] McClelland, J.L., Rumelhart, D.E., Exploration in Parallel Distributed Processing: A Handbook of Models, Programs, and Exercises, MIT Press, 1988.

243

244

Potrebbero piacerti anche