Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Abstract - In this paper we present a simple method using a Iyer et al. [5] discussed on the importance of OCR for
self-organizing map neural network (SOM NN) which can be used digitizing Indian literature, together with their methods. A
for character recognition tasks. It describes the results of training three-layer back-propagation neural network with 23 inputs
a SOM NN to perform optical character recognition on images of and 31 outputs was trained. They claimed a recognition rate of
printed characters. 49 features have been used to distinguish in recognizin rate of
between 62 characters (both uppercase and lowercase letters of 76% in recognizing Devanagari characters.
the English language and numerals). The implemented program Park et al. [6] compared the use of neural networks together
recognizes text by analyzing an image file. The text to be with two other classification methods. Their network was
recognized is currently limited to characters typed using the trained to recognize 26 handwritten characters using the "A*-
Verdana font type, bolded with a font size of 18. The program is like" algorithm. The back-propagation network trained
capable of handling non-ideal images (noisy, colored text, rotated
image). Recognition accuracy is consistently 100% for ideal
images, but ranges between 80% - 100% for non-ideal images. ewas
re orted
consisted
of three layers wth 680 inputs and 26 outputs.
1172
D. Non-ideal Images 3) Images with Rotated Text
A number of processes have been introduced to overcome
imperfect image conditions. Examples include: S Y
1173
Rtecognisedl MklONXlASHI lUNIVBllE:lRSI:TY MVlALAYSIA Appendix E presents the results of recognizing text in a
T1ei rotated image. Recognition accuracy for this set of images is
Fig. 9 The recognized characters are displayed to the user. between 81% and 92%. Rotating an image causes some data
corruption on a pixel scale due to interpolation (Fig. 11). While
The above-mentioned processes were implemented into a visually insignificant, it is evident that these changes are
full-fledged program written in Matlab (Fig. 8). Users are sufficient to disrupt the recognition process.
allowed to load an image file and set a number of parameters if IV. DISCUSSION
the image is non-ideal (Section D).
The program is capable of recognizing an image with A. Choice of Font Properties Used
multiple lines of text. This is achieved by automatically The original attempt was to recognize text using the Courier
determining the number of distinct lines of text in the image. New font type. This font was chosen as the characters are of
Each line is cropped and analyzed separately. near-equal width and is commonly used (the default font type
Recognition process is based on the pre-trained SOM neural for Windows Notepad). However, inspection of the dataset
network. Characters that are identified and recognized are revealed the presence of 'protrusions' at the ends of some
presented in a text box (Fig. 9). The user has the option of characters (Fig. 12). These protrusions occasionally caused
saving the characters to a text file. adjacent characters to be connected. This in turn prompted the
III. RESULTS program to frequently wrongly label groups of characters as
one object.
The program and its algorithms have been tested with a
variety of images (e.g. ideal, noisy, colored, or rotated). Due to
the assumptions previously mentioned, testing images were [L1vW Z
obtained
veconia
obtained via screenshots pormtheThe images Fig. 12 Protrusions at the ends of some characters in Courier New font type.
ofra trsexteditor
reenshts of a text editor program. Appendimes
and recognition results are presented in the Appendices________________
section.
Appendix B presents the results of testing the SOM NN with
ideal images. Images are a mixture of uppercase and lowercase
VUUW XY
letters, together with numerals. Recognition accuracy with this Fig. 13 Characters using Verdana font type lack the protrusions.
set of images is consistently 100%.
Text editors use a font size of 12 by default. This size is
suitably large enough for the human eye to distinguish between
M O NrAS H M N AS O, characters. However, the characters are made up of insufficient
pixels from which significant feature data can be extracted.
NASH.14O | 4O NA Hu fm It was found that characters using the Verdana font type,
|bolded and at a font size of 18 produced characters that were
sufficiently spaced apart. In addition, the larger font size used
IM O N:AS H I M O5
NAS Hproduced characters that consisted of more pixels (Fig. 13).
This increase in pixel area allowed better and more significant
Fig. 10 (Left) Three noisy images and (Right) the cleared-up images. Notice feature data to be obtained for training the neural network.
that the noise removal process does not remove noise at object borders.
B. Features Used to Distinguish Characters
Appendix C presents the results of recognizing text in Initially, 15 features were sufficient to differentiate between
images that have some noise introduced in it ('salt & pepper' 26 uppercase letters. Increasing the set of characters to be
with noise density of 0.02). Recognition accuracy is in the recognized to include numerals (total of 36) required the use of
range of 90% to 100%. This drop can be attributed to the noise 26 features.
removal process. Particles at an object's boundary are not The increase of feature data required to learn an additional
removed, as the program cannot determine whether the particle 10 characters (26 to 36) is not an indication that the neural
pixels are part of the object or are noise (Fig. 10). network has difficulties in adapting internal weights to learn to
Appendix D presents the results of recognizing colored text recognize more characters. Rather, based on observations, it is
in an image. Recognition accuracy ranges from 95% to 100%. the fact that many characters share similar features (e.g.
The issue lies in selecting a proper threshold value to convert bounding box area, Euler number, orientation). There are few
the colored image to a binary image. A value that properly feature data which present totally different values for all 36
thresholds an image with light colors does not produce the characters. Thus, additional feature data is used to support and
same desirable results for an image with dark colors. reinforce other key features.
Additional feature data was generated by dividing a
A CM D,n F
^ p vo" r G
r H
a I
^ 3
w K
R k character
~~was followed twocalculating
into by segments (top-half and bottom-half).
certain feature This
data by treating
Fig. 11 Discrepancies in the image are clearly visible as a result of the rotation eahsg ntsasprteojc.Falyitwsfudhtte
processes.
~ ~ 49 features used (Table I) were the minimum number required
1174
to correctly train the SOM network to distinguish between 62 font size that the neural network has been trained to
different characters of the English language. recognize.
Organizing Map
C. C.SelfSelf-Organizing Neural Networks
Map Neural Networks
0 Recognize frequently used symbols (e.g. '.', ',',ml
.?' et) Loaigtepstoso.hs '!',
Neural networks are ideal for recognition tasks. Feature data s o,ets). Locatnng the posu thons of these small
for characters in non-ideal images differ slightly to that of ideal matchino ertons (sgourie T sform).
images (different pixel values). However, the SOM neural
network is capable of correctly recognizing these characters. * Enable the program to handle slightly stretched or
The results discussed above prove the robustness of using a skewed images.
network that has been trained with an ideal dataset to recognize * Train the SOM neural network using better feature
characters under different conditions. data (less affected by rotational and skew changes).
The learning rate for the neural network was kept at a value * Implementation of a more reliable noise-removal
of 0.01. The network learned at a slower pace, but the learning algorithm, such that noisy pixels which appear on the
was more accurate. Initially, it was noticed that the network border of characters are properly removed.
had to run for 700 epochs before it successfully learned all 26 VI. CONCLUSION
uppercase letters. It was thus expected that the network would
require a higher number of epochs to learn the additional 10 An optical character recognition program has been
numerals. implemented using Matlab. The character recognition process
However, on training, the SOM neural network required no is performed using 49 features as inputs to a self-organizing
increase in the number of epochs to learn the fully combined map neural network. The neural network is capable of
set of uppercase letters, lowercase letters and numerals. This distinguishing 62 characters of the English language (both
leads to the belief that teaching the network to recognize uppercase and lowercase letters, together with numerals 0 - 9).
different characters is a matter of presenting sufficient In addition, the program is also capable of recognizing
significant feature data to the network, rather than training it characters in images that are noisy, colored or rotated.
for a long period (large number of epochs). Recognition accuracy is consistently 100% for ideal images,
The time required for the recognition process is dependent but ranges between 81% and 100% for non-ideal images.
on both the number of characters and distinct lines in the VII. REFERENCES
image. Recorded time ranges from less than a second for an [1] S. Rost (1998), "Character Recognition: Image and Video Processing
image with only a few characters, to 12 seconds for an image Project One by Stanislav Rost", Available:
with four lines and 98 characters. http://web.mit.edu/stanrost/www/cs585pl/pl.html (Accessed: 2005,
September 2).
D. Comparison [2] V. Kwatra, "Optical Character Recognition", Available:
In the literature review, some works related to OCR have http://www.cc.gatech.edu/-kwatra/computer_vision/ocr/OCR.html
been highlighted. Each of the works has focused on different (Accessed: 2005, September 2).
aspects and applications of character recognition. As such, [3] V. Rao, "Greek Letter Recognition Project", Available:
http:Hlumsis.miami.edu/-vrao/project/reportl.html (Accessed: 2005,
there are no similar techniques adopted for OCR comparison. September 2).
Hence, a straightforward one-to-one comparison with other [4] R. Vogt, M. Janeczko, J. LoPorto, and J. Trenkle, "Neural Network
existing techniques is not possible at this stage. Recognition of Machine-Printed Characters", Proceedings of the Fifth
U.S.P.S. Advanced Technology Conference, Vol. 2, pp. 715-725, 1992.
E. Applications of the Work Presented [5] P. Iyer, A. Singh, and S. Sanyal, "Optical Character Recognition for Noisy
A wide variety of applications with respect
to the use of Images in Devanagari Script", UDL Workshop on Optical Character
OCR techniques exist. Examples of applications where the [6] Recognition
J. Park and V.with Workflow and
Govindaraju, Document Summarization, 2005.
"Active Character Recognition Using A*-like
work presented above include: Algorithm", IEEE Conference on Computer Vision and Pattern
* Label scanning. Recognition (CVPR), IEEE, 2000.
[7] K. Kedem., I. Bar-Yosef, "Character Recognition of Ancient Hebrew
* Converting books to an electronic format. Manuscripts", Available: http://www.cs.bgu.ac.il/-klara/Project.htm
* Processing bank checks [I11]. (Accessed: 2005, September 5).
* Extracting text from a screenshot (image). [8] S. Liao, A. Chiang, Q. Lu, M. Pawlak, "Chinese Character Recognition via
* Extrcnseatin
Licensetg.
plate recognition. Gegenbaur Moments", 16th International Conference on Pattern
Recognition, IEEE, pp. 485 - 488, 2002.
V. FUTURE WORK [9] R. Krasteva, A. Boneva, D. Butchvarov, V. Geortchev "Basic components
in Optical Character Recognition Systems. Experimentally Analysis on
Future works that can be performed to further improve on Old Bulgarian Character Recognition", Academic Open Internet Journal,
Volume 5, http://www.acadjournal.com/, 2001.
the optical character recognition program include: [10]A. Cherkasov (2003), "Creating Optical Character Recognition (OCR)
* Recognizing images of text typed with the Verdana applications using Neural Networks", Available:
font type, but with different font sizes. This can be http://www.codeproject.com/dotnet/simple-ocr.asp, (Accessed: 2006, June
achieved by scaling the read image file to a standard 11). Natschlaeger, "Optical Character Recognition", Available:
[1 l]T.
http://www.igi.tugraz.atllehre/CI (Accessed: 2005, September 2).
1175
ng,1lD.~ ~ 87/ 10.%
VIII. APPENDICES
Appendix A. Results of Correcting and Extracting Images Oriented at Different Angles
ISCijARLES
NAME IS CHARLES
HiELLO MY NAME IS CHiARLES CHARLES____
ARLO5YNA EI 19__20 95__0
___
LS CH
HELLO MY
Appendix C. Images with Noise ('Salt and pepper' noise with a tolerance of 0.02)
|MONAS" UNIVERSITY MSIA 200 MONASH UNZVERSIYY MS1A 2005 21/24 87.5%
This text has been rotated 2 degrees CCW This tact hos beon rotated Z degraes CCW 27/33 81.8%
1176