Sei sulla pagina 1di 6

Optical Character Recognition Program for Images of

Printed Text using a Neural Network


Velappa Ganapathy Charles C. H. Lean
School of Engineering School ofEngineering
Monash University Malaysia Monash University Malaysia
2 Jln Kolej, Bandar Sunway, 46150 Selangor, Malaysia 2 Jln Kolej, Bandar Sunway, 46150 Selangor, Malaysia
velappa.ganapathy @ eng. monash. edu. my charleslean @gmail. com

Abstract - In this paper we present a simple method using a Iyer et al. [5] discussed on the importance of OCR for
self-organizing map neural network (SOM NN) which can be used digitizing Indian literature, together with their methods. A
for character recognition tasks. It describes the results of training three-layer back-propagation neural network with 23 inputs
a SOM NN to perform optical character recognition on images of and 31 outputs was trained. They claimed a recognition rate of
printed characters. 49 features have been used to distinguish in recognizin rate of
between 62 characters (both uppercase and lowercase letters of 76% in recognizing Devanagari characters.
the English language and numerals). The implemented program Park et al. [6] compared the use of neural networks together
recognizes text by analyzing an image file. The text to be with two other classification methods. Their network was
recognized is currently limited to characters typed using the trained to recognize 26 handwritten characters using the "A*-
Verdana font type, bolded with a font size of 18. The program is like" algorithm. The back-propagation network trained
capable of handling non-ideal images (noisy, colored text, rotated
image). Recognition accuracy is consistently 100% for ideal
images, but ranges between 80% - 100% for non-ideal images. ewas
re orted
consisted
of three layers wth 680 inputs and 26 outputs.

Other researches that have been conducted in the field of


Index Terms - Optical character recognition, feature extraction, O
OCR iinclude the recognition of ancient Hebrew manuscripts
.. neural network, image
artificial ...........
.. processing, recognition accuracy.
[7] and Chinese characters [8]. Krasteva et al. [9] presented
their findings on recognizing Old Bulgarian characters.
I. INTRODUCTION Almost all of the literature reviewed reported on similar
Optical character recognition (OCR) programs analyze steps implemented to train neural networks. These steps fall
images of text. Through the use of image processing functions, under the major headings of image pre-processing,
individual characters are located andrecognized. segmentation, feature extraction and classification. They
OCR programs that implement the use of neural networks constitute the main steps in training each author's neural
have been presented by Rost [1] and Kwatra [2]. Both these network.
networks have been similarly trained to specifically recognize It was found that multi-layered back-propagation neural
vowels of the English language. networks were the most popular choice of neural networks
Rost's network was trained to recognize only four vowels of used. Different authors reported on the use of different transfer
the English alphabet (excluding the letter 'I'). Features such as functions within the network that improved the recognition
Hu-moments, normalized central moments, and gravity centers accuracy.
were computed for each vowel. However, little mention was made on the reasons for their
In comparison, Kwatra's network could recognize all five choice of neural networks, nor the choice of values for the
vowels. Kwatra applied similar feature extraction techniques parameters used. Authors also frequently failed to list or
and included additional features such as the use of a shape discuss on the features extracted which were used to train the
mask, Euler number, area and orientation. artificial neural network.
Rao's [3] program could identify thirteen Greek letters This paper presents the steps used to train a self-organizing
without the implementation of neural networks. His work map neural network to perform character recognition tasks,
proves that it is possible to apply similar feature extraction instead of the more popular back-propagation neural network.
techniques to images of different written languages for A reduced number of features have been used to train the
recognition purposes. network, which is capable of recognizing up to 62 different
Vogt et al. [4] discussed about their work on recognizing characters.
machine printed characters. They reported on using 245 II. THE CHARACTER RECOGNITIONNPROCESS
features to train a three-layer back-propagation neural network
to perform the recognition tasks. Scarce mention was made on A. Objectives and Assumptions
the features used. Their neural network could recognize 26 The aim is to formulate a Matlab-based program that
letters of the English alphabets together with 10 numerals. implements the use of a trained neural network to recognize

1-4244-0726-5/06/$20.OO '2006 IEEE 1 171


characters from an image file. The recognized characters are in.
printed to screen or saved to a text file. Assumptions made to
implement optical character recognition for the scope of this _
work are:
* The characters to be recognized are of the Verdana
font type, bolded, with a font size of 18.
* The text is a darker color than the background. Fd
* The characters to be recognized are uppercase letters,
lowercase letters, and numerals of the English
language. Fig. 3 Separate objects are cropped from the ofiginal image.
* The image file is stored in *.PNG (Portable Network
Graphics), a lossless compression format. TABLE I
FEATURES EXTRACTED TO TRAIN THE NEURAL NETWORK
* The image file consists of text only (no graphics). Area Bounding box height Bounding box width
B. Feature Extraction Centroid (2) Compression Eccentricity
Equivalent diameter Euler number Extent (2)
[A BCD E FG H I 1 KLM NO PQ RSTU VW XYZ Extrema points (6)
Hu-moments (7)
Filled area
Major axis length
Horizontal intersects (3)
Minor axis length
[a b c d e f g h i j k I m n o p q r s t u v w x YZ Orientation Perimeter Vertical intersects (3)
Second moments of Centroid of objects in Extent of objects in half
1_2_3_4_5_6_7_8_9|l
|_ area (2) half the image (4) the image (2)
Fig. 1 Sample images used to train the neural network. NNumber of objects Orientation of objects Sum of object areas in
in half the image (2) half the image (2)
F

in half the image (2)

Read image file (RGB)


c. Training the Artificial Neural Network
Threshold obtain binary
to obtain
Threshold to binary image
image Artificial neural networks (NNs) are collections of
mathematical models that draw on the analogies of adaptive
l , v biological learning. The key element of a neural network is the
Label all objects in ima topology. It consists of a large number of highly interconnected
I .s processing elements (nodes) that are tied together with
Remove small particles (e.g. the dot in 'i' and j weighted connections (links).
l- . t
Re-label all objects in the image
Learning typically occurs by example through training, or
exposure to a set of input-output data (patterns), where the
. + training algorithm adjusts the link weights. The link weights
Crop individual characters and obtain image features store the knowledge necessary to solve specific problems [10].
A competitive learning self-organizing map (SOM) neural
END network was trained using the training dataset. The inputs to
the network consist of 49 features that allow the network to
Fig. 2 Process flowchart for extracting object features. distinguish between 62 different
dsigihbten6 ifrn characters.
hrces
A neural network was trained to recognize 62 different The SOM network has 62 output neurons, corresponding to
characters of the English language: 26 uppercase letters, 26 the number of characters it has been trained to recognize. SOM
lowercase letters, and 10 numerals. Fig. 1 displays the set of networks can learn to detect regularities and constraints in their
training images used. Fig. 2 describes the process involved in inputs, and adapt their future responses to those inputs
extracting the features relevant to each training character. accordingly.
The image file was thresholded to obtain a binary image. The competitive transfer function returns neuron outputs of
Each distinct character was represented as an object (particle). 0 for all neurons except the winner (output of 1), which acts as
Small particles such as the dot in 'i' and 'j' were removed. a 'flag'. The winner is the neuron associated with the most
This was to avoid individual characters being mislabeled as positive element of the network input. The weights of the
two separate objects. winning neuron are adjusted with the Kohonen learning rule.
Individual objects (characters) were cropped and analyzed The Kohonen rule allows the weights of a neuron to learn an
separately (Fig. 3). For each character, 49 features were input vector, and is very useful in recognition algorithms.
extracted (Table 1). The entire feature dataset for all 62 It is for these reasons that this type of neural network was
characters was normalized to produce a training dataset with chosen over the more popular feed-forward, back-propagation
values ranging between 0 and 100. neural network. In addition, determining the ideal parameter
values for back-propagation networks (e.g. number of layers,
number of neurons in each layer, transfer functions used, goal,
and number of epochs) is a tedious task. A SOM network
simplifies this matter as its only parameters are the learning
rate and the number of training epochs.

1172
D. Non-ideal Images 3) Images with Rotated Text
A number of processes have been introduced to overcome
imperfect image conditions. Examples include: S Y

1) Images with noise. l pQRSTUVWX


2) Images with colored text. B C GDEF I3 K
3) Images with rotated text.
Fig. 6 Objects in the rotated image are divided into two groups.
1) Images with Noise
The noise in an image constitutes objects with small area To recognize images with rotated text, the line of text in the
values (relative to the area of proper characters). To remove image is iteratively adjusted back to the horizontal. This
small amounts of noise, each object's area is checked. If the method involves first dividing the number of objects in the
area value is below a threshold, the particle is assumed to be image into two groups. In each iteration, the distance between
noise. Hence, it is removed from the binary image. the top of each group's bounding box is measured relative to
This threshold value is obtained based on analysis of the the image's top border. The distances are denoted as d1 and d2
training dataset. The minimum character area value is first respectively (Fig. 6).
identified, after which the threshold value is chosen to be less The sign and magnitude of the difference between these
than that value. Thus, any objects with area less than the distance values are a measure of the direction and angle of
threshold can be assumed to be noise. rotation. If (d1 - d2) is positive, this implies that the text is
Fig. 4 shows an example of the noise removal process. An rotated in a counter-clockwise direction. Conversely, if the
assumption is that the noise appears in a colored format. To difference is negative then the image is rotated in a clockwise
convert the colored image to binary, thresholding is used. The direction.
noise appears as white spots against a dark background, or dark If the magnitude of the difference is more than 0.5, the
spots against a white background. Fig. 4 also shows the result image is determined to be at an angle. The image is rotated 1°
after noise particles have been identified and removed. in the opposite direction to reduce the difference between d1
and d2. This process is repeated until one of two terminating
F ]] conditions is met:
^A B C D E F G H I]J KIL M N O} I*I. The magnitude of the difference is less than 0.5, or;
|IA B C D E* F G.H. I K1K-: M N o::|| . Ten rotations have been performed.
LA B C D E F G H I I K L M N O
Fig. 4 (Top) Image with noise; (Middle) Result after thresholding;
A 8L C DC ES F G HVIJX K LS M N0 EP Q R S T U VEWE X>E
Fig. 7 The resultant image after rotating the image back to the horizontal.
(Bottom) Result after the noise removal process.
The image is rotated in increments of 1° as this value is the
minimum angle required for there to be a discernible change in
pixel values. These pixel values are determined using the
bicubic interpolation method.
AB C D I 3 K L M N 0 The second terminating condition accounts for the cases
where the image is at an angle of 0.5°. Subsequent efforts to
rotate it back to the horizontal causes the program to enter an
lr E F G H I I K L M N 0 |I endless loop. This causes the image to be rotated in alternate
directions with subsequent iterations.
AB C D I J K I M N 0 Fig. 7 shows the result of rotating an image that was initially
at an angle of 5° in the counter-clockwise direction (Fig. 6).
A B C D E F G H M N 0 Appendix A shows more results of rotating images with
different initial angles.
A C D E F G H I JK L M N O E. Implementation
Fig. 5 (Top) Image with colored text; (Middle) Results of thresholding the R-,
G- and B- color planes separately; (Bottom) Result of binary addition. ...OCR....

Converting an image with colored text to a binary image MONA UNIVERSITYALAYSIA


involves multiple thresholding processes. The image is first
separated into the three primary color planes (Red, Green, and msNTu U MALAYSI
,
, ~~~~~~~MONASH,UNIVERSITY MALAYIA iBim
Blue). Each plane is thresholded separately, after which they
are combined using binary addition (Fig. 5).r "=x0 - ** A0
Fig. 8 Graphical user interface of the OCR program.

1173
Rtecognisedl MklONXlASHI lUNIVBllE:lRSI:TY MVlALAYSIA Appendix E presents the results of recognizing text in a
T1ei rotated image. Recognition accuracy for this set of images is
Fig. 9 The recognized characters are displayed to the user. between 81% and 92%. Rotating an image causes some data
corruption on a pixel scale due to interpolation (Fig. 11). While
The above-mentioned processes were implemented into a visually insignificant, it is evident that these changes are
full-fledged program written in Matlab (Fig. 8). Users are sufficient to disrupt the recognition process.
allowed to load an image file and set a number of parameters if IV. DISCUSSION
the image is non-ideal (Section D).
The program is capable of recognizing an image with A. Choice of Font Properties Used
multiple lines of text. This is achieved by automatically The original attempt was to recognize text using the Courier
determining the number of distinct lines of text in the image. New font type. This font was chosen as the characters are of
Each line is cropped and analyzed separately. near-equal width and is commonly used (the default font type
Recognition process is based on the pre-trained SOM neural for Windows Notepad). However, inspection of the dataset
network. Characters that are identified and recognized are revealed the presence of 'protrusions' at the ends of some
presented in a text box (Fig. 9). The user has the option of characters (Fig. 12). These protrusions occasionally caused
saving the characters to a text file. adjacent characters to be connected. This in turn prompted the
III. RESULTS program to frequently wrongly label groups of characters as
one object.
The program and its algorithms have been tested with a
variety of images (e.g. ideal, noisy, colored, or rotated). Due to
the assumptions previously mentioned, testing images were [L1vW Z
obtained
veconia
obtained via screenshots pormtheThe images Fig. 12 Protrusions at the ends of some characters in Courier New font type.
ofra trsexteditor
reenshts of a text editor program. Appendimes
and recognition results are presented in the Appendices________________
section.
Appendix B presents the results of testing the SOM NN with
ideal images. Images are a mixture of uppercase and lowercase
VUUW XY
letters, together with numerals. Recognition accuracy with this Fig. 13 Characters using Verdana font type lack the protrusions.
set of images is consistently 100%.
Text editors use a font size of 12 by default. This size is
suitably large enough for the human eye to distinguish between
M O NrAS H M N AS O, characters. However, the characters are made up of insufficient
pixels from which significant feature data can be extracted.
NASH.14O | 4O NA Hu fm It was found that characters using the Verdana font type,
|bolded and at a font size of 18 produced characters that were
sufficiently spaced apart. In addition, the larger font size used
IM O N:AS H I M O5
NAS Hproduced characters that consisted of more pixels (Fig. 13).
This increase in pixel area allowed better and more significant
Fig. 10 (Left) Three noisy images and (Right) the cleared-up images. Notice feature data to be obtained for training the neural network.
that the noise removal process does not remove noise at object borders.
B. Features Used to Distinguish Characters
Appendix C presents the results of recognizing text in Initially, 15 features were sufficient to differentiate between
images that have some noise introduced in it ('salt & pepper' 26 uppercase letters. Increasing the set of characters to be
with noise density of 0.02). Recognition accuracy is in the recognized to include numerals (total of 36) required the use of
range of 90% to 100%. This drop can be attributed to the noise 26 features.
removal process. Particles at an object's boundary are not The increase of feature data required to learn an additional
removed, as the program cannot determine whether the particle 10 characters (26 to 36) is not an indication that the neural
pixels are part of the object or are noise (Fig. 10). network has difficulties in adapting internal weights to learn to
Appendix D presents the results of recognizing colored text recognize more characters. Rather, based on observations, it is
in an image. Recognition accuracy ranges from 95% to 100%. the fact that many characters share similar features (e.g.
The issue lies in selecting a proper threshold value to convert bounding box area, Euler number, orientation). There are few
the colored image to a binary image. A value that properly feature data which present totally different values for all 36
thresholds an image with light colors does not produce the characters. Thus, additional feature data is used to support and
same desirable results for an image with dark colors. reinforce other key features.
Additional feature data was generated by dividing a
A CM D,n F
^ p vo" r G
r H
a I
^ 3
w K
R k character
~~was followed twocalculating
into by segments (top-half and bottom-half).
certain feature This
data by treating
Fig. 11 Discrepancies in the image are clearly visible as a result of the rotation eahsg ntsasprteojc.Falyitwsfudhtte
processes.
~ ~ 49 features used (Table I) were the minimum number required

1174
to correctly train the SOM network to distinguish between 62 font size that the neural network has been trained to
different characters of the English language. recognize.
Organizing Map
C. C.SelfSelf-Organizing Neural Networks
Map Neural Networks
0 Recognize frequently used symbols (e.g. '.', ',',ml
.?' et) Loaigtepstoso.hs '!',
Neural networks are ideal for recognition tasks. Feature data s o,ets). Locatnng the posu thons of these small
for characters in non-ideal images differ slightly to that of ideal matchino ertons (sgourie T sform).
images (different pixel values). However, the SOM neural
network is capable of correctly recognizing these characters. * Enable the program to handle slightly stretched or
The results discussed above prove the robustness of using a skewed images.
network that has been trained with an ideal dataset to recognize * Train the SOM neural network using better feature
characters under different conditions. data (less affected by rotational and skew changes).
The learning rate for the neural network was kept at a value * Implementation of a more reliable noise-removal
of 0.01. The network learned at a slower pace, but the learning algorithm, such that noisy pixels which appear on the
was more accurate. Initially, it was noticed that the network border of characters are properly removed.
had to run for 700 epochs before it successfully learned all 26 VI. CONCLUSION
uppercase letters. It was thus expected that the network would
require a higher number of epochs to learn the additional 10 An optical character recognition program has been
numerals. implemented using Matlab. The character recognition process
However, on training, the SOM neural network required no is performed using 49 features as inputs to a self-organizing
increase in the number of epochs to learn the fully combined map neural network. The neural network is capable of
set of uppercase letters, lowercase letters and numerals. This distinguishing 62 characters of the English language (both
leads to the belief that teaching the network to recognize uppercase and lowercase letters, together with numerals 0 - 9).
different characters is a matter of presenting sufficient In addition, the program is also capable of recognizing
significant feature data to the network, rather than training it characters in images that are noisy, colored or rotated.
for a long period (large number of epochs). Recognition accuracy is consistently 100% for ideal images,
The time required for the recognition process is dependent but ranges between 81% and 100% for non-ideal images.
on both the number of characters and distinct lines in the VII. REFERENCES
image. Recorded time ranges from less than a second for an [1] S. Rost (1998), "Character Recognition: Image and Video Processing
image with only a few characters, to 12 seconds for an image Project One by Stanislav Rost", Available:
with four lines and 98 characters. http://web.mit.edu/stanrost/www/cs585pl/pl.html (Accessed: 2005,
September 2).
D. Comparison [2] V. Kwatra, "Optical Character Recognition", Available:
In the literature review, some works related to OCR have http://www.cc.gatech.edu/-kwatra/computer_vision/ocr/OCR.html
been highlighted. Each of the works has focused on different (Accessed: 2005, September 2).
aspects and applications of character recognition. As such, [3] V. Rao, "Greek Letter Recognition Project", Available:
http:Hlumsis.miami.edu/-vrao/project/reportl.html (Accessed: 2005,
there are no similar techniques adopted for OCR comparison. September 2).
Hence, a straightforward one-to-one comparison with other [4] R. Vogt, M. Janeczko, J. LoPorto, and J. Trenkle, "Neural Network
existing techniques is not possible at this stage. Recognition of Machine-Printed Characters", Proceedings of the Fifth
U.S.P.S. Advanced Technology Conference, Vol. 2, pp. 715-725, 1992.
E. Applications of the Work Presented [5] P. Iyer, A. Singh, and S. Sanyal, "Optical Character Recognition for Noisy
A wide variety of applications with respect
to the use of Images in Devanagari Script", UDL Workshop on Optical Character
OCR techniques exist. Examples of applications where the [6] Recognition
J. Park and V.with Workflow and
Govindaraju, Document Summarization, 2005.
"Active Character Recognition Using A*-like
work presented above include: Algorithm", IEEE Conference on Computer Vision and Pattern
* Label scanning. Recognition (CVPR), IEEE, 2000.
[7] K. Kedem., I. Bar-Yosef, "Character Recognition of Ancient Hebrew
* Converting books to an electronic format. Manuscripts", Available: http://www.cs.bgu.ac.il/-klara/Project.htm
* Processing bank checks [I11]. (Accessed: 2005, September 5).
* Extracting text from a screenshot (image). [8] S. Liao, A. Chiang, Q. Lu, M. Pawlak, "Chinese Character Recognition via
* Extrcnseatin
Licensetg.
plate recognition. Gegenbaur Moments", 16th International Conference on Pattern
Recognition, IEEE, pp. 485 - 488, 2002.
V. FUTURE WORK [9] R. Krasteva, A. Boneva, D. Butchvarov, V. Geortchev "Basic components
in Optical Character Recognition Systems. Experimentally Analysis on
Future works that can be performed to further improve on Old Bulgarian Character Recognition", Academic Open Internet Journal,
Volume 5, http://www.acadjournal.com/, 2001.
the optical character recognition program include: [10]A. Cherkasov (2003), "Creating Optical Character Recognition (OCR)
* Recognizing images of text typed with the Verdana applications using Neural Networks", Available:
font type, but with different font sizes. This can be http://www.codeproject.com/dotnet/simple-ocr.asp, (Accessed: 2006, June
achieved by scaling the read image file to a standard 11). Natschlaeger, "Optical Character Recognition", Available:
[1 l]T.
http://www.igi.tugraz.atllehre/CI (Accessed: 2005, September 2).

1175
ng,1lD.~ ~ 87/ 10.%
VIII. APPENDICES
Appendix A. Results of Correcting and Extracting Images Oriented at Different Angles

10 HELLO MY NAME IS CHARLES M HELLO MY NAME IS 19/20 95.0%


HELLMYNMEICHALESHELLO M NAME ICHARLES CHARL5S
20 HELLO MY NAME fS
HELLO MY NAME IS CHARLES HELLO MY NAME I5 CHARLES CHARLES 19/20 95.0%
30 HELLO MY NAME IS CHARLES HELLO MY NAME IS CHARLES CgARLES
HELLOMYNAMEfS 18/20 90.0%
5 |
40 HELLO MY NAM5 IS 18/20 90.0%

ISCijARLES
NAME IS CHARLES
HiELLO MY NAME IS CHiARLES CHARLES____
ARLO5YNA EI 19__20 95__0
___
LS CH
HELLO MY

Appendix B. Ideal Images


MONASH UNIVERSITY MSIA 2005 MONASH UNIVERSITY MSIA 2005 24/24 100.0%
HELLO MY NAME IS CHARLES HELLO MY NAME IS CHARLES 20/20 100.0%
This text is in VERDANA font type This text is in VERDANA font type
Font size 18 and BOLDED Font size 18 and BOLDED 83/83 100.0%
There are 3 lines and 83 letters in this image There are 3 lines and 83 letters in this image
Hello Dr V Ganapathy Hello Dr V Ganapathy
Do U know that there are 365 days in a year Do U know that there are 365 days in a year 98/98 100.0%
It is 27 degrees Celcius It is 27 degrees Celcius
Today is the 5th of September 2005 Today is the 5th of September 2005

Appendix C. Images with Noise ('Salt and pepper' noise with a tolerance of 0.02)

[ HELLO MY NAME IS CHARLES HELLO MY NAME IS CHARLES 20/20 100.0%


MONASH1 UNIVERSITY MALAYSIA MONASH UNIVERSITY MALWYSfA 22/24 91.7%
HOW MUCH WOOD WOULD A WOODCHUCK HOW MUCH WOOD WOULD A WOODCHUCK
CHUCK IF A WOODCHUCK COULD CHUCK WOOUD:0 CHUCK IF A WOODCHUCK COHLD CHUCK WOOD 5 9
1 This is a sanm ptle of what the, 1 This is a sampla of what tha
2 OPTICAL CHARATEZR RECOGNITION z OPTICAL CHARACYER RECOGNITION 64/71 90.1%
-3- program can recognise 3 progrom san resognise
Appendix D. Images with Colored Text
ImageRecogized ext Crrec Accuracy
A C E F IJ ABCDEFGHIJ 10/10 100.0%
MONASH UNIVER MSIA 2005 MONASH UNIVERSITY MSIA 3005 23/24 95.8%
Black Brown Olive Black Brown Olive 27/27 100.0%
Red Blue Green Red Blue Green
3 blind Mice see HOW they run 3 blind Mice see HOW they run
lack a Jill went UP the Hill Jack and Jill went UP the Hill 1
Blak
..W. aB,
........., pBaa Baa Black Sheep
Twinkle twinkle I E star Twinkle twinkle LITTLE star
Appendix E. Images with Rotated Text

|HELLO MY NAME IS CHARLES HELLO MY NAM 15SCHARL5S 17/20 85.0%

|MONAS" UNIVERSITY MSIA 200 MONASH UNZVERSIYY MS1A 2005 21/24 87.5%

ABCIE F H 3 KLMNO PQRSTUVW XYZ


ABCDEFGHfJKLMNOPQRSYUVWXYZ 24/26 92.3%

This text has been rotated 2 degrees CCW This tact hos beon rotated Z degraes CCW 27/33 81.8%

XThaSbeeXrOTated7d I~This tovthasboon rotatod 7dogrees CW |26/32 881.3%

1176

Potrebbero piacerti anche