OCR

Fundamental Challenges to Mobile Based OCR
Abstract: Recently, the world has seen a giant leap in the field of mobile computing. Many
applications that were conventionally based on desktop computers, workstations and specialized
hardware are now being incorporated for mobile devices as well. Optical Character Recognition
being one of them, this paper explores various challenges faced to develop such a technology on
a mobile platform in contrast with conventional platforms such as an optical scanner. The paper
focuses on pre-processing steps required to correct the acquired images through cameras and
mobile devices in order to use them with traditional feature extraction and recognition software.
The paper also explores some solutions and workarounds as adopted by OCR engines including
Tesseract, OCR Droid and ABBYY. The paper lists numerous valuable references in the field and
may serve as a repository for the same.
Keywords: Optical Character Reader, Mobile, Camera, Pre-processing, Document Analysis,
OCR.

Atishay Jain

Akriti Dubey

Rachit Gupta
Nitin Jain
PoojaTripathi
Associate Professor

I SSN 2319-9725

May, 2013 www.ijirs.com Vol 2 Issue 5

International Journal of Innovative Research and Studies Page 87

1. Introduction:
Optical Character Recognition may be defined as the electronic analysis of an image in order
to identify the areas containing textual information and extracting/recognizing the text from
the given image.
Conventionally, scanners have been used to acquire images of the documents. The document
images are scanned using flat-bed, sheet-fed or mounted imaging devices. Analysis of images
acquired by scanners and the problems associated with them are constrained by their inherent
properties. Scanners have been used to acquire images of paper printed or handwritten
documents such as legal documents, journal papers, cheques, forms, for archiving books, etc.
Substantial portion of these documents is text, while figures and pictures are allowed. If the
scanned images are of a good resolution, processing and feature extraction follows with ease.
A resolution of 300dpi or more is generally adequate for OCR. Optical Character Recognition
in scanned documents and their related problems have been widely addressed in literature.
Recently, with the increasing availability of digital cameras and mobile devices at affordable
prices, the community is showing an increased interest in Camera based OCR. These devices
can also be used in areas where the use of scanners is not possible, e.g. recognising textual
information in a scene, reading of cargo container codes with uneven surfaces[1]. These
devices along with their totally different way of image acquisition pose a new set of
challenges. The lighting conditions may not be proper, the imaged surface may be textured or
a non-planar surface, the document plane and image plane may not be parallel leading to
perspective distortion. We have frequently used business cards in our paper to demonstrate
the challenges, because of their familiarity. However it must be noted that, document may
refer to any scene from where text can be extracted, including License plates of cars[2], codes
written on shipping containers[1], Labels on cylindrical bottles, etc. The challenges are
discussed in detail in section 3.

2. Steps Involved in OCR:
The process of OCR generally involves the following 3 steps:


Figure 1: Flowchart showing steps involved in OCR
2.1. Pre-Processing:
The raw image is generally not suitable for recognition task and there is a need for it to be
enhanced, e.g., the text should be contrasting with the background(Black on white), should be
projected to a uniform size, such as 12pt. 300dpi, characters be sharpened wherever possible.
The image is subjected some preliminary processing steps to make it usable for further steps
in processing and character recognition. Pre-processing aims to produce data that are easy for
the OCR systems to operate upon accurately. It mainly involves one or more of the following
steps:
2.1.1. Noise Reduction:
Compared to optical scanners, digital cameras are inherently more prone to noise due to their
mode of operation. CCD/CMOS sensors suffer from dark and read-out noise. Also, in low-
light conditions, or when sensors are exposed to less light due to a smaller aperture more
noise is observed. High shutter speed and ISO also contribute to increased noise. Blurring
may be introduced due to motion.
Contrast stretching may be used as introduced by Kuo and Ranganath[3] for enhancing color
and greyscale images. More advanced and sophisticated methods are explained by Taylor and
Dance [4] and Chen et al. [5] in their respective publications.
If low resolution cameras are used character strokes may appear very thin on the image and
be blended with the background. A Binarization technique performed on the above image
may not detect the stroke as part of the foreground. The above may be removed by Opening
morphological operations. So, noise reduction and image enhancement is of prime
importance before binarization.
2.1.2. Skew and perspective correction:


Perspective distortion in imaging using cameras is inevitable and its correction is of prime
importance for OCR applications. Skew detection and correction is the first step process in
the document analysis and understanding processing steps. Correcting these distortions is
very important, because they have a direct effect on the reliability and efficiency of the
segmentation and feature extraction stages. The noises and the deviation in the document
resolution or types are still the main two challenges facing the skew detection and correction
methods.
2.1.3. Binarization:
It is a process of converting the acquired image which is typically either a color image or a
grayscale image, to a bi-level image. Every pixel is categorized either as a foreground or a
background pixel[6][7][8][9][10][12]. Thresholding is the most popular Binarization method,
and can be applied in the following ways.
Global The entire document image is processed, and an estimate is calculated based on the
pixel counts such that pixels having higher and lower values than the estimate are classified
into the different classes, viz. foreground and background. All pixels are categorized using
global characteristics of the image. Otsus global thresholding[8] algorithm is a well known
and frequently used such algorithm.
Local adaptive thresholding[9][10] Different threshold values for all pixels of the image are
estimated based on the local characteristics.
Mehmet Sezgin et al [6] categorize the thresholding methods into six groups according to the
information they are exploit.
2.2. Feature Extraction:
In feature extraction stage, each character is represented as a feature vector, which becomes
its identity. During Training stage, feature vectors of characters are prepared as templates
which will later on serve in matching of features of acquired images of symbols with the
respective templates. Feature extraction methods analyse the input document image and
select a set of features that uniquely identifies and classifies the character. The objective of
feature extraction is to extract a set of features, which will maximize the recognition rate with
the least amount of elements. Due to the inherent nature of handwriting being highly variable
and imprecise, obtaining these features such that extraction will generate a similar feature set


for different instances of the same character is a difficult task. Sometimes multiple templates
of features of the same character are prepared, in order to account for the variability in
handwriting. Shunji Mori [27] explores in detail the various feature extraction methods along
with their mathematical treatment.
2.3. Recognition (Matching and Discrimination):
This phase includes the algorithm for recognising characters; the image of every character
must be converted to appropriate character code. Matching of the acquired character features
and features of the template character is performed and are discriminated into a particular
character code. The recognition algorithm may also produce a variety of outputs given an
image of an input character and provide probabilistic analysis of the same. For instance,
recognition of the image of "l" character can produce "l", "1" "/", "\" codes and list the
probability of occurrence of each.
ShunjiMori[27] lists and explains in detail many methods of character recognition at the basic
level, and the book can be used as a valuable reference.

3. Challenges to OCR of Camera Acquired Images:
OCR techniques assume high resolution, high quality images with a simple structure(High
contrasting text and background) for good quality recognition. Scanners perform very well
for the above mentioned tasks. Unfortunately, due to the typical nature of cameras and
environment, it is not always possible to assure the above. Many discrepancies may arise,
which are explained as follows:
3.1. Uneven Lighting Conditions:
A camera has very less control over lighting of an image than a scanner where lighting is
uniform and hence certain discrepancies can creep into the images like shadows and
reflection. Further problems arise when on-camera flash is used for the purpose. As Fisher
[11] found, if on-camera ash is used, the centre of the view is the brightest, and then lighting
decays outward. Use of flash may also result in the sensor being over-exposed or cause
flooding of light.


Due to the immense variation in light that may be present in camera acquired image, global
thresholding methods for binarization are not ideal(Refer Fig 2). A better alternative is to use
adaptive thresholding[7][9][10][12]. For example, fig. 3 uses adaptive thresholding.

a) b)

c)
Figure 2: a) Business card showing uneven lighting conditions, b) A result of Otsus Global
Thresholding algorithm [8] clearly showing missing text in light areas, particularly at
bottom-centre portion, c) A result of manually thresholding the image in an attempt to
recover missing text. Source: Own work.
Trier and Taxt compared 11 locally adaptive thresholding techniques and conclude that
Niblacks method is the most eective for general images [12]. OCR Droid engine[13] uses
the Sauvola algorithm for thresholding[10]. Both Niblacks and Sauvolas algorithm compute
thresholds individually for each pixel using information from the local neighborhood of that
pixel. Both of them perform well and give excellent results even for severely degraded
documents due to uneven lighting conditions. J. He and et al. [14] compare both the above
algorithms for thresholding historical documents with the ABBYY OCR engine[30] and
concluded that their proposed improvements to Niblacks and Sauvolas algorithms
performed slightly better than the originals and the ABBYY Commercial OCR engine.
Thresholding using both the algorithms on above business card is shown in fig 3.


a) b)

c)
Figure 3: a) A result of Niblacks Local adaptive Thresholding algorithm[9] on Fig 2.a) with
neighbourhood size 140x140 pixels. The missing text has appeared, but a lot of artefacts have
emerged in the background. b) artefacts removed using Dilation process with circular
structuring element of radius 3 pixels. C) A result of Sauvolas Local adaptive Thresholding
algorithm[10] on Fig 2.a) with neighbourhood size 140x140 pixels. Source: Own work.
3.2.Skewness (Rotation):
When OCR input is taken from a hand-held camera or other imaging devices whose
perspective is not xed like a scanner, text lines may get skewed from their original
orientation. Feeding such a rotated image to our OCR engine produces extremely poor
results.Many methods to deskew the document image have frequently appeared in literature,
viz. Hough transform, Projection Profile, Fourier transformation methods, RAST algorithm,
etc.
In Hough transform, the points in the Cartesian coordinate system are described as a
summation of sinusoidal distribution:



The skew angle is calculated on the basis that at the skew angle the density of Transform
spaces is maximum.Y. Lee [28] describes a method to detect and correct the skew angle in a
printed business form in his patent.
Horizontal projection profile is generally a histogram of the number of dark pixels in
horizontal scan lines of a text.
OCR DROID uses the RAST algorithm. It is a very robust skew-detection and correction
algorithm which is based on branch and bound text line finding algorithm for skew detection
and auto-rotation[15][16].
J.H. Park et al [17] propose a method of skew correction for business cards on PDA, wherein
local adaptive thresholding is applied to the images first. They introduce a process called
stripe generation, wherein adjacent characters and strings are merged and clusters are formed.
Only those clusters having size and eccentricity larger than a certain value are used as stripes.
Skew angle is calculated by first calculating directional angle of stripes using central
moments and averaging them.
3.3. Tilting (Perspective Distortion):
As opposed to scanners where the plane of document is always parallel to the plane of sensor,
document image acquisition with a mobile camera may not always be parallel to the
document plane. As a result tilted images are captured, wherein text closer to the camera
appears bigger and that far away from the camera smaller.A perspective distortion is
observed which causes lower recognition rate and accuracy if the recognizer is not
perspective intolerant[18].
OCR Droid takes advantage of orientation sensors embedded in the mobile devices. It detects
the tilt of the phone and disallows users to capture images when distortion occurs. It also
allows users to calibrate plane of the document with that of the camera. This is however, a
preventive technique rather than a corrective one.
Jagannathan and Jawahar demonstrate in their paper[19], the intelligent use of commonly
available clues for perspective rectification. In one of the algorithms they propose, take into
account the rectangular borders of the document. If the image is distorted, the rectangle
appears as quadrilateral. Now, if the positions of its vertices are calculated, the perspective
can be corrected by using an equation like:


33 32 31
23 22 12
33 32 31
13 12 11
h y h x h
h y h x h
y
h y h x h
h y h x h
x

Where: (x,y) are coordinates of pixels in image
(x,y) are coordinates of pixels of approximation of the original document
And [
] can be computed if we have the knowledge of aspect ratio of

original rectangle and map its vertex pixel (x,y) with that of the image (x,y). They also
propose[19] some other methods where apriori knowledge of aspect ratio is not required.

a) b)

c)
Figure 4: a) Image of a business card showing perspective distortion, b) Perspective
distortion corrected, c) Thresholded image. The document images have been improved using
CamCard[29] commercial software for android devices. Source: Own work.



3.4. Blur:
Since many digital cameras are designed to operate over a variety of distances, focus
becomes a signicant factor. Sharp edge response is required for the best character
segmentation and recognition. At short distances and large apertures, even slight perspective
changes can cause uneven focus. Two types of blur are generally associated with
photography: motion blur and out of focus blur. When the shutter speed of the camera is not
high enough to capture a moving object, sensor gets exposed to a constantly changing scene.
Thus, the parts in motion get blurred.
Out of focus blur is caused when the document/object to be imaged is not at focal point of the
lens. OCR Droid [13] handles this before the image acquisition itself. The engine uses the
Autofocus API of the Android SDK to keep the document in focus. The methods of
correcting out of focus blur have not been addressed much in literature.
Xing et al [20] propose a motion deblurring technique to correct motion blur caused due to
uniform motion as well as accelerated motion. Traditional image restoration methods need to
have an a priori knowledge of motion and/or blur and other parameters of degradation.
Wiener filter uses a degradation function, Cannon[21] proposed a method to identify blur
parameters using frequency domain techniques while Yitzhaky et al [22] proposed a method
based on the observation that image characteristics in the direction of motion are different
than those in other directions.
3.5. Warping:
OCR using cameras allows text on objects of varying geometries to be recognized. The
objects may not be planar in nature, e.g. Codes written on shipping containers(Fig 5 a), text
on a curved surface, etc. Even with flatbed scanners, some situations may arise wherein the
text acquired on image will appear warped, e.g. the text towards the binding of a very thick
book.Kanungo et al. [23] discuss the degradation around the book spine using a cylinder
model.
As a result of warping, straight lines may appear curved. Warping is inherently non-linear
and perspective correction methods discussed above are not applicable. Lee and
Kankanhalli[1] present a system to read warped text from Shipping Containers. Zhang and
Tan present 2 approaches in their papers[24][25] to correct image warping by straightening
lines in the image that were originally straight in the document. In their first paper[24],


horizontal connected components are clustered and they are moved vertically against each
other so that they eventually lie along the same horizontal straight line. In [25], they try to fit
a quadratic polynomial curve on the warped line to increase the accuracy of line
straightening. They concluded the corrected image increased recall and precision by 10%
than the warped image for OCR applications.
Ulges et al. [15] develop a way of dewarping images of convention paper documents. They
dewarp images by assuming the fact that text lines are evenly spaced and parallel to each
other. They use the RAST algorithm to calculate base-line of each word. The baseline is
robustly calculated and also accounts for descenders in characters as in p,g,j which lie
below the baseline. Line spacing is estimated by using baselines for subsequent lines. A 3D
model is developed to measure cell widths and using the above 3 information in addition to a
priori information of layout of the document, the characters can be effectively dewarped.

a) b)
Figure 5: a) Warped code on a shipping container. b) Warping of text on a mayonnaise
bottle. Source: Own Work
Source: http://www.myspace.com/442526486/photos/1334734

4. Challenges to OCR due to inherent limitations of Mobile Devices:
4.1. Limited Storage:
RAW images captured by sensors occupy a considerable amount of storage size. Due to the
limited storage capacity of mobile devices, lossy compression schemes like JPEG are used to
compress them. This reduces image size considerably. However, such compression
techniques are not optimized for documents or textual content.Zunino and Rovetta[2]
designed a vector quantization(VQ) mechanism for license plate images in ALPR.This


method not only compresses images but also givesinformation as to the location of the plate
in images.Testing results show that a VQ-based schema outperformsJPEG algorithms in high
compression ratio (>30)cases, while JPEG is better in low ratio scenarios.
Witten et al [26] introduce very efficient compression schemes specifically for images
containing textual data. They compress by separating the text from noise and compressing
each of them accordingly. In their first scheme the output produced is an approximation of
the original document wherein much of the noise is left out. The second scheme is a lossless
one, wherein encoded form of this extra noise is also transmitted with the document. First
scheme is more suitable for OCR. Thresholded images are used. First of all, all the connected
components having black pixels are isolated and extracted as marks. A library of such marks
is constructed and their offsets are calculated in the image. Finally, the library, symbol
sequence and the offsets are transmitted as output.
4.2. Limited Computing Power:
The ultimate goal will beto embed document analysis processing directly into thedevices. In
such cases, the system must provide computationally ecient algorithms that can operate
withlimited memory, processor, and storage resources. An almost instantaneous response is
expected by the users. Computationally efficient algorithms need to be designed.
OCRdroid[13] overcomes the above limitation by using computationally efficient algorithms
to reduce processing time, spawning separate threads in the background for computationally
intensive work so that UI is always responsive, using image compression for fast
transmission. Only, preventive actions are implemented on the mobile phone such that,
excessive tilting, misalignment, etc. are prevented. The image is transmitted to a web server
wherein actual steps of OCR are performed and the result is returned on the phone.
The proprietary mobile software based on android platform, CamCard[29] performs
everything on the mobile device itself. Its implementation details have not been published as
of yet, but the processing takes place without an internet connection with excellent results.
However, the software can detect text particularly from business cards only and hence has a
very narrow domain.

5. Conclusion:


Image acquisition using cameras of mobile devices differs a lot as compared to images of
documents acquired using Optical scanners. With the advantages that images of non-planar
and curved and irregular objects can be easily acquired, being portable and cost-effective in
nature, they come along with a lot of challenges that need to be addressed.
This paper has brought together methods which have been used to address the challenges at
pre-processing step in various papers, applications and OCR Engines.



References:
1. Lee, C. M., &Kankanhalli, A. (1995). Automatic extraction of characters in complex
scene images. International Journal of Pattern Recognition and Artificial Intelligence,
9(01), 67-82.
2. Zunino, R., &Rovetta, S. (2000). Vector quantization for license-plate location and
image coding. Industrial Electronics, IEEE Transactions on, 47(1), 159-167.
3. Kuo, S. S., &Ranganath, M. V. (1995, October). Real time image enhancement for
both text and color photo images. In Image Processing, 1995.Proceedings.,
International Conference on (Vol. 1, pp. 159-162). IEEE.
4. Taylor, M. J., & Dance, C. R. (1998, April). Enhancement of document images from
cameras.In Photonics West'98 Electronic Imaging (pp. 230-241).International Society
for Optics and Photonics.
5. Chen, D., Shearer, K., &Bourlard, H. (2001, September). Text enhancement with
asymmetric filter for video OCR.In Image Analysis and Processing,
2001.Proceedings. 11th International Conference on (pp. 192-197). IEEE.
6. Sezgin, M. (2004). Survey over image thresholding techniques and quantitative
performance evaluation. Journal of Electronic imaging, 13(1), 146-168.
7. Jiang, W. C. (1995, May). Thresholding and enhancement of text images for character
recognition.In Acoustics, Speech, and Signal Processing, 1995.ICASSP-95., 1995
International Conference on (Vol. 4, pp. 2395-2398). IEEE..
8. Otsu, N. (1975). A threshold selection method from gray-level
histograms.Automatica, 11(285-296), 23-27.
9. Niblack, W. (1985). An introduction to digital image processing.Strandberg
Publishing Company.
10. Sauvola, J., Seppanen, T., Haapakoski, S., &Pietikainen, M. (1997, August). Adaptive
document binarization. In Document Analysis and Recognition, 1997., Proceedings of
the Fourth International Conference on (Vol. 1, pp. 147-152). IEEE.
11. Fisher, F. (2001). Digital camera for document acquisition.In Proc. symposium on
document image understanding technology (pp. 75-83).
12. Trier, O. D., &Taxt, T. (1995). Evaluation of binarization methods for document
images. Pattern Analysis and Machine Intelligence, IEEE Transactions on,17(3), 312-
315.


13. Zhang, M., Joshi, A., Kadmawala, R., Dantu, K., Poduri, S., &Sukhatme, G. S.
OCRdroid: A Framework to Digitize Text on Smart Phones. Science, 3.
14. He, J., Do, Q. D. M., Downton, A. C., & Kim, J. H. (2005, August). A comparison of
binarization methods for historical archive documents.InDocument Analysis and
Recognition, 2005.Proceedings. Eighth International Conference on (pp. 538-542).
IEEE.
15. Ulges, A., Lampert, C. H., &Breuel, T. M. (2005, August). Document image
dewarping using robust estimation of curled text lines.In Document Analysis and
Recognition, 2005.Proceedings. Eighth International Conference on (pp. 1001-1005).
IEEE.
16. T. M. Breuel. Robust least square baseline nding using abranch and bound
algorithm. In Proceedings of SPIE/IS&T2002 Document Recognition & Retrieval IX
Conf. (DR&RIX), pages 2027, San Jose, California, USA, January 2002
17. Park, J. H., Jang, I. H., & Kim, N. C. (2003, August). Skew correction of business
card images in PDA. In Communications, Computers and signal Processing, 2003.
PACRIM.2003 IEEE Pacific Rim Conference on (Vol. 2, pp. 724-727).IEEE.
18. Moravec, K. L. C. (2002, September). A grayscale reader for camera images of Xerox
DataGlyphs. In Proc. of The British Machine Vision Conference (pp. 698-707).
19. Jagannathan, L., &Jawahar, C. V. (2005, August). Perspective correction methods for
camera based document analysis. In Proceedings of the First International Workshop
on Camera-based Document Analysis and Recognition (CBDAR) (pp. 148-154).
20. Qi, X. Y., Zhang, L., & Tan, C. L. (2005, August). Motion deblurring for optical
character recognition.In Document Analysis and Recognition, 2005.Proceedings.
Eighth International Conference on (pp. 389-393). IEEE.
21. Cannon, M. (1976). Blind deconvolution of spatially invariant image blurs with
phase. Acoustics, Speech and Signal Processing, IEEE Transactions on,24(1), 58-63.
22. Yitzhaky, Y., Mor, I., Lantzman, A., &Kopeika, N. S. (1998). Direct method for
restoration of motion-blurred images. JOSA A, 15(6), 1512-1519.
23. Kanungo, T., Haralick, R. M., & Phillips, I. (1993, October). Global and local
document degradation models. In Document Analysis and Recognition, 1993.,
Proceedings of the Second International Conference on (pp. 730-734). IEEE.
24. Zhang, Z., & Tan, C. L. (2001). Restoration of images scanned from thick bound
documents. In Image Processing, 2001.Proceedings.2001 International Conference on
(Vol. 1, pp. 1074-1077).IEEE.


25. Zhang, Z., & Tan, C. L. (2003, August). Correcting document image warping based
on regression of curved text lines. In Document Analysis and Recognition,
2003.Proceedings. Seventh International Conference on (pp. 589-593). IEEE.
26. Witten, I. H., Bell, T. C., Emberson, H., Inglis, S., &Moffat, A. (1994). Textual image
compression: Two-stage lossy/lossless encoding of textual images.Proceedings of the
IEEE, 82(6), 878-888.
27. Mori, S., Nishida, H., & Yamada, H. (1999). Optical character recognition. John
Wiley & Sons, Inc..
28. Lee, Y. (1991). U.S. Patent No. 5,054,098. Washington, DC: U.S. Patent and
Trademark Office.
29. 1. Kevin Jackson: CamCard Makes Collecting Business Cards Easier, posted in
Android Software. http://www.androidthoughts.com/tags/ocr, 2010.
30. ABBYY FineReader 10 Professional Editor, http://finereader.abbyy.com/.

OCR

Caricato da

Informazioni sul documento

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

OCR

Caricato da

Copyright:

Formati disponibili

Fundamental Challenges to Mobile Based OCR

] can be computed if we have the knowledge of aspect ratio of

Potrebbero piacerti anche