Paper 11

International Journal of Computer Information Systems, Vol. 3, No.
2, 2012
Adaptive Optical Character Recognition system

Abhinandana.G1, Gino Sophia2
1
Post Graduate Student, Department of Computer Science and Engineering, Hindustan University Chennai, India
abhiprasad.g@gmail.com
2
Associate Professor, Department of Computer Science and Engineering, Hindustan University Chennai, India
sgsophia@hindustanuniv.edu
Abstract - Optical character recognition system is a typical software tool that allows a typed document or a handwritten document to be converted into a searchable file. Unlike a scan of the document that merely captures an image of the original document enabling it to be edited or searched through any way. An OCR system enables us to take a book or a magazine article, feed it directly into an electronic computer file and then manipulate it. The idea of this project is to demonstrate an OCR that is capable of detecting hand written text using contours.OCR system is built on extensive image processing concepts that are capable of detecting the characters and process them to be edited Keywords - contours, Template Reconstruction, Region of interest. matching, segmentation,
I. INTRODUCTION OCR program attempts to read the graphic shape of a letter of the alphabet in the word in an image file, and then compare it to actual stored letters in various fonts that it recognizes to translate the text in the image file into editable text in a document. Most OCRs today are adaptive yet are designed to recognize standardized fonts this project aims to demonstrate an adaptive OCR which can be trained as it functions. A majority of the over the counter optical character recognition systems are capable of detecting only the printed literals.OCRs that work by observing the dynamics during hand written are termed as online character recognition or ICRs.ICRs -intelligent character recognition software have a fair accuracy but are resource consuming and are evolving. Commercially available ICRs are very expensive. The idea of this project is to demonstrate an OCR that is capable of detecting handwritten text.OCR that works on feature extraction and matching techniques can segment the words and identify indentation and spaces. It could adjust itself according to the input by implementing machine learning algorithms and thus be trained as its functions. The proposed OCR shall be low on system resources and will be open source. This is an inexpensive system done entirely through software OpenCV OpenCV is an open source computer vision library available. The library is written in C and C++ and runs under Linux, Windows and Mac OS X. There is active development on
interfaces for Python, Ruby, Mat lab, and other languages. OpenCV was designed for computational efficiency and with a strong focus on real-time applications. OpenCV is written in optimized C and can take advantage of multicore processors. Further automatic optimization on Intel architectures [Intel], can be done by Intels Integrated Performance Primitives (IPP) libraries [IPP], which consist of low-level optimized routines in many different algorithmic areas. OpenCV automatically uses the appropriate IPP library at runtime if that library is installed. One of OpenCVs goals is to provide a simple-to-use computer vision infrastructure that helps people build fairly sophisticated vision applications quickly. The OpenCV library contains over 500 functions that span many areas in vision, including factory product inspection, medical imaging, security, user interface, camera calibration, stereo vision, and robotics. Because computer vision and machine learning often go hand-in hand, OpenCV also contains a full, general-purpose Machine Learning Library (MLL). This sub library is focused on statistical pattern recognition and clustering. The MLL is highly useful for the vision tasks that are at the core of OpenCVs mission, but it is general enough to be used for any machine learning problem. Most computer scientists and practical programmers are aware of some facet of the role that computer vision plays. But few people are aware of all the ways in which computer vision is used. For example, most people are somewhat aware of its use in surveillance, and many also know that it is increasingly being used for images and video on the Web. A few have seen some use of computer vision in game interfaces. Yet few people realize that most aerial and street-map images (such as in Googles Street View) make heavy use of camera calibration and image stitching techniques. Some are aware of niche applications in safety monitoring, unmanned flying vehicles, or biomedical analysis. But few are aware how pervasive machine vision has become in manufacturing: virtually everything that is mass-produced has been automatically inspected at some point using computer vision. In this paper OpenCV is used for character detection thus showing its compatibility in other areas as well. OpenCV was designed for computational efficiency and with a strong focus on realtime applications. OpenCV is written in optimized C and can take advantage of multicore processors. One of OpenCV goals is to provide a simple-to-use infrastructure to build fairly sophisticated vision applications quickly.
February Issue
Page 58 of 62
ISSN 2229 5208
III. SYSTEM ARCHITECTURE An image is given as input to the system. The image undergoes the process of segmentation which is partitioning a digital image into multiple segments. Segmentation simplifies and/or changes the representation of an image into something that is more meaningful and easier to analyze. The letters in the image are roughly identified by contours. The result of segmentation is set of contours extracted from the image. During the process of segmentation, each alphabet is edge detected creating a closed box around the contours created that perfectly fits the alphabet .The output is the left top coordinates of the closed box and also the closed area calculation around the alphabet. Thus the alphabet is marked for choosing the Region Of Interest (ROI).The ROI undergoes painting process where an empty image is created to embed the image to be matched. Every time the matching is done the previously embedded image is replaced by a new image for template matching. Painting is done considering the logic source image>>template image(source image is much greater in size than that of the template image).Template matching technique is used in classifying.This technique compares portions of images against one another Sample image or template is used to recognize smaller objects in the source image. Upon testing each alphabet, the accurate alphabet with max value 1 is detected.The max value 1 indicates the template matching for a particular alphabet is correct. The rest of the alphabets get a min value of 0 if it is not found to be a perfect match or gets a decimal value <1 even if its slightly matches with the template.
1. Input image
International Journal of Computer Information Systems, Vol. 3, No. 2, 2012 As shown in Figure 2 Once the template matching is successfully completed, the result is dumped into a file. The dump file consists of the characters and their corresponding coordinates. During the reconstruction (RC) phase of the project the alphabets are sorted out based on their coordinates. These sorted results are printed to be displayed same as the source image in the same sequential order IV.SYSTEM DESIGN A. Segmentation Segmentation focuses on how to isolate objects or parts of objects from the rest of the image. It examines algorithms that deal with finding, filling and isolating objects and object parts in an image. We start with separating foreground objects from learned background
SEGMENTATION
Contours
ROI
Painting
Explicit Segmentation In explicit approaches one tries to identify the smallest possible word segments (primitive segments) that may be smaller than letters, but surely cannot be segmented further. Later in the recognition process these primitive segments are assembled into letters based on input from the character recognizer. The advantage of the first strategy is that it is robust and quite straightforward, but is not very flexible
2. Countours ROI
segmentation
3. Painting
4. Template matching
Implicit Segmentation In implicit approaches the words are recognized entirely without segmenting them into letters. This is most effective and viable only when the set of possible words is small and known in advance, such as the recognition of bank checks and postal address Background Subtraction Because of its simplicity and because camera locations are fixed in many contexts, background subtraction is probably the most fundamental image processing operation for video security applications. In order to perform background subtraction, we first must learn a model of the background once learned, this background model is compared against the current image and then the known background parts are
RC
Prin t
Ma tch es?
Database
Sort
Wri te
5. Dump file
Figure 2 system architecture
February Issue
Page 59 of 62
ISSN 2229 5208
subtracted away. The objects left after subtraction are presumably new foreground objects. Of course background is an ill-defined concept that varies by application. For example, if you are watching a highway, perhaps average traffic c flow should be considered background. Normally background is considered to be any static or periodically moving parts of a scene that remain static or periodic over the period of interest. The whole ensemble may have timevarying components, such as trees waving in morning and evening wind but standing still at noon. Two common but substantially distinct environment categories that are likely to be encountered are indoor and outdoor scenes. We are interested in tools that will help us in both of these environments. Frame Differencing The very simplest background subtraction method is to subtract one frame from another (possibly several frames later) and then label any difference that is big enough the Foreground. Averaging Background Method The averaging method basically learns the average and standard deviation (or similarly, but computationally faster, the average difference) of each pixel as its model of the background. Contour Sequence One kind of object that can be stored inside memory storage is a sequence. Sequences are themselves linked lists of other structures. OpenCV can make sequences out of many different kinds of objects. In this sense you can think of the sequence as something similar to the generic container classes (or container class templates) that exist in various other programming languages. The sequence construct in OpenCV is actually a deque, so it is very fast for random access and for additions and deletions from either end but a little slow for adding and deleting objects in the middle. The sequence structure itself has some important elements that you should be aware of. The first, and one you will use often, is total. This is the total number of points or objects in the sequence. The next four important elements are pointers to other sequences: h_prev, h_next, v_prev, and v_next. These four pointers are part of what are called CV_TREE_NODE_FIELDS; they are used not to indicate elements inside of the sequence but rather to connect different sequences to one another. Other objects in the OpenCV universe also contain these tree node fields. Any such objects can be Contour is represented in OpenCV by a CvSeq sequence that is, one way or another, a sequence of points. The function cvFindContours () computes contours from binary images. It can take images created by cvCanny (), which have edge pixels in them, or images created by functions like cvThreshold () or cvAdaptiveThreshold (), in
International Journal of Computer Information Systems, Vol. 3, No. 2, 2012 which the edges are implicit as boundaries between positive and negative regions. Contour Finding Figure 1, which depicts the functionality of cvFindContours ().The upper part of the figure shows a test image containing a number of white regions (labeled A through E) on a dark background. The lower portion of the figure depicts the same image along with the contours that will be located by cvFindContours (). Those contours are labeled cX or hX, where c stands for contour, h stands for hole, and X is some number. Some of those contours are dashed lines; they represent exterior boundaries of the white regions (i.e., nonzero regions).OpenCV and cvFindContours() distinguish between these exterior boundaries and the dotted lines, which you may think of either as interior boundaries or as the exterior boundaries of holes (i.e., zero regions). The concept of containment here is important in many applications. For this reason, OpenCV can be asked to assemble the found contours into a contour tree* that encodes the containment relationships in its structure. A contour tree corresponding to this test image would have the contour called c0 at the root node, with the holes h00 and h01 as its children. Those would in turn have as children the contours that they directly contain, and so on. CvFindContours () does not really know anything about edge images. This means that, to cvFindContours (), an edge is just a very thin white area. As a result, for every exterior contour there will be a hole contour that almost exactly coincides with it. This hole is actually just inside of the exterior boundary. You can think of it as the white-toblack transition that marks the interior edge of the edge.
Figure 1
February Issue
Page 60 of 62
ISSN 2229 5208
ROI If region of interest (ROI) is set, then the function will respect that region. Thus, one way of speeding up character detection is to trim down the image boundaries using ROI. ROI and widthStep have great practical importance, since in many situations they speed up computer vision operations by allowing the code to process only a small sub region of the image. Support for ROI and widthStep is universal in OpenCV:* every function allows operation to be limited to a sub region. To turn ROI on or off, use the cvSetImageROI () and cvResetImageROI () functions. Given a rectangular sub region of interest in the form of a CvRect, you may pass an image pointer and the rectangle to cvSetImageROI () to turn on ROI; turn off ROI by passing the image pointer to cvResetImageROI (). Painting Painting process consists of creating an empty image to place the image to be matched during Template matching. For any image to be compared the rule of thumb is source image>>template image. B. Template matching Technique used in classifying objects. Template matching techniques compare portions of images against one another. Sample image may be used to recognize similar objects in source image. If standard deviation of the template image compared to the source image is small enough, template matching may be used.Templates are most often used to identify printed characters, numbers, and other small, simple objects.
International Journal of Computer Information Systems, Vol. 3, No. 2, 2012 the image in that position. Match is done on pixel-by-pixel basics. Template is a small image, usually a bi-level image. Find template in source image, with a Yes/No approach. Correlation is a measure of the degree to which two variables agree, not necessary in actual value but in general behavior. The two variables are the corresponding pixel values in two images, template and source. Template matching via cvMatchTemplate() is not based on histograms; rather, the function matches an actual image patch against an input image by sliding the patch over the input image using one of the matching methods described below. Square difference matching method (method = CV_TM_SQDIFF) These methods match the squared difference, so a perfect match will be 0 and bad matches will be large.
Rsq _ diff ( x, y) [T ( x ', y ') I ( x x ', y y ')]2

x ', y '
Correlation matching methods (method = CV_TM_CCORR) These methods multiplicatively match the template against the image, so a perfect match will be large and bad matches will be small or 0
Rccorr ( x, y) [T ( x ', y ').I ( x x ', y y ')]2

x ', y '
Correlation coefficient matching methods (method = CV_TM_CCOEFF)
Rccoeff ( x, y) [T '( x ', y ').I '( x x ', y y ')]2

x ', y '
T '( x ', y ') T ( x ', y ')
1 ( w.h) T ( x '', y '')

x '', y ''
x, y
correlation
x, y
I '( x x ', y y ') I ( x x ', y y ')
1 ( w.h) I ( x x '', y y '')

x '', y ''
Template image
Normalized methods For each of the three methods just described, there are also normalized versions first developed by Galton [Galton] as described by Rodgers [Rodgers88]. The normalized methods are useful because, as mentioned previously, they can help reduce the effects of lighting differences between the template and the image. In each case, the normalization Coefficient is the same:
Input image
Output image
Figure 2 The matching process moves the template image to all possible positions in a larger source image and computes a numerical index that indicates how well the template matches
Z ( x, y)
T ( x ', y ') . I ( x x ', y x ')

2 x ', y ' x ', y '
February Issue
Page 61 of 62
ISSN 2229 5208
Normalization The raw data is subjected to a number of preliminary processing steps to make it usable in the descriptive stages of character analysis. Pre-processing aims to produce data that are easy for the OCR systems to operate accurately. The main objectives of pre-processing are the following; Binarization Document image binarization (thresholding) refers to the conversion of a gray-scale image into a binary image. Two categories of thresholding. Global, picks one threshold value for the entire document image which is often based on an estimation of the background level from the intensity histogram of the image.Adaptive (local), uses different values for each pixel according to the local area information Noise reduction Noise reduction improves the quality of the document. Two main approaches: Filtering (masks) Morphological Operations (erosion, dilation, etc) Normalization provides a tremendous reduction in data size, thinning extracts the shape information of the characters. Slant reduction The slant of handwritten texts varies from user to user. Slant removal methods are used to normalize the all characters to a standard form. D. Reconstruction This module consists of rearranging the detected alphabets for their sequential arrangement. The reconstruction phase will be the final module of the paper as it completes the entire character detection process with its following sub operations Sort With the characters and their corresponding coordinates available, score cards are prepared for each and every character that is detected. Taking the maximum score card value into account, the matched alphabet is thus confirmed. Print and Display The result is hence confirmed to be same as the input image and displayed for the users. D.Database The data base consists of alphabets stored in different fonts available in same size each, in jpeg or png form. The database
International Journal of Computer Information Systems, Vol. 3, No. 2, 2012 can be further expanded with alphabets of different sizes as well V. CONCLUSION OCR systems are thus made inexpensive because of their non usage of any hardware circuits. Rather it is a full fledged software that enables the detection and manipulation of a document. The database is limited to fonts of certain size but they can be expanded in future with different sized fonts REFERENCES 1. P.viola and M.J. Jones real time face detection, International journal of computer vision 57(2004):137-154: 2. R. Lien hart and J. Maydt, An Extended Set of Haar-like Features for Rapid Object Detection, IEEE ICIP (2002), 900903 3. [Burt81] P. J. Burt, T. H. Hong, and A. Rosenfeld, Segmentation and estimation of image region properties through cooperative hierarchical computation, IEEE Transactions on Systems, Man, and Cybernetics 11 (1981): 802809. 4. [Canny86] J. Canny, A computational approach to edge detection, IEEE Transactions on Pattern Analysis and Machine Intelligence 8 (1986679714). 5. Learning OpenCV computer vision with OpenCV library by Gary Bradski and Adrian Kaehler 6. Segmentation of Character and Natural Image Documents with Neural Network Model for Facsimile Equipments Akira IKEDA and Yoshifunii SHIMODAIRA 7. Component-based robust face detection using AdaBoost and decision tree-Kiyoto Ichikawa, Komukai-Toshiba, Osamu Hori
AUTHORS PROFILE Abhinandana.G, Chennai, 20.10.1988, M.E computer science and engineering, School of computing sciences and engineering, Hindustan University, Chennai, Tamilnadu, India.
S.G.GinoSophia, Chennai, 13.06.1975, Associate professor, Department of Computer science and Engineering, Hindustan university, Chennai, Tamilnadu, India.
February Issue
Page 62 of 62
ISSN 2229 5208

Paper 11

Caricato da

Informazioni sul documento

Descrizione originale:

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Paper 11

Caricato da

Copyright:

Formati disponibili

International Journal of Computer Information Systems, Vol. 3, No.

Adaptive Optical Character Recognition system

ISSN 2229 5208

Figure 2 system architecture

ISSN 2229 5208

ISSN 2229 5208

Rsq _ diff ( x, y) [T ( x ', y ') I ( x x ', y y ')]2

Rccorr ( x, y) [T ( x ', y ').I ( x x ', y y ')]2

Correlation coefficient matching methods (method = CV_TM_CCOEFF)

Rccoeff ( x, y) [T '( x ', y ').I '( x x ', y y ')]2

T '( x ', y ') T ( x ', y ')

1 ( w.h) T ( x '', y '')

I '( x x ', y y ') I ( x x ', y y ')

1 ( w.h) I ( x x '', y y '')

T ( x ', y ') . I ( x x ', y x ')

ISSN 2229 5208

ISSN 2229 5208

Potrebbero piacerti anche