0 valutazioniIl 0% ha trovato utile questo documento (0 voti)

114 visualizzazioni69 pagineabout text segmentation algorithms

May 04, 2016

© © All Rights Reserved

DOCX, PDF, TXT o leggi online da Scribd

about text segmentation algorithms

© All Rights Reserved

0 valutazioniIl 0% ha trovato utile questo documento (0 voti)

114 visualizzazioni69 pagineabout text segmentation algorithms

© All Rights Reserved

Sei sulla pagina 1di 69

Using MATLAB

CHAPTER 1

INTRODUCTION

1.1 Introduction of Image Processing

Modern digital technology has made it possible to manipulate multi-dimensional

signals with systems that range from simple digital circuits to advanced parallel

computers. The goal of this manipulation can be divided into three categories:

We will focus on the fundamental concepts of image processing. Space does not permit

us to make more than a few introductory remarks about image analysis. Image

understanding requires an approach that differs fundamentally from the theme of this

book. Further, we will restrict ourselves to twodimensional (2D) image processing

although most of the concepts and techniques that are to be described can be extended

easily to three or more dimensions.

Using MATLAB

Digitizer

Mass

Image processing

Digital

Computer

Display

Operator console

1.2.1 Digitizer

the conversion of a typically analog object, image or a signal into digital form. Graphics

tablet or digitizing tablet. Digitizer lens cursor.

The digitizer by definition is a device used to convert analog signals into digital

signals. In the case of our cell phones, this device would be the glass that covers the

LCD. Yes, the glass piece that is attached to the LCD is the digitizer... or sometimes

called LCD digitizer. It converts your actions (press, swipe, etc.) into a digital signal that

your phone understands. The data from the digitizer (glass) is transferred to the phone by

Dept. Of E.C.E, SIETK, PUTTUR

Using MATLAB

the attached digitizer flex cable or digitizer flex ribbon, that should be included in your

purchase of a replacement LCD digitizer.

It's pretty difficult to preserve the integrity of the flex cable when there is a

digitizer replacement performed. So, when we have an LCD assembly, minus small parts

and frames, we're looking at 3 major components... 1) the digitizer (front glass), 2) the

digitizer flex cable (should be attached to the glass) and 3) the LCD display unit. All 3 of

these components can be purchased separately or all together. Although, I wouldn't

recommend to anyone that they purchase the digitizer without the digitizer flex cable.

Mounting the flex cable can be tricky with some phones and if not done correctly will

give you poor functionality of your touch screen LCD.

1.2.2 Image processing

An image processer does the function of image acquisition, storage, pre

processing, segmentation, recognition and interpretation and finally display and record

the resulting image. The following block diagram gives the fundamental involved in an

image processing system. There are some fundamental steps but as they are fundamental,

all these steps may have sub-steps. The fundamental steps are described below with a

neat diagram.

Using MATLAB

Image acquisition

Preprocessing

Segmentation

Knowledge base

(i) Image Acquisition: This is the first step or process of the fundamental steps of digital

image processing. Image acquisition could be as simple as being given an image that is

already in digital form. Generally, the image acquisition stage involves preprocessing,

such as scaling etc.

(ii) Image Enhancement: Image enhancement is among the simplest and most appealing

areas of digital image processing. Basically, the idea behind enhancement techniques is to

bring out detail that is obscured, or simply to highlight certain features of interest in an

image. Such as, changing brightness & contrast etc.

(iii)

Image Restoration: Image restoration is an area that also deals with improving

restoration is objective, in the sense that restoration techniques tend to be based on

mathematical or probabilistic models of image degradation.

(iv) Color Image Processing: Color image processing is an area that has been gaining its

importance because of the significant increase in the use of digital images over the

Internet. This may include color modeling and processing in a digital domain etc.

v) Wavelets and Multi resolution Processing: Wavelets are the foundation for

representing images in various degrees of resolution. Images subdivision successively

into smaller regions for data compression and for pyramidal representation.

Using MATLAB

(vi) Compression: Compression deals with techniques for reducing the storage required

to save an image or the bandwidth to transmit it. Particularly in the uses of internet it is

very much necessary to compress data.

(vii)Morphological Processing: Morphological processing deals with tools for extracting

image components that are useful in the representation and description of shape.

(viii) Segmentation: Segmentation procedures partition an image into its constituent parts

or objects. In general, autonomous segmentation is one of the most difficult tasks in

digital image processing. A rugged segmentation procedure brings the process a long way

toward successful solution of imaging problems that require objects to be identified

individually.

(ix)Representation and Description: Representation and description almost always

follow the output of a segmentation stage, which usually is raw pixel data, constituting

either the boundary of a region or all the points in the region itself. Choosing a

representation is only part of the solution for transforming raw data into a form suitable

for subsequent computer processing. Description deals with extracting attributes that

result in some quantitative information of interest or are basic for differentiating one class

of objects from another.

(x)Object recognition : Recognition is the process that assigns a label, such as, vehicle

to an object based on its descriptors.

(xi) Knowledge Base : Knowledge may be as simple as detailing regions of an image

where the information of interest is known to be located, thus limiting the search that has

to be conducted in seeking that information. The knowledge base also can be quite

complex, such as an interrelated list of all major possible defects in a materials inspection

problem or an image database containing high-resolution satellite images of a region in

connection with change-detection applications.

1.2.3 Digital computer

An electronic computer in which the input is discrete rather than continuous, consisting

of combinations of numbers, letters, and other characters written in an appropriate

programming language and represented internally in binary notation. Compare analog

computer.

Dept. Of E.C.E, SIETK, PUTTUR

Using MATLAB

Refers to various techniques and devices for storing large amounts of data. The

earliest storage devices were punched paper cards, which were used as early as 1804 to

control silk-weaving looms. Modern mass storage devices include all types of disk drives

and tape drives. Mass storage is distinct from memory, which refers to temporary storage

areas within the computer. Unlike main memory, mass storage devices retain data even

when the computer is turned off.

1.2.5 Hard copy device

Hard copy devices are those that give the output in the tangible form. Printers and

Plotters are two common hard copy devices

1.2.6 Operator console

Operator console consists of equipment and arrangements for verification

intermediate results and for alternation in the software as and when require. The operater

is also cap able of checking for any resulting errors and the entry of requisite data.

Some of the major fields in which digital image processing is widely used are mentioned

below

Medical field

Remote sensing

Machine/Robot vision

Color processing

Pattern recognition

Video processing

Microscopic Imaging

Others

Using MATLAB

Image sharpening and restoration refers here to process images that have been

captured from the modern camera to make them a better image or to manipulate those

images in way to achieve desired result. It refers to do what Photoshop usually does. This

includes Zooming, blurring , sharpening , gray scale to color conversion, detecting edges

and vice versa , Image retrieval and Image recognition. The common examples are:

1.3.2 Medical field

The common applications of DIP in the field of medical is

1. Gamma ray imaging

2. PET scan

3. X Ray Imaging

4. Medical CT

5. UV imaging

1.3.3 Transmission and encoding

The very first image that has been transmitted over the wire was from London to

New York via a submarine cable. Now just imagine , that today we are able to see live

video feed , or live cctv footage from one continent to another with just a delay of

seconds. It means that a lot of work has been done in this field too. This field doesnot

only focus on transmission , but also on encoding. Many different formats have been

developed for high or low bandwith to encode photos and then stream it over the internet

or e.t.c.

1.3.4 Machine/Robot vision

Apart form the many challenges that a robot face today , one of the biggest

challenge still is to increase the vision of the robot. Make robot able to see things ,

identify them , identify the hurdles e.t.c. Much work has been contributed by this field

and a complete other field of computer vision has been introduced to work on it.

1.3.5 Hurdle detection

Hurdle detection is one of the common task that has been done through image

processing, by identifying different type of objects in the image and then calculating the

distance between robot and hurdles.

Dept. Of E.C.E, SIETK, PUTTUR

Using MATLAB

Most of the robots today work by following the line and thus are called line

follower robots. This help a robot to move on its path and perform some tasks. This has

also been achieved through image processing.

1.3.7 Color processing

Color processing includes processing of colored images and different color spaces

that are used. For example RGB color model , YCbCr, HSV. It also involves studying

transmission , storage , and encoding of these color images.

1.3.8 Pattern recognition

Pattern recognition involves study from image processing and from various other

fields that includes machine learning ( a branch of artificial intelligence). In pattern

recognition , image processing is used for identifying the objects in an images and then

machine learning is used to train the system for the change in pattern. Pattern recognition

is used in computer aided diagnosis , recognition of handwriting , recognition of images

e.t.c

1.3.9 Video processing

A video is nothing but just the very fast movement of pictures. The quality of the

video depends on the number of frames/pictures per minute and the quality of each frame

being used. Video processing involves noise reduction , detail enhancement , motion

detection , frame rate conversion , aspect ratio conversion , color space conversion e.t.c.

1.4 Segmentation

The division of an image into meaningful structures, image segmentation, is often

an essential step in image analysis, object representation, visualization, and many other

image processing tasks. In chapter 8, we focussed on how to analyze and represent an

object, but we assumed the group of pixels that identified that object

was known

beforehand. In this chapter, we will focus on methods that find the particular pixels that

make up an object. A great variety of segmentation methods has been proposed in the past

decades, and some categorization is necessary to present the methods properly here. A

disjunct categorization does not seem to be possible though, because even two very

Dept. Of E.C.E, SIETK, PUTTUR

Using MATLAB

different

segmentation

approaches

may

share

properties

that

defy

singular

categorization regarding the emphasis of an approach than a strict division. The following

categories are used:

are used to segment the image. They may be applied directly to an image, but can

Edge based segmentation. With this technique, detected edges in an image are

Region based segmentation. Where an edge based technique may attempt to find

the object boundaries and then locate the object itself by filling them in, a region

based technique takes the opposite approach, by (e.g.) starting in the middle of an

object and then growing outward until it meets the object boundaries.

Clustering techniques. Although clustering is sometimes used as a synonym for

agglomerative) segmentation techniques, we use it here to denote techniques that

are primarily used in exploratory data analysis of high-dimensional measurement

patterns. In this context, clustering methods attempt to group together patterns

that are similar in some sense. This goal is very similar to what we are attempting

to do when we segment an image, and indeed some clustering techniques can

Matching. When we know what an object we wish to identify in an image

(approximately) looks like, we can use this knowledge to locate the object in an

image.This approach to segmentation is called matching.

Using MATLAB

Text segmentation is the process of dividing written text into meaningful units,

such as words, sentences, or topics. The term applies both to mental processes used by

humans when reading text, and to artificial processes implemented in computers, which

are the subject of natural language processing. The problem is non-trivial, because while

some written languages have explicit word boundary markers, such as the word spaces of

written English and the distinctive initial, medial and final letter shapes of Arabic, such

signals are sometimes ambiguous and not present in all written languages. Compare

speech segmentation, the process of dividing speech into linguistically meaningful

portions.

1.5.1 Segmentation problems

A. Word segmentation

Word segmentation is the problem of dividing a string of written language into its

component words. In English and many other languages using some form of the Latin

alphabet, the space is a good approximation of a word divider (word delimiter).

(However the equivalent to this character is not found in all written scripts, and without it

word segmentation is a difficult problem.

10

Using MATLAB

Chinese, Japanese, where sentences but not words are delimited, Thai and Lao, where

phrases and sentences but not words are delimited, and Vietnamese, where syllables but

not words are delimited.

In some writing systems however, such as the Ge'ez script used for Amharic and

Tigrinya among other languages, words are explicitly delimited (at least historically) with

a non-whitespace character.The Unicode Consortium has published a Standard Annex on

Text Segmentation, exploring the issues of segmentation in multiscript texts. Word

splitting is the process of parsing concatenated text (i.e. text that contains no spaces or

other word separators) to infer where word breaks exist.Word splitting may also refer to

the process of hyphenation.

B. Sentence segmentation

Sentence segmentation is the problem of dividing a string of written language into

its component sentences. In English and some other languages, using punctuation,

particularly the full stop character is a reasonable approximation. However even in

English this problem is not trivial due to the use of the full stop character for

abbreviations, which may or may not also terminate a sentence. For example Mr. is not its

own sentence in "Mr. Smith went to the shops in Jones Street." When processing plain

text, tables of abbreviations that contain periods can help prevent incorrect assignment of

sentence boundaries.As with word segmentation, not all written languages contain

punctuation characters which are useful for approximating sentence boundaries.

C. Text segmentation

Topic analysis consists of two main tasks: topic identication and text

segmentation. While the first is a simple classification of a specific text, the latter case

implies that a document may contain multiple topics, and the task of computerized text

segmentation may be to discover these topics automatically and segment the text

Dept. Of E.C.E, SIETK, PUTTUR

11

Using MATLAB

accordingly. The topic boundaries may be apparent from section titles and paragraphs. In

other cases, one needs to use techniques similar to those used in document

classification.Segmenting the text into topics or discourse turns might be useful in some

natural processing tasks: it can improve information retrieval or speech recognition

significantly (by indexing/recognizing documents more precisely or by giving the

specific part of a document corresponding to the query as a result). It is also needed in

Topic detection and Tracking systems and text summarizing problems.Many different

approaches have been tried. e.g. HMM, lexical chains, passage similarity using word cooccurrence, clustering etc. It is quite an ambiguous task people evaluating the text

segmentation systems often differ in topic boundaries. Hence, evaluating is quite dubious

problem too.

D. Other segmentation problems

Processes may be required to segment text into segments besides mentioned,

including morphemes (a task usually called morphological analysis) or paragraphs.

E.Automatic segmentation approaches

Automatic segmentation is the problem in natural language processing of

implementing a computer process to segment text.When punctuation and similar clues are

not consistently available, the segmentation task often requires fairly non-trivial

techniques, such as statistical decision-making, large dictionaries, as well as

consideration of syntactic and semantic constraints. Effective natural language processing

systems and text segmentation tools usually operate on text in specific domains and

sources. As an example, processing text used in medical records is a very different

problem than processing news articles or real estate advertisements.The process of

developing text segmentation tools starts with collecting a large corpus of text in an

application domain. There are two general approaches:

12

Using MATLAB

Annotate the sample corpus with boundary information and use Machine

Learning

Some text segmentation systems take advantage of any markup like HTML and know

document formats like PDF to provide additional evidence for sentence and paragraph

boundaries.

1.6 Compression

The objective of image compression is to reduce irrelevance and redundancy of

the image data in order to be able to store or transmit data in an efficient form.

1.6.1 Lossy and lossless compression

Image compression may be lossy or lossless. Lossless compression is preferred

for archival purposes and often for medical imaging, technical drawings, clip art, or

comics. Lossy compression methods, especially when used at low bit rates, introduce

compression artifacts. Lossy methods are especially suitable for natural images such as

photographs in applications where minor (sometimes imperceptible) loss of fidelity is

acceptable to achieve a substantial reduction in bit rate. The lossy compression that

produces imperceptible differences may be called visually lossless.

Methods for lossless image compression are:

BMP, TGA, TIFF

Entropy encoding

13

Using MATLAB

Chain codes

Reducing the color space to the most common colors in the image. The selected

colors are specified in the color palette in the header of the compressed image.

Each pixel just references the index of a color in the color palette, this method can

be combined with dithering to avoid posterization.

Chroma subsampling. This takes advantage of the fact that the human eye

perceives spatial changes of brightness more sharply than those of color, by

averaging or dropping some of the chrominance information in the image.

Fourier-related transform such as the Discrete Cosine Transform (DCT) is widely

used: N. Ahmed, T. Natarajan and K.R.Rao, "Discrete Cosine Transform," IEEE

Trans. Computers, 90-93, Jan. 1974. The DCT is sometimes referred to as "DCTII" in the context of a family of discrete cosine transforms; e.g., see discrete

cosine transform. The more recently developed wavelet transform is also used

extensively, followed by quantization and entropy coding.

Fractal compression.

The best image quality at a given bit-rate (or compression rate) is the main goal of

image compression, however, there are other important properties of image compression

schemes.

1.6.4 Scalability

Generally refers to a quality reduction achieved by manipulation of the bitstream

or file (without decompression and re-compression). Other names for scalability are

progressive coding or embedded bitstreams. Despite its contrary nature, scalability also

Dept. Of E.C.E, SIETK, PUTTUR

14

Using MATLAB

may be found in lossless codecs, usually in form of coarse-to-fine pixel scans. Scalability

is especially useful for previewing images while downloading them (e.g., in a web

browser) or for providing variable quality access to e.g., databases. There are several

types of scalability:

reconstructed image.

Resolution progressive: First encode a lower image resolution; then encode the

difference to higher resolutions.

Certain parts of the image are encoded with higher quality than others. This may

be combined with scalability (encode these parts first, others later).

1.6.6 Meta information

Compressed data may contain information about the image which may be used to

categorize, search, or browse images. Such information may include color and texture

statistics, small preview images, and author or copyright information.

1.6.7 Processing power.

Compression algorithms require different amounts of processing power to encode

and decode. Some high compression algorithms require high processing power.The

quality of a compression method often is measured by the Peak signal-to-noise ratio. It

measures the amount of noise introduced through a lossy compression of the image,

however, the subjective judgment of the viewer also is regarded as an important measure,

perhaps, being the most important measure.

1.6.8 Bit plane slicing

A bit plane of a digital discrete signal (such as image or sound) is a set of bits

corresponding to a given bit position in each of the binary numbers representing the

Dept. Of E.C.E, SIETK, PUTTUR

15

Using MATLAB

signal.For example, for 16-bit data representation there are 16 bit planes: the first bit

plane contains the set of the most significant bit, and the 16th contains the least

significant bit.

It is possible to see that the first bit plane gives the roughest but the most critical

approximation of values of a medium, and the higher the number of the bit plane,

the less is its contribution to the final stage. Thus, adding a bit plane gives a better

approximation.

If a bit on the nth bit plane on an m-bit dataset is set to 1, it contributes a value of

2(m-n), otherwise it contributes nothing. Therefore, bit planes can contribute half of

the value of the previous bit plane.

16

Using MATLAB

CHAPTER-2

LITERATURE SURVEY

2.1 MULTISCALE SEGMENTATION FOR MRC DOCUMENT

COMPRESSION USING COST FUNCTION

The Mixed Raster Content (MRC) standard (ITU-T T.44) specifies a framework

for document compression which can dramatically improve the compression/quality

tradeoff as compared to traditional lossy image compression algorithms. The key to

MRCs performance is the separation of the document into foreground and back- ground

layers, represented as a binary mask. In this paper, we propose a integrated segmentation

algorithm which is based on the sequential application of two algorithms. Cost Optimized

Segmentation (COS), is a block wise segmentation algorithm.

The second algorithm, Connected Component Classification (CCC), refines the

initial segmentation by classifying feature vectors of connected components using a

Markov random field (MRF) model. The integrated COS/CCC segmentation algorithms

are then incorporated to a resolution enhanced rendering (RER) method ie to achieve

high quality rendering of document containing text, pictures and graphics, while

maintaining desired compression ratios.

The procedure for Cost Optimized Segmentation (COS) is as follows. The image

is first divided into overlapping blocks. Each block contains mm pixels, and adjacent

blocks overlap by m/2 pixels in both the horizontal and vertical directions. The blocks are

denoted, Oi,j for i = 1, .., M , and j = 1, .., N , where M and N are the number of the

blocks in the vertical and horizontal directions. The pixels in each block are segmented

into foreground (1) or background (0) by the clustering method of Cheng and

Dept. Of E.C.E, SIETK, PUTTUR

17

Using MATLAB

Bouman. This results in an initial binary mask for each block denoted by C i,j {0, 1}mm.

However, in order to form a consistent segmentation of the page, these initial block

segmentations must be merged into a single binary mask. To do this, we allow each block

to be modified using a class assignment, si,j{0, 1, 2, 3}, as follows,the most traditional

approach to text segmentation is Otsus method which thresholds pixels in an effort to

divide the documents histogram into object and background.

There are many modified versions of Otsus method. While Otsu uses a global

thresholding approach, Niblack and Sauvola use a local thresholding approach. Kapurs

method uses entropy information for the global thresholding, and Tsai uses a moment

preserving approach. A comparison of the algorithms for text segmentation can be found.

In order to improve text extraction accuracy, some text segmentation approaches

also use character properties such as size, stroke width, directions, and run-length

histogram. Other binarization approaches for document coding have used rate-distortion

minimization as a criteria for document binarization. Many recent approaches to text

segmentation have been based upon statistical models. One of the best commercial text

18

Using MATLAB

hidden Markov model (HMM).

2.1.1 Advantages

The content used here is a standard framework for layer-based document compression.

It reduces the bit rate of encoded raster documents.

The mixed raster content detects and segments the text in complex documents in

background gradations.

for document image with inhomogeneous backgrounds

A segmentation algorithm using a water flow model [Kim et al., Pattern

Recognition 35 has already been presented where a document image can be efficiently

divided into two regions, characters and background, due to the property of locally

adaptive thresholding. However, this method has not decided when to stop the iterative

process and required long processing time. Plus, characters on poor contrast backgrounds

often fail to be separated successfully. Accordingly, to overcome the above drawbacks to

the existing method, the current paper presents an improved approach that includes

extraction of regions of interest (ROIs), an automatic stopping criterion, and hierarchical

thresholding. Experimental results show that the proposed method can achieve a

satisfactory binarization quality, especially for document images with a poor contrast

background, and is significantly faster than the existing method.

Wavelets and MRF Model

A novel scheme for the extraction of textual areas of an image using globally

matched wavelet filters. A clustering-based technique has been devised for estimating

globally matched wavelet filters using a collection of ground truth images .We have

extended our text extraction scheme for the segmentation of document images into text,

background, and picture components (which include graphics and continuous tone

images). Multiple, two-class Fisher classifiers have been used for this purpose.We also

Dept. Of E.C.E, SIETK, PUTTUR

19

Using MATLAB

labeling scheme for refinement of the segmentation results. Experimental results have

established effectiveness of our approach.

The concept of matched wavelets to develop the globally matched wavelet

(GMW) filters specifically adapted for the text and non text region.We have used these

filters for detecting text regions in scene images and for segmentation of document

images into text, picture and background. We find the GMW filters by training matched

wavelets on an image set. The key contribution of our work is that it is

A trainable segmentation scheme based on matched wavelet filters.For highperformance systems, using application specific training image sets (e.g., license plate

images, handwritten text images, printed text images), we can obtain filters customized

for a particular application. Compared to other existing methods , the dimensionality, and,

thus, the computation of the feature space, is considerably reduced. The filtering and the

feature extraction operations account for most of the required computations; however, our

Dept. Of E.C.E, SIETK, PUTTUR

20

Using MATLAB

method is very simple to understand, computationally less expensive, and efficient. In the

latter part, we exploit the contextual information using MRF-based post processing to

improve the results of document segmentation. The rest of the paper is organized as

follows.

2.3.1 Estimating Globally Matched Wavelet Filters

Matched wavelet estimation for any signal is formulated as finding a closed form

expression for extracting the compactly infinitely supported wavelet which maximizes

error norm between the signal reconstructed at initial scaling subspace and successive

lower wavelet subspace [1]. At an abstract level, our system use a set of trained wavelet

filters matched to text and non text classes. When a mixed document (having both text

and non text components) is passed through text matched filters, we get blacked-out

regions in the detail (high pass) space corresponding to the text regions of the document

and vice versa for the non text matched wavelet filters. These blacked-out regions in the

output of text and non text wavelet filters are used to classify various regions as either

text or non text.

In , an approach is proposed for estimating matched wavelets for a given image. It

is further shown in [1] that estimated wavelets with separable kernel have higher peak

signal-to-noise ratio (PSNR) for the same bit-rate as compared with standard 9/7 wavelet.

In this section, we describe a technique for estimating a set of matched wavelets from a

database of images. We term them as GMWs. These GMWs are used to generate feature

vectors for segmentation. We discuss more about their implementation in subsequent

subsections.

2.3.2 Matched Wavelets & Their Estimation

First, we briefly review the theory of matched wavelets with separable kernel as

proposed. Consider a 2-D two-band wavelet system (with separable kernels) shown in

Fig. 2. Here, and are the horizontal and vertical directions, the scaling filter in any

direction is represented as , its dual is represented as , wavelet filter is represented as ,

and its dual is shown as . Further, boxes showing 2 with an upward or downward arrow

Dept. Of E.C.E, SIETK, PUTTUR

21

Using MATLAB

input to this system is a 2-D signal which is, for practical purposes, assumed to be

continuous and the output of the system is another 2-D signal constructed from the sum

of

approximation subspace or scaling subspace, whereas the outputs of the other three

channels are called detail subspaces. This system is designed as biorthogonal wavelet

system which means that it needs to satisfy following conditions for perfect

reconstruction of the two- band filter bank

h1 ( n )=(1 )n f o ( M n)

n

f 1 ( n ) =(1 ) ho ( M n)

(1)

(2)

The scaling function

(t)

(t )

are governed by

two-scale relations for the two-band wavelet system. Similar equations exist for

estimating dual scaling function

22

. The error

Using MATLAB

e ( x )=a ( x )a x

(3)

Where is the continuous 2-D image signal and represents the 2-D image reconstructed

from detail coefficients only. Then, corresponding error energy is defined as

2

E= e ( x ) dx

(4)

In this refer to natural images (photographs for example) as scene images. The

scene images that we consider contain some written or embedded text and everything else

is nontext region. In the document image, however, we consider three components: 1)

text, 2) picture, and 3) background. Backgrounds are continuous tone low frequency

regions with dull features although mixed with noise. Images are continuous tone regions

falling in between text and background.

Thus, for document images we have extended our work described in the last

section (text location in general image) to segmentation of document images into three

classes, viz. text, picture, and background. We have used the same feature vectors and

classified them into three classes. For classification, we have used Fisher classifiers, first

because of the advantages like ease of training (because of projecting the data on one

dimension) and time efficiency. Moreover, the results of the Fisher classifier fit naturally

into our MRF post processing step as explained in the next section.

Dept. Of E.C.E, SIETK, PUTTUR

23

Using MATLAB

The Fisher classifier is often used for two-class classification problems. Although

it can be extended to multiclass classification (three classes in our case), yet the

classification accuracy decreases due to the overlap between neighboring classes. Thus,

we need to make some modifications to the Fisher classifiers (explained in the last

section) to apply them in this case [3].We use three Fisher classifiers, each optimized for

a two-class classification problem (text/picture, picture/background, and background

text). Each classifier outputs a confidence in the classification and the final decision is

made by fusing the outputs of all three classifiers.

Fig. 4. Distribution of Y for image and background as obtained from classifier 1. Similar

distributions are obtained for classifiers 2 and 3.

2.3.4 MRF POSTPROCESSING FOR DOCUMENT IMAGE SEGMENTATION

Segmentation may lead to overlapping in the feature space. This is especially true

for the picture and background classes because of lack of hard distinction between the

textures of these two classes. We deal with this problem by exploiting the contextual

information around each pixel. A similar approach has been used recently in to refine the

results of segmenting the handwritten text, printed text and noise in the document image.

Results for the document image segmentation of previous section show that

misclassification happens either in form of occurrence of certain isolated clusters of

another class in a given class or at the boundaries of the different classes as indicated by

Fig. 5. Removing this misclassification is equivalent to making the classification

smoother. In this section, we present

24

Using MATLAB

Figure 2.6: Document Segmentation results obtained for two sample images

Images show that the misclassification occurs either at the class boundaries or

because of the presence of small isolated clusters.The problem of correcting the

misclassification belongs to very general class of problems in vision and can be

formulated in terms of energy minimization. Every pixel must be assigned a label in the

set {text, picture, background}. refers to a particular labeling of the pixels and refers to

the value of the label of a particular pixel. We consider the first order MRFmodel. This

simplifies the energy function to the following form:

E (f )=

{p q } N

V p ,q ( f p , f q ) + D p ( f p)

p P

(5)

Inputs to the algorithm are the classification confidence maps (for image, text,

background) and labelings (initial results) that we obtained in the last section using this

classification confidence maps. Using the initial labelings, algorithm evaluates the

interaction energy ( ) and minimize the total energy ( ) to obtain a new labeling. This step

is repeated till no further minimization is possible, finally leaving the resulting optimized

labeling.

Dept. Of E.C.E, SIETK, PUTTUR

25

Using MATLAB

Image to text conversion is the vital area of research for many years. Mainly,

Optical Character Recognition (OCR) in use to extract characters from the image.

Character segmentation is a preprocessing step for an OCR. In this paper, we have

discussed different character segment methods used in various domains. Some of the

methods are used for handwritten character recognition and some of the methods are for

vehicle Number Plate (NP) detection.

The major focus of this research is to identify the approaches that can be useful in

the vehicle NP detection. After analyzing the existing character segmentation methods,

the favored methods for NP detection are discussed in the conclusion section. The paper

is concluded by suggesting the future scope of research in this research area.

Image to text processing is the topic of research for last several years. The most

common method is Optical Character Recognition (OCR) to extract text from the images.

In large and complex images, it is essential to segment the image and then extract the

characters by using character segmentation method. Then after the segmented character

should be sent to OCR engine for the further process.

The process is well depicted in Fig.1. As shown in this fig 1 (a), first an image of

number plate (NP) is captured which is further processed to find the region of interest. In

this figure, the purpose is to extract the characters of captured NP to detect vehicle

number.

By using an image segmentation process the number plate region is detected as

shown in fig 1 (b). In order to identify the vehicle number each character should be

clipped from the segmented NP. This task can be accomplished by using character

segmentation method. The segmented characters are shown in fig 1 (c).

26

Using MATLAB

Figure2.7: Image

segmentation and

character segmentation

which is followed by discussion and conclusion section. The paper is concluded by

suggesting future scope in the area of character segmentation.

Different character segmentation methods are discussed in this paper. There are

different methods of character segmentation such as localized histogram multilevel

thresholding, Bayes theorem, prior knowledge, feature extraction, dynamic programming,

nonlinear clustering, multistage graph search algorithm, segment confidence-based binary

segmentation,

separator

symbols

frame

of

reference

and

horizontal-vertical

segmentation. All these methods are very useful as a preprocessing step for the OCR.

Some of algorithms based on prior knowledge and separator symbols frame of reference

might not be useful for NP segmentation as it is difficult have prior knowledge regarding

vehicle NP in advance. Dynamic programming and Segment confidence-based binary

segmentation (SCBS) based methods can be really useful for NP character extraction

Dept. Of E.C.E, SIETK, PUTTUR

27

Using MATLAB

Traditional generative Markov random fields for segmenting images model the

image data and corresponding labels jointly, which requires extensive independence

assumptions for tract- ability. We present the conditional random field for an application i

n sign detection, using typical scale and orientation selective texture filters and a

nonlinear texture operator based on the grating cell. The resulting model captures

dependencies between neighboring image region labels in a data-dependent way that

escapes the difficult problem of modeling image formation, instead focusing effort and

computation on the labeling task. We compare the results of training the model with

pseudo-likelihood against an approximation of the full likelihood with the iterative tree re

parameterization algorithm and demonstrate improvement over previous methods.

Image segmentation and region labeling are common problems in computer

vision. In this work, we seek to identify signs in natural images by classifying regions

according to their textural properties. Our goal is to integrate with a wearable system that

will recognize any detected signs as a navi gational aid to the visually impaired. Generic

sign detection is a difficult problem. Signs may be located anywhere in an image, exhibit

a wide range of sizes, and contain an extraordinarily broad set of fonts, colors,

arrangements, etc. For these reasons, we treat signs as a general texture class and seek to

discriminate such a class from the many others present in natural images.

The value of context in computer vision tasks has been studied in various ways

for many years. Two types of context are important for this problem:label context and

data context. In the absence of label context, local regions are classified independently,

which is a common approach to object detection. Such disregard for the (unknown) labels

of neighboring regions often leads to isolated false positives and missing false negatives.

The absence of data context means ignoring potentially helpful image data from any

neighbors of the region being classified. Both contexts are simultaneously important. For

instance, since neighboring regions often have the same label, we could penalize label

discontinuity in an image. If such regularity is imposed without regard for the actual data

in a region and local evidence for a label is weak, then continuity constraints would

typically override the local data. Conversely, local region evidence for a \sign" label

Dept. Of E.C.E, SIETK, PUTTUR

28

Using MATLAB

could be weak, but a strong edge in the adjoining region might bolster belief in the

presence of a sign at the site because the edge indicates a transition. Thus, considering

both the labels and data of neighboring regions is important for predicting labels. This is

exactly what the conditional random field (CRF) model provides. The advantage of the

discriminative contextual model over a generative one for detection tasks has recently

been shown in [8]. We demonstrate a training method that improves prediction results,

and we apply the model to a challenging real-world task. First the details of the model

and how it divers from the typical random field are described, followed by a description

of the image features we use. We close with experiments and conclusions.

2.5.1 Image Features for Sign Detection

Text and sign detection has been the subject of much research. Earlier approaches

either use independent, local classifications or use heuristic methods, such as connected

component analysis. Much work has been based on edge detectors or more general

texture features, as well as color. Our approach calculates a joint labeling of image

patches, rather than labeling patches independently, and it obviates layout heuristics by

allowing the CRF to learn the characteristics of regions that contain text. Rather than

simply using functions of single filters (e.g., moments) or edges, we use a richer

representation that captures important relationships between responses to different scaleand orientation-selective filters.

To measure the general textural properties of both sign and especially non-sign

(hence, background) image regions, we use responses of scale and orientation selective

filters. Specifically, we use the statistics of filter responses described in, where

correlations between steerable pyramid responses of different scales and orientations are

the prominent features.

A biologically inspired non-linear texture operator for detecting gratings of bars at

a particular orientation and scale is described. Scale and orientation selective filters, such

as the steerable pyramid or Gabor filters, respond indiscriminately to both single edges

and one or more bars.

Dept. Of E.C.E, SIETK, PUTTUR

29

Using MATLAB

Figure 2.8: Grating cell data flow for a single scale and orientation.

Two boxes at I, T, and F represent center on and center filters, while the boxes at M are

for the six receptive fields.Using an algorithm that ranks discriminative power of random

field model features, we found the top three in the edge-less, context-free Max Ent model

to be (i) the level of green hue (easily identifying vegetation as background), (ii) mean

grating cell response (easily identifying text), and (iii) correlation between a vertically

and diagonally oriented filter of moderate scale (the single most useful other `textural'

feature).

CHAPTER-3

High Quality MRC Document Coding MRC model

Dept. Of E.C.E, SIETK, PUTTUR

30

Using MATLAB

The mix raster content (MRC) model can be used to implement highly effective

document compression algorithms. However, many MRC based methods which achieve

high compression ratio can distort some ne details of document quality, such as thin

lines, and text edges. In this pa-per, we present a method called resolution enhanced

rendering (RER) to achieve high quality rendering of document containing text, pictures

and graphics, while main-tainting desired compression ratios. The method applies

adaptive dithering to the MRC encoder and then performs a nonlinear prediction in the

MRC decoder. Both the dither-ing and nonlinear prediction algorithms are jointly

optimized to produce the best quality rendering.

We present experimental results illustrating the performance of our method and

comparing it to some existing MRC compression algorithms. Document imaging

applications such as scan-to-print, document archiving, and internet fax are driving the

need for document compression standards that maintain high quality while achieving

high compression ratios. Recently, the mix raster content (MRC) compression model has

been adopted as a standard for document encoding. The MRC standard allows raster

documents to be coded at very high compression ratios but with much lower distortion

than would be possible using conventional image coding methods. While MRC methods

are much better than conventional transform coders, they still can substantially distort

ne document details, such as thin lines and text edges.

In this paper, we propose a method called resolution enhanced rendering (RER)

for jointly optimizing the MRC encoder and decoder to achieve high quality rendering of

document text, image and graphics, while maintaining de-sired compression ratios. The

method works by adaptively dithering the mask layer of a three-layer MRC encoding to

produce the intermediate tone levels required for high quality rendering. The dithering is

performed using a novel adaptive error diffusion algorithm. A tree-based nonlinear

predictor is then designed into the MRC decoder to reconstruct the desired intermediate

tones. Both the dithering and nonlinear prediction algorithms are jointly optimized to

produce the best quality rendering. The optimization is performed by iteratively

optimizing the encoder and de-coder to achieve the minimum distortion.

Dept. Of E.C.E, SIETK, PUTTUR

31

Using MATLAB

with the MRC standard. That is to say, the RER enhanced encoder works with a

conventional MRC decoder, and the RER enhanced decoder works with a conventional

MRC encoder. Second, the method can be implemented using the previously proposed

the Rate Distortion Optimized Segmentation (RDOS) method. The RDOS method

computes the segmentation that minimizes a combination of bit-rate and distortion. We

will present experimental results comparing the performance of RDOS compression with

and without the RER method.

An MRC encoder is based on the Mixed Raster Content imaging model, which

represents a document by layers with different properties. As shown in Figure 1, a threelayer MRC document contains a background layer, a fore-ground layer, and a binary

mask layer. At each pixel, the value of the binary mask is used to select between the foreground and background pixels. In the MRC model, each layer is compressed

independently. This adds some inefficiency since the foreground and background layers

must be coded even when they are not used, but it simplifies the imaging model.

Typically, the foreground and background layers are compressed using natural image

coders such as JPEG or embedded wavelet coder; whereas the binary mask is typically

encoded with a lossless binary encoder such as JBIG or JBIG2.

Figure 3.1: MRC imaging model forms text and line art by using a binary mask to

choose between foreground and background layers.

In this work, we focus on the rate distortion optimized segmentation (RDOS)

algorithm. Strictly speaking, the RDOS encoder is not a true MRC encoder because it

does not encode each layer of the MRC model independently. Nonetheless, the RDOS

Dept. Of E.C.E, SIETK, PUTTUR

32

Using MATLAB

method can in principal be modified to be a true MRC method, and the methods

introduced in this work are equally applicable to any typical MRC encoder.

The RDOS algorithm classifies each 88 block of pixels into one of four

classes: picture block, two-color block, one-color block or other block. Each

class corresponds to a different coding method. The picture and other blocks use JPEG

block encoders. The one-color blocks are entropy coded using an arithmetic encoder. For

each two-color block, both the foreground and background colors are entropy coded

using the arithmetic encoders while the 8 8 binary mask is encoded using a JBIG2

encoder. The class of each block is chosen to maximize the rate-distortion performance

over the entire document. The optimization is achieved by applying each candidate

coding method to each block and then selecting the method which yields the best ratedistortion trade-off.

MRC encoders have an enormous advantage for document encoding because

they can efficiently encode text and line art with very high spatial resolution. However,

one limitation of conventional MRC encoders is that the mask can only represent binary

transitions at text edges. This makes accurate representation of text edges difficult. In

principal, it is possible to add edge detail to the foreground or background layers;

however, in practice this detail is lost when those layers are encoded using natural image

coders at acceptable bit rates. Figure 3.2 shows how the resolution enhanced rendering

(RER) algorithm adds edge detail while retaining the binary MRC mask layer.

First, the RER encoder segments the foreground and background using an

adaptive error diffusion method. This error diffusion method effectively dithers the binary

mask along the edge of the character to represent the gradual transition of true raster

scanned text characters. The error diffusion algorithm uses the local value of the mask to

adapt the error diffusion weights so that error is diffused along the 1-D mask boundary.

The RER decoder uses the binary mask, together with the foreground and background

colors to estimate the true value of the document pixels. This estimation is done using a

nonlinear tree-structured predictor as described in [8, 9]. Importantly, this predictor is

33

Using MATLAB

trained to identify the characteristic patterns of the RER encoder. Therefore, it can do a

much better job of accurately estimating the true pixel values.

Figure3.2: Illustration of MRC encoder and decoder with RER. Examples were selected

from actual RER inputs and outputs.

Figure 3.3 illustrates how the RER encoder and decoder are jointly optimized to

maximize the quality of the decoded document. As we will see, both the encoder and the

decoder have parameters which can be trained to produce the best possible result. The

error diffusion algorithm has +ve parameters which control its behavior, and the

nonlinear predictor has a large number of parameters which specify the nodes of a

nonlinear regression tree.

34

Using MATLAB

Figure 3.3: Overview of method used to train the optimized encoder and decoder. Once

training is complete, the encoder and decoder function independently.

In each iteration of the optimization, the parameters of the encoder or decoder are

alternatively fixed, while the parameters of the other one are optimized. Importantly, two

different sets of documents are used for training the encoder and decoder. We have found

this improves the robustness of the training procedure. Experimental results are shown

for test documents that are not contained in either set of training documents. The

experimental results indicate that this training process robustly converges to parameters

which reduce the distortion of the decoded document. Moreover, we have found that joint

optimization of the encoder and decoder performs substantially better than independent

optimization of these two functions.

3.3.1 The RER Encoder

Let Xs be a pixel in the raster document at location s. In the MRC format, each

pixel also has an associated foreground color, Fs, and background color, Bs. The binary

MRC mask then determines whether Fs or Bs will be used to represent the true pixel

value Xs. In RDOS encoding, the foreground and background colors are constant in 88

blocks. But in other MRC encoding methods, the values of the foreground and

background colors can change from pixel to pixel. Next define the scalar value s which

determines the relative mixture of foreground and background color in the pixel Xs. More

specifically, s is given by the value on the real line which minimizes the squared error

Dept. Of E.C.E, SIETK, PUTTUR

35

Using MATLAB

Figure 4 gives a geometric interpretation of s as the projection of the true pixel color

onto the line connecting the foreground and background colors. The solution to this least

squares approximation problem is given by

Figure3.4: Least

squares

foreground, Fs, colors.

Notice that when a pixel is well approximated by either the foreground or

background color, then s is small. On the other hand, when a pixel is best approximated

by a mixture of foreground and background colors, then s is large. The value of s will

be used to control the local adaptation of the error diffusion algorithm.

Typically, MRC encoders compute the binary mask by classifying each pixel as

foreground or background independently. However, the RER encoder computes the binary mask by applying a form of adaptive error diffusion to the gray scale image s. This

effectively dithers the binary mask along character edges so that the mask can more

accurately represent ne gradations in the transition between foreground and background

colors. While the RER error diffusion method is similar to serpentine scan Floyd

Steinberg error diffusion, it is specially designed and optimized to diffuse error along text

edge transitions. This is done by adaptively setting the error diffusion weights at each

pixel. Figure 5 illustrates the four future pixels s0; s1; s2; s3 which neighbor s in

Dept. Of E.C.E, SIETK, PUTTUR

36

Using MATLAB

serpentine scan order. Then ws0; ws1; ws2; ws3 are the values of the four corresponding

error diffusion weights. The values of these four weights are varied at each pixel s using

the formula

3.3.2 RER Decoder

Here we assume that the foreground and background colors are the same as used

in the RER encoder. The nonlinear predictor works by rst extracting the binary mask in

a 55 window about the pixel in question. Optimal Estimator of Given z Tree-Based

Classifier Binary Image of Mask Layer

Xs = Fs + (1 - l s) s

37

Using MATLAB

This data forms a binary vector, zs, which is then used as input to a binary regression tree

predictor known as Tree- Based Resolution Synthesis (TBRS). The TBRS predictor

estimates the value of s in a two-step process. First, it classifies the vector zs into one of

M classes using a binary tree classifier. The basic idea of TBRS is to use a binary

regression tree as a piecewise linear approximation to the conditional mean estimator.

The classification step is essential because it can separate out the distinct regions of the

document corresponding to mask edges of different orientation and shape.

One additional complication occurs with the RDOS method. Since it is not a true

MRC encoder, pixels which fall outside of two-color blocks have no binary mask values.

This can cause a problem when the pixel s falls near the boundary of a block, and the 5

5 window about the pixel covers part of the adjacent block that is not a two-color block.

In this case, the pixels are classified as 0, 1, or 2 depending on if they are close to the

background color, the foreground color or neither color. Then the values, 1, and 2 are

encoded as binary values 00, 01, and 10, to insure that the input vector zs remain binary.

3.3.3 Training

The objective of the training process is to optimize the performance of the RER

encoder and decoder by selecting the encoder and decoder parameters to maximize the

decoded document quality over a training set of documents. The distortion metric used to

measure document quality is mean squared error. While mean squared error is not always

a good measure of quality, for this application we found that it was always well correlated

with our subjective evaluation of quality.

The training process alternated between optimization of the encoder and decoder

parameters. So, when optimizing the encoder parameters, the previously obtained

decoder parameters were used; and when optimizing the decoder parameters, the

previously obtained encoder parameters were used. The training phases for encoder and

decoder used different sets of training data. This strategy seemed to produce more robust

training results. The iterative optimization is always started by optimizing the decoder

and using the initial encoder parameters

38

Using MATLAB

computationally expensive to train. The training process includes three steps: generating

training vector pairs, building the regression tree, and generating the least square

prediction filters.

The encoder is optimized by searching for the value of the vector, in which

minimizes the mean squared error averaged over the set of training documents. This

search is initialized at the current value of the vector and uses the decoder parameters

resulting from the last optimization of the decoder. The optimization of is done by

sequentially perturbing the elements of the vector using an iterative coordinate decent

strategy. Each perturbation is made using a step size of which is initialized to the value

= 0:2. The decision to perturb each component by or to leave it unchanged is made

based on the value of the mean squared error computed with the set of training

documents. If the mean squared error does not decrease with the perturbation of every

element of the parameter vector, then the size of the perturbation is automatically reduced

using.

Figure3.7: Comparison of compression results. (a) A portion of the original test image;

(b) Compressed by standard RDOS at 0.184 bpp (130:1 compression ratio); (c)

Compressed by RER enhanced RDOS at 0.182 bpp (132:1 compression ratio).

Dept. Of E.C.E, SIETK, PUTTUR

39

Using MATLAB

CHAPTER-4

PROPOSED SYSTEM

Our segmentation method is composed of two algorithms that are applied in

sequence: the cost optimized segmentation (COS) algorithm and the connected

component classification (CCC) algorithm. The COS algorithm is a block wise

segmentation algorithm based upon cost optimization. The COS produces a binary image

from a gray level or color document; however,

the resulting binary image typically contains many false text detections.

The CCC algorithm further processes the resulting binary image to improve the

accuracy of the segmentation. It does this by detecting non text components (i.e., false

text detections) in a Bayesian framework which incorporates an Markov random field

(MRF) model of the component labels. One important innovation of our method is in the

design of the MRF prior model used in the CCC detection of text components. In

particular, we design the energy terms in the MRF distribution so that they adapt to

attributes of the neighboring components relative locations and appearance. By doing

this, the MRF can enforce stronger dependencies between components which are more

likely to have come from related portions of the document.

The COS algorithm is a block-based segmentation algorithm formulated as a

global cost optimization problem. The COS algorithm is comprised of two components:

blockwise segmentation and global segmentation. The blockwise segmentation divides

the input image into overlapping blocks and produces an initial segmentation for each

block. The global segmentation is then computed from the initial segmented blocks so as

to minimize a global cost function, which is carefully designed to favor segmentations

that capture text components. The parameters of the cost function are optimized in an

offline training procedure. A block diagram for COS is shown in Fig. 4.1.

Dept. Of E.C.E, SIETK, PUTTUR

40

Using MATLAB

Figure4.1: COS algorithm comprises two steps: block wise segmentation and global

segmentation. The parameters of the cost function used in the global segmentation are

optimized in an offline training procedure.

4.1.1 Block wise Segmentation

Block wise segmentation is performed by first dividing the image into

overlapping blocks, where each block contains MM pixels, and adjacent blocks overlap

by pixels in both the horizontal and vertical directions. The blocks are denoted by for ,

and , where and are the number of the blocks in the vertical and horizontal directions,

respectively. If the height and width of the input image is not divisible by , the image is

padded with zeros. For each block, the color axis having the largest variance over the

block is selected and stored in a corresponding gray image block, . The pixels in each

block are segmented into foreground (1) or background (0) by the clustering method

of Cheng and Bouman [24]. The clustering method classifies each pixel in by comparing

it to a threshold . This threshold is selected to minimize the total subclass variance. More

specifically, the minimum value of the total subclass variance is given by

41

Using MATLAB

where and are number of pixels classified as 0 and 1 in by the threshold , and and are the

variances within each subclass (see Fig. 3). Note that the subclass variance can be

calculated efficiently. First, we create a histogram by counting the number of pixels

which fall into each value between 0 and 255. For each threshold , we can recursively

calculate and from the values calculated for the previous threshold of .

Figure4.2: Illustration of a block wise segmentation. The pixels in each block are

separated into foreground (1) or background (0) by comparing each pixel

4.1.2 Global Segmentation

The global segmentation step integrates the individual segmentations of each

block into a single consistent segmentation of the page. To do this, we allow each block

to be modified using a class assignment denoted by,

42

Using MATLAB

Notice that for each block, the four possible values of correspond to four possible

changes in the blocks segmentation: original, reversed, all background, or all foreground.

If the block class is original, then the original binary segmentation of the block is

retained. If the block class is reversed, then the assignment of each pixel in the block is

reversed. If the block class is set to all background or all foreground, then the pixels

in the block are set to all 0s or all 1s, respectively. Fig. 4 illustrates an example of the

four possible classes where black indicates a label of 1 (foreground) and white

indicates a label of 0 background).

minimizing the following global cost as a function of the class assignments, for all, i,j

As it is shown, the cost function contains four terms, the first term representing the fit of

the segmentation to the image pixels, and the next three terms representing regularizing

constraints on the segmentation. The values , , and are then model parameters which can

be adjusted to achieve the best segmentation quality. The first term is the square root of

the total subclass variation within a block given the assumed segmentation. More

specifically

Dept. Of E.C.E, SIETK, PUTTUR

43

Using MATLAB

where is the standard deviation of all the pixels in the block. Since must always be less

than or equal to , the term can always be reduced by choosing a finer segmentation

corresponding to or 1 rather than smoother segmentation corresponding to or 3. The

terms and regularize the segmentation by penalizing excessive spatial variation in the

segmentation. To compute the term , the number of segmentation mismatches between

pixels in the overlapping region between block and the horizontally adjacent block is

counted. The term is then calculated as the number of the segmentation mismatches

divided by the total number of pixels in the overlapping region. Also is similarly defined

for vertical mismatches. By minimizing these terms, the segmentation of each block is

made consistent with neighboring blocks. The term denotes the number of the pixels

classified as foreground (i.e., 1) in divided by the total number of pixels in the block.

This cost is used to ensure that most of the area of image is classified as background. For

computational tractability, the cost minimization is iteratively performed on individual

rows of blocks, using a dynamic programming approach [37]. Note that row-wise

approach does not generally minimize the global cost function in one pass through the

image.

Therefore, multiple iterations are performed from top to bottom in order to

adequately incorporate the vertical consistency term. In the first iteration, the

optimization of th row incorporates the term containing only the throw. Starting from the

second iteration, terms for both the th row and th row are included. The optimization

stops when no changes occur to any of the block classes. Experimentally, the sequence of

updates typically converges within 20 iterations.

The cost optimization produces a set of classes for overlapping blocks. Since the

output segmentation for each pixel is ambiguous due to the block overlap, the final COS

segmentation output is specified by the center region of each overlapping block. The

weighting coefficients , , and were found by minimizing the weighted error between

Dept. Of E.C.E, SIETK, PUTTUR

44

Using MATLAB

A ground truth segmentation was generated manually by creating a mask that indicates

the text in the image. The weighted error criteria which we minimized is given by

where , and the terms and are the number of pixels in the missed detection and false

detection categories, respectively.

45

Using MATLAB

Figure 3.4. Illustration of how the component inversion step can correct erroneous

segmentations of text. (a) Original document before segmentation. (b) Result of COS

binary segmentation. (c) Corrected segmentation after component inversion.

The CCC algorithm refines the segmentation produced by COS by removing

many of the erroneously detected nontext components. The CCC algorithm proceeds in

three steps: connected component extraction, component inversion, and component

classification. The connected component extraction step identifies all connected

components in the COS binary segmentation using a 4-point neighborhood. In this case,

connected components less than six pixels were ignored because they are nearly invisible

Dept. Of E.C.E, SIETK, PUTTUR

46

Using MATLAB

at 300 dpi resolution. The component inversion step corrects text segmentation errors that

sometimes occur in COS segmentation when text is locally embedded in a highlighted

region. Fig. 5(b) illustrates this type of error where text is initially segmented as

background. Notice the text 100 Years of Engineering Excellence is

initially

segmented as background due to the red surrounding region. In order to correct these

errors, we first detect foreground components that contain more than eight interior

background components (holes).

In each case, if the total number of interior background pixels is less than half of

the surrounding foreground pixels, the foreground and background assignments are

inverted. Fig. 5(c) shows the result of this inversion process. Note that this type of error is

a rare occurrence in the COS segmentation. The final step of component classification is

performed by extracting a feature vector for each component, and then computing a MAP

estimate of the component label. The feature vector, , is calculated for each connected

component, , in the COS segmentation. Each is a 4-D feature vector which describes

aspects of the th connected component including edge depth and color uniformity.

Finally, the feature vector is used to determine the class label, , which takes a value of 0

for nontext and 1 for text.

dependency between random variables

47

Using MATLAB

The Bayesian segmentation model used for the CCC algorithm is shown in Fig. 6.

The conditional distribution of the feature vector given is modeled by a multivariate

Gaussian mixture while the underlying true segmentation labels are modeled by an MRF.

Using this model, we classify each component by calculating the MAP estimate of the

labels, , given the feature vectors, . In order to do this, we first determine which

components are neighbors in the MRF. This is done based upon the geometric distance

between components on the page.

4.2.1 Statistical Model

Here, we describe more details of the statistical model used for the CCC

algorithm. The feature vectors for text and non-text groups are modeled as Ddimensional multivariate Gaussian mixture distributions

distribution. In order to simplify the data model, we also assume that the values are

conditionally independent given the associated values.

The components of the feature vectors include measurements of edge depth and

external color uniformity of the connected component. The edge depth is defined as the

Euclidean distance between RGB values of neighboring pixels across the component

boundary (defined in the initial COS segmentation). The color uniformity is associated

with the variation of the pixels outside the boundary. In this experiment, we defined a

feature vector with four components, where the first two are mean and variance of the

edge depth and the last two are the variance and range of external pixel values. More

Dept. Of E.C.E, SIETK, PUTTUR

48

Using MATLAB

details are provided in the Appendix. To use an MRF, we must define a neighborhood

system. To do this, we first find the pixel location at the center of mass for each

connected component. Then for each component we search outward in a spiral pattern

until the nearest neighbors are found. The number is determined in an offline training

process along with other model parameters. We will use the symbol to denote the set of

neighbors of connected component. To ensure all neighbors are mutual, if component is a

neighbor of component, we add component to the neighbor list of component if this is not

already the case. In order to specify the distribution of the MRF, we first define

augmented feature vectors.

The augmented feature vector,, for the connected component consists of the

feature vector concatenated with the horizontal and vertical pixel location of the

connected components center. We found the location of connected components to be

extremely valuable contextual information for text detection. For more details of the

augmented feature vector, see the Appendix. Next, we define a measure of dissimilarity

between connected components in terms of the Mahalanobis distance of the augmented

feature vectors given by

Where

49

Using MATLAB

Using the defined neighborhood system, we adopted an MRF model with pairwise cliques. Let be the set of all where and denote neighboring connected components.

Then, the are assumed to be distributed as

Where S(.) is an indicator function taking the value 0 or 1, and , , and are scalar

parameters of the MRF model. As we can see, the classification probability is penalized

by the number of neighboring pairs which have different classes. This number is also

weighted by the term . If there exists a similar neighbor close to a given component, the

term becomes large since Di,j is small. This favors increasing the probability

that the two similar neighbors have the same class.

function of the distance between the two components.

Dept. Of E.C.E, SIETK, PUTTUR

50

Using MATLAB

detections. The term may take a positive or negative value. If is positive, both text

detections and false detections increase. If it is negative, false and missed detections are

reduced. To find an approximate solution of (12), we use iterative conditional modes

(ICM) which sequentially minimizes the local posterior probabilities [38], [39]. The

classification labels, , are initialized with their maximum likelihood (ML) estimates, and

then the ICM procedure iterates through the set of classification labels until a stable

solution is reached.

In order to improve accuracy in the detection of text with varying size, we

incorporated a multiscale framework into the COS/CCC segmentation algorithm. The

multiscale framework allows us to detect both large and small components by combining

results from different resolutions. Since the COS algorithm uses a single block size (i.e.,

single scale), we found that large blocks are typically better suited for detecting large

text, and small blocks are better suited for small text. In order to improve the detection of

both large and small text, we use a multiscale segmentation scheme which uses the results

of coarse-scale segmentations to gui de segmentation on finer scales. Note that both COS

and CCC segmentations are per formed on each scale, however, only COS is adapted to

the multiscale scheme.

51

Using MATLAB

from coarse to fine scales, incorporating the segmentation result from the previous

coarser scale. Both COS and CCC are performed on each scale, however only COS was

adapted to the multiscale scheme.

The overview of our multiscale-COS/CCC scheme. In the multiscale scheme,

segmentation progresses from coarse to fine scales, where the coarser scales use larger

block sizes, and finer scales use smaller block sizes. Each scale is numbered from L-1 to

0, where L-1 is the coarsest scale and 0 is the finest scale. The COS algorithm is modified

to use different block sizes for each scale, and incorporates the previous coarser

segmentation result by adding a new term to the cost function.

the weighting factor of the error for false detection, was fixed to 0.5 for the multiscaleCOS/CCC training process.

52

Using MATLAB

The MRF-based binary segmentation used for the comparison is based upon the

MRF statistical model developed by Zheng and Doermann. The purpose of their

algorithm is to classify each component as either noise, hand-written text, or machine

printed text from binary image inputs. Due to the complexity of implementation, we used

a modified version of the CCC algorithm incorporating their MRF model by simply

replacing our MRF classification model by their MRF noise classification model. The

multiscale COS algorithm was applied without any change. The clique frequencies of

their model were calculated through offline training using a training data set. Other

parameters were set as proposed in the paper.

4.5.1 Preprocessing

For consistency all scanner outputs were converted to RGB color coordinates and

descreened before segmentation. The scanned RGB values were first converted to an

intermediate device-independent color space, CIE XYZ, then transformed to RGB. Then,

resolution synthesis-based denoising (RSD) was applied.

This descreening procedure was applied to all of the training and test images. For

a fair comparison, the test images which were fed to other commercial segmentation

software packages were also descreened by the same procedure.

4.5.2 Segmentation Accuracy and Bit rate

To measure the segmentation accuracy of each algorithm, we used a set of

scanned documents along with corresponding ground truth segmentations. First, 38

documents were chosen from different document types, including flyers, newspapers, and

magazines. The documents were separated into 17 training images and 21 test images,

and then each document was scanned at 300 dots per inch (dpi) resolution on the EPSON

STYLUS PHOTO RX700 scanner. After manually segmenting each of the scanned

documents into text and nontext to create ground truth segmentations, we used the

training images to train the algorithms, as described in the previous sections. The

remaining test images were used to verify the segmentation quality. We also scanned the

53

Using MATLAB

test documents on two additional scanners: the HP Photo smart 3300 All-in-One series

and Samsung SCX-5530 FN.

These test images were used to examine the robustness of the algorithms to

scanner variations Fig4.9 illustrates segmentations generated by Otsu/CCC, multiscaleCOS/CCC/Zheng, DjVu, Lura Document, COS, COS/CCC, and multiscale-COS/CCC for

a 300 dpi test image. The ground truth segmentation is also shown. This test image

contains many complex features such as different color text, light-color text on a dark

background, and various sizes of text. As it is shown, COS accurately detects most text

components but the number of false detections is quite large.

However, COS/CCC eliminates most of these false detections without

significantly sacrificing text detection. In addition, multiscale- COS/CCC generally

detects both large and small text with minimal false component detection. Otsu/CCC

method misses many text detections. Lura Document is very sensitive to sharp edges

embedded in picture regions and detects a large number of false components. DjVu also

detects some false components but the error is less severe than LuraDocument.

Multiscale-COS/CCC/Zhengs result is similar to our multiscale- COS/CCC result but

our text detection error is slightly less.

54

Using MATLAB

Lura Document, COS, COS/CCC, and multiscale-COS/CCC. (a) Original test image. (b)

Ground truth segmentation. (c) Otsu/CCC. (d) Multiscale-COS/CCC/Zheng. (e) DjVu. (f)

Lura Document. (g) COS. (h) COS/CCC. (i) Multiscale- COS/CCC.

CHAPTER-5

SOFTWARE REQUIREMENTS

The main tools required for this project can be classified into two broad categories.

1) Hardware requirement,

2) Software requirement.

In the hardware part a normal computer where MATLAB software can be easily

operated is required, i.e. with a minimum system configuration of: RAM 512MB, hard

disk 20GB and with a processor Pentium III.

In the software part MATLAB software and the video, which is to be stabilized, is

the minimum requirement. Some of the benefits from MATLAB in video processing are:

55

Using MATLAB

Built in functions for complex operations and algorithms (Ex. FFT, DCT, etc)

MATLAB is a high-performance language for technical computing. It integrates

computation, visualization, and programming in an easy-to-use environment where

problems and solutions are expressed in familiar mathematical notation. Typical uses

include:

Algorithm development

MATLAB is an interactive system whose basic data element is an array that does

not require dimensioning. This allows solving many technical computing problems,

especially those with matrix and vector formulations.

The name MATLAB stands for matrix laboratory. MATLAB was originally

written to provide easy access to matrix software developed by the LINPACK and

EISPACK projects. Today, MATLAB uses software developed by the LAPACK and

ARPACK projects, which together represent the state-of-the-art in software for matrix

computation.

MATLAB has evolved over a period of years with input from many users. In

university environments, it is the standard instructional tool for introductory and

advanced courses in mathematics, engineering, and science. In industry, MATLAB is the

tool of choice for high-productivity research, development, and analysis.

Dept. Of E.C.E, SIETK, PUTTUR

56

Using MATLAB

Very important to most users of MATLAB, toolboxes allow learning and applying

specialized technology. Toolboxes are comprehensive collections of MATLAB functions

(M-files) that extend the MATLAB environment to solve particular classes of problems.

Areas in which toolboxes are available include signal processing, control systems, neural

networks, fuzzy logic, wavelets, simulation, and many others.

5.2.2 Overview

The MATLAB system consists of five main parts:

A .Development Environment

This is the set of tools and facilities that help to use MATLAB functions and files.

Many of these tools are graphical user interfaces. It includes the MATLAB desktop and

Command Window, a command history, and browsers for viewing help, the workspace,

files, and the search path.

B. The MATLAB Mathematical Function Library

This is a vast collection of computational algorithms ranging from elementary

functions like sum, sine, cosine, and complex arithmetic, to more sophisticated functions

like matrix inverse, matrix eigenvalues, Bessel functions, and fast Fourier transforms.

C. The MATLAB Language

This is a high-level matrix/array language with control flow statements, functions,

data structures, input/output, and object-oriented programming features. It allows both

"programming in the small" to rapidly create quickly and dirty throwaway programs, and

"programming in the large" to create complete large and complex application programs.

D. Handle Graphics

This is the MATLAB graphics system. It includes high-level commands for twodimensional and three-dimensional data visualization, image processing, animation, and

presentation graphics. It also includes low-level commands that allow to fully customize

the appearance of graphics as well as to build complete graphical user interfaces on

MATLAB applications.

E. The MATLAB Application Program Interface (API)

57

Using MATLAB

This is a library that allows writing C and FORTRAN programs that interact with

MATLAB. It includes facilities for calling routines from MATLAB (dynamic linking),

calling MATLAB as a computational engine, and for reading and writing MAT-files.

5.2.3 Basic programming

This part provides a brief introduction to starting and quitting MATLAB, and the

tools and functions that help to work with MATLAB variables and files.

A. Starting MATLAB

On a Microsoft Windows platform, to start MATLAB, double-click the MATLAB

shortcut icon. On a UNIX platform, to start MATLAB, type mat lab at the operating

system prompt. To can change the directory in which MATLAB starts, define startup

options including running a script upon startup, and reduce startup time in some

situations.

B. Quitting MATLAB

To end MATLAB session, select Exit MATLAB from the File menu in the

desktop, or type quit in the Command Window. To execute specified functions each time

MATLAB quits, such as saving the workspace, create and run a finish.m script.

C. MATLAB Desktop

When starting MATLAB, the MATLAB desktop appears, containing tools

(graphical user interfaces) for managing files, variables, and applications associated with

MATLAB. The first time MATLAB starts, the desktop appears as shown in the following

illustration, although your Launch Pad may contain different entries.

To can change the way desktop looks can be done by opening, closing, moving,

and resizing the tools in it. To move tools outside of the desktop or return them back

inside the desktop (docking) can also be done. All the desktop tools provide common

features such as context menus and keyboard shortcuts. Selecting Preferences from the

File menu can also specify certain characteristics for the desktop tools. For example,

specifying the font characteristics for Command Window text.

Dept. Of E.C.E, SIETK, PUTTUR

58

Using MATLAB

D. Desktop Tools

This section provides an introduction to MAT Labs desktop tools. MATLAB

functions are also used to perform most of the features found in the desktop tools. The

tools are:

Command Window

Command History

Launch Pad

Help Browser

Workspace Browser

Array Editor

Editor/Debugger

i.

Command Window

Use the Command Window to enter variables and run functions and M-files.

ii.

Command History

The lines entered in the Command Window are logged in the Command History

window. In the Command History, previously used functions can be viewed, and copy

and execute selected lines. To save the input and output from a MATLAB session to a

file, use the diary function.

iii.

MATLAB can also be used to run external programs from the Command Window.

The exclamation point character ! is a shell escape and indicates that the rest of or the

input line is a command to the operating system. This is useful for invoking utilities

running other programs without quitting MATLAB. On Linux, for example, Emacs

magik.m invokes an editor called emacs for a file named magik.m. When quitting the

external program, the operating system returns control to MATLAB.

Dept. Of E.C.E, SIETK, PUTTUR

59

Using MATLAB

iv.

Launch Pad

MAT Labs Launch Pad provides easy access to tools, demos, and documentation.

v.

Help Browser

Use the Help browser to search and view documentation for all the Math Works

products. The Help browser is a Web browser integrated into the MATLAB desktop that

displays HTML documents.

To open the Help browser, click the help button in the toolbar, or type help

browser in the Command Window. The Help browser consists of two panes, the Help

Navigator, which is used to find information, and the display pane, to view the

information.

vi.

Help Navigator

Product filter - Set the filter to show documentation only for the products

specified.

Contents tab - View the titles and tables of contents of documentation for

products.

Index tab - Find specific index entries (selected keywords) in the Math Works

documentation.

Search tab - Look for a specific phrase in the documentation. To get help for a

specific function, set the Search type to Function Name.

vii.

Display Pane

After finding documentation using the Help Navigator, view it in the display

Browse to other pages - Use the arrows at the tops and bottoms of the pages, or

use the back and forward buttons in the toolbar.

60

Using MATLAB

Find a term in the page - Type a term in the Find in page field in the toolbar and

click Go.

Other features available in the display pane are: copying information, evaluating a

viii.

To determine how to execute functions call, MATLAB uses a search path to find

M-files and other MATLAB-related files, which are organized in directories on file

system. Any file to run in MATLAB must reside in the current directory or in a directory

that is on the search path. By default, the files supplied with MATLAB and Math Works

toolboxes are included in the search path.

ix.

Workspace Browser

The MATLAB workspace consists of the set of variables (named arrays) built up

during a MATLAB session and stored in memory. Variables are added to the workspace

by using functions, running M-files, and loading saved workspaces.

To view the workspace and information about each variable, use the Workspace

browser, or use the functions who and whose. To delete variables from the workspace,

select the variable and select Delete from the Edit menu. Alternatively, use the clear

function.

To save the workspace to a file that can be read during a later MATLAB session,

select Save Workspace As from the File menu, or use the save function. This saves the

workspace to a binary file called a MAT-file, which has a .mat extension. There are

options for saving to different formats. To read in a MAT-file, select Import Data from the

File menu, or use the load function.

5.2.4 Array Editor

Dept. Of E.C.E, SIETK, PUTTUR

61

Using MATLAB

Use the Array Editor to view and edit a visual representation of one- or two-dimensional

numeric arrays, strings, and cell arrays of strings that are in the workspace.

5.2.5 Editor/Debugger

Use the Editor/Debugger to create and debug M-files, which are programs to run

MATLAB functions. The Editor/Debugger provides a graphical user interface for basic

text editing, as well as for M-file debugging. To use any text editor create M-files, such as

Emacs, and use preferences (accessible from the desktop File menu) to specify that editor

as the default. If to use another editor, use the MATLAB Editor/Debugger for debugging,

or use debugging functions, such as dbstop, which sets a breakpoint.

If it is needed to view the contents of an M-file, display it in the Command

Window by using the type function.

5.2.6 Manipulating Matrices

Entering Matrices

The best way to get started with MATLAB is to learn how to handle matrices.

Start MATLAB and follow along with each example.

Enter matrices into MATLAB in several different ways:

Start by entering Drer's matrix as a list of its elements. For this follow few basic

conventions:

62

Using MATLAB

A = [16 3 2 13; 5 10 11 8; 9 6 7 12; 4 15 14 1]

MATLAB displays the matrix just entered.

A=

16

13

10

11

12

15

14

This exactly matches the numbers in the engraving. Once entered the matrix, it is

automatically remembered in the MATLAB workspace. Now it is simply referred as A.

5.2.7 Expressions

Like most other programming languages, MATLAB provides mathematical

expressions, but unlike most programming languages, these expressions involve entire

matrices. The building blocks of expressions are:

Variables

Numbers

Operators

Functions

5.2.8 Variables

MATLAB does not require any type declarations or dimension statements. When

MATLAB encounters a new variable name, it automatically creates the variable and

allocates the appropriate amount of storage. If the variable already exists, MATLAB

changes its contents and, if necessary, allocates new storage. For example, num_students

Dept. Of E.C.E, SIETK, PUTTUR

63

Using MATLAB

= 25 creates a 1-by-1 matrix named num_students and stores the value 25 in its single

element. Variable names consist of a letter, followed by any number of letters, digits, or

underscores. MATLAB uses only the first 31 characters of a variable name. MATLAB is

case sensitive; it distinguishes between uppercase and lowercase letters. A and a are not

the same variable. To view the matrix assigned to any variable, simply enter the variable

name.

5.2.9 Numbers

MATLAB uses conventional decimal notation, with an optional decimal point and

leading plus or minus sign, for numbers. Scientific notation uses the letter e to specify a

power-of-ten scale factor. Imaginary numbers use either i or j as a suffix. Some examples

of legal numbers are

3

9.6397238

1i

-99

1.60210e-20

-3.14159j

0.0001

6.02252e23

3e5i

All numbers are stored internally using the long format specified by the IEEE

floating-point standard. Floating-point numbers have a finite precision of roughly 16

significant decimal digits and a finite range of roughly 10-308 to 10+308.

5.2.10 Operators

Expressions use familiar arithmetic operators and precedence rules.

+

Addition

Subtraction

Multiplication

Division

64

Using MATLAB

Using MATLAB)

Power

'

()

5.3 Functions

MATLAB provides a large number of standard elementary mathematical

functions, including abs, sqrt, exp, and sin. Taking the square root or logarithm of a

negative number is not an error; the appropriate complex result is produced

automatically. MATLAB also provides many more advanced mathematical functions,

including Bessel and gamma functions. Most of these functions accept complex

arguments. For a list of the elementary mathematical functions, type

help elfun

For a list of more advanced mathematical and matrix functions, type

help spec fun

help elmat

Some of the functions, like sqrt and sin, are built-in. They are part of the

MATLAB core so they are very efficient, but the computational details are not readily

accessible. Other functions, like gamma and sinh, are implemented in M-files. The code

can be seen and even can be modified. Several special functions provide values of useful

constants.

Pi

3.14159265...

Imaginary unit, -1

65

Using MATLAB

Same as i

Eps

Realmin

Realmax

Inf

Infinity

NaN

Not-a-number

CHAPTER-6

EXPERIMENTAL RESULTS

66

Using MATLAB

CHAPTER-7

CONCLUSION

We presented a novel segmentation algorithm for the compression of raster

documents. While the COS algorithm generates consistent initial segmentations, the CCC

algorithm substantially reduces false detections through the use of a component-wise

MRF context model. The MRF model uses a pair-wise Gibbs istribution which more

heavily weights nearby components with similar features. We showed that the COS/CCC

algorithm achieves greater text detection accuracy with a lower false detection rate, as

Dept. Of E.C.E, SIETK, PUTTUR

67

Using MATLAB

are also potentially useful for document processing applications such as OCR.

REFERENCES

[1] ITU-T Recommendation T.44 Mixed Raster Content (MRC), T.44, International

Telecommunication Union, 1999.

[2] G. Nagy, S. Seth, and M. Viswanathan, A prototype document image analysis system

for technical journals, Computer, vol. 25, no. 7, pp. 1022, 1992.

[3] K. Y.Wong and F. M.Wahl, Document analysis system, IBM J. Res. Develop., vol.

26, pp. 647656, 1982.

68

Using MATLAB

[4] J. Fisher, A rule-based system for document image segmentation, in Proc. 10th Int.

Conf. Pattern Recognit., 1990, pp. 567572.

[5] L. OGorman, The document spectrum for page layout analysis, IEEE Trans.

Pattern Anal. Mach. Intell., vol. 15, no. 11, pp. 11621173, Nov. 1993.

[6] Y. Chen and B. Wu, A multi-plane approach for te 14191444, 2009.

[7] R. C. Gonzalez and R. E. Woods, Digital Image Processing, 3rd ed. Upper Saddle

River, NJ: Pearson Education, 2008.

[8] ITU-T Recommendation T.44 Mixed Raster Content (MRC), T.44, International

Telecommunication Union, 1999.

[9] G. Nagy, S. Seth, and M. Viswanathan, A prototype document image analysis system

for technical journals, Computer, vol. 25, no. 7, pp. 1022, 1992.

[10] K. Y.Wong and F. M.Wahl, Document analysis system, IBM J. Res. Develop., vol.

26, pp. 647656, 1982.

[11] J. Fisher, A rule-based system for document image segmentation, in Proc. 10th

Int. Conf. Pattern Recognit., 1990, pp. 567572.

[12] L. OGorman, The document spectrum for page layout analysis, IEEE Trans.

Pattern Anal. Mach. Intell., vol. 15, no. 11, pp. 11621173, Nov. 1993.

[13] Y. Chen and B. Wu, A multi-plane approach for text segmentation of complex

document images, Pattern Recognit., vol. 42, no. 7, pp. 14191444, 2009.

[14] R. C. Gonzalez and R. E. Woods, Digital Image Processing, 3rd ed. Upper Saddle

River, NJ: Pearson Education, 2008.

69

## Molto più che documenti.

Scopri tutto ciò che Scribd ha da offrire, inclusi libri e audiolibri dei maggiori editori.

Annulla in qualsiasi momento.