Text Segmentation

Test Segmentation of MRC Document Compression and Decompression by
Using MATLAB
CHAPTER 1
INTRODUCTION
1.1 Introduction of Image Processing
Modern digital technology has made it possible to manipulate multi-dimensional
signals with systems that range from simple digital circuits to advanced parallel
computers. The goal of this manipulation can be divided into three categories:
Image Processing image in image out
Image Analysis image in measurements out
Image Understanding image in high-level description out
We will focus on the fundamental concepts of image processing. Space does not permit
us to make more than a few introductory remarks about image analysis. Image
understanding requires an approach that differs fundamentally from the theme of this
book. Further, we will restrict ourselves to twodimensional (2D) image processing
although most of the concepts and techniques that are to be described can be extended
easily to three or more dimensions.
1.2 The Image Processing System
Dept. Of E.C.E, SIETK, PUTTUR

Using MATLAB
Digitizer
Mass
Image processing
Digital
Computer
Display
Operator console
Hard copy device
Figure1.1: Image processing system

1.2.1 Digitizer
Digitizer, digitizer, digitizing or digitizing may refer to: Digitizing or digitization,

the conversion of a typically analog object, image or a signal into digital form. Graphics
tablet or digitizing tablet. Digitizer lens cursor.
The digitizer by definition is a device used to convert analog signals into digital
signals. In the case of our cell phones, this device would be the glass that covers the
LCD. Yes, the glass piece that is attached to the LCD is the digitizer... or sometimes
called LCD digitizer. It converts your actions (press, swipe, etc.) into a digital signal that
your phone understands. The data from the digitizer (glass) is transferred to the phone by

Using MATLAB
the attached digitizer flex cable or digitizer flex ribbon, that should be included in your
purchase of a replacement LCD digitizer.
It's pretty difficult to preserve the integrity of the flex cable when there is a
digitizer replacement performed. So, when we have an LCD assembly, minus small parts
and frames, we're looking at 3 major components... 1) the digitizer (front glass), 2) the
digitizer flex cable (should be attached to the glass) and 3) the LCD display unit. All 3 of
these components can be purchased separately or all together. Although, I wouldn't
recommend to anyone that they purchase the digitizer without the digitizer flex cable.
Mounting the flex cable can be tricky with some phones and if not done correctly will
give you poor functionality of your touch screen LCD.
Figure 1.2: Digitizer

1.2.2 Image processing
An image processer does the function of image acquisition, storage, pre
processing, segmentation, recognition and interpretation and finally display and record
the resulting image. The following block diagram gives the fundamental involved in an
image processing system. There are some fundamental steps but as they are fundamental,
all these steps may have sub-steps. The fundamental steps are described below with a
neat diagram.

Using MATLAB
Image acquisition
Preprocessing
Segmentation
Representation and discretion
Knowledge base
Recognition and interpretation
Figure1.3: Fundamental steps of image processing

(i) Image Acquisition: This is the first step or process of the fundamental steps of digital
image processing. Image acquisition could be as simple as being given an image that is
already in digital form. Generally, the image acquisition stage involves preprocessing,
such as scaling etc.
(ii) Image Enhancement: Image enhancement is among the simplest and most appealing
areas of digital image processing. Basically, the idea behind enhancement techniques is to
bring out detail that is obscured, or simply to highlight certain features of interest in an
image. Such as, changing brightness & contrast etc.
(iii)
Image Restoration: Image restoration is an area that also deals with improving
the appearance of an image. However, unlike enhancement, which is subjective, image

restoration is objective, in the sense that restoration techniques tend to be based on
mathematical or probabilistic models of image degradation.
(iv) Color Image Processing: Color image processing is an area that has been gaining its
importance because of the significant increase in the use of digital images over the
Internet. This may include color modeling and processing in a digital domain etc.
v) Wavelets and Multi resolution Processing: Wavelets are the foundation for
representing images in various degrees of resolution. Images subdivision successively
into smaller regions for data compression and for pyramidal representation.

Using MATLAB
(vi) Compression: Compression deals with techniques for reducing the storage required
to save an image or the bandwidth to transmit it. Particularly in the uses of internet it is
very much necessary to compress data.
(vii)Morphological Processing: Morphological processing deals with tools for extracting
image components that are useful in the representation and description of shape.
(viii) Segmentation: Segmentation procedures partition an image into its constituent parts
or objects. In general, autonomous segmentation is one of the most difficult tasks in
digital image processing. A rugged segmentation procedure brings the process a long way
toward successful solution of imaging problems that require objects to be identified
individually.
(ix)Representation and Description: Representation and description almost always
follow the output of a segmentation stage, which usually is raw pixel data, constituting
either the boundary of a region or all the points in the region itself. Choosing a
representation is only part of the solution for transforming raw data into a form suitable
for subsequent computer processing. Description deals with extracting attributes that
result in some quantitative information of interest or are basic for differentiating one class
of objects from another.
(x)Object recognition : Recognition is the process that assigns a label, such as, vehicle
to an object based on its descriptors.
(xi) Knowledge Base : Knowledge may be as simple as detailing regions of an image
where the information of interest is known to be located, thus limiting the search that has
to be conducted in seeking that information. The knowledge base also can be quite
complex, such as an interrelated list of all major possible defects in a materials inspection
problem or an image database containing high-resolution satellite images of a region in
connection with change-detection applications.
1.2.3 Digital computer
An electronic computer in which the input is discrete rather than continuous, consisting
of combinations of numbers, letters, and other characters written in an appropriate
programming language and represented internally in binary notation. Compare analog
computer.

Using MATLAB
1.2.4 Mass storage

Refers to various techniques and devices for storing large amounts of data. The
earliest storage devices were punched paper cards, which were used as early as 1804 to
control silk-weaving looms. Modern mass storage devices include all types of disk drives
and tape drives. Mass storage is distinct from memory, which refers to temporary storage
areas within the computer. Unlike main memory, mass storage devices retain data even
when the computer is turned off.
1.2.5 Hard copy device
Hard copy devices are those that give the output in the tangible form. Printers and
Plotters are two common hard copy devices
1.2.6 Operator console
Operator console consists of equipment and arrangements for verification
intermediate results and for alternation in the software as and when require. The operater
is also cap able of checking for any resulting errors and the entry of requisite data.
1.3 APPLICATIONS OF DIGITAL IMAGE PROCESSING

Some of the major fields in which digital image processing is widely used are mentioned
below
Image sharpening and restoration
Medical field
Remote sensing
Transmission and encoding
Machine/Robot vision
Color processing
Pattern recognition
Video processing
Microscopic Imaging
Others

Using MATLAB
1.3.1 Image sharpening and restoration

Image sharpening and restoration refers here to process images that have been
captured from the modern camera to make them a better image or to manipulate those
images in way to achieve desired result. It refers to do what Photoshop usually does. This
includes Zooming, blurring , sharpening , gray scale to color conversion, detecting edges
and vice versa , Image retrieval and Image recognition. The common examples are:
1.3.2 Medical field
The common applications of DIP in the field of medical is
1. Gamma ray imaging
2. PET scan
3. X Ray Imaging
4. Medical CT
5. UV imaging
1.3.3 Transmission and encoding
The very first image that has been transmitted over the wire was from London to
New York via a submarine cable. Now just imagine , that today we are able to see live
video feed , or live cctv footage from one continent to another with just a delay of
seconds. It means that a lot of work has been done in this field too. This field doesnot
only focus on transmission , but also on encoding. Many different formats have been
developed for high or low bandwith to encode photos and then stream it over the internet
or e.t.c.
1.3.4 Machine/Robot vision
Apart form the many challenges that a robot face today , one of the biggest
challenge still is to increase the vision of the robot. Make robot able to see things ,
identify them , identify the hurdles e.t.c. Much work has been contributed by this field
and a complete other field of computer vision has been introduced to work on it.
1.3.5 Hurdle detection
Hurdle detection is one of the common task that has been done through image
processing, by identifying different type of objects in the image and then calculating the
distance between robot and hurdles.

Using MATLAB
1.3.6 Line follower robot

Most of the robots today work by following the line and thus are called line
follower robots. This help a robot to move on its path and perform some tasks. This has
also been achieved through image processing.
1.3.7 Color processing
Color processing includes processing of colored images and different color spaces
that are used. For example RGB color model , YCbCr, HSV. It also involves studying
transmission , storage , and encoding of these color images.
1.3.8 Pattern recognition
Pattern recognition involves study from image processing and from various other
fields that includes machine learning ( a branch of artificial intelligence). In pattern
recognition , image processing is used for identifying the objects in an images and then
machine learning is used to train the system for the change in pattern. Pattern recognition
is used in computer aided diagnosis , recognition of handwriting , recognition of images
e.t.c
1.3.9 Video processing
A video is nothing but just the very fast movement of pictures. The quality of the
video depends on the number of frames/pictures per minute and the quality of each frame
being used. Video processing involves noise reduction , detail enhancement , motion
detection , frame rate conversion , aspect ratio conversion , color space conversion e.t.c.
1.4 Segmentation
The division of an image into meaningful structures, image segmentation, is often
an essential step in image analysis, object representation, visualization, and many other
image processing tasks. In chapter 8, we focussed on how to analyze and represent an
object, but we assumed the group of pixels that identified that object
was known
beforehand. In this chapter, we will focus on methods that find the particular pixels that
make up an object. A great variety of segmentation methods has been proposed in the past
decades, and some categorization is necessary to present the methods properly here. A
disjunct categorization does not seem to be possible though, because even two very

Using MATLAB
different
segmentation
approaches
may
share
properties
that
defy
singular
categorization1. The categoryization presented in this chapter is therefore rather a

categorization regarding the emphasis of an approach than a strict division. The following
categories are used:
Threshold based segmentation. Histogram thresholding and slicing techniques

are used to segment the image. They may be applied directly to an image, but can
also be combined with pre- and post-processing techniques.

Edge based segmentation. With this technique, detected edges in an image are
assumed to represent object boundaries, and used to identify these objects.

Region based segmentation. Where an edge based technique may attempt to find
the object boundaries and then locate the object itself by filling them in, a region
based technique takes the opposite approach, by (e.g.) starting in the middle of an
object and then growing outward until it meets the object boundaries.
Clustering techniques. Although clustering is sometimes used as a synonym for
agglomerative) segmentation techniques, we use it here to denote techniques that
are primarily used in exploratory data analysis of high-dimensional measurement
patterns. In this context, clustering methods attempt to group together patterns
that are similar in some sense. This goal is very similar to what we are attempting
to do when we segment an image, and indeed some clustering techniques can
readily be applied for image segmentation.

Matching. When we know what an object we wish to identify in an image
(approximately) looks like, we can use this knowledge to locate the object in an
image.This approach to segmentation is called matching.

Using MATLAB
Figure 1.1: Example of Segmentation
1.5 Text Segmentation

Text segmentation is the process of dividing written text into meaningful units,
such as words, sentences, or topics. The term applies both to mental processes used by
humans when reading text, and to artificial processes implemented in computers, which
are the subject of natural language processing. The problem is non-trivial, because while
some written languages have explicit word boundary markers, such as the word spaces of
written English and the distinctive initial, medial and final letter shapes of Arabic, such
signals are sometimes ambiguous and not present in all written languages. Compare
speech segmentation, the process of dividing speech into linguistically meaningful
portions.
1.5.1 Segmentation problems
A. Word segmentation
Word segmentation is the problem of dividing a string of written language into its
component words. In English and many other languages using some form of the Latin
alphabet, the space is a good approximation of a word divider (word delimiter).
(However the equivalent to this character is not found in all written scripts, and without it
word segmentation is a difficult problem.
10

Using MATLAB
Languages which do not have a trivial word segmentation process include

Chinese, Japanese, where sentences but not words are delimited, Thai and Lao, where
phrases and sentences but not words are delimited, and Vietnamese, where syllables but
not words are delimited.
In some writing systems however, such as the Ge'ez script used for Amharic and
Tigrinya among other languages, words are explicitly delimited (at least historically) with
a non-whitespace character.The Unicode Consortium has published a Standard Annex on
Text Segmentation, exploring the issues of segmentation in multiscript texts. Word
splitting is the process of parsing concatenated text (i.e. text that contains no spaces or
other word separators) to infer where word breaks exist.Word splitting may also refer to
the process of hyphenation.
B. Sentence segmentation
Sentence segmentation is the problem of dividing a string of written language into
its component sentences. In English and some other languages, using punctuation,
particularly the full stop character is a reasonable approximation. However even in
English this problem is not trivial due to the use of the full stop character for
abbreviations, which may or may not also terminate a sentence. For example Mr. is not its
own sentence in "Mr. Smith went to the shops in Jones Street." When processing plain
text, tables of abbreviations that contain periods can help prevent incorrect assignment of
sentence boundaries.As with word segmentation, not all written languages contain
punctuation characters which are useful for approximating sentence boundaries.
C. Text segmentation
Topic analysis consists of two main tasks: topic identication and text
segmentation. While the first is a simple classification of a specific text, the latter case
implies that a document may contain multiple topics, and the task of computerized text
segmentation may be to discover these topics automatically and segment the text
11

Using MATLAB
accordingly. The topic boundaries may be apparent from section titles and paragraphs. In
other cases, one needs to use techniques similar to those used in document
classification.Segmenting the text into topics or discourse turns might be useful in some
natural processing tasks: it can improve information retrieval or speech recognition
significantly (by indexing/recognizing documents more precisely or by giving the
specific part of a document corresponding to the query as a result). It is also needed in
Topic detection and Tracking systems and text summarizing problems.Many different
approaches have been tried. e.g. HMM, lexical chains, passage similarity using word cooccurrence, clustering etc. It is quite an ambiguous task people evaluating the text
segmentation systems often differ in topic boundaries. Hence, evaluating is quite dubious
problem too.
D. Other segmentation problems
Processes may be required to segment text into segments besides mentioned,
including morphemes (a task usually called morphological analysis) or paragraphs.
E.Automatic segmentation approaches
Automatic segmentation is the problem in natural language processing of
implementing a computer process to segment text.When punctuation and similar clues are
not consistently available, the segmentation task often requires fairly non-trivial
techniques, such as statistical decision-making, large dictionaries, as well as
consideration of syntactic and semantic constraints. Effective natural language processing
systems and text segmentation tools usually operate on text in specific domains and
sources. As an example, processing text used in medical records is a very different
problem than processing news articles or real estate advertisements.The process of
developing text segmentation tools starts with collecting a large corpus of text in an
application domain. There are two general approaches:
Manual analysis of text and writing custom software
12

Using MATLAB
Annotate the sample corpus with boundary information and use Machine
Learning
Some text segmentation systems take advantage of any markup like HTML and know
document formats like PDF to provide additional evidence for sentence and paragraph
boundaries.
1.6 Compression
The objective of image compression is to reduce irrelevance and redundancy of
the image data in order to be able to store or transmit data in an efficient form.
1.6.1 Lossy and lossless compression
Image compression may be lossy or lossless. Lossless compression is preferred
for archival purposes and often for medical imaging, technical drawings, clip art, or
comics. Lossy compression methods, especially when used at low bit rates, introduce
compression artifacts. Lossy methods are especially suitable for natural images such as
photographs in applications where minor (sometimes imperceptible) loss of fidelity is
acceptable to achieve a substantial reduction in bit rate. The lossy compression that
produces imperceptible differences may be called visually lossless.
Methods for lossless image compression are:
Run-length encoding used as default method in PCX and as one of possible in

BMP, TGA, TIFF
Area image compression
DPCM and Predictive Coding
Entropy encoding
Adaptive dictionary algorithms such as LZW used in GIF and TIFF
Deflation used in PNG, MNG, and TIFF
13

Using MATLAB
Chain codes
1.6.2 Methods for lossy compression:
Reducing the color space to the most common colors in the image. The selected
colors are specified in the color palette in the header of the compressed image.
Each pixel just references the index of a color in the color palette, this method can
be combined with dithering to avoid posterization.
Chroma subsampling. This takes advantage of the fact that the human eye
perceives spatial changes of brightness more sharply than those of color, by
averaging or dropping some of the chrominance information in the image.
Transform coding. This is the most commonly used method. In particular, a

Fourier-related transform such as the Discrete Cosine Transform (DCT) is widely
used: N. Ahmed, T. Natarajan and K.R.Rao, "Discrete Cosine Transform," IEEE
Trans. Computers, 90-93, Jan. 1974. The DCT is sometimes referred to as "DCTII" in the context of a family of discrete cosine transforms; e.g., see discrete
cosine transform. The more recently developed wavelet transform is also used
extensively, followed by quantization and entropy coding.
Fractal compression.
1.6.3 Other properties

The best image quality at a given bit-rate (or compression rate) is the main goal of
image compression, however, there are other important properties of image compression
schemes.
1.6.4 Scalability
Generally refers to a quality reduction achieved by manipulation of the bitstream
or file (without decompression and re-compression). Other names for scalability are
progressive coding or embedded bitstreams. Despite its contrary nature, scalability also
14

Using MATLAB
may be found in lossless codecs, usually in form of coarse-to-fine pixel scans. Scalability
is especially useful for previewing images while downloading them (e.g., in a web
browser) or for providing variable quality access to e.g., databases. There are several
types of scalability:
Quality progressive or layer progressive: The bitstream successively refines the

reconstructed image.
Resolution progressive: First encode a lower image resolution; then encode the
difference to higher resolutions.
Component progressive: First encode grey; then color.
1.6.5 Region of interest coding

Certain parts of the image are encoded with higher quality than others. This may
be combined with scalability (encode these parts first, others later).
1.6.6 Meta information
Compressed data may contain information about the image which may be used to
categorize, search, or browse images. Such information may include color and texture
statistics, small preview images, and author or copyright information.
1.6.7 Processing power.
Compression algorithms require different amounts of processing power to encode
and decode. Some high compression algorithms require high processing power.The
quality of a compression method often is measured by the Peak signal-to-noise ratio. It
measures the amount of noise introduced through a lossy compression of the image,
however, the subjective judgment of the viewer also is regarded as an important measure,
perhaps, being the most important measure.
1.6.8 Bit plane slicing
A bit plane of a digital discrete signal (such as image or sound) is a set of bits
corresponding to a given bit position in each of the binary numbers representing the
15

Using MATLAB
signal.For example, for 16-bit data representation there are 16 bit planes: the first bit
plane contains the set of the most significant bit, and the 16th contains the least
significant bit.
It is possible to see that the first bit plane gives the roughest but the most critical
approximation of values of a medium, and the higher the number of the bit plane,
the less is its contribution to the final stage. Thus, adding a bit plane gives a better
approximation.
If a bit on the nth bit plane on an m-bit dataset is set to 1, it contributes a value of
2(m-n), otherwise it contributes nothing. Therefore, bit planes can contribute half of
the value of the previous bit plane.
16

Using MATLAB
CHAPTER-2
LITERATURE SURVEY
2.1 MULTISCALE SEGMENTATION FOR MRC DOCUMENT
COMPRESSION USING COST FUNCTION
The Mixed Raster Content (MRC) standard (ITU-T T.44) specifies a framework
for document compression which can dramatically improve the compression/quality
tradeoff as compared to traditional lossy image compression algorithms. The key to
MRCs performance is the separation of the document into foreground and background
layers, represented as a binary mask. In this paper, we propose a integrated segmentation
algorithm which is based on the sequential application of two algorithms. Cost Optimized
Segmentation (COS), is a block wise segmentation algorithm.
The second algorithm, Connected Component Classification (CCC), refines the
initial segmentation by classifying feature vectors of connected components using a
Markov random field (MRF) model. The integrated COS/CCC segmentation algorithms
are then incorporated to a resolution enhanced rendering (RER) method ie to achieve
high quality rendering of document containing text, pictures and graphics, while
maintaining desired compression ratios.
The procedure for Cost Optimized Segmentation (COS) is as follows. The image
is first divided into overlapping blocks. Each block contains mm pixels, and adjacent
blocks overlap by m/2 pixels in both the horizontal and vertical directions. The blocks are
denoted, Oi,j for i = 1, .., M , and j = 1, .., N , where M and N are the number of the
blocks in the vertical and horizontal directions. The pixels in each block are segmented
into foreground (1) or background (0) by the clustering method of Cheng and
17

Using MATLAB
Bouman. This results in an initial binary mask for each block denoted by C i,j {0, 1}mm.
However, in order to form a consistent segmentation of the page, these initial block
segmentations must be merged into a single binary mask. To do this, we allow each block
to be modified using a class assignment, si,j{0, 1, 2, 3}, as follows,the most traditional
approach to text segmentation is Otsus method which thresholds pixels in an effort to
divide the documents histogram into object and background.
Figure2.1: Proposed technique flow diagram

There are many modified versions of Otsus method. While Otsu uses a global
thresholding approach, Niblack and Sauvola use a local thresholding approach. Kapurs
method uses entropy information for the global thresholding, and Tsai uses a moment
preserving approach. A comparison of the algorithms for text segmentation can be found.
In order to improve text extraction accuracy, some text segmentation approaches
also use character properties such as size, stroke width, directions, and run-length
histogram. Other binarization approaches for document coding have used rate-distortion
minimization as a criteria for document binarization. Many recent approaches to text
segmentation have been based upon statistical models. One of the best commercial text
18

Using MATLAB
segmentation algorithms, which is incorporated in the DjVu document encoder, uses a

hidden Markov model (HMM).
2.1.1 Advantages
The content used here is a standard framework for layer-based document compression.
It reduces the bit rate of encoded raster documents.
The mixed raster content detects and segments the text in complex documents in
background gradations.
2.2 An improved binarization algorithm based on a water flow model

for document image with inhomogeneous backgrounds
A segmentation algorithm using a water flow model [Kim et al., Pattern
Recognition 35 has already been presented where a document image can be efficiently
divided into two regions, characters and background, due to the property of locally
adaptive thresholding. However, this method has not decided when to stop the iterative
process and required long processing time. Plus, characters on poor contrast backgrounds
often fail to be separated successfully. Accordingly, to overcome the above drawbacks to
the existing method, the current paper presents an improved approach that includes
extraction of regions of interest (ROIs), an automatic stopping criterion, and hierarchical
thresholding. Experimental results show that the proposed method can achieve a
satisfactory binarization quality, especially for document images with a poor contrast
background, and is significantly faster than the existing method.
2.3 Text Extraction and Document Image Segmentation Using Matched

Wavelets and MRF Model
A novel scheme for the extraction of textual areas of an image using globally
matched wavelet filters. A clustering-based technique has been devised for estimating
globally matched wavelet filters using a collection of ground truth images .We have
extended our text extraction scheme for the segmentation of document images into text,
background, and picture components (which include graphics and continuous tone
images). Multiple, two-class Fisher classifiers have been used for this purpose.We also
19

Using MATLAB
exploit contextual information by using a Markov random field formulation-based pixel

labeling scheme for refinement of the segmentation results. Experimental results have
established effectiveness of our approach.
The concept of matched wavelets to develop the globally matched wavelet
(GMW) filters specifically adapted for the text and non text region.We have used these
filters for detecting text regions in scene images and for segmentation of document
images into text, picture and background. We find the GMW filters by training matched
wavelets on an image set. The key contribution of our work is that it is
Figure2.2: (a)(b) Examples of document images; (c)(d) non document images.

A trainable segmentation scheme based on matched wavelet filters.For highperformance systems, using application specific training image sets (e.g., license plate
images, handwritten text images, printed text images), we can obtain filters customized
for a particular application. Compared to other existing methods , the dimensionality, and,
thus, the computation of the feature space, is considerably reduced. The filtering and the
feature extraction operations account for most of the required computations; however, our
20

Using MATLAB
method is very simple to understand, computationally less expensive, and efficient. In the
latter part, we exploit the contextual information using MRF-based post processing to
improve the results of document segmentation. The rest of the paper is organized as
follows.
2.3.1 Estimating Globally Matched Wavelet Filters
Matched wavelet estimation for any signal is formulated as finding a closed form
expression for extracting the compactly infinitely supported wavelet which maximizes
error norm between the signal reconstructed at initial scaling subspace and successive
lower wavelet subspace [1]. At an abstract level, our system use a set of trained wavelet
filters matched to text and non text classes. When a mixed document (having both text
and non text components) is passed through text matched filters, we get blacked-out
regions in the detail (high pass) space corresponding to the text regions of the document
and vice versa for the non text matched wavelet filters. These blacked-out regions in the
output of text and non text wavelet filters are used to classify various regions as either
text or non text.
In , an approach is proposed for estimating matched wavelets for a given image. It
is further shown in [1] that estimated wavelets with separable kernel have higher peak
signal-to-noise ratio (PSNR) for the same bit-rate as compared with standard 9/7 wavelet.
In this section, we describe a technique for estimating a set of matched wavelets from a
database of images. We term them as GMWs. These GMWs are used to generate feature
vectors for segmentation. We discuss more about their implementation in subsequent
subsections.
2.3.2 Matched Wavelets & Their Estimation
First, we briefly review the theory of matched wavelets with separable kernel as
proposed. Consider a 2-D two-band wavelet system (with separable kernels) shown in
Fig. 2. Here, and are the horizontal and vertical directions, the scaling filter in any
direction is represented as , its dual is represented as , wavelet filter is represented as ,
and its dual is shown as . Further, boxes showing 2 with an upward or downward arrow
21

Using MATLAB
represent up sampling or down sampling respectively of the signal by a factor of 2. The

input to this system is a 2-D signal which is, for practical purposes, assumed to be
continuous and the output of the system is another 2-D signal constructed from the sum
of
outputs of four channels shown in Fig. 2. The output of channel 1 is called
approximation subspace or scaling subspace, whereas the outputs of the other three
channels are called detail subspaces. This system is designed as biorthogonal wavelet
system which means that it needs to satisfy following conditions for perfect
reconstruction of the two- band filter bank
Figure2.3: Separable kernel filter bank.

h1 ( n )=(1 )n f o ( M n)
n
f 1 ( n ) =(1 ) ho ( M n)
(1)
(2)
Where is any M odd delay

The scaling function
(t)
and wavelet function
(t )
are governed by
two-scale relations for the two-band wavelet system. Similar equations exist for
estimating dual scaling function
' (t) and dual wavelet function '( t)
between two signals is defined as
22
. The error

Using MATLAB
e ( x )=a ( x )a x
(3)
Where is the continuous 2-D image signal and represents the 2-D image reconstructed
from detail coefficients only. Then, corresponding error energy is defined as
2
E= e ( x ) dx
(4)
2.3.3 Segmentation of Document Images

In this refer to natural images (photographs for example) as scene images. The
scene images that we consider contain some written or embedded text and everything else
is nontext region. In the document image, however, we consider three components: 1)
text, 2) picture, and 3) background. Backgrounds are continuous tone low frequency
regions with dull features although mixed with noise. Images are continuous tone regions
falling in between text and background.
Figure 2.4: Fisher projected value

Thus, for document images we have extended our work described in the last
section (text location in general image) to segmentation of document images into three
classes, viz. text, picture, and background. We have used the same feature vectors and
classified them into three classes. For classification, we have used Fisher classifiers, first
because of the advantages like ease of training (because of projecting the data on one
dimension) and time efficiency. Moreover, the results of the Fisher classifier fit naturally
into our MRF post processing step as explained in the next section.
23

Using MATLAB
The Fisher classifier is often used for two-class classification problems. Although
it can be extended to multiclass classification (three classes in our case), yet the
classification accuracy decreases due to the overlap between neighboring classes. Thus,
we need to make some modifications to the Fisher classifiers (explained in the last
section) to apply them in this case [3].We use three Fisher classifiers, each optimized for
a two-class classification problem (text/picture, picture/background, and background
text). Each classifier outputs a confidence in the classification and the final decision is
made by fusing the outputs of all three classifiers.
Fig. 4. Distribution of Y for image and background as obtained from classifier 1. Similar
distributions are obtained for classifiers 2 and 3.
2.3.4 MRF POSTPROCESSING FOR DOCUMENT IMAGE SEGMENTATION
Segmentation may lead to overlapping in the feature space. This is especially true
for the picture and background classes because of lack of hard distinction between the
textures of these two classes. We deal with this problem by exploiting the contextual
information around each pixel. A similar approach has been used recently in to refine the
results of segmenting the handwritten text, printed text and noise in the document image.
Results for the document image segmentation of previous section show that
misclassification happens either in form of occurrence of certain isolated clusters of
another class in a given class or at the boundaries of the different classes as indicated by
Fig. 5. Removing this misclassification is equivalent to making the classification
smoother. In this section, we present
24

Using MATLAB
Figure 2.6: Document Segmentation results obtained for two sample images
Images show that the misclassification occurs either at the class boundaries or
because of the presence of small isolated clusters.The problem of correcting the
misclassification belongs to very general class of problems in vision and can be
formulated in terms of energy minimization. Every pixel must be assigned a label in the
set {text, picture, background}. refers to a particular labeling of the pixels and refers to
the value of the label of a particular pixel. We consider the first order MRFmodel. This
simplifies the energy function to the following form:
E (f )=
{p q } N
V p ,q ( f p , f q ) + D p ( f p)
p P
(5)
Inputs to the algorithm are the classification confidence maps (for image, text,
background) and labelings (initial results) that we obtained in the last section using this
classification confidence maps. Using the initial labelings, algorithm evaluates the
interaction energy ( ) and minimize the total energy ( ) to obtain a new labeling. This step
is repeated till no further minimization is possible, finally leaving the resulting optimized
labeling.
25

Using MATLAB
2.4 A Review of Character Segmentation Methods

Image to text conversion is the vital area of research for many years. Mainly,
Optical Character Recognition (OCR) in use to extract characters from the image.
Character segmentation is a preprocessing step for an OCR. In this paper, we have
discussed different character segment methods used in various domains. Some of the
methods are used for handwritten character recognition and some of the methods are for
vehicle Number Plate (NP) detection.
The major focus of this research is to identify the approaches that can be useful in
the vehicle NP detection. After analyzing the existing character segmentation methods,
the favored methods for NP detection are discussed in the conclusion section. The paper
is concluded by suggesting the future scope of research in this research area.
Image to text processing is the topic of research for last several years. The most
common method is Optical Character Recognition (OCR) to extract text from the images.
In large and complex images, it is essential to segment the image and then extract the
characters by using character segmentation method. Then after the segmented character
should be sent to OCR engine for the further process.
The process is well depicted in Fig.1. As shown in this fig 1 (a), first an image of
number plate (NP) is captured which is further processed to find the region of interest. In
this figure, the purpose is to extract the characters of captured NP to detect vehicle
number.
By using an image segmentation process the number plate region is detected as
shown in fig 1 (b). In order to identify the vehicle number each character should be
clipped from the segmented NP. This task can be accomplished by using character
segmentation method. The segmented characters are shown in fig 1 (c).
26

Using MATLAB
Figure2.7: Image
segmentation and
character segmentation
In the following section existing character segmentation methods are discussed,

which is followed by discussion and conclusion section. The paper is concluded by
suggesting future scope in the area of character segmentation.
Different character segmentation methods are discussed in this paper. There are
different methods of character segmentation such as localized histogram multilevel
thresholding, Bayes theorem, prior knowledge, feature extraction, dynamic programming,
nonlinear clustering, multistage graph search algorithm, segment confidence-based binary
segmentation,
separator
symbols
frame
of
reference
and
horizontal-vertical
segmentation. All these methods are very useful as a preprocessing step for the OCR.
Some of algorithms based on prior knowledge and separator symbols frame of reference
might not be useful for NP segmentation as it is difficult have prior knowledge regarding
vehicle NP in advance. Dynamic programming and Segment confidence-based binary
segmentation (SCBS) based methods can be really useful for NP character extraction
2.5 Sign Detection in Natural Images with Conditional Random Fields

27

Using MATLAB
Traditional generative Markov random fields for segmenting images model the
image data and corresponding labels jointly, which requires extensive independence
assumptions for tractability. We present the conditional random field for an application i
n sign detection, using typical scale and orientation selective texture filters and a
nonlinear texture operator based on the grating cell. The resulting model captures
dependencies between neighboring image region labels in a data-dependent way that
escapes the difficult problem of modeling image formation, instead focusing effort and
computation on the labeling task. We compare the results of training the model with
pseudo-likelihood against an approximation of the full likelihood with the iterative tree re
parameterization algorithm and demonstrate improvement over previous methods.
Image segmentation and region labeling are common problems in computer
vision. In this work, we seek to identify signs in natural images by classifying regions
according to their textural properties. Our goal is to integrate with a wearable system that
will recognize any detected signs as a navi gational aid to the visually impaired. Generic
sign detection is a difficult problem. Signs may be located anywhere in an image, exhibit
a wide range of sizes, and contain an extraordinarily broad set of fonts, colors,
arrangements, etc. For these reasons, we treat signs as a general texture class and seek to
discriminate such a class from the many others present in natural images.
The value of context in computer vision tasks has been studied in various ways
for many years. Two types of context are important for this problem:label context and
data context. In the absence of label context, local regions are classified independently,
which is a common approach to object detection. Such disregard for the (unknown) labels
of neighboring regions often leads to isolated false positives and missing false negatives.
The absence of data context means ignoring potentially helpful image data from any
neighbors of the region being classified. Both contexts are simultaneously important. For
instance, since neighboring regions often have the same label, we could penalize label
discontinuity in an image. If such regularity is imposed without regard for the actual data
in a region and local evidence for a label is weak, then continuity constraints would
typically override the local data. Conversely, local region evidence for a \sign" label
28

Using MATLAB
could be weak, but a strong edge in the adjoining region might bolster belief in the
presence of a sign at the site because the edge indicates a transition. Thus, considering
both the labels and data of neighboring regions is important for predicting labels. This is
exactly what the conditional random field (CRF) model provides. The advantage of the
discriminative contextual model over a generative one for detection tasks has recently
been shown in [8]. We demonstrate a training method that improves prediction results,
and we apply the model to a challenging real-world task. First the details of the model
and how it divers from the typical random field are described, followed by a description
of the image features we use. We close with experiments and conclusions.
2.5.1 Image Features for Sign Detection
Text and sign detection has been the subject of much research. Earlier approaches
either use independent, local classifications or use heuristic methods, such as connected
component analysis. Much work has been based on edge detectors or more general
texture features, as well as color. Our approach calculates a joint labeling of image
patches, rather than labeling patches independently, and it obviates layout heuristics by
allowing the CRF to learn the characteristics of regions that contain text. Rather than
simply using functions of single filters (e.g., moments) or edges, we use a richer
representation that captures important relationships between responses to different scaleand orientation-selective filters.
To measure the general textural properties of both sign and especially non-sign
(hence, background) image regions, we use responses of scale and orientation selective
filters. Specifically, we use the statistics of filter responses described in, where
correlations between steerable pyramid responses of different scales and orientations are
the prominent features.
A biologically inspired non-linear texture operator for detecting gratings of bars at
a particular orientation and scale is described. Scale and orientation selective filters, such
as the steerable pyramid or Gabor filters, respond indiscriminately to both single edges
and one or more bars.
29

Using MATLAB
Figure 2.8: Grating cell data flow for a single scale and orientation.
Two boxes at I, T, and F represent center on and center filters, while the boxes at M are
for the six receptive fields.Using an algorithm that ranks discriminative power of random
field model features, we found the top three in the edge-less, context-free Max Ent model
to be (i) the level of green hue (easily identifying vegetation as background), (ii) mean
grating cell response (easily identifying text), and (iii) correlation between a vertically
and diagonally oriented filter of moderate scale (the single most useful other `textural'
feature).
CHAPTER-3
High Quality MRC Document Coding MRC model
30

Using MATLAB
3.1 High Quality MRC Document

The mix raster content (MRC) model can be used to implement highly effective
document compression algorithms. However, many MRC based methods which achieve
high compression ratio can distort some ne details of document quality, such as thin
lines, and text edges. In this pa-per, we present a method called resolution enhanced
rendering (RER) to achieve high quality rendering of document containing text, pictures
and graphics, while main-tainting desired compression ratios. The method applies
adaptive dithering to the MRC encoder and then performs a nonlinear prediction in the
MRC decoder. Both the dither-ing and nonlinear prediction algorithms are jointly
optimized to produce the best quality rendering.
We present experimental results illustrating the performance of our method and
comparing it to some existing MRC compression algorithms. Document imaging
applications such as scan-to-print, document archiving, and internet fax are driving the
need for document compression standards that maintain high quality while achieving
high compression ratios. Recently, the mix raster content (MRC) compression model has
been adopted as a standard for document encoding. The MRC standard allows raster
documents to be coded at very high compression ratios but with much lower distortion
than would be possible using conventional image coding methods. While MRC methods
are much better than conventional transform coders, they still can substantially distort
ne document details, such as thin lines and text edges.
In this paper, we propose a method called resolution enhanced rendering (RER)
for jointly optimizing the MRC encoder and decoder to achieve high quality rendering of
document text, image and graphics, while maintaining de-sired compression ratios. The
method works by adaptively dithering the mask layer of a three-layer MRC encoding to
produce the intermediate tone levels required for high quality rendering. The dithering is
performed using a novel adaptive error diffusion algorithm. A tree-based nonlinear
predictor is then designed into the MRC decoder to reconstruct the desired intermediate
tones. Both the dithering and nonlinear prediction algorithms are jointly optimized to
produce the best quality rendering. The optimization is performed by iteratively
optimizing the encoder and de-coder to achieve the minimum distortion.
31

Using MATLAB
This method also has a number of potential advantages. First, it is compatible

with the MRC standard. That is to say, the RER enhanced encoder works with a
conventional MRC decoder, and the RER enhanced decoder works with a conventional
MRC encoder. Second, the method can be implemented using the previously proposed
the Rate Distortion Optimized Segmentation (RDOS) method. The RDOS method
computes the segmentation that minimizes a combination of bit-rate and distortion. We
will present experimental results comparing the performance of RDOS compression with
and without the RER method.
3.2 Conventional MRC Encoder

An MRC encoder is based on the Mixed Raster Content imaging model, which
represents a document by layers with different properties. As shown in Figure 1, a threelayer MRC document contains a background layer, a fore-ground layer, and a binary
mask layer. At each pixel, the value of the binary mask is used to select between the foreground and background pixels. In the MRC model, each layer is compressed
independently. This adds some inefficiency since the foreground and background layers
must be coded even when they are not used, but it simplifies the imaging model.
Typically, the foreground and background layers are compressed using natural image
coders such as JPEG or embedded wavelet coder; whereas the binary mask is typically
encoded with a lossless binary encoder such as JBIG or JBIG2.
Figure 3.1: MRC imaging model forms text and line art by using a binary mask to
choose between foreground and background layers.
In this work, we focus on the rate distortion optimized segmentation (RDOS)
algorithm. Strictly speaking, the RDOS encoder is not a true MRC encoder because it
does not encode each layer of the MRC model independently. Nonetheless, the RDOS
32

Using MATLAB
method can in principal be modified to be a true MRC method, and the methods
introduced in this work are equally applicable to any typical MRC encoder.
The RDOS algorithm classifies each 88 block of pixels into one of four
classes: picture block, two-color block, one-color block or other block. Each
class corresponds to a different coding method. The picture and other blocks use JPEG
block encoders. The one-color blocks are entropy coded using an arithmetic encoder. For
each two-color block, both the foreground and background colors are entropy coded
using the arithmetic encoders while the 8 8 binary mask is encoded using a JBIG2
encoder. The class of each block is chosen to maximize the rate-distortion performance
over the entire document. The optimization is achieved by applying each candidate
coding method to each block and then selecting the method which yields the best ratedistortion trade-off.
3.3 Resolution Enhanced Rendering Method

MRC encoders have an enormous advantage for document encoding because
they can efficiently encode text and line art with very high spatial resolution. However,
one limitation of conventional MRC encoders is that the mask can only represent binary
transitions at text edges. This makes accurate representation of text edges difficult. In
principal, it is possible to add edge detail to the foreground or background layers;
however, in practice this detail is lost when those layers are encoded using natural image
coders at acceptable bit rates. Figure 3.2 shows how the resolution enhanced rendering
(RER) algorithm adds edge detail while retaining the binary MRC mask layer.
First, the RER encoder segments the foreground and background using an
adaptive error diffusion method. This error diffusion method effectively dithers the binary
mask along the edge of the character to represent the gradual transition of true raster
scanned text characters. The error diffusion algorithm uses the local value of the mask to
adapt the error diffusion weights so that error is diffused along the 1-D mask boundary.
The RER decoder uses the binary mask, together with the foreground and background
colors to estimate the true value of the document pixels. This estimation is done using a
nonlinear tree-structured predictor as described in [8, 9]. Importantly, this predictor is
33

Using MATLAB
trained to identify the characteristic patterns of the RER encoder. Therefore, it can do a
much better job of accurately estimating the true pixel values.
Figure3.2: Illustration of MRC encoder and decoder with RER. Examples were selected
from actual RER inputs and outputs.
Figure 3.3 illustrates how the RER encoder and decoder are jointly optimized to
maximize the quality of the decoded document. As we will see, both the encoder and the
decoder have parameters which can be trained to produce the best possible result. The
error diffusion algorithm has +ve parameters which control its behavior, and the
nonlinear predictor has a large number of parameters which specify the nodes of a
nonlinear regression tree.
34

Using MATLAB
Figure 3.3: Overview of method used to train the optimized encoder and decoder. Once
training is complete, the encoder and decoder function independently.
In each iteration of the optimization, the parameters of the encoder or decoder are
alternatively fixed, while the parameters of the other one are optimized. Importantly, two
different sets of documents are used for training the encoder and decoder. We have found
this improves the robustness of the training procedure. Experimental results are shown
for test documents that are not contained in either set of training documents. The
experimental results indicate that this training process robustly converges to parameters
which reduce the distortion of the decoded document. Moreover, we have found that joint
optimization of the encoder and decoder performs substantially better than independent
optimization of these two functions.
3.3.1 The RER Encoder
Let Xs be a pixel in the raster document at location s. In the MRC format, each
pixel also has an associated foreground color, Fs, and background color, Bs. The binary
MRC mask then determines whether Fs or Bs will be used to represent the true pixel
value Xs. In RDOS encoding, the foreground and background colors are constant in 88
blocks. But in other MRC encoding methods, the values of the foreground and
background colors can change from pixel to pixel. Next define the scalar value s which
determines the relative mixture of foreground and background color in the pixel Xs. More
specifically, s is given by the value on the real line which minimizes the squared error
35

Using MATLAB
Figure 4 gives a geometric interpretation of s as the projection of the true pixel color
onto the line connecting the foreground and background colors. The solution to this least
squares approximation problem is given by
Figure3.4: Least
squares
approximation of pixel color, Xs, by a combination of the background, Bs, and

foreground, Fs, colors.
Notice that when a pixel is well approximated by either the foreground or
background color, then s is small. On the other hand, when a pixel is best approximated
by a mixture of foreground and background colors, then s is large. The value of s will
be used to control the local adaptation of the error diffusion algorithm.
Typically, MRC encoders compute the binary mask by classifying each pixel as
foreground or background independently. However, the RER encoder computes the binary mask by applying a form of adaptive error diffusion to the gray scale image s. This
effectively dithers the binary mask along character edges so that the mask can more
accurately represent ne gradations in the transition between foreground and background
colors. While the RER error diffusion method is similar to serpentine scan Floyd
Steinberg error diffusion, it is specially designed and optimized to diffuse error along text
edge transitions. This is done by adaptively setting the error diffusion weights at each
pixel. Figure 5 illustrates the four future pixels s0; s1; s2; s3 which neighbor s in
36

Using MATLAB
serpentine scan order. Then ws0; ws1; ws2; ws3 are the values of the four corresponding
error diffusion weights. The values of these four weights are varied at each pixel s using
the formula
Figure3.5: Illustration of error diffusion algorithm.

3.3.2 RER Decoder
Here we assume that the foreground and background colors are the same as used
in the RER encoder. The nonlinear predictor works by rst extracting the binary mask in
a 55 window about the pixel in question. Optimal Estimator of Given z Tree-Based
Classifier Binary Image of Mask Layer
Xs = Fs + (1 - l s) s
Figure 3.6: Structure of RER decoder using nonlinear predictor.
37

Using MATLAB
This data forms a binary vector, zs, which is then used as input to a binary regression tree
predictor known as Tree- Based Resolution Synthesis (TBRS). The TBRS predictor
estimates the value of s in a two-step process. First, it classifies the vector zs into one of
M classes using a binary tree classifier. The basic idea of TBRS is to use a binary
regression tree as a piecewise linear approximation to the conditional mean estimator.
The classification step is essential because it can separate out the distinct regions of the
document corresponding to mask edges of different orientation and shape.
One additional complication occurs with the RDOS method. Since it is not a true
MRC encoder, pixels which fall outside of two-color blocks have no binary mask values.
This can cause a problem when the pixel s falls near the boundary of a block, and the 5
5 window about the pixel covers part of the adjacent block that is not a two-color block.
In this case, the pixels are classified as 0, 1, or 2 depending on if they are close to the
background color, the foreground color or neither color. Then the values, 1, and 2 are
encoded as binary values 00, 01, and 10, to insure that the input vector zs remain binary.
3.3.3 Training
The objective of the training process is to optimize the performance of the RER
encoder and decoder by selecting the encoder and decoder parameters to maximize the
decoded document quality over a training set of documents. The distortion metric used to
measure document quality is mean squared error. While mean squared error is not always
a good measure of quality, for this application we found that it was always well correlated
with our subjective evaluation of quality.
The training process alternated between optimization of the encoder and decoder
parameters. So, when optimizing the encoder parameters, the previously obtained
decoder parameters were used; and when optimizing the decoder parameters, the
previously obtained encoder parameters were used. The training phases for encoder and
decoder used different sets of training data. This strategy seemed to produce more robust
training results. The iterative optimization is always started by optimizing the decoder
and using the initial encoder parameters
3.4 Decoder Optimization
38

Using MATLAB
While the TBRS predictor is very efcient to implement, it can be

computationally expensive to train. The training process includes three steps: generating
training vector pairs, building the regression tree, and generating the least square
prediction filters.
3.5 Encoder Optimization

The encoder is optimized by searching for the value of the vector, in which
minimizes the mean squared error averaged over the set of training documents. This
search is initialized at the current value of the vector and uses the decoder parameters
resulting from the last optimization of the decoder. The optimization of is done by
sequentially perturbing the elements of the vector using an iterative coordinate decent
strategy. Each perturbation is made using a step size of which is initialized to the value
= 0:2. The decision to perturb each component by or to leave it unchanged is made
based on the value of the mean squared error computed with the set of training
documents. If the mean squared error does not decrease with the perturbation of every
element of the parameter vector, then the size of the perturbation is automatically reduced
using.
Figure3.7: Comparison of compression results. (a) A portion of the original test image;
(b) Compressed by standard RDOS at 0.184 bpp (130:1 compression ratio); (c)
Compressed by RER enhanced RDOS at 0.182 bpp (132:1 compression ratio).
39

Using MATLAB
CHAPTER-4
PROPOSED SYSTEM
Our segmentation method is composed of two algorithms that are applied in
sequence: the cost optimized segmentation (COS) algorithm and the connected
component classification (CCC) algorithm. The COS algorithm is a block wise
segmentation algorithm based upon cost optimization. The COS produces a binary image
from a gray level or color document; however,
the resulting binary image typically contains many false text detections.
The CCC algorithm further processes the resulting binary image to improve the
accuracy of the segmentation. It does this by detecting non text components (i.e., false
text detections) in a Bayesian framework which incorporates an Markov random field
(MRF) model of the component labels. One important innovation of our method is in the
design of the MRF prior model used in the CCC detection of text components. In
particular, we design the energy terms in the MRF distribution so that they adapt to
attributes of the neighboring components relative locations and appearance. By doing
this, the MRF can enforce stronger dependencies between components which are more
likely to have come from related portions of the document.
4.1 COST OPTIMIZED SEGMENTATION (COS)

The COS algorithm is a block-based segmentation algorithm formulated as a
global cost optimization problem. The COS algorithm is comprised of two components:
blockwise segmentation and global segmentation. The blockwise segmentation divides
the input image into overlapping blocks and produces an initial segmentation for each
block. The global segmentation is then computed from the initial segmented blocks so as
to minimize a global cost function, which is carefully designed to favor segmentations
that capture text components. The parameters of the cost function are optimized in an
offline training procedure. A block diagram for COS is shown in Fig. 4.1.
40

Using MATLAB
Figure4.1: COS algorithm comprises two steps: block wise segmentation and global
segmentation. The parameters of the cost function used in the global segmentation are
optimized in an offline training procedure.
4.1.1 Block wise Segmentation
Block wise segmentation is performed by first dividing the image into
overlapping blocks, where each block contains MM pixels, and adjacent blocks overlap
by pixels in both the horizontal and vertical directions. The blocks are denoted by for ,
and , where and are the number of the blocks in the vertical and horizontal directions,
respectively. If the height and width of the input image is not divisible by , the image is
padded with zeros. For each block, the color axis having the largest variance over the
block is selected and stored in a corresponding gray image block, . The pixels in each
block are segmented into foreground (1) or background (0) by the clustering method
of Cheng and Bouman [24]. The clustering method classifies each pixel in by comparing
it to a threshold . This threshold is selected to minimize the total subclass variance. More
specifically, the minimum value of the total subclass variance is given by
41

Using MATLAB
where and are number of pixels classified as 0 and 1 in by the threshold , and and are the
variances within each subclass (see Fig. 3). Note that the subclass variance can be
calculated efficiently. First, we create a histogram by counting the number of pixels
which fall into each value between 0 and 255. For each threshold , we can recursively
calculate and from the values calculated for the previous threshold of .
Figure4.2: Illustration of a block wise segmentation. The pixels in each block are
separated into foreground (1) or background (0) by comparing each pixel
4.1.2 Global Segmentation
The global segmentation step integrates the individual segmentations of each
block into a single consistent segmentation of the page. To do this, we allow each block
to be modified using a class assignment denoted by,
42

Using MATLAB
Notice that for each block, the four possible values of correspond to four possible
changes in the blocks segmentation: original, reversed, all background, or all foreground.
If the block class is original, then the original binary segmentation of the block is
retained. If the block class is reversed, then the assignment of each pixel in the block is
reversed. If the block class is set to all background or all foreground, then the pixels
in the block are set to all 0s or all 1s, respectively. Fig. 4 illustrates an example of the
four possible classes where black indicates a label of 1 (foreground) and white
indicates a label of 0 background).
Our objective is then to select the class
assignments, , so that the resulting binary masks, , are consistent. We do this by

minimizing the following global cost as a function of the class assignments, for all, i,j
As it is shown, the cost function contains four terms, the first term representing the fit of
the segmentation to the image pixels, and the next three terms representing regularizing
constraints on the segmentation. The values , , and are then model parameters which can
be adjusted to achieve the best segmentation quality. The first term is the square root of
the total subclass variation within a block given the assumed segmentation. More
specifically
43

Using MATLAB
where is the standard deviation of all the pixels in the block. Since must always be less
than or equal to , the term can always be reduced by choosing a finer segmentation
corresponding to or 1 rather than smoother segmentation corresponding to or 3. The
terms and regularize the segmentation by penalizing excessive spatial variation in the
segmentation. To compute the term , the number of segmentation mismatches between
pixels in the overlapping region between block and the horizontally adjacent block is
counted. The term is then calculated as the number of the segmentation mismatches
divided by the total number of pixels in the overlapping region. Also is similarly defined
for vertical mismatches. By minimizing these terms, the segmentation of each block is
made consistent with neighboring blocks. The term denotes the number of the pixels
classified as foreground (i.e., 1) in divided by the total number of pixels in the block.
This cost is used to ensure that most of the area of image is classified as background. For
computational tractability, the cost minimization is iteratively performed on individual
rows of blocks, using a dynamic programming approach [37]. Note that row-wise
approach does not generally minimize the global cost function in one pass through the
image.
Therefore, multiple iterations are performed from top to bottom in order to
adequately incorporate the vertical consistency term. In the first iteration, the
optimization of th row incorporates the term containing only the throw. Starting from the
second iteration, terms for both the th row and th row are included. The optimization
stops when no changes occur to any of the block classes. Experimentally, the sequence of
updates typically converges within 20 iterations.
The cost optimization produces a set of classes for overlapping blocks. Since the
output segmentation for each pixel is ambiguous due to the block overlap, the final COS
segmentation output is specified by the center region of each overlapping block. The
weighting coefficients , , and were found by minimizing the weighted error between
44

Using MATLAB
segmentation results of training images and corresponding ground truth segmentations.

A ground truth segmentation was generated manually by creating a mask that indicates
the text in the image. The weighted error criteria which we minimized is given by
where , and the terms and are the number of pixels in the missed detection and false
detection categories, respectively.
45

Using MATLAB
Figure 3.4. Illustration of how the component inversion step can correct erroneous
segmentations of text. (a) Original document before segmentation. (b) Result of COS
binary segmentation. (c) Corrected segmentation after component inversion.
4.2 Connected component classification (CCC)

The CCC algorithm refines the segmentation produced by COS by removing
many of the erroneously detected nontext components. The CCC algorithm proceeds in
three steps: connected component extraction, component inversion, and component
classification. The connected component extraction step identifies all connected
components in the COS binary segmentation using a 4-point neighborhood. In this case,
connected components less than six pixels were ignored because they are nearly invisible
46

Using MATLAB
at 300 dpi resolution. The component inversion step corrects text segmentation errors that
sometimes occur in COS segmentation when text is locally embedded in a highlighted
region. Fig. 5(b) illustrates this type of error where text is initially segmented as
background. Notice the text 100 Years of Engineering Excellence is
initially
segmented as background due to the red surrounding region. In order to correct these
errors, we first detect foreground components that contain more than eight interior
background components (holes).
In each case, if the total number of interior background pixels is less than half of
the surrounding foreground pixels, the foreground and background assignments are
inverted. Fig. 5(c) shows the result of this inversion process. Note that this type of error is
a rare occurrence in the COS segmentation. The final step of component classification is
performed by extracting a feature vector for each component, and then computing a MAP
estimate of the component label. The feature vector, , is calculated for each connected
component, , in the COS segmentation. Each is a 4-D feature vector which describes
aspects of the th connected component including edge depth and color uniformity.
Finally, the feature vector is used to determine the class label, , which takes a value of 0
for nontext and 1 for text.
Figure4.5: Illustration of a Bayesian segmentation model. Line segments indicate

dependency between random variables
47

Using MATLAB
The Bayesian segmentation model used for the CCC algorithm is shown in Fig. 6.
The conditional distribution of the feature vector given is modeled by a multivariate
Gaussian mixture while the underlying true segmentation labels are modeled by an MRF.
Using this model, we classify each component by calculating the MAP estimate of the
labels, , given the feature vectors, . In order to do this, we first determine which
components are neighbors in the MRF. This is done based upon the geometric distance
between components on the page.
4.2.1 Statistical Model
Here, we describe more details of the statistical model used for the CCC
algorithm. The feature vectors for text and non-text groups are modeled as Ddimensional multivariate Gaussian mixture distributions
Covariance matrix, and weighting coefficient of the th cluster in each

distribution. In order to simplify the data model, we also assume that the values are
conditionally independent given the associated values.
The components of the feature vectors include measurements of edge depth and
external color uniformity of the connected component. The edge depth is defined as the
Euclidean distance between RGB values of neighboring pixels across the component
boundary (defined in the initial COS segmentation). The color uniformity is associated
with the variation of the pixels outside the boundary. In this experiment, we defined a
feature vector with four components, where the first two are mean and variance of the
edge depth and the last two are the variance and range of external pixel values. More
48

Using MATLAB
details are provided in the Appendix. To use an MRF, we must define a neighborhood
system. To do this, we first find the pixel location at the center of mass for each
connected component. Then for each component we search outward in a spiral pattern
until the nearest neighbors are found. The number is determined in an offline training
process along with other model parameters. We will use the symbol to denote the set of
neighbors of connected component. To ensure all neighbors are mutual, if component is a
neighbor of component, we add component to the neighbor list of component if this is not
already the case. In order to specify the distribution of the MRF, we first define
augmented feature vectors.
The augmented feature vector,, for the connected component consists of the
feature vector concatenated with the horizontal and vertical pixel location of the
connected components center. We found the location of connected components to be
extremely valuable contextual information for text detection. For more details of the
augmented feature vector, see the Appendix. Next, we define a measure of dissimilarity
between connected components in terms of the Mahalanobis distance of the augmented
feature vectors given by
Where
is the covariance matrix of the augmented feature vectors on training
data. Next, the Mahalanobis distance, , is normalized using the equations
49

Using MATLAB
Using the defined neighborhood system, we adopted an MRF model with pairwise cliques. Let be the set of all where and denote neighboring connected components.
Then, the are assumed to be distributed as
Where S(.) is an indicator function taking the value 0 or 1, and , , and are scalar
parameters of the MRF model. As we can see, the classification probability is penalized
by the number of neighboring pairs which have different classes. This number is also
weighted by the term . If there exists a similar neighbor close to a given component, the
term becomes large since Di,j is small. This favors increasing the probability
that the two similar neighbors have the same class.
Figure3.6. Illustration of classification probability of Xi given a single neighbor Xj as a

function of the distance between the two components.
50

Using MATLAB
We introduced a term to control the tradeoff between missed and false

detections. The term may take a positive or negative value. If is positive, both text
detections and false detections increase. If it is negative, false and missed detections are
reduced. To find an approximate solution of (12), we use iterative conditional modes
(ICM) which sequentially minimizes the local posterior probabilities [38], [39]. The
classification labels, , are initialized with their maximum likelihood (ML) estimates, and
then the ICM procedure iterates through the set of classification labels until a stable
solution is reached.
4.3 Multi scale-COS/CCC Segmentation Scheme

In order to improve accuracy in the detection of text with varying size, we
incorporated a multiscale framework into the COS/CCC segmentation algorithm. The
multiscale framework allows us to detect both large and small components by combining
results from different resolutions. Since the COS algorithm uses a single block size (i.e.,
single scale), we found that large blocks are typically better suited for detecting large
text, and small blocks are better suited for small text. In order to improve the detection of
both large and small text, we use a multiscale segmentation scheme which uses the results
of coarse-scale segmentations to gui de segmentation on finer scales. Note that both COS
and CCC segmentations are per formed on each scale, however, only COS is adapted to
the multiscale scheme.
51

Using MATLAB
Figure3.7. Illustration of a multiscale-COS/CCC algorithm. Segmentation progresses

from coarse to fine scales, incorporating the segmentation result from the previous
coarser scale. Both COS and CCC are performed on each scale, however only COS was
adapted to the multiscale scheme.
The overview of our multiscale-COS/CCC scheme. In the multiscale scheme,
segmentation progresses from coarse to fine scales, where the coarser scales use larger
block sizes, and finer scales use smaller block sizes. Each scale is numbered from L-1 to
0, where L-1 is the coarsest scale and 0 is the finest scale. The COS algorithm is modified
to use different block sizes for each scale, and incorporates the previous coarser
segmentation result by adding a new term to the cost function.
the weighting factor of the error for false detection, was fixed to 0.5 for the multiscaleCOS/CCC training process.
52

Using MATLAB
4.5 Performance evolution of proposed system

The MRF-based binary segmentation used for the comparison is based upon the
MRF statistical model developed by Zheng and Doermann. The purpose of their
algorithm is to classify each component as either noise, hand-written text, or machine
printed text from binary image inputs. Due to the complexity of implementation, we used
a modified version of the CCC algorithm incorporating their MRF model by simply
replacing our MRF classification model by their MRF noise classification model. The
multiscale COS algorithm was applied without any change. The clique frequencies of
their model were calculated through offline training using a training data set. Other
parameters were set as proposed in the paper.
4.5.1 Preprocessing
For consistency all scanner outputs were converted to RGB color coordinates and
descreened before segmentation. The scanned RGB values were first converted to an
intermediate device-independent color space, CIE XYZ, then transformed to RGB. Then,
resolution synthesis-based denoising (RSD) was applied.
This descreening procedure was applied to all of the training and test images. For
a fair comparison, the test images which were fed to other commercial segmentation
software packages were also descreened by the same procedure.
4.5.2 Segmentation Accuracy and Bit rate
To measure the segmentation accuracy of each algorithm, we used a set of
scanned documents along with corresponding ground truth segmentations. First, 38
documents were chosen from different document types, including flyers, newspapers, and
magazines. The documents were separated into 17 training images and 21 test images,
and then each document was scanned at 300 dots per inch (dpi) resolution on the EPSON
STYLUS PHOTO RX700 scanner. After manually segmenting each of the scanned
documents into text and nontext to create ground truth segmentations, we used the
training images to train the algorithms, as described in the previous sections. The
remaining test images were used to verify the segmentation quality. We also scanned the
53

Using MATLAB
test documents on two additional scanners: the HP Photo smart 3300 All-in-One series
and Samsung SCX-5530 FN.
These test images were used to examine the robustness of the algorithms to
scanner variations Fig4.9 illustrates segmentations generated by Otsu/CCC, multiscaleCOS/CCC/Zheng, DjVu, Lura Document, COS, COS/CCC, and multiscale-COS/CCC for
a 300 dpi test image. The ground truth segmentation is also shown. This test image
contains many complex features such as different color text, light-color text on a dark
background, and various sizes of text. As it is shown, COS accurately detects most text
components but the number of false detections is quite large.
However, COS/CCC eliminates most of these false detections without
significantly sacrificing text detection. In addition, multiscale- COS/CCC generally
detects both large and small text with minimal false component detection. Otsu/CCC
method misses many text detections. Lura Document is very sensitive to sharp edges
embedded in picture regions and detects a large number of false components. DjVu also
detects some false components but the error is less severe than LuraDocument.
Multiscale-COS/CCC/Zhengs result is similar to our multiscale- COS/CCC result but
our text detection error is slightly less.
54

Using MATLAB
Figure4.9: Binary masks generated from Otsu/CCC, multiscale-COS/CCC/Zheng, DjVu,

Lura Document, COS, COS/CCC, and multiscale-COS/CCC. (a) Original test image. (b)
Ground truth segmentation. (c) Otsu/CCC. (d) Multiscale-COS/CCC/Zheng. (e) DjVu. (f)
Lura Document. (g) COS. (h) COS/CCC. (i) Multiscale- COS/CCC.
CHAPTER-5
SOFTWARE REQUIREMENTS
The main tools required for this project can be classified into two broad categories.
1) Hardware requirement,
2) Software requirement.
5.1 Hardware Requirement

In the hardware part a normal computer where MATLAB software can be easily
operated is required, i.e. with a minimum system configuration of: RAM 512MB, hard
disk 20GB and with a processor Pentium III.
5.2 Software Requirements

In the software part MATLAB software and the video, which is to be stabilized, is
the minimum requirement. Some of the benefits from MATLAB in video processing are:
55

Using MATLAB
Easy to work with; as Images are matrices
Built in functions for complex operations and algorithms (Ex. FFT, DCT, etc)
Image processing tool box,
Supports most image formats (.bmp, .jpg, .gif, tiff etc)
5.2.1 Matlab Introduction

MATLAB is a high-performance language for technical computing. It integrates
computation, visualization, and programming in an easy-to-use environment where
problems and solutions are expressed in familiar mathematical notation. Typical uses
include:
Math and computation
Algorithm development
Modeling, simulation, and prototyping
Data analysis, exploration, and visualization
Scientific and engineering graphics
Application development, including graphical user interface building

MATLAB is an interactive system whose basic data element is an array that does
not require dimensioning. This allows solving many technical computing problems,
especially those with matrix and vector formulations.
The name MATLAB stands for matrix laboratory. MATLAB was originally
written to provide easy access to matrix software developed by the LINPACK and
EISPACK projects. Today, MATLAB uses software developed by the LAPACK and
ARPACK projects, which together represent the state-of-the-art in software for matrix
computation.
MATLAB has evolved over a period of years with input from many users. In
university environments, it is the standard instructional tool for introductory and
advanced courses in mathematics, engineering, and science. In industry, MATLAB is the
tool of choice for high-productivity research, development, and analysis.
56

Using MATLAB
MATLAB features a family of application-specific solutions called toolboxes.

Very important to most users of MATLAB, toolboxes allow learning and applying
specialized technology. Toolboxes are comprehensive collections of MATLAB functions
(M-files) that extend the MATLAB environment to solve particular classes of problems.
Areas in which toolboxes are available include signal processing, control systems, neural
networks, fuzzy logic, wavelets, simulation, and many others.
5.2.2 Overview
The MATLAB system consists of five main parts:
A .Development Environment
This is the set of tools and facilities that help to use MATLAB functions and files.
Many of these tools are graphical user interfaces. It includes the MATLAB desktop and
Command Window, a command history, and browsers for viewing help, the workspace,
files, and the search path.
B. The MATLAB Mathematical Function Library
This is a vast collection of computational algorithms ranging from elementary
functions like sum, sine, cosine, and complex arithmetic, to more sophisticated functions
like matrix inverse, matrix eigenvalues, Bessel functions, and fast Fourier transforms.
C. The MATLAB Language
This is a high-level matrix/array language with control flow statements, functions,
data structures, input/output, and object-oriented programming features. It allows both
"programming in the small" to rapidly create quickly and dirty throwaway programs, and
"programming in the large" to create complete large and complex application programs.
D. Handle Graphics
This is the MATLAB graphics system. It includes high-level commands for twodimensional and three-dimensional data visualization, image processing, animation, and
presentation graphics. It also includes low-level commands that allow to fully customize
the appearance of graphics as well as to build complete graphical user interfaces on
MATLAB applications.
E. The MATLAB Application Program Interface (API)
57

Using MATLAB
This is a library that allows writing C and FORTRAN programs that interact with
MATLAB. It includes facilities for calling routines from MATLAB (dynamic linking),
calling MATLAB as a computational engine, and for reading and writing MAT-files.
5.2.3 Basic programming
This part provides a brief introduction to starting and quitting MATLAB, and the
tools and functions that help to work with MATLAB variables and files.
A. Starting MATLAB
On a Microsoft Windows platform, to start MATLAB, double-click the MATLAB
shortcut icon. On a UNIX platform, to start MATLAB, type mat lab at the operating
system prompt. To can change the directory in which MATLAB starts, define startup
options including running a script upon startup, and reduce startup time in some
situations.
B. Quitting MATLAB
To end MATLAB session, select Exit MATLAB from the File menu in the
desktop, or type quit in the Command Window. To execute specified functions each time
MATLAB quits, such as saving the workspace, create and run a finish.m script.
C. MATLAB Desktop
When starting MATLAB, the MATLAB desktop appears, containing tools
(graphical user interfaces) for managing files, variables, and applications associated with
MATLAB. The first time MATLAB starts, the desktop appears as shown in the following
illustration, although your Launch Pad may contain different entries.
To can change the way desktop looks can be done by opening, closing, moving,
and resizing the tools in it. To move tools outside of the desktop or return them back
inside the desktop (docking) can also be done. All the desktop tools provide common
features such as context menus and keyboard shortcuts. Selecting Preferences from the
File menu can also specify certain characteristics for the desktop tools. For example,
specifying the font characteristics for Command Window text.
58

Using MATLAB
D. Desktop Tools
This section provides an introduction to MAT Labs desktop tools. MATLAB
functions are also used to perform most of the features found in the desktop tools. The
tools are:
Command Window
Command History
Launch Pad
Help Browser
Current Directory Browser
Workspace Browser
Array Editor
Editor/Debugger
i.
Command Window
Use the Command Window to enter variables and run functions and M-files.
ii.
Command History
The lines entered in the Command Window are logged in the Command History
window. In the Command History, previously used functions can be viewed, and copy
and execute selected lines. To save the input and output from a MATLAB session to a
file, use the diary function.
iii.
Running External Programs

MATLAB can also be used to run external programs from the Command Window.
The exclamation point character ! is a shell escape and indicates that the rest of or the
input line is a command to the operating system. This is useful for invoking utilities
running other programs without quitting MATLAB. On Linux, for example, Emacs
magik.m invokes an editor called emacs for a file named magik.m. When quitting the
external program, the operating system returns control to MATLAB.
59

Using MATLAB
iv.
Launch Pad
MAT Labs Launch Pad provides easy access to tools, demos, and documentation.
v.
Help Browser
Use the Help browser to search and view documentation for all the Math Works
products. The Help browser is a Web browser integrated into the MATLAB desktop that
displays HTML documents.
To open the Help browser, click the help button in the toolbar, or type help
browser in the Command Window. The Help browser consists of two panes, the Help
Navigator, which is used to find information, and the display pane, to view the
information.
vi.
Help Navigator
Use to Help Navigator to find information. It includes:
Product filter - Set the filter to show documentation only for the products
specified.
Contents tab - View the titles and tables of contents of documentation for
products.
Index tab - Find specific index entries (selected keywords) in the Math Works
documentation.
Search tab - Look for a specific phrase in the documentation. To get help for a
specific function, set the Search type to Function Name.
vii.
Favorites tab - View a list of documents previously designated as favorites.

Display Pane
After finding documentation using the Help Navigator, view it in the display
pane. While viewing the documentation, to:
Browse to other pages - Use the arrows at the tops and bottoms of the pages, or
use the back and forward buttons in the toolbar.
60

Using MATLAB
Bookmark pages - Click the Add to Favorites button in the toolbar.
Print pages - Click the print button in the toolbar.
Find a term in the page - Type a term in the Find in page field in the toolbar and
click Go.
Other features available in the display pane are: copying information, evaluating a
selection, and viewing Web pages.

viii.
Current Directory Browser
To determine how to execute functions call, MATLAB uses a search path to find
M-files and other MATLAB-related files, which are organized in directories on file
system. Any file to run in MATLAB must reside in the current directory or in a directory
that is on the search path. By default, the files supplied with MATLAB and Math Works
toolboxes are included in the search path.
ix.
Workspace Browser
The MATLAB workspace consists of the set of variables (named arrays) built up
during a MATLAB session and stored in memory. Variables are added to the workspace
by using functions, running M-files, and loading saved workspaces.
To view the workspace and information about each variable, use the Workspace
browser, or use the functions who and whose. To delete variables from the workspace,
select the variable and select Delete from the Edit menu. Alternatively, use the clear
function.
To save the workspace to a file that can be read during a later MATLAB session,
select Save Workspace As from the File menu, or use the save function. This saves the
workspace to a binary file called a MAT-file, which has a .mat extension. There are
options for saving to different formats. To read in a MAT-file, select Import Data from the
File menu, or use the load function.
5.2.4 Array Editor
61

Using MATLAB
Double-click on a variable in the Workspace browser to see it in the Array Editor.

Use the Array Editor to view and edit a visual representation of one- or two-dimensional
numeric arrays, strings, and cell arrays of strings that are in the workspace.
5.2.5 Editor/Debugger
Use the Editor/Debugger to create and debug M-files, which are programs to run
MATLAB functions. The Editor/Debugger provides a graphical user interface for basic
text editing, as well as for M-file debugging. To use any text editor create M-files, such as
Emacs, and use preferences (accessible from the desktop File menu) to specify that editor
as the default. If to use another editor, use the MATLAB Editor/Debugger for debugging,
or use debugging functions, such as dbstop, which sets a breakpoint.
If it is needed to view the contents of an M-file, display it in the Command
Window by using the type function.
5.2.6 Manipulating Matrices
Entering Matrices
The best way to get started with MATLAB is to learn how to handle matrices.
Start MATLAB and follow along with each example.
Enter matrices into MATLAB in several different ways:
Enter an explicit list of elements.
Load matrices from external data files.
Generate matrices using built-in functions.
Create matrices with your own functions in M-files.
Start by entering Drer's matrix as a list of its elements. For this follow few basic
conventions:
Separate the elements of a row with blanks or commas.
Use a semicolon to indicate the end of each row.
62

Using MATLAB
Surround the entire list of elements with square brackets, [ ].
To enter Durer's matrix, simply type in the Command Window

A = [16 3 2 13; 5 10 11 8; 9 6 7 12; 4 15 14 1]
MATLAB displays the matrix just entered.
A=
16
13
10
11
12
15
14
This exactly matches the numbers in the engraving. Once entered the matrix, it is
automatically remembered in the MATLAB workspace. Now it is simply referred as A.
5.2.7 Expressions
Like most other programming languages, MATLAB provides mathematical
expressions, but unlike most programming languages, these expressions involve entire
matrices. The building blocks of expressions are:
Variables
Numbers
Operators
Functions
5.2.8 Variables
MATLAB does not require any type declarations or dimension statements. When
MATLAB encounters a new variable name, it automatically creates the variable and
allocates the appropriate amount of storage. If the variable already exists, MATLAB
changes its contents and, if necessary, allocates new storage. For example, num_students
63

Using MATLAB
= 25 creates a 1-by-1 matrix named num_students and stores the value 25 in its single
element. Variable names consist of a letter, followed by any number of letters, digits, or
underscores. MATLAB uses only the first 31 characters of a variable name. MATLAB is
case sensitive; it distinguishes between uppercase and lowercase letters. A and a are not
the same variable. To view the matrix assigned to any variable, simply enter the variable
name.
5.2.9 Numbers
MATLAB uses conventional decimal notation, with an optional decimal point and
leading plus or minus sign, for numbers. Scientific notation uses the letter e to specify a
power-of-ten scale factor. Imaginary numbers use either i or j as a suffix. Some examples
of legal numbers are
3
9.6397238
1i
-99
1.60210e-20
-3.14159j
0.0001
6.02252e23
3e5i
All numbers are stored internally using the long format specified by the IEEE
floating-point standard. Floating-point numbers have a finite precision of roughly 16
significant decimal digits and a finite range of roughly 10-308 to 10+308.
5.2.10 Operators
Expressions use familiar arithmetic operators and precedence rules.
+
Addition
Subtraction
Multiplication
Division
64

Using MATLAB
Left division (described in "Matrices and Linear Algebra" in

Using MATLAB)
Power
'
Complex conjugate transpose
()
Specify evaluation order
Table 5.1 Arithmetic Operators
5.3 Functions
MATLAB provides a large number of standard elementary mathematical
functions, including abs, sqrt, exp, and sin. Taking the square root or logarithm of a
negative number is not an error; the appropriate complex result is produced
automatically. MATLAB also provides many more advanced mathematical functions,
including Bessel and gamma functions. Most of these functions accept complex
arguments. For a list of the elementary mathematical functions, type
help elfun
For a list of more advanced mathematical and matrix functions, type
help spec fun
help elmat
Some of the functions, like sqrt and sin, are built-in. They are part of the
MATLAB core so they are very efficient, but the computational details are not readily
accessible. Other functions, like gamma and sinh, are implemented in M-files. The code
can be seen and even can be modified. Several special functions provide values of useful
constants.
Pi
3.14159265...
Imaginary unit, -1
65

Using MATLAB
Same as i
Eps
Floating-point relative precision, 2-52
Realmin
Smallest floating-point number, 2-1022
Realmax
Largest floating-point number, (2- )21023
Inf
Infinity
NaN
Not-a-number
Table 5.2 Values of Useful Constants in MATLAB
CHAPTER-6
EXPERIMENTAL RESULTS
66

Using MATLAB
CHAPTER-7
CONCLUSION
We presented a novel segmentation algorithm for the compression of raster
documents. While the COS algorithm generates consistent initial segmentations, the CCC
algorithm substantially reduces false detections through the use of a component-wise
MRF context model. The MRF model uses a pair-wise Gibbs istribution which more
heavily weights nearby components with similar features. We showed that the COS/CCC
algorithm achieves greater text detection accuracy with a lower false detection rate, as
67

Using MATLAB
compared to state-of-the-art commercial MRC products. Such text-only segmentations

are also potentially useful for document processing applications such as OCR.
REFERENCES
[1] ITU-T Recommendation T.44 Mixed Raster Content (MRC), T.44, International
Telecommunication Union, 1999.
[2] G. Nagy, S. Seth, and M. Viswanathan, A prototype document image analysis system
for technical journals, Computer, vol. 25, no. 7, pp. 1022, 1992.
[3] K. Y.Wong and F. M.Wahl, Document analysis system, IBM J. Res. Develop., vol.
26, pp. 647656, 1982.
68

Using MATLAB
[4] J. Fisher, A rule-based system for document image segmentation, in Proc. 10th Int.
Conf. Pattern Recognit., 1990, pp. 567572.
[5] L. OGorman, The document spectrum for page layout analysis, IEEE Trans.
Pattern Anal. Mach. Intell., vol. 15, no. 11, pp. 11621173, Nov. 1993.
[6] Y. Chen and B. Wu, A multi-plane approach for te 14191444, 2009.
[7] R. C. Gonzalez and R. E. Woods, Digital Image Processing, 3rd ed. Upper Saddle
River, NJ: Pearson Education, 2008.
[8] ITU-T Recommendation T.44 Mixed Raster Content (MRC), T.44, International
Telecommunication Union, 1999.
[9] G. Nagy, S. Seth, and M. Viswanathan, A prototype document image analysis system
for technical journals, Computer, vol. 25, no. 7, pp. 1022, 1992.
[10] K. Y.Wong and F. M.Wahl, Document analysis system, IBM J. Res. Develop., vol.
26, pp. 647656, 1982.
[11] J. Fisher, A rule-based system for document image segmentation, in Proc. 10th
Int. Conf. Pattern Recognit., 1990, pp. 567572.
[12] L. OGorman, The document spectrum for page layout analysis, IEEE Trans.
Pattern Anal. Mach. Intell., vol. 15, no. 11, pp. 11621173, Nov. 1993.
[13] Y. Chen and B. Wu, A multi-plane approach for text segmentation of complex
document images, Pattern Recognit., vol. 42, no. 7, pp. 14191444, 2009.
[14] R. C. Gonzalez and R. E. Woods, Digital Image Processing, 3rd ed. Upper Saddle
River, NJ: Pearson Education, 2008.
69

Text Segmentation

Caricato da

Informazioni sul documento

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Text Segmentation

Caricato da

Copyright:

Formati disponibili

Test Segmentation of MRC Document Compression and Decompression by

Image Processing image in image out

Image Analysis image in measurements out

Image Understanding image in high-level description out

1.2 The Image Processing System

Dept. Of E.C.E, SIETK, PUTTUR

Test Segmentation of MRC Document Compression and Decompression by

Hard copy device

Figure1.1: Image processing system

Digitizer, digitizer, digitizing or digitizing may refer to: Digitizing or digitization,

Test Segmentation of MRC Document Compression and Decompression by

Figure 1.2: Digitizer

Dept. Of E.C.E, SIETK, PUTTUR

Test Segmentation of MRC Document Compression and Decompression by

Representation and discretion

Recognition and interpretation

Figure1.3: Fundamental steps of image processing

the appearance of an image. However, unlike enhancement, which is subjective, image

Dept. Of E.C.E, SIETK, PUTTUR

Test Segmentation of MRC Document Compression and Decompression by

Test Segmentation of MRC Document Compression and Decompression by

1.2.4 Mass storage

1.3 APPLICATIONS OF DIGITAL IMAGE PROCESSING

Image sharpening and restoration

Transmission and encoding

Dept. Of E.C.E, SIETK, PUTTUR

Test Segmentation of MRC Document Compression and Decompression by

1.3.1 Image sharpening and restoration

Test Segmentation of MRC Document Compression and Decompression by

1.3.6 Line follower robot

Test Segmentation of MRC Document Compression and Decompression by

categorization1. The categoryization presented in this chapter is therefore rather a

Threshold based segmentation. Histogram thresholding and slicing techniques

also be combined with pre- and post-processing techniques.

assumed to represent object boundaries, and used to identify these objects.

readily be applied for image segmentation.

Dept. Of E.C.E, SIETK, PUTTUR

Test Segmentation of MRC Document Compression and Decompression by

Figure 1.1: Example of Segmentation

1.5 Text Segmentation

Dept. Of E.C.E, SIETK, PUTTUR

Test Segmentation of MRC Document Compression and Decompression by

Languages which do not have a trivial word segmentation process include

Test Segmentation of MRC Document Compression and Decompression by

Manual analysis of text and writing custom software

Dept. Of E.C.E, SIETK, PUTTUR

Test Segmentation of MRC Document Compression and Decompression by

Run-length encoding used as default method in PCX and as one of possible in

Area image compression

DPCM and Predictive Coding

Adaptive dictionary algorithms such as LZW used in GIF and TIFF

Deflation used in PNG, MNG, and TIFF

Dept. Of E.C.E, SIETK, PUTTUR

Test Segmentation of MRC Document Compression and Decompression by

1.6.2 Methods for lossy compression:

Transform coding. This is the most commonly used method. In particular, a

1.6.3 Other properties

Test Segmentation of MRC Document Compression and Decompression by

Quality progressive or layer progressive: The bitstream successively refines the

Component progressive: First encode grey; then color.

1.6.5 Region of interest coding

Test Segmentation of MRC Document Compression and Decompression by

Dept. Of E.C.E, SIETK, PUTTUR

Test Segmentation of MRC Document Compression and Decompression by