Sei sulla pagina 1di 31

B.L.D.E. A’S V.P. Dr. P.G.

HALAKATTI COLLEGE OF ENGINEERING AND


TECHNOLOGY BIJAPUR – 586 103

DEPARTMENT
OF
COMPUTER SCIENCE AND ENGINNERING

SEMINAR REPORT ON
“DOCUMENT TEXT SEGMENTATION”

Submitted in partial fulfillment for the award of


BACHELOR OF ENGINEERING
IN
COMPUTER SCIENCE AND ENGINNERING
UNDER THE GUIDANCE OF
Prof. Sujata Desai

SUBMITTED BY

NAME USN
ANKITA KULKARNI 2BL16CS013
B.L.D.E. A’S V.P. Dr. P.G. HALAKATTI COLLEGE OF ENGINEERING
AND TECHNOLOGY BIJAPUR – 586 103

DEPARTMENT OF COMPUTER SCIENCE AND


ENGINEERING
CERTIFICATE
This is to certify that the seminar entitled ”DOCUMENT TEXT SEGEMNTATION” is
a bonafide work carried out by ANKITA KULKARNI(2BL16CS013) and Submitted
th
in partial fulfillment for the award of Degree of Bachelor of Engineering in VIII
Semester of Visvesvaraya Technological University, Belgaum during the year
2019-2020.

GUIDE HOD PRINCIPAL

Prof. Sujata Desai Dr. Pushpa Patil Dr. Atul Ayare

CO-ORDINATORS
Prof. D. Y. Ijeri
Prof. S. S. Patil
ACKNOWLEDGEMENT

While presenting this Seminar on” Document Text Segmentation ”, I feel that it is our
duty to acknowledge the help rendered to us by various persons.

Firstly I thank God for showering his blessings on me. I am grateful to my V.P. Dr.
P.G.Halakatti College of Engineering and Technology for providing me a congenial
atmosphere to carry out the seminar successfully.

We wish to express our sincere gratitude to Dr. Pushpa. B. Patil, head of the department for
providing us an opportunity to do Seminar..

We are thankful to our guide Prof. Sujata Desai for their constant support ranging from
explaining the concepts in verification, patiently answering even the most trivial and basic
questions, helping us in solving out the issues in the actual work. She has been a remarkable
source of inspiration and constant motivation for us. their guidance and nurturing helped us in
strengthening our technical knowledge and skills greatly.

Finally, we convey our appreciation to the CSE department for providing us guidance,
support and timely assistance throughout the project.
VISVESVARAYA TECHNOLOGICAL UNIVERSITY JNANA
SANGAMA, BELAGAVI

Department of Computer Science and Engineering


B.L.D.E.A'S V.P. Dr. P.G. HALAKATTI COLLEGE OF ENGINEERING
AND TECHNOLOGY, VIJAYAPUR- 586101

DECLARATION

I Ankita Kulkarni, a student of Eight Semester B.E, Computer Science and Engineering,
B.L.D.E.A’s V.P. Dr P.G.H College of Engineering and Technology declare that the Seminar
Report titled “Document Text Segmentation” has been carried out by me and submitted in
partial fulfillment of the course requirements for the award of degree in Bachelor of
Engineering in Computer Science and Engineering of Visvesvaraya Technology University,
Belagavi, during academic year 2019-2020.

Place: Vijayapur

Date:
ABSTRACT
Segmentation of text from image documents has many important applications such as
document retrieving, object identification, detection of vehicle license plate, etc. It is very
popular research field in recent years. Text information in natural scene images serves as
important clues for many image-based applications such as scene understanding, content-
based image retrieval, assistive navigation, and automatic geocoding. However, locating text
from a complex background with multiple colours is a challenging task. Two different
techniques using morphological operations are proposed to find text strings from any natural
scene images.
The system employs Symlet wavelet and 2-mean classification for segmentation of
text from image document. The system uses morphology operation like as dilation and
erosion in post processing. Proposed method for text segmentation from image document has
been implemented in MATLAB. A method for the segmentation of the text regions from
compound images is proposed. The Haar Discrete Wavelet Transform (DWT) is employed.
The resulted detail component sub bands contain both text edges and non-text edges. The
Sobel edge detector is applied on each sub image to extract the strong edges. The resultant
edges are used to form an edge map using a weighted OR operator. The edge map is
binarized using a threshold. A morphological dilation operation is performed on the
processed edge map. The projection profiles are analysed to localize the text regions. Finally
a threshold is applied which results in the segmentation of the real text regions from the
image.
CONTENT

CHAPTER NO TITLE PAGE NO.

1. INTRODUCTION

2. LITERATURE SURVEY

3. MOTIVATION

4. OBJECTIVES AND PROBLEM DEFINATION

5. ARCHITECTURE OF THE SYSTEM

6. TOOLS AND TECHNOLOGY

7. EXPERIMENTAL RESULTS

8. APPLICATIONS

CONCLUSION

REFERENCES
CHAPTER 1
INTRODUCTION

Text line extraction or segmentation is an important problem that does not have an
universal accepted solution in the context of automatic handwritten document recognition
systems. Text characteristics can vary in font, size, shape, style, orientation, alignment,
texture, colour, contrast and background information. These variations turn the process of
word detection complex and difficult.

Text information extraction (TIE) is a technology used to extract text contents from
still images and image sequences, and then turn them into machine-editable text. Three main
steps of TIE are text localization, text segmentation and text recognition. For increasing the
recognition rate, the localized text regions must usually be segmented from the image and
converted into binary images before sent to an OCR system for recognition. The accuracy of
text segmentation can highly affect the text recognition rate. Scene text is a kind of text that
easily suffers from uneven lighting and complex background, so it is challenging to achieve
text segmentation under complex conditions. Many techniques on text segmentation have
been surveyed in and a lot of other methods on text segmentation have been proposed in
recent years. The recent techniques for text segmentation can be classified into stroke-based
methods, colour-based methods and graph-based methods.

Stroke-based methods search the stroke-like structures in text images using a stroke
filter. Considering the strokes as the intrinsic text characteristics, a definition of text is given
first and then a stroke filter is designed to get the response of text strokes. After the text
colour polarity is determined, a local region growing procedure is performed to refine the
binarized stroke response map. This approach mainly makes use of the transitional colour
between the strokes and the adjacent background in embedded video text images.
Unfortunately, this approach may not work for scene text as there is usually no transitional
colour between scene text characters and the adjacent background.

In colour-based methods, colour information is considered and text segmentation is


performed using certain colour model. Song performed colour reduction and clustering using
the Euclidean distance to reduce the number of colours and processing time. Each colour
cluster was treated as a colour plane, and each colour plane was classified into text,
background, noise or text edges. In the work of Mancas-Thillou e, the Euclidean distance and
the Cosine similarity were used for K-mean clustering. Log-Gabor filters were chosen to
combine colour and spatial information to segment characters properly. Based on the
observation that the hue value of chromatic pixels changed less than the lightness value under
shadows and uneven lighting condition, Yao classified text regions into three types: gray text
region, chromatic text region, gray and chromatic mixture text region.

Segmentation algorithms based on hue, hue and lightness, and lightness were
performed respectively for the three types of classifying text regions respectively. But for text
image with low quality, this method is not effective enough. Li utilized graph theory to
handle multi-polarity text segmentation. In their model, the corresponding intensity map of a
colour image was represented with an undirected, weighted graph by treating each pixel
group at a gray level as a node.
CHAPTER 2
LITERATURE SURVEY

1. Vyankatesh V. Rampurkar , Sahil K. Shah , Gyankamal J. Chhajed , Sanjay


Kumar Biswash’s “An Approach towards Text Detection from Complex Images
Using Morphological Techniques” (2018)
Proposed techniques are based on siblings’ method i.e. adjacent character grouping
method and text line grouping method. Text line grouping method can locate text
strings situated at arbitrary orientations. Proposed techniques can find text strings by
using structure-based partition and grouping methods using morphological
operations.

2. “Text Line Segmentation Based on Morphology and Histogram Projection” by


Rodolfo P. dos Santos, Gabriela S. Clemente, Tsang Ing Ren and George D.C.
Calvalcanti (2009)
In order to segment text from a page document it is necessary to detect all the
possible
manuscript text regions. In this article we propose an efficient algorithm to segment
handwritten text lines. The text line algorithm uses a morphological operator to
obtain the features of the images.

3. “Automatic Detection and Segmentation of Text in Low Quality Thai Sign


Images” by Wittaya Jirattitichareon and Thanarat H. Chalidabhongse (2006)
A system for automatic detection and segmentation of text in low quality Thai sign
images is presented in this paper. The method is designed as a part of a real-time
Thai sign translator system which can be used in many applications.

4. “Edge Based Segmentation Approach to Extract Text from Scene Images” by


Kumuda T and L Basavaraj (2017)
After text extraction, they are given as input to OCR and then the results are
displayed in the notepad. From the test results we can see that, the proposed
algorithm extract text of different font, size, and orientation efficiently.
5. “Segmentation of text from compound images” by Dr.N.Krishnan , C. Nelson
Kennedy Babu , S.Ravi and Josphine Thavamani (2007)
In this paper, a method for the segmentation of the text regions from compound
images is proposed. The Haar Discrete Wavelet Transform (DWT) is employed. The
resulted detail component sub bands contain both text edges and non-text edges.

6. “ Segmentation and Extraction of Text from Curved Text Lines using Image
Processing Approach” by Monika A. Shejwal, Sangita D. Bharkad (2017)
The camera captured images containing text are having curved text lines because of
distortions by page curl and the view angle of camera. So it is necessary while
scanning the document, the text should be straight and words are inline properly. But
text lines segmentation of curled text is a difficult method for dew raping techniques.
This paper presents the method based on image processing algorithms for
segmentation and extraction of characters from curled text lines from document
images.

7. “Text and Non-text Segmentation and Classification From Document Images”


by Zaidah Ibrahim Dino Isa and Rajprasad Rajkumar (2008)
Heuristic rules have been used in segmenting and classifying the text and non-text
blocks. This research focuses on the classification of non-text block in technical
documents into table, graph, and figure. A comparative study is conducted between
backpropagation neural network and support vector machine and the result shows
that support vector machine classifies better than back propagation neural network.

8. “Segmentation of Text From Image Document” by Ankush Gautam Department


of Information Technology Graphic Era University, Dehradun
Segmentation of text from image documents has many important applications such as
document retrieving, object identification, detection of vehicle license plate, etc. It is
very popular research field in recent years. In this paper, we employ Symlet wavelet
and 2-mean classification for segmentation of text from image document. We have
used morphology operation like as dilation and erosion in post processing. Proposed
method for text segmentation from image document has been implemented in
MATLAB.
CHAPTER 3
MOTIVATION

Presently, many common documents such as newspapers, magazines, technical journals


and even business forms are being stored in their digital forms. Thus, document layout
analysis is very important as it allows automatic classification based on total content rather
than just a few keywords. Document layout analysis systems separate text and non-text
regions and much research has been conducted in this area.

The digitization of document is an important method for increasing the quality and
compatibility of document. Now a day instead of scanning the document, camera captured
document images are very much used by people. Because cameras are on hand usually at low
cost and implanted in all mobile gadgets. This gives quick and non- contact document
imaging. The quality of the document images captured by camera is very poor because
camera perspective distortions, non-uniform shading, image blurring, character smearing (due
to low resolution) and lighting variations. So document image analysis plays a vital role in
extraction of information from document images. Extraction of straight text lines from
document images is easy as compared to the curved text lines. The text of document is in
curved manner because of camera perspective and other distortions.

In this paper, we tried to tackle these obstacles occur in binarization of camera captured
images. We used image processing algorithms for segmentation of the text lines of document
images.

Business forms usually do not consists of images or graphics but contain text within
specified areas or boxes. Text extraction from specific business forms is based on prior
knowledge about the location and the length of the text while properties of the boxes, such as
height and width are sometimes being used to determine the location of the text .

Other documents like newspapers, magazine sand technical journals have specific layout
structure and the text extraction is based on the prior knowledge of the location of the text
like headers, footers and page numbers and since these documents sometimes contain non-
text regions, RLSA and PPC are being used to segment the regions and heuristic rules are
applied to classify the text and non-text regions produced since the layout structure of most of
the documents are known. Multi-stage thresholding is applied in and histograms and
heuristics for block segmentation and classification is applied in while interactive input from
user for the desired block is applied performs horizontal dilation and the length of the line is
fixed so that the lines can be extracted completely and heuristic rules are applied for text and
non-text region classification. Region growing and analysis with heuristic rule is also applied
where the generic modelling of paper layout is known. Connected component generation and
heuristics are also applied in.
CHAPTER 4

OBJECTIVES AND PROBLEM DEFINITION

Image segmentation is a task of fundamental importance in digital image analysis. It is


the process of partitioning a digital image into multiple segments. Documents in which text is
embedded in picture are increasingly common today for example, in magazines,
advertisements and web pages. Robust detection of text from these documents receives a
growing attention owing to their number of applications content based indexing, Text
searches in images and Archiving documents.
In the face of the very important mass of information exchanged between different
organizations, the need for systems allowing the recognition, the indexation, the information
retrieval and the automatic classification of complex multi-lingual and multi-script document
images has grown continuously. However, most works of backward-conversion of printed
document images are limited to textual block recognition without handling complex
documents such as letters of information, forms, all types of application sheet, etc. In
practice, these documents can be noised, skewed, deformed, multi-lingual, multi-script with
irregular textures and may contain several heterogeneous blocks such as annotations, machine
print and/or handwritten script, graphics, pictures, logos, photographs, tabular structures. This
situation makes it difficult to analyse and recognize document images.
There are various approaches for text and picture segmentation in image document
namely region based approach and texture based approach.

The text-regions in a document image can be detected either by region based and texture-
based methods. They are relatively independent of changes in text size and orientation, but
having difficulties with complex images with non-uniform backgrounds, for example, if a
text string touches a graphical object in the original image, they may form one connected
component in the resultant binary image.
In the region based approach, we consider each pixel in the image and assign it to a
particular region or object. This approach is basically divided into two subcategories: edge
based and connected component based.

Basically the idea behind the edge-based algorithms is that the edges of text symbols are
typically stronger than those of noise, textured-background and other graphical items. In
these top-down techniques, a binary edge image is first generated using an edge detector, and
then adjacent edges are connected by applying morphological operations or other algorithms
such as run-length smoothing.
Connected components of the resultant image are the candidate text areas, as each one
represents either several merged lines or a graphical item. Then, each component is
decomposed into smaller regions by analysing its vertical and horizontal projection profiles,
and finally each of the small regions satisfying certain heuristic constraints is labelled as text.
Edge-based methods are fast and can detect text in complex backgrounds but are restrictive to
detect only horizontally or vertically aligned text strings.

In texture-based approach the input image is usually considered as a composite of text


and non-text or text, picture and background texture classes. Many segmentation algorithms
employ a classification window of a certain size in the hope that all or majority of pixels in
the window belong to the same class. Thereafter, a classification algorithm can be used to
label each window in the feature space.

The literature on text segmentation is broad in scope but there appears to be very little
literature on using machine learning techniques on this subject. Text segmentation algorithm
should have adaptation and learning capability, but a learner usually needs much time and
training data to achieve satisfactory results, which restricts its practicality. To overcome these
problems, M. M. Haji, S. D. Katebi give a simple procedure for generating training data from
manually segmented images, then applying a Naive Bayes Classifier (NBC), which is fast
both in training and application phase.
CHAPTER 5
ARCHITECTURE OF THE SYSTEM

Convert RGB image into gray image

Symlet Image

Block Processing

K-means clustering

Post processing by Morphological


Operations

Mask Image

Obtain text image


Fig5.1:Flow diagram of Text Segmentation

A. Image Acquisition
The Image Acquisition is the process of collection of images for text and picture
segmentation in image document. We have used scanned image for text and picture
segmentation in image document.

B. Image pre-processing
The aim of pre-processing is to improve image data so that it removes undesired distortions
and/or it enhances image features that are relevant for further processing.
The analysis of a picture using image processing that can identify shades, colours and
relationships that cannot be perceived by the human eye. Image processing is used to solve
identification problems, such as in forensic medicine or in creating weather maps from
satellite pictures. It deals with images in bitmapped graphics format that have been scanned
in or captured with digital cameras. The colour images are then converted to grey level
images by using the following formula

grey (i , j) = 0.59 green(i , j) + 0.30 red(i , j) + 0.11 blue(i , j)

C. Symlet Wavelet
The wavelet transform provides a multi-resolution representation of an image that has
become quite popular in recent years owing to their huge number of applications in various
fields, such as, for example, telecommunications, geophysics, astrophysics and in computer
vision field to enable to detection, analysis and recognition of image features and properties
over varying ranges of scale.
Symlet wavelets are a family of wavelets. They are a modified version of Daubechies
wavelets with increased symmetry. Symlet Wavelet[n] is defined for any positive integer n.
The scaling function and wavelet function have compact support length of 2n. The scaling
function has n vanishing moments.
Symlet wavelet can be used with functions as Discrete Wavelet Transform. The
discrete wavelet transform is a mathematical tool for signal analysis and image processing.
By the wavelet transform, an image can be decomposed into multiresolution frame in which
every portion has distinct frequency and spatial properties.

D. Block processing
Images can either be too large to load into memory, or else they can be loaded into memory
but then be too large to process. Therefore block processing is more useful to automatically
divide the input image into blocks of the user- specified size, process each block individually
and then reassembles each block results into the output image.
If we want to divide an image into blocks and process each block individually, the
function blkproc is used that allow to process distinct blocks.

E. K-means Clustering
The K-mean algorithm is an iterative technique that is used to partition an image into K
clusters. The basic algorithm is:
1. Pick K cluster centres, either randomly or based on some heuristic.
2. Assign each pixel in the image to the cluster that minimizes the distance between the
pixel and the cluster centre
3. Re-compute the cluster centre by averaging all of the pixels in the cluster
4. Repeat steps 2 and 3 until convergence is attained (e.g. no pixels change clusters)

The k-means algorithm is an evolutionary algorithm that gains its name from
its method of operation. The algorithm clusters observations into k groups, where k is
provided as an input parameter. We used 2-means classification in our
implementation. One group of white pixel and second group of black pixel are used in
2-means classification.

F. Post Processing
Post processing attempts to increase the quality of a mask image. Post processing is
performed with the help of Morphology. The morphological operations are dilation and
erosion. Dilation adds pixels to the boundaries of objects in an image, while erosion removes
pixels on object boundaries. The number of pixels added or removed from the objects in an
image depends on the size and shape of the structuring element used to process the image.

5.2 HAAR DWT METHOD


In another approach, a morphological method is presented for text extraction. The edges are
identified using a morphological gradient operator. The resulting edge image is then
thresholded to obtain a binary edge image. Adaptive thresholding is performed for each
candidate region in the intensity image, which is less sensitive to illumination conditions and
reflections. Edges that are spatially close are grouped by dilation to form candidate regions,
while small components are removed by erosion. Non-text components are filtered out using
size, thickness, aspect ratio, and grey level homogeneity.
Fig5.2: Block Diagram of the Text Segmentation Algorithm
Fig5.3: Four Components of the input image after 2D Haar DWT

A. Pre-processing
If the input image is a colour image, its RGB components are combined to give the
intensity image Y as follows:
Y = 0.299R + 0.587G + 0.114B
Image Y is then processed with discrete wavelet transform.

B. Haar Discrete Wavelet Transform


The Discrete Wavelet Transform (DWT) provides a powerful tool for modelling the
characteristics of textured images. The discrete wavelet transform is a very useful tool for
signal analysis and image processing, especially in multi-resolution representation. It can
decompose signal into different components in the frequency domain.
Two dimensional discrete wavelet transform (2-D DWT) decomposes an input image
into four sub-bands, one average component (LL) and three detail components (LH, HL, HH)
as shown in the figure 3. Haar DWT detects three kinds of edges of the original image. 2D
Haar DWT decomposes an input image into 4 sub-bands, one average component (LL) and 3
detail components (LH, HL, HH).The illumination components are transformed to the
wavelet domain using Haar wavelet. This stage results in the four LL, HL,LH and HH sub
image coefficients.
The candidate text edges in the original image are detected from those three detail
component sub-bands (HL, LH and HH). After the Haar DWT, the detected edges include
mostly text edges and some non-text edges in every detail component sub-band. The
operation for the Haar DWT is simpler than that of any other wavelets. Some of the
advantages of Haar wavelet is that it is orthogonal and compactly supported. Its coefficients
are either 1 or -1.

C. Extracting Text Edges


The dense edges are the distinct characteristics of the text blocks which are used to detect the
possible text regions. The candidate text regions are found by finding the edges in the
mentioned sub-bands and fusing the edges contained in each sub-band. The Sobel edge
detector is efficient to extract the strong edges. The Sobel edge detector is applied on each
sub-band to get the candidate text edges. In the next step, using a weighted ‘OR’ operator,
these candidate text edges are used to form the edge map. A threshold is applied on the edge
map to obtain the binary edge map. Then, a morphological dilation operation is performed on
the processed edge map. This operation results in filling the gaps inside the obtained
characters’ regions.

D. Removing Non-Text Regions


To further enhance the results, the non-text regions are removed. The horizontal rectangular
areas with high density indicate text strings. Projection is a more efficient way to find such
high density areas. The idea of coarse-to-fine detection is to locate the text region
progressively by two phase projection. There are lots of detected edge dense blocks that
include multi-line texts. The projection profile is used to separate these blocks into single line
text. A horizontal/vertical projection profile is defined as the vector of the sums of the pixel
intensities over each column/row.
The horizontal and vertical projection of the processed edge map is found. The average of the
minimum and maximum value of the vertical projection is taken as the threshold. Then the
rows whose sums of pixel intensities are above the threshold are taken. Next, the horizontal
projection of only those rows is found.
The minimum and maximum of the horizontal projection is taken and the average of
them is taken as the threshold. Only the columns whose sums of pixel intensities are above
the threshold are taken. This results in the localization of the text regions in the image. The
corresponding regions in the original grayscale image are taken. Finally, a threshold is
applied on these regions, which results in the segmentation of the real text regions from the
image.
CHAPTER 6

TOOLS AND TECHNOLOGY

MATLAB:

When it comes to image processing what can be more flexible than MATLAB. It stands for
“Matrix Laboratory”, a fourth-generation high-level programming language developed by
MathWorks (U.S). It was created in the year 1984 with an objective to provide interactive
environment for computation, visualisation and programming. It is written in C, C++ and Java.
MATLAB is used in every facet of computational mathematics. Mathematical calculations
where it is most commonly used include- matrix and array manipulations, linear algebra,
algebraic equations, statistics, calculus, integration, transformation, etc. It has wide range of
applications including signal processing, image and video processing, control systems,
computer vision, AI, etc. MATLAB is the most popular software used in the field of Digital
Image Processing. However, it is not open source, a user has to pay for licensed MATLAB
interpreter.

TECHNOLOGY-Optical Character Recognition (OCR)

Optical character recognition (OCR) is process of classifying optical patterns contained in a


digital image. The character recognition is achieved through segmentation, feature extraction
and classification. OCR (optical character recognition) is the recognition of printed or written
text characters by a computer. This involves photo scanning of the text character-by-character,
analysis of the scanned-in image, and then translation of the character image into character
codes, such as ASCII, commonly used in data processing.

In OCR processing, the scanned-in image or bitmap is analysed for light and dark areas in
order to identify each alphabetic letter or numeric digit. When a character is recognized, it is
converted into an ASCII code. Special circuit boards and computer chips designed expressly
for OCR are used to speed up the recognition process.
OCR is often used as a “hidden” technology, powering many well-known systems and
services in our daily life. Less known, but as important, use cases for OCR technology
include data entry automation, indexing documents for search engines, automatic number
plate recognition, as well as assisting blind and visually impaired persons.

OCR technology has proven immensely useful in digitising historic newspapers and texts that
have now been converted into fully searchable formats and had made accessing those earlier
texts easier and faster.
CHAPTER 7

EXPERIMENTAL RESULTS
We have developed segmentation of text and picture from image document with the
help of Symlet wavelet and 2-mean classification using MATLAB R2009a. Here we have
considered some images and illustrate the segmentation of text and picture.

Fig7.1: Original Image

Fig7.2:Output mask image of Fig7.1

Fig7.3:Segmented image of Fig7.1


Fig7.4:Original Image

Fig7.5:Output Mask Image

Fig7.6:Segmented image of Fig7.1


Fig7.7:Original Image

Fig7.8:Output mask image of Fig7.7

Fig7.9:Segmented text
Fig7.10:Original Image

Fig7.11:Real Text Regions

Fig7.12: Segmented Real Text Regions


CHAPTER 8
APPLICATIONS

8.1 Information Retrieval


Information retrieval is the task of identifying the documents in a collection which
satisfy a user’s request for information about a particular subject. Documents meeting this
criterion have come to be known as “relevant” documents, despite the discrepancy between
the usual usage of relevant, meaning related and what is implied.

8.2 Text segmentation and Language Modeling


Language modeling has a number of applications including playing a crucial role in
speech recognition. Speech recognition is the notoriously difficult problem of determining
what words are encoded in an acoustic signal. It is challenging for a number of reasons.
Different speakers pronounce words in subtly different ways. Languages contain homonyms,
which are pairs of words that are pronounced the same but spelled differently like too and
two. Speakers also slur words and speak at different rates. Also, there are no explicit
boundaries between words like those marked by whitespace in text
.
8.3 Information Extraction
Information extraction (IE) is the task of automatically filling out templates with
particular facts from a document. IE systems use different templates to report different types
of information. For example, if IE were used to convey the details of new product
introductions one of the slots - relevant pieces of information contained in a template - might
be the date when the product will become available.

8.4. Thresholding
Thresholding is the simplest method of image segmentation. From a grayscale image,
thresholding can be used to create binary images.

8.5. Removing Non-Text Regions


To further enhance the results, the non-text regions are removed. The horizontal rectangular
areas with high density indicate text strings. Projection is a more efficient way to find such
high density areas. The idea of coarse-to-fine detection is to locate the text region
progressively by two phase projection.
CONCLUSION

We have presented a method of document image segmentation to


identify the textual and the non-textual zones in the form of either
graphics or any other type of illustrations.

The development of efficient method for detecting all types of graphics and text in
any orientation from real life documents is a challenging process. Our Proposed method for
segmentation of text in image document extract text from images efficiently by applying
Symlet wavelet.

A method of text extraction from images is proposed using the Haar Discrete Wavelet
Transform, the Sobel edge detector, the weighted OR operator, thresholding and the
morphological dilation operator. These mathematical tools are integrated to detect the text
regions from the complicated images. The proposed method is robust against language and
font size of the texts. The proposed method is also used to decompose the blocks including
multi-line texts into single line text. According to the experimental results, the proposed
method is proved to be efficient for extracting the text regions from the images.

In the future, research efforts must be devoted to Multi-oriented annotations detection,


improved separation of graphic linked to text and segmentation rate will be improved.
REFERENCES

1. D. Chen, H. Bourlard and J. Thiran, “Text Identification in Complex Backgrounds Using


SVM”, Proc. of the International Conf. On Computer Vision and Pattern Recognition, Chen
Bourlard, H. Thiran ,pp. 621-626, 8-14 Dec. 2001.

2. M. Pietikäinen and O. Okun, “Text Extraction from Grey Scale Page Images by Simple
Edge Detectors”, Proc. of the 12th Scandinavian Conf. On Image Analysis, Bergen, Norway,
pp. 628-635, 11-14 June 2001.

3. Jie Xi, Xian-Sheng Hua, Xiang-Rong Chen, et al., “A Video Text Detection and
Recognition System”, Proc. of ICME 2001, Waseda University, Japan, pp. 1080-1083,
August 2001.

4. Jie Xi, Xian-Sheng Hua, Xiang-Rong Chen, et al., “A Video Text Detection and
Recognition System”, Proc. of ICME 2001, Waseda University, Japan, pp. 1080-1083,
August 2001.

5. H. Choi and R. G. Baraniuk, “Multiscale Image Segmentation Using Wavelet-Domain


Hidden Markov Models”, IEEE Transactions on Image Processing, vol. 10(9), pp. 1309-
1321, Sep. 2001.

6. Shulan Deng and Shahram Latifi, "Fast Text Segmentation Using Wavelet for Document
Processing", Proc. of the 4th WAC, ISSCI, IFMIP, Maui, Hawaii, USA, pp. 739-744, 11-15
June 2000.

7. M. M. Haji, S. D. Katebi, “An Efficient Text Segmentation Technique Based on Naive


Bayes Classifier”, GVIP Journal, Volume 5, Issue 7, July 2005

8. M. M. Haji, S. D. Katebi, “Machine Learning Approaches to Text Segmentation”, Scientia


Iranica, Vol. 13, No. 4, pp 395-403, October 2006.
9. S.Audithan, R.M. Chandrasekaran,“ Document Text Extraction from Document Images
Using Haar Discrete Wavelet Transform” , European Journal of Scientific Research ISSN
1450-216X ,Vol.36,No.4, pp.502-512, June 2009.

10. Punam Thakare, “A Study of Image Segmentation and Edge Detection Techniques”,
International Journal on Computer Science and Engineering (IJCSE) ISSN : 0975-3397 Vol.
3 No. 2 Feb 2011

11. Danial Md Nor1, Rosli Omar2 , M. Zarar M.Jenu, Jean Marc Ogier, “Image
Segmentation and text Extraction: Application to the Extraction of Textual Information in
Scene Images”, ISASM 2011

12. Neha Gupta, V.K. Banga, “Image Segmentation for Text Extraction”, ICEECE’2012
April 28-19, 2012

13. Mao, W., Chung F., Lanm K., and Siu. W., “Hybrid Chinese / English Text Detection in
Images and Video Frames”, Proceedings of International Conference on Pattern Recognition,
2002, Vol. 3, pp. 1015 – 1018.

14. Qixiang Ye, Wen Gao, Weiqiang Wang and Wei Zeng, 2003. “A Robust Text Detection
Algorithm in Images and Video Frames”, (2003 IEEE), ICICS-PCM 2003, 15-18 December
2003, Singapore.

13. en.wikipedia.org/wiki/Segmentation_(image processing)

Potrebbero piacerti anche