Sei sulla pagina 1di 1

2015 International Conference on Computing Communication Control and Automation

Combination of multiple image features along with KNN classifier for


classification of Marathi Barakhadi.
Dhanashree Joshi Sarika Pansare
Professor Student
Dept. of Computer Engineering, Sinhgad Academy of Dept. of Computer Engineering, Sinhgad Academy of
engineering, Kondhwa, Pune, India engineering, Kondhwa, Pune, India
e-mail: dk.joshi28@gmail.com e-mail:sarupansare@gmail.com

Abstract: Character recognition is an emerging area for humans and to a machine and alternative inputs cannot be
research in terms of different languages spoken all over the world predefined. OCR can be used to include preprocessing steps
and the associated writing of them. India itself has 11 different such as binarization, skew correction, text block segmentation
scripts and each script has its own subscripts. This diversity gives
prior to recognition.
a wide scope for research out of which devnagari script has been
chosen for studying its problems and solutions for those
problems. Devnagari has marathi as one of its complicated Approaches used for the design of OCR systems
language which has barakhadi as its characteristic part. A lot of
researchers have worked on determining the marathi characters Matrix Matching : Matrix Matching converts each character
more efficiently, problem listed during this work are the styles of into a pattern within a matrix, and then compares the pattern
writing, strokes, aspect ratio etc. Data mining is evolving in with an index of known characters. Its recognition is strongest
various fields such as satellite images, medical images, object on monotype and uniform single column pages.
specific images etc. This paper discusses a new system that
combines the Image processing methods along with the data
Fuzzy Logic: Fuzzy logic is a multi-valued logic that allows
mining classification algorithm which is a new trend called as
image mining. The proposed technique applies data acquisition
intermediate values to be defined between conventional
,pre-processing steps such as grayscale conversion, edge evaluations like yes/no, true/false, black/ white etc. An attempt
detection, binarization and feature extraction methods such as hu is made to attribute a more human-like way of logical thinking
moments and GLCM feature extraction from image processing in the programming of computers. Fuzzy logic is used when
and extracted features are given to Data mining KNN answers do not have a distinct true or false value and there is
classification algorithm for getting the classification results. The uncertainly involved.
Database used is handwritten barakhadi of 3024 images of 36
barakhadi consonants and 12 vowels written by 7 different Feature Extraction: This method defines each character by
people from different age groups. The Proposed system will
the presence or absence of key features, including height,
efficiently and effectively classify the character into its exact
category and will reflect a very high performance as compared to width, density, loops, lines, stems and other character traits.
others for this hybrid system which is never done before. Feature extraction is a perfect approach for OCR of
magazines, laser print and high quality images.
Index Terms- Acquisition, Aspect ratio, Character Recognition,
Data Mining, Feature Extraction. Structural Analysis: Structural Analysis identifies characters
by examining their sub features shape of the image, sub-
vertical and horizontal histograms. Its character repair
I. INTRODUCTION capability is great for low quality text and newsprints.
Character recognition is a process which allows computers to
recognize written or printed characters such as numbers or Neural Networks: This strategy simulates the way the human
letters and to change them into a form that the computer can neural system works. It samples the pixels in each image and
use. Character recognition is becoming more and more matches them to a known index of character pixel patterns.
important in the modern world. It helps humans ease their jobs The ability to recognize characters through abstraction is great
and solve more complex problems. It aims at automation by for faxed documents and damaged text. Neural networks are
reducing the human efforts to a larger extent and to meet ideal for specific types of problems, such as processing stock
various applications like postal automation, office automation market data or finding trends in graphical patterns.
etc. Because of the complexity associated with the large data
due to the variations in the writing style of different Languages worked for character recognition includes Hindi,
individuals and shape similarity, handwritten character Arabic, Malayalam, Sanskrit, Devnagari handwritten numeral
recognition systems are more complex. recognition, Bangla script, Tamil.
Optical Character recognition (OCR) is a technology that Why Marathi is complex language? Marathi is the language
allows machines to automatically recognize the characters spoken by the native people of Maharashtra. Marathi is an
through an optical mechanism. Optical character recognition is Indo-Aryan language spoken by about 71 million people
needed when the information should be readable both to mainly in the Indian state of Maharashtra and neighboring

978-1-4799-6892-3/15 $31.00 © 2015 IEEE 607


DOI 10.1109/ICCUBEA.2015.124

Potrebbero piacerti anche