Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
1 INTRODUCTION 5
2 LITERATURE SURVEY 7
4 PROPOSED WORK 19
4.1 Image Aquisition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.2 Image Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.3 Text Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.4 Text-to-Speech Converter . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.5 Speech Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
5 IMPLEMENTATION 22
5.1 Flow of Work Carried Out . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1
7 APPLICATIONS 32
2
List of Figures
3
List of Tables
4
Chapter 1
INTRODUCTION
Machine replication of human functions like reading is an ancient dream. However, over
the last five decades, machine reading has grown from a dream to reality.Visually impaired
people report numerous difficulties with accessing printed text using existing technology,
including problems with alignment, focus, accuracy, mobility and efficiency. We present
a smart device that assists the visually impaired and travellers which effectively and effi-
ciently reads paper-printed text. The proposed project uses the methodology of a camera
based assistive device that can be used by people to read Text document. The framework
is on implementing image capturing technique in an embedded system based on Raspberry
Pi board. The design is motivated by preliminary studies with visually impaired people,
and it is small-scale and mobile, which enables a more manageable operation with little
setup. In this project we have proposed a text read out system for the travellers and visually
challenged. The proposed fully integrated system has a camera as an input device to feed
the printed text document for digitization.Speech is probably the most efficient medium
for communication between humans.To extract the text from image we use optical char-
acter recognition technique (OCR). Optical character recognition has become one of the
most successful applications of technology in the field of pattern recognition and artificial
intelligence.
Optical character Recognition (OCR) is a process that converts scanned or printed text
images, handwritten text into editable text for further processing.
Speech synthesis is the artificial synthesis of human speech. A Text-To-Speech (TTS)
synthesizer is a computer-based system that should be able to read any text aloud, whether
it was directly introduced in the computer by an operator or scanned and submitted to an
Optical Character Recognition (OCR) system.
Testing of device was done on raspberry pi platform. The Raspberry Pi is a basic em-
bedded system and being a low cost a single-board computer used to reduce the complexity
of systems in real time applications. This platform is mainly based on python. Raspberry pi
consist of Camera slot Interface (CSI) to interface the raspberry pi camera. Here, the Dark
and Low contrast images captured by using the Raspberry Pi camera module are enhanced
5
Image Processing Based Multilingual Translator
LITERATURE SURVEY
Asha G. Hagargund et alcarried out a work and they concluded that the basic framework
is an embedded system that captures an image, extracts only the region of interest (i.e.
region of the image that contains text) and converts that text to speech. It is implemented
using a Raspberry Pi and a Raspberry Pi camera. The captured image undergoes a series
of image pre-processing steps to locate only that part of the image that contains the text
and removes the background. Two tools are used convert the new image (which contains
only the text) to speech. They are OCR (Optical Character Recognition) software and TTS
(Text-to-Speech) engines. The audio output is heard through the raspberry pi’s audio jack
using speakers or earphones.
OCR based automatic book reader for the visually impaired using
Raspberry PI (International Journal of Innovative Research in
Computer and Communication Engineering - 2016)
Aaron James S et al carried out a work and they concluded that Optical character recog-
nition (OCR) is the identification of printed characters using photoelectric devices and
computer software. It coverts images of typed, handwritten or printed text into machine
encoded text from scanned document or from subtitle text superimposed on an image. In
this research these images are converted into audio output. OCR is used in machine process
such as cognitive computing, machine translation, text to speech,key data and text mining.
It is mainly used in the field of research in Character recognition, Artificial intelligence
and computer vision.In this research , as the recognition process is done using OCR the
character code in text files are processed using Raspberry Pi device on which it recognizes
character using tesseract algorithm and python programming and audio output is listened.
To use OCR for pattern recognition to perform Document image analysis (DIA) we use
7
Image Processing Based Multilingual Translator
information in grid format in virtual digital library’s design and construction. This research
mainly focuses on the OCR based automatic book reader for the visually impaired using
Raspberry PI. Raspberry PI features a Broadcom system on a chip (SOC) which includes
ARM compatible CPU and an on chip graphics processing unit GPU. It promotes Python
programming as main programming language with support for BBC BASIC.
which one can recognize the object, mark the interesting region within the object, scan
the text and convert the scanned text into binary characters through optical recognition. A
second method has been presented using which the noise present in the scanned image is
eliminated before characters are recognized. A third method that can be used to convert the
recognised characters into e-speech through pattern matching has also be presented. Appli-
cations: An embedded system has been developed based on ARM technology which helps
the blind persons to read the currency notes. All the methods presented in this paper have
been implemented within an embedded application. The embedded board has been tested
with different currency notes and the speech in English has been generated that identify the
value of the currency. Further work can be done to generate the speech in different other
both National and International Languages.
Features of the B+ version are almost same as B model; however, USB and Network
Boot and Power over Ethernet facility only come with B+ model. Also, two extra USB
10
Image Processing Based Multilingual Translator
ports are added to this device.The SoC(system on chip) combines both CPU and GPU on a
single package and turns out to be faster than Pi 2 and Pi 3 models.
3.1.1.1 Specifications
4. WIFI: Dual-band 802.11ac wireless LAN (2.4GHz and 5GHz ) and Bluetooth 4.2
5. Ethernet: Gigabit Ethernet over USB 2.0 (max 300 Mbps). Power-over-Ethernet
support (with separate PoE HAT).
9. Power: 5V/2.5A DC power input 10. Operating system support: Linux and Unix
3.1.2 PI Camera
The camera module used in this poject is raspberry picamera module as shown in the fig
0.2The camera module plugs to the CSI connector on the Raspberry Pi. It’s able to deliver
clear 5MP resolution image, or 1080p HD video recording at 30fps. The camera module
attaches to Raspberry Pi by a 15 pin Ribbon Cable, to the dedicated 15 pin MIPI Cam-
era Serial Interface (CSI), which was designed especially for interfacing to cameras. The
CSI bus is capable of extremely high data rates, and it exclusively carries pixel data to
the BCM2835 processor. In order to meet the increasing need of Raspberry Pi compatible
camera modules. The ArduCAM team now released a revision C add-on camera module
for Raspberry Pi which is fullycompatible with official one. It optimizes the optical perfor-
mance than the previous Pi cameras, and give user a much clear and sharp image. Also it
provides the FREX and STROBE signalswhich can be used for multi-camera synchronize
capture with proper camera driver firmware.It attaches to Raspberry Pi by way of one of
the two small sockets on the board upper surface. This interface uses the dedicated CSI
interface, which was designed especially for interfacing tocameras. The CSI bus is capable
of extremely high data rates, and it exclusively carries pixel data.The camera is supported
in the latest version of Raspbian, Raspberry Pi’s preferred operatingsystem The board itself
is tiny, at around 36mm x 36mm. The highlight of our module is that the lens is replaceable
compared to official one, making it perfect for mobile or other applicationswhere size and
image quality are important. It connects to Raspberry Pi by way of a short ribboncable.
The camera is connected to the BCM2835/BCM2836 processor on the Pi via the CSI bus,
ahigher bandwidth link which carries pixel data from the camera back to the processor.
This bustravels along the ribbon cable that attaches the camera board to the Pi.The sensor
itself has a native resolution of 5 megapixel, and has a fixed focus lens onboard.In terms
of still images, the camera is capable of 2592 x 1944 pixel static images, and alsosupports
1080p30, 720p60 and 640x480p60/90 video.
3.1.2.1 Features
• 5MPixel sensor
• Integral IR filter
• Size: 36 x 36 mm
• 15 cm flat ribbon cable to 15-pin MIPI Camera Serial Interface (CSI) connector
3.1.2.2 Applications
• Cellular phones
• PDAs
• Toys
The Python interpreter and the extensive standard library are freely available in source
or binary form for all major platforms from the Python Web site, https://www.python.org/,
and may be freely distributed. The same site also contains distributions of and pointers to
many free third party Python modules, programs and tools, and additional documentation.
Python_Imaging_Library(PIL):
Python Imaging Library (expansion of PIL) is the de facto image processing package for
Python language.It incorporates lightweight image processing tools that aids in Editing,
Creating and Saving images.
This module is not preloaded with Python. So to install it execute the following com-
mand in the command-line: “ pip install pillow ”
MODULES:
Image:
The Image module provides a class with the same name which is used to represent a PIL
image. The module also provides a number of factory functions, including functions to load
images from files, and to create new images. Image module includes various methods() that
can be used to perform various operations on Images. some of the methods are : new( ) ,
open( ) , size( ), format( ), resize( ), rotate( ), save( ) etc...
Used in code : open( ) - Parameters : “ imagelocation” in computer data base , return-
type image_object
ImageOps:
Google_Translator(googletrans):
Googletrans is a free and unlimited python library that implemented Google Translate API.
This uses the Google Translate Ajax API to make calls to such methods as detect and
translate.this is an unofficial library using the web API of translate.google.com and also is
not associated with Google.
MODULES:
Translator:
• text (UTF-8 str; unicode; string sequence (list, tuple, iterator, generator)) – The
source text(s) to be translated.
• dest – The language to translate the source text into. The value should be one of the
language codes listed in googletrans.LANGUAGES . dest – str; unicode
• src – The language of the source text. The value should be one of the language
codes listed in googletrans.LANGUAGES or one of the language names listed in
googletrans.LANGCODES. If a language is not specified, the system will attempt to
identify the source language automatically. src – str; unicode
Google_text-to-speech(gTTs):
gTTS (Google Text-to-Speech), a Python library and CLI tool to interface with Google
Translate’s text-to-speech API.
PARAMETERS:
• tld (string) – Top-level domain for the Google Translate host, i.e https://translate.google.<tld>.
This is useful when google.com might be blocked within a network but a local or dif-
ferent Google host (e.g. google.cn) is not. Default is com.
• lang (string, optional) – The language (IETF language tag) to read the text in. Default
is en.
• lang_check (bool, optional) – Strictly enforce an existing lang, to catch a language er-
ror early. If set to True, a ValueError is raised if lang doesn’t exist. Setting lang_check
to False skips Web requests (to validate language) and therefore speeds up instancia-
tion. Default is True.
OS_module:
CREATING_DIRECTORY:
We can create a new directory using the mkdir() function from the OS module.
>>> import os
>>>os.mkdir("File_path")
CHANGING_CURRENT_WORKING_DIRECTORY:
We must first change the current working directory to a newly created one before doing any
operations in it. This is done using the chdir() function.
>>> import os
>>>os.chdir("File_path")
There is a getcwd( ) function in the OS module using which we can confirm if the
current working directory has been changed or not.
>>>os.getcwd( )
os.system():
This method is implemented by calling the Standard C function system(). Syntax: os.sys-
tem(command) , command - string type that tells which command to execute
• OCR uses artificial intelligence for text search and its recognition on images.
• With support for Unicode and the ability to recognize more than 100 languages out
of the box.
• Tesseract is used for text detection on mobile devices, in video, and in Gmail image
spam detection.
3.2.3 Raspbian OS
• Raspbian is a Debian-based computer operating system for Raspberry Pi. Since 2015
till now it is officially provided by the Raspberry Pi Foundation as the primary oper-
ating system for the family of Raspberry Pi single-board computers.
• The operating system is still under active development. Raspbian is highly optimized
for the Raspberry Pi line’s low-performance ARM CPUs.
• OS family: Unix-like
• Platforms: ARM
PROPOSED WORK
We have been observing in our day to day life that many travellers around the globe are
facing difficulties in reading and understanding multilingual text images and respective
destination boards this happens because of different country languages and regional lan-
guages which are not understandable by travellers similarly visually impaired persons face
difficulties in reading books without braille script so in order to overcome this problem we
have opted this project.
This Problem is faced by every traveller that he cant understand the various interna-
tional languages like English,Grench,Greek,Portugese etc....So our project will eliminate
the problem faced by every human who travels from country to country.
The Below fig 4.1 gives the abstract representation of the solution or Proposed work
19
Image Processing Based Multilingual Translator
• There are many unessesary contents in the recieved images like noise, non - clearity
and various other measures
• This block helps in providing Excellent results while extracting the text from image.
• Each character present in the image is recognized and extracted using a Special En-
gine
• this helps to extract the characters and form a string of characters to get the text
format of image
• Once the text is translated now the text is ready to be given to tts converter
• the text to speech converter converts the text into speech format in desired language
of user choice
• Speech output can be heard using speakers or ear phones depending on user choice
The objective of this project is to help the travellers all around the globe to read and under-
stand multilingual text images and destination boards efficiently without any difficulty and
this project should be helpful for visually impaired persons to read books without braille
script in both cases images should be converted to speech format(desired language) by us-
ing required technique.
IMPLEMENTATION
In the Above Figure the complete work flow carried out is described through flow diagram
,
• The complete work is done using Python 3.8 programming language which provides
various libraries and modules to translate the text present in the image to speech
format.
• First the Image from database is accessed using the PIL (Python Imaging Library)
where we use the Image module, snap code is shown below
22
Image Processing Based Multilingual Translator
• Now Image preprocessing is done ,so that the text in the image is clear enough for
further process. its done using another PIL module that is ImageOps which provides
various methods( ) for image preprocessing we are using only grayscale( ) , the snap
code is shown below
• Now the Image is ready for text extraction, now comes into picture Tesseract OCR
which is used to extract the text from the preprocessed image using another python
module pytesseract with its method image_to_string( ) which takes image_object
as input parameter , snap code is shown below
• Now the text obtained is in English so, need to be translated to desired language(i.e
speech output language ) text . This is done using googletrans library with mod-
ule Translator having method translate( ) which is a Free Google translation Api
,there are number of language unicodes used as the parameter for translate( ) method
they are : LANGUAGES = { ’af’: ’afrikaans’, ’sq’: ’albanian’, ’am’: ’amharic’,
’ar’: ’arabic’, ’hy’: ’armenian’, ’az’: ’azerbaijani’, ’eu’: ’basque’, ’be’: ’belaru-
sian’, ’bn’: ’bengali’, ’bs’: ’bosnian’, ’bg’: ’bulgarian’, ’ca’: ’catalan’, ’ceb’:
’cebuano’, ’ny’: ’chichewa’, ’zh-cn’: ’chinese (simplified)’, ’zh-tw’: ’chinese (tra-
ditional)’, ’co’: ’corsican’, ’hr’: ’croatian’, ’cs’: ’czech’, ’da’: ’danish’, ’nl’:
’dutch’, ’en’: ’english’, ’eo’: ’esperanto’, ’et’: ’estonian’, ’tl’: ’filipino’, ’fi’:
’finnish’, ’fr’: ’french’, ’fy’: ’frisian’, ’gl’: ’galician’, ’ka’: ’georgian’, ’de’:
’german’, ’el’: ’greek’, ’gu’: ’gujarati’, ’ht’: ’haitian creole’, ’ha’: ’hausa’,
’haw’: ’hawaiian’, ’iw’: ’hebrew’, ’hi’: ’hindi’, ’hmn’: ’hmong’, ’hu’: ’hungar-
ian’, ’is’: ’icelandic’, ’ig’: ’igbo’, ’id’: ’indonesian’, ’ga’: ’irish’, ’it’: ’italian’,
’ja’: ’japanese’, ’jw’: ’javanese’, ’kn’: ’kannada’, ’kk’: ’kazakh’, ’km’: ’khmer’,
’ko’: ’korean’, ’ku’: ’kurdish (kurmanji)’, ’ky’: ’kyrgyz’, ’lo’: ’lao’, ’la’: ’latin’,
’lv’: ’latvian’, ’lt’: ’lithuanian’, ’lb’: ’luxembourgish’, ’mk’: ’macedonian’, ’mg’:
’malagasy’, ’ms’: ’malay’, ’ml’: ’malayalam’, ’mt’: ’maltese’, ’mi’: ’maori’, ’mr’:
’marathi’, ’mn’: ’mongolian’, ’my’: ’myanmar (burmese)’, ’ne’: ’nepali’, ’no’:
’norwegian’, ’ps’: ’pashto’, ’fa’: ’persian’, ’pl’: ’polish’, ’pt’: ’portuguese’, ’pa’:
’punjabi’, ’ro’: ’romanian’, ’ru’: ’russian’, ’sm’: ’samoan’, ’gd’: ’scots gaelic’,
’sr’: ’serbian’, ’st’: ’sesotho’, ’sn’: ’shona’, ’sd’: ’sindhi’, ’si’: ’sinhala’, ’sk’:
’slovak’, ’sl’: ’slovenian’, ’so’: ’somali’, ’es’: ’spanish’, ’su’: ’sundanese’, ’sw’:
’swahili’, ’sv’: ’swedish’, ’tg’: ’tajik’, ’ta’: ’tamil’, ’te’: ’telugu’, ’th’: ’thai’, ’tr’:
’turkish’, ’uk’: ’ukrainian’, ’ur’: ’urdu’, ’uz’: ’uzbek’, ’vi’: ’vietnamese’, ’cy’:
’welsh’, ’xh’: ’xhosa’, ’yi’: ’yiddish’, ’yo’: ’yoruba’, ’zu’: ’zulu’, ’fil’: ’Filipino’,
’he’: ’Hebrew’ }
• for time being we have used important regional languages of India like: {english :
’en’ ,hindi : ’hi’ , kannada : ’kn’, telugu : ’te’ , tamil : ’ta’ ,marathi : ’mr’} , snap
code to obtain various language translated text is shown below:
• Once the translation of text is done , now the translated text is to be converted into
speech format which has the same language as that of translated text. this is done
using the python library called gtts with module gTTs(Google text to speech con-
verter) , the snap code for the same is shown below:
• Once the speech output is saved in the computer database its played automatically
when we run the code , the output can be heard using ear phones or computer inbuilt
speakers .
• This is the Work flow carried out which describes the main Functioning of our
project by using OS as Windows instead of Raspbian OS and using the Hard-
ware as Laptop instead of Raspberry Pi board “ why so? it is written in Future
Work Chapter of this report” .
• The Above Image is a Clear Image where all the characters are visible and its easy
for Tesseract OCR Engine to recognize the text present in the image since its a black
n white image .We can apply grayscale image preprocessing technique or need not to
apply , its our choice
26
Image Processing Based Multilingual Translator
• In the above image we can see that we have a coloured background so here its nec-
essary to get grayscale image of the coloured image where the black becomes more
darker and background colour is almost converted to light colour ( almost white) this
image preprocessing helps tesseract to recognize the text easily
• For the above image the pre-requisite is that handwritten text should be a perfect
english alphabet font used in writing books. Then the image is ready to be given to
Tesseract OCR
• The Above Image is one of the perfect text image which can be given to Tesseract
OCR and can be easily recognized by it .
6.2 OUTPUT
FOR FIRST OUTPUT WE SELECT FIG 6.2 AS THE INPUT
• The Above figure 6.5 Shows the Output Console before entering the Language Pref-
erence , there are 6 preferences for 6 different languages in which we want to
translate the text image the user can select any one of the preferences , for first trial
we select our Official Language of India Hindi so the preference number is ’2’
the output after that is shown below
• In the Above Fig 6.6 , we can see that after entering the preference as ’2’ it asks
Do you want to save as a audio file(y/n) : when we enter ’y’ it saves the file and
after that it asks to enter the file name : whatever File name we give the same
appears when the audio file played automatically. the image of Output text in
(.txt) format and audio file in (.mp3) format is shown below:
Rest of the Outputs (Both the text and Speech O/Ps) for types of images those are fig
6.1 , fig 6.2 , fig 6.3, fig 6.4 is available in the following link given below :
https://drive.google.com/drive/u/0/folders/12DreOHX-3ePTy5fBy8xukoEGOYvc5h9A
Just copy the link and paste it in your favourite browser (preferred Google Chrome)after
that u can access all the O/Ps for all four Images in all the Languages mentioned in fig 6.6 .
APPLICATIONS
1. This project is used specially for travellers all around the world in reading and un-
derstanding the destination boards containing multilingual text
2. Its also used for the visually impaired people to read text without using braille script.
3. This project will help the people who dont know english language in india to under-
stand the text images in there own regional language
32
Chapter 8
8.1 CONCLUSION
The system enables the visually impaired to not feel at a disadvantage when it comes to
reading text not written in braille. We have implemented an image to speech conversion
technique using raspberry pi. The simulation results have been successfully verified and
the hardware output has been tested using different samples. Our algorithm successfully
processes the image and reads it out clearly. This is an economical as well as efficient
device for the visually impaired and travellers. We have applied our algorithm on many im-
ages and found that it successfully does its conversion. The device is compact and helpful
to the society.Text-to-Speech device can change the text image input into sound with a per-
formance that is high enough and a readability tolerance of less than 2%, with the average
time processing less than two minutes for A4 paper size.
This portable device can be used independently by people. The image pre-processing
part allows for the extraction of the required text region from the complex background and
to give a good quality input to the OCR. The text, which is the output of the OCR is sent to
the TTS engine which produces the speech output. For portability of the device, a battery
may be used to power up the system.
33
Image Processing Based Multilingual Translator
in India our all orders were cancelled and we could not get the hardware equipments in
hand . So for future work whenever we will get the Hardware equipments we will complete
our project successfully along with that we want to add speech to speech conversion in our
project so that it becomes easy for user in handelling the device.
But whatever the Basic Software Implementation we have done it covers our 70%
of the Project , implementation carried out is the main functioning of our Project
[2] Aaron James S,Sanjana S, Monisha M , “OCR based automatic book reader for the
visually impaired using Raspberry PI”, International Journal of Innovative Research in
Computer and Communication Engineering, Vol. 4, Issue 7, January 2016.
35