Project Final Report

Contents
1 INTRODUCTION 5
2 LITERATURE SURVEY 7
3 DESIGN AND REQUIREMENT SPECIFICATION 10

3.1 Hardware Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.1.1 Raspberry PI Board [3B+] . . . . . . . . . . . . . . . . . . . . . . 10
3.1.1.1 Specifications . . . . . . . . . . . . . . . . . . . . . . . 11
3.1.2 PI Camera . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.1.2.1 Features . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.1.2.2 Applications . . . . . . . . . . . . . . . . . . . . . . . . 13
3.2 Software Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.2.1 Python 3.8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.2.1.1 LIBRARIES AND RESPECTIVE MODULES USED . . 14
3.2.2 TESSERACT OCR ENGINE: . . . . . . . . . . . . . . . . . . . . 16
3.2.3 Raspbian OS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.2.3.1 Basic Feature . . . . . . . . . . . . . . . . . . . . . . . 18
4 PROPOSED WORK 19
4.1 Image Aquisition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.2 Image Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.3 Text Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.4 Text-to-Speech Converter . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.5 Speech Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
5 IMPLEMENTATION 22
5.1 Flow of Work Carried Out . . . . . . . . . . . . . . . . . . . . . . . . . . 22
6 RESULT AND DESCRIPTION 26

6.1 Types of Images used as an input to Translator . . . . . . . . . . . . . . . . 26
6.2 OUTPUT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
1
7 APPLICATIONS 32
8 CONCLUSION AND FUTURE WORK 33

8.1 CONCLUSION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
8.2 FUTURE WORK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2
List of Figures
3.1 Raspberry PI 3B+ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

3.2 Raspberry PI Camera Module . . . . . . . . . . . . . . . . . . . . . . . . 11
3.3 Tesseract OCR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4.1 Basic Block Diagram of Proposed Work . . . . . . . . . . . . . . . . . . . 19
5.1 Work Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
6.1 Clear Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

6.2 Coloured Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
6.3 Handwritten Text Image . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
6.4 Scanned Document Image . . . . . . . . . . . . . . . . . . . . . . . . . . 28
6.5 Output Console Before Entering Language Preference . . . . . . . . . . . . 29
6.6 Output Console After Entering Preference & Filename . . . . . . . . . . . 30
6.7 Text O/P Screenshot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
6.8 Speech O/P Screenshot . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3
List of Tables
3.1 Pin Configuaration of Raspberry PI Camera . . . . . . . . . . . . . . . . . 13
4
Chapter 1
INTRODUCTION
Machine replication of human functions like reading is an ancient dream. However, over
the last five decades, machine reading has grown from a dream to reality.Visually impaired
people report numerous difficulties with accessing printed text using existing technology,
including problems with alignment, focus, accuracy, mobility and efficiency. We present
a smart device that assists the visually impaired and travellers which effectively and effi-
ciently reads paper-printed text. The proposed project uses the methodology of a camera
based assistive device that can be used by people to read Text document. The framework
is on implementing image capturing technique in an embedded system based on Raspberry
Pi board. The design is motivated by preliminary studies with visually impaired people,
and it is small-scale and mobile, which enables a more manageable operation with little
setup. In this project we have proposed a text read out system for the travellers and visually
challenged. The proposed fully integrated system has a camera as an input device to feed
the printed text document for digitization.Speech is probably the most efficient medium
for communication between humans.To extract the text from image we use optical char-
acter recognition technique (OCR). Optical character recognition has become one of the
most successful applications of technology in the field of pattern recognition and artificial
intelligence.
Optical character Recognition (OCR) is a process that converts scanned or printed text
images, handwritten text into editable text for further processing.
Speech synthesis is the artificial synthesis of human speech. A Text-To-Speech (TTS)
synthesizer is a computer-based system that should be able to read any text aloud, whether
it was directly introduced in the computer by an operator or scanned and submitted to an
Optical Character Recognition (OCR) system.
Testing of device was done on raspberry pi platform. The Raspberry Pi is a basic em-
bedded system and being a low cost a single-board computer used to reduce the complexity
of systems in real time applications. This platform is mainly based on python. Raspberry pi
consist of Camera slot Interface (CSI) to interface the raspberry pi camera. Here, the Dark
and Low contrast images captured by using the Raspberry Pi camera module are enhanced
5
Image Processing Based Multilingual Translator
in order to identify the particular region of image.

The pi camera is manually focused towards the text. Then, it takes a picture; a delay
of around 5 seconds is provided, which helps to focus the pi camera, if it is accidently
defocused. After delay, picture is taken and processed by Raspy to hear the translated
words of the text through the earphone or speaker plugged into Raspy through its 3.5mm
audio jack.
Department of Electronics & Communication Engineering, JCE Belagavi 6

Chapter 2
LITERATURE SURVEY
Image to Speech Conversion for Visually Impaired(International

Journal of Latest Research in Engineering and Technology-2017)
Asha G. Hagargund et alcarried out a work and they concluded that the basic framework
is an embedded system that captures an image, extracts only the region of interest (i.e.
region of the image that contains text) and converts that text to speech. It is implemented
using a Raspberry Pi and a Raspberry Pi camera. The captured image undergoes a series
of image pre-processing steps to locate only that part of the image that contains the text
and removes the background. Two tools are used convert the new image (which contains
only the text) to speech. They are OCR (Optical Character Recognition) software and TTS
(Text-to-Speech) engines. The audio output is heard through the raspberry pi’s audio jack
using speakers or earphones.
OCR based automatic book reader for the visually impaired using
Raspberry PI (International Journal of Innovative Research in
Computer and Communication Engineering - 2016)
Aaron James S et al carried out a work and they concluded that Optical character recog-
nition (OCR) is the identification of printed characters using photoelectric devices and
computer software. It coverts images of typed, handwritten or printed text into machine
encoded text from scanned document or from subtitle text superimposed on an image. In
this research these images are converted into audio output. OCR is used in machine process
such as cognitive computing, machine translation, text to speech,key data and text mining.
It is mainly used in the field of research in Character recognition, Artificial intelligence
and computer vision.In this research , as the recognition process is done using OCR the
character code in text files are processed using Raspberry Pi device on which it recognizes
character using tesseract algorithm and python programming and audio output is listened.
To use OCR for pattern recognition to perform Document image analysis (DIA) we use
7
information in grid format in virtual digital library’s design and construction. This research
mainly focuses on the OCR based automatic book reader for the visually impaired using
Raspberry PI. Raspberry PI features a Broadcom system on a chip (SOC) which includes
ARM compatible CPU and an on chip graphics processing unit GPU. It promotes Python
programming as main programming language with support for BBC BASIC.
A Smart Reader for Visually Impaired People Using Raspberry PI

(International Journal of Engineering Science and Computing – 2016)
D.Velmurugan et alcarried a work and they concluded that this work proposes a smart
reader for visually challenged people using raspberry pi. This paper addresses the integra-
tion of a complete Text Read-out system designed for the visually challenged. The system
consists of a webcam interfaced with raspberry pi which accepts a page of printed text.
The OCR (Optical Character Recognition) package installed in raspberry pi scans it into a
digital document which is then subjected to skew correction, segmentation, before feature
extraction to perform classification. Once classified, the text is readout by a text to speech
conversion unit (TTS engine) installed in raspberry pi. The output is fed to an audio ampli-
fier before it is read out. The simulation for the proposed project can be done in MATLAB.
The simulation is just an initiation of image processing ie., the image to text conversion
and text to speech conversion done by the OCR software installed in raspberry pi. The sys-
tem finds interesting applications in libraries, auditoriums, offices where instructions and
notices are to be read and also in assisted filling of application forms.
Though there are many existing solutions to the problem of assisting individuals who
are blind to read, however none of them provide a reading experience that in any way
parallels that of the sighted population. In particular, there is a need for a portable text
reader that is affordable and readily available to the blind community. Inclusion of the
specially enabled in the IT revolution is both a social obligation as well as a computational
challenge in the rapidly advancing digital world today.
Camera based Text to Speech Conversion, Obstacle and Currency

Detection for Blind Persons (Indian Journal of Science and Technology
– 2016)
J. K. R. Sastry et alcarried out a work the main object of this paper is to present an
innovated system that can help the blind for handling currency. Methods/Statistical Anal-
ysis: Many image processing techniques have been used to scan the currency, remove the
noise, mark the region of interest and convert the image into text and then to sound which
can be heard by the blind. The entire system is implemented by using Raspberry Pi Micro
controller based system. In the proto type model an IPR sensor is used instead of camera
for sensing the object. Findings: In this paper a novel method has been presented using

which one can recognize the object, mark the interesting region within the object, scan
the text and convert the scanned text into binary characters through optical recognition. A
second method has been presented using which the noise present in the scanned image is
eliminated before characters are recognized. A third method that can be used to convert the
recognised characters into e-speech through pattern matching has also be presented. Appli-
cations: An embedded system has been developed based on ARM technology which helps
the blind persons to read the currency notes. All the methods presented in this paper have
been implemented within an embedded application. The embedded board has been tested
with different currency notes and the speech in English has been generated that identify the
value of the currency. Further work can be done to generate the speech in different other
both National and International Languages.
Image Processing based Multilingual Translator for Travellers using

Raspberry pi(International Journal of Advanced Research in
Computer and Communication Engineering 2017)
G.Madhavan et al carried a work and the paper is about an innovative, efficient and
realtime cost beneficial method that enables international travelers to hear the text images
of sign boards, routes in their own languages. It combines the concept of Optical Character
Recognition (OCR), text to Speech Synthesizer (TTS) and translator in Raspberry pi. Text
Extraction from color images is a challenging task in computer vision. Text-to-Speech
conversion is a method that scans and reads any language letters and numbers that are in
the image using OCR technique and then translates it into any desired language and at last
it gives audio output of the translated text. The translate shell which is used to translate the
text to is available in many languages.
The translate shellis utilized to provide audio output too. This paper describes the de-
sign, implementation and experimental results of the device. This device consists of two
modules, image processing module and voice output module. The device was developed
based on Raspberry Pi b+ with 900 MHz processor speed.

Chapter 3
DESIGN AND REQUIREMENT

SPECIFICATION
3.1 Hardware Description
3.1.1 Raspberry PI Board [3B+]

Raspberry Pi 3 B+ was introduced by Raspberry Pi foundation on 14th March 2018. It is
an advanced version of Raspberry Pi 3 B model that was introduced in 2016.It is a tiny
computer board that comes with CPU, GPU, USB ports, I/O pins, WiFi, Bluetooth, USB
and network boot and is capable of doing some functions like a regular computer.
Figure 3.1: Raspberry PI 3B+
Features of the B+ version are almost same as B model; however, USB and Network
Boot and Power over Ethernet facility only come with B+ model. Also, two extra USB
10
ports are added to this device.The SoC(system on chip) combines both CPU and GPU on a
single package and turns out to be faster than Pi 2 and Pi 3 models.
3.1.1.1 Specifications
1. SOC: Broadcom BCM2837B0, Cortex-A53 (ARMv8) 64-bit SoC
2. CPU: 1.4GHz 64-bit quad-core ARM Cortex-A53 CPU
3. RAM: 1GB LPDDR2 SDRAM
4. WIFI: Dual-band 802.11ac wireless LAN (2.4GHz and 5GHz ) and Bluetooth 4.2
5. Ethernet: Gigabit Ethernet over USB 2.0 (max 300 Mbps). Power-over-Ethernet
support (with separate PoE HAT).
6. Improved PXE network and USB mass-storage booting.
7. Thermal management: Yes Video: Yes – VideoCore IV 3D. Full-size HDMI
8. Audio: Yes USB 2.0: 4 ports GPIO: 40-pin
9. Power: 5V/2.5A DC power input 10. Operating system support: Linux and Unix
3.1.2 PI Camera
Figure 3.2: Raspberry PI Camera Module
The camera module used in this poject is raspberry picamera module as shown in the fig
0.2The camera module plugs to the CSI connector on the Raspberry Pi. It’s able to deliver

clear 5MP resolution image, or 1080p HD video recording at 30fps. The camera module
attaches to Raspberry Pi by a 15 pin Ribbon Cable, to the dedicated 15 pin MIPI Cam-
era Serial Interface (CSI), which was designed especially for interfacing to cameras. The
CSI bus is capable of extremely high data rates, and it exclusively carries pixel data to
the BCM2835 processor. In order to meet the increasing need of Raspberry Pi compatible
camera modules. The ArduCAM team now released a revision C add-on camera module
for Raspberry Pi which is fullycompatible with official one. It optimizes the optical perfor-
mance than the previous Pi cameras, and give user a much clear and sharp image. Also it
provides the FREX and STROBE signalswhich can be used for multi-camera synchronize
capture with proper camera driver firmware.It attaches to Raspberry Pi by way of one of
the two small sockets on the board upper surface. This interface uses the dedicated CSI
interface, which was designed especially for interfacing tocameras. The CSI bus is capable
of extremely high data rates, and it exclusively carries pixel data.The camera is supported
in the latest version of Raspbian, Raspberry Pi’s preferred operatingsystem The board itself
is tiny, at around 36mm x 36mm. The highlight of our module is that the lens is replaceable
compared to official one, making it perfect for mobile or other applicationswhere size and
image quality are important. It connects to Raspberry Pi by way of a short ribboncable.
The camera is connected to the BCM2835/BCM2836 processor on the Pi via the CSI bus,
ahigher bandwidth link which carries pixel data from the camera back to the processor.
This bustravels along the ribbon cable that attaches the camera board to the Pi.The sensor
itself has a native resolution of 5 megapixel, and has a fixed focus lens onboard.In terms
of still images, the camera is capable of 2592 x 1944 pixel static images, and alsosupports
1080p30, 720p60 and 640x480p60/90 video.
3.1.2.1 Features
• High-Definition video camera for Raspberry Pi Model A/B/B+ and Raspberry Pi 2
• Omnivision OV5647 sensor in a fixed-focus module with replaceable Lens
• Lens holder: M12x0.5 , CS mount or C mount
• 5MPixel sensor
• Integral IR filter
• Still picture resolution: 2592 x 1944
• Max video resolution: 1080p
• Max frame rate: 30fps
• Support FREX/ STROBE feature

• Size: 36 x 36 mm
• 15 cm flat ribbon cable to 15-pin MIPI Camera Serial Interface (CSI) connector
3.1.2.2 Applications
• Cellular phones
• PDAs
• Toys
• Other battery-powered products
• Can be used in Raspberry Pi, ARM, DSP, FPGA platforms
Table 3.1: Pin Configuaration of Raspberry PI Camera
3.2 Software Description
3.2.1 Python 3.8

Python 3.8 is an easy to learn, powerful programming language. It has efficient high-
level data structures and a simple but effective approach to object-oriented programming.
Python’s elegant syntax and dynamic typing, together with its interpreted nature, make it
an ideal language for scripting and rapid application development in many areas on most
platforms.

The Python interpreter and the extensive standard library are freely available in source
or binary form for all major platforms from the Python Web site, https://www.python.org/,
and may be freely distributed. The same site also contains distributions of and pointers to
many free third party Python modules, programs and tools, and additional documentation.
3.2.1.1 LIBRARIES AND RESPECTIVE MODULES USED
Python_Imaging_Library(PIL):
Python Imaging Library (expansion of PIL) is the de facto image processing package for
Python language.It incorporates lightweight image processing tools that aids in Editing,
Creating and Saving images.
This module is not preloaded with Python. So to install it execute the following com-
mand in the command-line: “ pip install pillow ”
MODULES:
Image:
The Image module provides a class with the same name which is used to represent a PIL
image. The module also provides a number of factory functions, including functions to load
images from files, and to create new images. Image module includes various methods() that
can be used to perform various operations on Images. some of the methods are : new( ) ,
open( ) , size( ), format( ), resize( ), rotate( ), save( ) etc...
Used in code : open( ) - Parameters : “ imagelocation” in computer data base , return-
type image_object
ImageOps:
The ImageOps module contains a number of ‘ready-made’ image processing operations.

This module is somewhat experimental, and most operators only work on L and RGB im-
ages. ImageOps module has lots of methods ( ) to manipulate various attributes of image
or perform image pre-processing. some of the methods are : grayscale( ) , fit( ), equalize(
), expand( ),autocontrast( ) etc.....
Used in code: graysacle( ) - Parameters : image_object
Google_Translator(googletrans):
Googletrans is a free and unlimited python library that implemented Google Translate API.
This uses the Google Translate Ajax API to make calls to such methods as detect and
translate.this is an unofficial library using the web API of translate.google.com and also is
not associated with Google.
MODULES:

Translator:
class googletrans.Translator(service_urls=None, user_agent=’Mozilla/5.0 (Windows NT 10.0;

Win64; x64)’, proxies=None, timeout=None) . the method used to translate the english
text to required language text is , translate( ) : translate(text, dest=’en’, src=’auto’) Pa-
rameters are defined below:
• text (UTF-8 str; unicode; string sequence (list, tuple, iterator, generator)) – The
source text(s) to be translated.
• dest – The language to translate the source text into. The value should be one of the
language codes listed in googletrans.LANGUAGES . dest – str; unicode
• src – The language of the source text. The value should be one of the language
codes listed in googletrans.LANGUAGES or one of the language names listed in
googletrans.LANGCODES. If a language is not specified, the system will attempt to
identify the source language automatically. src – str; unicode
translate( ) returns output as string
Google_text-to-speech(gTTs):
gTTS (Google Text-to-Speech), a Python library and CLI tool to interface with Google
Translate’s text-to-speech API.
PARAMETERS:
• text (string) – The text to be read.
• tld (string) – Top-level domain for the Google Translate host, i.e https://translate.google.<tld>.
This is useful when google.com might be blocked within a network but a local or dif-
ferent Google host (e.g. google.cn) is not. Default is com.
• lang (string, optional) – The language (IETF language tag) to read the text in. Default
is en.
• slow (bool, optional) – Reads text more slowly. Defaults to False.
• lang_check (bool, optional) – Strictly enforce an existing lang, to catch a language er-
ror early. If set to True, a ValueError is raised if lang doesn’t exist. Setting lang_check
to False skips Web requests (to validate language) and therefore speeds up instancia-
tion. Default is True.
OS_module:

It is possible to automatically perform many operating system tasks. The OS module in

Python provides functions for creating and removing a directory (folder), fetching its con-
tents, changing and identifying the current directory, etc.
CREATING_DIRECTORY:
We can create a new directory using the mkdir() function from the OS module.
>>> import os
>>>os.mkdir("File_path")
CHANGING_CURRENT_WORKING_DIRECTORY:
We must first change the current working directory to a newly created one before doing any
operations in it. This is done using the chdir() function.
>>> import os
>>>os.chdir("File_path")
There is a getcwd( ) function in the OS module using which we can confirm if the
current working directory has been changed or not.
>>>os.getcwd( )
os.system():
This method is implemented by calling the Standard C function system(). Syntax: os.sys-
tem(command) , command - string type that tells which command to execute
3.2.2 TESSERACT OCR ENGINE:
Figure 3.3: Tesseract OCR
• Tesseract OCR is an optical character recognition engine with open-source code, is

the most popular and qualitative OCR-library.

• OCR uses artificial intelligence for text search and its recognition on images.
• With support for Unicode and the ability to recognize more than 100 languages out
of the box.
• It can be trained to recognize other languages.
• Tesseract is used for text detection on mobile devices, in video, and in Gmail image
spam detection.
Download Link of Tesseract OCR Engine for Windows : https://github.com/UB-Mannheim/tesser-

act/wiki
Python-tesseract_Library(pytesseract):
Python-tesseract is an optical character recognition (OCR) tool for python. That is, it will
recognize and “read” the text embedded in images.
Python-tesseract is a wrapper for Google’s Tesseract-OCR Engine. It is also useful
as a stand-alone invocation script to tesseract, as it can read all image types supported by
the Pillow and Leptonica imaging libraries, including jpeg, png, gif, bmp, tiff, and others.
Additionally, if used as a script, Python-tesseract will print the recognized text instead of
writing it to a file.
command line to install pytesseract for python : pip install pytesseract
• Code for invoking Tesseract OCR engine in Python environment : pytesseract.pytesser-
act.tesseract_cmd = r" path of tesseract ocr engine(tesseract.exe) "
• Method used : pytesseract.image_to_string(image_object) - used to convert interna-

tional language text images to pure text using optical charcater recognition.
3.2.3 Raspbian OS
• Raspbian is a Debian-based computer operating system for Raspberry Pi. Since 2015
till now it is officially provided by the Raspberry Pi Foundation as the primary oper-
ating system for the family of Raspberry Pi single-board computers.
• The operating system is still under active development. Raspbian is highly optimized
for the Raspberry Pi line’s low-performance ARM CPUs.
• Raspbian uses PIXEL, Pi Improved Xwindows Environment, Lightweight as its main

desktop environment as of the latest update.
• It is composed of a modified LXDE desktop environment and the Openbox stacking

window manager with a new theme and few other changes. It is composed of a
modified LXDE desktop environment and the Openbox stacking window manager
with a new theme and few other changes.

• The distribution is shipped with a copy of computer algebra program Mathematic

and a version of Minecraft called Minecraft Pi as well as a lightweight version of
Chromium as of the latest version.
3.2.3.1 Basic Feature
• Developer: Raspberry Pi Foundation
• OS family: Unix-like
• Source model : Open source
• Latest release:Raspbian Jessie with PIXEL / 16.02.2017
• Marketing target: Raspberry Pi
• Update method: APT
• Package manager: dpkg
• Platforms: ARM
• Kernel type: Monolithic
• User land: GNU
• Default user interface: PIXEL, LXDE
• License: Free and open-source software licenses (mainly GPL)

Chapter 4
PROPOSED WORK
We have been observing in our day to day life that many travellers around the globe are
facing difficulties in reading and understanding multilingual text images and respective
destination boards this happens because of different country languages and regional lan-
guages which are not understandable by travellers similarly visually impaired persons face
difficulties in reading books without braille script so in order to overcome this problem we
have opted this project.
This Problem is faced by every traveller that he cant understand the various interna-
tional languages like English,Grench,Greek,Portugese etc....So our project will eliminate
the problem faced by every human who travels from country to country.
The Below fig 4.1 gives the abstract representation of the solution or Proposed work
Figure 4.1: Basic Block Diagram of Proposed Work
4.1 Image Aquisition

In this Block ,
• The required text image is captured by Raspberry PI Camera module
• The Raspberry pi recieves the image in either png or jpeg format .
19
• The Image is stored in database of Raspberry PI
4.2 Image Preprocessing

In this Block,
• There are many unessesary contents in the recieved images like noise, non - clearity
and various other measures
• These measures can prevent the smooth working of further process
• So Image Preprocessing block provides various image enhancement techniques like

foreground extraction , creating grayscale of image , removal of noise , removal of
blurr areas etc...
• This block helps in providing Excellent results while extracting the text from image.
4.3 Text Extraction

In this Block,
• Each character present in the image is recognized and extracted using a Special En-
gine
• Engine used is Tesseract OCR(Optical Character Recognition) Engine
• this helps to extract the characters and form a string of characters to get the text
format of image
4.4 Text-to-Speech Converter

In this Block,
• Text is first converted to required language whose speech format is to be obtained
• Once the text is translated now the text is ready to be given to tts converter
• the text to speech converter converts the text into speech format in desired language
of user choice

4.5 Speech Output

• This block the resultant bloack where we get our desired language speech output of
the text image given as an input
• Speech output can be heard using speakers or ear phones depending on user choice
The objective of this project is to help the travellers all around the globe to read and under-
stand multilingual text images and destination boards efficiently without any difficulty and
this project should be helpful for visually impaired persons to read books without braille
script in both cases images should be converted to speech format(desired language) by us-
ing required technique.

Chapter 5
IMPLEMENTATION
5.1 Flow of Work Carried Out
Figure 5.1: Work Flow
In the Above Figure the complete work flow carried out is described through flow diagram
,
• The complete work is done using Python 3.8 programming language which provides
various libraries and modules to translate the text present in the image to speech
format.
• We have used PyCharm Platform to write the code .
• First the Image from database is accessed using the PIL (Python Imaging Library)
where we use the Image module, snap code is shown below
22
• Now Image preprocessing is done ,so that the text in the image is clear enough for
further process. its done using another PIL module that is ImageOps which provides
various methods( ) for image preprocessing we are using only grayscale( ) , the snap
code is shown below
• Now the Image is ready for text extraction, now comes into picture Tesseract OCR
which is used to extract the text from the preprocessed image using another python
module pytesseract with its method image_to_string( ) which takes image_object
as input parameter , snap code is shown below
• Now the text obtained is in English so, need to be translated to desired language(i.e
speech output language ) text . This is done using googletrans library with mod-
ule Translator having method translate( ) which is a Free Google translation Api
,there are number of language unicodes used as the parameter for translate( ) method
they are : LANGUAGES = { ’af’: ’afrikaans’, ’sq’: ’albanian’, ’am’: ’amharic’,
’ar’: ’arabic’, ’hy’: ’armenian’, ’az’: ’azerbaijani’, ’eu’: ’basque’, ’be’: ’belaru-
sian’, ’bn’: ’bengali’, ’bs’: ’bosnian’, ’bg’: ’bulgarian’, ’ca’: ’catalan’, ’ceb’:
’cebuano’, ’ny’: ’chichewa’, ’zh-cn’: ’chinese (simplified)’, ’zh-tw’: ’chinese (tra-
ditional)’, ’co’: ’corsican’, ’hr’: ’croatian’, ’cs’: ’czech’, ’da’: ’danish’, ’nl’:
’dutch’, ’en’: ’english’, ’eo’: ’esperanto’, ’et’: ’estonian’, ’tl’: ’filipino’, ’fi’:
’finnish’, ’fr’: ’french’, ’fy’: ’frisian’, ’gl’: ’galician’, ’ka’: ’georgian’, ’de’:
’german’, ’el’: ’greek’, ’gu’: ’gujarati’, ’ht’: ’haitian creole’, ’ha’: ’hausa’,
’haw’: ’hawaiian’, ’iw’: ’hebrew’, ’hi’: ’hindi’, ’hmn’: ’hmong’, ’hu’: ’hungar-
ian’, ’is’: ’icelandic’, ’ig’: ’igbo’, ’id’: ’indonesian’, ’ga’: ’irish’, ’it’: ’italian’,
’ja’: ’japanese’, ’jw’: ’javanese’, ’kn’: ’kannada’, ’kk’: ’kazakh’, ’km’: ’khmer’,
’ko’: ’korean’, ’ku’: ’kurdish (kurmanji)’, ’ky’: ’kyrgyz’, ’lo’: ’lao’, ’la’: ’latin’,
’lv’: ’latvian’, ’lt’: ’lithuanian’, ’lb’: ’luxembourgish’, ’mk’: ’macedonian’, ’mg’:
’malagasy’, ’ms’: ’malay’, ’ml’: ’malayalam’, ’mt’: ’maltese’, ’mi’: ’maori’, ’mr’:
’marathi’, ’mn’: ’mongolian’, ’my’: ’myanmar (burmese)’, ’ne’: ’nepali’, ’no’:
’norwegian’, ’ps’: ’pashto’, ’fa’: ’persian’, ’pl’: ’polish’, ’pt’: ’portuguese’, ’pa’:

’punjabi’, ’ro’: ’romanian’, ’ru’: ’russian’, ’sm’: ’samoan’, ’gd’: ’scots gaelic’,
’sr’: ’serbian’, ’st’: ’sesotho’, ’sn’: ’shona’, ’sd’: ’sindhi’, ’si’: ’sinhala’, ’sk’:
’slovak’, ’sl’: ’slovenian’, ’so’: ’somali’, ’es’: ’spanish’, ’su’: ’sundanese’, ’sw’:
’swahili’, ’sv’: ’swedish’, ’tg’: ’tajik’, ’ta’: ’tamil’, ’te’: ’telugu’, ’th’: ’thai’, ’tr’:
’turkish’, ’uk’: ’ukrainian’, ’ur’: ’urdu’, ’uz’: ’uzbek’, ’vi’: ’vietnamese’, ’cy’:
’welsh’, ’xh’: ’xhosa’, ’yi’: ’yiddish’, ’yo’: ’yoruba’, ’zu’: ’zulu’, ’fil’: ’Filipino’,
’he’: ’Hebrew’ }
• for time being we have used important regional languages of India like: {english :
’en’ ,hindi : ’hi’ , kannada : ’kn’, telugu : ’te’ , tamil : ’ta’ ,marathi : ’mr’} , snap
code to obtain various language translated text is shown below:
• Once the translation of text is done , now the translated text is to be converted into
speech format which has the same language as that of translated text. this is done
using the python library called gtts with module gTTs(Google text to speech con-
verter) , the snap code for the same is shown below:
• the output of speech is available in myobj , now the speech_object is to be saved

in the computer database. these operations of creating directory, saving the file
and playing automatically the speech.mp3 file by using available system media
player is done using another Python module i.e os module , the snap code for the
same is shown below :
• Once the speech output is saved in the computer database its played automatically
when we run the code , the output can be heard using ear phones or computer inbuilt
speakers .

• This is the Work flow carried out which describes the main Functioning of our
project by using OS as Windows instead of Raspbian OS and using the Hard-
ware as Laptop instead of Raspberry Pi board “ why so? it is written in Future
Work Chapter of this report” .

Chapter 6
RESULT AND DESCRIPTION
6.1 Types of Images used as an input to Translator
Figure 6.1: Clear Image
• The Above Image is a Clear Image where all the characters are visible and its easy
for Tesseract OCR Engine to recognize the text present in the image since its a black
n white image .We can apply grayscale image preprocessing technique or need not to
apply , its our choice
Figure 6.2: Coloured Image
26
• In the above image we can see that we have a coloured background so here its nec-
essary to get grayscale image of the coloured image where the black becomes more
darker and background colour is almost converted to light colour ( almost white) this
image preprocessing helps tesseract to recognize the text easily
Figure 6.3: Handwritten Text Image
• For the above image the pre-requisite is that handwritten text should be a perfect
english alphabet font used in writing books. Then the image is ready to be given to
Tesseract OCR

Figure 6.4: Scanned Document Image
• The Above Image is one of the perfect text image which can be given to Tesseract
OCR and can be easily recognized by it .
6.2 OUTPUT
FOR FIRST OUTPUT WE SELECT FIG 6.2 AS THE INPUT

Figure 6.5: Output Console Before Entering Language Preference
• The Above figure 6.5 Shows the Output Console before entering the Language Pref-
erence , there are 6 preferences for 6 different languages in which we want to
translate the text image the user can select any one of the preferences , for first trial
we select our Official Language of India Hindi so the preference number is ’2’
the output after that is shown below

Figure 6.6: Output Console After Entering Preference & Filename
• In the Above Fig 6.6 , we can see that after entering the preference as ’2’ it asks
Do you want to save as a audio file(y/n) : when we enter ’y’ it saves the file and
after that it asks to enter the file name : whatever File name we give the same
appears when the audio file played automatically. the image of Output text in
(.txt) format and audio file in (.mp3) format is shown below:
Figure 6.7: Text O/P Screenshot

Figure 6.8: Speech O/P Screenshot
Rest of the Outputs (Both the text and Speech O/Ps) for types of images those are fig
6.1 , fig 6.2 , fig 6.3, fig 6.4 is available in the following link given below :
https://drive.google.com/drive/u/0/folders/12DreOHX-3ePTy5fBy8xukoEGOYvc5h9A
Just copy the link and paste it in your favourite browser (preferred Google Chrome)after
that u can access all the O/Ps for all four Images in all the Languages mentioned in fig 6.6 .

Chapter 7
APPLICATIONS
1. This project is used specially for travellers all around the world in reading and un-
derstanding the destination boards containing multilingual text
2. Its also used for the visually impaired people to read text without using braille script.
3. This project will help the people who dont know english language in india to under-
stand the text images in there own regional language
32
Chapter 8
CONCLUSION AND FUTURE WORK
8.1 CONCLUSION
The system enables the visually impaired to not feel at a disadvantage when it comes to
reading text not written in braille. We have implemented an image to speech conversion
technique using raspberry pi. The simulation results have been successfully verified and
the hardware output has been tested using different samples. Our algorithm successfully
processes the image and reads it out clearly. This is an economical as well as efficient
device for the visually impaired and travellers. We have applied our algorithm on many im-
ages and found that it successfully does its conversion. The device is compact and helpful
to the society.Text-to-Speech device can change the text image input into sound with a per-
formance that is high enough and a readability tolerance of less than 2%, with the average
time processing less than two minutes for A4 paper size.
This portable device can be used independently by people. The image pre-processing
part allows for the extraction of the required text region from the complex background and
to give a good quality input to the OCR. The text, which is the output of the OCR is sent to
the TTS engine which produces the speech output. For portability of the device, a battery
may be used to power up the system.
8.2 FUTURE WORK

In this Project we have done only a Basic Software Implementation using a Computer
instead of Raspberry Pi Board & Windows OS instead of Raspbian OS . We were very
keen to do the hardware implementation but it was not so, since we know that COVID -
19 made a great impact that we all four project members had to be in our home and all
colleges and electronic equipment shops were closed and whatever we had ordered the
hardware equipments i.e, Raspberry PI 3B+ Board(was not available at our local areas) and
PI Camera Module from Alibaba.com ( a china website) so as soon as lockdown started
33
in India our all orders were cancelled and we could not get the hardware equipments in
hand . So for future work whenever we will get the Hardware equipments we will complete
our project successfully along with that we want to add speech to speech conversion in our
project so that it becomes easy for user in handelling the device.
But whatever the Basic Software Implementation we have done it covers our 70%
of the Project , implementation carried out is the main functioning of our Project

Bibliography
[1] Asha G. Hagargund.SharshaVanriaThota. MitadruBera. Eram Fatima Shaik “Image to

Speech Conversion for Visually Impaired”, International Journal of Latest Research in
Engineering and Technology (IJLRET) ISSN: 2454-5031, Volume 03 - Issue 06, June
2017, PP. 09-15.
[2] Aaron James S,Sanjana S, Monisha M , “OCR based automatic book reader for the
visually impaired using Raspberry PI”, International Journal of Innovative Research in
Computer and Communication Engineering, Vol. 4, Issue 7, January 2016.
[3] D.Velmurugan,M.S.Sonam ,S.Umamaheswari, S.Parthasarath, K.R.Arun , “A Smart

Reader for Visually Impaired People Using Raspberry PI” ISSN 2321 3361 © 2016
IJESC, Volume 6 Issue No. 3.
[4] D. B. K. Kamesh, S. Nazma, J. K. R. Sastry , S. Venkateswarlu , “Camera based Text to

Speech Conversion, Obstacle and Currency Detection for Blind Persons” Indian Jour-
nal of Science and Technology, Vol 9(30), DOI: 10.17485/ijst/2016/v9i30/98716, Au-
gust 2016
[5] S. Rishi Kumar, G.Madhavan, M. Naveen, S.Subash, U. Selvamalar Beulah Ponrani,

“Image Processing based Multilingual Translator for Travellers using Raspberry pi”,
International Journal of Advanced Research in Computer and Communication Engi-
neering ISO 3297:2007 Certified Vol. 6, Issue 3, March 2017.
35

Project Final Report

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Project Final Report

Caricato da

Copyright:

Formati disponibili

Contents

3 DESIGN AND REQUIREMENT SPECIFICATION 10

6 RESULT AND DESCRIPTION 26

8 CONCLUSION AND FUTURE WORK 33

3.1 Raspberry PI 3B+ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

4.1 Basic Block Diagram of Proposed Work . . . . . . . . . . . . . . . . . . . 19

5.1 Work Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

6.1 Clear Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3.1 Pin Configuaration of Raspberry PI Camera . . . . . . . . . . . . . . . . . 13

in order to identify the particular region of image.

Department of Electronics & Communication Engineering, JCE Belagavi 6

Image to Speech Conversion for Visually Impaired(International

A Smart Reader for Visually Impaired People Using Raspberry PI

Camera based Text to Speech Conversion, Obstacle and Currency

Department of Electronics & Communication Engineering, JCE Belagavi 8

Image Processing based Multilingual Translator for Travellers using

Department of Electronics & Communication Engineering, JCE Belagavi 9

DESIGN AND REQUIREMENT

3.1 Hardware Description

3.1.1 Raspberry PI Board [3B+]

Figure 3.1: Raspberry PI 3B+

1. SOC: Broadcom BCM2837B0, Cortex-A53 (ARMv8) 64-bit SoC

2. CPU: 1.4GHz 64-bit quad-core ARM Cortex-A53 CPU

3. RAM: 1GB LPDDR2 SDRAM

6. Improved PXE network and USB mass-storage booting.

7. Thermal management: Yes Video: Yes – VideoCore IV 3D. Full-size HDMI

8. Audio: Yes USB 2.0: 4 ports GPIO: 40-pin

Figure 3.2: Raspberry PI Camera Module

Department of Electronics & Communication Engineering, JCE Belagavi 11

• High-Definition video camera for Raspberry Pi Model A/B/B+ and Raspberry Pi 2

• Omnivision OV5647 sensor in a fixed-focus module with replaceable Lens

• Lens holder: M12x0.5 , CS mount or C mount

• Still picture resolution: 2592 x 1944

• Max video resolution: 1080p

• Max frame rate: 30fps

• Support FREX/ STROBE feature

Department of Electronics & Communication Engineering, JCE Belagavi 12

• Other battery-powered products

• Can be used in Raspberry Pi, ARM, DSP, FPGA platforms

Table 3.1: Pin Configuaration of Raspberry PI Camera

3.2 Software Description

3.2.1 Python 3.8

Department of Electronics & Communication Engineering, JCE Belagavi 13

3.2.1.1 LIBRARIES AND RESPECTIVE MODULES USED

The ImageOps module contains a number of ‘ready-made’ image processing operations.

Department of Electronics & Communication Engineering, JCE Belagavi 14

class googletrans.Translator(service_urls=None, user_agent=’Mozilla/5.0 (Windows NT 10.0;

translate( ) returns output as string

• text (string) – The text to be read.

• slow (bool, optional) – Reads text more slowly. Defaults to False.

Department of Electronics & Communication Engineering, JCE Belagavi 15

It is possible to automatically perform many operating system tasks. The OS module in

3.2.2 TESSERACT OCR ENGINE:

Figure 3.3: Tesseract OCR

• Tesseract OCR is an optical character recognition engine with open-source code, is

Department of Electronics & Communication Engineering, JCE Belagavi 16

• It can be trained to recognize other languages.

Download Link of Tesseract OCR Engine for Windows : https://github.com/UB-Mannheim/tesser-

• Method used : pytesseract.image_to_string(image_object) - used to convert interna-

• Raspbian uses PIXEL, Pi Improved Xwindows Environment, Lightweight as its main

• It is composed of a modified LXDE desktop environment and the Openbox stacking

Department of Electronics & Communication Engineering, JCE Belagavi 17

• The distribution is shipped with a copy of computer algebra program Mathematic