Smart Speech Recognition

MAEERs Maharashtra Institute of Technology, Pune
Department of Computer Engineering
SMART SPEECH RECOGNITION

APPLICATION ON ANDROID PLATFORM
Under the guidance of :-
PROF. R.A. AGRAWAL
LAKSHYA AGGARWAL:B120024277
MAYANK KAUSHIK:- B120024283
VARUN KULKARNI:- B120024274
POOJA KAMBLE:- B120024263
MOTIVATION
The motivation for doing this project was primarily an
interest in undertaking a challenging project in an
interesting area of research.
We wanted to work with Speech Recognition as it is a new
and upcoming area of research plus implement some
machine learning concepts we have been learning in our
curriculum.
Literature
Survey
Sr No.
1
Author
Google Inc
Melis Oner, Jeffry A. PulciferStump, Patrick Seeling and

Tolga Kaya
Mohamed Elfeky, Pedro J.

Moreno, Victor Soto
International Conference on
Natural Lan- guage and
Speech Processing (2015)
Paper
Personalized Speech
Recognition on
mobile devices
Description
Focuses primarily on
recognizing clear and
descriptive speech
that converts speech
to text
Towards the run and Checks the step
walk activity through count and provides
step detection
relevant results as
the output
Overcome
Creation of google API
Step detection and

improvising in the
health sector, thus
providing better
opportunities and
promoting a healthier
lifestyle
Multi-Dialectical Recognition of
Users voice search
Languages Effect multilingual aspects queries are usually
on Speech
so as to diversify the directed to a dialectRecognition
languages that the
specific recognizer
machine is capable of that does not match
understanding
the users current
location.
Dr. Raj Kamal Professor,

School of Computer
Science and IT and
School of Electronics,
Devi Ahilya University
Machine Learning
Techniques for
Mobile Intelligent
Systems
Wu, Shyi-Shiou Dept. of

Electronics Engineering Nanki
University of Technology
Nantou, Taiwan
The Design of an
Intelligent Pedometer
using Android
Machine learning
To acquire or
explores the study and develop new
construction of
knowledge or skills
algorithms that can
from existing or
learn from and make
non-existing
predictions on data.
examples for the
sake of optimizing
performance
criterion.
Pedometer is a common The system provides

auxiliary device used for three action modes:
maintaining health and fit time-based mode,
ness. In this paper, an
distance-based mode
intelligent pedometer is and count- based
developed using Android. mode. All the tracking
data are saved in

SQLite database.
Kuei-Chun Liu1 ChingHung Wu1 Shau-Yin

Tseng1 Yin-Te Tsai22015
IEEE International
Conference on Computer
and Information
Technology
Sanja Primorac and Mladen

Russo FESB, University of
Split, Croatia.
Voice Helper: A
Mobile Assistive
System for
Visually Impaired
Persons
The reader, which

trans forms digital
information to text
and then to voice by
TTS (Text-To-Speech),
is widely used to help
visually impaired
persons to operate
devices.
It provides the
Navigation Reader
for walking and
riding, which is
based on Google
Maps and supports
more detailed voice
guidance for the
distance to a
destination and
directions of
moving
Android application User can send messages Speech recognition for
for sending SMS
to the entered phone
Voice SMS uses a
messages with
number or the number of technique based on
speech recognition contact from the
hidden Markov models
interface
phonebook. Speech
(HMM - Hidden
recognition is done via
Markov Model). It is
the Internet, connecting currently the most
to Googles server. The
successful and most
application is adapted to flexible approach to
input messages in
speech recognition.
English. Used tools are
Android SDK and the

installation is done on
mobile phone with
INTRODUCTION
TO OUR
PROJECT
THE STATEMENT
Our project aims to develop an android platform based application that
uses smart speech recognition to provide voice commands for
different mobile applications and perform repetitive tasks
autonomously using machine learning concepts.
Let us go into details :

The primary focus is on building a smart speech recognition system.
entails providing voice commands for a variety of applications on an android based platform.
This
This will include multiple applications - calling, texting, switching on and off sensors (Wi-Fi, GPS,
Bluetooth),setting alarms.
The application will provide online as well as offline services.
This application would also apply machine learning concepts to identify usage patterns and the tasks
being performed repetitively will be automated.
Also services of activity recognition, recognizing nearby friends using Bluetooth will be performed.
The importance of the project is that it will help visually challenged people as well
as the general population have an alternate and a very easy way to control applications
on
android
smart
phones
using
voice
commands
instead
of
touch.
NOW,
What is Speech Recognition ???
Speech recognition (SR) is the inter-disciplinary sub-field of computational
linguistics which incorporates knowledge and research in the linguistics, computer
science, and electrical engineering fields to develop methodologies and technologies
that enables the recognition and translation of spoken language into text by
computers and computerized devices such as those categorized as smart technologies
and robotics. It is also known as "automatic speech recognition" (ASR), "computer
speech recognition", or just "speech to text" (STT).
It is the process of converting an

acoustic signal, captured by a
microphone or a telephone, to a
set of words.
The recognized words can be an
end in themselves, as for
applications such as commands &
control, data entry, and document
preparation.
They can also serve as the input to
further linguistic processing in
order to achieve speech
understanding.
System
Architecture
The proposed architecture consists of two main components which are designed to work together.
Modules Speech Recognition and Machine Learning .
Each of these modules can be sub divided into multiple blocks as per requirement.
The Speech recognition module is performed in five steps.

The initial block is to take input from user in the form of voice commands.
Then this is passed to the second block which is the API that converts
speech into text. The output of Conversion Block is the command in
textual form which is given as an input to the next block.
Here the input is compared to the predefined list of commands from the
database and the appropriate one is chosen.
This command is then executed in the next block.
In the last Output Block the command given by the user is saved as a log
entry in the Database which will be later used by the Machine Learning
module.
The Machine Learning module provides multiple functionalities. The first and foremost is the
automatic execution of very frequent commands given by the user with users permission.
The module uses the command log generated by the Speech Recognition module to search for
patterns , rules and learn frequently used commands. It will then ask user permission to execute
them automatically.
Example is if a user sets an alarm for 6:00 AM frequently via voice commands, then the module
recognizes this as a frequent task and will now ask the user to set the alarm next time
automatically .
Another functionality is to count the steps taken by user within a specific period of
time using Activity Recognition. This time period is defined by user input . It takes
start voice command from the user and stops with the users stop voice command .
This requires sensor input from Accelerometer. The flow of control is depicted in fig.
This module also notifies the user of friends in the vicinity. This is achieved
by searching for available paired Bluetooth devices in users surroundings. This will
require telling the system about the paired device. This information will be saved in
the application database.
Mathematical Models
System Description (S): Module 1 Speech Recognition

S = {I, O, Fn, Sc, Fc} where
Input (I):
Output (O):
User voice commands

Performing the action spoken by the user.
Functions (Fn): {Fn1,Fn2,Fn3,Fn4} where

Fn1: Recording user voice commands
Fn2: Speech to Text conversion
Fn3: Matching Text to Command database
Fn4: Performing the action defined in database
Success Conditions (Sc): {Sc1,Sc2,Sc3} where
Sc1: Correct speech to text conversion
Sc2: Command match found in database
Sc3: Successful execution of command
Failure Conditions (Fc): {Fc1,Fc2,Fc3} where
Fc1: Unrecognizable language or dialect
Fc2: High levels of background noise
Fc3: No command match found in database
System Description (S): Module 2 Machine Learning

S = {I, O, Fn, Sc, Fc} where
Input (I): {I1,I2} where
I1: Sensor data taken for activity recognition
I2: Log of user given commands
Output (O): Notifications for recommendations by the system
Functions (Fn): {Fn1,Fn2,Fn3} where

Fn1: Recognizing human activity and identifying patterns
Fn2: generating recommendations based on patterns
Fn3: Performing the corresponding commands if user accepts
recommendation
Success Conditions (Sc): {Sc1,Sc2} where

Sc1: Recognizing the activity accurately
Sc2: Desired recommendations are made
Failure Conditions (Fc): {Fc1,Fc2} where
Fc1: Inability to recognize activity
Fc2: Sensors not working properly
UML Diagrams
Activity Diagram
Use Case Diagram
Deployment Diagram
System Requirements
HardwareRequirements:
AndroidOSbasedMobiledevicewithbasicsensorsBluetooth,
accelerometer.
SoftwareRequirements:
Androidstudio,
GoogleAPI,
Database:MySQL,SQLite
Probable Test Cases

Giving different commands to the system by different users
in different environmental conditions.
Test Cases
Environmental
Conditions
Noisy environment
User 1
Commands
1...n
N
users
Silent Environment
User 1
N users
Commands 1..n
Applications
Perform tasks like calling, texting without using touch
screen.
app can be used for multi-tasking like operating mobile
device while driving or cooking, etc.
Can act as a personal assistant
Used as a transcriber
Helps the visually challenged operate a smart phone with
relative ease.
Future Scope
Future versions of the app can aim to bring more applications under voice commands purview.
Also the functionality of voice recognition system as an authentication system can be
considered.
Instead of taking only input as voice commands, the app can with time also give output as
speech. It can speak out the results instead of only displaying them.
Including the ability to recognize different languages and dialect and work speech recognition
for them.
Future work in activity recognition may consider more activities and implement a real-time
system on smartphone. Other query strategies such as variance reduction and densityweighted methods may be investigated to enhance the performance of machine learning
schemes proposed here.
Conclusion
Android platform based smart voice recognition system is developed to operate
multiple android apps with simple voice commands. This technology is being
implemented in a user friendly and compact device. The application is operating
in both online and offline modes. This project has the capability of modern smart
speech recognition software to increase independence for persons with
disabilities.
Major purpose of this system was to provide a system so that the blind and
physically disabled population can easily control many functions of a smart phone
via voice.
The system is very useful for the general population as well. Users command a
mobile device to do something via voice such as directly controlling smart
phones. These commands are then immediately executed.
The application is also using machine learning concepts to execute frequent voice
commands automatically. It is also keeping track of steps taken by user and
paired Bluetooth devices in vicinity.
References
Wikipedia : https://en.wikipedia.org/wiki/Speech_recognition
https://en.wikipedia.org/wiki/Machine_learning
[1] Sanja Primorac and Mladen Russo, Android application for sending
SMS messages with speech recognition interface, 35th International
Convention MIPRO, 2012.
[2] Android Developers, http://developer.android.com
[3] Google Inc., Personalized speech recognition on mobile devices , Proceedings of International Conference on
Acoustics, Speech and Signal
Processing (ICASSP), 2016
[4] Melis Oner, Jeffry A. Pulcifer-Stump, Patrick Seeling and Tolga Kaya,
Towards the Run and Walk Activity Classification through Step Detection,
An Android Application , Annual International Conference of the
IEEE Engineering in Medicine and Biology Society. IEEE Engineering
in Medicine and Biology Society. Annual Conference, 2012
[5] Thomas Olutoyin Oshin, ERSP: An Energy-efficient Real-time Smartphone Pedometer, IEEE International Conference on
Systems, Man,
and Cybernetics, 2013
[6] Dr. Raj Kamal, A. Chaudhary and S. Kolhe Machine Learning Techniques
for Mobile Intelligent Systems: A Study , Ninth International Conference
on Wireless and Optical Communications Networks, 2012
[7] Wu, Shyi-Shiou and Hsin-Yi Wu, The Design of an Intelligent Pedometer
using Android, Second International Conference on Innovations in Bioinspired Computing and Applications, 2011
HARLib: A human activity recognition library on Android
Human Activity Recognition -An Android Application by Smitha K.S International Journal of Scientific & Engineering
Research (2013)

Smart Speech Recognition

Caricato da

Informazioni sul documento

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Smart Speech Recognition

Caricato da

Copyright:

Formati disponibili

MAEERs Maharashtra Institute of Technology, Pune

Department of Computer Engineering

SMART SPEECH RECOGNITION

PROF. R.A. AGRAWAL

Melis Oner, Jeffry A. PulciferStump, Patrick Seeling and

Mohamed Elfeky, Pedro J.

Step detection and

Dr. Raj Kamal Professor,

Wu, Shyi-Shiou Dept. of

Pedometer is a common The system pro- vides

data are saved in

Kuei-Chun Liu1 ChingHung Wu1 Shau-Yin

Sanja Primorac and Mladen

The reader, which

Android SDK and the

Let us go into details :

It is the process of converting an

The Speech recognition module is performed in five steps.

System Description (S): Module 1 Speech Recognition

User voice commands

Functions (Fn): {Fn1,Fn2,Fn3,Fn4} where

System Description (S): Module 2 Machine Learning

Output (O): Notifications for recommendations by the system

Functions (Fn): {Fn1,Fn2,Fn3} where

Success Conditions (Sc): {Sc1,Sc2} where

Use Case Diagram

Probable Test Cases

Potrebbero piacerti anche