Sei sulla pagina 1di 15

What is Speech Recognition?

The technology by which sounds, words or phrases

spoken by humans are converted into electrical signals, and these signals are transformed into coding patterns to which meaning has been assigned. This project will aim to create a device which accepts audio input and will send a corresponding signal to another device to be controlled to perform the required task.

Components
Microphone.
FGPA (Altera DE2-115). Bluetooth Module.

Microcontroller (Arduino UNO with ATmega328).


5V Relays.

Bluetooth module Altera de2-115

Arduino UNO

5V Relays

Block Diagram

Input
The voice command is given to the microphone.
The Altera board has a 24-bit Audio CODEC

microphone interface so that we directly plug in the microphone. The electrical signal from the microphone is digitized by an "analog-to-digital (A/D) converter", and is stored in memory.

Processing
A typical speech recognition system has two stages

A pre-processing stage, which takes a speech waveform as its input, and extracts from it feature vectors or observations which represent the information required to perform recognition. The next stage is recognition, or decoding, which is performed using a set of phoneme-level statistical models called hidden Markov models (HMMs).

Approaches
The most common approaches to voice recognition

can be divided into two classes: "template matching" and "feature analysis". Template matching: We match the input with a digitized voice sample, or template. Feature analysis: We first processes the voice input using "Fourier transforms" or "linear predictive coding (LPC)", then attempt to find similarities between the expected inputs and the actual digitized voice input.

Preferred methodology: HMM-Hidden Markov model


Hidden Markov Models (HMMs) provide a simple and

eective framework for modelling time-varying spectral vector sequences. As a consequence, almost all present day large vocabulary continuous speech recognition (LVCSR) systems are based on HMMs. For small- to medium-sized vocabularies the word and language models are compiled into a single, integrated model. Recognition is performed using the Viterbi algorithm to find the route through this model which best explains the data.

Hardware Implementation
The final implementation would be as follows:

A remote would consist of a power button, led and microphone externally. Internally it consists of the ASIC and Bluetooth module. At the other end, we would be having a microcontroller (ATmega8) which would also have a Bluetooth module and is attached to other appliances through relay. (must be integrated with the house wiring)

Wireless Communication
The output from the previous stage is fed to the

Bluetooth module, which is received by another Bluetooth module . This Bluetooth module is connected to microcontroller. We program the microcontroller to do the required operation.

Extensions
Control of televisions with Wi-Fi.
Control of television without Wi-Fi.

Implementation with RF module so that cost could go

as low as Rs.1,200.

Cost Expected
Component Microphone Altera de2-115 Bluetooth modules Arduino UNO 5V Relays (6) Other Components Total Price Rs. 600 Rs. 20,000 Rs. 2,500 Rs. 2,000 Rs. 250 Rs. 1,000 Rs. 27,000*

Expected Cost of the Product in bulk


Component ASIC Microphone Bluetooth modules ATmega8 5V Relays (6) Other Components Total Price Rs. 300 Rs. 200 Rs. 2,000 Rs. 85 Rs. 150 Rs. 50 Rs. 2,785*

References
Speech Recognition on an FPGA using Discrete and Continuous

Hidden Markov Models By: Stephen J. Melnikoff, Steven F. Quigley & Martin J. Russell Electronic, Electrical and Computer Engineering, University of Birmingham
Speaker-Independent Phone Recognition Using Hidden Markov

Models By: Kai-Fu-Lee, member, IEEE and HSIAO-WUEN HON

Potrebbero piacerti anche