HM2007 Based Speech Recognition

Visit us at http://www.sunrom.
com
Document: Datasheet
Date: 6-Aug-12
Model #: 1180
Products Page: www.sunrom.com/p-762.html
Speech Recognition System based on HM2007

The speech recognition system is a completely assembled and easy to use programmable speech recognition circuit. Programmable, in the sense that you train the words (or vocal utterances) you want the circuit to recognize. This board allows you to experiment with many facets of speech recognition technology. It has 8 bit data out which can be interfaced with any microcontroller for further development. Some of interfacing applications which can be made are controlling home appliances, robotics movements, Speech Assisted technologies, Speech to text translation, and many more.
Features
Self-contained stand alone speech recognition circuit User programmable Up to 20 word vocabulary of duration two second each Multi-lingual Non-volatile memory back up with 3V battery onboard. Will keep the speech recognition data in memory even after power off. Easily interfaced to control external circuits & appliances
Specification
Parameter Input Voltage Output Data Value 9 to 15 V DC 8 bits at 5V Logic Level Note Use a commonly available 12V 500ma DC Adapter Any microcontroller like 8051, PIC or AVR can be interfaced to data port to interpret and implement specialized applications
Applications
There are several areas for application of voice recognition technology.
Speech controlled appliances and toys Speech assisted computer games Speech assisted virtual reality Telephone assistance systems
Voice recognition security Speech to speech translation
Introduction
Speech recognition will become the method of choice for controlling appliances, toys, tools and computers. At its most basic level, speech controlled appliances and tools allow the user to perform parallel tasks (i.e. hands and eyes are busy elsewhere) while working with the tool or appliance. The heart of the circuit is the HM2007 speech recognition IC. The IC can recognize 20 words, each word a length of 1.92 seconds.
Complete Schematic of System

VCC VCC CN1 SIP10 R1 6.8K C3 100nF M1 MIC C1 100nF VCC DB0 DB1 DB2 DB3 DB4 DB5 DB6 DB7 1 2 3 4 5 6 7 8 9 10
DATA OUT
R2 220R R3 220R VCC R6 220R R7 220R R8 220R R9 220R R10 220R R11 220R R12 220R VCC R13 220R R14 220R R15 220R R16 220R R17 220R S2 LT543 b 1 S1 LT543 b 1
READY
D1 LED C2 100nF
+
R4 470R Y1 3.579 Mhz 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 U1 HM2007 GND X2 X1 S1 S2 S3 RDY K1 K2 K3 K4 TEST WLEN CPUM WAIT DEN SA0 SA1 SA2 SA3 SA4 SA5 SA6 SA7 D2 VCC 28 3V BT1 3V BATT BAT85 SA0 SA1 SA2 SA3 SA4 SA5 SA6 SA7 SA8 SA9 SA10 SA11 SA12 ME MR U4 HY6264 10 9 A0 8 A1 7 A2 6 A3 5 A4 4 A5 3 A6 25 A7 24 A8 21 A9 23 A10 2 A11 A12 OE WE CS1 CS2
VCC C4 100nF R5 22K U2 74HC573 9 8 D7 7 D6 6 D5 5 D4 4 D3 3 D2 2 D1 D0 11 1 LE OE 20
S1 S2 S3 K1 K2 K3 K4 VCC DEN SA0 SA1 SA2 SA3 SA4 SA5 SA6 SA7
AGND VDD MICIN LINE VREF D7 D6 D5 D4 D3 D2 D1 D0 MR/MW ME NC NC SA12 SA11 SA10 SA9 SA8 GND VDD D3 BAT85 D0 D1 D2 D3 D4 D5 D6 D7
48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 VCC 3V
VCC
D7 D6 D5 D4 D3 D2 D1 D0 MR ME
D0 D1 D2 D3 D4 D5 D6 D7 DEN
GND
Q7 Q6 Q5 Q4 Q3 Q2 Q1 Q0
12 13 14 15 16 17 18 19
DB0 DB1 DB2 DB3 DB4 DB5 DB6 DB7
VCC
U3 CD4511B 7 1 A 2 B 6 C D 4 5 3 BI LE LT
16
a b c d e f g
13 12 11 10 9 15 14
VCC
VDD
5 4 3 2 1 a 2 4 f 5 g c 7 9 d 10 9 8 7 6 5 4 3 2 1 a 2 4 f 5 g c 7 9 d 10 9 8 7 6 10 e Sheet 1 Rev of 1 10 e
GND
1
SA12 SA11 SA10 SA9 SA8 SW1
2
SW2
10
3
SW3 K4
SW4
SW5
SW6
6
K3
SW9 11 12 13 15 16 17 18 19 D0 D1 D2 D3 D4 D5 D6 D7
SW7
8 0
SW8
9
K2 VCC
VCC
U5 CD4511B 7 1 A 2 B 6 C D 4 5 3 BI LE LT
16
CLEAR
SW10
TRAIN
SW12 K1
a b c d e f g
13 12 11 10 9 15 14
VDD
S3
S2
S1
GND
22 27 20 26 C7 100nF
Power Supply
1 D4 CN2 DC SOCKET D5 D6 C8 100n +
U6 LM7805 IN GND 2 C11 + 100uF 16V OUT 3
VCC
8 VCC C5 100nF C10 100n
GND
SW11
VCC C6 100nF
R18 100K VCC
14
C9 1000uF 25V
9-12V Input
Sunrom Technologies
D7 Title Speech Recognition Code 1180 Date: Wednesday , February 11, 2009
http://www .sunrom.com
1
Using the System

The keypad and digital display are used to communicate with and program the HM2007 chip. The keypad is made up of 12 normally open momentary contact switches. When the circuit is turned on, 00 is on the digital display, the red LED (READY) is lit and the circuit waits for a command. Training Words for Recognition
Sunrom Technologies
Your Source for Embedded Systems
Visit us at www.sunrom.com
Press 1 (display will show 01 and the LED will turn off) on the keypad, then press the TRAIN key ( the LED will turn on) to place circuit in training mode, for word one. Say the target word into the onboard microphone (near LED) clearly. The circuit signals acceptance of the voice input by blinking the LED off then on. The word (or utterance) is now identified as the 01 word. If the LED did not flash, start over by pressing 1 and then TRAIN key. You may continue training new words in the circuit. Press 2 then TRN to train the second word and so on. The circuit will accept and recognize up to 20 words (numbers 1 through 20). It is not necessary to train all word spaces. If you only require 10 target words thats all you need to train. Testing Recognition: Repeat a trained word into the microphone. The number of the word should be displayed on the digital display. For instance, if the word directory was trained as word number 20, saying the word directory into the microphone will cause the number 20 to be displayed. Error Codes: The chip provides the following error codes. 55 = word to long 66 = word to short 77 = no match Clearing Memory To erase all words in memory press 99 and then CLR. The numbers will quickly scroll by on the digital display as the memory is erased. Changing & Erasing Words Trained words can easily be changed by overwriting the original word. For instances suppose word six was the word Capital and you want to change it to the word State. Simply retrain the word space by pressing 6 then the TRAIN key and saying the word State into the microphone. If one wishes to erase the word without replacing it with another word press the word number (in this case six) then press the CLR key. Word six is now erased. Simulated Independent Recognition The speech recognition system is speaker dependant, meaning that the voice that trained the system has the highest recognition accuracy. But you can simulate independent speech recognition. To make the recognition system simulate speaker independence one uses more than one word space for each target word. Now we use four word spaces per target word. Therefore we obtain four different enunciations of each target word. (speaker independent). The word spaces 01, 02, 03 and 04 are allocated to the first target word. We continue do this for the remaining word space. For instance, the second target word will use the word spaces 05, 06, 07 and 08. We continue in this manner until all the words are programmed. If you are experimenting with speaker independence use different people when training a target word. This will enable the system to recognize different voices, inflections and enunciation's of the target word. The more system resources that are allocated for independent recognition the more robust the circuit will become.
Sunrom Technologies
If you are experimenting with designing the most robust and accurate system possible, train target words using one voice with different inflections and enunciation's of the target word. Homonyms Homonyms are words that sound alike. For instance the words cat, bat, sat and fat sound alike. Because of their like sounding nature they can confuse the speech recognition circuit. When choosing target words for your system do not use homonyms. The Voice with Stress & Excitement Stress and excitement alters ones voice. This affects the accuracy of the circuits recognition. For instance assume you are sitting at your workbench and you program the target words like fire, left, right, forward, etc., into the circuit. Then you use the circuit to control a flight simulator game, Doom or Duke Nukem. Well, when youre playing the game youll likely be yelling FIRE! Fire! ...FIRE!! ...LEFT go RIGHT!. In the heat of the action youre voice will sound much different than when you were sitting down relaxed and programming the circuit. To achieve a higher accuracy word recognition one needs to mimic the excitement in ones voice when programming the circuit. These factors should be kept in mind to achieve the high accuracy possible from the circuit. This becomes increasingly important when the speech recognition circuit is taken out of the lab and put to work in the outside world. Error Codes When interfacing the external circuit through its data bus, The decoding circuit must recognize the word numbers from error codes. So the circuit must be designed to recognize error codes 55, 66 and 77 and not confuse them with word spaces 5, 6 and 7. Voice Security System This circuit isnt designed for a voice security system in a commercial application, but that should not prevent anyone from experimenting with it for that purpose. A common approach is to use three or four keywords that must be spoken and recognized in sequence in order to open a lock or allow entry. Aural Interfaces Its been found that mixing visual and aural information is not effective. Products that require visual confirmation of an aural command grossly reduces efficiency. To create an effective AUI products need to understand (recognize) commands given in an unstructured and efficient methods. The way in which people typically communicate verbally. Learning to Listen The ability to listen to one person speak among several at a party is beyond the capabilities of todays speech recognition systems. Speech recognition systems can not (as of yet) separate and filter out what should be considered extraneous noise. Speech recognition is not understanding speech. Understanding the meaning of words is a higher intellectual function. Because a circuit can respond to a vocal command doesnt mean it understands the command spoken. In the future, voice recognition systems may have the ability to distinguish nuances of speech and meanings of words, to Do what I mean, not what I say!
Sunrom Technologies
Speaker Dependent / Speaker Independent Speech recognition is divided into two broad processing categories; speaker dependent and speaker independent. Speaker dependent systems are trained by the individual who will be using the system. These systems are capable of achieving a high command count and better than 95% accuracy for word recognition. The drawback to this approach is that the system only responds accurately only to the individual who trained the system. This is the most common approach employed in software for personal computers. Speaker independent is a system trained to respond to a word regardless of who speaks. Therefore the system must respond to a large variety of speech patterns, inflections and enunciation's of the target word. The command word count is usually lower than the speaker dependent however high accuracy can still be maintain within processing limits. Industrial applications more often require speaker independent voice recognition systems. Recognition Style In addition to the speaker dependent/independent classification, speech recognition also contends with the style of speech it can recognize. They are three styles of speech: isolated, connected and continuous. Isolated: Words are spoken separately or isolated. This is the most common speech recognition system available today. The user must pause between each word or command spoken. Connected: This is a half way point between isolated word and continuous speech recognition. It permits users to speak multiple words. The HM2007 can be set up to identify words or phrases 1.92 seconds in length. This reduces the word recognition dictionary number to 20. Continuous: This is the natural conversational speech we use to in everyday life. It is extremely difficult for a recognizer to sift through the sound as the words tend to merge together. For instance, "Hi, how are you doing?" to a computer sounds like "Hi,.howyadoin" Continuous speech recognition systems are on the market and are under continual development. More On The HM2007 Chip The HM2007 is a CMOS voice recognition LSI (Large Scale Integration) circuit. The chip contains an analog front end, voice analysis, regulation, and system control functions. The chip may be used in a stand alone or CPU connected. Features: Single chip voice recognition CMOS LSI Speaker dependent External RAM support Maximum 40 word recognition (.96 second) Maximum word length 1.92 seconds (20 word) Microphone support Manual and CPU modes available Response time less than 300 milliseconds 5V power supply
Sunrom Technologies
More information on the HM2007 chip is available in the HM2007 data booklet (DS-HM2007) which can be downloaded below. http://www.sunrom.com/files/HM2007.pdf
Interfacing external circuits through data bus

This sample project will show how a circuit can be interfaced through the data bus of speech recognition circuit. It will show messages and error codes on LCD. It will also operate four relays as per data from speech circuit.
Schematic of interfacing project

U1 LCD 16x2 VCC RN1 10K R-ARRAY 1 D1 LED VCC CN6 SIP10 1 2 3 4 5 6 7 8 9 10 DB0 DB1 DB2 DB3 DB4 DB5 DB6 DB7 C1 100n 9 8 7 6 5 4 3 2 U3 AT89S52 39 38 37 36 35 34 33 32 1 2 3 4 5 6 7 8 VCC + XTAL1 C6 10uF 9 R3 10K RST 31 P0.0/AD0 P0.1/AD1 P0.2/AD2 P0.3/AD3 P0.4/AD4 P0.5/AD5 P0.6/AD6 P0.7/AD7 P1.0/T2 P1.1/T2EX P1.2 P1.3 P1.4/SS P1.5/MOSI P1.6/MISO P1.7/SCK EA/VPP PSEN ALE/PROG XTAL2 GND 40 D7 D6 D5 D4 D3 D2 D1 D0 14 13 12 11 10 9 8 7 R1 470R VCC
LCD
6 5 4 3 16 15 2 1 Enable R/W RS VL Gled Vled Vdd Vss
VCC P2.0/A8 P2.1/A9 P2.2/A10 P2.3/A11 P2.4/A12 P2.5/A13 P2.6/A14 P2.7/A15 P3.0/RXD P3.1/TXD P3.2/INT0 P3.3/INT1 P3.4/T0 P3.5/T1 P3.6/WR P3.7/RD 21 22 23 24 25 26 27 28 10 11 12 13 14 15 16 17 29 30
C2 100n
VCC
Display Contrast
VCC PR1 50K PRESET
DATA FROM SPEECH BOARD
U2 ULN2803 1 2 3 4 5 6 7 8 IN1 IN2 IN3 IN4 IN5 IN6 IN7 IN8 OUT1 OUT2 OUT3 OUT4 OUT5 OUT6 OUT7 OUT8 COM 18 17 16 15 14 13 12 11 10 RLY 1 RLY 2 RLY 3 RLY 4
GND
VDD
20
19
Y1
18
VDD
C14 33p
11.0592
C13 33p
D2 LED VDD R2 1K LS1 RELAY 5 3 4 1 2
CN1 PBT2
Power Supply
D4 CN2 DC SOCKET D5 D6 C8
VDD 1
U6 LM7805 IN GND 2 OUT 3
VCC RLY 1 VDD
+ C9 1000uF 25V
C11 + 100uF 16V
C10 100n
100n
D9 LED VDD LS3 RELAY 5 3 4 1 2
9-12V Input
D7
R4 1K RLY 3 VDD
CN3 PBT2
D11 LED VDD R8 1K RLY 4 VDD 1 2 LS4 RELAY 5 3 4
CN7 PBT2
D10 LED VDD R7 1K 1 RLY 2 1 2 LS2 RELAY 5 3 4
Sunrom Technologies
Title Demo Project of Speech Recognition Code 1180A Date: Thursday , February 26, 2009
http://w ww .sunrom.com
Sheet 1 Rev of 1
CN4 PBT2
Sunrom Technologies
Sample Code of interfacing project, How to use dataout from speech

//main.c #include <REGX51.H> // standard 8051 defines // -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= // -=-=-=-=- Include files -=-=-=-=-=-=-= // -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= #include "lcd.h" #include "utils.h" // -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= // -=-=-=-=- Hardware Defines -=-=-=-=-=-=-= // -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= sfr DATA = P0; sbit OUT1 = P3^4; sbit OUT2 = P3^5; sbit OUT3 = P3^6; sbit OUT4 = P3^7; // -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= // -=-=-=-=- Variables -=-=-=-=-=-=-= // -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= char buf[20]; char code M1[] ="SPEECH: ONE"; char code M2[] ="SPEECH: TWO"; char code M3[] ="SPEECH: THREE"; char code M4[] ="SPEECH: FOUR"; // -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= // -=-=-=-=- Main Program -=-=-=-=-=-=-= // -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= void main() { unsigned char lastdata, datanow; // -=-=- Intialize variables -=-=-= OUT1 = 0; OUT2 = 0; OUT3 = 0; OUT4 = 0; // -=-=- Intialise -=-=-= lcdInit(); // -=-=- Welcome LCD Message -=-=-= lcdClear(); lcdGotoXY(0,0); // 1st Line of LCD lcdPrint("Speech Test"); lcdGotoXY(0,1); // 2nd Line of LCD lcdPrint("System"); delayms(5000); // 5 sec lcdClear(); lcdGotoXY(0,0); // 1st Line of LCD lcdPrint("Train: 1-4 key >"); lcdGotoXY(0,1); // 2nd Line of LCD lcdPrint("Train>Speak Now"); // -=-=- Program Loop -=-=-= lastdata=0xff; while(1) { datanow=DATA; // read data from speech board if(lastdata!=datanow) // if there is new data then, { lastdata=datanow; switch(lastdata) { case 0x55: lcdClear(); lcdGotoXY(0,0); // 1st Line of LCD
Sunrom Technologies
lcdPrint("Speech too Long"); lcdGotoXY(0,1); // 2nd Line of LCD lcdPrint("Try Again!"); break; case 0x66: lcdClear(); lcdGotoXY(0,0); // 1st Line of lcdPrint("Speech too Short"); lcdGotoXY(0,1); // 2nd Line of lcdPrint("Try Again!"); break; case 0x77: lcdClear(); lcdGotoXY(0,0); // 1st Line of lcdPrint("No Match"); lcdGotoXY(0,1); // 2nd Line of lcdPrint("Try Again!"); break; case 0x01: if(OUT1==1) OUT1 = 0; else OUT1 = 1; lcdClear(); lcdGotoXY(0,0); // 1st Line of lcdPrint(M1); break; case 0x02: if(OUT2==1) OUT2 = 0; else OUT2 = 1; lcdClear(); lcdGotoXY(0,0); // 1st Line of lcdPrint(M2); break; case 0x03: if(OUT3==1) OUT3 = 0; else OUT3 = 1; lcdClear(); lcdGotoXY(0,0); // 1st Line of lcdPrint(M3); break; case 0x04: if(OUT4==1) OUT4 = 0; else OUT4 = 1; lcdClear(); lcdGotoXY(0,0); // 1st Line of lcdPrint(M4); break; } } } }
LCD LCD
LCD LCD
LCD
LCD
LCD
LCD
Sunrom Technologies

HM2007 Based Speech Recognition

Caricato da

Informazioni sul documento

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

HM2007 Based Speech Recognition

Caricato da

Copyright:

Formati disponibili

Visit us at http://www.sunrom.

Products Page: www.sunrom.com/p-762.html

Speech Recognition System based on HM2007

Voice recognition security Speech to speech translation

Complete Schematic of System

VCC C4 100nF R5 22K U2 74HC573 9 8 D7 7 D6 6 D5 5 D4 4 D3 3 D2 2 D1 D0 11 1 LE OE 20

DB0 DB1 DB2 DB3 DB4 DB5 DB6 DB7

U6 LM7805 IN GND 2 C11 + 100uF 16V OUT 3

8 VCC C5 100nF C10 100n

R18 100K VCC

Using the System

Your Source for Embedded Systems

Your Source for Embedded Systems

Your Source for Embedded Systems

Your Source for Embedded Systems

Interfacing external circuits through data bus

Schematic of interfacing project

DATA FROM SPEECH BOARD

D2 LED VDD R2 1K LS1 RELAY 5 3 4 1 2

U6 LM7805 IN GND 2 OUT 3

VCC RLY 1 VDD

C11 + 100uF 16V

D9 LED VDD LS3 RELAY 5 3 4 1 2

D11 LED VDD R8 1K RLY 4 VDD 1 2 LS4 RELAY 5 3 4

D10 LED VDD R7 1K 1 RLY 2 1 2 LS2 RELAY 5 3 4

Your Source for Embedded Systems

Sample Code of interfacing project, How to use dataout from speech

Your Source for Embedded Systems

Your Source for Embedded Systems

Potrebbero piacerti anche