Sei sulla pagina 1di 6

339

Robotics in the Factory of the Future

A Voice-activated Robot with


Artificial Intelligence
Dinesh P. Mital and G o h Wee Leng
School of Electrical and Electronic Engineering, Nanyang Technological Institute, Singapore

In this paper, a voice activated robot arm with intelligence is


presented. The robot arm is controlled with natural connected
speech input. The language input allows a user to interact with
the robot in terms which are familiar to most people. The
advantages of speech activated robots are hands-free and fast
data input operations. The proposed robot is capable of understanding the meaning of natural language commands. After
interpreting the voice commands a series of control data for
performing a tasks are generated. Finally the robot actually
performs the task. Artificial Intelligence techniques are used to
make the robot understand voice commands and act in the
desired mode. It is also possible to control the robot using the
keyboard input mode.
Keywords: Artificial intelligence, Microear, Blocksworld natural language, Teach-mode, Voice-activated robot.

. . . . ~. . . . . . . . . . . .

Dinesli P. Mital received his B. Tech.


degree in Electrical Engineering from
Indian Institute of Technology,

Kanpur, India in 1968. He completed


his M.S. and Ph.D. from State University of New York, Stony Brook, U.S.A.
in 1970 and 1974 respectively. From
1974 to 1976, he worked in NCR Corporation, Dayton, Ohio as an addevelopment engineer. Thered
vanced
after from 1974 to 1983, he worked as
Assistant Professor/Professor in University of Roorkee, India, in Computer Engineering. There he supervised 3 Ph.D and 20 M. Eng.
candidates. Currently, he is working as a senior lecturer in
NTI, School of Electrical and Electronic Engineering. He has
published over 40 technical papers in Computer-Engineering
related areas. His current areas of interest include UP-application, digital controls, Robotics and A.I.
North-Holland
Robotics and Autonomous Systems 4 (1988/89) 339-344

1. Introduction

Speech recognition appears to be emerging as a


new key man-machine interface medium. Researchers have found that the psychological problems inherent to talking to machines a r e a barrier
tO acceptance of speech interfaces. To achieve
practical continuous speech recognition, the system has to expand its vocabulary by an order of
magnitude and increase speed by at least t w o
orders of magnitudes [1-7]. Natural language interfaces appear to be the way to vastly increase
the number of people who can interact with the
computers. For a field like robotics, this medium
of communication may become very important for
remote stationed and hostile environment robot
operations.
Virtually all robotic systems available today
provide the traditional man-machine interface
tools. Typically a user must communicate through
either teach pendant buttons or at a keyboard
typing archaic command strings. Clearly this
makes the robotic systems unusuable for a large
number of relatively untrained operators. The system that is described here incOrpOrates
- 01Onh c e t
gies taken from the field of speech recognition,
natural language understanding and Artificial Intelligence. T h i s will c o n t r i b u t e significantly toW.L. Colt was born in Singapore and
btained his B- Eng (Elect) (Hns)
from the University of Singapore in
1969 and MSEE from the University
of Wisconsin, USA in 1971. He has
worked with the Hewlett-Packard
~
L_
Corporation in their Santa Clara, Palo
/
I
Alto and Singapore plants. Before
joining the Nanyang Technological Institute (NTI), he was with the National University of Singapore. He is
presently Associate Professor and
Head of the Division of Computer
Engineering of NTI. His areas of research include artificial
intelligence and microprocessor applications.
i

0921-8830/89/$3.50 1989, Elsevier Science Publishers B.V. (North-Holland)

340

D.P. Mital, G.W. Leng / Voice-Activated Robot with AI

CONTROLLER

PRINTER

I
~EECH

[ SCOaSOT

i ER_lllAaivl

"

[ MIC~OE~VOK~ ]

\--" ICOMMANDSYS'IEMI

RECOGINZED
SENTENCE

Fig. 2. Speech recognition system.

u~_~

Fig. 1. Block diagram of the voice activated system,

wards reducing the effort and knowledge needed


to operate the robot. The chances of making mistakes are also reduced through natural language
usage. Work in this area is not entirely new [1-3].
A voice activated system was proposed for the
first time about 20 years ago.
The proposed system consists of an I B M
P C / A T microcomputer, M I C R O E A R - a voice
activated hardware and Scorbot E R - I I I robot with
controller. Artificial intelligence support i s provided with the help of a LISP interpreter ( I Q L I S P
1.7). The control software for Scorbot-III was
modified to a c c o m m o d a t e speech activated cornmands and to improve the accuracy and delay in
the m o v e m e n t of the robot. The speech recognition process is a bit slow, due to the M I C R O E A R
hardware. The system has been tested extensively,
Overall speech activated c o m m a n d s ' response has
been excellent without any failures. The block
diagram of the set up is shown in Fig. 1.
The proposed system can be controlled using
any of the three possible modes. These modes are:

speech control mode.

The final action for the robot is based on the


individual words. I n p u t language c o m m a n d s are
defined using regular g r a m m a r language modelling techniques. The model used by the recognition
system evaluates only correct strings of words [5].
Using this a p p r o a c h we can increase the recognition accuracy while reducing the processing time.
Table 1
List of vocabulary for the system

PAGE I

uovz
rt~
ROTATE
s~ivr
GO
BASE

SUOUI.DER
WRIST-ROLL
WRISr-PITC}I
GRIPPER

2. Speech Recognition
For this task it was important that c o m m u n i c a tion between the robot and the h u m a n be as
natural as possible; that is using continuous short
spoken English language sentences. Current technology is not available yet to the point when
fluent speech or long sentences can be recognised
with the same accuracy as isolated words or small
phrases. However, if one applies certain, restrictions on the types and structures of sentences,
then connected speech recognition can be implemented with a good success rate.

Fig. 3. Block diagram of MICROEAR modes.

(i) keyboard control m o d e

(ii) teach p e n d a n t m o d e
(iii) speech control mode.
In this paper we shall concentrate on only the

L~

STRING

WORD

SIRING

0Go
0ol
002
oo~
004

ca~s~
To
THE
~
SLOWLY

o16
o17
018
o19
020

0c6

006
007
006
009
010

oll

m~ISE

POSITION
STOP
QUIT
HOME
CILAR

~wa

021

022
023
024
025
026

027

RICHT
UP
DOWN

012
013
014

BRAVO
C~IARLIE
DELTA

028
029
030

OPE~

o15

~_P^,~

o31

WORD
~
oN~
^rpicK_up
xows'r*cx-UP
TO
PLAC~
GEl"
,OLD

e^~ 2
srRI~
ooo
oo21
004003
oo6Gos
007
o~
009
olo

WORD
co-To

QU~Ys~
tY
QUT
ICLEAR
BPa~vo
CHARLIE
DELTA
BLKA
~.~

s~r.c

o11
o1312
015014
o1617
018
019
020
o21

D.P. Mital, G.W. Leng / Voice-Activated Robot with A I

The block diagram of the speech recognition system is shown in Fig. 2.


The speech recognition hardware used is based
on the M I C R O E A R hardware. This is, to a certain extent, a speaker dependent, language dependent state-of-the-art recogniser capable of recognising up to 256 words or short phrases. It has its
own processor, along with 16K static RAM, and is
designed with low power, cool running CMOS. It
communicates through an RS 232C serial port to
any host computer.
The voice activated hardware operates in two
main modes, training mode and recognition mode
(Fig. 3). The training mode is used to "teach" the
system words which we w a n t it to recognise, while
the recognition mode is used when the user wants
M I C R O E A R to "listen" to what the user has to
say. In this mode, the system is able to hear those
words which it has been previously trained. There
is also a retraining mode. This mode is used if a
particular word or phrase is not properly recognised during listening. The user can replace
that word or retrain the system for it.
The storage capacity of 256 words is divided
into eight groups of 32 words each. It is possible
to intermix words and phrases from various
groups. For controlling the robot, we use only 64
words. These words and phrases are given in Table
1.

3. Speech Recognition Software


The voice controlled robot can be commanded
to any position within its reach. Commands may
be given through simple natural language form.
The robot can also memorize a few positions
specified by names. For example, it can be told to
go to the named position and grasp an object from
there and place it on the conveyor belt. A lot of
software development work was involved to implement this.
The " P R O J E C T F U N C T I O N " can be considered as the "heart" of the program of the voice
controlled system. The whole system is organised
in modular form where lower level modules are
transparent to the higher level modules. The
organisational structure is given in Figs. 4, 5, 6
and 7.
Data objects created under " P R O J E C T
F U N C T I O N " are accessable to all other sub-

341

Y
READ CMM I
PUT CMMD ]

fIN CMMD-UB'r[

L~<'='.Wm't'~ I
[ CLEAR" [
[ t~IdD--UST
]

I
Fig. 4. Flowchart for FUNCTION block.
~
|

I _ _

Fig. 5. Block diagram of FUNCTION block.


modules in the voice controlled system. The
" P R O J E C T " module actually contains a loop that
repetitively performs the " C O M M A N D INPUT",
" C H E C K D I C T I O N A R Y " and " E X E C U T E SEQ U E N C E " . It also tests for commands like STOP,
HOME, M E M O R I Z E , QUIT, etc.
A few of the other important modules are:

...j,,MAKES~-,,--_~__
m-0.
t ...~<.S,.~ETR'DOF
m-AT
- ' " " ~o.~r0P
,J \"~
UNC-RASp MOVEOBJECT
1 ~.~OVESUPPO.T
- - - ~

FINDSPACE

AD0SUPPORT

, FUNCr~THAT
~,, BEC~U~DRE~SS OFCO~0NS
- - -~

FUNCTION THAT WILL BE CALLED WHEN CERTAIN CONDITIONS

A.~6AT~n~

Fig. 6. Block manipulation functions and their relationship.

342

D.P. Mital, G.W. Leng / Voice-Activated Robot with AI

Execution of the above modules is repeated until


the action has been interpreted.

STAR'r-WINDOW

IWe.ADCOMMAND]

3.2. Robot Control Routines

OOM~iD = 1
NOUNPHFIASE?

The sub-modules under this module are

WORD?
~erlw ?
Amaa.E?

MOVE_ BASE,

MOVE_ SHOULDER,

MOVE_ ELBOW,

MOVE_ WRIST_ PITCH,

MOVE_ WRIST_ ROLL,

B-0eKe0M~DI
~ ~ / / / ' /
-,,,./

~ve$
lean. MO~ t~CllON]
~'IWA'I'EIdOTOFI[

II

~s

i~1

II~o~DITo~I
~
~
~

I[BLOa~
~crr~^rE I
Ow'~cDI

[mROR

MOVE_ GRIPPER

These functions form the characters required to


activate the motors. The required information in
the form of character strings is passed on to the
robot controller using ASCII strings. These submodules calculate the necessary steps and directions to move the motor to the required position
and store this information.
Other sub-modules affecting the motion of the
robot directly are
STOP, UPDATE, UPDATE_END, and HOME
Thesesub-modulesareusedtokeeptrackofthe
current motor positions.
3.3. System Start-up Functions

Fig. 7. Flowchart of language understanding program.

3.1. Language Understanding Modules


A time-out was used to recognise the end of a
sentence. This module interprets the meaning of a
sentence. It has the following four sub-modules:
(i) RECOGNISE WORD. This submodule
activates MICROEAR for recognising a word and
returning a number string,
(ii) CONV. INT. This submodule takes the
number string from the first submodule and gencrates a numerical code for the spoken word.
(iii) OPERATE. This takes a command from
CONV.INT and compares it against the lists of
words from ACTION-DICTIONARY. DIRECTION-DIRECTIONARY or ADJ-DICTIONARY. It also sets the respective action flag, motor
flag or direction flag.
(iv) TEST. This is used to test if all the action
flags have been set. If so, the relevant function to
activate the motors will be invoked. It may also
call the UPDATE-END function to update the
individual motor counter to check if each has
reached its counter limit,

This module initiates the start-up procedures


for robot action and has many sub-modules. A
few of the important sub-modules are:
START-UP. This sub-module has many routines
to keep track of resetting the action flags, motor
flags, direction flags and to initialise the look-up
table required to change the ASCII values from
the MICROEAR to IQLISP objects. Other important functions are used to initialise the data
communication lines from MICROEAR to
SCORBOT-III in LISP environment.
START-WIND. This module initialises all the
windows and attributes required by the system
and sets the display screen for the whole system.
DICTIONARY. The words are classified into the
following classes:
CLASS
WORDS
ACTION
MOVE, TURN, ROTATE ....
ROBOT
BASE, SHOULDER, ELBOW ....
DIRECTION UP, DOWN, LEFT, RIGHT
ADJECTIVE THE, OF, AND, OR
Each class has many words and generates different control sequences of operation to implement
the meaning of the words.

D.P. Mital, G.W. Leng / Voice-Activated Robot with A I

3.4. Other Functions

ERROR-TEST. Routine is executed when the motor is activated but the direction is wrong or when
some other error conditions arise.
DELAY. Delay simply adds a delay from a few
microseconds to a few seconds.
SLOW and QUICK. This performs speed control,
Only two controls have been provided,
C O N V _ S T A R . This converts numbers into
strings, which are fed to the robot controller.
The " P R O J E C T " module is invoked to start the
voice controlled system. It first resets all motor
flags and the direction flags, which are necessary
for motor movements. Next, the motor counters
are set to zero and the positive and negative limits
of each motor preset to known values. After all
these steps, the system is ready to interact with the
user.

4. AI Implementation Using Blocks World


In Artificial Intelligence research, the
" B L O C K S W O R L D " is one of the many problem
domains explored. In our implementation, this
approach involves a flat surface on which blocks
are placed. Each block is identified by its size,
type and position. A human user interacts with
the system by giving vocal commands to manipulate the robot arm. The language is English-like
with restrictions in format. The points are chosen
such that the arm can reach anywhere on the
workplace (24" x 18"). We identified four distinct
positions. These positions were taught to the
SCORBOT-III. The operations that were performed successfully on the system are:
(i) Place a block on another.
(ii) Pick-up a block from any position,
(iii) Move to any of the predefined positions,
(iv) Move a block from the stack from position
A to position B.
(v) Move a block from position A to position
B while avoiding collision.
Several other variations were also executed successfully. The system will prompt the user for
commands and then execute the operations successfully. To execute an operation, the task is
broken up into many sub-tasks. This process of

343

problem solving is known as "planning" in AI


research. We were even successful in trajectory
generation under collision avoidance conditions.
The general approach here is: The system identitles the user commands and then examines the
current state of Blocks World. The problem is
broken into sub-problems and solved methodically. In case no solutions were found, an appropriate error message is given. If many possible
solutions exist, a solution which may be optimal in
some sense will be selected.
A typical sequence of actions to place a block
on the top of another block is:
(i) Identify space for grasp on the block to be
moved.
(ii) If there are blocks on the target block,
clear the blocks.
Put them in a vacant position properly.
(iii) Grasp the target block.
(iv) Move the block to the destination position.
(v) Ungrasp the block.
(vi) Replace the moved blocks in (ii).
Software was developed to incorporate these AI
functions. One of the important module developed
to implement this is:
L A N G U A G E Module. This module matches an
input phrase against simple templates of keywords.
Certain (key) words are identified and the rest are
ignored (or used in further processing later on).
After template matching the other sub-modules
are activated.
A few of the sub-modules in this module are:
PUT_ON, PICK_UP, MOVE_ARM,
and
U N GRASP. A brief explanation of these submodules is given.
P U T _ O N . This module places a block on top of
another block. One block is identified as a target
block, the other as destination blocks. Appropriate action is generated to implement this.
PICK_ UP. This module will move the robot arm
to pick up a block and transfers control to the
next module.
MOV ARM. This module moves the robot arm
to a predefined position, which is the destination
block.
U N _ GRASP. The U N _ GRASP module basically
asks the gripper to release the block in the gripper.
Before placing the block, other sub-modules ensure proper positioning of the block.
There are many other sub-modules which are
necessary to complete the task. Because of the

344

D.P. Mital, G. IV.. Leng / Voice-Activated Robot with A I

limitation of space, those sub-modules will not be


discussed here.

5. Robot Control
Before the system can be used, it has to be
trained. Training involves telling the robot to perform the operation as desired. Typically, when
starting the system, the initial voice c o m m a n d is
" F I N D H O M E " . Some variations in this phrase
are accepted. The robot, after interpreting this
command starts to go for home position. It finds
home position from its internal sensor signals. It
generates a sequence of operations to come in the
home position from any possible position. The
system comes in a reset state. This movement may
involve repositioning of all the arm joints,
The majority of the robot control commands
are concerned with planning the motion of the
robot so that only intended objects are moved
without too many changes in the work area. A
typical move command might be to " M O V E
SLOWLY BLK A TO P O S I T I O N D E L T A " . It
would then compute the position of Block A. The
information about size and height of Block A will
be searched from the database. The robot arm will
be positioned and block A will be picked and
placed in position " D E L T A " . "Slowly" implies
that the motion must be slow enough not to
damage Block A.
There is enough intelligence in the system to
sense and remember the position of all the parts in
the working area. A part may be picked from the
bottom position of a stack and placed anywhere.
A proper database has been created for that. In
case of constraints on the possible trajectory of
path, a collision-free path will be generated for
placing an object in a desired location. The Blocks
World concept has been implemented for a defined work area of 18" x 24".
The results of voice activated commands have
been very successful. We used about 64 voice
commands in our experiment,

6. Conclusion

First, it consists of the voice recognition part


which takes the form of a voice controller interfaced serially to the computer. Software modules
are written for training the voice recogniser and
for natural language processing. The second part
of the project is the robotics part consisting of a
robot arm with controller linked serially to the
computer. C o m m a n d s processed by the first
sub-system are used to activate functions for
manipulating the robot arm.
We have developed a workstation which utilises
the concepts of Artificial Intelligence, voice control and robotics in a well-known research topic
called the " B L O C K S W O R L D " . This is a simple
model of the real world consisting of a robot arm
and a few blocks of cubes. The robot arm can
manipulate these blocks and at the same time
remember the status of the blocks as well as the
status of all the positions. Control commands for
the robots are given using natural languages. The
vocabulary is limited to 64 words in our implementation, but can easily be expanded to 256
words.

Acknowledgements
The authors would like to thank Mr. Sng Hock
Tiong and Mr. Tan Chee Kwang for their help in
developing and testing the system. They would
also like to thank Prof. Brian Lee, Dean, School of
Electrical and Electronic Engineering for providing the facilities and making the equipment available for the research work.

References
[1] P. Vicens, Aspects of Speech Recognition by Computer,
Ph.D. Thesis, Stanford, MEMO-AI-85.
[2] G. Miller, R. Bole and M. Sibila, Active damping of

ultrasonic transducers for robotic applications, I E E E Int.


Conf. on Robotics, Atlanta, USA (1984).
[3] G. Miller and J. Jarvis, A review of robotics at AT & T Bell
laboratories, I E C O N "84, Tokyo, Japan (1984).
[4] M. Brown, A hand-ear coordination experiment with a
robot arm, Proc. 1985 Conf. on Intelligent Systems and
Machines, Oakland, MI, USA (1985).
[5] S. Levinson and L. Rabiner, A task oriented conversational
mode speech understanding system, Bibliotheca Phonetica,

In this paper, an effort has been made to


integrate a number of useful technologies into a

12(1985).
[6] C. Myers and L. Rabiner, A level building dynamic time

unified robotic system. The system is user-friendly


and intelligent. It can be divided into two major
sub-systems:

ASSP-29, No. 2 (April 1981).


[7] A Voice Activated Robot Controller, Report, Nanyang

warping algorithm for connected word recognition, IEEE


Trans. on Acoustic, Speech and Signal Processing, Vol.

Technological Institute, Singapore (February 1988).

Potrebbero piacerti anche