Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Abstra t
Although India's average litera
y level is about 65%, less than 5% of India's population
an
use English for
ommuni
ation. And even though the world-wide web and
omputer
om-
muni
ation has given us a
ess to information at the
li
k of a mouse, 95% of our population
is ex
luded from this revolution due to dominan
e of English. To over
ome this problem
we develop a multimodal interfa
e to the
omputer that is relevant for India, i.e., one that
enables Indi
omputing. The
omponents of this Indian language interfa
e are:
1. Keyboard and display interfa
e
2. Spee
h interfa
e
3. Handwriting interfa
e
In this paper, we propose a design for enabling online web-based form lling appli
ations with
a multimodal interfa
e. As a proof of
on
ept -an example based on the Railway Reservation
Enquiry System is des
ribed.
Keywords:
1 Introdu tion
Imagine a villager walking into a rural Internet kiosk, who may be semi-literate or even illiterate,
wanting to use the power of the Internet to either
ommuni
ate with a relative somewhere else, or
onta
t a
ity hospital, or get vital
rop information. The
urrently available English-based keyboard
and appli
ations are totally unfamiliar and intimidating. Computers have be
ome an essential part
of many fa
ets of our lives. However, in the Indian
ontext the use of
omputers is far less
ompared
with that in the developed nations of the West be
ause of the reason we have already hinted at: the
language of the interfa
e is almost always English and the
ommuni
ation is in the \written" form,
i.e., via the keyboard. Barely 65% of our population is literate, of whi
h only an elite minority ( 5%)
an read, write, and speak the English language. This shuts out most of the Indian population from
the world wide web and its huge potential.
As a result he or she feels shut out and is not a part of the ongoing information revolution. On the
other hand, if the
omputer had appli
ations in the lo
al language, 60% of the Indian population would
have been a
essible to it. In a typi
al Indian language, there are roughly 3000
hara
ter
lusters.
The number of keys on the keyboard will be
ome unmanageably large if we want a single keystroke
for ea
h
luster. Instead, we will have to make do with a sequen
e of keystrokes, making typing a
hassle. Even a literate person will nd this diÆ
ult. This makes the keyboard interfa
e unnatural for
Indian languages.
Therefore it is essential to have an interfa
e that uses not only lo
al language but also a more natural
one with ease of use. Thus, arises the need for handwriting interfa
e
atering to the literate population
of India. Providing a spee
h interfa
e would make Internet and other appli
ations a
essible to even
to semi-literate and illiterate se
tions of the population also. We
all su
h an interfa
e with keyboard,
handwriting, and spee
h interfa
es as being multimodal. However, spee
h and handwriting interfa
es
supplement the keyboard and do not repla
e it.
In the West, although spee
h and handwriting interfa
es are available for English, these are for some
spe
i
appli
ations, namely, di
tation ma
hines (hands-free), limited handwriting/graÆti re
ognition
as in Personal Digital Assistants (PDA). This is primarily be
ause of the simpli
ity of the Roman
s
ript. Whereas, in the Indian
ontext, these interfa
es must be part of main stream appli
ations,
namely mail readers, web browsers, word pro
essors. This is the FIRST su
h eort to seamlessly
integrate all three interfa
es.
As of today, web browser is most widely used appli
ation. Web pages
an be very broadly
lassied
into two types namely passive and a
tive pages. Passive pages are those web pages whi
h just give
information to the user and does not require any user input. A
tive pages are those web pages whi
h
require user input. We
all su
h pages where the user
an enter data, as web forms or online forms or
web-based forms. A small survey of the various websites reveals us that online forms are the heart of
appli
ations like railway reservation enquiry, telephone enquiry, any ti
ket reservation appli
ation, and
online shopping et
. Hen
e, the fo
us of this paper is on enabling web-based forms with multimodal
interfa
e. In the
ontext of web forms, multimodal interfa
e makes more sense, as the data to be lled
an
ome from any of the three input modes. Multimodal interfa
e thus provides a natural medium of
ommuni
ation with the system. But, for an ordinary man, it is not enough. He needs lo
al language
support whi
h is a must. Thus, with multimodal interfa
e and lo
al language support, Internet and
the wealth of its resour
es are available to our Indian population.
The organization of the paper is as follows: In Se
tion 2 we dis
uss the issues and design for enabling
web-based forms with multimodal interfa
e. In Se
tion 3, we des
ribe the implementation of the
multimodal interfa
e. In Se
tion 4, we dis
uss a prototypi
al appli
ation - a multimodal railway
reservation enquiry system.
In this se
tion, we des
ribe the normal pro
essing of user's request in a browser and the issues and
design related to enabling web forms with multimodal interfa
e.
Usually, when the user requests for a page in the browser, the request is forwarded to the proxy
spe
ied in the browser
onguration. The proxy sends the requested page to the
lient from its
a
he
or from the requested web server. In the
ase of web forms, say a sear
h engine, user inputs the data
and on
li
king \Sear
h" button, a query is sent to the sear
h engine web server with the data entered.
The sear
h engine web server pro
esses the request and responds by sending the sear
h results page to
the
lient. This pro
ess also goes through the proxy only. In the next se
tion, we des
ribe the issues
in providing multimodal support to web-based forms.
As mentioned in the previous se
tion, user requests for page by giving its Uniform Resour
e Lo
ator
(URL) in the browser. From the user's end, multimodal support means to display the multimodal
version of the web page when he/she requests for a web form. We should also be able to send the data
values entered by the user using multimodal interfa
e to the web server and get the response. Usually
it is the browser whi
h interfa
es between the user and the web, i.e. it handles the intera
tion with
the user on one hand and the web on the other through the proxy. In order to provide multimodal
support, we should prevent the browser from displaying the requested page and repla
e it with the
multimodal page. We should handle the user interfa
e at the front end and the intera
tion with the
web at the ba
kend. In the following subse
tions we explain how this
ould be done.
Given the above requirements, we have a design with the multimodal interfa
e implemented as a
plugin, along with a lo
al proxy and an Hyper Text Markup Language (HTML) parser as shown in
Figure 1. With respe
t to the user interfa
e, all that we need to do is, to display equivalent multimodal
enabled form when the user requests for an URL in the browser. We need the URL requested to
he
k
if the URL is a web form. Generally, the browser gives the requested URL to its proxy. Hen
e, we
run a lo
al proxy on the
lient, whi
h is
ongured in the browser. At the proxy, we trap the URL
requested. Then the URL is
he
ked to see if it is a form by going through a le. The le
ontains the
URLs of the web forms. The user has the ability to modify the le to suit his needs. If the requested
page is a form, we send the HTML page with multimodal plugin to the
lient, else it goes to the
higher level proxy and page is retrieved without multimodal support as usual. In order to generate
multimodal pages on the
y, we need the knowledge of the
ontrols in the original HTML page, for
whi
h we need the original page. To identify the
ontrols in the HTML page, we need an HTML
parser.
At the
lient end the browser displays the response (HTML page with multimodal plugin) from
the lo
al proxy, and this triggers the exe
ution of the multimodal plugin (refer Figure 2). Plugin
downloads the requested page, parses it and stores the identied
ontrols in proper data stru
tures.
These identied
ontrols are populated on a basi
multimodal form and displayed to the user. Now,
the user
an give input in any of the three modes. Details on the multimodal
apability is given in
se
tion 3.
Consider a situation where the user has entered the data in a multimodal form. Now, these data values
must be sent/posted to the appropriate web server. Posting information to a web server requires the
Send the HTML page from the web server
Local Proxy
Yes
User
Figure 2: Steps involved in plugin exe ution for enabling multimodal interfa e
URL to whi
h the information has to be posted, a method of sending the information, and a query.
Method is GET or POST method by whi
h the web server a
epts data. Query is a string in whi
h
the attribute names and the data values are pa
ked into. URL of the web server, method and the
attribute names are in the HTML page itself. During parsing, these information are also parsed and
stored in appropriate data stru
tures. So, on
li
k of \Submit" button, the query is built by the
plugin. Then the query along with URL and method are passed to a PERL s
ript, whi
h
onne
ts
to the remote web server, posts the query and gets the response whi
h is also an HTML page. The
response obtained is opened as a normal web page in the browser.
2.3 Issues in Input Modes/Interfa
es
In this se
tion, we dis
uss the issues involved in keyboard, spee
h and handwriting interfa
es or input
modes.
Lo
al language support for typing is provided as stated in [1℄. Internationalization and lo
alization
on
epts have been used for providing lo
al language support for stati
strings. Here, the language
resour
es are separated from the appli
ation. For displaying stati
strings in menu bar, tool bar et
.
in lo
al language, we have to provide the lo
al language strings apriori to the system. Then, on
hoosing a parti
ular language, those lo
al language strings will be used.
In lo
al language mode, the English text in the web form, like instru
tions to ll, legends for the
abbreviations used, or other important information might be lost if we display only the
ontrols in
the multimodal page. So, we have to transliterate the English text to lo
al language and display it in
the multimodal page. In this mode, the data entry will also be in lo
al language. Again, these lo
al
language data values must be
onverted into English as the remote web server
an understand only
English, before sending the request to it. This means that we need algorithms to transliterate English
to lo
al language and vi
e-versa. The
rux of the task here is to press a sequen
e of keystrokes to get
a
hara
ter in lo
al language. The typed lo
al language string must be transliterated to English later
before sending to web server. So, this suggests that we should use same map tables for typing and
transliteration. Keyboard map tables
an be spe
ied by the user. But, this requires a
areful design
of the map tables to a
hieve the purpose.
Keyboard interfa
e is domain independent and language independent as you use the same software
by just
hanging the language resour
es. It is more reliable and robust but
umbersome to type in
lo
al language. But, in
ases where the ba
kground noise is very high, spee
h re
ognition system may
fail and when the
hara
ters are not properly written, the handwriting may also fail. In su
h
ases,
we need the reliability of the keyboard to
orre
t the wrong re
ognition. Spee
h and handwriting
requires oine training of models for re
ognition unlike keyboard. Also,
ertain
ontrol a
tions
an
be done by keyboard very easily. Employing spee
h in su
h
ir
umstan
es may result in unne
essary
overhead. Hen
e, keyboard is kept a
tive with either handwriting or spee
h interfa
e. The fo
us here
is to supplement the keyboard with handwriting and spee
h and not to repla
e it.
In all these web forms that we are talking about, we see that every web form has a nite vo
abulary
of words. For a nite vo
abulary system with 50 words the spee
h and
hara
ter re
ognition rates are
as high as 96% and 83% [2℄ on an average. With the vo
abulary knowledge, we
an further improve
the performan
e using longest mat
h. The vo
abulary in the
ase of web forms have a hierar
hial
stru
ture. For example, in Bharat San
har Nigam Ltd. (BSNL) telephone dire
tory servi
e, in the
rst form we have to give the distri
t we need from the list of distri
ts in Tamil Nadu. In this the
range of values for a
ontrol is known. Hen
e only those models are loaded for re
ognition. This saves
a lot of memory by not loading all the models and also speeds up re
ognition time.
Both spee
h and handwriting require a vo
abulary of words for whi
h the models need to be trained
for re
ognition. One alternative is to put all the possible words and train for them. Re
ognition will
be very slow, also the performan
e will go down, due to more number of models. Also, one user may
visit a page more frequently than the other user. But, he needs good performan
e for that page. Hen
e
he
an train for that web form and use it rather than having a slow, less performing re
ognizer with
all the words. He
an train for as many forms as needed. We must provide user friendly interfa
es for
training for spee
h and handwriting for web forms.
In the
ase of handwriting, we have
hara
ter re
ognizers, thereby eliminating the need to train for
ea
h web form. We
an have
hara
ter re
ognizers for ea
h language, and use the web form vo
abulary
for enhan
ing the performan
e.
3 Implementation
As mentioned earlier, there are a lot of form lling appli
ations on the web. Those web pages
ontain
mostly the
ontrols like text box, radio button,
ombo box or the pull down menu,
ommand buttons
like submit,
lear et
. Data entry using radio buttons,
he
k buttons,
ombo boxes is just a mouse
li
k. Now, we are left with text box only where the user has to type data.
At this point, we have two methods to provide multimodal support.
Provide multimodal support to the text box
ontrol of a tool kit, for example, Qt.
Or, develop a multimodal form
apable of handling the three input modes.
In the rst
ase, multimodal support has to be provided for ea
h and every
ontrol if we de
ide that
ea
h
ontrol needs a multimodal interfa
e. Also there will be overheads of ea
h
ontrol running a
spee
h and handwriting re
ognizer.
In the se
ond
ase, whi
h we have followed, there is a basi
multimodal form with three input modes.
Any appli
ation using this main window,
an use the multimodal
apability. There will be one spee
h
and handwriting re
ognizer for the whole appli
ation as shown in Figure 3. Generally, with keyboard
interfa
e the
ursor indi
ates the position where the typed key strokes appear. So, for spee
h and
handwriting interfa
es also, we have to indi
ate by keeping the
ursor in the position where the
re
ognized string should be pla
ed in the appli
ation. Spee
h and handwriting interfa
es
omplement
ea
h other and supplement the keyboard.
In the multimodal form there are three i
ons namely, keyboard, spee
h, and handwriting. On
li
king
the spee
h i
on, the re
ord i
on
omes up. On
li
king the re
ord i
on, the user's utteran
e is re
orded
for a xed duration. After re
ognition the result is obtained from the spee
h re
ognizer. Similarly,
on
li
king the handwriting i
on, the write strip
omes up. In the write strip, the user
an write and
on
li
king 'Complete', the written word is re
ognized and obtained from the
hara
ter re
ognizer.
In both the above
ases, the result is a string. With the information about, the
ontrol having the
urrent fo
us, the result is displayed in that
ontrol by setting its text property to the obtained result,
from any re
ognizer.
We have used Qt for building the basi
multimodal form. But, this
on
ept
an be used and the form
an be generated using the any available tool kit.
As a proof of
on
ept, we have developed a multimodal interfa
e for Indian Railway Reservation
Enquiry appli
ation, developed in our laboratory. This appli
ation has been developed in a semi-
automati
way. But, we have identied the issues to be ta
kled for any web form, on whi
h we are
working
urrently.
Figure 4 shows the opening page of multimodal railway reservation enquiry appli
ation. It shows three
i
ons on the top left namely keyboard, spee
h and handwriting interfa
es. The user has to
hoose the
required mode of input. Keyboard interfa
e is kept a
tive always with either handwriting or spee
h.
All the
ontrol a
tions
an be done using the keyboard.
Figure 5: Multimodal Railway Reservation Enquiry System with Handwriting Write Strip
Total number of words for spee
h, are around 130 for spee
h purpose in
luding 15 important
ities in
India, 84 train numbers, 8
lasses, 12 months, and 31 days.
The vo
abulary
onsists of 50 important Indian
ities in Tamil language for handwriting re
ognition
(refer Figure 5). During re
ognition, again the output from the re
ognizer is
he
ked in the di
tionary
for longest mat
h and the mat
hed string is displayed.
We
an
hoose the train number, date and month in the se
ond form (see Figure 6), using the
multimodal interfa
e. Then, on
li
king the 'Get It' button, the reservation information is displayed.
Here again, the query is sent to the Indian Railway web server and the reply is parsed and displayed
Figure 7: Third form of Multimodal Railway Reservation Enquiry System
as shown in Figure 7.
5 Con lusion
Spee
h and handwriting interfa
es
omplement ea
h other and supplement the keyboard very well in
the multimodal environment. In Railway Reservation Enquiry Appli
ation, the multimodal interfa
e
is implemented and does not pose any dis
omfort to the user. In the Indian
ontext, the multimodal
interfa
e must be present in all mainstream appli
ations, whi
h is the area of our
urrent work. As a
step in that dire
tion, a word pro
essor has also been enabled with the multimodal interfa
e with a
nite vo
abulary of words. It is hoped that development of a multimodal interfa
es to the
omputer,
will bridge the gap between the haves and the havenots and will go a long way in the empowerment
of the rural folk in India. With the proliferation of the Internet a
ross the
ountryside, Mahatma
Gandhi's dream of Village Swaraj
an indeed be
ome a reality (at least in the Information Te
hnology
sphere).
Referen es
[1℄ Anitha Nalluri, Bala Saraswathi A, Bharathi S, Hema A Murthy, Patri
ia J, Timothy A Gonsalves,
Vidhya M S, Vivekanathan K, " Indian Language Support for X - Window System," in Pro
eedings
of the ICON 2002, (SP-P6.4, May 2004).
[2℄ Aparna K H, Vidhya Subramanian, Kasirajan M, Vijay Prakash G, V.S. Chakravarthy, Sriganesh
Madhavanath, " Online Handwriting Re
ognition for Tamil," in Pro
eedings of the IWFHR-9 2004.