Sei sulla pagina 1di 9

Multimodal Interfa

e Enabled Web-based Form


Filling Appli ations

Abstra t

Although India's average litera y level is about 65%, less than 5% of India's population an
use English for ommuni ation. And even though the world-wide web and omputer om-
muni ation has given us a ess to information at the li k of a mouse, 95% of our population
is ex luded from this revolution due to dominan e of English. To over ome this problem
we develop a multimodal interfa e to the omputer that is relevant for India, i.e., one that
enables Indi omputing. The omponents of this Indian language interfa e are:
1. Keyboard and display interfa e
2. Spee h interfa e
3. Handwriting interfa e
In this paper, we propose a design for enabling online web-based form lling appli ations with
a multimodal interfa e. As a proof of on ept -an example based on the Railway Reservation
Enquiry System is des ribed.

Keywords:

Multimodal interfa e, Indi omputing

1 Introdu tion

Imagine a villager walking into a rural Internet kiosk, who may be semi-literate or even illiterate,
wanting to use the power of the Internet to either ommuni ate with a relative somewhere else, or
onta t a ity hospital, or get vital rop information. The urrently available English-based keyboard
and appli ations are totally unfamiliar and intimidating. Computers have be ome an essential part
of many fa ets of our lives. However, in the Indian ontext the use of omputers is far less ompared
with that in the developed nations of the West be ause of the reason we have already hinted at: the
language of the interfa e is almost always English and the ommuni ation is in the \written" form,
i.e., via the keyboard. Barely 65% of our population is literate, of whi h only an elite minority ( 5%)
an read, write, and speak the English language. This shuts out most of the Indian population from
the world wide web and its huge potential.
As a result he or she feels shut out and is not a part of the ongoing information revolution. On the
other hand, if the omputer had appli ations in the lo al language, 60% of the Indian population would
have been a essible to it. In a typi al Indian language, there are roughly 3000 hara ter lusters.
The number of keys on the keyboard will be ome unmanageably large if we want a single keystroke
for ea h luster. Instead, we will have to make do with a sequen e of keystrokes, making typing a
hassle. Even a literate person will nd this diÆ ult. This makes the keyboard interfa e unnatural for
Indian languages.
Therefore it is essential to have an interfa e that uses not only lo al language but also a more natural
one with ease of use. Thus, arises the need for handwriting interfa e atering to the literate population
of India. Providing a spee h interfa e would make Internet and other appli ations a essible to even
to semi-literate and illiterate se tions of the population also. We all su h an interfa e with keyboard,
handwriting, and spee h interfa es as being multimodal. However, spee h and handwriting interfa es
supplement the keyboard and do not repla e it.
In the West, although spee h and handwriting interfa es are available for English, these are for some
spe i appli ations, namely, di tation ma hines (hands-free), limited handwriting/graÆti re ognition
as in Personal Digital Assistants (PDA). This is primarily be ause of the simpli ity of the Roman
s ript. Whereas, in the Indian ontext, these interfa es must be part of main stream appli ations,
namely mail readers, web browsers, word pro essors. This is the FIRST su h e ort to seamlessly
integrate all three interfa es.
As of today, web browser is most widely used appli ation. Web pages an be very broadly lassi ed
into two types namely passive and a tive pages. Passive pages are those web pages whi h just give
information to the user and does not require any user input. A tive pages are those web pages whi h
require user input. We all su h pages where the user an enter data, as web forms or online forms or
web-based forms. A small survey of the various websites reveals us that online forms are the heart of
appli ations like railway reservation enquiry, telephone enquiry, any ti ket reservation appli ation, and
online shopping et . Hen e, the fo us of this paper is on enabling web-based forms with multimodal
interfa e. In the ontext of web forms, multimodal interfa e makes more sense, as the data to be lled
an ome from any of the three input modes. Multimodal interfa e thus provides a natural medium of
ommuni ation with the system. But, for an ordinary man, it is not enough. He needs lo al language
support whi h is a must. Thus, with multimodal interfa e and lo al language support, Internet and
the wealth of its resour es are available to our Indian population.
The organization of the paper is as follows: In Se tion 2 we dis uss the issues and design for enabling
web-based forms with multimodal interfa e. In Se tion 3, we des ribe the implementation of the
multimodal interfa e. In Se tion 4, we dis uss a prototypi al appli ation - a multimodal railway
reservation enquiry system.

2 Design of Multimodal Interfa e for Web Forms

In this se tion, we des ribe the normal pro essing of user's request in a browser and the issues and
design related to enabling web forms with multimodal interfa e.

2.1 Form Filling Appli ations on the Web

Usually, when the user requests for a page in the browser, the request is forwarded to the proxy
spe i ed in the browser on guration. The proxy sends the requested page to the lient from its a he
or from the requested web server. In the ase of web forms, say a sear h engine, user inputs the data
and on li king \Sear h" button, a query is sent to the sear h engine web server with the data entered.
The sear h engine web server pro esses the request and responds by sending the sear h results page to
the lient. This pro ess also goes through the proxy only. In the next se tion, we des ribe the issues
in providing multimodal support to web-based forms.

2.2 Issues in Multimodal Interfa e for Web Forms

As mentioned in the previous se tion, user requests for page by giving its Uniform Resour e Lo ator
(URL) in the browser. From the user's end, multimodal support means to display the multimodal
version of the web page when he/she requests for a web form. We should also be able to send the data
values entered by the user using multimodal interfa e to the web server and get the response. Usually
it is the browser whi h interfa es between the user and the web, i.e. it handles the intera tion with
the user on one hand and the web on the other through the proxy. In order to provide multimodal
support, we should prevent the browser from displaying the requested page and repla e it with the
multimodal page. We should handle the user interfa e at the front end and the intera tion with the
web at the ba kend. In the following subse tions we explain how this ould be done.

2.2.1 User Interfa e and Handling Intera tion with User

Given the above requirements, we have a design with the multimodal interfa e implemented as a
plugin, along with a lo al proxy and an Hyper Text Markup Language (HTML) parser as shown in
Figure 1. With respe t to the user interfa e, all that we need to do is, to display equivalent multimodal
enabled form when the user requests for an URL in the browser. We need the URL requested to he k
if the URL is a web form. Generally, the browser gives the requested URL to its proxy. Hen e, we
run a lo al proxy on the lient, whi h is on gured in the browser. At the proxy, we trap the URL
requested. Then the URL is he ked to see if it is a form by going through a le. The le ontains the
URLs of the web forms. The user has the ability to modify the le to suit his needs. If the requested
page is a form, we send the HTML page with multimodal plugin to the lient, else it goes to the
higher level proxy and page is retrieved without multimodal support as usual. In order to generate
multimodal pages on the y, we need the knowledge of the ontrols in the original HTML page, for
whi h we need the original page. To identify the ontrols in the HTML page, we need an HTML
parser.
At the lient end the browser displays the response (HTML page with multimodal plugin) from
the lo al proxy, and this triggers the exe ution of the multimodal plugin (refer Figure 2). Plugin
downloads the requested page, parses it and stores the identi ed ontrols in proper data stru tures.
These identi ed ontrols are populated on a basi multimodal form and displayed to the user. Now,
the user an give input in any of the three modes. Details on the multimodal apability is given in
se tion 3.

2.2.2 Handling Intera tion with Web

Consider a situation where the user has entered the data in a multimodal form. Now, these data values
must be sent/posted to the appropriate web server. Posting information to a web server requires the
Send the HTML page from the web server

Local Proxy

Request Is it web No Higher level


Client
form? Proxy
Browser

Yes

Send the HTML page


with multimodal plugin

Figure 1: Design for enabling web-based forms with multimodal interfa e

Working of multimodal plugin

Download requested URL

Parse the downloaded page

Identify the controls

Populate multimodal page with


parsed controls

Display multimodal page

User

Figure 2: Steps involved in plugin exe ution for enabling multimodal interfa e

URL to whi h the information has to be posted, a method of sending the information, and a query.
Method is GET or POST method by whi h the web server a epts data. Query is a string in whi h
the attribute names and the data values are pa ked into. URL of the web server, method and the
attribute names are in the HTML page itself. During parsing, these information are also parsed and
stored in appropriate data stru tures. So, on li k of \Submit" button, the query is built by the
plugin. Then the query along with URL and method are passed to a PERL s ript, whi h onne ts
to the remote web server, posts the query and gets the response whi h is also an HTML page. The
response obtained is opened as a normal web page in the browser.
2.3 Issues in Input Modes/Interfa es

In this se tion, we dis uss the issues involved in keyboard, spee h and handwriting interfa es or input
modes.

2.3.1 Keyboard Input Mode

Lo al language support for typing is provided as stated in [1℄. Internationalization and lo alization
on epts have been used for providing lo al language support for stati strings. Here, the language
resour es are separated from the appli ation. For displaying stati strings in menu bar, tool bar et .
in lo al language, we have to provide the lo al language strings apriori to the system. Then, on
hoosing a parti ular language, those lo al language strings will be used.
In lo al language mode, the English text in the web form, like instru tions to ll, legends for the
abbreviations used, or other important information might be lost if we display only the ontrols in
the multimodal page. So, we have to transliterate the English text to lo al language and display it in
the multimodal page. In this mode, the data entry will also be in lo al language. Again, these lo al
language data values must be onverted into English as the remote web server an understand only
English, before sending the request to it. This means that we need algorithms to transliterate English
to lo al language and vi e-versa. The rux of the task here is to press a sequen e of keystrokes to get
a hara ter in lo al language. The typed lo al language string must be transliterated to English later
before sending to web server. So, this suggests that we should use same map tables for typing and
transliteration. Keyboard map tables an be spe i ed by the user. But, this requires a areful design
of the map tables to a hieve the purpose.
Keyboard interfa e is domain independent and language independent as you use the same software
by just hanging the language resour es. It is more reliable and robust but umbersome to type in
lo al language. But, in ases where the ba kground noise is very high, spee h re ognition system may
fail and when the hara ters are not properly written, the handwriting may also fail. In su h ases,
we need the reliability of the keyboard to orre t the wrong re ognition. Spee h and handwriting
requires oine training of models for re ognition unlike keyboard. Also, ertain ontrol a tions an
be done by keyboard very easily. Employing spee h in su h ir umstan es may result in unne essary
overhead. Hen e, keyboard is kept a tive with either handwriting or spee h interfa e. The fo us here
is to supplement the keyboard with handwriting and spee h and not to repla e it.

2.3.2 Spee h and Handwriting Input Modes

In all these web forms that we are talking about, we see that every web form has a nite vo abulary
of words. For a nite vo abulary system with 50 words the spee h and hara ter re ognition rates are
as high as 96% and 83% [2℄ on an average. With the vo abulary knowledge, we an further improve
the performan e using longest mat h. The vo abulary in the ase of web forms have a hierar hial
stru ture. For example, in Bharat San har Nigam Ltd. (BSNL) telephone dire tory servi e, in the
rst form we have to give the distri t we need from the list of distri ts in Tamil Nadu. In this the
range of values for a ontrol is known. Hen e only those models are loaded for re ognition. This saves
a lot of memory by not loading all the models and also speeds up re ognition time.
Both spee h and handwriting require a vo abulary of words for whi h the models need to be trained
for re ognition. One alternative is to put all the possible words and train for them. Re ognition will
be very slow, also the performan e will go down, due to more number of models. Also, one user may
visit a page more frequently than the other user. But, he needs good performan e for that page. Hen e
he an train for that web form and use it rather than having a slow, less performing re ognizer with
all the words. He an train for as many forms as needed. We must provide user friendly interfa es for
training for spee h and handwriting for web forms.
In the ase of handwriting, we have hara ter re ognizers, thereby eliminating the need to train for
ea h web form. We an have hara ter re ognizers for ea h language, and use the web form vo abulary
for enhan ing the performan e.

3 Implementation

As mentioned earlier, there are a lot of form lling appli ations on the web. Those web pages ontain
mostly the ontrols like text box, radio button, ombo box or the pull down menu, ommand buttons
like submit, lear et . Data entry using radio buttons, he k buttons, ombo boxes is just a mouse
li k. Now, we are left with text box only where the user has to type data.
At this point, we have two methods to provide multimodal support.

 Provide multimodal support to the text box ontrol of a tool kit, for example, Qt.
 Or, develop a multimodal form apable of handling the three input modes.

In the rst ase, multimodal support has to be provided for ea h and every ontrol if we de ide that
ea h ontrol needs a multimodal interfa e. Also there will be overheads of ea h ontrol running a
spee h and handwriting re ognizer.

Recognized Controls in a web form


Multimodal Interface input from
Text box control with current
Speech Hand− focus
writing 3 modes
Controls
Keyboard without
focus

Figure 3: Design of multimodal interfa e

In the se ond ase, whi h we have followed, there is a basi multimodal form with three input modes.
Any appli ation using this main window, an use the multimodal apability. There will be one spee h
and handwriting re ognizer for the whole appli ation as shown in Figure 3. Generally, with keyboard
interfa e the ursor indi ates the position where the typed key strokes appear. So, for spee h and
handwriting interfa es also, we have to indi ate by keeping the ursor in the position where the
re ognized string should be pla ed in the appli ation. Spee h and handwriting interfa es omplement
ea h other and supplement the keyboard.
In the multimodal form there are three i ons namely, keyboard, spee h, and handwriting. On li king
the spee h i on, the re ord i on omes up. On li king the re ord i on, the user's utteran e is re orded
for a xed duration. After re ognition the result is obtained from the spee h re ognizer. Similarly,
on li king the handwriting i on, the write strip omes up. In the write strip, the user an write and
on li king 'Complete', the written word is re ognized and obtained from the hara ter re ognizer.
In both the above ases, the result is a string. With the information about, the ontrol having the
urrent fo us, the result is displayed in that ontrol by setting its text property to the obtained result,
from any re ognizer.
We have used Qt for building the basi multimodal form. But, this on ept an be used and the form
an be generated using the any available tool kit.

4 Prototypi al Web-Based Form Filling Appli ation

As a proof of on ept, we have developed a multimodal interfa e for Indian Railway Reservation
Enquiry appli ation, developed in our laboratory. This appli ation has been developed in a semi-
automati way. But, we have identi ed the issues to be ta kled for any web form, on whi h we are
working urrently.

Figure 4: Multimodal Interfa e in Hindi

Figure 4 shows the opening page of multimodal railway reservation enquiry appli ation. It shows three
i ons on the top left namely keyboard, spee h and handwriting interfa es. The user has to hoose the
required mode of input. Keyboard interfa e is kept a tive always with either handwriting or spee h.
All the ontrol a tions an be done using the keyboard.
Figure 5: Multimodal Railway Reservation Enquiry System with Handwriting Write Strip

Figure 6: Se ond form of Multimodal Railway Reservation Enquiry System

Total number of words for spee h, are around 130 for spee h purpose in luding 15 important ities in
India, 84 train numbers, 8 lasses, 12 months, and 31 days.
The vo abulary onsists of 50 important Indian ities in Tamil language for handwriting re ognition
(refer Figure 5). During re ognition, again the output from the re ognizer is he ked in the di tionary
for longest mat h and the mat hed string is displayed.
We an hoose the train number, date and month in the se ond form (see Figure 6), using the
multimodal interfa e. Then, on li king the 'Get It' button, the reservation information is displayed.
Here again, the query is sent to the Indian Railway web server and the reply is parsed and displayed
Figure 7: Third form of Multimodal Railway Reservation Enquiry System

as shown in Figure 7.

5 Con lusion

Spee h and handwriting interfa es omplement ea h other and supplement the keyboard very well in
the multimodal environment. In Railway Reservation Enquiry Appli ation, the multimodal interfa e
is implemented and does not pose any dis omfort to the user. In the Indian ontext, the multimodal
interfa e must be present in all mainstream appli ations, whi h is the area of our urrent work. As a
step in that dire tion, a word pro essor has also been enabled with the multimodal interfa e with a
nite vo abulary of words. It is hoped that development of a multimodal interfa es to the omputer,
will bridge the gap between the haves and the havenots and will go a long way in the empowerment
of the rural folk in India. With the proliferation of the Internet a ross the ountryside, Mahatma
Gandhi's dream of Village Swaraj an indeed be ome a reality (at least in the Information Te hnology
sphere).

Referen es

[1℄ Anitha Nalluri, Bala Saraswathi A, Bharathi S, Hema A Murthy, Patri ia J, Timothy A Gonsalves,
Vidhya M S, Vivekanathan K, " Indian Language Support for X - Window System," in Pro eedings
of the ICON 2002, (SP-P6.4, May 2004).

[2℄ Aparna K H, Vidhya Subramanian, Kasirajan M, Vijay Prakash G, V.S. Chakravarthy, Sriganesh
Madhavanath, " Online Handwriting Re ognition for Tamil," in Pro eedings of the IWFHR-9 2004.

Potrebbero piacerti anche