Sei sulla pagina 1di 16

AFNLP 2008 Meeting

Indonesia Country Report


Hammam Riza

hammam@iptek.net.id
Agency for the Assessment and
Application of Technology (BPPT)
Ministry of Research and Technology
Republic of Indonesia
1

TOC

Past Activities
Activities in 2007
Activities Plan 2008, 2009
National Language Year 2008

Past NLP Research Projects


in Indonesia

Indonesian Text-To-Speech (BPPT, ITB, UI)


GDA/MMA/Linguistic-DS MPEG-7 (Multimedia Annotation)
Cross-Linguistic Portal (dictionaries, corpus, tools)
Web translator (WebTRans)
Standard Indonesian Language Corpus (SILC)
Indonesian Language Dictionaries Project (KBBI)
English-Indonesia Parallel Corpus (INCI)
Speech recognition/synthesis system (Bandung Institute of
Technology/ Telkom RDC/University of Indonesia)
Information retrieval (ITB and University of Indonesia)
Text/Image processing tools (Gajah Mada University)
Computational lexicon (National Language Center)
Computational morphology (Atmajaya University)
3

Promotion of Language
Technologies (2007)

National Language Congress XII in Solo


introducing toolkit to build speech database for
endangered languages and Atmajaya
Language Workshop (June 2007) in Jakarta
on promoting local computing policy and
speech technologies (both keynote speeches
by Dr. Hammam Riza)

Promotion of Context Sensitive Dictionary


Project for Speech Translation Corpus for
Aceh Tsunami Region; (Indonesian-Acehnese,
bidirectional)
4

Activities in Machine Translation


(2006-2007)

Rule-based system Indonesian-English


translator (started in 2006) was launched to
the market June 2007 by ITB

This translator is combined with English TTS


(Windows), and Indonesian TTS (proprietary)
Experiment of Statistical MT using Pharaoh
decoder (Eng-Indo parallel corpus) by

Current Activities in Speech Tech


Telkom RDC & BPPT collaboration on
Speech Recognition and Summarization
Indonesia Goes Open Source (IGOS)
speech recognition system (funded by
Ministry of Research and Technology)
Speech recognition system for Bahasa
Indonesia (University of Indonesia)
Transcribing speech data that contains
broadcast TV and Radio news
Applications:
sending short message service (sms)
IVR ( health and tourism services)

Research for intonation by example and


automatic prosody pattern extractor
using Artificial Neural Network (ANN)
Text to Speech system for local languages
(ITB/UI)
6

100th Year of Bahasa Indonesia


National Language Year 2008

Series of event culminating at the International Conference on


Bahasa Indonesia (Oct 2008)

Importance of Indonesian Its roles, functions in national life &


development (policy making, business, media, education)
Language planning (shaping change)

6 keynote speakers from AFNLP will be invited by Indonesian


government through out the year

Major Activities for 2008

Local Language Resource Projects (Language Center)

Indonesian and Local Languages - Wordnet

MALINDO (Malaysia-Indonesia) joint projects

Speech to speech translation for Asian languages (ASTAR)

Speech database Telkom RDC/BPPT (APT support)

Language Resources and Translation English Indonesia (collaboration with PAN Localization)

Speech Corpus for Local Languages (Endangered


Languages) using BLARK (ELDA)
8

Activities Plan for 2008-2009

Speech Recognition and Phrase-based Statistical Machine


Translation (SMT) system for bidirectional Indonesian-English
and Indonesian-Japanese

Mapping and SMT for Indonesian-Regional Languages


(Bahasa Nusantara) and for German, French, Chinese and
Arabic (cross border languages)

Information Retrieval (cross language speech retrieval)

Topic Detection and Tracking (TDT)

Searching and retrieving Indonesian speech data


Identifying topics in speech data collection
Classifying new data to the existing topics in the collection

Speech Synthesis
Speech Summarization

Summarize the Indonesian speech documents


9

E-dictionary project
National Language Center

Size & Comprehensiveness:

Method:

corpus-based,
primary data for largest print dict

Kamus Besar
Bahasa Indonesia
(KBBI) 3rd ed.

Usefulness:

200,000 entries
many subject areas are covered

find the words you need


definitions and examples are helpful

Users

writers, journalists, editors, scientists,


academics, teachers, students, business
people, lawyers etc

Echols & Shadilys


Eng-Ind. dictionary.

In Indonesia, there are at least 13 biggest local


languages with at least one million speakers
Javanese (75,200,000)
Madurese (13,694,000)
Buginese (4,000,000)
Sasak (2,100,000)
Rejang (1,000,000)

Sundanese (27,000,000)
Minangkabau (6,500,000)
Balinese (3,800,000)
Makassarese (1,600,000)

Malay (20,000,000)
Batak (5,150,000)
Acehnese (3,000,000)
Lampung (1,500,000)

ACEH 32 local languages

EAST JAVA 6 local languages

LOCAL & CROSS-BORDER LANGUAGES

Note:
Cross-Border Languages in Indonesia:
English, Arabic, Chinese, French,
German, Dutch, Japanese, etc.

Language Digital Divide


Language Preservation

Survey of indigenous local languages


Local computing policy will be
developed for major local languages
Endangered languages are identified
and preserved by means of ICT
Language resources collection for
official and major local languages

Thank You

Any comments please mail


to
hammam@iptek.net.id

Potrebbero piacerti anche