Sei sulla pagina 1di 19

A Vietnamese Language Model

Based on
Recurrent Neural Network
Viet-Trung Tran, Kiem-Hieu Nguyen, Duc-Hanh Bui
Hanoi University of Science and Technology

Friday, October 7, 16
1
Outline

Statistical language model

Current state of the art

RNN for Vietnamese language model

Experimental results

Conclusion

2
Friday, October 7, 16
Statistical language
model
A probability distribution of word sequence

E.g. “go to the airport”

? = P(“airport”|“go to the”)

Applications:
LABAN KEY
Spelling checkers, smart keyboards

Enhance speed recognition/machine translation

3
Friday, October 7, 16
Challenges
Meaningful

grammatically correct

understandable

Context-aware

E.g. I am from Vietnam. My mother-tongue is Vietnamese

Out of vocabulary

Slang, abbreviations, etc.

4
Friday, October 7, 16
Common approach
N-gram language model
Katz's back-off: estimates the conditional
probability of a word given its history in the n-gram
When trigram unavailable -> back-off to bi-gram
-> uni-gram

SOURCE: HTTPS://EN.WIKIPEDIA.ORG/WIKI/KATZ%27S_BACK-OFF_MODEL
5
Friday, October 7, 16
N-gram language model
Only see a few words back
Only predict words seen in the same context

6
Friday, October 7, 16
Deep learning for NLP
Word embedding

MIKOLOV ET AL. (2013B).

(SOCHER ET AL. (2013A))


7
Friday, October 7, 16
Recurrent neural
network for text

INPUT : GO TO THE
OUTPUT : TO THE SCHOOL
PROBABILITY (SCHOOL | GO TO THE)
8
Friday, October 7, 16
RNN vs. N-gram
Foldable word context vs. fix n-gam context
Personalization through continuous learning
More meaningful text suggestions
Naturally support phrase, terms suggestions

9
Friday, October 7, 16
RNN for Vietnamese
language model
Character level language model
{previous characters} -> next characters
Syllable level language model
{previous syllables} -> next syllables

10
Friday, October 7, 16
LSTM cell

SOURCE: HTTP://COLAH.GITHUB.IO/POSTS/2015-08-
UNDERSTANDING-LSTMS/

11
Friday, October 7, 16
Stacking multiple layers

12
Friday, October 7, 16
Experiments
1,500 MOVIES - 2.056.308 SENTENCES

13
Friday, October 7, 16
Experimental results

14
Friday, October 7, 16
15
Friday, October 7, 16
Conclusion
First neural language model for Vietnamese
Largest experimental dataset
Future work
Word embedding
Neural net compression
Conversational neural machine translation

16
Friday, October 7, 16
Thank you for your
attention

17
Friday, October 7, 16
Conversational
Chú hoài linh đẹp trai. Chú hoài linh
Chào buổi sáng
chị hát hay wa!! nghe thick a.
chị khởi my ơi e rất la hâm mộ
chú hoài linh thật đẹp zai và chú Trấn thành đẹp

18
Friday, October 7, 16
lịch sử ghi nhớ năm 1979
tại hội nghị, đồng chí Phạm Ngọc Thủy Võ Văn
Kiệt
tại hội nghị, đồng chí Hồ Chí Minh nói
tại hội nghị, đồng chí Võ Nguyên Giáp và đồng chí
Hồ Chí Minh đã ngồi ở
tại đại hội Đảng lần thứ nhất vào năm 1945,
Ngay từ những ngày đầu, Đúng như nhận xét của
Giáo sư Nguyễn Văn Linh

19
Friday, October 7, 16

Potrebbero piacerti anche