Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
wingifydevfest@nirantk.com
Why should you care?
Language is emotion
Why should you care?
● Research Engineer /
NLP Hacker - Maker of
hindi2vec
Python
What I expect you know already
Some exposure to
modern (deep)
machine learning
What I expect you know already
Ideas like:
● Seq2seq
● Text Vectors: GloVe,
word2vec
● Transformer
What you'll learn
today
What you'll learn today
NEW Idea: Transfer Learning for Text
What you'll learn today
how to do NLP with small datasets
What you'll learn today
There are too many NLP challenges in any language!
Language modeling
Classification
Word sense disambiguation
Lexical normalization Named entity recognition
What you’ll learn today
EXAMPLE
What you’ll NOT learn
today
What you’ll NOT learn today
No Math.
What you’ll NOT learn today
No peeking under the hood. No code. We will do that later!
Text Classification needs a lot of
data!
But exactly how much data is enough?
Let's get some estimates from English datasets?
IMDb 2 No 10x
On IMDb On TREC-6
SAME TASK MULTI-TASK
TRANSFER - TRANSFER -
Different Data Different Data,
Different Task
SAME TASK MULTI-TASK
TRANSFER - TRANSFER -
Different Data Different Data,
Different Task
How does this change
things for you?
Simpler code & ideas
Simpler code
NOW: DOWNLOAD AND ADAPT to
BEFORE: DEVELOP and REUSE
your Task
1. Select Source Task & Model e.g.
1. Select Source Model e.g. ULMFit
Classification
or BERT
2. Reuse Model e.g. for classifying
cars types or screenshot 2. Reuse Model e.g. for text
segmentation classification or any other text task
3. Tune Model to Your Dataset 3. Tune Model
a. Downside: Needs tagged a. Can use both untagged and
samples, does not learn from tagged samples
untagged samples
b. Upside: Can give me an initial
Can use the same source model
performance boost
4. Repeat for every New Challenge across multiple tasks, and languages
which you see. BORING!
TEXT BACKBONE TASK SPECIFIC
EMBEDDING LAYER
DATA FLOW DIRECTION
Simpler code
BEFORE: DEVELOP and REUSE NOW: DOWNLOAD AND ADAPT to
1. Select Source Task & Model e.g. your Task
Classification 1. Select Source Model e.g. ULMFit
2. Reuse Model e.g. for classifying or BERT
cars types or screenshot 2. Reuse Model e.g. for text
segmentation classification or any other text task
3. Tune Model to Your Dataset 3. Tune Model
a. Downside: Needs tagged
a. Can use both untagged and
samples, does not learn from
untagged samples tagged samples
b. Upside: Can give me
4. Repeat for every New Challenge Can use the same source model
which you see. BORING! across multiple tasks, and languages
GLoVe Language Classifier
Models
DATA FLOW DIRECTION
Simpler Code
We will download pre-trained language models instead of word
vectors
Making the Backbone
or Source Model
Making the Backbone
Pre-training for Language Models
The BERT model was trained in two tasks simultaneously: Masked Words
(Masked LM) and Next Sentence Prediction.
Making the Backbone
Label = isNext
Making the backbone
Label = NotNext
Pause!
Any questions at this point?
Indian Languages
e.g. Hindi, Telugu, Tamil
First Challenge: Making a good backbone
Indian Languages
e.g. Hindi, Telugu, Tamil
Text Backbone Task Specific
Embedding Layer
DATA FLOW DIRECTION
Hindi2vec: Based on ULMFit
- Designed to work well on tiny datasets and small
compute e.g. I work off free K80 GPUs via Colab
https://github.com/NirantK/hindi2vec
Alternative: Use Google AI’s BERT
Indian Languages
e.g. Hindi, Tamil
Text Embedding BERT Language Specific
Layer e.g. हंद
DATA FLOW DIRECTION
BERT: Based on OpenAI’s General Purpose
Transformer
- Designed to work well on larger datasets and
large compute e.g. they need few GPU-days to fine
tune for a specific language
- State of the Art Results on 11 NLP Tasks
BERT: Based on OpenAI’s General Purpose
Transformer
BERT-Multilingual : Works for 104 languages!
RELATED MYTH:
Not enough Indian
Language Resources!
Datasets
Ready to Use
Sidenote: You can Make
- Wikimedia Dumps with 100+ languages
your Own!
- IIT Bombay English Hindi Corpus includes
the following: - Online Newspapers
and Regional TV
Forums
- WhatsApp groups!
@NirantK
Created by @rasagy,
Typo: 1st Dec 2018 not 2019
Credits and Citations
- Slides and gifs from Writing Good Code for NLP Research by Joel
Grus at AllenAI
- ULMFit Paper and Blog by Jeremy Howard (fast.ai) and Sebastian
Ruder (@seb_ruder)
- Recommended Reading: Illustrated BERT
- BERT Dissections: Paper, Blogs:The Encoder, The Specific
Mechanics, The Decoder
- Visualisations Made from Neural Nets Visualisation Cheatsheet
Appendix
Appendix: 1 Slide Summary of ULMFit Paper
Howard and Ruder suggest using pre-trained models for solving a wide range of NLP problems. With this
approach, you don’t need to train your model from scratch, but only fine-tune the original model. Their
method, called Universal Language Model Fine-Tuning (ULMFiT) outperforms state-of-the-art results,
reducing the error by 18-24%. Even more, with only 100 labeled examples, ULMFiT matches the
performance of models trained from scratch on 10K labeled examples.
However, to be successful, this fine-tuning should take into account several important considerations:
● Different layers should be fine-tuned to different extents as they capture different kinds of
information.
● Adapting model’s parameters to task-specific features will be more efficient if the learning rate is
firstly linearly increased and then linearly decayed.
● Fine-tuning all layers at once is likely to result in catastrophic forgetting; thus, it would be better to
gradually unfreeze the model starting from the last layer.
Training Tasks: Masked Language Model tried on 5% at Random, Next Sentence Prediction
Results: SoTA on 11 NLP Tasks, mostly around Inference and QA. Indicated that model can be fine tuned on
new datasets and tasks both
Model: BERT-Base is inspired from OpenAI Transformer, roughly the same parameter size. BERT-Large is
340M parameters, based on Transformer Networks.