Sei sulla pagina 1di 18

Information Extraction from

Tweets posted during Disaster

Shamik Kundu (CS16MTECH11015)


Guide: Srijith P.K and Maunendra Sankar Desarkar

1
Contents:
1. Problem Statement description
2. Data set related info
3. Proposed Model
4. Results
5. Brief comparison between models
6. Analysis
7. Future work
8. References
2
Problem Statement description:
A large set of tweets posted during recent Nepal disaster event along with a set of
classes is given [FIRE 2016 Microblog track]. Each class will identify a broad
information need during a disaster.

Aim is to develop IR methodologies for extracting relevant tweets according to a


given set of topics describing a generic information need during the disaster
situation.

3
Given classes:

4
Detailed example of a class:

5
Example of some tweets and their class labels:

Resource required(FMT2) Govt/NGO works(FMT6)

Property damaged/restored(FMT7) Medical resource required(FMT4)


6
Example of some tweets and their class labels:

dsad

Resource available(FMT1) Medical resource available(FMT3)

7
Resource Required/available at specific location(FMT5)
Dataset Info:

8
Basic outline: proposed NLP-feature Based model

9
Proposed NLP-Feature Based model:

10
Results(contd.)

11
Results(contd.):

12
Brief comparison between models:

13
Analysis:
Classification accuracies of FMT2(resource required) and FMT4(Medical resource
required) are below 30%

Probable cause:

1. Many of the miss-classified tweets contain multiple fragments each of which


belongs to 2 different classes.

14
Analysis (Contd.):
2. The descriptions of few classes [FMT1(re. avail), FMT2(re. reqd.), FMT3(med.
re. avail) and FMT4(med. re. reqd.)] are very close thus leading to
miss-classification.

3. The number of tweets belonging to FMT4(Medical resource required) in the


dataset is exceptionally low (5.35% of total data) and at the same time it has high
similarity with FMT2 and FMT3 thus leading to really poor classification accuracy.

15
Future work:
1. Try multilabel classification to improve classification accuracy.
2. Try multi-view learning to exploit the advantage of redundant views.
3. Try to map requirement with availabilities.

16
References:
1. Anirban Sen, Koustav Rudra and Saptarshi Ghosh, “Extracting Situational
Awareness from Microblogs during Disaster Events”
2. Apoorv Agarwal, Boyi Xie, Ilia Vovsha, Owen Rambow, Rebecca Passonneau,
“Sentiment Analysis of Twitter Data”
3. Koustav Rudra, Siddhartha Banerjee, Niloy Ganguly, Pawan Goyal,
Muhammad Imran, Prasenjit Mitra, “Summarizing Situational Tweets in Crisis
Scenario”

17
THANK YOU.

18

Potrebbero piacerti anche