Sei sulla pagina 1di 23

Mobile Tartu

Understanding activity patterns from Social Media


Data: a fusion of Facebook and Twitter

Manos Chaniotakis

CERTH-HIT Research Associate

Ph.D. Candidate
National Technical University of Athens, Greece

Chaniotakis@certh.gr
COST TU1305 | 29-06-2016 | Tartu
Introduction
Social Media Definition and Statistics
2

Definition (Kaplan and Haenlein, 2010)


group of Internet-based applications that are built on the
ideological and technological foundations of web 2.0

Facebook the 3rd most visited website


Twitter the 10th most visited website
Remaining Top 10 rated website are search engines
(4 out of 8)

A. M. Kaplan and M. Haenlein, Users of the world, unite! The challenges and opportunities of Social
Media, Business Horizons, vol. 53, no. 1, pp. 5968, jan 2010.
Social Media in Transportation
A review
3
Social Media in Transportation
Challenges
4

Two major challenges


Functional use of Social Media
Data availability
Social Media Functionalities
Score Card
5
Social Media Data Availability
Score Card
6
Datasets (1)
7

Facebook Check-ins
3 months of data collection
City Centre

Multiple queries (centroid-150m radius)

Twitter Data
Streaming (real-time data collection)
Historical (users timeline)

Conventional Travel Survey


Telephone Survey
Foursquare Check-ins
Datasets (2)
Data Collection
8

Facebook:
7,511Venues
~20,000 check-ins per day

Twitter Data
7,856 geotagged users
~118,000 tweets

~1000 (geotagged) tweets per day


Temporal Distribution
Percentile Distribution
9
Temporal Distribution
Correlation
10
Activities
Distribution
11
Problem Definition
12

Topic Modelling limitations for Tweet text


160 characters
Slunk

Activity-based Corpus Required for activities

Matching of POIs on Twitter


Approach
13

Spatial Fusion Textual Fusion Enrich Information


Facebook Venue Geotagged
Locations Tweets Location
Data Preparation - Cleansing
R script for all operations Spatial Fusion
Packages used:
maptools rgdal geosphere sp Candidate Venues
per Tweet (5)
rgeos
stringi - stringr Text Operations
upper,Latin,ASCII

Textual Fusion
Spatial Fusion
14

Find which tweets have been performed in an area


300 meters from each venue
Find the closest candidate venues (5 venues)
Merge data-sets
Text Operations
15

Transform all characters to Latin


Remove Special Characters
Remove punctuation
Remove words with less than 3 characters
Make all upper case

@1055rock @1055ROCK EKPOMP 1055ROCK EKPOMPE


. 11. KOZMEEEE . STIS 11. KOZMEEEE STIS 11
1. MECHRI T 1. MECHRI TE 1 HTTP
\nhttp://t.co/1Nh2dfVF6 \nHTTP://T.CO/1NH2DFV 1NH2DFVF6W MONO
w\n F6W\nMONO EPIDEIX EPIDEIXE TAPER
. TAPER DE THA ECHOUME. ECHOUME
Tweets Text Statistics
16

word freq
Greek Latin transliteration https 45363
thessaloniki 18696
Manually remove (greek) stopwords greece 3581
love 2330

Word Frequency henrycavill


happy
1517
1500
follow 1453
menfashion 1401
denim 1340
boutique 1147
bar 1143
kinorri 1143
photo 1091
kalimera 1089
day 998
skg 993
please 984
sten 964
nightlife 958
eleni 948
alphatv 891
Log-Log Plot Word Frequency
Textual Fusion
17

Exact Match (grepl)


Venue Name
Venue Category

Aggregated Venue Category

Approximate Match (text Distance)


Venue Name
Venue Category

Aggregated Venue Category


Textual Fusion
Exact Match
18

total 54319 at least on 12374


Textual Fusion
Approximate Match
19

Test words of Facebook venues on Twitter Text


Normalized Levenshtein distance (number of changes
to reach same word- adist)

adist / nchar(FBexamined)

Restricted to strings with nchar > 3


Thresshold = 0.3
Adist(Kafe , cafe) = 1 , normalized =
Textual Fusion
Approximate Match
20

total 54319 at least on 27524

Approximate Match (text Distance)


Conclusions
21

Work in progress
Not all (geotagged) tweets are related to activities
Activity Assignment in tweets from Facebook
A start for data fusion (spatial textual)
Quite promising results
Future Work
22

Topic Modelling on Activity corpus (from identified


Twitter activities)

Use of other data sources (POIs, 4SQUARE, City)

Connection to real world (questionnaires)


Manos Chaniotakis
CERTH-HIT Research Associate

Ph.D. Candidate
National Technical University of Athens, Greece

Chaniotakis@certh.gr
COST TU1305 | 29-06-2016 | Tartu

Potrebbero piacerti anche