Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Data Grabbing
Analysis
July-August, 2017
OUTLINES
2 Data Grabbing
3 Analysis
2 Data Grabbing
3 Analysis
Hosting Organization
Top of the edge mobile solutions with the latest worldwide technologies
Part of Proxym-Group
Objective
detecting clients intent for several topics.
Foundations
Heavily relying on data grabbed from social media.
2 Data Grabbing
Architecture
Filtering
3 Analysis
architecture
Possible Approaches
Real-time Processing
Delayed Processing
Real-time processing
Principe
Consists of detecting the intent right after its arrival.
Technologies
Heavily relying on Apache spark
Advantages
1 Assures Scalability
2 Is Fast
3 Allows the possibility of integrating some complex machine learning algorithms
Architecture
Delayed processing
Principe
Consists of waiting until a fine amount of information is collected.
Technologies
Heavily relying on Apache Kafka
Advantages
1 Assures Scalability
2 Allows the model to learn more
Our choice
Filtering
Filtering
1 Based on topics If we were to predict a car purchase then
the topic would be car and the keywords
2 Each topic defined by a group of
would be all possible words that are related
keywords
to the word car.
3 Example
2 Data Grabbing
3 Analysis
Preprocessing
Training
Data mining
Text cleansing
1 Stop word removal
2 Rare word removal
Features extraction
basically term frequency but downscaled
Algorithms
Naive Bayes
Support Vector Machine
Decision Tree
Stochastic Gradient Descent
Forest Tree
Neural Network
Naive Bayes
Assumption
Each feature is independent from others
Foundation
Prior Likelihood
Posterior = evidence
Accuracy
80%
Assumption
Linear problem
Foundation
Maximise the entropy
Accuracy
78%
Assumption
Linear problem
Foundation
Maximise the distance between the Hyperplane and Nearest Training Data
Accuracy
81%
Neural network
Assumption
Universal Approximation :
Accuracy
82%