Sei sulla pagina 1di 21

Context of the Project

Data Grabbing
Analysis

TUNISIA POLYTECHNIC SCHOOL

Summer Project Presentation

Client Intent Detection For Advanced Services: A big


data solution

Carried out by: Tarek BEN CHARRADA


3rd year engineering student
Supervised by: Mr. Marouan OMEZZINE
Entrepreneur and Technology consultant
Vis-a-Vis: Dr. Takoua ABDELLATIF
Prof. Computer Technology

July-August, 2017

1/23 Tarek BEN CHARRADA (Tunisia Polytechnic School) Proxym Group


Context of the Project
Data Grabbing
Analysis

OUTLINES

1 Context of the Project

2 Data Grabbing

3 Analysis

2/23 Tarek BEN CHARRADA (Tunisia Polytechnic School) Proxym Group


Context of the Project
Hosting Organization
Data Grabbing
Presentation of the Project
Analysis

1 Context of the Project


Hosting Organization
Presentation of the Project

2 Data Grabbing

3 Analysis

3/23 Tarek BEN CHARRADA (Tunisia Polytechnic School) Proxym Group


Context of the Project
Hosting Organization
Data Grabbing
Presentation of the Project
Analysis

Hosting Organization

Mobile application and web based solution

More than 170 projects and applications developed in house

Top of the edge mobile solutions with the latest worldwide technologies

Part of Proxym-Group

4/23 Tarek BEN CHARRADA (Tunisia Polytechnic School) Proxym Group


Context of the Project
Hosting Organization
Data Grabbing
Presentation of the Project
Analysis

Presentation of the Project

Objective
detecting clients intent for several topics.

Foundations
Heavily relying on data grabbed from social media.

5/23 Tarek BEN CHARRADA (Tunisia Polytechnic School) Proxym Group


Context of the Project
Architecture
Data Grabbing
Filtering
Analysis

1 Context of the Project

2 Data Grabbing
Architecture
Filtering

3 Analysis

6/23 Tarek BEN CHARRADA (Tunisia Polytechnic School) Proxym Group


Context of the Project
Architecture
Data Grabbing
Filtering
Analysis

architecture

7/23 Tarek BEN CHARRADA (Tunisia Polytechnic School) Proxym Group


Context of the Project
Architecture
Data Grabbing
Filtering
Analysis

Possible Approaches

Real-time Processing
Delayed Processing

8/23 Tarek BEN CHARRADA (Tunisia Polytechnic School) Proxym Group


Context of the Project
Architecture
Data Grabbing
Filtering
Analysis

Real-time processing

Principe
Consists of detecting the intent right after its arrival.

Technologies
Heavily relying on Apache spark

Advantages
1 Assures Scalability
2 Is Fast
3 Allows the possibility of integrating some complex machine learning algorithms

9/23 Tarek BEN CHARRADA (Tunisia Polytechnic School) Proxym Group


Context of the Project
Architecture
Data Grabbing
Filtering
Analysis

Architecture

10/23 Tarek BEN CHARRADA (Tunisia Polytechnic School) Proxym Group


Context of the Project
Architecture
Data Grabbing
Filtering
Analysis

Delayed processing

Principe
Consists of waiting until a fine amount of information is collected.

Technologies
Heavily relying on Apache Kafka

Advantages
1 Assures Scalability
2 Allows the model to learn more

11/23 Tarek BEN CHARRADA (Tunisia Polytechnic School) Proxym Group


Context of the Project
Architecture
Data Grabbing
Filtering
Analysis

Our choice

12/23 Tarek BEN CHARRADA (Tunisia Polytechnic School) Proxym Group


Context of the Project
Architecture
Data Grabbing
Filtering
Analysis

Filtering

Filtering
1 Based on topics If we were to predict a car purchase then
the topic would be car and the keywords
2 Each topic defined by a group of
would be all possible words that are related
keywords
to the word car.
3 Example

13/23 Tarek BEN CHARRADA (Tunisia Polytechnic School) Proxym Group


Context of the Project
Preprocessing
Data Grabbing
Training
Analysis

1 Context of the Project

2 Data Grabbing

3 Analysis
Preprocessing
Training

14/23 Tarek BEN CHARRADA (Tunisia Polytechnic School) Proxym Group


Context of the Project
Preprocessing
Data Grabbing
Training
Analysis

Data mining

Text cleansing
1 Stop word removal
2 Rare word removal

Features extraction
basically term frequency but downscaled

15/23 Tarek BEN CHARRADA (Tunisia Polytechnic School) Proxym Group


Context of the Project
Preprocessing
Data Grabbing
Training
Analysis

Algorithms

Naive Bayes
Support Vector Machine
Decision Tree
Stochastic Gradient Descent
Forest Tree
Neural Network

16/23 Tarek BEN CHARRADA (Tunisia Polytechnic School) Proxym Group


Context of the Project
Preprocessing
Data Grabbing
Training
Analysis

Naive Bayes

Assumption
Each feature is independent from others

Foundation
Prior Likelihood
Posterior = evidence

Accuracy
80%

17/23 Tarek BEN CHARRADA (Tunisia Polytechnic School) Proxym Group


Context of the Project
Preprocessing
Data Grabbing
Training
Analysis

Stochastic Gradient Descent

Assumption
Linear problem

Foundation
Maximise the entropy

Accuracy
78%

18/23 Tarek BEN CHARRADA (Tunisia Polytechnic School) Proxym Group


Context of the Project
Preprocessing
Data Grabbing
Training
Analysis

Support Vector Machine

Assumption
Linear problem

Foundation
Maximise the distance between the Hyperplane and Nearest Training Data

Accuracy
81%

19/23 Tarek BEN CHARRADA (Tunisia Polytechnic School) Proxym Group


Context of the Project
Preprocessing
Data Grabbing
Training
Analysis

Neural network

Assumption
Universal Approximation :

Accuracy
82%

20/23 Tarek BEN CHARRADA (Tunisia Polytechnic School) Proxym Group


Context of the Project
Preprocessing
Data Grabbing
Training
Analysis

THANK YOU FOR YOUR ATTENTION

21/23 Tarek BEN CHARRADA (Tunisia Polytechnic School) Proxym Group

Potrebbero piacerti anche