Sei sulla pagina 1di 24

MACHINE LEARNING

AND
PREDICTIVE MODELING

Prepared and presented by :

Rabi Kulshi

1
2 2
Contents

1. Why Machine Learning?
2. What is Machine Learning?
3. Type of Machine Learning
Supervised Learning
Regression
Classification
Unsupervised Learning
Clustering
4. Implementation
Regression Model
Using Normal Equation
Using Gradient Descent
Classification Model
Using Logistics Regression
Using Nave Bayesian Conditional Probability
Clustering
Using K Means


2
3
Introduction
Why Machine Learning?

My first thought: Is it because humans cant learn any more???
Volume, Velocity and Variety of information drove the need for
machine learning
Big Data, Hadoop and analytic challenges
It was easy for human to estimate the house price based on 3-4
features like location, size, age and interest rate
Imagine when you have understand the factors( 30-40) that drives
our users decision to buy an image
Imagine when you have to detect a fraud attempt that has 100
300 features (attributes) and attempts are appearing at 100
messages/sec.

T
r
a
i
n

What is Machine Learning?




As defined in Wikipedia (Tom M. Mitchell) : A computer program is said to learn from experience E
with respect to some class of tasks T and performance measure P, if its performance at tasks in
T, as measured by P, improves with experience E

Successfully
labeled
observations
Incoming new
events,
Data set,
Observation,
(Experiences)

Exp
Task
Perf
% of correct
classification
Exp
4
The Learning Model?
with continuous knowledge update
T
r
a
i
n


Successfully
labeled
observations
Incoming new
events,
Data set,
Observation,
(Experiences)

Exp
Task
Perf
% of correct
classification
Update
knowledge
Exp
Knowledge update
Based on sample
Based on Filter
Real time/batch
Supervised update


f(x)
X{x1,x2,xn}
y
5
Type of Machine Learning?




Supervised Learning
In supervised learning a supervisor provides a set of labeled data,
also known as experience that is used to train the machine to build
knowledge (i.e. a model/function)



Unsupervised Learning
Find some structure or groups in the data. Also known as clustering
technique
Example :News groups, Cohesive groups in Facebook, customer groups
(customer Age, Geo location, Profession,
social group,)
Purchase Image A (Yes or No)
6
Examples of Supervised Learning
where outcome is a continuous variable




Outcome or yield (Y) is a continuous
variable, i.e. we are predicting numeric value
Yah, this is a regression problem
Example : Revenue from on-line sales for next year
concurrent active user sessions

Here y is the outcome that depends on m features or
attributes X1, X2...Xm



)
m
x ........
2
x
1
f(x y + + + =
7
Examples of Supervised Learning
where outcome is discrete variable






Outcome or yield is a discrete variable
Got it , this is a Classification problem
Outcome has two classes
Example : Here the outcome Y will have one of the two values
: Yes or No,
: True or False
Outcome has multiple classes
Example : Classify email tone as Threat, Anger, Appreciate,
Unhappy.. ; i.e. value of Y is either Threat or Anger or
Appreciate

8
Regression Model, Visual Analysis
Passion for photography(x)
(Feature, independent variable)
O
n
-
l
i
n
e

p
u
r
c
h
a
s
e

o
f

i
m
a
g
e
s

p
e
r

y
e
a
r


(
y
)

(
D
e
p
e
n
d
e
n
t

v
a
r
i
a
b
l
e
,

Y
i
e
l
d
,

r
e
s
u
l
t
)


10
0
1k
15
5
4k
3k
2k
X Y
x
x
y
x x y
x y
T
|
| |
| |
| |
=
(

=
+ =
+ =
1
0
1 0
1 1 0 0
1 1 0
) , (
9
Classification Model, Visual Analysis
Time spend to search and make decision
(In minutes) (Feature)
P
u
r
c
h
a
s
e

a
m
o
u
n
t

i
n

$

(
F
e
a
t
u
r
e
)


20
0
50
30
10
200
150
100
Decision
Boundary
10
Distance
from
decision
boundary
Clustering Model, Visual Analysis
T
y
p
e

o
f

I
m
a
g
e

p
u
r
c
h
a
s
e
d

Customer Age
50 20
a
60 30
d
c
b
e
40 60 70
11
Key considerations and concepts for implementation
and development of Machine learning Models
12
Visualize your data to identify features,
understand and determine the model
Carefully analyze Standard deviation or
percentage of false positive or false negative
Set users expectations on the probability of
false conclusion, this is very important
Keep a balance between Bias caused by
under fitting and Variance cause by over
fitting
May decide to use Training, Validation and
Test (60%, 20%,20%) steps
Regularization process and selection of
regularization parameter
A Few Machine Learning Techniques
13
Machine Learning
Supervised
Learning
Regression
Model
Normal Equation
Gradient Descent
Classification
Model
Logistic Regression
(Gradient Descent)
Naive Bayesian
Support Vector
Machine
Unsupervised
Learning
Clustering
Model
K Mean
Bisecting K Mean
14
Lets take a
break for
questions

Linear Regression Model
n
x
n
b ........
2
x
2
b
1
x
1
b
0
x
0
b ) (
1
0
x a,
0
b Where
2
x
2
b
1
x
1
b
0
x
0
b ) (
2
x
2
b
1
x
1
b
0
b ) (
2
x
2
b
1
x
1
b a ) (
1
x
1
b a ) ( bx a ) ( y
+ + + + =
= =
+ + = + + =
+ + = + = + = =
x f
x f x f
x f x f x f
Introduction to Linear model


Y is the outcome which depends on n features x1, x2,xn,
where bi is the parameter of xi
15
Linear Regression Model
(
(
(
(
(
(
(
(
(
(
(

(
(
(
(
(
(
(
(
(
(
(

= =
=

=
= + + + + =
n
x
.
.
1
x
0
x ,
n

.
.
1

X
T
) (
x
0
) (
n
x
n
........
2
x
2

1
x
1

0
x
0
) (
x
x f
i
n
i
i
x f x f |
Compact and Matrix representation of Linear model

16
Linear Regression
Find beta values using Normal equation
y
T
X
1
X)
T
(X

=
Well know methodology
Easy to implement
Computationally less expensive for small to
medium number of feature (<1000)
No need to select Learning rate that is
needed for Gradient Descent
Pay attention on matrix inverse issue in
case of linear dependency

Solve analytically using well known Normal equation
method
17


2
)
(i)
y )
(i)
(x
m
1 i

(f
2m
1
) C(

=
=
Cost function with m observations
Now iterate through the above formula to optimize cost function . Stop
iteration when cost function does not change significantly between
iterations
J=1,2.n
And is the Learning Rate
) C(
j

j
:
j
| =
Linear Regression
Find beta values Using Gradient descent

(i)
j
x )
(i)
y )
(i)
(x
m
1 i

(f
m
1

j
:
j

=
=
Lets understand gradient descent technique using example of an ant
who wants to reach the bottom of the hat
18
Classification Model
Solution using Logistic Regression
Classification Model
Y, the outcome can get one of the two values; 0=negative
(No), 1=positive (Yes)
Email Spam, Email Threat, Fraudulent transaction
Multi-class classification problem; Tone of the message
The approach
Compute the Decision Boundary based on training data and
that boundary that you will use to classify as YES or NO
depending on where (which side of the Decision Boundary) the
observation falls
Depending on the dimension (i.e. number of features) The
Decision Boundary will be a line, a plane or a hyper plane (one
degree less than the number of features)


19
Time spend to search and make decision
(In minutes) (Feature)
P
u
r
c
h
a
s
e

a
m
o
u
n
t

i
n

$

(
F
e
a
t
u
r
e
)


20
0
1
30
10
4
3
2
Decision
Boundary
20
Distance
from
decision
boundary
Logistic Regression Model
z
e 1
1
g(z)
x)
T
g( (x)

f
x
T

e 1
1
(x)

+
=
=

+
=
The Model
0<=f(x)<=1
The function known as Sigmoid function
The Cost function become a convex function
Decision boundary
Now we need to optimize the cost function for Logistics
regression for estimating beta using Gradient descent
Logistic Regression Model
0 > x
T
|
0 < x
T
|
Predict Y=1 if g(z) 0.5, z 0
Predict Y=0 if g(z) <0.5, z <0
21
22
Classification Model
Solution using Bayesian Model
Based on conditional probability
It assumes features are statistically independent variable
We find the probability for a given observation to be classifies
as class ci, for i=1,2n
The observation will be classified as a type/category for which
this conditional probability is maximum


) ..., (
) ( ) / ( )....... / ( ) / (
) ..., /( (
) / ( )....... / ( ) / ( ) / ..., (
) ..., (
) ( ) / ..., (
) ..., /( (
, 2 , 1
2 1
, 2 , 1
2 1 , 2 , 1
, 2 , 1
, 2 , 1
, 2 , 1
n
i i n i i
n i
i n i i i n
n
i i n
n i
x x x p
c p c x p c x p c x p
x x x c p
c x p c x p c x p c x x x p
x x x p
c p c x x x p
x x x c p
=
=
=
Clustering Model with K Means
T
y
p
e

o
f

I
m
a
g
e

p
u
r
c
h
a
s
e
d

Customer Age
50 20
a
60 30
d
c
b
e
40 60 70
23

Lets take
some
Questions

24

Potrebbero piacerti anche