Class Material - 1

ME6111D Machine Learning and
Artificial Intelligence
Vinay V Panicker
vinay@nitc.ac.in
Contact me 9447597959
Course Outcomes:
• CO1: Illustrate the basic concept of classification and

regression and its applications in predictive analytics
• CO2: Comprehend the features of programming tools for data
analysis and to develop skills to code algorithms
• CO3: Perform the techniques of well-known supervised and
unsupervised learning algorithms.
• CO4: Explore the application of machine learning algorithms and
Artificial Intelligence techniques for solving practical problems.
• Module 1: (14 hours)
Syllabus
• Relation between Machine Learning and Statistics. Introduction to Algorithms in
Machine Learning – Classification, Supervised machine learning – linear regression,
Multiple linear regression, Logistic regression – Model representation, Discriminant
Analysis, Classification Trees, Support Vector Machine.
• Introduction to unsupervised learning - Clustering – types of clustering,
Dimensionality Reduction, Principal Component Analysis algorithm, Factor analysis.
• Era of Intelligent Systems - The Fourth Industrial Revolution Impact, The Technology
of the Fourth Industrial Revolution, Introduction to Artificial Intelligence and Cognition.
Application of artificial intelligence (AI) techniques: Meta-heuristics: Genetic
Algorithm, Scatter Search, Tabu Search, Particle Swarm Intelligence, Ant Colony
Optimization; Artificial Neural Networks; Fuzzy Logic Systems; Case based
reasoning.
References
• J.F. Hair, W.C. Black, B. J. Babin, and R.E. Anderson, Multivariate Data
Analysis. 7thEdn. Pearson New International, 2015.
• T. Hastie, R. Tibshirani, J. Friedman, The elements of statistical learning. 2nd
Edn. New York: Springer, 2017.
• E. Rich, K. Knight, S. B. Nair, Multivariate Data Analysis. 3rd Edn. Pearson
New International, 2012.
• M. Gardener, Beginning R: The statistical programming language. Wiley India
Publication, 2012.
• R.A. Johnson, and D.W. Wichern, Applied Multivariate Statistical Analysis.
6thEdn. Pearson New International, 2015.
• J.S. Hurwitz, M. Kaufman, and A. Bowles, Cognitive Computing and Big Data
Analytics, Wiley 2005.
• M. Skilton, and F. Hovsepian, The 4th Industrial Revolution, Palgrave
Macmillan, 2017.
Selected References
• J.F. Hair, W.C. Black, B. J. Babin, and R.E. Anderson, Multivariate Data
Analysis. 7thEdn. Pearson New International, 2015.
• U. D. Kumar, Business Analytics – The science of data-driven decision making.
1st Edn. India: Wiley, 2017.
• M. Pradhan, and U.D. Kumar, Machine Learning using Python. 1st Edn. India:
Wiley, 2019.
• A. Srinivasaraghavan and V. Joseph, Machine Learning. 1st Edn. India: Wiley,
2019.
Module 1 Topics
 Relation between Machine Learning and Statistics.

 Analytics – different types
 Correlation analysis – Pearson correlation coefficient, Spearman rank
correlation, Point bi-serial correlation, Phi coefficient
 Supervised machine learning - Simple Linear Regression and
Multiple Linear Regression
 Logistic Regression
 Decision Trees
 Discriminant Analysis
 Support Vector Machine.
Module 2 Topics
 Unsupervised learning - Clustering – types of clustering,

 Dimensionality Reduction
 Principal Component Analysis algorithm
 Factor analysis.
Module 3 Topics
 Era of Intelligent Systems - The Fourth Industrial Revolution Impact, The Technology of
the Fourth Industrial Revolution,
 Introduction to Artificial Intelligence and Cognition.
 Application of artificial intelligence (AI) techniques: Meta-heuristics:
 Genetic Algorithm,
 Scatter Search,
 Tabu Search,
 Particle Swarm Intelligence,
 Ant Colony Optimization;
 Artificial Neural Networks; Fuzzy Logic Systems;
 Case based reasoning.
Evaluation Policy
Marks Remarks
distribution
Midterm/Interim Test 20% Portions will be informed in the class
Practice sessions and Tutorials (25%)
Assignments / Tutorials/
30% Course project (50%)
Course Project/ Quizzes
Moodle Quiz(zes) (25%)
End Exam 50%
• Coding Exam 10
• Written Exam 40
Minimum mark required to get a pass grade will be 40 marks out of 100
Practice Sessions and Tutorials
Sl.No. Sessions Hours

1 Introduction to R and Python Languages 2
2 Linear and Multiple Linear Regression 1.5
Module - 1 3 Logistic Regression, Support Vector Machine, KNN 2
4 Decision Tree, Random Forest 1.5
Module - 2 5 K-means clustering, Principal Component Analysis 1.5
6 AI - Artificial Neural Network 1.5
Module - 3
7 Reinforcement Learning 1.5
Why ML&AI?
• Data Scientist: The Sexiest Job of the Twenty-ﬁrst Century —
• Title of a Harvard Business Review article by
• Thomas Davenport and DJ Patil
Why ML&AI?
Responsibilities for a Machine Learning Engineer
• Study and transform data science prototypes

• Design machine learning systems
• Research and implement appropriate ML algorithms and tools
• Develop machine learning applications according to
requirements
• Select appropriate datasets and data representation methods
• Run machine learning tests and experiments
• Perform statistical analysis and fine-tuning using test results
• Train and retrain systems when necessary
• Keep abreast of developments in the field
Why ML&AI?
Skill set expected for a Machine Learning Engineer
• Understanding of data structures, data modeling and software
architecture
• Ability to write robust code in programming tools like Python, R, Java
• Proficiency in big data tools Hadoop, Spark…
• Proficiency in basic libraries for analytics/machine learning such as
scikit-learn and pandas
• Proficiency with a deep learning framework such as TensorFlow or
Keras
• Expertise in visualizing and manipulating big datasets
• Ability to select hardware to run an ML model with the required
latency
What do you think?
1. Which hospital/Doctor to be selected for a medically better treatment?
2. Anticipate the risk of accepting a careless fool as a customer and priced
the insurance premium accordingly.
3. Could the creditors detect the bogus bills filed under a wrong name?
4. The financial adviser could warn me about the out-value of the
property
5. Booking the flight, can I determine that the airfare was going to drop?
6. Risk of changing economic conditions and growing competition.
But what do they all have in common!!
They all have in common… Prediction
Prediction is power.
Making predictions poses a tough challenge. Each prediction
depends on multiple factors.
The challenge is tackled by a systematic, scientific means to
develop and continually improve prediction—to literally learn to
predict.
The solution is machine learning—
Computers automatically develop new knowledge and capabilities by
furiously feeding on modern society’s greatest and most potent unnatural
resource DATA
Quotes
• “In God we trust, all others must bring data.” – W. Edwards Deming
• “Learning is not compulsory... neither is survival.” - W. Edwards Deming

“EXPERIENCE BY ITSELF TEACHES NOTHING... WITHOUT THEORY, EXPERIENCE
HAS NO MEANING. WITHOUT THEORY, ONE HAS NO QUESTIONS TO ASK. HENCE,
WITHOUT THEORY, THERE IS NO LEARNING.”
Computer
Science
Machine Learning
Engineering Mathematics
and Statistics
Definitions of Machine Learning
• Machine Learning (ML) at its most basic is the practice of using

algorithms to parse data, learn from it, and then make a
determination o
• McKinsey & Company states that “ML is based on algorithms that
can learn from data without relying on rules-based programming.”
• ML lies at the intersection of Computer Science, Engineering,
and Statistics.
Journey from Statistics to ML
• Machine Learning (ML) is a branch of study in which a model can learn

automatically from the experiences based on data without exclusively
being modelled like in statistical models.
• Statistics is a branch of mathematics dealing with the collection,
analysis, interpretation, presentation and organization of numerical
data.
• Machine Learning is a branch of computer science that utilizes past
experience to learn from and use its knowledge to make future
decisions.
• ML is an intersection of Computer Science, Engineering, and Statistics
Statistics
Descriptive Inferential
Statistics Statistics
Business
Analytics
Descriptive Predictive Prescriptive

Analytics Analytics Analytics
Relation between Machine Learning and
Statistics
 Machine Learning is …
an algorithm that can learn from data without relying on rules-
based programming.
Is to generalize a detectable pattern or to create an unknown rule
from given data.
Relation between Machine Learning and
Statistics
 Statistical Modelling is …
 formalization of relationships between variables in the form of
mathematical equations.
 Machine learning and statistical modeling are two different
branches of predictive modeling.
 “They are both concerned with the same question: how do we learn
from data?”
Machine learning can be called as “glorified statistics”.
Sl. Sl.
Statistical Modelling Machine Learning
No. No.
1. From the school of statistics and Mathematics 1. From the school of Computer Science
Formalization of relationships in the form of Algorithms learn from data without relying on a
2. 2.
mathematical relationships rule-based programming
Assume shape of the model curve prior to No assumption about underlying shape
3. 3.
model fitting on the data
Statistical models can be developed on a single ML algorithms need to be trained on two
4. 4.
dataset datasets called training and validation data
Data will be split into training and testing data Data will be split into training, validation and
5. 5.
testing data
Statistical model predicts the output with accuracy ML just predicts the output with an accuracy of
6. of 85 percent and having 90 percent confidence 6. 85 percent
about it
7. Used for research 7. Used for Business decisions
Venn diagram that shows how machine learning and statistics are related
Variables used
• Y - Dependent/response/target/outcome variable
• X - Independent/predictor variable
Statistics and machine learning are like distant
cousins
o Both machine learning and
• They’re related, sure. But their parents
statistics share the same are different.
goal: Learning from data.  Machine learning is a subfield of
o Both these methods focus on computer science and artificial
drawing knowledge or insights intelligence.
from the data.  It deals with building systems that can
learn from data, instead of explicitly
o Methods are different. programmed instructions.
 A statistical model, on the other hand,
is a subfield of mathematics
Introduction to Algorithms in Machine Learning
ML Algorithms
Supervised Unsupervised Reinforcement

Learning Learning Learning
Target Variable Target Variable Target Variable Target Variable Target Variable
Categorical Continuous not available Categorical not available
Classification Regression Clustering Association Classification Control

Steps in machine learning model development
and deployment
1. Collection of data: Structured source data, web scrapping, Application
Programming Interface (API), chat interaction, and so on. ML work on
both structured and unstructured data (voice, image, and text).
2. Data preparation and missing/outlier treatment: Data is to be formatted
as per the chosen machine learning algorithm; also, missing value
treatment needs to be performed by replacing missing and outlier values
with the mean/median, and so on.
3. Data analysis and feature engineering: Data needs to be analyzed in
order to find any hidden patterns and relations between variables, and so
on. Correct feature engineering with appropriate domain knowledge will
solve 70 percent of the problems.
Steps in machine learning model development
and deployment
4. Train algorithm on training and validation data: Post feature
engineering, data will be divided into three chunks (train,
validation, and test data). Machine learning are applied on training
data and the hyperparameters of the model are tuned based on
validation data to avoid overfitting.
5. Test the algorithm on test data: Performance will be checked
against unseen test data. If the performance is still good enough,
we can proceed to the next and final step.
6. Deploy the algorithm: Trained machine learning algorithms will be
deployed on live streaming data to classify the outcomes.
Algorithms in Machine Learning
• Supervised Learning - This algorithm consist of a target / outcome variable (or
dependent variable) which is to be predicted from a given set of predictors
(independent variables). Using these set of variables, generate a function that
map inputs to desired outputs. The training process continues until the model
achieves a desired level of accuracy on the training data.
• Supervised learning problems can be further grouped into regression and
classification problems.
 Classification problems  Regression problems
 Logistic regression  Linear regression
 Lasso and ridge regression  Decision trees (regression trees)
 Decision trees (classification trees)  Bagging regressor
 Bagging classifier  Random forest regressor
 Random forest classifier  Boosting regressor - (adaboost, gradient
 Boosting classifier (adaboost, gradient boost, and xgboost)
boost, and xgboost)  SVM regressor
 SVM classifier
• Unsupervised Learning - In this algorithm, we do not have any
target or outcome variable to predict / estimate. It is used for
clustering population in different groups, which is widely used
for segmenting customers in different groups for specific
intervention.
 Principal component analysis (PCA)
 K-means clustering
• Logistic regression:
 Outcomes are discrete classes rather than continuous values.
 For example, a customer will arrive or not, he will purchase
the product or not.
 Logistic regression has a high bias and a low variance error.
 In statistical methodology, it uses the maximum likelihood
method to calculate the parameter of individual variables.
 In contrast, in machine learning methodology, log loss will be
minimized with respect to β coefficients (also known as
weights).
• Linear regression:
 Prediction of continuous variables such as customer income
and so on.
 It utilizes error minimization to fit the best possible line in
statistical methodology.
 However, in machine learning methodology, squared loss will
be minimized with respect to β coefficients.
 Linear regression also has a high bias and a low variance error.
Bias versus variance trade-off
 Every model has both bias and variance error components
 Bias and variance are inversely related to each other; while trying to reduce one
component, the other component of the model will increase
 The true art lies in creating a good fit by balancing both. The ideal model will have
both low bias and low variance
 Errors from the bias component come from erroneous assumptions in the underlying
learning algorithm
 High bias can cause an algorithm to miss the relevant relations between features and
target outputs; this phenomenon causes an under fitting problem
 On the other hand, errors from the variance component come from sensitivity to
change in the fit of the model, even a small change in training data; high variance can
cause an overfitting problem
Performance Measures
• Confusion matrix: A matrix that represents the actual versus
the predicted.
Predicted Yes Predicted No
Actual Yes TP FN
Actual No FP TN
• Some terms used in a confusion matrix are:
• True positives (TPs): Cases when we predict the variable as YES
when the variable actually was YES.
• True negatives (TNs): Cases when we predict the variable as
NO when the variable actually was NO.
• False positives (FPs): When we predict as YES when it was NO.
FPs are also considered to be type I errors.
• False negatives (FNs): When we predict as NO when it was
actually YES. FNs are also considered to be type II errors.
• Precision (P): When yes is predicted, how often is it

correct? TP
TP  FP
Recall (R)/sensitivity/true positive rate: Among the actual yeses, what fraction was
predicted as yes?
TP Predicted
Yes
Predicted
No
TP  FN Actual Yes TP FN
Actual No FP TN
Specificity: Among the actual nos, what fraction was predicted as no?
Also equivalent to 1- false positive rate:
TN
TN  FP
F1 score (F1): This is the harmonic mean of the precision and recall.
Multiplying the constant of 2 scales the score to 1 when both precision and
recall are 1:
2
F1 
1 1

P R
• Area under curve (ROC): Receiver Operating
Characteristic curve is used to plot between true positive
rate (TPR) and false positive rate (FPR), also known as a
sensitivity and 1- specificity graph
• R-squared (coefficient of determination):
• Measure of the percentage of the response variable variation that is
explained by a model. It also a measure of how well the model minimizes
error compared with just utilizing the mean as an estimate.
Measures of Variation
• Total variation is made up of two parts:

SST  SSR  SSE
Total Sum of Regression Sum of Error Sum of
Squares Squares Squares
SST   ( Yi  Y )2 SSR   ( Ŷi  Y)2 SSE   ( Yi  Ŷi )2

where:
Y = Mean value of the dependent variable
Yi = Observed value of the dependent variable
Yˆi = Predicted value of Y for the given Xi value
FDP_NITC_May 2015 48
Measures of Variation
Y
Yi 

SSE = (Yi - Yi )2 Y
_
SST = (Yi - Y)2

 _2
Y
SSR = (Yi - Y)
_ _
Y Y
Xi X
Coefficient of Determination, r2
• The coefficient of determination is the portion of the total variation in the

dependent variable that is explained by variation in the independent
variable
• The coefficient of determination is also called r-squared and is denoted as
r2
SSR regression sum of squares
r 
2

SST total sum of squares
note:
0  r 1 2
Adjusted R-squared
• Adjusted R-squared: It penalizes the R-squared value if extra variables

without a strong correlation are included in the model:
2
( 1-R ) (n-1 )
R adjusted  1 
2
n  k 1
n=sample size, k=number of predictors (or variables)
Adjusted R-squared value is the key metric in evaluating the quality of

linear regressions.
Any linear regression model having the value of R2 adjusted >= 0.7 is
considered as a good enough model to implement.
Scatter Diagram
• A scatter diagram is a graphical presentation of the relationship
between two quantitative variables.
• One variable is shown on the horizontal axis and the other
variable is shown on the vertical axis.
• The general pattern of the plotted points suggests the overall
relationship between the variables.
Scatter Diagram
• A Positive Relationship
x
Scatter Diagram
• A Negative Relationship
y
x
Scatter Diagram
• No Apparent Relationship
y
x
Example: Panthers Football Team
• Scatter Diagram
The Panthers football team is interested in investigating the
relationship, if any, between interceptions made and points
scored.
x = Number of y = Number of
Interceptions Points Scored
1 14
3 24
2 18
1 17
3 27
• Scatter Diagram
y
Number of Points Scored

30
25
20
15
10
5
0 x
0 1 2 3
Number of Interceptions
• The preceding scatter diagram indicates a positive
relationship between the number of interceptions and the
number of points scored.
• Higher points scored are associated with a higher number
of interceptions.
• The relationship is not perfect; all plotted points in the
scatter diagram are not on a straight line.
Measures of Association
Between Two Variables
Often a manager/a researcher is interested in the relationship

between two variables
• Covariance
• Correlation Coefficient
Covariance
• Covariance measures the strength of the linear

association between two variables.
• Positive values indicate a positive relationship.
• Negative values indicate a negative relationship.
• Covariance can have any value, but unable to use this value
to determine the relative strength of the relationship.
Covariance
• For a sample of size n with the observations
(x1, y1), (x2, y2), and so on.
• If the data sets are samples, the covariance is denoted by sxy.
 ( xi  x )( yi  y )
sxy 
n 1
• If the data sets are populations, the covariance is denoted by  xy .
 ( xi   x )( yi   y )
 xy 
N
Example
Week Number of Sales volume
commercials (x) (y)
1 2 50
2 5 57
3 1 41
4 3 54
5 4 54
6 1 38
7 5 63
8 3 48
9 4 59
10 2 46
Scatter Diagram
70
60
Sales (in monetary units) 50
40
30
20
10
0
0 1 2 3 4 5 6
Number of commercials
Calculations for sample covariance
(x) (y) ( x  x ) ( y  y )
i i ( xi  x )( yi  y )
2 50 -1 -1 1
5 57 2 6 12
1 41 -2 -10 20
3 54 0 3 0
4 54 1 3 3
1 38 -2 -13 26
5 63 2 12 24
3 48 0 -3 0
4 59 1 8 8
2 46 -1 -5 5
Calculations for sample covariance
s xy 
 ( x  x )( y  y )
i i

99
 11
n 1 10  1
Scatter Diagram
70
60
Sales (in monetary units) 50
40
30
20
10
0
0 1 2 3 4 5 6
Number of commercials
Correlation Coefficient
• The coefficient can take on values between -1 and +1.
• Values near -1 indicate a strong negative linear
relationship.
• Values near +1 indicate a strong positive linear
relationship.
• If the data sets are samples, the coefficient is rxy.
sxy
rxy 
sx s y
• If the data sets are populations, the coefficient is xy .

 xy
 xy 
 x y
Week Number of Sales
commerci volume
als (x) (y)
( xi  x ) ( yi  y ) ( x  x ) 2 ( y  y ) 2 ( x  x )( y  y )
i i i i
1 2 50 -1 -1 1 1 1
2 5 57 2 6 4 36 12
3 1 41 -2 -10 4 100 20
4 3 54 0 3 0 9 0
5 4 54 1 3 1 9 3
6 1 38 -2 -13 4 169 26
7 5 63 2 12 4 144 24
8 3 48 0 -3 0 9 0
9 4 59 1 8 1 64 8
10 2 46 -1 -5 1 25 5
3 51 0 0 20 566 99
Sx=1.49 Sy=7.93 Sxy=11
r= 0.930

Class Material - 1

Caricato da

Informazioni sul documento

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Class Material - 1

Caricato da

Copyright:

Formati disponibili

ME6111D Machine Learning and

• CO1: Illustrate the basic concept of classification and

 Relation between Machine Learning and Statistics.

 Unsupervised learning - Clustering – types of clustering,

Sl.No. Sessions Hours

• Study and transform data science prototypes

• “Learning is not compulsory... neither is survival.” - W. Edwards Deming

• Machine Learning (ML) at its most basic is the practice of using

• Machine Learning (ML) is a branch of study in which a model can learn

Descriptive Predictive Prescriptive

Supervised Unsupervised Reinforcement

Classification Regression Clustering Association Classification Control

• Precision (P): When yes is predicted, how often is it

• Total variation is made up of two parts:

SST   ( Yi  Y )2 SSR   ( Ŷi  Y)2 SSE   ( Yi  Ŷi )2

• The coefficient of determination is the portion of the total variation in the

• Adjusted R-squared: It penalizes the R-squared value if extra variables

Adjusted R-squared value is the key metric in evaluating the quality of

Number of Points Scored

Often a manager/a researcher is interested in the relationship

• Covariance measures the strength of the linear

• If the data sets are populations, the covariance is denoted by  xy .

Sales (in monetary units) 50

Sales (in monetary units) 50

• If the data sets are populations, the coefficient is xy .

Sx=1.49 Sy=7.93 Sxy=11

Potrebbero piacerti anche