Sei sulla pagina 1di 13

Introduction to Basics of Machine Learning Algorithms

Pankaj Oli
Machine Learning
it is a field of study that gives the ability to the computer for self-learn without being explicitly
programmed.
- Algorithms or techniques that enable computer (machine) to “learn” from data .
“A computer algorithm/program is said to learn from performance measure P and experience E
with some class of tasks T if its performance at tasks in T, as measured by P, improves with
experience E.” -Tom M. Mitchell.

Types of Machine Learning


Supervised Learning
 In supervised learning, the machine is taught by example. The operator provides the machine
learning algorithm with a known dataset that includes desired inputs and outputs, and the
algorithm must find a method to determine how to arrive at those inputs and outputs. While
the operator knows the correct answers to the problem, the algorithm identifies patterns in
data, learns from observations and makes predictions. The algorithm makes predictions and is
corrected by the operator – and this process continues until the algorithm achieves a high level
of accuracy/performance.
 Under the umbrella of supervised learning fall: Classification, Regression and Forecasting.

Classification: In classification tasks, the machine learning program must draw a conclusion
from observed values and determine to
what category new observations belong. For example, when filtering emails as ‘spam’ or ‘not
spam’, the program must look at existing observational data and filter the emails accordingly.
 Regression: In regression tasks, the machine learning program must estimate – and understand
– the relationships among variables. Regression analysis focuses on one dependent variable
and a series of other changing variables – making it particularly useful for prediction
and forecasting.
 Forecasting: Forecasting is the process of making predictions about the future based on the
past and present data, and is commonly used to analyse trends.
Unsupervised learning

 The machine learning algorithm studies data to identify patterns. There is no answer key or
human operator to provide instruction. Instead, the machine determines the correlations and
relationships by analyzing available data. In an unsupervised learning process, the machine
learning algorithm is left to interpret large data sets and address that data accordingly.
The algorithm tries to organize that data in some way to describe its structure. This might
mean grouping the data into clusters or arranging it in a way that looks more organised.
 Under the umbrella of unsupervised learning, fall:
 Clustering: Clustering involves grouping sets of similar data (based on defined criteria). It’s
useful for segmenting data into several groups and performing analysis on each data set to find
patterns.
 Dimension reduction: Dimension reduction reduces the number of variables being considered
to find the exact information required.

Reinforcement learning
 Reinforcement learning focuses on regimented learning processes, where a machine learning
algorithm is provided with a set of actions, parameters and end values. By defining the rules,
the machine learning algorithm then tries to explore different options and possibilities,
monitoring and evaluating each result to determine which one is optimal. Reinforcement
learning teaches the machine trial and error. It learns from past experiences and begins to
adapt its approach in response to the situation to achieve the best possible result.
 SUPERVISED ALGORITHMS

 Naïve Bayes Classifier Algorithm (Supervised Learning - Classification)


The Naïve Bayes classifier is based on Bayes’ theorem and classifies every value as
independent of any other value. It allows us to predict a class/category, based on a given set of
features, using probability.
Despite its simplicity, the classifier does surprisingly well and is often used due to the fact it
outperforms more sophisticated classification methods.
 P(h|d) = (P(d|h) * P(h)) / P(d)
 P(h|d) is the probability of hypothesis h given the data d. This is called the posterior
probability.
 P(d|h) is the probability of data d given that the hypothesis h was true.
 P(h) is the probability of hypothesis h being true (regardless of the data). This is called the
prior probability of h.
 P(d) is the probability of the data (regardless of the hypothesis).

 K-Nearest Neighbour (Supervised Learning)


The K-Nearest-Neighbour algorithm estimates how likely a data point is to be a member of
one group or another. It essentially looks at the data points around a single data point to
determine what group it is actually in.
 If we are similar to our neighbour it means we are one of them
 It stores all the available cases and classifies the new data or case based on similarity measure.
 K denotes the no of nearest neighbour which are voting the class of testing data
 Linear Regression (Supervised Learning/Regression)
Linear regression is the most basic type of regression. Simple linear regression allows us to
understand the relationships between two continuous variables.
The simplest form of the regression equation with one dependent and one independent
variable is defined by the formula y = m*x + c, where y = estimated dependent variable score,
c = constant, m = regression coefficient, and x = score on the independent variable.

 Logistic Regression (Supervised learning – Classification)


Logistic regression focuses on estimating the probability of an event occurring based on the
previous data provided. It is used to cover a binary dependent variable, that is where only two
values, 0 and 1, represent outcomes.

 Decision Trees (Supervised Learning – Classification/Regression)


A decision tree is a flow-chart-like tree structure that uses a branching method to illustrate
every possible outcome of a decision. Each node within the tree represents a test on a specific
variable – and each branch is the outcome of that test.
 Random Forests (Supervised Learning – Classification/Regression)
Random forests or ‘random decision forests’ is an ensemble learning method, combining
multiple algorithms to generate better results for classification, regression and other tasks.
Each individual classifier is weak, but when combined with others, can produce excellent
results. The algorithm starts with a ‘decision tree’ (a tree-like graph or model of decisions) and
an input is entered at the top. It then travels down the tree, with data being segmented into
smaller and smaller sets, based on specific variables.

 Support Vector Machine Algorithm (Supervised Learning - Classification)


Support Vector Machine algorithms are supervised learning models that analyses data used for
classification and regression analysis. However, it is mostly used in classification problems. In
this algorithm, we plot each data item as a point in n-dimensional space (where n is number of
features you have) with the value of each feature being the value of a particular coordinate.
Then, we perform classification by finding the hyper-plane that differentiate the two classes
very well
Unsupervised Learning Algorithms
 K-Means Clustering:-
 The algorithm will categorize the items into k groups of similarity. To calculate that similarity,
we will use the euclidean distance as measurement.
The algorithm works as follows:
 First we initialize k points, called means, randomly.
 We categorize each item to its closest mean and we update the mean’s coordinates, which are
the averages of the items categorized in that mean so far.
 We repeat the process for a given number of iterations and at the end, we have our clusters.
 The “points” mentioned above are called means, because they hold the mean values of the
items categorized in it.
 Hierarchical Clustering
 Hierarchical clustering is one of the algorithms of Clustering technique, in which similar data
grouped in a cluster. It is an algorithm that builds the hierarchy of clusters. This algorithm
starts with all the data points assigned to a bunch of their own. Then two nearest groups are
merged into the same cluster. In the end, this algorithm terminates when there is only a single
cluster left. Start by assign each data point to its bunch. Now find the closest pair of the group
using Euclidean distance and merge them into the single cluster. Then calculate the distance
between two nearest clusters and combine until all items clustered into a single cluster.

 There are two top-level methods for finding these hierarchical clusters:

 Agglomerative clustering uses a bottom-up approach, wherein each data point starts in its own
cluster. These clusters are then joined greedily, by taking the two most similar clusters together
and merging them.

 Divisive clustering uses a top-down approach, wherein all data points start in the same cluster.
We can then use a parametric clustering algorithm like K-Means to divide the cluster into two
clusters. For each cluster, you further divide it down to two clusters until you hit the desired
number of clusters.
 Hypothesis Testing
 In statistics, a hypothesis test calculates some quantity under a given assumption. The result of
the test allows us to interpret whether the assumption holds or whether the assumption has
been violated.
 A claim that we want to test is correct or not.
 Null Hypothesis (H0)-Currently established or accepted value of parameters.
 Alternative Hypothesis (Ha)-Research hypothesis , it involves the claims to be tested.
 H0 and Ha are mathematically opposite.

 Possible outcomes
 -reject null hypothesis H0
 -Fail to reject null hypothesis H0
 Level of Confidence(C) –how confident we are in our decisions.
 Level of significance(alpha)- alpha =1-C.

 Significant Hypothesis (P-value)


 If p-value>alpha then fail to reject H0.
 p-value<=alpha then reject H0 accept Ha
 Topic Modelling.

 It is an unsupervised text mining technique to discover topic across various text document.
 Topic Model forms clusters of similar and related words which are called topics.

 LSA(Latent Semantic Analysis)- It attempts to leverage the context around the word to capture
the hidden concept also called topics.
- m is no of text documents
- n is no of unique words
- K is topics to be extracted from all documents.
- No of topic(K) to be specified by the user.

 Steps
-make a matrix of m*n
-reduce the dimension of above matrix to k using SVA(singular value decomposition)

SVD decomposed in three matrices matrix U , matrix S and matrix Vt(transpose).

A=USVt
- A is matrix to be decomposed.
 Ensemble methods in machine learning

 Ensemble methods is a machine learning technique that combines several base
models/learners in order to produce one optimal predictive model.
 It uses multiple learning algorithms together for the same time to obtain predictions with an
aim to have better prediction.
 Random forest is an ensemble of decision trees.

 Types of Ensemble Methods


 Bagging or Bootstrap Aggregation: Bagging tries to implement similar learners on small
sample populations and then takes a mean of all the predictions. In generalized bagging, we
can use different learners on different population.
 Implementing same algorithms in parallel with random datasets.

 Boosting: Boosting is an iterative technique which adjust the weight of an observation based
on the last classification. If an observation was classified incorrectly, it tries to increase the
weight of this observation and vice versa. Boosting in general decreases the bias error and
builds strong predictive models. However, they may sometimes over fit on the training data.
 Implementing algorithms in series.
 Can used for classification and regression.
Thank you

Potrebbero piacerti anche