Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Pankaj Oli
Machine Learning
it is a field of study that gives the ability to the computer for self-learn without being explicitly
programmed.
- Algorithms or techniques that enable computer (machine) to “learn” from data .
“A computer algorithm/program is said to learn from performance measure P and experience E
with some class of tasks T if its performance at tasks in T, as measured by P, improves with
experience E.” -Tom M. Mitchell.
Classification: In classification tasks, the machine learning program must draw a conclusion
from observed values and determine to
what category new observations belong. For example, when filtering emails as ‘spam’ or ‘not
spam’, the program must look at existing observational data and filter the emails accordingly.
Regression: In regression tasks, the machine learning program must estimate – and understand
– the relationships among variables. Regression analysis focuses on one dependent variable
and a series of other changing variables – making it particularly useful for prediction
and forecasting.
Forecasting: Forecasting is the process of making predictions about the future based on the
past and present data, and is commonly used to analyse trends.
Unsupervised learning
The machine learning algorithm studies data to identify patterns. There is no answer key or
human operator to provide instruction. Instead, the machine determines the correlations and
relationships by analyzing available data. In an unsupervised learning process, the machine
learning algorithm is left to interpret large data sets and address that data accordingly.
The algorithm tries to organize that data in some way to describe its structure. This might
mean grouping the data into clusters or arranging it in a way that looks more organised.
Under the umbrella of unsupervised learning, fall:
Clustering: Clustering involves grouping sets of similar data (based on defined criteria). It’s
useful for segmenting data into several groups and performing analysis on each data set to find
patterns.
Dimension reduction: Dimension reduction reduces the number of variables being considered
to find the exact information required.
Reinforcement learning
Reinforcement learning focuses on regimented learning processes, where a machine learning
algorithm is provided with a set of actions, parameters and end values. By defining the rules,
the machine learning algorithm then tries to explore different options and possibilities,
monitoring and evaluating each result to determine which one is optimal. Reinforcement
learning teaches the machine trial and error. It learns from past experiences and begins to
adapt its approach in response to the situation to achieve the best possible result.
SUPERVISED ALGORITHMS
There are two top-level methods for finding these hierarchical clusters:
Agglomerative clustering uses a bottom-up approach, wherein each data point starts in its own
cluster. These clusters are then joined greedily, by taking the two most similar clusters together
and merging them.
Divisive clustering uses a top-down approach, wherein all data points start in the same cluster.
We can then use a parametric clustering algorithm like K-Means to divide the cluster into two
clusters. For each cluster, you further divide it down to two clusters until you hit the desired
number of clusters.
Hypothesis Testing
In statistics, a hypothesis test calculates some quantity under a given assumption. The result of
the test allows us to interpret whether the assumption holds or whether the assumption has
been violated.
A claim that we want to test is correct or not.
Null Hypothesis (H0)-Currently established or accepted value of parameters.
Alternative Hypothesis (Ha)-Research hypothesis , it involves the claims to be tested.
H0 and Ha are mathematically opposite.
Possible outcomes
-reject null hypothesis H0
-Fail to reject null hypothesis H0
Level of Confidence(C) –how confident we are in our decisions.
Level of significance(alpha)- alpha =1-C.
It is an unsupervised text mining technique to discover topic across various text document.
Topic Model forms clusters of similar and related words which are called topics.
LSA(Latent Semantic Analysis)- It attempts to leverage the context around the word to capture
the hidden concept also called topics.
- m is no of text documents
- n is no of unique words
- K is topics to be extracted from all documents.
- No of topic(K) to be specified by the user.
Steps
-make a matrix of m*n
-reduce the dimension of above matrix to k using SVA(singular value decomposition)
A=USVt
- A is matrix to be decomposed.
Ensemble methods in machine learning
Ensemble methods is a machine learning technique that combines several base
models/learners in order to produce one optimal predictive model.
It uses multiple learning algorithms together for the same time to obtain predictions with an
aim to have better prediction.
Random forest is an ensemble of decision trees.
Boosting: Boosting is an iterative technique which adjust the weight of an observation based
on the last classification. If an observation was classified incorrectly, it tries to increase the
weight of this observation and vice versa. Boosting in general decreases the bias error and
builds strong predictive models. However, they may sometimes over fit on the training data.
Implementing algorithms in series.
Can used for classification and regression.
Thank you