Machine Learning For Humans (2019)

Machine Learning for
Humans
September 17, 2019

This book is the result of a collaborative effort of a community of people
like you, who believe that knowledge only grows if shared.
We are waiting for you!
Get in touch with the rest of the team by visiting http://join.wikitolearn.org
You are free to copy, share, remix and reproduce this book, provided that you properly give
credit to original authors and you give readers the same freedom you enjoy.
Read the full terms at https://creativecommons.org/licenses/by-sa/3.0/
Contents
1 Introduction 1
1.1 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 Style . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.2 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.3 Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 What is Machine Learning? . . . . . . . . . . . . . . . . . . . . . . 2
2 Foundations 3
2.1 Describe the world . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 What is Data? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.3 Find the edges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.4 Find the center . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.5 Measure error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.6 Groups - clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.7 Probability and uncertainty . . . . . . . . . . . . . . . . . . . . . . 6
2.8 Supervised or unsupervised? . . . . . . . . . . . . . . . . . . . . . . 7
3 Visualization 8
3.1 Intro . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.2 Scatterplot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
4 Supervised Learning 10
4.1 Nearest Neigbors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4.1.1 What? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4.1.2 Why? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4.1.3 How? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4.1.4 Try it. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4.2 Classification Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Contents 3 / 20
4.2.1 What? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4.2.2 Why? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.2.3 How? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.2.4 Try it. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.3 Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.3.1 What? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.3.2 Why? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.3.3 Try it out . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4.4 Logistic Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.4.1 What? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.4.2 Why? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.4.3 How? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.4.4 Try it. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
5 Unsupervised Learning 15
5.1 K-Means Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . 15
5.1.1 Additional resources . . . . . . . . . . . . . . . . . . . . . . 16
6 Appendix 17
6.1 Further Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
6.1.1 Articles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
6.1.2 Books . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
7 Text and image sources, contributors, and licenses 18

7.1 Text . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
7.2 Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
7.3 Content license . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
Chapter 1. Introduction 1 / 20
Chapter 1
Introduction
1.1 Purpose
Machine Learning for Humans is intended as a people friendly introduction to

the powerful and exciting world of Machine Learning.
1.1.1 Style
This book will attempt to avoid technical jargon and complicated mathematics
to the greatest extent possible. It is intended as a gentle introduction for a casual
reader. There are many resources available to continue learning more in-depth
concepts.
Where code examples are helpful we will use Python, since it is widely adopted
in the machine learning community and has friendly syntax.
1.1.2 Approach
We will focus on machine learning concepts from a high level, using diagrams
and visual programming where possible. When code is needed, it will be self-
contained. This means that each chapter should stand alone (aside from installing
dependencies for the book).
1.1.3 Structure
Chapters, particularly in the core of the book, will be organized to encourage

understanding:
• What?
• Describe the concept, e.g. what the machine learning algorithm does
• Why?
• Describe why the machine learning algorithm is used.
• How?
Chapter 1. Introduction 2 / 20
• Show a brief example of the concept used in a meaningful way.
• Try it.
• Give the reader some easy challenges to apply their knowledge.
This structure is intended to give the reader a framework for understanding, and
practical hands-on exercises. The try it exercises should be relatively easy to
complete, so the reader has a quick sense of accomplishment.
1.2 Tools
In order to give the reader hands-on machine learning experience, we need to

select a few tools. This is not to say the selected tools will be the ’best’ for all
uses, or the only tools ever needed. We just need something to start with.
The initial toolkit will center around the Python language, and consists of:
• Scikit Learn - toolkit with many machine learning algorithms
• Orange Machine Learning - visual machine learning design tool
• OpenML - community of machine learning projects/people who are openly

collaborating
The two dependencies listed above are built on other, lower-level dependencies
that were not mentioned for simplicity. As we explore the machine learning code
examples, we will encounter the other dependencies.
1.3 What is Machine Learning?
Machine learning is the task of making machines smarter. It is a field of study

which gives machines, power to learn from the patterns and analogies and figure
out a way to infer or predict from data.
In machine learning there are three different type of algorithms:
• Supervised Learning: when our system learns by looking at the training

cases specifically designed for it.
• Unsupervised Learning: When our system looks at the data and tries to
formulate or make sense of the weakly defined or visible rules in the data.
• Reinforcement Learning: It is when system learns from the trial and error
method just like a new born kid.
Chapter 2. Foundations 3 / 20
Chapter 2
Foundations
2.1 Describe the world
Our process begins by describing the world in some way. We make a series of
observations of some type of object or phenomena. Each observation has one
or more qualities, commonly called attributes. Attributes can be thought of in
a couple of ways, names or numbers.
Names
Names are quite common, and are usually distinguished by different text. For
example, we might have different names for a quality we call color, which could
include red, orange, and blue.
Numbers
Numbers may contain more detail than names. For example numbers can tell us
how things are ordered, or how large they are when compared with one another.
We will see these types of measurement referred to as Ordinal, meaning ordered,
and Interval, meaning there is a regular distance from one number to the next
(hence magnitude).
2.2 What is Data?
Data is any piece of information or observation about any object or person organ-
ised into variables to make the processing easier. Variable are the basic building
blocks that hold information about any observation about the particular object.
Weight Shape Type

1 170 Circular Apple
2 145 Bumpy Orange
3 165 Circular Apple
In this table, rows are the observations and each column is the variables contain-
ing information about the object that is fruit.
Variable can be of two types:
• Categorical Variable: These are the variable that takes category or label
values from the finite set of values. In the above table, you can already
guess its the ” Shape ” variable that is categorical as it can only hold two
values Circular/Bumpy.
• Qualitative Variable: These are the variable that takes numerical values
and represent some kind of measurement. In the above table, “Weight” is
the qualitative value that contains the numerical value of measurements.
2.3 Find the edges
It is sometimes important to know boundaries of a set of observations, or objects.

Each observation might have different values for one or more attributes. When
taking all values in to account, we can see the highest and lowest value observed.
The distance between the highest and lowest value is known as the range.
2.4 Find the center
It is often desirable to find the center of a group of objects or observations. There

are several ways to find a ’statistical’ center, each with strengths and weaknesses.
1.6
mode
1.4
median
1.2
mean
1.0
0.8
0.6 σ = 0.25
0.4
σ=1
0.2
0.0
0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 2.2
Comparing Mean,
Median, and Mode. Notice how the shape of the ’distribution’ affects each
measurement. By Cmglee, CC BY-SA 3.0, Link
Mean
The Mean, commonly called ’average’, looks at for the center as a measurement
of how spread out the group members are and where most members reside. It
is subject to bias from the most far reaching members of the group, known as
outliers. Despite the strong influence of outliers, the Mean is a very common way
to find the center of a group, or distribution.
Median
Median is a way to find the center by dividing the set of observations exactly in
half. It doesn’t matter how many of each value there are, just that there are an
equal number of observations on each side.
Mode
Mode considers the center as the value that is seen most frequently. In other
words, the Mode is the value observed with the highest count.
2.5 Measure error
2.6 Groups - clusters
Observations of reality often have similarities, but just enough uniqueness to

make them distinct. For example, when recording weather we typically record
temperature, barometric pressure, etc. When displaying our data, we might
notice that some observations are nearer to others sometimes forming noticeable
groups or patterns.
Data showing clustered

points. Original source: Jake Vanderplas - Clustering: K-Means In-Depth
When observations appear in groups, we often call those groups ’clusters’. Clus-
tering is process of dividing a set of observations into different clusters or groups
on the basis of attributes.
Data showing clustered

points. Original source: Jake Vanderplas - Clustering: K-Means In-Depth
2.7 Probability and uncertainty
In many ways, the world around us is uncertain* [1]. We have developed ways to
describe our level of uncertainty, and attempt to predict uncertain events. When
an event might occur, we can say there is some probability* [2] that it might
happen.
Probability is typically expressed in percentages. For example, there might be a
30% probability of rain tomorrow. Predicting the outcome of some event is called
forecasting.* [3]
Conditional probability
Some events depend on previous events. For example, the weather tomorrow is
dependent* [4] on the weather today. Dynamic systems* [5]* [6]* [7] are those,
such as weather, where the current situation is sensitive to previous situations.
References
[1] ‘Uncertainty’, Wikipedia, <https://en.wikipedia.org/w/index.php?title=Uncertainty>
[2] ‘Probability’, Wikipedia, <https://en.wikipedia.org/w/index.php?title=Probability>
[3] ‘Forecasting’, Wikipedia, <https://en.wikipedia.org/w/index.php?title=Forecasting>
[4] ‘Conditional Probability’, Wikipedia, <https://en.wikipedia.org/w/index.php?title=

Conditional_probability>
[5] ‘Dynamical System’, Wikipedia <https://en.wikipedia.org/w/index.php?title=Dynamical_

system>
[6] ‘System Dynamics’, Wikipedia, <https://en.wikipedia.org/w/index.php?title=System_

dynamics>
[7] ‘Systems Thinking’, Wikipedia, <https://en.wikipedia.org/w/index.php?title=Systems_

thinking>
2.8 Supervised or unsupervised?
There are times when we want to group observations together. For example,
grouping animals by diet. Sometimes, the group ’labels’ are known in advance
and sometimes there are no clear labels. These two cases roughly divide the realm
of machine learning into two pillars: unsupervised and supervised learning.
Chapter 3. Visualization 8 / 20
Chapter 3
Visualization
3.1 Intro
Before diving in to the machine learning landscape, we should be familiar with

basic tools for data visualization. These tools are useful to see how data is
structured, and can help us choose the best machine learning algorithms for the
job.
3.2 Scatterplot
A scatterplot allows us to compare two, or more, variables simultaneously. We

can use a scatterplot to see the overall shape of data with numeric attributes.
Chapter 3. Visualization 9 / 20
Example scatterplot
showing three variables: petal width, petal length, and species
Chapter 4. Supervised Learning 10 / 20
Chapter 4
Supervised Learning
4.1 Nearest Neigbors
4.1.1 What?
4.1.2 Why?
4.1.3 How?
4.1.4 Try it.
4.2 Classification Tree
4.2.1 What?
A classification tree, like its name implies, has a root and a series of branches,
eventually terminating at leaf nodes. Unlike regular trees, however, they tend
to grow upside down.
Decision tree for Iris data

set.
At the root are all observations in a data set and some decision boundary that
separates the observations into two groups.
Decision boundary for

petal width at 0.8 units.
The two groups are again separated into two subsets each, along another dividing
point. This process repeats a given number of times, called depth.
At the end of the line, we can see the result of the series of decisions. Every
observation that enters at the top of the decision tree will pass through a number
of decision nodes and end up in one of the leaf nodes.
The nodes also tell us how many observations from the training data set would
be accurately classified with the given decision boundaries.
Decision tree leaf nodes

showing accuracy of classification
To improve accuracy, we can increase the depth of the decision tree, at the risk
of overfitting the training data set.
4.2.2 Why?
Classification trees provide us humans with an intuitive glimpse into the classi-
fication model. They can be used manually to guide decisions or to gain under-
standing of factors that distinguish observations. Classification trees are one of
the easier types of machine learning algorithms for humans to understand.
4.2.3 How?
4.2.4 Try it.
4.3 Linear Regression
4.3.1 What?
Regression : Predict real-valued output

Regression is a form of technique which investigates the relationship between two
or more variable. We try to find out a general pattern and try to predict the
values. Regression is whole concept but to keep it simple lets first discuss Linear
Regression which also is known as Regression in one variable.
4.3.2 Why?
You may have a lot of data with a real valued attribute, such as temperature
observations. You may wish to see which ’direction’ the data is trending. A
linear regression would draw a line through the middle of your data, with as little
error as possible. This gives you a quick glimpse of the data and a way to predict
an x value for any value of y.
Visualization of a linear
regression. Original author: Jake Vanderplas Introduction to Scikit-Learn: Machine
Learning with Python
4.3.3 Try it out
Let’s say you have an input which when fed into a function gives you a certain
output. With that function you can predict any value (output) for any given
value (input). Mathematically it can be defined as follows :
output=F(input)
Our task in regression is of predicting the output accurately. For that purpose first
we need to get an approximate function F( ) so that when we give that function
an input it can predict those desired value for us. One thing that must be noted
is that the more our function is approximated accurately the more accurate our
predictions.
I have used the word approximated because in real world data no variable can be
strictly said to follow a specific pattern there can be noise, error or many other
ambiguities.
Okay lets first understand the data we will be dealing with.
Format of data :
• a variable X (Input)
• a variable y (Output). We have various instance of the X’s and their corre-
sponding y’s. Our aim is to find a function best describing these patterns
and which can also help us in predicting values for new X’s.
Notation :
Data points : (x,y) instances in our data
n - number of data points
Lets say we have n data points - (x1,y1),(x2,y2),(x3,y3).........(xn,yn)

h(x) - predictor function
F’ - value output by the function h(x) when x is an input to it.
Now the question of interest is how can we approximate the function.
( You will find this common in most of the algorithms so better memorise it )
Lets devide our work in tasks for better understanding :
Task 1 : Assume a hypothesis function which we want to approximate.
Task 2 : Find how well this hypothesis function performs.
Task 3 : Update the Approximated function appropriately and interate till either
dataset is finished or the function conveges.
Task 1
Under this section we assume our hypothesis fuinction.

lets say,
h(x)=(theta0)+(theta1)*(x)
This is equation of a straight line It is another reason it is called linear regression.
Task 2
Under section how well hypothesis function performs :

This thing can be done by using one of the value of X and checking error between
corresponding output that we know and the one which we have predicted.
We us root mean squared error to check the performance of our predictor function.
Error= J(theta0,theta1) = ((h(x1)-y1)^2 +(h(x2)-y2)^2 +(h(x3)-y3)^2 +.......+(h(xn)-
yn)^2)/(2*n)
Task 3
What we need to do is to minimize this Error function.

For minimizing we first diffrentiate this function after substituting the actual
form of h(x) and use it to update the value of parameters appropriately.
grad(theta0) = J’(theta0,theta1)/del(theta0) = (2*(theta0+theta1*(x1)-y1)+2*(theta0+theta1*(x2)
y2)+2*(theta0+theta1*(x3)-y3)+.......+2*(theta0+theta1*(xn)-yn))/(2*n)
grad(theta1) = J’(theta0,theta1)/del(theta1) = (2*(theta0+theta1*(x1)-y1)*x1+2*(theta0+theta1*(
y2)*x2+2*(theta0+theta1*(x3)-y3)*x3+.......+2*(theta0+theta1*(xn)-yn)*xn)/(2*n)
Using these as the updates to our parameters theta0 and theta1 with the steps
which we can scale with the help of alpha knows as step.
So, our update function becomes
Repeat until convergence:
(theta0)new = (theta0)old - alpha*(grad(theta0))
(theta1)new = (theta1)old - alpha*(grad(theta1))

Here alpha is learning rate. The greater the learning rate the faster the algo-
rithm will converge.
Delimma while selecting alpha the learning rate:
• if we choose small learning rate, slow convergence
• when we choose large alpha, it overshoots. So, we need to carefully select

our learning rate alpha.
This is also known as the gradient descent algorithm.
4.4 Logistic Regression
4.4.1 What?
4.4.2 Why?
4.4.3 How?
4.4.4 Try it.

Chapter 5. Unsupervised Learning 15 / 20
Chapter 5
Unsupervised Learning
5.1 K-Means Clustering
What?
K means clustering is a unsupervised machine learning algorithm to perform

clustering. In this we randomly assume k centres in our dataset and continuously
do so until we converge or we wish to stop. (It is NP hard Problem :P)
Why?
We do so to nearly at best approximate the positions of those centres to get a

good estimate of how similar our data is to each other.
How?
When we have randomly assigned centres we calculate the euclidean distance of

each observation to the closest centres and stop when this vakue is minimum.
It is used as it is the most computation efficient and approximate solution to our
problem of dividing into groups or clusters.
Chapter 5. Unsupervised Learning 16 / 20
Animation of the k-means

clustering algorithm. The Iris data set was chosen. The k-means was configured with
three random points before starting the animation.
Limitations
One possible outcome is that there are no organic clusters in the
data; instead, all of the data fall along the continuous feature ranges
within one single group.* [1]
5.1.1 Additional resources
• Jupyter notebook: scikit-learn K-Means clustering
[1] Trevino, Andrea, ‘Introduction to K-Means Clustering’ <https://www.datascience.

com/blog/introduction-to-k-means-clustering-algorithm-learn-data-science-tutorials>
[accessed 8 December 2016]
Chapter 6. Appendix 17 / 20
Chapter 6
Appendix
6.1 Further Learning
There are many great resources we can use to further explore machine learning.
Here are a few.
6.1.1 Articles
Trevino, Andrea, ‘Introduction to K-Means Clustering’
6.1.2 Books
Downey, Allen. Think Bayes. Green Tea Press

Downey, Allen. Think Stats (Second Edition). Green Tea Press
Guido, Andreas C. Müller, Sarah. Introduction to Machine Learning with Python.
Urdan, T. C. (2017). Statistics in plain English. New York, NY: Routledge,
Taylor & Francis Group.
Chapter 7. Text and image sources, contributors, and licenses 18 / 20
Chapter 7
Text and image sources,

contributors, and licenses
7.1 Text
• Course:Machine Learning for Humans/Introduction/Purpose Source: https://en.
wikitolearn.org/Course%3AMachine_Learning_for_Humans/Introduction/Purpose?oldid=
8861 Contributors: Brylie, Athale and Anonymous: 1
• Course:Machine Learning for Humans/Introduction/Tools Source: https://en.wikitolearn.
org/Course%3AMachine_Learning_for_Humans/Introduction/Tools?oldid=7699 Contrib-
utors: Brylie
• Course:Machine Learning for Humans/Introduction/What is Machine Learn-
ing? Source: https://en.wikitolearn.org/Course%3AMachine_Learning_for_Humans/Introduction/
What_is_Machine_Learning%3F?oldid=8681 Contributors: Vijaykrishnavanshi
• Course:Machine Learning for Humans/Foundations/Describe the world Source:
https://en.wikitolearn.org/Course%3AMachine_Learning_for_Humans/Foundations/Describe_
the_world?oldid=7749 Contributors: Brylie
• Course:Machine Learning for Humans/Foundations/What is Data? Source: https:
//en.wikitolearn.org/Course%3AMachine_Learning_for_Humans/Foundations/What_is_
Data%3F?oldid=11117 Contributors: Vijaykrishnavanshi and Bro666
• Course:Machine Learning for Humans/Foundations/Find the edges Source: https:
//en.wikitolearn.org/Course%3AMachine_Learning_for_Humans/Foundations/Find_the_
edges?oldid=7743 Contributors: Brylie
• Course:Machine Learning for Humans/Foundations/Find the center Source: https:
//en.wikitolearn.org/Course%3AMachine_Learning_for_Humans/Foundations/Find_the_
center?oldid=7737 Contributors: Brylie
• Course:Machine Learning for Humans/Foundations/Measure error Source: https:
//en.wikitolearn.org/Course%3AMachine_Learning_for_Humans/Foundations/Measure_
error?oldid=7739 Contributors: Brylie
• Course:Machine Learning for Humans/Foundations/Groups - clusters Source:
https://en.wikitolearn.org/Course%3AMachine_Learning_for_Humans/Foundations/Groups_
-_clusters?oldid=9064 Contributors: Valsdav and Brylie
• Course:Machine Learning for Humans/Foundations/Probability and uncertainty
Source: https://en.wikitolearn.org/Course%3AMachine_Learning_for_Humans/Foundations/
Probability_and_uncertainty?oldid=7788 Contributors: Brylie
• Course:Machine Learning for Humans/Foundations/Supervised or unsupervised?
Source: https://en.wikitolearn.org/Course%3AMachine_Learning_for_Humans/Foundations/
Supervised_or_unsupervised%3F?oldid=8845 Contributors: Brylie
• Course:Machine Learning for Humans/Visualization/Intro Source: https://en.wikitolearn.
org/Course%3AMachine_Learning_for_Humans/Visualization/Intro?oldid=7758 Contrib-
utors: Brylie
• Course:Machine Learning for Humans/Visualization/Scatterplot Source: https://

en.wikitolearn.org/Course%3AMachine_Learning_for_Humans/Visualization/Scatterplot?
oldid=7760 Contributors: Brylie
• Course:Machine Learning for Humans/Supervised Learning/Nearest Neigbors
Source: https://en.wikitolearn.org/Course%3AMachine_Learning_for_Humans/Supervised_
Learning/Nearest_Neigbors?oldid=8859 Contributors: Brylie
• Course:Machine Learning for Humans/Supervised Learning/Classification Tree
Learning/Classification_Tree?oldid=8860 Contributors: Brylie
• Course:Machine Learning for Humans/Supervised Learning/Linear Regression
Learning/Linear_Regression?oldid=10354 Contributors: Irene, Brylie and Vijaykrishna-
vanshi
• Course:Machine Learning for Humans/Supervised Learning/Logistic Regres-
sion Source: https://en.wikitolearn.org/Course%3AMachine_Learning_for_Humans/Supervised_
Learning/Logistic_Regression?oldid=8858 Contributors: Brylie and Vijaykrishnavanshi
• Course:Machine Learning for Humans/Unsupervised Learning/K-Means Clus-
tering Source: https://en.wikitolearn.org/Course%3AMachine_Learning_for_Humans/
Unsupervised_Learning/K-Means_Clustering?oldid=8840 Contributors: Brylie, Vijaykr-
ishnavanshi and Anonymous: 1
• Course:Machine Learning for Humans/Appendix/Further Learning Source: https:
//en.wikitolearn.org/Course%3AMachine_Learning_for_Humans/Appendix/Further_Learning?
oldid=7784 Contributors: Brylie and Anonymous: 1
7.2 Images
• File:Animation_of_k-means_clustering.gif Source: http://en.wikitolearn.org/images/
en/0/0d/Animation_of_k-means_clustering.gif License: ? Contributors: ? Original artist:
?
• File:Comparison_mean_median_mode.svg Source: http://en.wikitolearn.org/images/
en/d/de/Comparison_mean_median_mode.svg License: ? Contributors: ? Original artist:
?
• File:Data_with_clustered_points.png Source: http://en.wikitolearn.org/images/en/
e/e5/Data_with_clustered_points.png License: ? Contributors: ? Original artist: ?
• File:Decision-tree-orange-canvas-iris-dataset.png Source: http://en.wikitolearn.org/
images/en/3/3e/Decision-tree-orange-canvas-iris-dataset.png License: ? Contributors: ?
Original artist: ?
• File:Decision_tree_leaf_nodes_showing_accuracy_of_classification.png Source:
http://en.wikitolearn.org/images/en/1/15/Decision_tree_leaf_nodes_showing_accuracy_
of_classification.png License: ? Contributors: ? Original artist: ?
• File:Decision_tree_showing_decision_boundary.png Source: http://en.wikitolearn.
org/images/en/6/6a/Decision_tree_showing_decision_boundary.png License: ? Contrib-
utors: ? Original artist: ?
• File:K-means_clustering_visualization.png Source: http://en.wikitolearn.org/images/
en/0/01/K-means_clustering_visualization.png License: ? Contributors: ? Original artist:
?
• File:Linear_regression_visualization.png Source: http://en.wikitolearn.org/images/
en/9/9b/Linear_regression_visualization.png License: ? Contributors: ? Original artist:
?
• File:Scatterplot_example_showing_petal_length_with_petal_width.png Source:
http://en.wikitolearn.org/images/en/d/d3/Scatterplot_example_showing_petal_length_
with_petal_width.png License: ? Contributors: ? Original artist: ?
7.3 Content license

• [Project:Copyright Creative Commons Attribution Share Alike 3.0 & GNU FDL]
• Creative Commons Attribution-Share Alike 3.0

Machine Learning For Humans (2019)

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Machine Learning For Humans (2019)

Caricato da

Copyright:

Formati disponibili

Machine Learning for

September 17, 2019

Get in touch with the rest of the team by visiting http://join.wikitolearn.org

7 Text and image sources, contributors, and licenses 18

Machine Learning for Humans is intended as a people friendly introduction to

Chapters, particularly in the core of the book, will be organized to encourage

• Describe why the machine learning algorithm is used.

• Show a brief example of the concept used in a meaningful way.

• Give the reader some easy challenges to apply their knowledge.

In order to give the reader hands-on machine learning experience, we need to

• Scikit Learn - toolkit with many machine learning algorithms

• Orange Machine Learning - visual machine learning design tool

• OpenML - community of machine learning projects/people who are openly

1.3 What is Machine Learning?

Machine learning is the task of making machines smarter. It is a ﬁeld of study

• Supervised Learning: when our system learns by looking at the training

2.1 Describe the world

2.2 What is Data?

Weight Shape Type

Variable can be of two types:

2.3 Find the edges

It is sometimes important to know boundaries of a set of observations, or objects.

2.4 Find the center

It is often desirable to ﬁnd the center of a group of objects or observations. There

2.5 Measure error

2.6 Groups - clusters

Observations of reality often have similarities, but just enough uniqueness to

Data showing clustered

Data showing clustered

2.7 Probability and uncertainty

[1] ‘Uncertainty’, Wikipedia, <https://en.wikipedia.org/w/index.php?title=Uncertainty>

[2] ‘Probability’, Wikipedia, <https://en.wikipedia.org/w/index.php?title=Probability>

[3] ‘Forecasting’, Wikipedia, <https://en.wikipedia.org/w/index.php?title=Forecasting>

[4] ‘Conditional Probability’, Wikipedia, <https://en.wikipedia.org/w/index.php?title=

[5] ‘Dynamical System’, Wikipedia <https://en.wikipedia.org/w/index.php?title=Dynamical_

[6] ‘System Dynamics’, Wikipedia, <https://en.wikipedia.org/w/index.php?title=System_

[7] ‘Systems Thinking’, Wikipedia, <https://en.wikipedia.org/w/index.php?title=Systems_

2.8 Supervised or unsupervised?

Before diving in to the machine learning landscape, we should be familiar with

A scatterplot allows us to compare two, or more, variables simultaneously. We

4.1 Nearest Neigbors

4.1.4 Try it.

4.2 Classiﬁcation Tree

Decision tree for Iris data

Decision boundary for

Decision tree leaf nodes

4.2.4 Try it.

4.3 Linear Regression

Regression : Predict real-valued output

4.3.3 Try it out

Lets say we have n data points - (x1,y1),(x2,y2),(x3,y3).........(xn,yn)

Under this section we assume our hypothesis fuinction.

Under section how well hypothesis function performs :

What we need to do is to minimize this Error function.

(theta1)new = (theta1)old - alpha*(grad(theta1))

• if we choose small learning rate, slow convergence

• when we choose large alpha, it overshoots. So, we need to carefully select

This is also known as the gradient descent algorithm.

4.4 Logistic Regression

4.4.4 Try it.

5.1 K-Means Clustering

K means clustering is a unsupervised machine learning algorithm to perform

We do so to nearly at best approximate the positions of those centres to get a