Sei sulla pagina 1di 16

Machine Learning Report

Deciding which Machine Learning Algorithm and Inductive Bias to use

24/02/2020

1
Contents
Introduction.................................................................................................................................................3
Virtual Personal Assistants......................................................................................................................3
Email Spam and Malware Filtering...........................................................................................................3
Search Engine Result Refining.................................................................................................................3
Product Recommendations.....................................................................................................................3
Online Fraud Detection............................................................................................................................3
Machine Learning Algorithms best suited to specific problems..................................................................4
Probability-based.....................................................................................................................................4
Naive Bayes.........................................................................................................................................4
Information-based...................................................................................................................................6
Decision trees......................................................................................................................................6
Regression-based....................................................................................................................................7
Linear Regression.................................................................................................................................7
Neural networks......................................................................................................................................9
Deep learning......................................................................................................................................9
Criteria when selecting a machine learning algorithm..............................................................................10
Type of data...........................................................................................................................................10
Categorize the problem.........................................................................................................................10
Categorize input................................................................................................................................10
Categorize output..............................................................................................................................11
Constraints............................................................................................................................................11
Find the algorithm.................................................................................................................................11
Inductive biases used by various algorithms.............................................................................................12
Linear Regression..................................................................................................................................12
Decision Trees.......................................................................................................................................12
Naive Bayes...........................................................................................................................................12
Neural Network.....................................................................................................................................13
Glossary.....................................................................................................................................................15
Bibliography...............................................................................................................................................15

2
Introduction
Machine learning is one of the vast computer sciences, endless possibilities that could be applied to
almost any application. Machine learning is a category of artificial intelligence (AI). In the past couple of
decades it has become a common means in extracting large amount of data from datasets and acquires
information from that data that could be used in an infinite number of ways.

There is a high chance that you are using it in one way or the other right now and you don’t even know
about it. According to Medium.com, the following are several examples on how machine learning is
used in our every day to day life without us even knowing it

Virtual Personal Assistants


We all have at least one personal assistant in our life, windows 10 have Cortana, iPhone has Siri amazon
have Alexa. Virtual personal assistants assist us in finding information when asked over voice or
reminding us for our schedule.

Email Spam and Malware Filtering


There are several filtering approaches that email clients use. To make sure that these spam filters are
continuously up to data with the latest information, they are powered by machine learning. (Daffodil
Software, 2017)

Search Engine Result Refining


Google and other search engines use machine learning to improve their search results. When you
execute a search, in the backend, their algorithm monitors how you respond to the results. If you stay
on the webpage for a certain length of time, the search engine assumes that the search results were to
your specification, if you stay on the webpage for a short period of time then the search engine assumes
that the results did not meet your specification. This is how the algorithm improves search results.
(Daffodil Software, 2017)

Product Recommendations
Most ecommerce websites gives customers product recommendations, this can drastically increase
revenue substantially. This is done by identifying customer’s patterns in sales and shopping behavior.
Most ecommerce retailers have taken advantaged of machine learning to successfully create a product
recommendation engine.( Himanshu Singh, 2019)

3
Online Fraud Detection
Machine learning is helping making the cyberspace a safe place and monitering monetary frauds online
is an example of it. Company’s use a set of tools to compare billions of transactions occuring between
buyers and sells and distinguish between legitimate and illegitimate transactions of money. (Daffodil
Software, 2017)

In the following chapters I will be discussing various machine learning, which machine learning
algorithms are suited to which problem, what the selection criteria that should be looked at when
choosing a machine learning algorithm and which inductive biases are used by various different
algorithms

Machine Learning Algorithms best suited to specific problems

Probability-based

Naive Bayes
The Naïve Bayes algorithm learns the probability of an object with certain features belonging to a
particular grouping class, in other words, it is a simple classifier that classifies based on probabilities of
events. It mainly applies to text classification. It performs well in many categorical data problems. Naive
Bayes is a member of Bayesian prediction group of algorithms.

It is called the naïve because it makes the assumption that the occurrence of a certain feature is
independent of the occurrence of other features. To put this into an example, it is trying to identify a
fruit based on its colour, shape and taste, if it comes across the feature of a fruit are orange, spherical
and tangy, and it will conclude that it is an orange. (GeeksforGeeks, 2017)

With the Naïve Bayes Algorithm, as with all other machine learning models there needs to be an existing
set of training data for each class. See Figure 1 for an example of the Naïve Bayes algorithm and the type
of data it uses and outputs it creates.

4
Figure 1 This table contains training data and outputs that can be used in a Naïve Bayes Algorithm model

(GeeksforGeeks, 2017)

Figure 1 shows a dataset that is divided into two parts, the feature matrix and the response matrix. The
feature matrix contained the rows of the dataset in which each row holds the value of dependent
features. In the dataset in Figure 1, the features are Outlook, Temperature, Humidity and Windy. The
response row contains the value of the output (class) for each row of the feature matrix. In the dataset
in Figure 1, the output variable name is “play golf”.(GeeksforGeeks, 2017)

It predicts membership probabilities for each output such as the probability that given record or data
point belongs to a particular output. The output with the highest probability is considered as the most
likely class. (Rahul Saxena,  2017)

Because of how useful the naïve bias is at calculating the output categorical data it is often used as a
classifier. Real world applications where the Naïve Bayes is used are in applications include filtering
spam emails, input data it could use for this is incoming emails location, volume of emails, and
frequency of emails or sentiment prediction to predict items a user might buy or predicting what the
user might want to watch next. The best features are categorical features and the class will be
categorical.

5
Information-based

Decision trees
Decision Trees are a category of extremely powerful Machine Learning model capable of achieving high
accuracy in several tasks as well as being extremely explainable. What makes decision trees special
within the realm of machine learning models is mainly their clarity of data representation. The
knowledge learned by a decision tree through training is directly developed into a hierarchical data
structure with each decision being considered as a node. This structure holds and displays the data in
such some way that it will simply be understood, even by non-experts.

The way it does this is it uses entropy. Entropy is a measure of the purity of a node. If it is 100% full of
one class, then you know a lot. If it is 50/50, then it doesn't tell you much. Where it is useful is when you
compare the jump from one node to the next two (in a binary tree). A decision tree will choose the split
with the highest information gain, which is the change in entropy from one node to its direct children.(
Jake Hoare, 2018)

An example of a decision tree is a decision about what activity you must try this weekend. It would
depend upon whether or not you're feeling like going out together with your friends or spending the
weekend alone; in each case, your call conjointly depends on the weather. If it’s sunny and your friends
able accessible, you'll wish to play football. If it finally ends up snowing you’ll visit a show. And if your
friends don’t show up, well then you wish playing video games despite of what the weather is like. This
Decision tree is displayed in Figure 2.

An example of
decision trees being
used is at
border
security,
whether
somebody
might seem like a
terrorist, the data
it could be
assessing is Figure 2 (George Seif, 2018) decision tree example of deciding to be alone or with friends how
many

6
countries he has been in in a few months, age or visa status, the action should be detain, report or
allow.

Regression-based

Linear Regression
Linear Regression is a machine learning algorithm based on supervised learning. Linear Regression
performs a regression task for regression models. A regression model then targets a prediction based on
the input values. This is why the Linear regression is mostly used for finding out the relationship
between variables and forecasting. Different regression models disagree because of – the type of
relationship between the dependent and independent variables, they're considering and also the range
of independent variables getting used. (Mohit Gupta 2018)

An example if Linear Regression algorithm is how much money should be allocated for gas. In Figure 3
you can see data that has been acclimated on priory trips for total money pain and total miles traveled.

cc

Figure 3(Carolina Bento 2018) data set

Using this input data on a spreadsheet you will get a scatter graph as seen in Figure 4.

7
Figure 4 (Carolina Bento 2018) scatter graph using data set

By plotting the data from the past, it is clear that there is a linear connection between how far you can
drive without filling up with gas. What we what to calculate is the dependent variable (to be predicted)
which is money spent and the independent variable (input) is what we use to calculate it.

With this data we can roughly Figure how much to pay for gas on any given miles by calculating the
epsilon. This can be shown in Figure 5.

Figure 5(Carolina Bento 2018) scatter graph with epsilon calculated

Regression algorithms are commonly used for predicting continuous data. An example where linear
regression can be used is when predict the sale of products or economic growth. The features that this
algorithm will be better using would be continuous features and will give a continuous class.

8
Neural networks

Deep learning
A neural network is a series of algorithms that tries to recognize underlying relationships in a set of data
through a process that mimics the way a biological brain operates. With neural networks, it provides a
multi-layer approach to learn data representations, typically performed with a multi-layer neural
network. This is shown in Figure 6.

Figure 6 diagram of a deep learning neural network (University of Cincinnati, 2019)

This is as opposed to other machine learning algorithms that only have the ability to have one or two
layers of data transformation.

Deep learning exceeds in recognizing objects in images as its implemented using 3 or more layers of
artificial neural networks where each layer is responsible for extracting at least one feature of the image

9
The types of problems that neural networks would be the best at handling would be problems that
involve images or voice, examples of these problems are image classification, speech recognition, and
autonomous driving. The features that this algorithm will take will be image and audio based.

Criteria when selecting a machine learning algorithm


When solving a machine learning problem it can be hard to determine which is the best suited algorithm
for the task at hand. Different algorithms are best suited for different types of data and problems.

Type of data
Before you begin looking at completely different Machine Learning algorithms, you should possess a
clear image of your data, your problem and your constraints.

The type of data of knowledge we've got plays a key role when making a decision on which Machine
Learning algorithm to use. Some algorithms will work with smaller sample sets whereas others need
tons and tons of samples. Some algorithms work with sure styles of data. (Rajat Harlalka, 2018)

An example of this would be the Naïve Bayes algorithm. The Naïve Bayes algorithm works well with
categorical input however it isn't the least bit sensitive to missing data. Regression algorithms work best
for continuous algorithms.

Categorize the problem

Categorize input
The next criteria to look at when choosing a machine learning algorithm is to categorize the type of
input. Find out if the data that is being used is either labeled or unlabeled.

Labeled data is data that contains features and a label. In supervised training, models learn from labeled
examples. Unlabeled data is data that contains features but no label. Unlabeled examples are the input
to inference. In semi-supervised and unsupervised learning, unlabeled examples are used during
training.(Machine Learning Glossary, 2020)

If the data you have is labeled data, then the problem you have is a supervised learning problem and
you will need to use an algorithm from the supervised learning category. If your data is unlabeled data
then you want to find structure, it’s an unsupervised learning problem and you will need to use an
algorithm from the unsupervised learning category.

If the problem is you want to optimize an objective function by interacting with an environment, it’s a
reinforcement learning problem.( Rajat Harlalka, 2018)

10
Categorize output
The next step in categorizing the problem is categorizing the output. If the output of the model is a
number, then it’s a regression problem. If the output of the model is a class, then it is a classification
problem. If the output of the model is a set of input groups, then it is a clustering problem. If the
problem is to be able to detect an anomaly, then it is anomaly detection problem.

Constraints
It is import to know the constraints when creating a solution with a machine learning algorithm so you
do not commit to the impossible.

One constraint that is of importance is the storage capacity of the data base. Depending on the storage
capacity of the data base, it may be impossible to store gigabytes of classification models of gigabytes of
data clustered.( Rajat Harlalka, 2018)

How fast should the prediction be, this is the case for the Google car and tesla cars which use autosomes
driving and need updates as fast as possible.

Find the algorithm


Now that the problem details and requirements and components are identified, now you need to decide
which algorithm best fits what has been identified.

Several algorithms might fit the solution and requirements you have but they can be differing in
complexity, it is best to go with the simplest algorithm because the more complex an algorithm is, the
higher the chance of overfitting occurs.( Jean Francois Puget, 2016)

In order to help people who are new to machine learning to do this process and use the right algorithm
for the problem, see figure 7 for a decision tree diagram on how to choose the right algorithm in the
correct order and asking the right questions.

Figure 7(scikit-learn, 2013) decision tree for choosing the right algorithm

11
Inductive biases used by various algorithms
The inductive bias of a learning algorithm is the set of assumptions that the learner uses to predict
outputs from given inputs that it has not encountered.

When creating a machine learning model, data should always be separated into training data and testing
data to ensure that the model doesn’t suffer from Overfitting or Underfitting . When

A classic example of an inductive bias that can be used in everyday life is Occam’s razor - the simplest
solution is most likely the correct one.

Linear Regression
An inductive bias commonly used in linear regression is, the relationship between the attributes x and
the output y is linear. The objective is to reduce the sum of squared errors. (Laura Hamilton, 2014)

This means the model assumes that the output or dependent variable is related to independent variable
linearly. So if there is an exponential increase on the y axis while the x axis remains continuous, it will
know something is wrong. This is how linear regression uses an inductive bias

Decision Trees
ID3 is an algorithm used to generate a decision trees from a dataset. When ID3 is given a set of training
data, there could be several decision trees generated from this data, the decision tree that is chosen
could depend on which inductive bias it is using.

ID3’s search strategy is to select shorter trees over longer trees and selects trees that place attributes
with the highest information gain closest to the roots. (Mitchell, 2010, p16)

Therefore an inductive bias for ID3 is Shorter trees are preferred over longer trees. Trees that place high
information gain attributes close to the root are preferred over those that do not. (Laura Hamilton,
2014)

Naive Bayes
In a Naive Bayes network, each arrow represents dependence. In a Naive Bayes network each input is
treated as being dependent on only the output class or label. See Figure 8 for an image of a Naive Bayes
network.

12
Figure 8 (unknown, 2013) diagram of a Naive Bayesian network

This is an example of an inductive bias in Naive Bayes algorithm; the inputs in Naive Bayes are
independent from each other, or in other words, each input depends only on the output class or label.
(Laura Hamilton,2014)

Neural Network
In neural networks algorithms data is being graphed, linear interpolation is useful when looking for a
value between given data points be connecting the data points in a smooth way(continuous), this can be
helpful when having a graph with bias.

Figure 9 (Paul Bourke, 1999) Linear interpolation without smoothing

13
Figure 10 (Paul Bourke, 1999) Linear interpolation with smoothing between datapoints

The Backpropagation algorithm is a neural network algorithm that uses an inductive bias for dealing
with graphs with bias; it is called Smooth interpolation between data points. (Laura Hamilton, 2014)

This inductive bias helps models generalize data; continuity is a necessary condition for neural networks
in particular. Being continuous is a necessary condition for being differentiable, and being differentiable
is what lets us train neural networks with backpropagation.

14
Glossary
AI – Artificial Intelligence

ID3 – Iterative Dichotomiser 3

Bibliography
Daffodil Software,2017 , 9 Applications of Machine Learning from Day-to-Day Life, medium, viewed on
the 15th of February 2020 https://medium.com/app-affairs/9-applications-of-machine-learning-from-
day-to-day-life-112a47a429d0

Mitchell, 2010, Decision Tree Learning Mitchell, Chapter 3, lecturer notes , CptS 570 Machine Learning,
Washington State University https://www.eecs.wsu.edu/~holder/courses/CptS570/fall08/slides/ch3.pdf

Laura Hamilton 2014, The Inductive Biases of Various Machine Learning Algorithms, Laura Hamilton,
viewed on 17th February 2020, http://www.lauradhamilton.com/inductive-biases-various-machine-
learning-algorithms

Unknown, 2013, Naive Bayes, diagram, viewed on February 17 th 2020,


http://inductivebias.com/Blog/naive-bayes/

University of Cincinnati , 2019, Deep Learning: How Will It Change Healthcare?, diagram, viewed 16
February 2020, http://orbograph.flywheelsites.com/deep-learning-how-will-it-change-healthcare/

Bento. C, 2018, Linear Regression In Real Life, image, viewed 18 February 2020,
https://towardsdatascience.com/linear-regression-in-real-life-4a78d7159f16

scikit-learn, 2013, Choosing the right estimator, image, viewed 17 February 2020 https://scikit-
learn.org/stable/tutorial/machine_learning_map/index.html

Gupta M ,2018 , ML | Linear Regression, GeeksforGeeks, viewed 18 February 2020,


https://www.geeksforgeeks.org/ml-linear-regression/

15
Bourke, p, 1999, Interpolation methods , paulbourke , viewed 17 February 2020,
http://paulbourke.net/miscellaneous/interpolation/

Jean Francois Puget, 2016, Overfitting In Machine Learning, IBM Community, viewed 18 February 2020,
https://www.ibm.com/developerworks/community/blogs/jfp/entry/Overfitting_In_Machine_Learning?
lang=en

Hoare, J, 2018, How is Splitting Decided for Decision Trees?, Displayer, viewed 18 February 2020,
https://www.displayr.com/how-is-splitting-decided-for-decision-trees/

Mitchell, T. M. (1980), The need for biases in learning generalizations, CBM-TR 5-110, New Brunswick,
New Jersey, USA: Rutgers University

16

Potrebbero piacerti anche