Sei sulla pagina 1di 10

Name: -AnandTarika

Registration Number: - 11705802


Ca-2.

TABLE OF CONTENTS.
1. INTRODUCTION.
2. SUPERVISED AND UNSUPERVISED LEARNING.
3. DATA USED.
4. MACHINE LEARNING MODELS.
5. CLUSTERING.
6. CLASSIFICATION.
7. REGRESSION.
8. USES OF MODELS.
INTRODUCTION.
The scientific purpose of Machine Learning is to focus on developing algorithms to find patterns
or make predictions from the data. Machine Learning is increasingly used by many professions
and industries in for example manufacturing, medicine, finance, robotics, telecommunications
and social media.
Machine learning can appear in many guises. Much of the art of machine learning is to reduce
a range of fairly disparate problems to a set of fairly narrow prototypes. Mostly machine
learning is used to solve those problems and provide good guarantees for the solutions.

SUPERVISED AND UNSUPERVISED LEARNING.

Machine learning is further divided into categories.


1. Supervised Learning.
2. Unsupervised Learning.

Supervised Learning.
Supervised learning starts with a manually specified model of desired outputs.

Unsupervised Learning.
Unsupervised learning starts from the data, making its own conclusion about the structure it finds in
that data.

Examples of Supervised Learning.


1. Classification
a. Spam Filtering.
b. Image Classification.
2. Regression.
a. Given information about house predict its cost.
b. Amazon Prime:- Predict the rating user going to give to the movie.
Examples of Unsupervised learning
1. Clustering.
2. Association.

Classification Vs Clustering.

DATA USED
The data I have used has been downloaded from https://www.openml.org/t/9980.

The data uses names such as V1, V2, V3 etc.

The data set used here is WDBC.

MACHINE LEARNING MODELS.


Machine learning is an application of artificial intelligence (AI) that provides systems the ability to
automatically learn and improve from experience without being explicitly programmed. Machine
learning focuses on the development of computer programs that can access data and use it learn for
themselves.

The process of learning begins with observations or data, such as examples, direct experience, or
instruction, in order to look for patterns in data and make better decisions in the future based on the
examples that we provide. The primary aim is to allow the computers learn automatically without
human intervention or assistance and adjust actions accordingly.
The models that I used here is.
1. Classification.
2. Regression.
3. Clustering.

I divided the data in 2 parts for training and another for testing. The strategy I have used
to split the data into training and testing subsets is 70 percent for training and 30
percent for evaluation.

CLUSTERING.

In machine learning and statistics, classification is a supervised learning approach in which the
computer program learns from the data input given to it and then uses this learning to classify
new observation.
Clustering is an unsupervised machine learning method for partitioning dataset into a set of
groups or clusters. Clustering tries to group a set of objects and find whether there is some
relationship between the objects.

Code.

library(dplyr)

anand_2=read.csv("rrdata.csv")

anand_2

head(anand_2)

ca_21=as_tibble(anand_2)

head(ca_21,5)
str(ca_21)

#apply kmeans algorithm to data to find clusters

ca_21$Class=factor(ca_21$Class, levels = c("1", "2"))

ca_cluster=kmeans(anand_2, 2)

ca_cluster

#normalize data

function1=function(x) { return((x-min(x)/(max(x)-min(x)))) }

data_normal_anand=as.data.frame(lapply(ca_21[,1:20], function1))

data_normal_anand

#apply kmeans algorithm to normalized data

resultcluster_ca_normal=kmeans(data_normal_anand, 2)

resultcluster_ca_normal

Output.
Regression.
In statistical modeling regression analysis is a set of statistical processes for estimatingthe relationships
among variables. It includes many techniques for modeling and analyzing several variables, when the
focus is on the relationship between a dependent variable and one or more independent variables(or
'predictors'). More specifically, regression analysis helps one understand how the typical value of the
dependent variable (or 'criterion variable') changes when any one of the independent variables is varied,
while the other independent variables are held fixed.

Regression analysis is used when you want to predict a continuous dependent variable from a number
of independent variables.One thing to keep in mind with regression analysis is that causal relationships
among the variables cannot be determined. While the terminology is such that we say that X "predicts"
Y, we cannot say that X "causes" Y.

The goal is to estimate a real-valued variable y ∈R given a pattern x . For instance, we might want to
estimate the value of a stock the next day, the yield of a semiconductor fab given the current process,
the iron content of ore given mass spectroscopy measurements, or the heart rate of an athlete, given
accelerometer data.

Code.

library(caTools)

data=read.csv(file.choose())

attach(data)

library(ggplot2)

#apply linear regression with V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 V20
as independent variables and

#Class as dependent variable

linearMod<- lm(Class~
V2+V3+V4+V5+V6+V7+V8+V9+V10+V11+V12+V13+V14+V15+V16+V17+V18+V19+V20+V21+V22+V23+V
24+V25+V26+V27+V28+V29+V30, data=data)

linearMod

summary(linearMod)

Output
CLASSIFICATION
In machine learning and statistics, classification is a supervised learning approach in which the computer
program learns from the data input given to it and then uses this learning to classify new observation.

I have classified the wdbc data into 2 classes. Class 1 and Class 2.

CODE

library(dplyr)

library(rattle)

library(class)

library(plyr)

library(ggplot2)

#import data

anand_2=read.csv("rrdata.csv")

anand_21=as_tibble(anand_2)

#change Class vector from Factor

anand_21$Class=factor(anand_21$Class, levels = c("1", "2"))

#check summary to see the need for normalization

summary(anand_21[,1:21])

#create function for normalization

function1=function(x) { return((x-min(x)/(max(x)-min(x)))) }

#apply normalization function

data_normal_anand=as.data.frame(lapply(anand_21[,1:21], function1))
#split the data into training and testing data

anand_train_data=data_normal_anand[1:350,]

anand_test_data=data_normal_anand[350:500,]

anand_train_label=anand_21$Class[1:350]

#use knnalgorith to find classes, use 23 neighbours as there are 600 obeservatuiobs in total

data_pridct=knn(train=anand_train_data, test=anand_test_data, cl=anand_train_label, k=23)

data_pridct

#create a table to compare predicted data and original data

table(data_pridct, anand_2$Class[350:500])

Output

Practical use of Models

I found out that we can use classification as the best model for machine learning for the dataset wdbc
models.

Refrences.

https://www.kaggle.com/datasets
https://toolbox.google.com/datasetsearch
https://data.world/#
https://archive.ics.uci.edu/ml/datasets.html
https://www.openml.org/search?type=data

Potrebbero piacerti anche