Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
2018-2019
MARCH-23
VIT-AP
NAME : D.VENKATARAMANA
REG NO : 17BME7082
1
NAME: D.VENKATARAMANA
REG NO: 17BME7082
1.Basic operation on R
2.Read data analysis
3.Data analysis and random sampling
4.Random sampling
5.Binomial distribution
6.Normal distribution
7.Z-Test
8.LAB-TEST
9.Regression
2
Basic operations on R:
1. Simple Operations
Output:
>x
[1] 2 4 3 5 6
Code:
3
length(x)
output:
[1] 5
code:
x[length(x)]
output:
[1] 6
code:
min(x)
output:
[1] 2
code:
> max(x)
Output:
[1] 6
Code:
4
> x<-1:20
Output:
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Code:
> x(3)
Output:
> 3
Code:
> a<-c(2,3,5,9,11,12,23,6)
Output:
>a
[1] 2 3 5 9 11 12 23 6
Code:
> a[3]
Output:
[1] 5
code:
5
> a[c(2,5,6,)]
Output:
[1] 3 11 12
d) Print the data as {20, 19, …, 2, 1} without again entering the data.
Code:
> x<-20:1
Output:
>x
[1] 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
Code:
x<-c(rep(4,4),rep(3,3),rep(5,3))
output:
>x
[1] 4 4 4 4 3 3 3 5 5 5
Code:
> x<-rep(c(4,3,5),c(4,3,3))
c) Create a list (3, 1, 5, 3, 2, 3, 4, 5, 7,7, 7, 7, 7,7, 6, 5, 4, 3, 2, 1, 34, 21, 54) using one
Code:
> x<-c(3,1,5,3,(2:5),rep(7,5),(7:1),34,21,54)
6
Output:
>x
[1] 3 1 5 3 2 3 4 5 7 7 7 7 7 7 6 5 4 3 2 1 34 21 54
d) First create a list (2, 1, 3, 4). Then append this list at the end with another list (5,
Check whether the number of elements in the augmented list is 11. 4
code:
> x<-c(5,7,12,6,-8)
> x<-c(2,1,3,4)
> x<-c(x,c(5,7,12,6,-8))
Output:
>x
[1] 2 1 3 4 5 7 12 6 -8
Code:
> length(x)
Output:
[1] 9
Code:
> length(x)=11
Output:
> length(x)
7
[1] 11
> length(x)==11
[1] TRUE
4. (a) Print all numbers starting with 3 and ending with 7 with an increment of 0:5.
numbers in x.
Code:
> x<-0.5:7
Output:
>x
Code:
> seq(3,7,0.5)
Output:
[1] 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0
Output:
[1] 2 4 6 8 10 12 14
(a) Type 2*x and see what you get. Each element of x is multiplied by
Code:
> 2*x
8
Output:
[1] 4 8 12 16 20 24 28
Code:
> seq(1,10,1)
Output:
[1] 1 2 3 4 5 6 7 8 9 10
Code:
> seq(1,10,1)
Output:
[1] 1 2 3 4 5 6 7 8 9 10
Code:
> x<-seq(1,10,1)
Output:
> sum(x)
Output:
[1] 55
9
Code:
> mean(x)
Output:
[1] 5.5
Code:
> median(x)
Output:
[1] 5.5
Code:
> mad(x)
Output:
[1] 3.7065
(e) Find the value of 1 𝑛 ∑ |𝑥𝑖 − 𝑥̅| 𝑛 𝑖=1 , This is known as mean deviation about m
Check whether 𝑀𝐷𝑥̅ is less than or equal to standard deviation.
Code:
> sd(x)
Output:
[1] 3.02765
Code:
> mad(x)<=sd(x)
10
Output:
[1] FALSE
output:
a
11
5 59.75 1420 4 1.9 yes
b) How many rows are there in this table? How many columns are there?
Code:
nrow(a)
output:
[1] 20
Code:
12
> ncol(a)
Output:
[1] 5
c) How to find the number of rows and number of columns by a single command?
Code:
> c(nrow(a),ncol(a))
Output:
[1] 20 5
Code:
> names(a)
Output:
e) If the file is very large, naturally we cannot simply type `a', because it will cover the entire sc
be able to understand anything. So how to see the top or bottom few lines in this file?
Code:
> head(a)
Output:
13
4 57.50 1000 2 8.8 no
Code:
> tail(a)
Output:
f) If the number of columns is too large, again we may face the same problem. So how to see th
first 3 columns?
Code:
a[1:5,1:3]
output:
1 52.00 1225 3
2 54.75 1230 3
3 57.50 1200 3
14
4 57.50 1000 2
5 59.75 1420 4
g) How to get 1st, 3rd, 6th, and 10th row and 2nd, 4th, and 5th column?
Code:
> a[c(1,3,6,10),c(2,4,5)]
Output:
1 1225 6.2 no
3 1200 4.2 no
6 1450 5.2 no
10 1550 9.2 no
Code:
> a[6,10]
NULL
> a[2,5]
[1] no
Levels: no yes
> a[1,2]
[1] 1225
> a[2,]
15
2 54.75 1230 3 7.5 no
> a[,5]
[1] no no no no yes no yes no no no yes no yes yes yes yes yes yes yes yes
Levels: no yes
> a[3]
Rooms
1 3
2 3
3 3
4 2
5 4
6 3
7 4
8 4
9 5
10 6
11 6
12 5
13 6
14 7
15 6
16 6
16
17 6
18 6
19 8
20 7
> a[1,]
1 52 1225 3 6.2 no
> a[0,]
> a[5]
CentralHeating
1 no
2 no
3 no
4 no
5 yes
6 no
7 yes
8 no
9 no
10 no
17
11 yes
12 no
13 yes
14 yes
15 yes
16 yes
17 yes
18 yes
19 yes
20 yes
3) Calculate simple statistical measures using the values in the data file
a) Find means, medians, standard deviations of Price, Floor Area, Rooms, and Age
code:
mean(a[,1])
[1] 71.5875
> median(a[,1])
[1] 69.875
> sd(a[,1])
[1] 12.21094
> mean(a$Age)
18
[1] 4.205
> sd(a$Age)
[1] 2.786523
> sum(a$CentralHeating=="yes")
[1] 11
Code:
> hist(a$FloorArea,freq=F)
Output:
Draw all the graphs in (c), (d), and (e) in the same graph paper. Draw all the graphs in (c), (d), a
19
graph paper.
Draw all the graphs in (c), (d), and (e) in the same graph paper.
par(mfrow=c(2,2))
> plot(a$Price,a$FloorArea,col=1,pch=3)
> plot(a$Price,a$FloorArea,col=1,pch=2)
> hidt(a$FloorArea)
20
21
Data analysis and random sampling:
a) Matrices and arrays are represented as vectors with dimensions: Create one
to 12 numbers with 3X4 order.
CODE:
x<-1:12 > x [1] 1 2 3 4 5 6 7 8 9 10 11 12
> dim(x)<-c(3,4)
Output:
X x [,1] [,2] [,3] [,4]
[1,] 1 4 7 10
[2,] 2 5 8 11
[3,] 3 6 9 12
output:
22
output:
> x
> x [,1] [,2] [,3] [,4]
[1,] 1 4 7 10
[2,] 2 5 8 11
[3,] 3 6 9 12
Code:
> rownames(x)<-LETTERS[1:3]
Output:
> x [,1] [,2] [,3] [,4]
A 1 4 7 10
B 2 5 8 11
C 3 6 9 12
b) Transpose of the matrix.
[1,] 1 4 7 10
[2,] 2 5 8 11
[3,] 3 6 9 12
> matrix(1:12,nrow=3,byrow=T)
[,1] [,2] [,3] [,4]
[1,] 1 2 3 4
[2,] 5 6 7 8
[3,] 9 10 11 12
c) Use functions cbind and rbind separately to create different matrices.
Code:
> rbind(A=c(1,2,3,4),B=c(5,6,7,8))
Output:
[,1] [,2] [,3] [,4]
A 1 2 3 4
B 5 6 7 8
Code:
> cbind(P=c(1,5,9),
Q=c(2,6,10)) P Q
23
[1,] 1 2
[2,] 5 6
[3,] 9 10
Code:
> t(x) A B C
Output:
[1,] 1 2 3
[2,] 4 5 6
[3,] 7 8 9
[4,] 10 11 12
d) Use arbitrary numbers to create matrix.
Code:
p<-matrix(1:4,4,4)
output:
> p [,1] [,2] [,3] [,4]
24
3.RANDOM SAMPLING
a) In R, you can simulate these situations with the sample function. Pick five nu
random from the set 1:40.
Code:
> sample(1:40,5)
Output:
[1] 30 16 1 24 14
Code:
> sample(1:40,5)
Output:
[1] 36 23 26 27 35
Code:
> sample(1:40,5)
Output:
[1] 13 6 26 28 20
Code:
> sample(1:40,5)
Output:
[1] 12 7 9 17 27
b) Notice that the default behaviour of sample is sampling without replacemen
samples will not contain the same number twice, and size obviously cannot be b
length of the vector to be sampled. If you want sampling with replacement, then
add the argument replace=TRUE. Sampling with replacement is suitable for mo
tosses or throws of a die. So, for instance, simulate 10 coin tosses.
sample(c("suc","fail"),10,replace =T,prob=c(0.9,0.1)) [
"suc" "suc" "suc" "fail" "suc" "suc" "suc" "suc" "suc"
> sample(c("suc","fail"),10,replace =T,prob=c(0.9,0.1))
"suc" "suc" "suc" "suc" "fail" "suc" "suc" "suc" "suc"
e) Find 5!
factorial(5) [1] 120
BINOMIAL DISTRIBUTION:
Syntax:
EXERCISE IN LAB
dbinom(3,5,0.95)
[1] 0.02143438
> dbinom(c(0,1,2,3,4),5,0.95)
> dbinom(0:4,5,0.95)
26
[1] 0.0000003125 0.0000296875 0.0011281250 0.0214343750 0.2036265625
>
> dbinom(2:4,5,0.95)
> sum(dbinom(2:4,5,0.95))
[1] 0.2261891
> pbinom(2:4,5,0.95)
> pbinom(0:4,5,0.95)
> pbinom(4,5,.95)-pbinom(1,5,0.95)
[1] 0.2261891
> x<-seq(0,5,by=0.1)
> y<-pbinom(x,5,0.95)
> pbinom(4,5,.95)-pbinom(1,5,0.95)
[1] 0.2261891
27
There were 45 warnings (use warnings() to see them)
Normal distribution:
output:
[1] 0.2524925
28
3) What percentage of people have 110 and 125
Code:
Output:
[1] 0.6997171
Code:
qnorm(.25)
output:
[1] -0.6744898
Code:
> qnorm(.25,2,3)
Output:
[1] -0.02346925
Code:
qnorm(0.25,100,15)
29
output:
[1] 89.88265
Code:
> qnorm(0.01,100,15)
Output:
[1] 65.10478
8)Generate 20 random numbers from a normal distribution with mean=572 and sta
deviation=51 calculate the mean & standard deviation data set
Code:
norm<-rnorm(20,572,51)
output:
> norm[1:20]
> mean(norm)
[1] 563.0454
> sd(norm)
[1] 46.16659
30
9)make an appropriate histogram data and visually if the normal density curve and
density estimate or similar
Code:
hist(norm,main="normal distribution",prob=TRUE)
output:
Code:
curve(dnorm(x,572,51),add=TRUE)
output:
31
32
Z-test:
Test the hypothesis that the mean systolic blood pressure in a certain population e
mmHg. The standard deviation has a known value of 20 and a data set of 55 patien
120,115,94,118,111,102,102,131,104,107,115,139,115,113,114,105,115,134,109,
109,93,118,109,106,125,150,142,119,127,141,149,144,142,149,161,143,140,148,
149,141,146,159,152,135,134,161,130,125,141,148,153,145,137,147,169
Code:
x<-c(120,115,94,118,111,102,102,131,104,107,115,139,115,113,114,105,115,134,10
+ 109,93,118,109,106,125,150,142,119,127,141,149,144,142,149,161,143,140,148,
+ 149,141,146,159,152,135,134,161,130,125,141,148,153,145,137,147,169)
Output:
>x
[1] 120 115 94 118 111 102 102 131 104 107 115 139 115 113 114 105 115 134 109
106 125 150 142 119 127 141 149 144 142 149 161 143 140 148 149 141 146 159 152
130 125 141 148 153 145 137 147 169
Code:
n<-length(x)
Output:
n
[1] 55
Code:
mean(x)
output:
33
[1] 130
Code:
> sd(x)
Output:
[1] 19.16691
Code:
> z<-((mean(x)-140)/(sd(x)/sqrt(n)))
Output:
>z
[1] -3.869272
Code:
> pnorm(z)
Output:
[1] 0.8413337
Code:
> 1-pnorm(z)
Output:
[1] 0.1586663
Code:
> 2* 0.1586663
34
Output:
[1] 0.3173326
Q) The coin split 100 times and coin up head 43 times test the claim.That is the fair
of significance
Code:
prop.test(43,100,0.5,conf.level=0.95)
output:
0.3326536 0.5327873
sample estimates:
0.43
LAB TEST-1
35
1)An outbreak of salmonella-related illness was attributed to ice produced at a
Is there evidence that the mean level pf Salmonella in ice cream greater
SOLUTION:
alternative="greater",mu=0.3
code:
t.test(v,alternative="greater",mu=0.3)
Result:
data: v
36
95 percent confidence interval:
0.3245133 Inf
sample estimates:
mean of x
0.4564444
CONCLUSION:
P-VALUE=0.02927
2)Comparing two independent sample means, taken from two populations with
37
different countries with unknown population variances. Is there any
A: 175 168 168 190 156 181 182 175 174 179
B: 185 169 173 173 188 186 175 174 179 180
CODE:
a<-c(175,168,168,190,156,181,182,175,174,179)
b<-c(185,169,173,173,188,186,175,174,179,180)
var.test(a,b)
Result:
data: a and b
0.5223017 8.4657950
sample estimates:
ratio of variances
2.102784
CODE:
qf(0.95,9,9)
Result:
38
[1] 3.178893
CODE:
t.test(a,b,var.equal=TRUE,paired=FALSE)
Result:
data: a and b
-10.93994 4.13994
sample estimates:
mean of x mean of y
174.8 178.2
CODE:
> qt(0.975,18)
Result:
[1] 2.100922
CONCLUSION:
REGRESSION:
39
Find
1.Scatter plot
Code:
x<-c(132,129,120,113,105,92,84,83.2,88.4,59,80,81.5,71,69.2)
> y<-c(46,48,51,52.1,54,52,59,58.7,61.6,64,61.4,54.6,58.8,58.0)
Output:
>x
[1] 132.0 129.0 120.0 113.0 105.0 92.0 84.0 83.2 88.4 59.0 80.0 81.5 71.0 69.2
>y
[1] 46.0 48.0 51.0 52.1 54.0 52.0 59.0 58.7 61.6 64.0 61.4 54.6 58.8 58.0
Code:
plot(y~x)
output:
40
2.Finding True regression Line
Code:
plot(x,y,pch=16,cex=1.3,,main="Fuel quality test",xlab="iodine value",ylab="cetan value")
> abline(lm(y~x),col="red")
41
Output:
3. Plot regression line and Predict the value of y for given x=90.
Code:
lm(y~x)
output:
42
Call:
lm(formula = y ~ x)
Coefficients:
(Intercept) x
75.2224 -0.2095
THANK YOU
43