Sei sulla pagina 1di 43

R-LAB REPORT

2018-2019

MARCH-23

VIT-AP
NAME : D.VENKATARAMANA
REG NO : 17BME7082

1
NAME: D.VENKATARAMANA
REG NO: 17BME7082

1.Basic operation on R
2.Read data analysis
3.Data analysis and random sampling
4.Random sampling
5.Binomial distribution
6.Normal distribution
7.Z-Test
8.LAB-TEST
9.Regression

2
Basic operations on R:

1. Simple Operations

a) Enter the data {2,4,3,5,6}directly and store it in a variable x.


Code:
x<-c(2,4,3,5,6)

Output:

>x

[1] 2 4 3 5 6

b) Find the number of elements in x, i.e. in the data list.

Code:

3
length(x)

output:

[1] 5

c) Find the last element of x.

code:

x[length(x)]

output:

[1] 6

d) Find the minimum element of x.

code:

min(x)

output:

[1] 2

e) Find the maximum element of x.

code:

> max(x)

Output:

[1] 6

2. Enter the data {1, 2, …. ,19,20} in a variable x.

Code:

4
> x<-1:20

Output:

[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

a) Find the 3rd element in the data list.

Code:

> x(3)

Output:

> 3

b) Find 3rd to 5th element in the data list.

Code:

> a<-c(2,3,5,9,11,12,23,6)

Output:

>a

[1] 2 3 5 9 11 12 23 6

Code:

> a[3]

Output:

[1] 5

c) Find 2nd, 5th, 6th, and 12th element in the list

code:

5
> a[c(2,5,6,)]

Output:

[1] 3 11 12

d) Print the data as {20, 19, …, 2, 1} without again entering the data.

Code:

> x<-20:1

Output:

>x

[1] 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1

a) Create a data list (4, 4, 4, 4, 3, 3, 3, 5, 5, 5) using ‘rep’ function.

Code:

x<-c(rep(4,4),rep(3,3),rep(5,3))

output:

>x

[1] 4 4 4 4 3 3 3 5 5 5

Code:

> x<-rep(c(4,3,5),c(4,3,3))

c) Create a list (3, 1, 5, 3, 2, 3, 4, 5, 7,7, 7, 7, 7,7, 6, 5, 4, 3, 2, 1, 34, 21, 54) using one

Code:

> x<-c(3,1,5,3,(2:5),rep(7,5),(7:1),34,21,54)

6
Output:

>x

[1] 3 1 5 3 2 3 4 5 7 7 7 7 7 7 6 5 4 3 2 1 34 21 54

d) First create a list (2, 1, 3, 4). Then append this list at the end with another list (5,
Check whether the number of elements in the augmented list is 11. 4

code:

> x<-c(5,7,12,6,-8)

> x<-c(2,1,3,4)

> x<-c(x,c(5,7,12,6,-8))

Output:

>x

[1] 2 1 3 4 5 7 12 6 -8

Code:

> length(x)

Output:

[1] 9

Code:

> length(x)=11

Output:

> length(x)

7
[1] 11

> length(x)==11

[1] TRUE

4. (a) Print all numbers starting with 3 and ending with 7 with an increment of 0:5.
numbers in x.
Code:
> x<-0.5:7

Output:

>x

[1] 0.5 1.5 2.5 3.5 4.5 5.5 6.5

Code:

> seq(3,7,0.5)

Output:

[1] 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0

(b) Print all even numbers between 2 and 14 (both inclusive)


Code:
> seq(2,14,2)

Output:

[1] 2 4 6 8 10 12 14

(a) Type 2*x and see what you get. Each element of x is multiplied by
Code:
> 2*x

8
Output:

[1] 4 8 12 16 20 24 28

Code:

> seq(1,10,1)

Output:

[1] 1 2 3 4 5 6 7 8 9 10

5. Few simple statistical measures:

(a) Enter data as 1,2, … ,10.

Code:

> seq(1,10,1)

Output:

[1] 1 2 3 4 5 6 7 8 9 10

(b) Find sum of the numbers.

Code:

> x<-seq(1,10,1)

Output:

> sum(x)

Output:

[1] 55

(c) Find mean, median.

9
Code:

> mean(x)

Output:

[1] 5.5

Code:

> median(x)

Output:

[1] 5.5

Code:

> mad(x)

Output:

[1] 3.7065

(e) Find the value of 1 𝑛 ∑ |𝑥𝑖 − 𝑥̅| 𝑛 𝑖=1 , This is known as mean deviation about m
Check whether 𝑀𝐷𝑥̅ is less than or equal to standard deviation.

Code:

> sd(x)

Output:

[1] 3.02765

Code:

> mad(x)<=sd(x)

10
Output:

[1] FALSE

Read data and analyse:


Reading a data file and working with it:
a) Read the file first and store it in a.
Code:
a<-read.csv("house_data_1.csv",header=T)

output:
a

Price FloorArea Rooms Age CentralHeating

1 52.00 1225 3 6.2 no

2 54.75 1230 3 7.5 no

3 57.50 1200 3 4.2 no

4 57.50 1000 2 8.8 no

11
5 59.75 1420 4 1.9 yes

6 62.50 1450 3 5.2 no

7 64.75 1380 4 6.6 yes

8 67.25 1510 4 2.3 no

9 67.50 1400 5 6.1 no

10 69.75 1550 6 9.2 no

11 70.00 1720 6 4.3 yes

12 75.50 1700 5 4.3 no

13 77.50 1660 6 1.0 yes

14 78.00 1800 7 7.0 yes

15 81.25 1830 6 3.6 yes

16 82.50 1790 6 1.7 yes

17 86.25 2010 6 1.2 yes

18 87.50 2000 6 0.0 yes

19 88.00 2100 8 2.3 yes

20 2.00 2240 7 0.7 yes

b) How many rows are there in this table? How many columns are there?
Code:
nrow(a)

output:

[1] 20

Code:

12
> ncol(a)

Output:

[1] 5

c) How to find the number of rows and number of columns by a single command?

Code:

> c(nrow(a),ncol(a))

Output:

[1] 20 5

d) What are the variables in the data file?

Code:

> names(a)

Output:

[1] "Price" "FloorArea" "Rooms" "Age" "CentralHeating"

e) If the file is very large, naturally we cannot simply type `a', because it will cover the entire sc
be able to understand anything. So how to see the top or bottom few lines in this file?

Code:

> head(a)

Output:

Price FloorArea Rooms Age CentralHeating

1 52.00 1225 3 6.2 no

2 54.75 1230 3 7.5 no

3 57.50 1200 3 4.2 no

13
4 57.50 1000 2 8.8 no

5 59.75 1420 4 1.9 yes

6 62.50 1450 3 5.2 no

Code:

> tail(a)

Output:

Price FloorArea Rooms Age CentralHeating

15 81.25 1830 6 3.6 yes

16 82.50 1790 6 1.7 yes

17 86.25 2010 6 1.2 yes

18 87.50 2000 6 0.0 yes

19 88.00 2100 8 2.3 yes

20 92.00 2240 7 0.7 yes

f) If the number of columns is too large, again we may face the same problem. So how to see th
first 3 columns?

Code:

a[1:5,1:3]

output:

Price FloorArea Rooms

1 52.00 1225 3

2 54.75 1230 3

3 57.50 1200 3

14
4 57.50 1000 2

5 59.75 1420 4

g) How to get 1st, 3rd, 6th, and 10th row and 2nd, 4th, and 5th column?

Code:

> a[c(1,3,6,10),c(2,4,5)]

Output:

FloorArea Age CentralHeating

1 1225 6.2 no

3 1200 4.2 no

6 1450 5.2 no

10 1550 9.2 no

h) How to get values in a specific row or a column?

Code:

> a[6,10]

NULL

> a[2,5]

[1] no

Levels: no yes

> a[1,2]

[1] 1225

> a[2,]

Price FloorArea Rooms Age CentralHeating

15
2 54.75 1230 3 7.5 no

> a[,5]

[1] no no no no yes no yes no no no yes no yes yes yes yes yes yes yes yes

Levels: no yes

> a[3]

Rooms

1 3

2 3

3 3

4 2

5 4

6 3

7 4

8 4

9 5

10 6

11 6

12 5

13 6

14 7

15 6

16 6

16
17 6

18 6

19 8

20 7

> a[1,]

Price FloorArea Rooms Age CentralHeating

1 52 1225 3 6.2 no

> a[0,]

[1] Price FloorArea Rooms Age CentralHeating

<0 rows> (or 0-length row.names)

> a[5]

CentralHeating

1 no

2 no

3 no

4 no

5 yes

6 no

7 yes

8 no

9 no

10 no

17
11 yes

12 no

13 yes

14 yes

15 yes

16 yes

17 yes

18 yes

19 yes

20 yes

3) Calculate simple statistical measures using the values in the data file

a) Find means, medians, standard deviations of Price, Floor Area, Rooms, and Age

code:

mean(a[,1])

[1] 71.5875

> median(a[,1])

[1] 69.875

> sd(a[,1])

[1] 12.21094

> mean(a$Age)

18
[1] 4.205

> sd(a$Age)

[1] 2.786523

> sum(a$CentralHeating=="yes")

[1] 11

b) Draw histograms of Prices, FloorArea, and Age.

Code:

> hist(a$FloorArea,freq=F)

Output:

Draw all the graphs in (c), (d), and (e) in the same graph paper. Draw all the graphs in (c), (d), a

19
graph paper.

Draw all the graphs in (c), (d), and (e) in the same graph paper.

par(mfrow=c(2,2))

> plot(a$Price,a$FloorArea,col=1,pch=3)

> plot(a$Price,a$FloorArea,col=1,pch=2)

> hidt(a$FloorArea)

20
21
Data analysis and random sampling:
a) Matrices and arrays are represented as vectors with dimensions: Create one
to 12 numbers with 3X4 order.
CODE:
x<-1:12 > x [1] 1 2 3 4 5 6 7 8 9 10 11 12
> dim(x)<-c(3,4)
Output:
X x [,1] [,2] [,3] [,4]
[1,] 1 4 7 10
[2,] 2 5 8 11
[3,] 3 6 9 12

b) Create same matrix with matrix function.


matrix(x,3,4)
code:
[,1] [,2] [,3] [,4]
[1,] 1 4 7 10
[2,] 2 5 8 11
[3,] 3 6 9 12
a) Give name of rows of this matrix with A,B,C.
Code:
matrix(1:12,3,4)

output:

[,1] [,2] [,3] [,4]


Code:
x<-
matrix(1:12,nrows=3,dinames=list(c("A","B","C"),c("P","

22
output:
> x
> x [,1] [,2] [,3] [,4]
[1,] 1 4 7 10
[2,] 2 5 8 11
[3,] 3 6 9 12
Code:
> rownames(x)<-LETTERS[1:3]
Output:
> x [,1] [,2] [,3] [,4]
A 1 4 7 10
B 2 5 8 11
C 3 6 9 12
b) Transpose of the matrix.
[1,] 1 4 7 10
[2,] 2 5 8 11
[3,] 3 6 9 12
> matrix(1:12,nrow=3,byrow=T)
[,1] [,2] [,3] [,4]
[1,] 1 2 3 4
[2,] 5 6 7 8
[3,] 9 10 11 12
c) Use functions cbind and rbind separately to create different matrices.
Code:
> rbind(A=c(1,2,3,4),B=c(5,6,7,8))
Output:
[,1] [,2] [,3] [,4]
A 1 2 3 4
B 5 6 7 8
Code:
> cbind(P=c(1,5,9),
Q=c(2,6,10)) P Q

23
[1,] 1 2
[2,] 5 6
[3,] 9 10
Code:
> t(x) A B C
Output:
[1,] 1 2 3
[2,] 4 5 6
[3,] 7 8 9
[4,] 10 11 12
d) Use arbitrary numbers to create matrix.
Code:
p<-matrix(1:4,4,4)
output:
> p [,1] [,2] [,3] [,4]

g) Verify matrix multiplication.


Code:
> p%*%q
Output:
[,1] [,2] [,3] [,4]
[1,] 22 22 22 22
[2,] 44 44 44 44
[3,] 66 66 66 66
[4,] 88 88 88 88

24
3.RANDOM SAMPLING
a) In R, you can simulate these situations with the sample function. Pick five nu
random from the set 1:40.
Code:
> sample(1:40,5)
Output:
[1] 30 16 1 24 14
Code:
> sample(1:40,5)
Output:
[1] 36 23 26 27 35
Code:
> sample(1:40,5)
Output:
[1] 13 6 26 28 20
Code:
> sample(1:40,5)
Output:
[1] 12 7 9 17 27
b) Notice that the default behaviour of sample is sampling without replacemen
samples will not contain the same number twice, and size obviously cannot be b
length of the vector to be sampled. If you want sampling with replacement, then
add the argument replace=TRUE. Sampling with replacement is suitable for mo
tosses or throws of a die. So, for instance, simulate 10 coin tosses.
sample(c("suc","fail"),10,replace =T,prob=c(0.9,0.1)) [
"suc" "suc" "suc" "fail" "suc" "suc" "suc" "suc" "suc"
> sample(c("suc","fail"),10,replace =T,prob=c(0.9,0.1))
"suc" "suc" "suc" "suc" "fail" "suc" "suc" "suc" "suc"

c) In fair coin-tossing, the probability of heads should equal the probability of ta


of a random event is not restricted to symmetric cases. It could be equally well a
cases, such as the successful outcome of a surgical procedure. Hopefully, there w
25
better than 50% chance of this. Simulate data with nonequal probabilities for th
a 90% chance of success) by using the prob argument to sample.

d) The choose function can be used to calculate the following express.


> choose(40, 5) [1] 658008

e) Find 5!
factorial(5) [1] 120

BINOMIAL DISTRIBUTION:

Syntax:

• Is x is the number of Binomial events

• 𝑃(𝑋 = 𝑥) dbinom(x, size, prob)

• 𝑃(𝑋 ≤ 𝑥) pbinom(x, size, prob, lower.tail=TRUE)

• 𝑃(𝑋 > 𝑋) pbinom(x, size, prob, lower.tail=FALSE)

• qbinom() gives the quantiles for the binomial distribution

EXERCISE IN LAB

dbinom(3,5,0.95)

[1] 0.02143438

> dbinom(c(0,1,2,3,4),5,0.95)

[1] 0.0000003125 0.0000296875 0.0011281250 0.0214343750 0.2036265625

> dbinom(0:4,5,0.95)

26
[1] 0.0000003125 0.0000296875 0.0011281250 0.0214343750 0.2036265625

>

> dbinom(2:4,5,0.95)

[1] 0.001128125 0.021434375 0.203626563

> sum(dbinom(2:4,5,0.95))

[1] 0.2261891

> pbinom(2:4,5,0.95)

[1] 0.001158125 0.022592500 0.226219063

> pbinom(0:4,5,0.95)

[1] 0.0000003125 0.0000300000 0.0011581250 0.0225925000 0.2262190625

> pbinom(4,5,.95)-pbinom(1,5,0.95)

[1] 0.2261891

> x<-seq(0,5,by=0.1)

> y<-pbinom(x,5,0.95)

> pbinom(4,5,.95)-pbinom(1,5,0.95)

[1] 0.2261891

> plot(x,dbinom(x,5,0.95),xlab="no.of ready terminals",ylab="prob",type="h",main


dist")

There were 45 warnings (use warnings() to see them)

> plot(x,dbinom(x,5,0.95),xlab="no.of ready terminals",ylab="prob",type="s",main


dist")

27
There were 45 warnings (use warnings() to see them)

> plot(x,dbinom(x,5,0.95),xlab="no.of ready terminals",ylab="prob",type="h",main


dist")

There were 45 warnings (use warnings() to see them)

> plot(x,dbinom(x,5,0.95),xlab="no.of ready terminals",ylab="prob",type="z",main


dist")

Normal distribution:

IQ normally distributed the mean of 100 and standard deviation=15


What percentage of people have a less then 125
Code:
A) pnorm(125,100,15,lower.tail = TRUE)
output:
B) [1] 0.9522096

1) What percentage of people have a greater then 110


Code:
pnorm(110,100,15,lower.tail = FALSE)

output:

[1] 0.2524925

28
3) What percentage of people have 110 and 125

Code:

> pnorm(125,100,15,lower.tail = TRUE)-pnorm(110,100,15,lower.tail=FALSE)

Output:

[1] 0.6997171

4)Find 25% of standard normal

Code:

qnorm(.25)

output:

[1] -0.6744898

5)find 25% for the normal distribution with mean=2,standard deviation=3

Code:

> qnorm(.25,2,3)

Output:

[1] -0.02346925

6)what IQ separates the lower 25% from the others

Code:

qnorm(0.25,100,15)

29
output:

[1] 89.88265

7)what IQ seperates the top 10% from the others

Code:

> qnorm(0.01,100,15)

Output:

[1] 65.10478

8)Generate 20 random numbers from a normal distribution with mean=572 and sta
deviation=51 calculate the mean & standard deviation data set

Code:

norm<-rnorm(20,572,51)

output:

> norm[1:20]

[1] 655.3100 552.7909 563.5219 570.3602 598.1708 636.6796 517.9256 603.6028 5


547.1251 567.3546 587.7043 552.8809 531.8193 520.1667 438.1240 560.2786 558.4
577.9000

> mean(norm)

[1] 563.0454

> sd(norm)

[1] 46.16659

30
9)make an appropriate histogram data and visually if the normal density curve and
density estimate or similar

Code:

hist(norm,main="normal distribution",prob=TRUE)

output:

Code:

curve(dnorm(x,572,51),add=TRUE)

output:

31
32
Z-test:
Test the hypothesis that the mean systolic blood pressure in a certain population e
mmHg. The standard deviation has a known value of 20 and a data set of 55 patien
120,115,94,118,111,102,102,131,104,107,115,139,115,113,114,105,115,134,109,
109,93,118,109,106,125,150,142,119,127,141,149,144,142,149,161,143,140,148,
149,141,146,159,152,135,134,161,130,125,141,148,153,145,137,147,169
Code:
x<-c(120,115,94,118,111,102,102,131,104,107,115,139,115,113,114,105,115,134,10

+ 109,93,118,109,106,125,150,142,119,127,141,149,144,142,149,161,143,140,148,

+ 149,141,146,159,152,135,134,161,130,125,141,148,153,145,137,147,169)
Output:
>x

[1] 120 115 94 118 111 102 102 131 104 107 115 139 115 113 114 105 115 134 109
106 125 150 142 119 127 141 149 144 142 149 161 143 140 148 149 141 146 159 152
130 125 141 148 153 145 137 147 169

Code:
n<-length(x)
Output:
n

[1] 55

Code:

mean(x)

output:

33
[1] 130

Code:

> sd(x)

Output:

[1] 19.16691

Code:

> z<-((mean(x)-140)/(sd(x)/sqrt(n)))

Output:

>z

[1] -3.869272

Code:

> pnorm(z)

Output:

[1] 0.8413337

Code:

> 1-pnorm(z)

Output:

[1] 0.1586663

Code:

> 2* 0.1586663

34
Output:

[1] 0.3173326

Q) The coin split 100 times and coin up head 43 times test the claim.That is the fair
of significance

Code:

prop.test(43,100,0.5,conf.level=0.95)

output:

1-sample proportions test with continuity correction

data: 43 out of 100, null probability 0.5

X-squared = 1.69, df = 1, p-value = 0.1936

alternative hypothesis: true p is not equal to 0.5

95 percent confidence interval:

0.3326536 0.5327873

sample estimates:

0.43

LAB TEST-1

35
1)An outbreak of salmonella-related illness was attributed to ice produced at a

certain factory. Scientists measured the level of Salmonella in 9 randomly

sampled batches ice crean.The levels(in MPN/g) were:

0.593 0.142 0.329 0.691 0.231 0.793 0.519 0.392 0.418

Is there evidence that the mean level pf Salmonella in ice cream greater

than 0.3 MPN/g?

SOLUTION:

H0(Null hypothesis): μ= 0.3

Ha(alternate Hypothesis): μ>0.3

we need to include the options

alternative="greater",mu=0.3

code:

v = c(0.593, 0.142, 0.329, 0.691, 0.231, 0.793, 0.519, 0.392, 0.418)

t.test(v,alternative="greater",mu=0.3)

Result:

One Sample t-test

data: v

t = 2.2051, df = 8, p-value = 0.02927

alternative hypothesis: true mean is greater than 0.3

36
95 percent confidence interval:

0.3245133 Inf

sample estimates:

mean of x

0.4564444

CONCLUSION:

P-VALUE=0.02927

SO mean salmonella level in the ice cream is above 0.3MPN/g

Reject the null hypothesis at the 0.05 significance level

2)Comparing two independent sample means, taken from two populations with

unknown variance.The following data shows the heights of individuals of two

37
different countries with unknown population variances. Is there any

significant difference b/n the average heights of two groups.

A: 175 168 168 190 156 181 182 175 174 179

B: 185 169 173 173 188 186 175 174 179 180

CODE:

a<-c(175,168,168,190,156,181,182,175,174,179)

b<-c(185,169,173,173,188,186,175,174,179,180)

var.test(a,b)

Result:

F test to compare two variances

data: a and b

F = 2.1028, num df = 9, denom df = 9, p-value = 0.2834

alternative hypothesis: true ratio of variances is not equal to 1

95 percent confidence interval:

0.5223017 8.4657950

sample estimates:

ratio of variances

2.102784

CODE:

qf(0.95,9,9)

Result:

38
[1] 3.178893

CODE:

t.test(a,b,var.equal=TRUE,paired=FALSE)

Result:

Two Sample t-test

data: a and b

t = -0.94737, df = 18, p-value = 0.356

alternative hypothesis: true difference in means is not equal to 0

95 percent confidence interval:

-10.93994 4.13994

sample estimates:

mean of x mean of y

174.8 178.2

CODE:

> qt(0.975,18)

Result:

[1] 2.100922

CONCLUSION:

So the above conditions

Null hypothesis can be accepted

REGRESSION:

39
Find

1.Scatter plot

Code:
x<-c(132,129,120,113,105,92,84,83.2,88.4,59,80,81.5,71,69.2)

> y<-c(46,48,51,52.1,54,52,59,58.7,61.6,64,61.4,54.6,58.8,58.0)

Output:

>x

[1] 132.0 129.0 120.0 113.0 105.0 92.0 84.0 83.2 88.4 59.0 80.0 81.5 71.0 69.2

>y

[1] 46.0 48.0 51.0 52.1 54.0 52.0 59.0 58.7 61.6 64.0 61.4 54.6 58.8 58.0

Code:

plot(y~x)

output:

40
2.Finding True regression Line

Code:
plot(x,y,pch=16,cex=1.3,,main="Fuel quality test",xlab="iodine value",ylab="cetan value")

> abline(lm(y~x),col="red")

41
Output:

3. Plot regression line and Predict the value of y for given x=90.

Code:
lm(y~x)

output:

42
Call:

lm(formula = y ~ x)

Coefficients:

(Intercept) x

75.2224 -0.2095

THANK YOU

43

Potrebbero piacerti anche