Sei sulla pagina 1di 8

Statistical Computing and Simulation

Spring 2014
Assignment 2, Due April 3/2014
1. (a) Write a computer program using the Mid-Square Method using 6 digits to
generate 10,000 random numbers ranging over [0, 999999]. Use the
Kolmogorov-Smirnov Goodness-of-fit test to see if the random numbers that you
create are uniformly distributed. (Note: You must notify the initial seed number
used, and you may adapt 0.05 as the value. Also, you may find warning
messages for conducting the Goodness-of-fit test, and comment on the
Goodness-of-fit test. )
The following is a program of mid-square generator:
midsq=function(seed,n) {
temp=seed
z=NULL
for (k in 1:n) {
temp=temp^2
temp=temp%% (10^9)
temp=floor(temp/1000)
z=c(z,temp)
}
return(z)
}

a =midsq(123456,100)
cor(a[-1],a[-100])
[1] -0.04486407
# It looks good using 123,456 as the seed.
a =midsq(123456,10000)
a1=floor(a/10^5)
table(a1)
# The result looks bad if 10,000 numbers are
generated.
0
1
2
3
4
5
6
7
8
9
10
18
15
13
11
16 9871 13
14
19
ks.test(a1,punif)
Because the Kolmogorov-Smirnov Goodness-of-fit test cannot handle data with
ties, we suggest using the 2 Goodness-of-fit test which gives similar
conclusion.
(b) The calculators use U n+1 = ( + U n ) 5 (mod 1) to generate random numbers
between 0 and 1. Compare the result with those in (a) & (b), and discuss your
finding based on the comparison.
First, we will write down a program:
palm.cal=function(seed,irr,n) {
temp=seed

z=NULL
for (k in 1:n) {
temp=(temp+irr)^5
temp=temp-floor(temp)
z=c(z,temp)
}
return(z)
}

Plugging the random number seed 100 and generating 10,000 numbers, the
number of observations in 10 equal-spaced intervals are 948, 1060, 1028, 964, 975,
964, 1029, 1028, 997, 1007. The test statistic is 5.985 and the p-value is 0.741.
(c) In class, we often use simulation tools in R, such as sample or
ceiling(runif), to generate random numbers from 1 to k, where k is a natural
number. Using graphical tools (such as histogram) and statistical tests to check
which one is a better tool in producing uniform numbers between 1 and k. (Hint:
You may check if the size of k matters by, for example, assigning k a small and
big value.)
We can try k = 15 & 100 and see if it makes any differences. We shall first check
the uniformity of 10,000 random numbers, and separating them into 15 & 10
groups for k = 15 & 100, respectively.
t1=sample(c(1:15), 10000, T)
t2=sample(c(1:100), 10000, T)
t3=ceiling(15*runif(10000))
t4=ceiling(100*runif(10000))
a1=table(t1)
a2=table(ceiling(t2/10))
a3=table(t3)
a4= table(ceiling(t4/10))
The following tables show the numbers in each class:
1

10

11

12

13

14

15

a1

677

633

648

664

668

713

671

651

659

651

696

667

664

648

710

a2

1012

1061

990

1030

979

965

1020

948

997

998

a3

668

626

665

682

723

672

654

662

671

676

659

638

687

686

631

a4

990

1017

986

1011

1016

986

997

1003

954

1040

Note: It is suggested that we repeat the preceding simulation for about 100 or
1,000 times and record the p-values of 2 goodness-of-fit test. The ideal
distribution of p-values should look like uniform and the following is the program
for calculating p-values of testing the case of k = 15 for 1,000 simulation runs.
t1=NULL
be=10000/15

for (i in 1:1000) {
a1=sample(c(1:15), 10000, T)
a2=ceiling(15*runif(10000))
b1=table(a1)
b2=table(a2)
c1=sum((b1-be)^2/be)
c2=sum((b2-be)^2/be)
d1=pchisq(c1,14)
d2=pchisq(c2,14)
t1=cbind(t1,c(d1,d2))
}
The histograms of p-values, as well as the ks.test for uniformity, show that
sample is better than runif since the p-values of runif do not look like
uniform distribution (p-value < 0.05).

Also, it is obvious that these numbers do not violate the assumption of


uniformity (via chi-square goodness-of-test). We can also use the acf and pacf
to briefly check the independence, and the results seem random.
2. (a) Fibonacci numbers, defined as X n+1 = X n + X nm (mod 1) , is another way of
generating random numbers. The usual setting is letting m = 1 and see if ( X n )' s
are a sequence of random numbers from U(0,1). However, xn < xn +1 < xn 1 and
xn 1 < xn +1 < xn never appear under this setting. In general, the performances of
Fibonacci numbers would be close to random as m increases. Write a program
to generate Fibonacci numbers and test if they are good random numbers given
varies choices of m. (Note: You could simulate 10,000 random numbers, and use
goodness-of-tests & independence tests to evaluate Fibonacci numbers.)

The program is as following:


fibonacci.num=function(seed,n){
length m+1
m=length(seed)-1
for (i in 1:n) {
x=seed[i]+seed[m+i]
x1=x-floor(x)
seed=c(seed,x1)
}
return(seed[-c(1:(m+1))])
}

#Initial seeds: vector of

#n is # of runs

It is obvious that the numbers generated will have the same numbers of
significant digits as seed1 and seed2. Still, I conducted 200 simulation runs,
each run with 500 random numbers from the Fibonacci numbers, and use the
Kolmogorov-Smirnov goodness-of-fit test and the up-and-down independence test
to check its performances. If the numbers generated behave randomly, the
p-values of 200 uniformity and independence tests shall look like U(0,1). We
can use the command ks.test to check the performances of 200 p-values. The
graph below is the p-values of checking U(0,1), given the values of m from 1 to 10.
We can see that the uniformity is satisfied, but not the independence, even with
larger values of m.

1+ 5
(the golden
2
ratio) or other irrational numbers to replace the value of , to generate random
numbers between 0 and 1. Using graphical tools (such as histogram) and statistical

(b) In addition to U n+1 = ( + U n ) 5 (mod 1) , we can use =

tests to check if or has a better performance in producing uniform numbers


between 0 and 1.
First, similar to (b) in problem #1, I use the following program:
palm.cal=function(seed,irr,n) {
temp=seed
z=NULL

for (k in 1:n) {
temp=(temp+irr)^5
temp=temp-floor(temp)
z=c(z,temp)
}
return(z)
}

The random number seed is uniformly distributed between 1 and 100,000. The
simulation is repeated 1,000 times and 10,000 random numbers are generated in
each simulation. The 10,000 random numbers are separated into 10 equal-spaced
intervals to check the uniformity by the 2-test. Here, I check the test statistics for
and e, among 1,000 simulation runs, the number of times that the test statistics
greater than the critical bound of 0.05 are 9 and 4, respectively. It seems and e
(and as well) can be used to generate random numbers from (0, 1). However, if
the histogram of testing p-values (not shown here) shows different information and
it does not look like uniform distribution.
3. Write your own R programs to perform Gap test, Permutation test, and run test.
Then use this program to test if the uniform random numbers generated from
Minitab (or SAS, SPSS, Excel) and R are independent.
The programs for Gap test and Permutation test are as following:
gap.test=function(data,a,b){
n=length(data)
x=c(1:n)*(a<data & data<b)
x1=x[x>0]
y=x1[-1]-x1[-length(x1)]-1
return(table(y))
}

I suggest looking at the counts of each possible numbers before doing the 2 -test.
I generated 6,000 random numbers from U(0,1) in R and got 3,622 numbers from
the program above using = 0.2, = 0.8 (i.e., from geometric distribution) with
table values:
x
0
count 2179

1
2
3
867 357 134

4
52

5
20

6
9

7
1

8
2

9
1

The classes are 0, 1, 2, 3, 4, 5, 6, and 7+, which gives the expected counts of
2173.2, 869.28, 347.712, 139.0848, 55.63392, 22.253568, 8.901427, and
5.934285. The 2 statistics is 1.552602, indicating that we do not reject the
numbers are independent.
The program of the permutation test is as following:
permute.test=function(data,k){
y=rep(10,k)^c((k-1):0)
x=matrix(data,ncol=k,byrow=T)
x1=apply(x,1,rank)
yy=apply(x1*y,2,sum)
return(table(yy))

I choose k = 3, i.e., 6 possible combinations. Check with 6,000 numbers from


runif in R and it seems that the random number generator performs fine.
Comb. 123 132 213 231 312 321
Counts 340 353 325 298 360 324
Using the 2-test, the numbers above do not fail the uniform distribution.
The following is the program of up-and-down test:
run.test=function(data) {
n=length(data)
ave=(2*n-1)/3
std=sqrt((16*n-29)/90)
x=(data[-1]>data[-n])^1
x1=sum(x[-1]!=x[-(n-1)])+1
value=(x1-ave)/std
return(pnorm(value))
}

I generate 10,000 uniform random numbers 1,000 times and see if the testing
statistics generated follow uniform distribution. Theoretically, the p-values of
1,000 replications shall follow the distribution U(0,1). Out of 1000 runs, I found
that there are 52 and 99 times of p-value smaller than 0.05 and 0.10 respectively.
Also, the p-value of the ks-test applying to these 1,000 p-values for U(0,1) is
0.1396.
4. Write a small computer program to perform Up-and-down test. Then use this
program and uniform random numbers generated from R to check if the mean and
variance for the number of runs derived by Levene and Wolfowitz (1944) are
valid.
The following is the program of up-and-down test:
run.test=function(data) {
n=length(data)
ave=(2*n-1)/3
std=sqrt((16*n-29)/90)
x=(data[-1]>data[-n])^1
x1=sum(x[-1]!=x[-(n-1)])+1
value=(x1-ave)/std
return(pnorm(value))
}

I generate 10,000 uniform random numbers 1,000 times and see if the testing
statistics generated follow uniform distribution. Theoretically, the p-values of
1,000 replications shall follow the distribution U(0,1). Out of 1000 runs, I found
that there are 48 and 91 times of p-value smaller than 0.05 and 0.10 respectively.
Also, the p-value of the ks-test applying to these 1,000 p-values for U(0,1) is
0.4232.
12

5. ( U i 6) can be used to approximate N(0,1) distribution, where U i ' s are


i =1

random sample from U(0,1).


(a) Based on = 0.05, compare the results of the Chi-square test and the
Kolmogorov-Smirnov test, and see if there are any differences.
Based on 10,000 simulation runs, the p-values of 2 -test and the
Kolmogorov-Smirnov test are 0.3242 and 0.508, respectively. We can use the
following R program to check their difference in p-values for 1,000 simulation
runs.
temp=NULL
for (i in 1:1000) {
x=matrix(runif(120000),ncol=10000,byrow=F)
x1=apply(x,2,sum)-6
a=pearson.test(x1)[2]
#In S-Plus, use
chisq.gof(x1)[3]
b=ks.test(x1,pnorm)[2]
#In S-Plus, use ks.gof(x1)[2]
temp=cbind(temp,c(a,b))
}

Note that temp in the program is of data type list and we can use
> matrix(unlist(temp),ncol=2,byrow=T)
to produce two columns of p-values for 2 -test and the Kolmogorov-Smirnov
test. Further, we can check the number of times (out of 1,000 simulation runs) that
12

10,000 runs of (U i 6) is rejected (i.e., with p-value ):


i =1

= 0.01

= 0.05

= 0.10

73

134

17

69

147

Pearson

2 -test
KS-test

KS-test has larger powers than Pearson 2 -test. Of course, the hypothesis
12

that (U i 6) resembles normal variables is rejected.


i =1

(b) Design two tests of independence (which are not the same as you saw in class)
and apply them on the random sample that you generate.
Note that you need to simulate random sample of U(0,1) via the LCG. Also, a
minimum of 10,000 simulation runs is required.
A lot of tools that can help to do the independence test. For example, since the
random numbers follow normal assumption, we can use correlation, Time
Series, and Durbin-Watson. Also, distribution assumption can be used,
including chi-square distributions since the sum of k independent normal variables
squares follows a chi-square distribution with k1 degree of freedom.

6. Tools in Time Series can also be used to check if the random numbers generated
are independent, but the normality assumption is required. Still, we can apply the
acf and pacf plots to check if the random numbers violate the independence
assumption. Apply the time series tools to the random numbers generated by
12

( U i 6) and comment on your findings.


i =1

In R, the sample acf and pacf values can be found by using acf(data)[*] and
the standard error approximately equals to 1 / n , where n is the sample size. The
following program can be used to check there are unusual acf or pacf values for the
12

random numbers generated by ( U i 6) . Note that, as a demonstration, I only


i =1

checked the first 24 acf and pacf values, and the first value of acf should be 1.
t1=NULL
t2=NULL
for (i in 1:1000) {
x=matrix(runif(120000),ncol=10000,byrow=F)
x1=apply(x,2,sum)-6
a=acf(x1,plot=F)
a1=unlist(a)
a2=as.numeric(a1[2:25])
b=pacf(x1,plot=F)
b1=unlist(b)
b2=as.numeric(b1[1 :24])
t1=cbind(t1,a2)
t2=cbind(t2,b2)
}
bound=2/sqrt(10000)
a3=(t1 > bound)*1
b3=(t2 > bound)*1
a4=apply(a3,1,sum)
b4=apply(b3,1,sum)

I did not find the numbers of acf or pacf values larger than the bound exceeding
5%. The independence test of Fibonacci numbers is similar and is omitted.

Potrebbero piacerti anche