Sei sulla pagina 1di 3

STATS

20 Fall 2016

Final Exam


* Finish and submit within 110 minutes.
* Submit your work in R script only.
* If needed, make comments followed by #
* NO chatting or emailing during the final.


1. A basketball player wants to know how many shots he needs to try during a
game to reach a score of 20 points. Assume that the distribution of his scores is as
below.

Score Percentage
3 10%
2 45%
1 (free throws) 20%
0 25%

We want to find the number of shots he tries until he reaches a score of 20. (Note
that it can be over 20, depending on the score from the last shot.) Write a function
score_report that exports a text file report.txt which contains reports of the
scores from all shots until he reaches 20 and summary of his shots. (10 points)

The example result:




2. Simulation

A certain type of missile has failure probability of 0.02. We want to find the
average launch number of the fourth failure. For example, the launch number of
fourth failure in following experiment is 23.
SSSSS SSFSS SSSSF SFSSS SSF
To get one estimate of the average launch number, we calculate a mean of 100
launch numbers of the fourth failure. Simulate 1000 times and get 1000 estimates
of the average. Compare the mean of 1000 averages and the theoretical expected
number of 200. No need to write as a function. (Note: The launch number of
fourth failure follows Negative Binomial distribution.) (15 points)




3. In R script on CCLE, you are given a vector date.



We want to write a function date_format that changes the format of the date. The
result we want to get from the function is below. (20 points)





4. Data Analysis

Download final.csv from CCLE and import the data to R. The data is from U.S.
National Longitudinal Study of Young Women in 1988. See the table for the
variable description.

Variable Name Description
idcode id of the respondent
age age in current year
race race
married married
grade current grade completed
collgrad college graduated
c_city lives in central cities
union union worker
wage hourly wage
hours usual hours work
ttl_exp total work experience
tenure job tenure (years)

1) Find (i) the number of columns and rows of the data, (ii) variable names, and
(iii) variable types. (5 points)

2) Subset the dataset (i) where the woman is single and lives in a central city
(subset1), and (ii) where the woman is married and does not live in a central city
(subset2). (iii) Compare the average working hours between subset1 and subset2
and test if they are significantly different. (5 points)

3) Test (i) if there is an association between whether a woman is a union worker
and whether she is a college graduate, and (ii) if their average wages are different
by their races. Interpret the results in the context of the data. (5 points each)

4) Finally, we are interested in which variables are linearly related to ones hourly
wage.
(i) Include age, grade, c_city, hours, and total experience to explain wage to
construct the linear model m1. Which variables are significant at 0.05 level?
Interpret the coefficients of significant variables. (6 points)

(ii) Draw residual plots and check the model assumptions. (3 points)

(ii) Have an updated model m2 without insignificant variables found in
model m1. What differences do you observe in m2? (3 points)

(iii) Test if the two models are different. Are they significantly different? (3
points)

Potrebbero piacerti anche