Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Key Vocabulary:
Statistics subject
data, datum participant
variation experimental unit
categorical
individual observation
quantitative
respondent variable
2. The authors claim that this book is very different from a typical mathematics textbook.
Would you agree or disagree, based on what you read in Chapter 1? Explain.
3. According to the authors, what are the three simple steps to doing Statistics right?
6. Explain the difference between a categorical variable and a quantitative variable. Give an
example of each.
Key Vocabulary:
frequency table contingency table
relative frequency table marginal distribution
distribution conditional distribution
Simpsons Paradox
bar chart independent
pie chart segmented bar chart
1. According to the authors, what are the three rules of data analysis?
2. Explain the difference between a frequency table and a relative frequency table.
Key Vocabulary:
distribution spread
histogram mode
relative frequency unimodal
histogram bimodal skewed
stem-and-leaf display multimodal outliers
dotplot uniform gaps
shape symmetric time plot
center tail re-expressing data
7. When describing a distribution, what three things should you always mention?
8. What should you look for when describing the shape of a distribution?
12. What is meant by re-expressing or transforming data? What is the purpose of re-expressing
or transforming data?
Key Vocabulary:
center quartile
spread interquartile range
mean
midrange percentile
standard deviation
median five-number summary
variance
range boxplot
1. Explain the difference between range and interquartile range. Why is the interquartile
range often a better measure of the spread of a distribution?
4. When is it more appropriate to use the mean as a measure of center rather than the median?
Why?
5. When is it more appropriate to use the median as a measure of center rather than the mean?
Why?
Key Vocabulary:
standard deviation parameter
standardized value statistic
rescaling standard Normal model
z-score 68-95-99.7 Rule N( , )
normal model normal probability plot
1. What unit of measurement is used to describe how far a set of values are from the mean?
3. Briefly describe why standardized units are used to compare values that are measured using
different scales, different units, or different populations.
4. How does adding or subtracting a constant amount to each value in a set of data affect the
mean? Why does this happen?
5. How does multiplying or dividing a constant amount by each value in a set of data (also
called rescaling) affect the mean? Why does this happen?
6. How does adding or subtracting a constant amount to each value in a set of data affect the
standard deviation? Why does this happen?
7. How does multiplying or dividing a constant amount by each value in a set of data (also
called rescaling) affect the standard deviation? Why does this happen?
8. How does standardizing a variable affect the shape, center, and spread of its distribution?
9. In what way does a z-score give an indication of how unusual a value is?
10. How would you describe the shape of a normal curve? Draw several examples.
17. Is there a difference between the 80th percentile and the top 80%? Explain.
18. Describe two methods for assessing whether or not a distribution is approximately normal.
Key Vocabulary:
scatterplot form
association scatter
response variable
direction explanatory variable
correlation coefficient
1. What type of graph is used to show the relationship between two quantitative variables?
2. When describing a scatterplot, what four things should you always mention?
7. What three conditions are necessary in order to use correlation as a measure of association?
8. What does the sign of the correlation coefficient tell you about the association?
11. Sketch an example of a scatterplot that shows two variables with a strong association but a
weak correlation.
Key Vocabulary:
parameter y
linear model mean-mean point
predicted value regression line
residual R2
line of best fit coefficient of determination
slope
1. Explain the quote (by George Box, a famous statistician), All models are wrong, but some
are useful.
9. The R2 value shows how much of the variation in the response variable can be accounted for
by the linear regression model. If R2 = 0.95, what can be concluded about the relationship
between x and y?
_____ % of the variability in _____ is accounted for by the linear relationship with _____.
10. What conditions are necessary before using a linear model for a set of data?
12. If a least-squares regression line fits the data well, what characteristics should the residual
plot exhibit? Sketch a well-labeled example.
Key Vocabulary:
extrapolation influential point
interpolation lurking variable
leverage homogeneous
9. Even if a strong association is observed between two variables, it may be wrong to assume
that changing one variable will cause a change in the other variable. Explain why.
10. Why do scatterplots of summary statistics show less scatter than those of individual
observations.
Key Vocabulary:
Re-expression
2. One of the goals of re-expressing data is to make the distribution appear more symmetric.
Why is this advantageous?
3. Another goal of re-expressing data is to make the spread of several groups more alike. Why
is this advantageous?
6. What type of data often benefits from re-expression by taking the square root of values?
7. What type of data often benefits from re-expression by taking the logarithm of values?
8. What type of data often benefits from re-expression by taking the reciprocal of values?
9. If your data contain zeroes, what must you do before re-expressing using logarithms or
reciprocals? Explain.
10. If a scatterplot of the x-values vs. the logarithm of the y-values appears to be linear, what
type of relationship is there between the original x- and y-values?
12. If a scatterplot of the logarithm of the x-values vs. the logarithm of the y-values appears to be
linear, what type of relationship is there between the original x- and y-values?
14. If a scatterplot of the logarithm of the x-values vs. the y-values appears to be linear, what
type of relationship is there between the original x- and y-values?
Key Vocabulary:
random behavior simulation
random numbers component
pseudorandom numbers outcome
trial
Key Vocabulary:
population sampling frame
sample sampling variability
sample survey homogeneous groups
biased heterogeneous groups
voluntary response
randomization strata
sample
census stratified random sample
convenience sampling
parameter cluster sampling
undercoverage
statistic multistage sampling
nonresponse bias
Simple Random Sample systematic sampling
response bias
(SRS) respondents
16. Explain the difference between nonresponse bias and response bias.
Key Vocabulary:
observational study levels
retrospective study treatment
prospective study block
experiment completely randomized double-blind
random assignment experiment placebo
subjects statistically significant placebo effect
participants control matching
experimental units control group confounding
factors single-blind
Randomization
Replication
Blocking
Key Vocabulary:
probability independent
trial Law of Large Numbers
outcome complement
mutually exclusive
event disjoint
14. If two events A and B are independent, what must be true about Ac and Bc?
15. What is meant by the union of two or more events? Draw a diagram.
17. What is meant by the intersection of two or more events? Draw a diagram.
18. Explain the difference between the union and the intersection of two or more events.
Key Vocabulary:
trial mutually exclusive
outcome independent
event conditional probability
sample space tree diagram
disjoint
2. In general (whether events are disjoint or not), what is the formula for finding P ( A B ) ?
3. Explain the difference between the Addition Rule for disjoint events and the General
Addition Rule.
8. In general (whether events are independent or not), what is the formula for finding
P ( A B) ?
9. Explain the difference between the Multiplication Rule for independent events and the
General Multiplication Rule.
10. State the formula used to determine whether or not two events are independent.
Key Vocabulary:
random variable standard deviation
discrete random variable expected value
continuous random E(X)
variable V(X)
2. Explain the difference between a discrete random variable and a continuous random variable.
9. Explain how to calculate the variance of a discrete random variable X using the formula
X2 = ( xi X ) pi
2
10. Given the variance of a random variable, explain how to calculate the standard deviation.
11. Suppose X2 = 2 and Y2 = 3 and X and Y are independent random variables. According to
the rules for variances, what is X +Y ? What is X +Y ?
2
Key Vocabulary:
Bernoulli trials
Geometric model
Binomial model
3. How do you find the expected value and standard deviation of a geometric random variable?
6. What is the difference between a probability distribution function (pdf) and a cumulative
distribution function (cdf)?
8. Explain the difference between the binomial setting and the geometric setting.
9. How do you find the expected value and standard deviation of a binomial random variable?
n n!
12. In the formula = , what does n represent? What does k represent?
k k !( n k ) !
n n!
What does the value of = represent?
k k !( n k ) !
1! 1 1 5! 5x4x3x2x1
2! 2x1 2 6!
3! 3x2x1 6 7!
4! 4x3x2x1 24 n!
n!
14. What is the value of ?
( n 1)!
Key Vocabulary:
parameter sampling distribution
statistic model
Standard error
proportion Central Limit Theorem
5. How is the size of a sample related to the spread of the sampling distribution?
6. In an SRS of size n, what is true about the sampling distribution of p when the sample size n
increases?
pq
10. When does the formula apply to the standard deviation of p ?
n
11. When the sample size n is large, the sampling distribution of p is approximately normal.
What test can you use to determine if the sample is large enough to assume that the sampling
distribution is approximately normal?
14. Because averages are less variable than individual outcomes, what is true about the standard
deviation of the sampling distribution of x ?
15. What is the mean of the sampling distribution of x , if x is the mean of an SRS of size n
drawn from a large population with mean and standard deviation ?
16. What is the standard deviation of the sampling distribution of x , if x is the mean of an SRS
of size n drawn from a large population with mean and standard deviation ?
17. To cut the standard deviation of x in half, you must take a sample _____ times as large.
18. When should you use to calculate the standard deviation of x ?
n
19. If is not known, what can you use to estimate the standard deviation of x ? What is this
called?
20. What does the central limit theorem say about the shape of the sampling distribution of x ?
Key Vocabulary:
standard error margin of error
confidence level critical value
confidence interval one-proportion z-interval
1. Describe the sampling distribution model of p . What assumptions must you make for this
description to be reasonable?
2. What is the standard error of p ? When would you use standard error in place of standard
deviation?
3. Explain the meaning of the following statement: We are 95% confident that between 42.1%
and 61.7% of sea fans are infected.
7. As the confidence level increases, what happens to the margin of error? What happens to the
confidence interval?
8. By how many times must the sample size n increase in order to cut the margin of error in
half?
10. What is the critical value z* for a 90% confidence interval? Draw a sketch.
11. What is the critical value z* for a 95% confidence interval? Draw a sketch.
12. What is the critical value z* for a 99% confidence interval? Draw a sketch.
13. What assumptions and conditions must you consider before creating a
confidence interval for a proportion?
14. What effect does increasing your sample size have on the margin of error? What effect does
it have on the confidence level? What effect does it have on the confidence interval?
15. The formula used to determine the sample size n that will yield a confidence interval for a
pq
population proportion with a specified margin of error m is m = z * . Solve for n.
n
Key Vocabulary:
hypothesis fail to reject (the null
null hypothesis hypothesis)
reject (the null hypothesis) alternative hypothesis
1. What is a hypothesis?
2. After analyzing a set of data, if the results support a hypothesis, does that prove the
hypothesis is true? Explain.
3. After analyzing a set of data, if the results are inconsistent with a hypothesis, does that prove
the hypothesis is false? Explain.
5. When testing hypotheses, always start by assuming that the null hypothesis is true. What is
meant by a null hypothesis?
6. Given a null hypothesis H 0 : p = p0 , what are the parameters of the Normal sampling
distribution model? Under what conditions is this model appropriate?
7. How would you determine whether a particular value of p is so unlikely to have occurred
(assuming p = p0 ) that you would reject the null hypothesis?
8. Why do we say we fail to reject the null hypothesis rather than accept the null
hypothesis?
11. Explain the difference between a two-sided alternative hypothesis and a one-sided alternative
hypothesis. Draw a sketch
Key Vocabulary:
P-value Type I Error
statistically significant Type II Error
alpha level power
significance level
4. A 95% confidence interval corresponds to a two-sided hypothesis test at what alpha level?
5. A 90% confidence interval corresponds to a one-sided hypothesis test at what alpha level?
Key Vocabulary:
pooling
1. What conditions and assumptions are necessary for the sampling model of p 1 p 2 to be
approximately Normal?
2. If the above conditions and assumptions are met, what is the mean and standard deviation of
the sampling model?
3. Describe how to construct a level C confidence interval for the difference between two
proportions, p1 p2 .
4. Explain what is meant by pooling two samples. When is it appropriate to pool samples?
5. For a two-sample hypothesis test where H 0 : p1 p2 = 0 , show how to calculate the z test
statistic?
Key Vocabulary:
t-distribution one-sample t-interval
t-table one-sample t-test
degrees of freedom
8. The z-Table gives the area under the standard normal curve to the left of z. What does the
t-Table give?
9. Samples from normal distributions have very few outliers. If your data contains outliers,
what does this suggest?
10. If the size of the SRS is less than 15, when can we use t procedures on the data?
11. If the size of the SRS is between 15 and 40, when can we use t procedures on the data?
12. If the size of the SRS is at least 40, when can we use t procedures on the data?
Key Vocabulary:
two-sample t-interval pooled t-test
two-sample t-test
1. Describe the assumptions and conditions that must be met in order to use two-sample t-
procedures.
3. In a two-sample problem, what is the null hypothesis for comparing two means?
5. Explain why we ALWAYS pool our statistics when testing two proportions, but almost
NEVER pool our statistics when testing two means.
Key Vocabulary:
paired t-test paired t-interval
1. Explain why a two-sample hypothesis test is not appropriate for paired data.
3. What conditions and assumptions are necessary in order to use a paired t-test?
Key Vocabulary:
2-Test
1. State the null and alternative hypotheses for a goodness of fit test.
7. In order to reject the null hypothesis, what will be true about the chi-square statistic?
9. What is the shape of a chi-square distribution? What happens to the shape as the degrees of
freedom increases?
10. Explain the difference between a chi-square test for goodness-of-fit and a chi-square test of
homogeneity.
11. State the null and alternative hypotheses for comparing more than two proportions.
12. How do you calculate the expected count in any cell of a two-way table when the null
hypothesis is true?
13. How many degrees of freedom does a chi-square test for a two-way table with r rows and c
columns have?
14. If you decide to reject the null hypothesis when performing a chi-square test of homogeneity,
it is necessary to follow up by examining the residuals of the components. Explain how to
standardize the residuals for the components of the chi-square test.
15. What would you look for when interpreting these standardized residuals?
16. State the null and alternative hypotheses for a chi-square test for independence.
17. Explain the difference between a chi-square test of homogeneity and a chi-square test for
independence (think about the hypotheses for each).
Key Vocabulary:
1. What is the equation for the idealized regression line? Label each part of the equation.
2. What is the equation for the least-squares regression line? Label each part of the equation.
3. What are the four assumptions and conditions that must be met in order to perform inference
for regression? Explain how to check each one.
4. What three aspects of the scatterplot affect the standard error of the regression slope?
5. If there is no association between two variables, what should be the value of the slope of the
regression line?
6. When performing inference on the regression slope, what are the null and alternative
hypotheses?
7. How many degrees of freedom does the standardized extimated regression slope have?
8. What is the formula for constructing a confidence interval for the regression slope ?
9. How is this similar to the formula for constructing other confidence intervals?
10. If you want to test for no correlation between your variables x and y, what should be used for
the null hypothesis?
11. When using the LinRegTTest on the TI-83+, what does the symbol represent?