Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Consider the table below describing a data set of individuals who have registered to volunteer at a
public school. Which of the choices below listscategorical variables?
Question 2
The General Social Survey conducted annually in the United States asks how many friends people
have and how they would rate their happiness level (very happy, pretty happy, not too happy). In
order to evaluate the relationship between these two variables a researcher calculates the average
number of friends for people who categorize themselves as very happy, pretty happy, and not too
happy. Which of the following correctlyidentifies the variables used in the study as explanatory
and response?
explanatory:number of friends Inorrect 0.00 Having more friends might cause people to be happier or b
response: happiness level more friends. So we can’t easily determine which variable
(categorical with 3 levels) based on which we might expect to affect which. However
is the explanatory variable since we first divide the data int
analyze summary statistics of number of friends of people
Therefore, number of friends is the response variable.
explanatory:number of friends
response: very happy, pretty
happy, not too happy
explanatory:happiness level
(categorical with 3 levels)
response: number of friends
Total 0.00 /
1.00
Identify the explanatory variable in a pair of variables as the variable suspected of affecting the
other, however note that labeling variables as explanatory and response does not guarantee that the
relationship between the two is actually causal, even if there is an association identified between
the two variables.
Question 3
Past research suggests that students who study with fewer distractions (internet, cell phone, etc.)
tend to get higher grades. Which of the following is the best scenario for being able to generalize
this finding to the population of all students?
None of the students in the sample has any misdemeanors; their answers
can’t be trusted.
A student list for the college is obtained and students are randomly Correct 1.00 Random sampling a
selected from the list, and all selected students participate in the study. population at large,
sampling.
A survey is emailed to all registered students, and the results are based
on the sample of returned surveys.
Sample only includes students who are in classes that the researcher
teaches.
Total 1.00 /
1.00
Classify a study as observational or experimental, and determine whether the study’s results can
be generalized to the population and whether they suggest correlation or causation.
• If random sampling has been employed in data collection, the results should be generalizable to
the target population.
• If random assignment has been employed in study design, the results suggest causality.
Question 4
A school district is considering whether it will no longer allow students to park at school after two
recent accidents where students were severely injured. As a first step, they survey parents of high
school students by mail, asking them whether or not the parents would object to this policy
change. Of 5,799 surveys that go out, 1,209 are returned. Of these 1,209 surveys that were
completed, 926 agreed with the policy change and 283 disagreed. Which of the following
statements is the most plausible?
It is possible that 80% of the parents of high school students disagree with the policy
change.
The survey is unlikely to have any bias because all parents were mailed a survey. Inorrect 0.00
The school district has strong support from parents to move forward with the policy
approval.
Question 5
As part of a statistics project, Andrea would like to collect data on household size in her city. To
do so, she asks each person in her statistics class for the size of their household, and reports that
her sample is a simple random sample. However, this is not a simple random sample. Which of the
following is the best reasoning for why this is not a random sample that is appropriate for this
research question?
Your Answer
Andrea did not block for any variables that might influence the response.
Andrea asked everybody in her class instead of asking her classmates to volunteer.
In this investigation of household size, each household represents a case. Andrea incorrectly sampled individuals instea
of households.
Andrea did not use a random number table to randomize the order in which she collected the students’ responses, so th
sample cannot be random.
Total
Distinguish between simple random, stratified, and cluster sampling, and recognize the benefits
and drawbacks of choosing one sampling scheme over another.
Question 6
True or False: Stratified sampling allows for controlling for possible confounders in the sampling
stage, while blocking allows for controlling for such variables during random assignment.
False Inorrect 0.00 Stratifying and blocking both allow for controlling for potential confounders, but
stratify when we sample (divide population into strata and sample from within ea
random assignment (divide sample into blocks and randomly assign from within
True
Total 0.00 /
1.00
Identify the four principles of experimental design and recognize their purposes:
Question 7
Which of the below data sets has the lowest standard deviation? You do not need to calculate the
exact standard deviations to answer this question.
100, 100, 100, 100, 100, 100, Correct 1.00 The dataset with the most repeated observations has the
101 deviation.
0,1,2,3,4,5,6
0,1,3,3,3,5,6
Note that there are three commonly used measures of center and spread:
center: mean (the arithmetic average), median (the midpoint), mode (the most frequent
observation)
spread: standard deviation (variability around the mean), range (max-min), interquartile
range (middle 50% of the distribution)
Question 8
True or False: The statistic mean/median (mean divided by median) can be used as a measure of
skewness (either right or left). If this statistic is less than 1, the distribution is most likely left
skewed.
False
True Correct 1.00 In a left skewed distribution the median is greater than the mean, therefore
than 1.
Identify the shape of a distribution as symmetric, right skewed, or left skewed, and unimodal,
bimodoal, multimodal, or uniform.
Question 9
Based on the relative frequency histogram below, which of the following statements is supported
by the plot?
Your Answer Score Explanation
The distribution is
multimodal.
It is not possible
to estimate the
median without
knowing the
sample size.
Total 0.00 /
1.00
Use histograms and box plots to visualize the shape, center, and spread of numerical distributions,
and intensity maps for visualizing the spatial distribution of the data.
Question 10
A recent housing survey was conducted to determine the price of a typical home in a city that is
mostly middle-class, with one very expensive suburb. The mean price of a house in this city is
roughly $650,000. Which of the following statements is most likely to be true?
Majority of houses in this city cost Correct 1.00 Since the city is mostly middle-class, with one very ex
less than $650,000. distribution to be right skewed, and therefore the mean
observations fall below the median, more than 50% of
$650,000, the mean.
Total 1.00 /
1.00
Question ExplanationThis question refers to the following learning objective:
Define a robust statistic (e.g. median, IQR) as a statistics that is not heavily affected by skewness
and extreme outliers, and determine when such statistics are more appropriate measures of center
and spread compared to other similar statistics.
Question 11
Phi Delta Kappa (PDK) is an international professional organization for educators that, in
collaboration with Gallup, has been conducting polls on the public’s attitudes toward the public
schools since 1969. The following was one of the questions on the 2011 poll:
”Most teachers in the nation now belong to unions or associations that bargain over salaries,
working conditions, and the like. Has unionization, in your opinion, helped, hurt, or made no
difference in the quality of public school education in the United States?”
The respondents’ answers broken down by party affiliation are shown below. Which of the
following statements is most justified by these data?
Total 1.00 /
1.00
Use contingency tables and segmented bar plots or mosaic plots to assess the relationship between
two categorical variables.
Question 12
In 1948, Austin Bradford Hill, designed a study to test a new treatment for tuberculosis that at the
beginning of the study there was no evidence whether it would be any better or worse than bed
rest. He randomly assigned some patients who volunteered to be a part of this study to receive the
treatment Streptomycin, an antibiotic. The other patients received only bed rest as the control
group. Hill then observed the patients’ outcomes: which patients died and which recovered. The
results of the study are shown below.
We use the following simulation test if there is a difference between the recovery rates under the
two treatments: We write “died” on 18 index cards and “survived” on 89 index cards to indicate
whether or not a patient died. Next, we shuffle the cards and deal them into two groups of 52 and
55, for control and treatment, respectively. We then calculate the simulated difference between the
recovery rates in Streptomycin and control groups (pp Streptomycin − ppControl), and record this
value. We repeat this simulation 100 times. The histogram below shows the distribution simulated
difference between the recovery rates in these 100 simulations.
Which of the following is correct? Choose all that apply (there are multiple correct answers).
The conclusion of this study is Correct 0.11 Since the sample is comprised of volunteers, we can’t gen
generalizable to all tuberculosis
patients.
The alternative hypothesis should Correct 0.11 The evidence could go either way so we should consider a
be that there is a difference
between the recovery rates under
the two treatments.
Based on this study we can Correct 0.11 Also, since this is an experiment we can deduce causation
conclude a causal relationship
between Streptomycin and better
tuberculosis recovery rate.
Streptomycin treatment does not Inorrect 0.00
appear to be effective in treating The observed difference betwee
tuberculosis since the observed isp^Streptomycin−p^control=5155−
number of deaths in the treatment
There is 1 simulation where the simulated difference is ≥0
group would not be considered
hypothesis test, the p-value is 0.01×2=0.02. This is consid
unusual based on the simulation
reject the null hypothesis and hence conclude that the data
results.
treatments.
The alternative hypothesis is that Inorrect 0.00 The evidence could go either way so we should consider a
the Streptomycin treatment is more
effective than bed rest.
Total 0.56 /
1.00
Note that an observed difference in sample statistics suggesting dependence between variables
may be due to random chance, and that we need to use hypothesis testing to determine if this
difference is too large to be attributed to random chance. Set up null and alternative hypotheses for
testing for independence between variables, and evaluate the data support for these hypotheses
using a simulation technique.