Sei sulla pagina 1di 9

Math 1040 Skittles Term Project

1. Report Introduction

The data below was compiled from 23 students in the Statistics class who
each bought a 2.17 oz. bag of Skittles and recorded the number of each color from
the individual Skittle bags. We then used the recorded data of the Skittles bags and
applied it to various equations and concepts that have been taught throughout the
semester. Throughout the class we have learned to summarize data, calculate
probabilities and conduct confidence intervals and hypothesis testing. The goal of
this project is to demonstrate our knowledge of these key concepts.

2. Organizing and Displaying Categorical Data: Colors

Color # Proportion

Red 268 0.197

Orange 250 0.184

Yellow 265 0.195

Green 308 0.226

Purple 269 0.198

Total 1360 1
According to the pie chart, there is very little variation with the number of skittles
per color. However, the Pareto chart shows that green skittles occur more frequently
than any other color, and orange occurs less often. The graphs do reflect what we
expected to see, because a single bag of skittles usually has around the same number
of colors, and these graphs show that the numbers are consistent throughout. There is
not a significant variation between the colors. The graph below shows a single bag, (in
red) compared to the entire sample (in orange). This is a great reflection of how each
bag of skittles varies on the number of colors within.

3. Organizing and Displaying Quantitative Data: the Number of Candies per


Bag

Mean: 59.1

Standard deviation: 2.53

Min: 53

Q1: 58

Median: 59

Q3: 61

Max: 64

Total number of bags: 23


According to the data and charts above, we see that the number of Skittles
per bag consistently ranges from 53-64, with most bags containing 59-60 Skittles
each. The shape of the distribution is symmetric, or bell shaped. The charts show
what we expected to see, as each bag contains roughly the same amount of
candies, with little variation and no outliers. The sample data, the data collected from
the 23 classmates combined, shows similar results when compared to the data
collected from an individual Skittles bag. My Skittles bag contained 58 candies, 1.1
candies less than the sample mean.

4. Reflection

Categorical data is considered qualitative, meaning it is not measured by quantity. In


this case the categorical data is defined as the color of Skittles in a Skittles bag,
while the quantitative data refers to how many Skittles of each color are in a bag. It
makes sense to use a pie chart for categorical data because a pie chart accurately
represents the entire data set by breaking the data into separate categories, in this
case, the different colors of Skittles. It doesn’t make sense to use a box plot for
categorical data since box plots are used to find the five-number summary of
quantitative data. It makes sense to use frequency histograms when calculating
quantitative data as a frequency histogram groups data into ranges, in this case, the
number of candies found in each individual bag. While a bar graph accurately
represents the categorical data, it does not provide us the mean and does not
accurately represent the different levels of distribution of the data. It makes sense to
use a confidence interval when searching for mean or proportion because this refers
to quantitative data. However it is not practical to use a confidence interval with
categorical data since categorical data simply asks for groups/sections/categories,
rather than specific numbers and calculations. It makes sense to use a hypothesis
test when calculating categorical, or qualitative, data since hypothesis tests are used
to test a characteristic within the population or sample. This is also the reason we
would not use a hypothesis test for quantitative data.

5. Confidence Interval Estimates

A confidence interval is a range of values that you can be (95,99% ect.) certain
contains the true mean or true proportion of the population.

Construct a 99% confidence interval estimate for the true proportion of yellow
candies.

p ̂=x/n=265/1360=0.195
p ̂±z_(α/2)*√((p ̂(1-p ̂ ))/n)=0.195±2.575*√(0.195(1-0.195)/1360)=0.195±0.028
(0.167,0.223)
Construct a 95% confidence interval estimate for the true mean number of
candies per bag.

x ̅±z_(α/2)*σ/√n=59.1±1.96*2.53/√1360=59.1±0.134=(58.97,59.23)

Construct a 98% confidence interval estimate for the standard deviation of the
number of candies per bag.

Not applicable for this class

Discuss and interpret the results of each of your three interval estimates. Include
neatly written and scanned copies of your work.

6. Hypothesis Tests

Hypothesis testing is taking an assumption and testing the parameters of that


assumption to determine if it is correct or not.

Use a 0.05 significance level to test the claim that 20% of all Skittles candies
are red.

Claim: p = 0.20

Null: p = 0.20

Alternative: p ≠ 0.20

Test statistic: Z = 0.197-0.2/ √ [0.20(1 − 0.20)/1360] = -0.28

P-value: 0.3897

Fail to reject null hypothesis. There is not sufficient evidence to suggest that the
proportion of candies is not equal to 0.20.

Use a 0.01 significance level to test the claim that the mean number of candies
in a bag of Skittles is 55.

Claim: μ = 55

Null: μ = 55

Alternative: μ ≠ 55

Test statistic: (x ̄-μ)/(s/√n)=(55-59.1)/(2.53/√23)=-7.772


P-value: 9.510 x 10-8 ≈0

Reject null hypothesis. There is sufficient evidence to suggest that the mean number
of candies is not equal to 55.

We failed to reject the claim that 20% of the candies are red because there was
insufficient evidence suggesting that the proportion of red candies is not equal to
20%.
We rejected the hypothesis claiming that the mean number of candies in the bag is
55 because there is sufficient evidence suggesting the mean number of candies is
55.

7. Reflection

Conditions for conducting confidence intervals are random samples,


independent data, with sample size no more than 10% of the population. Conditions
for conducting a valid hypothesis test are normal distribution, data from multiple
groups with the same variance, and independent data. Our data for both confidence
interval and hypothesis testing fit these conditions. Possible errors that may have
occurred could be type I errors, rejecting the hypothesis when you should fail to
reject, and type II errors, failing to reject a false hypothesis. There is always the
chance of human error, for example, typing the wrong numbers into the calculator or
purchasing the wrong size bag of candies, etc. A possible improvement for the
sampling data could be to count and record the data from each bag as a class.
Another way we could improve the sampling method is to have a larger sample size.

In conclusion, our data shows that the number of Skittles per bag is roughly
proportionate, as is the number of colors found in each bag. The mean number of
candies per bag was found to be 59.1, with the lowest number of candies per bag as
53 and the highest number of candies per bag as 64. The number of each color
varied greatly by bag, but when the data is compiled we see that the variation of
colors for sample size is proportionate, with green showing slightly higher numbers
than the rest.

Potrebbero piacerti anche