Sei sulla pagina 1di 8

Skittles Term Project

Leana Tran
In my statistic class, we were wondering if each 2.17 ounce bag of Skittles consists of
the same amount and color. This term project is about analyzing the data from each student
with a 2.17 ounce bag of Skittles. Throughout the whole semester, we applied the concepts that
we learned to the project. In order to do this, each student in the class bought a 2.17 ounce bag
of Original Skittles. The possible colors included red, yellow, green, purple, and orange. Each
color was sorted out and counted. The following project will be used throughout the semester.
Individual Portion
1. Create a table that displays the counts by color and total from your own bag of candies
together with the counts by color and total for the entire class sample.
Count

Count

Count

Count

Count

Tot

Red

Orange

Yellow

Green

Purple

al

My Bag

13

13

13

11

59

Class

275

226

247

232

243

122
3

Counts

1. Do the graphs reflect what you expected to see? Are there any surprises?
Yes, the data from the class was exactly what I had expected. My group and I
thought that each color would have a different amount. I was kind of shocked, because
with my data it was pretty consistent between all the colors, of my colors contained
the same amount, 13. Although, the amount of each color did not have a huge
difference, most of the class had a large portion of red skittles compared to the other
colors. I was surprised that someone only had 29 skittles in their bag and another
person had 80 that is a lot of skittles.
2. Are there any observations that appear to be outliers? If so, what impact might they
have on graphics and summary statistics?
Yes! There were outliers. After doing the math, 29 was a significant outlier. On a
box plot, the minimum will be noted (29). The maximum will be noted (80). The outliers
will have the biggest impact on the mean and
possibly the range. Not so much the mode or median. Eighty was considered an outlier
but not a significant outlier.

3. Does the distribution of colors in the total class data match the distribution of your
own data from your single bag of candies or are they different?
The distribution of each color was different for mostly everybody. There were 2
other people that had the same amount of reds as I did, but the rest of the data is
different. Everyone's bag had a different amount of skittles for each color. Mine
remained consistent for the most part, but others did fluctuate between having 3 for one
color or 15 for another. Overall, the class data showed that there were more reds total
than any other color. Yellow and purple were also another popular color.
Group Portion

a. We think that the proportion for each color is not equally distributed. We do not
believe that each bag contains the same amount of each color or 20%. The
reason behind this is because we think there are errors that could possibly occur.
These errors can contribute to slight changes in the deviation.
2. Does the class data represent a random sample? What would the population be?
Collaborate to discuss sampling and our data in a paragraph or two.
a. The data we obtained represents a simple random sample. Each package contains
the same weight of 2.17 oz. which contains the same chance of being chosen.
b. The population represents ALL of the skittles in the class. The sample would
represent the amount of skittles that are: red, green, yellow, orange, and purple.
c. For our sample size, there were 22. It is not necessarily too small or large because
generally a large sample is n>30. The greater the sample size, the smaller the
standard error will be. The class data does contain an outlier (29). The outliers will
mainly affect the mean.

In the chart above, it indicated the total number of each color from the whole class. Our
data was all entered online and our Professor combined the data in excel/simple chart. From
that chart, we could create a pie and bar chart. From the charts above, we can conclude that
the colors were not equally distributed in each bag. The data was almost equal. My 2.17
ounce bag of Original Skittles contained 13 red skittles, 13 orange skittles, 9 yellow skittles, 13
green skittles, and 11 purple skittles. My data set was close, but not quite even.

1.

Total candies in each bag:

1. mean number of candies per bag: 58.23 candies


2. standard deviation of the number of candies per bag: 8.95 candies
3. 5number summary for the number of candies per bag:
Minimum: 29
Q1: 57
Median: 59
Q3: 61
Maximum: 80

In the histogram and boxplot above, it showed the amount of candies in each 2.17
ounce Original Skittles bag from the classrooms data. From the two graphs above, the
distribution seems to be fairly bell shaped. There were outliers that were present in the data set
which affects the location of the mean. The mean amount of Skittles in each bag was 58.23
pieces. In my bag of Skittles I had 59 pieces of Skittles. I was very close to the average amount
of Skittles in each bag.
The quantitative data is the amount that is being measured/counted such as weight,
measurements, time, and more. Categorical data consists of names, labels, groups, colors, or
categories. Categorical data is not measured compared to quantitative data. Graphs to use
when graphing quantitative data include: histograms, leaf plots, and more. Graphs used to
represent categorical data include: pie and bar charts.
Categorical data in my mind is something that you can place into a category. I was always
confused on the differences between categorical and quantitative until recently. The example of
Zip Codes has always been an obstacle for me. You may assume that zip codes consist of
numbers therefore they are quantitative, that is incorrect. Zip codes can be sorted out into
categories depending on the region. For example: West Jordan, Utah 84081. Quantitative data
can consist of numbers such as height, shoe size, and etc. Typically for a categorical data, I
would see bar graphs. For quantitative data, bar graphs and sometimes pie charts can help. Pie
charts are mainly comparing things there as bar graphs can show you a difference.
Off to confidence intervals, a confidence interval gives a range of values that contains an
estimate of the certain/true parameter. Confidence intervals give a range as point estimate
gives a single number. Confidence intervals can be constructed at a confidence level (95%,
98%, ). The confidence levels describe the unknown with the sampling method. With smaller
samples, we use the t-distribution. The confidence interval can be computed from a sample
data.
Ex: With a 95% confidence interval indicates that 19/20 samples from the population will have a
confidence interval that contains the true population parameter.

Yellow Candies

n(total # of candies in
sample): 5650
x(total # of yellow

Population Mean
Number
n(# of bags in the

Population Standard
Deviation
n= 94

sample): 94
(total)
s= 3.4621

s= 3.4624

candies in

S^2=

sample): 1189

11.98

= 0.01

= 0.05

= 0.02

DF: 94-1 = 93

DF: 94-1=93

Critical Value: 0.005


= 2.576
P: total yellow/total
skittles:
1189/5650=
0.2104
E:

Critical Value:
1-.95/2=0.025

T0.025 =
2.576 * .2104 * (1 .
2104)/5650

1.990+1.984/2=
1.987

) = 0.0200
(using t-distribution
critical values
for df 80 & 100
since 94 was in
the middle).
Lower Bound:p-E
0.2104-0.0200=
0.1904
Upper Bound:p+E
0.2104+0.0200=

X-bar: 5640/94=
60.106
E= (1.987)*
3.4621/94
=0.7095

0.2304
Lower Bound:
Confidence interval
99%

60.106 - 0.709=
59.396

Critical Value:
0.01:
135.807+124.116/2=129.96
15

0.99:
70.064+61.754/2=65.90
9

(using chi-square
distribution and df 90 &
100 because 94(n) is in
the middle).
Lower Bound:
(941)*11.98/129.9615=2.92
7

Upper Bound:
(94-1)*11.98/65.909=4.111

estimation for the


population

Upper Bound:
60.106+0.709=60.8

standard

15 95%

deviation of

Confidence

yellow skittles:

Interval for

(.1904, .2404)

population

98% confidence interval


estimate for population
standard deviation is
(2.927,4.111)

mean number of
candies/bag:
(59.396, 60.815)
Total Skittles: 5650
Bags:94

1. The confidence interval 99% estimation for the population standard deviation of yellow
skittles is (.1904, .2404) We are 99% sure that the Skittles will fall between those intervals.
2. The 95% confidence interval for population mean number of candies/bag is (59.396,
60.815) This means that we are 95% confident that the true mean is between the two
values.
3. The 98% confidence interval estimate for population standard deviation is (2.927,4.111)
We are 98% confident that the standard deviation would fall between the two datas.
Above was a table we used to enter the work done to construct confidence
intervals. A 99%, 95%, and 98% confidence interval was conducted.
A few mistakes that could have affected the data is documenting incorrect
amount/wrong Skittle Bag size, non-response error, and more. The outliers from before did
change the mean for the whole class, but the median is a resistant.
Summary

I loved how fun the project was! Not only was I able to apply the concepts I learned on
the way but I have a better understanding of statistics and how it works hang in hang in the real
world. I have been going through medical research papers in other classes and many of them

are studies performed using statistics. It was great to work with the other students in the class
and just help each other understand concepts we could not understand otherwise.

I have always struggled with math ever since it jumped from high school math to college
math. I have noticed that statistics is in our everyday lives and decisions. This project allowed
me analyze the problem efficiently. I also learned all the many different types of tests that you
can perform in statistics. I recently realized working in the Childrens Surgical Unit that every 4
hours vital signs are gathered. We use statistics to analyze all their trends for that certain
patient, whether it is blood pressure, respiratory rate, oxygen and temperature levels. I cant
wait to apply my knowledge of statistics into other subjects such as Human Physiology and
more.

Potrebbero piacerti anche