Sei sulla pagina 1di 11

Math 1040 Project:

Statistics Using Skittles


Skittles data individual work part 1:
I went and bought a bag of Skittles (2.17 oz Original Skittles. From my bag I gathered this
data:

Count Red Count Orange Count Yellow Count Green Count Purple Total
My Bag

14

15

16

62

Here is my data of each color into percentages

Count Red Count Orange


My Bag

23%

24%

Count Yellow
13%

Count Green
26%

Count Purple
15%

After receiving this data we submitted it to the instructor who then shared with everyone the
entire classs data. The entire class was then broken up into groups where we applied the
objectives we were learning to help us understand how to apply statistics.

Skittles Project Group #1 Part 2


Determine the proportion of each color within the overall sample gathered by the class.
FIRST: Guess! What do you expect the proportions to be? Why?
My guess:
Red: 0.232
Orange: 0.071
Yellow: 0.232
Green: 0.25
Purple: 0.214
Me and my group guessed this data by sharing our results with one another to see if it was
close to the entire classs population.

SECOND: Now open the data set and compute the proportions of Red, Orange, Yellow, Green,
and Purple candies in the class data set. Note that the sample size is the total number of
candies collected by the class.
Red: 0.206
Orange: 0.198
Yellow: 0.210
Green: 0.192
Purple: 0.193

Pie Chart

Histogram Chart

Does the class data represent a random sample? What would the population be? Collaborate
to discuss sampling and our data in a paragraph or two.

Sample: The 94 2.17 oz bags of Original Skittles purchased by students in the class
Population: 2.17 oz bags of Original Skittles in the Greater Salt Lake area.

The population really depends on where all the students in this class got their bag of skittles.
It could be all of Utah if we were spread out evenly across the state; however, I bet most of
the students live in the Greater Salt Lake area, since they are attending Salt Lake
Community College.
This is not a true simple random sample, because we would have to take a sample from all
of the bags of 2.17 oz Original Skittles, before they were shipped all around the world. A
sample of size n from a population of size N is a simple random sample only if every possible
sample of size n has an equal chance of occurring.

Analyzing Skittles Data: Individual Work Portion Part 2

Count

Count

Count

Count

Count

Orange

Yellow

Green

Purple

14

15

16

62

1,238

1,148

1,238

1,133

1,135

5,892

Total
Red
My Bag
Class
Counts

Do the graphs reflect what you expected to see? Are there any surprises?

This Skittles study was really interesting. I saw a few surprises. I believe it was
number 53 that had almost 200 Skittles in their bag and I found that surprising but as
I thought about it I came up with some rational reasoning that possibly they bought a
bigger bag (for example like a king size). However, this probably didn't affect the
percentages of the totals to change because not only was there more Skittles as a
whole but there was more of each individual color so that would have balanced out
the percentages. As far as surprises between my results and the class as a whole was
the yellow after breaking each result into percentages I had a 7% difference of the
number of yellow skittles in the bag when compared with the class as a whole.

Are there any observations that appear to be outliers? If so, what impact might they have on
graphics and summary statistics?

Some outliers that I noticed was that there was a couple to a few responses
that only had the red number (and no other number) inserted into the statistics of the
whole class. This would affect the percentages of the average amount (or odds) of
the random sampling. I suppose it may be possible that someone could have

received a whole bag of just red Skittles but statistically this would be rare and an
outlier. I wonder if they just accidentally submitted it before they submitted their
other colors.

Does the distribution of colors in the total class data match the distribution of your own data
from your single bag of candies or are they different?

They were similar but different. For example my most common color was
green. In my bag 26% of all the skittles were green. However, in the class as a whole
it is tied for the least common color. With all the classes skittles it made up for only
19%. This repeated itself with my least common color which was yellow. In my bag of
skittles only 13% was yellow, whereas in the class as a whole it was tied for the most
common skittle to make up to 21% of all skittles. Here are the other percentages of
my bag compared to the classes bag:

Count Red Count Orange Count Yellow Count Green Count Purple
My Bag

23%

24%

13%

26%

15%

Class Counts

21%

19%

21%

19%

19%

Group Project- part 3


As a group we continued to use the data to estimate values.
1

i Mean number of candies per bag: 60.1


ii Standard deviation of the number of candies per bag: 5.6 (sample sd)
iii 5-number summary for the number of candies per bag
Minimum: 37
Q1: 58
Median: 60
Q3: 62
Maximum: 82

Histogram:
Total Candies in Each Bag
x-axis: number of candies per bag
y-axis: frequency

Box Plot for Number of Skittles per Bag

Term Project Part 3: Individual Portion


This was my response to the values that were found.

1.

Write a paragraph discussing your findings about the variable Total candies in
each bag. Address the following in your writing:

i.

What is the shape of the distribution of this variable?

1. I would say that according to our group and my individual research that
this is a bell shaped distribution because according to the data the
majority of the values are concentrated in the middle.

ii.

Do the graphs reflect what you expected to see or are there some surprises?

1. I believed that the values did reflect my expectations. I think that the
standard bag of Skittles would vary but only a little amount. There are
however, a couple surprises on both the low end and the high end.
What I mean by this is that I didnt think that those values would have
any results (for instance the 80-84 range and the 30-34 range).
Because when I get a bag of Skittles I assume that I will receive close
to the same amount in every bag.

iii.

Does the overall data collected by the whole class agree with your own single
bag data?
Include the number of candies from your own bag and the total number of bags
in the class sample in your discussion.

1. In my bag the total number of candies was 62. The total number of
candies for the whole class was 5,892 for 98 bags of Skittles so the
average of skittles per bag was 60.1 for the whole class. This data
agrees with my data in that my bag is very close to the class average.

2.

In a half page, explain the difference between categorical and quantitative data.
Also address the following in your writing:

i.

What types of graphs make sense and what types of graphs do not make
sense for categorical data? For quantitative data? Explain why.

ii.

What types of calculations make sense and what types of calculations do not
make sense for categorical data? For quantitative data? Explain why.

Categorical data its more like the characteristics of what you receive. In this specific
example it would be the colors of the candies. Its a data that cannot be counted but you can
group it to distinguish it from the others. On the other hand quantitative data can be
counted and given a number to distinguish it. In this example it would be the amount of
skittles per bag rather than identifying the characteristics in the bag.

In categorical data pie graphs are useful to identify characteristics in a visual form. It
would also be possible to do a bar graph as well, but its not as specific to categorical data
as a pie chart is. Stem and leaf plots would not work to identify characteristics. The stem
and leaf plot relies on quantitative data to draw its plot. Therefore, categorical data would be
difficult to express in this form.

In quantitative data there are a variety of different graphing types that makes sense
to use such as bar graphs, histograms, stem and leaf plots, and distribution graphs. These
types of graphs use numbers to distinguish the plot. A pie chart would not work very well
because it relies on characteristics rather than numbers to differentiate between variables.

In categorical data we calculated a lot of percentages to compare the characteristics.


In categorical there is no need to calculated a mean, median, mode, standard deviation,
max, or min because they cant calculate a characteristic. In quantitative you would want to
calculate the mean, median, mode, standard deviation, min, or max because all of these

values distinguish or separate the quantitative data from each other. With quantitative you
dont calculate percent because it doesnt give you any additional information.
Term project group portion Part 4:
See handwritten calculations on Term project group portion part 4 attached below this
document.
Term project individual portion part 4:
In life statistics is used to estimate the probability of an event or characteristic to occur in
a population. However, it is difficult to be a 100% accurate when determining the probability
of something in an entire population. This is why confidence interval is used. The purpose of
a confidence interval is to help determine an outcome of a probability but because a lot of
the time we are unable to get data from an entire population we take only samples of that
population so our numbers can vary. With a confidence interval we can determine that the
population parameter will land in between two values and we can put a confidence level on
those values. For example In my research (with the classes help) I am 99% confident that
the population proportion of the number of yellow skittles in the 2.17 oz bags of Original
Skittles is between 0.1965 and 0.2243; which means that if I bought a 2.17 oz of Original
Skittles I am 99% confident that that bag will contain 19.65%-22.43% yellow skittles.
Term project part 5: Summary Reflection

The skittles project that we worked on throughout the semester of statistics taught me how to
conduct and perform a study to form a hypothesis or to test a hypothesis among a population,
how to interpret statistics, and most importantly how useful statistics is in everyday life.
This project was able to teach me a real life statistics application. In the class, while doing
the homework assignments and tests, I would be tested on the interpretation of the data. With the
skittles project I actually had to start from scratch I studied a population find and collect data (it
wasnt given to me in a question) as well as interpret it. As I studied the skittles population I had

a few questions in mind that I wanted to answer. First was what color is most common? Second
what color is least common? At first I just analyzed my bag and although my calculations were
correct my sample alone was a poor representation of the population as a whole. As we
progressed in the semester we were able to see every ones data that was in the class. After
comparing results I found that my hypothesis of my bag alone was a bit different that the classs
data as a whole. We then would take our data and apply and carryout objectives as we learned
them through the progressing chapters.
This project helped me understand how useful statistics is in everyday life. If I have questions,
want to influence something, or understand a probability or statistic of something I can simply
start by gathering data on the population then put my data in the equations to help me understand
or control an outcome.