Sei sulla pagina 1di 6

Julie Prologo

Math 1040
Term Project Skittles data
In this project we are comparing data collected by the class for 2.17oz bags
of Skittles. We are comparing the different charts to look at the data
collected. Such as Pareto charts, pie charts, histograms, and boxplots. We
are looking at the mean, standard deviation, 5 number summaries,
proportions, and the different tables of collected data.
The total sample size is 2365 candies.
The proportion of red candies is .217,
The proportion of orange is .185,
The proportion of yellow is .188,
The proportion of green is .206,
The proportion of purple is.204.
Pareto chart for Skittles data
520
500
480
460
440
420
400
RED

GREEN

PURPLE

YELLOW ORANGE

Pie chart for Skittles data

red
orange
yellow
green
purple

My observations of this data is that the colors arent as equally proportioned


as I thought they would be. It appears that red is the most popular color
followed by green, purple, yellow, and last orange.
My table looks like this
Red

Orange
9

10

Yellow
17

Green
13

Purple
14

With a total number of 63 candies in my bag


Proportions are as such
Red- .143, orange- ..159, yellow- .270, green- .206, purple- .222
The proportions for this single bag of candies are significantly different than
that of the class as a whole.
The table for the whole class looks like this.
Red
513

Orange
438

Yellow
444

Green
487

Purple
483

The mean number of candies is 59.1 candies per bag.


The Standard deviation is 3.61
The 5 number summary is ( 49, 57, 60, 62, 67)

Observation- The shape of the distribution in each graph is not a normal


distribution. The graph is not what I would expect to see. And the proportions
and number of candies are very different from my bag of candies. It was
interesting to see the variation in each entry.

Reflection: The difference between categorical and quantitative data are that
categorical is something your comparing that doesnt add up or the numbers
dont mean anything, such as eye color or social security numbers.
Quantitative data is something that consists of numbers that can represent
counts or measures. The Pareto and Pie charts are what we are using for the
categorical data, and the Histogram and Boxplot are what we used for the
quantitative data. For Histogram and Boxplots you use measures and
numbers to chart the data., for Pie charts and Pareto charts you are charting
the colors by how many are present. You cant measure the colors. The
calculation being used for Categorical data are the proportions and tables
showing amounts of proportions. The calculations being used for Quantitative
data are the mean, standard deviation, and 5 number summaries, because
they are measuring something that is measurable. Numbers that you can
count
Confidence Interval Estimates: Confidence Interval estimate is an interval
estimate, or a range of values that are used to estimate a true population
parameter.
-Construct a 99% confidence interval estimate for the true proportion of
yellow candies, using the calculator functions of 1-prop-int, to find this
interval (0.167, 0.208)
-Construct a 95% confidence interval estimate for the true mean number of
candies, using the calculator function of T-interval (57.945, 60.255)
-Construct a 98% confidence interval estimate for the standard deviation of
the number of candies per bag. Using the formula for finding the standard
deviation we found the interval to be (2.8, 4.8)
Hypothesis Tests: A hypothesis is a claim or statement about a property of a
population. A hypothesis test (or test of significance) is a procedure for
testing a claim about a property of a population.
-Use a 0.05 significance level to test claim that 20% of all skittles candies are
red.
Claim = Null hypothesis: p= 0.20
Alternative hypothesis: p 0.20
Test statistic: Z= 2.0563
P-Value: p= 0.0398
Since the P-value is less than 0.05 significance level we will reject the Null
Hypothesis.
In other words we do not have sufficient evidence to support the claim that
20% of all candies are red.
Use a 0.01 significance level to test the claim that the mean number of
candies in a bag is greater than 55.
Null Hypothesis: Mean= 55

Claim- Alternative hypothesis: Mean > 55


Test Statistic: t= 7.183
P-Value: p= 0
Since the P-value is less than the 0.01 significance level we will reject the
Null Hypothesis.
In other words we have sufficient evidence to support the claim that there
are greater than 55 candies per bag.
Reflective Writing and e-portfolio:
As a result of this project I have learned many things that pertain to statistics
in the real world. I have learned anything from probabilities and how to
choose a sample survey, to knowing how to read and create graphs and
tables to confidence intervals and hypothesis testing. These concepts have
many real world applications. Such as in my job as a nurse we occasionally
participate in studies certain groups are doing. And it has been fun learning
how this data is interpreted. For instance a while back we did a study on with
a certain drug was given did it reduce the amount of post-op complications
with an ileus that a patient will have. We had certain criteria the patients had
to meet before they could be part of this study. It would be interesting to see
the data that the researches came up with and see the information they
learned.
Statistics are used everyday in nursing. They look at data on how to affect
patient care, if a certain protocol or way of doing something is effective, or
on how to decide the treatment of patients. Statistics are a way of looking at
numbers or data, rather than relying on emotional information. Instead of
saying that nurses are overworked you can look at numbers on what is
effective. There are many more reasons to use statistics in Nursing. In other

professions there are many applications for these things we have learned in
this class also.
These skills will definitely help me in the rest of my education. This summer I
will start The RN-BSN program to finish my degree and I can see how this will
help me with the rest of the classes I will be taking.