LGD Term Project Final

Term Project
Lorraine G. deDanann
Statistics 1040 Term Project

Introduction
Our term project this semester was the Skittles project, which involved practice with several
sections in our statistics course. The project started with a collection of data from our class. Each
student picked up a small 2.17 ounce bag of skittles, and we counted the skittles in each bag,
including the total number of skittles and the number of each color, and submitted this data to
group spreadsheet. The 49 bags of skittles provided the sample for our project. We used this data
to calculate proportions; build charts; calculate the mean, standard deviation, and five number
summary; look for outliers and distributions with histograms and boxplots; and calculate
confidence interval estimates, including confidence intervals for the population proportion,
mean, and standard deviation. This project has been a great way to apply our learning to a real
life situation, and gain better understanding about statistics and how they can show us
information about the world around us.
I Data Collection
Our class collected the data for 49 bags of skittles, counting the number of candies per bag, and
the number of each candy color in each bag. The data was collected in a spreadsheet.
II - Organizing and Displaying Categorical Data: Colors

Determine the proportion of each color within the overall sample gathered by the class. Guess!
What do you expect the proportions to be? Why?
We expect the proportions of the five colors within a 2.17 ounce Skittles package to be close to a
relative frequency of .2 each. We think that our individual bags might not be close to that, but we
expect that the larger the sample size is the closer those proportions would be to .2 for each of
the five colors. So once we add everyones bags together, the colors will be close to even.
Term Project
Now open the data set and compute the proportions of Red, Orange, Yellow, Green, and Purple
candies in the class data set. Note that the sample size is the total number of candies collected by
the class.
Count Red Count Orange Count Yellow Count Green Count Purple Total
Class Counts
617
627
566
567
620
2997
Proportion
.206
.209
.189
.189
.207
In StatCrunch, create a pie chart and a Pareto chart for the total number of candies of each
color in our class data set. Submit copies of your graphs in this report.
Term Project
Does the class data represent a random sample? What would the population be? Collaborate to
discuss sampling and our data in a paragraph or two. Be careful to review the definition of a
random sample and be precise.
The class data does not represent a random sample, as the sampling methods were not
random. From dictionary.com, random sampling is a method of selecting a sample from a
statistical population in such a way that every possible sample that could be selected has a
predetermined probability of being selected. For our skittles sample to be random, each skittles
bag in the world would have had an equal chance of being selected. Our sample was a
convenience sample, as the bags of skittles were purchased at various locations that were
convenient to the students. The population for this study is the 49 students in our class. While the
majority of our class members reside in Utah, there are many that live outside of the state as
well, and out in the world. This means that there are a wide variety of locations where the skittles
were purchased. Our project group is a sample of the class population.
Create a table that displays the counts by color and total from your own bag of candies together
with the counts by color and total for the entire class sample.
Count Red Count Orange Count Yellow Count Green Count Purple Total
My Bag
16
15
10
11
My Bag %
26.7
25
13.3
16.7
18.3
Class Counts
617
627
566
567
620
Class %
20.6
20.9
18.9
18.9
20.7
60
2997
Write a paragraph discussing your observations of this data. Respond to the following prompts:
i.
Do the graphs reflect what you expected to see? Are there any surprises?
ii.
Are there any observations that appear to be outliers? If so, what impact might they have
on graphics and summary statistics?
iii.
Does the distribution of colors in the total class data match the distribution of your own
data from your single bag of candies or are they different?
I expected there to be an equal (or close to equal) proportion of the colors, especially in the
larger sample from the entire class. It is not surprising to see an unequal number of skittles in the
individual bags, as the factory is unlikely to portion an equal amount of each color without a
specific design in the machinery to do so. But as it is most likely that the factory produces an
equal number of each skittles color, the proportion of the colors will be about equal in a large
enough sample size. Our classroom had a sample size of 49 bags, which is not quite a large
enough number to see the proportions equal, but large enough to see the larger trends in the data.
Term Project
One of the bags in the student count had 110 candies in the bag, which is 55% larger than the
average of all of the other bags of skittles from the class. The average number of skittles for the
other 48 students is 60.15 skittles per bag. I suspect that the bag of skittles with 110 skittles was
not the 2.17 ounce bag that was used by the rest of the class. However, I doubt that this larger
size bag of skittles had a strong effect on the overall proportions. Calculating the proportion of
the colors without this outlier equals very similar proportions (Red 20.7%, Orange 20.7%,
Yellow 19%, Green 18.9%, and Purple 20.6%). My bag had a different distribution than the total
class, with more red (27%) and orange (25%), with 17% green, and 18% purple, and only 13%
yellow. This is not wholly surprising, as the factory process is unlikely to have a method of
equally distributing the colors in each bag.
III - Organizing and Displaying Quantitative Data: the Number of Candies per Bag
In this section, we learned to apply our understanding to create data sets, histograms, and
box plots. This direct application was most helpful to understand how these are useful.
Using the total number of candies in each bag in our class sample, compute the following
measures for the variable Total candies in each bag:
i.
mean number of candies per bag : 61.2
ii.
standard deviation of the number of candies per bag: 8.9
iii.
5-number summary for the number of candies per bag: 52,58,60,62,110
Create a frequency histogram for the variable Total candies in each bag.
Term Project
Create a box plot for the variable Total candies in each bag.
Write a paragraph discussing your findings about the variable Total candies in each bag.
Address the following in your writing:
i.
ii.
iii.
What is the shape of the distribution of this variable?

Do the graphs reflect what you expected to see or are there some surprises?
Does the overall data collected by the whole class agree with your own single bag
data? Include the number of candies from your own bag and the total number of bags in
the class sample in your discussion.
The shape is skewed right. The graphs reflect what I expected to see. The one outlier (the bag
with 110 skittles) shows the one blip on the far right side of the graphs. Most of the data are
between Q1 (58) and Q3 (62) 70% of the data falls between 58 and 62.9. My bag had 60
skittles, which fits right within the Q1-Q3 data that is reflected in the class data. The class had a
median of 60 skittles per bag, and a mean of 61.2 skittles per bag, with a total number of 49 bags
of skittles in the sample.
In a half page, explain the difference between categorical and quantitative data. Also address the
following in your writing:
i.
ii.
What types of graphs make sense and what types of graphs do not make sense for
categorical data? For quantitative data? Explain why.
What types of calculations make sense and what types of calculations do not make sense
for categorical data? For quantitative data? Explain why.
Term Project
For quantitative data, the mean and the standard deviations should be used. For categorical
data, quartiles should be used. Bar graphs are the best type of graph to use with categorical data,
because this graph can easily show categories of data, such as the different colors of skittle in a
sample of skittles. A pie chart is also useful to show this data, as the pie is divided into different
categories, or slices of pie. Both bar graphs and pie charts allow for quick understanding of
categorical, or quantitative, data.
Quantitative data is best shown in a chart that will clearly show the numbered data.
Histograms, when property constructed, can easily show if a data is bell-curves, left-skewed, or
right-skewed, and if any outliers exist. Histograms with a bell-shaped curve can also be used to
make broad generalizations about the data, using the (statistical model). Boxplots allow a quick
absorption of quantitative data, as they show the IQR, or the Interquartile Range, which show the
middle 50% of the data A boxplot also shows the range of the full data set, as well as any
outliers.
IV - Confidence Interval Estimates

In this section, we had the opportunity to apply our fresh knowledge about confidence
interval estimates. I found this section to be most helpful, as it gave us a chance to work
out the equations on real data, to better understand what we were learning to calculate.
The equations seem complicated, but the last paragraph in this section explains what they
mean, and how this data can be used in the real world.
Construct a 99% confidence interval estimate for the population proportion of yellow candies.
Show your work, including the computations for the margin of error and the critical value.
566
Sample Proportion: 2997 = .189

Critical Value: 99% = = 0.01 = 0.005= 2.575
2
.189 (1.189)
Margin of Error: 2.575
2997
Lower Limit: .189 - .0184 = .1704

Upper Limit: .189 + .0184 = 2073
= .0184
Term Project
Construct a 95% confidence interval estimate for the population mean number of candies per
bag. Show your work, including the computations for the margin of error and the critical value.
Mean number of candies per bag: 61.2
Standard deviation of the number of candies per bag: 8.868
(1 ) 100% = 95% = (1 .05) 100% = 95% so = .05
Critical Value: 95% = = 0.05 = 0.025 = 2.000
2
Degrees of freedom: n 1 or 49 1 or 48
Lower bound:
2
Upper bound: : +
2
Margin of error: =
2
8.868
= 61.2 2.00
48
8.868
= 61.2 + 2.00
= = 2.00
48
8.868
48
= 58.64
= 63.76
= 2.56
Construct a 98% confidence interval estimate for the population standard deviation of the
number of candies per bag. Show your work, including the computations and the critical values.
sample size = 49
degrees of freedom: 48
standard deviation of the number of candies per bag: 8.868
2 = 78.639
2 = 29.707
2 = 76.154
48 78.639
Confidence interval estimate for :
76.154
48 78.639
<<
29.707
= 7.04 < < 11.27
Term Project
Discuss and interpret (with complete sentences) the results of each of your three interval
estimates.
We are 99% confident that the proportion of yellow candies is between .1704 and .2073.
We are 95% confident that the population mean of the number of candies per each bag is
between 58.64 and 63.76.
We are 98% confident that the standard deviation of the number of candies per bag is
between 7.04 and 11.27.
In a paragraph, explain in general the purpose and meaning of a confidence interval.
A confidence interval is an estimate of the range of values of the given statistic that contain
the population parameter; it defines how well the sample statistics estimate the population value.
Statisticians use confidence intervals to describe the degree of uncertainty associated with a
sample statistic. If a random sample is drawn many times, the results would vary between
samples. A confidence interval defines the range of values that are estimated to contain the
population parameter x% of the time. Confidence intervals give a point estimate (the most likely
value) and margin of error around that point estimate. The margin of error is added and
subtracted from the point estimate, and this gives the range of values, or amount of uncertainty.
This range, or interval, is the estimate that should contain the data x% of the time. For example, a
95% confidence interval means that if the same sampling method were used on different samples
of the data, we expect the true population parameter to fall within the interval estimates 95% of
the time. The confidence interval does not necessarily show the true value of the parameter. It
does not show confidence in the data itself, but rather shows confidence in how much of the data
is estimated to fall within the parameters of the interval, if the collection of the sample data were
repeated.
V - Summary
This project has been a great experience. We started with a single bag of skittles for each
student, and with the data collection from our class, and with knowledge learned from the course,
we were able to extrapolate a lot of meaning for this data. From understanding population
proportions, to understanding the mean and standard deviation, we learned how to quantify
variables in a data set. From the histograms and boxplots, we understood how to look for
outliers, and how they can change data. From the confidence intervals, we learned how our data
set would look if we were to run a similar experiment, to better understand the data would show
in repeated samples. This project has been a great way to apply our learning to a real life
situation, and gain better understanding about statistics and how they can show us information
about the world around us.
Term Project
VI Course Reflection
Working through the term project this semester, I have learned how statistics can color and
affect many parts of our modern lives. Seeing the example of the skittles project the variance in
the number of skittles per bag, the proportion of colors of the skittles in each bag, and how those
numbers change with more samples helps to not only understand variance and deviation, but
also understand why finding a perfect bag of skittles with an equal amount of each color is so
difficult. Each bag had a different proportion of colors, and while the proportion would have
come close to equal with a large enough sample (as demonstrated in the law of large numbers),
our 49 bag data set was not quite equal.
Understanding confidence intervals is also important, as this knowledge puts clearer context
into what academic papers, science magazines, and other sources mean when they give
confidence intervals. In particular, understanding that a confidence interval is not confidence in
the data itself, but rather, an understanding of the data that would be shown from multiple
random samples. I especially appreciated writing out each equation, as to know how these
numbers are computed helps to give a better understanding about what they mean. I am studying
psychology, and statistics are very important in understanding data from psychology studies, and
the inferences that are made in psychology. I have taken this class in preparation for the more
specific statistics classes offered in the psychology department at the University of Utah.
Additionally, I plan to work in education and research, and will use statistics on a regular basis to
understand and interpret my findings, and to help explain the findings to others. While I have
always enjoyed math, this course - and, in particular, this term project - has given a better
appreciation for the equations used in statistics, and the inferences of that data.

LGD Term Project Final

Caricato da

Informazioni sul documento

Descrizione originale:

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

LGD Term Project Final

Caricato da

Copyright:

Formati disponibili

Term Project

Statistics 1040 Term Project

II - Organizing and Displaying Categorical Data: Colors

mean number of candies per bag : 61.2

standard deviation of the number of candies per bag: 8.9

5-number summary for the number of candies per bag: 52,58,60,62,110

What is the shape of the distribution of this variable?

IV - Confidence Interval Estimates

Sample Proportion: 2997 = .189

Margin of Error: 2.575

Lower Limit: .189 - .0184 = .1704

Confidence interval estimate for :

= 7.04 < < 11.27

Potrebbero piacerti anche