Sei sulla pagina 1di 5

Sam Hogan

Heidi Kenney
Sterling Parker
Jessica Romney
Dakotah Phillips

Math 1040 Term Project

1. Question: What is the correlation between the neck circumference and weight in pounds of
Black Bears in Pocono Mountains, Pennsylvania in 1985?
2. Hypothesis: The more that the bear weighs the larger the neck will be.
3. Source: We used a data set that we found online where researchers took the measurements of
54 bears in a population of ~6,000 bears. The source of the data from: Biologist Gary Alt and Minitab,
Inc. As described in People Magazine (Toby Kahn 1985).
4. Data: The collected data as listed below and 1-variable summaries for each set. Three data
values were removed due to them being outliers. These values are noted but crossed off and not
included in any of our data analyses or charts.
Weight
(LBS)
Neck
(IN)
26 12
29 10
34 13
40 13
46 13.5
48 13
60 15.5
62 15
64 15
65 16
76 15
79 16.5
80 16
86 17
90 17
94 17
105 17.5
114 17
116 17.5
120 19
125 19
132 20
140 10.5
140 18
144 18
148 19
150 20
150 21
154 22
166 22
166 21.5
180 21.5
182 21
202 24
202 21.5
204 20
204 24
212 23
220 21
220 24
236 23
262 26.5
270 27
316 26
332 29
344 28
348 31.5
356 28
360 27
365 28
416 31
436 30
446 28
514 30.5

1-variable summary for X dataset
(Weight in Pounds)
Mean: 182.889
Standard Deviation: 121.801
Min: 26 Max: 514
Outliers: Any value >426.493 or <-60.715 (0)
There are 3 outliers in this group we removed
1-variable summary for Y dataset
(Neck Circumference in Inches)
Mean: 20.556
Standard Deviation: 5.641
Min: 10 Max: 31.5
Outliers: Any Value >31.838 or <9.274
There are 0 outliers in this group we removed
5. Correlation:

y = 0.049x + 11.885
R = 0.9428
0
5
10
15
20
25
30
35
40
0 50 100 150 200 250 300 350 400 450
N
e
c
k

S
i
z
e

(
i
n

i
n
c
h
e
s
)

Weight (in pounds)
Bear Weight vs. Neck Circumference
Graph of correlation between weight and neck size
The value given in Table A-6, that is noted as being critical values of the Pearson Correlation
Coefficient r, states that for 50 data points the r value must be greater than 0.279. Since our calculated r
is greater than the table r for 50 data points this shows a linear correlation. Therefore, we are ok to
proceed with the data analysis. Unfortunately the Table A-6 does not give an acceptable r value for 51
data points, but we are ok to assume that our r value is still sufficient as the values on Table A-6
decrease as more data points are given.
The best fit line for our data table (being the formula y=mx+b) is Neck Circumference=0.049 x
weight in pounds+11.885 pounds. 0.049 is the slope of the line and 11.885 is the starting neck
circumference on the y axis.
6. Residual data:
Residual Data Values (Being Observed Y Values Minus Predicted Y Values)
-1.762
-3.891
-1.106
-1.364
-1.122
-1.708
0.276
-0.31
-0.396
0.561
-0.912
0.459
-0.084
0.658
0.486
0.314
0.341
-0.546
-0.132
1.196
0.981
1.68
-8.164
-0.664
-0.836
-0.008
0.906
1.906
2.734
2.218
1.718
1.116
0.53
2.67
0.17
-1.416
2.584
1.24
-1.104
1.896
0.208
2.59
2.746
-0.232
2.08
0.564
3.892
0.048
-1.124
-0.339
0.468


-10
-8
-6
-4
-2
0
2
4
6
0 50 100 150 200 250 300 350 400 450
Residual Plot
Graph of relative differeces between predicted and observed values
There does not appear to be any significant pattern to the residual y vs. x data plot. There is a
slight fanning of the data, but not enough to be considered significant. The slight fanning would be
expected because as the bears get larger, their sizes become less consistent.
7. Criteria:
The 3 criteria for a linear correlation are 1) the calculated r must be greater than the r given on
table A-6, 2) the residual plot should not have any pattern, and 3) the residual plot should not become
thicker or thinner when viewed from the left or right. Since all 3 criteria for a linear correlation were
met in our analyses we can conclude that there is a definite correlation between the weights of the bears
and their neck circumferences. Therefore, our hypothesis was correct!
8. Predictions:
We can plug some values into our best fit line equation to make predictions about what a neck
circumference might be given the weight of a certain bear. Its more of a guess or an average. Since we
dont have a perfect correlation there is no way to tell what neck circumference would really be without
measurements. Below are some numbers we decided to plug into our equation.
y = 0.049(50) + 11.885 =
14.335lb
y = 0.049(100) + 11.885 =
16.785lb
y = 0.049(150) + 11.885 =
19.235lb
y = 0.049(200) + 11.885 =
21.685
y = 0.049(250) + 11.885 =
24.135
y = 0.049(300) + 11.885 =
26.585
y = 0.049(350) + 11.885 =
29.035
y = 0.049(400) + 11.885 =
31.485
y = 0.049(450) + 11.885 =
33.935
We can't make predictions about lower weight bears because of the way the data is skewed. The
data has some noticeable outliers in the lower weights making it difficult to accurately predict a neck
circumference. The same holds true on the higher end of the weight scale. The data starts to fan out
making it difficult to accurately predict the neck circumference here as well. However near the center
of the data is when it most adheres to the prediction line. So predictions here would be most accurate.
We also believe that the skewed data is due to the majority of the sample being on the lower half of the
weight scale. This seems to indicate that there is a larger amount of small bears in this population. It is
also not reasonable to think that we can predict outliers or bears that are much heavier than those that
lie within our line. Bears could only be so big; nature puts a limit on size.
9. Afterthoughts
After deciding on our hypotheses we quickly began to arrange the data and put it into graphs. It
became readily apparent that our hypothesis was accurate and would be supported by the data. From
there all that was left was to analyze it. In the process of presenting the data we did make a small error.
The mistake that we made was in creating the residual plot. We did not initially graph the residual data
versus x at first. We soon realized our mistake after more closely examining the instructions and were
able to fix our graph appropriately.
We also made a small error in neglecting to exclude outliers from our data analysis. This was an
easy error to fix as our charts were linked to our data tables. All we had to do was move the outlying
data away from the group of data and the chart was self adjusting. We also had to fix our residual data
plot because we did not put our x-values compared to the residual y-values; instead only putting
residual y-values on them with no x at all. We found these errors easy to overlook without very careful
examination, but with help from an outside perspective we were able to catch the errors and resolve
them in a timely manner.
We also discussed how many data points would be acceptable, and we decided that 54 bears
would be a reasonable representation of the population in this area, even though it was only roughly 1
percent of the data. We believe that the amount of data that we have was adequate support for our
hypotheses to be considered correct.

Potrebbero piacerti anche