Sei sulla pagina 1di 4

For this Data Exploration Project I chose to report on the number of babies born each year in the U.S.

These particular values are rates measured by the number of babies born each year per 1,000 people, so in other words 14.4 means approximately 14.4 babies for every 1,000 of the population that year. I chose to start in the 1970s all the way to the 2000s, so that makes 31 observations and values that are represented in the chart and graphs. I obtained my data from the U.S. Census Bureau on the internet www.census.gov/prod/2011pubs/12statab/ vitstat.pdf (pg.65) because I was interested to see if there were any surprising trends over time. I also thought that examining the past was intriguing because its a good indicator to predict what the future population could become.
Data Information Sample size: 31 5 Number Summery: (14.2, 14.6, 15.6, 15.9, 18.4) Mean: 15.46, Median: 15.6, Range: 14.2 to 18.4 IQR: Q3-Q1: 15.9-14.6= 1.3
xi 14.2 14.2 14.3 14.4 14.4 14.6 14.6 14.6 14.8 14.8 15 (xi-x)2 (14.2-15.46)2 = 1.59 (14.2-15.46)2 = 1.59 (14.3-15.46)2 = 1.35 (14.4-15.46)2 = 1.12 (14.4-15.46)2 = 1.12 (14.6-15.46)2 = 0.739 (14.6-15.46)2 = 0.739 (14.6-15.46)2 = 0.739 (14.8-15.46)2 = 0.436 (14.8-15.46)2 = 0.436 (15-15.46)2 = 0.212 xi 15 15.1 15.4 15.6 15.6 15.6 15.6 15.6 15.7 15.8 (xi-x)2 (15-15.46)2 = 0.212 (15.1-15.46)2 = 0.129 (15.4-15.46)2 = 0.004 (15.6-15.46)2 = 0.019 (15.6-15.46)2 = 0.019 (15.6-15.46)2 = 0.019 (15.6-15.46)2 = 0.019 (15.6-15.46)2 = 0.019 (15.7-15.46)2 = 0.058 (15.8-15.46)2 = 0.116

mean= sum of all values / # of observations 479.26/31= 15.46

xi 15.8 15.8 15.9 15.9 16 16.2 16.4 16.7 17.2 18.4 Total

(xi-x)2 (15.8-15.46)2 = 0.116 (15.8-15.46)2 = 0.116 (15.9-15.46)2 = 0.194 (15.9-15.46)2 = 0.194 (16-15.46)2 = 0.292 (16.2-15.46)2 = 0.548 (16.4-15.46)2 = 0.884 (16.7-15.46)2 = 1.54 (17.2-15.46)2 = 3.03 (18.4-15.46)2 = 8.64 26.239

26.239/ (31-1)= .875 Variance= .875 and nd the square root to get the standard deviation Standard Deviation= .935 I found that there was only one outlier which is 18.4, my work for coming to this conclusion is below: Below outlier: Q1-(1.5IQR) BO: 14.6-(1.51.3)= 12.65 No outlier: lowest value is 14.2 Above outliers: Q2+(1.5IQR) AO: 15.9+(1.51.3)= 17.85 One outlier: highest value is 18.4, second highest is 17.2

Now lets just say that we add 100 to each number in our data: Sample size: still 31 5 number summery: (114.2, 114.6, 115.6, 115.9, 118.4) Mean:115.46, Median: 115.9, Range: 114.2 to 118.4 Standard Deviation: .935 Variance: .875 IQR: 1.3

Frequencies of Birth Rates in the U.S. (1970-2000)

" "

" "

" "

" Birth Rates

Birth Rates in the U.S. Stem plot (1970-2000)

L114 H114 L115 H115 L116 H116 L117 H117 L118 H118

22344 66688 0014 66666788899 024 7 2 4

" "

Birth Rates in the U.S. Box plot(1970-2000)

Key I L118 I 4 = 118.4

The mean and median are different than before in the sense that, like the rest of the data, their values increased by 100. The range is exactly the same because there is still the same amount of space between the lowest and highest values, which is also the deal with the standard deviation seeing as there is the same average amount of space between the values and the mean. Now lets say we increase the original data by 50%:

Sample size: still 31 5 number summery: (21.3, 21.9, 23.4, 23.85, 27.6) Mean: 23.19, Median: 23.4, Range: 21.3 to 27.6 Standard Deviation: 1.4 Variance: 1.96 IQR: 1.95 Frequencies of Birth Rates in the U.S. (1970-2000)

" "

" "

"

Birth Rates

Birth Rates in the U.S. Stem plot (1970-2000) " " " " Birth Rates in the U.S. Box Plot(1970-2000)

L21 H21 L22 H22 L23 H23 L24 H24 L25 H25 L26 H26 L27 H27

33 566999 22 557 144444 677799 03 6 1 8

Key I L23 I6= 23.6

The mean and median and range changed from the original data only by multiplication of 1.5. In addition, increasing the data by 50% also increased the standard deviation and variance because now there is more space between the values and the mean.

Now, lets assume my data is of a normal distribution. I will nd out the following from the data: Percentage that is greater than 5 units above the mean Seeing as my data has units that are very close to each other, 0% of my data is 5 units above the mean. But say if it were greater than .5 units above the mean it would be 5/13 or 16.13%. Percentage that is between 3 units below my mean and 2 units above my mean Again, my data has a very small range, so only one of my values fell outside of the requirement at 30/31 or 96.77%. The number of units required for the top 10% The top 10% would be the number of data above the lower 90%, which is approximately 28 out of the 31 observations. Observation number 29 has a value of 16.7 so the top 10% is approximately the percentage of values that are 1 unit above the mean. In conclusion, looking at the original data I noticed a large peak at H15 or the upper half of the interval 15 and a very small standard deviation, meaning there hasnt been a huge change in the number of births by population. Though, as a whole, the number of births over the chronological years has had a slight decrease over time. I have found that altering data may or may not affect the certain characteristics of the set. When I added 100 to the data set, not much changed, but multiplying it by 1.5 changed the data along with the graphs that were constructed. As for the shape of the original graph, I found it to be skewed right in the histogram, stem plot and box plot. This is also how the graphs for the increase by 50% looked but were skewed right in a different fashion, and had some bimodal qualities. I nd it might be interesting as a follow-up to see if certain large events have effected the birthrates over time, but that will be for another day!

Potrebbero piacerti anche