Sei sulla pagina 1di 2

STAT 301 Project #1

Instructions:
Provide complete and clear explanations to all questions using the guidelines from class
Throughout the project, remember the importance of context!
.
Graphs must be labeled for clear communication.
To adequately label your graphs and describe the data behavior, you MUST use the information from
Information on DEMOGRAPHIC.MTW. it is not sufficient to use the variable name from the
dataset. (You do not need to follow the data source link in this document, they are provided as an FYI.)
Be especially aware of unitssome of the rates are NOT percentages, so you must define and use
accurate descriptions
Your project must be typed, easy to read and have the answer to each question identified by question
number/letter.
Copy and paste the specified Minitab output, including graphs and number summaries, into the
Word document.
The indicated Minitab output MUST appear with the discussion, not as separate pages.
Be sure to include your name, your instructors name, and your class day and time on a cover sheet at
the beginning of your project.
Your project must be printed and handed in at the beginning of class on the due date specified on the
class schedule. Projects cannot be submitted via email and are only accepted late in documented
extreme emergencies, after faculty approval.




The data set for this project contains data a various demographic characteristics for each of the individual states in the
US. The data is taken from the Statistical Abstract of the United States for 2007 which gives the latest available data for
each of the variables. (This is why you will see for different years for different variables. Luckily, this type of demographic
data does not typically change dramatically from within one year or two years unless there is some dramatic influence.)
These data can be found in the data set DEMOGRAPHIC.MTW, under the data link at .math.iupui.edu/~dhall/

We will begin to investigate this data set by examining several of the variables. REMEMBER: All graphs and calculations
must be completed in Minitab and the Minitab output cut and pasted into your document WITH any other required
information for each problem. (Do NOT Append output at the end!)

1. Start by adding an additional column to your downloaded data set. To do this you will be grouping the census
divisions into their Census Regions shown on the map at this link:
://www.census.gov/prod/2006pubs/07statab/cover2.

In Minitab you will select the tab labeled Data. From the pull down menu, select Code>text to text.

Then, fill in your window as follows: (The C5 will appear in this code window at the completion of this step.)

Click OK. When you look at your data set, you will see that the previously empty C5 now has the new column of
data. You should type Census Region in the top cell to label it. If you want to have this updated data set (with the
new column) to work with in a different session, you must save this data set to a USB drive if working in a lab or to
your desktop if working at home. Otherwise you will need to repeat this step each time. There is no output from
this step to paste into your project.

2. Make a histogram of %noHS data (use cut points & 1 percentage point width classes starting with 7 to <8, 8 to <9,
etc.). Use Minitab to find the descriptive statistics (cut and paste this output). Identify the mean and median (values
and label) of the data in the appropriate locations along the x-axis on

the histogram using the annotation tools.
Identify Indianas value on the histogram as well. Using only the histogram, describe the distribution of this variable.
Explain why describing the center is not very useful for a distribution with its distinctive characteristics.
3. Find the numerical summaries (descriptive statistics) and create a boxplot of the % Below Poverty 05 data. Where
does Indiana fall in this data (Give its value and its quarter location)?

4. The numerical summaries (and box plot) in the previous question fail to show important features of the distribution.
Make a stem plot of the distribution of the % Below Poverty 05 data. Describe the variable distribution from the
stem plot; be sure to specifically note the features that are not evident in the numerical summary output from
descriptive information in the previous problem. How does what you observe support the following statement?
Remember to always start with a graph of your data numerical summaries are not a complete
description.

5. Make a dotplot of the InfMortRate 03 data. Using only the plot (no number summaries on this step), describe the
distribution of this data (in context) using the guidelines from class. Are there any states that you consider outliers?
If so, use the plot and the original data set to identify them by name. What was Indianas infant mortality rate in
2003? (Remember that dot plots will often show spaces when the data set is small, even when the values are fairly
close.)

6. Make another dot plot of the InfMortRate 03 data but this time use the stack groups option. The census region will
be your categorical variable. Compare and contrast the InfMortRate 03 data in each census region. Be specific!

7. Create side-by-side box plots of the InfMortRate 03 data by census region, the % Below Poverty 05 by census region
and the %noHS by census region. (You will have three separate sets of box plots in separate windows, with 4 box
plots per set) Identify the regions that seem to exhibit corresponding rates on the sets of box plots. What does this
seem to suggest that might warrant further investigation? Are there other variables in the Demographic data set
that you think should also be investigated? Give at least 2 specific variables and explain your choices.

Potrebbero piacerti anche