Sei sulla pagina 1di 39

DATA COLLECTION &

REPRESENTATION
(STATISTICS)
TEACHER: JOSE ARTURO GONZALEZ
THINKING PROBLEM
Manuel and Santiago play for the same basketball team. Unfortunately, during practice Manuel
suffered an injury and could only play half the season. The points scored by both boys in each
match were:

Manuel: 17,21,15,23,18,12,27,15,22,31,28,25
Santiago: 19,19,13,10,15,15,24,18,26,27,23,13,20,24,18,26,19,25,8,26,21,23,26,19

Which player’s performance was better?

Things to think about:


• Would it be fair to simply total the points each player scored for the season?
• How could we display the data in a meaningful way?
• What would be the “best” way to solve the problem?
INTRODUCTION
Statistics deals with the collection, organization, display, analysis and interpretation of data.
Individuals and many groups such as businesses and government agencies collect data. This data is then
transformed into information that will later be used to determine whether changes are needed, or if
changes that have been made were successful.
For example, governments every so often perform what is known as a census. A census is conducted to
gather data from all of the country’s population. Once this data is transformed into information it is
used to help make decisions which will affect that country’s future. The government might have to
consider how much money it needs to be provided for health care in the years ahead because the
number of elderly people has increased.
Thinking exercise:
• What might a government decide needs to be done if the country’s birth rate has increased?
• What might a government decide needs to be done if the country’s birth rate has decreased and
it’s elderly population is increasing?
Usually the results of the collection and interpretation of data is displayed by using graphs, tables and
diagrams.
SAMPLES AND POPULATIONS
Important words used in statistics:
Population: Refers to the whole group of people or objects from where we are collecting data.

Sample: A representative group chosen from the population to take part in the survey, be measured,
or be tested.
Random Sample: A sample selected so that any person or object has the same possibility of being
selected than any other.

Inference: A conclusion we can make based on the information that was collected and interpreted.

Consider the following example, suppose we want to determine how many students at CCB like vanilla
ice cream. What could the population be? How could we select a sample? What might an inference be?
SAMPLES AND POPULATIONS
Special Cases:
When a government carries out a CENSUS it involves gathering information from everyone in the
population. This process is very expensive and takes lots of time.

Because of the previous statement many governments may decide to gather the required information
from a sample of the population. To do this, and to make any inference real it is critical that the results
be as typical of the whole population as possible. To ensure this, it is important to randomly select the
sample and to make the sample as large as is practical.

Class Discussion (To be answered and submitted using classroom)


1. Discuss why would:
a. Apparel manufacturers like to know the body measurements of people in different age groups.
b. CCB’s restaurant be interested in the types and quantities of food consumed.
c. Meteorologists be interested in temperature, rainfall and atmospheric pressure measurements
throughout the country and throughout the world.
2. For each of the three situations given in question 1, discuss how information could be collected.
SAMPLES AND POPULATIONS
Example:
SAMPLES AND POPULATIONS
Exercises:
SAMPLES AND POPULATIONS
Exercises:
SAMPLES AND POPULATIONS
Exercises:

5 Scientist in the jungle want to find the best estimate for the lion population. They
tagged and released 20 lions as part of a research project. Later, they found 160
lions, 8 of which where tagged. Find the nearest whole number that best estimates
the lion population?

6 Juanita works in an Ornithology Department. Students asked her to find out the
best estimate of the local bird population. So she tied a belt around the legs of 40
birds. A few days later, he observed 520 birds, 34 of which had belts. To the
nearest whole number, what is the best estimate for the bird population?
CATEGORICAL DATA
When we talk about categorical data we refer to data which can be placed in categories.

An example could be if we stand at a street intersection and record the color of the different cars driving
past the intersection. In this case we could use the following code for the colors; R for Red, B for blue, G
for green, W for white and O for all other colors.

We could then obtain the following results after observing a 50 car sample:
BGWWR OGWRW OOBBG OGRWR WWWGB
BBGGW WWWOG WOBWW RWWRB OOBWR

Once we have our categorical data, we first organize it in groups. To do this we can either use a:
a. a dot plot or
b. a tally and frequency table.
At this point we can identify key features of the data. For example, the mode. The mode is the most
frequently occurring category.
A dot plot is a graph used to display data, each dot represents one data value. They can be horizontal
or vertical.
CATEGORICAL DATA

Example:
CATEGORICAL DATA
(DOT PLOT)

Exercises:
CATEGORICAL DATA
(DOT PLOT)

Exercises:
CATEGORICAL DATA
(TALLY & FREQUENCY TABLES)

If the problem we are studying has lots of data, it might be easier to use a tally and frequency table. This
tool will help us in the data collection process.

The tally part is used to keep a count of data in each category. The frequency simply summarizes the
tally, meaning it lets us know the total number of each category.

This type of table is sometimes called a frequency distribution table or simply a frequency table.
Example:
CATEGORICAL DATA
(TALLY & FREQUENCY TABLES)

Example:
CATEGORICAL DATA
(TALLY & FREQUENCY TABLES)
Exercises:
CATEGORICAL DATA
(TALLY & FREQUENCY TABLES)

Exercises:
GRAPHS OF CATEGORICAL DATA
Bar Graphs
Bar Graphs consist of rectangular shaped columns of equal width. The height of each column represents
the number of observations (frequency) of the different categories.
Example:
GRAPHS OF CATEGORICAL DATA
Bar Graphs
Exercises:
GRAPHS OF CATEGORICAL DATA
Bar Graphs
Exercises:
GRAPHS OF CATEGORICAL DATA
Pie Chart
Pie Charts are a useful of showing how a quantity is divided up. A full pie/circle represents the whole
quantity. We can then divide the pie into wedges or slices to show the frequency of each category.

The table opposite shows the results when 8th grade students were asked
“What is your favorite fruit?”
!
There are 60 kids in the sample, so each person is entitled to "# 𝑡ℎ of the
!
pie chart. "# 𝑡ℎ of 360ª is 6ª, so we can determine the angles of the
different wedges in the pie chart.

13 x 6ª = 78ª for orange


21 x 6ª = 126ª for apple
10 x 6ª = 60ª for banana
7x 6ª = 42ª for pineapple
9x 6ª = 54ª for pear
GRAPHS OF CATEGORICAL DATA
Example:
GRAPHS OF CATEGORICAL DATA
Pie Chart
Exercises:
GRAPHS OF CATEGORICAL DATA
Pie Chart
Exercises:
NUMERICAL DATA
When we talk about NUMERICAL DATA, we refer to data which is in number form.

Numerical data can be arranged using either a stem-and-leaf plot or a tally and frequency table. As in
the case of categorical data, numerical data can also be presented by a bar/column graph.

STEM-AND-LEAF PLOTS

A stem-and-leaf plot can be used to show a set of data in order.

Consider the weights (kg) of firefighter recruits:

101, 91, 83, 84, 72, 93, 67, 85, 79, 87, 78, 89, 68, 80, 107, 70, 85, 64, 95, 76, 87, 74, 68, 59, 82, 77

For each data value, the units digit will be the leaf, and the digits before it determines the stem on which
the leaf is placed.

For this example the stem labels are 5, 6, 7, 8, 9, and 10. These will be written under one another in
Ascending order.
NUMERICAL DATA
Once the stems have been recorded we start to look at each dada value. The first value is 101, here 10
is the stem and 1 is the leaf. So we record a 1to the right of the stem label 10. The next value we see is
91. Here its stem label is 9 and its leaf would be 1. Again we record a 1 to the right of the stem label 9.
We proceed to record all the data in an un ordered stem-and-leaf plot.
NUMERICAL DATA
Example:
NUMERICAL DATA
Exercises:
NUMERICAL DATA
Exercises:
WORKING WITH NUMERICAL DATA
Example:
WORKING WITH NUMERICAL DATA
Exercises:
WORKING WITH NUMERICAL DATA
Exercises:
MEASURES OF CENTRAL TENDENCY
The mean or average of a set of numbers is an important measure of their middle (central tendency). We
Talk about averages all the time. For example:
MEAN OR AVERAGE

• The average speed of a car


• Average height or weight
• The average score of an exam
• The average income for a country.
The mean or average is the total sum of all numbers in the data set divided by the number of observations.

Example:
MEASURES OF CENTRAL TENDENCY
Exercises:
MEAN OR AVERAGE
MEASURES OF CENTRAL TENDENCY
Exercises:
MEAN OR AVERAGE
MEASURES OF CENTRAL TENDENCY
The Median of a data set is dependent on whether the number of observations in the data set is odd or
even. To determine the median, first reorder the data set from the smallest to the largest then if the
MEDIAN & MODE

number of observations is odd, then the median is the observation in the middle of the data set. If the
number of observations is even, then the median is the average of the two middle observations.
MEASURES OF CENTRAL TENDENCY
The Mode for a data set is the observation that occurs the most often. It is not uncommon for a data set
to have more than one mode. This happens when two or more observation occur with equal frequency in
the data set. A data set with two modes is called bimodal. A data set with three modes is called
MEDIAN & MODE

trimodal.
MEASURE OF VARIABILITY
The Range for a data set is the difference between the largest value and smallest value contained in the
data set. First reorder the data set from smallest to largest then subtract the first observation from the
last observation.
RANGE