Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Objectives:
1. Define Statistics
2. Explain commonly used terms in Statistics.
3. Use the commonly used terms in statistics during class discussion
Concepts:
Statistics or statistical methods is concerned with scientific methods for
collecting, organizing, summarizing, presenting and analyzing data, as well as drawing
valid conclusions and making reasonable decisions on the basis of such analysis
Data are observations such as measurements, gender and survey responses.
Data are sometimes used to find statistics. Statistics is used as the plural form of
the term statistic, which mean statistical tool, used in dealing with the sample.
Examples are t-test chi-square test, Spearman rank correlation, mean, standard
deviation, etc. Measures obtained from a sample, like average test scores, proportion
of respondents agreeing or disagreeing with an opinion statement, and so on, are called
statistics. A Statistic is a numerical characteristics which describes a sample. It
indicates something about the sample as a parameter, indicates something about a
population.
Purpose of Statistics
According to Ary and Jacobs (1976), statistics is a body of scientific methods
used for analyzing quantitative data. Statistical procedures have two functions: (1) they
aid the scientist in organizing, summarizing, interpreting, and communicating
quantitative information obtained from observations and (2) they allow the scientist to
extrapolate the data to reach tentative conclusions about the large group from which the
smaller group was derived.
Population or the universe refers to the collection of all elements ( score, people,
measurements and so on ) under study, or under consideration. A small part of this is
called a sample. If the sample is representative of a population, important conclusions
1
STATISTICS
about the population can often be inferred from the analysis of the sample.
Representativeness is usually approximated by means of random selection. This is a
procedure in which every object or individual in the population has an equal chance of
being included in the sample. This random procedure ensures that chance alone
determines the members chosen for the sample.
The phase of statistics, which seeks only to describe and analyze a given group
without drawing any conclusions or inference about a large group, is called descriptive
or deductive statistics.
The phase of statistics dealing with conditions under which such conclusions
about the population can often be inferred from analysis inference is called inferential
statistics or inductive statistics. Because such inference cannot be absolutely certain,
the language of probability is often used in stating conclusions.
Kinds of Data
Quantitative data consist of numbers representing counts or measurement.
Example is the income of college graduates.
Qualitative of categorical data can be separated into different categories that are
distinguished by some nonnumeric characteristics. Example the gender of the teacher.
Continuous data results from infinitely many possible values that corresponds
to some continuous scale that covers a range of values without gaps, interruption or
jumps. The height of an individual, which can be 62 inches, 63.8 inches, 65.8615
inches, depending on the accuracy of the measurement, is a continuous variable.
Discrete data results when the number of possible values is either finite number
of a countable number. The number of children in the family is discrete – it can only
assume any of the values 0, 1, 2, 3, etc. but cannot be 1.5, 3.3, 4.5, etc.
In general, measurements give rise to a continuous data while enumeration and
Levels of Measurement
The nominal level of measurement is characterized by data that consist of
names, labels and categories only. Since nominal data lacks ordering or numerical
2
STATISTICS
____1.1 the highest and lowest scores in statistics were announced by the teacher.
____1.2 The principal had every seventh student on her list of freshman take a reading
test in order to get an idea on how well all freshmen read.
____1.4 A head teacher administered spelling to all his sixth grade pupils who scored
above the national norm on the test.
3. An investigator has a list of (a) all fifth graders in the city school system. From the list
she drew a (b) subgroup of 50 children for the study.
4. The average height of the population of the third grade boys in Cabanatuan City is
called a ___________ . The average height of a sample of third grade boys in the city is
called ___________.
3
STATISTICS
Population____________________________________
4
STATISTICS
5
STATISTICS
The footnote which immediately appears below bottom line of the table explains,
quantifies or clarifies item in the table which are not readily understandable or are
missing proper symbols are used to indicate the items that are clarified or explained.
Source note is generally written below the footnote indicates the source of data
presented in the table.
Table Number : Title (Headnote)
Sub Head Master Caption
Column Column Column Column
Caption Caption Caption Caption
Stub
6
STATISTICS
symbols (e.g. asterisks, circles etc.) should be used to distinguish the lines. In any set of
line graphs, plotting symbols and line styles should be used consistently. Also, consider
using the same scale on each graph, when comparisons are to be made across graphs.
Bar Charts Bar charts display simple results clearly. They are not generally
useful for large amounts of structured information. Since the horizontal axis represents
a discrete categorization, there is often no inherent order to the bars. In this case, the
chart is clearer to read if the bars are sorted in order of height, e.g. the first bar
represents the variety with the highest yield; the next bar displays the second highest
yield and so on. The opposite direction, ascending order, can also be used. This advice
has to be compromised when there is a series of charts with the same categories. In
this case, it is usually preferable to have a consistent bar order throughout the series.
Also in a series of bar charts, the shading of the different bars (e.g. black, gray, diagonal
lines etc.) must be consistent. It is frequently useful to "cluster" or group the bars
according to the categories they represent, to highlight certain comparisons. The
method of grouping should be determined by the objective of the chart.
The essential parts of a graph are number at the bottom of the graph, scale,
classification or arrangements, classes (categories indicated in the x and y axis)
symmetry, footnote and source.
Exercises:
1. Record your electric bill for the last 10 months. Present the data in textual,
tabular and graphical form.
2. Go to the nearest busy road in your place. Record all vehicles that pass in a
particular point. Present the data in textual, tabular and graphical form.
3. Number of male and female students in the different classes in the graduate
school.
7
STATISTICS
Age Frequency
54 3
55 1
56 1
57 2
58 2
60 2
8
STATISTICS
Advantage of the mode has an advantage over the median and the mean
as it can be found for both numerical and categorical (non-numerical) data.
There are some limitations in using the mode. In some distributions, the
mode may not reflect the center of the distribution very well. When the
distribution of retirement age is ordered from lowest to highest value, it is easy to
see that the center of the distribution is 57 years, but the mode is lower, at 54
years.
54, 54, 54, 55, 56, 57, 57, 58, 58, 60, 60
It is also possible for there to be more than one mode for the same
distribution of data, (bi-modal, or multi-modal). The presence of more than one
mode can limit the ability of the mode in describing the center or typical value of
the distribution because a single value to describe the center cannot be
identified.
In some cases, particularly where the data are continuous, the distribution
may have no mode at all (i.e. if all values are different).
In cases such as these, it may be better to consider using the median or mean,
or group the data in to appropriate intervals, and find the modal class.
The median is the middle value in distribution when the values are
arranged in ascending or descending order.
The median divides the distribution in half (there are 50%
of observations on either side of the median value). In a distribution with an odd
number of observations, the median value is the middle value.
Looking at the retirement age distribution (which has 11 observations), the
median is the middle value, which is 57 years:
54, 54, 54, 55, 56, 57, 57, 58, 58, 60, 60
When the distribution has even number of observations, the median value
is the mean of the two middle values. In the following distribution, the two middle
values are 56 and 57, therefore the median equals 56.5 years:
9
STATISTICS
52, 54, 54, 54, 55, 56, 57, 57, 58, 58, 60, 60
The median is less affected by outliers and skewed data than the mean, and is
usually the preferred measure of central tendency when the distribution is not
symmetrical.
The median cannot be identified for categorical nominal data, as it cannot be
logically ordered.
The mean is the sum of the value of each observation in a dataset divided by the
number of observations. This is also known as the arithmetic average. Looking at the
retirement age distribution again:
54, 54, 54, 55, 56, 57, 57, 58, 58, 60, 60
The mean is calculated by adding together all the values
54+54+54+55+56+57+57+58+58+60+60 = 623) and dividing by the number of
observations (11) which equals 56.6 years.
The mean can be used for both continuous and discrete numeric data. The
mean cannot be calculated for categorical data, as the values cannot be summed. As
the mean includes every value in the distribution the mean is influenced by outliers and
skewed distributions.
Answer the following :
1. Find the mean, median, mode, and range of the data set:
8, 6, 7, 5, 6, 2, 5, 9, 9, 4, 5
mean = ____ median = ______ mode = ____ range = ______
2. Lewis has the following data: 2, 5, m, 2, 4, 3. If the mean is 3, which and number
could m be?
3. Study how to get the mean, median mode using Microsoft excel.
10
STATISTICS
TESTING HYPOTHESIS
Objectives
1. State the two type of hypotheses.
2. Decide the level of significance to be used in a certain problem
3. Explain the five step procedure in testing hypothesis
Hypothesis
A statement that are still to be resolved whether they are true or not is called
hypothesis. A hypothesis is subjected to testing statistically. If it is to be true, then it is
accepted, but if it is found to be false, it is rejected.
Types of Hypothesis
There are two kinds of hypothesis, the null hypothesis and the alternative
hypothesis. A null hypothesis must express the idea of non-significant difference or no
significant relationship which is denoted by Ho while, alternative hypothesis is just the
negation of the null hypothesis, and denoted by Ha. Rejection of null hypothesis leads
to the acceptance of the alternative hypothesis.
Example:
Problem: A teacher wanted to find out who among his students, male or female,
perform better in mathematics.
Types of Errors
In real life, making a decision is sometimes difficult. We are not sure if we made
the right or wrong decision. In hypothesis testing, there is also the possibility of
11
STATISTICS
Level of Significance
The probability of committing the type I error is called the level of significance.
The level of significance for type I error is denoted by alpha (α), while the probability of
committing a type II error is denoted by beta (β). The value of α is equal to the
probability of making an error in rejecting the null hypothesis when in fact it is true.
Similarly, the value of β is equal to the probability of committing an error in accepting the
Hº when in fact it is false.
Choice of the value of the significance level ranges from .01 to .10, depending
on task risk the researcher is willing to take in making a type I error. A .01 or 1% level
of significance means the researcher is giving 1% error in his decision. Moreover, it
implies to the 95% confident of his decision to be right. Likewise for α= .05 and α=.01.
Types of Tests
A hypothesis could be one-sided (directional) or two-sided (non directional). The
one-sided test hypothesis is referred to as one-tailed best. If the alternative hypothesis
is expressed in terms like “greater than” or “less than”, it is called one-tailed test. The
rejection region lies only in one tail of the distribution.
7. Conclusion is the last step in hypothesis testing. It is the part where the
researcher explains his decision. Interpreting a result may not end by simply saying
12
STATISTICS
Exercises
1. A researcher wanted to determine which venue is effective in the conduct of K-12
training, hotel or school? .
1.1 State the problem based on the given situation.
1.2 Give your null and alternative hypothesis.
1.3 What kind of test are you going to use?
1.4 What is your preferred level of significance?
2. What is your basis in accepting or rejecting the null hypothesis.
03. When you reject the null hypothesis, what is your conclusion?
13
STATISTICS
T-test
Objectives
1. Use T- test in testing hypothesis.
Concepts
There are several techniques used in testing hypothesis and one of them is the
significance of the difference between means. Three of the techniques are: first for
cases with dependent or correlated samples; second, for cases with independent or
uncorrelated samples { when cases are few, N less than 30
(N 30)} ; and third, when cases are not few or N is more than 30 (N 30)
Problem Situation :
Students were tested on their ability to predict how moving bodies behave, both
before and after attending a course on Newtonian Physics. Their marks are tabulated
here. The resulting data is presented in Table 1.
Table 1: Students marks before and after the course in Newtonian Physics
14
STATISTICS
Difference
Students Before After D D2
Boots 45 42 -3 9
Melody 56 50 -6 36
Trinity 32 19 -13 169
Efrey 76 78 2 4
Gee 65 63 -2 4
Helen 52 43 -9 81
Jelyn 60 62 2 4
Imme 87 90 3 9
Joan 49 38 -11 121
Abby 59 53 -6 36
Total 581 538 D=-43 D = 473
2
then:
-43
t =
10 ( 473 ) – ( -43 ) 2
10 – 1
t =-43/17.89
The value of t may either be positive or negative. In using the table on critical
values of t, the absolute value of t is considered. The degree of freedom in this situation
is N-1 or 10-1 = 9
Using the table on critical values of t, for 9 degrees of freedom, the required
value is 2.26 for significance at the 5 percent.
4. Interpretation:
Since the computed / obtained value of t = 2.40 is greater than the
required value of t = 2.26 at the 5 percent level, the obtained value of t = 2.40
is significant.
5. Decision:
Reject the null hypothesis : There is significant difference in the score of
the students before and after attending the course on Newtonian Physics.
Accept the alternative hypothesis : There is significant difference in the
score of the students before and after attending the course on Newtonian
Physics
16
STATISTICS
This test is used when only two unrelated groups are being compared and the
measurements are either interval or ratio. The two groups may or may not have the
same number of samples.
17
STATISTICS
Method N x S
Calculator-Based ( 1 ) 15 28.6 5.9
Lecture ( 2 ) 14 21.7 4.6
28.6 – 21.7
t =
(15 – 1) 5.9 2 +(14 -1) 4.62 1 + 1
15 + 14 - 2 15 14
6.9
t =
( 487.34 + 275.08 ) 29
27 210
6.9 6.9
t = =
(28.24)(29/210) 1.97
18
STATISTICS
Exercises:
1. A sociologist is studying the effect of a certain motion picture in film upon
the attitude of Christian students towards Muslim students. She hypothesizes
that viewing a film will cause the scores of the students on a ceratin attitude
scale to shift downward. The score of the participating students are recorded
below.
19
STATISTICS
CHI-SQUARE ( X2 )
Objectives
1. Calculate and interpret the chi-square value X2
2. Apply the chi –square to different situations.
Concepts
This technique is useful method of comparing experimental obtained results with
those to be expected theoretically of some results.
Uses of X2:
1. to determine the relationship or no relationship
2. to test the hypothesis of independence.
x2 = ( fo – fe ) 2
fe
Problem Situation
x2 = ( fo – fe ) 2
fe
X = ( 15 – 17.5 )2
2 + ( 30 – 31.5 ) 2 + ( 25 – 21 ) 2 + ( 20 – 20 ) 2 + ( 40 –36
2
) +
17.5 31.5 21 20 36
21
STATISTICS
Objectives
1. Calculate and interpret the linear correlation coefficient r
2. Apply the Pearson Product-Moment Coefficient Of Correlation to daily life
situation.
Concepts
The linear correlation coefficient r measures the strength of the relationship
between two paired variables. The linear correlation is sometimes referred as to as
Pearson Product-Moment Coefficient of Correlation( R Xy ) in honor of Karl Pearson. It
is computer by using the formula
rxy = N XY – ( X ) ( Y )
√ [ ( N X2 – ( X)2 ( N Y 2 – ( Y ) 2 ]
where
N = represents the number of pairs of data present
= denoted the addition of the items indicated
X = denotes the sum of all x- variable
X2 = indicated that each the x- value should be squares and then those
squares added.
(X)2= indicates that the x-values should be added and the total then squared
XY= indicated that each x-value should be first multiplied by its corresponding y-
values. After obtaining all such, find the sum.
rxy = represents the linear correlation coefficient for a sample
To interpret the correlation coefficient value (rxy) obtained, the following classification
may be applied.
An r from -+0.00 to 0.20 denotes negligible correlation
An r from -+0.21 to 0.40 denotes low or slight correlation
An r from -+0.41 to 0.70 denotes marked or moderate relationship
An r from -+0.71 to 0.90 denotes high relationship
An r from -+0.91 to 0.99 denotes very high correlation
An r from -+1.00 denotes perfect correlation
Problem Situation
To identify the relationship of the performance in Mathematics (X) and English
(Y) of 10 BSE students of a certain college, a test was administered. The following are
the results based on their achievement test.
22
STATISTICS
X Y X2 Y2 XY
30 35 900 1225 1050
43 44 1849 1936 1892
53 57 2809 3249 3021
45 44 2025 1936 1980
70 80 4900 6400 5600
45 47 2020 2209 2115
68 75 4624 5625 5100
48 47 2304 2209 2256
38 35 1444 1225 1330
45 46 2025 2116 2070
485 510 24905 28130 26414
Computation of r xy
rxy = N XY – ( X ) ( Y )
√ [ ( N X2 – ( X)2 ( N Y 2 – ( Y ) 2 ]
r xy = 10 ( 26414 ) – (485)(510)
√ [10 (24905) – ( 485)2 10 (28130) – ( 510) 2 ]
rxy = 16790
√ 17119.87149
r xy = 0.98 ( very high relationship )
4. Interpretation:
Since the compute/ obtained value of rxy = 098 indicates a very high
relationship between the student’s performance who got high scores in in
Mathematics and English. Students who got high scores in Mathematics also
23
STATISTICS
got high scores in English, and those who got low scores in Mathematics also
got low scores in English.
5. Decision:
Reject the null hypothesis that: There is a very high relationship between
the performance in Mathematics and English based on the test scores on 10
BSE students in a certain state college.
Problem Situation
A retain outlet of air conditioners believes that its weekly sales are dependent
upon the average temperature during the week. It picks at random 12 weeks in 2007
and finds that its sales are related to the average temperature in these weeks as
follows:
Mean Sales
temperature (no. of air conditioners
(degrees ) )
72 3
77 4
82 7
43 1
31 0
28 0
81 8
81 5
76 5
60 4
50 4
55 5
Is it correct that its weekly sales are dependent upon the average temperature of the
week?
24
STATISTICS
Using the table of r values, if n = 12, the critical value is .5529 at .05 level
4. Interpretation:
The computed /obtained value of rxy = 0.8536 is greater than the critical
value r = 0. 5529
5. Decision:
Reject the null hypothesis that “There is no significant relationship
between weekly sales and average temperature for the week.”
25
STATISTICS
26
STATISTICS
.3246
24 1.711 2.064 36.415 40
.3044
25 1.708 2.060 37.652 45
.2875
26 1.706 2.056 38.885 50
.2732
27 1.703 2.052 40.113 60
.2500
28 1.701 2.048 41.337 70
.2319
29 1.699 2.045 42.557 80
.2172
30 1.697 2.042 43.773 90
.2050
40 1.684 2.021
60 1.671 2.000
120 1.658 1.980
1.645 1.960
ANALYSIS OF VARIANCE
Objectives
Use ANOVA in testing hypothesis
Identify the condition in which one can use analysis of variance in testing
hypothesis.
Concepts
ANOVA is used to test the significance of difference between means of 2 or more
selected sets of data simultaneously. It is a method of dividing the variation observed in
experimental data into different parts, each part assignable to a know source, caused or
factor.
1. ONE-WAY ANOVA
Problem Situation
Three groups of six students were subjected to one of three types of teaching
method. The grades of the students are taken at the end of the second grading period
and enumerated accordingly to grouping.
27
STATISTICS
Steps:
1. x = 534+465+561 = 1560
x2 = 47 636+36 275 + 52 573 = 136 484
2. Total sum of squares ( TSS )
TSS = X2 –( x )2 = 1 284,
N
TSS = 136 484 - (1560) 2 /18
3. The between-column variance or between-column sum of squares is 1/r of the
sum of the squares of the column sums, minus the correction ter5m, where r
refers to the number of rows. ( SSb )
SSb = 1 ( X )2 - ( X ) 2 = 817
r N
SSb = 1/6 (534 +4652 +5612) – 15602 /18
2
5. These three sums of squares are place in an Analysis of Variance table which
contains the sources of variations, their sums of squares, their corresponding
degrees of freedom, ad the estimated variance of an analysis of variance table.
28
STATISTICS
Total 1284 17
7. After completing the entries in the ANOVA table, the F- Test formula is applied
F= MSSb = 408.5 = 13.12
MSSw 31.13
This long computation can be simplified by using the Data Analysis Toolpack of
Microsoft Excel. Copy and paste the raw score on an excel worksheet.
Students Group1(Xa) Group2(Xb) Group3(Xc)
Method A Method B Method C
A 84 70 90
B 90 75 95
C 92 90 100
D 96 80 98
E 84 75 88
F 88 75 90
SUMMARY
Groups Count Sum Average Variance
Column 1 6 534 89 22
Column 2 6 465 77.5 47.5
Column 3 6 561 93.5 23.9
ANOVA
Source of
Variation SS df MS F P-value F crit
Between Groups 817 2 408.5 13.12 0.000508 3.68
Within Groups 467 15 31.13
29
STATISTICS
Total 1284 17
4. Interpretation
Since the obtained value of F = 13.12 is greater than the required value of
F = 3.68 to be significant at the .05 level of significance with df = 15 and 2,
therefore the obtained value of F = 13.12 is significant. This means that
there is a significant difference among the teaching methods.
5. Decision
Reject Ho ( Since Fc is greater than F tab, that is 13.12 3.68)
Exercise
The values on the table are measured maximum breadths of male. Egyptian
skulls from different epoch. Changes in the head shape overtime suggest that
interbreeding occurred with immigrant population . Use 0.05 level to test the
claim that the mean is the same for the different epoch.
30