Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Aresgado 2019
Coverage:
Nature of Statistics
Division of Statistics
Definition of Some Basic Statistical Terms
Symbols for Parameter
Symbols for Statistic
Scales of Measurement of Data
Nature of Statistics
Statistics- is the science of collecting, monitoring, analyzing, summarizing, and interpreting data in
order to make decisions.
Biostatistics- tool of statistics are applied to the data that is derived from biological sciences. It can also
be defined as Statistics applied to biological (life) problems, including: public health, medicine, ecological and
environmental.
Areas of Statistics
Descriptive Statistics- consist of methods for organizing, displaying, and describing data by using
tables, graphs and summary measures.
Inferential Statistics- deals with making a judgment or a conclusion about a population based on the
study of a sample that is taken from the population. The member of the sample, chosen as to be representative
of the population, consists of only a small part of the entire population. Provided the sample is taken carefully,
whatever conclusions are made about the sample may also be considered true of the entire population.
Example: Citrus extracts mixed with cucumber extracts can repel and kill fire ants.
Variable- characteristics under study. It may assume any set of values. It is denoted by any capital
letters in English alphabet.
Example: height, weight, scores etc.
Constant- is a quantity that does not change its value.
Example: The Greek alphabet pi (π) is a constant because its only value is 3.1415…
Data- consists of information coming from observations, interviews, counts, measurements, or
responses from a variable.
Lecture Guide in Statistics for Biology Property of: Leo M. Aresgado 2019
Example: Those found in experiment, survey, records and other modes of research.
Ungrouped or Raw data- are data which are not organized in any specific way. They are simple the
collection of data as they are gathered.
Example: The raw height of students.
Grouped data- are raw data organized into groups or categories with corresponding frequencies.
Example: The frequency distribution table.
Population- is the totality or set of all individuals or entities under consideration or study. It does not
necessarily mean collection of people. It can be people or item whose characteristics are being studied.
The population being studied is called target population. Unless a population is small, it is usually
impractical to obtain all population data.
Example: All midwives in the Philippines.
Sample- is a representative set of observations that reflects the characteristics of the population. It is
part of the population or a sub-collection of elements drawn from a population.
Example: Midwives of Occidental Mindoro.
Parameter- numerical description of the characteristics of population.
Example: the Population Mean and the Population Standard Deviation.
Statistic- numerical description of the characteristics of sample.
Example: Sample Mean and Sample Standard Deviation. TIP: Use the mnemonic device of matching
of first letters: Population-parameters Sample-statistic
Classification/Types of variables
1. Quantitative variable- A variable that can be measured numerically is called a quantitative variable. Data
collected from a Q.V. are quantitative data.
Example: height, weight, age, test score, speed and body temperature
2. Qualitative or Categorical variable- Variables that cannot be measured numerically, but can be divided into
different categories. The data collected about such are called quantitative data.
Example: sex, birthplace, eye color, religious preferences and marital status.
Note: There are times that a variable can be expressed both quantitatively and qualitatively. The grades in
school can be expressed in numbers and letters.
Lecture Guide in Statistics for Biology Property of: Leo M. Aresgado 2019
1. Discrete Variables- a variable who’s values are countable. Can assume exact or with no middle value.
Example: Number of cars sold must be 0, 1, 2, 3. Number of cars sold cannot be between 0 and 1.
2. Continuous variable- A variable that can assume any numerical value over a certain range and we
cannot use natural numbers over these values.
Example: between 5 to 20 seconds, time taken for a banker to serve a customer.
Discrete
Quantitative
Variable Continuous
Qualitative
Interval
Quantitative
Ratio
Data
Ordinal
Qualitative
Nominal
LEARNING CHECK!
Direction: In two to four sentences, explain the following. Be guided by the rubric.
Area 3 2 1
Content The explanation is One to two statements in Three or more
clear and correctly the explanation is/are statements in the
stated. incorrect. explanation are
incorrect.
Direction The direction was One or two thing/s in the Three or more things in
followed. direction was/were not the direction were not
followed. followed.
Table 2. Rubric
Total 1: ____/6
Total 2: ____/6
Direction: For items 3-10, write T if it is true and write F if it is false on the space provided before each
number.
3. _____ A statistic is a measure that describe a population characteristics.
4. _____A sample is a subset of a population.
5. _____ Philippine Statistics Authority (PSA) collects all the census data about the population of the
Philippines.
6. _____ Data at the nominal level are qualitative only.
Lecture Guide in Statistics for Biology Property of: Leo M. Aresgado 2019
7. _____ For data at the ratio level, zero entries represents position only and are not inherent zeroes.
8. _____ Data at the ordinal level are quantitative only.
9. _____ For data at the interval level, you cannot calculate meaningful differences between data entries.
10._____ Inferential statistics involves using a population to draw a conclusion about a corresponding sample.
Direction: For items 11-14, determine whether the data set is a population or a sample. Write your answer on
the space provided before each number.
11. __________ The age of each province governor.
12. __________ The speed of every fifth motorcycle passing the market.
13. __________ A survey for 500 students for the main campus with 5000 population.
14. __________ The annual salary of each employee at a company.
Direction: For items 15-18, answer whether the data in each number is qualitative or quantitative. Write your
answer on the space provided before each number.
Direction: For items 19-21, distinguish between statistic and parameter. Write your answer on the space
provided before each number.
19. __________The average grade of 5 of the 14 students is 85.
20.__________In survey of a sample of high school students 43% said that they love Math.
21. __________ In 1997 the interest category for 12% of all new magazines was sports.
Direction: For items, 22-25, classify the data below whether it is nominal, ordinal, interval or ratio. Write your
answer on the space provided before each number.
22. _________The top 3 best midwives of 2017.
Data:
1. Marween
2. Mark
3. Dem
23. __________The four subjects of midwifery.
Data:
1. Biostat
2. Physics
3. Microbiology and Parasitology
4. Anatomy
3. 9 inches
4. 10 inches
25. __________The distance a vehicle travelled.
Data:
1. 1-2 kilometers
2. 3-4 kilometers
3. 5-6 kilometers
Coverage:
Sources of data
Types of data
Methods of collecting data
Determining sample size
Sampling techniques
Methods of presenting data
Different kinds of graphs/charts
Sources of Data
There are two sources of obtaining data. One is called primary source from which first-hand
information is obtained usually by means of personal interview and actual observation. On the other hand, the
secondary source, of information is taken from other’s works, news reports, reading, and those that are kept by
the Philippine Statistics Authority, Securities and Exchange Commission and other government agencies.
Types of Data
1. Primary- gathered directly from the source, based on direct or first-hand experience.
Examples: first person accounts, autobiographies, diaries etc.
2. Secondary- information which are taken from published or unpublished data which were previously
gathered by other individuals or agencies.
Examples: magazines, published books business reports etc.
1. Direct or interview method: Person to person exchange between the interviewer and the interviewee.
Provides consistent and more precise information since clarification may be given by the interviewee.
Questions may be repeated or modified to suit the interviewee’s level of understanding.
2. Indirect or questionnaire method: Written responses are given to prepared questions. Questionnaires
may be mailed or hand-carried.
Advantage: This method is inexpensive and can cover a wide area in a shorter span of time. Informers
may feel sense of freedom to express opinions because greater anonymity is maintained.
Disadvantage: Strong probability of non-response especially if mailed. Questions not understood may
probably not answered well.
3. The registration method: Enforced by certain law. Examples: births, death, motor vehicles, marriages,
license etc.
Advantage: Made systematized and readily available.
Lecture Guide in Statistics for Biology Property of: Leo M. Aresgado 2019
4. The observation method: Investigator observes the behavior of persons or organizations and their
outcomes. It is usually when the subject cannot talk or write. The method makes possible the recording
of behavior at the appropriate time and situation.
5. The experiment method: Used when the objective is to determine the cause and effect relationship of
certain phenomena under controlled condition. Scientific researchers usually use this method.
6. Simulation: Use of mathematical or physical model to reproduce the condition of a situation or a
process. Collecting data often used computers. Allows studying situations that are impractical or even
dangerous to create in real life and often times saves time and money. Examples: medicine makers use
animals to see the effectivity of a drug.
Population Margin of error of Sample size (n) per Error (e) of:
(N)
+/- 1% +/- 2% +/- 3% +/- 4% +/- 5% +/-
10%
500 * * * * 222 83
Where:
n= samples size
N= population size
e= desired margin of error (percent allowance for non-precision because of the use of sample instead of
population). Remember that the larger the size of the sample, the closer its characteristics would be to the
characteristics of the entire population, so the ideal margin of error to use is 3%.
Example:
Task: Determine the sample size if the population is 5,000 with 3% margin of error.
= 5,000/(1+ (4.5)
= 5,000/5.5
=909.09 or 909 look that it is the same answer as the table above
Gay (1976) offers some minimum acceptable sizes depending on the type of research:
1. Descriptive research- 10% of the population. For smaller population minimum of 20 percent may be
required.
2. Correlational research- 30 subjects.
3. Experimental research- 15 subjects per group.
4. Ex post facto/causal comparative- 15 subjects per group.
Sampling techniques
Sampling involves selecting a number of units from a defined population. A sample will be
representative of the population if all members of the population have an equal chance of being picked.
Convenience sampling- Consist only of the available people during the process of data
collection.
Example: Survey the students who come to class early. Often leads to biased studies and not
recommended.
2. Probability sampling techniques: involve random selection procedures. All units of the population
should have an equal or at least a known chance of being included in the sample. Generalization is
possible from sample to population.
Simple Random Sampling (SRS)- Method of selecting the sample size (n) from the universe
(N) equal chance of being in the sample. Pre-requisite is the complete listing of the units of
population, provides researcher list where he would randomly pick. Method used by raffle
system.
Method of SRS:
1. Lottery/Fishbowl sampling- done by simply writing the names or the numbers of all the
members of the population in small rolled pieces of papers which are later placed in a container
where it will be drawn.
2. Table of random numbers- used if the population is large enough. Close your eyes and point
at an entry in the table proceed in any direction until you reach the sample size.
Systematic sampling- allows chance and system in selecting. Each member of the population is
assigned a number. A starting number is randomly selected and then every kth (example: every
3rd, 5th, 100th or 1000th is selected. If N is known, k value can be calculated as: k=N/n
Stratified sampling- Members of the population are separated into groups with similar
characteristics such as age, gender, or ethnicity. Then random sample is selected from each of the
strata.
Types of Stratified Sampling:
1. Simple stratified random sampling- taking equal number of elements from each stratum.
Population Sample
4th year 85 50
rd
3 year 200 50
nd
2 year 215 50
1st year 200 50
TOTAL N=800 n=200
Table 4. Simple Stratified Random Sampling
2. Stratified proportional random sampling- considers the proportion of the subgroups.
Population Proportion Sample
4th year 85 .23 46
rd
3 year 200 .25 50
2nd year 215 .27 54
st
1 year 200 .25 50
TOTAL N=800 1.00 200
Table 5. Stratified Proportional Random Sampling
Lecture Guide in Statistics for Biology Property of: Leo M. Aresgado 2019
1. Textual form- The textual method involves the data being written in words, although other symbols and
even numerals are allowed.
Example:
Types of graphs:
1. The bar graph-The bar graph consists of a series of bars against a plot. This is used for comparison of
data per category.
2. The line graph-The line graph consists of a series of points against a plot connected by line segments.
This is used for denoting progression.
3. The pie chart/circle graph-The pie chart, as the name implies, is in the form of a pie. It is often used to
represent data and its relationship to a whole.
4. The histogram-Is a sequence of touching vertical rectangles. Its height is drawn corresponding to its
frequency.
5. The frequency polygon-Is constructed using the midpoints of a histogram.
6. Pictograph -Visual representation by means of drawing pictures or symbols related to the subject under
study. Legends are sometimes used to represent magnitude of a single unit of the picture then repetitions
of this picture are drawn to indicate differences in quantity.
LEARNING CHECK!
Task: Perform an actual collection, organization and presentation of data. This is an application of what
you have learned.
Steps:
1. Find a partner in the class, someone that you can contact and you can work with.
2. Choose a target group, for example a group of midwives in an office, a group of students in a class,
medical practitioners in an area, people in your work, etc. Do not choose a group with population that is
more than 500 (only on this activity) it will be laborious on your part. Be sure to choose the office or
group that will be easy for you to connect with.
3. Compose a communication letter (if ever needed) requesting the concerned official/head to allow you to
get the names of the members and to conduct a very simple interview. If the official/head granted your
request proceed to step four. If official/head denied your request, look for another target group.
4. Once you already have the approval of the official/head, determine the actual number of the heads of the
population.
5. Using the Sloven’s formula, compute for the number of your sample. Any margin of error is allowed
(margin of error can be any value from 1 to 10%).
6. When you already have a sample, apply any sampling technique that you have learned from the previous
pages whether convenience sampling, lottery etc. For example, I have to get a total of 83 samples from
500 individuals in a population (10% margin of error) and I prefer using lottery method. What I will do
is to write/type the 500 names of the population and pick only 83 of them. The 83 names that I picked
will be my sample.
Lecture Guide in Statistics for Biology Property of: Leo M. Aresgado 2019
7. For the steps 1-6, do not use any paper write the information in the Learning Check Form 1 (attached).
8. Using the Interview Questionnaire Number 1 (attached) get information about your sample. You are
free to change the question from simply asking “what is the age?” into “how many years they have
been working in their job?” or anything that you want.
9. The result of the interview will be used in the activity for the next chapter.
Note: Use the box to show the solution in computing for the sample size using the Sloven’s formula.
Explain in not more than eight sentences the process on how you did the sampling.
__________________________________________________________________________________________
__________________________________________________________________________________________
__________________________________________________________________________________________
__________________________________________________________________________________________
__________________________________________________________________________________________
__________________________________________________________________________________________
__________________________________________________________________________________________
__________________________________________________________________________________________
__________________________________________________________________________________________
Lecture Guide in Statistics for Biology Property of: Leo M. Aresgado 2019
Note: If you have more than 60 samples you may write the other data in a separate paper and attach that here.
Lecture Guide in Statistics for Biology Property of: Leo M. Aresgado 2019
Coverage:
After knowing the different sampling techniques and how to collect data, we must give more meaning to
it. In this chapter, we will consider the tabular presentation through frequency distribution.
A Frequency Distribution is a tabulation of grouping data into appropriate categories showing the
number of observations in each group or category.
1. Class Limit
Groupings or categories defined by lower and upper limits.
Example:
71 - 75
Lower Class Upper Class
76 - 80
Limit Limit
81 - 85
Table 6. Class Limit
2. Class Width/Size
It is the width of each class interval.
Example:
Lower Upper
Limit Limit
71 75 Class Size-5
76 80
81 85
Table 7. Class Width/Size
3. Class Boundaries
The numbers used to separate class but without gaps created by class limits. The number to be
subtracted or added is half the difference between the upper limit of one class and the lower limit of the
next class.
Example:
4. Class Marks/Midpoint
These are the midpoints of the lower and upper class limits. They can be found by adding the lower and
upper limits and then divide by two.
Example:
Class Interval Class Mark Step
71-75 73 71+75/2=73
76-80 78 76+80/2=78
81-85 83 81+85/2=83
Table 9. Class Marks/Midpoint
Supposed that patients in a hospital for a particular day were interviewed about their ages and the data
are put in a table. We will be using the data below to show the steps in constructing a frequency distribution
table.
3 16 14 15 13 14 9 17 16 21
4 13 16 23 12 8 10 21 13 14
6 12 10 13 23 5 20 7 13 18
21 10 14 8 15 14 16 17 17 17
21 11 12 9 21 9 12 6 11 12
Table 10. Ages of Patients in a Hospital
Step 1: Find the range of the values. Range is highest value minus lowest value. Look at the table above, you
will see that the highest value of age is 23 and lowest value of age is 3.
Example: range (R) =23-3
Therefore the range (R) is 20
Step 2: Determine the class size/class width by dividing the range by the desired number of groupings/class
intervals. Take note of the term “your desired”. Let say for example, I want to group the ages of the
hospital patients above into five groups/class intervals, and then what I have to do is this: class size/class
width =20/5. Therefore the class size/class width is 4.
There is a more systematic way of computing the number of groupings/class intervals rather than simply
deciding on your own how many groups you want. This is through the use of the Sturge’s formula,
K=1+3.3 log n
Where:
K=is the number of class interval.
n=total number of observations.
In the data above, there is a total of 50 ages gathered, that will serve as the n.
Example:
K=1+3.3 log (50)
K=1+3.3 (1.69897)
K=1+5.60660
K=6.607
K≈7 (approximately 7 class intervals)
Lecture Guide in Statistics for Biology Property of: Leo M. Aresgado 2019
The K or the number of groupings/class intervals was obtained not by simply deciding by your own, but
through systematic computation. Just remember that any of the two, whether you simply decide the number of
groupings/class intervals or you apply the Sturge’s formula both are accepted.
Thus,
groupings/class intervals (C) =20/7
groupings/class intervals (C) =2.857
groupings/class intervals (C) ≈3 (approximately 3 class width/size)
Step 3: Set up the class limits of each class. The limits of each class are defined by a lower limit and an upper
limit. The highest observed value should be part of the highest class interval while the lowest observed
value should be part of the lowest class interval. Use lowest score as the lowest lower limit. If you are
asking what the class limit is, go back to the previous page and see Parts of Frequency Distribution
Table number labeled Table 6.
Note: Sometimes the number of classes (k) is not followed. An extra class will be added to accommodate the
highest observed value in the data set a class will be deleted if it turns out to be empty.
Step 4: Set up the class boundaries. These are defined by a lower class boundary and an upper class boundary.
If you are asking what the class boundary is, go back to the previous page and see Parts of Frequency
Distribution Table labelled Table 8.Class boundary is obtained by considering the following formula.
Lower Class Boundary=lower limit minus .5
Upper Class Boundary=upper limit plus .5
Step 5: Tally the scores in each class to obtain the frequency (f).
Step 6: Solve the class mark/midpoint (X) of each class. This obtained by adding the lower class limit and
upper class limit, then divide by 2. If you are asking what the class mark/midpoint, go back to the
previous page and see Parts of Frequency Distribution Table labelled Table 9.
Summary of Examples:
The steps of constructing a frequency distribution using the data in the “Table 10. Ages of Patients in a
Hospital” is summarized below.
LEARNING CHECK!
Direction: Using the data from the interview that you have conducted, make a frequency distribution table.
Coverage:
Summation Notation
Simple Arithmetic Mean
Weighted Arithmetic Mean
Median
Mode
A score that indicates where the center of the distribution tends to be located. It is also called a value
that can be made to represent all the values of terms in a group.
Prerequisite to the skills of computing for that value is the knowledge on summation notation that is
why we will be dealing with it first.
Often mathematical formulae/formulas require the addition of many variables Summation or sigma
notation is a convenient and simple form of shorthand used to give a concise expression for a sum of the values
of a variable.
3. The Starting Point for the Summation or the Lower Limit of Summation
4. The Stopping Point for the Summation or the Upper Limit of Summation
http://www.columbia.edu/itc/sipa/math/images/01/image008.gif
Lecture Guide in Statistics for Biology Property of: Leo M. Aresgado 2019
This expression means sum the values of x, starting at x1 and ending with xn.
This expression means sum the values of x, starting at x3 and ending with x10.
This expression means sum the squared values of x, starting at x1 and ending with xn.
LEARNING CHECK!
Direction: Solve for the following. Refer to the data in the table 11.
i xi
1 1
2 2
3 3
4 4
Table 11. Data
1. Find
2. Find
When calculating the arithmetic mean, the importance of all the items is considered to be equal. The
arithmetic mean of a set of data is found by taking the sum of the data, and then dividing the sum by the total
number of values in the set. A mean is commonly referred to as an average. It is also the simplest and most
widely used measure of a mean.
Lecture Guide in Statistics for Biology Property of: Leo M. Aresgado 2019
Example:
Find the mean driving speed for 6 different cars on the same highway.
Data: 66 mph, 57 mph, 71 mph, 54 mph, 69 mph, 58 mph
Solution: 66 + 57 + 71 + 54 + 69 + 58 = 375 ÷ 6 = 62.5
Answer: The mean driving speed is 62.5 mph.
LEARNING CHECK!
Direction: Solve for the following. Use the boxes for your answer.
1. A marathon race was completed by 5 participants in the times given below. What is the mean race time
for this marathon?
Data: 2.7 hr, 8.3 hr, 3.5 hr, 5.1 hr, 4.9 hr
2. The hourly wage (in pesos) of 5 midwives in San Jose, Occidental Mindoro is shown below. Find their
average hourly wage.
Data: 150, 200, 300, 250, 180