Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Branch of mathematics that transforms data into useful Exercises: Determine whether the following statistical claims is
information for decision makers descriptive or inferential.
Science of conducting studies to collect, organize, summarize, 1. One out of every five people is an especially appealing target for
present, analyze, and draw conclusions from a set of quantitative hungry mosquitoes.
data 2. Almost 85% of lung cancers in men and 45% in women are
It is also concerned with the use of probability theory to estimate tobacco-related.
population parameters 3. The risk of heart attack is attributed to obesity.
4. Native Americans are significantly more likely to be hit crossing
the streets than are people of other ethnicities.
5. There is an 80% chance that in a room full of 30 people that at
least two people will share the same birthday.
IMPORTANCE OF DATA:
Data are needed to provide the necessary input to a survey.
Data are needed to provide the necessary input to a study.
Data are needed to measure performance of an ongoing service
or production process.
SOME REASONS TO STUDY STATISTICS: Data are needed to evaluate conformance to standards.
Health Computer Skills Information Management Data are needed to assist in formulating alternative courses of
Care action in a decision-making process.
Auditing Process Improvement Technical Literacy Data are needed to satisfy our curiosity.
Marketing Health Care Quality Improvement
Purchasing Product Warranty Operations Management Observation – a single member of a collection of items that we want
Medicine to study, such as a person, firm, or region.
Variable – a characteristic of the subject or individual, such as an
STATISTICAL CHALLENGES employee’s income or an invoice amount.
The ideal data analyst (business professionals using statistics) should Data set – consists of all the values of all of the variables for all of the
possess these characteristics: observations we have chosen to observe.
Is technically current (e.g., software-wise).
Communicates well. - A data set may consist of many variables.
Is proactive. - T h e q u e s t i
Has a broad outlook.
Is flexible
Focuses on the main problem.
Meets deadlines.
Knows his/her limitations and is willing to ask for help.
Can deal with imperfect information.
Has professional integrity. can be used will depend upon the data type and the number of
variables.
TWO BRANCHES OF STATISTICS:
1. Descriptive Statistics POPULATION AND SAMPLE
- Concerned with collecting, organizing, summarizing, Population
presenting, and analyzing numerical data. - refers to the groups or aggregates of people, objects, materials,
- Is that branch of statistics that presents techniques for events, or things of any form.
describing set of measurements. - consists of all subjects (human or otherwise) that are being
2. Inferential Statistics studied.
- Drawing conclusions and/or making decisions concerning a - Measured by Parameters
population based only on sample data. Sample
- Also called Statistical inference or Inductive statistics - consists of few or more members of the population.
- Its main concern is to analyze the organized data leading to - Is a subgroup of the population selected for analysis
prediction or inferences. - a portion of population, or a subset from a set of units.
- Implies that before carrying out an inference, appropriate - Measured by Statistics
and correct descriptive measures or methods are employed
to bring out good results PARAMETER AND STATISTIC
Parameter
Descriptive Inferential Statistics - are the measures of the population.
Statistics - any numerical summary measure based on data from a
Collect Data Estimation population. (Bluman, 2010)
Ex. Survey Ex. Estimate the population mean weight Statistic (estimate)
using the sample mean weight - is the measure of sample.
Present Data Hypothesis Testing
Ex. Tables and Ex. Test the claim that the population SAMPLE AND CENSUS
Graphs mean weight is 120 pounds Sample
Characterize Data Drawing conclusions and/or making - involves looking only at some items selected from the population.
Ex. Sample Mean decisions concerning a population based Census
on sample results - is an examination of all items in a defined population.
\
• You might ask patients to express the amount of pain they
are feeling on a scale of 1 to 10. A score of 7 means more
pain that a score of 5, and that is more than a score of 3. But
the difference between a pain of 7 and 5 may not be the
same as that between 5 and 3. The values simply express
an order.
• Movie ratings, from * to *****.
• Rating a product purchased through online by giving marks
of 5 stars
- Classifies data into distinct categories in which ranking is implied
Numerical Variables
3. Interval Scale – it has all the properties of the ordinal scale.
- Quantitative differences can be determined.
- Indicates an actual amount and there is equal unit of
measurement separating each score, specifically equal intervals.
- The zero point of the interval scale is arbitrary and does not
reflect an absence of the attribute.
- It does not have a true value of zero
\
By direct observation or measurement 1. Random Sampling
By interview (questionnaires or rating scales) - Is a sampling technique where we select a group of subjects (a
By mail of recording or of reporting forms sample) for study from a larger group (a population).
- Ordinary or special mails - Each individual is chosen entirely by chance and each member of
- Courier services the population has a known, but possibly non-equal, chance of
- E-mail and fax being included in the sample.
Data from a political survey
Data collected from an experiment 2. Simple Random Sampling
- Each member of the population has an equal chance to be
2. Secondary Sources – The data compliers are secondary sources of included in the sample gathered.
data. The information taken from published or unpublished materials Lottery or Fishbowl technique
previously gathered by other researchers or agencies. Table of random Numbers – random numbers can be
Analyzing census data generated by a random number table, software
Examining data from print journals or data published on the program or a calculator.
internet (Book, newspapers, magazines, journals, published and
unpublished theses and dissertations) Example: There are 800 students currently enrolled in your school.
Registration (registry of birth/deaths, marriages) You wish to form a sample of ten students to answer some survey
questions.
RANDOM VARIABLES • Assign numbers 001 to 800 to each student.
- are variables whose values are determined by chance. • On the table of random numbers, choose a starting place at random
- are needed since one cannot do arithmetic operations on words (anywhere, say, the 5th column, 2nd row.)
- enable us to compute statistics, such as average or variance.
- Example:
• Selecting a product from a manufacturing process
1 – defective 2 – non-defective
• Selecting a student from each class
1 – male 2 – female
• Inspection of the availability of resources
1 – more than enough 2 – enough 3 – scarce resource • Read numbers in grouping of three digits. Get the first 10 groupings. •
261, 046, 731, 800, 701, 349, 866, 675, 199, 723, 596
TWO TYPES OF RANDOM VARIABLES
1.Categorical Random Variables SAMPLING WITH OR WITHOUT REPLACEMENT
2. Numerical Random Variables Without Replacement
- Not allowing duplicates when sampling
TWO TYPES OF NUMERICAL RANDOM VARIABLES With Replacement
1. Discrete Random Variables - Allowing duplicates when sampling
2. Continuous Random Variables
Instinctively most people believe that sampling without
RANGE OF APPLICATIONS:
replacement is preferred over sampling with replacement
A few examples of business applications of statistics are given below:
because allowing duplicates in our sample seems odd.
1. An auditor can use random sampling techniques to audit the
In reality, sampling without replacement can be a problem when
accounts receivables transactions to make inferences and
our sample size n is close to our population size N.
decisions about the validity of the total accounts receivable
At some point in the sampling process, the remaining items in the
number reported on the company’s balance sheet.
population will no longer have the same probability of being
2. A financial analyst may use regression and correlation to
selected as the items we chose at the beginning of the sampling
understand the relationship of a financial ratio to a set of other
process. This could lead to a bias.
variables in business.
When should we worry about sampling without
3. A sales manager may use statistical techniques to forecast sales
replacement?
as well as production cost for the coming year.
- Only when the population is finite and the sample size is close to
4. A manager may use probability and statistics in the evaluation of
the population size.
alternative projects or investments.
5. A market researcher may use test of significance about a group of
Note:
buyers to which the firm wishes to sell a particular product.
6. A product developer may determine what potential customers
A common criterion is that a finite population is effectively infinite
if the sample is less than 5 percent of the population (i.e., if n/N
want in a new product being developed using statistics.
≤ .05).
7. A production manager tests manufacturing processes to ensure
that products are produced with desired customer and regulatory
An equivalent statement is that a population is effectively infinite
when it is at least 20 times as large as the sample (i.e., when N/n
specifications.
≥ 20).
8. Restaurant managers use statistics to examine variability of
important performance measures, such as customer orders and
food cost, to plan work schedules and estimate material 3. Systematic Random Sampling
purchases. - Obtained when we choose every “nth” individual in a population.
- The items or individuals are arranged in some way perhaps
FOUR PROCESSES OF STATISTICS: alphabetically or other sort.
1. Collection
2. Description 4. Stratified Random Sampling
3. Analysis - A population is first divided into subsets based on homogeneity
4. Interpretation called strata.
- Stratified sampling involves selecting independent samples from a
METHODS OF DATA COLLECTION number of subpopulations, groups or strata within the population
The following are some methods of collecting data. - Example: Suppose a farmer wishes to work out the average milk
1. The Direct or Interview Method yield of each cow type in his herd which consists of Ayrshire,
2. The Indirect or Questionnaire Method Friesian, Galloway and Jersey cows. He could divide up his herd
3. The Registration Method into the four sub-groups and take samples from these.
4. The Experimental or observation Method
5. Cluster Sampling
COLLECTION OF DATA - Can be done by subdividing the population into smaller units or
1. Data are needed to provide the necessary input to a survey. clusters, usually along geographic boundaries, then selecting only
2. Data are needed to provide the necessary input to a study. at random some primary units where the study would then be
3. Data are needed to measure performance of an ongoing service concentrated.
or production process. - Strata consist of geographical regions.
4. Data are needed to evaluate conformance to standards. One-stage cluster sampling – sample consists of all
5. Data are needed to assist in formulating alternative courses of elements in each of k randomly chosen subregions
action in a decision-making process. (clusters).
6. Data are needed to satisfy our curiosity. Two-stage cluster sampling – first choose k subregions
(clusters), then choose a random sample of elements within
PPT2: SAMPLING TECHNIQUES each cluster
Sampling is that part of statistical practice concerned with the
selection of individual observations intended to yield some knowledge 6. Multi-Stage Sampling
about a population of concern, especially for the purposes of statistical - Combination of several sampling techniques.
inference.
SAMPLING TECHNIQUES
\
- Usually used by researchers who are interested in studying a very Nonresponse bias occurs when those who respond have
large population, say the whole provinces included in the characteristics different from those who don’t respond.
CALABARZON. For example, people with caller ID, answering machines,
- This is done by starting the selection of the members of the blocked or unlisted numbers, or cell phones are likely to be
sample using cluster sampling and then dividing each cluster into missed in telephone surveys. Because these are generally
strata. Then from each stratum individuals are drawn randomly more affluent individuals, their socioeconomic class may be
using simple random sampling. underrepresented in the poll.
A special case is selection bias, a self-selected sample.
NON-PROBABILITY SAMPLING For example, a talk show host who invites viewers to take a
- is a sampling technique wherein members of the sample are web survey about their sex lives will attract plenty of
drawn from the population based on the judgment of the respondents. But those who are willing to reveal details of
researchers. their personal lives (and who have time to complete the
- The result of the study using this technique is relatively biased. survey) are likely to differ substantially from those who
- This technique lacks objectivity of selection; hence, it is dislike such surveys or are too busy (and probably weren’t
sometimes called subjective sampling. watching the show anyway).
- Non-probability sampling techniques are sometimes used Response error occurs when respondents deliberately give false
information to mimic socially acceptable answers, to avoid
because they are convenient and economical.
embarrassment, or to protect personal information.
- Researchers use this method because they are inexpensive and
Coverage error occurs when some important segment of the
easy to conduct.
target population is systematically missed.
Measurement error results when the survey questions do not
1. Convenience Sampling – Take advantage of whatever sample is
accurately reveal the construct being assessed.
available at that moment. A quick way to sample
Interviewer error occurs when the interviewer’s facial
2. Purposive Sampling
expressions, tone of voice, or appearance influences the
3. Judgment Sampling
responses.
- A non-probability sampling method that relies on the
Sampling error is uncontrollable random error that is inherent in
expertise of the sampler to choose items that are any random sample.
representative of the population. Even when using a random sampling method, it is possible that
- Can be affected by subconscious bias (i.e., non-randomness the sample will contain unusual responses. This cannot be
in the choice). prevented and is generally undetectable. It is not an error on your
- Quota sampling is a special kind of judgment sampling, in part.
which the interviewer chooses a certain number of people in
each category. PPT3: SURVEY RESEARCH
4. Focus Groups – A panel of individuals chosen to be BASIC STEPS IN SURVEY RESEARCH
representative of a wider population, formed for open-ended Step 1. State the goals of the research.
discussion and idea gathering Step 2. Develop the budget (time, money, staff)
Step 3. Create a research design (target population, frame, sample
OTHER DATA COLLECTION METHODS size)
Point-of-sale (POS) systems can collect real-time data on Step 4. Choose a survey type and method of administration.
purchases at retail or convenience stores, restaurants, and gas Step 5. Design a data collection instrument (questionnaire)
stations. Step 6. Pretest the survey instrument and revise as needed.
Many companies use loyalty cards that are swiped during the Step 7. Administer the survey (follow up if needed)
purchase. These loyalty cards have the customer’s information, Step 8. Code the data and analyze it.
which can be matched to the purchase just made.
Businesses also send out e-mail surveys to loyal customers on a SURVEY TYPES
regular basis to get feedback on their products and services. Type of Characteristics
Facebook can track your Internet searches using its software Survey
algorithms. Mail • You need a well-targeted and current mailing
Google also tracks Internet searches and provides these data list (people move a lot).
through its Google Analytics services. • Low response rates are typical and
nonresponse bias is expected (non-
SAMPLE SIZE
respondents differ from those who respond).
The necessary sample size depends on the inherent variability of
• Zip code lists (often costly) are an attractive
the quantity being measured and the desired precision of the
option to define strata of similar income,
estimate.
education, and attitudes.
For example, the caffeine content of Mountain Dew is fairly
• To encourage participation, a cover letter
consistent because each can or bottle is filled at the factory, so a
small sample size would suffice to estimate the mean. should clearly explain the uses to which the
In contrast, the amount of caffeine in an individually brewed cup data will be put.
of Bigelow Raspberry Royale tea varies widely because people let • Plan for follow-up mailings.
it steep for varying lengths of time, so a larger sample would be Telephone • Random dialing yields very low response and
needed to estimate the mean. is poorly targeted
The purposes of the investigation, the costs of sampling, the • Purchased phone lists help reach the target
budget, and time constraints also are taken into account in population, though a low response rate still is
deciding on sample size. typical (disconnected phones, caller
screening, answering machines, work hours,
SOURCES OF ERROR OR BIAS no-call lists).
In sampling, the word bias refers to a systematic tendency to • Other sources of nonresponse bias include
over- or underestimate a population parameter of interest. the growing number of non-English speakers
However, the words bias and error are often used and distrust caused by scams and spams.
interchangeably. Interviews • Interviewing is expensive and time-
The word error generally refers to problems in sample consuming, yet a trade-off between sample
methodology that lead to inaccurate estimates of a population size for high-quality results may still be worth
parameter. it.
No matter how careful you are when conducting a survey, you will • Interviews must be carefully handled so
encounter potential sources of error. interviewers must be well-trained – an added
cost.
• But you can obtain information on complex or
sensitive topics (e.g., gender discrimination in
companies, birth control practices, diet and
exercise habits).
Web • Web surveys are growing in popularity, but are
subject to nonresponse bias because those
who participate may differ from those who feel
too busy, don’t own computers or distrust your
motives (scams and spam are again to
blame).
• This type of survey works best when targeted
to a well-defined interest group on a question
of self-interest (e.g., views of CPAs on new
proposed accounting rules, frequent flyer
views on airline security).
Direct • This can be done in a controlled setting (e.g.,
\
Observatio psychology lab) but requires informed • Responses are usually coded numerically (e.g., 1 = male 2 =
n consent, which can change behavior. female).
• Unobtrusive observation is possible in some • Missing values are typically denoted by special characters (e.g.,
non-lab settings (e.g., what percentage of blank, “.” or “*”).
airline passengers carry on more than two • Discard questionnaires that are flawed or missing many
bags, what percentage of SUVs carry no responses.
passengers, what percentage of drivers wear • Watch for multiple responses, outrageous or inconsistent replies
seat belts). or range answers.
• Follow-up if necessary and always document your data-coding
SURVEY GUIDELINES decisions.
• Planning - What is the purpose of the survey? Consider staff
expertise, needed skills, degree of precision, budget. DATA FILE FORMAT
• Design - Invest time and money in designing the survey. Use
books and references to avoid unnecessary errors.
• Quality - Take care in preparing a quality survey so that people
will take you seriously.
• Pilot Test - Pretest on friends or co-workers to make sure the
survey is clear.
• Buy-In - Improve response rates by stating the purpose of the
survey, offering a token of appreciation or paving the way with
endorsements.
• Expertise – Work with a consultant early on
Enter data into a spreadsheet or database as a “flat file” (n subjects x
QUESTIONNAIRE DESIGN m variables matrix).
• Use a lot of white space in layout.
• Begin with short, clear instructions. ADVICE ON COPYING DATA
• State the survey purpose. • Using commas (,), dollar signs ($), or percents (%) as part of the
• Assure anonymity. values may result in your data being treated as text values.
• Instruct on how to submit the completed survey. • A numerical variable may only contain the digits 0-9, a decimal
• Break survey into naturally occurring sections. point, and a minus sign.
• Let respondents bypass sections that are not applicable (e.g., “if • To avoid round-off errors, format the data column as plain
you answered no to question 7, skip directly to Question 15”). numbers with the desired number of decimal places before you
• Pretest and revise as needed. copy the data to a statistical package
• Keep as short as possible
PPT4: THE MEASURES OF VARIATION
SIGNIFICANCE OF THE MEASURE:
1. It categorically supports the descriptional value of the measures
of central tendency.
2. It functions as a measure of risk or uncertainty in the field of
finance.
3. It provides measures of volatility in considering alternatives for
pricing commodities.
4. It may be commonly used commonly as a measure of error in the
field of forecasting.
COMPUTATIONAL PROCEDURES:
1. Range – is the scale distance between the highest and the lowest
value in a given set of data
2. Mean Absolute Deviation – the average distance of the values
from the arithmetic mean
MAD=
∑|x −x́| MAD=
∑ f |x−x́|
n n
Illustration:
QUESTION WORDING
- The way a question is asked has a profound influence on the
response.
- Example:
1. Shall state taxes be cut?
2. Shall state taxes be cut, if it means reducing highway
maintenance?
3. Shall state taxes be cut, it means firing teachers and police?
- Make sure you have covered all the possibilities.
- Example:
Are you married? Yes or No?
- Overlapping classes or unclear categories are a problem.
- Example: 3. Quartile Deviation – one half of the distance between quartiles.
How old is your father?
35 – 45 Formula for Quartile Deviation
45 – 55
55 – 65
Q3−Q1
QD=
65 or older 2
CODING AND DATA SCREENING 4. Variance and Standard Deviation
\
• Variance ( s2) – the average squared difference or deviation from Tables and Graphs for Univariate Numerical Data
• Univariate Data – those data having one scalar components.
the mean They involve only one variable.
• Standard Deviation (s) – the average square root of the variance
Ungrouped Data and Grouped Data
PROPERTIES OF STANDARD DEVIATION: 1. Ordered Array (Data Array)
1. Standard Deviation is only used to measure the spread or 2. Stem and Leaf Display
dispersion around the mean of a data set
2. Standard Deviation is never negative Illustration
3. For data with approximately the same mean, the greater the
Construct a stem and Leaf display and frequency distribution table with
spread, the greater the Standard Deviation
five classes for the following College Algebra grades of 30 students.
4. If all the values of a data set are the same, the Standard
Deviation is zero (because each value is equal to the mean)
70 83 87 76 80 87 75 84 85 76 81 82 89 77 84 86 71 80 80 79 84 86
93 83 85 88 72 84 84 92
Population Variance Sample Variance
Answer:
σ 2=
∑ ( x− x́)2 s2=
∑ (x− x́)2 7
8
01256679
00012334444455667789
8
20
N n−1 9 23 2
Illustration:
RELATIVE MEASURES OF VARIABILITY Frequency Distribution of the Philippines’ Richest as of 082509
% of Persons
Coefficient of Variation Net Worth Number
Proportions % of "Less than"
- Allows the variability of scores in two sets of data that do not in Million of of X Cf< Cf>
of Persons Persons Lower Boundary
Dollars Persons
necessarily measure the same thing. of Class Interval
- Denoted by ‘v’ 38-370 26 204
- It is useful to use this coefficient when the means of the 371 - 703 8 537
distribution being compared are far apart, or data are in different 704 - 1036 3 870
units. 1037 - 1369 1 1203
1370 - 1702 1 1536
Formula of Coefficient of Variation
Illustration 2:
standard deviation Where:
CV = x 100 s = standard deviation % of Persons
mean x́ = mean
Net Worth Number
in Million of of X Cf< Cf>
Proportions % of "Less than"
of Persons Persons Lower Boundary
Dollars Persons
of Class Interval
7–9 2
Illustration: 10 – 12 8
13 – 15 14
16 – 18 17
19 - 21 20 20
\
TRUNCATED MEAN in the shaping of the attitudes of those farmers who feel negatively
- Obtained by computing the arithmetic mean but removing first the towards the design:
extreme value or outliers (Q) on both ends of the distribution • Does not ridge
(lowest and/or highest). • Does not work for inter-cropping
- Removed for the ff. reasons: • Far too expensive
a. to eliminate the directional effect of these values. • New technology too risky
b. to give a better descriptional average of data. • Too difficult to carry.
Formula:
ŔG =√n x1 x2 x 3 x 4 … x n−1
Where:
x=1+r
r = growth rate on the given period
\
- IA Likert scale is a composite of itemized scales.
Illustration:
The annual revenue growth for the apartment sector from 4th quarter
2004 to 3rd quarter 2014 has been studied and used the coefficient of
variation ratio to compare the historical volatility and average annual
revenue change at both the national and metro level.
So, according to the coefficient of variation, on a standalone basis,
Phoenix ranks as one of the riskiest metros for investors. Then, when
you look at the less risky metros – specifically Washington DC, SF,
Miami and San Diego – these cities have some common traits, such as
high barriers to entry, strong growth in the 20 to 34-year-old age cohort
(group) and expensive single-family housing.
Expected return (ř ) = (10% x 0.20) + (15% x 0.60) + (20% x 0.20) Investment A
Ratio=
= 0.15 or 15% Investment B
PORTFOLIO RETURN Where A = Larger CV and B = Lower CV
A portfolio is a collection of investments all owned by an individual or a
firm. EXPECTED RETURN
EV = Probability * Rate of Return
Illustration:
ABC Corporation is considering investing in three stocks. Listed below MEASURING RISK – STANDARD DEVIATION
are the expected returns for each investment.
N
Stocks
FLI
Individual Expected
Returns (%)
10
Amount Invested
20,000
σ= √∑i=1
(r −^r )2 Pi
PA 15 50,000 Illustration:
WEB 18 30,000 Consider the following annual dividends of 3 stocks (in
thousands) for 5 years:
What is the expected portfolio return?
Expected return (rˇp ) = (10% x 20%) +(15% x 50%) + (18% x 30%)
= 14.90%
MEASURING RISK
Risk is the exposure to uncertainty or danger resulting to changes in
the expected return in a given investment.
a. Which of these stocks has a greatest average annual dividend in
• Standard deviation (σi) measures total, or stand-alone, risk. 5 years?
• The larger σi is, the lower the probability that actual returns will be b. Which is the best stock in terms of its annual dividend?
closer to expected returns. c. Which among the 3 stocks is least risky?
• Larger σi is associated with a wider probability distribution of d. Which among the 3 stocks is the riskiest?
returns. e. Determine how much riskier is Stock C compared to Stock A.
\
f. How does measure of variability helps in the decision? Rule of Thumb in Interpreting the Skewness Number
If skewness is less than –1 or Highly Skewed (Positively or
Answer: greater than +1 negatively)
If skewness is between -1 and Moderately Skewed (Positively
– ½ or between + ½ and +1 or negatively)
If skewness is between – ½ Approximately Symmetric
and + ½
TYPES OF SKEWNESS
1. Positive skewness
PPT9: MEASURES OF SHAPE - The mode is located I the highest point of the curve.
- The median is the middle value of the distribution.
MEASURES OF SHAPE (SKEWNESS AND KURTOSIS) - The mean is greater than the median and the mode and
found towards the lower tail end of the curve.
- Mean is pulled by the frequency but very high numerical
observations.
2. Negative skewness
- The mode is the highest point
- The median is the middle value
- The mean is less than the median and the mode and found
towards the lower tail end of the curve (left side)
Illustration:
Below are the reading temperature in 12 cities in the country during
INTERPRETATION OF SKEWNESS: summer:
72, 74, 75, 77, 78, 79, 82, 85, 86, 90, 93, 94
Positive Skewness – the data are positively skewed or skewed
right, meaning that the right tail of the distribution is longer than
Solve for the skewness then interpret the result.
the left
Negative Skewness – the data are negatively skewed or skewed
Illustration 2:
left, meaning that the left tail is longer.
The age distribution of 60 vacationers in Boracay Beach was
Skewness = 0 – the data are perfectly symmetrical. observed. It was found out that the mean age is 41.83, median is 43.23
A skewness of exactly zero is unlikely for real-world data and s = 14.32. Find the value of skewness.
\
- The truncated mean is obtained by computing the arithmetic
Answer:
3( x́−~x) 3(41.83−43.23)
sk= =sk = =−0.29
s 14.32
The value -0.29 is between -0.5 and +0.5. This indicates that curve is
approximately symmetric, so therefore, the distribution is said to be
normally distributed.
KURTOSIS
- Tells you how tall and sharp the central park is, relative to a
standard bell curve.
- From the Greek word κυρτός, kyrtos or kurtos, meaning
bulging.
mean but removing first the extreme value or outliers (Q) on both
- Any measure of the "peakedness" of a frequency distribution
ends of the distribution (lowest and/or highest).
- Refers to the peakedness or flatness of a frequency distribution - These are removed for the following reasons:
- Is a measure of flatness of the distribution. a) To eliminate the directional effect of these values.
Heavier tailed distributions have larger kurtosis measures b) To give a better descriptional average of data.
The normal distribution has a kurtosis of 3.
- Kurtosis characterizes the relative peakedness or flatness of a Example: Given the interest rates: 2.12, 2.16, 2.14, 3.12, 2.13.
distribution compared with the normal distribution. Determine if 3.12 is an outlier.
Positive Kurtosis – indicates a relatively peaked distribution
Negative Kurtosis – indicates a relatively flat distribution
GEOMETRIC MEAN OF RATE OF RETURN
CLASSIFICATIONS OF KURTOSIS: This is done by computing the nth root of the product of n number of
1. Mesokurtic – Normal distribution numerical values minus 1.
- The distribution of data wherein more items are clustered
around a central value and there will be a few extremes.
- The mean = median = mode Where:
2. Leptokurtic – A distribution that is more peaked than the normal
3. Platykurtic – The one that is flatter than the normal ŔG =√ x1 x2 x 3 x 4 … x n−1 x 1=1+r
n
r = growth rate on the given
period
Kurtosis for Ungrouped Data Kurtosis for Grouped Data
4 USES OF THE GEOMETRIC MEAN RATE OF RETURN:
∑ f (x− x́) ∑ f ( x− x́)4 1. An alternative measure of average if the arithmetic mean
Ku= Ku= overestimates or underestimates the average of the data.
n s4 n s4 2. Commonly used for data such as periodic percentage
increase/decrease.
Ku = 0 Mesokurtic 3. Widely used to measure correct average in the periodic growth
Ku > 0 Leptokurtic rate
Ku < 0 Platykurtic
s GEOMETRIC MEAN
Illustration: - Calculated by raising the product of series of numbers to the
Below are the reading temperature in 12 cities in the country during inverse of the total length of the series.
summer: - Used in finance to calculate average growth rates and is referred
72, 74, 75, 77, 78, 79, 82, 85, 86, 90, 93, 94 to as the compounded annual growth rate.
- Most useful when numbers in the series are not independent of
Solve for the kurtosis then interpret the result. each other or if numbers tend to make large fluctuations.
- Applications of the geometric mean are most common in business
PPT10: THE MEASURES OF CENTRAL TENDENCY and finance, where it is commonly used when dealing with
percentages to calculate growth rates and returns on portfolio of
MEAN OF UNGROUPED DATA securities.
The sum of the values divided by the number of values--often - Geometric mean is also used in certain financial and stock market
called the "average." indexes, such as Financial Times' Value Line Geometric index.
Add all of the values together.
Divide by the number of values to obtain the mean. Growth Rates Examples:
1. Consider a stock that grows by 10% in year one, declines by 20%
Illustration: in year two, grows 9% in year three, and grows by 30% in year
The mean of 7, 12, 24, 20, 19 is 16.4 four. The geometric mean of the growth rate is calculated as:
\
To find the Mode: Below are the reading temperatures in 12 cities in the country during
1. Calculate the frequencies for all of the values in the data the summer:
2. The mode is the value (or values) with the highest frequency 72, 74, 75, 77, 78, 79, 82, 85, 86, 90, 93, 94
APPLICATION
1. The mean grade of the first year, second year, and third year
students in Math in a particular high school is 80. If there are 32
first year, 29 second year, and 38 third year in this group whose
average are 84 and 80 for the first year and second year,
respectively, find the average of the third-year students of this
high school.
2. A statistics instructor computes final grades based on quizzes,
long tests, and final exam giving them weights of 2, 3, and 5,
respectively. If a student had grades of 87, 91, and 88, for
quizzes, long tests, and final exam, respectively, find the student’s
final grade in Statistics.
MEASURES OF LOCATION
1. Median ESTIMATION
2. Quantiles Estimation is the process of deriving the value of the parameter from
the information obtained from a sample. The population is usually large
QUANTILES and the parameter is always unknown, thus only estimates of the true
- are also average of position or location of the desired item. parameter are derived or obtained from the sample.
- (n/2, n/4, 3n/4, n/10, n/100…)
CORRELATION ANALYSIS AND REGRESSION ANALYSIS
1. Quartiles - divide the distribution into four equal parts. Correlation Analysis is a statistical technique used in
Quartile One (Q1) – measures the first one-fourth of the determining whether a relationship exists between variable of
given distribution (n/4) interest.
Quartile Three (Q3) – measures the point separating the
third from the fourth or last quartile. (3n/4) Regression Analysis is a statistical method that identifies the
Quartile Two (Q2) – divides the distribution into half and relationship between quantitative variables.
therefore will be the same as the median. (n/2) - Once the relationship is established, regression moves on to
2. Deciles – divide the distribution into ten equal parts the prediction capability of inferential statistics.
3. Percentiles – divide the distribution into one hundred equal parts
HYPOTHESIS TESTING
QUANTILES FOR UNGROUPED DATA Hypothesis Testing is a decision-making process for evaluating
Quantiles are also measures of location the measures: claims or beliefs about a population.
The researcher:
1) States the particular hypothesis that will be evaluated;
2) Gives a significance level;
3) Selects a sample from the population;
4) Collects the data;
5) Performs the calculations required for the test; and
6) Makes probabilistic decision.
TEST OF NORMALITY
Two descriptive techniques to determine if the sample data fits the
normal distribution:
1) Compare the defining summary statistics of the sampled data with
the properties of the normal distribution.
2) Construct the normal probability plot.
\
3( X−median) whisker plot for a moderately sized data and a frequency
PC = histogram or polygon for a large data.
s 3. Determine how the data are distributed in its range – whether
approximately, 2/3 of the data lie within 1 standard deviation
If the index is greater than or equal to +1 or less than or equal to -1, it about the mean, 4/5 of the data are within 1.28 standard deviation
can be concluded that data are significantly skewed. about the mean, and if approximately, 19/20 of the data are within
2 standard deviations about the mean.
PC ≤−1 or PC≥+1 WHAT IS A BOX AND WHISKER PLOT?
Box and whisker plot
Example 1: - Also called a box plot—displays the five-number summary of a set
A survey of high-tech firms showed a number of days’ inventory they of data. The five-number summary is the minimum, first quartile,
had on hand. Determine if the data are approximately normally median, third quartile, and maximum.
distributed. - In a box plot, we draw a box from the first quartile to the third
quartile. A vertical line goes through the box at the median. The
5, 29, 34, 44, 45, 63, 68, 74, 74, 81, 88, 91, 97, 98, 113, 118, 151, 158 whiskers go from each quartile to the minimum or maximum.
81, 148, 152, 135, 151, 152, 159, 142, 34, 162, 130, 162, 163, 143, 67,
112, 70
Please note that box and whisker plots can be drawn either vertically
or horizontally.
\
Because of symmetry of the standard normal curve, O1 And On
will have the same numerical value but oppositely signed, O1 will
be negative and On will be positive.
Illustration:
Shop Sales
699 996
836 1037
915 1085
921 1119
978 1208
983 1223
Compute for O1, O2, O3, …, On: (proportion or area under the
curve)
O1st = 1/13 O2nd = 2/13 O3rd = 3/13
= 0.0769 = 0.1538 = 0.2308
O11th = 11/13 O12th = 12/13 O1 = -1.43
= 0.8462 = 0.9231
HYPOTHESIS TESTING
- Is a decision-making process for evaluating claims about a
population.
- In hypothesis testing, the researcher must define:
• the population under study;
• state the particular hypotheses that will be investigated;
• give the significance level;
• select a sample from the population;
• collect the data;
• perform the calculations required for the statistical test, and
finally reach a conclusion.
In Panel B, the points are nonlinear where the points seem to rise STATISTICAL HYPOTHESIS
more steeply at first and then less steeply in the end. This pattern is • A Statistical Hypothesis is a statement, assertion, conjecture, or
indicative of the elongated tail of a left skewed data in Table 3.2.2. claim about the nature of a population. It is the basic object of any
experimental inquiry.
On the other hand, Panel C, the non-linear pattern of the points seems • A Statistical Hypothesis is an assumption or statement, which
to rise less steeply at first and then more steeply. The steepness at the may or may not be true, concerning one or more populations.
right is indicative of the elongated right tail of skewed data in Table
3.2.2 Consider the following:
1. At one time it was thought that, the middle children are said to be
TO CONSTRUCT THE NORMAL PROBABILITY PLOT insecure compared to the first or last child.
Perform the Inverse Normal Scores Transformation 2. Female had inferior intelligence compared to male.
By converting the ordered data set x1, x2, x3, … xn to the 3. The mean life span of man is 50 years.
corresponding standardized normal quantile values O1, O2, O3, 4. The average daily income of a family is P345.
…, On, where Oi is the value below which is the proportion i/(n + 5. Researchers at George Washington University and the National
1) of the area under the standard normal curve. Institutes of Health claim that approximately 75% of the people
\
believe tranquilizers work very well to make a person calmer and
more relaxed.
6. Records of a certain hospital showed that the distribution of length
of stay of its patients is normal with a mean of 11.5 days and a
standard deviation of 2 days.
\
6. μ 1 – μ2≠ 0
7. μ 1 > μ2
8. μ1<μ
9. The new educational program has a poor effect in the
achievement of the elementary pupils.
10. The use of multimedia as an instructional material in classroom
teaching is highly significant towards student behavior inside the
classroom.
\
Step 2. Specify the level of significance to be used.
Step 3. Determine the critical region.
Step 4. Select an appropriate test statistic and determine the critical
value of the test statistic.
CRITICAL VALUES OF Z
GUIDELINES IN HYPOTHESIS TESTING: D. Hypothesis Tests About the difference Between Two
1. State the null and alternative hypothesis (H0 andH1). population Means for Large (n1, n2 > 30) and Independent
2. Choose the level of significance. Samples.
3. Determine the critical region.
4. Choose the statistical test appropriate to test the hypothesis.
5. Compute the value of the statistical test.
6. Make a decision.
Reject H0 if the test statistic has a value in the critical region;
otherwise, do not reject H0.
7. Interpret and discuss the result.