Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
VARIABLES
unit of measurement is arbitrary & there is no GUIDELINES FOR THE FREQUENCY DISTRIBUTION
absolute zero level where none of a given Set of classes must be mutually exclusive. There
characteristic is present should be no overlap between classes & limits
4. THE RATIO SCALE The set of classes must be exhaustive. No data
similar to interval scale but has an absolute zero & values should fall outside the range covered by
multiples are meaningful the frequency distribution.
If possible, the classes should have equal widths.
CHAPTER 2 Unequal class widths make it difficult to interpret
RAW DATA both frequency distributions & their graphical
have not been manipulated or treated in any way presentations.
beyond their original collection Selecting the number of classes to use is a
will not be arranged or organized in any subjective process.
meaningful manner Whenever possible, class widths should be round
WHEN RAW DATA ARE QUANTITATIVE, 2 WAYS TO numbers.
ADDRESS THIS PROBLEM: If possible, avoid using open-end classes. These
1. Frequency Distribution are classes with either no lower limit or no upper
2. Histogram limit.
FREQUENCY DISTRIBUTION STEPS IN CREATING A FREQUENCY DISTRIBUTION
A table that divides the data values into classes & TABLE
shows the number of observed values that fall into 1. Find the highest and lowest value.
each class 2. Find the range
RELATIVE FREQUENCY - Highest value – lowest value
Frequency of the ith class / total frequency 3. # of classes (number of rows/distinct
(Frequency of the ith class / total frequency) x100 classes/groupings): Sturges’ Rule
CUMULATIVE FREQUENCY - K = 1 +3.322(log10n)
Add previous frequency 4. Class width (thickness of a class):
CUMULATIVE RELATIVE FREQUENCY - Width = Range / K
Cumulative frequency / total frequency - Classes should not overlap
(Cumulative frequency / total frequency) x 100 - Width should be the same
HISTOGRAM - All raw data should be slotted in a class
Describes a frequency distribution by using a 5. Lower limits of 1st class (lesser than the lowest value
series of adjacent rectangles, each of which has a in the data)
length that is proportional to the frequency of the 6. Upper limit (add the width to the lower limit)
observations within the range of values it 7. Define the classes
represents - First class includes the lowest value in the data
Represents quantitative data set.
y-axis: Frequency ; x-axis: Classes 8. Interpretation
CLASS FREQUENCY POLYGON
Each category of the frequency distribution Consists of line segments connecting the points
FREQUENCY formed by the intersections of the class marks with
the class frequencies
The number of data values falling within each
class Relative frequencies or percentages may also be
CLASS LIMITS used in constructing the figure
The boundaries for each class Empty classes are included at each end so the
curve will interest the horizontal axis
These determine which data values are assigned
to that class y-axis: relative frequency ; x-axis: classes / class
CLASS INTERVAL marks (midpoints)
OGIVE
The width of each class
A graphical display providing cumulative values for
This is the difference between the lower limit of the
frequencies, relative frequencies or percentages
class & the lower limit of the next higher class
These values can either be “greater than” or “less
When a frequency distribution is to have equally
than”
wide classes, the approximate width of each class
is y-axis: percentage of observations less than upper
limit of each class ; x-axis: class marks (midpoints)
STEM-AND-LEAF DISPLAY
A variant of the frequency distribution, uses a
subset of the original digits as class descriptors
A given data value can be represented only once
CLASS MARK in a display
The midpoint of each class All values must be expressed in terms of the same
Midway between the upper & lower class limits stem digit & the same leaf digit
Shows just 2 figures for each data value
If placed horizontally, looks like a histogram
DOT PLOT
Displays each data value as a dot & allows us to
readily see the shape of the distribution as well as
high & low values
BAR CHART
Represents frequencies according to the relative
lengths of asset of rectangles, but it differs in 2
respects from the histogram:
- (1) the histogram is used in representing
quantitative data, while the bar chart represents
qualitative data
- (2) adjacent rectangles in the histogram share a
common side, while those in the bar chart have
a gap between them
y-axis: classes; x-axis: frequency
LINE GRAPH
Is capable of simultaneously showing values of 2 SIMPLE TABULATION
quantitative variables (y, or vertical axis & x, or Deals with 1 variable
horizontal axis); it consists of linear segments Count of Engine
connecting points observed or measured for each Engine Total
variable 1 17
y-axis: frequency ; x-axis: classes 2 13
PIE CHART Grand Total 30
Is a circular display divided into sections based on
either the number of observations within or the CROSS TABULATION
relative values of the segments Deals with 2 or more variables
It can be constructed by using the principle that a Engine
Air-conditioning Grand
circle contains 360 degrees 1 2 Total
The angle for each piece can be calculated as 1 8 9 17
follows 2 5 8 13
Grand
13 17 30
Total
PICTOGRAM CHAPTER 3
Can describe frequencies or other values of ARITHMETIC MEAN
interest Arithmetic average / mean
Can be misleading Sum of the data divided by the number of
THE SCATTER DIAGRAM OR SCATTERPLOT observations
2-dimensional dotplot Most common measures of central tendency
Each point in the diagram represents a pair of GROUPED DATA
known or observed values of 2 variables, generally
referred to as y and x, with y represented along μ = Σfimi/Σfi
the vertical axis and x represented along the RAW DATA
horizontal axis. - For a population
2 variables are referred to as: μ = Σxi/N
- dependent (y) variable - For a sample
- independent (x) variable
A direct (positive) linear relationship between the x̅ = Σxi/n
variables THE WEIGHTED MEAN/WEIGHTED AVERAGE
- Positive slope Each data value is weighted according to its
- X & Y are increasing together relative importance
An inverse (negative) linear relationship between RAW DATA
the variables negative slopes - For a population
- Negative slope μ = (Σwixi)/(Σwi)
- Y decreasing & X increasing - For a sample
x̅ = (Σwixi)/(Σwi)
THE MEDIAN
Value that has just as many values above it as
below it
GROUPED DATA RANGE
Md = L + (N/2 -Cfp)/(fmed) x w Simplest measure of dispersion
Difference between the highest and lowest
RAW DATA
values
- For odd numbers, median will be halfway
between 2 values in the middle when the values Weakness of being able to recognize only the
extreme values in the data
are arranged
- If there is even one very extreme values, the
(N+1)/2 range can be a large number that is misleading
- For even numbers to the unwary
N/2 MIDRANGE
THE MODE Variant of the range
RAW DATA The average of the lowest data value and
- A value that occurs with the greatest frequency highest data value
- When there are 2 modes, a distribution of values Vulnerable to the effects of extremely low or
is referred to bimodal. high outliers in the data
- There can be no mode Sometimes used as a very approximate
GROUPED DATA measure of central tendency
- Midpoint of modal class
- Modal class has the greatest frequency QUARTILES
Separate the data into equal-size groups in
COMPARISON OF THE MEAN, MEDIAN AND MODE order of numerical value
Mean First quartile
- gives equal consideration to even very extreme Q1 = (N+1)/4
values in the data Second quartile (the median)
- influenced of one or two very low or high values
- Mean makes more complete use of the data Q2 = 2(N+1)/4
Median Third quartile
- tends to focus more closely on those in the Q3 = 3(N+1)/4
middle of the data array
Mode INTERQUARTILE RANGE
- Can have more than one mode, but only one Difference between the third quartile & the first
value for mean and median quartile
- Tends to be less useful than the mean & median Distance between 75% and 25% values
as a measure of central tendency
Q3 – Q1
DISTRIBUTION SHAPE AND MEASURES OF
CENTRAL TENDENCY QUARTILE DEVIATION
SYMMETRICAL DISTRIBUTION: left and right One-half the interquartile range
sides of the distribution are mirror images of each (Q3 – Q1)/2
other
NORMAL DISTRIBUTION: single mode, bell- RESIDUALS
shaped Difference between each data value and the
SKEWNESS: refers to the tendency of the group mean
distribution to “tail off” to the right or left - Left of the mean - > 0
- SYMMETRICAL DISTRIBUTION: mean, - Right of the mean + < 0
median & mode are the same (only true for GROUPED DATA
unimodal distribution, not possible for bimodal - For a population
distributions)
- POSTIVIELY SKEWED DISTRIBUTION: mean Residuals = xi - μ
is greater than the median, which in turn is - For a sample
greater than the mode Residuals = xi - x̅
*median will tend to be a better measure of
central tendency than the mean MEAN ABSOLUTE DEVIATION (MAD)
- NEGATIVELY SKEWED DISTRIBUTION: less Average deviation or average absolute deviation
than the median which in turn is less than the
The average of the absolute values of
mode
differences from the mean
*median is less influenced by extreme values
- For a population
and tends to be a better measure of central
tendency than the mean MAD = Σ|xi – μ| / N
- For a sample
MAD = Σ|xi – x̅| / n About 95% of the observations will fall within 2
standard deviations of the mean
VARIANCE
Practically all of the observations will fall within 3
Common measure of dispersion, includes all
standard deviations of the mean
data values
GROUPED DATA
STANDARDIZED DATA
- For a population
Based on this concept & involves expressing
σ2 = Σ[fi (mi – μ)2] / N each data value in terms of its distance (in
σ2 = Σfi(mi)2 – N(μ)2 / N standard deviations) from the mean
- For a sample Have no units
How far above or below the individual value is
s2 = Σ[fi (mi – x̅)2] / (n-1) composed to be population mean in units of
s2 = Σfi (mi)2 – n(x̅)2 / (n-1) standard deviation
RAW DATA A negative z means the data value falls below
- For a population the mean
Mean will always be zero (0)
σ2 = Σ(xi – μ)2 / N For the population
σ2 = Σ(xi)2 – N(μ)2 / N zi = (xi - μ)/σ
- For a sample For the sample
s2 = Σ(xi – x̅)2/(n-1) zi = (xi - x̅)/s
s2 = Σ(xi)2 – n(x̅)2/(n-1)
THE COEFFICIENT OF VARIATION
STANDARD DEVIATION Indicated relative amount of dispersion in the
Positive square root of the variance of either a data
population or a sample Enables us to easily compare the dispersion of 2
Important measure of dispersion because it is sets of data that involve different measurement
the basis for determining the proportion of data units or differ substantially in magnitude
values within certain distances on either side of For a population
the mean for certain types of distribution CV = (σ/μ) x 100
GROUPED DATA & RAW DATA For a sample
- For a population
σ = √σ 2 CV = (s/x̅) x 100
- For a sample CHAPTER 4: DATA COLLECTION AND SAMPLING
s = √s2 METHODS
TYPES OF STUDIES
CHEBYSHEV’S THEOREM 1. Exploratory Research
When either a population or sample has a small 2. Descriptive Research
standard deviation, individual observation will 3. Causal Research
tend to be closer to the mean 4. Predictive Research
A large standard deviation will result when
individual observations are scattered widely EXPLORATORY RESEARCH
about their mean Helps us become familiar with the problem situation,
Specifies the minimum percentage of identify important variables & use these variables to
observations that will fall within a given number form hypotheses that can be tested in subsequent
of standard deviations from the mean, research
regardless of the shape of distribution Can also be of a qualitative nature
For either a sample or a population, the Understand the problem, identify relevant variables,
percentage of observations that fall within k (for formulate hypothesis
k>1) standard deviations of the mean will be at No prior study has been done before
least DESCRIPTIVE RESEARCH
2
(1-1/k ) x 100 Needs exploratory study, more information than
exploratory
Establish reliable measurements
THE EMPIRICAL RULE
CAUSAL RESEARCH
Standard deviation rule
Determine the relationship among variables
Applies only to distribution that are ball shaped
To determine whether one variable has an effect on
& symmetrical
another
About 68% of the observations will fall within 1
standard deviation of the mean
Should be pointed out that statistical techniques - (1) multiple choice – several alternatives to
alone cannot prove causality choose from
Proof must be established on the basis of - (2) dichotomous – only 2 alternatives available
quantitative findings along with logic - (3) open-ended – respondent is free to formulate
own answer & expand on the subject of the
PREDICTIVE RESEARCH questions
Needs causal study Problems with questionnaires
Use analysis to forecast - (1) The vocabulary level may be inappropriate
Attempt to forecast some situation or value that will for the type of person being surveyed
occur in the future - (2) the respondent may assume a frame of
reference other than the one the researcher
THE RESEARCH PROCESS intended
1. Define the problem - (3) the question may contain “leading” words or
2. Decide on the type of data required phrases that unduly influence the response
3. Determine through what means the data will be - (4) the respondent may hesitate to answer a
obtained question that involves a sensitive topic
4. Plan collection of data/select a sample
5. Collect & analyze data ERRORS IN SURVEY RESEARCH
6. Draw conclusion & report findings Survey research may lead to several different kinds
7. Make decisions in terms of research of errors
Sampling error is a random error
SOURCES OF DATA - Nondirectional or nonsystematic, because
PRIMARY DATA measurements exhibiting random error are just
Data generated by the researcher of this study as likely to be too high as they are to be too low
Survey, experimental, observational research Response & nonresponse errors
Tend to require more time & expense than - Directional or systematic type
secondary data 1. Sampling Error
SECONDARY DATA 2. Non-sampling Error
Gathered by someone else for some other purpose SAMPLING ERROR
- INTERNAL: sources within the researcher’s Occurs because a sample has been taken instead of
organization a complete census of the population
- EXTERNAL: sources including governmental, Determination of the sample size is necessary to
trade, commercial & internet sources have a given level of confidence that the sample
proportion will not be in error by more than a
SURVEY RESEARCH specified amount
Communication with a sample of individuals in order RESPONSE ERROR
to generalize on the characteristics of the population Some respondents may distort the truth (to put it
from which they were drawn kindly) when answering a question
Biased questions can encourage such response
TYPES OF SURVEYS errors
THE MAIL SURVEY NONRESPONSE ERROR
A mailed questionnaire is typically accompanied by Not everyone in the sample will cooperate in
a cover letter & a postage-paid return envelope for returning the questionnaire or in answering an
the respondent’s convenience interviewer’s questions
THE PERSONAL INTERVIEW Those who respond may be different from those who
An interviewer personally secures the respondent’s don’t
cooperation & carries out what could be described
as “purposeful conversation” in which the EXPERIMENTATION & OBSERVATIONAL
respondent replies to the questions asked. RESEARCH
TELEPHONE INTERVIEW EXPERIMENTS
Similar to the personal interview, but uses the Purpose: to identify cause-and-effect relationships
telephone instead of personal interaction. between variables
An interview conducted over the telephone 2 key variables in an experiment:
WEB SURVEY - (1) independent variable or treatment
A questionnaire completed over the internet - (2) dependent variable or measurement
QUESTIONNAIRE DESIGN EXPERIMENTAL GROUP
Also referred to as DATA COLLECTION Persons or objects receiving a treatment
INSTRUMENT CONTROL GROUP
Either filled out personally by the respondent or Those who are not exposed to the treatment
administered & completed by an interviewer EXTRANEOUS VARIABLES
May contain 3 types of questions:
Outside variables that are not part of the experiment, population with a number, then use a random
but can influence the results number table to select those who will make up the
sample
2 KINDS OF VALIDITY: THE SYSTEMATIC SAMPLE
INTERNAL VALIDITY We randomly select a starting point between 1 & k,
Refers to whether T really made the difference in the then sample every kth element from the population
measurements obtained PROBLEMS:
EXTERNAL VALIDITY - PERIODICITY: a phenomenon where the order
Even if T did make the difference, asking whether in which the population appears happens to
the results can be generalized to other people or include a cyclical variation in which the length of
settings the cycle is the same as the value of k that we
are using in selecting the sample
SECONDARY DATA o Not a common problem, the possibility of its
Collected by someone other than the researcher, for existence should be considered when
purposes other than the problem or decision at undertaking a systematic sample
hand; such data can be either internal or external STRATIFIED SAMPLE
2 TYPES OF SECONDARY DATA: The population is divided into layers or strata then a
1. Internal simple random sample of member s from each
2. External stratum is selected
INTERNAL SECONDARY DATA Strata members have the same percentage
Generated by your own firm or organization representation in the sample as they do in the
Internal secondary data have traditionally existed in population
the form of accounting or financial information TWO-WAY STRATIFIED SAMPLE
Anything in written form that has ever been A sample can also be stratified on the basis of 2 or
generated within the company falls within the real of more variables at the same time the sample has
internal secondary data been forced to take on the exact percentage
EXTERNAL SECONDARY DATA breakdown as the population in terms of 2 different
Gathered by someone outside the firm or measurements or characteristics
organization Forces the composition of the sample to be the
same as that of the population, at least in terms of
SAMPLING METHODS the stratification variable(s) selected
Can be categorized as probability or nonprobability o Important if some strata are likely to differ
PROBABILITY SAMPLING greatly from others with regard to the variable(s)
- each person or element in the population has a of interest
known (or calculable) chance of being included CLUSTER SAMPLE
in the sample Involves dividing the population into groups then
- Allow us to estimate the maximum amount of randomly selecting some of the groups & taking
sampling error between our sample statistic & either a sample or census of the members of the
the true value of the population parameter being groups selected
estimated Members of the population may not have the same
- Each person or element in the population has probability of inclusion, but these probabilities could
some (nonzero) known or calculable chance of be determined if we wished to exert the time & effort
being included in the sample
- However, every person or element may not have NONPROBABILITY SAMPLING
an equal chance for inclusion Not every unit in the population has a chance of
NONPROBABILITY being included in the sample & the process involves
- Primarily used in exploratory research studies at least some degree of personal subjectivity instead
where there is no intention of making statistical of following predetermined, probabilistic rules for
inferences from the sample to the population selection.
4 TYPES OF NONPROBABILITY SAMPLING:
PROBABILITY SAMPLING 1. Convenience Sample
1. Simple random sampling 2. Quota Sample
2. Systematic sample 3. Purposive Sample
3. Two-way stratified sample 4. Judgment Sample
4. Cluster sample
π = population proportion
μ = population mean n = sample size
σ = population standard deviation z-score for a given value of p:
n = sample mean
z-score for the sampling distribution of the
mean, normally distributed population
SAMPLING DISTRIBUTIONS WHEN THE
POPULATION IS FINITE
When sampling is without replacement & from a
z = distance from the mean, measured in standard error finite population
units Whether we are dealing with the sampling
x = value of the sample mean in which we are interested distribution of the mean (x̅) or the sampling
μ = population mean distribution of proportion (p), the same correction
σx̅ =standard error of the sampling distribution of the factor is applied
mean or σ/√n Depends on the sample size (n) versus the size of
the population (N) & as a rule of thumb, should be
WHEN THE POPULATION IS NOT NORMALLY applied whenever the sample is at least 5% as large
DISTRIBUTED as the population
Provided that the sample size is large (n≥30) the n < 0.05N, the correction will have very little effect
sampling distribution of the mean can still be Finite correction factor is applied when n ≥ 0.05N
assumed to be normal Purpose: to reduce the standard error according to
This is because of what is known as the central limit how large the sample is compared to the population
theorem: - The purpose is to arrive at a corrected (reduced)
- For large, simple random samples from a value of the standard error of the sampling
population that is not normally distributed, the distribution
sampling distribution of the mean will be
approximately normal, with the mean μx̅ = μ &
the standard error σx̅ = σ/√n. As the sample size