Sei sulla pagina 1di 12

QUANTITATIVE ANALYSIS  DATA: pieces of information about individuals

CHAPTER 1 organized into variables


BUSINESS STATISTICS  INDIVIDUALS: means a particular person or
 Collection, summarization, analysis & reporting of group
numerical findings relevant to a business decision  VARIABLE: a particular characteristic of an
or situation individual (measurable and concrete
 integral parts of our lives characteristics to be analyzed)
2 TYPES OF STATISTICS  DATA SET: a set of data identified with particular
1. DESCRIPTIVE STATISTICS circumstance
2. INFERENTIAL STATISTICS
TYPES OF VARIABLES AND SCALES OF
DESCRIPTIVE STATISTICS MEASUREMENT
 describing the characteristics of a set of data TYPES OF VARIABLES
 summarize & describe data collected 1. QUALITATIVE VARIABLE
INFERENTIAL STATISTICS (INDUCTIVE STATISTICS) 2. QUANTITATIVE VARIABLES
 proceeding from data characteristics to making
generalizations, estimates, forecasts, or other 1. QUALITATIVE VARIABLES/CATEGORICAL
judgments based on data VARIABLES
 arrive at inferences regarding the  person or object belongs in a category
phenomenon/phenomena for which sample data  also referred to as attributes, typically involve
were obtained counting how much people or objects will fall into
each category
STATISTIC  we describe the percentage or the number of
 can be a measure of typicalness / central persons or objects falling into each of the possible
tendency categories
 Mean, median, mode or proportion 2. QUANTITATIVE VARIABLES
 Data is measured by:  enables us to determine how much of something
- Typicalness/center is possessed, not just whether it is possessed
- Spread
2 TYPES OF QUANTITATIVE VARIABLES
HOW TO MEASURE CENTER? 1. DISCRETE QUANTITATIVE VARIABLE
1. Mean: average  can take only certain values along an interval, with
2. Median: middle / physical center the possible values having gaps between them
3. Mode: value most frequency observed  usually consists of observations that we can count
and often have integer values
HOW TO MEASURE SPREAD? 2. CONTINUOUS QUANTITATIVE VARIABLE
 How far values are from one another  can take on a value at any point along an interval,
1. Range: Maximum – Minimum possible values with no gaps between them
2. Standard Deviation: Average (distance
between each point and center) SCALES OF MEASUREMENT
1. NOMINAL SCALE
KEY TERMS FOR INFERENTIAL STATISTICS 2. ORDINAL SCALE
SAMPLE 3. INTERVAL SCALE
 smaller number (a subset) of the people or objects 4. RATIO SCALE
that exist within the larger population
POPULATION 1. THE NOMINAL SCALE
 the universe, entire set of people or objects of  uses numbers only for the purpose of identifying
interest membership in a group or category
SAMPLE STATISTIC  numbers have no arithmetic meaning
 measured characteristic of the sample 2. THE ORDINAL SCALE
 can be a measure of TYPICALNESS or CENTRAL  numbers represent "greater than" or "less than"
TENDENCY, such as mean, median, mode or measurements such as preferences or rankings
proportion  numbers are viewed in terms of rank
 may be a measure of spread or dispersion, such  ordinal scale has no unit of measurement
as range & standard deviation 3. THE INTERVAL SCALE
POPULATION PARAMETER  not only includes "greater than" or "less than"
 numerical characteristic of the population relationships, but also has a unit of measurement
 typical parameters include mean, median, that permits us to describe how much more or less
proportion & standard deviation one object possesses than another

VARIABLES
 unit of measurement is arbitrary & there is no GUIDELINES FOR THE FREQUENCY DISTRIBUTION
absolute zero level where none of a given  Set of classes must be mutually exclusive. There
characteristic is present should be no overlap between classes & limits
4. THE RATIO SCALE  The set of classes must be exhaustive. No data
 similar to interval scale but has an absolute zero & values should fall outside the range covered by
multiples are meaningful the frequency distribution.
 If possible, the classes should have equal widths.
CHAPTER 2 Unequal class widths make it difficult to interpret
RAW DATA both frequency distributions & their graphical
 have not been manipulated or treated in any way presentations.
beyond their original collection  Selecting the number of classes to use is a
 will not be arranged or organized in any subjective process.
meaningful manner  Whenever possible, class widths should be round
WHEN RAW DATA ARE QUANTITATIVE, 2 WAYS TO numbers.
ADDRESS THIS PROBLEM:  If possible, avoid using open-end classes. These
1. Frequency Distribution are classes with either no lower limit or no upper
2. Histogram limit.
FREQUENCY DISTRIBUTION STEPS IN CREATING A FREQUENCY DISTRIBUTION
 A table that divides the data values into classes & TABLE
shows the number of observed values that fall into 1. Find the highest and lowest value.
each class 2. Find the range
RELATIVE FREQUENCY - Highest value – lowest value
 Frequency of the ith class / total frequency 3. # of classes (number of rows/distinct
 (Frequency of the ith class / total frequency) x100 classes/groupings): Sturges’ Rule
CUMULATIVE FREQUENCY - K = 1 +3.322(log10n)
 Add previous frequency 4. Class width (thickness of a class):
CUMULATIVE RELATIVE FREQUENCY - Width = Range / K
 Cumulative frequency / total frequency - Classes should not overlap
 (Cumulative frequency / total frequency) x 100 - Width should be the same
HISTOGRAM - All raw data should be slotted in a class
 Describes a frequency distribution by using a 5. Lower limits of 1st class (lesser than the lowest value
series of adjacent rectangles, each of which has a in the data)
length that is proportional to the frequency of the 6. Upper limit (add the width to the lower limit)
observations within the range of values it 7. Define the classes
represents - First class includes the lowest value in the data
 Represents quantitative data set.
 y-axis: Frequency ; x-axis: Classes 8. Interpretation
CLASS FREQUENCY POLYGON
 Each category of the frequency distribution  Consists of line segments connecting the points
FREQUENCY formed by the intersections of the class marks with
the class frequencies
 The number of data values falling within each
class  Relative frequencies or percentages may also be
CLASS LIMITS used in constructing the figure
 The boundaries for each class  Empty classes are included at each end so the
curve will interest the horizontal axis
 These determine which data values are assigned
to that class  y-axis: relative frequency ; x-axis: classes / class
CLASS INTERVAL marks (midpoints)
OGIVE
 The width of each class
 A graphical display providing cumulative values for
 This is the difference between the lower limit of the
frequencies, relative frequencies or percentages
class & the lower limit of the next higher class
 These values can either be “greater than” or “less
 When a frequency distribution is to have equally
than”
wide classes, the approximate width of each class
is  y-axis: percentage of observations less than upper
limit of each class ; x-axis: class marks (midpoints)
STEM-AND-LEAF DISPLAY
 A variant of the frequency distribution, uses a
subset of the original digits as class descriptors
 A given data value can be represented only once
CLASS MARK in a display
 The midpoint of each class  All values must be expressed in terms of the same
 Midway between the upper & lower class limits stem digit & the same leaf digit
 Shows just 2 figures for each data value
 If placed horizontally, looks like a histogram
DOT PLOT
 Displays each data value as a dot & allows us to
readily see the shape of the distribution as well as
high & low values
BAR CHART
 Represents frequencies according to the relative
lengths of asset of rectangles, but it differs in 2
respects from the histogram:
- (1) the histogram is used in representing
quantitative data, while the bar chart represents
qualitative data
- (2) adjacent rectangles in the histogram share a
common side, while those in the bar chart have
a gap between them
 y-axis: classes; x-axis: frequency
LINE GRAPH
 Is capable of simultaneously showing values of 2 SIMPLE TABULATION
quantitative variables (y, or vertical axis & x, or  Deals with 1 variable
horizontal axis); it consists of linear segments Count of Engine
connecting points observed or measured for each Engine Total
variable 1 17
 y-axis: frequency ; x-axis: classes 2 13
PIE CHART Grand Total 30
 Is a circular display divided into sections based on
either the number of observations within or the CROSS TABULATION
relative values of the segments  Deals with 2 or more variables
 It can be constructed by using the principle that a Engine
Air-conditioning Grand
circle contains 360 degrees 1 2 Total
 The angle for each piece can be calculated as 1 8 9 17
follows 2 5 8 13
Grand
13 17 30
Total

PICTOGRAM CHAPTER 3
 Can describe frequencies or other values of ARITHMETIC MEAN
interest  Arithmetic average / mean
 Can be misleading  Sum of the data divided by the number of
THE SCATTER DIAGRAM OR SCATTERPLOT observations
 2-dimensional dotplot  Most common measures of central tendency
 Each point in the diagram represents a pair of  GROUPED DATA
known or observed values of 2 variables, generally
referred to as y and x, with y represented along μ = Σfimi/Σfi
the vertical axis and x represented along the  RAW DATA
horizontal axis. - For a population
 2 variables are referred to as: μ = Σxi/N
- dependent (y) variable - For a sample
- independent (x) variable
 A direct (positive) linear relationship between the x̅ = Σxi/n
variables THE WEIGHTED MEAN/WEIGHTED AVERAGE
- Positive slope  Each data value is weighted according to its
- X & Y are increasing together relative importance
 An inverse (negative) linear relationship between  RAW DATA
the variables negative slopes - For a population
- Negative slope μ = (Σwixi)/(Σwi)
- Y decreasing & X increasing - For a sample
x̅ = (Σwixi)/(Σwi)
THE MEDIAN
 Value that has just as many values above it as
below it
 GROUPED DATA RANGE
Md = L + (N/2 -Cfp)/(fmed) x w  Simplest measure of dispersion
 Difference between the highest and lowest
 RAW DATA
values
- For odd numbers, median will be halfway
between 2 values in the middle when the values  Weakness of being able to recognize only the
extreme values in the data
are arranged
- If there is even one very extreme values, the
(N+1)/2 range can be a large number that is misleading
- For even numbers to the unwary
N/2 MIDRANGE
THE MODE  Variant of the range
 RAW DATA  The average of the lowest data value and
- A value that occurs with the greatest frequency highest data value
- When there are 2 modes, a distribution of values  Vulnerable to the effects of extremely low or
is referred to bimodal. high outliers in the data
- There can be no mode  Sometimes used as a very approximate
 GROUPED DATA measure of central tendency
- Midpoint of modal class
- Modal class has the greatest frequency QUARTILES
 Separate the data into equal-size groups in
COMPARISON OF THE MEAN, MEDIAN AND MODE order of numerical value
 Mean  First quartile
- gives equal consideration to even very extreme Q1 = (N+1)/4
values in the data  Second quartile (the median)
- influenced of one or two very low or high values
- Mean makes more complete use of the data Q2 = 2(N+1)/4
 Median  Third quartile
- tends to focus more closely on those in the Q3 = 3(N+1)/4
middle of the data array
 Mode INTERQUARTILE RANGE
- Can have more than one mode, but only one  Difference between the third quartile & the first
value for mean and median quartile
- Tends to be less useful than the mean & median  Distance between 75% and 25% values
as a measure of central tendency
Q3 – Q1
DISTRIBUTION SHAPE AND MEASURES OF
CENTRAL TENDENCY QUARTILE DEVIATION
 SYMMETRICAL DISTRIBUTION: left and right  One-half the interquartile range
sides of the distribution are mirror images of each (Q3 – Q1)/2
other
 NORMAL DISTRIBUTION: single mode, bell- RESIDUALS
shaped  Difference between each data value and the
 SKEWNESS: refers to the tendency of the group mean
distribution to “tail off” to the right or left - Left of the mean - > 0
- SYMMETRICAL DISTRIBUTION: mean, - Right of the mean + < 0
median & mode are the same (only true for  GROUPED DATA
unimodal distribution, not possible for bimodal - For a population
distributions)
- POSTIVIELY SKEWED DISTRIBUTION: mean Residuals = xi - μ
is greater than the median, which in turn is - For a sample
greater than the mode Residuals = xi - x̅
*median will tend to be a better measure of
central tendency than the mean MEAN ABSOLUTE DEVIATION (MAD)
- NEGATIVELY SKEWED DISTRIBUTION: less  Average deviation or average absolute deviation
than the median which in turn is less than the
 The average of the absolute values of
mode
differences from the mean
*median is less influenced by extreme values
- For a population
and tends to be a better measure of central
tendency than the mean MAD = Σ|xi – μ| / N
- For a sample
MAD = Σ|xi – x̅| / n  About 95% of the observations will fall within 2
standard deviations of the mean
VARIANCE
 Practically all of the observations will fall within 3
 Common measure of dispersion, includes all
standard deviations of the mean
data values
 GROUPED DATA
STANDARDIZED DATA
- For a population
 Based on this concept & involves expressing
σ2 = Σ[fi (mi – μ)2] / N each data value in terms of its distance (in
σ2 = Σfi(mi)2 – N(μ)2 / N standard deviations) from the mean
- For a sample  Have no units
 How far above or below the individual value is
s2 = Σ[fi (mi – x̅)2] / (n-1) composed to be population mean in units of
s2 = Σfi (mi)2 – n(x̅)2 / (n-1) standard deviation
 RAW DATA  A negative z means the data value falls below
- For a population the mean
 Mean will always be zero (0)
σ2 = Σ(xi – μ)2 / N  For the population
σ2 = Σ(xi)2 – N(μ)2 / N zi = (xi - μ)/σ
- For a sample  For the sample
s2 = Σ(xi – x̅)2/(n-1) zi = (xi - x̅)/s
s2 = Σ(xi)2 – n(x̅)2/(n-1)
THE COEFFICIENT OF VARIATION
STANDARD DEVIATION  Indicated relative amount of dispersion in the
 Positive square root of the variance of either a data
population or a sample  Enables us to easily compare the dispersion of 2
 Important measure of dispersion because it is sets of data that involve different measurement
the basis for determining the proportion of data units or differ substantially in magnitude
values within certain distances on either side of  For a population
the mean for certain types of distribution CV = (σ/μ) x 100
 GROUPED DATA & RAW DATA  For a sample
- For a population
σ = √σ 2 CV = (s/x̅) x 100
- For a sample CHAPTER 4: DATA COLLECTION AND SAMPLING
s = √s2 METHODS
TYPES OF STUDIES
CHEBYSHEV’S THEOREM 1. Exploratory Research
 When either a population or sample has a small 2. Descriptive Research
standard deviation, individual observation will 3. Causal Research
tend to be closer to the mean 4. Predictive Research
 A large standard deviation will result when
individual observations are scattered widely EXPLORATORY RESEARCH
about their mean  Helps us become familiar with the problem situation,
 Specifies the minimum percentage of identify important variables & use these variables to
observations that will fall within a given number form hypotheses that can be tested in subsequent
of standard deviations from the mean, research
regardless of the shape of distribution  Can also be of a qualitative nature
 For either a sample or a population, the  Understand the problem, identify relevant variables,
percentage of observations that fall within k (for formulate hypothesis
k>1) standard deviations of the mean will be at  No prior study has been done before
least DESCRIPTIVE RESEARCH
2
(1-1/k ) x 100  Needs exploratory study, more information than
exploratory
 Establish reliable measurements
THE EMPIRICAL RULE
CAUSAL RESEARCH
 Standard deviation rule
 Determine the relationship among variables
 Applies only to distribution that are ball shaped
 To determine whether one variable has an effect on
& symmetrical
another
 About 68% of the observations will fall within 1
standard deviation of the mean
 Should be pointed out that statistical techniques - (1) multiple choice – several alternatives to
alone cannot prove causality choose from
 Proof must be established on the basis of - (2) dichotomous – only 2 alternatives available
quantitative findings along with logic - (3) open-ended – respondent is free to formulate
own answer & expand on the subject of the
PREDICTIVE RESEARCH questions
 Needs causal study  Problems with questionnaires
 Use analysis to forecast - (1) The vocabulary level may be inappropriate
 Attempt to forecast some situation or value that will for the type of person being surveyed
occur in the future - (2) the respondent may assume a frame of
reference other than the one the researcher
THE RESEARCH PROCESS intended
1. Define the problem - (3) the question may contain “leading” words or
2. Decide on the type of data required phrases that unduly influence the response
3. Determine through what means the data will be - (4) the respondent may hesitate to answer a
obtained question that involves a sensitive topic
4. Plan collection of data/select a sample
5. Collect & analyze data ERRORS IN SURVEY RESEARCH
6. Draw conclusion & report findings  Survey research may lead to several different kinds
7. Make decisions in terms of research of errors
 Sampling error is a random error
SOURCES OF DATA - Nondirectional or nonsystematic, because
PRIMARY DATA measurements exhibiting random error are just
 Data generated by the researcher of this study as likely to be too high as they are to be too low
 Survey, experimental, observational research  Response & nonresponse errors
 Tend to require more time & expense than - Directional or systematic type
secondary data 1. Sampling Error
SECONDARY DATA 2. Non-sampling Error
 Gathered by someone else for some other purpose SAMPLING ERROR
- INTERNAL: sources within the researcher’s  Occurs because a sample has been taken instead of
organization a complete census of the population
- EXTERNAL: sources including governmental,  Determination of the sample size is necessary to
trade, commercial & internet sources have a given level of confidence that the sample
proportion will not be in error by more than a
SURVEY RESEARCH specified amount
 Communication with a sample of individuals in order RESPONSE ERROR
to generalize on the characteristics of the population  Some respondents may distort the truth (to put it
from which they were drawn kindly) when answering a question
 Biased questions can encourage such response
TYPES OF SURVEYS errors
THE MAIL SURVEY NONRESPONSE ERROR
 A mailed questionnaire is typically accompanied by  Not everyone in the sample will cooperate in
a cover letter & a postage-paid return envelope for returning the questionnaire or in answering an
the respondent’s convenience interviewer’s questions
THE PERSONAL INTERVIEW  Those who respond may be different from those who
 An interviewer personally secures the respondent’s don’t
cooperation & carries out what could be described
as “purposeful conversation” in which the EXPERIMENTATION & OBSERVATIONAL
respondent replies to the questions asked. RESEARCH
TELEPHONE INTERVIEW EXPERIMENTS
 Similar to the personal interview, but uses the  Purpose: to identify cause-and-effect relationships
telephone instead of personal interaction. between variables
 An interview conducted over the telephone  2 key variables in an experiment:
WEB SURVEY - (1) independent variable or treatment
 A questionnaire completed over the internet - (2) dependent variable or measurement
QUESTIONNAIRE DESIGN EXPERIMENTAL GROUP
 Also referred to as DATA COLLECTION  Persons or objects receiving a treatment
INSTRUMENT CONTROL GROUP
 Either filled out personally by the respondent or  Those who are not exposed to the treatment
administered & completed by an interviewer EXTRANEOUS VARIABLES
 May contain 3 types of questions:
 Outside variables that are not part of the experiment, population with a number, then use a random
but can influence the results number table to select those who will make up the
sample
2 KINDS OF VALIDITY: THE SYSTEMATIC SAMPLE
INTERNAL VALIDITY  We randomly select a starting point between 1 & k,
 Refers to whether T really made the difference in the then sample every kth element from the population
measurements obtained  PROBLEMS:
EXTERNAL VALIDITY - PERIODICITY: a phenomenon where the order
 Even if T did make the difference, asking whether in which the population appears happens to
the results can be generalized to other people or include a cyclical variation in which the length of
settings the cycle is the same as the value of k that we
are using in selecting the sample
SECONDARY DATA o Not a common problem, the possibility of its
 Collected by someone other than the researcher, for existence should be considered when
purposes other than the problem or decision at undertaking a systematic sample
hand; such data can be either internal or external STRATIFIED SAMPLE
2 TYPES OF SECONDARY DATA:  The population is divided into layers or strata then a
1. Internal simple random sample of member s from each
2. External stratum is selected
INTERNAL SECONDARY DATA  Strata members have the same percentage
 Generated by your own firm or organization representation in the sample as they do in the
 Internal secondary data have traditionally existed in population
the form of accounting or financial information TWO-WAY STRATIFIED SAMPLE
 Anything in written form that has ever been  A sample can also be stratified on the basis of 2 or
generated within the company falls within the real of more variables at the same time the sample has
internal secondary data been forced to take on the exact percentage
EXTERNAL SECONDARY DATA breakdown as the population in terms of 2 different
 Gathered by someone outside the firm or measurements or characteristics
organization  Forces the composition of the sample to be the
same as that of the population, at least in terms of
SAMPLING METHODS the stratification variable(s) selected
 Can be categorized as probability or nonprobability o Important if some strata are likely to differ
 PROBABILITY SAMPLING greatly from others with regard to the variable(s)
- each person or element in the population has a of interest
known (or calculable) chance of being included CLUSTER SAMPLE
in the sample  Involves dividing the population into groups then
- Allow us to estimate the maximum amount of randomly selecting some of the groups & taking
sampling error between our sample statistic & either a sample or census of the members of the
the true value of the population parameter being groups selected
estimated  Members of the population may not have the same
- Each person or element in the population has probability of inclusion, but these probabilities could
some (nonzero) known or calculable chance of be determined if we wished to exert the time & effort
being included in the sample
- However, every person or element may not have NONPROBABILITY SAMPLING
an equal chance for inclusion  Not every unit in the population has a chance of
 NONPROBABILITY being included in the sample & the process involves
- Primarily used in exploratory research studies at least some degree of personal subjectivity instead
where there is no intention of making statistical of following predetermined, probabilistic rules for
inferences from the sample to the population selection.
4 TYPES OF NONPROBABILITY SAMPLING:
PROBABILITY SAMPLING 1. Convenience Sample
1. Simple random sampling 2. Quota Sample
2. Systematic sample 3. Purposive Sample
3. Two-way stratified sample 4. Judgment Sample
4. Cluster sample

THE SIMPLE RANDOM SAMPLING CONVENIENCE SAMPLE


 Every person or element in the population has an  Members of such samples are chosen primarily
equal chance of being included in the sample because they are both readily available & willing to
 A practical alternative to placing names in hat or box participates
is to identify each person or element in the QUOTA SAMPLE
 This is similar to stratified probability sample  The relative frequency approach to probability
described previously, except that members of the depends on what is known as the law of large
various strata are not chosen through the use of a numbers
probability sampling technique LAW OF LARGE NUMBERS
 This type of sample is far inferior to the stratified  Over a large number of trials, the relative frequency
sample in terms of representativeness with which an event occurs will approach the
PURPOSIVE SAMPLE probability of its occurrence for a single trial
 Members are chosen specifically because they’re
not typical of the population THE SUBJECTIVE APPROACH
JUDGMENT SAMPLE  The subjective approach to probability is judgmental,
 This sample is selected on the basis that the representing the degree to which one happens to
researcher believes the members to be believe that an event will or will not happen
representative of the population SUBJECTIVE PROBABILITIES
 Hunches or educated guesses
CHAPTER 5: PROBABILITY  Neither based on mathematical theory nor
EXPERIMENT developed from numerical analyses of the
 An activity or measurement that results in an frequencies of past events
outcome
SAMPLE SPACE PROBABILITIES & ODDS
 All possible outcome of an experiment ODDS
EVENT  Sometimes used as a way of expressing the
 One or more of the possible outcomes of an likelihood that something will happen
experiment; a subset of the sample space  Chance of the event occurring is ___ times the
 The probability for any event will be a number chance that it will not occur
between 0 and 1  Typically expressed in terms of the lowest applicable
PROBABILITY integers
 A number between 0 and 1 that expresses the  Odds against the occurrence of the event are
chance that an event will occur reverse of the odds in favor
Conversion from odds to probability & vice versa:
0 ≤ P(A) ≤ 1 1. If the odds in favor of an event happening are A
For any event, the probability will be no less than 0 and to B, or A:B, the probability being expressed is
no greater than 1.
P(A) + P(A’) = 1
Either the event will occur (A) or it will not occur (A’). A’
is called the complement of A. 2. If the probability of an event is x (with 0 ≤ x ≤ 1),
the odds in favor of the event are “x to (1 – x)” or
3 APPROACHES TO PROBABILITIES “x:(1-x)”
1. Classical approach
2. Relative Frequency Approach UNIONS & INTERSECTIONS OF EVENTS
3. Subjective Approach MUTUALLY EXCLUSIVE EVENTS
 If one event occurs, the other cannot occur.
THE CLASSICAL APPROACH  An event (A) and its complement (A’) are always
 Describes a probability in terms of the proportion of mutually exclusive.
times that an event can be theoretically expected to
occur EXHAUSTIVE EVENTS
 A set of events is exhaustive if it includes all the
possible outcomes of an experiment
 Classical probabilities find their greatest application  The mutually exclusive events A and A’ are
in games of chance, but they are not so useful when exhaustive because one of them must occur
we are engaged in real-world activities where either:  When the events within a set are both mutually
- (1) the possible outcomes are not equally likely exclusive & exhaustive the sum of their probabilities
- (2) the underlying process are less well known is 1.0
- One of them must happen & they include all the
THE RELATIVE FREQUENCY APPROACH possibilities
 Probability is the proportion of times an event is - Entries within a relative frequency table are
observed to occur in a very large number of trials: mutually exclusive & exhaustive, their sum will
Relative frequency probability always be 1.0
For a very large number of trials,
INTERSECTIONS OF EVENTS
 2 or more events occur at the same time
 Represented by “A and B”
WHEN EVENTS ARE NOT INDEPENDENT
UNION OF EVENTS  When events are not independent, the occurrence of
 At least 1 of a number of possible events occurs one will influence the probability that another will
 Represented by “A or B” take place
 Under these conditions, a more general
ADDITION RULES FOR PROBABILITY multiplication rule applies:
WHEN EVENTS ARE MUTUALLY EXCLUSIVE Multiplication rule when events are not independent
 When events are mutually exclusive, the occurrence P(A and B) = P(A) x P(B|A)
of one means that none of the others can occur
 The probability that one of the events will occur is BAYES’ THEOREM
the sum of their individual probability  Deals with sequential events, using information
Rule of addition when events are mutually exclusive: obtained about a second event to revise the
P(A or B) = P(A) + P(B) probability that a first event occurred
Bayes’ theorem for the revision of probability:
WHEN EVENTS ARE NOT MUTUALLY EXCLUSIVE
 When events are not mutually exclusive, 2 or more
of them can happen at the same time
 The general rule of addition can be used in General form
calculating probabilities Probability of event A, given that event B has occurred
General rule of addition when events are not mutually
exclusive:
P(A or B) = P(A) + P(B) – P(AB)

MULTIPLICATION RULES FOR PROBABILITY COUNTING: PERMUTATIONS & COMBINATIONS


MARGINAL PROBABILITY THE PRINCIPLE OF MULTIPLICATION
 The probability that a given event will occur  If following a first event that can happen n1 ways, a
 No other events re taken into consideration second event can then happen n2 ways, the total
 Typical expression is P(A) number of ways both can happen is n1n2
 Alternatively, if each of k independent events can
CONDITIONAL PROBABILITY occur in n different ways, the total number of
 The probability that an event will occur, given that possibilities is nk
another event has already happened
 A typical expression is P(A|B), with the verbal FACTORIAL
description, “the probability of A, given B”  Exclamation point is just a mathematical way of
Conditional probability of event A, given that B has saving space
occurred: n! = n x (n – 1) x (n – 2) x … x 1
Note: 0! Is defined as 1
P(A|B) =
PERMETUATIONS
 Refer to the number of different ways in which
MULTIPLICATION RULES objects can be arranged in order
 Determine the probability that 2 events will both  In a permutation, each item can appear only once &
happen or that 3 or more events will all happen each order of the items arrangement constitutes a
 Depend on whether the events are INDEPENDENT separate permutation
or DEPENDENT  The number of possible arrangements can be
determined as follows:
INDEPENDENT EVENTS Number of permutations of n objects taken r at a time:
 Events are independent when the occurrence of one
event has no effect on the probability that another
will occur
 When events are independent, their joint probability COMBINATIONS
is the product of their individual probability  Consider only the possible set of objects, regardless
Multiplication rule when events are independent: of the order in which the members of the set are
P(A and B) =P(A) x P(B) arranged
 The number of possible combinations of n objects
taken r at a time will be as follows:
Number of combinations of n objects taken r at a time:
DEPENDENT EVENTS
 Events are dependent when the occurrence of one
event changes the probability that another will occur CHAPTER 6: DISCRETE PROBABILITY
DISTRIBUTIONS
PROBABILITY DISTRIBUTIONS  VARIANCE: the expected value of the squared
 As the relative frequency distribution that should difference between the random variable & mean or
theoretically occur for observations from a given E[(x – μ)2]
population General formulas for the mean & variance of a discrete
 It can be helpful to proceed from (1) basic probability formula
understanding of how a natural process seems to Mean
operate in generating events to (2) identifying the μ = E(x) or μ = ΣxP(x) for all possible values of x
probability that a given event may occur Variance
σ2 = E[(x – μ)2] or σ2 = Σ(x – μ)2P(x)
RANDOM VARIABLES: DISCRETE V.S. For all possible values of x & the standard
CONTINUOUS deviation is
RANDOM VARIABLE
σ = √σ2
 A variable that can take on different values
according to the outcome of an experiment THE BINOMIAL DISTRIBUTION
 Random because we don’t know ahead of time  Deals with consecutive trials, each of which has 2
exactly what value it will have following the possible outcomes
experiment
 The binomial distribution relies on what is known as
 Can either be: the BERNOULLI PROCESS
- DISCRETE RANDOM VARIABLE
 Family of distributions & the exact member of the
o A random variable that an take on only
family is determined by the values of n & π
certain values along an interval, with the
possible values having gaps between them  The following observations may be made regarding
- CONTINUOUS RANDOM VARIABLE the Bernoulli process & the requirement that the
o A random variable that can take on a value probability of success (π) remain unchanged
at any point along an interval 1. If sampling is done with replacement, the
person or other item selected from the
NATURE OF DISCRETE PROABIBILITY population observed then put back into the
DISTRIBUTION population), π will remain constant from
DISCRETE PROBABILITY DISTRIBUTION one trial to the next
 A listing of all possible outcomes of an experiment, 2. If sampling is done without replacement,
along with their respective probabilities of the number of trials (n) is very small
occurrence compared to population (N) of such trials
 Discrete probability distributions can be expressed
from which the sample is taken, as a
as histograms, where the probabilities for the
various x values are expressed by heights of a practical matter π can be considered to be
series of vertical bar constant from 1 trial to the next
CHARACTERISTICS OF A DISCRETE PROBABILITY CHARACTERISTICS OF A BERNOULLI PROCESS:
DISTRIBUTION 1. There are 2 or more consecutive trials
1. For any value of x, 0 ≤ P(x) ≤ 1 2. In each trial, there are just 2 possible outcomes
2. The values of x are exhaustive: The probability – usually denoted as success or failure
distribution includes all possible values 3. The trials are substantially independent; that is
3. The values of x are mutually exclusive: Only one the outcome in any trial is not affected by the
value can occur for a given experiment outcomes of earlier trials & it does not affect the
4. The sum of their probabilities is one or outcomes of later trials
ΣP(x) = 1.0 4. The probability of success remains the same
THE MEAN & VARIANCE OF A DISCRETE from 1 trial to the next
PROBABILITY DISTRIBUTION  The discrete random variable, x, is the number of
 Used measures of central tendency & dispersion to successes that occur in n consecutive trails of the
describe typical observations & scatter in the data Bernoulli process.
values The binomial probability distribution:
 It can be useful to describe a discrete probability The probability of exactly x successes in n trials is
distribution in terms of its central tendency &
dispersion
 EXPECTED VALUE: mean of a discrete probability πx(1- π)n-x
distribution for a discrete random value
- Referred to as E(x) or μ Mean
- Weighted average of all the possible outcomes μ = E(x) = nπ
with each outcome weighted according to its Variance
probability occurrence σ2 = E[(x – μ)2] = nπ(1 – π)
CHAPTER 7: CONTINOUS PROBABILITY μ- 2σ to μ+ 2σ
DISTRIBUTIONS - Nearly all pf the area (about 99.7%) is in the
 Describe probabilities associated with random interval μ- 3σ to μ+ 3σ
variables that are able to assume any of an infinite STANDARDIZING
number of values along an interval  The normal curve & expressing the original x values
 Are smooth curves where probabilities are in terms of their number of standard deviations away
expressed as areas under the curves from the mean
- Curve is a function of x  Result is referred to as a STANDARD (OR
- F(x) is referred to as a probability density STANDARDIZED) normal distribution, & it allows us
function to use a single table to describe areas beneath the
 Probabilities in terms of the probability that x will be curve
within a specified interval of values  The process is z-score:
THE PROBABILITY DISTRIBUTION FOR A
CONTINUOUS RANDOM VARIABLE
1. The vertical coordinate function of x, described
z = the distance from the mean, measured in standard
f(x) & referred to as the probability density
deviation units
function
x = the value of x in which we are interested
2. The range of possible x values along the
μ = the mean of the distribution
horizontal axis
σ = the standard deviation of the distribution
3. The probability that x will take on a value
USING THE STANDARD NORMAL DISTRIBUTION
between a & b will be the area under the curve
TABLE
between points a & b
1. Convert the information provided into one or two
- The probability density function f(x) for a given
more z-scores
continuous distribution is expressed in algebraic
2. Use the standard normal table to identify the
terms & the areas beneath are obtained through
area(s) corresponding to the z-score(s). The
the mathematics of calculus
table provides cumulative areas to the z value of
4. The total area under the curve will be equal to
interest
1.0
3. Interpret the result in such a way as o answer
NORMAL DISTRIBUTION
the original question
 A bell-shaped, symmetrical curve, the use of which
is facilitated by a standard table listing the
CHAPTER 8: SAMPLING DISTRIBUTIONS
cumulative areas beneath
SAMPLING DISTRIBUTION OF THE MEAN OR
1. Many natural & economic phenomena tend to be
DISTRIBUTION OF SAMPLE MEANS
approximately normally distributed
 The probability distribution of these sample means
2. It can be used in approximating other
for all possible samples of that particular size
distributions including binomial
4 IMPORTANT CHARACTERISTICS OF SAMPLING
3. Sample means & proportions tend to be
DISTRIBUTIONS:
normally distributed whenever repeated samples
1. The sampling distribution of the mean will have
are taken from a given population of any shape
the same mean as the original population from
 Mean, median & mode are all at the same position
which the samples were drawn
on the horizontal axis
2. The standard deviation of the sampling
 The curve is asymptotic, approaching the horizontal distribution of the mean is referred to as the
axis at both ends, but never intersecting with it standard error of the mean, or σx̅.
 Total area beneath the curve is equal to 1.0 - Standard deviation can be calculated as the
The normal distribution for the continuous random positive square root of its variance
variable, x (with -∞ ≤ x ≤ +∞) 3. If the original population is normally distributed,
the sampling distribution of the mean will also be
e-(1/2)[(x-μ)/ σ]2 normal.
4. If the original population is not normally
 The shape of the curve & its location along the x distributed, the sampling distribution of the mean
axis will depend on the values of the standard will be approximately normal for large sample
deviation & mean sizes & will more closely approach the normal
AREAS BENEATH THE NORMAL CURVE distribution as the sample size increases.
 Regardless of the shape of a particular normal - Known as the central limit theorem
curve, the areas beneath it can be described for any
interval of our choice SAMPLE PROPORTION
 For any normal curve, the areas beneath the curve  When a discrete random variable is the result of a
will be as follows Bernoulli process, with x = the number of successes
- About 68.5% of the area is in the interval in n trials, the result can be expressed as a sample
μ- σ to μ+ σ proportion, p:
- About 95.5% of the area is in the interval
(n) is increased, the sampling distribution of the
mean will more closely approach the normal
distribution.
 Sample proportion or sample mean can be - Basic to the concept of statistical inference
considered a random variable that will take on because it permits us to draw conclusions about
different values as the procedure leading to the the population based strictly on sample data &
sample proportion is repeated without having any knowledge about the
SAMPLING DISTRIBUTION OF THE PROPORTION distribution of the underlying population.
 The result of the probability distribution THE SAMPLING DISTRIBUTION OF THE
 It will have an: PROPORTION
- EXPECTED VALUE (π) The proportion of successes in a sample consisting of n
o Probability of success on any given trial trials will be
- STANDARD DEVIATION σp

WHEN THE POPULATION IS NORMALLY


DISTRIBUTED  Whenever both nπ & n(1 – π) are ≥ 5, the normal
 When a great many simple random samples of size distribution can be used to approximate the
n are drawn from a population that is normally binomial.
distributed, the sample means will also be normally  When these conditions are satisfied & the procedure
distributed leading to the sample outcome is repeated a great
 This will be true regardless if the sample size many times, the sampling distribution of proportion
 Standard error of the distribution of these means will (p) will be approximately normally distributed, with
be smaller for larger values of n this approximation becoming better for larger values
SAMPLING DISTRIBUTION OF THE MEAN, SIMPLE of n & for values of π closer to 0.5
RANDOM SAMPLE FROM A NORMALLY - Sampling distribution of the proportion has an
DISTRIBUTED POPULATION expected value & a standard error
 Regardless of the sample size, the sampling - Size of sample becomes larger, the standard
distribution of the mean will be normally distributed error becomes smaller
with Sampling distribution of the proportion, p:
Mean Mean = E(p) = π
E(x) = μx̅ = μ Standard Error
Standard Error

π = population proportion
μ = population mean n = sample size
σ = population standard deviation z-score for a given value of p:
n = sample mean
 z-score for the sampling distribution of the
mean, normally distributed population
SAMPLING DISTRIBUTIONS WHEN THE
POPULATION IS FINITE
 When sampling is without replacement & from a
z = distance from the mean, measured in standard error finite population
units  Whether we are dealing with the sampling
x = value of the sample mean in which we are interested distribution of the mean (x̅) or the sampling
μ = population mean distribution of proportion (p), the same correction
σx̅ =standard error of the sampling distribution of the factor is applied
mean or σ/√n  Depends on the sample size (n) versus the size of
the population (N) & as a rule of thumb, should be
WHEN THE POPULATION IS NOT NORMALLY applied whenever the sample is at least 5% as large
DISTRIBUTED as the population
 Provided that the sample size is large (n≥30) the  n < 0.05N, the correction will have very little effect
sampling distribution of the mean can still be  Finite correction factor is applied when n ≥ 0.05N
assumed to be normal  Purpose: to reduce the standard error according to
 This is because of what is known as the central limit how large the sample is compared to the population
theorem: - The purpose is to arrive at a corrected (reduced)
- For large, simple random samples from a value of the standard error of the sampling
population that is not normally distributed, the distribution
sampling distribution of the mean will be
approximately normal, with the mean μx̅ = μ &
the standard error σx̅ = σ/√n. As the sample size

Potrebbero piacerti anche