Sei sulla pagina 1di 6

CHAPTER 1 – OVERVIEW OF STATISTICS one’s way through the promotion process, or

when one moves to a new employer.


1.1 WHY STUDY STATISTICS?
− It is the science of collecting, organizing, analyzing, 5. Communication
interpreting, and presenting data. − Understanding the language of statistics
− A statistic is a single measure, reported as a facilitates communication and improves
problem solving.
number, used to summarize a sample data set; for
example, the average height of students in a
6. Computer skills
university
− The use of spreadsheets for data analysis and
− For the height of students, a graduation gown
word processors or presentation software for
manufacturer may need to know the average
reports improves upon your existing skills.
height for the length of the gowns or an architect
may need to know the maximum height to design
7. Information management
the height of the doorways of the classroom. Both
the average and the maximum are examples of − Statistics helps summarize small and large
statistics. amounts of information (data) and reveal
underlying relationships.
TWO PRIMARY KINDS:
▪ DESCRIPTIVE STATISTICS 8. Technical literacy
− Refers to the collection, presentation, and − Career opportunities are in growth industries
summary of data (either using charts or propelled by advanced technology the use of
graphs using numerical summary). statistical software increases your technical
literacy.
Examples:
1. Sampling and surveys 9. Process improvement
2. Visual and displays − Large firms have formal systems for
3. Numerical and summarizations continuous quality improvement. Statistics
4. Probability methods helps firms oversee their suppliers, monitor
5. “A total of 2000 out of 15000 passed the CPA their internal operations, and identify
board exam.” problems. Quality improvement goes far
6. “According to The New York Times, Dartmouth beyond statistics, but every college graduate
College has the highest median mid-career salary.” is expected to know enough statistics to
7. “As per Philippine Statistics Authority, the number understand its role in quality improvement.
of employed persons in January was estimated at
43.7 million.” 1.3 STATISTICS IN BUSINESS
1. Auditing
▪ INFERENTIAL STATISTICS − The firm has learned that some invoices are
− Refers to the generalization from a sample to being paid incorrectly, but it doesn’t know
a population, estimating unknown population how widespread to problem is. A sample of
parameters, drawing conclusion, and making invoices can be used to estimate the
decisions. proportion of incorrectly paid invoices.
Examples:
1. Estimating parameters 2. Marketing
2. Testing Hypothesis − Many companies use Customer Relationship
3. Regression and Trends Management (CRM) to analyze data from
4. Quality Control multiple sources. With the statistical and
5. “A researcher concludes that a student’s analytics tools such as correlation and data
Mathematics grade can predict his success in mining, they identify specific needs of
Management Science.” different customer groups, and this helps
6. “Drinking decaffeinated can raise cholesterol them market their products and services
levels by 7%.” more effectively.

1.2 WHY STUDY STATISTICS? 3. Health care


1. Knowing statistics will make you a better − Evaluate 100 incoming patients using a 42-
consumer of other people’s data analyses. item physical and mental assessment
questionnaire.
2. You should know enough to handle everyday data
problems, to feel confident that others cannot 4. Quality improvement
deceive you with spurious arguments, and to − Initiate a triple inspection program, penalties
know when you’ve reached the limits of your for workers who produce poor-quality output.
expertise.
5. Purchasing
3. Statistical knowledge gives a company a − A food producer purchases plastics containers
competitive advantage against organizations that for packaging its product. Inspection of the
cannot understand their internal or external most recent shipment of 500 containers found
market data. that 3 of the containers were defective. The
supplier’s historical defect rate is 0.005. Has
4. Mastery of basic statistics gives an individual the defect rate really risen or is this simply a
manager a competitive advantage as one works “bad” batch.

-1-
cjb.idl
6. Medicine 1.5 CRITICAL ANALYSIS
− Determine whether a new drug is really better − Statistics is an essential part of critical thinking
than the placebo or if the difference is due to because it allows us to test an idea against
chance. empirical evidence.
− Empirical data represent data collected through
7. Operations management observation and experiments.
− Manage inventory by forecasting consumer − Statistical tools are used to compare prior ideas
demand. with empirical data, but pitfalls do occur.

8. Product warranty PITFALL 1: Conclusions from SMALL samples


− Determine the average dollar cost of engine • Be careful about making generalizations from
warranty claims on a new hybrid engine. small samples (e.g., a group of 10 patients who
showed improvement).
1.4 STATISTICAL CHALLENGES
The ideal analyst (business professionals using PITFALL 2: Conclusions from NONRANDOM samples
statistics) should possess these characteristics: • Be careful about making generalizations from
• Is technically current (e.g., software-wise) small samples and from retrospective studies of
• Communicates well special groups (e.g., studying heart attack patients
• Is proactive without defining matched control group).
• Has a broad outlook
• Is flexible PITFALL 3: Conclusions from RARE events
• Focuses on the main problem • Be careful about drawing strong inferences from
• Meets deadlines events that are not surprising when looking at the
• Knows his/ her limitations and is willing to ask for entire population (e.g., winning a lottery).
help
• Can deal with imperfect information PITFALL 4: Poor survey methods
• Has professional integrity • Be careful about using poor sampling methods or
vaguely worded questions (e.g., anonymous
BUSINESS ETHICS survey or quiz).
Some broad ethical responsibilities of business
include the following: PITFALL 5: Assuming a causal link
• Treating customers in a fair and honest manner • Be careful about drawing conclusions when no
• Complying with laws that prohibit discrimination cause-and-effect link exists (e.g., team who play in
• Ensuring that products and services meet safety named ballparks – Citi Field for the NY Mets – tend
regulations to lose more games than they win). Actually, it is
the players and managers who determine whether
• Standing behind warranties
a team wins.
• Advertising in a factual and informative manner
• Encouraging employees to ask questions and voice
PITFALL 6: Generalization to individuals
concerns about the company’s business practices
• Avoid reading too much into statistical
• Being responsible for accurately reporting
generalizations (e.g., men are taller than women).
information to management
Yes, but only in a statistical sense. Men are taller
on average, but many women are taller than many
UPHOLDING ETHICAL STANDARDS
men.
Ethical standards for the data analyst:
• Know and follow accepted procedures PITFALL 7: Unconscious bias
• Maintain data integrity • Be careful about unconsciously or subtly allowing
• Carry out accurate calculations bias to color handling of data (e.g., heart disease in
• Report procedures faithfully men versus women). Symptoms in men are more
• Protect confidential information obvious than in women.
• Cite sources
• Acknowledge sources of financial support PITFALL 8: Significance versus importance
• Statistically significant effects may lack practical
USING CONSULTANTS importance (e.g., Austrian military recruits born in
− Hire consultants at the beginning of the the spring average 0.6 cm taller than those born in
project, when your team lacks certain skills or the fall). Would anyone notice this difference?
when and unbiased or informed view is
needed.
− Take note that some companies expect their
employees to be able to interpret the results
of a statistical analysis, even if it was
completed by an outside consultant.

COMMUNICATING WITH NUMBERS


− Numbers have meaning only when
communicated in the context of a certain
situations.
− Presentation should be such that managers
will quickly understand the information they
need to use in order to make good decisions.
-2-
cjb.idl
CHAPTER 2 – DATA COLLECTION NOTE: Ambiguity is introduced when continuous data are
rounded to whole number. Be cautious
2.1 VARIABLES AND DATA
DATA TERMINOLOGY: Observations, Variables, and Data TIME-SERIES DATA AND CROSS-SECTIONAL DATA
sets. ▪ TIME SERIES DATA
▪ OBSERVATION − Each observation in the sample represents a
− A single member of a collection of items that different equally spaced point in time (e.g., years,
we want to study, such as a person, firm, or months, days).
region. − The periodicity may be annual, quarterly, monthly,
weekly, daily, hourly, etc.
▪ VARIABLE − Examples of MACROECONOMICS time series data
− A characteristic of the subject of the subject would include national income, economic
or individual such as an employee’s income indicators, and monetary data.
or an invoice amount. − Examples of MICROECONOMICS time series data
would include sales, market share, inventory
▪ DATA SET turnover, etc.
− Consists of all the values of all of the variables − For time series, we are interested in trends and
for all of the observations we have chosen to patterns over time (e.g., personal bankruptcies
observe. from 1980 to 2008)

A data set may consist of many variables. ▪ CROSS-SECTIONAL DATA


• The questions that can be explored and the − Each observation represents a different individual
analytical techniques that can be used will depend unit (e.g., person) at the same point in time (e.g.,
upon the data type and the number of variables. monthly VISA balances)
• The textbook starts with univariate data sets (one − For cross-sectional data, we are interested in:
variable), then moves to bivariate data sets (two 1. Variation among observations (e.g., accounts
variables), and multivariate data sets (more than receivable in 20 subway franchises)
two variables), as illustrated in table 2.2. 2. Relationships (e.g., whether accounts
receivables are related to sales volume in 20
DATA SET VARIABLES EXAMPLE TYPICAL subway franchises)
TASKS − We can combine the two data types to get pooled
Univariate 1 Income Histograms, cross-sectional and time series data.
Basic
statistics 2.2 LEVELS OF MEASUREMENTS
Bivariate 2 Income, Age Scatterplots, LEVEL OF CHARACTERISTICS EXAMPLES
correlation MEASUREMENT
Multivariate 3 Income, Age, Regression Nominal Categories only Eye color (blue,
Gender modeling brown, green,
etc.)
CATEGORICAL AND NUMERICAL DATA
TYPES OF DATA Ordinal Rank has meaning. Rarely, never
▪ CATEGORICAL (qualitative) No clear meaning to
− Verbal Label distance
Example: Interval Distance has Temperature (57
1. Vehicle Type (car, truck, SUV) meaning degrees Celsius)
2. Gender – binary (male, female)
Ratio Meaningful zero Accounts
− Coded Label exists Payable (21.7
Example: million dollars)
1. Vehicle Type (1, 2, 3)
2. Gender – binary (0,1)
▪ NOMINAL MEASUREMENT
▪ NUMERICAL (quantitative) − Merely identify a category
− Discrete − Are qualitative, attribute, categorical, or
Example: classification data, and can be coded numerically
1. Broken eggs in a carton (1, 2, 3, … 12) (e.g., 1 = Apple, 2 = Samsung, 3 = Dell, 4 = HP).
2. Annual dental visits (0, 1, 2, 3, …) − The only mathematical operations are COUNTING
3. Amount of money spent on clothing for the (e.g., frequencies) and simple statistics.
past month
4. Frequency of making online purchases ▪ ORDINAL MEASUREMENT
− Its codes can be ranked (e.g., 1 = Frequently, 2 =
− Continuous Sometimes, 3 = Rarely, 4 = Never).
Example: − Distance between data codes is not meaningful
1. Patient waiting line (14.27 minutes) (e.g., distance between 1 and 2, or between 3 and
2. Customer satisfaction (85.20%) 4 lacks meaning)
3. Annual family salary income − Many useful statistical tests exist for ordinal data
4. Amount of time spent studying in the library which are especially useful in social science,
marketing, and human resource research.
− Example: level of agreement to a statement

-3-
cjb.idl
▪ INTERVAL MEASUREMENT ▪ CENSUS
− Not only data can be ranked, but it can also have − An examination of all items in a defined
meaningful intervals between scale points (e.g., population
difference between 60 degrees Fahrenheit and 70
degrees Fahrenheit is same as difference between SITUATIONS WHERE A SAMPLE MAY BE PREFERRED
20 degrees Fahrenheit and 30 degrees 1. INFINITE POPULATION
Fahrenheit). − No census is possible if the population is of
− Since intervals between numbers represent indefinite size (an assembly line can keep
distances, mathematical operations can be producing bolts, a doctor can keep seeing more
performed (e.g., average). patients).
− Zero point of interval scales is arbitrary, so ratios
are not meaningful (60 degrees Fahrenheit is not 2. DESTRUCTIVE TESTING
twice as warm as 70 degrees Fahrenheit). − The act of measurement may destroy or devalue
− Example: range of grade obtain the item (vehicle crash tests).

▪ RATIO MEASUREMENT 3. TIMELY RESULTS


− Data have all properties of nominal, ordinal, and − Sampling may yield more timely results (checking
interval data types, and also possess a meaningful wheat samples for moisture content, checking
zero (absence of quantity being measured). peanut butter for salmonella contamination).
− Because of this zero point, ratios of data values are
meaningful (e.g., 20 million profit is twice as much 4. ACCURACY
as 10 million). − Instead of spreading resources thinly to attempt a
− Zero does not have to be observable in the data; it census, budget might be better spent to improve
is an absolute reference point. training of field interviewers and improve data
− Example: salaries, score in quiz safeguards.

LIKERT SCALES 5. COST


− A special case of interval data frequently used in − Even if a census is feasible, the cost, in either time
survey research. or money, may exceed our budget.
− The coarseness of a Likert Scale refers to the
number of scale points 6. SENSITIVE INFORMATION
− A trained interviewer might learn more about
Use the following procedure to recognize data types: sexual harassment in an organization through
QUESTIONS IF “YES” confidential interviews of a small sample of
Q1. Is there a meaningful Ratio Data (statistical employees.
point? operations are allowed)
Q2. Are intervals between Interval Data (common SITUATIONS WHERE A CENSUS MAY BE PREFERRED
scale points meaningful? statistics allowed, e.g., 1. SMALL POPULATION
means and standard − If the population is small, there is little reason to
deviations) sample, for the effort of data collection may be
only a small part of the total cost.
Q3. Do scale points Ordinal Data (restricted to
represent rankings? certain types of
2. LARGE SAMPLE SIZE
nonparametric statistical
tests) − If the required sample size approaches the
population size, we might as well go ahead and
Q4. Are there discrete Nominal Data (only
take a census.
categories? counting allowed, e.g.,
finding the mode)
3. DATABASE EXISTS
− If the data are on disk, we can examine 100% of
CHANGING DATA BY RECORDING
the cases. But auditing or validating data against
• In order to simplify data or when exact data
physical records may raise the cost.
magnitude is of little interest, ratio data can be
recorded downwards into ordinal or nominal 4. LEGAL REQUIREMENTS
measurements (but not conversely).
− Banks must count all the cash in bank teller
• For example, recode systolic blood pressure as drawers at the end of each business dat. The U.S.
“normal” (under 130), “elevated’ (130 to 140), or Congress forbade sampling in the 2000 decennial
“high” (over 140).
population census.
2.3 SAMPLING CONCEPTS
PARAMETER OR STATISTIC?
POPULATION OR SAMPLE OR CENSUS?
❖ Statistic is a numerical value computed from a
▪ POPULATION
sample
− Involves all of the items one is interested in. It ❖ Statistics can be used as estimates of parameters
maybe FINITE (e.g., all of the passengers on a found in the population
plane) or effectively INFINITE (e.g., all of the ❖ Symbols are used to represent population
Cokes produced in an ongoing bottling process). parameters and sample statistics.
❖ From a sample of n items, chosen from a
▪ SAMPLE population, we compute statistics that can be used
− A subset of the population and involves looking as estimates of parameters found in the
only at some of the items selected from the population.
population.
-4-
cjb.idl
❖ To avoid confusion, we use different symbols for NONRANDOM SAMPLING METHODS
each parameter and its corresponding statistic. Judgement Use expert knowledge to choose “typical “
Sample items (e.g., which employees to
TARGET POPULATION interview).
− The population must be carefully specified and the
sample must be drawn scientifically so that the Convenience Use a sample that happens to be available
sample is representative. Sample (e.g., ask coworkers’ opinions at lunch.
− The target population is the population we are
interested in (e.g., U.S. gasoline prices).
− The sampling frame is the group from which we Focus In-depth dialog with a representative
take the sample (e.g., 115 000 stations). groups panel of individuals (e.g., iPod users).
− The frame should not differ from the target
population.
OTHER DATA COLLECTION METHODS
2.4 SAMPLING METHODS • Businesses now have many other methods for
RANDOM SAMPLING METHODS collecting data with the use of technology.
Simple Random Sample Use random numbers to • Point-of-sale (POS) systems can collect real-time
select items from a list data on purchases at retail or convenience stores,
(e.g., VISA cardholders). restaurants, and gas stations.
Systematic Sample Select every kth item from • Many companies use loyalty cards that are swiped
a list or sequence (e.g., during the purchase. These loyalty cards have the
restaurant customers). customer’s information, which can be matched to
Stratified Sample Select randomly within the purchases just made.
defined strata (e.g., by age, • Businesses also send out e-mail surveys to loyal
occupation, gender). customers on a regular basis to get feedback on
Cluster Sample Select random their products and services.
geographical regions (e.g., • Facebook can track your Internet searches using
zip codes) that represent its software algorithms.
the population. • Google also tracks Internet searches and provides
these data through its Google Analytics services.
SIMPLE RANDOM SAMPLE
− We usually denote the population size by N and SAMPLE SIZE
the sample size by n. − The necessary sample size depends on the
− In a simple random sample, every item in the Internet variability of the quantity being
population of N items has the same chance of measured and the desired precision of the
being chosen in the sample of n items. estimate
− A physical experiment to accomplish this would be − For example, the caffeine content of Mountain
to write each of the N data values on a poker chip, Dew is fairly consistent because each can or bottle
and then to draw n chips from a bowl after stirring is filled at the factory, so a small sample size
it thoroughly. would suffice to estimate the mean.
− In contrast, the amount of caffeine in an
SAMPLING WITH OR WITHOUT REPLACEMENT individually brewed cup of Bigelow Raspberry
− With the advent of technology, random samples Royale tea varies widely because people let it
can easily be generated. steep for varying lengths of time, so a larger
− If we do not allow duplicates when sampling, then sample would be needed to estimate a mean.
we are sampling without replacement.
− If we do allow duplicates when sampling, then we SOURCES OF ERROR OR BIAS
are sampling with replacement. SOURCES OF ERROR CHARACTERISTICS
− When should we worry about sampling without Nonresponse Bias Respondents differ from
replacement? Only when the population is FINITE non-respondents
and the sample size is close to the population size. Selection Bias Self-selected respondents
− Take note that a common criterion that is a finite are atypical
population is EFFECTIVELY INFINITE if the Response Error Respondents give false
sample is less than 5% of the population. information
− An equivalent statement is that a population is Coverage Error Incorrect specification of
EFFECTIVELY INFINITE when it is at least 20 frame or population
times as large as the sample. Measurement Error Unclear survey instrument
wording
CLUSTER SAMPLE Interview Error Responses influenced by
− Strata consist of geographical regions interviewer
− One-stage cluster sampling – sample consists of all Sampling Error Random and unavoidable
elements in each of k randomly chosen subregions
(clusters). 2.5 SURVEYS
− two-stage cluster sampling – first choose k BASIC STEPS OF SURVEY RESEARCH
subregions (clusters), then choose a random I. State the goal of the research.
sample of elements within each cluster. II. Develop the budget (time, money, staff).
III. Create a research design (target population,
frame, sample size).

-5-
cjb.idl
IV. Choose a survey type and method of DATA QUALITY
administration. − Responses are usually coded numerically (e.g., 1=
V. Design a data collection instrument male, 2 = female).
(questionnaire). − Missing values are typically denoted by special
VI. Pretest the survey instrument and revise as characters (e.g., blank, “.” or “*”).
needed. − Discard questionnaires that are flawed or missing
VII. Administer the survey (follow-up if needed). many responses.
VIII. Code the data and analyze it. − Watch for multiple responses, outrageous or
inconsistent replies, or out-of-range answers.
FIVE GENERAL CATEGORIES OF SURVEYS − Follow up if necessary and always document your
SURVEY CHARACTERISTICS data coding decisions.
TYPE
Mail Mail requires a well-targeted and current SURVEY SOFTWARE
mailing list (people move a lot). Expect a − Designing and creating a survey is much easier
low response rates and nonresponse bias than it used to be.
(non-respondents differ from those who − Software is available that automates much of the
respond). Zip code lists (often costly) are process, allowing you to use different question
and attractive option to define strata of formats, skip questions and move to a new
similar income, education, and attitudes. section, easily visualize the layout, and other
To encourage participation, a cover letter features.
should explain the uses of survey data. − Because most surveys are now administered
Plan for follow-up mailings. online, survey software also includes features that
Telephone Random dialing yields low response and is allow the respondent to remain anonymous if
poorly targeted. Purchased phone lists warranted and prevent respondents from taking
help reach the target population, through a the survey twice.
low response rate still is typical
(disconnected phones, caller screening, NOTES FROM OTHER SOURCES:
answering machines, work hours, no-call RANDOM VARIABLE
lists). Other sources of nonresponse bias − a variable whose values depend on the outcomes
include the growing number of cellphones, of an experiment
non-English speakers, and distrust caused − a function that maps the outcome of an
by scams. experiment to real numbers
Interviews Interviewing is expensive and time- − usually denoted by an uppercase letter of the
consuming, yet a trade-off between sample alphabet and its possible values are denoted with
size for high-quality results may be worth the corresponding lowercase letter
it. Interviewers must be well-trained − an − different from variables in Algebra, the values that
added cost. Interviewers can obtain a random variable can assume are associated with
information on complex or sensitive topics specific probabilities and will vary from one trial
(e.g., gender discrimination in companies, to another of a random experiment
birth control practices, diet and exercise).
Web Web surveys are growing in popularity but DISCRETE RANDOM VARIABLE
are subject to non-response bias because − a random variable that can take only countable
they miss those who feel too busy, don’t values
own computers, or distrust your motives − its set of possible values is in one-to-one
(spams and scams). This type of survey correspondence with a subset of natural numbers
works best when targeted to a well- − the values that a discrete random variable can
defined interest group on a question of assume are called mass points
self-interest).
Direct Observation can be done in a controlled CONTINUOUS RANDOM VARIABLE
Observation setting (e.g., psychology lab) but requires − a random variable that takes an unaccountably
informed consent, which can change infinites number of values as a result of
behavior. Unobtrusive observation is measurement
possible in some non-lab settings.
− its set of possible values is from any range of real
numbers
TYPES OF QUESTIONS
• Open-ended ❖ Every sample point in a given experiment can be
• Fill-in-the-blank associated with a specific value of a random
• Check boxes variable. Since an event is made up of sample
• Ranked choices points, it is not difficult to see that any event can
• Pictograms be described by specifying a value or range of
• Likert Scale values of a random variable. Hence the probability
of an event corresponds to the probability that a
REQUISITES OF A QUESTIONNAIRE random variable will assume one or more of its
− A lot of white space in the layout possible values.
− A short, clear instructions as a beginning
− The purpose of having the survey
− Anonymity of the respondents
− Covered all possible choices

-6-
cjb.idl

Potrebbero piacerti anche