Sei sulla pagina 1di 39

Introduction to Statistics - I

Mari Sudha

Outline
Glossary Levels of Measurement Sampling Organizing Data Statistics

Glossary
Population
Group of individuals under study

Sample
A finite subset of statistical individuals in a population

Parameter
A value, usually unknown (and which therefore has to be estimated), used to represent a certain population characteristic. For example, the population mean is a parameter that is often used to indicate the average value of a quantity. Denoted by Greek letters e.g., ,

Statistic
A quantity that is calculated from a sample of data; Possible to draw more than one sample from the same population - the value of a statistic will in general vary from sample to sample. Often assigned Roman letters (e.g. m and s)

Glossary (cont.)
Sample Size
No. of individuals in a sample

Population Frame
List of sampling units from which the sample is selected (directories, maps, registered voters, list(s), etc.)

Statistical Inference
Makes use of information from a sample to draw conclusions (inferences) about the population from which the sample was taken

Experiment
Any process or study which results in the collection of data, the outcome of which is unknown

Glossary (cont.)
Random Process
An experiment, trial, or observation that can be repeated numerous times under the same conditions; outcome of which are independent and identically distributed. It is in no ways affected by any previous outcome and cannot be predicted with certainty

Random Variable
A variable whose value results from a measurement on some type of random process e.g., the tossing of a coin Can be classified as either discrete (a random variable that may assume either a finite number of values or an infinite sequence of values) or as continuous (a variable that may assume any numerical value in an interval or collection of intervals

Independent Variables
Variables that are manipulated and whose effects are measured and compared; also known as treatments; may include price levels, advertising themes etc.,

Glossary (cont.)
Experimental/Test Unit
Individuals, organizations, or other entities whose response to the independent variables or treatments is examined; may include consumers, stores, or geographic areas

Dependent Variables
Variables that measure the effect of the independent variables on the test units; may include sales, profits, and market share

Extraneous Variables
Variables other than the independent variables that affect the response on the test units; can confound the dependent variable measures such that it weakens or invalidates the results of the experiment Includes store size, store location, and competitive effort

Raw Data
Data collected in original form

Glossary (cont.)
Frequency
Variables that measure the effect of the independent variables on the test units; may include sales, profits, and market share

Frequency Distribution
The organization of raw data in table form with classes and frequencies

Measurement Scales
Variables differ in how well they can be measured, i.e., in
how much measurable information their measurement scale can provide

There is obviously some measurement error involved in every measurement, which determines the amount of information that we can obtain Another factor that determines the amount of information that can be provided by a variable is its type/level of measurement scale

Outline
Glossary Levels of Measurement Sampling Organizing Data Statistics

Levels of Measurement
Data obtained from measurement classified using numbers (In order to determine the way we are going to measure the
variables)

Classification can be done with different levels of precision or levels of measurement Important to know the LOM we are working on partly determines the arithmetic and statistical operations that can be carried out on them

Levels of Measurement (cont.)


Four types of Levels of Measurement They, in ascending order of precision are:
- Nominal - Ordinal - Interval - Ratio

Nominal Levels of Measurement (cont.)


Numbers are used to classify data words or letter would be equally appropriate Variables assessed on a nominal scale are called categorical variables Examples include
- Religion (Protestant Catholic, Hebrew, Buddhist, etc) - Race (Caucasian, African-American, Hispanic, Asian, etc) - Linguistic Group - Marital Status (Married, Single, Divorced) - Credit Card Numbers, Bank Account Numbers, Employee ID

Nominal (cont.)
Simple and widely used when relationship between two variables is to be studied Nominal Scale numbers are no more than labels; used specifically to identify different categories of responses E.g.,
What is your gender? [ ] Male [ ] Female

Nominal (cont.)
E.g., A survey of retail stores done on two dimensions way of maintaining stocks and daily turnover.
How do you stock items at present? [ ] By product category [ ] At a centralized store [ ] Department wise [ ] Single warehouse Daily turnover of consumer is? [ ] Between 100 200 [ ] Between 200 300 [ ] Above 300

Ordinal Levels of Measurement (cont.)


Simplest attitude measuring scale used in Marketing Research Values given to measurements can be ordered There is a rough quantitative sense to their measurement, but the differences between scores are not necessarily equal Examples Shoe size
Shoes are assigned a number to represent the size, larger numbers mean bigger shoes (show an ordered relationship between numbered items) we know that a shoe size of 8 is bigger than a shoe size of 4. What you cant say though is that a shoe size of 8 is twice as big as a shoe size of 4

Ordinal (cont.)
E.g., Results of a horse race, which say only which horses arrived first, second, third, etc. but include no information about times Textual labels can be instead of numbers to represent the category responses

Ordinal (cont.)
E.g.1, Rank the following attributes (1 5), on their importance in a microwave oven
1. 2. 3. 4. Company Name Functions Price Comfort

5. Design

The most important attribute is ranked 1 by the respondents and the least important is ranked 5. Instead of numbers, letters or symbols too can be used to rate in a ordinal scale. Such scale makes no attempt to measure the degree of favorability of different rankings

Ordinal (cont.)
If there are 4 different types of fertilizers and if they are ordered on the basis of quality as Grade A, Grade B, Grade C, Grade D is again an Ordinal Scale If there are 5 different brands of Talcum Powder and if a respondent ranks them based on say, Freshness into Rank 1 having maximum Freshness Rank 2 the second maximum Freshness, and so on, an Ordinal Scale results

Interval Levels of Measurement (cont.)


Measurements are classified, ordered with equal distances between each interval on the scale (right along the scale from low end to high end i.e., ) Does not have an absolute zero; zero is arbitrary with further numbers placed at equal interval Also termed as Rating Scales E.g., Temperature in centigrade: distance between 96 and 98oC is the same
as between 100 and 102 oC; measurement of 100oC does not mean that the temperature is 10 times hotter than something measuring 10oC even though the value given on the scale is 10 times as large

Interval Levels of Measurement (cont.)


E.g., How do you rate your present refrigerator for the following qualities
Company Name Functions Price Design Overall Satisfaction Less Known Few Low Poor Very DisSatisfied 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 Well 5 Known 5 Many 5 High 5 Good Very 5 Satisfied

1 2 3 4

Tells us that position 5 on the scale is above position 4 and also the distance from 5 to 4 is same as distance from 4 to 3 Does not permit conclusion that position 4 is twice as strong as position 2 because no zero position has been established

Interval (cont.)
E.g.2, Calendar years are an interval scale. The arbitrary 0 (or 1 depending on your viewpoint) was assigned when Christ was born and time before this is labeled BC E.g.3, Difference between the following values is measured by a fixed scale
- Money - People - Education (in years)

Ratio Levels of Measurement (cont.)


Has a natural zero point and further numbers are placed at equally appearing Values given to measurements can be ordered Divisions between the points on the scale have the same distance between them and numbers on the scale are ranked according to size Not widely used in Marketing Research unless a Base Value is available for comparison For example scales for measuring physical quantities like length, weight, etc.

Ratio (cont.)
Data on certain demographic or descriptive attributes, if they are obtained through open-ended questions, will have ratio-scale properties E.g.,
What is your annual income before taxes? ______ $ How far is the Theater from your home ? ______ miles Answers to these questions have a natural, unambiguous starting point, namely zero. Since starting point is not chosen arbitrarily, computing and interpreting ratio makes sense. For example we can say that a respondent with an annual income of $ 40,000 earns twice as much as one with an annual income of $ 20,000

Levels of Measurement (cont.)


Nominal: Mode is frequently used for response category

Ordinal: The central tendency can be represented by its mode or its median, but the mean cannot be defined
Interval: Can be represented by its mode, its median, or its arithmetic mean. Statistical dispersion can be measured by range, inter-quartile range, and standard deviation.

Levels of Measurement (cont.)


Scale Type Mathematical structure Nominal (also denoted as categorical or discrete) Ordinal Interval Permissible Statistics Mode, chi square Median, percentile Mean, standard deviation, correlation, regression, analysis of variance All statistics permitted for interval scales plus the following: geometric mean, harmonic mean, coefficient of variation, logarithms Admissible Scale Transformation One to One (equality (=)) Monotonic increasing (order (<)) Positive linear (affine) Mathematical structure Standard set structure (unordered) Totally ordered set Affine line

Ratio

Positive similarities (multiplication)

Field

Levels of Measurement (cont.)


OK to compute Frequency Distribution Median and Percentiles Add or Substract Mean, Standard Deviation, Standard Error of the Mean Ratio or Coefficient of Variation Nominal Yes No No No No Ordinal Yes Yes No No No Interval Yes Yes Yes Yes No Ratio Yes Yes Yes Yes Yes

Outline
Glossary Levels of Measurement Sampling Organizing Data Statistics

Sampling
Depends upon the nature of the data and type of enquiry Procedure for selecting a sample
- Decide on the target population/audience - Identification of population frame - Selection of sampling procedure/technique - Decide the sample size - Execute the Sampling Process (Select the sample individuals)

The nature of selecting a sample can be broadly classified under three heads:
- Non-Probability Sampling - Probability Sampling - Mixed Sampling

Sampling (cont.)
Procedure for selecting a sample - Decide on the target population/audience
- Identification of population frame - Selection of sampling procedure/technique - Decide the sample size - Execute the Sampling Process (Select the sample individuals)

Sampling (cont.)
Non-Probability Sampling
- Every individual in the population does not have equal chance of being selected - Suffers from drawbacks of favoritism and nepotism depending upon beliefs and prejudice of investigator - Statistically valid statements cannot be made about the precision of the estimates (i.e. predictive value is weak) - Methods of Non-Prob. Sampling: 1. Convenience Sampling
2. Judgment Sampling 3. Quota Sampling 4. Snowball Sampling

Sampling (cont.)
Mixed Sampling
- Samples selected partly according to some laws of chance and partly

according to a fixed sampling rule - No assignment of probabilities

Sampling (cont.)
Probability Sampling
- Every individual in the population has an equal chance of being selected

- No assignment of probabilities - Different types of Probability Sampling:


I. Where each individual has an equal chance of being selected II. Sampling units have different probabilities of being selected III.Probability of selection of an individual is proportional to the sample size

- Forms of Probability Sampling:


I. Simple Random Sampling II. Stratified Simple Random Sampling III.Systematic Sampling IV.Cluster Sampling (simple and multistage)

Outline
Glossary Levels of Measurement Sampling Organizing Data Statistics

Organizing Data
The first step in the analysis of the data is organizing the collected numbers A frequency distribution is a tool for organizing data The first step in drawing a frequency distribution is to construct a frequency table A frequency table is a way of organizing the data by listing every possible score (including those not actually obtained in the sample) as a column of numbers and the frequency of occurrence of each score as another

Organizing Data (cont.): Frequency Distribution


Contingency Table
- Frequency tables of two variables presented simultaneously

Information contained in the frequency table may be transformed to a graphical or pictorial form, like:
I. Histograms II. Absolute Frequency Polygons III. Relative Frequency Polygons IV. Absolute Cumulative Frequency Polygons V. Relative Cumulative Polygons VI. Box Plots VII. Pie Charts etc.,

Data Analysis
The steps in the analysis of the data include:
- Data must be accurately scored and systematically organized to facilitate data analysis I. Scoring: assigning a total to each participants instrument II. Tabulating: the mechanics of organizing the data III. Coding: assigning numerals (e.g., ID) to data IV. Performing both the initial and more detailed analysis

Outline
Glossary Levels of Measurement Sampling Organizing Data Statistics

Statistics
Descriptive Statistics
Gives numerical and graphic procedures to summarize a collection of data in a clear and understandable way

Inferential Statistics
Provides procedures to draw inferences about a population from a sample

An unsophisticated forecaster uses statistics as a drunken man uses lamp-posts for support rather than for illumination ~ Andrew Lang

Potrebbero piacerti anche