Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Chapter 16
Statistics for Marketing & Consumer Research Copyright 2008 - Mario Mazzocchi
Statistics for Marketing & Consumer Research Copyright 2008 - Mario Mazzocchi
Would she go back home without buying washing powder at all? It could be difficult to define a model which explains choices using revealed preference, i.e. observing behaviors at the checkout till
if the customer decides not to buy washing powder at all, how would it be possible to infer this choice simply from a look at the products in her shopping trolley? if the customer buys an alternative brand with exactly the same size and price as before the price increase would a revealed preference model capture that consumer decision?
Statistics for Marketing & Consumer Research Copyright 2008 - Mario Mazzocchi
Choice models
Consumer models are usually targeted on the average behaviour With revealed preference one might apply a regression model; where we purchased the quantity is the dependent variable and price and other explanatory variables are on the right-hand side. With stated preference models a discrete choice variable is on the left-hand side of the equation
Example the choice whether to purchase washing powder or not (binary dependent variable); choice among a set of alternative brands (categorical DV)
5
Statistics for Marketing & Consumer Research Copyright 2008 - Mario Mazzocchi
Problems
After least square estimation predictions of y using the value of x would produce many other values than zero and one including values below zero and values above one Different coding for the binary dependent variable (e.g. one and two, or zero and ten) would lead to very different estimates for the a and b coefficients which makes the interpretation of the regression parameters difficult The above model does not meet the assumptions of the regression model since multivariate normality of the dependent variable for any value of the explanatory variables is broken
Statistics for Marketing & Consumer Research Copyright 2008 - Mario Mazzocchi
The right-hand side variable is generally assumed to be metric Binary and categorical variables on the right-hand side can be translated into dummies and used as explanatory variables like in regression analysis Non-metric dependent variables violate the normality and the homoskedasticity assumptions of regression; an alternative approach is used to estimate discrete choice models
Statistics for Marketing & Consumer Research Copyright 2008 - Mario Mazzocchi
yi 0 if zi d yi 1 if zi d
the dependent variable y is one when a latent continuous variable z is above the threshold d and zero otherwise The model is completed by a regression equation linking the latent variable to the explanatory variable
zi a b xi i
Statistics for Marketing & Consumer Research Copyright 2008 - Mario Mazzocchi
Problem 2 is easily resolved: as long as the intercept a appears in the regression equation, one may arbitrarily choose d (the easiest way is to fix it at zero) and the only result which will change is the estimate of the intercept a Problem 1 requires one to create z for each observation as a function of y, taking into account the information which we have, that is the proportions of zero and one for the y variable It is necessary to make an assumption on the probability distribution for this latent variable and how it is linked to y, i.e. a link function between y and z must be specified
Statistics for Marketing & Consumer Research Copyright 2008 - Mario Mazzocchi
Statistics for Marketing & Consumer Research Copyright 2008 - Mario Mazzocchi
10
Probabilities that y=1 (on the vertical axis) concentrate around zero for values of x below a certain threshold, then go quickly towards 1 when x is above the threshold. The function fits well with the need for approximating the probabilities of a binary outcome as a function of the explanatory variable. The logistic transformation of y into z is obtained by applying the logit link function to the expected value of y.
Statistics for Marketing & Consumer Research Copyright 2008 - Mario Mazzocchi
11
Logistic regression
The logit transformation is the link function for logistic regression The logit transformation is the log of the odds that y=1 relative to y=0 The logit link allows to transform the binary variable y into a continuous variable z The final equation is a regression model with a continuous variable on the left-hand side The only difference from the standard regression model is that the distribution of the error is not normal but logistic. Estimation of a and b can be obtained by maximum likelihood which works with any known probability distribution of the errors and returns the maximum likelihood estimates (the most probable values for the parameters)
Statistics for Marketing & Consumer Research Copyright 2008 - Mario Mazzocchi
12
13
Probit model
The Probit model is also applied to binary dependent variables but with different assumptions on the link function and the error distribution The link function (called probit) is the inverse of the standard normal cumulative distribution function This link function guarantees that the distribution of the model which is finally estimated is still normal The choice between the probit and the logit distribution depends on the type of dependent variable
if the dependent variable can be reasonably assumed to be a proxy for a true underlying variable which is normally distributed then the probit model should be chosen if the dependent variable is considered to be a truly qualitative and binomial character then logit modelling should be preferred generally the two models lead to very similar results, unless cases are concentrated to the tails of the distributions in which case the logit link function should be chosen
Statistics for Marketing & Consumer Research Copyright 2008 - Mario Mazzocchi
14
Generalizations
ordered logit (ordered probit) models
the dependent variable is not binary but categorical and the categories are ordered
15
Statistics for Marketing & Consumer Research Copyright 2008 - Mario Mazzocchi
Explanatory variables It is possible to opt for step-wise selection of explanatory variables Declare categorical variables Additional statistics
17
Additional statistics
The test compares the expected frequencies with those actually observed after dividing the subject in ten equal groups according to their predicted probabilities
In logistic regression, the exponential function of the coefficients are odds ratio this option provides confidence intervals
Statistics for Marketing & Consumer Research Copyright 2008 - Mario Mazzocchi
18
Output
Model Summ ary Step 1 -2 Log Cox & Snell likelihood R Square 467.079a .157 Nagelkerke R Square .217
a. Es timation terminated at iteration number 5 bec aus e parameter es timates c hanged by less than .001.
Step 1
Chi-square 3.030
df 8
Sig. .932
The hypothesis of equality between observed and predicted frequencies is not rejected
Classification Table(a)
Predicted
Butcher Observed Step 1 Butcher No Yes no 243 89 yes 34 54 Percentage Correct 87.7 37.8
Overall Percentage
a The cut value is .500
Statistics for Marketing & Consumer Research Copyright 2008 - Mario Mazzocchi
70.7
19
Coefficient estimates
These are odds ratios and are interpreted as follows: A one-year increase in age (q51) leads to a 2.2% increase in the odds of purchasing chicken at the butcher shop (i.e. the ratio between the probability of doing it and the probability of not doing it)
V ariables in the Equation
Step a 1
As requested, 95% confidence S.E. Wald df Sig. Ex p(B) intervals for the odds ratio are shown .007 8.988 1 .003 1.022
.074 .077 .028 .615 13.327 32.888 8.975 26.539 1 1 1 1 .000 .000 .003 .000 .764 1.554 1.088 .042
95.0% C.I.f or EXP(B) Low er Upper 1.008 1.037 .661 .883 1.337 1.807 1.030 1.150
Statistics for Marketing & Consumer Research Copyright 2008 - Mario Mazzocchi
A unit increase in trust in supermarket (q43b) 20 decreases the same odds ratio by 23.6%.
Statistics for Marketing & Consumer Research Copyright 2008 - Mario Mazzocchi
21
Statistics for Marketing & Consumer Research Copyright 2008 - Mario Mazzocchi
22
Artificial variable
Covariates Results are very similar to those obtained from logistic regression Model choice
Statistics for Marketing & Consumer Research Copyright 2008 - Mario Mazzocchi
23
Statistics for Marketing & Consumer Research Copyright 2008 - Mario Mazzocchi
24
GLM
It is comprehensive modeling approach for discrete choice modeling where one or more dependent categorical variables are modeled as the outcome of one or more explanatory variables which can be metric or non-metric. Depending on the type of link function the GLM collapses into:
logistic regression logit or probit models multinomial or multivariate logistic regression logit or probit models
Statistics for Marketing & Consumer Research Copyright 2008 - Mario Mazzocchi
25
GLMs
A binary dependent variable here leads to discrete choice models
It is possible to choose the dependent variable distribution and the link function
Statistics for Marketing & Consumer Research Copyright 2008 - Mario Mazzocchi
26
Here more model options (e.g. interaction) are defined Note that this procedure can also be used for log-linear analysis
Statistics for Marketing & Consumer Research Copyright 2008 - Mario Mazzocchi
27
Predictors
Factors are categorical variables
Statistics for Marketing & Consumer Research Copyright 2008 - Mario Mazzocchi
28
Model
It is necessary to specify how the predictors enter the model They need to be included as main effect If desired, interactions (also higher than two-way ones) may be introduced (see loglinear analysis)
Statistics for Marketing & Consumer Research Copyright 2008 - Mario Mazzocchi
29
Output
As expected results are identical to logistic regression
Param e ter Es tim ate s 95% Wald Conf idence Interval Parameter (Intercept) q5 q51 q21d q43b (Scale) B Std. Error 3.169 .6152 -.085 .0283 -.022 .0072 -.441 .0769 .269 .0738 1a Low er 1.963 -.140 -.036 -.592 .125 Upper 4.375 -.029 -.008 -.290 .414 Hypothes is Test Wald Chi-Square 26.539 8.975 8.988 32.888 13.327 df 1 1 1 1 1 Sig. .000 .003 .003 .000 .000
Dependent Variable: Butcher Model: (Intercept), q5, q51, q21d, q43b a. Fixed at the dis played value.
Statistics for Marketing & Consumer Research Copyright 2008 - Mario Mazzocchi
30
Ordinal logit
Statistics for Marketing & Consumer Research Copyright 2008 - Mario Mazzocchi
31
Output
The Pearsons Chi-square indicate a good fit, intended as the Warnings similarity between the predicted and observed data (but it is There are 975 (77.0%) cells (i.e., dependent variable levels by combinations of predictor sensitive variable values) with zero frequencies. to the large number of empty cells) A large proportion of Model Fitting Information The Pseudo R-square statistics are quite low, suggesting that empty cells may lead to the model could be improved by the inclusion of other covariates invalid goodness-of-fit -2 Log Model Likelihood Chi-Square df Sig. and factors. measures
Intercept Only Final Link function: Logit. Goodness-of-Fit Chi-Square 1088.968 816.931 df 1073 1073 Sig. .360 1.000 1004.627 987.352 17.276 7 .016
Pearson Deviance
A significant Chi-square statistic indicates that the ordered logit model is better than an intercept only model
Link function: Logit. Pseudo R-Square Cox and Snell Nagelkerke McFadden Link function: Logit.
Statistics for Marketing & Consumer Research Copyright 2008 - Mario Mazzocchi
32
Parameter estimates
Param e ter Es tim ate s 95% Conf idence Interval Low er Bound Upper Bound -4.705 -1.017 -4.222 -.556 -3.560 .085 -2.486 1.142 -1.761 1.864 -.521 3.126 -.021 .004 -2.618 .928 -1.917 1.602 -1.857 1.719 -1.678 1.920 -1.947 1.786 -1.403 2.952 . . Threshold [q43j = 1] [q43j = 2] [q43j = 3] [q43j = 4] [q43j = 5] [q43j = 6] q51 [q60=0] [q60=1] [q60=2] [q60=3] [q60=4] [q60=5] [q60=6] Es timate Std. Error -2.861 .941 -2.389 .935 -1.738 .930 -.672 .925 .051 .925 1.302 .930 -.009 .006 -.845 .905 -.158 .898 -.069 .912 .121 .918 -.081 .952 .774 1.111 a 0 . Wald 9.250 6.528 3.492 .527 .003 1.960 1.961 .873 .031 .006 .017 .007 .486 . df 1 1 1 1 1 1 1 1 1 1 1 1 1 0 Sig. .002 .011 .062 .468 .956 .162 .161 .350 .861 .940 .895 .933 .486 .
The location parameters translate the predictors into a value for the latent variable.
Location
The threshold determines the cut-off points for allocating an observation of a given value of the dependent variable,according to the value of the latent variable.
The Wald test (corresponding to the t-test in regression) shows that the predictors Link f unction: Logit. do not actually significantly. This is consistent with the poor Pseudo R a. This parametercontribute is set to zero bec aus e it is redundant. square statistics.
Statistics for Marketing & Consumer Research Copyright 2008 - Mario Mazzocchi
33
Marginal effects
What could be interesting (at least for a model with a better fit) is the computation of the marginal effects They represent the change in the probability of an observation of being classified in each specific category of the dependent variable according to the values of the predictors Unfortunately SPSS does not provide marginal effects
Statistics for Marketing & Consumer Research Copyright 2008 - Mario Mazzocchi
34
Multinomial logit/probit
The process is similar to the one leading to the estimation of ordered logistic regression and the output should also be interpreted accordingly
Statistics for Marketing & Consumer Research Copyright 2008 - Mario Mazzocchi
35
LIMDEP was specifically created for the estimation of limited dependent variable models,which include discrete choice models
It is extremely flexible and contains all the required features and the most up-todate diagnostics
STATA estimates discrete choice models with marginal effects Econometric views allow estimation of discrete choice models but the availability of diagnostics is rather limited when compared to LimDep and no marginal effects are displayed.
Statistics for Marketing & Consumer Research Copyright 2008 - Mario Mazzocchi
36
Conjoint analysis
Very popular research technique in marketing closely associated with stated preference analysis Mainly exploited for the development of new products and the modification of product characteristics Conjoint analysis is not a model or an estimation technique but rather a methodology for constructing the data collection instrument when the final objective is choice modeling
Statistics for Marketing & Consumer Research Copyright 2008 - Mario Mazzocchi
37
Rather than asking consumers about their evaluation of these attributes on a one-by-one basis,conjoint analysis starts by creating potential combinations of the product attributes E.g.
Combination 1: red car, with an mp3 stereo player and no air-conditioning, Combination 2: red car, but with a standard CD player and air-conditioning, etc.
Respondents choose among these alternative potential products defined by the combination of attributes From the final choice,conjoint analysis elicits the relevance of each attribute
Statistics for Marketing & Consumer Research Copyright 2008 - Mario Mazzocchi
38
Conjoint analysis
When several attributes are considered simultaneously the number of potential combinations is quite high Conjoint analysis creates many different choice sets each one containing a limited number of options Conjoint analysis is based on the statistical control of
the way choices are allocated in the sample the distribution of attributes
Hence, the collected data enable inference on preferences and evaluations for the individual attributes
Statistics for Marketing & Consumer Research Copyright 2008 - Mario Mazzocchi
39
By observing many individuals it is possible to go back from stated choices to preferences Conjoint analysis is inspired by scientific experimental designs and the terminology reflects this association
Attributes are called factors (e.g. car colour) The different values factors can assume are the levels (red, blue, yellow, etc.)
Statistics for Marketing & Consumer Research Copyright 2008 - Mario Mazzocchi
40
Conjoint analysis is a decompositional method (recall multidimensional scaling techniques),as it starts from an overall evaluation to infer preferences for the individual product attributes
Statistics for Marketing & Consumer Research Copyright 2008 - Mario Mazzocchi
41
42
Experimental design
The key problem of conjoint analysis is the large number of alternative combinations of attributes which arise when there are many factors and levels
E.g. a product with six attributes, each with three levels potentially allows for 729 different combinations
It would be unrealistic to assume that respondents are able to choose among so many alternatives This problem can be solved by an appropriate experimental design
Objective: understand the relationship between the factors and the potential choice with a number of observations as small as possible The experimental design sets the criteria to obtain the preference information from an aggregation of respondents (full factorial designs:
all potential products are compared (729 in the example)) fractional factorial designs: exploits the experimental design to reduce the number of choices, still guaranteeing that the sample will produce meaningful aggregate results 43
Statistics for Marketing & Consumer Research Copyright 2008 - Mario Mazzocchi
44
45
Example
Example
Car colour: red or blue Air conditioning: yes or no Single choice set
red with air conditioning (AC) red without AC blue with AC blue without AC
Choice-based conjoint
first choose among
red with AC blue without AC none of them
These choices are related and with a smaller set of choices it is possible to compare all attributes
Statistics for Marketing & Consumer Research Copyright 2008 - Mario Mazzocchi
46
47
There are computer packages specifically developed for conjoint analysis SPSS Conjoint module
deals with the experimental design provides estimates based on an orthogonal decomposition of the design matrix
In SAS/STAT, the TRANSREG procedure is a useful support to define the experimental design
48
Statistics for Marketing & Consumer Research Copyright 2008 - Mario Mazzocchi