BayesianStatisticsandMarketing ByRossiand Allenby

This article was downloaded by: [128.59.106.
102] On: 02 March 2016, At: 12:41

Publisher: Institute for Operations Research and the Management Sciences (INFORMS)
INFORMS is located in Maryland, USA
Marketing Science
Publication details, including instructions for authors and subscription information:
http://pubsonline.informs.org
Bayesian Statistics and Marketing

Peter E. Rossi, Greg M. Allenby,
To cite this article:

Peter E. Rossi, Greg M. Allenby, (2003) Bayesian Statistics and Marketing. Marketing Science 22(3):304-328. http://
dx.doi.org/10.1287/mksc.22.3.304.17739
Full terms and conditions of use: http://pubsonline.informs.org/page/terms-and-conditions
This article may be used only for the purposes of research, teaching, and/or private study. Commercial use
or systematic downloading (by robots or other automatic processes) is prohibited without explicit Publisher
approval, unless otherwise noted. For more information, contact permissions@informs.org.
The Publisher does not warrant or guarantee the articles accuracy, completeness, merchantability, fitness
for a particular purpose, or non-infringement. Descriptions of, or references to, products or publications, or
inclusion of an advertisement in this article, neither constitutes nor implies a guarantee, endorsement, or
support of claims made of that product, publication, or service.
2003 INFORMS
Please scroll down for articleit is on subsequent pages
INFORMS is the largest professional society in the world for professionals in the fields of operations research, management
science, and analytics.
For more information on INFORMS, its publications, membership, or meetings visit http://www.informs.org
Peter E. Rossi Greg M. Allenby
Graduate School of Business, University of Chicago, 1101 E. 58th Street, Chicago, Illinois 60637
Fisher College of Business, Ohio State University, 2100 Neil Avenue, Columbus, Ohio 43210
Downloaded from informs.org by [128.59.106.102] on 02 March 2016, at 12:41 . For personal use only, all rights reserved.
peter.rossi@gsb.uchicago.edu allenby.1@osu.edu
B ayesian methods have become widespread in marketing literature. We review the essence
of the Bayesian approach and explain why it is particularly useful for marketing prob-
lems. While the appeal of the Bayesian approach has long been noted by researchers, recent
developments in computational methods and expanded availability of detailed marketplace
data has fueled the growth in application of Bayesian methods in marketing. We emphasize
the modularity and exibility of modern Bayesian approaches. The usefulness of Bayesian
methods in situations in which there is limited information about a large number of units
or where the information comes from different sources is noted. We include an extensive
discussion of open issues and directions for future research.
(Bayesian Statistics; Decision Theory; Marketing Models; Critical Review)
1. Introduction require assessment of a prior, which some feel to be an

The past ten years have seen a dramatic increase in extra cost. Simulation methods, in particular, Markov
the use of Bayesian methods in marketing. Bayesian Chain Monte Carlo (MCMC) methods, have freed
analyses have been conducted over a wide range of us from computational constraints for a very wide
marketing problems from new product introduction class of models. MCMC methods are ideally suited
to pricing, and with a wide variety of different data for models built from a sequence of conditional dis-
sources. Bayesian methods are particularly appropri- tributions, often called hierarchical models. Bayesian
ate to the decision orientation of marketing problems. hierarchical models offer tremendous exibility and
While the conceptual appeal of Bayesian methods has modularity and are particularly useful for marketing
long been recognized, the recent popularity stems problems as discussed below.
from computational and modeling breakthroughs that While Bayesian methods have risen to prominence
have made Bayesian methods attractive for many in many elds, this review will emphasize a per-
marketing problems. In this paper, we will outline the spective on the use of Bayes methods that stems
basic advantages of the Bayesian approach, explain from a basic marketing paradigm. Fundamental to
how hierarchical Bayes models are ideally suited to this perspective is the notion that customers are dif-
many marketing data sets and decisions, and outline ferent in their preferences for products and that rms
the nature of the computational revolution. Through- must explicitly take this into account in determin-
out, we will emphasize the importance of a decision ing optimal marketing actions. It is useful, there-
orientation that we believe is an important aspect of fore, to view statistical analysis as comprised of three
marketing as a eld. components:
Until the mid-1980s, Bayesian methods appeared to 1. within-unit behavior (the conditional likelihood);
be impractical because the class of models for which 2. across-unit behavior (the distribution of hetero-
the posterior could be computed were no larger than geneity);
the class of models for which exact sampling results 3. action (the solution to a decision problem involv-
were available. Moreover, the Bayes approach does ing a loss function).
Marketing Science 2003 INFORMS 0732-2399/03/2203/0304

Vol. 22, No. 3, Summer 2003, pp. 304328 1526-548X electronic ISSN
BAYESIAN STATISTICS AND MARKETING
We will see how the Bayesian approach provides a provides a marked contrast to the sampling theo-
unied treatment of all three components. retic view in which we consider the data random,
We will follow these three steps as the outline of and we investigate the behavior of test statistics or
the paper, and conclude the paper with a discussion estimators over imaginary samples from py . The
of open issues and directions for future research. We Bayesian would regard the sampling distribution as
have also included Annotated Citations of Bayesian irrelevant to the problem of inference because it con-
Applications in Marketing in Appendix 1, which con- siders events y that have not occurred. Inference is
tains a list of published or accepted papers of the the problem of making statements about the unob-
last ten years that tackle marketing problems using servables conditional on the data.
Bayesian methods. The annotations provide a brief Since the posterior distribution can be a high-
description of the paper and how it relates to the top- dimensional object, investigators typically summarize
ics discussed in this paper. the posterior in terms of some lower dimensional
summary statistics. Typically, the posterior mean

E
= p y d is used as an estimator and the
2. Bayesian Essentials posterior standard deviation is used as a measure of
In this section, we introduce our notation for the
precision. Both of these quantities are the integrals of
Bayesian paradigm, and comment on the impor-
specic functions of the parameter vector, E y h
.
tant distinctions between classical and Bayesian
Other important examples include: (i) Aspects of the
approaches. We feel that these distinctions are under-
appreciated by researchers in marketing. We do not marginal distribution of one element or a subset of
attempt to provide a primer for Bayesian inference. the vector; (ii) posterior probabilities of intervals
For those interested in an introduction to Bayesian or regions of the parameter space (such as the pos-
inference and modern Bayesian computing methods, terior probability that a price coefcient is negative);
there are many excellent texts, including Bernardo and (iii) predictive distributions of the data, pyf y =

and Smith (1994), Gelman et al. (1995), Robert and pyf p y d. Thus, the Bayesian investigator is
Casella (1999), and Liu (2001). faced with the problem of computing a multidimen-
All Bayesian analysis starts with the specication sional integral of the posterior distribution. Methods
of the data-generating mechanism or the distribu- for computing these integrals are at the core of the
tion of the data y, given the unobservable parame- recent revolution in computing for Bayesian statistics.
ters , py . Viewed as a function of the parameters, The Bayesian framework is compelling in the sense
this distribution is sometimes called the likelihood that it provides a unied approach to modeling, incor-
function, l = py . The Bayesian, therefore, sub- poration of prior information, and inference. Inference
scribes to the likelihood principle that states that the here refers to making a posteriori statements about
likelihood function contains all relevant information all unobservables including both parameters and, as
regarding the model parameters. In addition, a proba- yet unrealized, data (prediction). Bayesian inference
bility distribution representing prior beliefs about is adheres to the likelihood principle and is conducted
required, p. Bayes theorem provides the updating using formal rules of probability theory. This means
mechanism for how prior beliefs are translated into that, under mild conditions, Bayes estimators are con-
posterior (or after the data) beliefs. sistent, asymptotically efcient, and admissible. As a
practical matter, Bayesian inference is free from the
py p
p y = py p use of asymptotic approximations and delivers exact,
py
nite sample inference. This is particularly important
p y is called the posterior distribution and reects in nonlinear models and models with discrete data.
both the prior beliefs, as well as sample information. The intuition developed for regression models of the
We note, immediately, that the posterior is a condi- sample size required for asymptotic sampling the-
tional distribution that conditions on the data. This ory to be accurate does not carry over well to many
Marketing Science/Vol. 22, No. 3, Summer 2003 305

ROSSI AND ALLENBY
of the models used with marketing data. In partic- Ansari et al. 2000a, Ter Hofstede et al. 2002) is impor-
ular, choice models may require extremely large (as tant to marketing problems. Classical inference pro-
much as 1,000 observations per parameter) samples cedures are silent on how to incorporate information
to insure the adequacy of asymptotic approximations from sources other than the data.
(cf. McCulloch and Rossi 1994). Some contend that specication of the likeli-
In general, Bayesian methods provide a better hood function is another drawback of the Bayesian
approximation to the level of uncertainty or, con- approach. For some models, evaluation of the like-
versely, the amount of information provided by the lihood can be computationally demanding. In other
model and the data than other approaches. For exam- situations, the investigator may be concerned with
ple, consider two-step procedures in which a subset model specication error induced by specifying an
of parameters are estimated in the rst stage, then inappropriate likelihood. Recent developments in sta-
the second stage estimates the remaining parameters, tistical computing have opened up the possibility of
conditional on the rst subset. Parameter uncertainty analyzing likelihood functions once thought to be
is difcult to account for in multistage analyses. Lenk computationally intractable. Regarding prior and like-
and DeSarbo (2000) provide an example of how a lihood specication, we recommend that the investi-
full Bayesian procedure outperforms an approximate gator perform sensitivity analysis.
two-step procedure for clustering problems. Parame-
ter uncertainty and model uncertainty are particularly
important considerations in optimal decision theory. 3. MCMC Simulation Methods
Optimal decision making should take into account The general computational problem facing Bayesians
uncertainty to avoid the problem of overcondence is the computation of various integrals of functions
(see Montgomery and Bradlow 1999). Bayesian deci- with respect to the posterior distribution. Since these
sion theory provides a unied approach to inference, integrals can be written as the posterior expectation
model choice, and uncertainty as discussed in 6 of
of a function of the parameters, simulation methods
this paper
seem natural candidates for approximation. For exam-
The advantages of Bayesian inference are not
ple, if we could make i.i.d. draws from the poste-
obtained without a cost, however. The Bayesian
rior we could simply approximate the integrals by the
approach is likelihood-based and requires a prior.
sample mean
Some have criticized Bayesian methods as relying on
subjective prior information. It is important to note
I = E y h
= hp y d
that the basis of prior information can also be objec-
tive or data-based. In addition, all modeling assump-
R
I = 1/R hr
tions are a form of prior information. The advantage r=1
of the Bayesian approach is that all prior assump-
tions are explicitly stated. Adherence to the princi- If draws from the posterior are available at low com-
ples of scientic inquiry does not rule out the use putational cost, we could simply use a very large
of subjective information but, rather, the specica- sample to approximate I to any desired degree of
tion of explicit and replicable procedures. It should accuracy. However, the general problem of draw-
be noted that in the practical domain of marketing from an arbitrary multivariate distribution is
ing, methods that make full use of prior information extremely difcult and there is no computationally
are required for reliable inference because informa- feasible general method.
tion about unknown quantities is hard to come by. Instead of using i.i.d. draws, another approach
Prior information from experts (Sandor and Wedel could be to construct a Markov chain with the pos-
2001), theories (Montgomery and Rossi 1999), or other terior as its stationary or equilibrium distribution. In
datasets (Lenk and Rao 1990, Putler et al. 1996, practice, this means specifying a transition density
Kamakura and Wedel 1997, Wedel and Pieters 2000, that produces a sequence of draws. r is a draw
306 Marketing Science/Vol. 22, No. 3, Summer 2003

ROSSI AND ALLENBY
from pr r1 given 0 . If p y is the stationary The sequence of draws converges in distribution to
distribution of this Markov chain, then we can simply the joint posterior distribution of the model param-
iterate the chain long enough to dissipate the effects eters, p1 K y. In addition, draws from the
of the initial condition and then save these draws to posterior distribution for any one parameter,
evaluate I. While these draws are no longer i.i.d. (they

will exhibit some form of autocorrelation in most pi y = p1 K y di
cases), laws of large numbers still apply and we can
approximate I to any desired degree of accuracy. The is obtained by simply discarding the draws of param-
use of a Markov chain to develop a simulation-based eters not of interest. For example, the posterior mean
estimate of I has been termed a MCMC method (see of pi y can be estimated from the sample mean of
Robert and Casella 1999 for a comprehensive discus- the i draws.
sion of these methods, and Chib 2003 for an excellent Many models can be expressed such in a way that
overview). The usefulness of the MCMC idea depends these various conditionals are available in closed-
on three criteria: form and from well-known distributions that are easy
(i) the ability to construct chains for arbitrary pos- to sample from. For example, single and groups of
terior distributions; linear regressions fall into this class. In addition, data
(ii) the ability of the chain to quickly converge to augmentation allows standard probit models to be
the equilibrium distribution and not to exhibit highly sampled using the Gibbs sampler. Today most work
autocorrelated or near nonstationary behavior; is being done with models that no longer have a
(iii) ease of drawing from the transition density of simple Gibbs sampler. However, the modular or con-
the chain. ditional setup of the Gibbs sampler is frequently
The class of Metropolis-Hastings algorithms pro- exploited. Typically, some sort of hybrid approach is
vide a set of methods for constructing Markov chains. used, in which Gibbs-style draws are combined with
Tierney (1994) shows that under very mild condi- Metropolis-style draws.
tions (mostly that the posterior density is positive It is possible to dene a Metropolis-Hastings style
everywhere in the parameter space), the Metropolis- MCMC algorithm for many models, including highly
Hastings style methods will converge at a geometric nonlinear models or models dened in high dimen-
rate to the unique equilibrium distribution that will sions. These algorithms, however, must be inves-
be the posterior. One particularly useful member of tigated closely to insure that they navigate freely
the Metropolis-Hastings class is the so-called Gibbs through the parameter space and reach the regions
sampler. The Gibbs sampler is dependent on the abil- of high posterior probability. It is possible to stop a
ity to draw from various conditional distributions of slowly navigating chain and conclude that the pos-
the joint posterior. Partition the vector into K sub- terior is very tight when, in fact, the algorithm is
vectors, = 1 K , and consider the conditional moving too slowly. We recommend that investiga-
distributions pk k k y, where k refers to the ele- tors simulate data to investigate performance of the
ments of , other than the kth element. If it is possi- MCMC algorithm. Independence Metropolis or ran-
ble to draw from these conditional distributions, then dom walk Metropolis algorithms must have properly
the Markov chain that can be constructed by cycling
chosen candidate sampling distributions in order to
through these conditional distributions has the pos-
function well in high-dimensional parameter spaces.
terior as its invariant distribution. That is, to draw
We recommend that investigators check these meth-
r r1 , we cycle through each of the K conditionals
ods against the slower, but more reliable, one-by-one

p1 r 1 r1 2 r1 3 r1 K y Griddy Gibbs methods.
In summary, MCMC methods are now available
p2 r 2 r 1 r1 3 r1 K y
to handle inference in a wide class of models.
MCMC methods are particularly well-suited to mod-

pK r K r 1 r 2 r K1 y els which are built from a hierarchy of conditional

ROSSI AND ALLENBY
distributions. One very important example is random ity to price. In the analysis of survey response data,
coefcient models, discussed below. Perhaps, more it is sometimes convenient to assume that responses
importantly, the modularity of the hierarchical mod- on a xed-point (e.g., ve- and seven-point) scale are
eling approaches, that dovetails so well with MCMC a censored realization of a latent, continuous vari-
methods, has enlarged the class of priors and likeli- able. Finally, in the analysis of multiple response data
hoods available for use in marketing applications. (i.e., pick any of J data), each element of the vector
of multivariate binomial responses can be thought of
as being equal to one if a latent variable surpasses a
4. Within-Unit Analysis: threshold, and equal to zero if the latent variable is
Likelihoods and Marketing Data less than the threshold value.
Marketing is concerned with understanding and The advantage of a latent variable approach to
reacting to the behavior of individual consumers. modeling marketing data is that it provides a exi-
Decisions are ultimately made at a disaggregate level, ble approach to specifying the likelihood function that
although for some types of decisions (e.g., setting is consistent with the observed, lumpy data (Rossi
store prices) an aggregate-level analysis is acceptable. et al. 2001, Marshall and Bradlow 2002). Models for
We note that it is always possible to derive aggre- the latent utility can be continuous even though the
gate predictions of actions by integrating over the dis- range of the observed dependent variable is discrete.
tribution of heterogeneity. Our discussion, therefore, Many useful models can be constructed starting from
focuses on data and models assuming that the unit an underlying multivariate normal regression model
of analysis is an individual respondent, consumer, or z = X + N 0
household.
Here, z is a m 1 vector, which is multivariate normal
Marketing data is sparse at the individual-unit
conditional on x. The latent vector, z, is censored via
level. In scanner panels of household purchases, for
some function which is not a function of the model
example, it is rare to have more than 20 observa-
parameters, . Examples include:
tions per household in most product categories. Each
Tobit Model: m = 1, y = 0, z < 0, y = z, z 0
observation is a vector response corresponding to
Ordered Probit: m = 1, y = r, cr1 z < cr , r = 1
the quantity purchased of a particular offering. The R, c0 = cR =
most frequent response value is zero, indicating no Multinomial Probit (MNP): y = j, zj = maxz1
purchase of the offering, and the second most fre- zm
quent response is one, indicating that one unit of Multivariate Probit: yj = 1, zj > 0; else yj = 0
the good is purchased. Responses also take on inte- These four examples illustrate the exibility of
ger values in surveys where respondents are asked the latent framework. The Tobit model produces a
to choose between discrete alternatives, to rank order discrete-continuous distribution for y given x that has
objects (Bradlow and Fader 2001), and to provide a lump of probability at zero (the no-purchase option,
responses on ve- and seven-point scales. Marketing for example). The ordered probit can be applied to
data are typically very lumpy, and are not well-suited ratings data in which the respondent provides ratings
to standard distributional assumptions (e.g., normal, on a ratings scale. The MNP probit model is a very
gamma, Poisson). exible general model that accommodates situations
Latent variable models are often used to explain in which choices are made from a set of m alterna-
marketing data. A latent variable model typically tives. Finally, the multivariate probit model can be
assumes that there exists an unobserved continuous used in situations such as the pick j from J alterna-
variable and a censoring mechanism that gives rise tives or where binary choice is made in different time
to the discrete outcome. In an economic model of periods or categories of products.
choice between near-perfect substitutes, for example, Latent variable models can often be given an eco-
consumers are assumed to select the offering with nomic interpretation as a random utility model. Con-
greatest value, measured as the ratio of marginal util- sider, for example, the MNP model. If consumers have

ROSSI AND ALLENBY
linear utility and can only choose one alternative, the problem. Classical econometricians have focused on
utility-maximizing choice is the choice for which the methods for approximating these integrals. The state
ratio of marginal utility to price is the highest; of the art in this area is the so-called GHK algorithm
(Keane 1993). The GHK algorithm uses importance
y = j# if Uj /pj = max%Ui /pi & sampling to approximate these probabilities. The cur-
rent classical practice involves using simulation meth-
where Ui is the marginal utility of choice i. In the
ods to approximate the likelihood (Huber and Train
random utility model, marginal utilities are not fully
2001) and then uses standard maximum likelihood
observable. We only observe various attributes of the
procedures, ignoring the simulation error (this is often
choice that are represented in the x vector. If lnUi =
called the simulated maximum likelihood approach).
Vi + i and i N 0 , then the model becomes
y = j# if Vj lnpj + j = max%Vi lnpi + i & 4.1. Data Augmentation

i Direct evaluation of the censored normal likelihood
This is a special case of the MNP model. The error can be avoided in a Bayesian approach if the param-
terms have the interpretation as the unobservable fac- eters are augmented with a vector of latent variables, z
tors inuencing marginal utility. The random util- (see Tanner and Wong 1987). To a Bayesian, all unob-
ity approach can be applied to any demand model servable quantities can be considered the object of
(see Blattberg and George 1991, Arora et al. 1998, inference regardless of whether they are called param-
Manchanda et al. 1999, Bradlow and Rao 2000, eters or latent variables. Technically, the number of
Leichty et al. 2001). If we specify a utility function, latent variables can be the same as the number of
the random component of marginal utility will induce observations, so large sample inference based on stan-
a distribution on the quantity demanded via the rst dard asymptotics does not apply. The posterior we
order conditions for utility maximization. This will now require is the joint distribution of the unobserv-
create a likelihood for the data. If the indifference able latent vector z, and the parameter vector (,
curves of the specied utility function intersect the given the data y. To reduce notational burden, we
axes of the positive orthant with nonzero slope, then will only consider the case of one observation. The
there is the potential for corner solutions in which joint posterior of z and is now the object of infer-
some of the components of the demand vector will ence. The posterior of is a marginal of this joint
be zero. These corners will create a mixed discrete- posterior.
continuous distribution of demand (Kim et al. 2002). p y = pz y dz
Models involving multivariate latent variables
As it turns out, we can exploit the latent structure of
(such as the MNP and multivariate probit models)
the model to construct a Gibbs-style Markov chain
have a likelihood function that can be computa-
that can sample from the joint posterior of the latents
tionally challenging to evaluate. For example, con-
and the parameters. We can then simply marginal-
sider the MNP model. If alternative j is chosen from
ize on the data by discarding the draws of z. That is,
m alternatives, this reveals that we are in a certain
we draw iteratively from the two conditional distri-
region of the error space (actually a cone). Thus, the
butions:
multinomial probabilities required for evaluation of
pz y and p z y
the MNP likelihood involve integrals over a region of
the error space The draw of the latent z given y is a draw from a
truncated normal distribution where the truncation
Pri = (z ) = X dz depends on the model. In the MNP case, z is trun-
R
cated to a m-dimensional cone. Given the latent vec-
The choice probabilities involve integrals of a tor z, inference proceeds as would standard Bayesian
multivariate normal density over cones, and these analysis of the underlying latent multivariate regres-
integrals pose a potentially severe computational sion model. For the linear multivariate regression

ROSSI AND ALLENBY
model, exact analytic results are available for the pos- elements of the covariance matrix (typically, the (1, 1)
terior of . Draws from the truncated multivari- element) to one. For Bayesian methods, the restric-
ate normal can easily be accomplished via one-by-one tion of the covariance matrix makes it difcult to use
draws from a series of univariate truncated normal standard conjugate priors such as the Wishart prior.
distributions (see McCulloch and Rossi 1994, Allenby McCulloch et al. (2000) show how to construct practi-
et al. 1995). This amounts to dening a subchain to cal priors on the appropriate space of matrices.
draw the truncated normal vector. What is important However, Bayesians are not limited to exact restric-
to note is that by augmenting with the latent variable, tions as a way of solving various identication prob-
we have avoided evaluation of any choice probabili- lems. Use of a proper prior distribution ensures that
ties or other integrals of the multivariate normal. The the posterior is proper, even if the likelihood is
cost of computational simplication is an enlargement not identied. In a Bayesian analysis, the issue of
of the state space for the Markov chain. In general, statistical identication shifts from an identiednot
this will cause the data-augmented MCMC method identied dichotomy, to an issue of the degree of iden-
to converge more slowly and exhibit higher auto- tication and to subspaces of the posterior distribu-
correlation than the non-data-augmented sampler. In tion that are well identied. For example, we can use
the case of the MNP model, Nobile (1998) has indi- a proper but diffuse prior in the unidentied param-
cated that, under certain conditions, the standard aug- eter space, and simply marginalize or project down
mented Gibbs sampler can exhibit very high autocor- on the space of identied parameters. The only added
relation and proposes an improved chain. cost of this procedure is making sure that the induced
Thus, data augmentation provides a clever way of prior on the identied quantities is sufciently diffuse
avoiding evaluation of various multivariate integrals to be usable in those situations in which we want our
at the possible expense of introducing high autocorre- inferences driven primarily by the data.
lation in to the MCMC method. Our experience, how- An even more striking example of the usefulness of
ever, has shown that the basic MNP Gibbs sampler this idea of navigating in the full, unidentied space
works well and can handle problems for which the can be found in the multivariate probit model. Here
method of simulated maximum likelihood grinds to the identied parameters consist only of the correla-
a halt. tion matrix of the latent variables because separate
scaling constants can be used for each element. Until
4.2. Identication recently (see Barnard et al. 2000) convenient priors
The latent variable formulation provides a natural for correlation matrices have not be available. Stan-
mechanism for understanding the identication prob- dard MCMC methods, such as Metropolis-Hastings,
lem in these models. Identication problems stem are difcult to adapt to the highly restricted space of
from the fact that various transformations of the valid correlation matrices (Manchanda et al. 1999). In
latent variables leave the observed censored outcome other words, it is hard to draw candidate correlation
variable unchanged. For example, recall that in the matrices. As Edwards and Allenby (2002) illustrate,
MNP model the choice is made with the highest all of this can be avoided by navigating in the uniden-
latent value. There are two transformations that leave tied space and projecting down to the space of cor-
the index of the maximum unchangedlocation and relation matrices (see also DeSarbo et al. 1999). These
scale shifts (see McCulloch and Rossi 1999, for fur- algorithms are fast and reliable.
ther details). Identication can be achieved either We have seen that disaggregate marketing data is
by imposing exact restrictions on the model param- often lumpy, containing discrete mass points of prob-
eters, or by employing informative priors on the full ability. A natural framework for building models with
parameter space and marginalizing on the identied discrete aspects is to use an underlying continuous
parameters. latent variable, coupled with some sort of censoring
In many classical and Bayesian approaches, the mechanism. Not only are latent variables useful for
approach to this scaling problem is to x one of the generating models but also the new MCMC Bayesian

ROSSI AND ALLENBY
inference methods nicely exploit the latent structure. means that the specication of the functional form
Finally, identication problems that are common in and hyperparameter for the prior may be important
latent variable models can be handled with great ex- in determining the inferences made for any one unit.
ibility in the Bayesian approach. A good example of this can be found in choice data
sets in which consumers are observed to be choos-

ing from a set of products. Many consumers (units)
5. Across-Unit Analysis: do not choose all of the alternatives available during
Incorporating Heterogeneity via the course of observation. In this situation, most stan-
Hierarchical Models dard choice models do not have a bounded maximum
The explosion in demand data available to marketers likelihood estimate (the likelihood has an asymptote
comes from the increased availability of disaggregate in a certain direction in the parameter space). In this
data. Scanner data at the store and household level situation, the prior is, in large part, determining the
is now commonplace. In the pharmaceutical indus- inferences made for these consumers.
try, physician-level prescription data is now com- Assessment of the joint prior for 1 N is dif-
monplace. This raises both modeling challenges, as cult, due to the high dimension of the parameter
well as major opportunities for improved protabil- space and, therefore, some sort of simplication of
ity through decentralized marketing decisions that the form of the prior is required. One frequently
exploit heterogeneity. This new data comes in panel employed simplication is to assume that, conditional
structure in which N , the number of units is large rel- on the hyperparameter, 1 N are a priori inde-
ative to T , the length of the panel. Thus, we may have pendent.
a large amount of data obtained by observing a large
p1 N y1 yN pyi i pi +
number of decision units. For a variety of reasons, it i
is unlikely that we will ever have a very large amount
of information about any one decision unit. In this This means that inference for each unit can be con-
situation, it is useful to have a model that pools infor- ducted independently of all other units conditional
mation among the units. A exible random effects on +. This is the Bayesian analogue of xed-effects
model, combined with Bayesian inference methods, approaches in classical statistics.
can produce accurate estimates at both the aggregate The specication of the conditionally independent
and individual decision unit level. prior can be very important, due to the scarcity of
data for many of the units. Both the form of the prior
5.1. Heterogeneity and Priors and the values of the hyperparameters are important
A useful general structure for disaggregate data is and can have pronounced effects on the unit-level
a panel structure in which the units are regarded inferences. For example, it is common to specify a
V . The normal form of this
normal prior, i N
as independent, conditional on unit-level parameters.
Given a joint prior on the collection of unit-level prior means that inuence of the likelihood for each
parameters, the posterior distribution can be written unit may be attenuated for likelihoods centered far
as follows: away from the prior. That is, the thin tails of the nor-
mal distribution diminish the inuence of outlying
p1 N y1 yN observations. In this sense, the specication of a nor-
mal form for the prior, whatever the values of the

pyi i p1 N + hyperparameters, is far from innocuous.
i
Assessment of the prior hyperparameters can also
The term in brackets is the conditional likelihood and be challenging in any applied situation. For the case
the rightmost term is the joint prior with hyperpa- of the normal prior, some relatively diffuse prior may
rameter, +. In many instances, the amount of infor- be a reasonable default choice. Rossi and Allenby
mation available for many of the units is small. This (1993) use a prior, based on a scaled version of the

ROSSI AND ALLENBY
pooled model information matrix. The prior covari- In the hierarchical model, the prior induced on the
ance is scaled back to represent the expected informa- unit-level parameters is not an independent prior.
tion in one observation to insure a relatively diffuse The unit-level parameters are conditionally, but not
prior. Use of this sort of normal prior will induce a unconditionally, a priori independent.
phenomenon of shrinkage in which the Bayes esti-

mates (posterior means) %i = E i datai prior
& will p1 m h = pi +p+ h d+
be clustered more closely to the prior mean than the i
unit-level maximum likelihood estimates %i &. For dif- If, for example, the second-stage prior on + is very
fuse prior settings, the normal form of the prior will diffuse, the marginal priors on the unit-level parame-
be responsible for the shrinkage effects. In particu- ters, i , will be highly dependent, as each parameter
lar, outliers will be shrunk dramatically toward the has a large common component.
prior mean. For many applications, this is a very The hierarchical model species that both prior and
desirable feature of the normal form prior. We will sample information will be used to make inferences
shrink the outliers in toward the rest of the param- about the common parameter, +. For example, in nor-
eter estimates and leave the rest pretty much alone. mal prior, i N V , the common parameters pro-
vide the location and the spread of the distribution
5.2. Hierarchical Models of i . Thus, the posterior for the i will reect a level
In general, however, it may be desirable to have the of shrinkage inferred from the data. It is important to
amount of shrinkage induced by the priors driven by remember, however, that the normal functional form
information in the data. That is, we should adapt will induce a great deal of shrinkage for outlying
the level of shrinkage to the information in the data units, even if the posterior of V is centered on large
regarding the dispersion in %i &. If, for example, we values.
observe that the %i & are tightly distributed about
some location or that there is very little information 5.3. Inference for Hierarchical Models
in each unit-level likelihood, then we might want Hierarchical models for panel data structures are
to increase the tightness of the prior so that the ideally suited for MCMC methods. In particular, a
shrinkage effects are larger. This feature of adap- Gibbs-style Markov chain can often be constructed
tive shrinkage was the original motivation for work by considering the basic two sets of conditionals:
by Efron and Morris (1975) and others on empiri- (1) i + yi
cal Bayes approaches in which prior parameters were and
estimated. These empirical Bayes approaches are an (2) + %i &
approximation to a full Bayes approach in which we The rst set of conditionals exploits the fact that
specify a second-stage prior on the hyperparameters the i are conditionally independent. The second set
of the conditional independent prior. This specica- exploits the fact that %i & are sufcient for +. That is,
tion is called a hierarchical Bayes model and con- once the %i & are drawn from (1), these serve as data
sists of the unit-level likelihood and two stages of to the inferences regarding +. If, for example, the rst-
priors. stage prior is normal, then standard natural conjugate
Likelihood: pyi i priors can be used, and all draws can be done one-
First-stage prior: pi + for-one and in logical blocks. This normal prior model
Second-stage prior: p+ h. is also the building block for other more complicated
The joint posterior for the hierarchical model is given priors. The normal model is given by
by
V
i N
p1 m + y1 ym h
A1
N

pyi i pi + p+ h
i V1 W . V

ROSSI AND ALLENBY
In the normal model, the %i & drawn from (1) are sion in the observables into the mean function
treated as a multivariate normal sample and standard
conditionally conjugate priors are used. It is worth = Bz + u
noting that in many applications the second-stage pri- u N 0 V
ors are set to be very diffuse (A1 = 100I or larger)

and the Wishart is set to have expectation I with very Here, z is a vector of explanatory variables that are
small degrees of freedom such as dim + 3. As we meant to explain across-unit differences. Typically, we
often have a larger number of units in the analysis, the might postulate that various demographic or mar-
ket characteristics might explain differences in inter-
data seems to overwhelm these priors and we learn a
cepts (brand preference) or slopes (marketing mix
great deal about +, or in the case of the normal prior,
V . sensitivities). In linear models, these normal prior

specications amount to specifying a set of interac-
In classical approaches to these models, the rst-
tions between the explanatory variables in the model
stage prior is called a random effects model and is
explaining y (see McCulloch and Rossi 1994, for fur-
considered part of the likelihood. The random effects
ther discussion of this point).
model is used to average the conditional likelihood to While the normal model is exible, there are sev-
produce an unconditional likelihood which is a func- eral drawbacks for marketing applications. As dis-
tion of the common parameters alone. cussed above, the thin tails of the normal model tend
to shrink outlying units greatly toward the center of
l+ = pyi i pi + di the data. While this may be desirable in many appli-
i
cations, it is a drawback in discovering new struc-
In the classic econometric literature, much is made ture in the data. For example, if the distribution of
of the distinction between random coefcient models the unit-level parameters is bimodal (something to
and xed effect models. Fixed effect models are con- be expected in models with brand intercepts), then a
sidered nonparametric in the sense that there is no normal rst-stage prior may shrink the unit-level esti-
specied distribution for the i parameters. Random mates to such a degree as to mask the multimodality
(see below for further discussions of diagnostics). For-
coefcient models are often consider more efcient,
tunately, the normal model provides a building block
but subject to specication error in the assumed ran-
for a mixture of normals extension of the rst-stage
dom effects distribution, pi +. In a Bayesian treat-
prior. The mixture of normals model can be written
ment, we see that the distinction between these two
approaches is in the formulation of the joint prior on p 1 K V1 VK
%1 m &.
= r1 1 1 V1 + + rK 1 K VK #

5.4. Heterogeneity Distributions rk = 1
Much of the work in both marketing and in the gen- It is well-known that the mixture of normals model
eral statistics literature has used the normal prior for provides a great deal of exibility and that with
the rst stage of the hierarchical model. The normal enough components, virtually any multivariate den-
prior offers a great deal of exibility and ts con- sity can be approximated. In particular, multiple
veniently with large Bayesian regression/multivariate modes are possible. Fatter tails than the normal can
analysis literature. The standard normal model can also be accommodated by mixing in normal compo-
easily handle analysis of many units (Steenburgh nents with large variance.
et al. 2002), and can be extended to include observ- The mixture of normals model can be viewed as
able determinants of heterogeneity (see Allenby and a generalization of the popular nite mixture model.
Ginter 1995, Rossi et al. 1996, Talukdar et al. 2002). The nite mixture model views the prior as a discrete
This can be done by introducing a multivariate regres- distribution with a set of mass points. This approach

ROSSI AND ALLENBY
has been very popular in marketing, due to the inter- about the distribution of heterogeneity can be made
pretation of each mixture point as representing a directly with the set of unit parameters, %i &, with-
segment and to the ease of estimation. In addition, out attempting to identify or estimate the component
the nite mixture approach can be given the interpre- parameters.
tation of a nonparametric method as in Heckman and In many situations, we have prior information on
Singer (1982). Critics of the nite mixture approach the signs of various coefcients in the base model. For
have pointed to the implausibility of the existence of example, price parameters are negative and advertis-
a small number of homogeneous segments, as well ing effects are positive. In a Bayesian approach, this
as the fact that the nite mixture approach does not sort of prior information can be included by modify-
allow for extreme units whose parameters lie outside ing the rst-stage prior. We replace the normal dis-
the convex hull of the support points. The mixture of tribution with a distribution with restricted support,
normals approach avoids the drawbacks of the nite corresponding to the appropriate sign restrictions. For
mixture model, while incorporating many of the more example, we can use a log-normal distribution for a
desirable features. parameter which is restricted via sign by the repa-
The MCMC algorithm for the normal heterogeneity rameterization, = ln. However, note that this
model can easily be extended to handle the mixture change in the form of the prior can destroy some of
of normals model by appending indicator variables the conjugate relationships which are exploited in the
for the mixture component to the state space. Con- Gibbs-sampler. However, if metropolis-style methods
ditional on the indicator variables, the draws of the are used to generate draws in the Markov chain, it
normal component parameters are standard conjugate is a simple matter to directly reparameterize the like-
draws given the classication of the observations into lihood function, by substituting exp( ) for , rather
one of the K components. The indicator variables, than rely on the heterogeneity distribution to impose
conditional on all other parameters, have a multi- the range restriction. What is more important is to
nomial distribution with probabilities proportional to ask whether the log-normal prior is appropriate. The
the number of units assigned to the component and left tail of the log-normal distribution declines to zero,
the likelihood that the units parameters are from the insuring a mode for the log-normal distribution at
component distribution. a strictly positive value. For situations in which we
In mixture of components models, there is a generic want to admit zero as a possible value for the param-
identication problem, generally known as the label- eter, this prior may not be appropriate. Boatwright
switching problem. A model with a given sequence et al. (1999) explore the use of truncated normal pri-
of component parameters is observationally equiva- ors as an alternative to the log-normal reparameter-
lent to any permutations of this sequence of parame- ization approach. Truncated normal priors are much
ters. Component labels, therefore, require identifying more exible, allowing for mass to be piled up at zero.
restrictions for inference to occur. One solution to this Bayesian models can also accommodate struc-
problem is to put informative priors on the model tural heterogeneity, or changes in the likelihood
parameters (e.g., 1 > 2 > > K , which works well specication for a unit of analysis. The likelihood is
when the data are in agreement with the restriction. specied as a mixture of likelihoods:
However, if the data are not in agreement (e.g., the
pyit %ik & = r1 p1 yit i1 + + rK pK yit iK
components primarily differ in V , not , then the
prior can lead to a chain that is slow to converge and estimation proceeds by appending indicator vari-
(Frhwirth-Schnatter et al. 2003). It should be noted, ables for the mixture component to the state space.
however, that the presence of label-switching does Conditional on the indicator variables, the datum, yit ,
not affect inference about parameters of a particular is assigned to one of K likelihoods. The indicator
unit, i . If the normal component mixing distribution variables, conditional on all other parameters, have
is seen as a exible device for approximating some a multinomial distribution with probabilities propor-
unknown heterogeneity distribution, then inference tional to the number of observations assigned to the

ROSSI AND ALLENBY
component, and the probability that the datum arise prior are inferred from the data, the main focus of
from likelihood. Models of structural heterogeneity concern should be on the form of this distribution.
have been used to investigate intraindividual change In the econometric literature, the use of parametric
in the decision process due to environmental changes distributions of heterogeneity (e.g., normal distribu-
(Yang and Allenby 2000) and fatigue (Otter et al. tions) are often criticized on the grounds that their
2003). misspecication leads to inconsistent estimates of the
Finally, Bayesian methods have recently been used common model parameters (cf. Heckman and Singer
to relax the commonly made assumption that the unit 1982). For example, if the true distribution of house-
parameters, i , are i.i.d. draws from the distribution hold parameters were skewed or bimodal, our infer-
of heterogeneity. Ter Hofstede et al. (2002) employ a ences based on a symmetric, unimodal normal prior
conditional Gaussian eld specication to study spa- could be misleading. One simple approach would be
tial patterns in response coefcients: to plot the distribution of the posterior household
means and compare this to the implied normal distri-
pi + = pi %j 3 j Si & V bution evaluated at the Bayes estimates of the hyper-
where Si denotes units that are spatially adjacent to parameters, N E data
E V
. The posterior means
unit i. Since the MCMC estimation algorithm employs are not constrained to follow the normal distribu-
full conditional distributions of the model parame- tion because the normal distribution is only part of
ters, the draw of i involves using a local average the prior and the posterior is inuenced by the unit-
for the mean of the mixing distribution. Yang and level data. This simple approach is in the right spirit
Allenby (2002b) employ a simultaneous specication but could be misleading due to the fact that we do
of the unit parameters to reect the possible presence not properly account for uncertainty in the unit-level
of interdependent effects, due to the presence of social parameter estimates.
and information networks. Allenby and Rossi (1999) provide a diagnostic check
of the assumption of normality in the rst stage
= 5W + u of the prior distribution that properly accounts for
parameter uncertainty. To handle uncertainty in our
u N 0 6 2 I
knowledge of the common parameters of the normal
where W is a matrix that species the network, 5, is a distribution, we compute the predictive distribution
coefcient that measures the inuence of the network, of i for unit i , selected at random from the popu-
and u is an innovation. lation of households with the random effects distri-
bution. Using our data and model, we can dene the
5.5. Diagnostic Checks of the First-Stage Prior predictive distribution of i as follows:
In the hierarchical model, the prior is specied in a
i data = V p
( V data d dV
two stage process:
V
N V is the normal prior distribution.
Here (i
We can use our MCMC draws of V , coupled
pV
with draws from the normal prior, to construct an
In the classical literature, the normal distibution of estimate of this distribution. The diagnostic check is
would be called the random effects model and would constructed by comparing the distribution of the unit-
be considered part of the likelihood, rather than part level posterior means to the predictive distribution
of the prior. Typically, very diffuse priors are used for based on the model given above.
the second stage. Thus, it is the rst-stage prior which
is important, and will always remain important, as 5.6. Findings and Inuence on Marketing Practice
long as there are only a few observations available The last ten years of work on heterogeneity in
per household. Since the parameters of the rst-stage marketing has yielded several important ndings.

ROSSI AND ALLENBY
Researchers have explored a rather large set of rst- short panels typically found in marketing applica-
stage models with a normal distribution of het- tions, it may be difcult to identify much more
erogeneity across units. In particular, investigators detailed structure beyond that afforded by the normal
have considered a rst-stage normal linear regres- model. In addition, relatively short panels may pro-
sion (Blattberg and George 1991), a rst-stage logit duce a confounding of the nding of heterogeneity
model (Allenby and Lenk 1994, 1995), a rst-stage pro- with various model misspecications in the rst stage.
bit (McCulloch and Rossi 1994), a rst-stage Poisson If only one observation is available for each unit, then
(Neelamegham and Chintagunta 1999), and a rst- the probability model for the unit level is the mixture
stage generalized gamma distribution model (Allenby of the rst-stage model with the second-stage prior:
et al. 1999, Jen et al. 2003). The major conclusion is that
there is a substantial degree of heterogeneity across py + = py p + d
units in various marketing data sets. This nding of
a large degree of heterogeneity holds out substantial This mixing can provide a more exible probability
model. In the one observation situation, we can never
promise for the study of preferences, both in terms
determine whether it is heterogeneity, or lack of
of substantive and practical signicance (Ansari et al.
exibility that causes the Bayesian hierarchical model
2000). There may be substantial heterogeneity bias
to t the data well. Obviously, with more than one
in models that do not properly account for hetero-
observation per unit, this changes, and it is possi-
geneity (Chang et al. 1999), and there is large value
ble to separately diagnose rst-stage model problems
in customizing marketing decisions to the unit level
and deciencies in the assumed heterogeneity distri-
(see Rossi et al. 1996).
bution. However, with short panels there is unlikely
Yang et al. (2002a) investigate the source of brand
to be a clean separation between these problems, and
preference, and nd evidence that variation in the
it may be the case that some of the heterogeneity
consumption environment, and resulting motivations,
detected in marketing data is really due to lack of
leads to changes in a units preference for a product
exibility in the base model.
offering (see also, Arora and Allenby 1999). Motivat-
There have been some comparisons of the nor-
ing conditions are an interesting domain for research, mal continuous model with the discrete approxima-
as they preexist the marketplace, offering a measure tion approach of a nite-mixture model. It is our
of demand that is independent of marketplace offer- view that it is conceptually inappropriate to view
ings. Other research has documented evidence that any population of units as being comprised of only
the decision process employed by a unit is not nec- a small number of homogeneous groups and, there-
essarily constant throughout a units purchase (Yang fore, the appropriate interpretation of the nite mix-
and Allenby 2000) and response (Otter et al. 2003) his- ture approach is an approximation method. Allenby
tory. This evidence indicates that the appropriate unit and Rossi (1999) and Lenk et al. (1996) show some
of analysis for marketing is at the level that is less of the shortcomings of the nite-mixture model, and
aggregate than a person or respondent, although there provide some evidence that the nite-mixture model
is evidence that household sensitivity to marketing does not recover reasonable unit-level parameter esti-
variables (Ainslie and Rossi 1998) and state depen- mates. In contrast, Andrews et al. (2002) use sim-
dence (Seetharaman et al. 1999) is constant across ulated data to suggest that unit-level recovery is
categories. comparable between the normal- and nite-mixture
The normal continuous model of heterogeneity approaches.
appears to do reasonably well in characterizing this At the same time that the Bayesian work in the
heterogeneity, but there has not yet been sufcient academic literature has shown the ability to produce
experimentation with alternative models, such as the unit-level estimates, there has been increased inter-
mixture of normals, to draw any denitive conclu- est on the part of practitioners in unit-level analysis.
sions (see Allenby et al. 1998). With the relatively Conjoint researchers have always had an interest in

ROSSI AND ALLENBY
respondent-level part-worths and had various ad hoc Parameter inference is a simple case of the general
schemes for producing these estimates. Recently, the decision theory set-up, in which the loss is often taken
Bayesian hierarchical approach to the logit model has to be quadratic. In this case, the optimal action is
been implemented in the popular Sawtooth conjoint an estimator taken to be the posterior mean of the
software. Experience with this software and simula- parameters.

tion studies have lead Rich Johnson, Sawtooth soft-
wares founder, to conclude that Bayesian methods 6.1. Model Selection
are superior to others considered in the conjoint liter- In many scientic settings, the action is a choice
ature (Sawtooth Software 2001). between competing models. In the Bayesian
Retailers are amassing volumes of store-level approach, it is possible to dene a set of models
scanner data. Not normally available to academic M1 Mk , and calculate a measure of the posterior
researchers, this store-level data is potentially useful probability of a model. If the loss function is zero
for informing the basic retail decisions such as pric- when the correct model is chosen and equal for all
ing and merchandizing. Attempts to develop reliable cases in which the incorrect model is chosen, then the
models for pricing and promotion have been frus- optimal Bayesian decision maker chooses the model
trated by the inability to produce reliable promotion with the highest posterior probability. In a parametric
and price response parameters. Thus, the promise of setting, the posterior probability of a model can be
store-level pricing has gone unrealized. Recently, a calculated as follows:
number of rms, including the leader DemandTec,
have appeared in this space, offering data-based pric- pMk D = pD Mk pMk

ing and promotion services to retail customers. At the
pD Mk = pD Mk pk d
heart of DemandTecs approach is a Bayesian shrink-
age model applied to store-sku-week data, obtained where D denotes the data. In the Bayesian
directly from the retail client. The Bayesian shrink- approach, the posterior probability only requires spec-
age methods allow DemandTec to produce reasonable ication of the class of models and the priors. There
and relatively stable store-level parameter estimates. is no distinction between nested and nonnested mod-
DemandTec builds on the approach of Montgomery els as in the hypothesis-testing literature in the classi-
(1997). cal literature. However, we do require specication of
the class of models under consideration; there is no
6. Decision Theory omnibus measure of the plausibility of a given model
The vast majority of the recent Bayesian literature or group of models versus some unspecied, and pos-
in marketing emphasizes the value of the Bayesian sibly unknown, set of alternative models.
approach to inference, particularly in situations with In situations where two models are being com-
limited information. Bayesian inference is only a pared, it is common to compute the ratio of posterior
special case of the more general Bayesian decision model probabilities. This ratio can be expressed as the
theoretic approach. Bayesian decision theory has two ratio of average likelihoods times the prior odds ratio.
critical and separate components: (1) a loss function, The ratio of average likelihood is sometimes called
and (2) the posterior distribution. The loss function the Bayes factor for a model.
associates a loss with a state of nature and a action,
pM1 D l p d1 pM1
la , where a is the action and is the state of nature = 1 1 1 1
pM2 D l2 2 p2 2 d2 pM2
(parameter). The optimal decision maker chooses the
action so as to minimize expected loss, where the The Bayes factor can be quite sensitive to the prior
expectation is taken with respect to the posterior dis- specication and, in particular, to the prior diffusion.
tribution. As the prior becomes more and more spread out, rel-
ative to the xed likelihood, the average value of the
= la p data d
min la
a likelihood declines. Thus, if the prior for Model 1 is a

ROSSI AND ALLENBY
great deal more spread out than the prior for Model 2, y is driven by the explanatory variables x and
this may result in Bayes factors which favor Model 2 parameters .
(this is certainly true in a limiting sense). In particular, py x
diffuse and improper priors can result in undened
The decision maker has control over a subset of the

Bayes factors. We recommend that close attention be x vector, x = xd xcov

. xd represents the variables
placed on the prior assessment and that prior sen- under the decision makers control and xcov are the
sitivity analysis be performed whenever computing covariates. The decision maker chooses xd so as to
posterior model probabilities. maximize the expected value of prots where the
A wide variety of methods have been proposed expectation is taken over the distribution of the out-
to approximate the posterior model probability. The come variable. In a fully Bayesian decision theoretic
most widely used method is due to Schwarz (1978), treatment, this expectation is taken with respect to the
who computed an asymptotic approximation that posterior distribution of , as well as the predictive
depends only on the dimension of the model. This is conditional distribution py xd xcov .
the idea behind the well-known Schwarz or Bayesian : xd xcov = E Ey :y xd
Information Criterion (BIC) for model choice. Except

for very special forms of priors, the Schwarz method = E :y xd py xd xcov dy
is extremely inaccurate and should not be relied on
for computation of the posterior model probability. = E :x
d xcov

Various numerical methods that rely on either the
The decision maker chooses xd to maximize prots
Laplace approximation or importance sampling meth- : . In general, the decision maker can be viewed as
ods of numerical integration are the preferred method minimizing expected loss, which is frequently taken
of approximation. In particular, Newton and Raftery as prots but need not be in all cases (see, for exam-
(1994) offer a convenient method for approximating a ple, Steenburgh et al. 2002)
Bayes factor using MCMC simulation draws to esti-
mate the average likelihood as the harmonic mean of 6.3. Plug-In vs. Full Bayes Approaches
the likelihoods of a sample from the posterior distri- The use of the posterior distribution of the model
bution. This estimator is consistent but may be unsta- parameters to compute expected prots is an impor-
ble due to draws of the parameters that are associated tant aspect of the Bayesian approach. In an approxi-
with small likelihood values. mate, or conditional, Bayes approach, the integration
of the prot function with respect to the posterior dis-
tribution of is replaced by an evaluation of the func-
6.2. Marketing Decisions and Bayesian tion at the posterior mean or mode of the parameters.
Decision Theory This approximate approach is often called the plug-
Bayesian decision theory is ideally suited for appli- in approach, or according to Morris (1983), Bayes
cation to many marketing problems in which a deci- Empirical Bayes.
sion must be made, given substantial parameter or
modeling uncertainty. In these situations, the uncer- : xd = Ey :x d = Ey

d
= :x
tainty must factor into the decision itself. The mar- When the uncertainty in is large and the prot
keting decision maker takes an action by setting the function is nonlinear, errors from the use of the
value of various variables designed to quantify the plug-in method can be large. In general, failure to
marketing environment facing the consumer (such as account for parameter uncertainty will overstate the
price or advertising levels). These decisions should be potential prot opportunity and lead to overcon-
affected by the level of uncertainty facing the mar- dence that results in an overstatement of the value of
keter. To make this concrete, begin with a probabil- information (see also Allenby 1990b, Kalyanam 1996,
ity model that species how the outcome variable Montgomery and Bradlow 1999).

ROSSI AND ALLENBY
6.4. Use of Alternative Information Sets As emphasized in 3, Bayesian methods are ideally
One of the most appealing aspects of the Bayesian suited for inference about the individual or disaggre-
approach is the ability to incorporate a variety of dif- gate parameters, as well as the common parameters.
ferent sources of information. All adaptive shrinkage Recall the prot function for the disaggregate decision
methods utilize the similarity between cross-sectional problem.

units to improve inference at the unit level. A high
level of similarity among units leads to a high level :i xd i xcov i = :x
d i xcov i i pi data di
of information shared. Because the level of similarity
is determined by the data via the rst-stage prior, the Here, we take the expectation with respect to the pos-
shrinkage aspects of the Bayesian approach adapt to terior distribution of the parameters for unit i. Total
the data. For example, Neelameghan and Chintagunta prots from the disaggregate data are simply the sum
(1999) show that similarities between countries can be of the maximized values of the prot function above.
used to predict the sales patterns following the intro-
duction of new products. ;disagg = :i xd i xcov i
The value of a given information set can be where xd i is the optimal choice of xd i
assessed using a prot metric and the posteriors of ,
corresponding to the two information sets. For exam- Aggregate prots can be computed by maximizing
ple, consider two information sets A and B, along the expectation of the sum of the disaggregate prot
with corresponding posteriors, pA pB . We solve functions with respect to the predictive distribution
the decision problem using these two posterior distri- of i
butions.
:agg xd = E :x

d xcov i
;l = max :l xd xcov = max :x
d xcov pl d
xd xd
= d
d xcov i p
:x
l = A B
;agg = :agg xd
Rossi et al. (1996) use this approach to value various
information sets available on individual households.
The appropriate predictive distribution of p, is
A targeting couponing problem that anticipated the formed from the marginal of the rst-stage prior with
now popular Catalina Marketing Inc. products was respect to the posterior distribution of the model
used to value a sequence of expanding individual parameters.
level information sets. We now turn to the problem
of valuing disaggregate information.
p = p +p+ data d+
6.5. Valuation of Disaggregate Information Comparison of ;agg with ;disagg provides a metric for
Once a fully decision-theoretic approach has been the achievable value of the disaggregate information.
specied, we can use the prot metric to value the
information in disaggregate data. We compare prots
that can be obtained via our disaggregate inferences 7. Open Issues and Directions for
about %i & with prots that could be obtained using Future Research
only aggregate information. The prot opportunities Researchers have long noted the conceptual appeal
afforded by disaggregate data will depend on both of the Bayesian framework for inference and deci-
the amount of heterogeneity across the units in the sion making. However, the potential of the Bayesian
panel data, as well as the level of information at the approach was not realized due to computational con-
disaggregate level. straints. Without modern simulation-based methods,
To make these notions explicit, we will lay out researchers were restricted to a short list of likelihoods
the disaggregate and aggregate decision problems. and associated conjugate priors. The developments

ROSSI AND ALLENBY
of the last 15 years have freed us from computation directly on the purchase quantities. Models that
constraints, allowing for the analysis of virtually any explicitly recognize that purchases are made in antic-
model. We now can consider models once thought ipation of future consumption have recently received
to be impossible to compute, and we can use pri- attention. For example, Dube (2003) explains simulta-
ors of virtually any form. The only constraint now, neous purchases of different varieties via anticipation
is the ability of the data to identify model param- of changes in tastes over future consumption occa-
eters, rather than the ability of the analyst to conduct sions. Yang et al. (2002) consider a model in which
inference for this model. However, the recent devel- the utility derived from goods is dependent on the
opments have an even more profound impact than context of consumption. Erdem and Keane (2003) con-
simply freeing us from computational constraints. The sider dynamic models of consumer demand in which
nature of the MCMC methods emphasize a modular- households stockpile goods for future consumption.
ity in the construction of models, typically achieved All of these models are amenable to Bayesian analy-
through a combination of conditional distributions. sis via data augmentation in which latent variables,
These conditional distributions specify the nature such as consumption, are introduced into the infer-
of the relationships between observed variables and ence procedures.
allow for the construction of more complicated rela- Price search models are another example of a latent
tionships. Thus, the researcher can create a more com- process of great importance in marketing. Consumers
plex model simply by adding layers to the hierarchy. are not always fully informed about the prices of
Consider, as a simple example, the relationship choice alternatives and must engage in price search.
between sales and price. Much attention has been We do not observe this price search process directly
devoted to tting the conditional distribution of sales but only the outcomes. In a classical approach, such
y given price x. However, the actual decision pro- as Mehta and Srinivasan (2003), the likelihood for
cess is certainly not well represented by one condi- the search model must be evaluated by integrating
tional distribution. Many endorse the concept of a over all possible search paths. In a data augmenta-
latent consideration set (Chiang et al. 1999) in which tion approach, this integration can be achieved by
a product must rst be included in the consideration introduction of latent variables that represent search
set before a consumer evaluates the impact of price. possibilities. In an MCMC method for navigating
If w represents the consideration set, then the model the posterior distribution of search parameters and
has been enlarged to the two layers y w x, and w z, latent variables, we do not enumerate all possible
where the consideration set is inuenced by another search paths but, instead, navigate among paths of
variable z (e.g., advertising). In the end, the hierar- high posterior probability. We believe that MCMC
chical model species a special form for the condi- approaches, together with data augmentation, hold
tional distribution of y x z that allows exploration of great promise for analyzing models with very large
the intermediary conditional relationships. Moreover, latent state spaces such as price search models and
the specication of hierarchical conditional models is discrete dynamic programming models, in general.
consistent with process models of consumer behavior Many models of consumer behavior include
(e.g., McFadden 2001). threshold-like effects. For example, some models of
Consideration sets are only one example of a latent consideration set formation have screening rules in
process that intervenes between the measurements which a threshold level of an attribute is dened. The
of the marketing mix variables and the sales out- threshold levels are unobservable parameters, and the
come variable. Other important examples include likelihood over these parameters has discontinuities.
price search and consumption. In typical demand This rules out the use of standard derivative-based
data, we do not observe the consumption of goods but maximization methods. MCMC methods simply
merely their purchases. In much demand modeling require draws from various conditional posterior dis-
in marketing, this distinction is glossed over, and the tributions in order to navigate the parameter space.
demand model is based on a utility function dened Drawing from a distribution with a density that is not

ROSSI AND ALLENBY
continuous poses no special difculties. Gilbride and but that we must consider the joint distribution of all
Allenby (2003) illustrate how this can be implemented variables.
for choice models with conjunction and disjunctive The joint determination of both outcome and input
screening rules. These developments open many pos- variables poses considerable challenges for statistical
sibilities for analysis of models with threshold com- inference and modeling. Manchanda et al. (2003) con-
ponents. sider sales force problems in which the level of sales
Thus, hierarchical modeling methods achieve not force effort at a given account is a function of sales
only a great exibility as emphasized in the Bayesian response parameters. Price endogeneity is another
statistics literature, but also they are well-suited to example of a challenging problem that involves deriv-
the elaboration of various latent process views of ing the joint distribution of price, sales, and possi-
consumer behavior and decision making. We expect ble exogeneous variables. Computational difculties
research in marketing to focus on a better understand- have limited the use of likelihood-based methods and,
ing of the process by which the consumer makes buy- instead, instrumental variables procedures have been
ing decisions, in hopes of creating more realistic, yet commonly employed. We believe there is substan-
still parsimonious, models of behavior. tial room for improvement in this area by the use of
A major challenge facing marketing practitioners is likelihood-based Bayesian approaches. As an exam-
the merging of information acquired across a vari- ple, consider a model of demand and supply in which
ety of different datasets. For example, a rm may there are cost shocks and a common demand shock
have access to consumer purchase information, sur- that is used by retailers in setting prices. This model
vey information on a subsample of consumers, and has a likelihood that is the joint distribution of price
syndicated aggregate sales information. Marketplace and quantity sold. This joint distribution is derived
and survey data cannot be combined without some from the distribution of costs shocks and demand
view to the processes by which consumers make shocks. While the mapping from shocks to observ-
buying decisions and respond to survey instruments.
ables is an implicit nonlinear system of equations,
Bayesian methods will facilitate the integration of
there is no conceptual difculty with implementing
these data sources through the specication of a
a metropolis algorithm for this system. The modu-
common set of behavioral parameters and the pro-
larity of the metropolis style MCMC method means
cesses by which these are translated into either survey
that elaborating the model by adding, for example,
responses or purchase decisions.
consumer heterogeneity, is straightforward (see Yang
The observational data used in much of quanti-
et al. 2003).
tative marketing is derived from an environment in
which the outcome and input variables are jointly
determined. Marketing mix variables are set by man-
agers with a view toward optimizing some objective 8. Conclusion
function that includes the dependent variable. For We have emphasized the value of Bayesian meth-
example, prices may be set with some knowledge of ods in situations with limited information. While
either price sensitivity or price demand shocks. Direct the total amount of data available has exploded, the
marketing response data is obtained from samples of amount of information about any one consumer is
consumers who were selected in a nonrandom fash- likely to remain limited. The customization of mar-
ion, with a view toward maximizing response rates keting actions to ner and ner levels of aggregation
or protability. Sales forces are allocated using some requires the ability to make inferences in conditions
sort of heuristic that attempts to create an optimal of limited information and to characterize the level
allocation in which the marginal benet of further of uncertainty in these inferences. Thus, we expect
effort is equated to marginal cost. This means that we Bayesian methods will play a critical role in realiz-
cannot model just the conditional distribution of the ing the potential of micromarketing and any analysis
outcome variable, given the marketing mix variables, conducted at a microlevel.

ROSSI AND ALLENBY
Finally, there are a number of important problems Appendix: Annotated Citations of Bayesian
in marketing that are essentially pure prediction prob- Applications in Marketing
lems. Given a set of information on a consumer, This annotated bibliography represents the results of a search for
the prediction problem is to predict the response applications of Bayesian statistics in marketing. Only published
or forthcoming articles that feature marketing applications are

to a given conguration of the marketing environ-
included.
ment. Information available about the consumer can
be summarized with a huge set of potential vari- Ainslie, Andrew, and Peter Rossi. 1998. Similarities in choice behav-
ior across product categories. Marketing Sci. 17 91106.
ables. The marketing environment itself can also be
A multi-category choice model is proposed where house-
summarized in many possible ways. One important
hold response coefcients are assumed dependent across cat-
applied problem is to sift through a large number of egory. The estimated distribution of heterogeneity reveals that
possible variables and functional forms to nd the price, display, and feature sensitivity are not uniquely deter-
mined for each category but may be related to household-
best possible prediction rule. In the Bayesian statis-
specic factors.
tics literature, there has been substantial progress in
the variable selection problem, and we believe these Allenby, Greg M., Thomas Shively, Sha Yang, Mark J. Garratt. 2003.
A choice model for packaged goods: Dealing with discrete
methods have great promise for application to mar-
quantities and quantity discounts. Marketing Sci. Forthcoming.
keting problems.
A method for dealing with the pricing of a product with
Structural or process-oriented approaches to mod- different package sizes is developed from utility-maximizing
eling achieve the prediction goal via a specication principles. The model allows for the estimation of demand
of the decision process. This guides in the selec- when there exist a multitude of size-brand combinations.
tion of variables and in the structure of relationships Allenby, Greg M., Robert P. Leone, Lichung Jen. 1999. A dynamic
between variables. However, structural theories are model of purchase timing with application to direct marketing.
typically silent on the exact parametric form of func- J. Amer. Statist. Assoc. 94 365374.
tional relationships or distributions. Again, there is Customer interpurchase times modeled with a heteroge-
neous generalized gamma distribution, where the distribu-
an opportunity for application of Bayesian nonpara- tion of heterogeneity is a nite mixture of inverse generalized
metric methods to the structural approach as well gamma components. The model allows for structural hetero-
(Kalyanam and Shively 1998, Shively et al. 2000). geneity where customers can become inactive.
In summary, Bayesian statistical methods offer an Allenby, Greg M., Neeraj Arora, James L. Ginter. 1998. On the
appealing set of tools to researchers in marketing. heterogeneity of demand. J. Marketing Res. 35 384389.
The Bayesian approach offers an integrated view of A normal component mixture model is compared to a nite
inference and decision making that is applicable to mixture model using conjoint data and scanner panel data. The
predictive results provide evidence that the distribution of het-
both theoretical and applied analysis. Moreover, the erogeneity is continuous, not discrete.
hierarchical modeling structure that is exploited in
Allenby, Greg M., Lichung Jen, Robert P. Leone. 1996. Economic
MCMC estimation methods is congruent with the-
trends and being trendy: The inuence of consumer condence
ories of behavior and offers a means of integrat- on retail fashion sales. J. Bus. Econom. Statist. 14 103111.
ing information across multiple data sources. Finally,
A regression model with autoregressive errors is used to
the computational advantages of Bayesian methods estimate the inuence of consumer condence on retail sales.
allow for study of high-dimensional data and com- Data are pooled across divisions of a fashion retailer to esti-
mate a model where inuence has a differential impact on pre-
plex relationships that are common in marketing. We
season versus in-season sales.
encourage our colleagues and students to experiment
with and apply Bayesian methods. Allenby, Greg M., Peter J. Lenk. 1995. Reassessing brand loyalty,
price sensitivity, and merchandising effects on consumer brand
choice. J. Bus. Econom. Statist. 13 281289.
Acknowledgments
The logistic normal regression model of Allenby and Lenk
Author Peter Rossi thanks the James M. Kilts Center for Marketing
(1994) is used to explore the order of the brand-choice process
support of this research. The authors thank Rob McCulloch and and to estimate the magnitude of price, display, and feature
Eric Bradlow for useful comments. The authors were inspired by advertising effects across four scanner panel datasets. The evi-
the writings and teachings of Arnold Zellner and Dennis Lindley dence indicates that brand-choice is not zero order, and mer-
throughout their careers. chandising effects are much larger than previously thought.

ROSSI AND ALLENBY
Allenby, Greg M., James L. Ginter. 1995. Using extremes to design Ansari, Asim, Kamel Jedidi, Sharan Jagpal. 2000. A hierarchical
products and segment markets. J. Marketing Res. 32 392403. Bayesian methodology for treating heterogeneity in structural
equation models. Marketing Sci. 19 328347.
A heterogeneous random-effects binary choice model is
used to estimate conjoint part-worths using data from a tele- Covariance matrix heterogeneity is introduced into a struc-
phone survey. The individual-level coefcients available in tural equation model, in contrast to standard models in
hierarchical Bayes models are used to explore extremes of the marketing, where heterogeneity is introduced into the mean
heterogeneity distribution, where respondents are most and structure of a model. The biasing effects of not accounting for
covariance heterogeneity are documented.
least likely to respond to product offers.
Arora, Neeraj, Greg M. Allenby. 1999. Measuring the inuence
Allenby, Greg M., Neeraj Arora, James L. Ginter. 1995. Incorpo-
of individual preference structures in group decision making.
rating prior knowledge into the analysis of conjoint studies.
J. Marketing Res. 36 476487.
J. Marketing Res. 32 152162.
Group preferences differ from the preferences of individuals
Ordinal prior information is incorporated into a conjoint
in the group. The inuence of the group on the distribution of
analysis using a rejection sampling algorithm. The resulting heterogeneity is examined using conjoint data on durable good
part-worth estimates have sensible algebraic signs that are purchases by a husbands, a wifes, and their joint evaluation.
needed for deriving optimal product congurations.
Arora, Neeraj, Greg M. Allenby, James L. Ginter. 1998. A hierarchi-
Allenby, Greg M., Peter J. Lenk. 1994. Modeling household pur- cal Bayes model of primary and secondary demand. Marketing
chase behavior with logistic normal regression. J. Amer. Statist. Sci. 17 2944.
Assoc. 89 12181231.
An economic discrete/continuous demand specication is
A discrete choice model with autocorrelated errors and con- used to model volumetric conjoint data. The likelihood func-
sumer heterogeneity is developed and applied to scanner panel tion is structural, reecting constrained utility maximization.
dataset of ketchup purchases. The results indicate substan-
Blattberg, Robert C., Edward I. George. 1991. Shrinkage estima-
tial unobserved heterogeneity and autocorrelation in purchase
tion of price and promotional elasticities: Seemingly unrelated
behavior.
equations. J. Amer. Statist. Assoc. 86 304315.
Allenby, Greg M. 1990a. Hypothesis testing with scanner data: The Weekly sales data across multiple retailers in a chain are
advantage of Bayesian methods. J. Marketing Res. 27 379389. modeled using a linear model with heterogeneity. Price and
Bayesian testing for linear restrictions in a multivariate promotional elasticity estimates are shown to have improved
predictive performance.
regression model is developed and compared to classical
methods. Boatwright, Peter, Robert McCulloch, Peter E. Rossi. 1999. Account-
Allenby, Greg M. 1990b. Cross-validation, the Bayes theorem, and level modeling for trade promotion: An application of a con-
small-sample bias. J. Bus. Econom. Statist. 8 171178. strained parameter hierarchical model. J. Amer. Statist. Assoc.
94 10631073.
Cross-validation methods that employ plug-in point
A common problem in the analysis of sales data is that
approximations to the average likelihood are compared to for-
price coefcients are often estimated with algebraic signs that
mal Bayesian methods. The plug-in approximation is shown to
are incompatible with economic theory. Ordinal constraints are
overstate the amount of statistical evidence. introduced through the prior to address this problem, leading
to a truncated distribution of heterogeneity.
Andrews, Rick, Asim Ansari, Imran Currim. 2002. Hierarchical
Bayes versus nite mixture conjoint analysis models: A com- Bradlow, Eric T., David Schmittlein. 1999. The little engines that
parison of t, prediction, and partworth recovery. J. Marketing could: Modeling the performance of World Wide Web search
Res. 8798. engines. Marketing Sci. 19 4362.
A simulation study is used to investigate the performance A proximity model is developed for analysis of the per-
of continuous and discrete distributions of heterogeneity in a formance of Internet search engines. The likelihood function
regression model. The results indicate that Bayesian methods reects the distance between the engine and specic URLs,
are robust to the true underlying distribution of heterogeneity, with the mean location of the URLs parameterized with a lin-
and nite mixture models of heterogeneity perform well in ear model.
recovering true parameter estimates. Bradlow, Eric T., S. Fader. 2001. A Bayesian lifetime model for the
Ansari, Asim., Skander Essegaier, Rajeev Kohli. 2000. Internet rec- Hot 100 Billboard songs. J. Amer. Statist. Assoc. 96 368381.
ommendation systems. J. Marketing Res. 37 363375. A time series model for ranked data is developed using a
latent variable model. The deterministic portion of the latent
Random-effect specications for respondents and stimuli variable follows a temporal pattern described by a general-
are proposed within the same linear model specication. The ized gamma distribution, and the stochastic portion is extreme
model is used to pool information from multiple data sources. value.

ROSSI AND ALLENBY
Bradlow, Eric T., Vithala R. Rao. 2000. A hierarchical Bayes model Kalyanam, Kirthi, Thomas S. Shively. 1998. Estimating irregu-
for assortment choice. J. Marketing Res. 37 259268. lar pricing effects: A stochastic spline regression approach.
A statistical measure of attribute assortment is incorporated J. Marketing Res. 35 1629.
into a random-utility model to measure consumer preference Stochastic splines are used to model the relationship
for assortment beyond the effects from the attribute levels between price and sales, resulting in a more exible specica-
themselves. The model is applied to choices between bundled tion of the likelihood function.
offerings.
Kalyanam, Kirthi. 1996. Pricing decision under demand uncer-
Chiang, Jeongwen, Siddartha Chib, Chakravarthi Narasimhan. tainty: A Bayesian mixture model approach. Marketing Sci.
1999. Markov chain Monte Carlo and models of consideration 15 207221.
set and parameter heterogeneity. J. Econometrics 89 223248.
Model uncertainty is captured in model predictions by tak-
Consideration sets are enumerated and modeled with a ing a weighted average where the weights correspond to the
Dirichlet prior in a model of choice. A latent state variable posterior probability of the model. Pricing decisions are shown
is introduced to indicate the consideration set, resulting in a
to be more robust.
model of structural heterogeneity.
Kamakura, Wagner A., Michel Wedel. 1997. Statistical data fusion
Chang, Kwangpil, S. Siddarth, Charles B. Weinberg. 1999. The
for cross-tabulation. J. Marketing Res. 34 485498.
impact of heterogeneity in purchase timing and price respon-
siveness on estimates of sticker shock effects. Marketing Sci. Imputation methods are proposed for analyzing cross-
18 178192. tabulated data with empty cells. Imputation is conducted in
an iterative manner to explore the distribution of missing
A random utility model with reference prices is exam- responses.
ined, with and without allowance for household heterogeneity.
When heterogeneity is present in the model, the reference price Kim, Jaehwan, Greg M. Allenby, Peter E. Rossi. 2002. Modeling
coefcient is estimated to be close to zero. consumer demand for variety. Marketing Sci. 21 223228.
DeSarbo, Wayne, Youngchan Kim, Duncan Fong. 1999. A Bayesian A choice model with interior and corner solutions is derived
multidimensional scaling procedure for the spatial analysis of from a utility function with decreasing marginal utility. Kuhn-
revealed choice data. J. Econometrics 89 79108. Tucker conditions are used to relate the observed data, with
utility maximization in the likelihood specication.
The deterministic portion of a latent variable model is spec-
ied as a scalar product of consumer and brand coordinates Lee, Jonathan, Peter Boatwright, Wagner Kamakura. 2003. A
to yield a spatial representation of revealed choice data. The Bayesian model for prelaunch sales forecasting of recorded
model provides a graphical representation of the market struc- music. Management Sci. 49 179196.
ture of product offerings.
The authors study the forecasting of sales for new music
Edwards, Yancy, Greg M. Allenby. 2003. Multivariate analysis of albums prior to their introduction. A hierarchical logistic
multiple response data. J. Marketing Res. Forthcoming. shaped diffusion model is used to combine a variety of sources
Pick any of J data is modeled with a multivariate pro- of information on attributes of the album, effects of marketing
bit model, allowing standard multivariate techniques to be variables, and dynamics of adoption.
applied to the parameter of the latent normal distribution.
Leichty, John, Venkatram Ramaswamy, Steven H. Cohen. 2001.
Identifying restrictions for the model are imposed by post-
processing the draws of the Markov chain. Choice menus for mass customization. J. Marketing Res. 38
183196.
Huber, Joel, Kenneth Train. 2001. On the similiarity of classical and
A multivariate probit model is used to model conjoint data
Bayesian estimates of individual mean partworths. Marketing where respondents can select multiple items from a menu. The
Lett. 12 259269. observed binomial data is modeled with a latent multivariate
Classical and Bayesian estimation methods are found to normal distribution.
yield similar individual-level estimates. The classical methods
condition on estimated hyperparameters, while Bayesian meth- Lenk, Peter, Ambar Rao. 1990. New models from old: Forecasting
ods account for their uncertainty. product adoption by hierarchical Bayes procedures. Marketing
Sci. 9 4253.
Jen, Lichung, Chien-Heng Chou, Greg M. Allenby. 2003. A Bayesian
The nonlinear likelihood function of the Bass model is com-
approach to modeling purchase frequency. Marketing Lett. 14
bined with a random-effects specication across new prod-
520.
uct introductions. The resulting distribution of heterogeneity is
A model of purchase frequency that combines a Poisson shown to improve early predictions of new product introduc-
likelihood with gamma mixing distribution is proposed, where tions.
the mixing distribution is a function of covariates. The covari-
ates are shown to be useful for customers with short purchase Lenk, Peter J., Wayne S. DeSarbo, Paul E. Green, Martin R.
histories or have infrequent interaction with the rm. Young. 1996. Hierarchical Bayes conjoint analysis: Recovery of

ROSSI AND ALLENBY
partworth heterogeneity from reduced experimental designs. standard shrinkage estimators that employ the distribution of
Marketing Sci. 15 173191. heterogeneity.
Fractionated conjoint designs are used to assess ability of Neelamegham, Ramya, Pradeep Chintagunta. 1999. A Bayesian
the distribution of heterogeneity to bridge conjoint analy- model to forecast new product performance in domestic and
ses across respondents to impute part-worths for attributes not international markets. Marketing Sci. 18 115136.
examined.
Alternative information sets are explored for making new
Manchanda, Puneet, Asim Ansari, Sunil Gupta. 1999. The shop- product forecasts in domestic and international markets, using
ping basket: A model for multicategory purchase incidence a Poisson model for attendance with log-normal heterogeneity.
decisions. Marketing Sci. 18 95114.
Putler, Daniel S., Kirthi Kalyanam, James S. Hodges. 1996. A
Multicategory demand data are modeled with a multi- Bayesian approach for estimating target market potential
variate probit model. Identifying restrictions in the latent with limited geodemographic information. J. Marketing Res.
error covariance matrix require use of a modied Metropolis- 33 134149.
Hastings algorithm.
Prior information about correlation among variables is com-
Marshall, Pablo, Eric T. Bradlow. 2002. A unied approach to con- bined with data on the marginal distribution to yield a joint
joint analysis models. J. Amer. Statist. Assoc. 97 674682. posterior distribution.
Various censoring mechanisms are proposed for relating Rossi, Peter E., Zvi Gilula, Greg M. Allenby. 2001. Overcoming scale
observed interval, ordinal, and nominal data to a latent linear usage heterogeneity: A Bayesian hierarchical approach. J. Amer.
conjoint model. Statist. Assoc. 96 2031.
McCulloch, Robert E., Peter E. Rossi. 1994. An exact likelihood Consumer response data on a xed-point rating scale are
analysis of the multinomial probit model. J. Econometrics 64 assumed to be censored outcomes from a latent normal distri-
217228. bution. Variation in the censoring cutoffs among respondents
allow for scale use heterogeneity.
The multinomial probit model is estimated using data aug-
mentation methods. Approaches to handling identifying model Rossi, Peter E., Robert E. McCulloch, Greg M. Allenby. 1996. The
identication are discussed. value of purchase history data in target marketing. Marketing
Sci. 15 321340.
Moe, Wendy, Peter Fader. 2002. Using advance purchase orders to
track new product sales. Marketing Sci. 21 347364. The information content of alternative data sources is eval-
uated using an economic loss function of coupon protability.
A hierarchical model of product diffusion is developed for The value of a households purchase history is shown to be
forecasting new product sales. The model features a mixture large relative to demographic information and other informa-
of Weibulls as the basic model, with a distribution of hetero- tion sets.
geneity over related products. The model is applied to data on
music album sales. Rossi, Peter E., Greg M. Allenby. 1993. A Bayesian approach to
estimating household parameters. J. Marketing Res. 30 171182.
Montgomery, Alan L. 1997. Creating micro-marketing pricing
strategies using supermarket scanner data. Marketing Sci. Individual-level parameters are obtained with the use of an
16 315337. informative, but relatively diffuse, prior distribution. Methods
of assessing and specifying the amount of prior information
Bayesian hierarchical models are applied to store-level scan- are proposed.
ner data. The model specication involves store-level demo-
graphic variables. Prot opportunities for store-level pricing Sandor, Zsolt, Michel Wedel. 2001. Designing conjoint choice exper-
are explored using constraints on the change in average price. iments using managers prior beliefs. J. Marketing Res. 28
430444.
Montgomery, Alan L., Eric T. Bradlow. 1999. Why analyst overcon-
dence about the functional form of demand models can lead The information from an experiment involving discrete
choice models depends on the experimental design and the
to overpricing. Marketing Sci. 18 569583.
values of the model parameters. Optimal designs are deter-
The specication of a function form involves imposing exact mined with an information measure that is dependent on the
restrictions in an analysis. Stochastic restrictions are introduced prior distribution.
via a more exible model specication and prior distribution,
resulting in less aggressive policy implications. Seetharaman, P. B., Andrew Ainslie, Pradeep Chintagunta. 1999.
Investigating household state dependence effects across cate-
Montgomery, Alan L., Peter E. Rossi. 1999. Estimating price elastic- gories. J. Marketing Res. 36 488500.
ities with theory-based priors. J. Marketing Res. 36 413423.
Multiple scanner panel datasets are used to estimate a
The prior distribution is used to stochastically impose model of brand choice with state dependence. Individual-level
restrictions on price elasticity parameters that are consistent estimates of state dependence effects are examined among cat-
with economic theory. This proposed approach is compared to egories.

ROSSI AND ALLENBY
Shively, Thomas A., Greg M. Allenby, Robert Kohn. 2000. A non- Yang, Sha, Greg M. Allenby, Geraldine Fennell. 2002a. Modeling
parametric approach to identifying latent relationships in hier- variation in brand preference: The roles of objective environ-
archical models. Marketing Sci. 19 149162. ment and motivating conditions. Marketing Sci. 21 1431.
Stochastic splines are used to explore the covariate speci- Intraindividual variation in brand preference is documented
cation in the distribution of heterogeneity. Evidence of highly and associated with variation in the consumption context and
nonlinear relationships is provided. motivations for using the offering. The unit of analysis is
shown be at the level of a person-occasion, not the person.
Steenburgh, Thomas J., Andrew Ainslie, Peder H. Engebretson.
Yang, Sha, Greg M. Allenby. 2003. Modeling interdependent con-
2002. Massively categorical variables: Revealing the informa-
sumer preferences. J. Marketing Res. Forthcoming.
tion in zipcodes. Marketing Sci. 22 4057.
The distribution of heterogeneity is modeled using a spatial
The effects associated with massively categorical variables, autoregressive process, yielding interdependent draws from
such as zip codes, are modeled in a random-effects specica- the mixing distribution. Heterogeneity is related to multiple
tion. Alternative loss functions are examined for assessing the networks dened with geographic and demographic variables.
value of the resulting shrinkage estimates.
Talukdar, Debabrata, K. Sudhir, Andrew Ainslie. 2002. Invest-

References
Ainslie, Andrew, Peter Rossi. 1998. Similarities in choice behavior
ing new production diffusion across products and countries.
across product categories. Marketing Sci. 17 91106.
Marketing Sci. 21 97116.
Allenby, Greg M., Neeraj Arora, James L. Ginter. 1995. Incorpo-
The Bass diffusion model is coupled with a random effects rating prior knowledge into the analysis of conjoint studies.
specication for the coefcients of innovation, imitation, and J. Marketing Res. 32 152162.
market potential. The random effects model includes macroe- , , . 1998. On the heterogeneity of demand.
conomic covariates that have large explanatory power relative J. Marketing Res. 35 384389.
to unobserved heterogeneity. , James L. Ginter. 1995. Using extremes to design products and
Ter Hofstede, Frenkel, Michel Wedel, Jan-Benedict E. M. segment markets. J. Marketing Res. 32 392403.
Steenkamp. 2002. Identifying spatial segments in international , Peter J. Lenk. 1994. Modeling household purchase behav-
markets. Marketing Sci. 21 160177. ior with logistic normal regression. J. Amer. Statist. Assoc. 89
12181231.
The distribution of heterogeneity in a linear regression
, . 1995. Reassessing brand loyalty, price sensitivity,
model is specied as a conditional Guassian eld to reect
and merchandising effects on consumer brand choice. J. Bus.
spatial associations. The heterogeneity specication avoids the
Econom. Statist. 13 281289.
assumption that the random effects are globally independent.
, Robert P. Leone, Lichung Jen. 1999. A dynamic model of
Ter Hofstede, Frenkel, Youingchan Kim, Michel Wedel. 2002. purchase timing with application to direct marketing. J. Amer.
Bayesian prediction in hybrid conjoint analysis. J. Marketing Statist. Assoc. 94 365374.
Res. 34 253261. , Peter E. Rossi. 1999. Marketing models of consumer hetero-
geneity. J. Econometrics 89 5778.
Self-state attribute-level importance and prole evaluations
Andrews, Rick, Asim Ansari, Imran Currim. 2002. Hierarchical
are modeled as joint outcomes from a common set of part-
worths. The likelihoods for the dataset differ and include other, Bayes versus nite mixture conjoint analysis models: A com-
incidental parameters that facilitate the integration of informa- parison of t, prediction, and partworth recovery. J. Marketing
tion to produce improved estimates. Res. 8798.
Ansari, Asim., Skander Essegaier, Rajeev Kohli. 2000a. Internet rec-
Wedel, Michel, Rik Pieters. 2000. Eye xations on advertisements ommendation systems. J. Marketing Res. 37 363375.
and memory for brands: A model and ndings. Marketing Sci. , Kamel Jedidi, Sharan Jagpal. 2000b. A hierarchical Bayesian
19 297312. methodology for treating heterogeneity in structural equation
A multilevel model of attention and memory response is models. Marketing Sci. 19 328347.
used to investigate the effect of brand, pictorial, and text Arora, Neeraj, Greg M. Allenby. 1999. Measuring the inuence of
attributes of print advertisements. Information in the data is individual preference structures in group decision making. J.
integrated through a multilayered likelihood specication. Marketing Res. 36 476487.
, Greg M. Allenby, James L. Ginter. 1998. A hierarchical Bayes
Yang, Sha, Greg M. Allenby. 2000. A model for observation, struc- model of primary and secondary demand. Marketing Sci. 17
tural, and household heterogeneity in panel data. Marketing 2944.
Lett. 11 137149. Barnard, John, Robert E. McCulloch, Xiao-Li Meng. 2000. Model-
Structural heterogeneity is specied as a nite mixture of ing covariance matrices in terms of standard deviations and
nonnested likelihoods, and covariates are associated with the correlations, with application to shrinkage. Statistica Sinica
mixture point masses. 10 424.

ROSSI AND ALLENBY
Bernardo, Jose, Adrian F. M. Smith. 1994. Bayesian Theory. John Jen, Lichung, Chien-Heng Chou, Greg M. Allenby. 2003. A Bayesian
Wiley, New York. approach to modeling purchase frequency. Marketing Lett. 14
Blattberg, Robert C., Edward I. George. 1991. Shrinkage estima- 520.
tion of price and promotional elasticities: Seemingly unrelated Kalyanam, Kirthi. 1996. Pricing decision under demand uncer-
equations. J. Amer. Statist. Assoc. 86 304315. tainty: A Bayesian mixture model approach. Marketing Sci.
Boatwright, Peter, Robert McCulloch, Peter E. Rossi. 1999. Account- 15 207221.
level modeling for trade promotion: An application of a con- , Thomas S. Shively. 1998. Estimating irregular pricing effects:
strained parameter hierarchical model. J. Amer. Statist. Assoc. A stochastic spline regression approach. J. Marketing Res. 35
94 10631073. 1629.
Bradlow, Eric T., S. Fader. 2001. A Bayesian lifetime model for the Kamakura, Wagner A., Michel Wedel. 1997. Statistical data fusion
Hot 100 Billboard songs. J. Amer. Statist. Assoc. 96 368381. for cross-tabulation. J. Marketing Res. 34 485498.
, Vithala R. Rao. 2000. A hierarchical Bayes model for assort- Keane, Michael. 1993. Simulation estimation methods for lim-
ment choice. J. Marketing Res. 37 259268. ited dependent variable models. G. S. Maddala, C. R. Rao,
Chang, Kwangpil, S. Siddarth, Charles B. Weinberg. 1999. The H. D. Vinod, eds. Handbook of Statistics, Vol. 11. North Holland,
impact of heterogeneity in purchase timing and price respon- Amsterdam, The Netherlands.
siveness on estimates of sticker shock effects. Marketing Sci. Kim, Jaehwan, Greg M. Allenby, Peter E. Rossi. 2002. Modeling
18 178192. consumer demand for variety. Marketing Sci. 21 223228.
Chiang, Jeongwen, Siddartha Chib, Chakravarthi Narasimhan. Leichty, John, Venkatram Ramaswamy, Steven H. Cohen. 2001.
1999. Markov chain Monte Carlo and models of consideration Choice menus for mass customization. J. Marketing Res.
38 183196.
set and parameter heterogeneity. J. Econometrics 89 223248.
Lenk, Peter J., Wayne S. DeSarbo 2000. Bayesian inference for nite
Chib, Siddartha. 2003. Monte Carlo methods and Bayesian compu-
mixtures of generalized linear models with random effects.
tation: Overview. S. E. Fienberg, J. B. Kadane, eds. International
Psychometrika 65 93119.
Encyclopedia of the Social and Behavioral Sciences: Statistics. Else-
, Ambar Rao. 1990. New models from old: Forecasting prod-
vier Science, Amsterdam, The Netherlands. In press.
uct adoption by hierarchical Bayes procedures. Marketing Sci. 9
DeSarbo, Wayne, Youngchan Kim, Duncan Fong. 1999. A Bayesian
4253.
multidimensional scaling procedure for the spatial analysis of
, Wayne S. DeSarbo, Paul E. Green, Martin R. Young. 1996.
revealed choice data. J. Econometrics 89 79108.
Hierarchical Bayes conjoint analysis: Recovery of partworth
Dube, Jean-Pierre. 2003. Multiple discreteness and product differ-
heterogeneity from reduced experimental designs. Marketing
entiation: Demand for carbonated soft drinks. Marketing Sci.
Sci. 15 173191.
Forthcoming.
Liu, Jun S. 2001. Monte Carlo Strategies in Scientic Computing.
Edwards, Yancy, Greg M. Allenby. 2002. Multivariate analysis of
Springer-Verlag, New York.
multiple response data. J. Marketing Res. Forthcoming.
Manchanda, Puneet, Asim Ansari, Sunil Gupta. 1999. The shop-
Efron, Brad, Carl Morris. 1975. Data analysis using Steins estimator
ping basket: A model for multicategory purchase incidence
and its generalizations, J. Amer. Statist. Assoc. 70 311319.
decisions. Marketing Sci. 18 95114.
Erdem, Tulin, Micheal Keane. 2003. Brand and quantity choice , Pradeep K. Chintagunta, Peter E. Rossi. 2003. Response
dynamics under price uncertainty. Quantitative Marketing modeling with non-random marketing mix variables. Work-
Econom. 1 564. ing paper, Graduate School of Business, University of Chicago,
Frhwirth-Schnatter, Sylvia, Regina Tckler, Thomas Otter. 2003. Chicago, IL.
Bayesian analysis of the heterogeneity model. J. Bus. Econom. Marshall, Pablo, Eric T. Bradlow. 2002. A unied approach to con-
Statist. Forthcoming. joint analysis models. J. Amer. Statist. Assoc. 97 674682.
Gelfand, Alan E., Adrian F. M. Smith. 1990. Sampling-based McCulloch, Robert, Nicholas Polson, Peter Rossi. 2000. Bayesian
approaches to calculating marginal densities. J. Amer. Statist. analysis of the multinomial probit model with fully identied
Assoc. 87(June) 523532. parameters. J. Econometrics 99 173193.
Gelman, Andrew, John B. Carlin, Hal S. Stern, Donald B. Rubin. , Peter E. Rossi. 1994. An exact likelihood analysis of the multi-
1995. Bayesian Data Analysis. Chapman Hall, London. nomial probit model. J. Econometrics 64 217228.
Gilbride, Tim, Greg Allenby. 2003. Attribute-based consideration , . 1999. Bayesian analysis of multinomial probit model.
sets. Working paper, Ohio State University. Mariano, Weeks, Schuermann, eds. Simulation-Based Inference in
Heckman, James, Bernard Singer. 1982. A method for minimizing Econometrics. Cambridge University, Cambridge, U.K.
the impact of distributional assumptions in econometric mod- McFadden, Daniel. 2001. Economic choices. Amer. Econom. Rev. 91
els for duration data. Econometrica 5 2 271320. 351370.
Huber, Joel, Kenneth Train. 2001. On the similiarity of classical and Mehta, Nitin, Kannan Srinivasan. 2003. Price uncertainty and con-
Bayesian estimates of individual mean partworths. Marketing sumer search: A structural model of consideration set forma-
Lett. 12 259269. tion. Marketing Sci. 22 5884.

ROSSI AND ALLENBY
Montgomery, Alan L. 1997. Creating micro-marketing pricing Sawtooth Software. 2001. CBC hierarchical Bayes analysis tech-
strategies using supermarket scanner data. Marketing Sci. nical paper. Sawtooth Software Technical Paper Series,
16 315337. www.sawtoothsoftware.com.
, Eric T. Bradlow. 1999. Why analyst overcondence about the Schwarz, Gideon. 1978. Estimating the dimension of a model. Ann.
functional form of demand models can lead to overpricing. Statist. 6 461464.

Marketing Sci. 18 569583. Seetharaman, P. B., Andrew Ainslie, Pradeep Chintagunta. 1999.
, Peter E. Rossi. 1999. Estimating price elasticities with theory- Investigating household state dependence effects across cate-
based priors. J. Marketing Res. 36 413423. gories. J. Marketing Res. 36 488500.
Morris, Carl. 1983. Parametric empirical bayes inference: Theory Shively, Thomas A., Greg M. Allenby, Robert Kohn. 2000. A non-
and applications. J. Amer. Statist. Assoc. 78 4765. parametric approach to identifying latent relationships in hier-
Neelamegham, Ramya, Pradeep Chintagunta. 1999. A Bayesian archical models. Marketing Sci. 19 149162.
model to forecast new product performance in domestic and Steenburgh, Thomas J., Andrew Ainslie, Peder H. Engebretson.
international markets. Marketing Sci. 18 115136. 2002. Massively categorical variables: Revealing the informa-
Newton, Michael, Adrian E. Raftery. 1994. Approximate Bayesian tion in zip codes. Marketing Sci. 22 4057.
inference by the weighted likelihood bootstrap (with discus-
Tanner, Martin A., Wing H. Wong. 1987. The calculation of poste-
sion). J. Royal Statist. Soc. Series B 56 348.
rior distributions by data augmentation. J. Amer. Statist. Assoc.
Nobile, Augustino. 1998. A hybrid Markov chain for the Bayesian
82 528550.
analysis of the multinomial probit model. Statist. Comput.
Ter Hofstede, Frenkel, Michel Wedel, Jan-Benedict E. M.
8 229242.
Steenkamp. 2002. Identifying spatial segments in international
Otter, Thomas, Sylvia Frhwirth-Schnatter, Regina Tchler. 2003.
markets. Marketing Sci. 21 160177.
Unobserved preference changes in conjoint analysis. Working
, Youingchan Kim, Michel Wedel. 2002. Bayesian prediction in
paper, University of Vienna.
hybrid conjoint analysis. J. Marketing Res. 34 253261.
Putler, Daniel S., Kirthi Kalyanam, James S. Hodges. 1996. A
Tierney, Luke. 1994. Markov chains for exploring posterior distri-
Bayesian approach for estimating target market potential
butions. Ann. Statist. 23(4) 17011728.
with limited geodemographic information. J. Marketing Res.
33 134149. Wedel, Michel, Rik Pieters. 2000. Eye xations on advertisements
Robert, Christian P., George Casella. 1999. Monte Carlo Statistical and memory for brands: A model and ndings. Marketing Sci.
Methods. Springer, New York. 19 297312.
Rossi, Peter E., Greg M. Allenby. 1993. A Bayesian approach to Yang, Sha, Greg M. Allenby. 2000. A model for observation, struc-
estimating household parameters. J. Marketing Res. 30 171182. tural, and household heterogeneity in panel data. Marketing
, Zvi Gilula, Greg M. Allenby. 2001. Overcoming scale usage Lett. 11 137149.
heterogeneity: A Bayesian hierarchical approach. J. Amer. , . 2003. Modeling interdependent consumer preferences.
Statist. Assoc. 96 2031. J. Marketing Res. Forthcoming.
, Robert E. McCulloch, Greg M. Allenby. 1996. The value of , , Geraldine Fennell. 2002. Modeling variation in brand
purchase history data in target marketing. Marketing Sci. 15 preference: The roles of objective environment and motivating
321340. conditions. Marketing Sci. 21 1431.
Sandor, Zsolt, Michel Wedel. 2001. Designing conjoint choice , Yuxin Chen, Greg Allenby. 2003. Bayesian analysis of simul-
experiments using managers prior beliefs. J. Marketing Res. taneous demand and supply. Working paper, Ohio State
28 430444. University.
This paper was received July 28, 2002, and was with the authors 2 months for 2 revisions; processed by Pradeep Chintagunta.

BayesianStatisticsandMarketing ByRossiand Allenby

Caricato da

Informazioni sul documento

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

BayesianStatisticsandMarketing ByRossiand Allenby

Caricato da

Copyright:

Formati disponibili

This article was downloaded by: [128.59.106.

102] On: 02 March 2016, At: 12:41

Bayesian Statistics and Marketing

To cite this article:

Full terms and conditions of use: http://pubsonline.informs.org/page/terms-and-conditions

Please scroll down for articleit is on subsequent pages

1. Introduction require assessment of a prior, which some feel to be an

Marketing Science 2003 INFORMS 0732-2399/03/2203/0304

Marketing Science/Vol. 22, No. 3, Summer 2003 305

306 Marketing Science/Vol. 22, No. 3, Summer 2003

evaluate I. While these draws are no longer i.i.d. (they

Marketing Science/Vol. 22, No. 3, Summer 2003 307

308 Marketing Science/Vol. 22, No. 3, Summer 2003

y = j# if Vj lnpj  + j = max%Vi lnpi  + i & 4.1. Data Augmentation

Marketing Science/Vol. 22, No. 3, Summer 2003 309

310 Marketing Science/Vol. 22, No. 3, Summer 2003

sets in which consumers are observed to be choos-

Marketing Science/Vol. 22, No. 3, Summer 2003 311

phenomenon of shrinkage in which the Bayes esti- 

312 Marketing Science/Vol. 22, No. 3, Summer 2003

ors are set to be very diffuse (A1 = 100I or larger)

Marketing Science/Vol. 22, No. 3, Summer 2003 313

314 Marketing Science/Vol. 22, No. 3, Summer 2003

Marketing Science/Vol. 22, No. 3, Summer 2003 315

316 Marketing Science/Vol. 22, No. 3, Summer 2003

software. Experience with this software and simula- parameters.

Marketing Science/Vol. 22, No. 3, Summer 2003 317

The decision maker has control over a subset of the

Information Criterion (BIC) for model choice. Except  

318 Marketing Science/Vol. 22, No. 3, Summer 2003

methods utilize the similarity between cross-sectional problem.

:agg xd  = E :x

Marketing Science/Vol. 22, No. 3, Summer 2003 319

320 Marketing Science/Vol. 22, No. 3, Summer 2003

Marketing Science/Vol. 22, No. 3, Summer 2003 321

or forthcoming articles that feature marketing applications are

322 Marketing Science/Vol. 22, No. 3, Summer 2003

Marketing Science/Vol. 22, No. 3, Summer 2003 323

324 Marketing Science/Vol. 22, No. 3, Summer 2003

Marketing Science/Vol. 22, No. 3, Summer 2003 325

Talukdar, Debabrata, K. Sudhir, Andrew Ainslie. 2002. Invest-

326 Marketing Science/Vol. 22, No. 3, Summer 2003

Marketing Science/Vol. 22, No. 3, Summer 2003 327

functional form of demand models can lead to overpricing. Statist. 6 461464.

328 Marketing Science/Vol. 22, No. 3, Summer 2003

Potrebbero piacerti anche

y = j# if Vj lnpj + j = max%Vi lnpi + i & 4.1. Data Augmentation

phenomenon of shrinkage in which the Bayes esti-

Information Criterion (BIC) for model choice. Except

:agg xd = E :x