Data Analysis PDF

Data Analysis
Frequency Distribution
In a frequency distribution, one variable is
considered at a time.
A frequency distribution for a variable produces a
table of frequency counts, percentages, and
cumulative percentages for all the values associated
with that variable.
Statistics Associated with Frequency
Distribution
Measures of Location
The mean, or average value, is the most commonly used
measure of central tendency. The mean, ,is given by
X
n
X = S X /n
i
i= 1
Where,
Xi = Observed values of the variable X
n = Number of observations (sample size)
The mode is the value that occurs most frequently. It

represents the highest peak of the distribution. The mode
is a good measure of location when the variable is
inherently categorical or has otherwise been grouped into
categories.
Distribution
Measures of Location
The median of a sample is the middle value when
the data are arranged in ascending or descending
order. If the number of data points is even, the
median is usually estimated as the midpoint between
the two middle values by adding the two middle
values and dividing their sum by 2. The median is
the 50th percentile.
Distribution
Measures of Variability
The variance is the mean squared

deviation
n
from 2the mean.
(Xi - X)
sx = S
i =1 n - 1
The standard deviation is the square

root of the variance.
Cross-Tabulation
While a frequency distribution describes one variable
at a time, a cross-tabulation describes two or more
variables simultaneously.
Cross-tabulation results in tables that reflect the joint
distribution of two variables with a limited number of
categories or distinct values.
Since two variables have been cross classified,
percentages could be computed either columnwise,
based on column totals or rowwise, based on row
totals
The general rule is to compute the percentages in the
direction of the independent variable, across the
dependent variable.
Pepsi Consumption by Gender
Gender
Pepsi Consumption Male Female
Light 33.3% 66.7%
Heavy 66.7% 33.3%
Column total 100% 100%

Purchase of Fashion Clothing by Marital
Status
Purchase of Current Marital Status
Fashion
Clothing Married Unmarried
High 31% 52%
Low 69% 48%
Column 100% 100%
Number of 700 300
respondents
Purchase of Fashion Clothing by Marital
Status
Pur chase of Sex
Fashion Male Female
Clothing Marr ied Not Mar r ied Not
Mar r ied Mar r ied
High 35% 40% 25% 60%
Low 65% 60% 75% 40%
Column 100% 100% 100% 100%

totals
Number of 400 120 300 180
cases
Case Cleopatra
Prelaunch Market Research

Objective to assess
response to Cleopatra advt.
Product acceptance
Design
Supergroup
Ad test, Product Placement
Methodology
??
Toronto
Case Cleopatra

Results
Positive from the group
50% Buying Intention Post Ad
64% Buying Intention Post Trial
Decision
Launch in Quebec
Premium
Advt. and some consumer promotion
Case Cleopatra

Problems
Location
Beyond Trial
Adoption, Purchase Frequency
Poor performance
Sales
Case: Cleopatra
Post Launch Study

204 All Soap Users
99 Cleopatra Users (Try)
Results
High Awareness
73.5%
Low Trials
14%
Case Cleopatra
Trial Implications
Lost Opportunity
73.514.2%
Critical factor
High awareness not enough
Awareness, Interest, Evaluation, Trial, Adoption
Case Cleopatra
Low Trials Reasons

Lack of adequate promotional support
Low redemption of coupons
Sweepstakes did not work at all
Problems with the ad : Exhibit 13
63% do not intend to try
59% no or a negative reaction to the Cleopatra
Why?
Case Cleopatra
Problems with the ad

Jug of perfume being poured
Strong smell a problem
Perceived to be harsh and not for skin care
Footnote Exhibit 11
Execution of bath
Showers outnumber baths 4:1 in Quebec (ex.
12)
Not for everyday usage
67% --Occasional usage (ex. 12)
Case Cleopatra
Decision Options
Discontinue brand
Continue the current strategy
4.5% market share
Smaller niche
Case Cleopatra
Decision Options
Discontinue brand
Subsidiary/Sales force reputation
Externally
Internally
Need a contender for skin care segment
Case Cleopatra
Decision Options
Continue the current strategy
Significantly higher trial levels
Increase in promotions
Increase in expenses
More losses
Case Cleopatra
Brand Performance
High Conversion rate
Strong diagnostics among users
Exhibit 10
Skin care 50%
Fragrance 53%
Case Cleopatra: Exhibit 9
Brand Conversion
rate(all+most
occasions)/ever tried
Aloe and Lonolin 16%
Camay 14%
Cleopatra 31%
Dove 21%
Palmolive 12%
Case Cleopatra
Scale down expectations

Target a smaller segment
Need to profile current acceptors
Need to promote to this group
Change advertising- low/drop.
Reduce distribution coverage
With better incentives
Further Analysis: Crosstabs
Exhibits 9 and 10
Dove Regular vs. Others
Age segments
MHI groups
Problem 0
Pepsi has conducted a pilot U & A study

for its brands. It has found that favourite
brand varies across males and females.
It found that 5/15 males and 10/15
females prefer Mirinda and the reverse
is true for Pepsi. How should Pepsi test
this relationship?
Statistics Associated with Cross-
Tabulation
Chi-Square
To determine whether a systematic association
exists, the probability of obtaining a value of chi-
square as large or larger than the one calculated
from the cross-tabulation is estimated.
An important characteristic of the chi-square statistic
is the number of degrees of freedom (df) associated
with it. That is, df = (r - 1) x (c -1).
The null hypothesis (H0) of no association between
the two variables will be rejected only when the
calculated value of the test statistic is greater than
the critical value of the chi-square distribution with the
appropriate degrees of freedom.
Tabulation
Chi-Square
The chi-square statistic ( ) is used to test the
statistical significance of the observed association in
a cross-tabulation.
The expected frequency for each cell can be
calculated by using a simple formula:
nrnc
fe = n
where nr = total number in the row
nc = total number in the column
n = total sample size
Tabulation
Chi-Square
For the data in Table, the expected frequencies
for 15 X 15 = 7.50 15 X 15 = 7.50
30 30
the cells going from left to right and from top to
bottom, are: 15 X 15 = 7.50 15 X 15 = 7.50
30 30
(f o - f e) 2
2 = S fe
all
cells
Then the value of is calculated as follows:

Tabulation
Chi-Square
For the data in Table, the value of is
calculated as:
= (5 -7.5)2 + (10 - 7.5)2 + (10 - 7.5)2 + (5 - 7.5)2

7.5 7.5 7.5 7.5
=0.833 + 0.833 + 0.833+ 0.833
= 3.333
Marketing Problem 1
Vodafone Mobile has conducted a pilot

customer satisfaction study and it has found
that from a sample of 29 IIM students
average is 4.724 on a 7-point satisfaction
scale with a std. dev. Of 1. 579. Minimum
acceptable value of customer satisfaction
should be greater 4 for the firm. What should
the market research manager recommend to
the marketing manager?
Hypothesis Testing Using the t
Statistic
1. Formulate the null (H0) and the alternative (H1)
hypotheses.
2. Select the appropriate formula for the t statistic.
3. Select a significance level for testing H0. Typically,
the 0.05 level is selected.
4. Take one or two samples and compute the mean
and standard deviation for each sample.
5. Calculate the t statistic assuming H0 is true.
Hypothesis Testing Using the t
Statistic
6. Calculate the degrees of freedom and estimate the
probability of getting a more extreme value of the
statistic from Table 4 (Alternatively, calculate the
critical value of the t statistic).
7. If the probability computed in step 5 is smaller than
the significance level selected in step 2, reject H0.
If the probability is larger, do not reject H0.
8. Express the conclusion reached by the t test in
terms of the marketing research problem.
One Sample
t Test
The hypotheses may be

formulated as:
H0: < 4.0

H1: > 4.0
t = (X - )/sX
sX = s/ n
sX = 1.579/ 29
= 1.579/5.385 = 0.293
t = (4.724-4.0)/0.293 = 0.724/0.293 = 2.471

One Sample
t Test
The degrees of freedom for the t statistic to test the
hypothesis about one mean are n - 1. In this case,
n - 1 = 29 - 1 or 28. From Table 4 in the Statistical
Appendix, the probability of getting a more extreme
value than 2.471 is less than 0.05 (Alternatively, the
critical t value for 28 degrees of freedom and a
significance level of 0.05 is 1.7011, which is less than
the calculated value). Hence, the null hypothesis is
rejected. The satisfaction level does exceed 4.0.
Marketing Problem -2
Levers has launched a new brand of

coffee. It is interested in knowing if
consumers in South and North India are
responding differently to its new
product. What testing procedure do you
recommend to Levers?
Two Independent Samples
Means
In the case of means for two independent samples,
the hypotheses take the following form.
H : 0 1 2
H :
1 1 2
The two populations are sampled and the means and

variances computed based on samples of sizes n1
and n2. If both populations are found to have the
same variance, a pooled variance estimate is
computed from the two sample variances as follows:
n1 n2 2 2
(X X ) + (X X ) or s2 = (n 1 - 1) s1 +(n 2-1) s2
2 2
i1
- i2
-
1 2
i 1 i 1
2
s n1 + n2 -2
n + n -2
1 2
Means
The standard deviation of the test statistic can be

estimated as:
sX 1 - X 2 = s 2 (n1 + n1 )
1 2
The appropriate value of t can be calculated as:
(X 1 -X 2) - (1 - 2)
t= sX 1 - X 2
The degrees of freedom in this case are (n1 + n2 -2).

F Test
An F test of sample variance may be performed
if it is
not known whether the two populations have
equal variance. In this case, the hypotheses
are:
H0: 2 = 2

1
2
H1: 12
2
2
F Statistic
The F statistic is computed from the sample
variances as follows
s12
F(n1-1),(n2-1) =
s22
where
n1 = size of sample 1
n2 = size of sample 2
n1-1 = degrees of freedom for sample 1
n2-1 = degrees of freedom for sample 2
s12 = sample variance for sample 1
s22 = sample variance for sample 2
Pepsi has launched two new variants of

diet Pepsi. It has decided to conduct a
product test to arrive at a suitable
product. It has decided to conduct a
C.L.T. on a group of consumers. What
testing procedures would you suggest
to Pepsi?
Paired Samples
The difference in these cases is examined by a paired samples t
test. To compute t for paired samples, the paired difference
variable, denoted by D, is formed and its mean and variance
calculated. Then the t statistic is computed. The degrees of
freedom are n - 1, where n is the number of pairs. The relevant
formulas are:
H0: D = 0
H1: D 0
D - D
tn-1 = sD
n
Paired Samples
where, n
S1 Di
D= i=
n
n
S=1 (Di - D)2
sD = i
n-1
S
SD n
D
Nestle has launched a new variant of

drinking chocolate. It has decided to
conduct a product test to assess
consumer response. It has divided the
country into 4 geographic zones and
would like to know if regional
differences are relevant for this new
launch. What testing procedures would
you suggest to Nestle?
Relationship Among Techniques
Analysis of variance (ANOVA) is used as a test of
means for two or more populations. The null
hypothesis, typically, is that all means are equal.
Analysis of variance must have a dependent variable
that is metric (measured using an interval or ratio
scale).
There must also be one or more independent
variables that are all categorical (nonmetric).
Categorical independent variables are also called
factors.
Decomposition of the Total
Variation:
Independent Variable X
One-way ANOVA Total
Categories Sample
Within X1 X2 X3 Xc
Category Y1 Y1 Y1 Y1 Y1 Total
Variation Variation
Y2 Y2 Y2 Y2 Y2 =SSy
=SSwithin : :
: :
Yn Yn Yn Yn YN
Category Y1 Y2 Y3 Yc Y
Mean
Between Category Variation = SSbetween= SSx
Statistics Associated with One-
way
Analysis of Variance
SSbetween. Also denoted as SSx, this is the variation
in Y related to the variation in the means of the
categories of X. This represents variation between
the categories of X, or the portion of the sum of
squares in Y related to X.
SSwithin. Also referred to as SSerror, this is the

variation in Y due to the variation within each of the
categories of X. This variation is not accounted for
by X.
SSy. This is the total variation in Y.

Conducting One-way Analysis of
Variance
Decompose the Total Variation
The total variation in Y, denoted by SSy, can be
decomposed into two components:
SSy = SSbetween + SSwithin
where the subscripts between and within refer to the

categories of X. SSbetween is the variation in Y related
to the variation in the means of the categories of X.
For this reason, SSbetween is also denoted as SSx.
SSwithin is the variation in Y related to the variation
within each category of X. SSwithin is not accounted
for by X. Therefore it is referred to as SSerror.
Variance
Test Significance
The null hypothesis may be tested by the F statistic
based on the ratio between these two estimates:
SS x /(c - 1) MS x
F= =
SS error/(N - c) MS error
This statistic follows the F distribution, with (c - 1) and
(N - c) degrees of freedom (df).
Variance
Interpret the Results
If the null hypothesis of equal category means is not
rejected, then the independent variable does not
have a significant effect on the dependent variable.
On the other hand, if the null hypothesis is rejected,
then the effect of the independent variable is
significant.

Data Analysis PDF

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Data Analysis PDF

Caricato da

Copyright:

Formati disponibili

Data Analysis

The mode is the value that occurs most frequently. It

The variance is the mean squared

The standard deviation is the square

Pepsi Consumption Male Female

Light 33.3% 66.7%

Heavy 66.7% 33.3%

Column total 100% 100%

Low 65% 60% 75% 40%

Column 100% 100% 100% 100%

Prelaunch Market Research

Prelaunch Market Research

Prelaunch Market Research

Post Launch Study

Low Trials Reasons

Problems with the ad

Scale down expectations

Pepsi has conducted a pilot U & A study

Then the value of is calculated as follows:

= (5 -7.5)2 + (10 - 7.5)2 + (10 - 7.5)2 + (5 - 7.5)2

=0.833 + 0.833 + 0.833+ 0.833

Vodafone Mobile has conducted a pilot

The hypotheses may be

H0: < 4.0

t = (4.724-4.0)/0.293 = 0.724/0.293 = 2.471

Levers has launched a new brand of

The two populations are sampled and the means and

The standard deviation of the test statistic can be

The appropriate value of t can be calculated as:

The degrees of freedom in this case are (n1 + n2 -2).

Pepsi has launched two new variants of

Nestle has launched a new variant of

SSwithin. Also referred to as SSerror, this is the

SSy. This is the total variation in Y.

SSy = SSbetween + SSwithin

where the subscripts between and within refer to the

Potrebbero piacerti anche