Sei sulla pagina 1di 11

Operation 1: Identify missing values, go to transform, go to replace

missing values, highlight missing values for which the missing values are to
be analyzed. Transfer them to the right box. In method, select series,
mean, select ok. The software will create one additional variable.
Operation 2: If you would want to analyze the data, in a split file command,
which is generally used for data which are based on specific categories.
E.g. 4 Cities- Delhi, Mumbai, Kolkata, Chennai or gender based or
company wise data etc. Go to data, click split file, click on compare groups
& in the window named as groups based on put the appropriate category,
for e.g. in our research, we may enter gender as the group. Click Ok to
continue.
* To unsplit the file, click on data, split file, analyze all cases, do not create
groups, click Ok & continue.
Operation 3: To calculate descriptive statistics, SPSS gives you 3 options.
Option 1- Click analyze, click descriptive, click frequencies, highlight the
variable for which descriptive statistics are to be calculated. Transfer to the
right window. Click on statistics & select required options. Click continue,
click Ok & your output is ready.
Option 2- Go to analyze, descriptive statistics, descriptives, options, select
required options. Transfer the required variable from left to right. Click Ok to
get your output.
Option 3- Go to analyze, compare means, click on means, transfer the
required variable into the dependent list, click on options, click the required
operations, transfer them to the right. Click continue, click Ok.
Operation 4: Go to transform, click visual binning, select the required
variable, transfer to the right, click continue, click make cutpoints, click in
first cutpoint location, put the appropriate value, click on width, put your
interval size(eg.5, 10,15 etc), click on number of cutpoints, click on apply,
click on make labels, click in binned variable & give appropriate name to a
new variable, click Ok to give the output.

Operation 5: Go to transform, click on rank cases, select the appropriate


variable & transfer it to the right, click Ok to get the output.
Operation 6: Go to transform, click recode into different variables, click on
old & new values, click on range, in the range put appropriate values, in the
values, put as 1, click on all other values, click values & put as 2, click
continue, click Ok.
Testing the data for Normality:
In testing of hypothesis, using a t-test, we generally use the following 3 ttests:
1. One-Sample t-test
2. Paired Sample t-test
3. Independent Sample t-test
Independent Sample t-test: Suppose a brand of marketers of jeans wants
to understand whether a set of customers in Delhi & a set of customers in
Mumbai perceived its brand in the same way or not. The company
conducted a short survey in both the cities asking customers about their
perception of brand. The company used a 7-point scale for the respondents
of each city where 1 indicates liked a lot & 7 indicates not liked at all.
Null Hypothesis- There is no significant difference in the perception of
brands amongst people of Delhi & Mumbai i.e. M= D
Operation 7: Go to analyze, compare means, independent sample t-test,
put ratings in test variable, city in group variables, in defined group type 1 &
2, click Ok to continue, get the output.
When you perform an independent sample t-test, the output gives two
significant values for the test. The first value corresponds to equal
variances assumed & the second value corresponds to equal variances not
assumed. To select either of the two significant values, we have to first,
identify if the variances are equal or not. For this, the output default gives
Levenes test for equality of variances. So we first, frame the hypothesis,
Null Hypothesis- There is no significant difference in the variances of Delhi
& Mumbai. The significance value of Levenes test is 0.401 which when

compared with value 0.05 lies in the acceptance region because 0.401 >
0.05. Hence, we accept our null hypothesis & conclude that there is no
significant difference in the variance of 2 cities.
Since, our hypothesis is accepted we continue with equal variances
assumed. Hence, we take 0.010 as our significance value for the t-test
when comparing it with value 0.05 at 5% level of significance. The
significance value 0.01 < 0.05. Hence, we reject our null hypothesis &
conclude that the perception of the brand in the 2 cities is significantly
different.
-------------------------------------------------------------------------------------------------Paired Sample T-test:
When we want to find out, attitude of people, for e.g. towards a particular
brand before & after an advertising campaign, in such cases we use paired
sample t-tests. Let us assume that we have used a sample of 18
respondents before & after an advertising campaign. They each were
asked to rate on a 10-point scale, their attitudes towards the brand, where
1 represents brand is highly disliked & 10 represents brand is highly liked.
Following is the data collected after the survey.
Sr. No.
1
2
3
4
5
6
7
8
9
10
11
12
13
14

Before
3
4
2
5
3
4
5
3
4
2
2
4
1
3

After
5
6
6
7
8
4
6
7
5
4
6
7
4
6

15
16
17
18

6
3
2
3

8
4
5
6

There is no significant difference in the mean ratings of the respondents


before & after the advertising
Go to analyze, go to compare means, go to paired-sample t-test, put before
in pair 1 & after in pair 2. Click ok to continue.
At 5% level of significance, our significance value is 0.000 which is lesser
than our value 0.05. Hence we reject our null hypothesis & conclude that
there exists a significant difference in the mean ratings given by the
respondents before & after the advertisement which is also visible from the
correlation nos. The correlation is 0.498 which is a weak correlation
between before & after indicating that hypothesis can be rejected.
ONE SAMPLE T-TEST:
Coca-cola company has taken a sample of thirty cola bottles to test the
amnt. of sugar calories in each of the bottle. Test at 5% level of significance
if the mean value of the 30 samples is 300 calories.
Ho= 300
Ha 300
Go to analyze, compare means, one sample t-test, transfer calories from
left side to right side, in the test value put 300, and click Ok to get the
output.
At 5% level of significance, when we compare our significance values with
the value 0.05, the significance value is more than the value. Hence,
we accept our null hypothesis & conclude that the hypothized mean can be
considered as 300 calories.

Cross-tabulation of Data in chi-square test:


A company has conducted a research of a consumer survey for a brand of
detergents. One of the questions, dealt with income category of the
respondent & the other asked the respondent to rate his purchase
intention.
Code
1
2
3
4

Income Rs. In month


< 5000
5000-10000
10000-15000
> 15000

Purchase Intention
1
2
3
4
5

None
Low
High
Very High
Certain

Chi-square value for independence helps us test the hypothesis if the two
variables A & B are independent of each other. In the given example, we
want to test the hypothesis, is the purchase intent independent of income
group.
Ho= Purchase intent is independent of income group
When you request the software in the same output for a chi-square test, it
gives you the value of chi-square test of independence. In the given output,
our chi-square value is 0.097. When we compare this value with alpha
value 0.05, it is higher than the value & hence we accept our hypothesis
& conclude that purchase intent is independent of the income group.

ANOVA:
Completely randomized design in one-way ANOVA. This design is used
when there is only one independent categorical variable & one dependent
variable. Each category of an independent variable is called as a level. The
independent variable maybe different levels of price, packages, different
colors & the dependent variable can be effect on sales.
Case 1: An advertising company has 3 different versions of an ad copy for
a campaign. These 3 versions can be called as copy 1, 2, 3. Now the ad
agency wants to test which of these versions of the ad copy is preferred by
its target population before they launch the ad campaign. They have
collected responses of 18 respondents from the target population in the
nearby areas of the city, such that, these 18 respondents were assigned to
3 versions of the copy. Each version is thus shown to six respondents. The
respondents are asked to rate their liking for the ad copy on the scale of 110 such that 1= not liked at all & 10= liked a lot.
Go to analyze, compare means, one-way anova, put ratings in dependent
list, ad copy as factor variable, click ok.
Ho= There is no significant difference given to all the 3 ads (i.e. A= B= C)
Ha= At least one of the ad is significantly different.
At 5% level of significance, the value is 0.05 & the significance value is
0.203 which is greater than the value 0.05. Hence we accept our null
hypothesis & conclude that there is no significant difference in the mean
ratings of all the 3 ads.
Randomized Block Design:
In the given case above, if we make slight changes that is we add a block
variable (magazine) does it create change in the perception of respondents
towards the ad copy. These 3 versions of the ad copies were each used in
six different magazines. These magazines are coded as 1, 2, 3, 4, 5, and 6.
Out of the people who saw these ads, 18 randomly chosen respondents

are picked up, each of whom as seen a particular version of ad. Thus we
finally have 1 respondent who has seen the given version of ad in the given
magazine. In other words, we have one respondent for every combination
of magazine & ad copy.
1st Hypothesis:
Ho= There is no significant difference in the mean ratings given to all the 3
ads.
Ha= At least one ad is significantly different than the others.
2nd Hypothesis:
Ho= The block/magazine doesnt create a significant change in the
perception of people.
Ha= At least one is significantly different.
Go to analyze, general linear model, univariate, put ratings in the
dependent variable, put ad copy as fixed factor, magazine as random
factor, and click Ok.
At 5% level of significance, our significance value 0.005 is lesser than
value 0.05. Hence we reject our null hypothesis & conclude that at least
mean ratings of 1 of the ad copies is significantly different.
Hypothesis 2: At 5% level of significance, our significance value is 0.000
which is lesser than the value 0.05. Hence we reject our null hypothesis &
conclude that magazine creates a significant difference in the perception of
people towards the ads which can be further confirmed from rejection of
hypothesis 1which was accepted in one-way ANOVA.
Factorial design with 2 or more factors:
This type of design is employed when we have 2 or more independent
variables or factors. The major advantage of this design is that multiple
factors can be simultaneously tested. In such a design, there are 2 kinds of
effects that can be simultaneously tested. One is called as the main effect
& the second is called as the interaction effect.

Case 1: Rin Supreme detergent wants to test the effect of 2 factors


(independent variables) i.e. packaged design & price on the dependent
variables, sales. The company would like to know if each of the factors
independently affects the sales & if there is a combined effect of packaged
design & price on the sales. The company has conducted an experiment in
simulated environment on 18 respondents randomly selected. The
company has 3 levels of pricing, 8, 11, 14 & three levels of package design
designated by the main color used, blue, green & red. These independent
variables can be coded as 1, 2, and 3 for the 3 pricing levels & 1, 2, 3 for
the 3 colors respectively.
Hypothesis 1: Different levels of pricing do not have a significant different
on sales.
Hypothesis 2: Different levels of colors do not have a significant difference
on sales.
Hypothesis 3: Combined effect of pricing & colors doesnt create a
significant difference on sales.
The company has collected data for sales at different price levels with
different combinations of colors.
Go to analyze, go to general linear model, go to univariate, put sales in
dependent variable, price in fixed factor, packaging in random factor, click
ok to continue.
At 5% level of significance, when you test the first hypothesis, the value is
0.05. When you compare this value with p-value i.e. is the significance
value, it is 0.002. Since 0.002 < 0.05 we reject our null hypothesis &
conclude that price has a significant impact on sales.
At 5% level of significance, the value at 0.05 is lesser than the p-value
0.193. Hence we accept our null hypothesis & conclude that there is no
significant impact of sales of different levels of packaging.

At 5% level of significance, the value at 0.05 is lesser than the p-value


0.646. Hence, we accept our null hypothesis & conclude that there is no
significant impact of the combined effect of packaging & price on sales.
---------------------------------------------------------------------------------------------CORRELATIONS
Correlation analysis is used to measure degree of association between two
sets of quantitative data. For e.g. how are sales of product A correlated with
sales of product B or how advertising expenditure is correlated with other
promotional expenditures. There is virtually no limit to apply correlation
analysis to any 2 data sets of 2 or more variables but it is important that the
variables make business sense.
Case 1: A manufacturer & a marketer of electric motors would like to build a
regression model consisting of 5 or 6 independent variables to predict
sales. Past data has been collected for 15 sales territories on sales & 6
different independent variables. Build a regression model & recommend
whether it should be used by the company. Following are the variables that
the company has collected:
Dependent variable- Y= sales in Rs. (in lakhs) in the territory. Independent
variables- X1= market potential in the territory, X2= no. of dealers of the
company in the territory, X3= no. of sales people in the territory, X4= index
of competitors activity in the territory on a scale of 1-5 where 1= very low &
5= high, X5= no of service centers in the territory, X6= no. of existing
customers in the territory.
Go to analyze, correlate, bivariate, transfer all data from left to right, click
Pearsons, click Ok.

When you look at the correlations of all the variables with each other, the
values in correlation are standardized between -1 to +1. Looking at the last
column, we can find that except for competition index, all the other
variables are highly correlated with sales & range from 0.732 to 0.95. This

means we have chosen a fairly good set of independent variables. Only the
index of competitors activity doesnt appear to be strongly correlated with
sales as its correlation coefficient is -0.5. We must also note that these are
one-to-one correlations of each variable with sales & each other. So we
may still want to do a multiple regression with these independent variables
because its possible that in the presence of other variables, this
independent variable may become a good predictor of dependent variable.
The other observation from the table, we need to see is, whether
independent variables are highly correlated with each other. If they are as
in this case, it indicates that we may not be able to use all of them & may
be able to use only 1 or 2 & we land up eliminating few independent
variables.
The result of the regression model gives us the coefficients of the model
which are also called as the B-list. A (intercept)= -3.1728, B1= 0.2268, B2=
0.819, B3= 1.09, B4= -1.89, B5= -0.54 & B6= 0.065. When you substitute
these values in the equation, you get the following equation:
Sales = -3.17+0.23(potential) +0.82(Dealers) +1.09(sales people)1.89(competition activity) -0.55(service centers) + 0.07(existing customers)
Before we use this equation, we need to check the statistical significance of
the model & the r2 value. From the ANOVA table, the p-level is seen as
0.00004. This indicates that the model is statistically significant at a
confidence level of 1- 0.00004 or 99.99% level of confidence.
The r2 value is 0.97 which indicates 97.7% of the moment in sales can be
predicted by the given independent variables. We also note that the
significance of individual independent variable indicates that at the
significant level of 0.10(equivalent to confidence level of 90%) only
potential & people are statistically significant in the model. The other 4
independent variables are individually not significant. However, from the
time being, we shall use the model as it is & try to apply it for decision
making. The real use of regression model would be to try & predict sales in
lakhs, given all the independent variables values or check the impact of a
change in some of them on the sales figure of a territory. The equation we

obtain simply means the sales will increase in territory if the potential
increases, no. of dealers increases, level of competition decreases, the no.
of service people decreases & if the no. of existing customers increases.
The estimated increase in sales for every unit increase or decrease in
these variables is given by coefficients of respective variables for instance,
if the no of sales people is increased by 1, the sales will increase by 1.09
lakhs. Similarly, if 1 more dealer is added, we expect the sales to increase
by 82,000. If other variables are kept constant, the variable service does
not make too much of sense. If we increase the amount of service centers,
according to the output, sales is estimated to decrease by 55,000. When
we look at the significance level of service, we find that this variable is
insignificant. Hence, to put this variable as a part of the regression model
would be unwise.

Potrebbero piacerti anche