Sei sulla pagina 1di 11

HANOI UNIVERSITY

FACULTY OF MANAGEMENT AND TOURISM

STATISTICS FOR ECONOMICS


Autumn, 2018

Name Class ID Signature

Nguyen Thi Thuy Ngan 1KT-16 1604010069

Le Nguyen Cam Nhung 2TC-16 1604040086


Students’ information

Nguyen Thuy Duong 1KT-16 1604010016

Vu Hai Anh 1KT-16 1604010009

Dieu Anh 4KT-15 1504010006

Nguyen Thi Thao Ngan 3KT16 1604010068

Hoang Duc Thanh 1KT-16 1604010095

Date of submission: November 1st , 2018


Case study: Compensation for Sales

Professionals

A. Scenario
Suppose that a local chapter of sales professionals in the greater San Francisco area conducted
a survey of its membership to study the relationship, if any, between the years of experience
and salary for individuals employed in inside and outside sales positions. On the survey,
respondents were asked to specify one of three levels of years of experience: low (1-10 years),
medium (11- 20 years), and high (21 or more years). The objective of this study is to test for
any significant interaction between Position and Experience and to test for any significant
differences in salary due to position and years of experience. Use 0.05 level of significance.

B. Questions

Question 1. What inference technique should be considered for this study? Explain.

From the given information in the scenario, we consider the two-way ANOVA as the most
appropriate technique. The reason is that the two-way ANOVA can examine the connection
between two factors (Position and Experience) and the influence of these two independent
variables on dependent variable (Salary). In other words, our team can test if there is a
significant interaction between Position and years of Experience with Salary which employees
are paid for.

Question 2. Produce descriptive statistics for the dataset. You are expected to generate
as many relevant descriptive statistics as possible using ALL the relevant tools introduced
in thel abs of this course. Remember to provide appropriate interpretations for the
descriptive statistics. Try not to include unnecessary or irrelevant descriptive statistics.

In this report, some of the descriptive statistics will be revealed in detail in order to illustrate
the case study’s data features and some related measures.

a. Cross-tabulation Table
R-code is conducted and what is showed in R-Studio.
 table(position, exp)
Low Medium High
Inside 20 20 20
Outside 20 20 20
It is given that we will have the sample size of the whole data set is 120 observations, 20
observations for each combined group.

b. Mean of salary for each combination of experience and position

 by(salary,list(position,exp), mean)
: Inside
: Low
[1] 55031.35
----------------------------------------------------------------
: Outside
: Low
[1] 64607.9
----------------------------------------------------------------
: Inside
: Medium
[1] 55607.75
----------------------------------------------------------------
: Outside
: Medium
[1] 81628.5
----------------------------------------------------------------
: Inside
: High
[1] 57422.45
----------------------------------------------------------------
: Outside
: High
[1] 75254.9
The above output depicts mean salary ($) of each group. The highest salary of $81628.5 is the
combination between outside sales position and medium level of experience; while the lowest
one is $55031.35 from the group of inside position and low experience.

c. Standard deviation for each combination of experience and position

 by(salary,list(position,exp), sd)
: Inside
: Low
[1] 3619.716
----------------------------------------------------------------
: Outside
: Low
[1] 3556.456
----------------------------------------------------------------
: Inside
: Medium
[1] 3544.737
----------------------------------------------------------------
: Outside
: Medium
[1] 3453.467
----------------------------------------------------------------
: Inside
: High
[1] 3327.372
----------------------------------------------------------------
: Outside
: High
[1] 3830.774
It can be clearly seen that the gap between the highest number, which is $3830.774, and the
lowest one as 3327.372 is not significant.

d. Boxplot

boxplot(salary ~ position * exp, data = salessalary, frame =


TRUE, xlab = "Position and Experience", ylab = "Salary", col =
c("red","steelblue"))
A boxplot is a method for graphically depicting groups of numerical data through their
quartiles. We can use the boxplot of the data to compare within-group variations. tell us about
the distribution of different groups and help to detect outliers.
Based on what we have got about the box plot, the difference in the means of salaries are less
variable among three levels of the years of experience in inside position; in contrast, that of
outside position is witnessed to have a significant variability. We can also point out that the
Outside.High is the outlier point which stands below the smallest value, and the distribution is
almost right-skewed.

Question 3. Check all the assumptions of the inference technique you suggest in
question 1. Are the assumptions satisfied? Explain.

As mentioned in question 1, we are using two-way ANOVA as the inferential technique to test
whether there is a significant interaction between Position and Years of experience with Salary.
Among them, Salary is the dependent variable along with two independent variables: Position
(Inside, Outside) and Experience (Low, High, Medium).

Before analysing the data, there are three assumptions that need to be tested:

 Independent observations, simple random sample.

 Homogeneity of variances.

 Normally distributed sample.

The first assumption is to the check if there is any relationship between the observations
between the groups themselves and if it is a simple random sample or not. In this case, it meets
the requirement that respondents do not belong in more than one level of experiences or
position so that the sample is independent. At the same time, the employees conducting the
survey are selected randomly from the population of sales professionals in the greater San
Francisco area, which means each individual has an equal probability of being chosen. So this
is a simple random sample.

On the second assumption, we check the equality of standard deviation. The data we conduct
shows that 3830.774 is the largest standard deviation and 3327.372 is the smallest one. From
the result, we can take the ratio is approximately 1.15, which is satisfied the requirement of
smaller than 2. However, we also use Levene’s Test function to check again.

 R code:

 install.packages("car")
 library(car)
 leveneTest(salary ~ position*exp, data = salessalary)

 Output:

Levene's Test for Homogeneity of Variance (center = median)


Df F value Pr(>F)
group 5 0.0722 0.9962
114
 Hypothesis:

Ho: All populations have the same standard deviation


Ha: At least 2 populations have different standard deviation

We see that p-value = 0.9962 > α = 0.05. We cannot reject the null hypothesis, which means
there is no significant difference with standard deviation. With the two methods above, we
have the same result so the assumption is true.

In the final assumption, we test whether the distribution is normal by using Q-Q plot.

 R code:

 qqPlot(lm(salary ~ Position + Experience +


Position*Experience,data=salessalary), simulate=T,main="Q-Q Plot", labels=F)

 Output:

From the outcome above, we can see that the sample is drawn from normally distributed
populations because nearly all points are located near a straight line. So, this assumption is
satisfied.

Question 4. Perform the inference technique you suggest in question 1. Remember to


provide all the necessary steps. What are your interpretations and conclusions? Explain.
We chose Two-way ANOVA technique to test for difference in salary by position and
experience.
Step 1: Identifying null and alternative hypothesis
Ho: There is no interaction between Position and Experience.
Ha: There is interaction between Position and Experience.
Step 2: Level of significance
α = 0.05
Step 3: Decision rule
Reject H0 if p-value < α = 0.05
Step 4: Test statistic
Using two-way ANOVA with Salary is the outcome variable, Position and Experience are two
factors. We focus on the effects of Position and Experience on Salary and their interaction, so
we use this R command:
 aovSales <- aov(salary~position*exp, data = salessalary)
 summary(aovSales)

Here is R output for two-way ANOVA:


Df Sum Sq Mean Sq F value Pr(>F)
position 1 9.516e+09 9.516e+09 751.36 <2e-16 ***
exp 2 1.668e+09 8.341e+08 65.86 <2e-16 ***
position:exp 2 1.352e+09 6.760e+08 53.38 <2e-16 ***
Residuals 114 1.444e+09 1.266e+07
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Step 5: Conclusion
We have p-value = 2e-16 < α = 0.05. Therefore, we do reject Ho.
Conclusion: There is enough evidence to conclude that the interaction between position and
experience is significant. It means that the effect of years of experience on salary depends on
what has happened to position and vice versa.

Question 5. Draw an interaction plot and interpret the plot.

We use the following function in R Studio to draw the interation plot:


 interaction.plot(x.factor = position, trace.factor = exp, response = salary,
fun = mean, type = "b", legend = TRUE, xlab = "position", ylab = "salary",
col = c("green","red","blue"),main = "Interaction plot")

It can be seen from the graph that the three lines are not parallel, which indicates that there is a
significant interaction between position and experience. As a result, the influence of different
positions on the mean salary is dependent on experience. However, the effect of experience on
salaries in the inside group is not as significant as the outside group as we can see that the mean
salary of the inside position is lower than the outside one. In the outside sales position group,
the salary of people with medium experience is the highest, followed by the high and the low
ones respectively. In the inside position, people who have high years of experience has the
highest level of salary, followed by the medium and the low ones.
Question 6. Discuss the credibility of the interpretations and conclusions of question 4.
Is there anything we should be concerned about? Explain.

From the interpretations and conclusions in question 4, it has some points to discuss about its
credibility. We compared p-value with level of significance (α=0.05) and concluded that there
is interaction between position and experience. It means that the probability of type I error is 5
percent, which occurs when the null hypothesis is rejected when in reality it is true. In other
words, it is 95% credible that the null hypothesis that is no interaction between position and
experience will not be rejected if it is true.

In this study, something should be concerned. The Two-way ANOVA is not a perfect test and
it will have some limitations or provide misleading results under certain circumstances. In
practical applications, it is difficult to obtain a balance with all population means from each
data set. Another limitation of Two-way ANOVA is that it assumes that the groups have the
same or very similar, standard deviations. The greater the difference in standard deviations
between groups, the greater chance that the conclusion of the test is inaccurate. As the normal
distribution assumption, this is not a problem as long as the standard deviations are not
enormously different, and the sample size of each group is approximately equal. Two-way
ANOVA also assumes that the data in the groups are normally distributed. The test can still be
carried out if this not be the case. It assumes that the data in the groups are normally distributed.
The test can still be carried out if this not be the case. Moreover, in this case, the data can be
affected by the number of worker go to work at the day they conduct the survey. If sample size
is small, it’s important to test for a possible violation of the normality assumption as it can affect
our result. However, all the assumptions for two dimensions are met, the result is very accurate.
Not only that, there are many other factors that affect salaries. For example, employees’
productivity, education, the power of employee etc. These factors may also cause unreasonable
results to reality.
PEER EVALUATION FORM

Contribution
Name Signature
(100%)

Nguyen Thi Thuy Ngan 100%

Le Nguyen Cam Nhung 100%

Nguyen Thuy Duong 100%

Vu Hai Anh 100%

Dieu Anh 100%

Nguyen Thi Thao Ngan 100%

Hoang Duc Thanh 100%

Potrebbero piacerti anche