22 mi piace00 non mi piace

81 visualizzazioni287 pagineMay 04, 2016

© © All Rights Reserved

PDF, TXT o leggi online da Scribd

© All Rights Reserved

81 visualizzazioni

22 mi piace00 non mi piace

© All Rights Reserved

Sei sulla pagina 1di 287

Dr Kwaku Ohene-Asare & Mr. Abeeku E. Edu

Website:https://sakai.ug.edu.gh/portal

Email:kohene-asare@ug.edu.gh; asedu@ug.edu.gh

The University of Ghana Business School, Dept of OMIS

Feb-Apr, 2016

Defeat is not bitter unless you swallow it. - Joe Clark

Feb-Apr, 2016

1 / 287

Welcome

shadows of multivariate data analysis

"A little learning is a dang'rous thing; Drink deep, or

taste not the Pierian spring." Alexander Pope

Feb-Apr, 2016

2 / 287

Announcements

Lectures:

cuts across, UG, MBA, EMBA, WMBA

Oce

Surgery hours

Website

: By appointment

: sakai https://sites.google.com/site/oasare/

Feb-Apr, 2016

3 / 287

Grading Policy

Group assignments & IAs 10%

Class Partipation 5%

Total 50%

Feb-Apr, 2016

4 / 287

Some defeat themselves. No!!! Be positive.

Let me know what is & what isn't working for you

Put in the time, eort & energy! You can pass!!!

Anywhere is a walking distance if you have time

Feb-Apr, 2016

5 / 287

Session 1 Overview

Denition

Data is a collection of observations

This session seeks to explain the dierence between categorical

and numerical data, distinguish among nominal, ordinal, interval

and ratio scale of measurement and provide examples for each.

Feb-Apr, 2016

6 / 287

Session 1 Outline

Categorical/Qualitative Data

Numerical/Quantitative Data

Scales of measurement

Feb-Apr, 2016

7 / 287

Reading List

of Anderson, D.R., Sweeney, D.J., & Williams, T.A. (2011).

Statistics for Business and Economics (11th ed.). Sounth-Western Cengage

Learning

Pages 40 - 45 of Lane, D. (2003). Online Statistics Education: A

Multimedia Course of Study. In D. Lassner & C. McNaught (Eds.),

Proceedings of World Conference on Educational Multimedia, Hypermedia

and Telecommunications 2003 (pp. 1317-1320). Chesapeake, VA:

Association for the Advancement of Computing in Education

(AACE).Retrieved January 28, 2015 from http://www.editlib.org/p/14001

Chapter 1 - 3

Feb-Apr, 2016

8 / 287

Data (1)

data must be obtained. The data type and the scale of

measurement help relate the type of statistics the

analyst can use to examine the data.

Feb-Apr, 2016

9 / 287

Data (2)

information contained in the data & indicates the most

appropriate data summarization & statistical analyses

Feb-Apr, 2016

10 / 287

Categorical Data

categories

Selecting a bad category can mess the research outcome

2 levels of measurement exist under categorical data

Feb-Apr, 2016

11 / 287

labels

or

names

used to

considered a nominal scale

There is no intrinsic order

Feb-Apr, 2016

12 / 287

In a dataset, males could be coded as 0, females as 1. Here, the

scale of measurement is still nominal even though the data

appear as numeric value. Why?

Martial status could be coded as D if divorced, M if married, S if

single, & W if widowed. Order is useless.

Feb-Apr, 2016

13 / 287

the properties of nominal data & the order or rank or rating of

the data is meaningful

Ranking could be done in ascending or descending order. e.g.

attitudes on a likert scale

Feb-Apr, 2016

14 / 287

Your class rank in school (can be coded): excellent, good & then

poor

Such responses to questions coded from a scale of 1 to 5 as

strongly dislike, dislike, neutral, like & strongly like

Here, what does the rating of 5 indicate?

Feb-Apr, 2016

15 / 287

Levels of Measurement

Feb-Apr, 2016

16 / 287

of ordinal data & the interval between values is expressed in

terms of a xed unit of measure.

Scores on an interval scale can be added and subtracted but can

not be meaningfully multiplied or divided. Always numeric.

Feb-Apr, 2016

17 / 287

the same dierence as between 90

& 80

3 students with SAT math scores of 620, 550, & 470 can be

ranked or ordered in terms of best performance to poorest

performance.

Feb-Apr, 2016

18 / 287

take on any value within a nite or innite interval.

They are numeric variable with absolute (non-arbitrary) 0. You

can count, order and measure

Feb-Apr, 2016

19 / 287

number of employees, age

distance, height, weight

time needed to run a mile

Feb-Apr, 2016

20 / 287

Feb-Apr, 2016

21 / 287

Class Exercise 1

Identify the type of data & measurement scale described in each

of the

following

examples.

An opinion poll was taken asking people which party they would

vote for in a general election.

A market researcher stops you in Spintex Road and asks you to

rate between 1 (disagree strongly) and 5 (agree strongly) your

response to opinions presented to you.

Incomes of Ghanaians musicians.

Feb-Apr, 2016

22 / 287

Class Exercise 2

2003) asked 46 questions about subscriber characteristics and

interests. State whether each of the following questions provided

categorical numerical

or

Feb-Apr, 2016

23 / 287

numerical &

ratio

Are you male or female?

categorical &

nominal

Feb-Apr, 2016

24 / 287

When did you rst start reading Daily Graphic? High school,

college, early career, midcareer, late career, or retirement?

categorical &

ordinal

How long have you been in your job or position?

numerical &

ratio

Feb-Apr, 2016

25 / 287

bmw, benz, toyota, hyundai, ford etc.

categorical &

nominal

Feb-Apr, 2016

26 / 287

or B?

categorical &

ordinal

Feb-Apr, 2016

27 / 287

Foreign Aairs

The following questions were asked. Comment on whether each

question provides categorical or quantitative data and indicate

the level of measurement.

Feb-Apr, 2016

28 / 287

Where do you purchase books? Three options were listed:

Bookstore, Internet, and Book Club.

Do you own a car?

For foreign trips taken in the past three years, what was your

destination? Seven international destinations were listed.

Feb-Apr, 2016

29 / 287

Session 2 Overview

research, we usually make a tentative assumption about the

whole. But making a statement is one thing. Testing its

authenticity is another thing.

This session examines the tools needed to hypothesize &

conclude given there is only 1 sample.

Feb-Apr, 2016

30 / 287

Session 2 Overview

t-test statistic, critical value, p-value

level of signicance

statistically & practical signicance

Feb-Apr, 2016

31 / 287

Reading List

of Anderson, D.R., Sweeney, D.J., & Williams, T.A. (2011).

Statistics for Business and Economics (11th ed.). Sounth-Western Cengage

Learning

Pages 344-379 of Lane, D. (2003). Online Statistics Education: A

Multimedia Course of Study. In D. Lassner & C. McNaught (Eds.),

Proceedings of World Conference on Educational Multimedia, Hypermedia

and Telecommunications 2003 (pp. 1317-1320). Chesapeake, VA:

Association for the Advancement of Computing in Education

(AACE).Retrieved January 28, 2015 from http://www.editlib.org/p/14001

Chapter 9

Feb-Apr, 2016

32 / 287

What is a hypothesis?

such as the average, variance, standard deviation

A hypothesis can be a statement or a question:

Income has a positive eect on consumption

Are socially responsible rms are more protable?

Feb-Apr, 2016

33 / 287

1 Write null

H0

& alternative

Ha

hypotheses

3 Find the critical value, (i.e. tabulated value) from the table using

the degrees of freedom (d.f.)

5 Compare the

6 Compare

p value vs. . Reject H0 if p <

H0 if

Dr K. Ohene-Asare & Mr. Abeeku E. Edu (UGBS)

Feb-Apr, 2016

34 / 287

The

H0

to be true or because it is to be used as a basis for argument, but

has not been proved.

e.g. in a clinical trial of a new drug,

H0

H 0:

there is no

Feb-Apr, 2016

35 / 287

Ha

Ha

H 0.

Feb-Apr, 2016

36 / 287

Ha

alternative format (

).

.

And you have to write the

Ha

H0

Feb-Apr, 2016

37 / 287

Tail of test

The form of

Ha

<

better than, worse than, at least, at most, not at least,

not younger than, above.

6=

not equal to, not dierent from, diers from, the same as,

does not vary from, on, of , was.

Feb-Apr, 2016

38 / 287

Lower Tail Test

H 0 : H

Ha : <H

Upper Tail Test

H 0 : H

Ha : >H

Feb-Apr, 2016

39 / 287

Nescafe Ghana claims that since the population mean lling

weight is at least 3 lb/can, consumers' rights are protected.

H 0 : 3

Ha : <3

MaxFlight uses a high-technology manufacturing process to

produce golf balls with a mean driving distance of 295 yards.

___________

___________

Feb-Apr, 2016

40 / 287

According to the CEO, the new BMW car can run more than 24

miles per gallon

___________

___________

Feb-Apr, 2016

41 / 287

phone bill average not at most Ghc52 per month.

Ha : > 52 average is over Ghc52 per month

Feb-Apr, 2016

42 / 287

___________

___________

Feb-Apr, 2016

43 / 287

Suppose that we want to test the hypothesis that the climate has

changed since industrializatoin. If the mean temperature

throughout history is not as improved as 50 degrees, what is the

null & alternative hypotheses?

H 0 : 50

Ha : > 50

Feb-Apr, 2016

44 / 287

Session 3 Overview

The session demonstrates the process of hypothesizing, testing &

concluding given a single sample.

Feb-Apr, 2016

45 / 287

Session 3 Outline

t-test statistic

critical value, p-value

level of signicance,

Feb-Apr, 2016

46 / 287

Reading List

Chap 9 of Anderson, D.R., Sweeney, D.J., & Williams, T.A. (2011).

Statistics for Business and Economics (11th ed.). Sounth-Western Cengage

Learning

Chap 16 Buglear, John, 2005, Quantitative Methods for Business: The A-Z

of QM

Chap 9 of Newbold, P., Carlson, W. & Thorne, B (2013) Statistics for

Business and Economics, 8/E, Pearson

Feb-Apr, 2016

47 / 287

Examples of

t=

x

s

,

z=

Feb-Apr, 2016

48 / 287

Use

if sample is large (

30)

or

Determine the critical value:

< 30).

if sample is small (

df = n 1

tdf =,n1

Feb-Apr, 2016

49 / 287

p-Value Approach

t calculated or z calculated

p value .

Reject H 0 if p value <

Use the

to compute the

Feb-Apr, 2016

50 / 287

also called t tabulated .

For z , use only

Reject H 0 if t cal > t tab

Under

tdf =,n1

Feb-Apr, 2016

51 / 287

Z or t? Decision Rule

Ho ,

This implies rejecting

Ho

when the

P value <

Feb-Apr, 2016

52 / 287

t table

t Table

cum. prob

one-tail

two-tails

df

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

40

60

80

100

1000

.50

.75

.80

.85

.90

.95

.975

.99

.995

.999

.9995

0.50

1.00

0.25

0.50

0.20

0.40

0.15

0.30

0.10

0.20

0.05

0.10

0.025

0.05

0.01

0.02

0.005

0.01

0.001

0.002

0.0005

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

1.000

0.816

0.765

0.741

0.727

0.718

0.711

0.706

0.703

0.700

0.697

0.695

0.694

0.692

0.691

0.690

0.689

0.688

0.688

0.687

0.686

0.686

0.685

0.685

0.684

0.684

0.684

0.683

0.683

0.683

0.681

0.679

0.678

0.677

0.675

1.376

1.061

0.978

0.941

0.920

0.906

0.896

0.889

0.883

0.879

0.876

0.873

0.870

0.868

0.866

0.865

0.863

0.862

0.861

0.860

0.859

0.858

0.858

0.857

0.856

0.856

0.855

0.855

0.854

0.854

0.851

0.848

0.846

0.845

0.842

1.963

1.386

1.250

1.190

1.156

1.134

1.119

1.108

1.100

1.093

1.088

1.083

1.079

1.076

1.074

1.071

1.069

1.067

1.066

1.064

1.063

1.061

1.060

1.059

1.058

1.058

1.057

1.056

1.055

1.055

1.050

1.045

1.043

1.042

1.037

3.078

1.886

1.638

1.533

1.476

1.440

1.415

1.397

1.383

1.372

1.363

1.356

1.350

1.345

1.341

1.337

1.333

1.330

1.328

1.325

1.323

1.321

1.319

1.318

1.316

1.315

1.314

1.313

1.311

1.310

1.303

1.296

1.292

1.290

1.282

6.314

2.920

2.353

2.132

2.015

1.943

1.895

1.860

1.833

1.812

1.796

1.782

1.771

1.761

1.753

1.746

1.740

1.734

1.729

1.725

1.721

1.717

1.714

1.711

1.708

1.706

1.703

1.701

1.699

1.697

1.684

1.671

1.664

1.660

1.646

12.71

4.303

3.182

2.776

2.571

2.447

2.365

2.306

2.262

2.228

2.201

2.179

2.160

2.145

2.131

2.120

2.110

2.101

2.093

2.086

2.080

2.074

2.069

2.064

2.060

2.056

2.052

2.048

2.045

2.042

2.021

2.000

1.990

1.984

1.962

31.82

6.965

4.541

3.747

3.365

3.143

2.998

2.896

2.821

2.764

2.718

2.681

2.650

2.624

2.602

2.583

2.567

2.552

2.539

2.528

2.518

2.508

2.500

2.492

2.485

2.479

2.473

2.467

2.462

2.457

2.423

2.390

2.374

2.364

2.330

63.66

9.925

5.841

4.604

4.032

3.707

3.499

3.355

3.250

3.169

3.106

3.055

3.012

2.977

2.947

2.921

2.898

2.878

2.861

2.845

2.831

2.819

2.807

2.797

2.787

2.779

2.771

2.763

2.756

2.750

2.704

2.660

2.639

2.626

2.581

318.31

22.327

10.215

7.173

5.893

5.208

4.785

4.501

4.297

4.144

4.025

3.930

3.852

3.787

3.733

3.686

3.646

3.610

3.579

3.552

3.527

3.505

3.485

3.467

3.450

3.435

3.421

3.408

3.396

3.385

3.307

3.232

3.195

3.174

3.098

636.62

31.599

12.924

8.610

6.869

5.959

5.408

5.041

4.781

4.587

4.437

4.318

4.221

4.140

4.073

4.015

3.965

3.922

3.883

3.850

3.819

3.792

3.768

3.745

3.725

3.707

3.690

3.674

3.659

3.646

3.551

3.460

3.416

3.390

3.300

0.000

0.674

0.842

1.036

1.282

1.645

1.960

2.326

2.576

3.090

0%

50%

60%

70%

80%

90%

95%

Confidence Level

98%

99%

99.8%

0.001

3.291

Feb-Apr, 2016

99.9%

53 / 287

And use the df to nd the t tabulated .

For e.g., if t calculated = 2.66, to get the p value , scan

the body of the t-table for the 2.66

Always use the

Feb-Apr, 2016

54 / 287

p value

of 0.005 (read on

Do the same for the 1% & 10% for either two-tail or one-tail.

Feb-Apr, 2016

55 / 287

Applied example 1

Suppose that you are thinking of taking over an SME. The

current owner claims the weekly turnover of each existing SME is

not dierent from

GH5000

below this gure. You examine the books of

26

GH4900

SMEs chosen at

standard deviation

GH280

with

Feb-Apr, 2016

56 / 287

H0 : = 5000

Ha : 6= 5000

and

= 0.05,

Feb-Apr, 2016

57 / 287

t=

t=

x

s

n

49005000

280

= 1.82

26

the t tabulated (t-critical) & compare

This is the

Now, nd

Feb-Apr, 2016

58 / 287

Ha ). Read df = n 1 &

= t25,0.05 = 2.060 is get t tabulated

Two-tail (see

tn1,

As the

at 5%

& conclude that weekly turnover is GHc5000.

Feb-Apr, 2016

59 / 287

For

p value ,

use

t cal. = |1.82|.

pool

Figure falls between 1.812 & 1.833

Either way, you get 0.1 (at the top) as its a two-tail

CONCLUDE

reject

H0

: As p-value of 0.1

>

t table .

Feb-Apr, 2016

60 / 287

Diwoeasem, a tomato grower has developed a new variety of

tomato. He claim that the average yield per plant is at least 4kg

of fruit. A gardening magazine tests this claim by growing some

plants, & measuring the yield, obtained 0.69 standard deviation

(see below). Does this data support Diwoeasem's claim?

Formulate the hypotheses. Use both the p-value and critical

value approaches.

Feb-Apr, 2016

61 / 287

3.6

4.2

3.8

2.7

4.0

4.8

2.7

3.9

4.2

4.5

Feb-Apr, 2016

62 / 287

H0 : 4kg

H0 : < 4kg

Feb-Apr, 2016

63 / 287

Signicance level

= 5% = 0.05

Which test is appropriate. . . .Why?

Is this a one tail or a two tail test?

Feb-Apr, 2016

64 / 287

t = x

s

and

= 0.05

t=

3.844

0.69

= 0.73

10

This is the

t calculated

or

t test

statistic

Feb-Apr, 2016

65 / 287

Dr K. Ohene-Asare & Mr. Abeeku E. Edu (UGBS)

is the

t tabulated

Feb-Apr, 2016

66 / 287

Since the

t statistic

of |0.73| is <

t critical

of 1.833 we

Data strongly suggests that the true average yield is indeed at

least 4 kg.

Gardening magazine must support diwoeasem's claim.

Dierence of 4-3.84=0.16kg is just by chance.

Feb-Apr, 2016

67 / 287

To nd the

p value ,

use

t cal. = |0.73|.

Search from

Falls between 0.727 & 0.741. What tail?

Either way, you get 0.25 (at the top), a one-tail.

CONCLUDE

p value = 0.25

5% sig. level.

: As

reject H0 at the

>=

0 05, we do not

Feb-Apr, 2016

68 / 287

Practice Question 1

mean tenure for the CEO was not below 9 years. A survey of

companies reported in The Wall Street Journal found a sample

And(pg357,28)

Feb-Apr, 2016

69 / 287

validity of the claim made by the shareholders' group.

At = 1% , what is your statistical & practical conclusion

the

using

Feb-Apr, 2016

70 / 287

Practice Question 2

A travel magazine wants to classify transatlantic gateway airports

using mean rating for the population of travelers. A scale with a

low score of 0 & a high score of 10 is used & airports with a

population mean rating above 7 will be designated as superior

airports. The magazine sta sampled 16 travelers at each

airport. The sample for London's Heathrow Airport provided a

mean rating of 7.25 with a standard deviation of 1.052. Should

Heathrow be designated as superior airport?

( = 10%).

Feb-Apr, 2016

71 / 287

Practice Question 3

According to the label on packets of popcorns there should be 25

g of popcorns in every packet. The standard deviation of the

weight of popcorns per packet is known to be 2.2 g & the

weights are normally distributed. The mean weight of popcorns

in a random sample of 15 packets is 23.5 g. Test the hypothesis

that the information on the label is valid using a 1% level of

condence.

Feb-Apr, 2016

72 / 287

Session 4 Overview

We extend the 1 sample/population analysis to a 2-sample study,

when the dierence between the 2 population means is

important. .

For example, we may want to test for the eect of customer

training workshop on the sales of salespersons in a company or

the impact of a reform in an industry. Policy prescriptions may

be oered based on the ndings.

Feb-Apr, 2016

73 / 287

Session 4 Overview

bivariate paired data

hypotheses on dierence between 2 population means using

independent samples

draw appropriate conclusions

Feb-Apr, 2016

74 / 287

Reading List

Chap 10 of Anderson, D.R., Sweeney, D.J., & Williams, T.A. (2011).

Statistics for Business and Economics (11th ed.). Sounth-Western Cengage

Learning

Pages 344-379 of Lane, D. (2003). Online Statistics Education: A

Multimedia Course of Study. In D. Lassner & C. McNaught (Eds.),

Proceedings of World Conference on Educational Multimedia, Hypermedia

and Telecommunications 2003 (pp. 1317-1320).

Chap 17 Buglear, John, 2005, Quantitative Methods for Business: The A-Z

of QM

Chap 10 of Newbold, P., Carlson, W. & Thorne, B (2013) Statistics for

Business and Economics, 8/E, Pearson

Dr K. Ohene-Asare & Mr. Abeeku E. Edu (UGBS)

Feb-Apr, 2016

75 / 287

Feb-Apr, 2016

76 / 287

Parametric test

Nonparametric

One-sample t-test

Binomial

Paired t-test (dept)

Wilcoxon signed-rank test

Paired t-test (dept)

McNemar's Chi-square test

Independent t-test

Mann-Whitney U or Wil. ranksum

Pearson's correlation

Spearman's corr (xy)

Pearson's correlation

Kendall tau rank corr (xyz)

ANOVA (>2 indep. grps)

Kruskal-Wallis test

Repeated meas. ANOVA

Friedman Test, Cochran Q

Dr K. Ohene-Asare & Mr. Abeeku E. Edu (UGBS)

Feb-Apr, 2016

77 / 287

Paired/Dependent Samples

2 related/paired/matched/before & after samples

Population dierence =

(d )

normally distributed

Feb-Apr, 2016

78 / 287

H0

Ha

H0 : d = 0 Ha : d 6= 0 (not equal)

H0 : d 0 Ha : d > 0 (greater than)

H0 : d 0 Ha : d < 0 (less than)

Find tn1, =?? i.e. t-tabulated

Type of test

Two-sided

One-sided

One-sided

Feb-Apr, 2016

79 / 287

r

t=

d

S

d

n

Sd =

(dd)

n1

where

Sd = sample standard dev. of dierences

n = the sample size (number of pairs)

Feb-Apr, 2016

80 / 287

Decision Rule

Reject

Reject

Reject

If

H0

H0

H0

H0

if

if

if

p value <

t calculated > t tabulated

t statistic > critical value

between 2 samples

Feb-Apr, 2016

81 / 287

Assume you send your salespeople to a customer service

training workshop. Has the training made a dierence in the

number of complaints? You collect the following data:

t=

r

& Sd =

(dd)

n1

di

n

S

d

n

d

Sd / n

Feb-Apr, 2016

82 / 287

After - Before

Ha : d 6= 0, training is eective/postive

Feb-Apr, 2016

83 / 287

Feb-Apr, 2016

84 / 287

Salesperson Before (1) After (2) Difference, di

C.B.

T.F.

M.H.

R.K.

M.O.

6

20

3

0

4

4

6

2

0

0

- 2

-14

- 1

0

- 4

-21

d = ni

= - 4.2

Sd =

(d d)

n 1

= 5.67

Feb-Apr, 2016

85 / 287

t table

t Table

cum. prob

one-tail

two-tails

df

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

40

60

80

100

.50

.75

.80

.85

.90

.95

.975

.99

.995

.999

.9995

0.50

1.00

0.25

0.50

0.20

0.40

0.15

0.30

0.10

0.20

0.05

0.10

0.025

0.05

0.01

0.02

0.005

0.01

0.001

0.002

0.0005

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

1.000

0.816

0.765

0.741

0.727

0.718

0.711

0.706

0.703

0.700

0.697

0.695

0.694

0.692

0.691

0.690

0.689

0.688

0.688

0.687

0.686

0.686

0.685

0.685

0.684

0.684

0.684

0.683

0.683

0.683

0.681

0.679

0.678

0.677

1.376

1.061

0.978

0.941

0.920

0.906

0.896

0.889

0.883

0.879

0.876

0.873

0.870

0.868

0.866

0.865

0.863

0.862

0.861

0.860

0.859

0.858

0.858

0.857

0.856

0.856

0.855

0.855

0.854

0.854

0.851

0.848

0.846

0.845

1.963

1.386

1.250

1.190

1.156

1.134

1.119

1.108

1.100

1.093

1.088

1.083

1.079

1.076

1.074

1.071

1.069

1.067

1.066

1.064

1.063

1.061

1.060

1.059

1.058

1.058

1.057

1.056

1.055

1.055

1.050

1.045

1.043

1.042

3.078

1.886

1.638

1.533

1.476

1.440

1.415

1.397

1.383

1.372

1.363

1.356

1.350

1.345

1.341

1.337

1.333

1.330

1.328

1.325

1.323

1.321

1.319

1.318

1.316

1.315

1.314

1.313

1.311

1.310

1.303

1.296

1.292

1.290

6.314

2.920

2.353

2.132

2.015

1.943

1.895

1.860

1.833

1.812

1.796

1.782

1.771

1.761

1.753

1.746

1.740

1.734

1.729

1.725

1.721

1.717

1.714

1.711

1.708

1.706

1.703

1.701

1.699

1.697

1.684

1.671

1.664

1.660

12.71

4.303

3.182

2.776

2.571

2.447

2.365

2.306

2.262

2.228

2.201

2.179

2.160

2.145

2.131

2.120

2.110

2.101

2.093

2.086

2.080

2.074

2.069

2.064

2.060

2.056

2.052

2.048

2.045

2.042

2.021

2.000

1.990

1.984

31.82

6.965

4.541

3.747

3.365

3.143

2.998

2.896

2.821

2.764

2.718

2.681

2.650

2.624

2.602

2.583

2.567

2.552

2.539

2.528

2.518

2.508

2.500

2.492

2.485

2.479

2.473

2.467

2.462

2.457

2.423

2.390

2.374

2.364

63.66

9.925

5.841

4.604

4.032

3.707

3.499

3.355

3.250

3.169

3.106

3.055

3.012

2.977

2.947

2.921

2.898

2.878

2.861

2.845

2.831

2.819

2.807

2.797

2.787

2.779

2.771

2.763

2.756

2.750

2.704

2.660

2.639

2.626

318.31

22.327

10.215

7.173

5.893

5.208

4.785

4.501

4.297

4.144

4.025

3.930

3.852

3.787

3.733

3.686

3.646

3.610

3.579

3.552

3.527

3.505

3.485

3.467

3.450

3.435

3.421

3.408

3.396

3.385

3.307

3.232

3.195

3.174

636.62

31.599

12.924

8.610

6.869

5.959

5.408

5.041

4.781

4.587

4.437

4.318

4.221

4.140

4.073

4.015

3.965

3.922

3.883

3.850

3.819

3.792

3.768

3.745

3.725

3.707

3.690

3.674

3.659

3.646

3.551

3.460

3.416

3.390

0.001

Feb-Apr, 2016

86 / 287

t=

t=

d

Sd / n

4.

2

5.67/ 5

1 66

t = . i.e. t-calculated

tn1, = t51,0.05 = .

i.e.

2 776

t-tabulated

Feb-Apr, 2016

87 / 287

Use

1 66|

t=| .

Value is 1.660 &

p value = 0.10

Feb-Apr, 2016

88 / 287

Since the

p = 0.10

t statistic of |1.66|

= 0.05) we don't

>

is <

t critical

of 2.776 (or

signicance level.

No signicant dierence between complaints before & after.

Training was bogus. Any dierence was by chance.

Feb-Apr, 2016

89 / 287

Summary of results

Has the training made a difference in the number of

H0: x y = 0

H1: x y 0

= .05

d = - 4.2

d.f. = n 1 = 4

Reject

Reject

/2

- 2.776

2.776

- 1.66

(t stat is not in the reject region)

Test Statistic:

t=

d

4.2

=

= 1.66

sd/ n 5.67/ 5

Feb-Apr, 2016

90 / 287

Arrange data the normal way from excel. You can ignore the

names of salespersons

complaint=read.delim('clipboard')#load data from excel

complaint #view the data. feel it

boxplot(complaint$after,complaint$before, col="darkgreen")

#can check boxplot of the data

t.test(complaint$after,complaint$before, paired=TRUE)#run

the paired t test

Learn the independent t-test for independent samples on your

own

Dr K. Ohene-Asare & Mr. Abeeku E. Edu (UGBS)

Feb-Apr, 2016

91 / 287

Practice Question 1

A new therapy has been devised which is supposed to

lower blood pressure. The systolic blood pressure of 10

patients were taken before and after completing the course

(see below). Does this therapy work? Use a significance

level of 0.05.

Before After Difference

120

130 10

129

129 0

131

125 -6

136

128 -8

122

124 2

138

129 -9

139

130 -9

131

132 1

123

129 6

125

130 5

Feb-Apr, 2016

92 / 287

Practice Question 2

Tweaa is the VC of a large manufacturing company. He recently

noticed an increase in absenteeism that he thinks is related to

the general health of employees. Four years ago, in an attempt

to improve the situation, he began a tness program in which

employees exercise during their lunch hour. To evaluate the

program, he randomly samples some participants & found the

number of days each was absent. Below are the results. At 0.05

signicance level, did the program reduce absenteeism?

Feb-Apr, 2016

93 / 287

Feb-Apr, 2016

94 / 287

a) State the null & alternative hypotheses.

b) What is the critical value of the test statistic?

c) What is the value of the test statistic?

d) Decide using the critical value approach.

e) What is the p-value?

g) Did the tness program reduce absenteeism at the 5%

signicance level? Conclude practically.

Feb-Apr, 2016

95 / 287

Session 5 Overview

In some practical applications normality axiom is not tenable

especially when we have a wide range of distributions of the

parent population.

In such a case, we use nonparametric tests or distribution free

tests.

In this session we nonparametric tests for testing equality of

means/medians of 2 population distributions

Feb-Apr, 2016

96 / 287

Session 5 Overview

Mann-Whitney-U test or Wilcoxon rank sum test

Draw appropriate conclusions

Feb-Apr, 2016

97 / 287

Reading List

Chap 19 of Anderson, D.R., Sweeney, D.J., & Williams, T.A. (2011).

Statistics for Business and Economics (11th ed.). Sounth-Western Cengage

Learning

Pages 344-379 of Lane, D. (2003). Online Statistics Education: A

Multimedia Course of Study. In D. Lassner & C. McNaught (Eds.),

Proceedings of World Conference on Educational Multimedia, Hypermedia

and Telecommunications 2003 (pp. 1317-1320).

Chap 10 of Newbold, P., Carlson, W. & Thorne, B (2013) Statistics for

Business and Economics, 8/E, Pearson

Dr K. Ohene-Asare & Mr. Abeeku E. Edu (UGBS)

Feb-Apr, 2016

98 / 287

Parametric test

Nonparametric

One-sample t-test

Binomial

Paired t-test (dept)

Wilcoxon signed-rank test

Paired t-test (dept)

McNemar's Chi-square test

Independent t-test

Mann-Whitney U or Wil. ranksum

Pearson's correlation

Spearman's corr (xy)

Pearson's correlation

Kendall tau rank corr (xyz)

ANOVA (>2 indep. grps)

Kruskal-Wallis test

Repeated meas. ANOVA

Friedman Test, Cochran Q

Dr K. Ohene-Asare & Mr. Abeeku E. Edu (UGBS)

Feb-Apr, 2016

99 / 287

testing the equality of means in 2 independent / unpaired

samples. Mann-Whitney-Wilcoxon (MWW) test.

Feb-Apr, 2016

100 / 287

The main idea is to test whether 2 samples come from the same

population (i.e., if 2 populations have the same shape) by

comparing the ranks or ordinal values of the observations.

Some investigators interpret this test as comparing the medians

between the 2 populations.

Feb-Apr, 2016

101 / 287

H 0 : m1 m2 = 0

Ha : m1 m2 6= 0

If median (m) of sample

Ha : m1 m2 > 0 or

Ha : m1 > m2

1 is > sample 2,

Ha

becomes

Feb-Apr, 2016

102 / 287

H0 :

Ha :

The 2 populations are dierent

Example

H0 :

Ha :

foreign banks have higher eciency scores than domestic

banks.

Feb-Apr, 2016

103 / 287

Median class size for Math is larger than median class size for

English. Write

H0: MedianM

H 0 & Ha ____

MedianE (Math

English median)

HA: MedianM > MedianE (Math median is larger)

Feb-Apr, 2016

104 / 287

Median class size for Math is not at least that of the median

class size for English. Write Ho and Ha ____

H0: MedianM

MedianE

Feb-Apr, 2016

105 / 287

Korle Bu hospital is undertaking a clinical trial designed to

investigate the eectiveness of a new drug to reduce symptoms

of asthma in children. Participants are asked to record the

number of episodes of shortness of breath (dyspnea) over a 1

week period following receipt of the assigned treatment. The

non-normally distributed data are shown below. Is there a

dierence in the number of episodes of shortness of breath over

the 1 week period in participants receiving the new drug as

compared to those receiving the placebo?

Feb-Apr, 2016

106 / 287

have more episodes of shortness of breath. i.e. suer more from

dyspnea

BUT, is this statistically signicant?

Feb-Apr, 2016

107 / 287

Plot

&,E

^

E

Feb-Apr, 2016

108 / 287

Note:

n1

may be >

n2 .

(n1 = n2 = 5),

Hypotheses??

H 0 : m1 m2 = 0 ; Ha : m1 m2 6= 0

Feb-Apr, 2016

109 / 287

(n = 10)

Assign ranks from 1 to 10.

Also, track the group assignments in the total sample.

Feb-Apr, 2016

110 / 287

Feb-Apr, 2016

111 / 287

R1 =

R2 =

= 37

drug) = 18

sum of the ranks in group 2 (new

1)

(R1 + R2 ) = n(n+

2

37 + 18 = (10 11) /2 = 55

Check if:

Feb-Apr, 2016

112 / 287

The Mann-Whitney

U1 = R1

U2 = R2

n1 (n1 +1)

2

n2 (n2 +1)

2

Feb-Apr, 2016

113 / 287

Details of U

n1 (n1 +1)

2

if

all

observations in sample 2.

n2 (n2 +1)

2

if

all

observations in sample 1.

Feb-Apr, 2016

114 / 287

Results

Feb-Apr, 2016

115 / 287

Find

Find

n1

n2

Un ,n ,

1

in group 2 (largest group) along the side of the chart.

U tab

Feb-Apr, 2016

116 / 287

Two-sided test:

One-sided test:

if

U1 Un1,n2,

One-sided test:

if

(HA : m1 < m2) reject H0 if U1 is too small i.e.

(HA : m1 > m2)

reject

H0

if

U2

U2 Un1,n2,

tabulated U, reject

Feb-Apr, 2016

117 / 287

Back to example

U1 = R1

U2 = R2

n1 (n1 +1)

2

n2 (n2 +1)

2

= 37

= 18

5(5+1)

2

5(5+1)

2

= 22

=3

(smaller)

U1 + U2 = n1 n2

22 + 3 = 5 5 = 25

Feb-Apr, 2016

118 / 287

Decision

Umin Un1,n2,/2

Umin = 3> U5,5,0.05 = 2

We can't reject Ho because 3 >

Reject H0 if

2.

Feb-Apr, 2016

119 / 287

Conclusion

= 0.05,

shortness of breath are equal.

That is, no dierence in the medians . Our example is unique.

On the surface, sample data suggest a dierence, but the

are

Feb-Apr, 2016

120 / 287

1st: Arrange data long (vertical, with 1,0 for next column)

wm <-read.delim('clipboard') ## load data

wm ## view data

boxplot(wm, col="green")## quick boxplot

wilcox.test(drug ~ group, conf.int = TRUE, paired = FALSE,

data=wm)## compute mann-whitney test

Feb-Apr, 2016

121 / 287

## result: W = 22, p-value = 0.05855 alternative

hypothesis: true location shift is not equal to 0.

So there is no difference. The two samples are the

same. We can't reject H0 as p-value >

wilcox.test(drug ~ group, alternative =

"two.sided", conf.int = TRUE, paired = TRUE,

data=wm)## compute Wilcoxon signed rank test with

continuity correction

Dr K. Ohene-Asare & Mr. Abeeku E. Edu (UGBS)

Feb-Apr, 2016

122 / 287

Practice Assignment 1

A random sample of starting monthly salaries for graduates from

2 Ghanaian universities are below (GHc1000s):

UG KNUST

30

28.5

35

38

29

30.5

37.5

26

32

37

40

29

33

32

Feb-Apr, 2016

123 / 287

the median starting salary for UG graduates is higher than the

median starting salary for KNUST graduates? Solve by hand

&/or any software (e.g. R)

Feb-Apr, 2016

124 / 287

Session 6 Overview

dierence exist among them.

This is achieved via ANOVA (dependent and independent), &

their nonparametric analogues Friedman test & Kruskal Wallis

test

Feb-Apr, 2016

125 / 287

Session 6 outline

repeated ANOVA & independent ANOVA

Within & Between variations, F statistic,

ANOVA in excel & R

Feb-Apr, 2016

126 / 287

Reading List

Chap 13 of Anderson, D.R., Sweeney, D.J., & Williams, T.A. (2011).

Statistics for Business and Economics (11th ed.). Sounth-Western Cengage

Learning

Pages 493-549 of Lane, D. (2003). Online Statistics Education: A

Multimedia Course of Study. In D. Lassner & C. McNaught (Eds.),

Proceedings of World Conference on Educational Multimedia, Hypermedia

and Telecommunications 2003 (pp. 1317-1320).

Chap 15 of Newbold, P., Carlson, W. & Thorne, B (2013) Statistics for

Business and Economics, 8/E, Pearson

Dr K. Ohene-Asare & Mr. Abeeku E. Edu (UGBS)

Feb-Apr, 2016

127 / 287

T-Test

Feb-Apr, 2016

128 / 287

ANOVA

Feb-Apr, 2016

129 / 287

MANOVA

Feb-Apr, 2016

130 / 287

Parametric test

Nonparametric

One-sample t-test

Binomial

Paired t-test (dept)

Wilcoxon signed-rank test

Paired t-test (dept)

McNemar's Chi-square test

Independent t-test

Mann-Whitney U or Wil. ranksum

Pearson's correlation

Spearman's corr (xy)

Pearson's correlation

Kendall tau rank corr (xyz)

ANOVA (>2 indep. grps)

Kruskal-Wallis test

Repeated meas. ANOVA

Friedman Test, Cochran Q

Dr K. Ohene-Asare & Mr. Abeeku E. Edu (UGBS)

Feb-Apr, 2016

131 / 287

Normality

Continuous or scale or ratio data

Categorical groupings

Independent variables across groups

Independent random sampling

Homogeneity of variance for dependent variable

Feb-Apr, 2016

132 / 287

ANOVA

former, current)

Two-way ANOVA: Two factors e.g. gender (M/F) & smoking

status (never, former, current)

Three-way ANOVA: Three factors e.g. gender, smoking &

beer consumption

Feb-Apr, 2016

133 / 287

One-way repeated (correlated) measures: single group on which

you have measured something a few times.

It's the nonparametric analogue of the dependent Friedman test

For example, in research class, I give a test at the start of a

topic, at the end of the topic and at the end of the subject.

I can use a one-way dependent measures ANOVA to see if

student test performance changed over time

Feb-Apr, 2016

134 / 287

In an independent groups test, the subjects in the groups are

dierent people.

In a dependent/repeated measures case, the same subjects are

being tested under dierent conditions. They are the same

people.

(It doesn't have to be people; they could be owers, car engines,

rms or even barrels of beer)

Feb-Apr, 2016

135 / 287

Compare 5 subjects each tested with 4 drugs. drug is the

repeated variable here

Feb-Apr, 2016

136 / 287

means of three or more groups

Examples:

Expected mileage for 5 brands of tires

Feb-Apr, 2016

137 / 287

ANOVA (hypotheses)

H0 : 1 = 2 = 3 = ... = K

All population means are equal

i.e., no variation in means between groups

HA : i 6= j

i, j

pair

i.e., there is variation between groups

Does not mean

Feb-Apr, 2016

138 / 287

ANOVA (3)

Feb-Apr, 2016

139 / 287

ANOVA (4)

Feb-Apr, 2016

140 / 287

ANOVA chart

Feb-Apr, 2016

141 / 287

SS = Sum of Squares

df = degrees of freedom

MS = Mean Squares

n = sum of the sample sizes

K = number of groups

Feb-Apr, 2016

142 / 287

Feb-Apr, 2016

143 / 287

SST

SST =

ni

K X

X

(xij x)

i=1 j=1

xij =Pj th observation from group i

x = Nni xi overall sample mean

2

2

2

SST = (x11 x) + (x12 x) + ... + (xKnK x)

MST = SST

n1

Mean Square Total = SST/df

Feb-Apr, 2016

144 / 287

SSW

SSW =

ni

K X

X

(xij xi )2

i=1 j=1

xi = sample mean from group i

xij = j th observation in group i

SSW = (x11 x1 )2 + (x12 x1 )2 + ... + (xKnK xK )2

MSW = SSW

nK

Mean Square Within = SSW/df

Feb-Apr, 2016

145 / 287

SSG

SSG =

K

X

ni (

xi x)

i=1

xi = sample mean from group i

x = grand mean (mean of all data values)

2

2

2

SSG = n1 (

x1 x) + n12 (

x2 x) + ... + nK (

xK x)

MSG = KSSG

1

Mean Square Between Groups = SSG /df

Dr K. Ohene-Asare & Mr. Abeeku E. Edu (UGBS)

Feb-Apr, 2016

146 / 287

Feb-Apr, 2016

147 / 287

and the within estimate of variance

The ratio must always be positive

df1 = K 1

df2 = n K

will typically be large

H0

if

F > FK 1,nK ,

Feb-Apr, 2016

148 / 287

Decision: ANOVA

Feb-Apr, 2016

149 / 287

Example of ANOVA

dierent MBA groups:

1

Feb-Apr, 2016

150 / 287

i. Dene null and alternative hypotheses.

ii. Find the critical value of test statistic

iii. Find the value of the test statistic.

iv. What is your decision?

v. Is there a statistically signicant dierence between the mean

salaries of these MBA groups? Prove.

vii. If so, which group earns the least? Conclude practically.

Feb-Apr, 2016

151 / 287

ANOVA Table

Feb-Apr, 2016

152 / 287

ANSWER

H 0 : 1 = 2 = 3

Ha : i 6= j for at least

i, j = 1, 2, 3

Feb-Apr, 2016

153 / 287

Fcritical = FK 1,NK ,

K = number of groups , df1 = K 1, row

N = total sample from all groups , df2 = n K

Ftab = FK 1,NK , = F31,93,0.05 = F2,6,0.05

Ftab = 5.14

, column

Feb-Apr, 2016

154 / 287

ANOVA Table

Feb-Apr, 2016

155 / 287

Recall data

1

Feb-Apr, 2016

156 / 287

First: Find

x1 =

x2 =

x3 =

xi,

6

=2

3

15

=5

3

24

=8

3

Feb-Apr, 2016

157 / 287

PN

(xij )

N

1+2+3+4+5+6+7+8+9

=5

9

Grand mean:

x=

x=

SSG =

K

X

ni (

xi x)

i=1

SSG = 54

Dr K. Ohene-Asare & Mr. Abeeku E. Edu (UGBS)

Feb-Apr, 2016

158 / 287

SSW =

ni

K X

X

(xij xi )2

i=1 j=1

SSW1 = 2

Feb-Apr, 2016

159 / 287

SSW2

SSW2

SSW3

SSW3

= (4 5 + (5 5) + (6 5)2

=2

= (7 8)2 + (8 8)2 + (9 8)2

=2

Feb-Apr, 2016

160 / 287

SSW = 2 + 2 + 2

SSW = 6

BG

WG

SS df MS F

54

Feb-Apr, 2016

161 / 287

MSG = KSSG

1 and MSW =

MSG = 54

= 27

2

MSW = 66 = 1

BG

WG

SSW

NK

SS df MS F

54

27

Feb-Apr, 2016

162 / 287

F =

F =

MSG

MSW

27

= 27

1

BG

WG

SS df MS F

54

27

27

Decision Rule: Reject

H0

at 5% since

Feb-Apr, 2016

163 / 287

salaries of these MBA groups? Prove.

YES since we rejected H0, it means there is a dierence

vii. If so, which group earns the least? Conclude practically.

RMBA with lowest mean of 2

Feb-Apr, 2016

164 / 287

Data Analysis > ANOVA: Single Factor

For dependent/repeated measures samples:

Data Analysis > ANOVA: Two- Factor Without Replication

Feb-Apr, 2016

165 / 287

Feb-Apr, 2016

166 / 287

Example 2: Attempt

Feb-Apr, 2016

167 / 287

Feb-Apr, 2016

168 / 287

Club 1 Club 2 Club 3

254

234

200

263

218

222

241

235

197

237

227

206

251

216

204

x1 = 249.2

n1 = 5

x2 = 226.0

n2 = 5

x3 = 205.8

n3 = 5

x = 227.0

n = 15

K=3

SSW = (254 249.2)2 + (263 249.2)2 ++ (204 205.8)2 = 1119.6

MSG = 4716.4 / (3-1) = 2358.2

MSW = 1119.6 / (15-3) = 93.3

Dr K. Ohene-Asare & Mr. Abeeku E. Edu (UGBS)

F=

2358.2

= 25.275

93.3

Feb-Apr, 2016

169 / 287

Feb-Apr, 2016

170 / 287

Data Analysis > ANOVA: Single Factor

For dependent/repeated measures samples:

Data Analysis > ANOVA: Two- Factor Without Replication

Feb-Apr, 2016

171 / 287

Feb-Apr, 2016

172 / 287

EXCEL: data | data analysis | ANOVA: single factor

SUMMARY

Groups

Count

Sum

Average

Variance

Club 1

1246

249.2

108.2

Club 2

1130

226

77.5

Club 3

1029

205.8

94.2

ANOVA

Source of

Variation

SS

df

MS

Between

Groups

4716.4

2358.2

Within

Groups

1119.6

12

93.3

Total

5836.0

14

F

25.275

P-value

4.99E-05

F crit

3.89

Feb-Apr, 2016

173 / 287

Feb-Apr, 2016

174 / 287

(where 1st group numbers comes rst followed by next & so on).

This is for independent sample.

## The 2nd column has group names, with 1st group names

rst.

Feb-Apr, 2016

175 / 287

anovaclub = read.delim(0 clipboard 0 )

str (anovaclub) ## put data in order or 3 groups as

in excel

anovaclub

## view data

Feb-Apr, 2016

176 / 287

boxplots of the data. You'll see the variances don't have a

constant mean

results=aov(distance~club,data=anovaclub)## t ANOVA

models or table. # distance is the dep. var. # club = indep var.s

summary(results) ## show your answer

Feb-Apr, 2016

177 / 287

Results

Conclusion?? As

Feb-Apr, 2016

178 / 287

Parametric test

Nonparametric

One-sample t-test

Binomial

Paired t-test (dept)

Wilcoxon signed-rank test

Independent t-test

Mann-Whitney U or Wil. ranksum

Pearson's correlation

Spearman's corr (xy)

Pearson's correlation

Kendall tau rank corr (xyz)

ANOVA (>2 indep. grps)

Kruskal-Wallis test

Repeated meas. ANOVA

Friedman Test

Feb-Apr, 2016

179 / 287

Below are the salaries in thousands of Ghana cedis for three

dierent MBA groups:

RMBA

9

7

11

9

12

10

WMBA

13

20

14

13

EMBA

10

9

15

14

15

Feb-Apr, 2016

180 / 287

b. What is the decision rule, given the 5% signicance level?

c. Find the critical value of the test statistic

d. Compute the value of the test statistic

Feb-Apr, 2016

181 / 287

mean salaries between these MBA groups? If so, which group

earns the least? Conclude practically.

f. Determine what is driving the dierence in means by

performing multiple tests for each pairwise dierence

g. Which Group's mean salary is mostly driving the dierence

Feb-Apr, 2016

182 / 287

Dr. Asuo had the students in his research class rate his

performance as Excellent, Good, Fair, or Poor. The rating (i.e.

the treatment) a student gave the doctor was matched with his

or her course grade, which could range from 0 to 100. The

sample information is reported below. Is there a dierence in the

mean score/grade of the students in each of the four rating

categories? Use the .05 signicance level.

Feb-Apr, 2016

183 / 287

Assignment cont.

Feb-Apr, 2016

184 / 287

Assignment cont.

b. What is the decision rule

c. Calculate the critical value of the test statistic

d. Compute the value of the test statistic

Feb-Apr, 2016

185 / 287

Assignment cont.

each of the four rating categories? If so, which category of

students scored the most? Conclude practically.

f. Determine what is driving the dierence in means by

performing multiple tests for each pairwise dierence

Feb-Apr, 2016

186 / 287

Session 7 Overview

Research projects and managerial decisions often involve the

linkages between two or more variables. For instance, what is the

relationship between blood pressure and a person's weight? Do

other factors age, stress, diet, exercise etc - apart from weight

aect BP? How do you control these? How do you use each of

them to predict BP? What assumptions must be in place for the

modelling of such an association to exist? This session examines

association via correlation and causality via regression.

Feb-Apr, 2016

187 / 287

Session 7 outline

predictors & outcome variable, scatter plots, error term,

Goodness-of-t, signicant tests, condence intervals, p-value,

assumptions: linearity, normality, heteroskedasticity,

autocorrelation, heterogeneity, multicollinearity etc

Feb-Apr, 2016

188 / 287

Reading List

Chap 14 & 15 of Anderson, D.R., Sweeney, D.J., & Williams, T.A. (2011).

Statistics for Business and Economics (11th ed.). Sounth-Western Cengage

Learning

Chap 2-8 of Gujarati D. (2003), Basic Econometrics, 4th ed

Chap 11 & 12 of Newbold, P., Carlson, W. & Thorne, B (2013) Statistics

for Business and Economics, 8/E, Pearson

Chap 1-7 of Wooldridge, J.M. (2013), Introductory Econometrics: A

Modern Approach, 5th ed

Dr K. Ohene-Asare & Mr. Abeeku E. Edu (UGBS)

Feb-Apr, 2016

189 / 287

What is Correlation?

between 2 variables

Correlations are viewed as strong or weak

Correlations are viewed also as positive or negative

Feb-Apr, 2016

190 / 287

Feb-Apr, 2016

191 / 287

Feb-Apr, 2016

192 / 287

Correlation Coecient

Correlation coecient is a statistic that quanties a relation

between two variables

Falls between -1.00 and 1.00

The absolute value of the number (not the sign) indicates the

strength of the relation

measure of Correlation

Feb-Apr, 2016

193 / 287

Feb-Apr, 2016

194 / 287

What is Regression?

Regression analysis is a statistical technique used to analyze the

1

1 several

simple multiple

nexus between a

variable &

variables, leading to

) predictor/independent/explanatory

(

must be metric. In certain situations, non-metric (qualitative)

IVs can be incorporated using transformations in a dummy

variable regression e.g.: education's eect on income, impact

ofhours of study on students' grades

Feb-Apr, 2016

195 / 287

Simple regression:

yi = + xi + ui

Multiple model:

yi = 0 + 1 Ai + 2 Bi + 3 Ci + 4 Di + 5 Ei + ui

i = 1, ..., n = individual (group, country) index .

This is for a cross-sectional data set

Feb-Apr, 2016

196 / 287

A, B, ...E = values of k regressors for observation i

ui = random noise or idiosyncratic error for observation i

Feb-Apr, 2016

197 / 287

BPi = 0 + 1 agei + 2 weighti + 3 stressi + 4 pulsei + ui

Qid = 0 + 1 Pi + 2 Yi + 3 Ai + 4 Ci + ui

wage = 150 + 0.85educ + 2.4exp + 0.25pos + 0.05abili

Feb-Apr, 2016

198 / 287

Assumptions

Linearity :

are homogenous

N 0, 2I

No high multicollinearity, i.e. we need exogeneity of IVs.

Feb-Apr, 2016

199 / 287

levels:

If more than 2 levels, the number of dummy variables needed is

= number of levels - 1

Also called categorical or dichotomous or binary or 0-1 variable.

Feb-Apr, 2016

200 / 287

ROAi =

i

Interpret the coecient of prot, assets & type.

On average, ROA was 30 units greater when the rm was

domestic than foreign, given prot & assets.

Feb-Apr, 2016

201 / 287

Example

Brukutu Ventures, a local gin distillery is concerned about the

demand for its favourite gin bitters. Demand in most of its retail

shops has been hit hard due to new entrants into the market.

Management is concerned & wants to determine which

bitters, apart from the price (P) of gin. Of the determinants of

demand, the following were identied: price of complements (C),

average real income of customers (M) ...

Feb-Apr, 2016

202 / 287

Feb-Apr, 2016

203 / 287

Example cont'

location of retail shop (where LU=1 if location is urban or 0 if

rural area); dominant occupation (teaching, shing & trading)

around a retail shop (where OT=1 if the occupation is teaching

or 0 otherwise & OTR=1 if the occupation is trading or 0

otherwise); & nally, the dominant religion (Christian, Muslims

& Buddhists) of the people (where RC=1 if the people are

Christians or 0 otherwise & RM=1 if the people are Muslims or 0

otherwise). Use the correlation matrix & (dummy) regression

output to answer the questions that follow.

Feb-Apr, 2016

204 / 287

Feb-Apr, 2016

205 / 287

Statistics> OK> Input Range

Copy all data including headings.

Tick Labels in First Range

Select Output Range & where the results must be

Tick summary statistics>OK

Feb-Apr, 2016

206 / 287

Feb-Apr, 2016

207 / 287

Feb-Apr, 2016

208 / 287

Feb-Apr, 2016

209 / 287

Correlation questions

Which 3 pairs of variables are perfectly correlated?

Which 2 pairs of variables are mostly strongly correlated with the

response variable?

Q & P (-0.87)

Q & OTR (-0.78)

Feb-Apr, 2016

210 / 287

Which 3 pairs of variables are mostly multicollinear?

P & OTR (0.76)

P & C (-0.74)

M & RM (-0.73)

Which 3 pairs of variables are least correlated?

M & OT (-0.03)

Feb-Apr, 2016

211 / 287

OT & RC (0.05)

Q & OT (-0.11)

From matrix, identify 2 very good predictors

P (-0.87)

OTR (-0.78)

Feb-Apr, 2016

212 / 287

>OK>Input Y Range> Input X Range

Feb-Apr, 2016

213 / 287

Regression output

Feb-Apr, 2016

214 / 287

answering in multiple regression analysis

Feb-Apr, 2016

215 / 287

Feb-Apr, 2016

216 / 287

the data that is not explained by the independent variables

RSS = Regression Sum of Squares: measures amount of

variation explained by regression

TSS = Total Sum of Squares

Feb-Apr, 2016

217 / 287

R2

that is

RSS

ESS

R2 = TSS

= 1 TSS

0 < R2 < 1

R

Y

If

Also, Multiple

R=

R2

is explained by

Feb-Apr, 2016

218 / 287

Adjusted R 2 (1)

R2

never

The wish to penalize models with large

adjusted

Both

R2

&

has motivated an

model.

Feb-Apr, 2016

219 / 287

Adjusted R 2 (2)

R =1

ESS/(nK 1)

TSS/(n1)

n1

R = 1 (1 R 2 ) nK

1

With the

, as

rises,

RSS

&

DoF

both fall

Feb-Apr, 2016

220 / 287

K =

tj =

j

S.E .(^ )

j

j tnK 1, Sj

TSS = ESS + RSS

N = TSSdf + 1

2 = RSS

TSS

Feb-Apr, 2016

221 / 287

Multiple

R=

MSR =

MSE =

RSS

K

ESS

NK 1

R2

N1

R = 1 (1 R 2 ) NK

1

F = MSR

MSE

FK ,NK 1,

Feb-Apr, 2016

222 / 287

G= RSS = TSS - ESS = 34600 - 402.65 = 34197.35

B =

A =

RSS

R 2 = TSS

=

R 2 = 0.99

34,197.35

34600

= 0.99

D = N = TSSdf + 1 = 39 + 1 = 40

E = RSSdf = 8

F = TSSdf = 8 + 31 = 39

Feb-Apr, 2016

223 / 287

Feb-Apr, 2016

224 / 287

Fill in (2)

C =

R = 1 (1 R 2 )

N1

NK 1

C = 1-(1-0.99)(39)/(31) = 0.99

H = MSR=RSS/K = 34197.35/8= 4274.67

I = MSE=ESS/N-K-1 = 402.65/31 = 12.99

J = F=MSR/MSE = 4274.67/12.99 = 329.07

Feb-Apr, 2016

225 / 287

Feb-Apr, 2016

226 / 287

Fill in (3)

tj =

K =

L =

j

S.E .(^ )

j

coeff

SE

0.98

SE = t =

1.30 = 0.75

t = SE

= 419.92.43 = 3.95

N, Read N =| 1.35 |, you N = 0.20

O, Read O =| 8.58 |, you N = 0.00

M =

Feb-Apr, 2016

227 / 287

Feb-Apr, 2016

228 / 287

Q = 0 + 1 P + 2 C + 3 M + 4 LU + 5 OT

+6 OTR + 7 RC + 8 RM + u

Write the estimated regression equation.

Q = 280.27 - 3.16P - 0.11C - 0.98M - 5.11LU - 19.43OT 46.47OTR + 15.55RC - 23.01RM

Feb-Apr, 2016

229 / 287

Feb-Apr, 2016

230 / 287

.ANSWER

YES. Its coecient is statistically dierent from 0 (i.e. is

signicant) because

p value = 0.01

<

= 0.05

Feb-Apr, 2016

231 / 287

Feb-Apr, 2016

232 / 287

Using just the p-values, which variables' coecients are

signicantly dierent from zero?

C (0.01)

OT (0.00)

OTR (0.00)

RC (0.00)

RM (0.00)

Feb-Apr, 2016

233 / 287

Test

ANSWER

HA : 1 6= 0 i.e. 6 is signicant (always

a two-tail)

Feb-Apr, 2016

234 / 287

Feb-Apr, 2016

235 / 287

ANSWER CONTINUES..

t cal (1 ) = coe/s.e. = -2.01 2.01

t tab = tnK 1, = t4081,0.05 = t31,0.05

= 2.042

As the cal=2.01 < tab=2.04, we don't reject H0 at 5% level, &

conclude that coecient of price of gin is not signicant

Feb-Apr, 2016

236 / 287

Fitness of Model

Is the regression model well t? Interpret the apt measure used

ANSWER

R2

= 0.99 = 99%

Since

R 2 > 50%,

model is t!!!

the indep. vaiables

Feb-Apr, 2016

237 / 287

Feb-Apr, 2016

238 / 287

ANSWER

Reject

H0

If

p value <

signicant.

Feb-Apr, 2016

239 / 287

H0 : 1 = 2 = ... = 8 = 0 (all the coe.s are insig.)

HA : at least one i 6= 0, i = 1 6 (at least 1 X aects Y )

Find calculated F & tabulated F

Reject

H0

if

F =

MSR

MSE

> FK ,nK 1,

= 8,31,0.05

= FK ,NK 1,

i.e. if

= 2.26

Feb-Apr, 2016

240 / 287

Feb-Apr, 2016

241 / 287

How much higher is the Q predicted to be if a price rises by

GHc8?

Form a 95% condence interval for the eect of changes in real

income on Q.

That is,

j tnK 1,/2 Sj

Dr K. Ohene-Asare & Mr. Abeeku E. Edu (UGBS)

3 t4081,/2 S

= ??

Feb-Apr, 2016

242 / 287

0.98 2.042 0.75

0.98 1.5315

CI= -2.51 and 0.55

CI : 2.51 3 0.55

Feb-Apr, 2016

243 / 287

(Multicollinearity) Multicollinear variables are 2 independent

variables that are highly associated (linked) with each other

through their correlation coecient. In this case, you don't

include the dependent/outcome/response variable. It is a curse.

VIF measures how much the variance of your coecients is

inated" by multicollinearity

VIF =

1

1Rj2

Feb-Apr, 2016

244 / 287

where

Rj2

is the

R2

independent variables

If there is no collinearity between X1 & X2, then VIF = 1.

As a rule of thumb, VIF > 10 indicates high collinearity

Feb-Apr, 2016

245 / 287

Feb-Apr, 2016

246 / 287

Feb-Apr, 2016

247 / 287

Question on multicollinearity

Using only the VIF, which one of the pairs of variables selected

to be multicollinear may be deleted? Justify?

ANSWER

P & C, drop P with higher VIF of 24.15

M & RM, drop RM with a higher VIF of 20.00

Feb-Apr, 2016

248 / 287

1 unit rise in average real income of customers will reduce Qd of

Brukutu gin bitters by 0.98 units.

Also, interpret the coecient of price & complements

1 unit rise in the price of C of gin will reduce Qd of Brukutu gin

bitters by 0.11 units.

Feb-Apr, 2016

249 / 287

Interpret the coecient of teaching occupation

On the average, holding all other factors constant, Qd of

Brukutu gin bitters is expected to be 19.43 units less if the

people are teachers than shers.

Interpret the coecient of urban location

On the average, holding all other factors constant, Qd of

Brukutu gin bitters is expected to be 5.11 units less if a shop is

located in an urban than a rural area.

Feb-Apr, 2016

250 / 287

On the average, holding all other factors constant, Qd of

Brukutu gin bitters is expected to be 15.55 units more if the

people are Christains than Buddhists.

Interpret the coecient of muslim religion

Feb-Apr, 2016

251 / 287

What is the predicted quantity demanded of gin bitters that sells

for GHc10, has a complement price of GHc50, is sold in a shing,

rural shopping area where the Christian population, on average,

earn an income of GHc5?

Q = 280.27 - 3.16P - 0.11C - 0.98M - 5.11LU - 19.43OT 46.47OTR + 15.55RC - 23.01RM

Q = 280.27 - 3.16(10) - 0.11(50) - 0.98(5) - 5.11(0) - 19.43(0) 46.47(0) + 15.55(1) - 23.01(0)

Q = 253.82

Feb-Apr, 2016

252 / 287

What is the predicted quantity demanded of free gin bitters that

has a complement price of GHc30, is located in an urban,

teaching area where the Buddhists population, albeit unemployed

they get gifts from friends?

Q = 280.27 - 3.16P - 0.11C - 0.98M - 5.11LU - 19.43OT 46.47OTR + 15.55RC - 23.01RM

Q = 280.27 - 3.16(0) - 0.11(30) - 0.98(0) - 5.11(1) - 19.43(1) 46.47(0) + 15.55(0) - 23.01(0)

Q = 252.43

Feb-Apr, 2016

253 / 287

Comparing Predictions..

some Muslim traders in an urban with that of quantity of gin

bitters made by some Buddhist shers in a rural area given that

both gin bitters cost GHc15, have a complement price of GHc15,

& average real income of customers is GHc20? Which factor(s)

appears to be causing a comparative dierence (if any)?

Feb-Apr, 2016

254 / 287

Q = 280.27 - 3.16(15) - 0.11(15) - 0.98(20) - 5.11(1) - 19.43(0)

- 46.47(1) + 15.55(0) - 23.01(1)

Q = 136.79

Feb-Apr, 2016

255 / 287

Q = 280.27 - 3.16(15) - 0.11(15) - 0.98(20) - 5.11(0) - 19.43(0)

- 46.47(0) + 15.55(0) - 23.01(0) >

Q = 211.62

Feb-Apr, 2016

256 / 287

1st scenario

Factors causing the comparative dierence are dierences in

location, occupation & religion

Feb-Apr, 2016

257 / 287

excel to R. Use only the uncoded categorical variables: e.g.:

loca (urban, rural), occu (teaching, shing & trading) etc. In

excel, hide the variable columns - LU, OT, OTR, RC, & RM

gin #look at your data in R console

summary(gin) #see the summary statistics of the data

Feb-Apr, 2016

258 / 287

require(psych) #To do the next analyses, choose `Install

Packages under packages. Then, type `psych' and click install to

install it. Then type

describe(gin) #Get more descriptive or summary statistics .

What's the sample size? mean for complement?

qqnorm(gin$Q, col="blue", lwd=4);qqline(gin$Q, col="red",

lwd=3) #normal probability plot is a graphical tool for

comparing a data set with the normal distribution.

Feb-Apr, 2016

259 / 287

hist(gin$Q, col="blue" # draw histogram of Qd of gin bitters &

colour it blue

shapiro.test(gin$Q) #test of normality. H0 is data is normal.:

#H0: samples come from normal distribution". Reject H0 if

p-value <= 0.05. In ours, p-value

= 0.00387 < .

Qd data is

not normal

plot(density(gin$Q)) #density plot for Qd of gin bitters

Feb-Apr, 2016

260 / 287

Scatter plot of Qd vs P

ylab="Q", pch=19, col="red", lwd=4) ## where xlab is x-axis

lable & pch= plot character: box, dot, star etc. And "col"=color

to red. Try blue, lightblue, green. pch=21 plots an open circle,

pch=19 plots a solid circle. Try others.

Feb-Apr, 2016

261 / 287

pairs(gin) #Create scatterplot matrix of whole data

pairs(~Q+P+C+M+loca+rel, col="red", data=gin)

#scatterplot matrix of some variables. Can change colors.

cor(gin)#do pearson correlation. You can only do this with the

coded data set

OR

after removing any nominal value X rst

Feb-Apr, 2016

262 / 287

round(cor(gin),2) #Generate correlation matrix correct to 2 d.p.

with coded data set OR

round(cor(gin[,-which(names(gin) %in%

c("loca","occu","rel"))]),2) #remove nominal variables rst

library(Hmisc) #install & load library (Hmisc) rst. Then,

rcorr(as.matrix(gin)) #pearson correlation with p-values. OR

rcorr(as.matrix(gin[,-which(names(gin) %in%

c("loca","occu","rel"))]),2) #remove nominal variables rst

Feb-Apr, 2016

263 / 287

Do a simple regression

Q=dep var.; lm=linear model

plot(Q ~ P, col="blue",lwd=2, data=gin,main="my scatterplot

of Q & P & tted line"); abline (ginsimple,lwd=4,col="red") #

add tted reg line to the scatterplot

summary(ginsimple) #show answer of this 2by2

Feb-Apr, 2016

264 / 287

Do Multiple Regression in R

ginmulti = lm(Q ~ P+C+M+LU+OT+OTR+RC+RM,

data=gin) #run OLS. multiple reg with coded data

ginmulti = lm(Q ~ P+C+M+loca+occu+rel, data=gin) #run

OLS with the nominal data

summary(ginmulti) ## display results

round(connt(ginmulti),2) ## Condende Interval using proled

log-likelihood & correct to 2 d.p.

Feb-Apr, 2016

265 / 287

Do Multiple Regression in R

library(car) #load the package car rst

round(vif(ginmulti), 10) ## variance ination factor for multico.

2 d.p.

sqrt(vif(ginmulti)) > 10 # which ones are problematic?

Feb-Apr, 2016

266 / 287

The cross-sectional data was extracted from the 1974 Motor

Trend US magazine, and comprises fuel consumption and 10

aspects of automobile design and performance for 32 automobiles

(197374 models). It can be found in: Henderson and Velleman

(1981), Building multiple regression models interactively.

Biometrics, 37, 391411. The data frame is 32 observations on

11 variables. I've dropped the 8th variable, so, you've 10.

Feb-Apr, 2016

267 / 287

Feb-Apr, 2016

268 / 287

automobiles. The more number of cylinders, the more power you

can make. But, some 4 cylinder engines make more power than

V8 engines. The power an engine produces is called horsepower.

Displacement is determined from the bore and stroke of an

engine's cylinders. Use R software to do your assignment.

Feb-Apr, 2016

269 / 287

[, 1] mpg Miles/(US) gallon

[, 2] cyl Number of cylinders

[, 3] disp Displacement (cu.in.)

[, 4] hp Gross horsepower

[, 5] drat Rear axle ratio

[, 6] wt Weight (lb/1000)

Feb-Apr, 2016

270 / 287

[, 9] am Transmission (0 = automatic, 1 = manual)

[,10] gear Number of forward gears

[,11] carb Number of carburetors; regulates the ow of air &

gasoline into the engine cylinders.

Feb-Apr, 2016

271 / 287

1. How many variables are categorical & how many are

numerical?

2. Display the full descriptive statistics of the data in R. Why do

you think some variables are starred (*)?

3. What is the median for mpg, standard deviation for

horsepower , skewness for weight & the kurtosis for

displacement?

4. Is the mpg variable normally distributed? (use any

appropriate plot). Test if mpg is normal.

Feb-Apr, 2016

272 / 287

5. Create a blue scatterplot matrix of mpg, disp, hp, & drat, wt,

qsec. From your graph, is the correlation between mpg & weight

positive or negative?

6. What's the pearson correlation coecient between mpg &

weight and between horsepower & displacement correct to 2

d.p

7. Generate a simple regression plot of mpg on wt with a red

tted line.

8. Run the whole multiple regression model (label the equation

as "regcar") & use it to answer other questions below

Feb-Apr, 2016

273 / 287

9. Specify the regression model for the whole data. You may

shortcut some of the variable names.

10. In one sentence, which variables are signicant and which are

not? Justify

11. Interpret the coecient of displacement, cylsix, ammanual,

gearthree, & carbSix.

12. Is the regression model well t? Explain

Feb-Apr, 2016

274 / 287

13. Compare the mpg of a car with four cylinders, 180

displacement, 115 horsepower, 4.5 drat, 5kg weight, 20 qsec, is

manual, has four gearbox and two carburators with that of the

mpg of a car with six cylinders, 200 displacement, 80 horsepower,

4 drat, 3kg weight, 20 qsec, is automatic, has three gearbox and

three carburators. Which factor(s) appears to be causing a

comparative dierence (if any)?

Feb-Apr, 2016

275 / 287

Golden Tulip Hotel claims it is still the best in Ghana despite sti

competition from Fiesta Royal, La Palm, Robinhood etc. It has

embarked on price restructuring and adverts to boost demand for

the coming year. It has a number of outlets (including Golden

Tulip Kumasi city), each having data on the number of meals

served (Q) which is regressed on average price per meal (P in

GH), .......

Feb-Apr, 2016

276 / 287

GH), & the average income per household in each outlet's

immediate service area (Y in GH). Ordinary least squares

estimation of the regression equation based on the data led to

the following table. Use it to answer the questions that follow.

Feb-Apr, 2016

277 / 287

Feb-Apr, 2016

278 / 287

Which 3 pairs of variables are perfectly correlated?

Which 3 pairs of variables may be mostly strongly correlated with

the response variable?

Which 3 pairs of variables may be mostly multicollinear

Which 3 pairs of variables are least multicollinear?

Which 3 pairs of variables are least correlated?

Find the missing values in the regression output

Feb-Apr, 2016

279 / 287

Feb-Apr, 2016

280 / 287

Reg Output

Feb-Apr, 2016

281 / 287

1 Specify the regression model for the whole data.

2 Write the estimated regression equation.

3 Interpret the coecient of price. Is it consistent with

expectation?

4 Is the estimate of

advert

price

Feb-Apr, 2016

282 / 287

Test

if the coecient of

income

words, give the test statistic and explain your real-world

conclusions.

Is the regression model well t?

Are the coecients jointly signicant?

Feb-Apr, 2016

283 / 287

Golden Tulip wants to construct an approximate 95% prediction

interval for the number of meals served in an outlet given that

price is GH4, competitor's price is GH5, advertising is GH2

and nothing (i.e. no change) for income. As a consultant to

Golden Tulip, can you assist them?

Which one of the pairs of multicollinear variables selected earlier

could be deleted from the regression? Why?

Feb-Apr, 2016

284 / 287

References

Business and Economics, 7th edition, Publisher: Prentice Hall,

ISBN: 978-0-13-608536-2.

Wood, M. (2003) Making Sense of Statistics: A

Non-Mathematical Approach, Basingstoke and New York:

Palgrave.

Feb-Apr, 2016

285 / 287

References

in Social Research, Pearson Education 11/E

Bryman, A. and Bell, E. (2007) Business Research Methods

(second edition), Oxford

Wisniewski M. (2006) Quantitative Methods for Decision Makers

Feb-Apr, 2016

286 / 287

References

Hemel Hempstead: Prentice Hall.

Morris, C. (2002) Quantitative Approaches in Business Studies,

sixth edition, Harlow, Essex: Financial Times Prentice Hall.

See course outline for more references.

Feb-Apr, 2016

287 / 287

## Molto più che documenti.

Scopri tutto ciò che Scribd ha da offrire, inclusi libri e audiolibri dei maggiori editori.

Annulla in qualsiasi momento.