Sei sulla pagina 1di 10

Chapter 10

Correlation and Regression Analysis

Correlation

Strength of association between two variables.


Tells us how much the two variables are associated with one another.
However doesnt assume CAUSATION.
Simply tells us whether the two variables are positively or negatively correlated.

Regression
If there is a strong correlation between two variables, Regression is used to determine
the value of dependent variable (Y) from the value of independent variable (X)
Types
Simple Linear Regression
Determines the value of a Dependent Variable based on a single independent
variable
Simplest form of Regression Analysis
Multiple Linear Regression
Used when the Dependent Variable is a continuous variable and independent
variables are continuous or categorical.
Logistic Regression
Logistic Regression is used when the outcome variable is categorical
The independent variables could be either categorical or continuous
Logistic Regression determines the Odds Ratio for various independent variables
for the dichotomous dependent variable

Correlation Analysis is a group of statistical techniques to measure the association between two
variables.
The Dependent Variable is the variable being predicted or estimated.
The Independent Variable provides the basis for estimation. It is the predictor variable.
A Scatter Diagram is a chart that portrays the relationship between two variables.
The Coefficient of Correlation (r) is a measure of the strength of the relationship between two
variables. Also called Pearsons r and Pearsons product moment correlation coefficient.

It can range from -1.00 to 1.00.


Values of -1.00 or 1.00 indicate perfect and strong correlation.
Negative values indicate an inverse relationship and positive values indicate a direct
relationship.
It requires interval or ratio-scaled data.
Values close to 0.0 indicate weak correlation.
Chapter 12
CHI- SQUARE TEST
In sampling testing certain assumptions are made about the population
and samples. For example in sampling distribution it is assumed that the
samples are drawn from a population following normal distribution. But
sometimes it is not possible to make any assumptions about the distribution
of the population from where the samples are taken. In such situations we
follow a non parametric test. Chi Square is one such non parametric test
While collecting data if the observed frequency O and expected
frequency E are known then

2 = [(O E) 2 /E] follows a distribution known as chi square


distribution.

Properties of Chi Square Test

1. It is a non parametric test. Assumptions about the form of the distribution or its
parameters are not required
2. It is a distribution free test, which can be used in any type of distribution of population.
3. It is easy to calculate chi square statistics
4. It analyses the differences between a set of observed frequencies and a set of
corresponding expected frequencies.
5. It is a multinomial distribution
6. The variable varies from 0 to
7. It is a one tailed test

Uses of Chi Square Test

1. Useful for the test of goodness of fit


2. Useful for the test of independence of attributes
3. Useful for testing homogeneity
4. Useful for testing given population variance

Conditions for Applying Chi Square Test

1. The total frequencies (N) must be reasonably large than or at least 50


2. Observed frequency of less than 5 is pooled with the preceding or succeeding frequencies
so that no observed frequency is less than 5. Then the degree of freedom is based on the
resulting number of frequencies.
3. The distribution should not be expressed as proportions or percentages. It should be of
original units.

How to Test the Goodness of Fit for a Multinomial Distribution?

1. State the null and alternate hypothesis


2. Select a random sample and record the observed frequencies (Oi) for each category.
3. Assume the null hypothesis is true. With this assumption determine the expected
frequencies (Ei).
4. Compute the value of the test statistic using 2 = [(Oi Ei) 2 /Ei]
5. Fix the level of significance
6. Write the degree of freedom
7. Find the critical value corresponding to the degree of freedom and level of significance.
8. Check whether
a. Test statistic is < Critical Value
b. Test statistic is > Critical Value

Accept the null hypothesis if 2 (Test Statistic) < Critical Value

Reject the null hypothesis if 2 (Test Statistic) > Critical Value

2 Distribution

Type - I

1. Explain 2 test. How will you use this to test a hypothesis? What are the precautions
which are to be taken for 2 testing? Mention the uses of 2.

2. The following data shows the distribution of frequency between educational level and
awareness of AIDS. Find the relationship between education and awareness.
Awareness Level
Education Low Moderate Higher
Illiterate 320 50 30
Primary 80 15 05
Middle School 110 70 20
High School 200 60 40
Higher Secondary 310 130 60

3. Due to recession, an IT company is planning to lay-off some of its personnel. The results
of opinion survey conducted by a company are given below. Formulate the hypothesis
and test it using 2 test at 0.05 level of significance

Opinion Towards Retrenchment


Occupation Oppose Undecided Favor
Administrative Staf 37 16 19
Project Team Manager 46 22 15
Project Team Leader 32 11 2

4. The following table gives the data regarding the field of study in the University and their
field of Specialization in High School

Field of Study in The Unversity


Specialization in High Biology Medicine Agriculture
School
Biology 26 52 23
Physics & Mathematics 3 44 8
Agriculture 4 1 1
Humanities 6 4 10

Test whether there is any association between High School Specialization and field of
study in the University

5. Based on the following data test the hypothesis that there is no difference in quality of the
kind of tyres. ( = 0.05)

Tyre Brand
A B C D
Failed to last 4000 kms 26 23 15 32
Lasted to 4000-6000kms 118 93 116 121
Lasted for more than 6000kms 56 84 67 49
6. Following table provides the number of executives according to the time devoted to
public activities by rank

Rank
Time Devoted Manager Sr. Manager Gr.Manager
A Good Deal 25 13 9
Some Time 62 53 49
Never 12 34 43

Test the hypothesis that the time devoted to public activities is independent of the rank.

7. A survey was conducted in Bangalore city as well as in the rest of Karnataka state
regarding the peoples first choice of four types of magazines. The results are tabulated
below

City
Type of Magazine Bangalore Rest of Karnataka
News Magazine 70 310
Movie Magazine 60 280
Ladies Magazine 40 170
Sports Magazine 30 40
Test whether there is any significant diference between Bangalore
population and rest of Karnataka in the choice of magazines.
8. A sample of 115 professionals, 110 businessmen, and 125 farmers were chosen and asked
to express their feelings regarding a national policy. The result of the survey is given
below:

Occupation Favorable Against Indifferent


Professionals 80 21 14
Businessmen 72 15 23
Farmers 69 31 25
Can we conclude that there exists no diference in opinion among the
three classes of people in the policy?
9. A survey on the consumption of alcoholic drinks by menial staff of a municipal
corporation gave the following data:

Consumers Non - Consumers


Attenders 86 31
Scavengers 46 26
Road Workers 64 10
Test whether there is any association between type of work and
alcoholic drinking habit
10. The results of a survey to know the educational attainment among 100 persons randomly
selected in a locality are given below:

Education
Sex Middle School High School College
Male 10 15 25
Female 25 10 15
Can you conclude that education depends of sex of the individual?
11. The following table gives a sample of married women, their level of education and
marriage adjustment scores.

Marriage Adjustment Scores


Level of Low Medium High
Education
College 24 97 62
High School 26 32 24
Middle School 40 22 14
Analyze the data and give your comments

12. Formulate an appropriate hypothesis and use 2 test for the following data
Inter-Caste Marriage
Socioeconomic Status Favorable Indifferent Unfavorable
Low 40 25 10
Moderate 35 30 15
High 25 45 5

13. An oil company has explored three different areas for possible oil reserves the results of
the test were given as below.

Area
A B C
Strikes 7 10 8
Dry Holes 10 18 9
Does the data suggest that the three areas have the same potential at 10% level of
significance?
Type - II

14. On the basis of the information given below find if there is any association with
inoculation and absence of attack of typhoid at 5% level of significance.

Attacked Not Attacked Total


Inoculated 12 674 686
Not Inoculated 47 1122 1169

15. Find the relationship between Educational Status and Safety awareness of workers, Use

2 test at 0.05 level of significance. Propose the Null hypothesis and Alternate

Hypothesis.

Level of Awareness
Educational Status Low High
Upto 10th Std. 30 70
Professional 60 40

Type III

16. A theory predicts the proportion of beans in the four groups A, B, C and D should be
9:3:3:1. In an experiment among 1600 beans, the numbers in the four groups were 882,

313, 287 and 118. Does the experiment result support the theory? Apply 2 test.
17. A college is running post graduate classes in five subjects with equal number of students.
The total number of absentees in these five classes is 75. Test the hypothesis that these
classes are alike in absentees if the actual absentees in each are as follows: History = 9;
Philosophy = 18; Economics = 15; Commerce = 12; Chemistry = 11.

18. How well do the airline companies serve their customers? A study showed the following
customer ratings. 3% Excellent, 28% Good, 45% Fair and 24% Poor. In a follow up study
of telephone companies, a sample of 400 adults found the following customer ratings 24
Excellent, 124 Good, 172 Fair and 80 Poor. Does the distribution of customer ratings for
telephone companies differ from the distribution of the customer ratings for the airline
companies?

19. A multinomial population with four categories A, B, C and D have the following
proportion of items same in all categories. A sample of size 300 yielded the following
results: A = 85, B = 95, C = 50 and D = 70. Use = 0.05 to determine whether the claim
of the proportions being same in every category is true.

20. During the first 13 weeks of television season, the Saturday evening 8.00pm to 9.00pm
audience proportion were recorded as ABC = 29%, CBS = 28%, NBC = 25% and
Independent = 18%. A sample of 300 homes two weeks after a Saturday night schedule
revision yielded the following viewing audience data ABC = 95 homes, CBS = 70 homes,
NBC = 89 homes and Independent = 46 homes. Test with = 0.05 to determine whether
the viewing audience proportion have changed.

Type IV

21. A die is thrown 132 times with the following results

No: Turned Up 1 2 3 4 5 6
Frequency 16 20 25 14 29 28
Is the die biased?

22. Eight coins were tossed 256 times and the following results were obtained

No: of Heads 0 1 2 3 4 5 6 7 8
Frequency 2 6 30 52 67 56 32 10 1

Is the coin biased? Use


2 test
23. The following shows the result of throwing 12 dice, 4096 times. A throw of 4,5,6 being
called a success. Fit a Binomial Distribution.

Success 0 1 2 3 4 5 6 7 8 9 10 11 12
Frequency - 7 60 198 430 731 948 847 536 257 71 11 -
Test for goodness of fit.

24. A typist kept a record of mistakes made per day during 300 working days in a year.

Mistakes/Day 0 1 2 3 4 5 6
No: of Days 143 90 42 12 9 3 1
Fit a Poisson distribution to the data. Test for goodness of fit.
0.89
(Given e = 0.410652)

25. The following are the number of arrivals of flights per hour in an airport. Can we
conclude that the following 400 arrivals follow a Poisson distribution with =3 at 5%
level of significance?

No: of 0 1 2 3 4 5
Arrivals
No: of Hours 20 57 98 85 78 62

Potrebbero piacerti anche