Sei sulla pagina 1di 9

A chi-squared test, also referred to as test (or chi-square test), is any statistical hypothesis

test in which the sampling distribution of the test statistic is a chi-square distribution when the null
hypothesis is true. Chi-squared tests are often constructed from a sum of squared errors, or through
the sample variance. Test statistics that follow a chi-squared distribution arise from an assumption of
independent normally distributed data, which is valid in many cases due to the central limit theorem.
A chi-squared test can then be used to reject the hypothesis that the data are independent.

Also considered a chi-square test is a test in which this is asymptoticallytrue, meaning that the
sampling distribution (if the null hypothesis is true) can be made to approximate a chi-square
distribution as closely as desired by making the sample size large enough. The chi-squared test is
used to determine whether there is a significant difference between the expected frequencies and
the observed frequencies in one or more categories. Does the number of individuals or objects that
fall in each category differ significantly from the number you would expect? Is this difference
between the expected and observed due to sampling variation, or is it a real difference?

Contents
[show]

Examples of chi-square tests with samples[edit]


One test statistic that follows a chi-square distribution exactly is the test that the variance of a
normally distributed population has a given value based on a sample variance. Such tests are
uncommon in practice because the true variance of the population is usually unknown. However,
there are several statistical tests where the chi-square distribution is approximately valid:

Pearson's chi-square test[edit]


Main article: Pearson's chi-square test

Pearson's chi-square test, also known as the chi-square goodness-of-fit test or chi-square test for
independence. When the chi-square test is mentioned without any modifiers or without other
precluding context, this test is often meant (for an exact test used in place of , see Fisher's exact
test).

Yates's correction for continuity[edit]


Main article: Yates's correction for continuity

Using the chi-square distribution to interpret Pearson's chi-square statistic requires one to assume
that the discreteprobability of observed binomial frequencies in the table can be approximated by the
continuous chi-square distribution. This assumption is not quite correct, and introduces some error.

To reduce the error in approximation, Frank Yates suggested a correction for continuity that adjusts
the formula forPearson's chi-square test by subtracting 0.5 from the difference between each
observed value and its expected value in a 2 × 2 contingency table.[1] This reduces the chi-square
value obtained and thus increases its p-value.

Other chi-square tests[edit]

 Cochran–Mantel–Haenszel chi-squared test.


 McNemar's test, used in certain 2 × 2 tables with pairing
 Tukey's test of additivity
 The portmanteau test in time-series analysis, testing for the presence of autocorrelation
 Likelihood-ratio tests in general statistical modelling, for testing whether there is evidence of the
need to move from a simple model to a more complicated one (where the simple model is nested
within the complicated one).

Chi-squared test for variance in a normal population[edit]


If a sample of size n is taken from a population having a normal distribution, then there is a result
(see distribution of the sample variance) which allows a test to be made of whether the variance of
the population has a pre-determined value. For example, a manufacturing process might have been
in stable condition for a long period, allowing a value for the variance to be determined essentially
without error. Suppose that a variant of the process is being tested, giving rise to a small sample
of n product items whose variation is to be tested. The test statistic T in this instance could be set to
be the sum of squares about the sample mean, divided by the nominal value for the variance (i.e.
the value to be tested as holding). ThenT has a chi-square distribution with n − 1 degrees of
freedom. For example if the sample size is 21, the acceptance region for T for a significance level of
5% is the interval 9.59 to 34.17.

Example chi-squared test for categorical data[edit]


Suppose there is a city of 1 million residents with four neighborhoods: A, B, C, and D. A random
sample of 650 residents of the city is taken and their occupation is recorded as "blue collar", "white
collar", or "no collar". The null hypothesis is that each person's neighborhood of residence is
independent of the person's occupational classification. The data are tabulated as:

A B C D total

White collar 90 60 104 95 349


Blue collar 30 50 51 20 151

No collar 30 40 45 35 150

Total 150 150 200 150 650

Let us take the sample living in neighborhood A, 150/650, to estimate what proportion of the whole
1 million people live in neighborhood A. Similarly we take 349/650 to estimate what proportion of the
1 million people are white-collar workers. By the assumption of independence under the hypothesis
we should "expect" the number of white-collar workers in neighborhood A to be

Then in that "cell" of the table, we have

The sum of these quantities over all of the cells is the test statistic. Under the null hypothesis, it has
approximately a chi-square distribution whose number of degrees of freedom are

If the test statistic is improbably large according to that chi-square distribution, then one rejects the
null hypothesis ofindependence.

A related issue is a test of homogeneity. Suppose that instead of giving every resident of each of
the four neighborhoods an equal chance of inclusion in the sample, we decide in advance how many
residents of each neighborhood to include. Then each resident has the same chance of being
chosen as do all residents of the same neighborhood, but residents of different neighborhoods would
have different probabilities of being chosen if the four sample sizes are not proportional to the
populations of the four neighborhoods. In such a case, we would be testing "homogeneity" rather
than "independence". The question is whether the proportions of blue-collar, white-collar, and no-
collar workers in the four neighborhoods are the same. However, the test is done in the same way.

Applications[edit]
In cryptanalysis, chi-square test is used to compare the distribution of plaintext and (possibly)
decrypted ciphertext. The lowest value of the test means that the decryption was successful with
high probability.[2][3] This method can be generalized for solving modern cryptographic problems.[4]
In probability theory, a continuity correction is an adjustment that is made when a
discrete distribution is approximated by a continuous distribution.

Contents
[show]

Examples[edit]
Binomial[edit]
If a random variable X has a binomial distribution with parameters n and p, i.e., X is distributed as
the number of "successes" in n independent Bernoulli trials with probability p of success on each
trial, then

for any x ∈ {0, 1, 2, ... n}. If np and np(1 − p) are large (sometimes taken to mean ≥ 5), then the
probability above is fairly well approximated by

where Y is a normally distributed random variable with the same expected value and the
same variance as X, i.e., E(Y) = npand var(Y) = np(1 − p). This addition of 1/2 to x is a continuity
correction.

Poisson[edit]
A continuity correction can also be applied when other discrete distributions supported on the
integers are approximated by the normal distribution. For example, if X has a Poisson
distribution with expected value λ then the variance of X is also λ, and

if Y is normally distributed with expectation and variance both λ.

Applications[edit]
Before the ready availability of statistical software having the ability to evaluate probability
distribution functions accurately, continuity corrections played an important role in the practical
application of statistical tests in which the test statistic has a discrete distribution: it was a special
importance for manual calculations. A particular example of this is the binomial test, involving
the binomial distribution, as in checking whether a coin is fair. Where extreme accuracy is not
necessary, computer calculations for some ranges of parameters may still rely on using continuity
corrections to improve accuracy while retaining simplicity.

In statistics, a likelihood ratio test is a statistical test used to compare the goodness of fit of two
models, one of which (thenull model) is a special case of the other (the alternative model). The test
is based on the likelihood ratio, which expresses how many times more likely the data are under one
model than the other. This likelihood ratio, or equivalently its logarithm, can then be used to compute
a p-value, or compared to a critical value to decide whether to reject the null model in favour of the
alternative model. When the logarithm of the likelihood ratio is used, the statistic is known as a log-
likelihood ratio statistic, and the probability distribution of this test statistic, assuming that the null
model is true, can be approximated using Wilks's theorem.

In the case of distinguishing between two models, each of which has no unknown parameters, use
of the likelihood ratio test can be justified by the Neyman–Pearson lemma, which demonstrates that
such a test has the highest power among all competitors.[1]

Contents
[show]

Simple-vs-simple hypotheses[edit]
Main article: Neyman–Pearson lemma

A statistical model is often a parametrized family of probability density functions or probability mass
functions . A simple-vs-simple hypothesis test has completely specified models under both
the null and alternative hypotheses, which for convenience are written in terms of fixed values of a
notional parameter :

Note that under either hypothesis, the distribution of the data is fully specified; there are no unknown
parameters to estimate. The likelihood ratio test is based on the likelihood ratio, which is often
denoted by (the capital Greek letterlambda). The likelihood ratio is defined as follows:[2][3]

or

where is the likelihood function, and is the supremum function. Note that some
references may use the reciprocal as the definition. In the form stated here, the likelihood ratio is
[4]

small if the alternative model is better than the null model and the likelihood ratio test provides the
decision rule as follows:

If , do not reject ;
If , reject ;
Reject with probability if

The values are usually chosen to obtain a specified significance level , through the
relation . The Neyman-Pearson lemma states
that this likelihood ratio test is themost powerful among all level tests for this problem.[1]

Definition (likelihood ratio test for composite


hypotheses)[edit]
A null hypothesis is often stated by saying the parameter is in a specified subset of the
parameter space .

The likelihood function is (with being the pdf or pmf), which is a


function of the parameter with held fixed at the value that was actually observed, i.e., the data.
The likelihood ratio test statistic is [5]

Here, the notation refers to the supremum function.

A likelihood ratio test is any test with critical region (or rejection region) of the
form where is any number satisfying . Many common test statistics
such as the Z-test, the F-test, Pearson's chi-squared test and the G-test are tests for nested models
and can be phrased as log-likelihood ratios or approximations thereof.

Interpretation[edit]
Being a function of the data , the likelihood ratio is therefore a statistic. The likelihood ratio
test rejects the null hypothesis if the value of this statistic is too small. How small is too small
depends on the significance level of the test, i.e., on what probability of Type I error is considered
tolerable ("Type I" errors consist of the rejection of a null hypothesis that is true).

The numerator corresponds to the maximum likelihood of an observed outcome under the null
hypothesis. Thedenominator corresponds to the maximum likelihood of an observed outcome
varying parameters over the whole parameter space. The numerator of this ratio is less than the
denominator. The likelihood ratio hence is between 0 and 1. Low values of the likelihood ratio mean
that the observed result was less likely to occur under the null hypothesis as compared to the
alternative. High values of the statistic mean that the observed outcome was nearly as likely to occur
under the null hypothesis as the alternative, and the null hypothesis cannot be rejected.

Distribution: Wilks's theorem[edit]


If the distribution of the likelihood ratio corresponding to a particular null and alternative hypothesis
can be explicitly determined then it can directly be used to form decision regions (to accept/reject the
null hypothesis). In most cases, however, the exact distribution of the likelihood ratio corresponding
to specific hypotheses is very difficult to determine. A convenient result, attributed to Samuel S.
Wilks, says that as the sample size approaches , the test statistic for a nested
model will be asymptotically -distributed with degrees of freedom equal to the difference in
dimensionality of and . This means that for a great variety of hypotheses, a practitioner can
[6]

compute the likelihood ratio for the data and compare to the value
corresponding to a desired statistical significance as an approximate statistical test.

Wilk's theorem assumes that the true but unknown values of the estimated parameters are in the
interior of the parameter space. This is commonly violated in, for example, random or mixed effects
models when one of the variance components is negligible relative to the others. In some such
cases with one variance component essentially zero relative to the others or the models are not
properly nested, Pinheiro and Bates showed that the true distribution of this likelihood ratio chi-
square statistic could be substantially different from the naive , often dramatically so.[7] The naive
assumptions could givesignificance probabilities (p-values) that are far too large on average in some
cases and far too small in other.

In general, to test random effects, they recommend using Restricted maximum likelihood (REML).
For fixed effects testing, they say, "a likelihood ratio test for REML fits is not feasible, because"
changing the fixed effects specification changes the meaning of the mixed effects, and the restricted
model is therefore not nested within the larger model.[8]

They simulated tests setting one and two random effects variances to zero. In those particular
examples, the simulated p-values with k restrictions most closely matched a 50-50 mixture
of and . (With k = 1, is 0 with probability 1. This means that a good
approximation was .)

They also simulated tests of different fixed effects. In one test of a factor with 4 levels (degrees of
freedom = 3), they found that a 50-50 mixture of and was a good match for actual p-
values obtained by simulation -- and the error in using the naive "may not be too
alarming. However, in another test of a factor with 15 levels, they found a reasonable match
[9]

to -- 4 more degrees of freedom than the 14 that one would get from a naive (inappropriate)
application of Wilk's theorem, AND the simulated p-value was several times the naive .
They conclude that for testing fixed effects, it's wise to use simulation. (And they provided a
"simulate.lme" function in their "nlme" package for S-PLUS and R to support doing that.)

To be clear, these limitations on Wilk's theorem do not negate any power properties of a particular
likelihood ratio test, only the use of a distribution to evaluate its statistical significance.

Use[edit]
Each of the two competing models, the null model and the alternative model, is separately fitted to
the data and the log-likelihood recorded. The test statistic (often denoted by D) is twice the log of the
likelihoods ratio, i.e., it is twice the difference in the log-likelihoods:

The model with more parameters (here alternative) will always fit at least as well, i.e., have a greater
or equal log-likelihood, than the model with less parameters (here null). Whether it fits significantly
better and should thus be preferred is determined by deriving the probability or p-value of the
difference D. Where the null hypothesis represents a special case of the alternative hypothesis,
the probability distribution of the test statistic is approximately a chi-squared distribution withdegrees
of freedom equal to .[10] Symbols and represent the number of free
parameters of models alternative and null, respectively.

Here is an example of use. If the null model has 1 parameter and a log-likelihood of −8024 and the
alternative model has 3 parameters and a log-likelihood of −8012, then the probability of this
difference is that of chi-squared value
of with degrees of freedom, and is equal
to . Certain assumptions must be met for the statistic to follow a chi-squared
[6]

distribution, and often empirical p-values are computed.

The likelihood-ratio test requires nested models, i.e. models in which the more complex one can be
transformed into the simpler model by imposing a set of constraints on the parameters. If the models
are not nested, then a generalization of the likelihood-ratio test can usually be used instead:
the relative likelihood.

Examples[edit]
Coin tossing[edit]
An example, in the case of Pearson's test, we might try to compare two coins to determine whether
they have the same probability of coming up heads. Our observation can be put into a contingency
table with rows corresponding to the coin and columns corresponding to heads or tails. The
elements of the contingency table will be the number of times the coin for that row came up heads or
tails. The contents of this table are our observation X.

Here Θ consists of the possible combinations of values of the parameters , , ,


and , which are the probability that coins 1 and 2 come up heads or tails. In what
follows, and . The hypothesis space His constrained by the usual constraints
on a probability distribution, , and . The space of the null
hypothesis is the subspace where . Writing for the best values for under the
hypothesis H, the maximum likelihood estimate is given by

Similarly, the maximum likelihood estimates of under the null hypothesis are given by

which does not depend on the coin i.

The hypothesis and null hypothesis can be rewritten slightly so that they satisfy the constraints for
the logarithm of the likelihood ratio to have the desired nice distribution. Since the constraint causes
the two-dimensional H to be reduced to the one-dimensional , the asymptotic distribution for the
test will be , the distribution with one degree of freedom.

For the general contingency table, we can write the log-likelihood ratio statistic as

Potrebbero piacerti anche