Sei sulla pagina 1di 9

This article was downloaded by: [University of Alberta] On: 7 January 2009 Access details: Access Details: [subscription

number 713587337] Publisher Informa Healthcare Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

Encyclopedia of Biopharmaceutical Statistics


Publication details, including instructions for authors and subscription information: http://www.informaworld.com/smpp/title~content=t713172960

Analysis of 2 K Tables
Shiva Gautam a a Harvard Medical School, Boston, Massachusetts, U.S.A. Online Publication Date: 25 August 2004

To cite this Section Gautam, Shiva(2004)'Analysis of 2 K Tables',Encyclopedia of Biopharmaceutical Statistics,1:1,1 7

PLEASE SCROLL DOWN FOR ARTICLE


Full terms and conditions of use: http://www.informaworld.com/terms-and-conditions-of-access.pdf This article may be used for research, teaching and private study purposes. Any substantial or systematic reproduction, re-distribution, re-selling, loan or sub-licensing, systematic supply or distribution in any form to anyone is expressly forbidden. The publisher does not give any warranty express or implied or make any representation that the contents will be complete or accurate or up to date. The accuracy of any instructions, formulae and drug doses should be independently verified with primary sources. The publisher shall not be liable for any loss, actions, claims, proceedings, demand or costs or damages whatsoever or howsoever caused arising directly or indirectly in connection with or arising out of the use of this material.

Analysis of 2 K Tables
Shiva Gautam
Harvard Medical School, Boston, Massachusetts, U.S.A.

INTRODUCTION Data in 2 K contingency tables are encountered quite frequently in biomedical, epidemiological, social, and behavioral studies. The variable representing two rows is often called the row variable, whereas the variable representing K columns is called column variable. (Representation of a data set either in a 2 K or a K 2 table is just a matter of convenience.) Depending on the research design, either of the column and row variables may be outcome (response) variables or only one of them may be an outcome variable. More specifically, an observation may have been simultaneously categorized into one of the two rows and into one of the K column categories, or an observation may have been first drawn from a given classification of one of the variables (row or column) and then classified into one of the categories of the other variable (column or row). For example, without taking into account the pros and cons of study designs, consider a possible study to evaluate the association between smoking and lung cancer. The investigator may choose a design in which he/she first selects two groups of people according to whether they have or have no cancer. Then each subject is classified into one of the smoking history categories (e.g., nonsmoker, light smoker, heavy smoker, etc.). Similarly, the investigator may first select people according to smoking status, and then classify each subject from each smoking group according to whether he/she has or has no lung cancer. Finally, the investigator may select a fixed number of subjects and then simultaneously classify them into one of the two lung cancer categories and into one of the several smoking categories. In many situations, the same computational procedures can be used while analyzing data regardless of the study design. In the analysis of 2 K nominal table it is important to distinguish between a nominal table and an ordinal table. The data from the lung cancer and smoking study alluded above give rise to an ordinal 2 K table as the column of the tables (e.g., nonsmoker, light smoker, heavy smoker, etc.) follow an ordering (increasing) or a hierarchy in the sense that any one category will either be at a higher level or at a lower level than any of the other remaining categories. Sometimes such an ordering among categories is also called simple ordering. A 2 K
Encyclopedia of Biopharmaceutical Statistics DOI: 10.1081/E-EBS 120023105 Copyright D 2004 by Marcel Dekker, Inc. All rights reserved.

Downloaded By: [University of Alberta] At: 06:30 7 January 2009

table with no ordering, in the sense that any category is neither at a higher nor at a lower level than any one of the other remaining category, is called nominal category. For example, a table showing low-birthweight babies (low birthweight =yes or no) and ethnicity (Asians, Blacks, Hispanics, Whites, etc.) is an example of a 2 K nominal table. This paper presents some existing methods of analyzing data in 2 K nominal and ordinal tables, and then discusses some recently developed methods for 2 K ordinal table as an extension to Pearson chi-square test.

ANALYSIS OF 2 K NOMINAL TABLES Let nij denote the number of observations in the ith row (i = 1, 2) and jth column ( j = 1, 2, . . ., K) as displayed in 2 k P P Table 1. Also, let ni nij , nj nij , and 2 k 2 k i1 P j1 P P P n nij ni nj . The columns of
i 1j 1 i1 j1

the table, for the time being, are assumed to be nominal. Pearsons Chi-Square Test Perhaps the most popular method for analyzing 2 K nominal tables is the Pearsons chi-square procedure introduced by Karl Pearson in 1900. The Pearson chisquare test statistic is defined as X2
2 X k X i 1j 1

nij ^ yij 2 ;

ni nj . where ^ yij n The statistic X2 is asymptotically distributed as the chi-square variate with (K 1) degrees of freedom. A large value of X2 provides evidence against the null hypothesis. The null hypothesis is often stated as there is no association between the row and column variables. Depending on the research question, the null hypothesis could be that the distribution of proportions in each row (two populations) is the same or the column proportions (K-populations) are the same. As usual, the decision to reject (or not to reject) the null hypothesis is based on
1

ORDER

REPRINTS

Analysis of 2 K Tables

Table 1 A 2 K contingency table Column Row 1 2 Total 1 n11 n21 n+1 2 n12 n22 n+2 ... ... ... ... ... ... ... ... K n1k n2k n+k Total n1 + n2 + n

the p-value. Agresti[1] is an excellent source on chisquare analysis of two-way nominal categorical tables. Likelihood Ratio Chi-Square
Downloaded By: [University of Alberta] At: 06:30 7 January 2009

Likelihood ratio chi-square statistic is also used to make inference from a nominal contingency table. It is defined as G2 2
2 X k X i 1j 1

nij lognij =^ yij

cancer for a nonsmoker, then the odds in favor of lung cancer for a smoker is given by p1/(1 p1), and the odds in favor of lung cancer for a nonsmoker by p2/(1 p2). The ratio p1(1 p2)/p2(1 p1) is referred to as the odds ratio which is often used as a direct/indirect measure of the relative risk of a disease (cancer) with an exposure (smoking) relative to the same disease without the exposure. Sample odds ratios are calculated using the observed proportions instead of probabilities.] One of the advantages of using logistic regression is that it quantifies the magnitude of association. Furthermore, the effect of any additional variables can be adjusted in the model. For example, while evaluating the association of low birthweight (yes, no) and race (Blacks, Hispanics, Whites, Others), investigators may want to adjust for the effect of a covariate (e.g., weight of mothers). Because of these and some other desirable properties and ease of interpretation of coefficients, the logistic regression procedure[2,3] is widely used to model binary response data from biomedical and other studies. Maximal Correlation and Pearsons Chi-Square Consider Table 1 as a sample from a bivariate distribution U (row) and V (column) variable. Let U = 1 if an observation is classified into the first row, U = 0 if an observation is classified into the second row. Let V = sj if an observation is classified into the jth category ( j =1, 2, . . ., k), where sj is a real number. The value sj taken by the variable V corresponding to the jth column of Table 1 will be referred to as a score hereafter. Let r2{s1, s2, . . ., sk} denote the square of the Pearsons correlation between U and V for a given set of scores {s1, s2, . . ., sk}. Let 2 r2 max denote the maximum of r {s1, s2, . . ., sk} over all possible sets of scores. Then it can be shown[4,5] that,
2 X2 nrmax

ni n j . where ^ yij n For large n, G2 also has chi-square distribution with (K 1) degrees of freedom. Hence both X2 and G2 analyses of a given data set in a 2 K nominal table will generally yield similar results for large n. Logistic Regression When the two rows of a 2 K table represent response, logistic regression may be used to analyze the data by modeling the probability of response (e.g., present vs. absent). Let p =probability of response in the first row. Define dummy variable X2, X3, . . ., Xk such that Xj =1 if the observation is from the jth category (j =2, 3, . . ., k), and Xj =0, otherwise. The logistic regression model can be represented as ^ X2 b ^ Xk ^ b 3 logitp b 1 2 k   p where logit p log . 1 p ^ is log odds of responding in row 1 Note that b 1 from column 1 (reference column) or equivalently when X2, X3,. . .Xk equal to 0 and X1 equals to 1. In other ^ =log(n1j/n2j). From the above equation b ^ is words, b 1 j the excess of log odds responding in row 1 due to the jth column than the response due to the first column. In ^ represents odds ratio (odds of other words, expb j response due to the jth column compared to odds of response due to the first column). [Note: Suppose p1 denotes the probability of lung cancer for smoker, and p2 denotes the chance of lung

where X2 is the Pearsons chi-square statistics. It can be shown that the maximal score for r2 max is given by the set of scores {n11/n +1, n12/n +2, . . ., n1k/n + k} or any set of scores obtained from a linear transformation of p 2 , then it can be shown these scores.[5] If rmax rmax 2 2 2 that rmax = (rmax) =(rmin) , where rmin = rmax. Thus rmax is the maximum possible correlation between the row and column variable, and is always nonnegative. As mentioned earlier X2 is simply a significance test and does not provide information on the magnitude of association between row and column variables. As evident from Eq. 4, a large value of chi-square may result if there is a large sample size even when the association is poor. q X2 However, the maximum possible correlation rmax n can be used as a measure of association as it meets several

ORDER

REPRINTS

Analysis of 2 K Tables

criteria outlined by Goodman and Kruskal for a measure q [6] X2 of association. As n is also equal to Cramers V and the f coefficient, the maximal correlation gives a new meaning to these quantities which are incorporated into the output of some statistical packages. Note that 0 rmax  1 and rmax =1 if and only if total observations in each column are contributed by only one of the two rows.[5] Regression and Pearsons Chi-Square Consider U and V as row and column variables of Table 1. Furthermore, define K 1 dummy variables X2, X3, . . ., Xk corresponding to the second, third, . . ., and kth category, respectively (these are the same variables defined earlier in the context of logistic regression). Consider the following predicted line from the linear regression U on X 2, X3, . . . , Xk ^ ^ U a1 ^ a 2 X2 ^ a3 X 3 ^ a k Xk
5

Exact Tests Inferences drawn from the above-described analyses of 2 K tables are based on a large sample theory. When the entries nij of Table 1 are small then the p-value is obtained directly. For example, exact tests such as Fishers exact test for 2 2 tables can be extended to a 2 K table. An exact test for a 2 K table is often performed by generating all possible tables or randomly generating some large number of tables (e.g., 10,000 tables) with the same marginal totals as the observed table assuming the null hypothesis is true. The p-value is the proportion of tables yielding values of a statistic (e.g., X2) that are equal to or larger than the value of the same statistic obtained from the observed table.

Downloaded By: [University of Alberta] At: 06:30 7 January 2009

An Example Consider Table 2 from Helmes and Fekken.[8] The table is also reproduced in Agresti.[1] The table classifies psychiatric patients by their diagnosis and whether the treatment prescribed drugs. The Pearson chi-square statistic from Table 2 is X2 = 84.180 ( p <0.0001, df = 4). This suggests an association between the diagnosis and whether or not a patients treatment prescribed drugs. The above table shows that a schizophrenic patient is most likely to be treated by drugs followed by patient diagnosed as active disorder and personality disorder, respectively. A patient with neurosis has an almost 50% chance of being prescribed drugs, whereas a patient classified as having special symptoms is not likely to be treated by a drug. The Pearson chi-square test rejects the hypothesis that these proportions in the population are the same. In other words, there seems to be an association between the diagnosis and whether or not the treatment prescribed a drug. The last row shows the odds ratio of being prescribed with a drug for a given diagnosis compared to odds of being prescribed with a drug if a patient is diagnosed as schizophrenic.

If R denotes the multiple correlation coefficient then it can be shown that,[5] nR2 X 2
6

2 It follows from Eqs. 4 and 6 that r2 max = R . It may be desirable to collapse the columns of a 2 K table without losing much information.[7] Gautam and Kimeldorf showed that two columns of a 2 K table can be collapsed into one if the regression coefficients are equal.[5] Similarly, if a regression coefficient equals zero then the corresponding column can be collapsed to the column representing the intercept. Therefore, one can test the hypothesis that only a given subset of categories is responsible for the association between the variables. A post hoc analysis may also be performed to find out the reduced table obtained by collapsing categories. This is analogous to testing of the hypothesis for a subset of regression parameters in multiple linear regression setting.

Table 2 Diagnosis of patients and whether their treatment prescribed drugs Diagnosis Treatment Drugs No drugs Total % Drugs Odds ratio
Source: Ref. [8].

Schizophrenia 105 8 113 92.92 1.0

Active disorder 12 2 14 85.71 0.46

Neurosis 18 19 37 48.65 0.07

Personality disorder 47 52 69 68.12 0.07

Special symptoms 0 13 13 0 0

Total 182 94 276

ORDER

REPRINTS

Analysis of 2 K Tables

Logistic Regression Let X2 X3 1; 0; 1; 0; if Active Disorder otherwise  if Neurosis otherwise 

1; X4 0; X5
Downloaded By: [University of Alberta] At: 06:30 7 January 2009

1; 0;

if Personality Disorder otherwise  if Other Symptoms otherwise

Table 3 shows the results from the logistic regression model presented in Eq. 3. A quick inspection of Table 3 reveals that the entries in the last column are odds ratios given in the last row of Table 2 except for the entry corresponding to constant. The entry corresponding to constant is simply the odds of being prescribed drug in the schizophrenia category (105/ 8 =13.215). Logistic regression analysis shows that at least one odds ratio is different from one, thus indicating an association between row and column variable. Some caution must be taken while analyzing data in tables with zero entries. A small number is often added to such entries. As an alternate, exact logistic regression analysis can be performed. A large value of chi-square is indicative of an association between treatment prescribing drug and diagnosis, but it does not indicate whether the statistical significance is a reflection of a strong association or an artifact of weaker association and a large sample size. As discussed above, the maximal correlation could shed further light into this association. From the relationship between q the Pearson qchi-square and rmax, we have X2 84:180 rmax n 276 0:5523. This shows that the observed significant association is not solely due to a large sample size. A natural question that may arise is whether some of the categories can be combined without losing much information.[7] An investigator may be further interested to determine whether this association is mostly due to only a few selected categories of the table. Gautam

and Kimeldorf [5] used multiple regression procedure to compute the contribution of a category to the total association noting that rmax is equal to the multiple correlation coefficient R obtained from the regression of row variable U (U = 1 if first row and U =0 if second row) on dummy variables X2, X3, X4, and X5 defined earlier. They showed that if the 2 3 table is obtained by collapsing columns 1 and 2 together, columns 3 and 4 collapsing together and leaving column 5 the way it is, then the maximal correlation (Pearson chi-square) from this reduced table is 0.5511 (chi-square= 83.824) which is about 99.8% of the maximal correlation (chi-square) from the original 2 5 table. If the table is reduced to the 2 2 table by collapsing the last four columns into one column, then the maximal correlation from this 2 2 table is 0.4740 (chi-square = 62.011). The reduction in the correlation is 0.0783, and the corresponding reduction in chi-square is 22.179 ( p-value < 0.001). Furthermore, Gautam and Kimeldorf argue that this significance is due to the sample size rather than an indication of the weakening of the association due to collapsing of the categories.[5] The two columns of this table classify patients into schizophrenia and nonschizophrenia groups, and the 2 2 table still retains about 86% of the information provided by the original 2 5 table. Hence the association between row and column variables in the original table is basically the association between diagnosis of schizophrenia (yes or no) and whether the treatment prescribed (yes or no) a drug. In terms of odds ratios, the odds of being on prescription drugs for a person diagnosed with schizophrenia are 14.66 times the odds of being on prescription drugs for a person diagnosed with a nonschizophrenia (95% CI : 6.7132.04) category.

ANALYSIS OF 2 K ORDERED TABLES Pearsons chi-square procedures and other tests developed for analyzing data in 2 K nominal tables do not incorporate the information on ordering among the columns of the table. These tests are not directed toward any specific alternate hypothesis. In analyzing data in a 2 K ordered table, investigators will obviously want to use as much information as possible provided by the data and also often want to determine whether the null hypothesis can be rejected against a specific alternate hypothesis (e.g., increasing response with the columns). A test that utilizes ordering information will have increased power compared to a test for nominal tables.[1] Methods for analyzing data in 2 K ordered tables may be broadly classified into two groups, namely, methods that assign and that do not assign numerical scores to the ordered categories, respectively. Methods that do assign numerical scores to the ordered categories may further

Table 3 Results from logistic regression on data in Table 2 Variable X2 X3 X4 X5 Constant

b
0.783 2.629 2.676 23.777 2.575

SE(b ) 0.847 0.493 0.418 11,147.524 0.367

p-value 0.356 0.000 0.000 0.998 0.00

Exp(b ) 0.457 0.072 0.069 0.000 13.125

ORDER

REPRINTS

Analysis of 2 K Tables

be divided into two subgroups. In the first subgroup of methods, scores are chosen a priori and the analysis is carried out using these scores, whereas in the second subgroup, scores are extracted from the data and thus are functions of the observed data. Methods Without Scores One of the widely used methods for analyzing data in 2 K ordered tables with the two rows representing two populations is the MannWhitney or equivalently Wilcoxon rank sum procedure.[9] Dykstra et al.[10] proposed a likelihood ratio statistic for testing whether one population with ordered outcome is larger than the other in the sense likelihood ratio ordering. The Likelihood Proportional Odds Model is another method used to analyze 2 K ordered data without assigning scores to ordinal categories.[11] Methods with Scores Several methods of data analysis for 2 K ordered tables assign order-preserving scores. The CochranArmitage Mantel trend test is widely used to evaluate the trend in proportion.[1214] This trend test requires assignment of order-preserving scores. Another widely used method that utilizes order-preserving scores is logistic regression. The scores are chosen a priori by the investigator, and often, equally spaced scores (e.g., 1, 2, 3, . . ., k) are assigned to the ordered categories. In some situations, the ordered categories are defined by actual intervals or actual quantity (e.g., dose level) in which case the scores may be chosen as mid values of the interval or the numerical numbers used to define the categories. The trend test is simply a test of significance and does not provide the magnitude of the association. The slope from the logistic regression is a function of the odds ratio [exp(slope) = odds ratio].[2,3] It is worth noting that the test statistic for the trend test and logistic regression are equivalent for a given set of scores. Hence the p-value from the trend test under the null hypothesis of no trend and the p-value for the logistic regression under the null hypothesis unit odds ratio are the same. When the two rows represent two independent samples, one can compare the row means with a given set of scores using Students t-test. Again, the p-value from the t-test will also be equal to the p-value from the trend test or the logistic regression if one were to use these methods. Finally, the test of zero correlation (Pearson) between row and column variables would also have yielded the same p-value. Let the row variable be denoted by U and column variable V with its values as category scores. If r = corr(U,V) is the correlation between U and V for a given set of order-preserving scores, then the test statistic for the

trend test and logistic regression is equal to nr2, and the nr test statistic for the t-test is t p (where n is the 1 r2 number of observations). Therefore, if linear regression instead of logistic regression is used, then one would obtain the same value of the test statistic and the same pvalue while testing the null hypothesis of zero slope. Although the research questions being asked and the design generating a 2 K ordered table may be different, a common computational procedure may be employed to obtain the statistical significance. Graubard and Korn listed several equivalent test statistics for a given set of scores.[15] Choice of Scores As discussed, several methods (e.g., trend test) of analyzing data in 2 K ordered tables use order-preserving scores. Even the MannWhitney (or Wilcoxon rank sum) test which apparently is a non-score-based method is equivalent to a score-based method. The t-test with midrank as category scores is equivalent to the Wilcoxon rank sum test. Cochran noted that any set of scores gives a valid test.[12] However, different sets of scores may yield different results.[15] Iso-Chi-Square Test Gautam et al. proposed the Iso-chi-square test which is an extension of Pearsons chi-square test.[16] This test also addresses the issue of arbitrariness of the scores assigned to the columns of 2 K ordered tables. It was shown earlier that the Pearson chi-square test statistic is equal to nr2 max where the maximum was taken over all possible column scores (with no restriction on the scores). The Iso-chi-square statistic is given by the same expression but the maximum is taken over all possible orderpreserving scores. There is a closed form of solution for the maximal scores and it is given by isotonic regression. However, it is not necessary to extract the maximal scores. The Iso-chi-square is equal to the Pearson chisquare obtained from either the original 2 K table or from a reduced table obtained by collapsing certain adjacent categories. The null distribution of the Iso-chisquare therefore is given by a mixture of chi-square distributions with 1 to K 1 degrees of freedom. Exact p-values can be obtained by generating tables with given marginal totals. Gautam et al. listed 5% and 1% cut-off values for several values of K.[16] Iso-chi-square also addresses the issue of arbitrary assignment of scores to the ordered categories as it reports the maximal value of the test statistics. In cases where there is no clear indication of what scores are to be used, Agresti[1] suggests using sensitivity analysis by choosing a few sets of scores. Iso-chi-square actually

Downloaded By: [University of Alberta] At: 06:30 7 January 2009

ORDER

REPRINTS

Analysis of 2 K Tables

assigns all possible sets of order-preserving scores. It is obvious that Iso-chi-square is also related to the maximal t-statistic and the maximal trend statistic. Iso-chi-square which utilizes all possible scores is equivalent to a method that does not utilize order-preserving scores proposed by Dykstra et al.[10] and thus links traditional statistical methods with the correspondence analysis or dual scaling.[17] It was also pointed out in an earlier section that the Pearson chi-square test statistic is equivalent to the maximal correlation obtained without order restriction on the scores. An Example
Downloaded By: [University of Alberta] At: 06:30 7 January 2009

t-statistics, and argued that if the minimum t-statistic is positive and significant (or the maximum t-statistic is negative and significant), then any set of order-preserving scores will produce a significant result.[18] This could be used as an evidence of stochastic ordering in a given 2 K table. Berger et al. propose the convex hull test for ordered categorical data.[19]

CONCLUSION In this article some existing methods of analyzing data in 2 K (K>2) contingency tables are discussed. Pearsons chi-square test statistic which is widely used to analyze nominal data is shown to be related to maximal correlation. Using maximal correlation an investigator may determine if only a few categories contribute to the observed association. This relationship between the chisquare and the maximal correlation may also shed light on whether the large value of the chi-square test statistic is only due to a large sample size. The paper also discusses methods of analysis of 2 K ordered table. Some of these methods use order-preserving scores and others that do not use such scores. Several of the methods that utilize scores are equivalent to each other. As these methods are directed toward a particular alternative hypothesis they have more power in general than the methods that do not utilize such scores. Also, these methods are computationally simple. However, the scores chosen are often arbitrary. In many situations the columns may provide some indication (e.g., interval, actual dose of a drug, etc.), where it makes sense to use certain scores. But in a situation where the columns are defined as low, medium, and high, it may be difficult to come up with a set of score. In such situations, the Iso-chi-square method may be useful. Iso-chi-square may be considered as a natural extension of the Pearson chi-square to the 2 K ordered table in the sense that if this procedure is applied to 2 K nominal tables, the test statistic is the Pearsons chi-square test statistic. Also, Isochi-square may be considered as a link between methods that do and do not utilize order-preserving score. All the 2 K tables discussed here are assumed to have simple ordering. There may be other types of tables where the ordering between two categories is not simple. For example, parental drinking or smoking may be classified as neither parent, mother only, father only, and both parents. The level of the first or the last category has a distinct hierarchy compared with any other categories. However, such a hierarchy between the second, the first, and the third is not defined. Similarly, some 2 K tables may have mixed categories (both nominal and ordinal categories) or may have open-ended categories.[20,21] The method of Iso-chi-square may be extended

Consider Table 4 which classifies maternal drinking and congenital sex organ malformation of babies.[15] If the two sample Wilcoxon rank sum test is used then the p-value = 0.56 which is also the p-value from the trend test with midranks as category scores. If equally spaced scores {1, 2, 3, 4, 5} are used then the p-value =0.20 (from the trend test, t-test, logistic regression, linear regression, and correlation analysis). In an example such as this perhaps the mid-values of the interval represent the underlying continuous measure. Graubard and Korn used scores of 0, 1.5, 4.0, and 7 (somewhat arbitrary) which yield a p-value equal to 0.01.[15] Iso-chi-square analysis for this data set yields a p-value of 0.02. These are exact p-values. Stochastic Ordering Stochastic ordering, in the context of a 2 K ordered table, is defined as having the cumulative distribution function (CDF) of one of the rows not crossing the distribution function of the other. In terms of the entries of Table 1, j j P P F1 j n1t =n1 and F2j n2t =n2 . If F2j F1j for all j =1, 2, . . ., K, then row 2 is stochastically larger than row 1 (in the observed sample data). It is interesting to note that row 2 is stochastically larger than row 1 if and only if all order-preserving scores yield a larger (or equal) mean for row 2 than the corresponding mean for row 1. Kimeldorf et al. computed the minimum and maximum
t1 t1

Table 4 Maternal alcohol consumption and congenital sex organ malformation of the children Alcohol consumption (average number of drinks per day) Malformation Absent Present 0 17,066 48 <1 14,464 38 12 788 5 35 126 1 6 37 1

ORDER

REPRINTS

Analysis of 2 K Tables

in such situations. Gautam presented a case where the choice of an open-ended category may influence the statistical conclusion in the context of the trend test and argued that such an analysis may be misleading.[22]

8.

9. 10.

ACKNOWLEDGMENTS The author would like to express his sincere thanks to Roger Davis ScD for his valuable comments. This research was supported in part by grant RR 01032 to the Beth Israel Deaconess Medical Center General Clinical Research Center from the National Institutes of Health.
Downloaded By: [University of Alberta] At: 06:30 7 January 2009

11. 12. 13. 14.

REFERENCES
1. Agresti, A. Categorical Data Analysis, 2nd Ed.; Wiley: New York, 2002. 2. Hosmer, D.W.; Lemeshow, S. Applied Logistic Regression, 2nd Ed.; Wiley: New York, 2000. 3. Collett, D. Modeling Binary Data; Chapman & Hall: London, 1991. 4. Haberman, S.J. Test for independence in two-way contingency tables based on canonical correlation and linear-bylinear interaction. Ann. Stat. 1981, 9, 1178 1186. 5. Gautam, S.; Kimeldorf, G. Some results on the maximal correlation in 2 K contingency tables. Am. Stat. 1999, 53 (4), 336 341. 6. Goodman, L.A.; Kruskal, W.H. Measures of association for cross-classifications. J. Am. Stat. Assoc. 1954, 49, 732 764. 7. Bishop, Y.M.M.; Fienberg, S.E.; Holland, P.W. Discrete Multivariate Analysis: Theory and Practice; The MIT Press: Cambridge, 1995.

15.

16. 17.

18.

19.

20. 21. 22.

Helmes, E.; Fekken, G.C. Effects of psychotropic drugs and psychiatric illness on vocational aptitude and interest assessment. J. Clin. Psychol. 1986, 42, 569 576. Conover, W.J. Practical Nonparametric Statistics, 2nd Ed.; Wiley: New York, 1980. Dykstra, R.; Kocher, S.; Robertson, T. Inference for likelihood ratio ordering in the two sample problem. J. Am. Stat. Assoc. 1995, 90, 1034 1040. McCullugh, P. Regression models for ordinal data. J. R. Stat. Soc., Ser. B 1980, 42, 109 142. Cochran, W.G. Some methods of strengthening the common chi-square test. Biometrics 1954, 10, 417 451. Armitage, P. Tests for linear trend in proportion and frequency. Biometrics 1955, 11, 375 386. Mantel, N. Chi-square test with one degree of freedom: Extension of the MantelHaenszel procedure. J. Am. Stat. Assoc. 1963, 58, 690 700. Graubard, B.I.; Korn, E.L. Choice of column scores for testing independence in ordered 2 K contingency tables. Biometrics 1987, 43, 471 476. Gautam, S.; Singh, H.; Sampson, A. Iso-chi-square testing in 2 K ordered tables. Can. J. Stat. 2002, 29, 609 629. Nishisato, S. Analysis of Categorical Data: Dual Scaling and Its Applications; University of Toronto Press: Toronto, 1980. Kimeldorf, G.; Sampson, A.; Whitaker, L. Min max scoring for two sample ordinal data. J. Am. Stat. Soc. 1992, 87, 241 247. Berger, V.W.; Permutt, T.; Ivanova, A. The convex hull test for ordered categorical data. Biometrics 1998, 54, 1541 1550. Gautam, S. Test for linear trend in 2 K ordered tables with open ended categories. Biometrics 1997, 53, 1163 1169. Gautam, S. Analysis of mixed categorical data in 2 K contingency tables. Stat. Med. 2002, 21, 1471 1484. Gautam, S.; Ashikaga, T. Assessing the effect of openended category on the trend in 2 K ordered tables. J. Data Sci. 2003, 1, 167 183.

Request Permission or Order Reprints Instantly!


Interested in copying and sharing this article? In most cases, U.S. Copyright Law requires that you get permission from the articles rightsholder before using copyrighted content. All information and materials found in this article, including but not limited to text, trademarks, patents, logos, graphics and images (the "Materials"), are the copyrighted works and other forms of intellectual property of Marcel Dekker, Inc., or its licensors. All rights not expressly granted are reserved.
Downloaded By: [University of Alberta] At: 06:30 7 January 2009

Get permission to lawfully reproduce and distribute the Materials or order reprints quickly and painlessly. Simply click on the "Request Permission/ Order Reprints" link below and follow the instructions. Visit the U.S. Copyright Office for information on Fair Use limitations of U.S. copyright law. Please refer to The Association of American Publishers (AAP) website for guidelines on Fair Use in the Classroom. The Materials are for your personal use only and cannot be reformatted, reposted, resold or distributed by electronic means or otherwise without permission from Marcel Dekker, Inc. Marcel Dekker, Inc. grants you the limited right to display the Materials only on your personal computer or personal wireless device, and to copy and download single copies of such Materials provided that any copyright, trademark or other notice appearing on such Materials is also retained by, displayed, copied or downloaded as part of the Materials and is not removed or obscured, and provided you do not edit, modify, alter or enhance the Materials. Please refer to our Website User Agreement for more details.

Request Permission/Order Reprints Reprints of this article can also be ordered at http://www.dekker.com/servlet/product/DOI/101081EEBS120023105

Potrebbero piacerti anche