Statistical Test

PART ONE-Matching Statistics with the Research Design
This document will discuss the different types of research questions and describe several basic research designs that are used with each type. Also the general strengths and weaknesses of each design will be explained.
Research Questions
The domain of research designs is divided into three categories of research questions: descriptive, differences, and relationship. A descriptive research question seeks to identify and describe some phenomenon. An example: What is the ethnic breakdown of patients seen in the emergency room for non- emergency conditions. A differences research question asks if there are differences between groups on some phenomenon. For example: Do patients who receive massage experience more relief from sore muscle pain than patients who take a hot bath? A relationship question asks if two or more phenomena are related in some systematic manner. For example: If one increases his level of physical exercise does muscle mass also increase? If you are reading a research article, you can decide which type of research question a study used by determining the number of groups and some characteristics of the variables. Use the following chart to decide which type of research question was used:
---------------------------------------------------------------If you have two or more groups ................. Differences If you have one group and measure the DV two or more times ..................... Differences If you have one group, and measure the DV once ................................. Relationship If you have one group, no IV, and observe the DV once .................................. Descriptive ----------------------------------------------------------------
Table 1 illustrates the research designs used with each type of research question. The numbers of the research designs correspond to the designs on the following pages. The designs are just examples. You can change them around at will.
Table 1 - Research Questions and Designs -------------------------------------------------------------------------TYPE OF RESEARCH RESEARCH DESIGN QUESTION USUALLY USED -------------------------------------------------------------------------Descriptive 1. Observational w/ one observation (Describe conditions) 2. Observational w/ multiple obs.
3. Ex Post Facto Differences (Is there a difference?) 3. Ex Post Facto * 4. Pre/Post (two obs. of DV) 5. Pre/Post w/Control Group (two obs. of DV) 6. Two-Group (one after treat. obs. of DV) 7. Three-Group (one after treat. obs. of DV) 8. Repeated Measures (two or more obs.) 9. Factorial (two or more IVs) 10. Co-variance (pre-observation as control) 11. ABA Time Series (single subject) 12. AB Time Series (single subject)
13. Correlation/Regression (one group) 14. Correlation/Reliability (one group and two obs.) ------------------------------------------------------------------------* This design bridges both types
Relationships (How do the variables relate to each other)
To measure the phenomena being studied the researcher defines variables. Two types of variables are used: dependent variables (DV) and independent variables(IV) . A dependent variable is the phenomenon we want to study. In the examples, ethnic group, sore muscle pain, and muscle mass would all be dependent variables. An independent variable is a phenomenon that when it changes makes another phenomenon change. In the examples above, type of therapy (hot bath or massage), and exercise level (none, a little, a lot) are independent variables. The effect of the independent variable on the dependent variable can be seen in the examples: Which treatment therapy (a hot bath or a massage) will reduce sore muscle pain the most, or if we increase exercise, by how much will muscle mass increase. There are two categories for how the variables are measured: continuous and categorical. Continuous variables can be measured using either the interval or ordinal scale of measurement. Categorical variables can be measured using either the ordinal or nominal scale of measurement. It is usually best to call your variable categorical when you have an ordinal measurement scale. Independent and dependent variables can be either continuous or categorical depending on the research design. The combinations of variables allowed to be used in the same design (i.e., a continuous and a categorical) are determined by the various statistical tests. Generally with a differences research question you will have categorical IVs and either categorical or continuous DVs. With a relationships research question all variables must be either categorical or continuous. In Chapter 6 we will deal with which combinations are allowed in more detail. Interval measurement uses a scale where the distances between the points on the scale are equal across the scale, i.e., measuring with a ruler is an interval measurement
because inches are carefully defined to be a uniform length. A ratio scale is almost the same as an interval scale and for our purposes no distinction is made. An ordinal scale is a scale where phenomena are ordered or ranked, i.e., arrange a group of ten people by height and number the tallest person one and the shortest person ten. Finally, a nominal scale consists of categories with no order. The variable, type of therapy, from the example above, is a nominal variable with two categories: hot bath and massage. Note that treatment therapy is the variable and that it is ONE variable (with two categories: hot bath and massage), NOT two variables. Every variable MUST have at least two categories. You will diagram the research design using a matrix where events are the columns and groups are the rows. An event is the occurrence of one of three possibilities: a treatment, no treatment, or an observation. A treatment event is when the independent variable (IV) is manipulated, i.e., from the above example, when subjects are given a hot bath or a massage. A no treatment event is one where nothing is done to the IV. An observation event is when the dependent variable (DV) is measured. Group is simply the number of groups of subjects in the study. An example matrix for the pain study is below.
EVENT 1 2 3 ------------------------GROUP 1 O1 X1 O2 GROUP 2 O1 X2 O2 -------------------------
The first event, O1, is an observation of the pain level. The second event is the application of the treatment (X1 for massage and X2 for the hot bath). The third event, O2, is the second observation of the pain level. Finally, we need to introduce you to some terms and shorthand conventions that are used to diagram the various designs.
X - An "X" means that at this point we have administered a treatment. A treatment consists of manipulating the independent variable. The treatments may be numbered: First treatment, X1; second treatment, X2; third treatment, X3; nth treatment, Xn. O - An "O" means that at this point we have measured the dependent variable (taken an observation).
The nature of the sample (random or non-random) and how subjects were assigned to groups (random or non-random) is indicated by a letter in the upper left corner of the diagram.
R - The letter "R" in a box means that we have used random selection of subjects. If a design does not have this glyph in front of it this means that subjects were selected non-randomly. A - If we add an "A" to the "R" this means we have used random assignment of subjects to groups. Thus, we could have R, or A, or RA. To use assignment you need to have a differences question and two or more groups.
Descriptive Research Questions
Descriptive research questions give rise to observational designs. Observational designs use three general ways to gather data: observation, interview, or survey. These three methods are obvious in their application. In the first we observe behavior, in the second we ask people to describe their behavior orally, and in the third we seek written replies. The simplest design is to ask a sample of people about their behavior. Suppose we want to know how many people use seat belts. We would have a descriptive hypothesis: What is the percentage of people who wear seat belts? Suppose we wait at a stop light and when it is red we ask drivers if they usually wear their seat belts. A weakness of this study is that we do not know if the subjects are being totally honest with us. This "dishonesty" factor lowers the reliability and validity of this type of design. This is a non-random sample, with no treatment (no defined independent variable), and one dependent variable (percent of the sample who wear seat belts). The dependent variable is continuous and uses interval measurement. We observe the dependent variable once by asking each subject questions about this behavior. Our design looks like (1):
EVENT 1 ------------------Group 1 O ------------------(1)
There are variables that effect seat belt usage, but in design (1) we are not studying them, thus the term "no defined IV." Suppose we want to know how preferences for presidential candidates change over time? Design (2) is a survey design with multiple observations, no defined IV, and
random selection of subjects (Note the R above "Event"). We will ask the same question to the same people in January, May, October, and just before the election in November. Our dependent variable (continuous and interval) is the percent of people who support each candidate. Our descriptive hypothesis for each observation might be: What percent of the sample support each candidate?
R EVENT 1 2 3 4 ----------------------------------------Group 1 O1 O2 O3 O4 -----------------------------------------
(2)
A more sophisticated design for the seat belt study from design (1) would emerge if we observed whether or not they were wearing their seat belt and then asked them how often they wore their seat belt, using this scale: 1) all the time, 2) some of the time, or 3) never. This is based on a real study, see Geller, Casali & Johnson (1980). We would now have an independent variable (categorical and nominal): seat belt on or seat belt off. Our dependent variable (categorical and ordinal) would be their answer to the question: 1) always, 2) sometimes, or 3) never. This would give us two groups, still non-random. This type of design is called ex post facto since we do not manipulate the independent variable (IV), we only use it to classify people into groups. The design looks like this:
EVENT 1 -----------------------Group 1 (On) O Group 2 (Off) O ------------------------
(3)
Even though we interview many people the design indicates just one observation (O). Because the conditions under which we interview each person (subject) are the same, it is all one observation. Each subject is called a replication, and we add up all the replications to get an estimate of the dependent variable for that observation. For a more complicated design with several observations, the conditions are different for each observation. We expect the behavior of the subjects to be different for each observation when there are several (see designs 2, 4, and 10 for example). A directional hypothesis for design (3) is: People who are not wearing their seat belt will give answer 2 (sometimes) more often than people who are wearing their seat belts. Also, from the data collected from design (3) we could calculate another DV in terms of the percentages of people who said they wore seat belts when they were wearing them, and who said they wore them when they were not wearing them.
An observation study can provide very reliable data provided the criteria of what to look for are clearly defined. It is best to have two or more people observe, then correlate their observations. If their observations correlate well (a correlation value of .8 or .9), then we know we have reliable observations. If the correlation is not good (less than .7) then we need to redefine our criteria or train the observers better. In the Geller et al (1980) seat belt study they had two observers who observed 1,827 vehicles and they agreed 86.4% of the time on the gender of the driver, and whether or not the driver was wearing a seat belt. Geller et al only used observations where complete agreement was obtained, and discarded the non-agreement data. Descriptive studies are characterized by either not having an IV or by having an IV that is not manipulated. Also, if you have a descriptive hypothesis you have a descriptive design. Often there is confusion about whether or not a descriptive study has an IV. Remember a study with one group usually does NOT have an IV. A one group study will have an IV only when it is a time series, covariance, or repeated measures design. A study with two or more groups always has an IV. If we use the IV only to classify subjects into groups then it is an ex post facto study. When we manipulate the IV we have a differences study, and that is discussed in the next section.
Differences Research Questions
Differences Research Questions give rise to three types of designs: simple experimental designs (pre/post, two-group and three-group), factorial designs, and time-series designs. These designs are characterized by 1) manipulated IVs, and 2) two or more groups. These are the best designs to prove that altering (manipulating) the IV causes the DV to change. These designs enable you to control for unwanted effects due to extraneous variables. Thus the only thing effecting the DV is your manipulation of the IV. Experimental designs always have categorical IVs and the DV can be categorical or continuous. We will examine the simple experimental designs first, beginning with weak designs and ending with strong designs. Simple Experimental Designs Design (3), ex post facto, is often called a simple experimental design even though we do not manipulate the IV; the dividing lines between types of designs is not always clear and you need to be able to accept some ambiguity. A second simple design is the one group, pre/post, dependent means design, with or without random assignment (design 4). The term dependent means refers to the way we measure the DV. If we measure the same group of people twice, then the means of those two observations are related or dependent. The pre- observation (Event 1) is made before any treatment is done. The pre-observation is used to control for
differences between individuals at the beginning of the study. In this design we use difference scores as measures of the dependent variable, i.e., difference scores are obtained by subtracting O1 from O2. The difference is the change made by the independent variable. Suppose we wanted to see if blood pressure increased after exercise. Our independent variable is time, and it has two nominal categories: before exercise and after exercise. We indicate the exercise period as Event 2 with an X. The directional hypothesis is: There will be a significant gain in blood pressure after exercise, i.e., that O2 will be larger than O1.
R EVENT 1 2 3 --------------------------------------Group 1 O1 X O2 ---------------------------------------
(4)
With design (4) we attribute any change at O2 from O1 to the treatment (X). This is a poor design because it does not have a control group. This type of design is called quasi-experimental because it has some of the qualities of an experimental design but lacks adequate controls to insure internal validity. i.e., in this case it lacks a control group. If a study used non-random selection or non-random assignment it would also be quasi- experimental. An improved, true experimental study is seen in design (5):
RA EVENT 1 2 3 --------------------------------------Group 1 O1 X O2 Group 2 O1 O2 ---------------------------------------
(5)
In design (5) above, group 2 is the control group that is 1) the same (in terms of the distribution of relevant variables, e.g., age, sex, etc.) as group 1 due to the random assignment (RA), and 2) is not exposed to the treatment. A blank is used to indicate the no treatment condition. One could use X2, but a blank seems a logical indicator of the no treatment condition. The directional hypothesis is: The experimental group (group 1) will have a larger gain than the control group (group 2). If the group 1 measurement at O2 is different from the group 2 measurement at O2, that difference is due to the treatment. In this study the IV is treatment/no treatment. The average value of the DV for the two O1 measurements will be the same for each group due to the random assignment.
If random selection of our sample is not possible then the study will not generalize to the population we are studying. Even without random selection an effective control group can be obtained by using random assignment. If random assignment is not possible, the O1 measurement serves as a control feature. Without random assignment, instead of comparing the O2 observations, we compare the remainders after we subtract O1 from O2. Design (5) that looks at differences pre to post is called a covariance design, and allows us to correct for initial differences between groups. Design (5) could have two treatment groups and no control group. The two treatments would be indicated as X1 for group 1, and X2 for group 2. This would also be a true experimental design. Sometimes control groups are not needed if you know the dependent variable is not effected by the passage of time or other exteraneous variable (See Chapter 3 regarding the various threats to the internal validity of a study that might require you to use a control group). The O1 measurement in design (5) can be deleted to obtain design (6). Since random assignment makes the groups equal there may be no need for the pre-treatment observation. Some times a pre-treatment observation can cause problems with internal validity, so if the pre-treatment observation can be deleted the study is improved (See index under pre-test influence for more information). Design (6) is called a two-group, independent means design. The means are independent because we measure each group only once. The directional hypothesis is the same as the one for design (5), only the number of observations has changed.
RA EVENT 1 2 ----------------------------Group 1 X O Group 2 O -----------------------------
(6)
Design (6) is good to use if you have a large sample (20 or more subjects in each group). If the group size is small then, even with random assignment, the groups could be unequal due to chance factors. With small group sizes (10 or less subjects per group) design (5) is better than design (6) because the analysis of covariance will mathematically compensate for the unequal starting points. If you have a large sample size then either desigb (5 or 6) is good to use. If we define a more complicated IV, we add more groups to our design. Say we want to compare two therapies: individual counseling and group counseling, and are concerned that people tend to get well without counseling. If we add a control group to design (6), a "no counseling" group, we get a three-group design(7). In design (7), we randomly assigned, randomly selected people with mild emotional problems to
three groups. We do not need a pre-measure, but we could use one if desired. Groups 1 and 2 received different treatments (individual or group counseling) for three weeks. A measure of emotional adjustment was made after the three months of therapy. The control group receives no therapy but is given the test for emotional adjustment at the same time as the other two groups.
RA EVENT 1 2 ----------------------------Group 1 X1 O Group 2 X2 O Group 3 O -----------------------------
(7)
The study in design (7) has two directional hypotheses: 1) group 1 will be different from group 2, and 2) group 2 will be different from group 3 (we need not hypothesize that group 1 will be different from group 3 since the answers to the first two hypotheses will provide the answer to the third by a process of elimination). Suppose we want to see if the effect of the therapy is maintained over time. We can add follow up observations every month after stopping therapy. See design (8), events 3, 4, and 5.
RA EVENT 1 2 3 4 5 ----------------------------------------------------------Group 1 X1 O1 O2 O3 O4 (8) Group 2 X2 O1 O2 O3 O4 Group 3 O1 O2 O3 O4 -----------------------------------------------------------
Designs like (8) with more than one observation are called repeated measures designs. They can have one or more groups. Repeated measures is not the same as time series, (10) and (11), or a pre/post, (5). Times series is used for studies with a single subject. If you have a pre-treatment observation and two or more after treatment observations, use repeated measures rather than a pre/post design. If we had a group of clinic patients and assigned a third of them to each group, as in (8), then we would have random assignment without random selection (there would just be an A above EVENT). Using only random assignment the groups are equal, but we cannot generalize our results to a larger population (The internal validity is not threatened, but the external validity is weak). The patients that come to a clinic are not a random population of patients, they are a self selected group of people
PART TWO- RESEARCH DESIGN Factorial Experimental Designs Factorial designs are used when you have two or more IVs and want to know if an interaction is present. If we add another IV to design (7) so we have two IVs, then we have what is called a factorial design. We also have a more complex diagram. Suppose we add sex as an IV to the study in (7). Thus the IV, sex, would have two categories: male and female. A study with two IVs needs to have at least four groups. For design (9), we will have 4 groups and we will choose not to have the control group.
RA EVENT 1 2 -----------------------------------Group 1 X1A O Group 2 X1B O Group 3 X2A O Group 4 X2B O ------------------------------------
(9)
The groups are composed thus: Group 1 is composed of (1) men receiving (A) individual counseling Group 2 is composed of (1) men receiving (B) group counseling Group 3 is composed of (2) women receiving (A) individual counseling Group 4 is composed of (2) women receiving (B) group counseling If we decided we needed a control, we would have to add two control groups, one group of men and one of women. It is often easier to understand how the variables interact to form the groups if a grid diagram is used. Study the example below to understand how the two IVs (factors) are combined and result in four groups, where each group receives a different combination of IV categories.
--------------------------------IV - Factor B Type of Counseling --------------------------------(A) Individual (B) Group -------------------------------------------------------IV (1) Men Group 1 (X1A) Group 2 (X1B) Factor A --------------------------------------------Sex (2) Women Group 3 (X2A) Group 4 (X2B) --------------------------------------------------------
This design also leads to a more complicated set of directional hypotheses than the single IV studies. This study, and all other two factor designs, have three directional hypotheses: 1. There is a difference between sexes, i.e., men will be better emotionally adjusted than women after three months (Factor A). 2. There is a difference between therapies, i.e., Subjects in individual therapy will be better emotionally adjusted than subjects in group therapy (Factor B). 3. There is an interaction between sex and therapy, i.e., Men in individual therapy will be better emotionally adjusted than subjects in the other three groups (Factor A x B). Interaction means that there is one combination of Factor A and Factor B that produces a bigger difference than another combination. For example, perhaps women do better with group therapy, and men do better with individual therapy. Time Series Designs Time series designs are the third category of experimental designs. A time series design differs from an experimental design in purpose. With an experimental design we are interested in outcomes, thus we usually only observe behavior after all treatments are done. With a time series design we are interested in process, and observe behavior during treatment, i.e., or more correctly after each individual treatment. An example of a time series study is when we want to see how well a diabetic patient can regulate their glucose level. To keep their glucose levels stable diabetic patients use insulin every day. To see if they maintain a stable glucose level we need to measure the level each day. A time series study is different from a repeated measures design in that they will have more observations (ten or more is typical), and the observations are taken more often (hourly, or daily). Time series designs are also called single subject designs, because these designs were first used with studies that used only one subject. Single subject designs are an extension of the classical medical case study. The single subject design is very useful with patients, since it parallels the data gathering one does when treating a patient. Data from several patients can be combined to obtain a more valid measure of the effect of the IV on the DV. You need many subjects because a sample size of one is not a reliable measure of the patient population you are studying, i.e. it will not generalize. In design (10) we have no control group, so we use the subject as his own control.
EVENT 1 2 3 4
----------------------------Group 1 O1 X O2 O3 (Subject) ----------------------------<-a-> <-- B> <-a->
(10)
Since time series designs usually have numerous observations, it is best to abbreviate each phase on the diagram. Thus, in design (10) observation O1, which establishes a baseline for the subject's behavior, represents a series of observations not just one. Treatment X and observation O2 are a series recording the effect of each treatment on behavior. Finally, observation O3 is a series that records if the behavioral changes are maintained after the treatment is stopped. In a time series design the IV becomes the phases, e.g., in design (10) the IV has two categories: 1) time during which no treatment was given, and 2) time during which treatment was given. Thus, we expect the subject's behavior to change (or be different) due to the conditions of treatment/no treatment. Another way to diagram a time series design is in the form of a graph of the expected observations. Such a graph is shown in Figure 1.
There are three directional hypotheses with design (10) as shown in Figure 1: 1. Behavior frequency will not change during phase A. 2. Behavior frequency will decrease during phase B. 3. Behavior frequency will remain lowered during second phase A. Since the subject serves as his own control, hypotheses 2 and 3 cannot be tested unless hypothesis 1 is true. The first A phase is the control phase and if it is not stable there is no place to measure changes in the subsequent phases A time series study can have several subjects and several groups. The groups in design (11) are the same that we had in design (8), but now the concern is to study the
effect of each treatment on the subjects rather than just assessing the "after treatment" effects.
RA EVENT 1 2 3 ----------------------------Group 1 O X1 O Group 2 O X2 O Group 3 O O ----------------------------<--a--><----- B>
(11)
Time series studies have their own special terminology. Each time interval is called a trial, and trials without treatments are called phase A and trials with treatments are called phase B. Thus, the design in (10) is called an ABA design and (11) is called an AB design. You use the ABA design with behavioral interventions where the patients behavior might deteriorate when the treatment is stopped. The second baseline (A) period lets you check if behavior is maintained after treatment is stopped. Use the AB design for rehabilitation interventions where recovery is not dependent on the continuance of therapy. You can have many different designs, for example: ABAB, or ABACA (where C is a different treatment). Always have a baseline (A) period as the first period because the baseline serves as the control from which further changes are evaluated.
Relationships Research Questions
Relationships research questions are the final category of research questions. These are questions that concern prediction, i.e., can we predict variable X from variable Y? A relationships question would be: Is the relationship between diet and heart disease strong enough that we can predict the onset of heart disease from a knowledge of diet? This type of research question leads to correlation/regression designs. Correlation/regression designs are ex post facto designs because while we may define IVs we do not manipulate them. We do not manipulate the IV because we gather all the data at one time, hence there is no time to manipulate the IV. These designs are similar to observational studies except that we hypothesize a correlation between variables, rather than simple descriptions. A correlation describes how as one variable changes (the IV) another variable also changes (the DV) in a (hopefully) predictable way. For example as the temperature rises in a room, we become less comfortable. Thus temperature and comfort are
correlated. Regression is a mathematical equation that lets you actually make predictions and the correlation value tells you how accurate your predictions are). With correlational designs we may have many variables, but we observe them all once, and at one time, see design (12).
R EVENT 1 ------------------Group 1 O -------------------
(12)
Assume that in design (12) we are studying three variables: exercise level (V1), age (V2), and health care costs (V3). We select a random sample of people, gather data on each variable from each person, and then correlate the data. We will obtain three measures of relationship: V1 with V2, V2 with V3, and V1 with V3. We may find that exercise level decreases with age, that health care costs increase with age, and that health care costs increase as exercise level decreases. This does not tell us that decreased exercise level causes health care costs to increase. In order to prove that one variable causes another to change we must do a factorial experimental study where we vary exercise level and control for age. A correlational study does not prove causality (that A causes B), it just describes how variables relate to each other. We usually use correlational studies to find important variables, that we later study using differences designs to establish causality. The data for a correlational study looks like this:
Subject Variable 1 Variable 2 etc. ------------------------------------------------1 00.0 00.0 00.0 2 00.0 00.0 00.0 . . . . : : : : N 00.0 00.0 00.0 -------------------------------------------------
Correlational studies give a picture of the relationships between variables. A plot of hypothetical data for age and health care costs appears in Figure 2. Each * in the plot represents the scores for one subject on each variable.
The plot indicates as age increases, annual health care costs also increase. The straight line is called a regression line. The line describes the "best prediction" of variable Y (Age) from variable X (Health Care Costs). In this particular case the prediction is not very accurate because the actual scores (the *'s) do not lie on the line. There are some older people who have low health care costs. The correlation tells the researcher the strength of the relationships between variables, and the accuracy of prediction, i. e., If I know how much you spend on health care, how accurately can I predict your age?. When is a study relationships and when is it differences? A differences study has one or more of the following characteristics: 1) a manipulated IV, 2) two or more groups, 3) two or more observations, and /or 4) a directional hypothesis of this form, "The IV will cause the DV to change. A directional hypothesis for a relationships study has this form, "The IV and the DV are related, and the IV will predict the DV." Another type of correlation design occurs when you are measuring the reliability of a test. To measure reliability you administer the test twice under exactly the same conditions. Design (14) illustrates a reliability design.
R EVENT 1 2 ------------------------Group 1 O O -------------------------
(13)
Detailed discussions of all the above research designs can be found in Kerlinger (1973), Cox and West (1986), and other books on research design. Associated with each design are appropriate statistical methods for analyzing the data.
There are many more experimental designs, but all designs are just logical diagrams of events and groups. Remember first develop your hypotheses, then plan your study to test them. Your study need not conform to these examples to be correct.
This document focuses on how to select the correct statistical procedure for your research study. This is an important element of your research design. The choice of how the data is analyzed effects the number of subjects and the way you measure the dependent variables. To avoid using too few subjects or gathering the wrong type of data, you need to decide on the data analysis method during the design phase of your research. A key element to selecting the correct statistical test is to fit the design one is using into one of four patterns. The patterns are made by the sequence of events; the Xs and Os of the design. You must evaluates the design diagram by looking at the pattern of Xs and Os for group 1, by counting the number of Os for group 1, and by knowing the number of groups. The simplest pattern is pattern 0. This pattern consists of at least one group and one observation and is used for descriptive and relationship research questions. EVENT 1 ------------------GROUP 1 O -------------------
(Pattern 0)
The remaining three patterns are used for differences research questions. Pattern 1 is an experimental design that is all observations and has one or more groups. EVENT 1 ------------------GROUP 1 O GROUP 2 O GROUP 3 O ------------------EVENT 1 2
(Pattern 1)
----------------------- (Pattern 1) GROUP 1 O O ----------------------Pattern 2 is an experimental design that has a pattern of events where first a treatment is done and then the DV is observed. The DV may be observed more than once and still the design is assigned to pattern 2. EVENT 1 2 3 -------------------------GROUP 1 X1 O1 O2 GROUP 2 X2 O1 O2 --------------------------
(Pattern 2)
Pattern 3 is for experimental designs that have the sequence of events: observe, treat, observe. It is usually assumed that the first observation is used as a control measure. Pattern 3 designs are assigned to analysis of covariance when the DV uses interval measurement and there are three or more observations of one group, or two or more observations of two or more groups. In this case, the first (pretreatment) observation is used to predict the post treatment observations and then the residuals are analyzed for differences due to the treatment. EVENT 1 2 3 ----------------------------GROUP 1 O1 X1 O2 GROUP 2 O1 X2 O2 GROUP 3 O1 O2 -----------------------------
(Pattern 3)
The blank cell for group 3, event 2 indicates that group 3 is a control or no treatment group. Designs can have any number of events, but you only need to count the total number of events and observations, and examine the first three events, to deduce the design's pattern. Once you have the pattern you can use Table 2 to locate the correct statistical test. TABLE 2
Relating Research Designs to Appropriate Statistical Analyses ------------------------------------------------------------------DESIGN STATISTICAL TEST ------------------------------------------------------------------DIFFERENCES RESEARCH QUESTION 1. Basic two-group design 1. a. t-test - independent means (Pattern 1 or 2) (Interval or ratio data)* b. Mann-Whitney U test (Ordinal data) c. Chi-square (nominal data) 2. Pre-test and post-test design. (Pattrn 3) 2. a. t-test - dependent (non-independent) means (Interval) b. Wilcoxon or Sign test (Ordinal) c. McNemar test (Nominal) 3. Interrupted time-series analysis (interval)
3. Time-Series or Single Subject (Pattern 3)
4. Covariance, or repeated 4. a. Repeated measures analysis measures design. of variance OR Analysis of (Pattern 1 or 3) co-variance (Interval) b. Friedman's AOV by ranks (Ordinal) c. Cochran's Q (Nominal) 5. Three or more groups 5. a. Analysis of variance design (Pattern 1,2 or 3) (Interval) b. Kruskal-Wallis AOV (Ordinal) c. Chi-square test for K independent groups (Nominal) DESCRIPTIVE RESEARCH QUESTION (Pattern 0) 6. One-group sample from a 6. a. One-group t-test (Interval) known population. b. Kolmogorov-Smirnov test for goodness-of-fit (Ordinal) c. Chi-square goodness-of-fit
test (Nominal) RELATIONSHIPS RESEARCH QUESTION (Pattern 0) 7. Correlational study 7. a. Pearson product moment (Two or more variables correlation coefficient. and one group) (Interval) b. Spearman's rank order correlation, Kendall's Tau (Ordinal) c. Lambda Beta, Phi coefficient or Chi-square (Nominal) __________________________________________________________________ _ * Refers to the way the dependent variable is measured. Continuous variables use interval or ratio measurement Categorical variables use nominal or ordinal measurement For example, a one group pre-post design (Example 1), with interval measurement of the DV, will need the t-test for dependent means. EVENT 1 2 3 -------------------------- (Example 1) GROUP 1 O1 X O2 -------------------------If you changed the scale of measurement used by the DV to ordinal, then you would use the Wilcoxon or Sign test instead of the t-test. Another example. Given the two-group design shown below. Type of research question: Differences Number of independent variables: 1 Type of variable : categorical Number of dependent variables : 1 Type of variable : continuous Number of Events Number of Groups :3 :2
EVENT 1 2 3 -----------------------GROUP 1 O1 O2 O3 GROUP 2 O1 O2 O3 -----------------------Statistical test to use: Use a one factor repeated measures analysis of variance. Catalog of Statistical Tests This section describes the uses for the statistical tests that are available to the researcher for data analysis. Table 2 provides a summary of this section. Categorical IV and Continuous DV All the tests in this section are for differences research questions. These tests are appropriate for large sample sizes, i.e., twenty or more subjects per group. Some of them can be used with smaller group sizes (10 to 15 subjects per group), but the validity of the test results can be reduced when a small group size is used. 1. t-test for independent means - Use this test when you have two-groups and a single measurement of the dependent variable (DV). This test tells you if the mean of one group is different from the mean of the other. Remember all continuous variables use interval or ratio measurement. 2. t-test for dependent means - Used when you have one group and you measure the DV twice. It tells you if the mean of the first measurement is different from the mean of the second. 3. Analysis of variance (AOV) - Used when you have three or more groups and measure the DV once. It tells you if one of the group means is different from the other means. If you have one independent variable (IV) it is a one-factor AOV, and if you have two IVs it is a two-factor AOV. The two-factor AOV tells you three things: 1) if there is a difference between the DV means for the first IV, 2) if there is a difference between the DV means for the second IV, and 3) if there is a difference between the group means (An interaction effect; see Chapter 4 for more details). An AOV design with two factors has one group for each combination of the categories of the two IVs. If one IV has 2 categories and the second IV has 3 categories then the study will have 6 groups (3 x 2). AOV designs (and RAOV and
AOCovar below) can be used with as small as four subjects per group and still obtain valid results. 4. Repeated measures analysis of variance (RAOV) - Used when you have three or more groups and measure the DV two or more times. Can have one or more IVs. In addition to the information provided by an AOV design (see item 3 above) it tells you if the groups are different across the several measurements of the DV. Also appropriate for three or more observations of one group. 5. Analysis of covariance (AOCovar) - Used when you measure the DV before and after the treatment, have two or more groups, and have one or more after treatment observations. You can have one or more IVs. The t-test for dependent means is the one group, two observations equivalent of the AOCovar. Analysis of covariance is most useful when the groups are not randomly assigned and you want to control for any initial differences between the groups. 6. Multivariate analyses - When you have two or more DVs and three or more groups you can also run a multivariate AOV, RAOV, or AOCovar. This test allows you to see how the DVs relate to each other. It tells you if the DVs overlap in terms of what they measure or if they each measure a separate trait or concept. For example, you may use two different measures for heart rate (resting and immediately after exercise) as DVs and your single IV may have two categories of exercise levels. A multivariate AOV will tell you about the effects of the exercise levels on each heart rate measurement. Your results will show that the resting heart rate and heart rate immediately after exercise are correlated (e.g., they are both measuring the same trait) but that the immediate measurement is effected more by exercise level while resting heart rate is effected more by overall physical condition. If you have only two groups you use Hotelling's T-squared statistic. 7. One group t-test - If you only have one group, a descriptive research question, and a single measure of the DV you can see if your sample mean is different from the mean of the population (that is you can if you know the population mean). Since you have only one group you do NOT have an independent variable. This test is useful to see if your sample is representative of the population. Categorical IV and DV The tests in this section are called non-parametric tests because they are based on categorical data. They can be used with small sample sizes between five and twenty per group.
1. When data are frequencies (counts of the number of times an observation falls into one of several nominal categories) use these methods. a. Chi-square - There are many uses for Chi-square tests, here we will discuss three uses: i) When you have one set of observations, one DV, and one group you can use the Chi-square goodness-of-fit test to see if your sample frequencies are the same as the population frequencies. This is similar to the one-group t-test. ii) When you have one set of observations and one or more groups, one or more IVs, and one or more DVs you can see if a relationship exists between the variables or the groups. Similar to a correlation but it only tells you if a relationship exists not the strength of the relationship (as does a correlation coefficient). iii) You can also use Chi-square for the same purpose as a t-test for independent means or a one-way AOV when you only have frequency data. A Chi-square test can be used for both a relationship question and a differences question because at this level both questions have the same answer. If a relationship exists then the groups are different. If there is no relationship then the groups are the same. If there is a relationship, then you must use your knowledge of group membership to interpret the results. The frequencies in the categories of the DV are different is a systematic way if a relationship exists. b. Cochran's Q - Used when you have three or more observations of a nominal variable and one group. This test is the non-parametric parallel to the repeated measures AOV. c. McNemar test - Used with two observations of one group; nominal data. Nonparametric equivalent of the t-test for dependent means. d. Phi coefficient - This is a measure of relationship used when you have two variables that each only have two categories (e.g., yes/no or 1 / 0). This type of variable is called a dichotomous variable. This measure gives the same result as Pearson's product moment correlation coefficient. e. Lambda beta - This is a measure of relationship used when you have two nominal variables that each have two or more categories. Lambda beta is a more general measure of association than the Phi coefficient. This measure is not the same as Person's product moment correlation coefficient.
2. When data are scores (ordinal measurement) use these methods. These can be used with interval data as well by converting the interval data to ranks. a. Spearman's rank order correlation - This is a measure of relationship (association) used when you have two ordinal (rank order) variables. It is the nonparametric parallel to Pearson's product moment correlation coefficient. b. Kendall's Tau - This method provides the same measure of relationship as the Spearman. It is best to use Kendall's method when you have ten or fewer subjects. c. Kolmogorov-Smirnov test for goodness-of-fit - Used to compare your sample distribution to the population distribution. Use this test when you have ordinal (rank order) data. This test provides the same information as the one-group t-test or Chi-square goodness-of-fit test. d. Mann-Whitney U-test - Used when you have two groups and observe the DV once; ordinal data. This test is the non-parametric equivalent of the t-test for independent means. e. Wilcoxon Matched-pairs, signed-ranks test - Used when you have two observations of one group; ordinal data. This test uses the ranks as data. It is similar in function to the t-test for dependent means and the McNemar test for nominal data. f. Sign test - Used for exactly the same data as the Wilcoxon Matched-pairs, signed-ranks test. This test is not as powerful as the Wilcoxon because it only uses direction of change (+ or -) and not the actual ranks. g. Kruskal-Wallis one-way analysis of variance by ranks - Used when you have three or more groups and one observation of an ordinal variable. This test is the non-parametric parallel to the AOV. h. Friedman analysis of variance by ranks - Used when you have three or more observations of one ordinal DV and one group. It provides the same information as Cochran's Q or the Repeated measures AOV except that it is used with rank order data. This test is sometimes called a two-way AOV because the subjects are considered as the second factor. It is not a true two factor design. Continuous IV and DV
All of these procedures are used with relationships research questions and require large sample sizes, i.e., twenty or more per group. Smaller sample sizes can be used (10 to 15 per group) but the validity of the results can be reduced. 1. Pearson's product moment correlation coefficient - Used with pairs of continuous variables. It provides a measure of the degree of relationship between the two variables. 2. Regression analysis - Also called univariate regression analysis because there is only one DV used. This is a mathematical procedure not a statistical test. It produces an equation that lets you predict an individual's score on the DV from his score on the IV. The equation looks like this: DV = I + IV(S) Where I is called the intercept and S is called the slope. The regression equation does not tell you the accuracy of your prediction. The correlation coefficient tells you the accuracy of the prediction. 3. Multiple regression analysis - This procedure combines the functions of Pearson's correlation and regression analysis when you use two or more IVs to predict one DV. The procedure provides a regression (prediction) equation that looks like this: DV = I + (IV1 * B1) + (IV2 * B2) + (IVn * Bn) Where I is the intercept, and B1, B2, and Bn are called beta weights (Bn is the nth beta weight for the nth IV). The beta weights are values that correspond in function to the slope used in the univariate regression analysis. Multiple regression analysis also provides a measure of relationship called Multiple R. This Multiple R (or R squared) tells you the accuracy of prediction of the DV from the several IVs. 4. Multivariate Multiple Regression Analysis - This procedure is used when you have several IVs (like multiple regression above) and you have several DVs. This procedure provides a set of prediction equations (one for each DV), tells you the accuracy of prediction of each DV from the IVs, and finally tells you the degree to which the DVs are related and lets you study the structure of the relationship. There are no statistical tests for designs with continuous IVs and categorical DVs. Relationship designs require that the IVs and DVs both use the same type of measurement, i.e., both categorical, or both continuous.
The computational procedures for these statistical procedures can be found in many books on statistics. Three good books for this purpose are: Statistical Principles in Experimental Design by B. J. Winer; Non-parametric Methods of Quantitative Analysis by Jean D. Gibbons; and Statistical Methods in Education and Psychology by Gene V. Glass and Julian C. Stanley. There are several computer programs available that ease the pain of computing statistics by hand. Three major ones are: SPSS-Statistical Package for the Social Sciences by Norman H. Nie, C Hadlai Hull, Jean G. Jenkins, Karen Steinbrenner, and Dale H. Bent; SAS-Statistical Analysis System from SAS Institute Inc.; and BMD-Biomedical Computer Programs by W. J. Dixon. Some on-line statistical programs are available from the US Center for Disease Control. Epi Info, a statistics package which is available for free from the CDC web site. You can read about it here - http://www.cdc.gov/epiinfo. Two excellent books that combine statistical procedures with computer programs you can use to compute statistical tests is Multivariate Data Analysis by William W. Cooley and Paul R. Lohnes and Computational Handbook of Statistics by James L. Bruning and B. L. Kintz

Statistical Test

Caricato da

Informazioni sul documento

Descrizione originale:

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Statistical Test

Caricato da

Copyright:

Formati disponibili

PART ONE-Matching Statistics with the Research Design

Relationships (How do the variables relate to each other)

Descriptive Research Questions

----------------------------Group 1 O1 X O2 O3 (Subject) ----------------------------<-a-> <-- B> <-a->

3. Time-Series or Single Subject (Pattern 3)

Potrebbero piacerti anche