Sei sulla pagina 1di 55

Biostatistics Basics

An introduction to an expansive and complex field

2006

Common statistical terms


Data
Measurements or observations of a variable

Variable
A characteristic that is observed or manipulated Can take on different values

Evidence-based Chiropractic

2006

Statistical terms (cont.)


Independent variables
Precede dependent variables in time Are often manipulated by the researcher The treatment or intervention that is used in a study

Dependent variables
What is measured as an outcome in a study Values depend on the independent variable
Evidence-based Chiropractic

2006

Statistical terms (cont.)


Parameters
Summary data from a population

Statistics
Summary data from a sample

Evidence-based Chiropractic

2006

Populations
A population is the group from which a sample is drawn
e.g., headache patients in a chiropractic office; automobile crash victims in an emergency room

In research, it is not practical to include all members of a population Thus, a sample (a subset of a population) is taken
Evidence-based Chiropractic

2006

Random samples
Subjects are selected from a population so that each individual has an equal chance of being selected Random samples are representative of the source population Non-random samples are not representative
May be biased regarding age, severity of the condition, socioeconomic status etc.
Evidence-based Chiropractic

2006

Random samples (cont.)


Random samples are rarely utilized in health care research Instead, patients are randomly assigned to treatment and control groups
Each person has an equal chance of being assigned to either of the groups

Random assignment is also known as randomization


Evidence-based Chiropractic

2006

Descriptive statistics (DSs)


A way to summarize data from a sample or a population DSs illustrate the shape, central tendency, and variability of a set of data
The shape of data has to do with the frequencies of the values of observations

Evidence-based Chiropractic

2006

DSs (cont.)
Central tendency describes the location of the middle of the data Variability is the extent values are spread above and below the middle values
a.k.a., Dispersion

DSs can be distinguished from inferential statistics


DSs are not capable of testing hypotheses
Evidence-based Chiropractic

2006

Hypothetical study data (partial from book)


Distribution provides a summary of:
Frequencies of each of the values
23 34 43 51 61 72 etc.
Case # 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Visits 7 2 2 3 4 3 5 3 4 6 2 3 7 4
2006

Ranges of values
Lowest = 2 Highest = 7
Evidence-based Chiropractic

10

Frequency distribution table


Frequency Percent 3 21.4 4 28.6 3 21.4 1 7.1 1 7.1 2 14.3 Cumulative % 21.4 50.0 71.4 78.5 85.6 100.0

2 3 4 5 6 7

Evidence-based Chiropractic

11

2006

Frequency distributions are often depicted by a histogram

Evidence-based Chiropractic

12

2006

Histograms (cont.)
A histogram is a type of bar chart, but there are no spaces between the bars Histograms are used to visually depict frequency distributions of continuous data Bar charts are used to depict categorical information
e.g., MaleFemale, MildModerateSevere, etc.
Evidence-based Chiropractic

13

2006

Measures of central tendency


Mean (a.k.a., average)
The most commonly used DS

To calculate the mean


Add all values of a series of numbers and then divided by the total number of elements

Evidence-based Chiropractic

14

2006

Formula to calculate the mean


Mean of a sample Mean of a population
X = X n

X = N

X (X bar) refers to the mean of a sample and refers to the mean of a population X is a command that adds all of the X values n is the total number of values in the series of a sample and N is the same for a population

Evidence-based Chiropractic

15

2006

Measures of central tendency (cont.) Mode


The most frequently occurring value in a series The modal value is the highest bar in a histogram
Mode Mode

Evidence-based Chiropractic

16

2006

Measures of central tendency (cont.) Median


The value that divides a series of values in half when they are all listed in order When there are an odd number of values
The median is the middle value

When there are an even number of values


Count from each end of the series toward the middle and then average the 2 middle values

Evidence-based Chiropractic

17

2006

Measures of central tendency (cont.) Each of the three methods of measuring central tendency has certain advantages and disadvantages Which method should be used?
It depends on the type of data that is being analyzed e.g., categorical, continuous, and the level of measurement that is involved
Evidence-based Chiropractic

18

2006

Levels of measurement
There are 4 levels of measurement
Nominal, ordinal, interval, and ratio

1. Nominal
Data are coded by a number, name, or letter that is assigned to a category or group Examples
Gender (e.g., male, female) Treatment preference (e.g., manipulation, mobilization, massage)
19
2006

Evidence-based Chiropractic

Levels of measurement (cont.)


2. Ordinal
Is similar to nominal because the measurements involve categories However, the categories are ordered by rank Examples
Pain level (e.g., mild, moderate, severe) Military rank (e.g., lieutenant, captain, major, colonel, general)

Evidence-based Chiropractic

20

2006

Levels of measurement (cont.)


Ordinal values only describe order, not quantity
Thus, severe pain is not the same as 2 times mild pain

The only mathematical operations allowed for nominal and ordinal data are counting of categories
e.g., 25 males and 30 females
Evidence-based Chiropractic 2006

21

Levels of measurement (cont.)


3. Interval
Measurements are ordered (like ordinal data) Have equal intervals Does not have a true zero Examples
The Fahrenheit scale, where 0 does not correspond to an absence of heat (no true zero) In contrast to Kelvin, which does have a true zero
22
2006

Evidence-based Chiropractic

Levels of measurement (cont.)


4. Ratio
Measurements have equal intervals There is a true zero Ratio is the most advanced level of measurement, which can handle most types of mathematical operations

Evidence-based Chiropractic

23

2006

Levels of measurement (cont.)


Ratio examples
Range of motion
No movement corresponds to zero degrees The interval between 10 and 20 degrees is the same as between 40 and 50 degrees

Lifting capacity
A person who is unable to lift scores zero A person who lifts 30 kg can lift twice as much as one who lifts 15 kg
Evidence-based Chiropractic

24

2006

Levels of measurement (cont.)


NOIR is a mnemonic to help remember the names and order of the levels of measurement
Nominal Ordinal Interval Ratio

Evidence-based Chiropractic

25

2006

Levels of measurement (cont.)


Measurement scale Nominal Ordinal Interval Ratio Permissible mathematic operations Counting Greater or less than operations Addition and subtraction Addition, subtraction, multiplication and division Best measure of central tendency Mode Median Symmetrical Mean Skewed Median Symmetrical Mean Skewed Median

Evidence-based Chiropractic

26

2006

The shape of data


Histograms of frequency distributions have shape Distributions are often symmetrical with most scores falling in the middle and fewer toward the extremes Most biological data are symmetrically distributed and form a normal curve (a.k.a, bell-shaped curve)
Evidence-based Chiropractic

27

2006

The shape of data (cont.)

Line depicting Line depicting the shape of the shape of the data the data

Evidence-based Chiropractic

28

2006

The normal distribution


The area under a normal curve has a normal distribution (a.k.a., Gaussian distribution) Properties of a normal distribution
It is symmetric about its mean The highest point is at its mean The height of the curve decreases as one moves away from the mean in either direction, approaching, but never reaching zero
Evidence-based Chiropractic

29

2006

The normal distribution (cont.)


Mean Mean As one moves away from As one moves away from the mean in either direction the mean in either direction the height of the curve the height of the curve decreases, approaching, decreases, approaching, but never reaching zero but never reaching zero The highest point of The highest point of the overlying the overlying normal curve is at normal curve is at the mean the mean

A normal distribution is symmetric about its mean A normal distribution is symmetric about its mean

Evidence-based Chiropractic

30

2006

The normal distribution (cont.)


Mean = Median = Mode Mean = Median = Mode

Evidence-based Chiropractic

31

2006

Skewed distributions
The data are not distributed symmetrically in skewed distributions
Consequently, the mean, median, and mode are not equal and are in different positions Scores are clustered at one end of the distribution A small number of extreme values are located in the limits of the opposite end
Evidence-based Chiropractic

32

2006

Skewed distributions (cont.)


Skew is always toward the direction of the longer tail
Positive if skewed to the right Negative if to the left
The mean is shifted the most

Evidence-based Chiropractic

33

2006

Skewed distributions (cont.)


Because the mean is shifted so much, it is not the best estimate of the average score for skewed distributions The median is a better estimate of the center of skewed distributions
It will be the central point of any distribution 50% of the values are above and 50% below the median
Evidence-based Chiropractic

34

2006

More properties of normal curves


About 68.3% of the area under a normal curve is within one standard deviation (SD) of the mean About 95.5% is within two SDs About 99.7% is within three SDs

Evidence-based Chiropractic

35

2006

More properties of normal curves (cont.)

Evidence-based Chiropractic

36

2006

Standard deviation (SD)


SD is a measure of the variability of a set of data The mean represents the average of a group of scores, with some of the scores being above the mean and some below
This range of scores is referred to as variability or spread

Variance (S2) is another measure of spread


Evidence-based Chiropractic

37

2006

SD (cont.)
In effect, SD is the average amount of spread in a distribution of scores The next slide is a group of 10 patients whose mean age is 40 years
Some are older than 40 and some younger

Evidence-based Chiropractic

38

2006

SD (cont.)
Ages are spread Ages are spread out along an X axis out along an X axis

The amount ages are The amount ages are spread out is known as spread out is known as dispersion or spread dispersion or spread

Evidence-based Chiropractic

39

2006

Distances ages deviate above and below the mean

Etc.

Adding deviations Adding deviations always equals zero always equals zero

Evidence-based Chiropractic

40

2006

Calculating S2
To find the average, one would normally total the scores above and below the mean, add them together, and then divide by the number of values However, the total always equals zero
Values must first be squared, which cancels the negative signs

Evidence-based Chiropractic

41

2006

Calculating S2 cont.

S22is not in the S is not in the same units (age), same units (age), but SD is but SD is

Symbol for SD of a sample Symbol for SD of a sample for a population for a population
Evidence-based Chiropractic

42

2006

Calculating SD with Excel

Enter values in a column Enter values in a column

Evidence-based Chiropractic

43

2006

SD with Excel (cont.)

Click Data Analysis Click Data Analysis on the Tools menu on the Tools menu

Evidence-based Chiropractic

44

2006

SD with Excel (cont.)

Select Descriptive Select Descriptive Statistics and click OK Statistics and click OK

Evidence-based Chiropractic

45

2006

SD with Excel (cont.)

Click Input Range icon Click Input Range icon

Evidence-based Chiropractic

46

2006

SD with Excel (cont.)

Highlight all the Highlight all the values in the column values in the column

Evidence-based Chiropractic

47

2006

SD with Excel (cont.)

Click OK Click OK

Check if labels are Check if labels are in the first row in the first row Check Summary Check Summary Statistics Statistics

Evidence-based Chiropractic

48

2006

SD with Excel (cont.)

SD is calculated precisely SD is calculated precisely Plus several other DSs Plus several other DSs

Evidence-based Chiropractic

49

2006

Wide spread results in higher SDs narrow spread in lower SDs

Evidence-based Chiropractic

50

2006

Spread is important when comparing 2 or more group means

It is more difficult to see a clear distinction between groups in the upper example because the spread is wider, even though the means are the same

Evidence-based Chiropractic

51

2006

z-scores
The number of SDs that a specific score is above or below the mean in a distribution Raw scores can be converted to z-scores by subtracting the mean from the raw score then dividing the difference by the SD X z=
Evidence-based Chiropractic

52

2006

z-scores (cont.)
Standardization
The process of converting raw to z-scores The resulting distribution of z-scores will always have a mean of zero, a SD of one, and an area under the curve equal to one

The proportion of scores that are higher or lower than a specific z-score can be determined by referring to a z-table
Evidence-based Chiropractic

53

2006

z-scores (cont.)
Refer to a z-table Refer to a z-table to find proportion to find proportion under the curve under the curve

Evidence-based Chiropractic

54

2006

Partial z-table (to z = 1.5) showing proportions of the area under a normal curve for different values of z.
Z 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 0.00 0.5000 0.5398 0.5793 0.6179 0.6554 0.6915 0.7257 0.7580 0.7881 0.8159 0.8413 0.8643 0.8849 0.9032 0.9192 0.01 0.5040 0.5438 0.5832 0.6217 0.6591 0.6950 0.7291 0.7611 0.7910 0.8186 0.8438 0.8665 0.8869 0.9049 0.9207 0.02 0.5080 0.5478 0.5871 0.6255 0.6628 0.6985 0.7324 0.7642 0.7939 0.8212 0.8461 0.8686 0.8888 0.9066 0.9222 0.9357 0.03 0.5120 0.5517 0.5910 0.6293 0.6664 0.7019 0.7357 0.7673 0.7967 0.8238 0.8485 0.8708 0.8907 0.9082 0.9236 0.9370 0.04 0.5160 0.5557 0.05 0.5199 0.5596

z-scores (cont.)
0.06 0.5239 0.5636 0.07 0.5279 0.5675 0.08 0.5319 0.5714 0.09 0.5359 0.5753

Corresponds to the0.6103 0.6141 area to 0.5948 Corresponds0.6064 the area 0.5987 0.6026 0.6331under the curve in black 0.6517 0.6368 0.6406 0.6443 under the curve in 0.6480 black
0.6700 0.7054 0.7389 0.7704 0.7995 0.8264 0.8508 0.8729 0.8925 0.9099 0.9251 0.9382 55 0.6736 0.7088 0.7422 0.7734 0.8023 0.8289 0.8531 0.8749 0.8944 0.9115 0.9265 0.9394 0.6772 0.7123 0.7454 0.7764 0.8051 0.8315 0.8554 0.8770 0.8962 0.9131 0.9279 0.9406 0.6808 0.7157 0.7486 0.7794 0.8078 0.8340 0.8577 0.8790 0.8980 0.9147 0.9292 0.9418 0.6844 0.7190 0.7517 0.7823 0.8106 0.8365 0.8599 0.8810 0.8997 0.9162 0.9306 0.9429 0.6879 0.7224 0.7549

0.7852 0.8133 0.8389 0.8621 0.8830 0.9015 0.9177 0.9319 0.9441


2006

0.9332 0.93320.9345 0.9332 Evidence-based Chiropractic

Potrebbero piacerti anche