BIOSTATISTICS M391

By Dr. Atallah Z. Rabi

Definitions
Statistics: is a field of study concerned with:
The collection, organization, summarization, and analysis of data and The drawing inferences about a body of data when a part of data is observed

Biostatistics: Tools of statistics used in biological sciences and medicine

Definitions (cont.)
Statistics is the science and art of collecting, summarizing, and analyzing 'data that are subject to random variation (Last, 1995). Biostatistics is the application of statistics to biological problems. Data refers to a collection of items of information, A variable is any quantity that varies. It is any attribute, phenomenon, or event that can have different values.

Sources of Data
Data is the raw material of statistics, Data is used to answer a question, Sources of data are: Routinely kept records (hospital medical records) Surveys (information about mode of pt. transportation) Experiments (best strategy for pt. compliance) External sources (Published reports)

Common terms used in statistics

Population Sample Variables Measurements Statistical Inference Simple random sample

Population
Population is the largest collection of entities for which we have an interest at a particular time. (Weights of all new born babies in a hospital) Population of values is the largest collection of values of a random variable for which we have an interest at a particular time.
Finite population (values consist from fixed numbers) Infinite population (values consist of endless succession of values

Sample
Sample is a part of population. (weights of some selected new born babies) There are different types of samples There are different types of sampling techniques

Variables
A variable is a characteristic that takes different values in different persons, places, or things. Examples of variables :
diastolic blood pressure, heart rate, height of adult males, weight of new borne babies, ages of patients.

Types of variables
Quantitative variables: (weight, height, age, they convey
information regarding amount)

Qualitative variables (Sick, diabetic, they convey information

regarding attribute)

Random variable
Discrete random variable (# of daily admissions,
represented by whole number)

Continuous random variable (Height, weight, skull

circumferences)

Measurement
Measurement is defined as the assignment of numbers to objects or events according to a set of rules. Measurement has different scales: Nominal scale (male - female; well-sickmutually and collectively exclusive) Ordinal scale (observations can be ranked, low, medium , & high economic status) Interval scale ( distance between 2 measurements is known) Ratio scale (height, weight, & length, there is zero point.)

Statistical Inference
Statistical Inference is the procedure by which we reach a conclusion about a population on the basis of the information contained in a sample that has been drawn from that population.

Simple random sample

If a sample of size n is drawn from a population of size N in such a way that every possible sample of size n has the same chance of being selected, the sample is called a simple random sample

What is data?
Data are numbers, numbers result from:
Measurement (body Temp., Body weight) Counting (Number of patients admitted)

Why Statistics?
You should not ignore it. It is too useful. You cannot fight it. Everyone else uses it. It gives the right answers (95%) of the time. But you do not know which 95%. It is great fun. Trust me.

Why Statistics? Examples

Which drugs should be allowed on the market? What Public Health programs should be pursued? What programs would reduce infant mortality? Are cell phones a good idea for drivers? Is it a good idea for post-menopausal women to take estrogen?

Probability and Statistics

Probability generalizes the concept of replicability. Statistics are often used for decisions about specific (non-replicable) situations. These decisions are often made in the context of what is likely to happen in that specific situation. Probability (likelihood) reflects our belief about replicability

Populations, Samples, and Individuals Aristotle speculated about the population of all women (compared to the population of men). He had immediately available to him a sample of two women, and he could have counted the number of teeth for two individuals. The population is the collection of all people about whom you would like to ask a research question. This might be a fairly clearcut easily defined set of people: What proportion of people 65 or older in the US today have Alzheimers disease? Or it might be a more hypothetical group: How much of a reduction in symptomatic days could a person expect if treated with a new antiviral for flu?

Typically, you cant study everyone in the population. You cant afford to have everyone 65 or older in the US seen by a neurologist, even if you could find all the old people! You cant test everyone with the flu because the cases havent even occurred yet! So you study a sample, and you try to generalize to the population. The sample size is the number of individuals in the sample (not the number of measurements you make on each person!) A good study design will help make your sample representative of the population you are concerned about. Good statistical analysis will help tell you the best answer to your question about the population, and also how far off you might be.

Looking at data: categorical or continuous? Most data fall into two broad classes.

Continuous data are used to report a measurement of the

individual that can take on any value within an acceptable range. For example, age, systolic BP, [K+], change in weight over 6 months.

Categorical data are used to report a characteristic of the

individual that has a finite, usually small number of possibilities. The categories should be clear cut, not overlapping, and cover all the possibilities. For example, sex (male or female), vital status (alive or dead), disease stage (depends on disease), ever smoked (yes or no). Make sure you are very clear about the definitions. Does one cigarette and I didnt inhale count as smoking? When designing a study, allow for missing values and refusals.

All biostatistics begins with description. Before you do anything else, you look at the data and summarize the data. Our goal in this hour is to show you how to get a first look at the data and get ready to do more elaborate procedures. A statistic is just a numerical summary of the data, like the largest number in the data set. Descriptive statistics should be clear and easily interpreted. They should not mislead you about the data they are summarizing.

Measures of central tendency

Measures of central tendency tell you in some sense where you might expect a typical person to be, in the middle of the data. The mean is the arithmetic average. For example, if 3 people were in hospital 8, 10 and 30 days respectively, the mean time is 48/3 = 16 days.! The median is the value at which half the numbers are higher and half are lower. If number of individuals is odd, it is the middle value (rank (n+1)/2) and if number is even, it is average of two middle values.!. The mode is the most common value; rarely used

Mean Calculation

Measures of dispersion
Range Variance Standard

Line histogram showing distribution of HR in women

Women's HR at 1 ltr O2/min
14 12 10 8 6 4 2 0 105 110 115 120 125 130 135 140 145 150 155 160 165 170 175 heart rate # women