Sei sulla pagina 1di 93

DATA

Jenis, Pengolahan, dan Penyajian


WRESTI INDRIATMI
Dep. IK Kulit & Kelamin
FKUI – RSCM
Jakarta
Biostat_3 MDU-PPDS FKUI Wresti Indriatmi 2

Variables & Data

Bowers D. Medical Statistics from scratch. 2nd Ed. Chichester, John Wiley & Sons Ltd. 2008
Biostat_3 MDU-PPDS FKUI Wresti Indriatmi 3

Types of variables

Bowers D. Medical Statistics from scratch. 2nd Ed. Chichester, John Wiley & Sons Ltd. 2008
Biostat_3 MDU-PPDS FKUI Wresti Indriatmi 4

Types of data
Biostat_3 MDU-PPDS FKUI Wresti Indriatmi 5
Biostat_3 MDU-PPDS FKUI Wresti Indriatmi 6

Nominal categorical variables


 Data that one can name and put into categories.

 Not measured but simply counted.

 Often consist of unordered ‘either–or’ type


observations which have two categories and are
often know as binary:
• Dead or Alive; Male or Female; Cured or Not Cured;
Pregnant or Not Pregnant.
 Can have more than two categories:
• blood group O, A, B, AB; country of origin; ethnic group or
eye colour.
 Merely gives the number and percentage
Biostat_3 MDU-PPDS FKUI Wresti Indriatmi 7

Ordinal categorical variables


• The data do not have any units of measurement

• The ordering of the categories is not arbitrary as it was


with nominal variables - it is now possible to order the
categories in a meaningful way.
• The difference between any pair of adjacent scores is
not necessarily the same as the difference between
any other pair of adjacent scores.
• Ordinal data therefore are not real numbers. They
cannot be placed on the number line

1. Bowers D. Medical Statistics from scratch. 2nd Ed. Chichester, John Wiley & Sons Ltd. 2008
2. Machin D, Campbell MJ, Walters SJ. Medical Statistics. 4th Ed. Chichester, John Wiley & Sons. 2007
Biostat_3 MDU-PPDS FKUI Wresti Indriatmi 8

Ordinal categorical variables


• EDUCATION is given in three categories:
A. None
B. Elementary school,
C. Middle school,
D. College and above.
• Thus someone who has been to middle school has more
education than someone from elementary school but less
than someone from college.
• One cannot say that someone who had middle school
education is twice as educated as someone who had only
elementary school education.

1. Bowers D. Medical Statistics from scratch. 2nd Ed. Chichester, John Wiley & Sons Ltd. 2008
2. Machin D, Campbell MJ, Walters SJ. Medical Statistics. 4th Ed. Chichester, John Wiley & Sons. 2007
Biostat_3 MDU-PPDS FKUI Wresti Indriatmi 9

Ranks
In some studies it may be appropriate to assign
ranks:
• Patients with rheumatoid arthritis may be asked to order
their preference for four dressing aids.
• although numerical values from 1 to 4 may be assigned to
each aid, one cannot treat them as numerical values.
• They are in fact only codes for
1. best,
2. second best,
3. third choice and
4. worst.
Biostat_3 MDU-PPDS FKUI Wresti Indriatmi 10

Discrete metric variables (Count data)

 Metric discrete • The number of pregnancies


variables can be each woman had had
properly counted and • Counts per unit of time such
have units of as the number of deaths in
a hospital per year
measurement –
‘numbers of things’. • The number of attacks of
asthma a person has per
 They produce data month.
which are real • In dentistry, the number of
numbers located on decayed, filled or missing
the number line. teeth (DFM).

1. Bowers D. Medical Statistics from scratch. 2nd Ed. Chichester, John Wiley & Sons Ltd. 2008
2. Machin D, Campbell MJ, Walters SJ. Medical Statistics. 4th Ed. Chichester, John Wiley & Sons. 2007
Biostat_3 MDU-PPDS FKUI Wresti Indriatmi 11

Continuous metric variables


 These data contain the most information, and are
the ones most commonly used in statistics
• age, years of menstruation and body mass index.

 For simplicity  continuous data are dichotomised


to make nominal data.
• diastolic blood pressure, which is continuous, is converted
into hypertension (>90 mmHg) and normotension (≤90
mmHg).

 One can also divide a continuous variable into


more than two groups.

Machin D, Campbell MJ, Walters SJ. Medical Statistics. 4th Ed. Chichester, John Wiley & Sons. 2007
Biostat_3 MDU-PPDS FKUI Wresti Indriatmi 12

Continuous metric variables


1. Metric continuous variables
can be properly measured
and have units of
measurement.
2. They produce data that are
real numbers (located on
the number line).
3. The difference between
any pair of adjacent values
is exactly the same.
4. All of the usual mathematical
operations can be apply

Bowers D. Medical Statistics from scratch. 2nd Ed. Chichester, John Wiley & Sons Ltd. 2008
Biostat_3 MDU-PPDS FKUI Wresti Indriatmi 13

Machin D, Campbell MJ,


Walters SJ. Medical
Statistics. 4th Ed. Chichester,
John Wiley & Sons. 2007
Biostat_3 MDU-PPDS FKUI Wresti Indriatmi 14

Interval and ratio scales


 In an interval scale: body temperature or calendar dates, a
difference between two measurements has meaning, but
their ratio does not.
• Measuring temperature (in degrees centigrade) then we cannot say that
a temperature of 20°C is twice as hot as a temperature of 10°C.

 In a ratio scale: body weight,


• a 10% increase implies the same weight increase whether expressed in
kilograms or pounds.

 The crucial difference


• in a ratio scale, the value of zero has real meaning,
• in an interval scale, the position of zero is arbitrary.

Machin D, Campbell MJ, Walters SJ. Medical Statistics. 4th Ed. Chichester, John Wiley & Sons. 2007
Biostat_3 MDU-PPDS FKUI Wresti Indriatmi 15

An algorithm to help identify variable type

Bowers D. Medical Statistics from scratch. 2nd Ed. Chichester, John Wiley & Sons Ltd. 2008
Biostat_3 MDU-PPDS FKUI Wresti Indriatmi 16

Summarising categorical data


 A ratio is simply one number divided by
another.
• If we measure the weight of a person (in kg/)
and the height (in metres), then the ratio of
weight to height2 is the Body Mass Index.
Biostat_3 MDU-PPDS FKUI Wresti Indriatmi 17

Summarising categorical data


 Proportions are ratios of counts where the
numerator (the top number) is a subset of the
denominator (the bottom number).
• In a study of 50 patients, 30 are depressed, so the
proportion is 30/50 or 0.6.
• It is usually easier to express this as a percentage, so we
multiply the proportion by 100, and state that 60% of the
patients are depressed.
Biostat_3 MDU-PPDS FKUI Wresti Indriatmi 18

Summarising categorical data


• A proportion is known as a risk if the
numerator counts events which happen
prospectively.
• Hence if 300 students start nursing school and
15 drop out before finals, the risk of dropping
out is 15/300 = 0.05 or 5%.
Biostat_3 MDU-PPDS FKUI Wresti Indriatmi 19

Summarising categorical data


 Rates always have a time period attached.
• If 600 000 people in the UK die in one year, out of a
population of 60 000 000, the death rate is 600 000/60
0000 000 or 0.01 deaths per person per year.
• This is known as the crude death rate -- are often
expressed as deaths per thousand per year, so the
crude death rate is 10 deaths per thousand per year, since
it is much easier to imagine 1000 people, of whom 10 die,
than it is 0.01 deaths per person
Biostat_3 MDU-PPDS FKUI Wresti Indriatmi 20

Distributions of variables –
Categorical variables

• NOMINAL variable – the distribution is given by


the frequency with which different possible
values of the variable occur in the data
• Bar chart:
• The height = the frequency

• One axis showing the frequency has a scale

• The other axis describes the categories under study

Cook A, Netuveli G, Sheikh A. Basic skills in statistics. London, Class Publishing Ltd. 2004
Biostat_3 MDU-PPDS FKUI Wresti Indriatmi 21

Distributions of variables –
Categorical variables

Cook A, Netuveli G, Sheikh A. Basic


skills in statistics. London, Class
Publishing Ltd. 2004
Biostat_3 MDU-PPDS FKUI Wresti Indriatmi 22

Distributions of variables –
Numerical variables
• The strategy of representing distributions is
not possible.
• Group data into intervals on the measuring
scale – bins or class intervals
• The bins need not be of equal size

Cook A, Netuveli G, Sheikh A. Basic skills in statistics. London, Class Publishing Ltd. 2004
Biostat_3 MDU-PPDS FKUI Wresti Indriatmi 23

Distributions of variables –
Numerical variables
• Important differences between a histogram
and a bar chart
• bar chart, the height of the column represents
the frequency,
• in a histogram the frequency is represented by
the area of the column.

Cook A, Netuveli G, Sheikh A. Basic skills in statistics. London, Class Publishing Ltd. 2004
Biostat_3 MDU-PPDS FKUI Wresti Indriatmi 24

Descriptive statistics for a continuous variable

1. Measures of location

 Mean is the average value

 Median is the middle point of the ordered data

 Mode is the most common value observed

Cook A, Netuveli G, Sheikh A. Basic skills in statistics. London, Class Publishing Ltd. 2004
Biostat_3 MDU-PPDS FKUI Wresti Indriatmi 25

Descriptive statistics for a continuous variable

2. Measures of the scale or spread


 Range refers to the difference between the maximum
value and the minimum value in the data
 Variance is the average of the squares of the differences
between the mean and each observation. For finding the
average, we do not use the number of observations but
the number of degrees of freedom, which is one less than
the number of observations
 Standard deviation is the square root of the variance.

Cook A, Netuveli G, Sheikh A. Basic skills in statistics. London, Class Publishing Ltd. 2004
Biostat_3 MDU-PPDS FKUI Wresti Indriatmi 26

Descriptive statistics for a continuous variable

3. Measures of shape / symmetry


 Skewness refers to the degree of asymmetry of the
distribution of the variable. The normal distribution has
zero skewness
 Kurtosis refers to the ‘peakedness’ of the distribution. The
standard normal distribution has a kurtosis of 3.

Cook A, Netuveli G, Sheikh A. Basic skills in statistics. London, Class Publishing Ltd. 2004
Biostat_3 MDU-PPDS FKUI Wresti Indriatmi 27

Key Messages
 Categorical variables can be summarised using
counts and percentages.
 Discrete numerical variables can be
summarised using the mode and median as
measures of location, and ranges and
percentiles as measures of dispersion.
 Normally distributed numerical variables
should be summarised using the mean and
standard deviation.
 Non-normally distributed numerical variables
should usually be summarised with the median
and a measure of range.
Biostat_3 MDU-PPDS FKUI Wresti Indriatmi 28

Shapes of Distributions
SYMMETRIC:
 histogram in which the right half is a
mirror image of the left half.
SKEWED TO THE RIGHT:
 histogram in which the right tail is
more stretched out than the left.(long
tail to the right)
SKEWED TO THE LEFT:
 histogram the left tail is more
stretched out than the right.(long tail
to the left)
Biostat_3 MDU-PPDS FKUI Wresti Indriatmi 29

Shapes of Distributions
NUMBER OF MODAL CLASSES:
 the number of distinct peaks in a
histogram
BELL-SHAPED:
 A histogram looks like a bell.
Biostat_3 MDU-PPDS FKUI Wresti Indriatmi 30

Skewness
• Skewness measures the degree of
asymmetry exhibited by the data
– Positive skewness – More
observations below the mean than
above it
– Negative skewness – A small number
of low observations and a large
number of high ones
Biostat_3 MDU-PPDS FKUI Wresti Indriatmi 31

Skewness
Biostat_3 MDU-PPDS FKUI Wresti Indriatmi 32

Kurtosis
• Kurtosis measures how peaked the
histogram is
• The kurtosis of a normal
distribution is 0
• Kurtosis characterizes the relative
peakedness or flatness of a
distribution compared to the normal
distribution
Biostat_3 MDU-PPDS FKUI Wresti Indriatmi 33

Kurtosis
• Platykurtic– When the kurtosis < 0,
the frequencies throughout the curve
are closer to be equal (i.e., the
curve is more flat and wide)
– Thus, negative kurtosis indicates a
relatively flat distribution

• Leptokurtic– When the kurtosis > 0,


there are high frequencies in only a
small part of the curve (i.e, the
curve is more peaked)
– Thus, positive kurtosis indicates a
relatively peaked distribution
Biostat_3 MDU-PPDS FKUI Wresti Indriatmi 34

Data Presentation Techniques


 TEXT

 TABLES
• the best way of showing structured numeric
information

 GRAPHS / CHARTS
• better for showing relationships
• making comparisons
• indicating trends
• it is usual to include a table to show the data from which it
was drawn
Biostat_3 MDU-PPDS FKUI Wresti Indriatmi 35

Useful Questions To Ask When


Considering How To Display Information
 What do you want to show?
• It is vital to have a clear idea about what is to be displayed
- is it important to demonstrate that two sets of data have
different distributions or that they have different mean
values?
 What methods are available for this?
• what the main message ?

 Is the method chosen the best? Would


another have been better?
• if a chart has been used would a table have been better or
vice versa?
Biostat_3 MDU-PPDS FKUI Wresti Indriatmi 36

Recommendations For The


Presentation Of Numbers
Summarising Summarise continuous
categorical data numerical data
• both frequencies and • use the mean and
percentages can be standard deviation,
used.
• if the data have a skewed
• If percentages are distribution use the
reported, it is important median and range or
that the denominator (i.e. interquartile range.
total number of
observations) is given. • For all of these calculated
quantities it is important
to state the total number
of observations on which
they are based.
Biostat_3 MDU-PPDS FKUI Wresti Indriatmi 37

When to use a TABLE & when to use


a GRAPH?
• Using a sentence (text) if one is
presenting only one or two numbers
• Tables: when one has more data, or wants
to present exact numerical values
• Graphical figures: to give context and
interpretation to numbers
Biostat_3 MDU-PPDS FKUI Wresti Indriatmi 38

When to use a TABLE & when to use


a GRAPH?
• Use figures or graphs to highlight relationship

• When using either tables or graphs to


present study data
– the figure should have a clear, concise title
– the data given in the figure should be able to
be understood without needing to refer to
the accompanying text.
Biostat_3 MDU-PPDS FKUI Wresti Indriatmi 39

Presenting Summary Data


• A common error when presenting summary data
either in a table or directly in the text is to
present the ± sign after a mean without
specifying what the figure after the sign
represents.

• ‘The mean age of the physiotherapy group was 55.4 ±


13.4 years’.
Biostat_3 MDU-PPDS FKUI Wresti Indriatmi 40

Presenting Summary Data


• A more correct presentation of the same data
would be,

• ‘The mean age of the physiotherapy group was 55.4


years (standard deviation 13.4 years)’.

• This latter representation is unambiguous and


avoids the incorrect implications that the standard
deviation can take a negative value.
Biostat_3 MDU-PPDS FKUI Wresti Indriatmi 41

Golden Rules for Reporting Numbers


• In a sentence, numbers less than 10 are
words
– In the study group, eight participants did not
complete the intervention

• In a sentence, numbers 10 or more are


numbers
– There were 120 participants in the study

• In a sentence, numbers below 10 that are


listed with numbers 10 and above should
be written as a number
– In the sample, 15 boys and 4 girls had diabetes
Biostat_3 MDU-PPDS FKUI Wresti Indriatmi 42

Golden Rules for Reporting Numbers


• Numbers that represent statistical or
mathematical functions should be expressed in
numbers
– Raw scores were multiplied by 3 and then converted to
standard scores

• Use words to express any number that begins a


sentence, title or heading. Try and avoid
starting a sentence with a number
– Twenty per cent of participants had diabetes
Biostat_3 MDU-PPDS FKUI Wresti Indriatmi 43

Golden Rules for Reporting Numbers


• Use a zero before the decimal point when
numbers are less than 1
– The p value was 0.013

• Do not use a space between a number and


its per cent sign
– In total, 35% of participants had diabetes

• Use one space between a number and its


unit
– The mean height of the group was 170 cm
Biostat_3 MDU-PPDS FKUI Wresti Indriatmi 44

Golden Rules for Reporting Numbers


• Report percentages to only one decimal place
if the sample size is larger than 100
– In the sample of 212 children, 10.4% had
diabetes

• Report percentages with no decimal places if


the sample size is less than 100
– In the sample of 44 children, 11% had diabetes

• Do not use percentages if the sample size is


less than 20
– In the sample of 18 children, 2 had diabetes
Biostat_3 MDU-PPDS FKUI Wresti Indriatmi 45

Golden Rules for Reporting Numbers


• For ranges use ‘to’ or a comma but not ‘-’
to avoid confusion with a minus sign.
• Use the same number of decimal places as
the summary statistic
– The mean height was 162 cm (95% CI 156 to
168)

– The mean height was 162 cm (95% CI 156, 168)

– The median was 0.5 mm (inter-quartile range −0.1


to 0.7)

– The range of height was 145 to 170 cm


Biostat_3 MDU-PPDS FKUI Wresti Indriatmi 46

Golden Rules for Reporting Numbers


• p values between 0.001 and 0.05
should be reported to three decimal places
– There was a significant difference in blood
pressure between the two groups (t = 3.0, df
= 45, p = 0.004)

• p values shown on output as 0.000


should be reported as <0.0001
– Children with diabetes had significantly lower
levels of insulin than control children without
diabetes (t = 5.47, df = 78, p < 0.0001)
Biostat_3 MDU-PPDS FKUI Wresti Indriatmi 47

General Rules when Reporting Frequencies

• For small samples (N < 30), the use of


percentages and ratios is not
recommended.
– present the data with the number of the
observations divided with the total number
of subjects within the group
– for example, 3/11, instead of 27%
Biostat_3 MDU-PPDS FKUI Wresti Indriatmi 48

General Principles concerning


the Construction of Tables

• Tables should be fully self-explanatory.


– The title should clearly indicate what the table
shows,
– Tables, including column and row headings,
should be clearly labeled and
– a brief summary of the contents of a table
should always be given in words, either as part
of the title or in the main body of the text.
• Units should be stated for each numerical
variable
Biostat_3 MDU-PPDS FKUI Wresti Indriatmi 49

General Principles concerning


the Construction of Tables

• Zero is a number, and numerical observations


of zero should be explicitly presented as such.
– If a survey shows no cases of poliomyelitis in a
particular county in a particular year, the entry should
indicate this fact.

• A dash or a dotted line should be reserved for


data that are missing or unobserved.
– If the information from that particular county was
incomplete or otherwise unavailable, a dash or a
dotted line should be used
Biostat_3 MDU-PPDS FKUI Wresti Indriatmi 50

General Principles concerning


the Construction of Tables

• Tables should have a purpose, they should


contribute to and be integrated with the rest of
the text
– Tables should be used only when they can
communicate information more efficiently or
effectively than can be done in text or figure

• Data presented in tables should not be


duplicated elsewhere in the text.

• Data presented in tables should not also be


presented in figures, and vice versa.
Biostat_3 MDU-PPDS FKUI Wresti Indriatmi 51

Presenting Data in Tables


• Try to use relatively few significant
digits. Too many decimal points can
make data less clear (though sometimes
they are necessary).
• If numbers are large, consider using
percentages where applicable.
• Consider the orientation of the table.
When you want to draw attention to a
variable, it is better it is put as columns
rather than rows.
Biostat_3 MDU-PPDS FKUI Wresti Indriatmi 52

Presenting Data in Tables – Don’t


• Tables containing description of less than
three variables are usually not
considered good for data presentation.

• Also, tables having many cells with zero


frequencies or having a lot of
categories with small number of
counts should not be displayed in articles.

• Too small and too large tables should


be avoided.
Biostat_3 MDU-PPDS FKUI Wresti Indriatmi 53

Tables

Where there is no
natural ordering of the
rows (or indeed
columns), they
should be ordered
by size (category with
the highest frequency
first, lowest frequency
last) as this helps the
reader to scan for
patterns and
exceptions in the data.
Biostat_3 MDU-PPDS FKUI Wresti Indriatmi 54

Presenting Graph
• “a picture is worth a thousand
words.”
• Graphs tell a story in “pictures”
rather than in words or numbers.
Biostat_3 MDU-PPDS FKUI Wresti Indriatmi 55

Guidelines For Constructing Graphs


• Each graph should have a title
explaining what is being displayed.
• Axes should be clearly labeled.
• Gridlines should be kept to a minimum.
• Avoid three-dimensional graphs as
these can be difficult to read.
• The number of observations should be
included.
Biostat_3 MDU-PPDS FKUI Wresti Indriatmi 56

Good Graph
• a clear title (with
the sample size),
• labeled axes,
• no gridlines and
• the marital status
categories are
ordered by their
frequency.
Biostat_3 MDU-PPDS FKUI Wresti Indriatmi 57

Table or Graph ?
• A table  a display of numbers in a
rectangular grid,
• A graph or chart  a picture in which the
numbers are represented by points or lines.
Biostat_3 MDU-PPDS FKUI Wresti Indriatmi 58

Type of Graphs
Categorical Numerical graphs
graphs (nominal – Stemplot (stem-
or ordinal) and-leaf plot)
– Bar graph – Histograms
– Pie chart – Frequency
polygon
– Boxplots
– Scatter plot
Biostat_3 MDU-PPDS FKUI Wresti Indriatmi 59

Bar Chart
• Give a clear display of simple results.

• They are used when the horizontal axis


is composed of categories
– male / female;
– ethnic groups, etc.
• If the bars are not separated by spaces,
the chart is referred to as a histogram,
rather than a bar chart.
Biostat_3 MDU-PPDS FKUI Wresti Indriatmi 60

Bar Chart
• Bar Graph uses bars to represent the
frequencies (or relative frequencies)
the height of each bar equals the
frequency or relative frequency of each
category.
• Bar Graph: height indicates count or
percent
• Frequencies: counts
• Relative frequencies: percent
Biostat_3 MDU-PPDS FKUI Wresti Indriatmi 61

Data 1
Biostat_3 MDU-PPDS FKUI Wresti Indriatmi 62

Bar Graph
Biostat_3 MDU-PPDS FKUI Wresti Indriatmi 63

Horizontal Bar Graph


Biostat_3 MDU-PPDS FKUI Wresti Indriatmi 64

Pie Chart
Biostat_3 MDU-PPDS FKUI Wresti Indriatmi 65

Pie Chart
• Generally pie charts are to be avoided, as
they can be difficult to interpret
particularly when the number of
categories is greater than five.

• Small proportions can be very hard to


discern, as is the case for vaginal breech
delivery.
Biostat_3 MDU-PPDS FKUI Wresti Indriatmi 66

Two- or three-dimensional charts?

This should never be done as they are especially difficult to read and interpret. When
the charts are displayed as three dimensional this relationship is lost as what is
displayed becomes a volume. Only the front face is proportional to the numbers in the
categories and so only these should be displayed
Biostat_3 MDU-PPDS FKUI Wresti Indriatmi 67

Data 2
Biostat_3 MDU-PPDS FKUI Wresti Indriatmi 68

Clustered Bar Chart


Biostat_3 MDU-PPDS FKUI Wresti Indriatmi 69

Clustered Bar Chart


The data could be
presented as two
separate pie charts
or bar charts side
by side but it is
preferable to
present the data in
one graph with
the same scales
and axes to
make the visual
comparisons
easier.
Biostat_3 MDU-PPDS FKUI Wresti Indriatmi 70

Data 3
Biostat_3 MDU-PPDS FKUI Wresti Indriatmi 71

Stacked Bar Chart


When the number of
groups to be
compared becomes
greater than three
or four, a better type
of bar chart is the
stacked bar chart,
where the groups are
arranged on the
horizontal axis and
the variable being
compared between
the groups is
arranged on the
vertical axis.
Biostat_3 MDU-PPDS FKUI Wresti Indriatmi 72

Count Data
• Count data can
only take whole
numbers and the
best method to
display them is
using a bar chart.
Biostat_3 MDU-PPDS FKUI Wresti Indriatmi 73

Count Data
• On the horizontal
axis are the number
of deaths per day,
going from a
minimum of 0 deaths
per day to a
maximum of 16
deaths per day,
• On the vertical axis
is the frequency with
which these occur
during this 5-year
period.
Biostat_3 MDU-PPDS FKUI Wresti Indriatmi 74

Graphs For Continuous Data


• The simplest graphs are dotplots and
stem and leaf plots and they both
display all the data.

• Other graphs which provide useful


summaries of the data such as
histograms and box-and-whisker
plots.
Biostat_3 MDU-PPDS FKUI Wresti Indriatmi 75

Histograms
In order to construct a histogram
• the data range is divided into several
non-overlapping equally sized bins
(categories)
• the number of observations falling into
each bin counted.
• the categories are then displayed on the
horizontal axis (X-axis) and the
frequencies displayed on the vertical axis
(Y-axis)
Biostat_3 MDU-PPDS FKUI Wresti Indriatmi 76

Histogram
• Breaks the range of the values of a
quantitative variable into intervals and
displays only the count or percent of the
observations that fall into each interval.
• You can choose any convenient number of
intervals.
• Intervals must be of equal width.
Biostat_3 MDU-PPDS FKUI Wresti Indriatmi 77

Histogram

Figure 4.6 Histograms of


height for leg ulcer patients
(n= 222)
(a) with only 6 categories, (b)
with 22 categories and (c)
with 9 categories
Biostat_3 MDU-PPDS FKUI Wresti Indriatmi 78

Box–Whisker Plots
• Box plots can be particularly useful for
comparing the distribution of the data
across several groups.

• The box contains the middle 50% of the data,


with lowest 25% of the data lying below it and
the highest 25% of the data lying above it.

• In fact the upper and lower edges represent a


particular quantity called the interquartile
range.

• The horizontal line in the middle of the box


represents the median value
Biostat_3 MDU-PPDS FKUI Wresti Indriatmi 79

Box–Whisker Plots
Biostat_3 MDU-PPDS FKUI Wresti Indriatmi 80

Boxplot
• Boxplots graphically represent
the scores in a distribution
• Made using 5 number
summary
• Within the box are all scores that
fall between the 25th and 75th
percentile
• The whiskers capture all scores
within 1.5 IQRs of the box
boundary
• Outliers are between 1.5 and 3
IQRs
• Extreme outliers are beyond 3
IQRs
Biostat_3 MDU-PPDS FKUI Wresti Indriatmi 81

Displaying The Relationship


Between Two Continuous Variables
• The statistical method for assessing the
linear association between two
continuous variables is known as
correlation.
• The method for predicting the value of one
continuous variable from another is known as
regression.
• When preparing to conduct either analysis it
is essential to construct a scatter diagram
of the values of one of the variables against
the values of the other variable
• Constructed by drawing
Biostat_3 MDU-PPDS FKUI X- and Y-axes
Wresti Indriatmi 82
– The characteristic hypothesized to explain
ScatterorPlots
predict (Scatter Diagram)
or the one that occurs first (the
risk factor) is placed on the X-axis
– The characteristic or outcome to be
explained or predicted or the one that
occurs second is placed on the Y-axis
Biostat_3 MDU-PPDS FKUI Wresti Indriatmi 83

Scatter Diagram
Biostat_3 MDU-PPDS FKUI Wresti Indriatmi 84

Scatter Diagram
Biostat_3 MDU-PPDS FKUI Wresti Indriatmi 85

Scatter Diagram
Biostat_3 MDU-PPDS FKUI Wresti Indriatmi 86

Line Graph
• A graph showing the differences in frequencies or
percentages among categories of an interval-
ratio variable.
• Points representing the frequencies of each
category are placed above the midpoint of the
category and are joined by a straight line.
• Appropriate when the horizontal axis is
continuous rather than categories.
• They could be used to show progress over
time
– e.g. development of a measured skill each week over a
ten-week course
Biostat_3 MDU-PPDS FKUI Wresti Indriatmi 87

Line Graph
Biostat_3 MDU-PPDS FKUI Wresti Indriatmi 88

General Principles concerning


the Construction of Graphs
• Graphs should by fully explanatory
– Many readers don't read the detailed text, they
just look at the graph.
– The contents of the graph should be as
complete as possible.
• Title should include information concerning
who or what the subjects or experimental
material are,
• what observations are abstracted from
those subjects or material,
• and what restrictions of time and place
apply to the graph.
Biostat_3 MDU-PPDS FKUI Wresti Indriatmi 89

General Principles concerning the


Construction of Graphs
• E.g.: a presentation of birth rates in the state of Michigan
• never be headed merely "Birth Rates,"
• but might well be modified to say "Birth Rates per
1,000 Population, White Race, Michigan, 1920-
1960."
• If the length of title becomes a problem, additional
essential material can frequently be included in a
footnote.
• The graph should be as self-contained as possible,
requiring as little outside information for clear
interpretation as is feasible.
Biostat_3 MDU-PPDS FKUI Wresti Indriatmi 90

General Principles concerning the


Construction of Graphs
• Vertical and horizontal scales should by
clearly labeled and units should be identified.
– Most graphs present numerical information in
scaled form.
– Scales must be labeled in order to describe fully
the variable presented on the scale, and for
measurement variables the units of
measurement should identified.
– e.g.: weight (gms), age (years) etc...
Biostat_3 MDU-PPDS FKUI Wresti Indriatmi 91

General Principles concerning the


Construction of Graphs
• Do not try to include too much information in a
single graph.
– It is better to include several graphs than to compress
information too much.
– A device frequently used for the presentation of many
curves or trends is the presentation of a series of
small graphs.
– A safe rule of thumb is to avoid graphs containing
more than 3 curves.
Biostat_3 MDU-PPDS FKUI Wresti Indriatmi 92

General Principles concerning


the Construction of Graphs
• Graphs are intended to give an overview rather than a highly
detailed picture of a set of data.
• Do not include too much detail in a graph.
– Detailed presentations should be reserved for tables.
• Graphs condense detail to permit to see the forest rather
than the trees.
– If your main interest is in the trees, use a table.
• The inclusion of too much detail in a graph will tend to
obscure the essential points.
• Avoid inclusion of numbers within the body of a graph.
Biostat_3 MDU-PPDS FKUI Wresti Indriatmi 93

References
1. Peat J, Barton B. Medical statistics. A
guide to data analysis and critical
appraisal. Oxford, Blackwell Publishing
Ltd., 2005
2. Freeman JV, Walters SJ, Campbell MJ.
How to display data. Oxford, Blackwell
Publishing Ltd., 2008
3. Hall GM. How to write a paper. 5th Ed.
Oxford, John Wiley & Sons, Ltd., 2013

Potrebbero piacerti anche