Sei sulla pagina 1di 28

Frequency and Sampling

Distribution

Frequency distributions
Raw data are collection of that have not been organised numerically. An
example is the set of weight of say 100 male students obtained from an
alphabetical listing from university records.
Array is an arrangement of raw numerical data in an ascending or
descending order of magnitude.
Useful data are distributed into classes or categories. The tabular
arrangement of data by classes together with corresponding frequency is
called frequency distribution

Example frequency table


Weights kg

Number of students

60-62
63-65
66-68
69-71
72-74

5
18
42
27
8

The frequency distribution of weights of 100 male students at XYZ


university is given above.
The first class consists of weights from 60-62kg. The data organised
as in the above frequency distribution are often called grouped data

Bar Chart

Bar Chart
A set of rectangles
having base on the
horizontal axis, with the
centres at the class mark
and length equal class
interval size, and areas
proportional to class
frequencies

Frequency Polygon

Polygon
Is a line graph of the
class frequency plotted
against the class mark.
It can be obtained by
connecting the mid
points

Cumulative frequency
Weights

No of students

59.5

62.5

65.5

23

68.5

65

71.5

92

74.5

100

A graph showing the cumulative frequency distribution of all value


greater than or equal to the lower class boundary of each class int

Power Plant Example


Frequency distribution of mill availability (1999-2005)
Number of mills
1
2
3
4
5
6
7
8

unit 1
0
0
1
55
204
506
1064
413

Number of mills
1
2
3
4
5
6
7
8

unit 3
0
0
10
35
67
411
1322
457

Number of mills
1
2
3
4
5
6
7
8
Number of mills
1
2
3
4
5
6
7
8

unit 2
0
0
4
48
188
615
1149
286
unit 4
0
0
2
9
29
278
1220
780

Cumulative Frequency

Gas Turbine Data (5 years)


Summary of number of preventive maintenance activities

Gas
turbine

Gas
generator

Lubrication

Control
monitoring

Power
turbine

Starting
system

359

17

360

29

361

20

362

40

15

363

39

25

13

There are many preventive maintenance activities at different intervals. An initial question to ask is
whether or not they were carried out only upon the failure of other items. Some information about
condition monitoring activities was gathered following a number of preventive maintenance and
corrective maintenance activities

Gas Turbine Data (5 years)


Summary of number of corrective maintenance actions

Gas
turbine
359
360
361
362
363

Gas
generator
6
8
12
38
27

Lubrication

Control
monitoring

10
8
5
22
22

4
1
2
10
7

Power
turbine
0
3
2
1
10

Starting
system
3
2
1
4
1

Dual redundancy with spares was experienced over the observation period of five years. Now it
remains to be seen what levels of corrective maintenance are performed on the oil platform with lots
of redundancy as opposed to with no redundancy. The corrective maintenance actions were referred
to as failure repairs or replacements, while others were classified as periodic replacements;
10

PM and CM Activities

The differences between the gas turbines and their sub-units are clearly evident. Among gas turbines
G362 and G363, there are a lot of failures and maintenance activities. The question here is whether or
not these gas turbines are identical. If not there might be no particular reason for such similarities.
Similarly, units G359, G360 and G361 display roughly equal numbers of failures and might have
some commonality with each other.
11

Frequency of PM and CM

The history data indicate that a substantial amount of preventive maintenance activities,
which consist of minor periodic service tasks, inspections and periodic condition monitoring
activities, are performed but the failure frequency of the gas generators does not improve.
This might possibly be due to imperfect maintenance, or the interval period of PM activities
12
may not be appropriate since similar failures were repeatedly observed.

Sampling Distribution

A sampling distribution shows how a statistic


would vary with repeated random sampling of the
same size and from the same population.

A sampling distribution, therefore, is a probability


distribution of the results of an infinitely large
number of such samples.

13
13

Descriptive Measures
When data is clustered, or grouped around a central point, this central point is
often used to describe the data, or the population, and is used as a reference.
The mean (average), median and mode are measures of central tendency.
Mean (or average) is the sum of all the observations (X) divided by the
number of observations (n).
n

Mean =

x
i

i 1

Median is the middle value of an ordered set of data.


Mode is the value which occurs most frequently in a set of data

14

Variance
For a sample and a population the equations are:

x
n

Sample Variance =

S2

i 1

(n 1)

Where s is the sample variance, X is the sample mean, x is a data value and n
is the number of values (s is the sample standard deviation).
N

Population Variance =

x
i 1

N
Where is the population variance, is the population average, X is a data
value and N is the number of values ( is the population standard deviation).
2

15

Standard Deviation
Standard Deviation is the square root of the variance. The standard deviation is the
most useful measurement of the spread of data in statistical analysis.

x
n

Sample Standard Deviation

S S2

i 1

(n 1)
N

Population Standard Deviation

x
i 1

The standard deviation is the measure of spread or scatter in the population expressed in
the original units.

16

Population Distribution

A population distribution of a random variable is


the distribution of its values for all members of the
population.

Thus a population distribution is also the


probability distribution of the random variable
when we choose one individual (i.e. observation
or subject) from the population at random.
17

Sampling distribution of a
sample mean

Sampling distribution of a sample mean: if a population


has a normal distribution, then the sampling distribution of
a sample mean of x for n independent observations will
also have a normal distribution.

General fact: any linear combination of independent


normal random variables is normally distributed.

18

Standard deviation of a sample mean:


Standard error

The standard error is calculated by dividing the


standard deviation of the sample mean by the
square root of sample size-n.

Doing so anchors the standard deviation to the


samples size-n: the sampling distribution of the
sample mean across relatively small samples has
larger spread and across relatively large samples
has smaller spread.
19

Sampling distribution of a
sample mean:

If a population distribution

N ( , )

The sampling distribution of the sample mean is

x N ( , / n )

Normal if the population distribution is normal (i.e. a sample mean is a


linear combination of independent normal random variables).

The sampling distribution is approximately normal for large samples


in any case (according to the Central Limit Theorem).

20

Normal Population

We can apply the Central Limit Theorem:

Even if the population is not normal,


sample means from the population will be
approximately normal as long as the sample size is large
enough.

Properties of the sampling distribution:


x

n
21

Sampling Distributions
Non-Normal Population
Population Distribution

Sampling Distribution
(becomes normal as n increases)

Larger
sample
size

Smaller sample size

x
22

Central limit theorem

Sample statistics have distributions.

These are normally distributed (considers both mean and variance).

As sample size increases, our sample statistic approaches the


population statistic.
Example: from a population of five logs of wood, we can only sample
three. The five logs of wood had the following ring widths:
0.50

0.75

1.00

1.50

2.00

population mean = ?
average of all sample means = ?
23

Central limit theorem


Population mean = 1.15
(0.50+0.75+1.00)/3 = 0.75
(0.50+0.75+1.50)/3 = 0.92
(0.50+0.75+2.00)/3 = 1.08
(0.50+1.00+1.50)/3 = 1.00
(0.50+1.00+2.00)/3 = 1.17
(0.50+1.50+2.00)/3 = 1.33
(0.75+1.00+1.50)/3 = 1.08
(0.75+1.50+2.00)/3 = 1.42
(1.00+1.50+2.00)/3 = 1.50
Average of all sample means = 1.14
(rounding error here)

0.50
0.75
1.00
1.50
2.00
24

Central limit theorem


Sample size means everything! The more samples we collects,
the closer we obtain information on the population itself!

Average conditions become more prominent.

The variability about the mean becomes less prominent.

25

Central Limit Theorem

As the size of a random sample increases, the sampling


distribution of the sample mean gets closer to a normal
distribution.

This is true no matter what shape the population


distribution has.

Note: the Central Limit Theorem applies to the sampling


distribution of not only sample means but also sample
sums
26

Sampling Dilemma

Sampling does a good job of accepting very good


lots and rejecting bias lots. Unfortunately, a large
area of indecision lies in the middle.

The sampling rule is based on probability, and the


application of probability predicts the acceptance
of lots with substandard quality.

27

QUESTIONS?

28

Potrebbero piacerti anche