Sei sulla pagina 1di 60

Sampling

4/6/2011

KOPPAR & ASSOCIATES, CHARTERED ACCOUNTANTS

Introduction
y A sample is a portion, piece or segment that is representative of the whole. y A sample is a part or small section selected from the population. y A population or universe may be defined as the aggregate of items possessing a common trait or traits. y Population may be finite or infinite. y The individual units of the population are called items or elements. y The process of selecting a sample from a population is known as sampling.

4/6/2011

KOPPAR & ASSOCIATES, CHARTERED ACCOUNTANTS

Essentials of Sampling
y A sample should possess the following essentials for valid conclusions of the experimental results.

A sample should have similar characteristics of the original population from which it has been selected. ii. Selected sample should be homogeneous. iii. More number of items is to be included in the sample to make the results more reliable. In other words, the size of the sample should be sufficiently large.
i.

4/6/2011

KOPPAR & ASSOCIATES, CHARTERED ACCOUNTANTS

Advantages and Shortcomings of Sampling Techniques


y Advantages:  Reduction of cost  Saving of time  In case of infinite population, sampling method is the only option available.  It is difficult to handle a population consisting of a very large number of elements.

4/6/2011

KOPPAR & ASSOCIATES, CHARTERED ACCOUNTANTS

Advantages and Shortcomings of Sampling Techniques (Cont d)


y Shortcomings:  If the population is very small, it may be impossible to draw a representative sample from it.  If the sample has not been drawn properly, the results may be inaccurate, false or misleading.  Personal bias or prejudice involved in selecting a sample may vitiate the results.  If the population is heterogeneous, then the sample may not reflect the true characteristics of the population.

4/6/2011

KOPPAR & ASSOCIATES, CHARTERED ACCOUNTANTS

Size of the Sample


y Size of the sample is important as a smaller sample may not truly represent the population and a bigger sample will be difficult to manage in terms of analysis and interpretation. y The size of the sample should be optimum as the one which fulfills the requirements of efficiency, representativeness, reliability and flexibility. y The size of sample depends on a number of considerations. Some of which are as under: a. If the population consists of a perfectly homogeneous units, a smaller sample will serve the purpose. (Ex: blood of a person). If the population consists of heterogeneous units, a large sample is inevitable for yielding reliable results.
KOPPAR & ASSOCIATES, CHARTERED ACCOUNTANTS

4/6/2011

Size of the Sample (Cont d)


b. The larger the size of the population, the bigger should be the sample size. c. The nature of the study also affects the size of the sample. For an intensive and continuous study, a small sample may be suitable. For studies which are not likely to be repeated, it may be necessary to take a large sample size d. The availability of trained personnel, finance and time and other practical considerations also constitute a big constraint on the sample size. e. It is not necessary that only a large sample will give accurate results. f. The size of the sample is also influenced by the sampling technique

4/6/2011

KOPPAR & ASSOCIATES, CHARTERED ACCOUNTANTS

Methods of Sampling
There are two methods of

selecting samples from

populations they are:


A. B.

Non-Random or Non-Probability Sampling Random or Probability Sampling

4/6/2011

KOPPAR & ASSOCIATES, CHARTERED ACCOUNTANTS

Non Probability Sampling


i. ii. iii.

Judgment Sampling Convenience Sampling Quota Sampling

4/6/2011

KOPPAR & ASSOCIATES, CHARTERED ACCOUNTANTS

Judgment Sampling

The method is so called as the choice of sample items depends exclusively on the judgment of the investigator. It is a simple method used to obtain a more representative sample. It is widely used in solving every day business problems and making public policy decisions. The drawback of this method is that it is based solely on the judgment of the individual and hence may be biased. The sample may not be representative in character and results may not be accurate.

4/6/2011

KOPPAR & ASSOCIATES, CHARTERED ACCOUNTANTS

Convenience Sampling
y This method involves selecting the sample on convenience and easy accessibility. y This method is quick and cheap. y It may not be representative in character and hence may not yield reliable results.

4/6/2011

KOPPAR & ASSOCIATES, CHARTERED ACCOUNTANTS

Quota Sampling
y It is a type of judgment sampling. y In this, sample quotas are fixed according to any characteristics of the population like income, sex, religion etc. y It involves less time and money. y It may not be representative of the population as it is based on the personal bias of the selector.

4/6/2011

KOPPAR & ASSOCIATES, CHARTERED ACCOUNTANTS

Probability Sampling Methods


y Probability or random sampling gives all members of the population a known chance of being selected for inclusion in the sample and this does not depend upon previous events in the selection process. y The four methods of random sampling are as under:

Simple Random Sampling ii. Systematic Sampling iii. Stratified Sampling iv. Cluster Sampling
i.

4/6/2011

KOPPAR & ASSOCIATES, CHARTERED ACCOUNTANTS

Simple Random Sampling


y Simple random sampling selects samples by methods that allow each possible sample to have an equal probability of being picked and each item in the entire population to have an equal chance of being included in the sample. y One way of ensuring randomness of selection is to adopt the lottery method. y This is the most popular and simplest method of selecting a random sample from a finite population (note that this method is inapplicable if the population is infinite.) y In this method, all items of population are numbered on separate slips of paper of identical size, shape and color. These slips are folded and mixed up in a box and a blindfold selection is made. y Each slip has to be replaced once it is drawn out to ensure that the probability of selecting a second slip remains the same.
KOPPAR & ASSOCIATES, CHARTERED ACCOUNTANTS

4/6/2011

Simple Random Sampling (Cont d)


y Another method of random sampling is to select a sample with the help of random numbers. These numbers can be generated either by a computer programmed to scramble numbers or by a table of random numbers. y This method is more scientific as there are less chances of personal bias. y The method is also economical as it saves time, money, and labour. y The method however suffers from the following drawbacks: a. The method requires a complete list of all the items of the population. This may not be available in many cases. b. When the size of the sample is small, it will not be a true representative of the population.
KOPPAR & ASSOCIATES, CHARTERED ACCOUNTANTS

4/6/2011

Systematic Sampling
y In systematic sampling, elements are selected from a population at a uniform interval that is measured in time, order or space. y Systematic sampling differs from simple random sampling in that each element has an equal chance of being selected but each sample does not have an equal chance of being selected. y It is a relatively simple and convenient method of sample selection. y It involves less time and labour. y The main demerit of this method is that it may not represent the whole population.
KOPPAR & ASSOCIATES, CHARTERED ACCOUNTANTS

4/6/2011

Stratified Sampling
y In this method, we divide the population into relatively homogeneous groups, called strata. y Then we use one of the two approaches- either select at random from each stratum a specified number of elements corresponding to the proportion of that stratum in the population as a whole or draw an equal number of elements from each stratum and give weights to the results according to the stratum s proportion of total population. y Stratified sampling is appropriate when the population is already divided into groups of different sizes. y Stratified sampling, if properly designed accurately reflects the characteristics of the population from which they were chosen as compared to other sampling methods.
4/6/2011 KOPPAR & ASSOCIATES, CHARTERED ACCOUNTANTS

Cluster Sampling
y In this method, the population is divided into some recognizable subgroups which are called clusters. y A random sample of these clusters is drawn and all the units belonging to the selected clusters constitute the sample. y In this method, the clusters should be of small size and the number of units in each cluster must be more or less the same. y The method offers flexibility which is lacking in other methods. y It is less time consuming and less expensive. y The method is less accurate than any other method of selecting a sample.

4/6/2011

KOPPAR & ASSOCIATES, CHARTERED ACCOUNTANTS

Cluster Sampling (Cont d)


y In both stratified sampling and cluster sampling, the population is divided into well defined groups. y Stratified sampling is used when each group has a small variation within itself but there is a wide variation between the groups. y Cluster sampling is the opposite case when there is considerable variation in each group but the groups are essentially similar to each other.

4/6/2011

KOPPAR & ASSOCIATES, CHARTERED ACCOUNTANTS

Sampling Distribution
y The mean and standard deviation computed from a sample need not be the same as the mean and standard deviation computed from another sample. y A probability distribution of all the means of all samples is a distribution of the sample means. This is called a sampling distribution of the mean. y Similarly a probability distribution of all the medians (modes or proportions) of all samples is a sampling distribution of the median( or mode or proportion). y Such a sampling distribution can be described by its mean and standard deviation.

4/6/2011

KOPPAR & ASSOCIATES, CHARTERED ACCOUNTANTS

Sampling Distribution (Cont d)


y The standard deviation of a sampling distribution of sample means is called standard error of the mean. y Similarly the standard deviation of a sampling distribution of sample proportions is called standard error of the proportion. y It is called standard error as the variability in the sample statistic is due to sampling errors. y Thus the standard deviation of the distribution of a sample statistic is known as standard error of the statistic.

4/6/2011

KOPPAR & ASSOCIATES, CHARTERED ACCOUNTANTS

Variability of a Sampling Distribution


y The variability of a sampling distribution is measured by

its variance or its standard deviation. The variability of a sampling distribution depends on three factors:

y N: The number of observations in the population. y n: The number of observations in the sample. y The way that the random sample is chosen. y If the population size is much larger than the sample size, then

the sampling distribution has roughly the same sampling error, whether we sample with or without replacement. On the other hand, if the sample represents a significant fraction (say, 1/10) of the population size, the sampling error will be noticeably smaller, when we sample without replacement.
KOPPAR & ASSOCIATES, CHARTERED ACCOUNTANTS

4/6/2011

Central Limit Theorem


y The

y y y y

central limit theorem states that the sampling distribution of any statistic will be normal or nearly normal, if the sample size is large enough. How large is "large enough"? As a rough rule of thumb, many statisticians say that a sample size of 30 is large enough. If you know something about the shape of the sample distribution, you can refine that rule. The sample size is large enough if any of the following conditions apply. The population distribution is normal. The sampling distribution is symmetric, unimodal, without outliers, and the sample size is 15 or less. The sampling distribution is moderately skewed, unimodal, without outliers, and the sample size is between 16 and 40. The sample size is greater than 40, without outliers.
KOPPAR & ASSOCIATES, CHARTERED ACCOUNTANTS

4/6/2011

Sampling Distribution of the Mean


y Suppose we draw all possible samples of size n from a population of size N. Suppose further that we compute a mean score for each sample. In this way, we create a sampling distribution of the mean. y We know the following. The mean of the population ( ) is equal to the mean of the sampling distribution ( x). And the standard error of the sampling distribution ( x) is determined by the standard deviation of the population ( ), the population size, and the sample size. These relationships are shown in the equations below: y
x

and

* ( 1/n - 1/N )

4/6/2011

KOPPAR & ASSOCIATES, CHARTERED ACCOUNTANTS

Sampling Distribution of Mean


y Therefore, we can specify the sampling distribution of

the mean whenever two conditions are met:

y The population is normally distributed, or the sample

size is sufficiently large.

y The population standard deviation

is known.

y Note: When the population size is very large, the factor

1/N is approximately equal to zero; and the standard deviation formula reduces to: x = / (n).
KOPPAR & ASSOCIATES, CHARTERED ACCOUNTANTS

4/6/2011

Sampling Distribution of the Proportion


y In a population of size N, suppose that the probability of the occurrence of an event (dubbed a "success") is P; and the probability of the event's non-occurrence (dubbed a "failure") is Q. From this population, suppose that we draw all possible samples of size n. And finally, within each sample, suppose that we determine the proportion of successes p and failures q. In this way, we create a sampling distribution of the proportion. y We find that the mean of the sampling distribution of the proportion ( p) is equal to the probability of success in the population (P). And the standard error of the sampling distribution ( p) is determined by the standard deviation of the population ( ), the population size, and the sample size. These relationships are shown in the equations below:
4/6/2011 KOPPAR & ASSOCIATES, CHARTERED ACCOUNTANTS

Sampling Distribution of the Proportion


y
p

=P

and = [ PQ ].

* ( 1/n - 1/N ) = [ PQ/n - PQ/N ]

where

y Note: When the population size is very large, the factor PQ/N is approximately equal to zero; and the standard deviation formula reduces to: p = ( PQ/n ).

4/6/2011

KOPPAR & ASSOCIATES, CHARTERED ACCOUNTANTS

Concept Problem
y Assume that a school district has 10,000 6th graders.

In this district, the average weight of a 6th grader is 80 pounds, with a standard deviation of 20 pounds. Suppose you draw a random sample of 50 students. What is the probability that the average weight of a sampled student will be less than 75 pounds?

4/6/2011

KOPPAR & ASSOCIATES, CHARTERED ACCOUNTANTS

Solution
y To solve this problem, we need to define the sampling

distribution of the mean. Because our sample size is greater than 40, the Central Limit Theorem tells us that the sampling distribution will be normally distributed.
y To define our normal distribution, we need to know

both the mean of the sampling distribution and the standard deviation. Finding the mean of the sampling distribution is easy, since it is equal to the mean of the population. Thus, the mean of the sampling distribution is equal to 80.
4/6/2011 KOPPAR & ASSOCIATES, CHARTERED ACCOUNTANTS

Solution
y The standard deviation of the sampling distribution

can be computed using the following formula.


x

= * ( 1/n - 1/N ) x = 20 * ( 1/50 - 1/10000 ) = 20 * ( 0.0199 ) = 20 * 0.141 = 2.82 normally distributed with a mean of 80 and a standard deviation of 2.82. We want to know the probability that a sample mean is less than or equal to 75 pounds. To solve the problem, we calculate the value of z and from that the probability that the average weight of a sampled student is less than 75 pounds is equal to 0.038.
KOPPAR & ASSOCIATES, CHARTERED ACCOUNTANTS

y We know that the sampling distribution of the mean is

4/6/2011

Concept Problem

y Find the probability that of the next 120 births, no more than 40% will be boys. Assume equal probabilities for the births of boys and girls. Assume also that the number of births in the population (N) is very large, essentially infinite.

4/6/2011

KOPPAR & ASSOCIATES, CHARTERED ACCOUNTANTS

Solution
y The Central Limit Theorem tells us that the proportion of boys in 120 births will be normally distributed. y The mean of the sampling distribution will be equal to the mean of the population distribution. In the population, half of the births result in boys; and half, in girls. Therefore, the probability of boy births in the population is 0.50. Thus, the mean proportion in the sampling distribution should also be 0.50. y The standard deviation of the sampling distribution can be computed using the following formula. y p = [ PQ/n - PQ/N ] p = [ (0.5)(0.5)/120 ] = [ 0.25/120 ] = 0.04564
4/6/2011 KOPPAR & ASSOCIATES, CHARTERED ACCOUNTANTS

Solution
y In the above calculation, the term PQ/N was equal to

zero, since the population size (N) was assumed to be infinite.


y We know that the sampling distribution of the

proportion is normally distributed with a mean of 0.50 and a standard deviation of 0.04564. We want to know the probability that no more than 40% of the sampled births are boys. To solve the problem, we calculate the value of z and from that the probability that no more than 40% of the sampled births are boys is equal to 0.014.
4/6/2011 KOPPAR & ASSOCIATES, CHARTERED ACCOUNTANTS

Difference Between Proportions


y Suppose we have two populations with proportions equal to P1 and P2.

Suppose further that we take all possible samples of size n1 and n2. And finally, suppose that the following assumptions are valid. the population. That is, N1 is large relative to n1, and N2 is large relative to n2. (In this context, populations are considered to be large if they are at least 10 times bigger than their sample.) normal distribution to model differences between proportions. The sample sizes will be big enough when the following conditions are met: n1P1 > 10, n1(1 -P1) > 10, n2P2 > 10, and n2(1 - P2) > 10. not affected by observations in population 2, and vice versa.

y The size of each population is large relative to the sample drawn from

y The samples from each population are big enough to justify using a

y The samples are independent; that is, observations in population 1 are

4/6/2011

KOPPAR & ASSOCIATES, CHARTERED ACCOUNTANTS

Difference Between Proportions


y Given these assumptions, we know the following. y The set of differences between sample proportions will be

normally distributed. We know this from the central limit theorem.

y The expected value of the difference between all possible sample

proportions is equal to the difference between population proportions. Thus, E(p1 - p2) = P1 - P2.

y The standard deviation of the difference between sample

proportions ( d) is approximately equal to:


d

= { [P1(1 - P1) / n1] + [P2(1 - P2) / n2] }


KOPPAR & ASSOCIATES, CHARTERED ACCOUNTANTS

4/6/2011

Difference Between Proportions


y The variance of the difference between independent

random variables is equal to the sum of the individual variances. Thus, 2d = 2P1 - P2 = 21 + 22 y If the populations N1 and N2 are both large relative to n1 and n2, respectively, then 2 = P (1 - P ) / n 2 = P (1 - P ) / n And 2 2 2 2 1 1 1 1 Therefore, y 2d = [ P1(1 - P1) / n1 ] + [ P2(1 - P2) / n2 ]
y And
4/6/2011

= { [ P1(1 - P1) / n1 ] + [ P2(1 - P2) / n2] }


KOPPAR & ASSOCIATES, CHARTERED ACCOUNTANTS

Concept Problem 1
y In one state, 52% of the voters are Republicans, and 48% are Democrats. In a second state, 47% of the voters are Republicans, and 53% are Democrats. Suppose 100 voters are surveyed from each state. Assume the survey uses simple random sampling. y What is the probability that the survey will show a greater percentage of Republican voters in the second state than in the first state?

4/6/2011

KOPPAR & ASSOCIATES, CHARTERED ACCOUNTANTS

Solution
y For this analysis, let P1 = the proportion of Republican voters in the first state, P2 = the proportion of Republican voters in the second state, p1 = the proportion of Republican voters in the sample from the first state, and p2 = the proportion of Republican voters in the sample from the second state. The number of voters sampled from the first state (n1) = 100, and the number of voters sampled from the second state (n2) = 100. y The solution involves four steps. y Make sure the samples from each population are big enough to model differences with a normal distribution. Because n1P1 = 100 * 0.52 = 52, n1(1 - P1) = 100 * 0.48 = 48, n2P2 = 100 * 0.47 = 47, and n2(1 - P2) = 100 * 0.53 = 53 are each greater than 10, the sample size is large enough. y Find the mean of the difference in sample proportions: E(p1 - p2) = P1 - P2 = 0.52 - 0.47 = 0.05.
4/6/2011 KOPPAR & ASSOCIATES, CHARTERED ACCOUNTANTS

Solution
y Find the standard deviation of the difference. y d = { [ P1(1 - P1) / n1 ] + [ P2(1 - P2) / n2 ] } [ (0.52)(0.48) / 100 ] + [ (0.47)(0.53) / 100 ] } d = d = (0.002496 + 0.002491) = (0.004987) = 0.0706 y Find the probability. This problem requires us to find the probability that p1 is less than p2. This is equivalent to finding the probability that p1 - p2 is less than zero. To find this probability, we need to transform the random variable (p1 - p2) into a z-score. That transformation appears below. y zp1 - p2 = (x - p1 - p2) / d = = (0 - 0.05)/0.0706 = -0.7082 y The probability of a z-score being -0.7082 or less is 0.24. y Therefore, the probability that the survey will show a greater percentage of Republican voters in the second state than in the first state is 0.24.
4/6/2011 KOPPAR & ASSOCIATES, CHARTERED ACCOUNTANTS

Difference Between Means


y Suppose we have two populations with means equal to and Suppose further that we take all 1 2. possible samples of size n1 and n2. And finally, suppose that the following assumptions are valid. y The size of each population is large relative to the sample drawn from the population. That is, N1 is large relative to n1, and N2 is large relative to n2. (In this context, populations are considered to be large if they are at least 10 times bigger than their sample.) y The samples are independent; that is, observations in population 1 are not affected by observations in population 2, and vice versa. y The set of differences between sample means are normally distributed. This will be true if each population is normal or if the sample sizes are large. (Based on the central limit theorem, sample sizes of 40 are large enough).
4/6/2011 KOPPAR & ASSOCIATES, CHARTERED ACCOUNTANTS

Difference Between Means


y Given these assumptions, we know the following. y The expected value of the difference between all possible sample means is equal to the difference between population means. Thus, E(x1 - x2) = d = 1 - 2. y The standard deviation of the difference between sample means ( d) is approximately equal to:

= ( 12 / n1 + 22 / n2 ) d y The variance of the difference between independent random variables is equal to the sum of the individual variances.
y

4/6/2011

KOPPAR & ASSOCIATES, CHARTERED ACCOUNTANTS

Difference Between Means


Thus, y 2d =
2 (x1 - x2) = 2 x1 + 2 x2

y If the populations N1 and N2 are both large relative to n1 and n2, respectively, then y
2 x1 = 2 1

/ n1

And

x2 =

2 2

/ n2

Therefore, and y d2 = 12 / n1 + 22 / n2 2 2 y d = ( 1 / n1 + 2 / n2 )
4/6/2011 KOPPAR & ASSOCIATES, CHARTERED ACCOUNTANTS

Concept Problem
y For boys, the average number of absences in the first grade is 15 with a standard deviation of 7; for girls, the average number of absences is 10 with a standard deviation of 6. y In a nationwide survey, suppose 100 boys and 50 girls are sampled. What is the probability that the male sample will have at most three more days of absences than the female sample?

4/6/2011

KOPPAR & ASSOCIATES, CHARTERED ACCOUNTANTS

Solution:
y Find the mean difference (male absences minus female absences) in the population. y d = 1 - 2 = 15 - 10 = 5 y Find the standard deviation of the difference. y d = ( 12 / n1 + 22 / n2 ) = (72/100 + 62/50) = (49/100 + 36/50) = (0.49 + .72) = d (1.21) = 1.1 y Find the z-score that is produced when boys have three more days of absences than girls. When boys have three more days of absences, the number of male absences minus female absences is three. And the associated z-score is y z = (x - )/ = (3 - 5)/1.1 = -2/1.1 = -1.818
KOPPAR & ASSOCIATES, CHARTERED ACCOUNTANTS

4/6/2011

Solution
y Find the probability. We find that the probability of a zscore being -1.818 or less is about 0.035. y Therefore, the probability that the difference between samples will be no more than 3 days is 0.035.

4/6/2011

KOPPAR & ASSOCIATES, CHARTERED ACCOUNTANTS

Overview of the Central Limit Theorem


y If a random sample of N cases is drawn from a population with

mean m and standard deviation s, then the sampling distribution of the mean (the distribution of all possible means for samples of size N) 1) has a mean equal to the population mean mx 2) has a standard deviation (also called "standard error" or "standard error of the mean") equal to the population standard deviation, sx, divided by the square root of the sample size, N: 3) and the shape of the sampling distribution of the mean approaches normal as N increases.

This last point is especially important: The shape of the sampling distribution approaches normal as the size of the sample increases, whatever the shape of the population distribution.

4/6/2011

KOPPAR & ASSOCIATES, CHARTERED ACCOUNTANTS

Overview of the Central Limit Theorem


y The facts represented in the Central Limit Theorem allow us to determine the

likely accuracy of a sample mean, but only if the sampling distribution of the mean is approximately normal.

y If the population distribution is normal, then the sampling distribution of

the mean will be normal for any sample size N (even N = 1). If a population distribution is not normal, but it has a bump in the middle and no extreme scores and no strong skew, then a sample of even modest size (e.g., N = 30) will have a sampling distribution of the mean that is very close to normal. However, if the population distribution is far from normal (e.g., extreme outliers or strong skew), then to produce a sampling distribution of the mean that is close to normal it may be necessary to draw a very large sample (e.g., N= 500 or more).

y Important note: You should not assume that the sampling distribution of the

mean is normal without considering the shape of the population distribution and the size of your sample. A sample with N > 30 does not guarantee a normal sampling distribution if the population distribution is far from normal.

4/6/2011

KOPPAR & ASSOCIATES, CHARTERED ACCOUNTANTS

A Few CLT-Related Terms


Mean y The mean is the most common indicator of an 'average' score. It is computed by dividing the sum of all scores by the number of scores. If the distribution has extreme outliers and/or skew, the mean may not be very descriptive of a 'typical' score. Standard deviation y The standard deviation is a common measure of variation of scores. The standard deviation is computed by taking the square root of the variance. The larger the standard deviation (and variance), the wider the distribution and the further the scores are from the mean. Like the mean, the standard deviation is sensitive to outlying scores.

4/6/2011

KOPPAR & ASSOCIATES, CHARTERED ACCOUNTANTS

A Few CLT-Related Terms


Variance y Variance is a measure of how much scores in a distribution vary from the mean. Mathematically, variance is the average of the squared deviations from the mean. Taking the square root of variance results in the standard deviation. Population versus sample y A population consists of all cases in the group of interest. A sample is a group of cases selected from all possible cases in the population. For example, if the group of interest is American working women, the population would include each and every working woman in America. Usually it is impossible to collect data on an entire population. Instead, we use one of many sampling techniques to select a subgroup from the population. This subgroup is a sample.
4/6/2011 KOPPAR & ASSOCIATES, CHARTERED ACCOUNTANTS

A Few CLT-Related Terms


Sample size y A sample is a subset taken from the population of interest. The number of sampled cases is called the sample size. Sampling distribution of the mean y The sampling distribution of the mean is a theoretical distribution. If you were to draw an infinite number of samples with a particular sample size from a population you would get an infinite number of sample means (one for each sample you drew). The distribution of these means is the sampling distribution of means for your population at that particular sample size.

4/6/2011

KOPPAR & ASSOCIATES, CHARTERED ACCOUNTANTS

A Few CLT-Related Terms


Normal distribution y The normal curve (also called the "Bell curve" or "the Gaussian distribution") is a theoretical distribution mathematically defined by its mean and variance. When graphed the normal distribution has a shape similar to a bell curve (see Figure below). Naturally occurring distributions are rarely normal in shape. However, the distributions of many chance events do approach normal shape. Importantly, the distribution of possible means for a randomly selected sample is approximately normal if the sample is sufficiently large. The area under the curve for a standardized normal curve is exactly 1.00 or 100%, which is useful for finding probabilities.
KOPPAR & ASSOCIATES, CHARTERED ACCOUNTANTS

4/6/2011

Sampling Errors
y Sampling errors have their origin in sampling as sample is never a perfect miniature of the population. y Sampling errors are of two types

Biased Errors: These errors arise because of bias in selection. ii. Unbiased Errors: Unbiased errors arise due to chance difference between members of the population included in the sample and members not included in the sample. It is known as random sampling error. With the increase in the size of the sample, unbiased errors tend to decrease in magnitude.
i.

4/6/2011

KOPPAR & ASSOCIATES, CHARTERED ACCOUNTANTS

Sampling from Normal Population


y The sampling distribution of a mean of a sample taken from a normally distributed population demonstrates the following important properties.  The sampling distribution has a mean equal to the population mean.

x = .  The sampling distribution has a standard deviation (standard error) equal to the population standard deviation divided by the square root of the sample size x= / n

4/6/2011

KOPPAR & ASSOCIATES, CHARTERED ACCOUNTANTS

Sampling from Normal Population (Cont d)


y To compute the probability that the sample mean will lie between a range we use the formula:

Z = [X - ]/ x Where X = sample means = population mean x = standard error of the mean = / n

4/6/2011

KOPPAR & ASSOCIATES, CHARTERED ACCOUNTANTS

Sampling from Non Normal Population


y When the population is normally distributed the sampling distribution of the mean is also normal. y It has been observed that the mean of the sampling distribution of the mean will equal the population mean regardless of the sample size even if the population is not normal. y Second, as the sample size increases, the sampling distribution of the mean will approach normality, regardless of the shape of the population distribution.

4/6/2011

KOPPAR & ASSOCIATES, CHARTERED ACCOUNTANTS

Sampling from Non Normal Population (Cont d)


y Central Limit Theorem: The relationship between the shape of the population distribution and the shape of the sampling distribution of the means is called Central Limit Theorem. y This theorem states that the sampling distribution of the mean approaches normal distribution as the sample size increases. y It has been observed that wherever the sample size is at least thirty, the sampling distribution approximates the normal distribution.

4/6/2011

KOPPAR & ASSOCIATES, CHARTERED ACCOUNTANTS

Standard Error when Population is Finite


y

= [ / n ][ (N n)/(N -1)] Where N = size of the population & n = size of the sample
x

y The term ][ (N multiplier

n)/(N -1)] is called the finite population

4/6/2011

KOPPAR & ASSOCIATES, CHARTERED ACCOUNTANTS

Concept Problem
y From a population of 125 items with a mean of 105 and standard deviation of 17, 64 items were chosen.
i. ii.

What is the standard error of mean? What is the probability that the sample mean will be between 107.5 & 109.

4/6/2011

KOPPAR & ASSOCIATES, CHARTERED ACCOUNTANTS

Solution:
N = 125 = 105 = 17 n = 64 x = [ / n ][ (N n)/(N -1)] = 1.4904 To compute probability, find the area of the curve between mean of 107.5 & 109. For x = 107.5 = x - / x = 107.5 105/ 1.4904 = 2.5/1.4904 = 1.68
4/6/2011 KOPPAR & ASSOCIATES, CHARTERED ACCOUNTANTS

Solution (Cont d)
This corresponds to area of 0.4535 [From Z table] For x = 109 = 109 105/ 1.4904 = 2.683 This corresponds to area of 0.4963 [From Z table] So the probability that the mean will lie between the 2 values = 0.4963 0.4535 = 0.0428 4.28 %

4/6/2011

KOPPAR & ASSOCIATES, CHARTERED ACCOUNTANTS

Potrebbero piacerti anche