Sei sulla pagina 1di 31

N

The sample size (N) is the total number of observations in the sample.

Interpretation
The sample size affects the confidence interval and the power of the test.

Usually, a larger sample size results in a narrower confidence interval. A larger sample size
also gives the test more power to detect a difference. For more information, go to What is
power?.

Mean
Minitab displays the mean for each sample and the mean of the differences between the
paired observations.

The mean summarizes the sample values with a single value that represents the center of
the data. The mean is the average of the data, which is the sum of all the observations
divided by the number of observations.

To calculate the mean difference, Minitab calculates the differences between the paired
observations and then calculates the mean of the differences.

Interpretation
The mean difference is an estimate of the population mean difference.

Because the mean difference is based on sample data and not on the entire population, it is
unlikely that the sample mean difference equals the population mean difference. To better
estimate the population mean difference, use the confidence interval of the difference.

StDev
The standard deviation is the most common measure of dispersion, or how spread out the
data are about the mean. The symbol σ (sigma) is often used to represent the standard
deviation of a population, while s is used to represent the standard deviation of a sample.
Variation that is random or natural to a process is often referred to as noise.
The standard deviation uses the same units as the data.

Interpretation
Use the standard deviation to determine how spread out the data are from the mean. A
higher standard deviation value indicates greater spread in the data. A good rule of thumb
for a normal distribution is that approximately 68% of the values fall within one standard
deviation of the mean, 95% of the values fall within two standard deviations, and 99.7% of
the values fall within three standard deviations.

The standard deviation of the sample data is an estimate of the population standard
deviation. The standard deviation is used to calculate the confidence interval and the p-
value. A higher value produces less precise (wider) confidence intervals and less powerful
tests.

The standard deviation can also be used to establish a benchmark for estimating the overall
variation of a process.

Hospital 1

Hospital 2

Hospital discharge times


Administrators track the discharge time for patients who are treated in the emergency departments of
two hospitals. Although the average discharge times are about the same (35 minutes), the standard
deviations are significantly different. The standard deviation for hospital 1 is about 6. On average, a
patient's discharge time deviates from the mean (dashed line) by about 6 minutes. The standard deviation
for hospital 2 is about 20. On average, a patient's discharge time deviates from the mean (dashed line) by
about 20 minutes.

SE mean
The standard error of the mean (SE Mean) estimates the variability between sample means
that you would obtain if you took repeated samples from the same population. Whereas the
standard error of the mean estimates the variability between samples, the standard
deviation measures the variability within a single sample.

For example, you have a mean delivery time of 3.80 days, with a standard deviation of 1.43
days, from a random sample of 312 delivery times. These numbers yield a standard error of
the mean of 0.08 days (1.43 divided by the square root of 312). If you took multiple random
samples of the same size, from the same population, the standard deviation of those
different sample means would be around 0.08 days.

Interpretation
Use the standard error of the mean to determine how precisely the sample mean estimates
the population mean.

A smaller value of the standard error of the mean indicates a more precise estimate of the
population mean. Usually, a larger standard deviation results in a larger standard error of
the mean and a less precise estimate of the population mean. A larger sample size results in
a smaller standard error of the mean and a more precise estimate of the population mean.

Minitab uses the standard error of the mean to calculate the confidence interval.

Confidence interval (CI) and bounds


The confidence interval provides a range of likely values for the population mean difference.
Because samples are random, two samples from a population are unlikely to yield identical
confidence intervals. But, if you repeated your sample many times, a certain percentage of
the resulting confidence intervals or bounds would contain the unknown population mean
difference. The percentage of these confidence intervals or bounds that contain the mean
difference is the confidence level of the interval. For example, a 95% confidence level
indicates that if you take 100 random samples from the population, you could expect
approximately 95 of the samples to produce intervals that contain the population mean
difference.

An upper bound defines a value that the population mean difference is likely to be less
than. A lower bound defines a value that the population mean difference is likely to be
greater than.

The confidence interval helps you assess the practical significance of your results. Use your
specialized knowledge to determine whether the confidence interval includes values that
have practical significance for your situation. If the interval is too wide to be useful, consider
increasing your sample size. For more information, go to Ways to get a more precise
confidence interval.

Estimation for Paired Difference

95% CI for
Mean StDev SE Mean μ_difference
2.200 3.254 0.728(0.677, 3.723)

µ_difference: mean of (Before - After)

In these results, the estimate for the population mean difference in heart rates is 2.2. You can be 95%
confident that the population mean difference is between 0.677 and 3.723.

Null hypothesis and alternative hypothesis


The null and alternative hypotheses are two mutually exclusive statements about a
population. A hypothesis test uses sample data to determine whether to reject the null
hypothesis.
Null hypothesis
The null hypothesis states that a population parameter (such as the mean, the
standard deviation, and so on) is equal to a hypothesized value. The null hypothesis
is often an initial claim that is based on previous analyses or specialized knowledge.

Alternative hypothesis
The alternative hypothesis states that a population parameter is smaller, larger, or
different from the hypothesized value in the null hypothesis. The alternative
hypothesis is what you might believe to be true or hope to prove true.
In the output, the null and alternative hypotheses help you to verify that you entered
the correct value for the test difference.

T-Value
The t-value is the observed value of the t-test statistic that measures the difference
between an observed sample statistic and its hypothesized population parameter, in
units of standard error.

Interpretation

You can compare the t-value to critical values of the t-distribution to determine whether to
reject the null hypothesis. However, using the p-value of the test to make the same
determination is usually more practical and convenient.

To determine whether to reject the null hypothesis, compare the t-value to the critical value.
The critical value is tα/2, n–1 for a two-sided test and tα, n–1 for a one-sided test. For a two-
sided test, if the absolute value of the t-value is greater than the critical value, you reject the
null hypothesis. If it is not, you fail to reject the null hypothesis. You can calculate the critical
value in Minitab or find the critical value from a t-distribution table in most statistics books.
For more information, go to Using the inverse cumulative distribution function (ICDF) and
click "Use the ICDF to calculate critical values".

The t-value is used to calculate the p-value.

P-Value

The p-value is a probability that measures the evidence against the null hypothesis. A
smaller p-value provides stronger evidence against the null hypothesis.

Interpretation

Use the p-value to determine whether the population mean of the differences is statistically
different from the hypothesized mean of the differences.

To determine whether the difference between the population means is statistically


significant, compare the p-value to the significance level. Usually, a significance level
(denoted as α or alpha) of 0.05 works well. A significance level of 0.05 indicates a 5% risk of
concluding that a difference exists when there is no actual difference.
P-value ≤ α: The difference between the means is statistically significant (Reject
H0)

If the p-value is less than or equal to the significance level, the decision is to reject the null
hypothesis. You can conclude that the difference between the population means does not
equal the hypothesized difference. If you did not specify a hypothesized difference, Minitab
tests whether there is no difference between the means (Hypothesized difference = 0) Use
your specialized knowledge to determine whether the difference is practically significant.
For more information, go to Statistical and practical significance.

P-value > α: The difference between the means is not statistically significant
(Fail to reject H0)

If the p-value is greater than the significance level, the decision is to fail to reject the null
hypothesis. You do not have enough evidence to conclude that the mean difference
between the paired observations is statistically significant. You should make sure that your
test has enough power to detect a difference that is practically significant. For more
information, go to Power and Sample Size for Paired t.

P-value
The p-value is a probability that measures the evidence against the null hypothesis. Lower
probabilities provide stronger evidence against the null hypothesis.

Interpretation
Use the p-value to determine whether the data do not follow a normal distribution.

To determine whether the data do not follow a normal distribution, compare the p-value to
the significance level. Usually, a significance level (denoted as α or alpha) of 0.05 works well.
A significance level of 0.05 indicates a 5% risk of concluding that the data do not follow a
normal distribution when they actually do follow a normal distribution.

P-value ≤ α: The data do not follow a normal distribution (Reject H0)

If the p-value is less than or equal to the significance level, the decision is to reject the null
hypothesis and conclude that your data do not follow a normal distribution. If your data do
not follow the normal distribution, read the data considerations topic for any other analyses
that you want to perform. The data considerations topics indicate one the following:

• The analysis works well with nonnormal data.


• The analysis works well with nonnormal data that were transformed into
normal data.
• The analysis does not work well with nonnormal data. The data considerations
topics might identify a different analysis that you can use.
P-value > α : Cannot conclude the data do not follow a normal distribution (Fail to
reject H0)

If the p-value is larger than the significance level, the decision is to fail to reject the
null hypothesis because there is not enough evidence to conclude that your data do
not follow a normal distribution. However, you cannot conclude that the data do
follow a normal distribution.

Probability plot
A probability plot creates an estimated cumulative distribution function (CDF) from
your sample by plotting the value of each observation against the observation's
estimated cumulative probability.

Interpretation
Use a probability plot to visualize how well your data fit the normal distribution.

To visualize the fit of the normal distribution, examine the probability plot and assess
how closely the data points follow the fitted distribution line. If your data are perfectly
normal, the data points on the probability plot form a straight line. Skewed data form a
curved line.

Right-skewed data

Left-skewed data
TIP
Hold your pointer over the fitted distribution line to see a chart of percentiles and values.
However, be aware that these values are accurate only if the data follow a normal
distribution.

StDev
The standard deviation is the most common measure of dispersion, or how spread out the
data are from the mean. A larger sample standard deviation indicates that your data are
spread more widely around the mean

Histogram
A histogram divides sample values into many intervals and represents the frequency of data
values in each interval with a bar.

Interpretation
Use a histogram to assess the shape and spread of the data. Histograms are best when the
sample size is greater than 20.

Skewed data
Examine the spread of your data to determine whether your data appear to be
skewed. When data are skewed, the majority of the data are located on the high or
low side of the graph. Often, skewness is easiest to detect with a histogram or
boxplot.

Right-skewed

Left-skewed
The histogram with right-skewed data shows wait times. Most of the wait times are relatively
short, and only a few wait times are long. The histogram with left-skewed data shows failure time
data. A few items fail immediately, and many more items fail later.
Data that are severely skewed can affect the validity of the p-value if your sample is
small (less than 20 values). If your data are severely skewed and you have a small
sample, consider increasing your sample size.

Outliers
Outliers, which are data values that are far away from other data values, can strongly
affect the results of your analysis. Often, outliers are easiest to identify on a boxplot.

On a histogram, isolated bars at either ends of the graph identify possible outliers.
Try to identify the cause of any outliers. Correct any data–entry errors or
measurement errors. Consider removing data values for abnormal, one-time events
(also called special causes). Then, repeat the analysis. For more information, go
to Identifying outliers.

Individual value plot


An individual value plot displays the individual values in the sample. Each circle represents
one observation. An individual value plot is especially useful when you have relatively few
observations and when you also need to assess the effect of each observation.

Interpretation
Use an individual value plot to examine the spread of the data and to identify any potential
outliers. Individual value plots are best when the sample size is less than 50.

Skewed data
Examine the spread of your data to determine whether your data appear to be
skewed. When data are skewed, the majority of the data are located on the high or
low side of the graph. Often, skewness is easiest to detect with a histogram or
boxplot.
Right-skewed

Left-skewed

The individual value plot with right-skewed data shows wait times. Most of the wait times are
relatively short, and only a few wait times are long. The individual value plot with left-skewed data
shows failure time data. A few items fail immediately, and many more items fail later.
Data that are severely skewed can affect the validity of the p-value if your sample is
small (less than 20 values). If your data are severely skewed and you have a small
sample, consider increasing your sample size.

Outliers
Outliers, which are data values that are far away from other data values, can strongly
affect the results of your analysis. Often, outliers are easiest to identify on a boxplot.

On an individual value plot, unusually low or high data values indicate possible outliers.
Try to identify the cause of any outliers. Correct any data–entry errors or
measurement errors. Consider removing data values for abnormal, one-time events
(also called special causes). Then, repeat the analysis. For more information, go
to Identifying outliers.

Boxplot
A boxplot provides a graphical summary of the distribution of a sample. The boxplot shows
the shape, central tendency, and variability of the data.
Interpretation
Use a boxplot to examine the spread of the data and to identify any potential
outliers. Boxplots are best when the sample size is greater than 20.

Skewed data
Examine the spread of your data to determine whether your data appear to be
skewed. When data are skewed, the majority of the data are located on the high or low side
of the graph. Often, skewness is easiest to detect with a histogram or boxplot.

Right-skewed

Left-skewed

The boxplot with right-skewed data shows wait times. Most of the wait times are relatively
short, and only a few wait times are long. The boxplot with left-skewed data shows failure
time data. A few items fail immediately, and many more items fail later.

Data that are severely skewed can affect the validity of the p-value if your sample is small
(less than 20 values). If your data are severely skewed and you have a small sample, consider
increasing your sample size.

Outliers
Outliers, which are data values that are far away from other data values, can strongly affect
the results of your analysis. Often, outliers are easiest to identify on a boxplot.

On a boxplot, asterisks (*) denote outliers.


Try to identify the cause of any outliers. Correct any data–entry errors or measurement
errors. Consider removing data values for abnormal, one-time events (also called special
causes). Then, repeat the analysis. For more information, go to Identifying outliers.

AD-value
The Anderson-Darling goodness-of-fit statistic (AD-Value) measures the area between the
fitted line (based on the normal distribution) and the empirical distribution function (which
is based on the data points). The Anderson-Darling statistic is a squared distance that is
weighted more heavily in the tails of the distribution.

Interpretation
Minitab uses the Anderson-Darling statistic to calculate the p-value. The p-value is a
probability that measures the evidence against the null hypothesis. Smaller p-values provide
stronger evidence against the null hypothesis. Larger values for the Anderson-Darling
statistic indicate that the data do not follow the normal distribution.

Maximum
The maximum is the largest data value.

In these data, the maximum is 19.


13 17 18 19 12 10 7 9 14

Interpretation
Use the maximum to identify a possible outlier or a data-entry error. One of the simplest
ways to assess the spread of your data is to compare the minimum and maximum. If the
maximum value is very high, even when you consider the center, the spread, and the shape
of the data, investigate the cause of the extreme value.

Mean
The mean describes the sample with a single value that represents the center of the data.
The mean is calculated as the average of the data, which is the sum of all the observations
divided by the number of observations.
Minimum
The minimum is the smallest data value.

In these data, the minimum is 7.


13 17 18 19 12 10 7 9 14

Interpretation
Use the minimum to identify a possible outlier or a data-entry error. One of the simplest
ways to assess the spread of your data is to compare the minimum and maximum. If the
minimum value is very low, even when you consider the center, the spread, and the shape of
the data, investigate the cause of the extreme value.

1st Quartile
Quartiles are the three values—the 1st quartile at 25% (Q1), the second quartile at 50% (Q2
or median), and the third quartile at 75% (Q3)— that divide a sample of ordered data into
four equal parts.

The 1st quartile is the 25th percentile and indicates that 25% of the data are less than or equal
to this value.

For this ordered data, the 1st quartile (Q1) is 9.5. That is, 25% of the data are less than or equal to 9.5.

Median
The median is the midpoint of the data set. This midpoint value is the point at which half the
observations are above the value and half the observations are below the value. The median
is determined by ranking the observations and finding the observation that are at the
number [N + 1] / 2 in the ranked order. If the number of observations are even, then the
median is the average value of the observations that are ranked at numbers N / 2 and [N /
2] + 1.

For this ordered data, the median is 13. That is, half the values are less than or equal to 13, and half the
values are greater than or equal to 13. If you add another observation equal to 20, the median is 13.5,
which is the average between 5th observation (13) and the 6th observation (14).

Interpretation
The median and the mean both measure central tendency. But unusual values, called
outliers, can affect the median less than they affect the mean. If your data are symmetric,
the mean and median are similar.

Symmetric

Not symmetric

For the symmetric distribution, the mean (blue line) and median (orange line) are so similar that you can't
easily see both lines. But the non-symmetric distribution is skewed to the right.

3rd Quartile
Quartiles are the three values—the 1st quartile at 25% (Q1), the second quartile at 50% (Q2
or median), and the third quartile at 75% (Q3)— that divide a sample of ordered data into
four equal parts.

The third quartile is the 75th percentile and indicates that 75% of the data are less than or
equal to this value.

For this ordered data, the third quartile (Q3) is 17.5. That is, 75% of the data are less than or equal to 17.5.

Null hypothesis and alternative hypothesis


The null and alternative hypotheses are two mutually exclusive statements about the
distribution of the data. The Anderson-Darling test uses sample data to determine whether
to reject the null hypothesis.
Null Hypothesis
The null hypothesis states that the data follow a normal distribution.

Alternative Hypothesis
The alternative hypothesis states that the data do not follow a normal distribution.

*********

Interpret the key results for Boxplot


Complete the following steps to interpret a boxplot.

In This Topic
Step 1: Assess the key characteristics
Examine the center and spread of the distribution. Assess how the sample size may affect
the appearance of the boxplot.

Center and spread


Examine the following elements to learn more about the center and spread of your sample
data.

Median

The median is represented by the line in the box. The median is a common measure
of the center of your data.

Interquartile range box


The interquartile range box represents the middle 50% of the data.

Whiskers
The whiskers extend from either side of the box. The whiskers represent the ranges
for the bottom 25% and the top 25% of the data values, excluding outliers.

Hold the pointer over the boxplot to display a tooltip that shows these statistics. For
example, the following boxplot of the heights of students shows that the median
height is 69. Most students have a height that is between 66 and 72, but some
students have heights that are as low as 61 and as high as 75.

Investigate any surprising or undesirable characteristics on the boxplot. For example,


a boxplot may show that the median length of wood boards is much lower than the
target length of 8 feet.

Sample size (N)


The sample size can affect the appearance of the graph. For example, although the
following boxplots seem quite different, both of them were created using randomly
selected samples of data from the same population.
N = 15

N = 500
A boxplot works best when the sample size is at least 20. If the sample size is too
small, the quartiles and outliers shown by the boxplot may not be meaningful. If the
sample size is less than 20, consider using Individual Value Plot.

Step 2: Look for indicators of nonnormal or unusual


data
Skewed data indicate that data may be nonnormal. Outliers may indicate other
conditions in your data.

Skewed data
When data are skewed, the majority of the data are located on the high or low side
of the graph. Skewness indicates that the data may not be normally distributed.

The following boxplots are skewed. The boxplot with right-skewed data shows wait
times. Most of the wait times are relatively short, and only a few wait times are long.
The boxplot with left-skewed data shows failure time data. A few items fail
immediately and many more items fail later.

Right-skewed
Left-skewed
Some analyses assume that your data come from a normal distribution. If your data
are skewed (nonnormal), read the data considerations topic for the analysis to make
sure that you can use data that are not normal.

Outliers
Outliers, which are data values that are far away from other data values, can strongly
affect your results. Often, outliers are easiest to identify on a boxplot. On a boxplot,
outliers are identified by asterisks (*).

TIP
Hold the pointer over the outlier to identify the data point.
Try to identify the cause of any outliers. Correct any data-entry errors or
measurement errors. Consider removing data values that are associated with
abnormal, one-time events (special causes). Then, repeat the analysis.

Step 3: Assess and compare groups


If your boxplot has groups, assess and compare the center and spread of groups.

Centers
Look for differences between the centers of the groups. For example, the following
boxplot shows the thickness of wire from four suppliers. The median thicknesses for

some groups seem to be different.

Spreads
Look for differences between the spreads of the groups. For example, the following
boxplot shows the fill weights of cereal boxes from four production lines. The median
weights of the groups of cereal boxes are similar, but the weights of some groups

are more variable than others.

Interpret the key results for Pareto Chart


Complete the following steps to interpret a Pareto chart.

Step 1: Examine the order of the bars


A Pareto chart is a bar chart in which the bars are ordered from highest frequency of
occurrence to lowest frequency of occurrence. Use a Pareto chart to rank your defects from
largest to smallest, so that you can prioritize quality improvement efforts.

After a specified percentage of the defectives are categorized, usually 95%, Minitab
combines the remaining defects into a group called "Other". The Other category is always
displayed as the last bar, even if the Other category has a higher count than previous
categories.

Key Results: Counts, Percent


In these results, the largest source of complaints is from Room. The chart shows 104 complaints about
rooms, which account for 45.4% of all the complaints.

Step 2: Examine the cumulative percentage line


The cumulative percentage line starts at the first (highest) bar, and extends to the last bar to
help you assess the added contribution of each category. The cumulative percentage is also
displayed for each bar under the chart unless you have a by variable and display all on one
graph.
Key Results: Cum %
In these results, 65.1% of all the complaints are from the first two categories, Room and Appliances. Over
90% of all complaints are from the first 4 categories.

Interpret the key results for Run Chart


Complete the following steps to interpret a run chart.

Step 1: Look for patterns in the data


A run chart plots your process data in the order that they were collected. Use a run chart to
look for patterns or trends in your data that indicate the presence of special-cause variation.

Patterns in your data indicate that the variation is due to special causes that should be
investigated and corrected. However, common-cause variation is variation that is inherent or
a natural part of the process. A process is stable when only common causes, not special
causes, affect the process output. If only common causes of variation exist in your process,
the data exhibit random behavior.
In these results, the data appear to show some clustering in samples 3 through 5.

Step 2: Determine whether mixtures and clusters are


present
The test for number of runs about the median is based on the total number of runs that
occur both above and below the median. A run about the median is one or more
consecutive points on the same side of the center line. A run ends when the line that
connects the points crosses the center line. A new run begins with the next plotted point.

This test detects two types of nonrandom behavior: mixtures and clusters.

An observed number of runs that is greater than the expected number of runs indicates
mixtures. An observed number of runs that is less than the expected number of runs
indicates clusters.

Cluster patterns

Clusters may indicate special-cause variation, such as measurement problems, lot-to-


lot or set-up variability, or sampling from a group of defective parts. Clusters are
groups of points in one area of the chart. If the p-value for clustering is less than
0.05, you may have clusters in your data.
This chart shows possible clusters of data.
Mixture patterns

A mixture is characterized by frequent crossing of the center line. Mixtures often


indicate combined data from two populations, or two processes operating at
different levels. If the p-value for mixtures is less than 0.05, you may have mixtures in
your data.

In this chart, the mixture may indicate that the data come from different processes.

Key Results: P-value for Clustering, P-Value for Mixtures


In this example, the p-value for clustering of 0.385 and the p-value for mixtures of 0.615 are
greater than the α of 0.05. Therefore, you can conclude that the data does not indicate
mixtures or clusters.
Step 3: Determine whether trends and oscillation are
present
The test for number or runs up and down is based on the total number of
observed runs up or down. A run up is an upward run of consecutive points that
exclusively increases. A run down is a downward run of consecutive points that
exclusively decreases. A run ends when the direction (either up or down) changes.
For example, when the preceding value is smaller, a run up begins and continues
until the proceeding value is larger than the next point, then a run down begins.

This test detects two types of nonrandom behavior: oscillation and trends.

An observed number of runs that is greater than the expected number of runs
indicates oscillation. An observed number of runs that is less than the expected
number of runs indicates trends.

Trend patterns

A trend is a sustained drift in the data, either up or down. Trends may warn that a
process will soon go out of control. A trend can be caused by factors such as worn
tools, a machine that does not hold a setting, or periodic rotation of operators. If the
p-value for trends is less than 0.05, you may have a trend in your data.

In this chart, the upward trend in the first few data points is easy to see.
Oscillating patterns

Oscillation occurs when the data fluctuates up and down, which indicates that the
process is not steady. If the p-value for oscillation is less than 0.05, you may have
oscillation in your data.
In this chart, the data seem to vary up and down frequently.

Key Results: P-Value for Trends, P-Value for Oscillation


In this example, the p-value for trends of 0.500 and the p-value for oscillation of
0.500 are greater than the α of 0.05. Therefore, you can conclude that the data does
not indicate trends or oscillation.

Interpret the key results for Normality Test


Complete the following steps to interpret a normality test. Key output includes the p-value
and the probability plot.

Step 1: Determine whether the data do not follow a


normal distribution
To determine whether the data do not follow a normal distribution, compare the p-value to
the significance level. Usually, a significance level (denoted as α or alpha) of 0.05 works well.
A significance level of 0.05 indicates a 5% risk of concluding that the data do not follow a
normal distribution when the data do follow a normal distribution.
P-value ≤ α: The data do not follow a normal distribution (Reject H0)

If the p-value is less than or equal to the significance level, the decision is to reject
the null hypothesis and conclude that your data do not follow a normal distribution.

P-value > α: You cannot conclude that the data do not follow a normal distribution
(Fail to reject H0)
If the p-value is larger than the significance level, the decision is to fail to reject the
null hypothesis. You do not have enough evidence to conclude that your data do not
follow a normal distribution.

Key Result: P-Value


In these results, the null hypothesis states that the data follow a normal distribution. Because the p-
value is 0.463, which is greater than the significance level of 0.05, the decision is to fail to reject the
null hypothesis. You cannot conclude that the data do not follow a normal distribution.

Step 2: Visualize the fit of the normal distribution


To visualize the fit of the normal distribution, examine the probability plot and assess
how closely the data points follow the fitted distribution line. Normal distributions tend
to fall closely along the straight line. Skewed data form a curved line.
Right-skewed data

Left-skewed data
TIP
In Minitab, hold your pointer over the fitted distribution line to see a chart of
percentiles and values.

In this probability plot, the data form an approximately straight line along the line. The normal
distribution appears to be a good fit to the data.
****************

Interpret the key results for Individual Distribution


Identification
Complete the following steps to interpret Individual Distribution Identification. Key output
includes probability plots and p-values.

Step 1: View the fit of the distribution


Use the probability plot to assess how closely your data follow each distribution.

If the distribution is a good fit for the data, the points should fall closely along the fitted
distribution line. Departures from the straight line indicate that the fit is unacceptable.
Good fit

Poor fit
In addition to the probability plots, use the goodness-of-fit measures, such as the p-values,
and your practical process knowledge, to evaluate the distribution fit.

Step 2: Assess the fit of the distribution


Use the p-value to assess the fit of the distribution.

Compare the p-value for each distribution or transformation to the significance level.
Usually, a significance level (denoted as α or alpha) of 0.05 works well. A significance level of
0.05 indicates a 5% risk of concluding that the data do not follow the distribution when they
actually do follow the distribution.

P ≤ α: The data do not follow the distribution (Reject H0)

If the p-value is less than or equal to the significance level, you reject the null
hypothesis and conclude that your data do not follow the distribution.

P > α: Cannot conclude the data do not follow the distribution (Fail to reject H0)
If the p-value is greater than the significance level, you fail to reject the null
hypothesis. There is not enough evidence to conclude that the data do not follow
the distribution. You can assume that the data follow the distribution.
When selecting a distribution to model your data, also rely on your process knowledge.
If several distributions provide a good fit, use the following strategies to choose a
distribution:

• Choose the distribution that is most commonly used in your industry or application.

• Choose the distribution that provides the most conservative results. For example, if you
are performing capability analysis, you can perform the analysis using different
distributions and then choose the distribution that produces the most conservative
capability indices. For more information, go to Distribution percentiles for Individual
Distribution Identification and click "Percents and percentiles".
• Choose the simplest distribution that fits your data well. For example, if a 2-parameter
and a 3-parameter distribution both provide a good fit, you might choose the simpler
2-parameter distribution.
IMPORTANT
Use caution when you interpret results from a very small or a very large sample. If you
have a very small sample, a goodness-of-fit test may not have enough power to detect
significant deviations from the distribution. If you have a very large sample, the test
may be so powerful that it detects even small deviations from the distribution that have
no practical significance. Use the probability plots in addition to the p-values to
evaluate the distribution fit.
Distribution Identification for Calcium

2-Parameter Exponential

* WARNING * Variance/Covariance matrix of estimated parameters does


not exist.

The threshold parameter is assumed fixed when calculating

confidence intervals.

3-Parameter Gamma
* WARNING * Variance/Covariance matrix of estimated parameters does
not exist.

The threshold parameter is assumed fixed when calculating

confidence intervals.

Distribution ID Plot for Calcium

Distribution ID Plot for Calcium

Distribution ID Plot for Calcium

Distribution ID Plot for Calcium

Goodness of Fit Test

Distribution AD P LRT P

Normal 0.754 0.046

Box-Cox Transformation 0.414 0.324

Lognormal 0.650 0.085


3-Parameter Lognormal 0.341 * 0.017

Exponential 20.614 <0.003

2-Parameter Exponential 1.684 0.014 0.000

Weibull 1.442 <0.010

3-Parameter Weibull 0.230 >0.500 0.000

Smallest Extreme Value 1.656 <0.010

Largest Extreme Value 0.394 >0.250

Gamma 0.702 0.071

3-Parameter Gamma 0.268 * 0.006

Logistic 0.726 0.034

Loglogistic 0.659 0.050

3-Parameter Loglogistic 0.432 * 0.027

Johnson Transformation 0.124 0.986

Key Results: P
In these results, several distributions have a p-value that is greater than 0.05. The 3-parameter
Weibull distribution (P > 0.500) and the largest extreme value distribution (P > 0.250) have the
largest p-values, and appear to fit the sample data better than the other distributions. Also, the Box-
Cox transformation (P = 0.353) and the Johnson transformation (P = 0.986) are effective in
transforming the data to follow a normal distribution.
NOTE
For several distributions, Minitab also displays results for the distribution with an
additional parameter. For example, for the lognormal distribution, Minitab displays
results for both the 2-parameter and 3-parameter versions of the distribution. For
distributions that have additional parameters, use the likelihood-ratio test p-value (LRT
P) to determine whether adding another parameter significantly improves the fit of the
distribution. An LRT p-value that is less than 0.05 suggests that the improvement in fit
is significant. For more information, go to Goodness of fit for individual distribution
identification and click "LRT P".

Potrebbero piacerti anche