Field Training/Basic Statistics 2 Where are We in D-M-A-I-C?
Focusing the Problem Define Need (CTQ Identification) Define Phase Measure Phase Analyze Phase Improve Phase Control Phase Define Process Process Variation Data Collection Baselining Basic Statistics Monitor Output Customer (Dashboards) Intro to Data Analysis Pilot Improvements GE Company Proprietary June, 1998 Field Training/Basic Statistics 3 - Types of data - Normal distribution and normal probabilities - Measures of the center of the data Mean Median - Measures of the spread of data Range Variance Standard deviation - Process stability and process capability
Topics well discuss Basic Statistics GE Company Proprietary June, 1998 Field Training/Basic Statistics 4 - What is a unit? - What is a defect ? - What are the number of opportunities/unit for error? - Calculate DPU = Total number of defects Total number of units - Calculate DPMO = Total number of defects Total Total number number x of opportunities of units per unit Note: Some processes may have multiple opportunities per unit It is necessary to define Calculating Sigma x 1,000,000 GE Company Proprietary June, 1998 Field Training/Basic Statistics 5 Control Analyze Improve Define Measure Use of Data in the DMAIC Cycle Collect facts surrounding a problem or opportunity Develop a baseline for process parameters Establish relationships between process inputs and outputs Quantify the impact of a new process improvement Monitor process; ensure process gain is sustained GE Company Proprietary June, 1998 Field Training/Basic Statistics 6 Hmm What Kind of Data Is This? Six Sigma Data There are several different types of data Types of Data - We collect a statistically appropriate amount of data before we draw any conclusions - We choose the appropriate analytical tools to measure and control processes.
Understanding what type of data we have ensures that: GE Company Proprietary June, 1998 Field Training/Basic Statistics 7 Types of Data - Attribute data (qualitative, words) Categories (strongly agree, agree, etc. . .) Yes, no (order form filled out accurately or not) On time, not on time Pass/fail; good/bad (accurate billing/overcharged) - Variable data (quantitative, numbers) Discrete (count) data Data is not capable of being meaningfully subdivided into more precise increments Sample size needed is much larger than continuous data Ex: # of days > 30 A/R aging;# of times customer hangs up before receiving response Continuous data Decimal subdivisions are meaningful Ex: time to answer the telephone (exact # of seconds per call) Sample size of 30 is usually adequate GE Company Proprietary June, 1998 Field Training/Basic Statistics 8 Continuous 1 4.9787 21 4.9759 41 4.9769 61 4.9756 81 4.9746 2 4.9760 22 4.9761 42 4.9762 62 4.9759 82 4.9758 3 4.9762 23 4.9746 43 4.9747 63 4.9752 83 4.9786 4 4.9772 24 4.9764 44 4.9776 64 4.9756 84 4.9759 5 4.9767 25 4.9758 45 4.9738 65 4.9749 85 4.9756 6 4.9756 26 4.9761 46 4.9767 66 4.9770 86 4.9754 7 4.9745 27 4.9779 47 4.9756 67 4.9747 87 4.9751 8 4.9761 28 4.9777 48 4.9758 68 4.9758 88 4.9757 9 4.9764 29 4.9770 49 4.9757 69 4.9765 89 4.9752 10 4.9751 30 4.9764 50 4.9769 70 4.9759 90 4.9744 11 4.9750 31 4.9752 51 4.9754 71 4.9766 91 4.9755 12 4.9768 32 4.9762 52 4.9772 72 4.9763 92 4.9764 13 4.9761 33 4.9767 53 4.9757 73 4.9771 93 4.9768 14 4.9751 34 4.9766 54 4.9778 74 4.9761 94 4.9760 15 4.9757 35 4.9767 55 4.9746 75 4.9762 95 4.9742 16 4.9751 36 4.9757 56 4.9774 76 4.9768 96 4.9772 17 4.9767 37 4.9763 57 4.9759 77 4.9767 97 4.9768 18 4.9766 38 4.9778 58 4.9757 78 4.9780 98 4.9754 19 4.9757 39 4.9746 59 4.9767 79 4.9761 99 4.9764 20 4.9764 40 4.9756 60 4.9776 80 4.9763 100 4.9767 Salary Increases of 100 Employees Discrete Salary Increases of 100 Random Employees 1 1 21 -1 41 1 61 -1 81 -1 2 0 22 1 42 1 62 -1 82 -1 3 1 23 -1 43 -1 63 -1 83 1 4 1 24 1 44 1 64 -1 84 -1 5 1 25 -1 45 -1 65 -1 85 -1 6 -1 26 1 46 1 66 1 86 -1 7 -1 27 1 47 -1 67 -1 87 -1 8 1 28 1 48 -1 68 -1 88 -1 9 1 29 1 49 -1 69 1 89 -1 10 -1 30 1 50 1 70 -1 90 -1 11 -1 31 -1 51 -1 71 1 91 -1 12 1 32 1 52 1 72 1 92 1 13 1 33 1 53 -1 73 1 93 1 14 -1 34 1 54 1 74 1 94 0 15 -1 35 1 55 -1 75 1 95 -1 16 -1 36 -1 56 1 76 1 96 1 17 1 37 1 57 -1 77 1 97 1 18 1 38 1 58 -1 78 1 98 -1 19 -1 39 -1 59 1 79 1 99 1 20 1 40 -1 60 1 80 1 100 1 Continuous vs. Discrete GE Company Proprietary June, 1998 Field Training/Basic Statistics 9 After completion of Basic Statistics, you should : Goals - Be able to use statistical terminology - Be able to graph data - Calculate mean, median, and standard deviation - Be able to describe and interpret data - Use the data to make better and quicker decisions - Recognize and apply the normal distribution
GE Company Proprietary June, 1998 Field Training/Basic Statistics 10 What Is Statistics?? Collecting data, graphing data, and using that information to make decisions. GE Company Proprietary June, 1998 Field Training/Basic Statistics 11 Class Exercise Flip a coin 20 times and record the sequential results (H or T) # of Heads Tal l y # of Heads Frequency Rel at i v e Frequency Cum ul at i v e Frequency Cum ul at i v e Rel at i v e Frequency 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Flip: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 # of Heads Results: n = x = median = s = GE Company Proprietary June, 1998 Field Training/Basic Statistics 12 Definition of: n: Sample size, number of observations
Frequency: Number of entries in a cell
Relative Frequency: Cell frequency Sample size
Cumulative Frequency: Number in that cell plus previous cells
The cumulative frequency in the last cell must be
Cumulative Relative Frequency: Cumulative frequency Total sample size
The cumulative relative frequency in the last cell must be
GE Company Proprietary June, 1998 Field Training/Basic Statistics 13 Observations Vary - Differences are expected - Variation is due to: People Process Material Measuring instrument (gauge repeatability and reproducibility) Seasonality Others . . . . _________ _________ _________ There is inherent variability in even a very good product. This can be detected if you have a measurement which is sensitive enough to detect that variation.
GE Company Proprietary June, 1998 Field Training/Basic Statistics 14 Profound Statement: Although variation makes our life difficult, through the magic and wonder of statistics, we can describe the variation, reduce the variation, and improve the quality of our decisions. This course concentrates on descriptive statistics. Variation often makes our tasks difficult. Statistics deals with variation. - Descriptive statistics ( describing variation with graphs and summary values ) - Inference (decision making and predicting in the presence of uncertainty) - Design of experiments (methods of collecting data to improve the quality of decisions)
GE Company Proprietary June, 1998 Field Training/Basic Statistics 15 What is the Value of Graphing Data? 1. __________________________________
2. __________________________________
3. __________________________________
4. __________________________________ GE Company Proprietary June, 1998 Field Training/Basic Statistics 16 - A visual picture of data - Shape of observations (bell-shaped?) - Spread - Most frequent values - Highest and lowest value
Information You Can Gather from Graphing a Set of Data GE Company Proprietary June, 1998 Field Training/Basic Statistics 17 Populations and Samples - Population all items of interest - Sample subset of data from the population - Random sample every item in the population has an equal chance of being in the sample - We might be interested in the population of all range endcaps used in 1996, but we are basing our decision on a sample of n = 30 range endcaps.
GE Company Proprietary June, 1998 Field Training/Basic Statistics 18 We measure a sample in order to study the population. Populations and Samples - We use different symbols for population and sample values. Population Sample Average (mu) x (x-bar) or (mu hat) Standard Deviation o (sigma) s or o (sigma hat) (spread of the data) Variance o 2 s 2 or o 2
^ ^ ^ GE Company Proprietary June, 1998 Field Training/Basic Statistics 19
mean = X = X bar
X = X 1 + X 2 + X 3 +. . . + X n =
X i
n n i =1 n - The mean is influenced by extreme values - The mean is the most commonly used measure of the center of the distribution Sample: Mean (Average) Formula GE Company Proprietary June, 1998 Field Training/Basic Statistics 20 X Calculation X 1 = X 2 = X 3 = X 4 = X 5 = X 6 = X 7 = X 8 = X 9 = X 10 = X 11 = X 12 = X 13 = X 14 = X 15 = X 16 = X 17 = X 18 = X 19 = X 20 = n = X i = X i
n = Use the data from the coin chart GE Company Proprietary June, 1998 Field Training/Basic Statistics 21 Example : $60,000 $80,000 $100,000 $120,000 $1,640,000
Median = $100,000 Average = $400,000 Median - The middle value - Not influenced by extreme values - Applicable to income and housing prices because of the extreme values.
GE Company Proprietary June, 1998 Field Training/Basic Statistics 22 Calculating the Median Place the values in order and select the middle value Odd number of observations
The median is the ordered value
Example :
Given the numbers
60,000 80,000 100,000 120,000 1,640,000
n = 5
n + 1 2 5 + 1 2 6 2
The 3rd. ordered value is 100,000 = = n + 1 2 = 3rd. ordered value
= Even number of observations
The median is the average of the two middle values:
The n and n ordered values 2 2 Example:
Given the numbers
60,000 80,000 100,000 120,000 160,000 1,640,000
n = 6
n and n ordered value 2 2
6 and 6 3rd. and 4th. ordered values 2 2
median = + 1 + 1 100,000 + 120,000 2 = 100,000 + 1 GE Company Proprietary June, 1998 Field Training/Basic Statistics 23 - Coin data, n = _____________ - List the values in order - Even or odd number of data - If odd, calculate which ordered observation is the median ( ) - If even, calculate which ordered observations are used for the median ( + 1 ) - Find the middle number(s)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 n + 1 2 Median Calculation n n 2 2 and GE Company Proprietary June, 1998 Field Training/Basic Statistics 24 $ Thought Provoker Consider the incomes of all GEA employees: - Would the mean and the median be the same? Which would be greater? Why? Which measure would be appropriate to use? Would any measure of the center tell you all you want to know about incomes of GEA employees? GE Company Proprietary June, 1998 Field Training/Basic Statistics 25 Range is a measurement of spread, but uses two observations only. Range Range = Largest value minus smallest value Coin data : Largest = Smallest = Range = GE Company Proprietary June, 1998 Field Training/Basic Statistics 26 Standard Deviation and Variance - The standard deviation is a measure of the spread of the data Population Standard Deviation = o (sigma) Sample Standard Deviation = s - The variance is the square of the standard deviation Population Variance = o 2
Sample Variance = s 2 GE Company Proprietary June, 1998 Field Training/Basic Statistics 27 s =
n i = 1 ( Xi - X) 2 n - 1 The Standard Deviation Formula - Variance and standard deviation use all the observations to determine the spread - The range and standard deviation are both sensitive to extreme values - The standard deviation is useful when the distribution is normal - We divide by (n - 1) to make an unbiased estimate of the standard deviation - Dividing by n tends to give a low estimate - The square of a negative number is positive: Example (-5) 2 = 25 - If the spread of numbers is large, then s will be large
GE Company Proprietary June, 1998 Field Training/Basic Statistics 28 Calculating the Standard Deviation n ( X i - X ) 2 s = o = i = 1 = n - 1 ^ X = = i = 1 = n ^ X i n X i ( X i - X ) ( X i - X ) 2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
GE Company Proprietary June, 1998 Field Training/Basic Statistics 29 Results of a computer experiment 19 14 9 4 1 5 0 1 0 0 5 0 0 F
4. Graph the results. GE Company Proprietary June, 1998 Field Training/Basic Statistics 30 The Normal Distribution
- Many data sets follow a normal distribution (Gaussian, bell-shaped curve). - Many data sets do not follow a normal distribution, especially when the spec is one sided bounded by zero (example: average # of delivery days). - The Central Limit Theorem states that the sum or average of many values often follows a normal distribution. In the coin flipping example, we are adding the results of 20 individual trials. - Always graph the data to see if a normal distribution is reasonable. - The assumption of a normal distribution is critical when using the normal tables (Z tables).
GE Company Proprietary June, 1998 Field Training/Basic Statistics 31
Forming the Normal Curve Units of Measure GE Company Proprietary June, 1998 Field Training/Basic Statistics 32 68.3% 95.4% 99.7% 99.99999975% - 6o - 5o - 4o - 3o -2o - 1o + 1o + 2o + 3o + 4o + 5o + 6o Normal Curve GE Company Proprietary June, 1998 Field Training/Basic Statistics 33 110 100 90 80 70 60 50 40 30 20 100 50 0 C1 F r e q u e n c y Comparison of Distributions. Sketch in the means and medians on each distribution. Negative Skew Positive Skew Symmetrical Distribution 80 70 60 50 40 30 20 10 0 300 200 100 0 C3 F r e q u e n c y Comparison of Distributions. Tail 130 120 110 100 90 80 70 60 300 200 100 0 C2 F r e q u e n c y Comparison of Distributions. Tail Different Distributions Used With Permission AlliedSignal 1995 - Dr. Steve Zinkgraf Centering GE Company Proprietary June, 1998 Field Training/Basic Statistics 34 - Property 1: A normal distribution can be described completely by knowing only the: mean, and standard deviation
Distribution One Distribution Two Distribution Three What is the difference among these three normal distributions? Used With Permission AlliedSignal 1995 - Dr. Steve Zinkgraf The Normal Distribution Note: Means for all three distributions are equal Mean GE Company Proprietary June, 1998 Field Training/Basic Statistics 35 Used With Permission AlliedSignal 1995 - Dr. Steve Zinkgraf The Normal Distribution - The normal distribution is a distribution of data which has certain consistent properties - These properties are very useful in our understanding of the characteristics of the underlying process from which the data were obtained - Most natural phenomena and man-made processes are distributed normally, or can be represented as normally distributed GE Company Proprietary June, 1998 Field Training/Basic Statistics 36 Upper Specification Limit (USL) Target Specification (T) Lower Specification Limit (LSL) Mean of the distribution () or x Standard Deviation of the distribution (o) or s
The Standard Deviation
1o T USL p(d) Upper Specification Limit (USL) Target Specification (T) Lower Specification Limit (LSL) Mean of the distribution () or x Standard Deviation of the distribution (o) or s 3o The distance between the point of inflection and the mean constitutes the size of a standard deviation. If three such deviations can be fit between the target value and the specification limit, we would say the process has three sigma capability. Used With Permission 6 Sigma Academy Inc. 1995 Note: s = the Std Deviation of a Sample o = the Std Deviation of a Population Spread GE Company Proprietary June, 1998 Field Training/Basic Statistics 37 The Mean 1o USL 6o If six standard deviations can be fit between the mean and the specification limit, we would say the process has six sigma capability. Six Sigma Upper Specification Limit Process Capability GE Company Proprietary June, 1998 Field Training/Basic Statistics 38 Example for on-time delivery m = 8 AM on Requested Date USL = 4 PM on Requested Date 6o = 8 hours LSL = 8 AM on Day Prior to Requested Date 1o = 1.33 Hours 3s = 4 hours Six Sigma GE Company Proprietary June, 1998 Field Training/Basic Statistics 39 2 5 2 0 1 5 1 0 5 0 8 0 7 0 6 0 5 0 S a m p l e N u m b e r S a m p l e
M e a n
X - B a r C h a r t f o r P r o c e s s B X = 7 0 . 9 8 U C L = 7 7 . 2 7 L C L = 6 4 . 7 0 Process Capability Measures of process capability show us: - The results (output) of our process over time - When something has changed in our process - When our process may be statistically out of control
GE Company Proprietary June, 1998 Field Training/Basic Statistics 40 Measurement Time All Processes Have Variation - All repetitive activities of a process have a certain amount of fluctuation - Input, process and output measures will fluctuate - This fluctuation is called variation GE Company Proprietary June, 1998 Field Training/Basic Statistics 41 Machines
Materials
Methods
Measurement
Mother Nature
People P R O C E S S Sources Of Variation GE Company Proprietary June, 1998 Field Training/Basic Statistics 42
Variation - All variation is caused - There are two major classifications of causes Common Cause normal, day-to-day, predictable variation in a process Special cause unusual circumstances generating unpredictable variation Variation is the voice of the process learn to listen and understand it GE Company Proprietary June, 1998 Field Training/Basic Statistics 43 Common Causes Special Causes Common and Special Causes - Common to all occasions and places - Degree of presence varies - Each cause contributes a small effect to the variation in results - Variation due to common cause will almost always give results that are in statistical control
- Temporary or local; specific - May come and go sporadically - Evidence of the lack of statistical control is a signal that a special cause is likely to have occurred
GE Company Proprietary June, 1998 Field Training/Basic Statistics 44 Special Causes 2 5 2 0 1 5 1 0 5 0 7 5 7 0 6 5 S a m p l e N u m b e r S a m
p l e
M e a n
X - B a r C h a r t f o r P r o c e s s A X = 7 0 . 9 1 U C L = 7 7 . 2 0 L C L = 6 4 . 6 2 2 5 2 0 1 5 1 0 5 0 8 0 7 0 6 0 5 0 S a m p l e N u m b e r S a m p l e
M e a n
X - B a r C h a r t f o r P r o c e s s B X = 7 0 . 9 8 U C L = 7 7 . 2 7 L C L = 6 4 . 7 0 Used With Permission AlliedSignal 1995 - Dr. Steve Zinkgraf Variation - While every process displays Variation, some processes display controlled variation, while other processes display uncontrolled variation (Walter Shewhart). - Controlled Variation is characterized by a stable and consistent pattern of variation over time. Associated with Common Causes. - Uncontrolled Variation is characterized by variation that changes over time. Associated with Special Causes. - Process A shows controlled variation. - Process B shows uncontrolled variation GE Company Proprietary June, 1998 Field Training/Basic Statistics 45 - There will always be variability present in any process - We can tolerate variability if: The total variability of the output is relatively small compared to the process specifications and the process is on target The process is stable over time
LSL USL Nom USL LSL USL Nom Acceptable C o s t
C o s t
New Traditional Goal Post Mentality Used With Permission AlliedSignal 1995 - Dr. Steve Zinkgraf Can We Tolerate Variability? GE Company Proprietary June, 1998 Field Training/Basic Statistics 46 Used With Permission AlliedSignal 1995 - Dr. Steve Zinkgraf Process Stability - Determine if process is stable If process is not stable, identify and remove causes of instability - Determine the location of the process mean. Is it on target? If not, identify the variables which affect the mean and determine optimal settings to achieve target value - Estimate the magnitude of the total variability. Is it acceptable with respect to the customer requirements (spec limits)? If not, identify the sources of the variability and eliminate or reduce their influence on the process
GE Company Proprietary June, 1998 Field Training/Basic Statistics 47 From a statistical point of view, there are only two problems . . . It has too much spread It needs centering x x x x x xx xx x x x x x xxx xxx xx xx xx Lets take a look at both . . . Centering and Spread GE Company Proprietary June, 1998 Field Training/Basic Statistics 48 Inherent Capability of the Process General Assumptions: Over time, a typical process will shift and drift by approx. 1.5o also called short-term capability Time 1 Time 2 Time 3 Time 4 T LSL USL Sustained Capability of the Process also called long-term capability Visualizing the Process Dynamics Is the Process Stable ? Used With Permission 6 Sigma Academy Inc. 1995 GE Company Proprietary June, 1998 Field Training/Basic Statistics 49 Poor Process Capability LSL USL Very High Probability of Defects Very High Probability of Defects LSL USL Excellent Process Capability Very Low Probability of Defects Very Low Probability of Defects Used With Permission 6 Sigma Academy Inc. 1995 Is the Variability Acceptable to the Customer? Note: Specification limits (LSL and USL) must be defined using customer input! GE Company Proprietary June, 1998 Field Training/Basic Statistics 50 a Spec Limit Probably of a Defect GE Company Proprietary June, 1998 Field Training/Basic Statistics 51 High Probability of Defects Poor Design Capability High Probability of Defects LSL USL Low Probability of Defects Low Probability of Defects LSL USL The Normal Curve and Capability
GE Company Proprietary June, 1998 Field Training/Basic Statistics 52 3o Capability Historical Standard 4o Capability Current Standard 6o Capability New Standard Sigma Area Spelling Money Time Distance 3 o A floor space 1.5 misspelled words $2.7 Million indebtedness 3 1/2 months Coast-to-coast of a small hardware per page in a book per $1 billion in assets per century trip store. 4 o A floor space of a 1 misspelled word per $63,000 indebtedness per 2 1/2 days per 45 minutes of typical living room 30 pages in a book $1 billion in assets century freeway driving 5 o Size of the bottom 1 misspelled word in $570 indebtedness per 30 minutes per A trip to the of your telephone a set of encyclopedias $1 billion in assets century local gas station 6 o Size of a typical 1 misspelled word in all $2 indebtedness per 6 seconds per 4 steps in any diamond of the books contained $1 billion in assets century direction in a small library Understanding the Differences GE Company Proprietary June, 1998 Field Training/Basic Statistics 53 - Calculate X (mean) _______________ X = - Calculate s (std. deviation) _______________ s = o = Group Project Create a histogram of the height of class attendees. Calculate mean and standard deviation. X i 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 i 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 X i i 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 X i i X i
n ^ X i X n 1 GE Company Proprietary June, 1998 Field Training/Basic Statistics 54 Frequency Height