Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Contents
2. Problem formulation 3. Full factorial designs 4. Analysis of full factorial designs 5. Analysis of full factorial designs. II. Causes of bad models 6. Experimental objective: Screening 7. Post-screening actions 8. Experimental objective: Optimization 9. Experimental objective: Robustness testing 10. Conclusions 11. Additional Topics 12. Mixture design One Day Add-on 13. Exercises
2/10/2004
2/10/2004
Table of contents 1. Introduction 2. Problem formulation 3. Full factorial designs 4. Analysis of full factorial designs 5. Analysis of full factorial designs. II. Causes of bad models 6. Experimental objective: Screening 7. Post-screening actions 8. Experimental objective: Optimization 9. Experimental objective: Robustness testing 10. Conclusions 11. Additional topics D-optimal design Blocking the Experimental Plan Mixture design Other RSM designs Multilevel qualitative factors The Taguchi approach to robust design Simultaneous optimization of several responses Partial least squares projections to latent structures Design in Latent Variables 12. Mixture design One Day Add-On 13. Exercises Getting started: ByHand, CakeMix Screening, Full Fac: Pain, Tablets, Protein Spray-Drying Screening, Frac Fac: Pilot Plant, Reporter Gene Assay, Chromshper_B Optimization: Chiral Separation, Metabolism, Willge, DrugD Robustness Testing: Nonafact, HPLC Robustness Robust Design: CakeTaguchi, LoafVolume D-optimal design: Model Updating Blocking the Experimental Plan: Blocking Mixture design: Mixture Region Training, Waaler, Rocket, Corne59, Bubbles, Lowarp 14. References 05 17 29 39 55 67 85 99 115 127 131 132 147 156 172 175 180 196 202 214 235 287 289 301 317 353 379 393 411 425 429 465
Page 1 (1)
Contents
Why/How DOE and where DOE is used Three primary experimental objectives Three General Examples The intuitive approach to experimental work (COST) A better approach (DOE) Overview of steps in DOE (using CakeMix) Benefits of DOE Summary
2/10/2004
2/10/2004
Optimization
How shall we find the optimum? Is there a unique optimum, or is a compromise necessary to meet conflicting demands on the responses?
Robustness testing
How shall we adjust our factors to guarantee robustness? Do we have to change our product specifications prior to claiming robustness?
2/10/2004
t en tm ea Tr
Light
2/10/2004
t en tm ea Tr
Light
2/10/2004
1a
1m H 309/40
8000
6000
(II)
4000
2000
10
12
14 min
2/10/2004
X2
8 6 4
-2
10
11
12
13
14
X1
15
10 8 6 4 2 0
X2
-2 10 11 12 13 14
X1
9
15
2/10/2004
If not COST, what do we do instead? The solution is to construct a carefully prepared set of representative experiments, in which all relevant factors are varied simultaneously
200 X1 400
X3
50 100 X2 50
2/10/2004
10
Standard 300/75/75
100
X3
2/10/2004
11
2. Define Response(s)
2/10/2004
12
4. Make Model
Taste
N=11 DF=6 Cond. no.=1.1726 Y-miss=0
0.40
5. Interpret Model
Sh
R2=0.988 Q2=0.937
Egg
R2 Adj.=0.980 RSD=0.0974 Conf. lev.=0.95
Sh*Egg
2/10/2004
13
Flour = 400 g
2/10/2004
14
2/10/2004
15
Variability (Problem 3)
Every measurement and experiment is influenced by noise Under stable conditions every process and system varies around its mean, and stays within control limits; usually 3SD.s
2/10/2004
16
Reacting to noise
Consider one experiment where the temperature is changed from 35C to 40C The response change, from slightly below 93% to close to 96%, lies within the variability interval found when replicating
Ten measurements of yield, under identical conditions
yield
92
2/10/2004
94
96
98
yield
17
Focusing on effects
COST often implies an excess consumption of resources due to informationally inefficient distribution of the experiments DOE provides a better spread of the trials ==> averaging possibilities leading to more precise effect estimates
Y1
X2
X3
X1 X1
2/10/2004
X2 X1
18
Uncertainty of coefficient
N=11 DF=4
R2=0.995 Q2=0.874
2/10/2004
19
Consequence of variability
Two points, experiments, close to each other make the slope of the line be poorly determined
Y Y
Two points far away from each other make the slope be well determined
And if a center-point is put in between it is possible to explore whether our model is OK. Should it be linear or nonlinear?
Y
20
Screening
Balanced fraction of hyper cube
Optimization
2/10/2004
21
Benefits of DOE
Organized approach which connects experiments in a rational manner More useful information is obtained (the influence of all factors together) More precise information is acquired in fewer experiments Results are evaluated in the light of variability Support for decision-making: Map of the system (response contour plot)
2/10/2004
22
2/10/2004
23
Contents
Introduction to problem formulation Selection of experimental objective Definition of factors Definition of responses Selection of regression model The model concept Generation of experimental design Creation of worksheet Summary
2/10/2004
2/10/2004
System
Spray Drying Machine
Process
HPLC Equipment
2/10/2004
Responses (Y)
4
Factors (X)
Screening, optimization and robustness testing most frequently used The experimental objective tells which kind of investigation one wants to do. One should ask why is an experiment done? And for what purpose? And what is the desired result?
2/10/2004
PF - 1a. Familiarization
Useful when one is facing an entirely new type of application or equipment Spend a limited portion of the available resources, say, 10% Simple designs are used Goal: To verify that similar results are obtained for the replicated center-points, and that different results are found in the corners
2/10/2004
Factor 2
Factor 1
PF - 1b. Screening
Useful when one wants to find out a little about many factors Goal: To uncover the important factors and their appropriate ranges. Is factor/response relationship linear or non-linear? Results before . and after screening
Pareto principle (80/20 rule) With 25 factors approximately 5 have an effect Noise
2/10/2004
x2 x1
Interesting direction
2/10/2004
PF - 1d. Optimization
Useful when detailed knowledge about the factor influences are needed We do not ask if a factor is relevant (screening), but how (optimization) Goal: To identify the factor combination at which the desired response profile is fulfilled (or almost so)
2/10/2004
2/10/2004
10
11
PF - 2. Specification of factors
Categorization of factors Examples (MODDE) Quantitative Controlled & Uncontrolled Temperature 10C to 50C Process & Mixture (Formulation) Quant. Multilevel Quantitative & Qualitative Speed 200/300/400/500 rpm Qualitative Catalyst Pd/Pt/Mo Formulation Strawberries 0.3 - 0.4 Milk 0.3 - 0.4 Ice cream 0.3 - 0.4 Filler Solvent in mixture for which effect is uninteresting
12
2/10/2004
Transformation of factors
A factor can be transformed Examples:
log; neglog; logit; square root; fourth root
y y
log x
When ?
Variables with a natural zero Variables where the max/min ratio exceeds 10
12 10 8 6 4 2
1 2 7
Modde 3.0 by Umetri AB 1995-09-15 12:03
8 3 4
Types of variables
concentrations volumes levels
3.5
4.0
4.5
15
9 1 7 2
0 1.5
2.0
2.5
3.0
A bel
3.5
4.0
4.5
2/10/2004
10
13
Constraints of factors
An irregular experimental region may be defined by specifying linear constraints of factors
Investigation: itdoe_constraint Raw Data Plot with Experiment Number labels
5
pH
8 3 14 13 12 11 10 9
pH
1
120
7
130 140 Temp
5
150 160
14
Uncontrolled factors
These are factors that cannot be controlled, but which still may influence the results (responses)
Examples: Ambient humidity and temperature
Record values of uncontrolled factors, and include these in the data analysis Use randomization of experiments
2/10/2004
15
PF - 3. Specification of responses
Choose responses that are relevant; many responses often necessary (Regular, Derived, Linked) Continuous:
breakage of weld soot release when running a truck engine resolution of two adjacent peaks in liquid chromatography cost of material used in production (Derived response)
Discrete :
categorical answers of yes/no type the cake tasted good/did not taste well
16
Transformation of responses
Responses may be transformed A non-linear relationship between y and x, may be linearized by a suitable transformation of y Examples: no transf.; log; neglog; logit; square root; fourth root Transform after executing design
log y
x
2/10/2004
x
17
PF - 4. Selection of model
We distinguish between three main types of polynomial models
linear: interaction: quadratic: y = 0 + 1x1 + 2x2 +...+ y = 0 + 1x1 + 2x2 + 12x1x2 +...+ y = 0 + 1x1 + 2x2 + 11x12 + 22x22 + 12x1x2 +...+
Interaction: Screening
Quadratic: Optimization
18
Contour of Taste
5.70 5.40 5.10 Flour = 400.000
90
80
Eggpowder
70
4.80
60
4.20 3.90
50 50 60 70
4.50 5.10
80 90 100
Shortening
Toy train
Map of Iceland
2/10/2004
19
DOE is concerned with semi-empirical modeling using linear, interaction, quadratic, or cubic models
2/10/2004
20
PF - 5. Generation of design
Chosen model and design to be generated are intimately linked MODDE considers the number of factors, their levels and nature (quantitative, qualitative, ), and the selected experimental objective, and then recommends a design that is tailored to the researchers problem
2/10/2004
21
PF - 6. Creation of worksheet
An example worksheet with extra information
Run order; Constant factor; Uncontrollable factor; Inclusion of experiments;
Are the proposed experiments reasonable? Will they fulfil the goals?
2/10/2004
22
(ii) definition of factors (iii) definition of responses (iv) selection of regression model (v) generation of experimental design (vi) creation of worksheet
2/10/2004
23
interaction:
geometry: twisted plane objective: screening design: full or fractional factorial designs
quadratic:
geometry: curved plane objective: optimization (RSM in MODDE) design: composite designs
2/10/2004
24
Contents
Introduction to full factorial designs Construction and geometry of the 22, 23, 24 and 25 designs Pros and cons of full factorial designs Main effect of a factor By-hand methods for computing effects Interaction effects Plotting of interaction effects Computation of effects using least squares analysis Relationship between effects and coefficients How to express regression coefficients Summary
2/10/2004
Full factorial designs are regularly used with 2 - 4 factors In this chapter we consider two-level full factorial designs
2/10/2004
Notation
To perform a two-level full factorial design, the investigator has to assign a low level and a high level to each factor
Notation Standard Extended Example: Temp Example: pH Example: Cat. (A, B) Low 1 High + +1 Center 0 0
For a simple system, it may be convenient to display the coded unit together with original factor unit
y1 = yield
2/10/2004
X1
2/10/2004
Factors Original unit Exp. no x1 x2 1 1 25 2 1.5 25 3 1 100 4 1.5 100 5 1.25 62.5 6 1.25 62.5 7 1.25 62.5
X2
Standard 300/75/75
100
X3
Exp No 1 2 3 4 5 6 7 8 9 10 11
Experimental matrix Flour Short Egg Taste ening 200 50 50 3.52 400 50 50 3.66 200 100 50 4.74 400 100 50 5.2 200 50 100 5.38 400 50 100 5.9 200 100 100 4.36 400 100 100 4.86 300 75 75 4.68 300 75 75 4.73 300 75 75 4.61
6
2/10/2004
2/10/2004
Full factorial designs are realistic choices with 2-4 factors; with 5 or more factors fractional factorial designs are recommended
2/10/2004
2 3 4 5 6 7 8 9 10
--4 8 16 16 16 16 32 32
x1 = flour
5.00 Taste 4.80 4.60 4.40 200 220 240 260 280 300 Flour
N=11 DF=4 R2=0.995 Q2=0.874 R2 Adj.=0.988 RSD=0.0768 Conf. lev.=0.95
MODDE 7 - 2004-01-20 13:41:43
320
340
360
380
400
2/10/2004
10
1,3 = 94.4-80.4=14
Factors Response % Original unit x2 y3 Exp. no x1 1 1.0 25 80.4 2 1.5 25 72.4 3 1.0 100 94.4 4 1.5 100 90.6 5 1.25 62.5 84.5 6 1.25 62.5 85.2 7 1.25 62.5 83.8
4 1
(te m pe ra tu re )
5,6,7
100
X2
2
25 1.0
X1 (f 1.5 orm ic ac id/en amin e)
2/10/2004
11
4
pe ra tu re )
95 90 y3 85 80
5,6,7
(te m
100
2
25 1.0
X1 ( form
1,2 = 72.4 - 80.4 = -8 Main effect of formic acid/enamine: (1,2 + 3,4)/ 2 = (-8 + (-3.8))/2 = -5.9
1.3
1.4
1.5
ic ac id/en
1.5
amin e)
Investigation: Byhand (MLR) MODDE 7 - 2004-01-20 13:51:55 Main Effect for x2, resp. y3
3
Y3 (desired product)
1,3 = 94.4-80.4=14
4
pe ra tu re )
95
2,4 = 90.6 - 72.4 = 18.2
(t e m
100
y3
5,6,7
90 85 80
2
25 1.0
X1 ( form
75
Main effect of temperature: (1,3 + 2,4)/ 2 = (14 + 18.2)/2 = 16.1
30 40 50 60 70 80 90 100 x2
N=7 DF=3 R2=0.997 Q2=0.995 R2 Adj.=0.993 RSD=0.5728 Conf. lev.=0.95
MODDE 7 - 2004-01-20 13:50:19
ic ac id/en
1.5
amin e)
2/10/2004
12
Calculations refer to the computational matrix: 1st column gives the mean: (+80.4+72.4+94.4+90.6+84.5+85.2+83.8)/7 = 84.5; 2nd column gives the molar ratio, x1, main effect: (-80.4+72.4-94.4+90.6)/2 = - 5.9; 3rd column gives the temperature, x2, main effect: (-80.4-72.4+94.4+90.6)/2 = 16.1; 4th column gives the x1*x2 two-factor interaction: (+80.4-72.4-94.4+90.6)/2 = 2.1
2/10/2004
13
x1 (low ) x1 (high)
x2 (high) x2 (high)
y3
90 85 80 75
x1 (low) x1 (high)
x2 (low) x2 (low)
1.30 1.40 1.50
x1 (low) x1 (high)
30 40 50 60 x2
N=7 DF=3 R2=0.997 Q2=0.995 R2 Adj.=0.993 RSD=0.5728
MODDE 7 - 2004-01-20 13:54:57
70
80
90
100
2/10/2004
14
Mild interaction
Strong interaction
Investigation: Cakemix (MLR) Interaction Plot for Sh*Egg, resp. Taste
5.50 5.00 Taste
Egg (low ) Egg (high)
X2 (high)
X2 (high)
Width
Sp (low)
1.40 1.20 1.00 0.80
Sp (low) Sp (high)
Power
N=22 DF=15 R2=0.972 Q2=0.940 R2 Adj.=0.961 RSD=0.0594
MODDE 7 - 2004-01-20 14:08:37
4.50 4.00
X2 (low)
3 4 5 6 X5
N=33 DF=22 R2=0.989 Q2=0.974
X2 (low)
7 8 9 10
Sp (high)
3.50
Egg (low)
50 60 70 80 90 100 Shortening
N=11 DF=6 R2=0.988 Q2=0.937 R2 Adj.=0.980 RSD=0.0974
MODDE 7 - 2004-01-20 14:09:21
2.20 2.40 2.60 2.80 3.00 3.20 3.40 3.60 3.80 4.00 4.20
R2 Adj.=0.984 RSD=0.3925
MODDE 7 - 2004-01-20 14:07:18
2/10/2004
15
An important consequence of least squares analysis is that the outcome is not main and interaction effect estimates, but a regression model consisting of coefficients reflecting the influence of the factors (see below)
2/10/2004
16
3.5
2.5
Factor X1
2 2 2.5 3 3.5 4
2/10/2004
17
Effect
Indicates response change when factor changes from -1 to +1 Effects are sorted according to abs(size)
1.00 0.50 Effects 0.00 -0.50 -1.00
N=11 DF=4
R2=0.995 Q2=0.874
Fl
Sh*Egg
Fl*Egg
Egg
Example: CakeMix
2/10/2004
N=11 DF=4
R2=0.995 Q2=0.874
Fl*Sh
Sh
18
2/10/2004
19
2/10/2004
20
Contents
Introduction Minimum level of data analysis
Examples: CakeMix & ByHand
2/10/2004
Introduction
Analysis of DOE-data consists of three primary stages:
evaluation of raw data
get a general appraisal for regularities and peculiarities in the data understand and/or remove anomalies
2/10/2004
2/10/2004
6 4 3 5 8 7 1
1 2
17 16 2 4 3 1
1 2 3 4 5
9 11 10
6 8 5
6 7
13 11 9 10 12
8
14
15 18 19 20
2
3 4 5 6 7 8 9 Replicate Index
9 10 11 12 13 14 15
Good
2/10/2004
Bad
Taste
Cond. no.=1.1726 Y-miss=0
Regression analysis - R2
Goodness of fit, R2 = 1- SSres/SStot.corr. measures how well we can reproduce current runs varies between 0 and 1 1 = perfect model (all points on line) easy to get arbitrarily close to 1 provides basis for raw and standardized residuals in Nplot
Investigation: cakemix (MLR) Taste 6.00 5.50 Observed 5.00 4.50 4.00 3.50
6 4 7 1
3.50
38 9 11 10
2
4.00 4.50 5.00 5.50 6.00 Predicted
N=11 DF=4
R2=0.995 Q2=0.874
R2 Adj.=0.988 RSD=0.0768
2/10/2004
Regression analysis - Q2
Goodness of prediction, Q2 = 1- SSpress/SStot.corr. uncovers how well we can predict new experiments varies between - and 1 better indicator of model usefulness Q2 > 0.5 GOOD Q2 > 0.9 EXCELLENT provides basis for deleted studentized residuals in N-plot
Investigation: cakemix (MLR) Taste 6.00 5.50 5.00 4.50 4.00 3.50
6 4 7 X 1
3.50
Observed
38 9 11 10
2
4.00 4.50 5.00 5.50 6.00 Predicted
N=11 DF=4
R2=0.995 Q2=0.874
R2 Adj.=0.988 RSD=0.0768
Model Validity < 0.25 indicates significant lack of fit (i.e., model imperfection) Only available when replicated experiments have been performed
2/10/2004
Observed
10 5 0 0
Significant lack-of-fit
6 7 5
4 3 5 10 Predicted
N=7 DF=3 R2=0.698 Q2=-10.499 R2 Adj.=0.396 RSD=5.0485
15
6 4 3 5 8 7 1
1 2
17 16 2 4 3 1
1 2 3 4 5
9 11 10
6 8 5
6 7
13 11 9 10 12
8
14
15 18 19 20
2
3 4 5 6 7 8 9 Replicate Index
9 10 11 12 13 14 15
Good
2/10/2004
Bad
10
Taste
N=11 DF=4 Cond. no.=1.1726 Y-miss=0
N=11 DF=4 R2 Adj.=0.988 RSD=0.0768 Conf. lev.=0.95
MODDE 7 - 2004-01-20 14:47:02
Taste
N=11 DF=6 Cond. no.=1.1726 Y-miss=0
N=11 DF=6 R2=0.988 Q2=0.937
2/10/2004
11
12
2/10/2004
13
1
10 y2
6 7 5
0
y1
N=7 DF=3
3
1 2 3 Replicate Index 4
4
5
y2
Cond. no.=1.3229 Y-miss=0
y3
2/10/2004
14
2/10/2004
15
2/10/2004
16
2/10/2004
17
2/10/2004
18
All factorial designs, without center-points, have condition number 1 Compute condition number before and after altering the design
19
6 4 2 12 22 32 42 52 62 72 0
Bins
Investigation: CakeMix Descriptive Statistics for Taste
Bins
Bins
6
60
80 60
5
40 -
40 20
Skewness
20
Taste
Min: 3.52, Max: 5.9, Median: 4.73, Mean: 4.69455
V11
Min: 9, Max: 77, Median: 62.125, Mean: 56.1667
2/10/2004
20
N-Probability
9 5
2
-1
13 12 16 14 11 48
10 37 17 15
2/10/2004
21
Flour = 400g
2/10/2004
22
All diagnostic tools are retained (R2/Q2, N-plot, etc.). In addition, PLS provides other useful diagnostic tools
2/10/2004
23
2/10/2004
24
2/10/2004
25
Count
3.0
3.9
4.8 Bins
5.7
6.6
Investigation: Cakemix_cost Plot of Replications for Taste with Experiment Number labels
6.00 5.50 Taste 5.00 4.50 4.00 3.50 1
6 4 3 7 1
2
5 8 9 11 10
2
3 4 5 6 7 8 9 Replicate Index
MODDE 7 - 2004-01-21 08:21:13
2/10/2004
26
0.80
Taste
N=11 DF=4 Cond. no.=1.1726 Y-miss=0
N=11 DF=4 R2 Adj.=0.988 RSD=0.0768 Conf. lev.=0.95
MODDE 7 - 2004-01-20 14:47:02
Taste
N=11 DF=6 Cond. no.=1.1726 Y-miss=0
N=11 DF=6 R2=0.988 Q2=0.937
2/10/2004
27
1 4 9 11 3 7 5 10 2
-3 -2 -1 0 1 2 3 4 5 Deleted Studentized Residuals
N=11 DF=6 R2=0.988 Q2=0.937 R2 Adj.=0.980 RSD=0.0974
MODDE 7 - 2004-01-21 08:25:01
6 8
2/10/2004
28
Max Taste
Min Cost
2/10/2004
29
use of model
done to find out the impact of the model: What does it mean? Where should new experiments be positioned?
2/10/2004
30
Contents
Review of data analytical steps
evaluation of raw data regression analysis and model interpretation use of model
2/10/2004
2/10/2004
2/10/2004
Count
3.0
3.9
4.8 Bins
5.7
6.6
Investigation: Cakemix_cost Plot of Replications for Taste with Experiment Number labels
6.00 5.50 Taste 5.00 4.50 4.00 3.50 1
6 4 3 7 1
2
5 8 9 11 10
2
3 4 5 6 7 8 9 Replicate Index
MODDE 7 - 2004-01-21 08:21:13
2/10/2004
0.80
Taste
N=11 DF=4 Cond. no.=1.1726 Y-miss=0
N=11 DF=4 R2 Adj.=0.988 RSD=0.0768 Conf. lev.=0.95
MODDE 7 - 2004-01-20 14:47:02
Taste
N=11 DF=6 Cond. no.=1.1726 Y-miss=0
N=11 DF=6 R2=0.988 Q2=0.937
2/10/2004
1 4 9 11 3 7 5 10 2
-3 -2 -1 0 1 2 3 4 5 Deleted Studentized Residuals
N=11 DF=6 R2=0.988 Q2=0.937 R2 Adj.=0.980 RSD=0.0974
MODDE 7 - 2004-01-21 10:04:51
2/10/2004
N-Probability
6 8
Max Taste
Min Cost
2/10/2004
10
11
10 8 6 4 2 0 -1 24 49 Bins 74 99 124
0 -3 -2 -1 0 Bins 1 2 3
40 20 0 S/B
S/B Min: -0.2 Max: 117 Median: 1.7 Mean: 11.8053
-1
-2 S/B~
S/B Min: -2 Max: 2.06896 Median: 0.281033 Mean: 0.219957
2/10/2004
12
Replicate plot
S/B
120 100 80
16
1 S/B~
6 1 2 3
1 2 3 4 5 6 7 8
8 7
60 40 20
14 15 6 8 1 2 3 4 5 7 9 10111213
1 2 3 4 5 6 7 8 Replicate Index
MODDE 7 - 2004-02-02 15:37:01
16 14 15 13 19 17 18 12 9 10 11
-1
19 17 18
-2
9 10 11 12 13 14 15 16 17
9 10 11 12 13 14 15 16 17
Replicate Index
MODDE 7 - 2004-02-02 15:35:41
R2 Investigation: Reporter Gene Assay Screening (MLR) Q2 Summary of Fit Model Validity Reproducibility
R2 Investigation: Reporter Gene Assay Screening (MLR) Q2 Model Validity Summary of Fit Reproducibility
1.00
0.80
0.60
0.40
0.40
0.20
0.00 S/B~
Cond. no.=1.0897 Y-miss=0
2/10/2004
13
2/10/2004
14
0.80
0.60
0.40
0.20
10 26
0
15 31 21 5 19 3 823 7 24 25 30 9 29 13 27 32 18 11 12 14 1 17 28 6 2 16 22 4 20
2 1 0 -1 -2
26 10
15 31 19 5 21 4 20 3 6 22 118 7 2 16 32 30 78 23 1 29 11 27 12 9 14 13 25 24 28
N-Probability
0.00
0.99 0.98 0.95 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.05 0.02 0.01 -4 -3 -2
21 19 5 4 3 20 6 22 1 7 8 2 18 16 32 2 3 7 1 30 29 11 27 12 9 14 13 24 25 28
-1 0 1 2
26 15 10 31
2000
4000
6000
2000 Predicted
4000
6000
Predicted
N=32 DF=21 R2=0.876 Q2=0.712 R2 Adj.=0.817 RSD=844.4341
N=32 DF=21
R2=0.876 Q2=0.712
R2 Adj.=0.817 RSD=844.4341
MODDE 7 - 2004-01-21 10:23:15
R2 Q2
0.80
0.60
0.40
Observed
10
3.00
4 10 26
0.20
0.00
2.50
Time~
N=32 DF=21 Cond. no.=1.0000 Y-miss=0
26
28
Predicted
21 23
N-Probability
3.50
15 31 5 19 3 21 78 23 24 25 30 9 13 29 27 18 32 11 14 12 1 17 28 6 20 16 2 22
32 19 31 2017 18 30 29 9 78 5 6 11 27 16 14 4 2 12 13 2524 3 15 22 1
26
-2
32 19 20 18 31 30 17 29 10 9 8 7 11 6 25 27 16 14 4 12 15 13 25 24 3 1 21 22 28 23
-1 0 1 2
N=32 DF=21
R2=0.990 Q2=0.978
R2 Adj.=0.986 RSD=0.0399
MODDE 7 - 2004-01-21 10:30:50
Straight line
No patterns
Curvature is a problem in screening because the used linear and interaction models are unable to fit such a phenomenon Fortunately, problems related to curvature are easily detected and fixed Detection Tools:
Replicate plot Low Q2 & Model Validity LoF (ANOVA)
6 7 5
3
4
4
5
-0.20
y1
N=7 DF=3
y2
Cond. no.=1.3229 Y-miss=0
y3
Example: ByHand
2/10/2004
16
0.80
0.60
0.40 0.20 0.00 -0.20
0.40
0.20
y1
N=7 DF=3
y2
Cond. no.=1.3229 Y-miss=0
y3
0.00
y1
N=7 DF=3
y2
Cond. no.=2.8209 Y-miss=0
y3
2/10/2004
17
0.80
0.80
0.60
0.60
0.40
0.40
0.20
0.20
0.00
y1
N=7 DF=3
y2
Cond. no.=2.8209 Y-miss=0
y3
0.00
y1
N=7 DF=3
y2
Cond. no.=2.8209 Y-miss=0
y3
x2
x1*x1
x2
x2*x2
2/10/2004
18
A third common cause resulting in a poor screening model is when replicated experiments spread too much Detection Tools:
Replicate plot ANOVA table Reproducibility bar (here = 0.53, but not shown)
17 16 2 3 1 5 7 4 6 8 9 10 12
12 13 14 15
13 11
14
15 18 19 20
2/10/2004
19
8
-6 -5 -4 -3 -2 -1
12 14 10 17 1 20 16 15 9 19 18 6 2 13 4 11 5 7 3
0 1 2 3 4 5 6 7
N-Probability
N-Probability
N-Probability
8
-1
17 20 16 15 9 6 19 2 13 4 18 11 5 7 3
0 Standardized Residuals
N=20 DF=10 R2=0.980 Q2=0.849 R2 Adj.=0.961 RSD=5.0006
14 1 10
12
0.98 0.95 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.05 0.02
7 3
17 20 16 15 69 19 2 13 4 18 11 5
14 1 10
12
-9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 Raw Residuals
N=20 DF=10 R2=0.980 Q2=0.849 R2 Adj.=0.961 RSD=5.0006
MODDE 7 - 2004-01-21 11:54:57
2/10/2004
20
A mapping of a new factor requires more experiments; therefore, in reality, we usually only eliminate factors in screening
2/10/2004
21
2/10/2004
22
Contents
General Example 1
Background Steps in problem formulation Introduction Geometry Confoundings Generators Defining Relation Resolution Summary of properties
General Example 1
Summary
Evaluation of raw data Regression analysis and model interpretation Use of model
2/10/2004
Principal investigators: Lena Schultz and Lisbeth Abramo Active Biotech AB, Lund
t en tm ea Tr
Light
2/10/2004
PF - Specification of factors
The Ishikawa, or fishbone, system diagram is a very helpful method to overview all factors Reduces the risk of missing a critical factor The four Ms Practical maximum depth 4-5 levels
Methods
Manpower
Machines
Materials
2/10/2004
PF - Specification of factors
Six factors:
Number of cells/well (50000 400000) PMA (stimulator) (5 100 ng/ml) Ionomycin (stimulator) (0.1 2 g/ml) Stimulation time (3 6 hours) Lysing volume (30 100 l) Ratio sample/substrate (2 10)
2/10/2004
PF - Specification of responses
It is important to select responses that are relevant according to the experimental goals Many responses is not a problem Here: Signal-to-background ratio, computed as: [(signal-background)/background]
2/10/2004
2/10/2004
Not all parameters are of appreciable size and meaningful -- hierarchy Linear terms tend to be larger than two-factor interactions, which, in turn, tend to be larger than three-factor interactions, ... A 2k full factorial design has a parameter redundancy, i.e., an excess number of parameters which can be estimated but which lack relevance Fractional factorial designs exploit this redundancy, by reducing the number of design runs
2/10/2004
10
8 Eggpowder 4
Sho rten ing
100
5
50 100
4
rten in g
Sho rten in g
50 100
1 200
50 100
Sh o
1 200
Flour
2 400
50
Flour
2 400
50
1 200
Flour
2 400
50
2/10/2004
11
x4 = x1x2x3 + + + +
2/10/2004
12
Confounding of effects
Reduction of experiments means that effects become confounded, that is, to a certain degree mixed up with each other The 16 possible effects are evenly allocated as two effects per column Main effects are confounded with the three-factor interactions Comparatively simple confounding situation
x1x2x3x4 constant + + + + + + + + x2x3x4 x1 + + + + x1x3x4 x2 + + + + x 1x 2x 4 x3 + + + + = x 1x 2x 3 x4 + + + + x 3x 4 x 1x 2 + + + + x 2x 4 x 1x 3 + + + + x 2x 3 x 1x 4 + + + +
1 2 3 4 5 6 7 8
2/10/2004
13
2/10/2004
14
x2 / x1x3x4
x1 / x2 / x3 / x4 / x1x2 / x1x3 / x1x4 / x2x3x4 x1x3x4 x1x2x4 x1x2x3 x3x4 x2x4 x2x3
2/10/2004
15
Generators - Introduction
The generator dictates which specific fraction will be selected, and thereby, indirectly, controls the confounding pattern
7 8 Eggpowder 100
7 8 Eggpowder 100
8 Eggpowder 4
Sho rten in g
100
5
50 100
4
Sho rten ing
Sho rten in g
50 100
1 200
50 100
1 200
Flour
2 400
50
Flour
2 400
50
1 200
Flour
2 400
50
Run 5 2 3 8
x1 + +
x2 + +
x3 = x1x2 + +
Run 1 6 7 4
x1 + +
x2 + +
-x3 = x1x2 + + -
2/10/2004
16
9 10 11 12 13 14 15 16
2/10/2004
+ + + +
+ + + +
+ + + +
+ + + + + + + +
9 2 3 12 5 14 15 8
17
Multiple generators
Example: Construction of 25-2 fractional factorial design Generators: x4 = x1x2 and x5 = x1x3 Fourth and fifth factors may be introduced in the design as +x4/+x5, +x4/-x5, -x4/+x5, or -x4/-x5 Four possible quarter-fractions of 8 experiments
1 2 3 4 5 6 7 8 x1 + + + + x2 + + + + x3 + + + + x4 = x 1x 2 + + + + x5 = x1x3 + + + + x2x3 + + + + x1x2x3 + + + +
2/10/2004
18
(2) X1*X1=X12=I
X1 * X1 = I + + + + + + + +
Step 1: Identify generator(s): Step 2: Multiply both sides by X4: Step 3: Apply rule 2 This is the defining relation for the 24-1 design
2/10/2004
19
2/10/2004
20
2/10/2004
21
2/10/2004
22
Resolution IV Resolution V
I=a*b*c*d
Main effects unconfounded with two-factor interactions. Two-factor interactions still confounded with each other. Recommended for screening
I=a*b*c*d*e
Main effects unconfounded with two-factor interactions. Two-factor interactions unconfounded with each other. Resolution V designs are almost as good as full factorial designs.
2/10/2004
23
Resolution
Defining Relation
Generator(s)
Selected Fraction
2/10/2004
24
4 8
+/-X4=X1*X2*X3
2 Res III
+/-X4=X1*X2 +/-X5=X1*X3
5-2
6-3
2 Res III
7-3
7-4
16
2 Res V
5-1
+/-X5=X1*X2*X3*X4
2 Res IV
6-2
+/-X5=X1*X2*X3 +/-X6=X2*X3*X4
2 Res IV
32
6-1
7-2
8-3
2 Res IV
9-4
2 Res IV
10-5
64
7-1
8-2
9-3
10-4
D-opt
128
8-1
9-2
10-3
D-opt
2/10/2004
25
2/10/2004
26
16
16 14 12 Count 10 8 6
Before
S/B
60 40 20 0 1
14 15 6 8 1 2 3 4 5 7 9 10111213
2 3 4 5 6 7 8 Replicate Index
19 17 18
2 0 -1 24 49 Bins
MODDE 7 - 2004-02-02 15:55:58 Investigation: Reporter Gene Assay Screening
9 10 11 12 13 14 15 16 17
74
99
124
Investigation: Reporter Screening MODDE Gene 7 - 2004-02-02Assay 15:55:14 Plot of Replications for S/B~ with Experiment Number labels
2
Histogram of S/B~
9 8 7 6 Count 5 4 3 2 1
After
6 1 2 3
1 2 3 4 5 6 7 8
8 7
S/B~
16 14 15 13 19 17 18 12 9 10 11
-1
-2
9 10 11 12 13 14 15 16 17
0 -3 -2 -1 0 Bins 1 2 3
Replicate Index
MODDE 7 - 2004-02-02 15:57:02
2/10/2004
27
2/10/2004
28
The default linear model looks good with no evidence of lack of fit (R2 = 0.92, Q2 = 0.79, MVal = 0.65, Rep = 0.96)
0.98 0.95 0.9 N-Probability
1.00
0.80
0.60
0.40
0.20
0.00 S/B~
N=19 DF=12 Cond. no.=1.0897 Y-miss=0
Investigation: Reporter Gene Assay Screening (MLR) S/B~ with Experiment Number labels
No outliers
0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.05 0.02 -4 -3 -2
1 16 19 15 17 2 18 8 49 5 13 12 10 7 11 14 6
-1 0 1 2 3 4
2/10/2004
29
Investigation: Reporter Gene Assay Screening (MLR) Scaled & Centered Coefficients for S/B~
1.00 0.80 0.60
Reproducibility
1) PMA and Ratio were removed 2) Six two-factor interactions were added 3) Only three interactions were kept (Cel*Lys, Ion*StH, and Ion*Lys) 4) The revised model is much better (R2 = 0.96, Q2 = 0.91, Mval = 0.79, Rep = 0.96)
2/10/2004
0.80
0.60
0.40
0.20
N=19 DF=12
R2=0.917 Q2=0.791
R2 Investigation: Reporter Gene Assay Screening (MLR) Q2 Model Validity Summary of Fit Reproducibility
Investigation: Reporter Gene Assay Screening (MLR) Scaled & Centered Coefficients for S/B~
1.00 0.80 0.60
1.00
0.80
0.60
0.40 0.20
0.40
0.00 -0.20
0.20
Cel Cel*Lys Ion*Lys Lys Ion Ion*StH StH
0.00 S/B~
N=19 DF=11 Cond. no.=1.0897 Y-miss=0
N=19 DF=11 R2=0.962 Q2=0.914
Lys
StH
Rat
Ion
0.00
30
2/10/2004
31
32
Summary
Fractional factorial designs form the most widely used family of screening designs Many factors can be mapped in few runs Confounding of effects is a disadvantage, but this can be reasonably tolerated by selecting a ResIV design Reporter Gene Assay:
Very good model for S/B Indication of some small interaction terms, which may be important More experiments, to possibly improve the modelling of S/B, and to resolve confounded two-factor interactions, will be done using Fold-over (Chapter 7)
2/10/2004
33
Contents
Principles for inter- and extrapolation Basic requirement: Sound modelling Main outcomes Gradient techniques & software optimizer Adding new experiments Reporter Gene Assay
Creating the fold-over design Data Analysis
Summary
2/10/2004
2/10/2004
2/10/2004
Use richness of diagnostic tools to acquire a reliable and predictive model Example: Reporter Gene Assay (plots from Chap. 6)
0.80
0.60
0.40
0.20
0.00 S/B~
N=19 DF=11 Cond. no.=1.0897 Y-miss=0
Investigation: Reporter Gene Assay Screening (MLR) Scaled & Centered Coefficients for S/B~
1.00 0.80 0.60 0.40 0.20 0.00 -0.20 Cel Cel*Lys Ion*Lys
MODDE 7 - 2004-02-04 08:51:25
Lys
Ion
N=19 DF=11
R2=0.962 Q2=0.914
2/10/2004
Ion*StH
StH
One of the performed experiments fulfills the experimental goals (IDEAL case) Make a limited set of new trials to verify the golden run
2/10/2004
Predictions inside region lead forward to an interesting point or small region Make a limited set of new trials to verify this point or small region
2/10/2004
Predictions ouside region lead forward to an interesting point or small region Make a limited set of new trials to verify this point or small region
2/10/2004
2/10/2004
Gradient techniques
Steepest ascent or descent. Example shows Steepest descent Gradient techniques work best with fairly few responses, and when occurring twofactor interactions are fairly small
40
35 137
106
50 90 130 170
NH3
30
7 5 . 0
25
20 15 12.5 10 1.10
43.7
169
1.15
1.20
1.25
1.30
1.35
1.40
Airfuel
2/10/2004
10
Software optimizer
The MODDE optimizer will simultaneously start as many as eight simplexes, from different locations in the factor space (Details in Chapter 8) Example: Reporter Gene Assay The eight starting points in factor space
2/10/2004
11
unconfounding
2/10/2004
curvature
12
2/10/2004
13
14
2/10/2004
15
x4 = -x1x2 x5 = -x1x3
2/10/2004
16
2/10/2004
17
2/10/2004
18
Investigation: Reporter Gene Assay Screening Fold over complement Histogram of S/B
35
Count
30
16
100 S/B
20
50
14 27 15 33 32 13 17 19 20 36 37 25 18 38 12 24 26 29 31 34 12345678910 11 21 22 23 28 30
0 10 20 Replicate Index 30
10
0 -1 24 49 74 Bins
MODDE 7 - 2004-02-04 09:02:08
99
124
149
174
The S/B response is not normally distributed (there are a few extreme values)
2/10/2004
19
Investigation: Reporter Gene Assay Screening - Fold over complement Histogram of S/B~
12 10 8 Count 6 4 2
35 16 14 27 15 33 32 68 13 19 3436 37 17 25 38 12 18 7 26 24 29 5 31 10 9 30 21 23 28 12 4 22 11 3
10
20
20 Replicate Index
MODDE 7 - 2004-02-04 09:05:45
30
2/10/2004
20
Investigation: Reporter Gene Assay Screening - Fold over complement (MLR) S/B~ with Experiment Number labels
0.99 0.98 0.95 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.05 0.02 0.01 -4 -3 -2
Reproducibility
1.00
0.80
0.60
0.40
0.20
1 19 16 36 37 28 17 38 27 2 35 22 18 2 3 9 15 26 32 5 13 8 10 21 31 4 30 25 33 14 1 2 6 20 34 7 11 24 29
-1 0 1 2 3 4
0.00 S/B~
N=38 DF=30 Cond. no.=1.0897 Y-miss=0
N-Probability
No outliers
2/10/2004
21
N=38 DF=30
2/10/2004
22
Investigation: Reporter Gene Assay Screening - Fold over complement (MLR) Scaled & Centered Coefficients for S/B~
0.80
Reproducibility
1.00
0.80
0.60
0.60
0.40 0.20
0.40
0.00
0.20
0.00 S/B~
N=38 DF=30 Cond. no.=1.0897 Y-miss=0
N=38 DF=30
Investigation: Reporter Gene Assay Screening - Fold over complement (MLR) Scaled & Centered Coefficients for S/B~
0.80 0.60
Reproducibility
1.00
0.80
0.60
0.40
0.20
Ion
0.00 S/B~
N=38 DF=33 Cond. no.=1.0897 Y-miss=0
N=38 DF=33
R2=0.912 Q2=0.887
R2
2/10/2004
= 0.91,
Q2
Investigation: Reporter Gene Assay Screening - Fold over complement (MLR) S/B~ with Experiment Number labels
Deleted Studentized Residuals 2 1 0 -1 -2 -1 0 Predicted
N=38 DF=33 R2=0.912 Q2=0.887 R2 Adj.=0.902 RSD=0.3018
MODDE 7 - 2004-02-04 09:28:55
19 16 1 36 137 7 28 35 22 27 38 2 3 18 15 8 926 13 4 5 12 32 31 30 7 11 10 34 21 14 6 25 33 24 20 3 29
-1 0 1 2 3 4 Deleted Studentized Residuals
N=38 DF=33 R2=0.912 Q2=0.887 R2 Adj.=0.902 RSD=0.3018
MODDE 7 - 2004-02-04 09:28:11
19 36 37 17 28 22 23 38 2 18 8 26 9 4 5 12 13 7 34 11 30 31 21 10 25 24 20 3 29 1
1
16 27 15 32 6 33 35
N-Probability
14
2
2/10/2004
StH
24
Investigation: Reporter Gene Assay Screening - Fold over complement (MLR) Scaled & Centered Coefficients for S/B~
Cells, Ionomycin and Stimulation Time (StH) are most important Lysing Volume will not be varied in RSM (Chapter 8)
Ion
N=38 DF=33
R2=0.912 Q2=0.887
2/10/2004
StH
25
2/10/2004
26
2/10/2004
27
Contents
General Example 2
Background Problem formulation
Summary
2/10/2004
Principal investigators: Lena Schultz and Lisbeth Abramo Active Biotech AB, Lund
t en tm ea Tr
Light
2/10/2004
2/10/2004
PF - Specification of factors
Factors:
Cells Stimlation time Ionomycin
Old range
50000 400000 cells 26h 0.1 2 g/ml
New range
200000 400000 cells 46h 1 2 g/ml
2/10/2004
PF - Specification of responses
It is important to select responses that are relevant according to the experimental goals Many responses is not a problem Here: Signal-to-background ratio, computed as: [(signal-background)/background]
2/10/2004
2/10/2004
2/10/2004
10
All factors are mapped in five levels with the CCC design This makes it possible to estimate quadratic terms with great rigor The corner experiments and the axial experiments are all situated on the circumference of a circle with radius 1.41, and therefore the experimental region is symmetrical
2/10/2004
11
2/10/2004
12
2/10/2004
13
MODDE CCF CCC CCF is the recommended design choice for pilot plant and full scale investigations
2/10/2004
14
CCC
CCF
2/10/2004
15
Number of factors 2 3 4 5 6 7
2/10/2004
16
Investigation: Reporter Gene Assay RSM with CCF Descriptive Statistics Plot
8
200
14 12 1011 13 16 17 15 18
Count
6 5 4 3 2
200
7
150 S/B
6 4 5 3 9
1 2 3 4 5 6 7 8 9
150
100
100
50
1
10 11 12 13 14 15
50
Replicate Index
MODDE 7 - 2004-02-04 09:59:25
S/B
Min: 17.9 Max: 221.4 Median: 107.8 Mean: 120.683
2/10/2004
17
1.00
0.80
0.60
0.40
0.20
0.00 S/B
Investigation: Reporter Gene Assay RSM with CCF (MLR) Scaled & Centered Coefficients for S/B
N=18 DF=8
50
-50
-100 Cel Cel*Cel Ion*Ion Ion Cel*Ion StH*StH Cel*StH StH*Ion StH
N=18 DF=8
R2=0.908 Q2=0.558
2/10/2004
18
The two cross-terms Cel*StH and Cel*Ion and the quadratic term StH*StH were omitted The revised model is much better (R2 = 0.89, Q2 = 0.74, MVal = 0.92, Rep = 0.79).
1.00
0.80
0.60
0.40
0.20
0.00 S/B
N=18 DF=11 Cond. no.=4.0089 Y-miss=0
Investigation: Reporter Gene Assay RSM with CCF (MLR) Scaled & Centered Coefficients for S/B
50
-50
Cel
Cel*Cel
Ion*Ion
Ion
N=18 DF=11
R2=0.896 Q2=0.739
2/10/2004
StH*Ion
StH
19
Investigation: Reporter Gene Assay RSM with CCF (MLR) S/B with Experiment Number labels
Deleted Studentized Residuals 2 1 0 -1 -2 20 40
1 16 17 15 3 7 6 4 10 8 512 14 11 2
1 3 2 9
60 80 100 120
4 10 5 11
16 17 15 6
7 12
8 14
18 13 9
-2 -1
13 18
140 160 180 200 220
Predicted
N=18 DF=11 R2=0.896 Q2=0.739 R2 Adj.=0.840 RSD=22.9934
MODDE 7 - 2004-02-04 10:07:27
2/10/2004
20
2/10/2004
21
2/10/2004
22
2/10/2004
23
2/10/2004
24
Below zero means that we are between Target and Min for the response
2/10/2004
25
Performed in order to reduce the risk of being trapped by a local minimum or maximum
2/10/2004
26
2/10/2004
27
The relevance of the above factor combination was tested in a final robustness testing design.
2/10/2004
28
The CCC and CCF designs differ in how the star points, or axis points, are positioned Both CCC and CCF support quadratic models
2/10/2004
29
2/10/2004
30
Contents
Introduction to robustness testing General Example 3
Background Steps in problem formulation
General Example 3
Evaluation of raw data Regression analysis and model interpretation
2/10/2004
2/10/2004
1a
1m H 309/40
8000
6000
(II)
4000
2000
10
12
14 min
2/10/2004
2/10/2004
2/10/2004
H 309/40 1m
8000
6000
(II)
4000
2000
Specifications:
Res1 should be >1.5 (complete baseline separation) k1 N/A k2 N/A
10
12
14 min
2/10/2004
2/10/2004
2/10/2004
2/10/2004
10
3 Center points
2 "Center" points
100
50
50
2/10/2004
11
100
50 100
1 200
7 8 Eggpowder 100 7
Flour
2 400
50
8 Eggpowder 4
Sho rten ing
100
4
Sho rten ing
50 100 1 200
50 100
1 200
Flour
2 400
50
Flour
2 400
50
2/10/2004
12
In some cases a PB-design is a specific fraction of a factorial design Number of runs a multiple of 4 PB designs of 12, 20, and 24 runs of particular interest
2/10/2004
13
14
2/10/2004
15
2/10/2004
16
2/10/2004
17
k1
N=12 DF=6
k2
Cond. no.=1.2289 Y-miss=0
Res1
2/10/2004
18
2/10/2004
19
Investigation: HPLC Robustness (MLR) Scaled & Centered Coefficients for Res1 (Extended)
0.040 0.020 0.000 -0.020 -0.040
1 2
7 8 5 1 90 4 6
12 11
Co(ColA)
pH
Ac
10 11
0.00
Replicate Index
MODDE 7 - 2004-01-22 15:00:10
k1
N=12 DF=6
k2
Cond. no.=1.2289 Y-miss=0
Res1
N=12 DF=6
R2=0.772 Q2=0.121
2/10/2004
Co(ColB)
OS
Te
20
k1
N=12 DF=6
k2
Cond. no.=1.2289 Y-miss=0
Res1
vetific
N=11 DF=5 Cond. no.=1.1726 Y-miss=0
2/10/2004
21
k2, used to illustrate this limiting case; temporary spec. between 2.7 and 3.3 Coefficients used for understanding two things, namely (i) how to get k2 inside specification and (ii) how to produce a nonsignificant model (how to get the second limiting case ?) Rows 2-3: extreme cases Rows 4-5: how to enter inside specifications Rows 6-7: how to get a nonsignificant model
0.40
-0.20
0.20
-0.30
0.00 k1
N=12 DF=6
k2
Cond. no.=1.2289 Y-miss=0
Res1
Co(ColA)
pH
Ac
N=12 DF=6
R2=0.989 Q2=0.959
2/10/2004
Co(ColB)
OS
Te
22
Investigation: itdoe_roblimcases Plot of Replications for vetific with Experiment Number labels
Investigation: itdoe_roblimcases Plot of Replications for vetific with Experiment Number labels
45 vetific 40 35 30 25 1
10 9 11
70 vetific
10 9 11
60
4
2 3 4 5 6 7 8 9 Replicate Index
MOD D E 7 - 2003-1 1-17 11:5 8:00
50 1
1
2
2
3
3
4
4
5
5
6
6
7
7
8
8
9
100 90 80 70 60 50 40 30 20 10 0
3 1 2 4 5 6 7 8 10 11 9
vetific
Replicate Index
MODDE 7 - 2003-11-17 11:59:51
Replicate Index
MODDE 7 - 2003-11-17 12:01:59
2/10/2004
23
2/10/2004
24
2/10/2004
B) From the set of experiments, a model is derived which captures the relation between factor settings and experimental result (responses).
2/10/2004
DATA
Measurement Data
INFORMATION INFORMATION
Decision Action
Information Knowledge
INFORMATION DATA
2/10/2004
Contents
D-optimal design Blocking the experimental plan Mixture design Other RSM designs Multilevel qualitative factors The Taguchi approach to robust design Simultaneous optimization of several responses fitted with different models Partial least squares projections to latent structures, PLS Design in latent variables
2/10/2004
Additional Topics
D-optimal design
Contents
Introduction to D-optimal design Evaluation criteria
G-efficiency Condition number
2/10/2004
Factor A
Factor A
Factor B
Factor B
Factor A
Factor A
Factor B
Factor A
B
2/10/2004
C
5
Sett 1
Factor 3
Factor 1
r cto Fa
Factor 1
r cto Fa
2/10/2004
Factor 3
to r
Sett 2
Factor C
+1
-1 Sett 3
#Factors CCC/CCF BB 5 26 + 3 40 + 3 6 44 + 3 48 + 3 7 78 + 3 56 + 3
D-opt 26 + 3 35 + 3 43 + 3
Model upgrading
y = b0 + b1x1 + b2x2 + b3x3+ b12x1x2 + b13x1x3 + b23x2x3+ b11x12 + b22x22 + b33x32 + b111x13 + e
2/10/2004
2/10/2004
When making a combined design for process and mixture factors LoafVolume is a typical example where D-optimal design could have been utilized
2/10/2004
A D-optimal design can be tailored to support an irregular experimental region, or a very complex problem set-up (process + mixture)
2/10/2004
10
run 1 2 3 4
x1 -1 1 -1 1
x2 -1 -1 1 1
2/10/2004
11
X
1 -1 1 -1 1 1 1 1
(XX)
4 0 0 0 0 4 0 0 0 0 4 0 0 0 0 4 0.25 0 0 0
(XX)-1
0 0.25 0 0 0 0 0.25 0 0 0 0 0.25
Precision in b from:
2/10/2004
det=0
1 1 1
det=1
(9! / (3!*6!)) = 84 ways of selecting 3 trials out of 9 Maximize the determinant det(XX) Best precision in estimated regression coefficients with det = 16
2/10/2004
-1
-1
-1
-1
-1 -1 1 0 1
det=4
1 1
det=9
det=16
-1 -1 0 1
-1
-1
-1
-1
13
X
1 1 1
3 -1 0 -1 1 0
X
0 -1 1
0 0 2 3 -1 0
-1
-1
XX
-1 1 0 0 0 2
3 -1 0 -1 1 0
-1 0 0
1 -1 0
-1 1 0
1 0 -1
3 -1 0
1 0 1
-1 1 0
3 -1 0
0 0 2
14
2/10/2004
15
Evaluation criteria
Two common evaluation criteria: Condition number
- ratio of largest to smallest singular value of X - a measure of sphericity - 1 is lower (ideal) limit, denotes orthogonal design
G-efficiency
- computed as Geff = 100*p/n*d - compares the efficiency of a D-optimal design to that of a fractional factorial design - 100% is the upper limit and designates that a fractional factorial design was obtained - above 60-70% is recommended
2/10/2004
16
17
2/10/2004
18
2/10/2004
19
2/10/2004
20
2/10/2004
21
2/10/2004
22
2/10/2004
23
2/10/2004
24
Step 9: Evaluate the resulting designs. In this case all five alternatives are identical
2/10/2004
25
26
Full factorial design has 4*7 = 28 runs (very many in screening) A linear model is sufficient in screening:
Yield = b0 + b1Center + b2Variety + e constant term: 1 DF linear term of Center (7 levels): 6 DF linear term of Variety (4 levels): 3 DF extra: 5 DF Total: 15 DF
2/10/2004
27
C1........C4........C7 V1 V2 V3 V4
C1........C4........C7 V1 V2 V3 V4
C1........C4........C7 V1 V2 V3 V4
C1........C4........C7 V1 V2 V3 V4
2/10/2004
28
2/10/2004
29
30
2/10/2004
31
Optimization:
span in lifetime 6.02 - 22.28 min
2/10/2004
32
Summary
We have discussed: When to use D-optimal design What D-optimal design is Computational and geometrical aspects of the D-optimality criterion The condition number as evaluation criterion of D-optimality The G-efficiency as evaluation criterion of D-optimality Applications of D-optimal design
model updating multi-level qualitative factors combined designs of process and mixture factors
2/10/2004
33
Additional Topics
Contents
Introduction to blocking When to use blocking Blocking in MODDE
Block size Number of blocks Blockable designs Recoding of block factors
2/10/2004
35
Introduction to blocking
Randomization is used as a safeguard against unwanted sources of extraneous systematic variability When you cannot conduct all the experiments in a homogeneous way randomizing your experiments may not be sufficient to deal with such variability Blocking the experiments in synchronized groups may help to decrease the impact of such variability on the effects of the factors
2/10/2004
36
2/10/2004
37
Example: Blocking_Scr
With 25 design there are two options, with or without block interactions:
2/10/2004
38
Example: Blocking_Scr
With block interactions:
2/10/2004
39
Example: Blocking_Scr
Without block interactions:
2/10/2004
40
Example: Blocking_Scr
Design region (same with or without block interactions) Each block occurs twice in each cornercube
2/10/2004
41
Blocking in MODDE
MODDE supports orthogonal blocking for two-level full and fractional factorials, CCC, PB, and BB-designs (Note: CCF not blockable!) MODDE also supports blocking of D-optimal designs provided that the number of design runs is a multiple of the number of blocks (Note: blocks in D-optimal designs are usually not orthogonal to the factors)
2/10/2004
42
2/10/2004
43
2/10/2004
44
2/10/2004
45
Example: Blocking_RSM
Chemical example with objective to maximize yield. CCC design in two factors where cube and star portions were run at different time points.
2/10/2004
46
CCC design, which is blockable 2 blocks Equal number of center-points in each block
2/10/2004
47
Investigation: Blocking_RSM Plot of Replications for Yield with Experiment Number labels
B1 B2
90 88 86 Yield 84 82 80 78 1
5 6
Count
11 12 7 8 9 10
1
2 3 4
4
5 6 7 8 9 10 Replicate Index
MODDE 7 - 2004-02-04 15:05:16
0 77 81 85 Bins 89 93
2/10/2004
48
0.80
0.60
0.40
0.20
0.00 Yield
N=12 DF=3 Cond. no.=3.1808 Y-miss=0
Temp*$Blo(B1)
N=12 DF=3
R2=0.978 Q2=0.949
2/10/2004
Temp*$Blo(B2)
$Blo(B1)
$Blo(B2)
Tim*$Blo(B1)
Tim*$Blo(B2)
Tim
Tim*Tim
Temp
Temp*Temp
Tim*Temp
There is some evidence that slightly lower yields were obtained in the second block of six runs
2 0 -2 -4 -6 g
49
Use of model
Response surface plots visualise that higher yields were obtained in the first experimental campaign (when running the cube portion)
2/10/2004
50
2/10/2004
51
Additional Topics
Mixture Design
Contents
Introduction to mixture design A working strategy for mixture design
Example 1: Tablet formulation (regular experimental region) Example 2: Bubble formation - screening (irregular experimental region) Example 3: Bubble formation - optimization (irregular experimental region)
2/10/2004
53
What is the "problem" with the worksheet ? Each row sums to 1.0 !!!
54
2/10/2004
55
N=10 DF=4
Coefficients show that binder and fuel have the strongest impact on elasticity
56
4. G eneration of design
7. E xecution of design
2/10/2004
57
Constraint:
No other extra constraint
Response:
Release rate of the active substance (to be maximized)
2/10/2004
58
A LB UB
These bounds are inconsistent L*A After a simple arithmetic check (done automatically in the software) the new bounds become: LA
0.3 A 0.5 0.1 B 0.3 0.2 C 0.4.
2/10/2004
UA
LC
UC
59
Mixture model:
Quadratic
y = 0 + 1XMF1 + 2XMF2 + 3XMF3 + 11XMF12 + 22XMF22 + 33XMF32 + 12XMF1*XMF2 + 13XMF1XMF3 + 23XMF2XMF3 + Cox model type with constraints imposed on the regression coefficients
2/10/2004
60
Undesired experiments may be deleted from the candidate set prior to generation of the design
2/10/2004
61
2/10/2004
62
/ /
/0. 5/0
0.5 0 /0/
0.5
.5
B (0/1/0)
0/0.5/0.5
C (0/0/1)
1
X1 + X2 = 1
X2 0
2/10/2004
X1
1
63
10. U se of m odel
4. G eneration of design
7. E xecution of design
Useful approach to understand how and where the experiments are laid out
Glycerol = 0.0
2/10/2004
Glycerol = 0.1
Glycerol = 0.2
64
Linear
Quadratic
Special Cubic
2/10/2004
65
A (1/0/0)
/ /
0.5
Strongly irregular regions require an efficient algorithm to find overall centroid Serves the same function as the centerpoint does in process design
/0. 5/0
0.5 0 /0/ .5
B (0/1/0)
0/0.5/0.5
C (0/0/1)
2/10/2004
66
2/10/2004
67
0.80
0.60
0.40
1 7 2 3 10
-1 0
9 6
4 5
0.20
0.00 release
N=10 DF=4 Cond. no.=7.4174 Y-miss=0
Standardized Residuals
N=10 DF=4 R2=0.985 Q2=0.553 R2 Adj.=0.966 RSD=18.7170
MODDE 7 - 2004-01-23 11:02:43
Exp #10 is a probable outlier - Should be re-tested If #10 is deleted and model refitted, Q2 improves (from 0.55 to 0.69) indicates a more valid model
2/10/2004
68
100
50 min
-50
N=10 DF=4
R2=0.985 Q2=0.553
Regression coefficients
2/10/2004
69
Model predicts well except for blend 0.5/0.125/0.375 This strange experiment should be repeated and the model possibly updated with this new information
2/10/2004
70
Constraint:
0.2 DWL1 + DWL2 0.5
Response:
Lifetime of bubbles (sec) obtained with childrens bubble wand. Time until bursting was measured for bubbles of 4-5 cm size (diameter)
Mixture Factors:
Dish-washing liquid 1, SKONA, ICA (0 - 0.4) Dish-washing liquid 2, NEUTRAL, ADACO (0 - 0.4)
2/10/2004
71
2/10/2004
72
PLS analysis
Lifetime~
9 1 4 8 2 3 6 7
Replicate Index
MODDE 7 - 2004-01-23 11:15:36
13 19 20 17 18 16 14 15 23 21 22 24
0.80
0.60
12 11 10
0.40
0.20
0.00 Lifetime~
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
12
2 10
-1
14 4 24 86 16
20 13 5 23 17 1 3 11 9 18 21 22 7
19
15
0 Standardized Residuals
N=24 DF=18
R2=0.796 Q2=0.640
R2 Adj.=0.739 RSD=0.2018
MODDE 7 - 2004-01-23 11:23:22
2/10/2004
73
N=24 DF=18
R2=0.796 Q2=0.640
Regression coefficients
2/10/2004
74
Verifying experiment #1 Temp = 7 Time = 25 Mixture = 0.2 / 0.2 / 0.3 / 0.3 Resp 1 = 1120 sec (18 min 40 sec)
Verifying experiment #2 Temp = 7 Time = 49 Mixture = 0.4 / 0.0 / 0.3 / 0.3 Resp 1 = 810 sec (13 min 30 sec)
2/10/2004
75
Constraint: 0.3 DWL1 + DWL2 0.5 Response: Lifetime of bubbles (sec) obtained
with childrens bubble wand. Time until bursting was measured for bubbles of 4-5 cm size (diameter)
76
2/10/2004
77
PLS analysis
Lifetime~
3.10
1 2
8 11 5 4 3 6 7 9
8
1314
16 17
15
20 19 22 23 21 24
0.80
3.00
0.60
10 12
0.40
2.90
18
0.20
2.80 0 1 2 3
9 10 11 12 13 14 15 16 17 18 19 20 21 22 Replicate Index
MODDE 7 - 2004-01-23 11:45:51
0.00 Lifetime~
N=24 DF=14 Cond. no.=12.3206 Y-miss=0
11
7
-1
1 17 16 6 9 12 5
20 10 3 224
23 1921 13 18 8
14
22
4 15
0 Standardized Residuals
N=24 DF=14
R2=0.919 Q2=0.708
R2 Adj.=0.868 RSD=0.0358
MODDE 7 - 2004-01-23 13:14:00
2/10/2004
78
Glycerol = 0.4
N=24 DF=14
R2=0.919 Q2=0.708
Regression coefficients
2/10/2004
79
2/10/2004
80
2/10/2004
81
2/10/2004
82
Additional Topics
Contents
Introduction Three-level full factorial designs Box-Behnken designs Comparison of Composite, Three-level factorial, and Box-Behnken designs
2/10/2004
84
Introduction
Composite designs are commonly used in optimization
2/10/2004
85
86
x1
x3
x3
x1
x2
x2 x4
x2
x1
x1
x3
x3
x1
x2
x2
x2
x3
x2
x2
x2
x5
x1
x1
x3
x3
x1
x1
x1
x3
x3
x1
x2
x2
x2
x4
2/10/2004
x3
x3
x3
87
Box-Behnken designs
Family of designs employing three levels per varied factor BB-designs are useful if experimenting in the corners is unwanted Mostly, BB-designs are used when investigating three or four factors.
2/10/2004
88
Summary
An overview of the number of experiments encoded by composite, three-level full factorial, and Box-Behnken designs, for 2-5 factors
# Factors 2 3 4 5 CCC/CCF 8+3 14 + 3 24 + 3 26 + 3 Three-level 9+3 27 + 3 81 + 3 243 + 3 Box-Behnken ----12 + 3 24 + 3 40 + 3
Overall, the CCC and CCF designs are most economical Some parsimony is provided by the BB-designs in three and four factors as well, but with five factors the BB design is not an optimal choice The big drawback of the three-level full factorial designs is the rapidly increasing number of experiments
2/10/2004
89
Additional Topics
Contents
Introduction Example: Cotton cultivation Regression modelling of multi-level qualitative factors Interpretation of regression models
regression coefficient plot interaction plot
2/10/2004
91
Introduction
Example: Multilevel qualitative factors
Factor A is a qualitative factor with four levels, factor B a qualitative factor with three settings, and factor C a quantitative factor changing between -1 and +1 Selected objective: Screening and linear model Full factorial design in 24 experiments is not the best choice D-optimal design (open set or filled set) is a better alternative
Factor C Factor A Level 1 Level 2 Level 3 Level 4
2/10/2004
+1
Fa ct o
-1 Sett 3
Sett 1
rB
Sett 2
92
2/10/2004
93
2/10/2004
V(V1) V(V2) V(V3) V(V4) C(C1) C(C2) C(C3) C(C4) C(C5) C(C6) C(C7) V(V1)*C(C1) V(V1)*C(C2) V(V1)*C(C3) V(V1)*C(C4) V(V1)*C(C5) V(V1)*C(C6) V(V1)*C(C7) V(V2)*C(C1) V(V2)*C(C2) V(V2)*C(C3) V(V2)*C(C4) V(V2)*C(C5) V(V2)*C(C6) V(V2)*C(C7) V(V3)*C(C1) V(V3)*C(C2) V(V3)*C(C3) V(V3)*C(C4) V(V3)*C(C5) V(V3)*C(C6) V(V3)*C(C7) V(V4)*C(C1) V(V4)*C(C2) V(V4)*C(C3) V(V4)*C(C4) V(V4)*C(C5) V(V4)*C(C6) V(V4)*C(C7)
N=28 DF=0 Conf. lev.=0.95
MODDE 7 - 2004-01-23 13:34:35
94
In the case of multi-level qualitative factors, the interaction plot is especially informative Best possible combination of factors is Variety #4 and Center #4
V V V V
C5
C6
C7
2/10/2004
95
2/10/2004
96
The last extended term = negative sum of the other expanded terms All extended coefficients of a qualitative factor sum to zero
Important: Balancing
2/10/2004
Level 4
Fa ct
Sett 1
or
Sett 2
Factor C
+1
-1 Sett 3
98
Summary
Interaction plot informative tool in regression modelling Expansion of qualitative factors in regression modelling gives regular and extended mode coefficients plots Multi-level qualitative factors are well handled with D-optimal design
2/10/2004
99
Additional Topics
Contents
The Taguchi approach to robust design Inner and outer arrays of factors Classical analysis approach Interaction analysis approach Examples
CakeMix DrugD LoafVolume
2/10/2004
101
102
parameter design
equivalent to using DOE for finding optimal settings of the process variables
tolerance design
takes place when optimal factor settings have been specified tolerances on the factors are further adjusted if variability in the product quality is unacceptably high accomplished by using a mathematical model of the process, and the loss function belonging to the product property of interest
2/10/2004
103
Temp
Temp
225 175 30 Time 50 Temp 225 175 225 Temp 175 30 Time 50 Temp 225 175 30 Time 50 30 Time 50 30 Time 50 Temp 225 175 30 Time 50 Temp 225 175
225 175
100 Eggpowder
6
30 Time 50
Temp
30 Time 50
225 175
50 100
eni n g
50
200
Flour
400
104
Sho rt
CakeMix application
Inner and outer array system requires many experiments CakeMix: 11*5 = 55 experiments Experimental goal was to find levels of the three ingredients producing a good cake
(a) when the noise factors temperature and time were correctly set according to the instructions on the box, and (b) when deviations from these specifications occur
In this kind of testing, the producer has to consider worst-case scenarios corresponding to what the consumer might do with the product, and let these considerations regulate low and high levels of the noise factors
2/10/2004
105
2/10/2004
106
2/10/2004
107
Investigation: CakeTaguchi_classical Plot of Replications for LogStD with Experiment Number labels
0.40 0.30 0.20 LogStD
LogStD
6 4 3 5 8 7 1
1 2
1
0.30
1 7 3 11 10 9 4 8
Taste
7 11 10 9
LogStD
4.50
9 11 10
3 2
0.20
5 4 6 8
7 8 9
0.10
0.00
4.00 3.50
2
3 4 5 6 7 8 9 Replicate Index
-0.10
-0.20
3.60 3.80 4.00 4.20 4.40 4.60 4.80 5.00 5.20 5.40 5.60 5.80 6.00
Replicate Index
2/10/2004
108
1.00
0.80 0.60
0.40
0.20 0.00
-0.20 Taste
N=11 DF=4 Cond. no.=1.1726 Y-miss=0
LogStD
0.00
-0.10
N=11 DF=4
R2=0.995 Q2=0.874
N=11 DF=4
R2=0.959 Q2=-0.284
2/10/2004
109
0.80
0.60
0.40
0.20
0.00 Taste
N=11 DF=6 Cond. no.=1.1726 Y-miss=0
LogStD
0.00
-0.10
N=11 DF=6
R2=0.988 Q2=0.937
N=11 DF=6
R2=0.939 Q2=0.677
2/10/2004
110
2/10/2004
111
The best cake mix conditions are found in the upper left-hand corner Flour = 400g, Shortening = 50g, and Eggpowder = 100g
2/10/2004
112
For the Taguchi method to be really successful, one would need to be able to estimate the impact of the noise factors and possible interactions between the design and the noise factors The existence of such noise-design factor interactions is crucial, otherwise the noise (variability) cannot be reduced by changing some design factors
2/10/2004
113
2/10/2004
114
0.80 0.60
0.40
0.20 0.00
-0.20 Taste
N=55 DF=39 Cond. no.=1.3110 Y-miss=0
1
-2
23 55 53 12 54 29 218 50 41 37 25 39 8 4 47 49 16 6 42 27 15 44 4 3 7 24 13 26 9 10 11 3 22 45 52 28 1 4 5 21 30 46 20 35 40 34 19 36 48 38 31 51 33 32 17
-1 0 1 2 3 4 Deleted Studentized Residuals
0.50
N-Probability
0.00
-0.50
-1.00
Fl*Ti
Fl
Ti
Egg*Ti
Sh*Ti
Sh*Te
N=55 DF=39
R2=0.605 Q2=0.185
R2 Adj.=0.453 RSD=1.0545
MODDE 7 - 2004-01-23 13:53:55
N=55 DF=39
R2=0.605 Q2=0.185
2/10/2004
Sh*Egg
Egg*Te
Fl*Egg
Fl*Sh
Te*Ti
Fl*Te
Egg
Sh
Te
115
R2
2/10/2004
116
Taste
Large variation
Small
Te(high), (low), Ti Te Ti (high) (high) Te (low), Ti (low)
variation
N=55 DF=44
R2=0.693 Q2=0.571
R2 Adj.=0.623 RSD=0.8751
MODDE 7 - 2004-02-02 10:05:09
2/10/2004
117
2/10/2004
118
2/10/2004
119
Some model refinement necessary After: Strong model for OneHour, and no model for log SD1h (robust) All factors but Volume influence the average release
SD1h~
OneHour
N=27 DF=18 Cond. no.=5.9888 Y-miss=0
SD1h~
2/10/2004
120
121
Partially quadratic model with R2 = 0.79 and Q2 = 0.74 N-plot and ANOVA indicate model validity
36 35 34 33 32 31
62 142 87 49 115 74 35 104 61 114 125 30 32 44 27 88 23 143 128 113 140 153 155 20 34 43 102 47 59 152 103 107 77 86 112 17 31 124 136 116 5 50 89 4 54 3 139 158 141 157 26 33 45 48 72 8 1622 73 60 69 70 138 75 131 58 156 84 99 130 76 151 101 111 81 53 134 46 80 160 19 29 18 129 82 98 25 161 57 127 135 133 79 162 108 137 1 67 71 100 159 52 21 13 109 154 2 110 148 15 28 97 106 126 78 85 96 149 68 150 55 14 123 39 40 121 41 42 95 38 122 120 118 56 66 83 93 94 105 117 132 37 51 65 36 12 92 11 91 146 10 90 119 147 144 64 145 24 9 63 6 7 0 20 40 60 80 100 120 140 160 Replicate Index
MODDE 7 - 2004-02-02 10:37:16
OneHour
30.00
30.45
30.90
31.35
31.80
32.25
32.70
33.15
33.60
34.05
34.50
34.95
35.40
Bins
Reproducibility
1.00
0.80
N-Probability
0.60
0.40
0.20
24
0.00 OneHour
N=162 DF=147 Cond. no.=6.6122 Y-miss=0
N=162 DF=147
2/10/2004
35.85
122
Investigation: DrugD - interaction (MLR) Scaled & Centered Coefficients for OneHour (Extended)
5.0 4.0 3.0 2.0 1.0 0.0 -1.0 -2.0 -3.0 -4.0
0.50
0.00
-0.50
-1.00 Vol*Vol Vol Ba(B1) Ba(B2) Ba(B3) Ba(B4) Ba(B5) Ba(B6) Te Te*Te PrS PrS*PrS Vol*pH pH pH*pH
-5.0 Vol*Vol Vol Ba(B1) Ba(B2) Ba(B3) Ba(B4) Ba(B5) Ba(B6) Te Te*Te PrS PrS*PrS Vol*pH pH pH*pH
N=162 DF=147
R2=0.791 Q2=0.745
N=162 DF=147
R2=0.791 Q2=0.745
2/10/2004
123
2/10/2004
124
0.80
0.60
0.40
0.20
0.00 loafvolume
N=10 DF=4 Cond. no.=6.8608 Y-miss=0
stdev
2/10/2004
125
Model interpretation
Is it possible to get a volume of 530 and minimize spread?
2/10/2004
126
0.80
0.60
0.40
0.20
0.00 loafvolume
N=90 DF=75 Cond. no.=8.2742 Y-miss=0
80 81 71 26 63 77 78 72 87 45 62 86 53 70 60 3436 54 69 88 59 44 68 50 79 84 35 27 52 33 74 23 61 51 25 32 18 8 83 43 42 4976 41 14 6 65 40 67 20 17 15 85 22 9 16 66 13 29 56 31 24 7 57 5 47 4 55 39 38 58 75 82 19 30 11 37 12 73 48 64 3 2 46 21 10 28 1
-3 -2 -1 0 t[1] 1 2 3
8990
3 2 1 u[2] 0 -1 -2 -3
40.00
u[1]
Mi*Mi
Tj*Tj
Mi
Tj
Pr*Mi
Pr*Pr
Mi*Tj
Pr*Tj
Pr
Ha
Tj*Ha
Ha*Ha
-4
N=90 DF=75
N=90 DF=75
N=90 DF=75
R2=0.894 Q2=0.754
2/10/2004
Fo*Ha
Fo*Fo
Mi*Ha
Pr*Ha
Fo
Mi*Fo
Pr*Fo
Tj*Fo
82
127
Model interpretation
Volume sensitive to changes in proofing time
2/10/2004
128
Volume sensitive (= not robust) to changes in proofing and mixing time (which was discovered in classical analysis approach, as well)
2/10/2004
129
Inner array: Expensive factors Outer Array: Cheap factors 17 experiments per drill !
130
Summary
We have discussed
the Taguchi approach to robust design the concept of inner and outer arrays of factors the classical analysis approach the interaction analysis approach how to handle robust design testing when some factors are expensive and some inexpensive to vary
2/10/2004
131
Additional Topics
Contents
Background Example Data analysis Linked response Simultaneous optimization
2/10/2004
133
Background
When is there an interest in fitting different models to different responses? When working with many responses that are grouped
A PLS model fitted to grouped responses tends to have many components and be difficult to interpret
2/10/2004
134
Example: TruckEngine
Create one investigation for each response
2/10/2004
135
NOx
Investigation: TruckE_NOx (MLR) Scaled & Centered Coefficients for NOx
2 0 mg/s
mg/s 0.40
Soot~
Investigation: TruckE_Soot (MLR) Scaled & Centered Coefficients for Soot~
-2 -4
0.20
0.00
-6 -8
Air*Air Air NL*NL Air*NL EGR NL
-0.20
Air*Air
Air
Air
NL
NL*NL
EGR*NL
EGR
EGR*EGR
N=17 DF=10
R2=0.985 Q2=0.959
N=17 DF=13
R2=0.945 Q2=0.917
N=17 DF=9
R2=0.997 Q2=0.987
2/10/2004
EGR
NL
136
2/10/2004
137
138
139
Optimization results
Simplex #5 most successful
2/10/2004
140
Optimization results
Bring optimization results to response contour plots, or SweetSpot plot
Air = 240
2/10/2004
141
Summary
Linked responses can be used when responses are not correlated This means that one may, e.g., use PLS in a mother project for the analysis of a group of correlated responses, and then attach (link) another response and its model (MLR) coefficients prior to optimization Flexibility/Selectivity
outliers more easily eliminated PLS/MLR different models for different responses
142
Additional Topics
Contents
Introduction to PLS Geometrical interpretation of PLS LOWARP example
2/10/2004
144
2/10/2004
145
Notation
K M N A = = = = number of X variables number of Y variables number of observations number of PLS components
T P W U C
= = = = =
matrix of X-scores with col.s t1,.., tA (vectors) matrix of X-loadings with col.s p1,.., pA (vectors) matrix of PLS X-weights with col.s w1,.., wA (vectors) matrix of Y-scores with col.s u1,.., uA (vectors) matrix of PLS Y-weights with col.s c1,.., cA (vectors)
2/10/2004
146
Scaling of variables
x3
measured values & "length"
3
x1 x2 x3
20
x1
x2
Defining/Selecting the length of variable axes (X and Y-spaces) Recommended: To set each axis to unit length (unit variance scaling)
2/10/2004
147
responses M=3
y3
X
N N
Y
x2 y2
x1
y1
For each matrix, X and Y, we construct a space with K and M dimensions, respectively (here K=M=3) Each X- and Y-variable has one coordinate axis with the length defined by its scaling, typically unit variance
2/10/2004
148
Each observation is represented by one point in the X-space and one in the Y-space As in PCA, the initial step is to calculate and subtract the averages; this corresponds to moving the coordinate systems
2/10/2004
149
x2
x1 y1
Same observation
y2
The mean-centering procedure implies that the origos of the coordinate systems are repositioned
2/10/2004
150
x2
x1 y1
Projection of observation i
y2
The first PLS-component is a line in X-space and a line in Y-space, calculated to a) well approximate the point-swarms in X and Y and b) maximize covariance between the projections (t1 and u1) These lines pass through the average points
2/10/2004
151
The projection coordinates, t1 and u1, in the two spaces, X and Y, are connected and correlated through the inner relation ui1 = ti1 + hi (where hi is a residual) The slope of the dotted line is 1.0
2/10/2004
152
Comp 2 (t2)
x1
y1
The second PLS component is represented by lines in the X- and Y-spaces orthogonal to the lines of the first component, also going through the average points. These lines, t2 and u2, improve the approximation and correlation as much as possible.
2/10/2004
153
The second projection coordinates (t2 and u2) correlate, but less well than the first pair of latent variables By inserting X-values of a new observation into the model, we obtain its t1- and t2scores, which through the inner relation give values of u1 and u2, which, in turn, enable predicted values of Y to be computed
2/10/2004
154
PLS predictions
A new observation is similar to the training set if it is inside the tolerance cylinder in X-space Then its projection on the X-model (t) can be entered into the T-U-relation giving a u-value for each model dimension These values define a point on the Yspace model, which, in turn, corresponds to a predicted value for each y-variable
2/10/2004
155
Comp 2 (t2)
x2
x1 y1
y2
The PLS components form planes in X- and Y-spaces The variability around the X-plane is used to calculate a tolerance interval within which new observations similar to the training set will be located. This is of interest in classification and prediction.
2/10/2004
156
2/10/2004
157
PLS, Overview
X = 1* x + T* P'+E Y = 1* y + U *C'+F = 1 * y + T * C'+G
(because U = T + H) (inner relation)
differences to
2/10/2004
159
2/10/2004
160
LOWARP worksheet
Contains some missing data and many correlated responses
2/10/2004
161
Investigation: Lowarp (PLS, comp.=3) Score Scatter: t[1] vs u[1] with Experiment Number labels
2
16 3 2 17 8 12
10
0.80
1
R2 & Q2
u[1]
0.60
0.40
-1
0.20
7 6 11 9 4
-2
14 15 13 1
-2
0.00 Comp1
N=17 DF=13
5
-1 0 t[1]
N=17 DF=13 Cond. no.=2.0457 Y-miss=10
Comp2
Cond. no.=2.0457 Y-miss=10
Comp3
Investigation: Lowarp (PLS, comp.=3) Score Scatter: t[2] vs u[2] with Experiment Number labels
Investigation: Lowarp (PLS, comp.=3) Score Scatter: t[3] vs u[3] with Experiment Number labels
3 2 1 0 -1 -2
10 17 16 14 15 7 13 9 5 3
1 11
3 16 10 13 6 14 11 5 2 15 12 8
-2 -1 0 t[3]
N=17 DF=13 Cond. no.=2.0457 Y-miss=10
0 u[2]
9 1
-1
4 12
u[3]
-2
17 7
-3
2
-2 -1 t[2]
N=17 DF=13 Cond. no.=2.0457 Y-miss=10
2/10/2004
162
st3 st5
0.50 wc[2]
st1 gl
st3 st5
0.50 wc[2]
mi
0.00
st1 gl
mi
0.00
-0.50
cr
wc[1]
-0.50
cr
wc[1]
2/10/2004
163
st3 st5
mi
wc[1]
N=17 DF=13 Cond. no.=2.0457 Y-miss=10
2/10/2004
164
Variable importance for projection, VIP, is the most condensed way of expressing variable related information
st3 st5
w2 w6 w1 w5 w7 w3 w8 am w4 cr
0.00 wc[1]
N=17 Cond. no.=2.0457 Investigation: Lowarp (PLS, comp.=3) DF=13 Y-miss=10 Loading Scatter: wc[2] vs wc[3]
wc[3]
mi
glw4
st5 st3 mi
0.20
0.40
0.60
0.80
0.20
0.40
0.60
0.80
w8 w7 am
-0.60 -0.40 -0.20 0.00 0.20
mi
gl
wc[2]
N=17 DF=13 Cond. no.=2.0457 Y-miss=10
N=17 DF=13
2/10/2004
am
cr
0.40
0.60
0.80
0.00
165
Summary PLS
PLS is a multivariate regression method which is useful for handling complex DOE problems PLS is especially useful when:
(i) there are several correlated responses in the data set (ii) the experimental design has a high condition number (iii) there are small amounts of missing data in the response matrix
PLS calculates a new variable, t, summarizing X, and a another new variable, u, summarizing Y, and investigates the correlation between them All diagnostic tools available for MLR are retained for PLS In addition, PLS provides other diagnostic tools, such as, scores, loadings, and VIP
2/10/2004
166
Additional Topics
Contents
Introduction what is design in latent variables ?
Multivariate characterization Selecting informative molecules; COST vs DOE SMD: Increasing reliability of model and data
Example: Lead finding and lead optimization Example: Onion and an overview of design families
FDs and FFDs D-optimal design Cell-based & Grid-based design Space filling design Onion design principles
168
Introduction
In QSAR the central idea is to develop a model based on a small-sized training set, and calculate predictions for large numbers of non-tested compounds This means that the few chemicals in the training set should be representative and have a balanced distribution How do we accomplish this ?
Multivariate characterisation data matrix examined by PCA Principal Properties (few, orthogonal) Statistical Molecular Design (SMD) in principal properties (PP) Compounds are selected by matching the PP-scores to the chosen design
2/10/2004
169
A way to quantify qualitative, discrete, changes. The chemical descriptors must account for the dominant properties of the compounds, i.e. the principal properties that are known or anticipated to influence biological activity.
Properties such as
hydrophobicity steric properties (size) electronic properties (chemical) reactivity
PCA of the multi-property matrix gives the (latent) principal properties in terms of the principal component scores
2/10/2004
170
The COST Approach Vertical line: A is held constant while varying B Horizontal line: B is kept constant while varying A
The Design Approach Both factors A and B are varied simultaneously. This results in a better and more efficient mapping of the modelled response.
2/10/2004
171
Chemical map of 60 haloalkanes Trace of COSTing Problem: Limited range of applicability & reliability
Density
54 55
27 58 56 5716 23
17
1 t[2]
-1
-2
22 52 53 47 48 18 15 14 29 21 24 28 26 13 20 6 50 49 25 11 10 46 7 45 32 3 44 8 41 4 2 5 42 1 43 12 30 40 33 38 37 19 36 9 34 35
-6 -5 -4 -3 -2 -1 0 t[1] 1 2 3
51
-3
39 31
4 5 6
Mw/log P
2/10/2004
172
2/10/2004
173
174
Two-level factorial and fractional factorial designs with centre points are useful in QSAR modelling
2/10/2004
175
2/10/2004
176
7 4
6 35 16 5 30
-2
8 21 28 33 22 31 15 24 9 23 20 32 27 3725 26 11 34 12
-7 -6 -5 -4 -3 -2 -1 0 t[1] 1
t[2]
1 36
-4
38
2 3 4 5 6 7
2/10/2004
177
Cl N N N N H N Cl O 2 O O N O 5 N 6 7 N Cl N N 3 O N N S
O Cl S 4
S N
Cl N Cl O 8
2/10/2004
178
p[2]
Sub7
0.40 0.20 0.00 -0.20 -0.40 -0.60 -0.80 -0.60 -0.40 -0.20 0.00 p[1] 0.20 0.40
Sub6
Test2
0.60
Substances 1 and 2 are promising as leads. Sub2 would be the natural first choice.
2/10/2004
Some redundancy among the five tests. Three tests are sufficient in the future, e.g. 2, 3 and 4.
179
Suppose we select Sub2 as our lead Convention: 'OH' = pos 1, 'orto-Cl' pos 2, 'para-Cl' pos 3; Quinoline scaffold not varied Substituent descriptors (principal properties) taken from Skagerberg et al. (QSAR 8 (1989), 32-38
-1 -1 -1 Cl NO NO2 N3 SO2NH2 OCF3 CN NCS SCN CHO COOH CONH2 CH=NOH NHCSNH2 SOCH3 OSO2CH3 SO2CH3 NHSO2CH3 NHCOCF3 CH=CHNO2 COCH3 SCOCH3 OCOCH3 COCH3 CONHCH3 SO2C2H5 +1 -1 -1 N=CCl2 COOC2H5 CH=CHCOCH3 COOC3H7 N=NC6H5 OSO2C6H5 NHSO2C6H5 OCOC6H5 CHNC6H5 CH2OC6H5 -0.59 -0.77 -0.68 -0.25 -0.54 -0.30 -0.57 -0.32 -0.40 -0.64 -0.53 -0.50 -0.27 -0.21 -0.48 -0.39 -0.43 -0.31 -0.32 -0.27 -0.46 -0.12 -0.29 -0.23 -0.25 -0.15 0.06 0.10 0.10 0.38 0.80 0.78 0.71 0.74 0.74 0.83
t1 1.00 0.88 -0.88 -1.00 -0.83 -0.57 -0.72 -0.57 -0.55 -0.43 -0.41 -0.17 -0.45 -0.52 -0.29 -0.45 -0.33 -0.44 -0.29 -0.19 -0.32 -0.19 -0.25 -0.02 -0.09 -0.07 -0.12 -0.03 0.09 0.14 0.28 0.23 0.43 0.51 0.82
t2 -0.26 -0.07 0.17 0.61 0.55 0.05 0.80 0.63 0.13 0.18 0.04 0.20 0.40 0.28 0.15 1.00 0.02 0.17 0.20 0.06 0.47 0.40 0.20 0.08 0.77 0.18 0.45 0.25 0.19 0.33 0.09 0.74 0.34 0.74 0.29
t3 -0.13 -0.35 -0.36 -0.48 -0.52 -0.10 -0.51 -0.41 -0.25 -0.16 -0.65 -0.72 -0.36 -0.55 -0.06 -0.26 -0.45 -0.34 -0.13 -0.68 -0.01 -0.52 -0.64 -0.06 -0.36 -0.06 -0.14 -0.14 -0.56 -0.40 -0.02 -0.35 -0.35 -0.32 -0.97
t1 -1 -1 +1 Br SOOF SF5 I CF3 SCF3 SOOCF3 CF2CF3 PMe2 COC3H7 +1 -1 +1 2-Thienyl SOOC6H5 COC6H5 -1 +1 +1 CH2Br CH2I CH3 NMe2 Cyclo-Pr CHMe2 C3H7 t-C4H9 CH2C6H5 +1 +1 +1 s-C4H9 n-C4H9 C5H11 C6H5 OC6H5 NHC6H5 CycloHex -0.48 -0.63 -0.28 -0.30 -0.61 -0.15 -0.41 -0.35 -0.24 -0.11 0.26 0.22 0.04 -0.33 -0.15 -0.64 -0.34 -0.24 -0.17 -0.04 -0.10 -0.04 0.06 0.28 0.56 0.36 0.07 0.13 0.46
t2 -0.20 -0.95 -0.81 -0.22 -0.40 -0.36 -1.00 -0.44 -0.10 -0.60 -0.03 -0.84 -0.50 0.16 0.17 0.52 0.79 0.32 0.23 0.43 0.19 0.37 0.07 0.42 0.38 0.01 0.08 0.51 0.24
t3 0.06 0.09 0.35 0.22 0.33 0.18 0.28 0.46 0.39 0.47 0.12 0.18 0.84 0.10 0.25 0.00 0.12 0.26 0.66 0.00 0.95 1.00 0.50 0.02 0.04 0.17 0.71 0.64 0.63
2/10/2004
180
2/10/2004
Columns x1 x3 represent substituent position 1, columns x4 x6 position 2, and columns x7 x9 position 3 The proposed molecular structures should be checked with the synthetic chemists
CompNo 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Position 1 I COOC2H5 H C6H5 NO2 COC6H5 CH(CH3)2 OC3H7 NO2 COC6H5 CH(CH3)2 OC3H7 I COOC2H5 H C6H5
Position 2 NO2 H CH(CH3)2 I C6H5 COC6H5 COOC2H5 OC3H7 I CH(CH3)2 H NO2 OC3H7 COOC2H5 COC6H5 C6H5
Position 3 NO2 CH(CH3)2 I H H I CH(CH3)2 NO2 C6H5 COOC2H5 OC3H7 COC6H5 COC6H5 OC3H7 COOC2H5 C6H5
2/10/2004
182
2/10/2004
183
Example: Onion
Data set from AZ, Lund, Bosse Nordn
N = 1107 K = 115
10
Objective: Select 80 diverse and representative compounds First PCA score plot
= 0.50 (A = 2) R2X = 0.75 (A = 6) R2X
t[2]
-5
-10
-20
-10 t[1]
10
2/10/2004
184
185
2/10/2004
186
10 8 6 4 2 M1.t2
47 46 45 44 43 42 41 40 39 38 37 66 65 64 63 62 61 60 59 58 57 56 55 54 36 35 34 33
0 -2 -4 -6 -8 -10 -12
61 1 9 8 7 0 5 4 3 2 79 78 77 76 75 74 73 72 71 70 69 68 67
80
12 11
20 19 18 17 16 15 14 13 53 52 51 50 49 48 32 31 30 29 28 27 26 25 24
-20 -10 M1.t1 0 10
23 22 21
t[2]
-5
-10
-20
-10 t[1]
10
2/10/2004
187
Selection depends on mesh size and distribution of compounds Easy with A = 2, but complicated with A = 6!
-10
-20
0 t[1]
20
2/10/2004
188
Space-filling design
Similar to cell- & grid-based design Distance calculations between points in chemical space Compounds are selected giving the best coverage (smallest average distance between selected points) of the chemical space
2/10/2004
189
Onion design
Sees the chemical domain as composed of layers Selection becomes a function of number of layers and type of design laid out in each layer
2/10/2004
190
2/10/2004
191
2/10/2004
192
2/10/2004
193
2/10/2004
194
2/10/2004
195
2/10/2004
196
2/10/2004
197
2/10/2004
198
2/10/2004
199
2/10/2004
200
2/10/2004
201
2/10/2004
202
2/10/2004
203
2/10/2004
204
2/10/2004
205
2/10/2004
206
Contents
Introduction A working strategy for mixture design
Example: Tablets
2/10/2004
Introduction
Applications of DOE
Designs with process factors
Regular region: Factorials, Composite, Plackett-Burman, Box-Benhken Irregular region: D-optimal design
2/10/2004
2/10/2004
2/10/2004
Xk = 1
We can express the relative proportions as fractions or percentages
2/10/2004
Linear
Quadratic
Experimental domain is a simplex (or polyhedron) Experimental region has dimensionality k-1, where k is the number of mixture factors
2/10/2004
2/10/2004
A D-optimal design is a computer generated design that locates the experiments in such a way that the experimental region is well covered
2/10/2004
10
What is the "problem" with the worksheet ? Each row sums to 1.0 !!!
11
2/10/2004
12
N=10 DF=4
Coefficients show that binder and fuel have the strongest impact on elasticity
13
4. G eneration of design
7. E xecution of design
2/10/2004
15
10. U se of m odel
4. G eneration of d esign
7. Execution of d esign
Constraint:
No other extra constraint
Response:
Release rate of the active substance (to be maximized)
2/10/2004
16
10. U se of m odel
Co-ordinates of a Simplex
At each corner, one component is pure, 1.0 At the opposite side, this component is absent, 0.0 The concentration is the same along a line parallel with the opposite side. E.g. for A along horizontal lines. Going from the corner A (A=1.0) down, corresponds to going through A=1.0, A=0.75, A=0.5, ..., A=0.0 In the same way, going through the corner B towards the opposite side, corresponds to going through B=1.0, B=0.75, B=0.5, ..., B=0.0. And analogously for C.
2/10/2004
4. G eneration of d esign
7. Execution of d esign
17
10. U se of m odel
4. G eneration of d esign
7. Execution of d esign
A LB UB
These bounds are inconsistent After a simple arithmetic check L*A (done automatically in the software) the new bounds become:
0.3 A 0.5 0.1 B 0.3 0.2 C 0.4.
UA
LA
LC
UC
2/10/2004
18
10. U se of model
4. G eneration of design
7. Execution of design
Mixture model:
Quadratic
y = 0 + 1XMF1 + 2XMF2 + 3XMF3 + 11XMF12 + 22XMF22 + 33XMF32 + 12XMF1*XMF2 + 13XMF1XMF3 + 23XMF2XMF3 + Cox model type with constraints imposed on the regression coefficients
2/10/2004
19
4. Generation of design
7. Execution of design
The candidate set is the pool of theoretically possible and meaningful experiments, from which the actual design is selected Here, the candidate set is small:
3 extreme vertices 3 centers of edges 3 interior points 1 overall centroid
In most cases but mixture applications, undesired experiments may be deleted from the candidate set prior to generation of the design
2/10/2004
20
4. Generation of design
7. Execution of design
The design should contain experiments which are informative and map the experimental region as well as possible In this case the experimental region is regular and then the Simplex Centroid design is applicable
2/10/2004
21
10. U se of m odel
4. G eneration of design
7. E xecution of design
/0. 5/0
0.5 0 /0/
0.5
.5
B (0/1/0)
0/0.5/0.5
C (0/0/1)
1
X1 + X2 = 1
X2 0
2/10/2004
X1
1
22
10. U se of m odel
4. G eneration of design
7. E xecution of design
Useful approach to understand how and where the experiments are laid out
Glycerol = 0.0
2/10/2004
Glycerol = 0.1
Glycerol = 0.2
23
10. U se of m odel
4. G eneration of design
7. E xecution of design
Linear
Quadratic
Special Cubic
2/10/2004
24
10. U se o f m odel
4. G eneration of design
7. Execution of design
A (1/0/0)
0.5
Strongly irregular regions require an efficient algorithm to find overall centroid Serves the same function as the centerpoint does in process design
/0. 5/0
0.5 0 /0/ .5
B (0/1/0)
0/0.5/0.5
C (0/0/1)
2/10/2004
25
4. G eneration of design
7. Execution of design
2/10/2004
26
4. Generation of design
7. Execution of design
0.80
0.60
0.40
1 7 2 3 10
-1 0
9 6
4 5
0.20
0.00 release
N=10 DF=4 Cond. no.=7.4174 Y-miss=0
Standardized Residuals
N=10 DF=4 R2=0.985 Q2=0.553 R2 Adj.=0.966 RSD=18.7170
MODDE 7 - 2004-01-23 11:02:43
Exp #10 is a probable outlier - Should be re-tested If #10 is deleted and model refitted, Q2 improves (from 0.55 to 0.69) indicates a more valid model
2/10/2004
27
4. Generation of design
7. Execution of design
100
50 min
-50
N=10 DF=4
R2=0.985 Q2=0.553
Regression coefficients
2/10/2004
28
10. U se o f m odel
4. G eneration of design
7. Execution of design
release(pred) Lower Upper 363 322 404 293 262 324 363 322 405 320 278 361
Model predicts well except for blend 0.5/0.125/0.375 This strange experiment should be repeated and the model possibly updated with this new information
2/10/2004
29
Summary
DoE is an organized approach
Yields more useful information (influence of all factors together) Yields more precise information in fewer experiments Results evaluated in the light of variability A map of the system is obtained (useful for decision-making)
Approach to mixture design very similar to approach used for conventional process designs
2/10/2004
30
10. U se of m odel
4. G eneration of desig n
7. E xecution of desig n
Tap Water, Ume (0.4 - 0.8) Glycerol, APOTEKETS (15% water content / 0.0 - 0.2)
Constraint:
0.2 DWL1 + DWL2 0.5
Response:
Lifetime of bubbles (sec) obtained with childrens bubble wand. Time until bursting was measured for bubbles of 4-5 cm size (diameter)
Mixture Factors:
Dish-washing liquid 1, SKONA, ICA (0 - 0.4) Dish-washing liquid 2, NEUTRAL, ADACO (0 - 0.4)
2/10/2004
32
10. U se of m odel
Mixture Region
Mixture components are not independent: X1 + X2 + ....+ Xp = total usually total = 1 or 100% Mixture region is constrained If NO additional bounds on the components; that is every component can vary between 0 and 1:
4. G eneration of d esign
7. E xecution of design
2/10/2004
33
10. U se of m odel
4. G eneration of desig n
7. E xecution of desig n
The mixture region might be very small. The size is inferred from calculations of Range Lower (RL) and Range Upper (RU) If the region is a regular simplex, several classical mixture designs are available If the region is irregular, the experiments are laid out D-optimally
Consistency of bounds:
Some combinations of bounds are disallowed Implied bounds arise from the stated bounds
The above properties are handled automatically in software (MODDE), but unawareness of them might lead to bad or unexpected results
2/10/2004
34
10. U se of m odel
Types of Bounds
Lower bounds only
Li Xi 1.0 0.4 Tap water 1.0 (example, not realistic for bubble formation)
4. G eneration of desig n
7. E xecution of desig n
Relational constraints
0.1 X1 + X5 0.5 0.3 DWL1 + DWL2 0.5 Tap water + DWL1 + DWL2 + Glycerol 70 (SEK/l)
2/10/2004
35
10. U se of m odel
4. G eneration of desig n
7. E xecution of desig n
B (Oxidizer)
C (Fuel)
2/10/2004
36
10. U se of m odel
4. G eneration of desig n
7. E xecution of desig n
2/10/2004
37
10. U se of m odel
4. G eneration of desig n
7. E xecution of desig n
2/10/2004
38
10. U se of m odel
4. G eneration of desig n
7. E xecution of desig n
If all extreme points are valid Simplex (Regular region) Example : 0.1 A 0.8 0.1 B 0.8 0.1 C 0.8
2/10/2004
39
10. U se of m odel
4. G eneration of desig n
7. E xecution of desig n
These bounds are consistent, but the experimental region is irregular D-optimal design
2/10/2004
40
10. U se of m odel
4. G eneration of desig n
7. E xecution of desig n
2/10/2004
41
10. U se of m odel
4. G eneration of desig n
7. E xecution of desig n
Mixture region is irregular following the definition of the relational constraint A + B 0.65 shown as the dotted line
B
2/10/2004
C
42
10. U se of m odel
4. G eneration of desig n
7. E xecution of desig n
Example: 2 x1 + x2 0.30
2/10/2004
43
10. U se of m odel
Summary
Mixture has only Lower Bounds
Experimental Region is always a simplex
4. G eneration of desig n
7. E xecution of desig n
2/10/2004
44
4. Generation of design
7. Execution of design
Mixture model:
Process Model: Mixture Model: Process*Mixture Model: Interaction Linear Interaction
y = 0 + 1XPF1 + 2XPF2 + 3XMF3 + 4XMF4 + 5XMF5 + 6XMF6 + 12XPF1*XPF2 + 13XPF1*XMF3 + 14XPF1*XMF4 + 15XPF1XMF5 + 16XPF1XMF6 + 23XPF2XMF3 + 24XPF2XMF4 + 25XPF2XMF5 + 26XPF2XMF6 +
2/10/2004
45
4. Generation of design
7. Execution of design
2/10/2004
46
4. Generation of design
7. Execution of design
Add 5 extra experiments to get enough DF In addition, 2 supplementary experiments are recommended to handle the complexity introduced by the linear constraint lead no. of experiments, N = 20 (Note: no replicates included in this estimate)
2/10/2004
47
4. Generation of design
7. Execution of design
2/10/2004
48
Axes of a Simplex
Definition: The xi axis of the simplex is the one-dimensional subspace of the simplex where: xj = (1-xi)/(q-1) for all ji
4. Generation of design
7. Execution of design
The xi axis is the line perpendicular to the xi = 0 base of the simplex and passing through the centroid of the simplex
2/10/2004
B
Axes of components
49
Axial Designs
Axial designs consist of mixtures situated entirely on the axes of the simplex With Axial designs most of the points are positioned inside the simplex and consist of complete mixtures of q component blends Axial designs are recommended for use when component effects are to be measured for screening experiments and when linear models are to be fitted Extended Axial
2/10/2004
4. Generation of design
7. Execution of design
Standard Axial
50
4. Generation of design
7. Execution of design
or a combination of these
2/10/2004
51
4. Generation of design
7. Execution of design
The Extreme Vertices designs of McLean-Anderson provide the best available solution to the constrained design The Extreme Vertices are those points that lie on the intersection of the constrained boundaries Extreme Vertices are generated by forming all possible combinations of the q-1 constraints, and calculating the level of the qth component.
This gives a q*2q-1 possible points The Extreme Vertices are those points whose component levels lie within the constraints
2/10/2004
52
Extreme Vertices
Rapidly increasing complexity q
2 3 4 5 6 7 8 9 10 11 12
4. Generation of design
7. Execution of design
points
4 12 32 80 192 448 1024 2304 5120 11264 24576
2/10/2004
53
4. Generation of design
7. Execution of design
7 3 11 2 10 6
B
2/10/2004
C
54
4. Generation of design
7. Execution of design
4. Generation of design
7. Execution of design
A D-optimal design is a computer generated design, and consists of the best subset of experiments selected from the candidate set For a given model, Y = X + , the following can be said regarding the D-optimal approach:
the selected runs maximize the determinant of the matrix X'X these experiments span the largest volume possible in the experimental region
A D-optimal design can be tailored to support an irregular experimental region, or a very complex problem set-up (process + mixture)
2/10/2004
56
4. Generation of design
7. Execution of design
run 1 2 3 4
x1 -1 1 -1 1
x2 -1 -1 1 1
2/10/2004
57
4. Generation of design
7. Execution of design
X
1 -1 1 -1 1 1 1 1
(XX)
4 0 0 0 0 4 0 0 0 0 4 0 0 0 0 4 0.25 0 0 0
(XX)-1
0 0.25 0 0 0 0 0.25 0 0 0 0 0.25
Precision in b from:
2/10/2004
4. Generation of design
7. Execution of design
det=0
1 1 1
det=1
(9! / (3!*6!)) = 84 ways of selecting 3 trials out of 9 Maximize the determinant det(XX) Best precision in estimated regression coefficients with det = 16
-1
-1
-1
-1
-1 -1 1 0 1
det=4
1 1
det=9
det=16
-1 -1 0 1
-1
-1
-1
-1
2/10/2004
59
4. Generation of design
7. Execution of design
X
1 1 1
3 -1 0 -1 1 0
X
0 -1 1
0 0 2 3 -1 0
-1
-1
XX
-1 1 0 0 0 2
3 -1 0 -1 1 0
-1 0 0
1 -1 0
-1 1 0
1 0 -1
3 -1 0
1 0 1
-1 1 0
3 -1 0
0 0 2
60
4. Generation of design
7. Execution of design
2/10/2004
61
Evaluation criteria
Two common evaluation criteria: Condition number
- ratio of largest to smallest singular value of X - a measure of sphericity - 1 is lower (ideal) limit, denotes orthogonal design
4. Generation of design
7. Execution of design
G-efficiency
- computed as Geff = 100*p/n*d - compares the efficiency of a D-optimal design to that of a fractional factorial design - 100% is the upper limit and designates that a fractional factorial design was obtained - above 60-70% is recommended
2/10/2004
62
4. Generation of design
7. Execution of design
Add 5 extra experiments to get enough DF In addition, 2 supplementary experiments are recommended to handle the complexity introduced by the linear constraint lead no. of experiments, N = 20 (Note: no replicates included in this estimate)
2/10/2004
63
4. Generation of design
7. Execution of design
lead no. of experiments, N = 20 (Note: no replicates included in this estimate) Due to the element of randomness in the D-optimal search, we recommend to explore N 4 runs and generate 5 versions for each level of N 4 We explored N=16 to N=24 45 alternative D-optimal designs Best design with N = 16 (Geff = 76%, CondNo = 2.7)
2/10/2004
64
4. Generation of design
7. Execution of design
2/10/2004
65
10. U se of m odel
4. G eneration of design
7. E xecution of design
Useful approach to understand how and where the experiments are laid out
In MODDE: Show/Design Region It was concluded that the shape of the experimental region was reasonable and not too distorted, and of sufficient size
Glycerol = 0.0
2/10/2004
Glycerol = 0.1
Glycerol = 0.2
66
1. D efinition of facto rs an d b ou nd s
10 . U se of m odel
4. G eneration o f d esign
7. E xecutio n o f d esign
The reference mixture is used for anchoring the mathematical model easy to find for regular regions (overall centroid) Strongly irregular regions require efficient algorithm to find the centroid Serves the same function as the center-points in process design Calculated reference mixture: (0.183/0.183/0.55/0.084) (DWL1 / DWL2 / water / glycerol) Manually modified reference mixture: (0.2 / 0.2 / 0.5 / 0.1)
2/10/2004
67
1. D efinition of facto rs an d b ou nd s
10 . U se of m odel
4. G eneration o f d esign
7. E xecutio n o f d esign
- computationally extensive
Averages of all extreme vertices (AVG) Range Normalized Midrange (used in MODDE):
RNM (s1, s2, si , ., sq) si = mi - [Ri*(mj - 1.0)/Rj] i = 1 to q; j = 1 to q
Range: Ri = Ui - Li
Midrange: mi = (Ui + Li)/2
2/10/2004
68
10. U se of model
4. G eneration of design
7. Execution of design
69
4. Generation of design
7. Execution of design
Investigation: Bubb_scr (PLS, comp.=2) Scaled & Centered Coefficients for Lifetime~
0.60
Reproducibility
1.00 0.80
0.40
0.60 0.40 0.20 0.00 -0.20 Lifetime~
N=24 DF=11 Cond. no.=2.7203 Y-miss=0
0.20 s 0.00 -0.20 -0.40 -0.60 Ti Te*Ti Te*Gly Ti*DW1 Te*DW1 Te*DW2 Ti*DW2 Te*Wa Ti*Gly Gly Ti*Wa Te DW1 DW2 Wa
N=24 DF=11
R2=0.812 Q2=0.185
2/10/2004
70
4. Generation of design
7. Execution of design
2/10/2004
71
PLS -- Notation
K M N A = = = = number of X variables number of Y variables number of observations number of PLS components
4. Generation of design
7. Execution of design
T P W U C
= = = = =
matrix of X-scores with col.s t1,.., tA (vectors) matrix of X-loadings with col.s p1,.., pA (vectors) matrix of PLS X-weights with col.s w1,.., wA (vectors) matrix of Y-scores with col.s u1,.., uA (vectors) matrix of PLS Y-weights with col.s c1,.., cA (vectors)
2/10/2004
72
4. Generation of design
7. Execution of design
3
x1 x2 x3
20
x1
x2
Defining/Selecting the length of variable axes (X and Y-spaces) Recommended: To set each axis to unit length (unit variance scaling)
2/10/2004
73
4. Generation of design
7. Execution of design
responses M=3
y3
X
N N
Y
x2 y2
x1
y1
For each matrix, X and Y, we construct a space with K and M dimensions, respectively (here K=M=3) Each X- and Y-variable has one coordinate axis with the length defined by its scaling, typically unit variance
2/10/2004
74
4. Generation of design
7. Execution of design
Each observation is represented by one point in the X-space and one in the Y-space As in PCA, the initial step is to calculate and subtract the averages; this corresponds to moving the coordinate systems
2/10/2004
75
4. Generation of design
7. Execution of design
x2
x1 y1
Same observation
y2
The mean-centering procedure implies that the origos of the coordinate systems are repositioned
2/10/2004
76
4. Generation of design
7. Execution of design
x2
x1 y1
Projection of observation i
y2
The first PLS-component is a line in X-space and a line in Y-space, calculated to a) well approximate the point-swarms in X and Y and b) maximize covariance between the projections (t1 and u1) These lines pass through the average points
2/10/2004
77
4. Generation of design
7. Execution of design
The projection coordinates, t1 and u1, in the two spaces, X and Y, are connected and correlated through the inner relation ui1 = ti1 + hi (where hi is a residual) The slope of the dotted line is 1.0
2/10/2004
78
4. Generation of design
7. Execution of design
Comp 2 (t2)
x1
y1
The second PLS component is represented by lines in the X- and Y-spaces orthogonal to the lines of the first component, also going through the average points. These lines, t2 and u2, improve the approximation and correlation as much as possible.
2/10/2004
79
4. Generation of design
7. Execution of design
The second projection coordinates (t2 and u2) correlate, but less well than the first pair of latent variables By inserting X-values of a new observation into the model, we obtain its t1- and t2scores, which through the inner relation give values of u1 and u2, which, in turn, enable predicted values of Y to be computed
2/10/2004
80
4. Generation of design
7. Execution of design
Comp 2 (t2)
x2
x1 y1
y2
The PLS components form planes in X- and Y-spaces The variability around the X-plane is used to calculate a tolerance interval within which new observations similar to the training set will be located. This is of interest in classification and prediction.
2/10/2004
81
4. Generation of design
7. Execution of design
2/10/2004
82
PLS -- Overview
X = 1* x + T* P'+E Y = 1* y + U *C'+F = 1 * y + T * C'+G
(because U = T + H) (inner relation)
4. Generation of design
7. Execution of design
differences to
4. Generation of design
7. Execution of design
3) w are the correlation coefficients between the x's and u - Columns of X highly correlated with Y are given high weights 4) At Convergence for the Orthogonality: - p is computed so that t*p' is the "Best approximation of X" - t*p' is removed from X for the next component
2/10/2004
84
Summary of PLS
4. Generation of design
7. Execution of design
PLS is a multivariate regression method which is useful for handling complex DOE problems PLS is especially useful when:
(i) there are several correlated responses in the data set (ii) the experimental design has a high condition number (iii) there are small amounts of missing data in the response matrix
PLS calculates a new variable, t, summarizing X, and a another new variable, u, summarizing Y, and investigates the correlation between them All diagnostic tools available for MLR are retained for PLS In addition, PLS provides other diagnostic tools, such as, scores, loadings, and VIP
2/10/2004
85
4. Generation of design
7. Execution of design
Investigation: Bubb_scr (PLS, comp.=2) Scaled & Centered Coefficients for Lifetime~
0.60
Reproducibility
1.00 0.80
0.40
0.60 0.40 0.20 0.00 -0.20 Lifetime~
N=24 DF=11 Cond. no.=2.7203 Y-miss=0
0.20 s 0.00 -0.20 -0.40 -0.60 Ti Te*Ti Te*Gly Ti*DW1 Te*DW1 Te*DW2 Ti*DW2 Te*Wa Ti*Gly Gly Ti*Wa Te DW1 DW2 Wa
N=24 DF=11
R2=0.812 Q2=0.185
2/10/2004
86
4. Generation of design
7. Execution of design
Reproducibility
1.00
0.80
0.60
0.40
0.20
12
2 10
-1
20 5 13 23 17 1 3 9 11 18 21 14 227 4 24 86 16
0 Standardized Residuals 1
19
15
0.00 Lifetime~
N=24 DF=18 Cond. no.=2.1537 Y-miss=0
N=24 DF=18
R2=0.796 Q2=0.640
R2 Adj.=0.739 RSD=0.2018
MODDE 7 - 2004-02-02 12:50:00
2/10/2004
87
4. Generation of design
7. Execution of design
N=24 DF=18
R2=0.796 Q2=0.640
88
10. U se of m odel
4. G eneration of design
7. Execution of d esign
Verifying experiment #1 Temp = 7 Time = 25 Mixture = 0.2 / 0.2 / 0.3 / 0.3 Resp 1 = 1120 sec (18 min 40 sec)
Verifying experiment #2 Temp = 7 Time = 49 Mixture = 0.4 / 0.0 / 0.3 / 0.3 Resp 1 = 810 sec (13 min 30 sec)
2/10/2004
89
Summary
Proposed working strategy works for
mixture regions of regular geometry mixture regions of irregular geometry experimental series involving both process and mixture factors
Strategy is oriented towards a graphical presentation of modelling results In BubbleScr it was possible to raise bubble lifetime from 11 sec. to 6.02 min. Verifying experiments of model predictions gave increased lifetime of 18.40 min. Bubble lifetime further optimized by RSM D-optimal design (see section 5)
2/10/2004
90
10. U se of m odel
4. G eneration of desig n
7. E xecution of design
Tap Water, Ume (0.2 - 0.4) Glycerol, APOTEKETS (15% water content / 0.2 - 0.4)
Constraint: 0.3 DWL1 + DWL2 0.5 Response: Lifetime of bubbles (sec) obtained
with childrens bubble wand. Time until bursting was measured for bubbles of 4-5 cm size (diameter)
2/10/2004
92
10. U se of model
7. Execution of design
Experimental objective:
Optimization
Mixture model:
Quadratic
y = 0 + 1XMF1 + 2XMF2 + 3XMF3 + 4XMF4 + 11XMF12 + 22XMF22 + 33XMF32 + 44XMF42 + 12XMF1*XMF2 + 13XMF1XMF3 + 14XMF1XMF4 + 23XMF2XMF3 + 24XMF2XMF4 + 34XMF3XMF4 +
2/10/2004
93
4. Generation of design
7. Execution of design
2/10/2004
94
10. U se of m odel
4. G eneration of design
7. Execution of design
Add 5 extra experiments to get enough DF In addition, 2 supplementary experiments are needed to handle the complexity introduced by the linear constraint lead no of experiments = 17
2/10/2004
95
10 . U se of m odel
4. G eneration of design
7. E xecution of design
Glycerol = 0.2
Glycerol = 0.3
Glycerol = 0.4
2/10/2004
96
10. U se of m odel
4. G eneration of design
7. Execution of d esign
Calculated reference mixture: (0.2 / 0.2 / 0.3 / 0.3) (DWL1 / DWL2 / water / glycerol)
2/10/2004
97
10. U se of model
4. G eneration of design
7. Execution of design
2/10/2004
98
10. U se of model
4. Generation of design
7. Execution of design
Reproducibility
1.00
0.80
0.60
0.40
0.20
11
7
-1
4 15 22 23 14 21 19 13 18 8 20 10 3 24 2 117 16 6 9 512
0 Standardized Residuals 1
0.00 Lifetime~
N=24 DF=14 Cond. no.=12.3206 Y-miss=0
N=24 DF=14
R2=0.919 Q2=0.708
R2 Adj.=0.868 RSD=0.0358
MODDE 7 - 2004-02-02 13:03:01
2/10/2004
99
10. U se of model
4. G eneration of design
7. Execution of design
N=24 DF=14
R2=0.919 Q2=0.708
Regression coefficients
2/10/2004
100
10. U se of m odel
4. G eneration of design
7. E xecution of d esign
80
2/10/2004
101
10. U se of m odel
4. G eneration of design
7. E xecution of d esign
2/10/2004
102
2/10/2004
103
2/10/2004
104
2/10/2004
Please do not hesitate to ask the course instructor(s) for help/advice Remember that our solutions are just proposals; other alternatives might exist
2/10/2004
Exercises
Getting started
ByHand CakeMix
Optimization
Chiral Separation Metabolism RGA-Phase 3 Willge DrogenD
D-optimal Design
Model Updating
Blocking
Blocking
Mixture Design
Mixture Region Training Waaler Rocket Corne59 Bubbles Lowarp
Robustness Testing
Nonafact RGA-Phase 4 HPLC Robustness
Robust Design
CakeTaguchi LoafVolume
2/10/2004
Background
Enamines are reduced by formic acid to saturated amines. In this example morpholine-camphor enamine is the starting material. To investigate the amount of formic acid necessary and at which temperature the reaction should be carried out, design of experiments (DOE) was used.
Objective
The original objective was to make a model for three responses. Our first objective is to do calculations by hand to get an understanding of the arithmetic involved. After that, you should familiarise yourself with the software and perform the same calculations using the computer. The experimental goal was to minimise the amount of side product (Camphor) and the amount of unreacted starting material (Enamine), whilst maximising the yield of the desired product.
Data Factors
x1 x2 y1 y2 y3 Amount formic acid/enamine (mole/mole) Reaction temperature (C) 1.0 25
Levels
0 1.25 62.5 + 1.5 100
Responses
Camphor (side product)% Enamine unreacted % The desired product %
Goals
to be minimised to be minimised to be maximised
Factors
Exp. no 1 2 3 4 5 6 7 x1 1 1.5 1 1.5 1.25 1.25 1.25 x2 25 25 100 100 62.5 62.5 62.5 y1 6.7 10.5 5.5 7.7 7.5 7.9 7.8
Responses
y2 12.5 14.0 0.0 0.0 13.1 13.5 13.3 y3 80.4 72.4 94.4 90.6 84.5 85.2 83.8
Tasks
Task 1
Calculate by hand the coefficients of the equation Y = b0 + b1x1 + b2x2 + b12x1x2 +e. Do these calculations only for the first response, Y1. Do not include the centre points in these calculations (include them only when calculating the constant, b0); centre points are used for diagnostics. Hint: use the sign table mentioned in lecture.
Page 1 (5)
Task 2
Initiate a new investigation in MODDE and define the two factors and the three responses according to the information above. Do File/New and give a name of the investigation. Press Next. Press New (or double-click on the empty row) and enter the name, abbreviation, unit, and low and high settings of the first factor. Press Add another and fill in the name, abbreviation, unit, and settings of the second factor. Press OK. Press Next. Now we have defined the factors. Press New (or double-click on the empty row) and enter the name, unit, and abbreviation of the first response. Press Add another and give the details of the second response. Press Add another and enter the information regarding the third response. Press OK. Press Next. Now we have defined the responses. Select Screening. Press Next. Make sure that the selected design is the Full Factorial design in four runs. Verify that the number of Centre Points = 3 and Total runs = 7. Press Finish. Set Worksheet Run order to detect curvature and press OK. Now we have generated the experimental design. Enter the response values in the resulting worksheet. Now we are ready for data analysis.
Task 3
Evaluate the raw data. Make replicate plots (Worksheet/Replicate Plot) and histograms (Worksheet/Histogram) to examine the responses. Do Analysis/Fit. Evaluate the model. For which responses is the model reliable? What do you think could be the problem with the misbehaving response? Discuss.
Task 4
Look at the contour plot for each response (Prediction/Contour Plot Wizard). Which conditions should be chosen for preparative large-scale reduction of enamines of the morpholine-camphor type?
Page 2 (5)
Solutions to ByHand
Task 1 Sign table b0 + + + + + + + b1 + + 0 0 0 b2 + + 0 0 0 b12 + + 0 0 0
Task 3
We start by evaluating the raw data. The three replicate plots indicate that the replicate error is small for each response, which is favourable for the data analysis. It is possible to use the replicate plot to get a rough understanding of the relationships between the factors and the responses. We are going to fit an interaction model to each response. For such a model to be valid, the measurement values of the centre-points should be found in the middle part of the response interval. This is the case for y1 and y3, but not for y2. Hence, the replicate plot for y2 suggests that the relationship between y2 and the factors is curved (non-linear), which is impossible to describe with an interaction model.
Investigation: Byhand Investigation: Byhand Investigation: Byhand
2
10 9 8 7 6 1 2 3 Replicate Index
MODDE 7 - 2003-11-12 09:20:55
1
10
6 7 5
95 90 y3 85 80
3 4 6 5 7 1 2
1 2 3 Replicate Index
MODDE 7 - 2003-11-12 09:20:08
4 1 3
4 5
6 7 5
y1
y2 5
0 1 2 3
3
4 Replicate Index
4
5
75
Next, we create one histogram for each response. The three responses are approximately normally distributed and a need for response transformation cannot be detected.
Investigation: Byhand Investigation: Byhand Investigation: Byhand
Histogram of y2 4 3 Count 2 1 0
Histogram of y3
11
5.5 Bins
11
16.5
72
81 Bins
90
99
Page 3 (5)
After the raw data evaluation it is appropriate to carry out the regression modelling. According to the summary of fit plot, the model is reliable for all responses except y2, Enamine unreacted. The reason for this can be any of the following:
Investigation: Byhand (MLR) Summary of Fit 1.00
R2 Q2 Model Validity Reproducibility
the response includes an outlier a mistake was made in recording the response, for example the zeros are missing values the model is too simple the model is too complicated
y1
N=7 DF=3
y2
Cond. no.=1.3229 Y-miss=0
y3
Since we understood from the replicate plot of y2 that curvature is involved, it is likely that the fitted model is too simple. This can easily be checked by making plots of the raw data.
Investigation: Byhand Raw Data Plot with Experiment Number labels
y2
y2
14 12 10 8 y2 6 4 2 0
6 7 5
14 12 10 8 y2 6 4 2
2 1
6 7 5
3
1.00 1.10 1.20 x1 1.30 1.40
4
1.50
0 30 40 50 60 x2 70 80 90
4 3
100
From the scatter plots shown above the curvature is obvious. Such curvature can only be adequately captured by quadratic model terms, i.e. x12 and x22. The conclusion is therefore that with the current experimental design we cannot make a good model for y2. To estimate quadratic model terms the design must be expanded to become a composite design.
Page 4 (5)
Task 4
According to the response contour plots the temperature should be as high as possible and the ratio formic acid/enamine as low as possible. With these conditions we minimise the amount of Camphor (y1) and Enamine (y2) and maximise the amount of Product (y3).
NOTE: Because of the model weakness with regards to y2 we should interpret the second response contour plot with some caution.
Conclusions
The optimal point is low x1 (low molar ratio) and high x2 (high temperature). The model for y2 is weak because the relationship between the factors and this response is non-linear.
Page 5 (5)
Background
The producer of a commercial cake-mix experienced problems with the quality of the resulting cake in that there was considerable taste variation.
Objective
It was decided to use DOE to discover which combination of ingredients produced a tasty cake and which combination produced a reasonable cake at low cost.
Data
Three factors were studied: Flour, Shortening, and Eggpowder. The investigators used a design centred around the standard condition Flour = 300g, Shortening = 75g, and Eggpowder = 75g. Eleven experiments were made using a 23 full factorial design augmented with three replicated centre-points. The response is the average taste as assessed by a trained sensory panel.
Goal: Maximize
Page 1 (6)
Tasks
Task 1
Define a new investigation in MODDE with three factors and one response. Do File/New and name the investigation. Press Next. Press New (or double-click on the empty row) and enter the name, abbreviation, unit, and settings of the first factor. Press Add another and fill in the name, abbreviation, unit, and settings of the second factor. Press Add another and enter the details of the third factor. Press OK. Press Next. The three factors have now been defined. Press New (or double-click on the empty row) and enter the name, unit, and abbreviation of the Taste response. Press OK. Press Next. Select Screening. Press Next. Make sure that the selected design is the Full Factorial design in eight runs. Verify that Centre Points = 3 and Total runs = 11. Press Finish. Set Worksheet Run Order to detect curvature. Enter the response values in the generated worksheet. Now you are ready to analyse the data.
Evaluate the raw data. Fit the regression model. Which factors affect taste? Are there any non-significant model terms? What about lack of fit? Which factor combination gives an optimal taste?
Task 2
It is possible to take the cost of ingredients into account in the data analysis. The following prices were obtained: Flour 2.95 SEK/kg (0.00295 SEK/g) Shortening 14.70 SEK/kg (0.0147 SEK/g) Eggpowder 32.30 SEK/kg (0.0323 SEK/g)
Define a new response, a Derived response. Select Design and Responses. Double-click on the empty row. Define a derived response and press Edit, Next and Finish to enter the formula. Select ingredient from the list and multiply by the cost per gram, as shown below (NB: the parentheses shown in the formula are only used for clarity, they are not needed in reality). Also note that this task does not work with comma as decimal separator.
Refit the model. Find a recipe which represents a good compromise between a tasty cake and low cost. (Hint: Use Prediction/Contour Plot Wizard).
Page 2 (6)
Solutions to CakeMix
Task 1
We start by evaluating the raw data. First, we examine the curvature diagnostics plot (Worksheet/Curvature Diagnostics Plot) for taste (see below). This plot is constructed by plotting the value of Taste at three points, (1) the -/-/- factor combination, (9) the 0/0/0 factor combination, and (8) the +/+/+ factor combination. It is useful to examine whether the relationship between one response and the factors deviates from linearity. In this case the deviation from linearity is only mild and we may continue with the rest of the experiments. Whenever this plot exhibits strong curvature, reduce the range of the factors by 2/3.
Investigation: Cakemix
9
4.50 Taste
4.00
3.50
1
Low (distance: 0) Center (distance: 0) Factor Settings
The value inside parenthesis for each X-axis label is the normalized distance from the plotted experiment to the ideal design point of the design.
MODDE 7 - 2003-11-12 09:35:54
High (distance: 0)
The replicate plot shows that the replicate error is low, which is good. The histogram shows that the response is approximately normally distributed. This means that we have good data to work with.
Investigation: Cakemix
Investigation: Cakemix
Plot of Replications for Taste with Experiment Number labels 6.00 5.50 Taste 5.00 4.50 4.00 3.50 1
Histogram of Taste 6
6
Count
5 4
4 3
5 8 7 9 11 10
3 2 1
1
2
2
3 4 5 6 7 8 9 Replicate Index
MODDE 7 - 2003-11-12 09:37:18
3.00
3.90
4.80 Bins
5.70
6.60
Page 3 (6)
In the data analysis it is recommended to first examine the Summary of fit plot. This plot shows that we can explain 99% (R2 = 0.99) and predict 87% (Q2 = 0.87) of the response variation. The adequacy of the model is further indicated by MVal = 0.71 and Rep = 0.99. MVal measures the validity of the model and Rep the reproducibility. When the MVal bar is larger than 0.25, there is no Lack of Fit of the model (the model error is in the same range as the pure error). This is also shown by the ANOVA-table below, where the lower p-value is larger than 0.05, which means that the model exhibits no significant lack of fit. The upper p-value is smaller than 0.05, indicating that R2 is statistically significant. If the reproducibility is below 0.5, you have a large pure error, poor control of the experimental set up (the noise level is high), and you cannot assess the validity of the model. This results in low R2 and Q2. You should improve the reproducibility.
Investigation: Cakemix (MLR) Summary of Fit 1.00 0.80 0.60 0.40 0.20 0.00
N=11 DF=4
R2 Q2 Model Validity Reproducibility
Taste
Cond. no.=1.1726 Y-miss=0
Another diagnostic tool that is often used is the N-plot of residuals. However, with only 11 experiments it is difficult to define which residuals are normally distributed and which are not. In the plot below, the main thing to confirm is that all the experiments lie within 4 SDs, which they do. Inspection of the regression coefficients indicates that two model terms, Fl*Sh and Fl*Egg, are non-significant and can be removed from the model.
Investigation: Cakemix (MLR) Taste with Experiment Number labels 0.95 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.05 -4 -3 -2
Investigation: Cakemix (MLR) Scaled & Centered Coefficients for Taste 0.40 0.20 0.00 -0.20 -0.40 -0.60 Fl Fl*Egg Fl*Sh Egg
N-Probability
5 8 2 3 10
-1
9 11
1 6 4 7
N=11 DF=4
R2=0.995 Q2=0.874
After refitting the model a higher Q2 (0.94) is obtained. MVal and the ANOVA table also indicate the usefulness of the model. In the N-plot of residuals, experiment #1 is located beyond 4 SDs but it is considered harmless given the high Q2 of over 0.94.
Page 4 (6)
Sh*Egg
Sh
Investigation: Cakemix (MLR) Summary of Fit 1.00 0.80 0.60 0.40 0.20 0.00
N=11 DF=6
Taste
Cond. no.=1.1726 Y-miss=0
Investigation: Cakemix (MLR) Taste with Experiment Number labels 0.95 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.05
1 6 8 4 9 11 3 7 5 10 2
-5 -4 -3 -2 -1 0 1 2 3 4 5 Deleted Studentized Residuals
N-Probability
N=11 DF=6
R2=0.988 Q2=0.937
N=11 DF=6
R2=0.988 Q2=0.937
R2 Adj.=0.980 RSD=0.0974
MODDE 7 - 2003-11-12 10:27:19
The coefficient plot indicates that the largest model term is the Sh*Egg interaction. It is normal to explore such interactions by means of response contour plots. The three response contour plots shown below indicate that the highest value of taste is found with the factor settings Flour = 400g, Shortening = 50g and Eggpowder 100g.
Egg
Sh
Task 2
In Task 1 we found that Flour should be fixed at its high level in order to produce a tasty cake. This ingredient is also the cheapest one. The contour plots shown below were constructed using Flour = 400 g.
Apparently, we should stay in the upper left-hand corner to maximize taste. In this corner, the predicted ingredient cost is 5.14 SEK. However, the lower right-hand corner represents a reasonable compromise between taste and cost. Here the predicted cost is just 4.27 SEK.
Conclusions
To maximize taste we should use Flour = 400g, Shortening = 50g and Eggpowder = 100g. To obtain a compromise between high taste and low cost an alternative factor combination would be Flour = 400g, Shortening = 100g, and Eggpowder = 50g.
Page 6 (6)
Background
A new combination of constituents in a formulation with pain-relieving capacity was investigated. The formulation contained two active components, A and B, and the effect of different combinations of these were examined. The response was the time (in minutes) needed for the formulation to reach full anaesthetic effect (the average from testing 12 persons). The desirable result was full effect after 5 minutes. Substance A costs 60 times more than B to produce. Since every experiment was very expensive the number of experiments was minimised.
Objective
The first objective is to optimise time, with two variables, through the use of contour plots. Another objective is to consider production economy.
Data
Goal: 5 minutes
Tasks
Task 1
Define a new investigation according to the information given above. The default experimental plan in MODDE is the one used in this application. Fit the regression model. Construct a plot that shows under which conditions the formulation achieves the desired effect.
Task 2
Which approved combination of A and B is the most economical? Do this with graphical tools. (Hint: Add the derived response Cost).
Task 3
If the desirable result was full effect within 5 minutes, what would the answer to Task 2 be (95% significance)?
Page 1 (4)
Solutions to Pain
Task 1
In the data analysis it is recommended to examine the R2/Q2 plot first (summary of fit). This plot shows that we can explain 99% (R2 = 0.99) and predict 98% (Q2 = 0.98) of the response variation. Also the statistics MVal = 0.94 and Rep = 0.98 point to an excellent model. The coefficient plot shows that constituent A (CA) affects the release time more strongly than constituent B (CB). If the release time is to be minimised, we should increase the amounts of both constituents.
Investigation: Pain (MLR) Summary of Fit 1.00 0.80
R2 Q2 Model Validity Reproducibility
Investigation: Pain (MLR) Scaled & Centered Coefficients for Release time 0.00 -0.50 min
0.60 0.40
-1.00 -1.50
0.20
-2.00 CA CB
Release time
N=7 DF=3 Cond. no.=1.3229 Y-miss=0
N=7 DF=3
R2=0.995 Q2=0.984
Any point along the line Release time = 5 fulfils the experimental goal. (Hint: the number of steps in a contour plot can be changed by double-clicking in the plot, selecting Contour Levels and changing the number of steps and/or min/max. Here, we used 13 steps when going from Min = 4 to Max = 10.)
CA*CB
0.00
Page 2 (4)
Task 2
We added a new response, cost (derived from the factors).
The most economical combination of A and B contains as little of A as possible, i.e., approximately CA = 8.9 and CB = 100. At this point, the predicted cost is 634 (in arbitrary currency unit). The left-hand contour plot only gives the point estimate of Release time. The prediction list shows the uncertainty in the predicted value. As shown by this list, using the combination CA = 8.9 and CB = 100 might result in a Release time ranging from 4.6 to 5.4 minutes.
Page 3 (4)
Task 3
If we want to make sure that the Release time does not exceed 5 minutes, we have to adjust the upper confidence limit from approximately 5.4 to 5.0. This, in turn, implies that we should be looking for a point estimate of approximately 4.6. From the prediction list given below we conclude that in order to be sure that the painreliever does not take longer than 5 minutes to reach full effect, we need 9.45 mg of substance A and 100 mg of substance B (the cheapest solution). The limits are given with 95 % confidence.
Conclusions
In order to accomplish full anaesthetic effect within five minutes we may use the recipe constituent A = 9.45 mg and constituent B = 100 mg. This combination of ingredients is the most economical one.
Page 4 (4)
Background
A manufacturer experienced problems with variation in the thickness of tablets. The variation caused problems during packaging. The problem was tackled by determining which factors had the largest influence on the thickness of the tablets. Three factors that were considered to have an impact on the thickness of the tablets were investigated using experimental design. These factors were: Amount of stearate (lubricant) Amount of active substance, and Amount of starch
Objective
The objective of the investigation was to produce an experimental design and model the response. The goal was to produce a 5 mm thick tablet with a fixed level (90 mg) of active substance.
Data
Goal: 5 mm
Page 1 (4)
Tasks
Task 1
Initiate a new investigation in MODDE and define the factors and the response according to the information given above. Select Screening as objective. Accept the recommended 11 run design (Full factorial design in 8 runs plus 3 centre-points). Enter the response values in the Worksheet.
Task 2
Do Analysis/Fit. Determine which factors have the strongest influence on the thickness of tablets by looking at the coefficient plot. Are there any interaction effects present?
Task 3
How would you produce a 5 mm thick tablet with 90 mg active substance? With what precision can this be done (5 mm + ???)? Hint 1: Use a response contour plot to find suitable factor combinations at which to perform predictions. Hint 2: Use the prediction list and compute the predicted value and its associated confidence interval.
Page 2 (4)
Solutions to Tablet
Task 2
As seen below, the factors active substance and starch have a strong influence on the thickness of the tablets. The factor stearate has a small influence. The interaction between Stearate and Starch is small but should be included since the R2 and Q2 numbers decline when it is removed.
Investigation: Tablet (MLR) Investigation: Tablet (MLR) Summary of Fit 1.00 0.80 0.60 -0.20 0.40 actsu ste*actsu 0.20 0.00
N=11 DF=4
R2 Q2 Model Validity Reproducibility
thickness
Cond. no.=1.1726 Y-miss=0
R2 Q2 Model Validity Reproducibility
N=11 DF=4
R2=0.969 Q2=0.504
Investigation: Tablet (MLR) Summary of Fit 1.00 0.80 0.60 0.40 0.20
Investigation: Tablet (MLR) Scaled & Centered Coefficients for thickness 0.40 0.20 mm 0.00 -0.20
actsu
ste
sta
0.00
N=11 DF=6
thickness
Cond. no.=1.1726 Y-miss=0 N=11 DF=6 R2=0.953 Q2=0.808
The two lower plots show the result when we have removed insignificant terms. The principle for removing model terms is that of maximisation of Q2: The term ste*actsu has the smallest coefficient and is removed first. The model is then recalculated with the remaining terms and we compare Q2 with the original model. In this case Q2 increases from 0.50 to 0.78. We then continue by removing the second smallest interaction term, actsu*sta, and check Q2. Anew Q2 increases, from 0.78 to 0.81. When removing the last interaction term, ste*sta, Q2 drops a little. This indicates that the four-term model displayed above is predictively the optimal one. This example also shows that a small model term must not necessarily be excluded from the model, just because it is insignificant according to the confidence interval criterion.
Page 3 (4)
ste*sta
actsu*sta
ste
sta
ste*sta
Task 3
In the contour plot, the line where the thickness of the tablets is predicted to be 5 mm is of interest. Below, we give some predictions for different factor combinations found along the line 5 mm. According to these predictions the tablets can be made with the following precision: 5.00 0.09mm up to 5.00 0.12mm depending on which factor combination is selected for production.
Active substance: 90 mg
Conclusions
Active substance and starch are the two ingredients most profoundly affecting the tablet thickness. According to model predictions the tablets can be made with the precision 5.00 0.09mm if the factor combination stearate = 1.0 mg, active substance = 90 mg, and starch = 44.5 mg is used in the manufacturing.
Page 4 (4)
Background
Spray-drying is a process often used for drugs intended for inhalation. For the spray-drying of proteins, the prime interest is to produce particles of controlled size. Additionally, it is important that the protein temperature remains rather low to avoid unnecessary denaturation. Protein degradation may involve many complicated physical and chemical processes, including denaturation. Therefore, we would like to study protein stability at a molecular level in order to facilitate formulation applications.
Objective
This example is based on a model protein (D7599) developed by AstraZeneca. Protein powders of D7599 were produced by spray-drying. The experimental objective of this study was to determine which process parameters influence the quality of the spray-dried product. The data analysis will involve dealing with several responses which are not completely correlated. Original data source: Cronholm, M., The Effect of Process Variables on a Spray-dried Protein Intended for Inhalation, Undergraduate Research Study, Department of Pharmaceutics, Uppsala University, Uppsala, Sweden, 1998.
Data
Spray-drying conditions were varied using a full factorial design in four factors: Inlet Temperature temperature of drying air at the inlet of the equipment. The high and low levels of this factor were set such that degradation would be expected at the high level (220C) but not at the low temperature (100C). Atomization gas flow for this factor the low level (500 l/h) of the atomization gas (nitrogen) was the minimum required to achieve sufficient energy for atomization. The high level (800 l/h) was the maximum achievable flow with this spray-dryer. Aspiration rate the aspirator draws air through the instrument and this was varied from 60% to 100% (full capacity). Feed-flow indicates the material flow through the equipment. Here, the high level of 5ml/min was the maximum rate which could be used at the low temperature without condensation appearing in the drying chamber, whereas the low level (2 ml/min) was chosen as the slowest practical rate. Yield the amount of product produced. Should be maximized. Size particle size. Ideally, particles should be in the range 0.5 3.3 m in order to reach the lower airways. Water water content in spray-dried protein. To be minimized. Outlet temperature outlet drying air temperature. This temperature may influence protein degradation and was therefore included. No specific target value was specified for this response. HMWP high molecular weight proteins. Measures the extent of aggregations, i.e., formation of dimers and oligomers of the protein. Should be as low as possible.
To characterize the outcome of the spray-drying the following five responses were measured:
Page 1 (8)
Page 2 (8)
Tasks
Task 1
Initiate a new investigation in MODDE. Define the four factors and the five responses according to the information above. Select Screening and the full factorial design in 16 runs supplemented with three center-points. Enter the response data or copy them from PROTEIN SPRAY DRYING.XLS. Evaluate the raw data. Is there any need for data pre-treatment such as a response transformation?
Task 2
Select MLR as the fit method. Fit the regression model. Which are the important factors? Are there any non-significant model terms? Are the residuals approximately normally distributed? Refine the model, if necessary. Use the optimizer to predict good operating parameters.
Task 3
In a MODDE investigation you can only have one model (i.e., one set of model terms) for all the responses. Hence, to generate several models with the same factors and underlying design, but for different responses, we make copies of the original investigation, and in each copy keep the responses that will be fitted with the same model. Thereafter we can link all these responses into one of the investigations (File/Link Investigation) and optimize them together. Try to improve the modelling results, by dividing the responses in two separate projects. One project may contain Yield and Size, and another project Water, Outlet Temp and HMWP. Another possibility is to split the mother investigation into five new investigations and tailor-make one model for each response. Repeat Task 2, but analyze sub-sets of responses. Optimize the responses together. There is no solution provided to this Task.
Page 3 (8)
6 5 1 2 8 11 12
5 6 7 8
13 14 9 10
6 2 1 3 5 9 12 10
14
1
Water
3 4 2 5 7 8
11 13 12 16 10 14 15 18 17 19
18 19 15 16 17
Size
13 18 17 19 15 16
7 3 4
4
2 1 2 3
4 7 8
4 5 6 7 8 Replicate Index
11
2 1 2 3 4 5 6
6
7 8 Replicate Index
9 10 11 12 13 14 15 16 17
9 10 11 12 13 14 15 16 17
9 10 11 12 13 14 15 16 17
Replicate Index
MODDE 7 - 2003-11-26 19:06:37
Investigation: Protein Spray Drying Plot of Replications for Outlet Temp with Experiment Number labels
Investigation: Protein Spray Drying Plot of Replications for HMWP with Experiment Number labels
Investigation: Protein Spray Drying Plot of Replications for HMWP~ with Experiment Number labels
6
140 Outlet Temp 120 100 80 60 1
10 12
14
6 16
HMWP 3 0.40 HMWP~
6 8 8
0.20 0.00 -0.20
14 10
19 18 17 1
2 3
16
14 16 10 7
6 7 8
5 3
4 5 6 7
7 9
8
13 11
15
1
1 2 3 4 5
2 3 4 5
12 11 13 15
1 2
12 19 17 18 11 13 15
19 17 18
-0.40 1 2 3 4 5
5 7
6 7 8
9 10 11 12 13 14 15 16 17
9 10 11 12 13 14 15 16 17
9 10 11 12 13 14 15 16 17
Replicate Index
MODDE 7 - 2003-11-26 19:07:32
Replicate Index
MODDE 7 - 2003-11-26 19:07:52
Replicate Index
MODDE 7 - 2003-11-26 19:09:30
48
58
68
1.00
1.80
2.60 Bins
3.40
4.20
5.00
1.00
1.95
2.90 Bins
3.85
4.80
5.75
Investigation: Protein Spray Drying Histogram of Outlet Temp 7 6 5 Count Count 4 3 2 1 0 50 70 90 Bins
MODDE 7 - 2003-11-26 19:05:09
Investigation: Protein Spray Drying Histogram of HMWP 14 12 10 8 6 4 2 Count 12 10 8 6 4 2 0.00 0.75 1.50 Bins
MODDE 7 - 2003-11-26 19:05:25
110
130
150
2.25
3.00
3.75
-1.00
-0.65
-0.30 Bins
0.05
0.40
0.75
When dealing with many response variables you should always check the correlation matrix. It will suggest how the variables are correlated. An excerpt of the correlation matrix is shown below. This table indicates there are two groups of responses. The first sub-set contains Yield and Size which correlate with the coefficient 0.75. The second group is made up of Water, Outlet Temp and HMWP, which also have high pairwise correlation coefficients (-0.75, -0.88, and 0.88). Because of the subgrouping of the responses we should not expect them to depend in the same way on the various terms in the regression model.
Copyright Umetrics AB, 04-02-10 Page 4 (8)
Task 2
MLR was used to fit an interaction model to each of the five responses, each of which has 11 model terms (the constant, four linear terms, and six two-factor interactions). As seen below, we have good models for all responses except HMWP.
Investigation: Protein Spray Drying (MLR) Summary of Fit 1.00 0.80 0.60 0.40 0.20 0.00
R2 Q2 Model Validity Reproducibility
Yield
Size
Water
Outlet Temp
HMWP~
N=19 DF=8
The coefficient overview plot below shows all model coefficients (except the constant term) for each response variable. The first two responses (Yield and Size) are dominated by the Atomization gas flow. Also the Aspiration rate has an influence on the Yield. The other three responses are highly influenced by the setting of the Inlet Temperature. Water content in the spray-dried protein is also dependent Aspiration rate. The different dependence on the factors suggest Yield and Size to be correlated, and Water, Outlet Temp, and HMWP to be correlated.
Page 5 (8)
1.00
0.50
0.00
-0.50
N=19 DF=8
In an attempt to improve the five models, two model terms were removed. These were: Ato*FF and Asp*FF. Primarily, this gave a much better model for HMWP.
Investigation: Protein Spray Drying (MLR) Summary of Fit 1.00 0.80 0.60 0.40 0.20 0.00
R2 Q2 Model Validity Reproducibility
Yield
Size
Water
Outlet Temp
HMWP~
N=19 DF=10
Page 6 (8)
The coefficients of the revised models are plotted below in the Coefficient Overview plot.
Investigation: Protein Spray Drying (MLR) Normalized Coefficients
InT Ato Asp FF InT*Ato InT*Asp InT*FF Ato*Asp
1.00
0.50
0.00
-0.50
N=19 DF=10
Further review of the models using N-plots of residuals show a mild outlier for Outlet Temp (exp 10), but due to the high R2 and Q2 for this response this point is not alarming. No N-plots are shown. We then decided to use the above models together with the Optimizer to predict a factor combination representing good operating conditions. The response desirabilities were set according to the experimental goals mentioned on page 1.
Page 7 (8)
The results of running the Optimizer are shown below. Apparently, we have not completely fulfilled the desirabilities of the responses, but many simplexes have reached a point where many of the goals are met. It is mainly difficult to cope with the requirements on Water. The following approximate operating parameters are suggested in order to comply with most of the endpoints as well as possible: Inlet temperature: 160 C, Atomization gas-flow 580 l/h, Aspiration rate 100%, and Feed-flow 5ml/min.
Comment: Frequently, the optimizer is run iteratively in several steps, letting the results of the preceding stage dictate how to relax factor settings in the next stage. A practical way to do this is first using the optimizer for interpolation and then for extrapolation. In the current application, however, factor limits could not be changed in a second cycle of the optimizer, since they were already set according to performance limitations of the equipment used.
Conclusions
It is possible to develop strong models for the five responses. Good operating conditions predicted by the models are: Inlet temperature: 160 C, Atomization gas-flow 580 l/h, Aspiration rate 100%, and Feed-flow 5ml/min. A further experiment should be done to verify the results at this point and future work could involve an optimization study anchored around these settings.
Page 8 (8)
Background
The organic synthesis of semi-carbazone from glyoxylic acid is a key step in the synthesis of azuracil (a cytostaticum, anti-cancer drug).
Objective
The objective of this study was to investigate the best operating conditions of a pilot plant for synthesising semicarbazone. A fractional factorial design in four factors was constructed and three responses were measured. The intentions with this experimental protocol were to obtain a high yield of semi-carbazone, high purity and rapid filtration. This exercise also mirrors some of the difficulties that might appear when several responses have to be considered simultaneously.
Data
Time for addition of glyoxylic acid (h) Stirring time (h) Reaction temperature (C) Amount of water added (ml/mol)
Yield (%) isolated. Goal: High Purity (%) titrimetric. Goal: High Filtration (ordinal scale, -5 worst, 5 best) Goal: High
Page 1 (7)
Tasks
Task 1
Write down the computational matrix with + and - signs. Describe the defining relation and list the confounding pattern for the linear terms and the two-factor interactions.
Task 2
Solve the problem with MODDE. Note that the design does not include centre-points, hence you will see no bars relating to Model Validity and Reproducibility in the Summary of Fit plot. (Hint: add some interaction terms to the model and discuss the problems this might introduce).
Task 3
Show graphically which part of the experimental space should be chosen for the first experiment in the pilot plant (specify levels for the variables). Goal: High Yield, Purity and Filtration.
Task 4
Which method is commonly used to separate confoundings between two-factor interactions?
Page 2 (7)
const
1 2 3 4 5 6 7 8 + + + + + + + +
a
+ + + +
b
+ + + +
c
+ + + +
d
+ + + +
ab
+ + + +
ac
+ + + +
ad
+ + + +
bc
+ + + +
bd
+ + + +
cd
+ + + +
N.B. In the literature there are two ways of describing the generators and the interactions, with letters and with numbers. We use the more conventional LETTERS.
Task 2
To the left, we see the confounding pattern. The problem is that we cannot be sure of which of the confounded interaction terms is important when we get a significant coefficient (Note: a model like this one cannot be fitted with MLR, since it contains confounded terms and we only have 8 runs).
Page 3 (7)
Task 3
A linear model is a good choice for Purity and Filtration, but not for Yield
1.00 0.80 0.60 0.40 0.20 0.00 -0.20 Investigation: Pilot plant (MLR) Summary of Fit
R2 Q2
yield
N=8 DF=3
purity
Cond. no.=1.0000 Y-miss=0
filtration
From the regression coefficient plot of Yield, we can see that we have big confidence limits and hence great model uncertainty. One way to improve the model might be to add the interaction between the two largest main effects, i.e., Ad*Te and refit the model.
Investigation: Pilot plant (MLR) Scaled & Centered Coefficients for yield 2 1 % 0 -1
Ad
N=8 DF=3
R2=0.676 Q2=-1.304
wa
R2 Q2
St
The model improves a lot with respect to Yield, but degrades with regards to the prediction ability of Purity and Filtration.
Investigation: Pilot plant (MLR) Summary of Fit 1.00 0.80 0.60 0.40 0.20 0.00
yield
N=8 DF=2
purity
Cond. no.=1.0000 Y-miss=0
Te
filtration
Page 4 (7)
The overview of regression coefficients shows the importance of the added interaction term for Yield. It is also apparent that the second main effect (Stirring time) is insignificant for all three responses. Remove Stirring time and refit the model.
0.50
Ad St Te wa Ad*Te
0.00
yield
N=8 DF=2
purity
Cond. no.=1.0000 Y-miss=0
filtration
After the deletion of Stirring time, much better models were obtained. When calculating three models with the same model terms, this is the best result.
Investigation: Pilot plant (MLR) Summary of Fit 1.00 0.80 0.60 0.40 0.20 0.00
R2 Q2
yield
N=8 DF=3
purity
Cond. no.=1.0000 Y-miss=0
filtration
The coefficient overview plot may be used in trying to solve the problem. Recall that our goals are high Yield, high Purity and high Filtration. Addition time and Temperature are the two most important terms. We make contour plots with these as axes. As constant, we set the amount of water added at its centre level (because it has a negative effect for Yield and a positive one for Purity and Filtration).
Ad Te wa Ad*Te
purity
Cond. no.=1.0000 Y-miss=0
filtration
Page 5 (7)
By using long addition time and high temperature the goal of simultaneously high Yield, Purity and Filtration is accomplished. The use of the MODDE Optimiser is shown below. Note that the factors have been set for extrapolation outside the investigated area.
Page 6 (7)
Task 4
The method used to unconfound two-factor interactions is called FOLD-OVER.
Conclusions
In order to accomplish high yield, high purity, and rapid filtration, the factor combination of addition time 2h, water 137.5 ml/mol and temperature 60 C looks interesting and could be verified with additional experiments. The last factor, stirring time, may be set at a level convenient for the experimental process. The optimiser in MODDE indicates that even better results are obtainable when relaxing the high limit of addition time to 2.3h and the high limit of temperature to 80 C. Reference: J-C Vallejos, Diss. IPSOI, Marseille 1978.
Page 7 (7)
Background
Reporter gene assays are used in mechanistic studies of gene regulation. They also have great potential when applied to toxicology and drug development. A reporter gene has an easily measurable phenotype whose transcription is controlled by a promoter. Reporter gene assays provide important information of gene regulation relating to expression (i.e. number of copies) and when and where a particular protein is formed.
Objective
The data-set used in this exercise originates from Active Biotech AB in Lund, Sweden and we gratefully acknowledge Lena Schultz and Lisbeth Abramo for permitting us to use it. This study deals with the luciferase reporter gene, one of a number of widely used reporter genes. A total of six factors were investigated using DOE and the objective was to increase and stabilise the signal-to-background ratio of the assay. This study is unique in that it contains data related to the full spectrum of DOE applications, i.e. first a screening design was performed, then fold-over, then optimisation and finally robustness testing. The exercise is structured accordingly: Phase 1 (Screening): A 26-2 fractional factorial design in 16 experiments + 3 centre points. Phase 2 (Fold-over): The initial screening design was complemented by folding over. Phase 3 (Optimisation): A CCF design in 17 experiments to optimise three of the six factors. Phase 4 (Robustness Testing): A 25-1 fractional factorial design in 16 experiments + 3 centre points to investigate the sensitivity of the response to small changes in five of the factors.
Factors: Cells number of T-cells used in assay (number per well) PMA agent added to stimulate T-cells (ng/ml) Ionomycin agent added to stimulate T-cells (g/ml) Stimulation time duration of stimulation (hours) Lysing volume volume of buffer needed to lyse T-cells (l) Ratio ratio of amount of sample to amount of substrate required to acquire a signal in the luciferase assay
Page 1 (20)
Phase 1 (Screening)
Task 1.2
Fit the regression model. Which factors are most important? Are there any non-significant model terms? Are the residuals approximately normally distributed? Refine the model as necessary.
Task 1.3
Using the model obtained above, which factor combination maximises the signal-to-background ratio?
Page 2 (20)
Phase 2 (Fold-Over)
Page 3 (20)
Task 2.2
Fit the regression model. Which factors are most important? What about the block-factor? Are there any non-significant model terms? Are the residuals approximately normally distributed? Refine the model as necessary.
Task 2.3
Using the model obtained above, which factor combination maximises the signal-to-background ratio? Compare your answer with that obtained in Task 1.3?
Page 4 (20)
Phase 3 (Optimisation)
Tasks
Phase
(Optimisation)
Task 3.1
In optimisation, the objective is to locate an optimal factor combination which can be used as a future set point. Define a new investigation with three factors and one response. Note that the order of the factors has changed and that the factor ranges have been modified according to the results of the screening phase. The new design defines a much smaller experimental domain. Select RSM and choose a CCF design augmented with four centre-points. Enter the response data and evaluate them. Should the response be transformed?
Task 3.2
Fit the regression model. Which factors are most important? Are there any non-significant model terms? Are the residuals approximately normally distributed? Refine the model as necessary.
Task 3.3
Using the model obtained above, which factor combination maximises the signal-to-background ratio?
Copyright Umetrics AB, 04-02-10 Page 5 (20)
Page 6 (20)
The specification of the response was that the signal-to-background ratio should exceed 50 regardless of the factor combination. Define a new investigation in MODDE with five factors and one response. Select Screening and the Frac Fac Res V+ design augmented with three centre-points. Enter the response data and evaluate them. Should the response be transformed? How do the response data compare to the specification?
Task 4.2
Fit the regression model. Which factors are most important? Are there any non-significant model terms? Are the residuals approximately normally distributed? Refine the model as necessary. Is the response sensitive to the factor changes?
Task 4.3
Evaluate the results in terms of the four limiting cases of robustness testing. Which case applies here? Inside specification/Significant model (Limiting case 1) Inside specification/ Non-significant model (Limiting case 2) Outside specification/Significant model (Limiting case 3) Outside specification/Non-significant model (Limiting case 4)
Which factors should be better controlled in order to achieve robustness according to both criteria? Propose new factor tolerances where necessary.
Page 7 (20)
16
15
80 S/B 60 40 20 0 1
10
14 15 1 2 3 4 5
2 3 4 5 6
6 7 8 9 10 11 12 13
7 8 Replicate Index
MODDE 7 - 2003-11-27 09:51:02
19 17 18
9 10 11 12 13 14 15 16 17
-1
24
49 Bins
74
99
124
Investigation: Reporter Gene Assay Screening Plot of Replications for S/B~ with Experiment Number labels 2.00
16 14 15 6 5
5 6 7
10 8 Count 6 4 2 0
8 7 9 10
8
13 12 11
19 17 18
1 2 3 4
2 3 4
9 10 11 12 13 14 15 16 17
-1.00
-0.30
0.40 Bins
1.10
1.80
2.50
Replicate Index
MODDE 7 - 2003-11-27 09:52:04
Investigation: Reporter Gene Assay Screening Plot of Replications for S/B~ with Experiment Number labels 2 1
14 6 4 5 7 9 10 11 3
1 2 3 4 5 6 7 8
16 15
Count
8 6 4 2 0
8 12
13
S/B~
0 -1 -2
19 17 18
1 2
9 10 11 12 13 14 15 16 17
-3
-2
-1
0 Bins
Replicate Index
MODDE 7 - 2003-11-27 09:52:48
Page 8 (20)
Task 1.2
The default linear model looks good with no evidence of lack of fit (R2 = 0.92, Q2 = 0.79). The top two plots correspond to this model. To try and improve the model, PMA and Ratio were removed and the six two-factor interactions of the four remaining factors added, of which only three were worth keeping (Cel*Lys, Ion*StH, and Ion*Lys). The revised model is much better (R2 = 0.96, Q2 = 0.91). The lower two plots relate to the revised model.
Investigation: Reporter Gene Assay Screening (MLR) Summary of Fit 1.00 0.80 0.60 0.40 0.20 Cel Ion Lys PM Rat 0.00 StH 0.50
R2 Q2 Model Validity Reproducibility
Investigation: Reporter Gene Assay Screening (MLR) Scaled & Centered Coefficients for S/B~ 1.00
0.00
S/B~
N=19 DF=12
N=19 DF=12
MODDE 7 - 2003-11-27 09:54:26
R2=0.917 Q2=0.791
Investigation: Reporter Gene Assay Screening (MLR) Summary of Fit 1.00 0.80 0.60 0.40
Investigation: Reporter Gene Assay Screening (MLR) Scaled & Centered Coefficients for S/B~ 1.00
0.50
0.00 0.20 Ion*StH StH Cel Ion Lys 0.00 Cel*Lys Ion*Lys
S/B~
N=19 DF=11
N=19 DF=11
R2=0.962 Q2=0.914
The revised model contains no outliers (below, left), and the size of the residual is fairly independent of the predicted value (below, right), which is good.
Investigation: Reporter Gene Assay Screening (MLR) S/B~ with Experiment Number labels
Deleted Studentized Residuals Investigation: Reporter Gene Assay Screening (MLR) S/B~ with Experiment Number labels
0.98 0.95 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.05 0.02 -4 -3 -2
19 11 1 18 16 6 2 5 12 1415 7 8 4 13 10
0 1
19
2 1 0 -1 -2 -1 0 Predicted
N=19 DF=11 R2=0.962 Q2=0.914 R2 Adj.=0.937 RSD=0.2467
MODDE 7 - 2003-11-27 09:56:11
17
N-Probability
11 1 2 3 4 5 9 10
17 18 12 7 6 8 13 15 16 14
9 3
-1
Page 9 (20)
Task 1.3
The contour plot below shows how the signal-to-background ratio is predicted to change as a function of the factors Cells and LysVolume, while fixing the other factors at their maximum value. The combination of Cells and LysVolume was chosen to explore the borderline significant two-factor interaction.
Conclusions of Phase 1
The three most important factors are Cells, Ionomycin and Stimulation Time. There are a few twofactor interactions which look interesting as they improve the predictive power of the model. However, these two-factor interactions are confounded with other two-factor interactions. Such confounding can be resolved using the Fold-over technique, see Phase 2 of the exercise.
Page 10 (20)
Investigation: Reporter Gene Assay Screening - Fold over complement Histogram of S/B~
Count
12 4 3
6 8 7
16 14 15 13 19 17 18 12 9 10 11
35 33 32 34 36 37 25 38 24 26 29 31 28 30 21 23 22 20 27
12 10 8 6 4 2
10
20 Replicate Index
30
-3.00
-2.15
-1.30
-0.45
0.40
1.25
2.10
2.95
Bins
MODDE 7 - 2003-11-27 09:59:01
Task 2.2
The default linear model is very good (R2 = 0.92, Q2 = 0.88). The top two plots below relate to this model. The Block factor is not significant so there is no evidence of a time drift between the two sets of experiments. To try and improve the model, PMA, Ratio and $Block were removed. The refined model is only marginally better (R2 = 0.91, Q2 = 0.89). The lower two plots relate to the refined model.
Investigation: Reporter Gene Assay Screening - Fold over complement (MLR) Q2 Model Validity Summary of Fit Reproducibility 1.00 0.80 0.60 0.40 0.20 0.00
R2
Investigation: Reporter Gene Assay Screening Fold over complement (MLR) Scaled & Centered Coefficients for S/B~ 0.80 0.60 0.40 0.20 0.00 -0.20 StH Ion Cel Lys Rat $Bl Lys
MODDE 7 - 2003-11-27 10:01:55
N=38 DF=30
N=38 DF=30
PM
S/B~
R2=0.920 Q2=0.877
Investigation: Reporter Gene Assay Screening - Fold over complement (MLR) Q2 Model Validity Summary of Fit Reproducibility 1.00 0.80 0.60 0.40 0.20 0.00
R2
Investigation: Reporter Gene Assay Screening Fold over complement (MLR) Scaled & Centered Coefficients for S/B~ 0.80 0.60 0.40 0.20 0.00 -0.20 StH
R2 Adj.=0.902 RSD=0.3018 Conf. lev.=0.95
N=38 DF=33
Cel
N=38 DF=33
R2=0.912 Q2=0.887
Ion
S/B~
Page 11 (20)
The revised model contains no outliers (below, left). However, the plot of the deleted studentized residuals versus the predicted value (below, right) indicates that some of the largest residuals correspond to the six centre-points. A similar phenomenon was present also in the initial screening design. This hints at curvature problems. Curvature is easy to handle with a quadratic regression model but not with the linear model used here.
Investigation: Reporter Gene Assay Screening Fold over complement (MLR) S/B~ with Experiment Number labels Deleted Studentized Residuals 0.99 0.98 0.95 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.05 0.02 0.01 -4 -3 -2 Investigation: Reporter Gene Assay Screening Fold over complement (MLR) S/B~ with Experiment Number labels 2 1 0 -1 -2 -1 0 Predicted
N=38 DF=33 R2=0.912 Q2=0.887 R2 Adj.=0.902 RSD=0.3018
MODDE 7 - 2003-11-27 10:05:35
19 16 1 36 37 17 28 35 22 27 38 2 3 18 15 8 926 13 4 5 12 32 31 30 7 11 10 34 21 14 6 25 33 24 20 3 29
-1 0 1 2 3 4 Deleted Studentized Residuals
N=38 DF=33 R2=0.912 Q2=0.887 R2 Adj.=0.902 RSD=0.3018
MODDE 7 - 2003-11-27 10:05:03
19 36 37 17 22 28 38 23 2 18 8 26 9 13 4 5 12 7 34 11 30 31 10 21 25 24 20 3 29 1
1
16 27 15 32 6 33 35
N-Probability
14
Task 2.3
MODDEs Optimizer was used to locate the factor combination which maximises the response. PMA, Ratio and $Block were not included in the final model and are therefore greyed out in the Optimizer factor spreadsheet.
Page 12 (20)
The results of running the Optimizer are shown below. The optimum point corresponds to having three factors at their upper limit and one at its lower limit.
Conclusions of Phase 2
The Fold-over experiments did not indicate any large two-factor interactions. Instead, it confirmed that three of the factors dominate: Cells, Ionomycin and Stimulation Time. These three factors form the basis of the optimisation design employed during Phase 3, which will be better suited to handling the non-linear behaviour noted above.
Page 13 (20)
8
200
14 12 10 11 13 16 17 15 18
Count
6 5 4 3 2 1
200 150 100 50
7
150 S/B 100 50
6 4 5 3 9
1 2 3 4 5 6 7 8
S/B
9 10 11 12 13 14 15
17
62
107 Bins
152
197
242
S/B Min: 17.9 Max: 221.4 Median: 107.8 Mean: 120.683
Replicate Index
MODDE 7 - 2003-11-27 10:16:47
Task 3.2
The default model has a relatively poor Q2 (R2 = 0.91, Q2 = 0.56). The top two plots relate to the initial model. The model was pruned by removing non-significant terms (R2 = 0.89, Q2 = 0.74). The lower two plots relate to the refined model.
Investigation: Reporter Gene Assay RSM with CCF (MLR) Q2 Model Validity Summary of Fit Reproducibility 1.00
50
R2
Investigation: Reporter Gene Assay RSM with CCF (MLR) Scaled & Centered Coefficients for S/B
0.80
0
0.60
-50
0.40
StH*StH Cel*StH Cel*Cel StH*Ion StH*Ion Cel*Ion Ion*Ion Cel*Cel StH
N=18 DF=8
Cel
0.00
S/B
Ion
0.20
-100
R2=0.908 Q2=0.558
N=18 DF=8
Investigation: Reporter Gene Assay RSM with CCF (MLR) Q2 Model Validity Summary of Fit Reproducibility 1.00 0.80
0
Investigation: Reporter Gene Assay RSM with CCF (MLR) Scaled & Centered Coefficients for S/B
50
S/B
N=18 DF=11
R2=0.896 Q2=0.739
N=18 DF=11
Ion*Ion
StH
Cel
Ion
Page 14 (20)
There are no outliers (below, left) and the residuals are independent of the predicted value (below, right).
Investigation: Reporter Gene Assay RSM with CCF (MLR) S/B with Experiment Number labels 0.98 0.95 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.05 0.02 -4 -3
Investigation: Reporter Gene Assay RSM with CCF (MLR) S/B with Experiment Number labels Deleted Studentized Residuals 2 1 0 -1 -2 20 40
1 16 17 15 3 7 6 4 10 8 12 5 14 2 11
1 3 2 9
60 80 100 120 140 160 180 200 220 Predicted
N=18 DF=11 R2=0.896 Q2=0.739 R2 Adj.=0.840 RSD=22.9934
MODDE 7 - 2003-11-27 10:20:44
N-Probability
4 10 5 11 13
16 17 15 6
7 12
8 14
18 13 9
-2 -1
18
Task 3.3
The contour plots below show how the signal-to-background ratio varies in relation to the three factors. The optimum factor combination is high Stimulation time (6 hours), high Ionomycin (2) and intermediate Cells (around 320000).
The Optimizer was used to obtain more exact co-ordinates of the optimum.
Page 15 (20)
After the first optimisation round the 8th simplex was found to be best.
During the second optimisation round, new starting points were generated in the vicinity of the best simplex from the first round.
Four of the five new simplexes converge to the same point: Cells 320000, Stimulation Time = 6 and Ionomycin = 2.
Page 16 (20)
The results for the best predicted simplex were transferred to the SweetSpot plot, a plot which clearly show the location of the optimal point.
Further, the five simplex factor co-ordinates were transferred to the prediction list showing that the predicted optimal S/B value is 260 40.
Conclusions of Phase 3
The optimal factor combination within the investigated experimental domain is Cells 320000, Stimulation Time = 6 and Ionomycin = 2. In the final DOE stage, this point will be assessed for robustness. However, due to practical considerations, robustness testing was not performed on this precise point but rather one close to it (see Phase 4).
Page 17 (20)
8
80.00 70.00 60.00 50.00 40.00
13 1516
8 6
1 2 3 4 5
12 14 11 10
19 18 17
S/B
Count
4 2 0
2.00 4.00 6.00 8.00 10.00 12.00 14.00 16.00 Replicate Index
MODDE 7 - 2003-11-27 11:50:30
52
60.5
69 Bins
77.5
86
94.5
Task 4.2
In robustness testing model refinement is usually not performed and the ideal result is no model at all. The model obtained is poor (R2 = 0.93, Q2 = negative). However, the regression coefficient plot indicates that S/B is sensitive to changes in Ionomycin concentration.
Investigation: Reporter Gene Assay RobTest Frac Fac Q2 (MLR) Model Validity Summary of Fit Reproducibility 1.00 0.80 0.60 0.40 0.20 0.00 -0.20 S/B
-5 10
R2
Cel*StH
Ion*StH
Cel*Ion
Cel*Lys
Ion*Lys
N=19 DF=3
N=19 DF=3
R2=0.930 Q2=-84.830
StH*Lys
Cel*PM
StH
PM*StH
PM*Ion
Cel
Ion
Lys
PM*Lys
PM
Page 18 (20)
Task 4.3
In Task 4.2 it was shown that S/B is not robust to changes in Ionomycin concentration. However, the response data themselves are robust given that they are within specification. The factor range of Ionomycin must be reduced by half in order to make S/B robust. Hence, the concentration range for Ionomycin within which robustness can be claimed is 1.4851.515 g/ml rather than 1.471.53 g/ml.
Investigation: Reporter Gene Assay RobTest Frac Fac (MLR) Scaled & Centered Coefficients for S/B
10
-5
Cel*StH
PM*StH
Ion*StH
Cel*Ion
PM*Ion
Ion*Lys
PM*Lys
N=19 DF=3
R2=0.930 Q2=-84.830
Conclusions of Phase 4
The final DOE phase illustrated the first limiting case of robustness testing, i.e., a significant model and inside specification. S/B was most sensitive to changes in Ionomycin concentration.
StH*Lys
Cel*Lys
Cel*PM
StH
Cel
Ion
Lys
PM
Page 19 (20)
Stimulation time = 5.5 (Six hours was optimal according to the CCF design but 5.5 hours fits in better with an 8 hour working day.) LysVolume = 30 (Low level, as found to be optimal during screening).
The signal-to-background ratio is most sensitive to changes in the Ionomycin concentration. However, the response may be regarded as robust given that all the values were within specification. The final conclusion is that the results of the four phases are both coherent and consistent. This indicates the high quality of the underlying experimental data.
Page 20 (20)
Background
One important property in HPLC is the capacity factor. There are several mobile phase constituents that may influence this chromatographic response, such as, pH, temperature, and type and amount of mobile phase modifiers. Thus, optimization of capacity factors is not always straightforward, but requires design of experiments in combination with multivariate modeling for optimal output. This example is based on the publication of Andersson et al (Chromatographia Vol 38, 715-722, 1994).
Objective
In this example the influence of seven factors on chromatographic response (capacity factors) is investigated. Five factors represent mobile phase modifiers, three uncharged and two charged, and the last two are pH and column temperature. The chromatographic response (i.e., capacity factor) for the Chromspher B stationary phase was assessed using five substances (almokalant, amoxicillin, metoprolol, omeprazole and S 29). The goal was to get an overview (screening) of which factors are most influential for the capacity factors, since it is desirable to regulate these through changes in the factors.
Data
Acetoniltrile (ACN), methanol (MeOH) and tetrahydrofuran (THF) represent uncharged modifiers, whilst 1-octanesulphonic acid (OSA) and N,N-dimethyloctylamine (DMOA) correspond to charged ones.
Page 1 (8)
Tasks
Task 1
In MODDE first define the seven factors according to the information given above. The factors OSA and DMOA must be log-transformed (with C1 = 1 and C2 = 0). The next step is to specify the responses. Log-transform all five responses. Set C1 to 1 and C2 to 0 for all responses but Amoxicillin, which should have the settings C1 = 1 and C2 = 0.04. Select Screening as objective and MODDE will then prompt for 16+3 experiments in terms of a 27-3 FFD. Accept this proposal. This design only supports linear terms. However, the experimenters wished to estimate some interaction terms and hence they carried out five extra runs selected D-optimally. To append extra experiments to the worksheet you may right-mouse click in the worksheet window and select Add Experiment. Continue until the worksheet has 24 experiments. In EXCEL open the XLS-file CHROMS_B.XLS and COPY/PASTE the worksheet content to the worksheet generated in MODDE.
Task 2
Evaluate the raw data by creating replicate plots and histograms. Are the responses approximately normally distributed? What about the replicate error, is it large or small compared with the variation across the entire design?
Task 3
Set runs 20-24 as excluded (Excl). Select MLR as FIT METHOD (Analysis/Select Fit Method) and compute the model. Check R2, Q2, MVal, Rep, ANOVA, and N-plot of residuals for each one of the five responses. Can you trace any anomalies in the data? Look at the coefficients and interpret the model. Which factors seem most relevant?
Task 4
Include runs 20-24. Edit the model (Edit/Model) and add the three interaction terms pH*DMOA, ACN*OSA and MeOH*THF. Compute the model with MLR and compare results with Task 3.
Task 5
Use the same data material as in Task 4, but switch to PLS instead of MLR. What are the similarities and differences between the MLR and the PLS models? How are the different responses correlated? Which factors are most meaningful?
Page 2 (8)
Solutions to CHROMSPHER_B
Task 2
Investigation: chroms_b Histogram of OM~ 12 10 8 Count 6 4 2 0 -1.00 -0.60 -0.20 0.20 Bins
MODDE 7 - 2003-11-25 11:25:15
0.60
1.00
1.40
-1.00
-0.65
-0.30
0.05 Bins
0.40
0.75
1.10
-1.00
-0.65
-0.30
0.05 Bins
0.40
0.75
1.10
15
-3.00
-2.45
-1.90
-1.35 Bins
-0.80
-0.25
0.30
-1.00
-0.65
-0.30 Bins
0.05
0.40
0.75
The five histograms show that all responses are approximately normally distributed. This is what you would expect for logtransformed chromatographic data.
Investigation: chroms_b Plot of Replications for OM~ with Experiment Number labels 1.00 0.80 0.60 OM~ 0.40 0.20 0.00 -0.20 0 2 4 6 8 Replicate Index
MODDE 7 - 2003-11-25 11:27:37
Investigation: chroms_b Plot of Replications for S29~ with Experiment Number labels
Investigation: chroms_b Plot of Replications for Almo~ with Experiment Number labels
1 6 8 3 14 16 13 17 10 11
21 20
S29~
1 3
21 6 8 4 12 15 5 7 9 10 11 14 17 13 16 18 19 20
0.50 Almo~
20 8 1 3
0.00
4 12 15 5 7
22
22
6 12 15 4 5 9 11 10
13 14
16 19 17 18
21 22
19 18 24 23
-0.20 0 2 4 6 8
24 23
-0.50 0 2
2 7
4 6 8
24 23
10 12 14 16 18 20 22 24
10 12 14 16 18 20 22 24 Replicate Index
MODDE 7 - 2003-11-25 11:28:06
10 12 14 16 18 20 22 24 Replicate Index
MODDE 7 - 2003-11-25 11:28:23
Investigation: chroms_b Plot of Replications for Amox~ with Experiment Number labels
Investigation: chroms_b Plot of Replications for Meto~ with Experiment Number labels
1 2 5 34 15 12
6 9 7 8 17 11 14 10 13 16 22 21 20 24 23
20 6 3 2 7
0 2 4 6 8
8 13 9 11 10 14
16 17
19
21 22
18
12 15 4 5
18 24 23
19
2 4 6 8 10 12 14 16 18 20 22 24 Replicate Index
MODDE 7 - 2003-11-25 11:28:41
-0.60
10 12 14 16 18 20 22 24 Replicate Index
MODDE 7 - 2003-11-25 11:28:59
The replicate error is very small for each response, in fact so small that it will be difficult to avoid lack of fit in the ANOVA lack of fit test. The replicate plot for the fourth response (Amox) indicates a deviating behavior of experiment 19.
Page 3 (8)
Task 3
We can see that four out of five responses are well accounted for by the model. One response, Amoxicillin, has a large gap between R2 and Q2 indicating model problems for this response. Model validity is only OK with regards to the first response. In the N-plots below, residuals of a well predicted (S29) and a poorly predicted (Amox) response are plotted. For the problematic response (Amox), experiments 1, 11, 17, and 19 stick out a little, but they are still inside 4 standard deviations. The coefficient plot reveals that for most responses the coefficient patterns are similar. The notable exception is Amox, for which the factor pH has a negative coefficient, and not a positive one as for the other responses.
Investigation: chroms_b (MLR) Summary of Fit 1.00 0.80 0.60 0.40 0.20 0.00 -0.20 OM~ S29~ Almo~ Amox~ Meto~
R2 Q2 Model Validity Reproducibility
Investigation: chroms_b (MLR) S29~ with Experiment Number labels 0.98 0.95 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.05 0.02 -4 -3
4 12 15 20 9 8 18 14 2 13 10 53 6 19 17 7 16
-2 -1 0 1
1 11
N-Probability
N=20 DF=12
Investigation: chroms_b (MLR) Amox~ with Experiment Number labels 0.98 0.95 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.05 0.02 -4 -3
1 189 14 13 2 7 8 20 3 16 5 6 15 4 10 12
0 1 2
11
N-Probability
0.50
0.00
17 19
-2 -1
-0.50
OM~
S29~
Almo~
Amox~
Meto~
N=20 DF=12
Task 4
Evidently, the modeling of all responses benefits from the inclusion of the three cross-terms. The Nplot of Amox residuals has improved slightly, because now only experiments 1 and 19 deviate. The MeOH-THF term is most powerful among the cross-terms, which is seen in the coefficient plots. In the interaction plots, it is possible to discern that the MeOH-THF interaction is more pronounced for Amox than for S29.
Copyright Umetrics AB, 04-02-10 Page 4 (8)
Investigation: chroms_b (MLR) Summary of Fit 1.00 0.80 0.60 0.40 0.20 0.00 -0.20 OM~ S29~ Almo~ Amox~
Investigation: chroms_b (MLR) Amox~ with Experiment Number labels 0.98 0.95 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.05 0.02 -4 -3
19
-2
8 13 7 11 923 16 5 14 18 10 24 20 22 6 17 15 4 12 3 2 21
-1 0 1 2 3
N-Probability
Meto~
N=24 DF=13
Investigation: chroms_b (MLR) Scaled & Centered Coefficients for S29~ 0.10 0.20
0.00
0.00
-0.10
-0.20
N=24 DF=13
R2=0.978 Q2=0.930
N=24 DF=13
R2=0.866 Q2=0.522
Investigation: chroms_b (MLR) Interaction Plot for MeO*THF, resp. S29~ 0.50 0.40 Amox S29 0.30 0.20 0.10 0.00 16 18 20 22 24 -0.80 -0.90 -1.00 -1.10
THF (low ) THF (high)
THF (low)
-0.70
THF (low)
THF (high)
THF (high)
16 18 20 22 24 MeOH
N=24 DF=13 R2=0.866 Q2=0.522
N=24 DF=13
R2=0.978 Q2=0.930
R2 Adj.=0.961 RSD=0.0621
MODDE 7 - 2003-11-25 11:39:53
R2 Adj.=0.763 RSD=0.1811
MODDE 7 - 2003-11-25 11:40:11
MeO*THF
MeO
DMO~
ACN
THF
pH
OSA~
Page 5 (8)
Task 5
According to the R2- and Q2-values of the individual responses, the MLR and PLS models provide similar results. However, one must realize that when using MLR there are five models to consider, whereas PLS only fits one model to all responses.
Investigation: chroms_b (MLR) Summary of Fit 1.00 0.80 0.60 0.40 0.20 0.00 -0.20 OM~ S29~ Almo~ Amox~ Meto~
R2 Q2 Model Validity Reproducibility
Investigation: chroms_b (PLS, comp.=4)Q2 Model Validity Summary of Fit Reproducibility 1.00 0.80 0.60 0.40 0.20 0.00 -0.20 OM~ S29~ Almo~ Amox~ Meto~
R2
N=24 DF=13
N=24 DF=13
The PLS model has four components. For the response S29, which is strongly correlated with OM, Almo, and Meto, we can see that primarily the first PLS component explains response variation. For the deviating response Amox, however, the first component of the PLS model reflects hardly any variation. Rather, the second component models this response.
Investigation: chroms_b (PLS, comp.=4) PLS Summary (cum) for S29~ 1.00 0.80 R2 & Q2 0.60 0.40 0.20 0.00 Comp1 Comp2 Comp3 Comp4 R2 & Q2
R2 Q2
Investigation: chroms_b (PLS, comp.=4) PLS Summary (cum) for Amox~ 1.00 0.80 0.60 0.40 0.20 0.00 Comp1 Comp2 Comp3 Comp4
R2 Q2
N=24 DF=13
R2=0.961 Q2=0.850
R2 Adj.=0.932 RSD=0.0823
MODDE 7 - 2003-11-25 13:25:50
N=24 DF=13
R2=0.836 Q2=0.548
R2 Adj.=0.710 RSD=0.2002
MODDE 7 - 2003-11-25 13:26:33
PLS provides a diagnostic tool visualizing the correlation pattern between the X-factors and the Yresponses, namely the PLS t/u score plot. The first component, accounting for almost 72% of the response variation, captures a strong correlation between X and Y. The second component, which explains another 16% of the Y-variation, uncovers a weakly deviating feature of experiment number 19, i.e., the same phenomenon observed in the foregoing exercises.
Page 6 (8)
Investigation: chroms_b (PLS, comp.=4) Score Scatter: t[1] vs u[1] with Experiment Number labels
Investigation: chroms_b (PLS, comp.=4) Score Scatter: t[2] vs u[2] with Experiment Number labels 2
20 21 86 1 13 16 3 1722 14 9 4 12 15 5 19 18
1 u[1] 0 -1 -2
-2
8 16
11 7 24 10 23
-2 -1
u[2]
75 10 23 4 15 12 20 18 13
6 1 22 17 92 21 11 24 14 3
-4
19
0 t[1] 1 2 -3 -2 -1 t[2]
N=24 DF=13 Cond. no.=3.2013 Y-miss=0
N=24 DF=13
The third and fourth components also display reasonable correlations between t and u, considering they merely model 4 and 2% of the variation in the responses. The third component reveals a weak non-linear relationship.
Investigation: chroms_b (PLS, comp.=4) Score Scatter: t[3] vs u[3] with Experiment Number labels 3 2 1 u[3] 0 -1 -2 -3
Investigation: chroms_b (PLS, comp.=4) Score Scatter: t[4] vs u[4] with Experiment Number labels 2
7 1 8 6 23 13 16 9 22 20 4 21 2412 15 2 14 10
5 17
u[4]
1 0 -1 -2
5 1 18 7 21 14 3
-1
24 19 15 12 4
13 9 11 6 20 8 17 23 16 22 2
11
18 3 19
-1
10
0 t[4] 1
0 t[3]
N=24 DF=13
N=24 DF=13
Because PLS fits only one model to all responses, we may use the PLS loading plot to overview the relationships among all factors, cross-terms, and responses at the same time. The loading plot given below represents 88% of the response variation. This plot corroborates that Amox provides unique information about the experiments. The other four responses are correlated, and correlation coefficients among them always exceed 0.75 (Hint: use Worksheet/Correlation/Correlation Matrix). The loading plot also suggests that the factors pH, ACN, THF, and MeOH are most influential for the responses. The three cross-terms are of comparatively low importance. Basically, the VIP plot confirms the conclusions drawn from the loading plot.
Page 7 (8)
Investigation: chroms_b (PLS, comp.=4) Loading Scatter: wc[1] vs wc[2] 0.60 0.40 0.20 wc[2] 0.00 -0.20 -0.40 -0.60 wc[1]
N=24 DF=13 Cond. no.=3.2013 Y-miss=0
1.50
Al~ Me~
VIP
OSA~
S29~ OM~
1.00
0.50
N=24 DF=13
Conclusions
This application shows how DOE can be applied to explore the performance of chromatographic equipment. Seven factors were screened and the resulting models (MLR or PLS) revealed that four factors (pH, ACN, THF, and MeOH) were considerably more meaningful than the others. These are the factors to consider for further studies, e.g., optimization modeling. One response, Amox, was different, mainly because of a distinctively different pH-dependence. A separate MODDE investigation for this response is reasonable.
0.00
Background
Omeprazole is a potent inhibitor of gastric acid secretion and is frequently used against acid-related diseases in the stomach. Both enantiomers of omeprazole are effective in this respect. Omeprazole is metabolised to intermediary products of which hydroxylated omeprazole is the main metabolite. This metabolite is able to block the enzyme H+,K+-ATPase selectively. This enzyme is responsible for the gastric acid production.
Objective
The experimental objective of this study was to optimise the chiral separation (using HPLC) of the (R)- and (S)-enantiomers of omeprazole and its main metabolite hydroxylated omeprazole. In chromatography, the objective is separation of the analytes within a reasonable time. Separation relies on different retention of each analyte on the chiral stationary phase. Thus, the retention of each analyte is important and this response is described by the capacity factor, k. The degree of separation between two analytes is estimated as the resolution between two adjacent peaks in the chromatogram. A resolution of 1 is the minimum acceptable for separation of neighbouring peaks, but for complete baseline separation a resolution above 1.5 is required. In this application, four HPLC factors were varied: mobile phase pH, concentration of the organic eluent modifier acetonitrile (ACN), ionic strength and temperature. Logarithmically transformed capacity factors were measured for the four solutes (R-omeprazole, S-omeprazole, Rhydroxyomeprazole, S-hydroxyomeprazole). The experimental data are taken from the following reference: Karlsson, A., and Hermansson, S., Optimisation of Chiral Separation of Omeprazole and One of Its Metabolites on Immobilized 1-Acid Glycoprotein Using Chemometrics, Chromatographia, 44, 10-18, 1997. In the treatment of the experimental data below, solute 1 is omeprazole and solute 2 is hydroxyomeprazole. The (R)- and (S)-notation indicates different enantiomers. Capacity factors are denoted k and there are four of these. The resolution responses of interest are denoted Res. The experimental objective was to find a factor combination which: (a) achieves retention times (capacity factors) of less than 15 minutes (b) maintains resolution above 1.5 (complete baseline separation).
Page 1 (10)
Data
Factors
Responses:
Design:
Page 2 (10)
Tasks
Task 1
Create a new project in MODDE. Define the four factors and the eight responses as outlined above. Note 1: The four capacity factors are commonly analysed after transforming to logs. Note 2: The last four responses are derived from the four capacity factors. Res1 is k(S)-1 divided by k(R)-1. Res2 is k(S)-2 divided by k(R)-2. Res3 is k(R)-1 divided by k(R)-2. Res4 is k(S)-2 divided by k(R)-1.
The four derived responses are not shown in the worksheet until a model has been fitted. Select RSM and the second-ranked Reduced CCF design augmented with four centre-points. There are different versions of this design and the one used by the original investigators is not the same as that recommended by MODDE. Therefore, Copy/Paste the contents of CHIRAL SEPARATION.XLS into MODDE. Evaluate the raw data and the underlying design (replicate plot, histogram, scatter plot of responses, correlation matrix, etc). Are the responses approximately normally distributed? How large or small is the replicate error?
Task 2
Fit the model and review and interpret the results. How are the eight responses related? Which model terms are most important? Which factor settings meet the objectives of the study, i.e. capacity factors below 15 minutes and resolutions above 1.5?
Task 3
The experimenters carried out one verifying experiment to test the predictive power of the model. The verifying experiment was Eluent modifier = 11%, Temperature = 25 C, Ionic Strength = 0.02, and pH = 6.3. At this point, the measured capacity factors were: k(R)-1 = 2.48, k(S)-1 = 5.86, k(R)-2 = 1.59, and k(S)-2 = 3.18. How do the predictions from your model compare with these actual measurements?
Page 3 (10)
4 5
7 13 8 6 12 11 14 9 10 23 20 22 17 21 18 19 24
k(S)-1~
34
7 13 5 8 6 12 11 15 20 23 22 17 18 21 24 19 14 16 9 10
0.80
15
1 2
4 6
16
1 2
4 6
10 12 14 16 18 20 22
10 12 14 16 18 20 22
Replicate Index
MODDE 7 - 2003-11-26 18:29:03
Replicate Index
MODDE 7 - 2003-11-26 18:29:22
Investigation: Chiral Separation Plot of Replications for k(R)-2~ with Experiment Number labels 0.40
Investigation: Chiral Separation Plot of Replications for k(S)-2~ with Experiment Number labels 0.80
3
0.20 k(R)-2~ 0.00 -0.20 -0.40 0 2
7 13 5 8 6 12 11 14 15 24 23 22 21 18 20 17 19 16
34
7 13 15 5
1 2
4 6
12 11 14
20 24 22 21 18 23 17 19 16
9 10
8 10 12 14 16 18 20 22 Replicate Index
MODDE 7 - 2003-11-26 18:29:40
2
2 4 6 8
9 10
Replicate Index
10 12 14 16 18 20 22
Page 4 (10)
The appropriateness of the log transformation is confirmed by the shape of the histograms of each response (below).
Investigation: Chiral Separation Histogram of k(R)-1~ 12 15 10 8 Count 6 4 2 0 -0.30 -0.15 0.00 0.15 0.30 0.45 0.60 0.75 Bins
MODDE 7 - 2003-11-26 18:30:46
Count
10
-1.00
-0.60
-0.20
0.20 Bins
0.60
1.00
1.40
Investigation: Chiral Separation Histogram of k(R)-2~ 10 8 Count Count 6 4 2 0 14 12 10 8 6 4 2 -0.50 -0.35 -0.20 -0.05 Bins
MODDE 7 - 2003-11-26 18:31:27
0.10
0.25
0.40
-1.00
-0.65
-0.30
0.05 Bins
0.40
0.75
1.10
Task 2
A quadratic regression model was fitted to the response data. The summary plot (below) indicates that the first response has an excellent model but responses 2-4 suffer from lack of fit.
Investigation: Chiral Separation (MLR) Summary of Fit 1.00 0.80 0.60 0.40 0.20 0.00 -0.20 k(R)-1~ k(S)-1~ k(R)-2~ k(S)-2~
R2 Q2 Model Validity Reproducibility
N=24 DF=9
Page 5 (10)
The correlation matrix is useful for examining how the derived responses correlate with the measured ones. The figure below is an excerpt of the complete correlation matrix showing just the portion related to the responses. It is evident that all four capacity factors are strongly correlated and so it is reasonable to include them in the same investigation where we would expect them to have similar patterns of regression coefficients. The only really different response is the derived response Res3.
The coefficient overview plot shown below confirms the similarity of the coefficient profiles. There are no coefficients for the derived responses as these are generated from the fitted capacity factors. Overall, the linear terms dominate and by far the most important factors are concentration of acetonitrile (ACN) and temperature. It can also be seen that pH has some influence on the second, third and fourth responses. There is some evidence of a quadratic effect of temperature for the third and fourth responses.
Investigation: Chiral Separation (MLR) Normalized Coefficients 0.20 0.00 -0.20 -0.40 -0.60 -0.80 -1.00 k(R)-1~ k(S)-1~ k(R)-2~ k(S)-2~
ACN temp Ion pH ACN*ACN temp*temp Ion*Ion pH*pH ACN*temp ACN*Ion ACN*pH temp*Ion temp*pH Ion*pH
N=24 DF=9
The predictive power of the models was improved by removing non-significant model terms. The regression coefficients of the refined models are shown below.
Page 6 (10)
Investigation: Chiral Separation (MLR) Scaled & Centered Coefficients for k(R)-1~
Investigation: Chiral Separation (MLR) Scaled & Centered Coefficients for k(S)-1~
0.00
min
-0.10
-0.20 -0.30 ACN*ACN ACN*temp ACN*ACN pH ACN ACN Ion Ion pH temp*temp temp*temp ACN*temp
R2 Q2 Model Validity Reproducibility
temp
min
N=24 DF=16
R2=0.984 Q2=0.967
N=24 DF=16
temp
R2=0.995 Q2=0.987
Investigation: Chiral Separation (MLR) Scaled & Centered Coefficients for k(R)-2~
Investigation: Chiral Separation (MLR) Scaled & Centered Coefficients for k(S)-2~
0.00 min ACN*ACN ACN Ion pH temp*temp ACN*temp temp -0.10 -0.20 -0.30 ACN*ACN ACN Ion pH temp*temp ACN*temp temp
N=24 DF=16
R2=0.978 Q2=0.944
N=24 DF=16
R2=0.990 Q2=0.973
Notice how much the Q2 have increased as a result of the model pruning (see summary plot below) although responses 2, 3, and 4 still exhibit significant Lack of fit. It is concluded that this is due to the extremely low replicate errors for these three responses.
Investigation: Chiral Separation (MLR) Summary of Fit 1.00 0.80 0.60 0.40 0.20 0.00 -0.20 k(R)-1~ k(S)-1~ k(R)-2~ k(S)-2~
N=24 DF=16
In order to interpret the regression models, we created the eight response contour plots shown below. These were constructed using Eluent modifier (ACN) and Temperature as the axes and fixing Ionic strength and pH at their centre levels. The two quartets of response contour plots suggest that the lower left-hand corner (ACN low, temp low, Ion centre, pH centre) is the most interesting.
Page 7 (10)
Page 8 (10)
The conclusion from the eight response contour plots is that it will not be a problem to achieve retention times below 15 minutes. It should also be possible to get all four resolution responses above 1.5. To check this, we used MODDEs Optimizer functionality to locate the optimum factor settings. Because we already know that the capacity factors are not a problem, we excluded them from the optimisation. The specification of the response targets is shown below.
According to the results of the Optimizer (below), simplex #6 is the best. This combination of Eluent modifier = 10%, Temperature = 20C, Ionic strength 0.04 and pH 6.6 is close to the lower lefthand corner identified above in the eight response contour plots.
Page 9 (10)
Task 3
The prediction list below shows point estimates and their associated 95% confidence intervals. The results for the verifying experiment all fall within the 95% confidence interval which corroborates, albeit with just one point, the predictive power of the model.
Conclusions
Excellent R2 and Q2 were obtained for all four capacity factors. However, the models for responses 2-4 suffered from significant lack of fit, which was undoubtedly due to the extremely low replicate errors associated with these responses. The predictions for the verifying experiment were very close to the actual results obtained which gives confidence in the predictive power of the models. Using MODDEs Optimizer, it was easy to locate the factor settings which met the experimental objectives: Eluent modifier = 10%, Temperature = 20C, Ionic strength 0.04 and pH 6.6. These settings ensure complete baseline separation within reasonable retention times.
Page 10 (10)
Background
In the pharmaceutical industry it is important to study metabolism of candidate drugs. One approach is to incubate substances with microsomal preparations which may be used as model systems to investigate e.g. liver metabolism. During incubation, dedicated inhibitors may be used to block enzymes. This will help uncover which enzyme in the microsomes is responsible for metabolizing a specific drug. However, in order to obtain reliable results it is first necessary to ascertain that the compound under study is sufficiently well metabolized. In the current application the aim was to ensure that drug metabolism exceeded 40%. The example originates from Carlsson Research AB, Gothenburg, Sweden, and we gratefully acknowledge the company for allowing us to use this data set.
Objective
The objective of the investigation was to optimise the assay conditions for the enzymes such that a maximum of 60% of the drug was left after incubation with the microsomal preparation.
Data
The following five factors were of interest:
Comments/Explanation: Drug Drug concentration [M]. The higher the drug concentration the greater the risk the drug itself will inhibit the enzymes. Expressed on a log-scale. Microsome Microsome concentration [mg/ml]. The more the enzyme the more rapid the metabolism. Expressed on a log-scale. NADPH NADPH concentration [mM]. Enzyme co-factor. The more co-factor the less risk for total NADPH depletion before the end of the experiment. Expressed on a log-scale. Time Duration of incubation [min]. A longer duration will give the enzymes more opportunity to metabolize the drug. The risk is, however, that other factors will be depleted and hence there may be no net gain from prolonging the incubation time. Ionic strength Ionic strength of the Na/K-phosphate buffer used [mM]. This buffer may affect the ability of the enzymes to interact with the drug. The following response was recorded:
Comments/Explanation: %Left Amount of drug left at the end of the incubation experiment by LC-MS. The experimental objective was to achieve a figure less than 60%.
Page 1 (6)
In order to conduct this optimization study, the five factors were varied using a CCF design. This is a standard RSM design in 26 + 3 experiments.
Tasks
Task 1
Initiate a new investigation in MODDE and define the factors and the response according to the information given above. Remember to specify the log-transform for the three first factors. Select RSM as objective. Accept the recommended 29 run design (CCF design in 26 runs plus 3 centre-points). Enter the response values in the Worksheet. Evaluate the raw data. Are there any outliers? Is there a need for response transformation? What can you say about the replicate error?
Task 2
Fit the quadratic regression model. Determine which factors have the strongest influence on the metabolism of the drug by looking at the coefficient plot. Review the fit and revise the model if needed. Which factor combination represents the optimal metabolism environment for the enzymes in the microsomal preparations?
Page 2 (6)
Solutions to Metabolism
Task 1
Experiment number 15 deviates from the rest (below, top left). This is a very interesting point as it is the only one in the worksheet meeting the stipulated goal of %Left less than 60%. Hence, we are reluctant to remove it. This experiment also causes the distribution of %Left to be skewed (below, top right). The replicate error is very small compared with the overall response variation. One possible remedy might be the NegLog transformation. The results after applying this transform are shown in the two lower plots. Evidently, the NegLog transformation is a sensible choice since the distribution of %Left is closer to a normal distribution after transformation. In the following, we will work with the transformed response variable.
Investigation: Metabolism Plot of Replications for %left with Experiment Number labels
100 90 80 %left 70 60 50 0 2 4 6
1 2
6 4 3 5 7 9 8
10
13 12 14
11
Count
19 23 24 27 29 28 18 21 22 17 20 25 26 16
15
8 10 12 14 16 18 20 22 24 26 28 Replicate Index
MODDE 7 - 2003-12-03 11:33:30
0.00
44
54
64
74 Bins
84
94
104
Investigation: Metabolism Plot of Replications for %left~ with Experiment Number labels
1
-0.50
2
%left~ -1.00
6 4 5
10 23 24 27 19 13 29 12 28 14 18 21 9 22 17 25 2 6 16 20 78 11 15
8 10 12 14 16 18 20 22 24 26 28 Replicate Index
MODDE 7 - 2003-12-03 11:45:53
-1.50
3
0 2 4 6
-0.80
-0.50
-0.20
Page 3 (6)
Task 2
The fitted quadratic model contains 5 (linear) + 5 (quadratic) + 10 (two-factor interaction) = 20 terms plus the constant. Clearly, many of these are not significant according to the confidence intervals. The model also has negative Q2, which is unsatisfactory. The normal probability plot suggests no outliers in the data and there is no lack of fit (MVal > 0.25).
Investigation: Metabolism (MLR) Summary of Fit
1.00 0.80 0.60 0.40 0.20 0.00 -0.20
R2 Q2 Model Validity Reproducibility
%left~
N=29 DF=8 Cond. no.=7.2003 Y-miss=0
15 26
-2
10 6 4 25 227 20 22 29 13 11 28 24 7 17 1 6 9 5 3 18 14 12 23 8 21 19
-1 0 1 2
N-Probability
Page 4 (6)
In order to improve the modelling results, the following six model terms were discarded: Drug*NADPH, NADPH*Time, NADPH*Ionic strength, Drug*Drug, Mic*Mic & NADPH*NADPH. For this model the performance statistics are: R2 = 0.96, Q2 = 0.85, MVal = 0.41, & Rep = 0.99. These are excellent results and there are no outliers. Hence, the model may be used for predicting a region in experimental where the goal of %Left < 60 is attained.
Investigation: Metabolism (MLR) Summary of Fit
1.00 0.80 0.60 0.40 0.20 0.00 R2 Q2 Model Validity Reproducibility
%left~
N=29 DF=14 Cond. no.=5.3422 Y-miss=0
N=29 DF=14
R2=0.961 Q2=0.850
1 27 4 10 29 20 24 28 2 22 1 3 1 6 14 5 16 7 1 7 3 15 23 9 8 12 18 19 21 26
-2 -1 0 1 2
25
N-Probability
Page 5 (6)
The contour plot below shows a saddle surface and achieving %Left < 60 is not difficult. Staying in the lower part of the contour plot (low ionic strength and drug concentration, and high microsome and cofactor concentration) may enable response values as low as 20% left (80% of the drug is metabolized). The sweet-spot plot is coded according to the requirement on the response variable.
Conclusions
A very strong quadratic model was obtained. Using the factor combination, low ionic strength and drug concentration, high microsome and cofactor concentration, and 4 hours gives the lowest %Left inside the region explored. The relevance of this area was later verified using additional experiments.
Page 6 (6)
Background
The Willgerodt-Kindler reaction, a rearrangement that takes place when aryl-alkyl-ketones are heated in the presence of sulphur and an amine, is difficult to explain. One way of investigating the reaction mechanism is to find the factors that have the greatest influence on the reaction. The current data are drawn from the thesis of Torbjrn Lundstedt, Ume University, 1986.
Objective
To determine which factors are the most important using a fractional factorial design. To optimise the system utilising response surfaces.
Data
The proportion Sulphur/Ketone (mol/mol) The proportion Amine/Ketone (mol/mol) Temperature (C) Grain size of Sulphur (mm) Stirring speed (rpm)
Page 1 (5)
Tasks
Phase 1: Screening Task 1
Generate a 25-1 fractional factorial design. Enter the response values. Calculate a model showing the influence of the factors on the yield.
Task 2
Why dont you get a summary of fit plot? Edit the model by removing the two smallest terms. Recalculate the model. Which terms do you think are significant? Which factors can be neglected in further investigations?
Page 2 (5)
Solutions to Willge
Task 2
The design is saturated, i.e., there are no degrees of freedom left because we fitted a model of 16 terms to a design of 16 experiments. One way to alert the user of this undesirable situation is to deny plotting of R2, R2adj, or Q2. In the coefficient plot no confidence intervals are given (this is because RSD = 0 and because the tdistribution is undefined for zero degrees of freedom). Nevertheless, we can see that Te has the largest influence on Yield.
Investigation: Willges (MLR) Scaled & Centered Coefficients for Yield
15 10 % 5 0 -5 SK*MK MK*Sti MK SK*Sti Sti Te*Sti Te*Pa MK*Te MK*Pa SK*Te SK*Pa Pa*Sti Te*Sti SK Te Pa Te*Pa
N=16 DF=0
Conf. lev.=0.95
MODDE 7 - 2003-11-19 10:17:47
When removing the two smallest model terms, Pa*Sti and MK*Sti, a model is obtained that explains and predicts the variance in the data very well. From the coefficient plot we conclude that the three factors SK, MK, and Te have the largest influence on the Yield. Sti is also significant but will be neglected in further investigations. Through this screening we have thus reduced the number of factors from 5 to 3.
Investigation: Willges (MLR) Summary of Fit 1.00
20
R2 Q2
0.80
10
Yield
Cond. no.=1.0000 Y-miss=0
N=16 DF=2 R2=0.999 Q2=0.930 R2 Adj.=0.992 RSD=2.2849 Conf. lev.=0.95
MODDE 7 - 2003-11-19 10:18:52
Page 3 (5)
Task 3
When fitting the quadratic regression model to the data of the CCC design, a model was obtained with high R2 (0.98) and Q2 (0.85), but with negative MVal. Another diagnostic tool, the N-plot of residuals, pinpoints an outlier, i.e., experiment number 8. This outlier has to be removed in order to improve the modelling efficiency.
Investigation: Willge_Opt (MLR)
Investigation: Willge_Opt (MLR) Summary of Fit 1.00 0.80 0.60 0.40 0.20 0.00 -0.20
N=20 DF=10
Yield with Experiment Number labels 0.98 0.95 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.05 0.02
12 14 10 17 1 20 16 15 9 19 18 6 2 13 4 11 5 7 3 8
-7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 Deleted Studentized Residuals
N=20 DF=10 R2=0.980 Q2=0.849 R2 Adj.=0.961 RSD=5.0006
MODDE 7 - 2003-11-19 10:23:11
Yield
Cond. no.=3.5887 Y-miss=0
As we can see below, the removal of observation #8 improves the model. We now have an excellent model according to R2 and Q2. Observation #7 is somewhat far away in the residual plot, but it is not an influential point. There is also some indication of lack of fit (MVal), but in this case the replicate error is exceptionally low, which may, at least partly, explain why lack of fit appears.
Investigation: Willge_Opt (MLR) Summary of Fit 1.00 0.80 0.60 0.40 0.20 0.00
N=19 DF=9
R2 Q2 Model Validity Reproducibility
N-Probability
Investigation: Willge_Opt (MLR) Yield with Experiment Number labels 0.98 0.95 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.05 0.02
4 7
-4 -3 -2
5 14 17 9 10 3 20 16 13 15 19 11 18 1 6
-1 0 1 2
12
N-Probability
Yield
Cond. no.=4.0695 Y-miss=0
N=19 DF=9
R2=0.997 Q2=0.967
R2 Adj.=0.993 RSD=2.1588
MODDE 7 - 2003-11-19 10:24:32
Page 4 (5)
N=19 DF=9
R2=0.997 Q2=0.967
By creating the response contour plots shown below, we can see how the predicted Yield changes as a function of changes in the three factors. Evidently, the model forecasts a region of quantitative yield, i.e., with Temperature = 140, and high molar ratios in the other two factors.
Conclusions
This example illustrates the working principle of first conducting a careful screening investigation and thereafter a detailed optimisation study. The screening phase identified three factors as more influential than the other factors. When bringing these three factors into the optimisation stage, an area of quantitative yield (i.e., 100%) was discovered. The appropriateness of this region of operability was later verified experimentally by Torbjrn Lundstedt.
Page 5 (5)
Background
The stability of an analytical method (in this case release curves) cannot be investigated by changing one factor at a time. More information about the stability can be extracted using DOE. In this case, a small (small volume) design is laid out to describe how the factors should be altered around the standard settings to acquire information on the stability of the method (sensitivity to change). This example is intended to show how experimental design can simplify the examination of a methods sensitivity to small factor changes. The Drug D data originate from a pharmaceutical study at Astra Hssle performed by Tina Riesel and sa Backman.
Objective
The objective of this investigation was to examine how the release profile of Drug D was affected by changes in standard conditions (see below). Is the release after 1 hour the same as after 10 hours? Changes in standard conditions here refer to changes in the four factors: volume of an artificial stomach, its temperature, its fluctuation, and pH. In the written documentation, the manufacturer declared that after 1h the release should be between 20 and 40%, and after 10h above 80%. In addition, the specification stated that the factors should not cause more than 5% (1h) or 10% (10h) spread in each response. Hence, one experimental goal was to assess whether the variation in the release rates across the entire design was consistent with this claim.
Data
Page 1 (4)
Tasks
Task 1
Generate a design so that a model with square terms can be evaluated (select a CCF-design). Enter the response data, calculate the model, and interpret the results.
Task 2
Refine the model and make response contour plots to examine whether the responses change appreciably.
Task 3
According to the specification, the changes in the factors should not cause the response to vary more than 5% (1h) or 10% (10h). Is this specification met?
Solutions to Drug D
Task 1
As seen in the Summary of Fit plot the difference between R2 and Q2 is quite large for 10h, which indicates that the model might be too complicated, or that there are some outliers. As shown by the coefficient plot, there are several coefficients that are near zero.
Investigation: DrogenD (MLR) Summary of Fit 1.00 0.80
R2 Q2 Model Validity Reproducibility
Investigation: DrogenD (MLR) Scaled & Centered Coefficients for Release 10h 1 0 % -1 -2 -3
0.50 0.00
% -0.50 -1.00 Vol Te Sti pH Vol*Vol Te*Te Sti*Sti pH*pH Vol*Te Vol*Sti Vol*pH Te*Sti Te*pH Sti*pH
Release 1h
N=27 DF=12
Release 10h
Cond. no.=6.6122 Y-miss=0
N=27 DF=12
R2=0.948 Q2=0.710
N=27 DF=12
Task 2
After the removal of five terms we get an excellent model for the first response and a good model for the second response. We see that the 1h response is much more influenced by linear contributions from the factors than the 10h response. On the other hand, the quadratic influence is more pronounced for the latter response.
Investigation: DrogenD (MLR) Summary of Fit 1.00 0.80
R2 Q2 Model Validity Reproducibility
Vol*Vol
Vol*pH
Release 1h
N=27 DF=17
Release 10h
Cond. no.=5.9887 Y-miss=0
N=27 DF=17
R2=0.943 Q2=0.864
N=27 DF=17
R2=0.889 Q2=0.705
pH*pH
Page 2 (4)
Sti*pH
Vol
pH
Sti
Te
Te*Te
Vol Te Sti pH Vol*Vol Te*Te Sti*Sti pH*pH Vol*Te Vol*Sti Vol*pH Te*Sti Te*pH Sti*pH
R2=0.908 Q2=0.333 R2 Adj.=0.801 RSD=0.7162 Conf. lev.=0.95
MODDE 7 - 2003-11-12 12:52:45
Task 3
To understand the features of the two responses better, we created the response contour plots shown below. These two figures suggest that the responses change dramatically as a result of altered factor settings. However, this is misleading.
Page 3 (4)
Let us examine these plots more closely. We are tricked by the way they were constructed. More appropriate plots are given below. These plots are response surface plots in which the z-axis, the release axis, has been rescaled to values between 20 and 40% for 1h and between 80 and 100% for 10h. These are more appropriate ranges according to the original objectives of the investigation. Now we can see that the response surfaces are actually quite flat. Remarkably, the difference between the highest and the lowest measured values is as low as 4.1% for 1h and 5.5% for 10h. Hence, we conclude that the release responses are robust because they are inside the given specifications (less than 5% or 10% variation).
Conclusions
The release rate after one hour mainly relates linearly (except for pH*pH) to the four factors. The extent of quadratic dependence is more apparent for the release rate after ten hours. The specification for the 1h response is met. The specification for the 10h response is met.
Page 4 (4)
Background
The preparation of therapeutic products derived from blood of voluntary donors is an important route for tomorrows pharmaceutical industry. This is because human blood and plasma comprises many proteins, which, once extracted and purified, are of great medical and economic importance. Since the health of the millions of patients who receive blood-derived products every year depend on the quality of the processed blood and plasma, it is crucial that high priority is placed on the quality assurance of such products. One big risk is the transmission of infectious diseases via blood transfusion. Strategies for screening of blood for the detection of infectious agents is advancing, but this is a difficult and time-consuming process due to the continued discovery of new and emerging pathogens. At CLB (Dutch Red Cross)* in Amsterdam, designed experiments are routinely used as part of their viral safety strategy for blood-derived products. The current example is a robustness test investigating the robustness of a viral reduction step in the manufacturing of a solvent/detergent-treated factor IX product called Nonafact. We recall that in a robustness testing study the objective is to probe robustness close to the set point (the set point is usually chosen as the center-point in the design). A robust system copes with small factor changes without compromising its effectiveness. In other words, robustness is a measure of a systems reliability under normal use.
*)
Reference: H. Hiemstra, CLB. Presented at Blood-Products Safety, February 5-7, 2001, MacLean, Virginia, USA, http://www.healthtech.com/2001/bss/.
Objective
The experimental objective of the study here reviewed was to explore how sensitive a viral inactivation step was to changes in six process parameters. The six factors studied were (i) percentage TNBP, (ii) percentage Tween80, (iii) temperature, (iv) amount of protein, (v) pH, and (vi) concentration of NaCl. TNBP (tri-n-butylphosphate) and Tween80 (a detergent) help disintegrate the viruses, and the other factors may affect the viral inactivation process too. The response measured was the change in virus density when comparing density before and after treatment. Virus density is often expressed and valued on a logarithmic scale, and so any decrease in virus density is commonly expressed as [log (initial virus density) log (final virus density)]. This difference is often referred to as the reduction factor, or simply RF, and the higher the better. Maintaining RF > 5 is often used as the specification. In the current study, CLB used three enveloped viruses as models: HIV (Human Immunodeficiency Virus), BVDV (Bovine Viral Diarrhea Virus), and PSR (PseudoRabies Virus). BVDV is used as a model virus for human hepatitis C. Responses were measured within 10 minutes following addition of virucidal chemicals. This is a rather short time frame and in other similar studies up to 30 minutes is used.
Page 1 (6)
Data
Factors (process parameters):
Responses:
Design:
Page 2 (6)
Tasks
Task 1
Start a new MODDE project. Define the six factors and the six responses as outlined above. Select Screening and the Frac Fac Res III design in 8 runs. Use 2 center-points (change the default proposal based on 3 center-points). On your screen the following design should appear:
The design above was not used by the investigators. Instead, they choose to use the following modification:
In order to accomplish the altered experimental design you will have to modify the design manually (or paste the contents of NONAFACT.XLS into the worksheet). Also enter or paste the response data. Evaluate the raw data and the underlying design (replicate plot, histogram, scatter plot of responses, correlation matrix, etc.). Is there a need for response transformation? How large or small is the replicate error? Do the responses comply with the often used specification of staying above an RF of 5. What can you say about the correlation between the six responses? What can you say about the geometry of the underlying design?
Task 2
Select MLR as the fit method and compute the model. Review and interpret the model. Which linear terms are important? Is this system robust?
Page 3 (6)
Solutions to NONAFACT
Task 1
The replicate plots show acceptable spread in the two replicates for all responses but the second one. However, it would have been desirable to have access to at least three replicates. The replicate plots and histogram plots (no plots shown) do not indicate any skewed response. Only the second response (HIV_5min) constantly score RF-values exceeding the often used specification of 5. However, one should remember the short measurement time. Using longer time, e.g., 30 minutes, might have resulted in generally higher RF-values.
Investigation: Nonafact Plot of Replications for HIV_1min with Experiment Number labels Investigation: Nonafact Plot of Replications for HIV_5min with Experiment Number labels 6.10 Investigation: Nonafact Plot of Replications for BVDV_1min with Experiment Number labels 6.00 5.50 5.00 4.50 4.00 1 2 3
2
5.00 HIV_1min
2 6 1 3
1 2 3 4 BVDV_1min
5 2 4
5 3
6 7
8
HIV_5min
6.00
9 10
4.50
9 10
5 4
5 6 7
8 7
4.00
4 1
1 2 3 4 5 6 7 8 9 Replicate Index
MODDE 7 - 2003-11-26 17:56:40
1 3
4 5 6 7 8 Replicate Index
10
8 9 Replicate Index
MODDE 7 - 2003-11-26 17:57:01
8
9
Investigation: Nonafact Plot of Replications for BVDV_5min with Experiment Number labels 7.00 6.50 BVDV_5min 6.00 5.50 5.00 4.50
Investigation: Nonafact Plot of Replications for PSR_2min with Experiment Number labels 4.50
Investigation: Nonafact Plot of Replications for PSR_10min with Experiment Number labels 6
7
PSR_2min 4.00 3.50 3.00 2.50
5 2 6
7
PSR_10min
5 2
3 4
9 10
10 9
3
4 3
8 4 10 9
1
1 2 3 4 5 6 7 8 9 Replicate Index
MODDE 7 - 2003-11-26 17:57:41
1
1 2 3
3
4
4
5 6 7 8
8
9 1
1
2 3 4 5 6 7 8 9 Replicate Index
MODDE 7 - 2003-11-26 17:58:21
Replicate Index
MODDE 7 - 2003-11-26 17:58:02
The table below is the correlation matrix. It shows how all terms in the model and all responses relate to each other. A colored cell indicates high correlation. A number of interesting observations can be made. First of all, we can see that the factors Protein/Tween80 and NaCl/Protein are correlated in a pair-wise fashion. This is unexpected and means that the original investigators have failed to create a correct fractional factorial design. The effect of these non-zero correlations will be inflated confidence intervals around the regression coefficients of Tween80, Protein, and NaCl. Secondly, it appears that the factors TNBP, Tween80, and Temperature generally exert the strongest influence on the responses, i.e., the responses are most susceptible to altered settings in these factors. Thirdly, the response HIV5 seems to be different from the others, since it only correlates appreciably with HIV1. All the other five responses correlate more or less strongly with one another.
Page 4 (6)
Task 2
A linear regression model in seven terms (constant + six main effects) was fitted to each of the six responses. The summary of fit plot below demonstrates that two significant models were obtained, that of HIV_1min and that of PSR_2min. Also recall that in robustness testing we do not generally spend much time with model refinement activities.
Investigation: Nonafact (MLR) Summary of Fit 1.00 0.80 0.60 0.40 0.20 0.00 -0.20
R2 Q2 Model Validity Reproducibility
HIV_1min
HIV_5min
BVDV_1min
BVDV_5min
PSR_2min
PSR_10min
N=10 DF=3
The coefficient overview plot presented below is useful to get the overall picture. In appears that keeping TNBP high (0.35%), Tween80 low (0.8%), Temp high (30C), Protein high (35 mg/ml), pH low (5.5) and NaCl high (1250 mM) generally correspond to the most favorable operating conditions (encoding the highest virus reduction factors). With this setting of pH there is a minor controversy with respect to BVDV_1min, however the level of Tween80, which dominates for this response, is set advantageously.
1.00
0.50
0.00
-0.50
HIV_1min
HIV_5min
BVDV_1min
BVDV_5min
PSR_2min
PSR_10min
N=10 DF=3
Page 5 (6)
By using six response contour plots it is easy to overview the results (see below). These plots were drawn letting TNBP and Temp be the X- and Y-axes, respectively, and by putting Tween80 high, Protein high, pH low, and NaCl high. The colour coding is consistent throughout the six plots. From these plots it is quickly understood that with the short measurement time for the three strains of virus, RF > 5 is not within reach for PSR_2min. The specification is within reach for BVDV_1min if pH is set high. Note that RF for HIV_5min is constantly predicted above 6, hence the flatness of this response contour plot.
Conclusions
Virus reduction factors above 5 are not achievable for PSR2min. The best workable factor combination is TNBP high (0.35%), Tween80 low (0.8%), Temp high (30C), Protein high (35 mg/ml), pH low (5.5) and NaCl high (1250 mM).
Page 6 (6)
Background
The aim of robustness testing is to design a process, or a system, so that its performance remains satisfactory even when some influential factors are allowed to vary. In other words, we want to minimise the systems sensitivity to changes in certain critical factors. The advantages of this include simpler process control, wider range of applicability of product and higher quality of product. A robustness test is usually carried out before the release of an almost finished product, or analytical system, as a last test to ensure quality. Such a design is usually centred on a factor combination, which is currently used for running the analytical system, or the process. We call this the set point. The set point may have been found through a screening design, an optimisation design, or some other identification principle, such as written quality documentation. The aim of robustness testing is, therefore, to explore robustness close to the chosen set point. The example that we have chosen as an illustration originates from a pharmaceutical company. It represents a typical analytical chemistry problem within the pharmaceutical industry. In analytical chemistry, the HPLC method is often mounted for routine analysis of complex mixtures. It is therefore important that such a system will work reliably for a long time, and be reasonably insensitive to varying chromatographic conditions. In chromatography, the objective is separation of the analytes within a reasonable time. Separation relies on different retention of each analyte on the stationary phase. Thus, the retention of each analyte is important, and this response is described by the capacity factor, k. The degree of separation between two analytes is estimated as the resolution between two adjacent peaks in the chromatogram. A resolution of 1 is considered as the minimum value for separation between neighbouring peaks, but for complete baseline separation a resolution of >1.5 is necessary. As the resolution value approaches zero, it becomes more difficult to discern separate peaks.
Objective
The investigators explored five factors: (1) amount of acetonitrile in the mobile phase; (2) pH of mobile phase; (3) temperature; (4) amount of the OSA counter-ion in the mobile phase; (5) stationary phase batch (column), and mapped their influence on the chromatographic behaviour of two chemical analytes. Note that the last factor is of a qualitative nature. To study whether these factors had an influence on the chromatographic system, the researchers used a 12 run experimental design to encode 12 different chromatographic conditions. For each condition, three quantitative responses reflecting the capacity factors of the two analytes (compounds) and the resolution between the analytes were measured. The goal of this study was to constantly maintain a resolution of 1.5 or higher for all chromatographic conditions. No specifications were given for the two capacity responses.
Data
A 12 run design supporting a linear model was constructed. This design, shown below, is a 25-2 fractional factorial design, supplemented with four centre-points.
Page 1 (7)
Tasks
Task 1
Define a new investigation in MODDE with five factors and three responses. Select Screening, a linear model, and a relevant fractional factorial design with 8 + 4 runs. Enter the response data. Evaluate the raw data. Is there any need for data pre-treatment, such as a response transformation?
Task 2
Fit the linear regression model. Which are the important factors? Are there any non-significant model terms? Are the residuals approximately normally distributed? Comment on any lack of fit. Which responses are robust to changes in the five factors?
Task 3
Assuming that the specification for k2 was 2.7 to 3.3, what would your recommendation be for changing the tolerances of the factors so that robustness is likely to be achieved for this response? NOTE #1: This kind of specification of a capacity factor is uncommon in the pharmaceutical industry, but is shown here for illustration. NOTE #2: Use the discussion regarding the four limiting cases of robustness testing. It will give guidance to how this problem might be solved.
Page 2 (7)
Investigation: HPLC Robustness Plot of Replications for k2 with Experiment Number labels 3.40
Investigation: HPLC Robustness Plot of Replications for Res1 with Experiment Number labels
1
2.20 k1 2.00 1.80 1.60
3 5 4 6 8
7 8 9 10 11
3 1 5 4 2 6
1 2 3 4 5 6 7 8 9 10 11 Replicate Index
MODDE 7 - 2003-11-17 10:39:21
3 7 12 11
Res1
7
k2
1
1.850
9 10
3.20
7 5 8 6 10 9 12 11
11 12
9 10
1.800
4 2
1 2 3 4 5 6
8
1.750 7 8 9 10 11
Replicate Index
MODDE 7 - 2003-11-17 10:37:54
Replicate Index
MODDE 7 - 2003-11-17 10:39:50
In evaluation of the raw data, it is compulsory to check the data distribution of the responses, to reveal any need for response transformation. We may check this need by making a histogram of each response. Such histograms are plotted below and they inform us that it is appropriate to work in the untransformed scale of each response. In most cases it is convenient to work with log k, but not here.
Investigation: HPLC Robustness Histogram of k1 7 6 5 Count Count 4 3 2 1 0 1.50 1.75 2.00 Bins
MODDE 7 - 2003-11-17 10:42:54
2.25
2.50
3.30
3.60
0 1.700
1.755
1.810 Bins
1.865
1.920
Task 2
The regression analysis phase in robustness testing is carried out in a manner similar to that of screening and optimisation. However, the focus is primarily placed on the R2 and Q2 parameters, and the analysis of variance results, but not so much on residual plots and other graphical tools. The reason for this is that the interest in robustness testing lies in classifying the regression model as significant or not significant. With such information it is then possible to get an understanding of the robustness. Another modelling difference between robustness testing and screening/optimisation is that model refinement is usually not carried out. We fitted a linear model with 6 terms to each response. The overall results of the model fitting are displayed in the summary of fit plot. The predictive power ranges from poor to excellent. The Q2 values are 0.92, 0.96, and 0.12, for k1, k2, and Res1, respectively. In robustness testing the ideal result is a Q2 of near zero value. Hence, the Q2 of 0.12 for Res1 is an indication of an extremely weak relationship between the factors and the response, that is, it seems as if the response is robust. The low Q2 for Res1 might be explained by the fact that this response is close to constant across the entire design, and hence there is not much response variation to account for. The high Q2s of k1 and k2, on the other hand, indicate that these responses are sensitive to the small factor
Page 3 (7)
changes. However, for these latter responses we cannot make any robustness statement, as there are no specifications to compare with.
Investigation: HPLC Robustness (MLR) Q2 Model Validity Summary of Fit Reproducibility 1.00 0.80 0.60 0.40 0.20 0.00
R2
k1
N=12 DF=6
k2
Cond. no.=1.2289 Y-miss=0
Res1
The results of the second diagnostic tool, the analysis of variance, are summarised in the three tables. Remembering that the upper p-value should be smaller than 0.05 and the lower p-value larger than 0.05, we realise that the former test is a borderline case with respect to Res1, because the upper listed p-value is 0.059. This suggests that the model for Res1 is insignificant, and therefore that Res1 is robust.
Task 3
The derived models will now be used in a general discussion concerning various outcomes of robustness testing. In this discussion a possible solution to the problem given in Task 3 is presented. First limiting case Inside specification/Significant model The first limiting case is inside specification and significant model. The HPLC application contains one example of this limiting case, the Res1 response. We know from the initial raw data assessment that this response is robust, because all the measured values are inside the specification, that is, above 1.5. Actually, as highlighted in the first figure below, the measured values are all above 1.75. The question of a significant model, however, is more debatable. It is possible to interpret the regression model as a weakly significant regression equation. We will do so in this section for the sake of illustration. The classification of the model as significant is based on a joint assessment of the low, but positive, Q2, seen in the second figure, and the significant linear term of acetonitrile, seen in the third figure. Hence, Res1 may be regarded as a representative of the first limiting case.
Page 4 (7)
Investigation: HPLC Robustness Plot of Replications for Res1 with Experiment Number labels 2.50 2.00 1.50 1.00
Investigation: HPLC Robustness (MLR) Q2 Model Validity Summary of Fit Reproducibility 1.00 0.80 0.60 0.40
R2
Investigation: HPLC Robustness (MLR) Scaled & Centered Coefficients for Res1 (Extended) 0.040 0.020 0.000 -0.020 -0.040 pH Co(ColA) Co(ColB) OS Ac Te
10 9
12 11
Res1
0.50 0.00
0.20
1 2 3 4 5 6 7 8 9 10 11 Replicate Index
MODDE 7 - 2003-11-17 10:55:00
0.00
k1
N=12 DF=6
k2
Cond. no.=1.2289 Y-miss=0
Res1
N=12 DF=6 R2=0.772 Q2=0.121
An interesting consequence of these modelling results is that it appears to be possible to relax the factor tolerances and still maintain a robust system. For instance, the model interpretation reveals that the amount of acetonitrile could be as high as 28%, without compromising the goal of upholding a resolution above 1.5. Furthermore, in robustness testing it may be useful to estimate the response values of the most extreme experiments. The regression coefficient plot shows how to obtain these estimates. We can see that one extreme experimental condition is given by the factor combination: low Ac, high pH, high Te, high OS, and ColB. The other extreme experiment is this pattern reversed. The prediction spreadsheet gives these Res1 predictions and they are both valid with regard to the given specification.
Second limiting case Inside specification/Non-significant model The second limiting case is inside specification with a non-significant model. This is the ideal outcome of a robustness test. Again, we will use the Res1 response as an illustration. We know that the measured values of this response are all inside specification. In addition, we can interpret the obtained regression model as nonsignificant. This classification of the model as non-significant is contrary to the classification made in the previous section, but is still reasonable and is made for the purpose of illustrating the second limiting case. In general, to assess model significance, two diagnostic tools emerge as the most appropriate. The first tool is the R2/Q2 parameters. When these are both near zero, as is the situation in the left-hand figure below, we have the ideal case. This means that we are trying to model a system in which there is no relationship between the factors and the response in question. In reality, however, one has to expect that small deviations from this outcome will occur. A typical result is the case when R2 is rather large, in the range of 0.5-0.8, and Q2 low or close to zero. As shown in the middle figure, this is the case for Res1 which points to an insignificant model.
Investigation: itdoe_roblimcases (MLR) Q2 Model Validity Summary of Fit Reproducibility 1.00 0.80 0.60 0.40
0.40 1.00 0.80 0.60
R2
R2
0.20 0.00
vetific
Cond. no.=1.1726 Y-miss=0
k1
N=12 DF=6
k2
Cond. no.=1.2289 Y-miss=0
Res1
Page 5 (7)
The second important modelling tool relates to the analysis of variance, and particularly the upper F-test, which is a significance test of the regression model. We can see in the right-hand figure, that the Res1 model is weakly insignificant because the p-value (0.059) exceeds 0.05. Hence, we conclude that no useful model is obtainable. When no model is obtainable it is reasonable to anticipate that all the variation in the experiments can be seen as a variation around the mean. This variation can then be seen as the mean value t-value * standard deviation. Third limiting case Outside specification/Significant model The third limiting case is outside specification with a significant model. This limiting case occurs whenever a significant regression model is acquired, and the raw response data themselves do not fulfil the goals of the problem formulation. We will use the second response, k2, of the HPLC data to illustrate this limiting case. In order to accomplish a meaningful illustration, we will have to define a specification for k2, for example that k2 should be between 2.7 and 3.3. This kind of specification of a capacity factor is uncommon in pharmaceutical industry, but is shown here for illustration. We start by assessing the statistical behaviour of the k2 regression model. This behaviour is evident from the lefthand figure below, which indicates the sensitivity to small factor changes of k2 (as well as k1). In order to understand what is causing this susceptibility to changes in the factors, it is necessary to consult the regression coefficients displayed in the right-hand figure.
Investigation: HPLC Robustness (MLR) Q2 Model Validity Summary of Fit Reproducibility 1.00 0.80 0.60 0.40 0.20 0.00
R2
Investigation: HPLC Robustness (MLR) Scaled & Centered Coefficients for k2 (Extended) 0.10 0.00 -0.10 -0.20 -0.30 pH Co(ColA) Co(ColB)
Page 6 (7)
k1
N=12 DF=6
k2
Cond. no.=1.2289 Y-miss=0
Res1
N=12 DF=6
R2=0.989 Q2=0.959
We can see that it is mainly acetonitrile, pH and temperature, that affect k2. Using the procedure outlined in connection with the first limiting case, we may understand how to change the factor intervals to accomplish two things: (i) how to get k2 inside specification; and (ii) how to produce a non-significant model (i.e., how to approach the second limiting case). Firstly, it is possible to predict the most extreme experimental values (in the investigated area) of k2. These are the predictions listed on the first two rows in the next figure, and they amount to 2.50 and 3.49. Clearly, we are outside the 2.7-3.3 specification.
OS
Ac
Te
In order to move to within this specification, we must adjust the factor ranges of the three influential factors, and this is shown in rows three and four. If we also want the 95% confidence intervals, and not only the point estimates to be inside specification, somewhat harder demands on the factors are needed. Moreover, to get a non-significant regression model even narrower factor intervals are needed. This is done as follows: The regression coefficient of acetonitrile is 0.33 and its 95% confidence interval 0.036. These numbers mean that this coefficient must be decreased by a factor of 10, that is, be smaller than around 0.03, in order to make this factor non-influential for k2. Since this coefficient corresponds to the response change when the amount of acetonitrile is increased by 1% (from 26% to 27%) the new high level must be lowered from 27% to 26.1%. A similar reasoning applies to the new lower level. Hence, the narrower, more robust, factor tolerances of acetonitrile ought to be between 25.9% and 26.1%. A similar reasoning for temperature indicates that the factor interval should be decreased to one-third of the original size. Appropriate low and high levels thus appear to be 20C and 23C. Predictions obtained are listed in rows five and six. These new settings must, of course, be verified with a new design. This concludes our treatment of the third limiting case. The take-home message here is that it is possible to use the modelling results to understand how to reformulate the factor settings so that robustness can be obtained. Fourth limiting case Outside specification/Non-significant model The fourth limiting case is outside specification with a non-significant model. This limiting case may be the result when the derived regression model is poor, and there are anomalies in the data. Such anomalies are important to uncover, because their presence will influence the modelling. An informative graphical tool for identifying whether this limiting case is taking place is the replicate plot. The left-hand figure shows an example in which one strong outlier is present, which will invalidate all possibilities for robustness. The second figure depicts another case where all the replicated centre points have much higher response values than the other runs. This pattern hints at curvature and implies non-robustness. A third common situation, which partly resembles the first case, might take place when one experiment deviates from the rest and also falls outside some predefined robustness limits. This is shown in the last figure.
Investigation: itdoe_roblimcases
Investigation: itdoe_roblimcases Plot of Replications for vetific with Experiment Number labels
Investigation: itdoe_roblimcases Plot of Replications for vetific with Experiment Number labels
45 vetific 40 35 30 25 1
10 9 11
70 vetific
10 9 11
60
4
2 3 4 5 6 7 8 9 Replicate Index
MODDE 7 - 2003-11-17 11:58:00
50 1
1
2
2
3
3
4
4
5
5
6
6
7
7
8
8
9
100 90 80 70 60 50 40 30 20 10 0
3 1 2 4 5 6 7 8 10 11 9
vetific
Replicate Index
MODDE 7 - 2003-11-17 11:59:51
Replicate Index
MODDE 7 - 2003-11-17 12:01:59
Evidently, there can be several underlying explanations to this limiting case, and we have just shown a few. Therefore, we consider this limiting case as the most complex one. In summary, we have described four limiting cases of robustness testing, and it is important to realise that robustness testing results are not statically locked to these four extreme outcomes. In principle, there is a gradual transition from one limiting case to another, and hence an infinite number of outcomes are conceivable.
Conclusions
Evaluation of the data demonstrated that the response Res1 was robust because it was possible to maintain a resolution above 1.5 for all 12 experiments.
Page 7 (7)
Background
This is an industrial pilot plant investigation aimed at designing a cake mix giving tasty products.
Objective
The final goal was to design a cake mix which would produce a good cake even when a customer does not rigidly follow the baking instructions. To explore whether this was feasible, the factors Flour, Shortening, and Egg Powder were used as design factors and varied in a cubic inner array. They were varied between 200-400g (Flour), 50-100g (Shortening), and 50-100g (Egg Powder). In addition, two noise factors were incorporated in the experimental design as a square outer array. These factors were baking Temperature, varied between 175 and 225C, and Time spent in oven, varied between 30 and 50 minutes
Data
The investigators made 55 experiments. The inner array is a two-level full factorial design in 11 (8+3) runs, and the outer array a two-level full factorial design in 5 (4+1) runs, resulting in 11*5 = 55 experimental combinations. We will analyse this data set in two ways. A schematic representation of the experimental plan is given in Figure 1.
Temp Temp 225 175 30 Time 50 225 Temp 175 225 Temp 175 30 Time 50 225 Temp 175 30 Time 50 30 Time 50 30 Time 50 225 175 Temp 30 Time 50 225 Temp 175 225 175
100 Eggpowder
6
30 Time 50
Temp
30 Time 50
225 175
50 100
50
200
Flour
400
Sho rten in g
Figure 1: The arrangement of the factors as inner and outer arrays. This arrangement was introduced by the Japanese engineer Genichi Taguchi.
Page 1 (9)
Organisation of data for part I (MODDE worksheet should have 11 runs): The classical approach for analysing DOE data organised in inner and outer arrays is to form, for each point in the inner array (here: Cake Mix factors), the average response value across all points in the outer array (here: Time & Temperature). This gives two responses, the average taste for each point in the inner array, and the standard deviation around this average. Note: with this approach there will be no model terms related to Time and Temperature.
Tasks
Task 1
Define a new investigation in MODDE with three factors and two responses. Select Screening and an interaction model. Select a full factorial design with 8 (corners) + 3 (centre-points) runs. Evaluate the raw data. Fit the regression model. Which are the important factors? Are there any non-significant model terms? Are there any outliers? Comment on lack of fit. Which factor combination leads to an optimal taste? Which factors correlate with StDev? How shall the inner array factors be set to minimise the influence of the outer array factors?
Page 2 (9)
Organisation of data for part II (MODDE worksheet should have 55 runs): A problem with the foregoing analysis approach is that it does not enable a quantitative understanding of the impact of baking Time and Temperature, since these factors were not introduced in the regression model. One way to accomplish this is to re-organise the worksheet so that it contains all 55 experiments and five factors in the model. The consequence of this latter interaction analysis approach is that the StDev response vanishes. Another advantage of this latter approach is that it is possible to identify outliers.
No Flour Shortening Eggpowder Temp Time Taste No Flour Shortening Eggpowder Temp Time Taste 1 200 50 50 175 30 1.1 34 200 50 50 225 50 1.3 2 400 50 50 175 30 3.8 35 400 50 50 225 50 2.1 3 200 100 50 175 30 3.7 36 200 100 50 225 50 2.9 4 400 100 50 175 30 4.5 37 400 100 50 225 50 5.2 5 200 50 100 175 30 4.2 38 200 50 100 225 50 3.5 6 400 50 100 175 30 5 39 400 50 100 225 50 5.7 7 200 100 100 175 30 3.1 40 200 100 100 225 50 3 8 400 100 100 175 30 3.9 41 400 100 100 225 50 5.4 9 300 75 75 175 30 3.5 42 300 75 75 225 50 4.1 10 300 75 75 175 30 3.4 43 300 75 75 225 50 3.8 11 300 75 75 175 30 3.4 44 300 75 75 225 50 3.8 12 200 50 50 225 30 5.7 45 200 50 50 200 40 3.1 13 400 50 50 225 30 4.9 46 400 50 50 200 40 3.2 14 200 100 50 225 30 5.1 47 200 100 50 200 40 5.3 15 400 100 50 225 30 6.4 48 400 100 50 200 40 4.1 16 200 50 100 225 30 6.8 49 200 50 100 200 40 5.9 17 400 50 100 225 30 6 50 400 50 100 200 40 6.9 18 200 100 100 225 30 6.3 51 200 100 100 200 40 3 19 400 100 100 225 30 5.5 52 400 100 100 200 40 4.5 20 300 75 75 225 30 5.15 53 300 75 75 200 40 6.6 21 300 75 75 225 30 5.3 54 300 75 75 200 40 6.5 22 300 75 75 225 30 5.4 55 300 75 75 200 40 6.7 23 200 50 50 175 50 6.4 24 400 50 50 175 50 4.3 25 200 100 50 175 50 6.7 26 400 100 50 175 50 5.8 27 200 50 100 175 50 6.5 28 400 50 100 175 50 5.9 29 200 100 100 175 50 6.4 30 400 100 100 175 50 5 31 300 75 75 175 50 4.3 32 300 75 75 175 50 4.05 33 300 75 75 175 50 4.1
Task 2
Define a new investigation in MODDE with five factors and one response. Select Screening as objective and an interaction model. Create a design with 55 rows. Paste contents from CakeTaguchi.DIF into the MODDE worksheet. Evaluate the raw data. Fit the model. Which are the important factors? Are the residuals approximately normally distributed? Comment on lack of fit. Investigate the model coefficients and particularly examine baking Time and Temperature. Are they influential? How shall the cake-mix recipe be modified to minimise the influence of baking time and temperature?
Page 3 (9)
Solutions to CakeTaguchi
Task 1
It is instructive to first consider the raw experimental data. The first two plots show the replicate plots of the responses. We see that for both responses the replicate error is small and therefore satisfactory. It is also interesting that the responses are inversely correlated (third figure). We recall that the experimental goal is a factor combination producing a tasty cake and with low variation. Hence, it seems as if experiment number 6 is the most promising one.
Investigation: CakeTaguchi_classical Investigation: CakeTaguchi_classical
Investigation: CakeTaguchi_classical Raw Data Plot with Experiment Number labels 0.40 0.30
LogStD
Plot of Replications for Taste with Experiment Number labels 6.00 5.50 Taste 5.00 4.50 4.00 3.50 1
Plot of Replications for LogStD with Experiment Number labels 0.40 0.30 LogStD
1 7 3 11 10 9 4 8
Taste
6 4 3 7 1
2
1 3 2 4 6
1 2 3 4 5 6 7 8 Replicate Index
MODDE 7 - 2003-11-12 11:04:24
9 11 10
LogStD
7 11 10 9
2
3 4 5 6 7 8 9 Replicate Index
MODDE 7 - 2003-11-12 11:04:02
-0.20
-0.20
3.63.84.04.24.44.64.85.05.25.45.65.86.0
Next, we examine the modelling results obtained when fitting an interaction model to each response. Note that the negative Q2 of StDev indicates model problems. The model for Taste is of higher quality, but we remember from previous modelling attempts (see Exercise CakeMix) that even better results are possible if the two nonsignificant two-factor interactions are omitted.
Investigation: CakeTaguchi_classical (MLR) Q2 Model Validity Summary of Fit Reproducibility 1.00
0.40
R2
Investigation: CakeTaguchi_classical (MLR) Scaled & Centered Coefficients for LogStD 0.10
0.00
-0.10
Taste
N=11 DF=4
LogStD
Cond. no.=1.1726 Y-miss=0
N=11 DF=4
R2=0.995 Q2=0.874
N=11 DF=4
The results from fitting a refined model to each response are seen below. The model for StDev has improved a lot as a result of model pruning. Two interesting observations can now be made. The first is related to the Sh*Egg interaction, which is much smaller for StDev than for Taste. The second observation concerns the Fl main effect, which shows that Flour is the factor causing most spread around the average Taste. Hence, this is a factor to adjust in order to achieve robustness. The models that we have derived will now be used to accomplish the experimental goal.
Page 4 (9)
R2
Investigation: CakeTaguchi_classical (MLR) Scaled & Centered Coefficients for LogStD 0.10
0.00
-0.10
Taste
N=11 DF=6
LogStD
Cond. no.=1.1726 Y-miss=0
N=11 DF=6
R2=0.988 Q2=0.937
N=11 DF=6
R2=0.939 Q2=0.677
Sh
One way to understand the impact of the surviving two-factor interaction is to make interaction plots. Evidently, the impact of this model term is greater for Taste than for StDev. This is inferred from the fact that the two lines cross each other in the plot related to Taste, but do not cross in the other interaction plot. Both plots indicate that low level of Shortening and high level of EggPowder is favourable for high Taste and low StDev.
Investigation: CakeTaguchi_classical (MLR) Interaction Plot for Sh*Egg, resp. Taste 5.50 5.00 Taste 4.50 4.00 3.50
Egg (low ) Egg (high)
Investigation: CakeTaguchi_classical (MLR) Interaction Plot for Sh*Egg, resp. LogStD Egg (high)
Egg (high)
LogStD
0.200
Egg (low)
Egg (low)
50 60 70 80 90 100 Shortening
N=11 DF=6 R2=0.988 Q2=0.937 R2 Adj.=0.980 RSD=0.0974
MODDE 7 - 2003-11-12 11:17:06
0.000
An alternative procedure for understanding the modelled system is to make the response contour plots shown below. These contours were created by setting Flour to its high level, as this was found favourable in the modelling. The two contour plots convey an unambiguous message. The best cake mix conditions are found in the upper left-hand corner, where the highest taste is predicted, and at the same time the lowest standard deviation. This location corresponds to the factor settings Flour = 400, Shortening = 50, and EggPowder = 100. At this factor combination, Taste is predicted at 5.84 0.18, and StDev at 0.69 and with 95% confidence interval given by 0.55 and 0.87. Bearing in mind that the highest registered average value of Taste is 5.9, and the lowest value of StDev is 0.67, these predictions appear reasonable.
Page 5 (9)
Flour = 400g
Task 2
One drawback of the classical data analytical approach is that it does not allow the user to identify which noise factors could affect the variability of the responses. For the Taguchi method to be really successful, one would need to be able to estimate the impact of the noise factors and possible interactions between the design and the noise factors. Clearly, by definition, the success of the Taguchi approach critically depends on the existence of such noise-design factor interactions. Otherwise, the noise (variability) cannot be reduced by changing some design factors. Information about noise-design factor interactions can be extracted if both the noise and the design factors are combined in a single design. Then, a regression model can be fitted which contains both types of factors and their interactions. In this form of analysis, design factor effects in the classical approach (Task 1) now correspond to noise-design factor interactions (Task 2). We will now unfold the data table so that it comprises 55 rows and proceed with the Taguchi analysis. As usual, we commence the data analysis by evaluating the raw data. The replicate plot suggests that the replicate error is small, and the histogram shows that the response is approximately normally distributed. Hence, we may proceed to the regression analysis phase, without further pre-processing of the data.
Investigation: CakeTaguchi_interaction
Investigation: CakeTaguchi_interaction
Histogram of Taste 14 12 10 Count 8 6 4 2 0 1.00 1.80 2.60 3.40 4.20 5.00 5.80 6.60 7.40 Bins
MODDE 7 - 2003-11-12 11:22:34
50 16 55 25 53 54 27 15 23 29 18 17 49 28 26 12 19 39 22 41 47 20 14 21 6 30 37 13 52 4 24 31 58 33 42 48 32 2 44 43 3 11 9 38 46 7 10 51 36 40 45 35 34 1
10 20 30 40 50 Replicate Index
MODDE 7 - 2003-11-12 11:22:15
As seen in the summary of fit plot, the regression analysis gave a poor model with R2 = 0.60 and Q2 = 0.18. Such a large gap between R2 and Q2 is undesirable and indicates model inadequacy. The N-plot of residuals in
Copyright Umetrics AB, 04-02-10 Page 6 (9)
the next figure reveals no clues as to the poor modelling performance. The model also shows lack of fit (negative MVal).
Investigation: CakeTaguchi_interaction (MLR) Q2 Model Validity Summary of Fit Reproducibility 1.00
N-Probability
R2
Investigation: CakeTaguchi_interaction (MLR) Taste with Experiment Number labels 0.995 0.99 0.98 0.95 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.05 0.02 0.01 0.005
1
-4 -3 -2
23 55 53 12 54 29 218 50 41 37 25 3 9 8 4 47 49 1 6 6 42 27 15 44 4 3 7 24 13 26 9 10 1 1 3 22 45 52 28 1 4 5 21 30 46 20 35 40 34 19 36 48 38 31 51 33 32 17
-1 0 1 2 3 4
Taste
Cond. no.=1.3110 Y-miss=0
However, the regression coefficient plot does reveal two plausible causes. Firstly, the model contains many irrelevant two-factor interactions. Secondly, it is surprising to see that the Fl*Te and Fl*Ti two-factor interactions are so weak. Since we observed (in Task 1) the strong impact of Flour on StDev, we would now expect much stronger noise-design factor interactions. In principle, this means that there must be a crucial higher-order term missing from the model, the Fl*Te*Ti three-factor interaction. Consequently, in the model revision, we decided to add this three-factor interaction and remove six unnecessary two-factor interactions.
Investigation: CakeTaguchi_interaction (MLR) Scaled & Centered Coefficients for Taste
0.50 0.00 -0.50 -1.00 Fl*Sh Fl*Egg Fl*Ti Egg*Ti Sh*Ti Fl Ti Fl*Te Sh*Egg Egg*Te Sh*Te Te*Ti
Page 7 (9)
Sh
Egg
Te
N=55 DF=39
R2=0.605 Q2=0.185
When re-analysing the data, a more stable model with a reasonable R2 = 0.69 and Q2 = 0.57 was the result. An interesting aspect is that the R2 obtained is lower than in the classical analysis approach. This is due to the stabilising effect achieved by forming the average Taste over five trials in the classical analysis approach. Concerning the current model, we are unable to detect significant outliers among the individual experiments. The relevant N-plot of residuals is displayed below.
R2
Investigation: CakeTaguchi_interaction (MLR) Taste with Experiment Number labels 0.995 0.99 0.98 0.95 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.05 0.02 0.01 0.005
55 53 54 5023 12 41 26 2 24 15 47 18 49 1 3 3 29 25 42 40 3 0 5 3 7 7 28 39 44 43 19 1 6 4 9 36 45 11 1 0 6 52 38 8 22 21 34 46 20 27 17 1 48 31 14 33 51 32 35
-4 -3 -2 -1 0 1 2 3 4 Deleted Studentized Residuals
N=55 DF=44 R2=0.693 Q2=0.571 R2 Adj.=0.623 RSD=0.8751
MODDE 7 - 2003-11-12 11:26:25
Taste
Cond. no.=1.3110 Y-miss=0
Having acquired a reasonable model, it is appropriate to consider the regression coefficients, which are displayed below. We can see the significance of the new three-factor interaction. This is in line with the previous finding on the impact of Flour on StDev. Some smaller two-factor interactions, which are components of the three-factor term (i.e. Fl*Te, Fl*Ti and Te*Ti), are kept in the model to make the three-factor interaction more interpretable.
Investigation: CakeTaguchi_interaction (MLR) Scaled & Centered Coefficients for Taste 1.00 0.50 0.00 -0.50 -1.00 Fl*Ti Fl*Te*Ti
Page 8 (9)
Fl
Ti
N=55 DF=44
R2=0.693 Q2=0.571
The meaning of the three-factor interaction is easiest understood by constructing an interaction plot. The figure below displays the impact of the three-factor interaction. What should we look for in this kind of plot? The answer is that we want to get an indication of how to adjust the controllable factor Flour, so that the impact of variations in the uncontrollable factors Temperature and Time are minimised. The figure shows that by adjusting Flour to 400g the spread in Taste due to variations in Temperature and Time is reduced.
Sh*Egg
Te*Ti
Fl*Te
Egg
Sh
Te
5 Taste
190 200 210 220 230 240 250 260 270 280 290 300 310 320 330 340 350 360 370 380 390 400 410
Furthermore, in solving the problem, we must not forget the significance of the strong Sh*Egg two-factor interaction. We know from the initial analysis that the combination of low Shortening and high EggPowder produces the best cakes. With these considerations in mind we draw the response contour triplet shown in the figure below. Because these contours are relatively flat, especially when Flour = 400, we can conclude that the system is robust. Hence, when industrially producing a cake mix with the composition Flour 400g, Shortening 50g, and EggPowder 100g, together with a cooking recommendation of 200 C and time 40 min, sufficient robustness towards consumer misuse ought to be the result.
Conclusions
This example illustrates two principal approaches to the analysis of Taguchi-designed data. In the analysis, it was found that an important three-factor interaction existed between Flour, Time and Temperature (Fl*Ti*Te). By interpreting this term it was concluded that the impact of Time and Temperature on variation in Taste was minimised by adjusting Flour from 300 g (initial set-point) to 400 g (new product recipe).
Page 9 (9)
Background
Many factors at a bakery can affect the quality of loaves including factors related to the recipe of the dough and those related to the baking conditions. Naes et al. [Chemometrics and Intelligent Laboratory Systems 41 (1998) 221-235.] carried out an extensive project in which the influence of five factors on the volume of loaves was studied. Three varieties of wheat flour were used, Tjalve, Folke and HardRS, and two factors were related to baking conditions, i.e., mixing time and proofing time of the dough. The latter two factors may be inconsistent from one bakery to another. As a quality index of loaf formation, the loaf volume was used.
Objective
The experimental objective of this study was to accomplish a factor combination yielding the target loaf volume of 530 cm3. The idea was to find a combination of the three wheat flours constantly yielding loaves of the target volume, and thus being insensitive to changes in mixing time and proofing time.
Data
The investigators made 90 experiments. The experimental plan contains an inner array made up of the three mixture factors (Tjalve, Folke, HardRS) and an outer array consisting of the two process factors (mixing time and proofing time). The inner array is a Simplex Centroid design in 10 runs, and the outer array a CCF in 9 runs resulting in 10*9 = 90 experimental combinations. We will analyse this data set in two ways. A schematic representation of the experimental plan used is given in Figure 1.
Page 1 (8)
Tasks
Task 1
Define a new investigation in MODDE with three formulation factors and two responses. Select RSM as objective and a quadratic model. Create a mixture design with 10 rows. Paste contents from LoafVol2.DIF into the MODDE worksheet.
Task 2
Use PLS as the fit method. Which are the important factors? Are the residuals approximately normally distributed? Comment on lack of fit. Is it possible to get a volume of 530cm3 and minimise the spread (standard deviation)? It is desirable to get the standard deviation below 60.
Task 3
Define a new investigation in MODDE with two process factors, three formulation factors, and one response. Select RSM as objective and a quadratic model. Create a D-optimal design with 90 rows. Paste contents from LoafVolume.DIF into the MODDE worksheet.
Task 4
Use PLS as the fit method. Which are the important factors? Are the residuals approximately normally distributed? Comment on lack of fit. Investigate the model coefficients and particularly examine mixing time and proofing time. Are they influential?
Page 2 (8)
Page 3 (8)
Solutions to LoafVolume
Task 2
Using the default quadratic model, a strongly significant model for the average loaf volume was obtained. However, the model for StDev was weaker (low Q2 and problems in ANOVA). The residuals are nearly normally distributed for both responses. Note that the ANOVA is not complete, because there are no replicates available. We can observe from the plot of the raw data (StDev is plotted against loaf volume) that the two responses are strongly correlated (correlation coefficient = 0.90). This means that it will be difficult to get a high value of volume and a low value of the standard deviation.
Investigation: Loafvol2 (PLS, comp.=2) Summary of Fit 1.00
R2 Q2
stdev
80
0.80
9 10 36 4 5 1 2
460 480 500 520 540 loafvolume
70 stdev
0.60 0.40
60 50
0.20
40
0.00 loafvolume
N=10 DF=4 Cond. no.=6.8608 Y-miss=0
stdev
440
Investigation: Loafvol2 (PLS, comp.=2) loafvolume with Experiment Number labels 0.95 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.05
5
N-Probability
N-Probability
2 6
-1.00
10
1 3
8 9
0.95 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.05
9 4 1 3
2 7
-1.00 -0.50
8 5
6 10
-0.50
0.00
0.50
1.00
0.00
0.50
1.00
Standardized Residuals
N=10 DF=4 R2=0.953 Q2=0.782 R2 Adj.=0.894 RSD=10.4747
MODDE 7 - 2003-11-18 09:19:07
Standardized Residuals
N=10 DF=4 R2=0.795 Q2=0.281 R2 Adj.=0.539 RSD=8.5309
MODDE 7 - 2003-11-18 09:18:50
Page 4 (8)
The coefficient plot of the model for loaf volume indicates that both Folke and HardRS affect the volume, whereas Tjalve does not have same strong influence. The same two factors also affect the StDev response, although these coefficients are not statistically significant according to their 95% confidence intervals. One efficient way of understanding the impact of these models is to make mixture contour plots. The solid arrow indicates where the best compromise is found: the mixture 0.25/0.11/0.64 where loaf volume is estimated at 530 cm3 and StDev as low as possible. To get the prediction uncertainty for this point we may use the prediction spreadsheet in MODDE. It appears possible to suppress the standard deviation below 70, but not below 60. As a consequence, the conclusion of the classical analysis approach is that the mixture of wheat flours used for loafbaking cannot be made sufficiently insensitive towards changes in mixing and proofing times between different bakeries.
Investigation: Loafvol2 (PLS, comp.=2) Scaled & Centered Coefficients for loafvolume Investigation: Loafvol2 (PLS, comp.=2) Scaled & Centered Coefficients for stdev 20 20 cm3 cm3 Tj*Tj 10 0 -10 -20 Tj*Ha Tj*Tj Ha*Ha Ha*Ha Fo*Ha Tj*Ha Tj Tj*Fo Tj Tj*Fo Fo*Ha Ha Fo*Fo Ha Fo*Fo Fo Fo
-20
N=10 DF=4
R2=0.953 Q2=0.782
N=10 DF=4
R2=0.795 Q2=0.281
Page 5 (8)
Task 4
The PLS modelling resulted in a strong model for loaf volume. The R2 and Q2 values of this model are slightly lower than the corresponding values for the previous model regarding the average loaf volume, but the model is very good. The ANOVA table and the N-plot of residuals also suggest that the acquired model is good. In addition, the two PLS score plots reveal the strong correlation among the five factors and the response. When looking at the regression coefficients we realise the strong impact of proofing time (1st bar in coefficient plot) on loaf volume. Generally, with longer proofing time larger loaves are produced. This sensitivity to proofing time means that baking specifications distributed among the different bakeries ought to contain a recommendation regarding an appropriate proofing time. The time used for mixing the dough is less critical. Unfortunately, because there is no strong interaction between the process factors (proofing time & mixing time) and the mixture factors (three types of wheat flour) it will not be possible to adjust the mixture factors and affect loaf volume and minimise the spread in this property.
Investigation: Loafvolume (PLS, comp.=2) Summary of Fit 1.00 0.80 0.60 0.40 0.20 0.00
N=90 DF=75
R2 Q2
loafvolume
Cond. no.=8.2742 Y-miss=0
Investigation: Loafvolume (PLS, comp.=2) loafvolume with Experiment Number labels 0.995 0.99 0.98 0.95 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.05 0.02 0.01 0.005
Investigation: Loafvolume (PLS, comp.=2) Scaled & Centered Coefficients for loafvolume
17
-2
6 74 80 70 55 26 4 5 34 77 2 0 60 40 78 81 33 82 39 65 84 49 29 37 76 57 63 8 3 3 4 19 73 36 1 2 8 71 59 89 67 43 15 56 52 30 9 50 64 23 42 62 2 2 7 25 53 18 13 86 66 69 88 32 79 51 14 38 47 61 27 87 31 6 8 1 44 5 16 85 11 41 9 0 2 72 28 46 1 0 24 48 5 4 21 75 35 58
-1 0 1 2 Standardized Residuals
80 60 cm3 40 20 0 -20 Pr Mi Tj Fo Ha Pr*Pr Mi*Mi Tj*Tj Fo*Fo Ha*Ha Pr*Mi Pr*Tj Pr*Fo Pr*Ha Mi*Tj Mi*Fo Mi*Ha Tj*Fo Tj*Ha Fo*Ha
N=90 DF=75 R2=0.894 Q2=0.754 R2 Adj.=0.874 RSD=22.6934 Conf. lev.=0.95
MODDE 7 - 2003-11-18 09:27:03
N-Probability
N=90 DF=75
R2=0.894 Q2=0.754
R2 Adj.=0.874 RSD=22.6934
MODDE 7 - 2003-11-18 09:26:39
Page 6 (8)
Investigation: Loafvolume (PLS, comp.=2) Score Scatter: t[1] vs u[1] with Experiment Number labels 4
Investigation: Loafvolume (PLS, comp.=2) Score Scatter: t[2] vs u[2] with Experiment Number labels 3 2 1 u[2] 0 -1 -2 -3
80 81 71 26 77 78 72 86 87 62 45 63 53 70 60 36 34 69 88 54 59 44 50 68 79 27 35 84 52 74 33 23 61 51 25 18 32 83 43 42 76 49 41 14 6 8 40 65 67 20 17 15 85 22 9 16 66 56 13 29 3124 7 57 5 55 47 4 39 75 38 58 82 19 11 30 37 12 73 48 64 2 3 46 21 10 28 1
-3 -2 -1 0 t[1]
N=90 DF=75 Cond. no.=8.2742 Y-miss=0
8990
2 u[1]
-2
82
-4 -3
MODDE offers another graphical option to overview the modelling outcome, the 4D-mixture contour plot, which is displayed below. In this plot the colour coding has been made consistent across all nine plots. Thus, we can, for example, observe that when proofing time is kept low, it is impossible to manufacture loaves of the desired volume (530 cm3).
Page 7 (8)
It is also possible to make a response contour plot showing how the loaf volume changes as a function of proofing and mixing times, at the identified mixture composition 0.25/0.11/0.64/ (Tjalve/Folke/HardRS). The first plot shows how this is accomplished in MODDE and the second plot is the resulting graph. From the lower graph we may conclude that the loaf volume varies when changing proofing and mixing times, that is, the composition of the wheat flour mixture cannot be made such that the resulting loaves become insensitive to changes in the two process factors. To accomplish robustness in this respect much tougher restrictions are needed on the proofing and mixing times.
Conclusions
Loaf volume varies when changing proofing and mixing times. This means that the mixture of the wheat flours cannot be made insensitive to changes in the two process factors. To accomplish robustness in this respect much tougher restrictions are needed on the proofing and mixing times.
Page 8 (8)
Background
Complementing an executed experimental design with additional runs is a common need in DOE. For instance, in a screening situation one may use fold-over to add more experiments to the initial fractional factorial design. Additionally, factorial and fractional factorial designs may be upgraded to more elaborate composite designs (CCF or CCC). Design augmentation may also be undertaken after optimization with the goal of transmuting e.g. a quadratic model to a cubic model. A common feature for these design augmentation principles is that the complement runs are appended to improve the modeling results in a general sense. Therefore, the model upgrading is quite unselective, as it applies to model terms originating from all factors varied. However, sometimes such a broad and unselective design augmentation might not provide the optimal solution to a problem. Rather, it might be desirable to select a critically low number of extra experiments, which are tailored to the estimation of a small set of new, well-identified model terms. This can be accomplished through D-optimal design.
Objective
In this example, we are going to work with a screening application concerning laser welding of nickel material in plate heat exchangers. The objective is not so much to deal with the regression analysis, but to focus on how to add extra runs to the original experimental protocol.
Data
This example relates to one step in the process of fabricating a plate heat exchanger, a laser welding step involving the metal nickel. The investigator, Erik Vnnman, studied the influence of four factors on the shape and quality of the resulting weld. These factors were Power of laser, Speed of laser, Gas flow at Nozzle of welding equipment, and Gas flow at Root, that is, the underside of the welding equipment. One important response is the width of the weld, which should be in the range 0.7-1.0 mm.
Page 1 (13)
Tasks
Task 1
Define a new project in MODDE consisting of four factors and one response. The design you will need is the 24fractional factorial design (8 + 3 runs). This design supports a linear model in the four factors. Enter the response data and fit the linear model to the data.
1
Task 2
Revise the model from Task 1 by estimating also the cross-term Po*Sp. Discuss the problem of including this term. (Hint: Look at Show/Confoundings).
Page 2 (13)
Task 3
Model updating is often used after screening, when it is necessary to unconfound two-factor interactions. We will now outline the procedure for adding a few extra experiments to the laser welding data set. Step 1: Make a copy of the current investigation and switch to this copy. Step 2: In the new application, do File/Complement design (this opens a wizard)
Page 3 (13)
Step 4: Select the number of additional runs Comment: To unconfound two two-factor interactions 4 extra experiments are appropriate. This implies that a balanced number of additional experiments is added.
Page 4 (13)
Step 6: Select the number of additional center-points and name the new investigation Comment: If we do not want to include any center points in the design supplement, the number of center points should be set to zero. This is appropriate if the time span between the 11 first experiments and the new ones is short. Conversely, if considerable time has elapsed between the initial and the new experiments, it is recommended to add one or two center-points to test that the system is stable over time.
Page 5 (13)
Step 9: Evaluate the resulting designs. In this case all five alternatives are identical
Page 6 (13)
Your task is now the following: Use the approach outlined above and propose an updated experimental design, which is able to resolve Po*Sp and No*Ro from one another. How many extra runs do you think are necessary? Experiment by selecting different number of runs and repititions. Use the condition number and the G-efficiency to identify a suitable design! Also remember that many D-optimal proposals may exist with similar performance measures. It may be necessary to plot the configuration of a set of design candidates to identify the preferred design version. Note: Our solutions to this task display designs different from the one presented above.
Page 7 (13)
As shown by the summary of fit plot below, the linear regression model is not reliable because of the large gap between R2 and Q2. We must then try to identify the cause of the low model quality. However, neither the analysis of variance nor the N-plot of residuals highlight any apparent reason for the model insufficiency. The regression coefficient plot shows that the factors Power of laser and Speed of laser dominate the model. Something to test in order to improve the model is to estimate the cross-term between these two factors. This is dealt with in Task 2.
Investigation: Updating (MLR) Summary of Fit 1.00 0.80 0.60 0.40 0.20 0.00 -0.20
N=11 DF=6
R2 Q2 Model Validity Reproducibility
Width
Cond. no.=1.1726 Y-miss=0
Page 8 (13)
Investigation: Updating (MLR) Width with Experiment Number labels 0.95 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.05 -4 -3 -2
N-Probability
5 4
8 1
mm
10 7 2 9 11
3 6
0.20
0.00
-0.20
-1
-0.40 No Po Sp Ro Po*Sp Ro
N=11 DF=6
R2=0.816 Q2=-0.068
Task 2
As seen below, the introduction of the Po*Sp cross-term has a profound impact on the model quality. The regression coefficient plot shows that this term is almost as large as the main effect of Power of laser. Moreover, the model error has been lowered.
Investigation: Updating (MLR) Summary of Fit 1.00 0.80 0.60 0.40 0.20 0.00
N=11 DF=5
R2 Q2 Model Validity Reproducibility
Width
Cond. no.=1.1726 Y-miss=0
Investigation: Updating (MLR) Width with Experiment Number labels 0.95 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.05 -4
Investigation: Updating (MLR) Scaled & Centered Coefficients for Width 0.20 0.10 0.00 mm -0.10 -0.20 -0.30 No Po Sp
N-Probability
7 2
-3 -2 -1
5 4
1 8 9 11
10
6 3
N=11 DF=5
R2=0.962 Q2=0.610
Page 9 (13)
However, because of the moderate resolution (IV) of the design used, the Po*Sp two-factor interaction is confounded with another two-factor interaction, namely No*Ro (see Correlation Matrix below). Therefore, the coefficient labeled Po*Sp (above) reflects the sum of contributions from the terms Po*Sp and No*Ro (plus a few higher-order interactions which are assumed negligible). The Confoundings list below overviews the confounding pattern of the 24-1 fractional factorial design. The only way to resolve Po*Sp and No*Ro from one another is to conduct more experiments. This is discussed in Task 3.
Page 10 (13)
Task 3
The first aspect to consider at this stage is how many extra runs are needed? In principle, only two extra experiments are needed to resolve the Po*Sp and No*Ro two-factor interactions. In practice, however, four additional runs might offer a more stable solution. We will start by adding 2 experiments. This means that in the overview list of the D-optimal designs we should focus on the designs with 13 runs. Below, we see two different proposals displaying identical condition number and G-efficiency. We can see that for both designs variation has been induced in the factors No and Ro. Also note that alternative arrangements of the added experiments exist.
Page 11 (13)
Resource permitting, the addition of four extra experiments will provide even better resolution between Po*Sp and No*Ro. Below, we show the outcome of a design proposal where four extra runs and two extra center points have been appended to the original data set. Remember that many other alternatives exist with identical performance measures. Quite a few of these are not balanced with regards to the four corner runs, meaning that the low and high level of each factor are not explored using the same number of runs for each level. A 2 + 2 distribution is preferable to a 1 + 3 distribution.
Page 12 (13)
Conclusions
In the first instance, the researcher conducted a 24-1 fractional factorial design with three center-points, that is, eleven experiments. In the analysis it was found that one two-factor interaction, the one between Po and Sp, was influential. However, because of the moderate resolution of this design, this two-factor interaction is confounded with another two-factor interaction, namely No*Ro. An escape-route out of this problem is to complement the existing design with more experiments. One possibility is the fold-over design, which enables resolution of Po*Sp from No*Ro, as well as resolution of the remaining four two-factor interactions. The disadvantage of making the fold-over is a lot of extra experiments. Eleven additional runs are necessary. An alternative approach in this case, less costly in terms of experiments, is to make a D-optimal design updating, adding only a limited number of extra runs. It was shown how either two or four extra experiments, plus an optional number of centerpoints, could be added to the starting design to achieve this objective. The importance of design balancing was also addressed.
Page 13 (13)
Background
In a chemical experiment the influence of two factors (time and temperature) on the yield of the main product was investigated. Initially a 22 factorial design augmented with two centre-points was performed. Preliminary examination of the results indicated that the experimental design was correctly positioned in the experimental space and hence there was no need to adjust the low and high settings of the factors. However, there was some indication of a non-linear relationship between the factors and the response and so the design was upgraded to a CCC design by adding the star points and two additional centre points. This design comprised 12 experiments: 4 corner points, 4 star points and 2+2 centre points. This data set is taken from Box GEP, Hunter WG, Hunter JS, Statistics for experimenters, John Wiley & Sons, 1978, p. 519.
Objective
The objectives of this example were two-fold: (1) to identify the optimal settings of time and temperature, (2) to investigate whether there was evidence of a shift in the response data between the two series of experiments (i.e. whether there were significant block effects or not).
Data
Page 1 (4)
Tasks
Task 1
Define a new investigation in MODDE with two factors and one response. Select RSM, the CCC design using two blocks and two center points in each block. Make sure that you tick the Block interactions check box.
Enter the response data. Note that the values of the star points were rounded to the nearest integer by the experimenters so amend the factor settings accordingly. Evaluate the raw data. Fit the regression model. Which factors affect yield? Are there any nonsignificant model terms? Which factor combination optimises yield? What about the block effects are they significant?
Page 2 (4)
Solutions to Blocking
Task 1
We start by evaluating the raw data. First we examine the replicate plot which shows that the replicate error is low, which is good. The histogram shows that the response is approximately normally distributed indicating that we have good data to work with.
Investigation: Blocking_RSM Plot of Replications for Yield with Experiment Number labels
B1 B2
90 88 86 Yield 84 82 80 78 1
5 6
Count
11 12 7 8 9 10
1
2 3 4
4
5 6 7 8 9 10 Replicate Index
MODDE 7 - 2004-02-04 15:05:16
0 77 81 85 Bins 89 93
A strong model was obtained with R2=0.98, Q2=0.95, Model Validity=0.99 and Reproducibility=0.88. The regression coefficients indicate that a low value of time is best but the linear effect of temperature is not significant. However, the quadratic terms of both time and temperature are significant. The block factor and its interactions with time and temperature are not significant. However, the deletion of any of these model terms causes the model quality to deteriorate and so they are kept in the model. There is some evidence that slightly lower yields were obtained in the second set of runs.
Investigation: Blocking_RSM (MLR) Scaled & Centered Coefficients for Yield (Extended)
2 0 g
Reproducibility
0.80
-2 -4
0.60
-6 $Blo(B1) $Blo(B2) Tim*$Blo(B1) Tim*$Blo(B2) Temp*$Blo(B1) Temp*$Blo(B2) Tim Tim*Tim Temp Temp*Temp Tim*Temp
0.40
0.20
N=12 DF=3 R2=0.978 Q2=0.949
0.00 Yield
N=12 DF=3 Cond. no.=3.1808 Y-miss=0
Page 3 (4)
The two response surface plots below visualise that higher yields were obtained in the first set of runs (the factorial part of the design). The average difference between the two blocks is 1.76g.
Conclusions
To maximise yield we should use Time=76 min and Temperature=151C. There is a mild shift in yields between the two blocks of experiments.
Page 4 (4)
Example Lower
Data:
Binder: Oxidizer: Fuel: 0.1-1.0 0.5-1.0 0.1-1.0
Task 1:
Draw the experimental region by-hand.
Task 2:
Use MODDE to calculate the implied upper bounds.
Page 1 (6)
Example Upper
Data:
Binder: Oxidizer: Fuel: 0.0-0.6 0.0-0.7 0.0-0.4
Task 3:
Draw the experimental region by-hand.
Task 4:
Use MODDE to calculate the implied lower bounds.
Page 2 (6)
Task 5:
Draw the experimental region by-hand.
Task 6:
Use MODDE to calculate the implied lower and upper bounds.
Page 3 (6)
Binder
Fu el =
Calculate the implied upper bounds: R(L) 1- 0.1- 0.5 - 0.1 = 0.3 U i* = L i + R L Binder 0.1-1.0; 0.1 + 0.3 = 0.4 Oxidiser0.5-1.0; 0.5 + 0.3 = 0.8 Fuel 0.1-1.0; 0.1 + 0.3 = 0.4 Dashed lines indicate location of implied upper bounds.
0.1
Fu el =
Binder = 0.4
0.4
z er idi Ox =0 .5
Oxidizer
Ox ze idi r= 0.8
Binder = 0.1
Fuel
Page 4 (6)
Task 4:
To calculate compatible bounds MODDE followed the scheme listed in the right-hand part of the figure.
Binder
Calculate the implied lower bounds: R(U) = 0.6 + 0.7 + 0.4 -1 = 0.7 Binder Oxidizer Fuel Li=Ui-RU -0.1 - 0.6 0.0 - 0.7 -0.3 - 0.4
Binder = 0.6
Oxidizer
Fu =0 .7
el =
0.4 id Ox
Copyright Umetrics AB, 04-02-10
r ize
Fuel
Page 5 (6)
Task 6:
To calculate compatible bounds MODDE followed the scheme listed in the right-hand part of the figure.
Binder
Check if bounds are consistent
RL = 1-0.2-0.2-0.3= 0.3 RU = 0.6+0.6+0.5 -1 = 0.7 x1 x2 x3 0.2-0.6 0.2-0.6 0.3-0.5 0.5 0.5 0.5
Ox ze idi r= 0.2
0.3
el =
Oxidizer
Fu el =
Fu
0.5
Ox zer idi =0 .5
Binder = 0.5
Binder = 0.2
Fuel
Page 6 (6)
Background
In tablet manufacturing in pharmaceutical industry it is practical to make experiments according to mixture design. Here, three constituents were varied according to a modified simplex centroid mixture design in order to produce tablets. The three constituents were: cellulose, lactose and dicalciumphosphate.
Objective
The objective of the investigation was to find out how the three excipients influenced release of active substance.
Data
Ten tablets were prepared according to a mixture design in the three excipients mentioned. The response measured was the release (in min) of the active ingredient and this value has to be maximized. The data set is taken from P.J. Waaler, Acta Pharm Nord 4: 9-16, 1992.
Page 1 (4)
Tasks
Task 1
Create a new investigation in MODDE and define the three mixture factors and the single response according to the information given above. Select RSM as objective and accept the first choice design (Modified Simplex Centroid), using Design Runs = 9 and Centerpoints = 1. MODDE now creates a Worksheet identical to the one shown on the foregoing page. Enter the response values.
Task 2
Select PLS as fit technique. Fit the model. Questions to address and answer: Which are the significant terms? Are the residuals approximately normally distributed? What about Lack of Fit? Review the fit and interpret the model. Which formulation corresponds to maximized release (Hint: Use the Optimizer)?
Task 3
The experimenters performed three verifying experiments. x1 0.5 0.333 0.667 x2 0.125 0 0 x3 0.375 0.667 0.333 release 370 340 345
Page 2 (4)
Solutions to WAALER
Task 2
The PLS analysis of the tablet data gave a model with R2 = 0.98 and Q2 = 0.55 (upper left-hand plot). These statistics point to an imperfect model, because R2 substantially exceeds Q2. Unfortunately, the second diagnostic tool (upper right-hand plot), the ANOVA table, is incomplete because the lack of fit test could not be performed. However, a possible reason for the poor modelling is found when looking at the N-plot of the response residuals given in the middle left-hand figure. Experiment number 10 is an outlier and degrades the predictive ability of the model. If this experiment is omitted and the model refitted, Q2 will increase from 0.55 to 0.69. We decided not to remove the outlier, primarily to conform with the modelling procedure of the original literature source. The subsequent three plots show the inner relation for the respective PLS model dimension.
Investigation: Waaler_rsm (PLS, comp.=3) Summary of Fit 1.00 0.80 0.60 0.40 0.20 0.00
R2 Q2
release
N=10 DF=4
Investigation: Waaler_rsm (PLS, comp.=3) release with Experiment Number labels 0.95 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.05
Investigation: Waaler_rsm (PLS, comp.=3) Score Scatter: t[1] vs u[1] with Experiment Number labels 3
5 6 7 10
4 9 6 5
u[1]
2 1 0 -1 -2 -3
N-Probability
7 2 3 10
-1
8 2
0 Standardized Residuals
4
-3 -2 -1 0 t[1] 1 2 3
N=10 DF=4
R2=0.985 Q2=0.553
R2 Adj.=0.966 RSD=18.7170
MODDE 7 - 2003-11-20 09:07:23
N=10 DF=4
Investigation: Waaler_rsm (PLS, comp.=3) Score Scatter: t[2] vs u[2] with Experiment Number labels
Investigation: Waaler_rsm (PLS, comp.=3) Score Scatter: t[3] vs u[3] with Experiment Number labels
5 9 1 3 2
u[3]
1 0 -1 -2 -3
1 23 6 9 4 8 7 5
87
10
4
-1.00 -0.50 0.00 t[2]
N=10 DF=4 Cond. no.=7.4174 Y-miss=0
0.50
1.00
-4
10
-1 0 t[3]
N=10 DF=4 Cond. no.=7.4174 Y-miss=0
Page 3 (4)
Scaled and centered regression coefficients of the computed model are plotted in the left-hand plot below. This coefficient plot shows that in order to maximize the release, the amount of lactose in the recipe should be kept low and the amount of phosphate high. The presence of significant square and interaction terms indicate the existence of quadratic behavior and non-linear blending effects. These effects are more easily understood by means of the trilinear mixture contour plot shown in the righthand plot below. This latter plot suggests that with the mixture composition 0.32/0/0.68 one may expect a response value above 350. This point should be tested in reality, thus functioning as an experimental verification of the model.
Investigation: Waaler_rsm (PLS, comp.=3) Scaled & Centered Coefficients for release
la*la
ph*ph
N=10 DF=4
R2=0.985 Q2=0.553
Task 3
In this application, the optimizer identified only one point, the mixture 0.32/0/0.68, where maximum release rate was predicted at 363 minutes. This point was not tested in the original work, but one close to it was. The experimenters performed three verifying experiments and these results together with model predictions are summarized in the figure below. As seen, the model predicts well except for the mixture 0.5/0.125/0.375. Recall that the observed values (for the first three rows in the figure below) were 370, 340, and 345.
Conclusions
Maximum release is predicted for the combination 0.32 / 0 / 0.68. The experimental verification produced good agreement between measured and predicted response values for two out of three new formulations. The discrepancy between measured and predicted release for the remaining point suggests some information deficiency in the training set. One way to address this problem is to combine the two sets of data and then update the regression model. As a consequence, a new set of prediction samples should be compiled in order to verify the predictive power of this updated model.
ce*ph
ce*ce
la*ph
la
ce*la
ph
Page 4 (4)
Background
A manufacturer of a rocket propellant mixed three ingredients together to get the best possible product.
Objective
The objective was to formulate a propellant with elasticity > 2900.
Data
Three ingredients, mixture factors, were varied and one response (elasticity) was measured. The data table is shown below. Design: Modified Simplex Centroid. Model: Quadratic model.
Page 1 (5)
Task 1
Create a new investigation in MODDE according to the information given above. Select RSM as objective, a quadratic model, and generate a modified simplex centroid design with 9 + 1 runs. Enter the response data.
Task 2
Evaluate the raw data. Make a histogram to evaluate the distribution of elasticity, and a replicate plot to explore the replicate error. Are there any anomalies in the raw data?
Task 3
Select PLS as fit method. Relate the predictors to the response. Investigate the relevant score and loading plots. Interpret the model. What can you say about the correlation structure among the factors and responses (Hint: Look at PLS score plots)? Which factors are influential for elasticity? Which formulation should be used to maintain an elasticity above 2900?
Page 2 (5)
6 5 3 2 1
1 2 3 4
9 8 7
10
3 Count 2 1 0
4
5 6 7 8 9 10 Replicate Index
MODDE 7 - 2003-11-19 15:00:13
2350
2550
2750 Bins
2950
3150
Page 3 (5)
Task 3
A two-component PLS model was obtained with R2 = 0.80 and Q2 = 0.25. The gap between R2 and Q2 is large and this is unsatisfactory. The PLS total summary plot shows that the first component is the most important regarding explained variance. In order to investigate the correlation structure, we have plotted the t/u scores of the two model components. These indicate a curved correlation structure in the first component, and that the second component basically is a compensation for the encountered non-linear behavior. Further, the ANOVA table shows that the model is insignificant (p = 0.14, should be p< 0.05 for a significant model). The N-plot of residuals shows a weakly deviating behavior of experiment number 4, but since it lies inside 4SD.s it was kept in the modelling.
Investigation: Rocket (PLS, comp.=2) Summary of Fit 1.00 0.80 0.60 0.40 0.20 0.00
N=10 DF=4
R2 Q2
R2 Q2
0.80
Elasticity
Cond. no.=7.4174 Y-miss=0
Comp1
Comp2
N=10 DF=4
Investigation: Rocket (PLS, comp.=2) Score Scatter: t[1] vs u[1] with Experiment Number labels 2 1 u[1] 0 -1 -2
Investigation: Rocket (PLS, comp.=2) Score Scatter: t[2] vs u[2] with Experiment Number labels 2 1 0 u[2]
6 8 5 3
109
9 2 3 4
-3 -2 -1 t[2]
N=10 DF=4 Cond. no.=7.4174 Y-miss=0
-1 -2 -3
1 87 5
10 6
2 1
-3 -2 -1 t[1]
N=10 DF=4 Cond. no.=7.4174 Y-miss=0
4
0 1
-4
Investigation: Rocket (PLS, comp.=2) Elasticity with Experiment Number labels 0.95 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.05
N-Probability
5 4
-1
3 2 1 6
9 10
0 Standardized Residuals
N=10 DF=4
R2=0.801 Q2=0.249
R2 Adj.=0.553 RSD=160.1071
MODDE 7 - 2003-11-19 15:03:58
Page 4 (5)
The PLS loading plot and the coefficient plot indicates how the various model terms influence the elasticity of the rocket propellant. However, because we have a very weak model we must interpret the model with great care. Some guidance with regards to model refinement may be extracted from the coefficient plot; however, in this case we have not found it possible to improve the model. What one can do in this kind of situation is to use the trilinear mixture contour plot to get a general appraisal of the response function. We understand from the left-hand mixture region plot that we are investigating a small, though simplex-shaped, mixture domain. We conclude from the right-hand mixture contour plot that it seems possible to accomplish an elasticity above 2900 within the investigated region.
Investigation: Rocket (PLS, comp.=2) Loading Scatter: wc[1] vs wc[2] 0.60 0.40 0.20 wc[2] 0.00 -0.20 -0.40 -0.60
Investigation: Rocket (PLS, comp.=2) Scaled & Centered Coefficients for Elasticity
Oxi*Oxi
Bin*Oxi
Bin*Bin
Bin
Oxi
R2=0.801 Q2=0.249
Binder
Oxidiser
Fuel
Conclusions
In this case the experimental goal of obtaining an elasticity above 2900 was accomplishable. Binder and Fuel were the two excipients with the largest impact on the result variable. Oxidiser had almost negligible effect on the result variable.
Fue*Fue
Oxi*Fue
Bin*Fue
Fue
Fue*Fue
Bin*Oxi
-200
Page 5 (5)
Background
A manufacturer of fish pat wanted to produce a quality product irrespective of which species of fish were used. Since the market price of different fish varies considerably, a mixture design was used to locate the best tasting pat.
Objective
The aim was to produce a pat with a taste rating above 3.
Data
There were three ingredients and one response (taste). The data table is shown on the next page. Design: Modified Simplex Centroid. Model: Linear.
Tasks
Task 1
Create a new investigation in MODDE according to the information given above. Make sure that the design has 19 rows and then paste the contents of CORNE59.dif into the worksheet.
Task 2
Evaluate the raw data by inspecting the distribution and replicate error of taste using Worksheet/Histogram and Worksheet/Replicate Plot respectively. Are there any anomalies in the raw data?
Task 3
Select PLS as the fit method, using Analysis/Select Fit Method/PLS, and fit the model. Interpret the model by investigating the relevant score and loading plots using Analysis/PLS Plots/Score Scatter Plot and Analysis/PLS Plots/Loading Scatter Plot respectively. What can you say about the correlation structure among the three ingredients and taste (hint: look at the PLS score plots)? Check the validity of the model by looking at the ANOVA table and residual plot using Analysis/ANOVA/Anova Table and Analysis//Normal Prob. Plot Residuals respectively. Use loadings and coefficient plots, Analysis/Coefficients/Plot, to investigate which ingredients influence the taste of the pat? Which recipe gives a taste above 3?
Page 1 (5)
Experimental Data Experiments 1-10 comprise the original design, and experiments 11-19 are replicates.
Page 2 (5)
Solutions to Corne59
Task 2
The histogram suggests that a transformation, such as log, would be preferable. However, for the sake of this preliminary analysis of the data we will not transform the response. The replicate plot clearly illustrates the small replicate error. The correlation matrix is also shown below in order to illustrate the inherent correlation between the ingredients due to the overall mixture constraint, i.e. sum of ingredients = 1.0.
Investigation: Corne59 Histogram of taste 8 7 6 Count
taste 4 3 2 1 5 Investigation: Corne59 Plot of Replications for taste with Experiment Number labels
11 1
13 3 2 12
1 2 3 4
18 15 5 4 14 17
5 6
19 16 6
7 9 8
7 8 9
10
10
Replicate Index
MODDE 7 - 2003-11-19 11:27:27
Page 3 (5)
Task 3
A three-component PLS model was obtained with R2=0.97, Q2=0.90, MVal = 0.30, and Rep = 0.98. The PLS Total Summary plot shows that the first component is by far the most important in terms of variance explained. In order to investigate the correlation structure, we have plotted the t/u scores of the first two model components which indicate a strong relationship between taste and the three ingredients. The ANOVA table and the N-plot of residuals also indicate an excellent model.
Investigation: Corne59 (PLS, comp.=3) Summary of Fit 1.00 0.80 0.60 0.40 0.20 0.00
N=19 DF=13
R2 Q2 Model Validity Reproducibility
R2 Q2
0.80
R2 & Q2
0.60
0.40
0.20
taste
Cond. no.=6.5072 Y-miss=0
0.00 Comp1
N=19 DF=13
Comp2
Cond. no.=6.5072 Y-miss=0
Comp3
Investigation: Corne59 (PLS, comp.=3) Score Scatter: t[1] vs u[1] with Experiment Number labels
4 3 2 1 0 -1 -2 -3
1
Investigation: Corne59 (PLS, comp.=3) Score Scatter: t[2] vs u[2] with Experiment Number labels
11 1
18 15 10 8 19 16 6 7 9 5 11 1 13 3 2 12
2 12
-2
8
-1
18 15 13 5 3 10 19 16 6 9 4 14 17
0 t[1]
N=19 DF=13
u[1]
u[2]
-1
-2
4 14 17
-1
0 t[2]
N=19 DF=13 Cond. no.=6.5072 Y-miss=0
Investigation: Corne59 (PLS, comp.=3) taste with Experiment Number labels 0.98 0.95 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.05 0.02
N-Probability
7
-1
1 14 3 17 12 6
415 11 19 2 8 16
13 18
10
0 Standardized Residuals
N=19 DF=13
R2=0.971 Q2=0.905
R2 Adj.=0.960 RSD=0.1964
MODDE 7 - 2003-11-19 11:34:17
The PLS loadings plot and the coefficient plot indicate that all three mixture ingredients affect the taste of the fish pat. The response contour plot shows that, in order to achieve a taste rating above 3, you need to be in the upper part of the mixture triangle, i.e. high x1 and low x2. There is also clear evidence of non-linear blending.
Page 4 (5)
Investigation: Corne59 (PLS, comp.=3) Loading Scatter: wc[1] vs wc[2] 0.60 0.40 wc[2] 0.20 0.00 -0.20 -0.40
Investigation: Corne59 (PLS, comp.=3) Scaled & Centered Coefficients for taste 0.40
x2*x2
x1*x3
0.20 0.00 -0.20 -0.40 -0.60 x1 x2 x3 x1*x1 x2*x2 x3*x3 x1*x2 x1*x3 x2*x3
x2
x1*x2
wc[1]
N=19 DF=13 Cond. no.=6.5072 Y-miss=0
N=19 DF=13
R2=0.971 Q2=0.905
Conclusions
There is a strong relationship between taste and the three varied ingredients. To obtain a taste rating above 3, ingredient x1 should be high and ingredient x2 low. This gave the manufacturer a clear strategy for maintaining quality whilst simultaneously reducing cost.
Page 5 (5)
Background
Kids like to blow bubbles, but dislike bubbles which burst rapidly. We decided to use mixture design to investigate which factors that may affect bubble formation. We browsed through the Internet to find a suitable bubble mixture composition, which we could use as a starting reference mixture. Then this recipe was modified using mixture design, and bubbles were blown for each mixture composition. The investigator, Lennart Eriksson, carried out these experiments while being on parental leave and taking care of his son, little Andreas, 14 months old. This ensures high bubble quality.
Objective
The objective was to understand which factors that influence the bubble making process (Screening), and to see if some kind of optimal recipe could be formulated, which would ensure long-lasting bubbles (RSM).
Data
The lifetime in seconds was measured for bubbles of 4-5 cm size. The two process factors were: temperature (C) of solution and settling time of mixture (h). The four mixture factors were: dish-washing liquid 1 (DWL1), dish-washing liquid 2 (DWL2), tap water and glycerol.
Page 1 (8)
Tasks
Task 1
In MODDE first define the factors, the response and the constraint as outlined above. Select Screening as the objective. The process model should be an interaction model, and the mixture model a linear model. Create a Doptimal design with 24 runs. Edit the reference mixture so that it becomes 0.2 / 0.2 / 0.5 / 0.1. Open BUBB_SCR.XLS and paste in the real data. Select PLS as fit method and compute the model. Review and interpret the model. Which terms are important? Is there any deviating experiment? How should we proceed to improve the result (get longer-lasting bubbles)?
Task 2
Refine the model from the previous task by removing all insignificant terms. Refit and evaluate the updated model. Which factors are most meaningful to optimize? How should we proceed to improve the result (get longer-lasting bubbles)? Use the MODDE Optimizer to get some suggestions for future experiments.
Task 3
We are now going to use the results of the screening phase and construct an appropriate RSM design. This means that we will put the two process factors temperature (7 C) and time (25 h) as constants, and vary only the four mixture factors. We shall use the mixture composition 0.2 / 0.2 / 0.3 / 0.3 as our new reference mixture. In MODDE, make a copy investigation and re-define the factor settings and the DWL-constraint according to the following:
The response should be the lifetime of the bubbles acquired (log transformed). Select RSM as the objective. The mixture model should be a quadratic model. Create a D-Optimal design with 24 runs. Edit the reference mixture so that it becomes 0.2 / 0.2 / 0.3 / 0.3. Open BUBB_RSM.XLS and paste in the real data. Select PLS as fit method and compute the model. Review and interpret the model. Which terms are important? Is there any deviating experiment? Is it possible to even further increase the lifetime of the bubbles (longer than the measured 18.40 min)? Is it possible to find an optimum within the investigated region? Use the Optimizer to explore the mixture region.
Page 2 (8)
Page 3 (8)
Solutions to BUBBLES
Task 1
We can see that the distribution of the response is skewed to the right it needs to be log-transformed. The replicate plot shows that the pure error is reasonably low.
Investigation: Bubb_scr Histogram of Lifetime 10 15
2.50
Investigation: Bubb_scr Plot of Replications for Lifetime~ with Experiment Number labels
9 1 5 4 13 19 20 17 18 16 14 15 23 21 22 24
8
Lifetime~
Count
Count
10
6 4 2
2.00
1.50
8 2 3
4 6
12 11 10
11
81
151
221 Bins
291
361
431
1.00
1.30
1.60
1.90 Bins
2.20
2.50
2.80
1.00 0
7
8 10 12 14 16 18 20 22 Replicate Index
PLS was used to fit a model to the data, yielding R2 = 0.81, Q2 = 0.18, MVal = -0.2 and Rep = 0.93. There are several insignificant cross-terms, which cause the low Q2 and MVal. Remove these terms and refit the model.
Investigation: Bubb_scr (PLS, comp.=2) Summary of Fit 1.00 0.80 0.60
R2 Q2 Model Validity Reproducibility
Investigation: Bubb_scr (PLS, comp.=2) Scaled & Centered Coefficients for Lifetime~
0.50
s
0.40 0.20 0.00 -0.20
N=24 DF=11
0.00
-0.50 Te*DW1 Te*DW2 Te*Wa Ti*DW2 Ti*Wa Ti*Gly Gly Te*Ti Te*Gly Ti*DW1 Te Ti
Lifetime~
Cond. no.=2.7203 Y-miss=0
N=24 DF=11
DW1 DW2 Wa
R2=0.812 Q2=0.185
Task 2
When refitting the model a much better result was obtained. The refined model looks good according to R2/Q2, N-plot of residuals and Obs/pred. The ANOVA table and the MVal statistic show lack of fit, however, but the model is still useful. The model interpretation (with loadings or coefficients) indicates that in order to accomplish longer lasting bubbles the fraction of glycerol should be increased and the amount of water decreased. In the interpretation one must remember that the regression coefficients refer to the 0.2 / 0.2 / 0.5 / 0.1 reference mixture.
Page 4 (8)
Investigation: Bubb_scr (PLS, comp.=2) Summary of Fit 1.00 0.80 0.60 0.40 0.20 0.00
N=24 DF=18
Lifetime~
Cond. no.=2.1537 Y-miss=0
Investigation: Bubb_scr (PLS, comp.=2) Lifetime~ with Experiment Number labels 0.98 0.95 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.05 0.02
12
2 10
-1
20 13 5 23 17 1 3 11 9 18 21 14 227 4 24 6 8 16
0 1
19
15
Observed
2.50
9 19 20 1 17 18 16 23 21 22 4 12 24 14 8 10 5 13
N-Probability
2.00
1.50
15 3
1.00 7 1.00 1.20 1.40 1.60 1.80 2.00 2.20 2.40 2.60 Predicted
N=24 DF=18 R2=0.796 Q2=0.640 R2 Adj.=0.739 RSD=0.2018
MODDE 7 - 2003-11-19 10:58:44
11 2 6
Standardized Residuals
N=24 DF=18 R2=0.796 Q2=0.640 R2 Adj.=0.739 RSD=0.2018
MODDE 7 - 2003-11-19 10:58:13
Investigation: Bubb_scr (PLS, comp.=2) Loading Scatter: wc[1] vs wc[2] 0.60 0.40 wc[2] 0.20 0.00 -0.20 -0.40 -0.60 -0.40 -0.20 0.00 wc[1]
N=24 DF=18 Cond. no.=2.1537 Y-miss=0
Investigation: Bubb_scr (PLS, comp.=2) Scaled & Centered Coefficients for Lifetime~ 0.30
Ti Wa Gly Li~
s
DW2 Te
DW1
0.40 0.60
0.20
N=24 DF=18
We then used the MODDE optimizer to compute predictions of where to lay out an optimization design. Two such predicted mixture compositions are shown below, together with the results from the verifying experiments. It was decided to use the first verifying experiment as the reference for the RSM mixture design.
Page 5 (8)
Verifying experiments:
#1 Temp = 7 Time = 25 Mixture = 0.2 / 0.2 / 0.3 / 0.3 Lifetime = 1120 sec (18 min 40 sec)
#2 Temp = 7 Time = 49 Mixture = 0.4 / 0.0 / 0.3 / 0.3 Lifetime = 810 sec (13 min 30 sec)
Task 3
The replicate plot shows that the pure error is low. This plot also indicates that the replicates, i.e., the reference mixture measurements, lie in the upper part of the response interval. This indicates that a quadratic model is needed. The fitted quadratic PLS model had R2 = 0.92, Q2 = 0.71, MVal = 0.56, and Rep = 0.95, which are good values, and of sufficient quality for making an optimization. The model shows no lack of fit (ANOVA table) and has approximately normally distributed residuals. The PLS score plot demonstrates the good correlation between mixture composition and bubble lifetime. According to the coefficient plot, the excipients water and glycerol have most impact on bubble lifetime in the mixture region explored. Remember that the reference mixture is 0.2 / 0.2 / 0.3 / 0.3.
Investigation: Bubb_rsm Plot of Replications for Lifetime~ with Experiment Number labels Investigation: Bubb_rsm (PLS, comp.=2) Summary of Fit 1.00 0.80 0.60 0.40
R2 Q2 Model Validity Reproducibility
3.10 Lifetime~
1 2 5 4 3
2 4 6 8
8 11 6 7 9
10
13 14
20 19 22 23 21 24 15 17 16
3.00
10 12 18
16 18 20 22
2.90
0.20 0.00
N=24 DF=14
2.80 0
12
14
Replicate Index
MODDE 7 - 2003-11-19 11:12:35
Lifetime~
Cond. no.=12.3206 Y-miss=0
Page 6 (8)
Investigation: Bubb_rsm (PLS, comp.=2) Lifetime~ with Experiment Number labels 0.98 0.95 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.05 0.02
N-Probability
11
7
-1
1 16 6 9 512
4 15 22 23 14 21 19 13 18 20 8 10 3 24 2 17
0 Standardized Residuals
N=24 DF=14
R2=0.919 Q2=0.708
R2 Adj.=0.868 RSD=0.0358
MODDE 7 - 2003-11-19 11:13:20
Investigation: Bubb_rsm (PLS, comp.=2) Score Scatter: t[1] vs u[1] with Experiment Number labels 2 1 0 u[1] -1 -2 -3
Investigation: Bubb_rsm (PLS, comp.=2) Scaled & Centered Coefficients for Lifetime~ 0.060 0.040 0.020 0.000 -0.020 -0.040 Gly*Gly Gly DW1*Gly DW1*Wa DW1*DW1 DW2*DW2 DW1*DW2 DW2*Wa DW2*Gly Wa*Wa Wa*Gly -0.060 DW1 DW2 Wa
15 4 18 3
-3 -2 -1 t[1]
2 10 5
13 14 20 18 16 22 19 23 21 24 1117
6 12 9 7
1
N=24 DF=14
R2=0.919 Q2=0.708
N=24 DF=14
Page 7 (8)
Because the PLS model is good according to the evaluation criteria (R2/Q2/MVal/Rep, ANOVA, N-plot, t1/u1 score plot) we may proceed and make predictions. The mixture contour plot displayed below was created by putting glycerol, the most important ingredient, on its high level. Evidently, there is not a sharp optimum, but rather a ridge structure on which bubble lifetime in the span 1350-1360 seconds (approx 22.30 min) is encountered.
With the MODDE optimizer, the following five runs were predicted. They are all situated on the ridge found above.
DWL1 0.22 0.2108 0.2264 0.2229 0.2264 DWL2 0.1001 0.1187 0.1001 0.1001 0.1001 Water 0.2799 0.2705 0.2735 0.2771 0.2735 Glycerol 0.4 0.4 0.4 0.4 0.4 Lifetime 1359.421 1353.342 1360.329 1360.145 1360.329 iter 148 87 84 105 76 log(D) -0.8289 -0.7011 -0.8497 -0.8455 -0.8497
Conclusions
The conclusion is that by first using a screening design, then some steepest ascent predictions, and finally laying out an RSM design, we have made it possible to increase bubble lifetime from 6.02 min to 22.28 min!!!! Unfortunately, however, little Andreas, showed more interest for the little red plastic bubble wand, than for his fathers enormous experimental progress.
Page 8 (8)
Background
A manufacturer wanted to develop a new polymer with the properties of low warp and high strength. To achieve this, the polymer formulation was varied according to an extreme vertices mixture design with 14 runs and 3 centre points based on the following constituents: 1 2 3 4 Glas Crtp Mica Amtp 20 to 40 % 0 to 20 % 0 to 20 % 40 to 60 %
Objective
The objective of the investigation was to understand how the four constituents influence the properties of the polymer and if it was possible to manufacture a polymer with the required properties.
Data
Fourteen responses relating to warp, shrinkage and strength were measured on the polymers as shown below.
Page 1 (5)
Tasks
Task 1
Create a new investigation in MODDE and define the four factors and 14 responses (see above). Select SCREENING as the experimental objective. Generate a worksheet with 17 runs and copy/paste the entire data table (including the factor settings) from the file Lowarp.xls.
Task 2
Fit a model relating the constituents (variables 1 - 4) to the responses using PLS. Investigate the relevant score and loading plots using Analysis/PLS Plots/Score Scatter Plots and Analysis/PLS Plots/Loading Scatter Plots respectively and interpret the model. What can you say about the correlation structure among the factors and responses (hint: look at the score plots)? How are the 14 responses related (hint: look at the loading plots)? Which factors influence strength and which factors influence warp?
Page 2 (5)
Solutions to LOWARP
Task 2
PLS gives a three component model with R2 = 0.75 and Q2 = 0.53 which are excellent results considering that all 14 responses are included in one model. The R2 and Q2 values for each individual response are shown in the Summary of Fit plot below. The three PLS score plots confirm the strong correlation between the constituents and the responses. Finally, the DModY plot indicates no outliers in the response data.
Investigation: Lowarp (PLS, comp.=3) PLS Total Summary (cum)
1.00
R2 Q2
Investigation: Lowarp (PLS, comp.=3) Summary of Fit 1.00 0.80 0.60 0.40 0.20
0.80
wrp1
wrp2
wrp3
wrp4
wrp5
wrp6
wrp7
wrp8
st1
st2
st3
st4
st5
N=17 DF=13
N=17 DF=13
Investigation: Lowarp (PLS, comp.=3) Score Scatter: t[1] vs u[1] with Experiment Number labels 2 1 0 -1 -2
Investigation: Lowarp (PLS, comp.=3) Score Scatter: t[2] vs u[2] with Experiment Number labels
16 3 2 17 8 12
10
1
st6
0.00
10 17 14 16 15 7 13 9 5 3
1 11
7 6 11 9 4
-2
14 15 13 1
0 u[2] -1 -2 -3
u[1]
4 12 2
-2 -1
5
-1 0 t[1]
N=17 DF=13 Cond. no.=2.0457 Y-miss=10
0 t[2]
N=17 DF=13
Investigation: Lowarp (PLS, comp.=3) Score Scatter: t[3] vs u[3] with Experiment Number labels 3 2 u[3] 1 0 -1 -2
3
Standardized Residuals
2
16 10 13 6 14 11 5 2 15 12 8
-2 -1 0 t[3]
N=17 DF=13 Cond. no.=2.0457 Y-miss=10
9 1
17 7
Page 3 (5)
We have a PLS model characterising all 14 responses. Inspection of the VIP-plot indicates that, taken over all 14 responses, mica and glas are the most influential constituents. Since VIP is a squared function of the PLS loadings, it tells us how important each constituent is but not in which direction (positive or negative) it influences a particular response. This information can be obtained from the loadings plot which shows how the variables (constituents and responses) relate to each other. Observe that the eight warp responses are strongly clustered to the right of the loading plot in the direction of amtp and away from mica. Hence, we conclude that increasing amtp will increase warp whilst increasing mica will work in the opposite direction. The six strength responses are more scattered in the loading plot. This suggests that strength is either more difficult to measure or is a more complex phenomenon. crtp is most influential for st3 and st5, whereas glas is most important for st1, st2, st4 and st6. The four coefficient plots, shown below, illustrate the coefficient profiles for both correlated (wrp1 & wrp2) and uncorrelated (st3 & st4) responses.
Investigation: Lowarp (PLS, comp.=3) Variable Importance Plot 1.20 1.00 0.80 VIP 0.60 0.40 0.20 0.00
-0.50 -0.80 -0.60 -0.40 -0.20 Investigation: Lowarp (PLS, comp.=3) Loading Scatter: wc[1] vs wc[2]
st5st3
0.50 wc[2]
mi
0.00
mi
gl
am
cr
N=17 DF=13
N=17 DF=13
Investigation: Lowarp (PLS, comp.=3) Scaled & Centered Coefficients for wrp1 0.50 0.00 -0.50 -1.00 -1.50 gl mi cr am
Investigation: Lowarp (PLS, comp.=3) Scaled & Centered Coefficients for wrp2
N=17 DF=13
R2=0.734 Q2=0.610
N=17 DF=13
R2=0.771 Q2=0.625
Investigation: Lowarp (PLS, comp.=3) Scaled & Centered Coefficients for st3 200 100 0 -100 -200 gl mi cr am
2000 1000 0 -1000 -2000 -3000 gl
Investigation: Lowarp (PLS, comp.=3) Scaled & Centered Coefficients for st4
mi
cr
N=17 DF=13
R2=0.958 Q2=0.931
N=17 DF=13
R2=0.833 Q2=0.675
am
Page 4 (5)
Mixture contour plots provide a better understanding of the relationships between warp and strength and the four constituents. These contour plots are shown below for the four responses discussed previously and were constructed by fixing amtp at 0.5 and letting the other three constituents (crtp/mica/glas) vary. The arrow indicates a reasonable compromise among the four responses yielding the desired properties of high strength and low warp. This mixture is approximately glas = 0.3, crtp = 0.0, mica = 0.2 and amtp = 0.5. This mixture should be tested to verify the model predictions.
Conclusions
The application of a simple mixture design to a complex polymer optimisation problem has successfully generated a mixture point with the desired properties.
Page 5 (5)
Articles
1. 2. 3. 4. 5. 6. 7. 8. 9. Hendrix, C. (1979), What Every Technologist Should Know About Experimental Design, Chemtech, 9, 167174. Hunter, J.S. (1987), Applying Statistics to Solving Chemical Problems, Chemtech, 17, 167-169. Steinberg, D.M and Hunter, W.G. (1984), Experimental Design: Review and Comments, Technometrics, 26, 71-97. Grize, Y.L. (1995), A Review of Robust Process Design Approaches, Journal of Chemometrics 9, 239-262. Ahlinder, S., et al. (1997), Smart Testing Reaping the Benefits of DoE, Volvo Technology Report No 2 1997, www.volvo.se/rt/trmag/index.html. Nystrm, A. and Karlsson, A. (1997) Enantiomeric Resolution on Chiral-AGP with the aid of Experimental Design. Unusual Effects of Mobile Phase pH and Column Temperature, Journal of Chromatography A, 763, 105-113. Eriksson, L., Johansson, E., Wikstrm, C. (1998), Mixture Design Design Generation, PLS Analysis and Model Usage, Chemometrics and Intelligent Laboratory Systems, 43, 1-24. Lundstedt, T., et al. (1998), Experimental Design and Optimization, Chemometrics and Intelligent Laboratory Systems, 42, 3-40. Rappaport, K.D., et al. (1998), Perspectives on Implementing Statistical Modeling and Design in an Industrial/Chemical Environment, The American Statistician, May 1998, 52, 152-159.
page 1
Articles, general
1. 2. 3. 4. 5. 6. 7. 8. Wold, S., Esbensen, K., Geladi, P. (1987), Principal Component Analysis, Chemometrics and Intelligent Laboratory Systems, 2, 37-52. Hskuldsson, A. (1988), PLS Regression Methods, Journal of Chemometrics, 2, 211-228. Sthle, L., and Wold, S. (1988), Multivariate Data Analysis and Experimental Design in Biomedical Research, In: Ellis, G.P., and West, G.B. (Eds) Progress in Medical Chemistry, Elsevier Science Publishers, 291-338. Wold, S., Albano, C., and Dunn W.J., et al. (1989), Multivariate Data Analysis: Converting Chemical Data tables to plots, In: Computer Applications in Chemical Research and Education, Heidelberg, Dr. Alfred Htig Verlag. Stone M, Brooks RJ (1990): Continuum regression: Cross-validated sequentially constructed prediction embracing ordinary least squares, partial least squares and principal components regression Journal of the Royal Statistical Society, Ser. B, 52, 237-269. Frank, I.E., and Friedman, J.H. (1993), A Statistical View of Some Chemometrics Regression Tools, Technometrics, 35, 109-148. Wold, S. (1994), Exponentially Weighted Moving Principal Components Analysis and Projections to Latent Structures, Chemometrics and Intelligent Laboratory Systems, 23, 149-161. Wold, S., Eriksson, L., and Sjstrm, M. (1999), PLS in Chemistry, in: Encyclopedia of Computational Chemistry, Elsevier, pp 2006-2020.
Articles, process
1. 2. 3. 4. 5. 6. 7. 8. Kresta, J.V., MacGregor J.F., and Marlin T.E. (1991), Multivariate Statistical Monitoring of Process Operating Performance, The Canadian Journal of Chemical Engineering, 69, 35-47. Kourti, T., and MacGregor, J.F. (1995), Process Analysis, Monitoring and Diagnosis, Using Multivariate Projection Methods, Chemometrics and Intelligent Laboratory Systems, 28, 3-21. MacGregor, J.F. (1996), Using, On-line Process Data to Improve Quality, ASQC Statistics Division Newsletter, vol. 16. NO. 2. Page 6-13. Nijhuis, A., de Jong, S., Vandeginste, B.G.M. (1997), Multivariate Statistical Process Control in Chromatography, Chemometrics and Intelligent Laboratory Systems, 38, 51-61. Rnnar, S., McGregor, J.F., and Wold, S. (1998), Adaptive Batch Monitoring Using Hierarchical PCA, Chemometrics and Intelligent Laboratory Systems, 41, 73-81. Wikstrm, C., et al. (1998), Multivariate Process and Quality Monitoring Applied to an Electrolysis Process Part I. Process Supervision with Multivariate Control Charts, Chemometrics and Intelligent Laboratory Systems, 42, 221-231. Wikstrm, C., et al. (1998), Multivariate Process and Quality Monitoring Applied to an Electrolysis Process Part II. Multivariate Time-series Analysis of Lagged Latent Variables, Chemometrics and Intelligent Laboratory Systems, 42, 233-240. Wold, S., et al. (1998), Modelling and Diagnostics of Batch Processes and Analogous Kinetic Experiments, Chemometrics and Intelligent Laboratory Systems, 44, 331-340, 1998.
page 2
Articles, QSAR
1. 2. 3. 4. 5. 6. Eriksson, L., Hermens, J.L.M., et al. (1995), Multivariate Analysis of Aquatic Toxicity Data with PLS, Aquatic Sciences, 57, 217-241. Eriksson, L., and Johansson, E. (1996), Multivariate Design and Modeling in QSAR, Chemometrics and Intelligent Laboratory Systems, 34, 1-19. Verhaar, H.J.M., Hermens, J.L.M., et al. (1996), Classifying Environmental Pollutants. Separation of Class1 and Class2 Type Compounds Based on Chemical Descriptors, Journal of Chemometrics, 10, 149162. Goodford, P. (1996), Multivariate Characterization of Molecules for QSAR Analysis, Journal of Chemometrics, 10, 107-117. Lindgren, ., et al. (1996), Quantitative Structure-Effect Relationships for Some Technical Non-ionic Surfactants, Journal of the American Oil Companies Society, 73, 863-875. Sandberg, M., et al. (1998), New Chemical Descriptors Relevant for the Design of Biologically Active Peptides. A multivariate Characterization of 87 Amino Acids, Journal of Medicinal Chemistry, 41, 24812491.
page 3