DOE Handouts Exercises Solutions Wed

Design of Experiments (DOE) Pharma Applications
Contents

1. Introduction
2. Problem formulation 3. Full factorial designs 4. Analysis of full factorial designs 5. Analysis of full factorial designs. II. Causes of bad models 6. Experimental objective: Screening 7. Post-screening actions 8. Experimental objective: Optimization 9. Experimental objective: Robustness testing 10. Conclusions 11. Additional Topics 12. Mixture design One Day Add-on 13. Exercises
2/10/2004
Objectives of DOE Course

To describe how to make experiments efficiently
Span the experimental domain with the aid of an experimental design
To describe how to analyze the data

Use good statistical tools to evaluate the result of the experiments
To describe how to interpret the results

With the clever use of PC-based graphical facilities
To describe how convert modelling results into concrete action

MODDE optimizer & verifying experiments
2/10/2004
Table of contents 1. Introduction 2. Problem formulation 3. Full factorial designs 4. Analysis of full factorial designs 5. Analysis of full factorial designs. II. Causes of bad models 6. Experimental objective: Screening 7. Post-screening actions 8. Experimental objective: Optimization 9. Experimental objective: Robustness testing 10. Conclusions 11. Additional topics D-optimal design Blocking the Experimental Plan Mixture design Other RSM designs Multilevel qualitative factors The Taguchi approach to robust design Simultaneous optimization of several responses Partial least squares projections to latent structures Design in Latent Variables 12. Mixture design One Day Add-On 13. Exercises Getting started: ByHand, CakeMix Screening, Full Fac: Pain, Tablets, Protein Spray-Drying Screening, Frac Fac: Pilot Plant, Reporter Gene Assay, Chromshper_B Optimization: Chiral Separation, Metabolism, Willge, DrugD Robustness Testing: Nonafact, HPLC Robustness Robust Design: CakeTaguchi, LoafVolume D-optimal design: Model Updating Blocking the Experimental Plan: Blocking Mixture design: Mixture Region Training, Waaler, Rocket, Corne59, Bubbles, Lowarp 14. References 05 17 29 39 55 67 85 99 115 127 131 132 147 156 172 175 180 196 202 214 235 287 289 301 317 353 379 393 411 425 429 465
Copyright Umetrics AB, 2004-02-11
Page 1 (1)

Chapter 1 Introduction
Contents
Why/How DOE and where DOE is used Three primary experimental objectives Three General Examples The intuitive approach to experimental work (COST) A better approach (DOE) Overview of steps in DOE (using CakeMix) Benefits of DOE Summary
2/10/2004
Why/How DOE is used

Development of new products and processes Enhancement of existing products and processes Optimization of quality and performance of a product Optimization of an existing manufacturing procedure Screening of important factors Minimization of production cost Robustness testing of products and processes ...
2/10/2004
Where DOE is used

Chemical industry Polymer industry Car manufacturing industry Powertrain industry Pharmaceutical industry Food and dairy industry Pulp and paper industry Steel and mining industry Plastics and paints industry TeleCom industry Marketing and preference mapping; Conjoint analysis
2/10/2004
Three primary experimental objectives

Screening
Which factors are most influential? What are their appropriate ranges?
Optimization
How shall we find the optimum? Is there a unique optimum, or is a compromise necessary to meet conflicting demands on the responses?
Robustness testing
How shall we adjust our factors to guarantee robustness? Do we have to change our product specifications prior to claiming robustness?
2/10/2004
General Example 1: Screening

Reporter Gene Assay (Active Biotech AB) Change in factors give different treatments The goal was to uncover which factors affected the signal-to-background (S/B) ratio
Seed cells into plates and culture or treat as desired!
Place in luminometer and measure light emission!
t en tm ea Tr
Light
2/10/2004
General Example 2: Optimization

Reporter Gene Assay (Active Biotech AB) Based on results of screening phase; down-sizing from 6 to 3 factors The goal was to find the factor setting resulting in the highest S/B-value
t en tm ea Tr
Light
2/10/2004
General Example 3: Robustness Testing

HPLC separation of analytes in pharmaceutical industry The goal was to constantly maintain a resolution (Res1) above 1.5, which corresponds to complete baseline separation of two adjacent peaks
H 310/83
10000
1a
1m H 309/40
(R) (I) (S)
8000
6000
(II)
4000
2000
10
12
14 min
2/10/2004
The "intuitive" (COST) approach to experimental work

Changing one separate factor at a time (COST) does not lead to the real optimum, and gives different implications with different starting points Leads to many experiments and little information No quantification of interactions !!!
10
X2
8 6 4
-2
10
11
12
13
14
X1
15
10 8 6 4 2 0
X2
-2 10 11 12 13 14
X1
9
15
2/10/2004
A better approach - DOE

Standard 300/75/75
100
If not COST, what do we do instead? The solution is to construct a carefully prepared set of representative experiments, in which all relevant factors are varied simultaneously
200 X1 400
X3
50 100 X2 50
2/10/2004
10
Overview of DOE - CakeMix application

Three factors varied: Flour (200-400g), Shortening (50-100g), and Eggpowder (50-100g) Response: Taste of resulting cake
Cake Mix Experimental Plan Cake No 1 2 3 4 5 6 7 8 9 10 11 Flour Shortening 200 400 200 400 200 400 200 400 300 300 300 50 50 100 100 50 50 100 100 75 75 75 Egg Powder 50 50 50 50 100 100 100 100 75 75 75 Taste 3.52 3.66 4.74 5.20 5.38 5.90 4.36 4.86 4.73 4.61 4.68
Standard 300/75/75
100
X3
50 100 X2 200 X1 400 50
2/10/2004
11
Overview of steps in DOE - part I

1. Define Factors
2. Define Response(s)
3. Create Design (Make experiments)
2/10/2004
12
Overview of steps in DOE - part II

Investigation: Cakemix (MLR) Summary of Fit
1.00 0.80 0.60
R2 Q2 Model Validity Reproducibility
4. Make Model
0.40 0.20 0.00
Taste
N=11 DF=6 Cond. no.=1.1726 Y-miss=0
Investigation: Cakemix (MLR) Scaled & Centered Coefficients for Taste
0.40
5. Interpret Model
0.20 0.00 -0.20 -0.40 -0.60 Fl

N=11 DF=6
Sh
R2=0.988 Q2=0.937
Egg
R2 Adj.=0.980 RSD=0.0974 Conf. lev.=0.95
Sh*Egg
MODDE 7 - 2004-01-20 11:34:53
2/10/2004
13
Overview of steps in DOE - part III

6. Use Model (make decisions) Where to do verifying experiments ?
Flour = 400 g
2/10/2004
14
Three critical problems

Three critical problems that DOE will deal with in a better way than the COST-approach:
Problem 1 (Interactions): Systems influenced by more than one factor are poorly investigated by changing one separate factor at a time (interactions are missed) Problem 2 (Interpretation): Maps of the system may be misleading without using DOE (experiments are often ill-positioned and unable to support a response contour plot) Problem 3 (Noise): Systematic and unsystematic variability (seen "effects" and "noise") are difficult to estimate and consider in the computations without a designed series of experiments, see next slide
2/10/2004
15
Variability (Problem 3)
Every measurement and experiment is influenced by noise Under stable conditions every process and system varies around its mean, and stays within control limits; usually 3SD.s
2/10/2004
16
Reacting to noise
Consider one experiment where the temperature is changed from 35C to 40C The response change, from slightly below 93% to close to 96%, lies within the variability interval found when replicating
Ten measurements of yield, under identical conditions
94 96 98 92 Two measurements of yield. Any real difference?
yield
92
2/10/2004
94
96
98
yield
17
Focusing on effects
COST often implies an excess consumption of resources due to informationally inefficient distribution of the experiments DOE provides a better spread of the trials ==> averaging possibilities leading to more precise effect estimates
Y1
X2
X3
X1 X1
2/10/2004
X2 X1
18
Estimating real effects and noise

Real effects are estimated by the coefficients, and the noise is contained in the confidence intervals
Investigation: cakemix (MLR) Scaled & Centered Coefficients for Taste
0.50 0.00 -0.50 Fl Fl*Sh Sh Fl*Egg Egg Sh*Egg
Uncertainty of coefficient
N=11 DF=4
R2=0.995 Q2=0.874
R2 Adj.=0.988 RSD=0.0768 Conf. lev.=0.95
2/10/2004
19
Consequence of variability
Two points, experiments, close to each other make the slope of the line be poorly determined
Y Y
Two points far away from each other make the slope be well determined
And if a center-point is put in between it is possible to explore whether our model is OK. Should it be linear or nonlinear?
Y
It matters where the experiments are positioned !!! Design is needed.

2/10/2004
20
Selected design must match experimental objective

2 factors 3 factors >3 Hyper cube
Screening
Balanced fraction of hyper cube
Screening & Robustness Testing
Hyper cube + axial points
Optimization
2/10/2004
21
Benefits of DOE
Organized approach which connects experiments in a rational manner More useful information is obtained (the influence of all factors together) More precise information is acquired in fewer experiments Results are evaluated in the light of variability Support for decision-making: Map of the system (response contour plot)
2/10/2004
22
What we have learnt

DOE results in a set of experiments in which all factors are varied at the same time DOE is used for three primary experimental objectives
screening: which factors are important and what are their appropriate ranges? optimization: what is the optimal factor setting? robustness testing: how sensitive is a response to small changes in the factors?
DOE handles three problems well

factor interactions are estimable reliable maps of the systems are possible seen effects and noise are separable and estimable
2/10/2004
23

Chapter 2 Problem Formulation
Contents
Introduction to problem formulation Selection of experimental objective Definition of factors Definition of responses Selection of regression model The model concept Generation of experimental design Creation of worksheet Summary
2/10/2004
Introduction to problem formulation (PF)

Problem formulation (PF) is of central importance in DOE PF involves the selection/definition of a number of important features influencing the experimental work:
(1) selection of experimental objective (2) definition of factors (3) definition of responses (4) selection of regression model (5) generation of experimental design (6) creation of worksheet
2/10/2004
Introduction to problem formulation (PF)

Responses: Variables describing the properties of the system/process Factors: Parameters changed to influence responses and possibly direct the system/process towards a desired response profile Model: Mathematical expression linking the changes in the factors to the changes in the responses
Reporter Gene Assay
System
Spray Drying Machine
Process
HPLC Equipment
2/10/2004
Responses (Y)
4
Factors (X)
PF - 1. Selection of experimental objective

Experimental objective may be selected from six stages of DOE:
(a) familiarization (b) screening (c) finding the optimal region (d) optimization (e) robustness testing (f) mechanistic modelling
Screening, optimization and robustness testing most frequently used The experimental objective tells which kind of investigation one wants to do. One should ask why is an experiment done? And for what purpose? And what is the desired result?
2/10/2004
PF - 1a. Familiarization
Useful when one is facing an entirely new type of application or equipment Spend a limited portion of the available resources, say, 10% Simple designs are used Goal: To verify that similar results are obtained for the replicated center-points, and that different results are found in the corners
2/10/2004
Factor 2
Factor 1
PF - 1b. Screening
Useful when one wants to find out a little about many factors Goal: To uncover the important factors and their appropriate ranges. Is factor/response relationship linear or non-linear? Results before . and after screening
Pareto principle (80/20 rule) With 25 factors approximately 5 have an effect Noise
2/10/2004
PF - 1c. Finding the optimal region

Useful when one is interested in moving the experimental region so that it probably includes the optimum Goal: To accomplish an adequate re-positioning of the experimental region Tools:
Gradient techniques (manual) MODDE optimizer (automatic)
Y
How do we get here ?
x2 x1
Interesting direction
2/10/2004
PF - 1d. Optimization
Useful when detailed knowledge about the factor influences are needed We do not ask if a factor is relevant (screening), but how (optimization) Goal: To identify the factor combination at which the desired response profile is fulfilled (or almost so)
RSM: Response surface modelling (methodology)

9
2/10/2004
PF - 1e. Robustness testing

Useful when one wants to understand how to regulate factors so that changes in the responses are minimized Some factors affect the mean, some the spread around the mean, and some both properties Goal: To accomplish factor tolerances (settings) within which robustness can be assured:
small changes in controllable factors will not affect the result small changes in uncontrollable factors will not cause an undesirable spread around the desired result
2/10/2004
10
PF - 1f. Mechanistic modelling

Useful when there is a need of establishing a theoretical model for a given field and a given problem Goal: To prove a new model, or maybe falsify some competing models One or more semi-empirical models are utilized to build such a theoretical model. In this conversion process, regression coefficients are used to get an idea of appropriate derivative terms in the mech. model Important: correct
problem formulation experimental design data analysis
2/10/2004
11
PF - 2. Specification of factors
Categorization of factors Examples (MODDE) Quantitative Controlled & Uncontrolled Temperature 10C to 50C Process & Mixture (Formulation) Quant. Multilevel Quantitative & Qualitative Speed 200/300/400/500 rpm Qualitative Catalyst Pd/Pt/Mo Formulation Strawberries 0.3 - 0.4 Milk 0.3 - 0.4 Ice cream 0.3 - 0.4 Filler Solvent in mixture for which effect is uninteresting
12
2/10/2004
Transformation of factors
A factor can be transformed Examples:
log; neglog; logit; square root; fourth root
y y
log x
When ?
Variables with a natural zero Variables where the max/min ratio exceeds 10
12 10 8 6 4 2
1 2 7
Modde 3.0 by Umetri AB 1995-09-15 12:03
8 3 4
Types of variables
concentrations volumes levels
0 -2 1.5 2.0 2.5 3.0

A bel
3.5
4.0
4.5
15
9 1 7 2
0 1.5
2.0
2.5
3.0
A bel
3.5
4.0
4.5
2/10/2004
Transform before executing design
10
13
Constraints of factors
An irregular experimental region may be defined by specifying linear constraints of factors
Investigation: itdoe_constraint Raw Data Plot with Experiment Number labels
5
pH
8 3 14 13 12 11 10 9
pH
Exclusion above line
1
120
7
130 140 Temp
5
150 160
D-optimal design Exclusion below line

2/10/2004
14
Uncontrolled factors
These are factors that cannot be controlled, but which still may influence the results (responses)
Examples: Ambient humidity and temperature
Record values of uncontrolled factors, and include these in the data analysis Use randomization of experiments
2/10/2004
15
PF - 3. Specification of responses
Choose responses that are relevant; many responses often necessary (Regular, Derived, Linked) Continuous:
breakage of weld soot release when running a truck engine resolution of two adjacent peaks in liquid chromatography cost of material used in production (Derived response)
Discrete :
categorical answers of yes/no type the cake tasted good/did not taste well
Semi-continuous: (Product quality was)

Very poor = 1; Bad = 2; OK = 3; Good = 4; Excellent = 5
2/10/2004
16
Transformation of responses
Responses may be transformed A non-linear relationship between y and x, may be linearized by a suitable transformation of y Examples: no transf.; log; neglog; logit; square root; fourth root Transform after executing design
log y
x
2/10/2004
x
17
PF - 4. Selection of model
We distinguish between three main types of polynomial models
linear: interaction: quadratic: y = 0 + 1x1 + 2x2 +...+ y = 0 + 1x1 + 2x2 + 12x1x2 +...+ y = 0 + 1x1 + 2x2 + 11x12 + 22x22 + 12x1x2 +...+
Linear: Screening & Rob. Test.

2/10/2004
Interaction: Screening
Quadratic: Optimization
18
The model concept

Models are not reality, but approximate representations of some aspects of reality
Investigation: cakemix (MLR)

100
Contour of Taste
5.70 5.40 5.10 Flour = 400.000
90
80
Eggpowder
70
4.80
60
4.20 3.90
50 50 60 70
4.50 5.10
80 90 100
Shortening
Toy train
Map of Iceland
Response contour plot
2/10/2004
19
Empirical, semi-empirical and theoretical models

In DOE mathematical models are used for relating variation in factors to variation in responses Types of mathematical models
empirical semi-empirical fundamental y = a + bx + y = a + blogx + H = E ; pV = nRT
DOE is concerned with semi-empirical modeling using linear, interaction, quadratic, or cubic models
2/10/2004
20
PF - 5. Generation of design
Chosen model and design to be generated are intimately linked MODDE considers the number of factors, their levels and nature (quantitative, qualitative, ), and the selected experimental objective, and then recommends a design that is tailored to the researchers problem
2/10/2004
21
PF - 6. Creation of worksheet
An example worksheet with extra information
Run order; Constant factor; Uncontrollable factor; Inclusion of experiments;
Are the proposed experiments reasonable? Will they fulfil the goals?
2/10/2004
22
What we have learnt - part I

Problem formulation comprises six steps:
(i) selection of experimental objective
familiarization screening finding the optimal region optimization robustness testing (mechanistic modelling)
(ii) definition of factors (iii) definition of responses (iv) selection of regression model (v) generation of experimental design (vi) creation of worksheet
2/10/2004
23
What we have learnt - part II

Models are not reality, but useful approximations of small parts of reality Types of polynomial models:
linear: y = 0 + 1x1 + 2x2 +...+
geometry: undistorted plane objective: screening & robustness testing design: fractional factorial designs
interaction:
y = 0 + 1x1 + 2x2 + 12x1x2 +...+
geometry: twisted plane objective: screening design: full or fractional factorial designs
quadratic:
y = 0 + 1x1 + 2x2 + 11x12 + 22x22 + 12x1x2 +...+
geometry: curved plane objective: optimization (RSM in MODDE) design: composite designs
2/10/2004
24

Chapter 3 Full factorial designs
Contents
Introduction to full factorial designs Construction and geometry of the 22, 23, 24 and 25 designs Pros and cons of full factorial designs Main effect of a factor By-hand methods for computing effects Interaction effects Plotting of interaction effects Computation of effects using least squares analysis Relationship between effects and coefficients How to express regression coefficients Summary
2/10/2004
Introduction to full factorial designs

Full factorial designs form the basis for classical experimental designs They are important for a number of reasons:
they require relatively few runs per investigated factor they can be upgraded to form composite designs, which are used in optimization they form the basis for two-level fractional factorial designs, which are of great practical value at an early stage of a project they are easily interpreted by using common sense and elementary arithmetic
Full factorial designs are regularly used with 2 - 4 factors In this chapter we consider two-level full factorial designs
2/10/2004
Notation
To perform a two-level full factorial design, the investigator has to assign a low level and a high level to each factor
Notation Standard Extended Example: Temp Example: pH Example: Cat. (A, B) Low 1 High + +1 Center 0 0
100C 200C 150C 7 9 8 A B n/a

Cat A (-1)
For a simple system, it may be convenient to display the coded unit together with original factor unit
y1 = yield
Cat B (+1) low level -1 (100C) high level +1 (200C) x1 = temp

4
2/10/2004
The 22 full factorial design - construction & geometry

Example: ByHand
Definitions Construction Geometry
x1 x2 y3 Factors Amount formic acid/enamine (mole/mole) Reaction temperature (C) Response The desired product % - (1) 1.0 25 Levels 0 + (1) 1.25 1.5 62.5 100
X1
2/10/2004
Factors Original unit Exp. no x1 x2 1 1 25 2 1.5 25 3 1 100 4 1.5 100 5 1.25 62.5 6 1.25 62.5 7 1.25 62.5
Factors Coded unit x1 x2 + + + + 0 0 0 0 0 0
Response % y3 80.4 72.4 94.4 90.6 84.5 85.2 83.8
X2
The 23 full factorial design - construction & geometry

Example: CakeMix
Definitions Construction Geometry
Factors Flour Shortening Egg powder Response: Levels (Low/High) 200 g / 400 g 50 g / 100 g 50 g / 100 g Standard conditions 300 g 75 g 75 g
Taste of the cake, obtained by averaging the judgement of a sensory panel
Standard 300/75/75
100
X3
50 100 X2 200 X1 400 50
Exp No 1 2 3 4 5 6 7 8 9 10 11
Design Matrix Flour Short Egg ening + + + + + + + + + + + + 0 0 0 0 0 0 0 0 0
Experimental matrix Flour Short Egg Taste ening 200 50 50 3.52 400 50 50 3.66 200 100 50 4.74 400 100 50 5.2 200 50 100 5.38 400 50 100 5.9 200 100 100 4.36 400 100 100 4.86 300 75 75 4.68 300 75 75 4.73 300 75 75 4.61
6
2/10/2004
Orthogonality property of full factorials

Illustration: 23 design Each factor is orthogonal to the others in the design The effect of a factor can be estimated independently of all other factor influences
2/10/2004
The 24 and 25 full factorial designs

Construction of 22, 23, 24 and 25 designs in 4, 8, 16 and 32 runs, respectively (NOTE: No replicates are included) Geometrically, the 24 and 25 full factorial designs correspond to regular hyper-cubes in four and five dimensions
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 X1 X2 X3 X4 X5 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
2/10/2004
Pros and cons of two-level full factorial designs

These designs enable interaction models to be estimated, which is adequate for screening Each factor is investigated at both levels of all other factors
balancing orthogonality
No of No of runs No of runs investigated Full factorial Fractional factorial factors (k)
Full factorial designs are realistic choices with 2-4 factors; with 5 or more factors fractional factorial designs are recommended
2/10/2004
2 3 4 5 6 7 8 9 10
4 8 16 32 64 128 256 512 1024
--4 8 16 16 16 16 32 32
Main effect of a factor

The main effect of a factor is defined as the change in the response due to varying one factor from its low level to its high level, and keeping the other factors at their center-level Example: CakeMix
Main effect plot of flour with regards to taste
y1 = taste effect of flour
low level -1 (200 g)
high level +1 (400 g)

Investigation: Cakemix (MLR) Main Effect for Flour, resp. Taste
x1 = flour
5.00 Taste 4.80 4.60 4.40 200 220 240 260 280 300 Flour
N=11 DF=4 R2=0.995 Q2=0.874 R2 Adj.=0.988 RSD=0.0768 Conf. lev.=0.95
MODDE 7 - 2004-01-20 13:41:43
320
340
360
380
400
2/10/2004
10
Computation of main effects in the 22 case (ByHand)

The main effect of a factor might be understood as the average difference in response values when moving from low to high level
3
Y3 (desired product)
1,3 = 94.4-80.4=14
Factors Response % Original unit x2 y3 Exp. no x1 1 1.0 25 80.4 2 1.5 25 72.4 3 1.0 100 94.4 4 1.5 100 90.6 5 1.25 62.5 84.5 6 1.25 62.5 85.2 7 1.25 62.5 83.8
4 1
(te m pe ra tu re )
2,4 = 90.6 - 72.4 = 18.2
5,6,7
100
X2
2
25 1.0
X1 (f 1.5 orm ic ac id/en amin e)
Main effect of temperature: (1,3 + 2,4)/ 2 = (14 + 18.2)/2 = 16.1
2/10/2004
11
Computation of main effects in the 22 case (ByHand)

3
Investigation: Byhand (MLR) Main Effect for x1, resp. y3
4
pe ra tu re )
3,4 = 90.6 - 94.4 = -3.8
95 90 y3 85 80
5,6,7
(te m
100
2
25 1.0
X1 ( form
1,2 = 72.4 - 80.4 = -8 Main effect of formic acid/enamine: (1,2 + 3,4)/ 2 = (-8 + (-3.8))/2 = -5.9
75 1.0 1.1 1.2 x1

1.3
1.4
1.5
ic ac id/en
1.5
amin e)
Investigation: Byhand (MLR) MODDE 7 - 2004-01-20 13:51:55 Main Effect for x2, resp. y3
3
1,3 = 94.4-80.4=14
4
pe ra tu re )
95
2,4 = 90.6 - 72.4 = 18.2
(t e m
100
y3
5,6,7
90 85 80
2
25 1.0
X1 ( form
75
Main effect of temperature: (1,3 + 2,4)/ 2 = (14 + 18.2)/2 = 16.1
30 40 50 60 70 80 90 100 x2
MODDE 7 - 2004-01-20 13:50:19
ic ac id/en
1.5
amin e)
2/10/2004
12
A quicker by-hand method for computing effects (ByHand)

Experimental matrix Exp. no 1 2 3 4 5 6 7 x1 1 1.5 1 1.5 1.25 1.25 1.25 x2 25 25 100 100 62.5 62.5 62.5 Computational matrix mean + + + + + + + x1 + + 0 0 0 x2 + + 0 0 0 x1*x2 + + 0 0 0 Response y3 80.4 72.4 94.4 90.6 84.5 85.2 83.8
Calculations refer to the computational matrix: 1st column gives the mean: (+80.4+72.4+94.4+90.6+84.5+85.2+83.8)/7 = 84.5; 2nd column gives the molar ratio, x1, main effect: (-80.4+72.4-94.4+90.6)/2 = - 5.9; 3rd column gives the temperature, x2, main effect: (-80.4-72.4+94.4+90.6)/2 = 16.1; 4th column gives the x1*x2 two-factor interaction: (+80.4-72.4-94.4+90.6)/2 = 2.1
2/10/2004
13
Plotting of main and interaction effects (ByHand)

The two main effects make the surface slope and the twofactor interaction causes its twist
Interaction plots may be used to specifically explore the nature of interactions
Investigation: Byhand (MLR) Interaction Plot for x1*x2, resp. y3

95 90 y3 85 80 75 1.00 1.10 1.20 x1
N=7 DF=3 R2=0.997 Q2=0.995 R2 Adj.=0.993 RSD=0.5728
MODDE 7 - 2004-01-20 13:54:20
Investigation: Byhand (MLR)

x2 (low ) x2 (high)
Interaction Plot for x1*x2, resp. y3

95
x1 (low ) x1 (high)
x2 (high) x2 (high)
y3
90 85 80 75
x1 (low) x1 (high)
x2 (low) x2 (low)
1.30 1.40 1.50
x1 (low) x1 (high)
30 40 50 60 x2
N=7 DF=3 R2=0.997 Q2=0.995 R2 Adj.=0.993 RSD=0.5728
MODDE 7 - 2004-01-20 13:54:57
70
80
90
100
2/10/2004
14
The interaction plot shows the strength of an interaction

No interaction
Investigation: testing zero two factor interaction (MLR) Interaction Plot for X2*X5, resp. Tornado
15.60 15.55 Tornado 15.50 15.45 15.40 2
X2 (low ) X2 (high)
Mild interaction
Strong interaction
Investigation: Cakemix (MLR) Interaction Plot for Sh*Egg, resp. Taste
5.50 5.00 Taste
Egg (low ) Egg (high)
Investigation: LaserWelding_FO (MLR) Interaction Plot for Po*Sp, resp. Width

Sp (low ) Sp (high)
X2 (high)
X2 (high)
Width
Sp (low)
1.40 1.20 1.00 0.80
Egg (high) Egg (low) Egg (high)
Sp (low) Sp (high)
Power
N=22 DF=15 R2=0.972 Q2=0.940 R2 Adj.=0.961 RSD=0.0594
MODDE 7 - 2004-01-20 14:08:37
4.50 4.00
X2 (low)
3 4 5 6 X5
N=33 DF=22 R2=0.989 Q2=0.974
X2 (low)
7 8 9 10
Sp (high)
3.50
Egg (low)
50 60 70 80 90 100 Shortening
N=11 DF=6 R2=0.988 Q2=0.937 R2 Adj.=0.980 RSD=0.0974
MODDE 7 - 2004-01-20 14:09:21
2.20 2.40 2.60 2.80 3.00 3.20 3.40 3.60 3.80 4.00 4.20
R2 Adj.=0.984 RSD=0.3925
MODDE 7 - 2004-01-20 14:07:18
2/10/2004
15
Computation of effects using least squares fit

The by-hand method is used because it gives an understanding of the main and interaction effects concepts In reality, DOE data are analyzed by calculating a regression model using least squares fit, which has the following advantages:
(i) the robustness to slight fluctuations in the factor settings (ii) the ability to handle a failing corner where experiments could not be made (iii) the estimation of the experimental noise (iv) the availability of a number of useful model diagnostic tools
An important consequence of least squares analysis is that the outcome is not main and interaction effect estimates, but a regression model consisting of coefficients reflecting the influence of the factors (see below)
2/10/2004
16
Introduction to least squares analysis

Example of a linear relationship between a factor X1 and a response Y1 The deviation between the model and measured data is known as a residual Least squares analysis seeks to minimize the sum of the squares of such residuals Goodness of fit: R2 = 1 - SSres/SStot.corr
1 denotes perfect model 0 corresponds to no model at all 0.75 indicates a rough, but stable and useful model
Y1 = -1.54 + 1.61X1 + e; R2 = 0.75 Response Y1

4.5
3.5
2.5
Factor X1
2 2 2.5 3 3.5 4
2/10/2004
17
A coefficient has a value half of that of the effect

Coefficient
Indicates response change when factor changes from 0 to +1 (in coded factor unit) Coeff.s are sorted in factor order
0.40 0.20 0.00 -0.20 -0.40 -0.60 Fl Fl*Egg Egg Fl*Sh Sh Sh*Egg
MODDE 7 - 2004-01-20 14:11:40
Effect
Indicates response change when factor changes from -1 to +1 Effects are sorted according to abs(size)
1.00 0.50 Effects 0.00 -0.50 -1.00
N=11 DF=4
R2=0.995 Q2=0.874
R2 Adj.=0.988 RSD=0.0768 Conf. lev.=0.95
Investigation: Cakemix (MLR) Effects for Taste
Fl
Sh*Egg
Fl*Egg
Egg
Example: CakeMix
2/10/2004
N=11 DF=4
R2=0.995 Q2=0.874
R2 Adj.=0.988 RSD=0.0768 Conf. lev.=0.95

MODDE 7 - 2004-01-20 14:12:14
Fl*Sh
Sh
18
Ways of expressing regression coefficients

Scaled & Centered: Constant (4.695) relates to estimated taste at the design center-point Unscaled: Constant relates to taste at natural zero, i.e., zero grams of flour, shortening, and eggpowder (meaningless cake mix recipe !!!!!!)
2/10/2004
19
What we have learnt

Full factorial designs form the basis for classical experimental designs Full factorial designs are frequently used for exploring 2-4 factors They are useful for estimating factor main effects and two-factor interactions The main effect indicates the change in the response when the factor is varied from -1 to +1 (and fixing the others at 0) The regression coefficient of a linear model term reflects the change in the response when the factor is raised from 0 to +1 For model interpretation scaled & centered coefficients should be used Unscaled coefficients are used when exercising in EXCEL or with a pocket calculator
2/10/2004
20

Chapter 4 Analysis of full factorial designs
Contents
Introduction Minimum level of data analysis
Examples: CakeMix & ByHand
Recommended level of data analysis

Examples: CakeMix & ByHand
Advanced level of data analysis

Not illustrated
Overview of data analytical steps in MODDE Summary
2/10/2004
Introduction
Analysis of DOE-data consists of three primary stages:
evaluation of raw data
get a general appraisal for regularities and peculiarities in the data understand and/or remove anomalies
regression analysis and model interpretation

derive the best possible regression model interpret model
use of regression model

make decision of what to do next new investigation or verifying experiments?
2/10/2004
Minimum level of data analysis

Evaluation of raw data
replicate plot
Regression analysis and model interpretation

R2/Q2/Model Validity/Reproducibility coefficient plot
Use of regression model

response contour plot
2/10/2004
Evaluation of raw data - Replicate plot (CakeMix)

The replicate plot shows the variation among the replicates in relation to the variation across the entire design (reproducibility)
Investigation: cakemix Plot of Replications for Taste 6.00 5.50 Taste 5.00
Investigation: Test Plot of Replications for Response 3000 2500 amylase
6 4 3 5 8 7 1
1 2
17 16 2 4 3 1
1 2 3 4 5
2000 1500 1000
4.50 4.00 3.50
9 11 10
6 8 5
6 7
13 11 9 10 12
8
14
15 18 19 20
2
3 4 5 6 7 8 9 Replicate Index
500 Replicate Index
9 10 11 12 13 14 15
Good
2/10/2004
Bad
Regression analysis Summary of fit plot

R2 measures fit (explained variation)
1.00 0.80 0.60 0.40 0.20 0.00
N=11 DF=4
Investigation: CakeMix (MLR) Summary of Fit
Reproducibility assesses replicate R2 Q2 variation Model Validity

Reproducibility
Taste
Cond. no.=1.1726 Y-miss=0
Q2 measures predictive power (predicted variation)

2/10/2004
Model validity indicates if we have an appropriate model

6
Regression analysis - R2
Goodness of fit, R2 = 1- SSres/SStot.corr. measures how well we can reproduce current runs varies between 0 and 1 1 = perfect model (all points on line) easy to get arbitrarily close to 1 provides basis for raw and standardized residuals in Nplot
Investigation: cakemix (MLR) Taste 6.00 5.50 Observed 5.00 4.50 4.00 3.50
6 4 7 1
3.50
38 9 11 10
2
4.00 4.50 5.00 5.50 6.00 Predicted
N=11 DF=4
R2=0.995 Q2=0.874
R2 Adj.=0.988 RSD=0.0768
2/10/2004
Regression analysis - Q2
Goodness of prediction, Q2 = 1- SSpress/SStot.corr. uncovers how well we can predict new experiments varies between - and 1 better indicator of model usefulness Q2 > 0.5 GOOD Q2 > 0.9 EXCELLENT provides basis for deleted studentized residuals in N-plot
Investigation: cakemix (MLR) Taste 6.00 5.50 5.00 4.50 4.00 3.50
6 4 7 X 1
3.50
Observed
38 9 11 10
2
4.00 4.50 5.00 5.50 6.00 Predicted
N=11 DF=4
R2=0.995 Q2=0.874
R2 Adj.=0.988 RSD=0.0768
R2 must not exceed Q2 by more than 0.2-0.3 !!!!

2/10/2004
Regression analysis - Model Validity
ModelValidity =1 + 0.57647 * log10 ( p)

Model Validity > 0.25 indicates a good model
15
Investigation: byhand (MLR) y2
Model Validity < 0.25 indicates significant lack of fit (i.e., model imperfection) Only available when replicated experiments have been performed
2/10/2004
Observed
10 5 0 0
Significant lack-of-fit
6 7 5
4 3 5 10 Predicted
N=7 DF=3 R2=0.698 Q2=-10.499 R2 Adj.=0.396 RSD=5.0485
15
Regression analysis - Reproducibility

Reproducibility = 1 - (MSPure error / MSTotal corrected) If reproducibility is below 0.5, you have a large pure error and poor control of the experimental procedure
Investigation: cakemix Plot of Replications for Taste 6.00 5.50
Taste
Investigation: Test Plot of Replications for Response 3000 2500 amylase
6 4 3 5 8 7 1
1 2
17 16 2 4 3 1
1 2 3 4 5
5.00 4.50 4.00 3.50
2000 1500 1000
9 11 10
6 8 5
6 7
13 11 9 10 12
8
14
15 18 19 20
2
500 Replicate Index
9 10 11 12 13 14 15
Good
2/10/2004
Bad
10
Model interpretation - Coefficient plot (Cake Mix)

Coefficient plot shows importance of model terms; also useful for model refinement Example: CakeMix
initial model refined model
1.00 0.80 0.60 0.40 0.20 0.00
0.40 0.20 0.00 -0.20 -0.40 -0.60 Fl Egg Sh Sh*Egg
MODDE 7 - 2004-01-20 14:47:52

1.00

0.40
0.80 0.60 0.40 0.20 0.00
0.20 0.00 -0.20 -0.40 -0.60 Fl Sh*Egg Fl*Egg Egg Fl*Sh Sh

R2=0.995 Q2=0.874
Taste
N=11 DF=4 R2 Adj.=0.988 RSD=0.0768 Conf. lev.=0.95
MODDE 7 - 2004-01-20 14:47:02
Q2 increases from 0.87 to 0.94 model pruning justified
Taste
N=11 DF=6 R2=0.988 Q2=0.937
R2 Adj.=0.980 RSD=0.0974 Conf. lev.=0.95
2/10/2004
11
Use of model - Response contour plot (CakeMix)

Useful for understanding the impact of large interactions (Sh*EggP in CakeMix) Typical questions:
Where is the interesting area? Where do we start a new investigation? Where is it appropriate to make verifying experiments?
Flour = 400
Important: Underlying model must be good (high Q2) !!!

2/10/2004
12
Evaluation of raw data - Replicate plot (ByHand)

Example: ByHand (all three responses)
y1 (side product); y2 (unreacted starting material); y3 (desired product)
Small replicate errors
2/10/2004
13
Regression analysis - Summary of fit plot (ByHand)

R2/Q2 points to a poor model for y2 Replicate plot suggests a nonlinear relationship between factors and y2
Investigation: Byhand
Investigation: Byhand (MLR) Summary of Fit
1.00 0.80 0.60 0.40 0.20 0.00 -0.20
Plot of Replications for y2 with Experiment Number labels
1
10 y2
6 7 5
0
y1
N=7 DF=3
3
1 2 3 Replicate Index 4
4
5
y2
y3
MODDE 7 - 2004-01-20 15:16:05
2/10/2004
14
Model interpretation - Coefficient plot (ByHand)
Poor model for y2
2/10/2004
15
Use of model - Triplet response contour plot (ByHand)

Provides a convenient overview of the three models (but remember the weakness of model for y2) Goal
low y1 low y2 high y3
2/10/2004
16
Recommended level of data analysis

replicate plot + other options of Worksheet command in MODDE
Regression analysis and model interpretation

R2/Q2/Model Validity/Reproducibility + N-plot of residuals ANOVA (see Chapter 5) coefficient plot
Use of regression model

response contour plot + response surface plot, and prediction spreadsheet
2/10/2004
17

Analysis/Evaluate Recommended (Illustrated)
Condition number
Worksheet/Scatter plot Recommended

Plots of raw data
Worksheet/Histogram Recommended (Illustrated)

Distribution of response
Worksheet/Descriptive Statistics (Illustrated)

Distributions of several responses
Worksheet/Correlation Recommended if high CondNo

Plot or table of variable correlations
Worksheet/Replicate plot Minimum

Plot of signal-to-noise relationship
2/10/2004
18
Evaluation of raw data - Condition number

Measures the sphericity of a design Formally, the condition number is the ratio of the largest and the smallest singular values of the Xmatrix Informally, the condition number may be regarded as the ratio of the longest and shortest design diagonals
2/10/2004
Condition Number Good Design Questionable design BAD design
Scr. & Rob. Testing <3 3-6 >6
O pt. <8 8-12 >12
All factorial designs, without center-points, have condition number 1 Compute condition number before and after altering the design
19
Evaluation of raw data - Histogram & Descriptive Statistics

Tools used to evaluate the distribution of a response Near normality Positive skewness Negative skewness No transformation Use Log Use Neglog
Investigation: cakemix Histogram of Taste 6 5 4 Count 3 2 1 3.0 3.9 4.8 5.7 6.6 0
Count 10 8
Count 10 8 6 4 2 9 19 29 39 49 59 69 79 0
Investigation: itdoe_scr01c2 Histogram of Skewness
Investigation: microtox Histogram of V11
6 4 2 12 22 32 42 52 62 72 0
Bins
Investigation: CakeMix Descriptive Statistics for Taste
Bins
Bins
Investigation: itdoe_scr01c2 Descriptive Statistics for Skewness
Investigation: microtox Descriptive Statistics for V11
6
60
80 60
5
40 -
40 20
Skewness
20
Taste
Min: 3.52, Max: 5.9, Median: 4.73, Mean: 4.69455
V11
Min: 9, Max: 77, Median: 62.125, Mean: 56.1667
Min: 12.12, Max: 65.4, Median: 22.15, Mean: 23.7118
2/10/2004
20
Regression analysis - Normal probability plot of residuals

Good tool for finding outliers (deviating experiments) Example: NOx response of General Example 2 The vertical axis gives the normal probability of the distribution of residuals The horizontal axis corresponds to the numerical values of (standardized) residuals Note: Plot only useful with > 12-15 experiments & DF > 3
Investigation: TruckEngine (MLR) NOx with Experiment Number labels

0.98 0.95 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.05 0.02 -4 -3 -2
N-Probability
9 5
2
-1
13 12 16 14 11 48
10 37 17 15
Deleted Studentized Residuals

N=17 DF=9 R2=0.997 Q2=0.987 R2 Adj.=0.995 RSD=0.4624
MODDE 7 - 2004-01-20 15:36:58
2/10/2004
21
Use of model - Making predictions

Example: CakeMix Upper left-hand corner interesting Make predictions there
Flour = 400g
2/10/2004
22
Advanced level of data analysis (in Additional Topics)

Use partial least squares, PLS, as fit method for complicated applications PLS is appropriate when
(a) there are several correlated responses in the data set (b) the experimental design has a high condition number, above 10 (c) there are small amounts of missing data in the response matrix
All diagnostic tools are retained (R2/Q2, N-plot, etc.). In addition, PLS provides other useful diagnostic tools
2/10/2004
23
Overview of data analysis in DOE - CakeMix application

Three factors varied: Flour (200-400g), Shortening (50-100g), and Eggpowder (50-100g)
Responses: Taste of resulting cake, and cost of ingredients
2/10/2004
24
Overview of data analysis in DOE - CakeMix application
Compromise: High Taste & Low Cost
2/10/2004
25
Overview - Evaluate raw data

Investigation: Cakemix Histogram of taste 6 4 2 0
Count
3.0
3.9
4.8 Bins
5.7
6.6
Investigation: Cakemix_cost Plot of Replications for Taste with Experiment Number labels
6.00 5.50 Taste 5.00 4.50 4.00 3.50 1
6 4 3 7 1
2
5 8 9 11 10
2
MODDE 7 - 2004-01-21 08:21:13
2/10/2004
26
Overview - Regression analysis and model interpretation

1.00
0.40
0.80

R2=0.995 Q2=0.874
Compute model and interpret results
0.60 0.40 0.20 0.00
Taste
MODDE 7 - 2004-01-20 14:47:02

1.00

0.40 0.20
Refine model and interpret results
0.80 0.60 0.40 0.20 0.00
0.00 -0.20 -0.40 -0.60 Fl Egg Sh Sh*Egg

MODDE 7 - 2004-01-20 14:47:52
Taste
N=11 DF=6 R2=0.988 Q2=0.937
R2 Adj.=0.980 RSD=0.0974 Conf. lev.=0.95
2/10/2004
27
Overview - Regression analysis and model interpretation

Do further diagnostic testing of refined model N-plot less useful due to few experiments Point 1 is influential but not an alarming outlier (Q2 = 0.937)
0.95 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.05 -5 -4 N-Probability
Investigation: Cakemix_cost (MLR) Taste with Experiment Number labels
1 4 9 11 3 7 5 10 2
-3 -2 -1 0 1 2 3 4 5 Deleted Studentized Residuals
N=11 DF=6 R2=0.988 Q2=0.937 R2 Adj.=0.980 RSD=0.0974
MODDE 7 - 2004-01-21 08:25:01
6 8
2/10/2004
28
Overview - Use of model

Use evaluated refined model (make decisions) Where to do verifying experiments ????
Max Taste
Min Cost
Compromise High Taste/Low Cost

Flour = 400
2/10/2004
29
What have we learnt

Data analysis of DOE-data comprises three stages
evaluation of raw data
done to understand and clean data, and speed up regression modelling
regression analysis and model interpretation

done to derive the predictively most relevant model with meaningful mechanistic interpretation
use of model
done to find out the impact of the model: What does it mean? Where should new experiments be positioned?
2/10/2004
30

Chapter 5 Analysis of full factorial designs. II. Causes of bad models
Contents
Review of data analytical steps
evaluation of raw data regression analysis and model interpretation use of model
Causes of poor model

Skew response distribution Curvature Bad replicates Deviating experiments Missing factors
2/10/2004
Review of data analysis in DOE - CakeMix application

Three factors varied: Flour (200-400g), Shortening (50-100g), and Eggpowder (50-100g)
Responses: Taste of resulting cake, and cost of ingredients
2/10/2004
Review of data analysis in DOE - CakeMix application
Compromise: High Taste & Low Cost
2/10/2004
Review - Evaluate raw data

Investigation: Cakemix Histogram of taste 6 4 2 0
Count
3.0
3.9
4.8 Bins
5.7
6.6
Investigation: Cakemix_cost Plot of Replications for Taste with Experiment Number labels
6.00 5.50 Taste 5.00 4.50 4.00 3.50 1
6 4 3 7 1
2
5 8 9 11 10
2
MODDE 7 - 2004-01-21 08:21:13
2/10/2004
Review Regression analysis and model interpretation

1.00
0.40
0.80

R2=0.995 Q2=0.874
Compute model and interpret results
0.60 0.40 0.20 0.00
Taste
MODDE 7 - 2004-01-20 14:47:02

1.00

0.40 0.20
Refine model and interpret results
0.80 0.60 0.40 0.20 0.00
0.00 -0.20 -0.40 -0.60 Fl Egg Sh Sh*Egg

MODDE 7 - 2004-01-20 14:47:52
Taste
N=11 DF=6 R2=0.988 Q2=0.937
R2 Adj.=0.980 RSD=0.0974 Conf. lev.=0.95
2/10/2004
Regression analysis - Analysis of variance (ANOVA)

ANOVA is concerned with estimating various types of variabilities in the response data, and then comparing such estimates with each other through F-tests ANOVA table of Taste (CakeMix)
2/10/2004
Regression analysis - Analysis of variance (ANOVA)

Upper F-test assesses the significance of the regression model, and is satisfied when p < 0.05 Lower F-test compares the model error and the replicate error, and is satisfied when p > 0.05 LoF p-value used in calculation of Model Validity
2/10/2004
ANOVA table of Taste (CakeMix)

8
Review - Regression analysis and model interpretation

Do further diagnostic testing of refined model: ANOVA OK Not a strong outlier
Investigation: Cakemix_cost (MLR) Taste with Experiment Number labels
0.95 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.05 -5 -4
1 4 9 11 3 7 5 10 2
-3 -2 -1 0 1 2 3 4 5 Deleted Studentized Residuals
N=11 DF=6 R2=0.988 Q2=0.937 R2 Adj.=0.980 RSD=0.0974
MODDE 7 - 2004-01-21 10:04:51
2/10/2004
N-Probability
6 8
Review - Use of model

Use evaluated refined model (make decisions) Where to do verifying experiments ????
Max Taste
Min Cost
Compromise High Taste/Low Cost

Flour = 400
2/10/2004
10
Causes of poor model

Skew response distribution
Benefits of response transformation
Curvature Bad replicates Deviating experiments Missing factors .

2/10/2004
11
Cause of poor model. 1. - Skew response distribution

Common cause for poor modelling results Detection Tools:
Histogram Box-Whisker plot Replicate plot (next slide)
Investigation: Reporter Gene Assay Screening Histogram of S/B
18 16 14 12 Count
Count 9 8 7 6 5 4 3 2 1
Investigation: Reporter Gene Assay Screenin Histogram of S/B~
10 8 6 4 2 0 -1 24 49 Bins 74 99 124
0 -3 -2 -1 0 Bins 1 2 3
MODDE 7 - 2004-02-02 15:30:34

MODDE 7 - 2004-02-02 15:32:45
Investigation: Reporter Gene Assay Screening Descriptive Statistics Plot

120 100 80 60
Investigation: Reporter Gene Assay Screening Descriptive Statistics Plot

2
General Example 1; Response S/B
40 20 0 S/B
S/B Min: -0.2 Max: 117 Median: 1.7 Mean: 11.8053
-1
-2 S/B~
S/B Min: -2 Max: 2.06896 Median: 0.281033 Mean: 0.219957
2/10/2004
12
Cause of poor model. 1. - Skew response distribution

Investigation: Reporter Gene Assay Screening Plot of Replications for S/B with Experiment Number labels
Investigation: Reporter Gene Assay Screening Plot of Replications for S/B~ with Experiment Number labels
Replicate plot
S/B
120 100 80
16
1 S/B~
6 1 2 3
1 2 3 4 5 6 7 8
8 7
60 40 20
14 15 6 8 1 2 3 4 5 7 9 10111213
1 2 3 4 5 6 7 8 Replicate Index
MODDE 7 - 2004-02-02 15:37:01
16 14 15 13 19 17 18 12 9 10 11
-1
increases as a result of the logtransformation (from -0.2 to 0.91) Q2
19 17 18
-2
9 10 11 12 13 14 15 16 17
9 10 11 12 13 14 15 16 17
Replicate Index
MODDE 7 - 2004-02-02 15:35:41
R2 Investigation: Reporter Gene Assay Screening (MLR) Q2 Summary of Fit Model Validity Reproducibility
R2 Investigation: Reporter Gene Assay Screening (MLR) Q2 Model Validity Summary of Fit Reproducibility
1.00 0.80 0.60
1.00
0.80
0.60
0.40
0.40
0.20 0.00 -0.20 S/B

N=19 DF=11 Cond. no.=1.0897 Y-miss=0
N=19 DF=11
0.20
0.00 S/B~
2/10/2004
13
Benefits of response transformation

A well-chosen transformation may
(i) simplify the response function by linearizing a non-linear response-factor relationship, (ii) stabilize the variance of the residuals, and (iii) make the distribution of the residuals more normal, which sometimes implies that outliers are eliminated
2/10/2004
14
Example of benefits of a transformation

Production of long-lasting device for service in an aircraft. Ten factors were varied in 32 experiments (screening). Response was the lifetime in hours of device.
Investigation: Airplane (MLR) Summary of Fit
1.00
R2 Q2
Investigation: Airplane (MLR) Time with Experiment Number labels

Investigation: Airplane Histogram of Time

12 10 Count 8 6 4 2
0.80
6000 Observed 4000 2000 0

Time
N=32 DF=21 Cond. no.=1.0000 Y-miss=0
0.60
0.40
0.20
10 26
0
15 31 21 5 19 3 823 7 24 25 30 9 29 13 27 32 18 11 12 14 1 17 28 6 2 16 22 4 20
2 1 0 -1 -2
26 10
15 31 19 5 21 4 20 3 6 22 118 7 2 16 32 30 78 23 1 29 11 27 12 9 14 13 25 24 28
N-Probability
0.00
0.99 0.98 0.95 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.05 0.02 0.01 -4 -3 -2
21 19 5 4 3 20 6 22 1 7 8 2 18 16 32 2 3 7 1 30 29 11 27 12 9 14 13 24 25 28
-1 0 1 2
26 15 10 31
2000
4000
6000
2000 Predicted
4000
6000
Predicted
N=32 DF=21 R2=0.876 Q2=0.712 R2 Adj.=0.817 RSD=844.4341
N=32 DF=21

N=32 DF=21 R2=0.876 Q2=0.712 R2 Adj.=0.817 RSD=844.4341
MODDE 7 - 2004-01-21 10:27:11
316 13162316331643165316631673168316 Bins
R2=0.876 Q2=0.712
R2 Adj.=0.817 RSD=844.4341
MODDE 7 - 2004-01-21 10:23:15
Investigation: Airplane Histogram of Time~

12 10 8 Count 6 4 2
3 4
MODDE 7 - 2004-01-21 10:27:58
Investigation: Airplane (MLR) Summary of Fit

1.00
R2 Q2
MODDE 7 - 2004-01-21 10:21:55 Investigation: Airplane (MLR)
Investigation: Airplane (MLR) Time~ with Experiment Number labels

Deleted Studentized Residuals 2 1 0 -1 -2
Investigation: Airplane (MLR) Time~ with Experiment Number labels

0.99 0.98 0.95 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.05 0.02 0.01 -4 -3
Time~ with Experiment Number labels
0.80
0.60
0.40
Observed
10
3.00
4 10 26
0.20
0.00
2.50
Time~
N=32 DF=21 Cond. no.=1.0000 Y-miss=0
26
28
Predicted
21 23
N-Probability
3.50
15 31 5 19 3 21 78 23 24 25 30 9 13 29 27 18 32 11 14 12 1 17 28 6 20 16 2 22
32 19 31 2017 18 30 29 9 78 5 6 11 27 16 14 4 2 12 13 2524 3 15 22 1
26
-2
32 19 20 18 31 30 17 29 10 9 8 7 11 6 25 27 16 14 4 12 15 13 25 24 3 1 21 22 28 23
-1 0 1 2
2.60 2.80 3.00 3.20 3.40 3.60 3.80 Predicted

N=32 DF=21 R2=0.990 Q2=0.978 R2 Adj.=0.986 RSD=0.0399
MODDE 7 - 2004-01-21 11:10:59
2.60 2.80 3.00 3.20 3.40 3.60 3.80

N=32 DF=21 R2=0.990 Q2=0.978 R2 Adj.=0.986 RSD=0.0399
MODDE 7 - 2004-01-21 10:30:04
2.00 2.30 2.60 2.90 3.20 3.50 3.80 4.10 Bins

MODDE 7 - 2004-01-21 10:29:07
Desired result: Both high

2/10/2004
N=32 DF=21
R2=0.990 Q2=0.978
R2 Adj.=0.986 RSD=0.0399
MODDE 7 - 2004-01-21 10:30:50
Straight line
No patterns
Straight line Bell-shaped curve

15
Cause of poor model. 2. - Curvature

Investigation: Byhand
Curvature is a problem in screening because the used linear and interaction models are unable to fit such a phenomenon Fortunately, problems related to curvature are easily detected and fixed Detection Tools:
Replicate plot Low Q2 & Model Validity LoF (ANOVA)

14 12 10 y2 8 6 4 2 0 1 2 3 Replicate Index
MODDE 7 - 2004-01-21 11:30:05

1.00
6 7 5
0.80 0.60 0.40 0.20 0.00
3
4
4
5
-0.20
y1
N=7 DF=3
y2
y3
Example: ByHand
2/10/2004
16
Curvature: How to handle it

Steps:
removal of non-significant two-factor interaction addition of Temp2 (x22) refitting of model higher Q2 & Model Val. better ANOVA
1.00 0.80 0.60

1.00
0.80
0.60
0.40 0.20 0.00 -0.20
0.40
0.20
y1
N=7 DF=3
y2
y3
0.00
y1
N=7 DF=3
y2
y3
2/10/2004
17
Curvature: How to handle it

NOTE: the initial 22 factorial design must be augmented with axisexperiments to permit a more rigorous assessment of necessary quadratic terms x12 and x22 are confounded
1.00

1.00
0.80
0.80
0.60
0.60
0.40
0.40
0.20
0.20
0.00
y1
N=7 DF=3
y2
y3
0.00
y1
N=7 DF=3
y2
y3
Investigation: Byhand (MLR) Scaled & Centered Coefficients for y2
Investigation: Byhand (MLR) Scaled & Centered Coefficients for y2
0.00 -2.00 -4.00 -6.00 -8.00 x1

N=7 DF=3 R2=0.997 Q2=0.964
0.00 -2.00 -4.00 -6.00 -8.00 x1

N=7 DF=3 R2=0.997 Q2=0.964
x2
x1*x1
x2
x2*x2
R2 Adj.=0.995 RSD=0.4628 Conf. lev.=0.95

MODDE 7 - 2004-01-21 11:50:00
R2 Adj.=0.995 RSD=0.4628 Conf. lev.=0.95

MODDE 7 - 2004-01-21 11:47:46
2/10/2004
18
Cause of poor model. 3. - Bad replicates

Investigation: amylase
A third common cause resulting in a poor screening model is when replicated experiments spread too much Detection Tools:
Replicate plot ANOVA table Reproducibility bar (here = 0.53, but not shown)
Plot of Replications for amylase with Experiment Number labels

3000 2500 amylase 2000 1500 1000 500 1 2 3 4 5 6 7 8 9 10 11 Replicate Index
MODDE 7 - 2004-01-21 11:51:54
17 16 2 3 1 5 7 4 6 8 9 10 12
12 13 14 15
13 11
14
15 18 19 20
2/10/2004
19
Cause of poor model. 4. - Deviating experiments

Deviating experiments, or outliers, may degrade the predictive ability and blur the interpretation of a regression model Detection Tools: mainly N-plot, but also other residual plots Replicate plot, Model Validity, and LoF in ANOVA
Investigation: Willge_Opt (MLR) Yield with Experiment Number labels
0.98 0.95 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.05 0.02 -7

0.98 0.95 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.05 0.02
8
-6 -5 -4 -3 -2 -1
12 14 10 17 1 20 16 15 9 19 18 6 2 13 4 11 5 7 3
0 1 2 3 4 5 6 7
N-Probability
N-Probability
N-Probability
8
-1
17 20 16 15 9 6 19 2 13 4 18 11 5 7 3
0 Standardized Residuals
N=20 DF=10 R2=0.980 Q2=0.849 R2 Adj.=0.961 RSD=5.0006
14 1 10
12
0.98 0.95 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.05 0.02
7 3
17 20 16 15 69 19 2 13 4 18 11 5
14 1 10
12
-9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 Raw Residuals
N=20 DF=10 R2=0.980 Q2=0.849 R2 Adj.=0.961 RSD=5.0006
MODDE 7 - 2004-01-21 11:54:57

N=20 DF=10 R2=0.980 Q2=0.849 R2 Adj.=0.961 RSD=5.0006
MODDE 7 - 2004-01-21 11:54:00
MODDE 7 - 2004-01-21 11:54:26
2/10/2004
20
Cause of poor model. 5. - Missing factors

Most difficult outcome Might be the case when a model with moderate values of R2 ( 0.6) and Q2 ( 0.4) is obtained A missing factor requires additional thinking
Variation in temperature, humidity, raw material, equipment failure, ?
A mapping of a new factor requires more experiments; therefore, in reality, we usually only eliminate factors in screening
2/10/2004
21
What have we learnt

Data analysis of DOE-data comprises three stages
evaluation of raw data regression analysis and model interpretation use of model
Causes of bad model

Skew response distribution Curvature Bad replicates Deviating experiments Missing factors
Primary Detection Tool

Histogram Model Validity/LoF in ANOVA Replicate plot N-plot Generally poor model performance
2/10/2004
22

Chapter 6 Experimental Objective: Screening Illustration: General Example 1 (Reporter Gene Assay)
Contents
General Example 1
Background Steps in problem formulation Introduction Geometry Confoundings Generators Defining Relation Resolution Summary of properties
Fractional factorial designs
General Example 1
Summary
Evaluation of raw data Regression analysis and model interpretation Use of model
2/10/2004
Background to General Example 1

Reporter gene assays are used in mechanistic studies of gene regulation (toxicology, drug development, etc.) A reporter gene has an easily measurable phenotype whose transcription is controlled by a promoter Reporter gene assays provide important information of gene regulation relating to expression (i.e., number of copies), and when and where a particular protein is formed
2/10/2004
Principal investigators: Lena Schultz and Lisbeth Abramo Active Biotech AB, Lund
PF - Selection of experimental objective

In the reporter gene application, the selected experimental objective was screening With screening one wants to find out a little about many factors, that is, which factors dominate and what are their optimal ranges? Typically, screening designs involve the study of between 4 and 10 factors, but applications with as many as 12-15 screened factors are not uncommon The reporter gene case contains six factors, and this facilitates the overview of the results
t en tm ea Tr
Light
2/10/2004
PF - Specification of factors
The Ishikawa, or fishbone, system diagram is a very helpful method to overview all factors Reduces the risk of missing a critical factor The four Ms Practical maximum depth 4-5 levels
Methods
Manpower
Machines
Materials
2/10/2004
Six factors:
Number of cells/well (50000 400000) PMA (stimulator) (5 100 ng/ml) Ionomycin (stimulator) (0.1 2 g/ml) Stimulation time (3 6 hours) Lysing volume (30 100 l) Ratio sample/substrate (2 10)
2/10/2004
PF - Specification of responses
It is important to select responses that are relevant according to the experimental goals Many responses is not a problem Here: Signal-to-background ratio, computed as: [(signal-background)/background]
GOAL: As high as possible (Maximize)

2/10/2004
PF - Selection of regression model

Linear model? Interaction Model? With six factors a linear model requires 16 + 3 and an interaction 32 + 3 experiments A linear model was selected:
y = 0 + 1x1 + 2x2 +...+
2/10/2004
PF - Generation of design and creation of worksheet

The 26-2 fractional factorial design is applicable (explained in a moment) Standard design with 16 corners and 3 center-points
NOTE: Randomized run order!
2/10/2004
Introduction to fractional factorial designs

Consider the 27 full factorial design in 128 runs It is possible to estimate 128 model parameters
1 constant term; 7 linear terms; 21 two-factor interactions; 35 three-factor interactions; 35 four-factor interactions; 21 five-factor interactions; 7 six-factor interactions; 1 seven-factor interaction
Not all parameters are of appreciable size and meaningful -- hierarchy Linear terms tend to be larger than two-factor interactions, which, in turn, tend to be larger than three-factor interactions, ... A 2k full factorial design has a parameter redundancy, i.e., an excess number of parameters which can be estimated but which lack relevance Fractional factorial designs exploit this redundancy, by reducing the number of design runs
2/10/2004
10
Geometry of fractional factorial designs (FFDs)

Fractional factorial designs are used in screening and robustness testing A fraction of all possible corner experiments is selected Advantage: Reduction of experiments Disadvantage: Confounding of effects
7 8 Eggpowder 100
7 8 Eggpowder 100
8 Eggpowder 4
Sho rten ing
100
5
50 100
4
rten in g
Sho rten in g
50 100
1 200
50 100
Sh o
1 200
Flour
2 400
50
Flour
2 400
50
1 200
Flour
2 400
50
2/10/2004
11
Going from the 23 full factorial design to the 24-1 FFD

Use computational matrix of parent 23 design
Run # 1 2 3 4 5 6 7 8 constant + + + + + + + + x1 + + + + x2 + + + + x3 + + + + x1x2 + + + + x1x3 + + + + x2x3 + + + + x1x2x3 + + + +
Introduction of the fourth factor

Run # 1 2 3 4 5 6 7 8 constant + + + + + + + + x1 + + + + x2 + + + + x3 + + + + x 1x 2 + + + + x 1x 3 + + + + x 2x 3 + + + +
x4 = x1x2x3 + + + +
2/10/2004
12
Confounding of effects
Reduction of experiments means that effects become confounded, that is, to a certain degree mixed up with each other The 16 possible effects are evenly allocated as two effects per column Main effects are confounded with the three-factor interactions Comparatively simple confounding situation
x1x2x3x4 constant + + + + + + + + x2x3x4 x1 + + + + x1x3x4 x2 + + + + x 1x 2x 4 x3 + + + + = x 1x 2x 3 x4 + + + + x 3x 4 x 1x 2 + + + + x 2x 4 x 1x 3 + + + + x 2x 3 x 1x 4 + + + +
1 2 3 4 5 6 7 8
2/10/2004
13
Confoundings: Use of Correlation Matrix

Full factorial design (23); no confounding Fractional factorial design (24-1); confounding
2/10/2004
14
A graphical interpretation of confoundings

Only the sum of confounded terms is estimated Main effects usually dominate over three-factor interactions More experiments are needed to better resolve confounded terms
x2 / x1x3x4
x1 / x2 / x3 / x4 / x1x2 / x1x3 / x1x4 / x2x3x4 x1x3x4 x1x2x4 x1x2x3 x3x4 x2x4 x2x3
2/10/2004
15
Generators - Introduction
The generator dictates which specific fraction will be selected, and thereby, indirectly, controls the confounding pattern
7 8 Eggpowder 100
7 8 Eggpowder 100
8 Eggpowder 4
Sho rten in g
100
5
50 100
4
Sho rten ing
Sho rten in g
50 100
1 200
50 100
1 200
Flour
2 400
50
Flour
2 400
50
1 200
Flour
2 400
50
x1 = Flour x2 = Shortening x3 = Eggpowder
Run 5 2 3 8
x1 + +
x2 + +
x3 = x1x2 + +
Run 1 6 7 4
x1 + +
x2 + +
-x3 = x1x2 + + -
2/10/2004
16
Generators of the 24-1 fractional factorial design

Two versions of the 24-1 design, one given by the generator x4 = x1x2x3, and the other by the alternative generator -x4 = x1x2x3
Run # 1 2 3 4 5 6 7 8 x1 + + + + x2 + + + + x3 + + + + x4 Run # 1 10 11 4 13 6 7 16 x1 + + + + x1 + + + + x2 + + + + x2 + + + + x3 + + + + x3 + + + + x 1x 2x 3= x4 + + + + x 1x 2x 3= x4 + + + +
9 10 11 12 13 14 15 16
2/10/2004
+ + + +
+ + + +
+ + + +
+ + + + + + + +
9 2 3 12 5 14 15 8
17
Multiple generators
Example: Construction of 25-2 fractional factorial design Generators: x4 = x1x2 and x5 = x1x3 Fourth and fifth factors may be introduced in the design as +x4/+x5, +x4/-x5, -x4/+x5, or -x4/-x5 Four possible quarter-fractions of 8 experiments
1 2 3 4 5 6 7 8 x1 + + + + x2 + + + + x3 + + + + x4 = x 1x 2 + + + + x5 = x1x3 + + + + x2x3 + + + + x1x2x3 + + + +
2/10/2004
18
Defining relation - Introduction

The defining relation of a design is a formula derived from all its generators, that allows the calculation of the occurring confounding pattern This relation ties together all generators; Example: 24-1 design
Rules: (1) X1*I=X1
X1 * I = X1 + + + + + + + +
(2) X1*X1=X12=I
X1 * X1 = I + + + + + + + +
Step 1: Identify generator(s): Step 2: Multiply both sides by X4: Step 3: Apply rule 2 This is the defining relation for the 24-1 design
X4=X1X2X3 X42=X1X2X3X4 I=X1X2X3X4
2/10/2004
19
Use of defining relation

Confounding pattern can be understood through defining relation Example 24-1 design What is x1 confounded with?
Step 1: I = x1x2x3x4 Step 2: x1I = x12x2x3x4 Step 3: x1 = Ix2x3x4 Step 4: x1 = x2x3x4
2/10/2004
20
Defining relation of the 25-2 fractional factorial design
I = x1x2x4 = x1x3x5 = x2x3x4x5 What is x4 confounded with? x4 = x1x2 = x1x3x4x5 = x2x3x5
2/10/2004
21
Confounding pattern of the used 26-2 protocol

There are four ways of selecting 16 experiments out of 64 Actual selection controlled by the generators Each such quarterfraction is equivalent from a mathematical point of view
2/10/2004
22
Resolution of fractional factorial designs

Resolution of a design is defined as the length of the shortest word in the defining relation Resolution III I=a*b*c
Main effects confounded with two-factor interactions. Resolution III designs are the most reduced but also the most difficult analyze. Recommended for robustness testing.
Resolution IV Resolution V
I=a*b*c*d
Main effects unconfounded with two-factor interactions. Two-factor interactions still confounded with each other. Recommended for screening
I=a*b*c*d*e
Main effects unconfounded with two-factor interactions. Two-factor interactions unconfounded with each other. Resolution V designs are almost as good as full factorial designs.
2/10/2004
23
Summary of properties of fractional factorial designs

The selected generators control the confounding pattern and the selected fraction of experiments. Indirectly, this means that the selected generators also influence the shape of the defining relation and the resolution of the design.
Confounding Pattern
Resolution
Defining Relation
Generator(s)
Selected Fraction
2/10/2004
24
Overview table of common fractional factorial designs

Factors
4 8
3 3-1 2 Res III

+/-X3=X1*X2
8 No design No design 2 Res IV

8-4
9 No design No design 2 Res III

9-5
10 No design No design 2 Res III

10-6
No design No design 2 Res IV 2

4 4-1
No design No design 2 Res III

+/-X4=X1*X2 +/-X5=X1*X3 +/-X6=X2*X3
+/-X4=X1*X2*X3
2 Res III
+/-X4=X1*X2 +/-X5=X1*X3
5-2
6-3
2 Res III
7-3
7-4
16
2 Res V
5-1
+/-X5=X1*X2*X3*X4
2 Res IV
6-2
+/-X4=X1*X2 +/-X5=X1*X3 +/-X6=X2*X3 +/-X7=X1*X2*X3
+/-X5=X1*X2*X3 +/-X6=X2*X3*X4
2 Res IV
+/-X5=X1*X2*X3 +/-X6=X2*X3*X4 +/-X7=X1*X3*X4
+/-X5=X2*X3*X4 +/-X6=X1*X3*X4 +/-X7=X1*X2*X3 +/-X8=X1*X2*X4
+/-X5=X1*X2*X3 +/-X6=X2*X3*X4 +/-X7=X1*X3*X4 +/-X8=X1*X2*X4 +/-X9=X1*X2*X3*X4
32
2 D-opt 2 D-opt 2 Res VI Res IV Res IV

+/-X6=X1*X2*X3 *X4*X5 +/-X6=X1*X2*X3*X4 +/-X7=X1*X2*X4*X5
6-1
7-2
8-3
+/-X6=X1*X2*X3 +/-X7=X1*X2*X4 +/-X8=X2*X3*X4*X5
2 Res IV
9-4
+/-X5=X1*X2*X3 +/-X6=X2*X3*X4 +/-X7=X1*X3*X4 +/-X8=X1*X2*X4 +/-X9=X1*X2*X3*X4 +/-X10=X1*X2
+/-X6=X2*X3*X4*X5 +/-X7=X1*X3*X4*X5 +/-X8=X1*X2*X4*X5 +/-X9=X1*X2*X3*X5
2 Res IV
10-5
64
2 D-opt 2 D-opt 2 D-opt 2 Res VII Res V Res IV Res IV

+/-X7=X1*X2*X3 * X4*X5*X6 +/-X7=X1*X2*X3*X4 +/-X8=X1*X2*X5*X6 +/-X7=X1*X2*X3*X4 +/-X8=X1*X3*X5*X6 +/-X9=X3*X4*X5*X6
7-1
8-2
9-3
+/-X6=X1*X2*X3*X4 +/-X7=X1*X2*X3*X5 +/-X8=X1*X2*X4*X5 +/-X9=X1*X3*X4*X5 +/-X10=X2*X3*X4*X5
10-4
D-opt
128
2 D-opt 2 D-opt 2 Res VIII Res VI Res V

+/-X8=X1*X2*X3*X4 *X5*X6*X7 +/-X8=X1*X3*X4*X6*X7 +/-X9=X2*X3*X5*X6*X7
8-1
9-2
+/-X7=X2*X3*X4*X6 +/-X8=X1*X3*X4*X6 +/-X9=X1*X2*X4*X5 +/-X10=X1*X2*X3*X5
10-3
D-opt
+/-X8=X1*X2*X3*X7 +/-X9=X2*X3*X4*X5 +/-X10=X1*X3*X4*X6
2/10/2004
25
Summary of fractional factorial designs

Advantage: We can investigate more factors with drastically fewer runs (always add 3-5 center-points)
No of Factors 2 3 4 5 6 7 8 9-16 Full Factorial 4 8 16 32 64 128 256 >512 Fractional Factorial N/A 4 8 16 16 16 16 32
Disadvantage: Confounding = Aliasing of effects

Higher resolution gives less problems - Resolution IV recommended
2/10/2004
26
Reporter Gene Assay - Evaluation of raw data

Replicate and histogram plots (before and after log-transformation)
Investigation: Reporter Gene Assay Screening Plot of Replications for S/B with Experiment Number labels
18
120 100 80
Investigation: Reporter Gene Assay Screening Histogram of S/B
16
16 14 12 Count 10 8 6
Before
S/B
60 40 20 0 1
14 15 6 8 1 2 3 4 5 7 9 10111213
19 17 18
2 0 -1 24 49 Bins
MODDE 7 - 2004-02-02 15:55:58 Investigation: Reporter Gene Assay Screening
9 10 11 12 13 14 15 16 17
74
99
124
Investigation: Reporter Screening MODDE Gene 7 - 2004-02-02Assay 15:55:14 Plot of Replications for S/B~ with Experiment Number labels
2
Histogram of S/B~
9 8 7 6 Count 5 4 3 2 1
After
6 1 2 3
1 2 3 4 5 6 7 8
8 7
S/B~
16 14 15 13 19 17 18 12 9 10 11
-1
-2
9 10 11 12 13 14 15 16 17
0 -3 -2 -1 0 Bins 1 2 3
Replicate Index
MODDE 7 - 2004-02-02 15:57:02
MODDE 7 - 2004-02-02 15:57:59
2/10/2004
27

Condition number & Correlation matrix
Good design Response depends on Cells, Ion, and StH
2/10/2004
28
Reporter Gene Assay - Regression analysis

The default linear model looks good with no evidence of lack of fit (R2 = 0.92, Q2 = 0.79, MVal = 0.65, Rep = 0.96)
0.98 0.95 0.9 N-Probability
1.00
0.80
0.60
0.40
0.20
0.00 S/B~
N=19 DF=12 Cond. no.=1.0897 Y-miss=0
Investigation: Reporter Gene Assay Screening (MLR) S/B~ with Experiment Number labels
No outliers
0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.05 0.02 -4 -3 -2
1 16 19 15 17 2 18 8 49 5 13 12 10 7 11 14 6
-1 0 1 2 3 4

N=19 DF=12 R2=0.917 Q2=0.791 R2 Adj.=0.876 RSD=0.3472
MODDE 7 - 2004-02-02 16:01:08
2/10/2004
29
Reporter Gene Assay - Model adjustment

Model revision steps:
1.00
R2 Investigation: Reporter Gene Assay Screening (MLR) Q2 Summary of Fit Model Validity
Investigation: Reporter Gene Assay Screening (MLR) Scaled & Centered Coefficients for S/B~
1.00 0.80 0.60
Reproducibility
1) PMA and Ratio were removed 2) Six two-factor interactions were added 3) Only three interactions were kept (Cel*Lys, Ion*StH, and Ion*Lys) 4) The revised model is much better (R2 = 0.96, Q2 = 0.91, Mval = 0.79, Rep = 0.96)
2/10/2004
0.80
0.60
0.40 0.20 0.00
0.40
0.20
-0.20 -0.40 Cel PM

S/B~
N=19 DF=12 Cond. no.=1.0897 Y-miss=0
N=19 DF=12
R2=0.917 Q2=0.791
R2 Adj.=0.876 RSD=0.3472 Conf. lev.=0.95

MODDE 7 - 2004-02-02 16:02:25
1.00 0.80 0.60
1.00
0.80
0.60
0.40 0.20
0.40
0.00 -0.20
0.20
Cel Cel*Lys Ion*Lys Lys Ion Ion*StH StH
0.00 S/B~
N=19 DF=11 Cond. no.=1.0897 Y-miss=0
N=19 DF=11 R2=0.962 Q2=0.914
R2 Adj.=0.937 RSD=0.2467 Conf. lev.=0.95

MODDE 7 - 2004-02-02 16:04:35
Lys
StH
Rat
Ion
0.00
30
Reporter Gene Assay - Model adjustment

Some observations:
The three most important factors are Cells, Ionomycin and Stimulation Time There are a few two-factor interactions which look interesting as they improve the predictive power of the model However, these two-factor interactions are confounded with other two-factor interactions
Experimenters decided to carry out more experiments

to possibly improve the modelling of S/B (although we must say the model is already very good) to resolve confounded two-factor interactions technique: Fold-over (Chapter 7)
2/10/2004
31
Reporter Gene Assay - Use of model

Plot created with Cells and LysVolume as axes (allows exploration of twofactor interaction), while fixing the other factors at their maximum value (because of positive regression coefficients) Region of Maximum S/B
2/10/2004
32
Summary
Fractional factorial designs form the most widely used family of screening designs Many factors can be mapped in few runs Confounding of effects is a disadvantage, but this can be reasonably tolerated by selecting a ResIV design Reporter Gene Assay:
Very good model for S/B Indication of some small interaction terms, which may be important More experiments, to possibly improve the modelling of S/B, and to resolve confounded two-factor interactions, will be done using Fold-over (Chapter 7)
2/10/2004
33

Chapter 7 Post-screening actions (What to do after screening ?)
Contents
Principles for inter- and extrapolation Basic requirement: Sound modelling Main outcomes Gradient techniques & software optimizer Adding new experiments Reporter Gene Assay
Creating the fold-over design Data Analysis
Summary
2/10/2004
Interpolation basic principle

Interpolation is based on using the derived regression model for predictions inside the experimental space explored Of interest when experimenting outside the investigated region is either impossible or undesired
Response contour plots, response surface plots Software optimizer
2/10/2004
Extrapolation basic principle

Extrapolation (predictions outside investigated region) is done when it is possible to change factor settings
Response contour plots, response surface plots (Gradient techniques) Software optimizer Extrapolation more uncertain than interpolation Recommendation: Avoid extrapolating more than 25% outside factor interval (Example: Temperature range 20 60 C; Modified range (10 - 70 C)
2/10/2004
Basic requirement for polation: Reliable model

Investigation: Reporter Gene Assay Screening (MLR) Summary of Fit
1.00
Use richness of diagnostic tools to acquire a reliable and predictive model Example: Reporter Gene Assay (plots from Chap. 6)
0.80
0.60
0.40
0.20
0.00 S/B~
N=19 DF=11 Cond. no.=1.0897 Y-miss=0
1.00 0.80 0.60 0.40 0.20 0.00 -0.20 Cel Cel*Lys Ion*Lys
MODDE 7 - 2004-02-04 08:51:25
Lys
Ion
N=19 DF=11
R2=0.962 Q2=0.914
R2 Adj.=0.937 RSD=0.2467 Conf. lev.=0.95
2/10/2004
Ion*StH
StH
Main outcomes One point that fulfills the goals
One of the performed experiments fulfills the experimental goals (IDEAL case) Make a limited set of new trials to verify the golden run
2/10/2004
Main outcomes Interpolation to find interesting area
Predictions inside region lead forward to an interesting point or small region Make a limited set of new trials to verify this point or small region
2/10/2004
Main outcomes Extrapolation to find interesting area
Predictions ouside region lead forward to an interesting point or small region Make a limited set of new trials to verify this point or small region
2/10/2004
How do we find interesting point or area?

Example: Finding the optimal region is an objective which is often used to bridge the gap between screening and optimization
Two techniques are used for moving the experimental region

graphically oriented gradient technique automatic optimization procedure based on running multiple simplexes
2/10/2004
Gradient techniques
Steepest ascent or descent. Example shows Steepest descent Gradient techniques work best with fairly few responses, and when occurring twofactor interactions are fairly small
Adhere to the line
Simulate a design load = 100.00
40
35 137
106
50 90 130 170
NH3
30
7 5 . 0
25
20 15 12.5 10 1.10
43.7
169
1.15
1.20
1.25
1.30
1.35
1.40
Airfuel
2/10/2004
10
Software optimizer
The MODDE optimizer will simultaneously start as many as eight simplexes, from different locations in the factor space (Details in Chapter 8) Example: Reporter Gene Assay The eight starting points in factor space
2/10/2004
11
Adding new experiments region not moved

Complementing for.
unconfounding
2/10/2004
curvature
12
Adding new experiments region is moved

Complementing using
follow-up screening design (or robustness testing design)
2/10/2004
13
Adding new experiments region is moved

Complementing using
new RSM design!

2/10/2004
14
What it looks like in software

Software opportunities: Chap 7 Chap 8 Add Topics Hinted at in Chap 5
tim op D al
2/10/2004
15
The fold-over technique

The fold-over principle gives a complementary design that results in unconfounding in resolution III and resolution IV designs Example relates to the 25-2 fractional factorial design
x4 = x1x2 x5 = x1x3
x4 = -x1x2 x5 = -x1x3
2/10/2004
16
Summary; Post-screening actions

Post-screening actions depends on
the quality of the obtained regression model whether it is possible and necessary to modify the factor ranges whether some of the already conducted experiments are close to fulfilling the goals stated in the problem formulation
Experiments outside investigated region impossible or undesired

One of the performed experiments fulfills the experimental goals (IDEAL case) None of the performed experiments meets the goals and the model must be used for finding the best point Addition of complementary experiments to the mother design (e.g. Fold-over)
Experiments outside investigated region possible and/or desired

graphically oriented gradient technique automatic procedure based on running multiple simplexes in parallel
2/10/2004
17
Creating the fold-over reporter gene assay design

Worksheet has been appended with 19 new experiments The block factor is a precautionary measure that is useful for probing whether significant changes over time have occurred
2/10/2004
18

New design must be evaluated in the same way as parent design
Investigation: Reporter Gene Assay Screening - Fold over complement Plot of Replications for S/B with Experiment Number labels
150
Investigation: Reporter Gene Assay Screening Fold over complement Histogram of S/B
35
Count
30
16
100 S/B
20
50
14 27 15 33 32 13 17 19 20 36 37 25 18 38 12 24 26 29 31 34 12345678910 11 21 22 23 28 30
0 10 20 Replicate Index 30
10
0 -1 24 49 74 Bins
MODDE 7 - 2004-02-04 09:02:08
99
124
149
174
MODDE 7 - 2004-02-04 09:03:19
The S/B response is not normally distributed (there are a few extreme values)
2/10/2004
19

Response becomes more nearly normal after log transformation
Investigation: Reporter Gene Assay Screening - Fold over complement Plot of Replications for S/B~ with Experiment Number labels
2 1 S/B~ 0 -1 -2 0
Investigation: Reporter Gene Assay Screening - Fold over complement Histogram of S/B~
12 10 8 Count 6 4 2
35 16 14 27 15 33 32 68 13 19 3436 37 17 25 38 12 18 7 26 24 29 5 31 10 9 30 21 23 28 12 4 22 11 3
10
20
20 Replicate Index
MODDE 7 - 2004-02-04 09:05:45
30
0 -3.00 -2.15 -1.30 -0.45 0.40 1.25 2.10 2.95 Bins

MODDE 7 - 2004-02-04 09:05:12
2/10/2004
20

Summary of fit plot of fitted model
R2 Investigation: Reporter Gene Assay Screening Q2 - Fold over complement (MLR); Summary of Fit Model Validity
Investigation: Reporter Gene Assay Screening - Fold over complement (MLR) S/B~ with Experiment Number labels
0.99 0.98 0.95 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.05 0.02 0.01 -4 -3 -2
Reproducibility
1.00
0.80
0.60
0.40
0.20
1 19 16 36 37 28 17 38 27 2 35 22 18 2 3 9 15 26 32 5 13 8 10 21 31 4 30 25 33 14 1 2 6 20 34 7 11 24 29
-1 0 1 2 3 4
0.00 S/B~
N=38 DF=30 Cond. no.=1.0897 Y-miss=0
N-Probability

N=38 DF=30 R2=0.920 Q2=0.877 R2 Adj.=0.901 RSD=0.3027
MODDE 7 - 2004-02-04 09:16:02
No outliers
2/10/2004
21
Reporter Gene Assay - Model interpretation

Block factor is insignificant (no significant drift over time) PMA and Ratio not influential Keep four linear terms
Investigation: Reporter Gene Assay Screening - Fold over complement (MLR) Scaled & Centered Coefficients for S/B~
0.80 0.60 0.40 0.20 0.00 -0.20 Cel Lys StH Rat $Bl PM Ion
R2=0.920 Q2=0.877
N=38 DF=30
R2 Adj.=0.901 RSD=0.3027 Conf. lev.=0.95

MODDE 7 - 2004-02-04 09:16:57
2/10/2004
22
Reporter Gene Assay - Model refinement

Four main effects were kept Two-factor interactions were evaluated but were insignificant The revised model is only marginally better
R2 Investigation: Reporter Gene Assay Screening Q2 - Fold over complement (MLR); Summary of Fit Model Validity
0.80
Reproducibility
1.00
0.80
0.60
0.60
0.40 0.20
0.40
0.00
0.20
-0.20 Cel Lys StH Rat $Bl PM Ion

R2=0.920 Q2=0.877
0.00 S/B~
N=38 DF=30 Cond. no.=1.0897 Y-miss=0
N=38 DF=30
R2 Adj.=0.901 RSD=0.3027 Conf. lev.=0.95

MODDE 7 - 2004-02-04 09:16:57
R2 = 0.92, Q2 = 0.88, MVal = 0.47, Rep = 0.97

R2 Investigation: Reporter Gene Assay Screening Q2 Model Validity - Fold over complement (MLR); Summary of Fit
0.80 0.60
Reproducibility
1.00
0.80
0.60
0.40 0.20 0.00
0.40
0.20
-0.20 Cel Lys

MODDE 7 - 2004-02-04 09:21:07
Ion
0.00 S/B~
N=38 DF=33 Cond. no.=1.0897 Y-miss=0
N=38 DF=33
R2=0.912 Q2=0.887
R2 Adj.=0.902 RSD=0.3018 Conf. lev.=0.95
R2
2/10/2004
= 0.91,
Q2
= 0.89, MVal = 0.47, Rep = 0.97

23
Reporter Gene Assay Further Diagnostic Checking

The revised model contains no outliers. However, some of the largest residuals are encountered for the six center-points (hints at curvature problems) Curvature is easy to handle with a quadratic model (Chap 8)
0.99 0.98 0.95 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.05 0.02 0.01 -4 -3 -2
Deleted Studentized Residuals 2 1 0 -1 -2 -1 0 Predicted
N=38 DF=33 R2=0.912 Q2=0.887 R2 Adj.=0.902 RSD=0.3018
MODDE 7 - 2004-02-04 09:28:55
19 16 1 36 137 7 28 35 22 27 38 2 3 18 15 8 926 13 4 5 12 32 31 30 7 11 10 34 21 14 6 25 33 24 20 3 29
-1 0 1 2 3 4 Deleted Studentized Residuals
N=38 DF=33 R2=0.912 Q2=0.887 R2 Adj.=0.902 RSD=0.3018
MODDE 7 - 2004-02-04 09:28:11
19 36 37 17 28 22 23 38 2 18 8 26 9 4 5 12 13 7 34 11 30 31 21 10 25 24 20 3 29 1
1
16 27 15 32 6 33 35
N-Probability
14
2
2/10/2004
StH
24
Reporter Gene Assay - Interpretation of refined model
Mainly a linear dependence of S/B on the factors

0.80
Cells, Ionomycin and Stimulation Time (StH) are most important Lysing Volume will not be varied in RSM (Chapter 8)
0.60 0.40 0.20 0.00 -0.20 Cel Lys

MODDE 7 - 2004-02-04 09:21:07
Ion
N=38 DF=33
R2=0.912 Q2=0.887
R2 Adj.=0.902 RSD=0.3018 Conf. lev.=0.95
2/10/2004
StH
25
MODDE optimizer applied to Reporter Gene Assay data

What does it look like at point
Cells = 400000 Ionom. = 2.0 StimH = 6 LysVol = 30 ?
Bring optimization results into response contour plots
2/10/2004
26
What we have learnt

DOE applied to technical and chemical problems often involves proceeding in stages Initially, a screening design in 19 experiments was laid out Secondly a fold-over complement was added With the combined set of experiments it was possible to corroborate that four factors are more influential than the others
2/10/2004
27

Chapter 8 Experimental objective: Optimization Illustration: General Example 2 (Reporter Gene Assay)
Contents
General Example 2
Background Problem formulation
Introduction to RSM Composite designs; CCC & CCF General Example 2

Construction & Geometry Comparison of CCC and CCF Evaluation of raw data Regression analysis and model interpretation Use of model MODDE optimizer
Summary
2/10/2004

Continuation of screening designs (Chapters 6 and 7) Main results of screening designs
Three main factors Suspected quadratic dependence (structured residuals)
Principal investigators: Lena Schultz and Lisbeth Abramo Active Biotech AB, Lund
t en tm ea Tr
Light
2/10/2004

In the third phase of the reporter gene application, the selected experimental objective was optimization In optimization, the important factors, usually between 2 and 5, have been identified, and one wants to extract in-depth information about them It is of interest to reveal the nature of the relationships between the few factors and the measured responses For some factors and responses the relationships might be linear, for others non-linear, that is, curved Such relationships are conveniently investigated by fitting a quadratic regression model
2/10/2004
Factors:
Cells Stimlation time Ionomycin
Old range
50000 400000 cells 26h 0.1 2 g/ml
New range
200000 400000 cells 46h 1 2 g/ml
2/10/2004
PF - Specification of responses
It is important to select responses that are relevant according to the experimental goals Many responses is not a problem Here: Signal-to-background ratio, computed as: [(signal-background)/background]
GOAL: As high as possible (Maximize)

2/10/2004

A quadratic model was selected
y = 0 + 1x1 + 2x2 + 11x12 + 22x22 + 12x1x2 +...+
With three factors a quadratic model requires a design with 17 experiments (8 + 6 + 3)
2/10/2004

The experimenters selected a central composite facecentered, CCF, design in 18 runs Standard design with 8 corner points, 6 axis points, and 4 center-points
2/10/2004
Introd. to response surface methodology (RSM) designs

RSM has been the acronym for response surface methodology, reflecting the use of response surface plots for finding an optimal point In more recent years, however, the re-interpretation response surface modelling has become more prevalent Good RSM designs must
allow the estimation of the parameters of the model with low uncertainty give rise to a model with small prediction error have prediction error independent of direction permit a judgement of the adequacy of the model encode as few experiments as possible
The family of central composite designs meets these demands
2/10/2004
The CCC design in two factors

The composite designs are natural extensions of the two-level full and fractional factorial designs Central Composite Circumscribed
The CCC design consists of three building blocks

(i) regularly arranged corner experiments of a two-level factorial design (ii) symmetrically arrayed star points located on the factor axes, and (iii) repeatedly performed center-points
2/10/2004
10
The CCC design in two factors

Example worksheet of an 11 run CCC design in two factors
The first four rows represent the corner experiments, the next four rows the star (axial) points, and the last three rows the replicated center-points
All factors are mapped in five levels with the CCC design This makes it possible to estimate quadratic terms with great rigor The corner experiments and the axial experiments are all situated on the circumference of a circle with radius 1.41, and therefore the experimental region is symmetrical
2/10/2004
11
The CCC design in three factors

The CCC design in three factors is constructed in a fashion similar to that of the two factor analogue:
(i) eight corner experiments, (ii) six axial experiments, and (iii) three replicated center-points
2/10/2004
12
The CCC design in three factors

(i) eight corner experiments
(ii) six axial experiments
(iii) three replicated center-points
2/10/2004
13
The CCF design in three factors

When it is desirable to maintain the low and high factor levels, and still perform an RSM design, the central composite face-centered (CCF) design is a prudent alternative
MODDE CCF CCC CCF is the recommended design choice for pilot plant and full scale investigations
2/10/2004
14
A comparison of CCC and CCF-designs

Theoretically, the CCF design is slightly inferior to the CCC design:
the CCC design spans a larger volume five levels of each factor also means that the CCC design is better prepared for capturing strong curvature, or even cubic response behavior quadratic model terms are less correlated in the CCC than CCF case
CCC
CCF
2/10/2004
15
Overview of composite designs

It is possible to explore as many as five factors in as few as 29 experiments When moving up to six factors, there is a huge increase in the number of experiments
Number of factors 2 3 4 5 6 7
Number of experiments 8+3 14 + 3 24 + 3 26 + 3 44 + 3 78 + 3
2/10/2004
16

The replicate plot shows that the signal-to-background response variable is numerically much higher than in the screening designs
Investigation: Reporter Gene Assay RSM with CCF Plot of Replications for S/B with Experiment Number labels
Investigation: Reporter Gene Assay RSM with CCF Histogram of S/B

7
Investigation: Reporter Gene Assay RSM with CCF Descriptive Statistics Plot
8
200
14 12 1011 13 16 17 15 18
Count
6 5 4 3 2
200
7
150 S/B
6 4 5 3 9
1 2 3 4 5 6 7 8 9
150
100
100
50
1
10 11 12 13 14 15
50
0 17 62 107 Bins 152 197 242

S/B
Replicate Index
MODDE 7 - 2004-02-04 09:59:25
S/B
Min: 17.9 Max: 221.4 Median: 107.8 Mean: 120.683
MODDE 7 - 2004-02-04 10:00:06
The response is sufficiently well distributed to allow the choice of no transformation
2/10/2004
17

The initial model is good (R2 = 0.91, Q2 = 0.56, MVal = 0.87 and Rep = 0.79), albeit with an undesirably large gap between R2 and Q2 Some insignificant model terms
R2 Investigation: Reporter Gene Assay RSM with CCF (MLR) Q2 Summary of Fit Model Validity Reproducibility
1.00
0.80
0.60
0.40
0.20
0.00 S/B
Investigation: Reporter Gene Assay RSM with CCF (MLR) Scaled & Centered Coefficients for S/B
N=18 DF=8
50
-50
-100 Cel Cel*Cel Ion*Ion Ion Cel*Ion StH*StH Cel*StH StH*Ion StH
N=18 DF=8
R2=0.908 Q2=0.558
R2 Adj.=0.805 RSD=25.3554 Conf. lev.=0.95

MODDE 7 - 2004-02-04 10:02:27
2/10/2004
18

R2 Investigation: Reporter Gene Assay RSM with CCF (MLR) Q2 Summary of Fit Model Validity Reproducibility
The two cross-terms Cel*StH and Cel*Ion and the quadratic term StH*StH were omitted The revised model is much better (R2 = 0.89, Q2 = 0.74, MVal = 0.92, Rep = 0.79).
1.00
0.80
0.60
0.40
0.20
0.00 S/B
N=18 DF=11 Cond. no.=4.0089 Y-miss=0
50
-50
Cel
Cel*Cel
Ion*Ion
Ion
N=18 DF=11
R2=0.896 Q2=0.739
R2 Adj.=0.840 RSD=22.9934 Conf. lev.=0.95

MODDE 7 - 2004-02-04 10:04:18
2/10/2004
StH*Ion
StH
19

The revised model contains no outliers (below, left), and the size of the residual is independent of the predicted value (below, right), which is good.
Investigation: Reporter Gene Assay RSM with CCF (MLR) S/B with Experiment Number labels
0.98 0.95 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.05 0.02 -4 -3 N-Probability
Investigation: Reporter Gene Assay RSM with CCF (MLR) S/B with Experiment Number labels
Deleted Studentized Residuals 2 1 0 -1 -2 20 40
1 16 17 15 3 7 6 4 10 8 512 14 11 2
1 3 2 9
60 80 100 120
4 10 5 11
16 17 15 6
7 12
8 14
18 13 9
-2 -1
13 18
140 160 180 200 220

N=18 DF=11 R2=0.896 Q2=0.739 R2 Adj.=0.840 RSD=22.9934
MODDE 7 - 2004-02-04 10:07:04
Predicted
N=18 DF=11 R2=0.896 Q2=0.739 R2 Adj.=0.840 RSD=22.9934
MODDE 7 - 2004-02-04 10:07:27
2/10/2004
20
Reporter Gene Assay - Use of model

The optimum inside the experimental region lies close to the factor combination high Stimulation time (6h), high Ionomycin (2) and intermediate Cells (ca 320000) Region of optimum
2/10/2004
21
What to do after RSM - Introduction

Three primary actions are envisioned
(i) one of the design runs fulfils the goals stated in the problem formulation: this corresponds to the ideal case, and only a couple of verification experiments are needed to establish the usefulness of this factor combination (ii) regression modelling and use of the model for predicting the location of a new promising experimental point (most common case) (iii) supplementation of an existing RSM design with a few extra experiments to support a specific model, e.g., partly cubic
Reporter Gene Assay:

case (ii) by using the optimization routine in MODDE
2/10/2004
22
MODDE optimizer applied to the Reporter Gene Data

Factor settings for interpolation
Response is to be maximized above a minimum value
2/10/2004
23

Factor co-ordinates for simplex launching Rows 1-5: 23-1 FFD in three most important factors + center point Rows 6-8: Runs from worksheet closest to meet the experimental goals
2/10/2004
24

Results of first optimization round Evaluation, log (D)
> 0 BAD 0 GOOD < 0 Even better = -10 IDEAL
Below zero means that we are between Target and Min for the response
Run #8 has lowest log(D)
2/10/2004
25

Second optimization New starting points around run 8
Performed in order to reduce the risk of being trapped by a local minimum or maximum
2/10/2004
26

Evidently, all five points are predicted to meet our experimental goals In fact, by neglecting some small variation among the decimal digits, we find that the five simplexes have converged to the same point, that is, Cells 320000, StimH = 6h, and Ionomycin = 2 g/ml.
2/10/2004
27
Uncertainties in predicted optimal point

The factor co-ordinates were transferred to the prediction list. This list shows that the predicted optimal S/N-value is 260 40
The relevance of the above factor combination was tested in a final robustness testing design.
2/10/2004
28
What we have learnt

The composite designs CCC and CCF are natural extensions of the twolevel full and fractional factorial designs A composite design consists of three building blocks,
(i) regularly arranged corner experiments of a two-level factorial design, (ii) symmetrically arrayed star points located on the factor axes, and (iii) repeatedly performed center-points.
The CCC and CCF designs differ in how the star points, or axis points, are positioned Both CCC and CCF support quadratic models
2/10/2004
29
Summary of the Reporter Gene Assay application

Three important factors found in the screening, Cells, Ionomycin and Stimulation Time (StimH), were varied in a CCF encompassing 18 experiments A very good model, without outliers resulted The interpretation of this model revealed the factor combination Cells 320000, Stimulation Time = 6 and Ionomycin = 2 as optimal inside the investigated experimental domain.
2/10/2004
30

Chapter 9 Experimental objective: Robustness testing Illustration: General Example 3 (HPLC)
Contents
Introduction to robustness testing General Example 3
Background Steps in problem formulation
Common designs in robustness testing

Fractional factorial designs Plackett-Burman designs
General Example 3
Evaluation of raw data Regression analysis and model interpretation
Four limiting cases of robustness testing
2/10/2004
Introduction to robustness testing

Minimize the systems sensitivity to small changes in critical factors A robustness test is usually carried out before the release of an almost finished product, or analysis system, as a last test to ensure quality Set point: factor combination which is currently used for running the system Aim of robustness testing:
- to explore robustness close to the set point
2/10/2004

HPLC is used in routine analysis of complex mixtures in pharmaceutical industry Example:
5 factors were varied in a 12 run experimental design Responses: capacity factors of two analytes and resolution between two adjacent peaks
10000
H 310/83 (R) (I) (S)
1a
1m H 309/40
8000
6000
(II)
4000
2000
10
12
14 min
2/10/2004

The objectives in robustness testing are:
to identify responses which are robust to small factor changes to identify responses that are sensitive to small factors changes to understand which factors that need to be better controlled to achieve robustness
Small factor changes ???

variation that may normally occur in the laboratory variation in raw materials, equipment, ...
2/10/2004
PF- Specification of factors

Four quantitative factors and one qualitative factor
2/10/2004
PF- Specification of responses

Three responses:
Capacity factor 1, k1 Capacity factor 2, k2 Resolution, Res1
H 310/83 1a
10000
H 309/40 1m
(R) (I) (S)
8000
6000
(II)
4000
2000
Specifications:
Res1 should be >1.5 (complete baseline separation) k1 N/A k2 N/A
10
12
14 min
2/10/2004

We distinguish between three main types of polynomial models
linear interaction quadratic y = b0 + b1x1 + b2x2 +...+ e y = b0 + b1x1 + b2x2 + b12x1x2 +...+ e y = b0 + b1x1 + b2x2 + + b11x12 + b22x22 + b12x1x2 +...+ e
In robustness testing, a linear model is usually selected
2/10/2004

The ideal result in a robustness testing study is identical response values for each trial low-resolution screening design useful A 25-2 design augmented with four center-points was used
2/10/2004
Geometry of HPLC-Rob design
2/10/2004
10
How to deal with center-points in case of qual. factors

If all factors are quantitative it is easy to add center points If one factor is qualitative one may position center points centered on two surfaces of the cube
3 Center points
2 "Center" points
100 200 400 TypeA TypeB
100
50
50
2/10/2004
11
Common designs in robustness testing - Part I

Fractional factorial designs
Resolution III
5 6 7 8 Eggpowder
Sho rten ing
100
50 100
1 200
7 8 Eggpowder 100 7
Flour
2 400
50
8 Eggpowder 4
Sho rten ing
100
4
Sho rten ing
50 100 1 200
50 100
1 200
Flour
2 400
50
Flour
2 400
50
2/10/2004
12
Common designs in robustness testing - Part II

Plackett-Burman designs
two level designs support linear models requires very few experimental runs per factor also used in screening
In some cases a PB-design is a specific fraction of a factorial design Number of runs a multiple of 4 PB designs of 12, 20, and 24 runs of particular interest
2/10/2004
13
Common designs in robustness testing - Part III

Example shows the 12 run PB-design Recommended use of PB-designs
No of factors 5 9 13 17 21 25 29 Maximum No of runs No of factors 7 8 Use Frac Fac 11 12 15 16 Use Frac Fac 19 20 23 24 27 28 31 32 Use Frac Fac
Always add 3 center-points

2/10/2004
14
HPLC application - Evaluation of raw data

The numerical variation in the resolution response is small The lowest measured resolution is 1.75 and the highest 1.89 This means that Res1 is robust (inside specification)
2/10/2004
15

All three responses are nearly normally distributed
2/10/2004
16

Low condition number, which means that we have a good design Responses are strongly correlated
2/10/2004
17
HPLC application - Regression analysis

Is model significant or not? Model refinement is usually not carried out The low Q2 of 0.12 for Res1 suggests that this response is robust
Investigation: HPLC Robustness (MLR) Summary of Fit
1.00 0.80 0.60 0.40 0.20 0.00
k1
N=12 DF=6
k2
Res1
2/10/2004
18
Four limiting cases of robustness testing

Nature of robustness Is regression model significant, or not? Are responses inside or outside specifications? Four limiting cases Inside specification/Significant model Inside specification/Non-significant model Outside specification/Significant model Outside specification/Non-significant model
2/10/2004
19
First limiting case - Inside specification/Significant model

All the measured values are inside the specification, that is, above 1.5 Regression model significant: weak Q2 and significant term of AcN
Investigation: HPLC Robustness Plot of Replications for Res1 with Experiment Number labels
2.0
1.00
Investigation: HPLC Robustness (MLR) Scaled & Centered Coefficients for Res1 (Extended)
0.040 0.020 0.000 -0.020 -0.040
1.9 1.8 Res1 1.7 1.6 1.5 1.4 1
1 2
7 8 5 1 90 4 6
12 11
0.80 0.60 0.40 0.20
Co(ColA)
pH
Ac
10 11
0.00
Replicate Index
MODDE 7 - 2004-01-22 15:00:10
k1
N=12 DF=6
k2
Res1
N=12 DF=6
R2=0.772 Q2=0.121
R2 Adj.=0.582 RSD=0.0248 Conf. lev.=0.95

MODDE 7 - 2004-01-22 14:26:41
Extreme cases predictions (what is maximum variation ?):
2/10/2004
Co(ColB)
OS
Te
20
Second limiting case - Inside spec/Non-significant model

Ideal outcome Res1 can be used to illustrate this Model is non-significant according to ANOVA; p-value of 0.059 exceeds 0.05
Investigation: itdoe_roblimcases (MLR) Q2 Model Validity Summary of Fit Reproducibility
1.00 0.80 0.60
0.60 1.00 0.80
R2
0.40 0.20 0.00 -0.20
0.40 0.20 0.00
k1
N=12 DF=6
k2
Res1
vetific
2/10/2004
21
Third limiting case - Outside spec/Significant model

Investigation: HPLC Robustness (MLR)
k2, used to illustrate this limiting case; temporary spec. between 2.7 and 3.3 Coefficients used for understanding two things, namely (i) how to get k2 inside specification and (ii) how to produce a nonsignificant model (how to get the second limiting case ?) Rows 2-3: extreme cases Rows 4-5: how to enter inside specifications Rows 6-7: how to get a nonsignificant model

1.00 0.80 0.60
Scaled & Centered Coefficients for k2 (Extended)

0.10 0.00 -0.10
0.40
-0.20
0.20
-0.30
0.00 k1
N=12 DF=6
k2
Res1
Co(ColA)
pH
Ac
N=12 DF=6
R2=0.989 Q2=0.959
R2 Adj.=0.981 RSD=0.0418 Conf. lev.=0.95

MODDE 7 - 2004-01-22 15:06:16
2/10/2004
Co(ColB)
OS
Te
22
Fourth limiting case - Outside spec/Non-significant model

Most complex limiting case, as many outcomes are conceivable:
one strong outlier (left) replicated center-points have much higher response values (middle) one experiment deviates and falls outside specification (right) many more...
Inv estigation: itdoe_roblimcases
Investigation: itdoe_roblimcases Plot of Replications for vetific with Experiment Number labels
Plot of Replications for v etific with Experiment Number labels
45 vetific 40 35 30 25 1
10 9 11
70 vetific
10 9 11
60
4
MOD D E 7 - 2003-1 1-17 11:5 8:00
50 1
1
2
2
3
3
4
4
5
5
6
6
7
7
8
8
9
100 90 80 70 60 50 40 30 20 10 0
3 1 2 4 5 6 7 8 10 11 9
vetific
Replicate Index
MODDE 7 - 2003-11-17 11:59:51
Replicate Index
MODDE 7 - 2003-11-17 12:01:59
2/10/2004
23
What we have learnt

We have discussed:
the experimental objective of robustness testing common designs in robustness testing the HPLC application the problem formulation steps of this example the evaluation of raw data, the regression analysis, and model interpretation, related to the HPLC example four limiting cases of robustness testing what to do to possibly convert a non-robust system to become a robust one
2/10/2004
24

Chapter 10 Conclusions
Key features of DOE

How to make experiments efficiently
Span the experimental domain with the aid of a suitable experimental design
How to analyze the data

Use good statistical tools to evaluate experimental results
How to interpret the results

With the use of user-friendly PC-based graphical facilities
How to convert modelling results into concrete actions/decisions

MODDE optimizer & verifying experiments
2/10/2004
Design of Experiments (DOE)

Maximizes the information content from experimental series meanwhile keeping the number of experiments low A) Prepare a set of representative experiments, in which all factors under investigation are varied simultaneously
B) From the set of experiments, a model is derived which captures the relation between factor settings and experimental result (responses).
Experimental result = (factor settings)
2/10/2004
DATA
Measurement Data
INFORMATION INFORMATION
Decision Action
Information Knowledge
DoE DoE Design Design of of Experiments Experiments

2/10/2004
MVA MVA Multivariate Multivariate Data Data Analysis Analysis

4
Multivariate Data Analysis (MVA)

Captures the systematic parts in Mb data sets and visualizes the information in plots and graphs
INFORMATION DATA
Multivariate Multivariate Modeling Modeling
2/10/2004

Chapter 11 Additional Topics
Contents
D-optimal design Blocking the experimental plan Mixture design Other RSM designs Multilevel qualitative factors The Taguchi approach to robust design Simultaneous optimization of several responses fitted with different models Partial least squares projections to latent structures, PLS Design in latent variables
2/10/2004
Additional Topics
D-optimal design
Contents
Introduction to D-optimal design Evaluation criteria
G-efficiency Condition number
Typical examples of D-optimal design
2/10/2004
When to use D-optimal design - Irregular regions

Irregular experimental region in screening optimization mixture design
A
Factor B Factor B Factor B Factor A
Factor A
Factor A
Factor B
Factor B
Factor A
Factor A
Factor B
Factor A
B
2/10/2004
C
5
When to use D-optimal design - Qualitative factors

Multi-level qualitative factors in screening
Fa c
Factor A Level 1 Level 2 Level 3 Level 4
Sett 1
Factor 3
Factor 1
r cto Fa
Factor 1
r cto Fa
Qualitative factor, level A
Qualitative factor, level B
2/10/2004
Factor 3
Optimization designs with qualitative factors
to r
Sett 2
Factor C
+1
-1 Sett 3
When to use D-optimal design - Special requirements

Special number of runs
# Runs # Center-points 8 3 11 3 12 3 16 3 # Total runs 11 14 15 19 Design type Frac Fac D-optimal PB Frac Fac
#Factors CCC/CCF BB 5 26 + 3 40 + 3 6 44 + 3 48 + 3 7 78 + 3 56 + 3
D-opt 26 + 3 35 + 3 43 + 3
Model upgrading
y = b0 + b1x1 + b2x2 + b3x3+ b12x1x2 + b13x1x3 + b23x2x3+ b22x22 + e b11x12 b33x32
y = b0 + b1x1 + b2x2 + b3x3+ b12x1x2 + b13x1x3 + b23x2x3+ b11x12 + b22x22 + b33x32 + b111x13 + e
2/10/2004
When to use D-optimal design - Inclusions

Inclusions of existing experimental information screening optimization
2/10/2004
When to use D-opt. design - Process and Mixture Factors

P rocess and M ixture F actors
When making a combined design for process and mixture factors LoafVolume is a typical example where D-optimal design could have been utilized
2/10/2004
Introduction to D-optimal design

A D-optimal design is a computer generated design, and consists of the best subset of experiments selected from the candidate set For a given model, Y = X + , the following can be said regarding the D-optimal approach:
the selected runs maximize the determinant of the matrix X'X these experiments span the largest volume possible in the experimental region
A D-optimal design can be tailored to support an irregular experimental region, or a very complex problem set-up (process + mixture)
2/10/2004
10
A small D-optimal example

Example: 22 full factorial design with factors x1 and x2
run 1 2 3 4
x1 -1 1 -1 1
x2 -1 -1 1 1
Model y = b0 + b1x1 + b2x2 + + b12x1x2 + e
Model in matrix form y = Xb + e b = (XX)-1Xy
2/10/2004
11
D-optimal example, the Covariance matrix (XX)-1

X
1 1 1 1 -1 1 -1 1 -1 -1 1 1 1 -1 -1 1 1 -1 -1 1 1 1 -1 -1
X
1 -1 1 -1 1 1 1 1
(XX)
4 0 0 0 0 4 0 0 0 0 4 0 0 0 0 4 0.25 0 0 0
(XX)-1
0 0.25 0 0 0 0 0.25 0 0 0 0 0.25
Precision in b from:
(XX)-1 * RSD * t smallest (XX)-1 largest XX

12
2/10/2004
A second small D-optimal example

Problem: two factors (x1/x2) varied in three levels Proposed model:
y = b0 + b1x1 + b2x2 + e model needs 3 DF
det=0
1 1 1
det=1
(9! / (3!*6!)) = 84 ways of selecting 3 trials out of 9 Maximize the determinant det(XX) Best precision in estimated regression coefficients with det = 16
2/10/2004
-1
-1
-1
-1
-1 -1 1 0 1
det=4
1 1
det=9
det=16
-1 -1 0 1
-1
-1
-1
-1
13
How to compute a determinant

Example: experiments spread according to a determinant of 4
1
X
1 1 1
3 -1 0 -1 1 0
X
0 -1 1
0 0 2 3 -1 0
-1
-1
XX
-1 1 0 0 0 2
3 -1 0 -1 1 0
-1 0 0
1 -1 0
-1 1 0
1 0 -1
3 -1 0
1 0 1
-1 1 0
3 -1 0
0 0 2
(3*1*2) + (-1*0*0) + (0*-1*0) - (0*1*0) - (0*0*3) - (2*-1*-1) = 4

2/10/2004
14
Features of the D-optimal approach

Assumes that the selected regression model is "correct" and "true Sensitive to model choice Potential terms may be added to protect against this sensitivity
2/10/2004
15
Evaluation criteria
Two common evaluation criteria: Condition number
- ratio of largest to smallest singular value of X - a measure of sphericity - 1 is lower (ideal) limit, denotes orthogonal design
G-efficiency
- computed as Geff = 100*p/n*d - compares the efficiency of a D-optimal design to that of a fractional factorial design - 100% is the upper limit and designates that a fractional factorial design was obtained - above 60-70% is recommended
2/10/2004
16
Applications of D-optimal design - Model updating

Model updating is common after screening, when it is necessary to unconfound two-factor interactions Example: Laser welding Po*Sp two-factor interaction needed, but confounded with No*Ro Fold-over leads to 11 new experiments Selective updating possible with D-optimal design
2/10/2004
17

Step 1: Make a copy of current investigation
Step 2: In the new application, do File/Complement design (opens a wizard)
2/10/2004
18
Step 3: Select D-optimal design
2/10/2004
19

Step 4: Select the number of additional runs; to unconfound two two-factor interactions 4 extra experiments are appropriate
2/10/2004
20

Step 5: Edit Model to add the interesting terms.
2/10/2004
21

Step 6: Select the number of additional center-points and name the new investigation
2/10/2004
22

Step 7: Select Screening and 15 + 2 runs as lead numbers
2/10/2004
23

Step 8: Generate Doptimal design with 15 runs (here: five variants)
2/10/2004
24
Step 9: Evaluate the resulting designs. In this case all five alternatives are identical
2/10/2004
25

Step 10: Generate the selected design
Design tailor-made to resolve Po*Sp and No*Ro

2/10/2004
26
Applications of D-opt. design - Multi-level qual. factors

Example: Cotton cultivation
Full factorial design has 4*7 = 28 runs (very many in screening) A linear model is sufficient in screening:
Yield = b0 + b1Center + b2Variety + e constant term: 1 DF linear term of Center (7 levels): 6 DF linear term of Variety (4 levels): 3 DF extra: 5 DF Total: 15 DF
2/10/2004
27
Applications of D-opt. design - Multi-level qual. factors

D-opt designs with 14, 15, and 16 runs were generated 5 versions for each N 14 runs (above) balancing with regards to Center 16 runs (below) balancing with regards to Variety
C1........C4........C7 V1 V2 V3 V4
C1........C4........C7 V1 V2 V3 V4
C1........C4........C7 V1 V2 V3 V4
C1........C4........C7 V1 V2 V3 V4
2/10/2004
28
Applications of D-opt. design - Combined design

D-opt design is useful for combined designs of process and mixture factors Example: Bubble formation 2 process factors 4 mixture factors Response: lifetime of bubbles
2/10/2004
29

Objective screening Selected model was one with linear and interaction terms
interactions allowed among process factors between the process and mixture factors but not among the mixture factors themselves
Necessary DF (= 20) are calculated as follows:

- 1 DF for the constant term - 2 DF for the linear process terms - 3 DF for the linear mixture terms, - 1 DF for the process*process interaction - 6 (2*3) DF for the process*mixture interactions - 2 DF for the relational constraint - 5 extra DF
2/10/2004
30

Recommended procedure: find a lead number of design runs, N generate designs with N 4 runs make five alternative versions for each level of N 4 runs We generated 45 alternative D-optimal designs (N = 16 to N = 24) Selected: N=16 showing a G-efficiency of 76.1% and a condition number of 2.7 We have obtained a good design
2/10/2004
31

2 series of 4 replicates were added, making the entire design comprise 16 + 8 = 24 experiments Screening:
span in lifetime 11 362 sec
Optimization:
span in lifetime 6.02 - 22.28 min
Key to prolonging bubble lifetime: substantial increase of glycerol
2/10/2004
32
Summary
We have discussed: When to use D-optimal design What D-optimal design is Computational and geometrical aspects of the D-optimality criterion The condition number as evaluation criterion of D-optimality The G-efficiency as evaluation criterion of D-optimality Applications of D-optimal design
model updating multi-level qualitative factors combined designs of process and mixture factors
2/10/2004
33
Additional Topics
Blocking the Experimental Plan
Contents
Introduction to blocking When to use blocking Blocking in MODDE
Block size Number of blocks Blockable designs Recoding of block factors
Chemical example Summary
2/10/2004
35
Introduction to blocking
Randomization is used as a safeguard against unwanted sources of extraneous systematic variability When you cannot conduct all the experiments in a homogeneous way randomizing your experiments may not be sufficient to deal with such variability Blocking the experiments in synchronized groups may help to decrease the impact of such variability on the effects of the factors
2/10/2004
36
When to use blocking

Suppose you are running a full factorial design in 5 factors and 32 runs Batch size of raw material permits 8 experiments per batch You may then want to run your experiments in 4 blocks, each composed of 8 runs using homogeneous starting material Orthogonal Blocking makes it possible to divide the 32 experiments into 4 blocks of 8 runs, such that the difference between the blocks (the raw material) does not affect the estimate of the factor effects
2/10/2004
37
Example: Blocking_Scr
With 25 design there are two options, with or without block interactions:
2/10/2004
38
With block interactions:
2/10/2004
39
Without block interactions:
2/10/2004
40
Design region (same with or without block interactions) Each block occurs twice in each cornercube
2/10/2004
41
Blocking in MODDE
MODDE supports orthogonal blocking for two-level full and fractional factorials, CCC, PB, and BB-designs (Note: CCF not blockable!) MODDE also supports blocking of D-optimal designs provided that the number of design runs is a multiple of the number of blocks (Note: blocks in D-optimal designs are usually not orthogonal to the factors)
2/10/2004
42
Blocking of full and fractional factorial designs

Maximum number of blocks is 8 with minimum block size of 4. One blocking factor is used for 2 blocks, 2 for 4 blocks, and 3 for 8 blocks The block effects consist of the effects of the blocking factors and all their interactions Hence with 8 blocks there are 7 block effects consuming 7 DF Pseudo-resolution: The resolution of the design when all block effects (blocking factors and all their interactions) are treated as main effects under the assumption that there are no interactions between blocks and main effects, or blocks and main effects interactions
2/10/2004
43
Blocking of other designs

PB: Can only be split into two blocks by introducing one block factor, and using its signs to split the design. CCC: Each block must be a first-order orthogonal block. Can be split into two blocks, the cube portion and the star portion. The cube portion can sometimes be split into smaller blocks. Each block must have the same number of center-points. BB: BB3 not blockable; BB4 3 blocks, BB5-BB7 2 blocks, BB8 N/A. D-optimal design: Blocks must have equal size and the total number of runs must be a multiple of the number of blocks. Interactions between the block factor and other factors disallowed.
2/10/2004
44
Recoding the blocking factors

Blocks are assigned according to the combination of signs of the blocking factors To generate 4 blocks the following recoding is done $B1 + + $B2 + + Block no 1 2 3 4
2/10/2004
45
Example: Blocking_RSM
Chemical example with objective to maximize yield. CCC design in two factors where cube and star portions were run at different time points.
2/10/2004
46
How to specify the problem in MODDE
CCC design, which is blockable 2 blocks Equal number of center-points in each block
2/10/2004
47

Excellent input data!
Investigation: Blocking_RSM Plot of Replications for Yield with Experiment Number labels
B1 B2
Investigation: Blocking_RSM Histogram of Yield

5
90 88 86 Yield 84 82 80 78 1
5 6
Count
11 12 7 8 9 10
1
2 3 4
4
5 6 7 8 9 10 Replicate Index
MODDE 7 - 2004-02-04 15:05:16
0 77 81 85 Bins 89 93
MODDE 7 - 2004-02-04 15:04:39
2/10/2004
48
Regression model building

Strong regression model with
= 0.98 Q2 = 0.95 MVal = 0.99 Rep = 0.88 R2
Investigation: Blocking_RSM (MLR) Summary of Fit
1.00
0.80
0.60
0.40
0.20
0.00 Yield
Investigation: Blocking_RSM (MLR)
MODDE 7 - 2004-02-04 15:06:47
Scaled & Centered Coefficients for Yield (Extended)
Temp*$Blo(B1)
N=12 DF=3
R2=0.978 Q2=0.949
R2 Adj.=0.919 RSD=1.2524 Conf. lev.=0.95

MODDE 7 - 2004-02-06 13:38:08
2/10/2004
Temp*$Blo(B2)
$Blo(B1)
$Blo(B2)
Tim*$Blo(B1)
Tim*$Blo(B2)
Tim
Tim*Tim
Temp
Temp*Temp
Tim*Temp
There is some evidence that slightly lower yields were obtained in the second block of six runs
2 0 -2 -4 -6 g
49
Use of model
Response surface plots visualise that higher yields were obtained in the first experimental campaign (when running the cube portion)
2/10/2004
50
What have we learnt

Blocking introduces extra factors in the design this reduces residual DF and design resolution You should only block when the extraneous source of variability is large and cannot be controlled by randomizing the run order
2/10/2004
51
Additional Topics
Mixture Design
Contents
Introduction to mixture design A working strategy for mixture design
Example 1: Tablet formulation (regular experimental region) Example 2: Bubble formation - screening (irregular experimental region) Example 3: Bubble formation - optimization (irregular experimental region)
2/10/2004
53
Introduction to mixture design

Example, Rocket Propellant: Three components were mixed together to form a rocket propellant. The purpose was to find a propellant with an elasticity of > 2900 Formulation factors
Binder Oxidizer Fuel 0.2-0.4 0.4-0.6 0.2-0.4
What is the "problem" with the worksheet ? Each row sums to 1.0 !!!
Consequences for the design ????

2/10/2004
54

What does the mixture design look like? The experimental domain with 01 bounds on the factors takes the form of a triangle Here we are investigating a limited region of the available experimental domain
Oxidiser Fuel Binder
2/10/2004
55

A quadratic model was used
Investigation: Rocket (PLS, comp.=2) Scaled & Centered Coefficients for Elasticity
200 100 0 -100 -200 Oxi*Oxi Bin*Oxi Oxi Fue*Fue Bin*Fue Oxi*Fue Fue Bin Bin*Bin
R2=0.801 Q2=0.249
N=10 DF=4
R2 Adj.=0.553 RSD=160.1071 Conf. lev.=0.95

MODDE 7 - 2004-01-23 10:39:16
The model predicts an area in which an elasticity exceeding 2900 is found
Coefficients show that binder and fuel have the strongest impact on elasticity
We are able to quantitatively describe elasticity in terms of three varied ingredients

2/10/2004
56
A Working Strategy for Mixture Design

1. D efinition of factors and bounds 10. U se of m odel
Illustrations: Tablet preparation & Bubble formation
2. Selection of experim ental objective and m ixture m odel
9. V isualization of m odelling results
3. Selection of candidate set
8. A nalysis of data and evaluation of m odel
4. G eneration of design
7. E xecution of design
5. E valuation of size and shape of m ixture region
6. D efinition of reference m ixture
2/10/2004
57
Tablet: - 1. Definition of factors and bounds

Aim: To investigate tablet preparation and find out which factors that regulate the release rate of an active substance Mixture Factors:
Cellulose (0 - 1) Lactose (0 - 1) Phosphate (0 - 1) All factors sum to 100% (mixture constraint) Bounds display consistency
Constraint:
No other extra constraint
Response:
Release rate of the active substance (to be maximized)
2/10/2004
58

Checking for consistency of bounds Example:
0.1 A 0.5 0.1 B 0.3 0.2 C 0.4.
A LB UB
These bounds are inconsistent L*A After a simple arithmetic check (done automatically in the software) the new bounds become: LA
0.3 A 0.5 0.1 B 0.3 0.2 C 0.4.
2/10/2004
UA
LC
UC
59
Tablet: - 2. Selection of experimental objective and mixture model

Experimental objective:
Optimization
Mixture model:
Quadratic
y = 0 + 1XMF1 + 2XMF2 + 3XMF3 + 11XMF12 + 22XMF22 + 33XMF32 + 12XMF1*XMF2 + 13XMF1XMF3 + 23XMF2XMF3 + Cox model type with constraints imposed on the regression coefficients
2/10/2004
60
Tablet: - 3. Selection of candidate set

The candidate set is the pool of theoretically possible and meaningful experiments, from which the actual design is selected Here, the candidate set is small:
3 extreme vertices 3 centers of edges 3 interior points 1 overall centroid
Undesired experiments may be deleted from the candidate set prior to generation of the design
2/10/2004
61
Tablet: - 4. Generation of design

The design should contain experiments which are informative and map the experimental region as well as possible In this case the experimental region is regular and then the Simplex Centroid design is applicable
2/10/2004
62
Tablet: - 5. Evaluation of size and shape of mixture region

Introduction to regular mixture regions
A (1/0/0)
1 _ 1 _ 1 _ 3 3 3
/ /
/0. 5/0
0.5 0 /0/
0.5
.5
B (0/1/0)
0/0.5/0.5
C (0/0/1)
1
X1 + X2 = 1
X2 0
2/10/2004
X1
1
63

In MODDE: Show/Design Region Example: Bubbles (see more info. later)
1. D efinition of factors and bounds
10. U se of m odel
Useful approach to understand how and where the experiments are laid out
Glycerol = 0.0
2/10/2004
Glycerol = 0.1
Glycerol = 0.2
64

Alternative designs for regular region (choice of model will be important)
Linear
Quadratic
Special Cubic
2/10/2004
65
Tablet: - 6. Definition of reference mixture

The reference mixture is used to anchor the mathematical model - easy to find for regular regions (overall centroid)
1 _ 1 _ 1 _ 3 3 3
A (1/0/0)
/ /
0.5
Strongly irregular regions require an efficient algorithm to find overall centroid Serves the same function as the centerpoint does in process design
/0. 5/0
0.5 0 /0/ .5
B (0/1/0)
0/0.5/0.5
C (0/0/1)
Tablet preparation: 1/3,1/3,1/3
2/10/2004
66
Tablet: - 7. Execution of design

Important to carry out experiments in random order This is done in order to break down any systematic time trend to become a non-important and random unsystematic variation
2/10/2004
67
Tablet: - 8. Analysis of data and evaluation of model

Analysis of data with PLS
Investigation: Waaler_rsm (PLS, comp.=3) Summary of Fit
1.00
R2 Q2
Investigation: Waaler_rsm (PLS, comp.=3) release with Experiment Number labels

0.95 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.05 N-Probability
0.80
0.60
0.40
1 7 2 3 10
-1 0
9 6
4 5
0.20
0.00 release
Standardized Residuals
N=10 DF=4 R2=0.985 Q2=0.553 R2 Adj.=0.966 RSD=18.7170
MODDE 7 - 2004-01-23 11:02:43
Exp #10 is a probable outlier - Should be re-tested If #10 is deleted and model refitted, Q2 improves (from 0.55 to 0.69) indicates a more valid model
2/10/2004
68
Tablet: - 9. Visualization of modelling results

Investigation: Waaler_rsm (PLS, comp.=3) Scaled & Centered Coefficients for release
100
50 min
-50
-100 la la*la ce ce*ce ce*la ph*ph ce*ph ph la*ph
N=10 DF=4
R2=0.985 Q2=0.553
R2 Adj.=0.966 RSD=18.7170 Conf. lev.=0.95

MODDE 7 - 2004-01-23 11:03:19
Regression coefficients
Tri-linear contour plot
2/10/2004
69
Tablet: - 10. Use of model

Use of verifying experiments
Pred No cellulose lactose phosphate release (obs) 1 0.32 0 0.68 --2 0.5 0.125 0.375 370 3 0.333 0 0.667 340 4 0.667 0 0.333 345 release(pred) Lower Upper 363 322 404 293 262 324 363 322 405 320 278 361
Model predicts well except for blend 0.5/0.125/0.375 This strange experiment should be repeated and the model possibly updated with this new information
2/10/2004
70
BubbleScr: - 1. Definition of factors and bounds

Aim: To investigate bubble formation and find out which factors that dominate bubble lifetime Process Factors:
Temperature (7 - 21C; refrigerator/kitchen temperature) Time (1 - 13 - 25h) Tap Water, Ume (0.4 - 0.8) Glycerol, APOTEKETS (15% water content / 0.0 - 0.2)
Constraint:
0.2 DWL1 + DWL2 0.5
Response:
Lifetime of bubbles (sec) obtained with childrens bubble wand. Time until bursting was measured for bubbles of 4-5 cm size (diameter)
Mixture Factors:
Dish-washing liquid 1, SKONA, ICA (0 - 0.4) Dish-washing liquid 2, NEUTRAL, ADACO (0 - 0.4)
2/10/2004
71
BubbleScr: - 4. Generation of design

Best design with N = 16 (Geff = 76%, CondNo = 2.7) 2 series of 4 replicates were added 24 runs
2/10/2004
72
BubbleScr: - 8. Analysis of data and evaluation of model

Investigation: Bubb_scr
PLS analysis
Lifetime~
Plot of Replications for Lifetime~ with Experiment Number labels

1.00
Investigation: Bubb_scr (PLS, comp.=2) Summary of Fit
2.60 2.40 2.20 2.00 1.80 1.60 1.40 1.20 1.00
9 1 4 8 2 3 6 7
Replicate Index
MODDE 7 - 2004-01-23 11:15:36
13 19 20 17 18 16 14 15 23 21 22 24
0.80
0.60
12 11 10
0.40
0.20
0.00 Lifetime~
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
Investigation: Bubb_scr (PLS, comp.=2) N=24 Cond. no.=2.1537

DF=18 Y-miss=0
Lifetime~ with Experiment Number labels

0.98 0.95 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.05 0.02 N-Probability
12
2 10
-1
14 4 24 86 16
20 13 5 23 17 1 3 11 9 18 21 22 7
19
15
N=24 DF=18
R2=0.796 Q2=0.640
R2 Adj.=0.739 RSD=0.2018
MODDE 7 - 2004-01-23 11:23:22
2/10/2004
73
BubbleScr: - 9. Visualization of modelling results

Investigation: Bubb_scr (PLS, comp.=2) Scaled & Centered Coefficients for Lifetime~
0.30 0.20 0.10 0.00 -0.10 -0.20 Ti DW1 DW2 Gly Te Wa
Glycerol = 0.2 Temp = 14 Time = 13
Reference mixture 0.2 / 0.2 / 0.5 / 0.1
N=24 DF=18
R2=0.796 Q2=0.640
R2 Adj.=0.739 RSD=0.2018 Conf. lev.=0.95

MODDE 7 - 2004-01-23 11:24:53
2/10/2004
74
BubbleScr: - 10. Use of model

MODDE optimizer was used to propose two verifying experiments
Temp 7 7 Time 25 49 DWL1 0.2 0.4 DWL2 0.2 0 Water 0.3 0.3 Glycerol Lifetime Lower Upper 0.3 570.196 300.513 1081.893 0.3 1243.664 456.962 3384.745
Verifying experiment #1 Temp = 7 Time = 25 Mixture = 0.2 / 0.2 / 0.3 / 0.3 Resp 1 = 1120 sec (18 min 40 sec)
2/10/2004
75
BubbleOpt: - 1. Definition of factors and bounds

Verifying experiment #1was used to adjust the bounds of the four mixture factors Process Factors:
Temperature kept constant (+7C) Time kept constant at 25h Tap Water, Ume (0.2 - 0.4) Glycerol, APOTEKETS (15% water content / 0.2 - 0.4)
Constraint: 0.3 DWL1 + DWL2 0.5 Response: Lifetime of bubbles (sec) obtained
with childrens bubble wand. Time until bursting was measured for bubbles of 4-5 cm size (diameter)
Mixture Factors: Dish-washing liquid 1, SKONA,

ICA (0.1 - 0.3) Dish-washing liquid 2, NEUTRAL, ADACO (0.1 - 0.3)
2/10/2004
76
BubbleOpt: - 4. Generation of design

Selected design with 24 (20 + 4) runs Geff = 83%, CondNo = 16.8
2/10/2004
77
BubbleOpt: - 8. Analysis of data and evaluation of model

Investigation: Bubb_rsm
PLS analysis
Lifetime~
Plot of Replications for Lifetime~ with Experiment Number labels
Investigation: Bubb_rsm (PLS, comp.=2) Summary of Fit

1.00
3.10
1 2
8 11 5 4 3 6 7 9
8
1314
16 17
15
20 19 22 23 21 24
0.80
3.00
0.60
10 12
0.40
2.90
18
0.20
2.80 0 1 2 3
9 10 11 12 13 14 15 16 17 18 19 20 21 22 Replicate Index
MODDE 7 - 2004-01-23 11:45:51
0.00 Lifetime~
N=24 DF=14 Cond. no.=12.3206 Y-miss=0
Investigation: Bubb_rsm (PLS, comp.=2) Lifetime~ with Experiment Number labels

0.98 0.95 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.05 0.02 N-Probability
11
7
-1
1 17 16 6 9 12 5
20 10 3 224
23 1921 13 18 8
14
22
4 15
N=24 DF=14
R2=0.919 Q2=0.708
R2 Adj.=0.868 RSD=0.0358
MODDE 7 - 2004-01-23 13:14:00
2/10/2004
78
BubbleOpt: - 9. Visualization of modelling results

Investigation: Bubb_rsm (PLS, comp.=2) Scaled & Centered Coefficients for Lifetime~
0.060 0.040 0.020 0.000 -0.020 -0.040 -0.060 Gly*Gly Gly DW1*Gly DW1*DW1 DW2*DW2 DW1*DW2 DW2*Gly DW1*Wa DW2*Wa Wa*Gly DW1 DW2 Wa Wa*Wa
Glycerol = 0.4
N=24 DF=14
R2=0.919 Q2=0.708
R2 Adj.=0.868 RSD=0.0358 Conf. lev.=0.95

MODDE 7 - 2004-01-23 13:14:42
2/10/2004
79
BubbleOpt: - 10. Use of model

Raw Data Plot
3.15 Log (Lifetime) 3.10 3.05 3.00 2.95 2.90 2.85 2.80 40 4 6 12 7 18 3 50 60 Cost 70 80 22 23 21 24 1517 11 10 9 5 14 13 20 1 8 19 16 2
Ingredient cost is easy to take into consideration
2/10/2004
80

Lowest ingredient cost with longlasting bubbles
2/10/2004
81
Conclusions, Bubble example

Sequence 1) Screening, 2) RSM is very fruitful for rational experimental work We were able to increase bubble lifetime from 6.02 - 22.28 min Key to success was to increase glycerol substantially Long-lasting bubbles are obtained with
Cooled solution 25 h settling time (not popular for kids) Formulation
DWL1 DWL2 Water Glycerol 0.23 0.1 0.27 0.4
Red plastic bubble wand
2/10/2004
82
Additional Topics
Other RSM designs for regular experimental region
Contents
Introduction Three-level full factorial designs Box-Behnken designs Comparison of Composite, Three-level factorial, and Box-Behnken designs
2/10/2004
84
Introduction
Composite designs are commonly used in optimization
We will now discuss two additional design families namely

(i) Three-level full factorial designs (ii) Box-Behnken designs
2/10/2004
85
Three-level full factorial designs

Three-level full factorial designs are extensions of the two-level full factorials Geometry of 32 and 33 designs displayed Runs = 3k ; k = no. factors; 9,27,81,243,... With k = 4 or higher this design family is not used to any great extent Observe that the 32 design is equivalent to the CCF design in two factors
2/10/2004
86
Three-level full factorial designs

Geometry of 34 and 35 designs
x1
Investigation: itdoe_testingofdesigns D esign: Full Fac (3 levels)
x1
x3
x3
x1
Investigation: itdoe_testingofdesigns D esign: Full Fac (3 levels)
x2
x2 x4
x2
x1
x1
x3
x3
x1
x2
x2
x2
x3
x2
x2
x2
81 and 243 experiments !!!
x5
x1
x1
x3
x3
x1
x1
x1
x3
x3
x1
x2
x2
x2
x4
2/10/2004
x3
x3
x3
87
Box-Behnken designs
Family of designs employing three levels per varied factor BB-designs are useful if experimenting in the corners is unwanted Mostly, BB-designs are used when investigating three or four factors.
2/10/2004
88
Summary
An overview of the number of experiments encoded by composite, three-level full factorial, and Box-Behnken designs, for 2-5 factors
# Factors 2 3 4 5 CCC/CCF 8+3 14 + 3 24 + 3 26 + 3 Three-level 9+3 27 + 3 81 + 3 243 + 3 Box-Behnken ----12 + 3 24 + 3 40 + 3
Overall, the CCC and CCF designs are most economical Some parsimony is provided by the BB-designs in three and four factors as well, but with five factors the BB design is not an optimal choice The big drawback of the three-level full factorial designs is the rapidly increasing number of experiments
2/10/2004
89
Additional Topics
Multi-level qualitative factors
Contents
Introduction Example: Cotton cultivation Regression modelling of multi-level qualitative factors Interpretation of regression models
regression coefficient plot interaction plot
2/10/2004
91
Introduction
Example: Multilevel qualitative factors
Factor A is a qualitative factor with four levels, factor B a qualitative factor with three settings, and factor C a quantitative factor changing between -1 and +1 Selected objective: Screening and linear model Full factorial design in 24 experiments is not the best choice D-optimal design (open set or filled set) is a better alternative
Factor C Factor A Level 1 Level 2 Level 3 Level 4
2/10/2004
+1
Fa ct o
-1 Sett 3
Sett 1
rB
Sett 2
92
Example - Cotton cultivation
2/10/2004
93
Regression analysis - Coefficient plot

Data support interaction model Coefficient plot has 39 bars V 4 bars C 7 bars V*C 28 bars Groups of terms which cannot be split Center has more impact than Variety
Investigation: Yates (MLR) Scaled & Centered Coefficients for Yield (Extended)
60 40 20 0 -20
2/10/2004
V(V1) V(V2) V(V3) V(V4) C(C1) C(C2) C(C3) C(C4) C(C5) C(C6) C(C7) V(V1)*C(C1) V(V1)*C(C2) V(V1)*C(C3) V(V1)*C(C4) V(V1)*C(C5) V(V1)*C(C6) V(V1)*C(C7) V(V2)*C(C1) V(V2)*C(C2) V(V2)*C(C3) V(V2)*C(C4) V(V2)*C(C5) V(V2)*C(C6) V(V2)*C(C7) V(V3)*C(C1) V(V3)*C(C2) V(V3)*C(C3) V(V3)*C(C4) V(V3)*C(C5) V(V3)*C(C6) V(V3)*C(C7) V(V4)*C(C1) V(V4)*C(C2) V(V4)*C(C3) V(V4)*C(C4) V(V4)*C(C5) V(V4)*C(C6) V(V4)*C(C7)
N=28 DF=0 Conf. lev.=0.95
MODDE 7 - 2004-01-23 13:34:35
94
Regression analysis - Interaction plot

Investigation: Yates (MLR)
In the case of multi-level qualitative factors, the interaction plot is especially informative Best possible combination of factors is Variety #4 and Center #4
Interaction Plot for V*C, resp. Yield
V V V V
(V1) (V2) (V3) (V4)
80 60 40 Yield 20 0 -20 -40
V (V1) (V2) V (V4) (V3) V (V4) V (V1) V V (V3) (V2)

C1 C2 C3 C4 Center
N=28 DF=0
MODDE 7 - 2004-01-23 13:36:15
C5
C6
C7
2/10/2004
95
Regression coding of qualitative variables

Qualitative variables require a special form of coding for regression analysis to work properly A qualitative factor with k levels, will have k-1 expanded terms in the model calculations
Expanded term Level of factor V1 V2 V3 V4 V(V2) -1 1 0 0 V(V3) -1 0 1 0 V(V4) -1 0 0 1
2/10/2004
96
Regular and extended lists of coefficients

Regular Extended
Yield Constant V V(V1) V(V2) V(V3) V(V4) Sum Coeff. -0.25 DF = 3 -0.035 -2.75 -5.036 7.821 0 Yield Constant Coeff. -0.25
The last extended term = negative sum of the other expanded terms All extended coefficients of a qualitative factor sum to zero
V(V2) V(V3) V(V4) Sum
-2.75 -5.036 7.821 0.035
C(C2) C(C3) C(C4) C(C5) C(C6) C(C7) Sum

2/10/2004
4 -19.5 56.5 -30.75 7.25 14 31.5
C C(C1) C(C2) C(C3) C(C4) C(C5) C(C6) C(C7) Sum
DF = 6 -31.5 4 -19.5 56.5 -30.75 7.25 14 0 97
Generation of designs with multi-level qualitative factors

With two-level qualitative factors standard two-level factorial designs apply With multi-level qualitative factors a Doptimal design is a more sensible choice
Factor A Level 1 Level 2 Level 3
Important: Balancing
2/10/2004
Level 4
Fa ct
Sett 1
or
Sett 2
Factor C
+1
-1 Sett 3
98
Summary
Interaction plot informative tool in regression modelling Expansion of qualitative factors in regression modelling gives regular and extended mode coefficients plots Multi-level qualitative factors are well handled with D-optimal design
2/10/2004
99
Additional Topics
The Taguchi approach to robust design
Contents
The Taguchi approach to robust design Inner and outer arrays of factors Classical analysis approach Interaction analysis approach Examples
CakeMix DrugD LoafVolume
Studying of expensive and inexpensive factors
2/10/2004
101
Robust design vs. Robustness testing

In the Taguchi approach, robustness has a different connotation and objective The objective is to find conditions where simultaneously the responses have values close to target and low variability Factors are often varied in large intervals and with designs very different from those used in Robustness testing Factors are varied in inner and outer arrays often many runs In robustness testing small factor intervals are usually used
2/10/2004
102
The Taguchi approach - Three phases

product design
measures quality in terms of the loss suffered by society caused by product variability around a specified target loss function desirable product is one for which the total loss is acceptably small specifies an acceptable region within which the final product design can lie
parameter design
equivalent to using DOE for finding optimal settings of the process variables
tolerance design
takes place when optimal factor settings have been specified tolerances on the factors are further adjusted if variability in the product quality is unacceptably high accomplished by using a mathematical model of the process, and the loss function belonging to the product property of interest
2/10/2004
103
Arranging factors in inner and outer arrays

Design factors easy-to-control affect the mean form the inner array Noise factors hard-to-control may or may not affect the process mean, and the spread around the mean comprise outer array Example: CakeMix
2/10/2004
Temp
Temp
225 175 30 Time 50 Temp 225 175 225 Temp 175 30 Time 50 Temp 225 175 30 Time 50 30 Time 50 30 Time 50 Temp 225 175 30 Time 50 Temp 225 175
225 175
100 Eggpowder
6
30 Time 50
Temp
30 Time 50
225 175
30 Time 50 Temp 225 175
50 100
eni n g
50
200
Flour
400
104
Sho rt
CakeMix application
Inner and outer array system requires many experiments CakeMix: 11*5 = 55 experiments Experimental goal was to find levels of the three ingredients producing a good cake
(a) when the noise factors temperature and time were correctly set according to the instructions on the box, and (b) when deviations from these specifications occur
In this kind of testing, the producer has to consider worst-case scenarios corresponding to what the consumer might do with the product, and let these considerations regulate low and high levels of the noise factors
2/10/2004
105
The classical analysis approach

For each experimental point in the inner array, two responses are formed: average response for the five outer array experiments (Taste) StDev from the five outer array experiments (StDev)
2/10/2004
106
The classical analysis approach

Questions: Which factors affect the variation (StDev) only? Which factors affect the mean level (Taste) only? And which affect both responses? Note that with this approach, there will be no model terms related to the noise factors Standard deviation responses tend to be non-normally distributed, and log-transformation is common practice
2/10/2004
107

Replicate plots show no outliers Responses are inversely correlated and run #6 appears promising
Investigation: CakeTaguchi_classical Plot of Replications for Taste with Experiment Number labels
6.00 5.50 5.00 Taste
Investigation: CakeTaguchi_classical Plot of Replications for LogStD with Experiment Number labels
0.40 0.30 0.20 LogStD
Investigation: CakeTaguchi_classical Raw Data Plot with Experiment Number labels

0.40
LogStD
6 4 3 5 8 7 1
1 2
1
0.30
1 7 3 11 10 9 4 8
Taste
7 11 10 9
LogStD
4.50
9 11 10
3 2
0.20
5 4 6 8
7 8 9
0.10 0.00 -0.10 -0.20 1 2 3 4 5 6
0.10
0.00
4.00 3.50
2
-0.10
-0.20
3.60 3.80 4.00 4.20 4.40 4.60 4.80 5.00 5.20 5.40 5.60 5.80 6.00
Replicate Index
MODDE 7 - 2004-01-23 13:39:11
MODDE 7 - 2004-01-23 13:38:23
2/10/2004
108
Modelling results, interaction model

Investigation: CakeTaguchi_classical (MLR) Summary of Fit
Negative Q2 of StDev Two non-significant twofactor interactions
1.00
0.80 0.60
0.40
0.20 0.00
-0.20 Taste
LogStD
Investigation: CakeTaguchi_classical (MLR) Scaled & Centered Coefficients for Taste

0.10
0.40 0.20 0.00 -0.20 -0.40 -0.60 Fl Fl*Egg Sh*Egg Fl*Sh Egg Sh
Investigation: CakeTaguchi_classical (MLR) Scaled & Centered Coefficients for LogStD
0.00
-0.10
-0.20 Fl Fl*Sh Fl*Egg Egg Sh*Egg Sh
N=11 DF=4
R2=0.995 Q2=0.874
R2 Adj.=0.988 RSD=0.0768 Conf. lev.=0.95

MODDE 7 - 2004-01-23 13:44:13
N=11 DF=4
R2=0.959 Q2=-0.284
R2 Adj.=0.898 RSD=0.0540 Conf. lev.=0.95

MODDE 7 - 2004-01-23 13:44:57
2/10/2004
109
Modelling results, refined model

Model for StDev has improved Sh*Egg interaction is much smaller for StDev than for Taste Flour causes most spread around the average
0.10
0.40 0.20 0.00 -0.20 -0.40 -0.60 Fl Egg Sh*Egg Sh
Investigation: CakeTaguchi_classical (MLR) Summary of Fit
1.00
0.80
0.60
0.40
0.20
0.00 Taste
LogStD
Investigation: CakeTaguchi_classical (MLR) Scaled & Centered Coefficients for LogStD
0.00
-0.10
-0.20 Fl Egg Sh Sh*Egg

MODDE 7 - 2004-01-23 13:47:15
N=11 DF=6
R2=0.988 Q2=0.937
R2 Adj.=0.980 RSD=0.0974 Conf. lev.=0.95

MODDE 7 - 2004-01-23 13:47:33
N=11 DF=6
R2=0.939 Q2=0.677
R2 Adj.=0.899 RSD=0.0538 Conf. lev.=0.95
2/10/2004
110
Interpretation of refined model

Interaction term is more important for Taste than for StDev The two lines cross each other in the plot related to Taste, but not in the other interaction plot Both plots indicate that low level of Shortening and high level of Eggpowder is favorable for high Taste and low StDev
2/10/2004
111
Interpretation of refined model

Response contour plots useful for interpretation
The best cake mix conditions are found in the upper left-hand corner Flour = 400g, Shortening = 50g, and Eggpowder = 100g
2/10/2004
112
Limitations of classical analysis approach

Which noise factors are important?
There are no noise factors in the model!!!!!
For the Taguchi method to be really successful, one would need to be able to estimate the impact of the noise factors and possible interactions between the design and the noise factors The existence of such noise-design factor interactions is crucial, otherwise the noise (variability) cannot be reduced by changing some design factors
2/10/2004
113
The interaction analysis approach

Information regarding important noise-design factor interactions can be extracted if inner and outer arrays are combined into one single design Expectation: What were in the classical approach design factor effects on StDev, now correspond to noise-design factor crossterms
No Flour Shortening Eggpowder Temp Time Taste No Flour Shortening Eggpowder Temp Time Taste 1 200 50 50 175 30 1.1 34 200 50 50 225 50 1.3 2 400 50 50 175 30 3.8 35 400 50 50 225 50 2.1 3 200 100 50 175 30 3.7 36 200 100 50 225 50 2.9 4 400 100 50 175 30 4.5 37 400 100 50 225 50 5.2 5 200 50 100 175 30 4.2 38 200 50 100 225 50 3.5 6 400 50 100 175 30 5 39 400 50 100 225 50 5.7 7 200 100 100 175 30 3.1 40 200 100 100 225 50 3 8 400 100 100 175 30 3.9 41 400 100 100 225 50 5.4 9 300 75 75 175 30 3.5 42 300 75 75 225 50 4.1 10 300 75 75 175 30 3.4 43 300 75 75 225 50 3.8 11 300 75 75 175 30 3.4 44 300 75 75 225 50 3.8 12 200 50 50 225 30 5.7 45 200 50 50 200 40 3.1 13 400 50 50 225 30 4.9 46 400 50 50 200 40 3.2 14 200 100 50 225 30 5.1 47 200 100 50 200 40 5.3 15 400 100 50 225 30 6.4 48 400 100 50 200 40 4.1 16 200 50 100 225 30 6.8 49 200 50 100 200 40 5.9 17 400 50 100 225 30 6 50 400 50 100 200 40 6.9 18 200 100 100 225 30 6.3 51 200 100 100 200 40 3 19 400 100 100 225 30 5.5 52 400 100 100 200 40 4.5 20 300 75 75 225 30 5.15 53 300 75 75 200 40 6.6 21 300 75 75 225 30 5.3 54 300 75 75 200 40 6.5 22 300 75 75 225 30 5.4 55 300 75 75 200 40 6.7 23 200 50 50 175 50 6.4 24 400 50 50 175 50 4.3 25 200 100 50 175 50 6.7 26 400 100 50 175 50 5.8 27 200 50 100 175 50 6.5 28 400 50 100 175 50 5.9 29 200 100 100 175 50 6.4 30 400 100 100 175 50 5 31 300 75 75 175 50 4.3 32 300 75 75 175 50 4.05 33 300 75 75 175 50 4.1
2/10/2004
114

No strong noisedesign factor interaction !!!!!
Investigation: CakeTaguchi_interaction (MLR) Summary of Fit
1.00
0.80 0.60
0.40
0.20 0.00
-0.20 Taste
N=55 DF=39 Cond. no.=1.3110 Y-miss=0
Investigation: CakeTaguchi_interaction (MLR) Taste with Experiment Number labels

0.995 0.99 0.98 0.95 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.05 0.02 0.01 0.005 -4 -3
Investigation: CakeTaguchi_interaction (MLR)
Scaled & Centered Coefficients for Taste
1
-2
23 55 53 12 54 29 218 50 41 37 25 39 8 4 47 49 16 6 42 27 15 44 4 3 7 24 13 26 9 10 11 3 22 45 52 28 1 4 5 21 30 46 20 35 40 34 19 36 48 38 31 51 33 32 17
0.50
N-Probability
0.00
-0.50
-1.00
Fl*Ti
Fl
Ti
Egg*Ti
Sh*Ti
Sh*Te
N=55 DF=39
R2=0.605 Q2=0.185
R2 Adj.=0.453 RSD=1.0545
MODDE 7 - 2004-01-23 13:53:55
N=55 DF=39
R2=0.605 Q2=0.185
R2 Adj.=0.453 RSD=1.0545 Conf. lev.=0.95

MODDE 7 - 2004-01-23 13:57:09
2/10/2004
Sh*Egg
Egg*Te
Fl*Egg
Fl*Sh
Te*Ti
Fl*Te
Egg
Sh
Te
115

Six terms removed and three-factor interaction added An important three-factor interaction, Fl*Te*Ti !!!!!
Investigation: CakeTaguchi_interaction (MLR) Q2 Model Validity Summary of Fit Reproducibility
1.00 0.80 0.60 0.40 0.20 0.00 -0.20 Taste
N=55 DF=44 Cond. no.=1.3110 Y-miss=0
MODDE 7 - 2004-02-02 10:03:38
R2
Investigation: CakeTaguchi_interaction (MLR) Scaled & Centered Coefficients for Taste

1.00 0.50 0.00 -0.50 -1.00 Fl*Ti Fl*Te*Ti Ti Sh*Egg Te*Ti Fl Fl*Te Egg Sh Te
2/10/2004
116
Interpretation of the three-factor interaction

By adjusting Flour to 400g the spread in Taste due to variations in Temperature and Time is minimized
Investigation: CakeTaguchi_interaction (MLR) Interaction Plot for Fl*Te*Ti, resp. Taste
Te (low ), Ti (low ) Te (low ), Ti (high) Te (high), Ti (low ) Te (high), Ti (high)
Te (low), Ti (high) Te (high), Ti (low) Te (high), Ti (low)
Taste
Large variation
Small
Te(high), (low), Ti Te Ti (high) (high) Te (low), Ti (low)
variation
Te (low), Ti (low) Te (high), Ti (high)

200 220 240 260 280 300 Flour 320 340 360 380 400
N=55 DF=44
R2=0.693 Q2=0.571
R2 Adj.=0.623 RSD=0.8751
MODDE 7 - 2004-02-02 10:05:09
2/10/2004
117
Response contour plots

Flatness at Flour 400g indicates sufficient robustness towards consumers not following baking instructions
Sh = 50, Egg = 100
2/10/2004
118
A second example - DrugD

Classical analysis approach
Interaction analysis approach
2/10/2004
119
DrugD - The classical analysis approach

Investigation: DrugD - classical (MLR) Summary of Fit
1.00
Investigation: DrugD - classical (MLR) Summary of Fit

1.00 0.80 0.60 0.40 0.20 0.00 -0.20
Some model refinement necessary After: Strong model for OneHour, and no model for log SD1h (robust) All factors but Volume influence the average release
0.80 0.60 0.40 0.20 0.00 -0.20 OneHour

N=27 DF=12 Cond. no.=6.6122 Y-miss=0
SD1h~
OneHour
N=27 DF=18 Cond. no.=5.9888 Y-miss=0
SD1h~
2/10/2004
120
DrugD - Graphical evaluation

Flatness of the response surface: the difference between the highest and lowest measured values is as low as 4.1% OneHour is robust
Temp = 39 PropSpeed = 100

2/10/2004
121
DrugD - The interaction analysis approach

Investigation: DrugD - interaction Plot of Replications for OneHour with Experiment Number labels
Investigation: DrugD - interaction Histogram of OneHour

30.00
Partially quadratic model with R2 = 0.79 and Q2 = 0.74 N-plot and ANOVA indicate model validity
36 35 34 33 32 31
62 142 87 49 115 74 35 104 61 114 125 30 32 44 27 88 23 143 128 113 140 153 155 20 34 43 102 47 59 152 103 107 77 86 112 17 31 124 136 116 5 50 89 4 54 3 139 158 141 157 26 33 45 48 72 8 1622 73 60 69 70 138 75 131 58 156 84 99 130 76 151 101 111 81 53 134 46 80 160 19 29 18 129 82 98 25 161 57 127 135 133 79 162 108 137 1 67 71 100 159 52 21 13 109 154 2 110 148 15 28 97 106 126 78 85 96 149 68 150 55 14 123 39 40 121 41 42 95 38 122 120 118 56 66 83 93 94 105 117 132 37 51 65 36 12 92 11 91 146 10 90 119 147 144 64 145 24 9 63 6 7 0 20 40 60 80 100 120 140 160 Replicate Index
MODDE 7 - 2004-02-02 10:37:16
OneHour
20.00 Count 10.00 0.00
30.00
30.45
30.90
31.35
31.80
32.25
32.70
33.15
33.60
34.05
34.50
34.95
35.40
Bins
R2 Investigation: DrugD - interaction (MLR) Q2 Summary of Fit Model Validity
Investigation: DrugD - interaction (MLR) OneHour with Experiment Number labels

0.995 0.99 0.98 0.95 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.05 0.02 0.01 0.005 -4 -3 87 102 104 125 74 159 27 124 136 43 67 1 3 6 69 62 23 44 49 117 16 118 114 30 70 107 75 78 148 153 112 156 48 151 152 4 128 142 36 73 82 17 139 129 31 90 20 115 77 91 155 29 137 38 39 37 7 26 2 3 58 19 65 32 93 54 121 113 35 68 99 84 149 1 20 45 2 103 110 158 92 105 46 140 134 47 86 8 1 1 14 10 50 15 40 66 138 1 27 7 111 21 131 96 59 80 109 61 98 100 160 95 88 132 101 141 18 144 97 11 33 53 9 4 143 5 60 12 25 157 1 9 22 108 150 161 146 41 135 123 133 64 79 154 51 57 119 162 63 89 145 28 71 130 22 76 106 116 855 5 52 42 3 8 34 126 56 147 -2 -1 0 1 2 3 4
MODDE 7 - 2004-02-02 10:38:34
Reproducibility
1.00
0.80
N-Probability
0.60
0.40
0.20
24
0.00 OneHour
N=162 DF=147 Cond. no.=6.6122 Y-miss=0
N=162 DF=147

R2=0.791 Q2=0.745 R2 Adj.=0.771 RSD=0.5867
MODDE 7 - 2004-02-02 10:39:50
2/10/2004
35.85
122
DrugD - Modelling results from interaction analysis

Bath B2 provides slightly higher numerical values than other baths Effect of B2 is weak and must not be over-interpreted (right-hand plot) The recognition of this small term is important for further fine-tuning of the experimental equipment
Investigation: DrugD - interaction (MLR) Scaled & Centered Coefficients for OneHour (Extended)
Investigation: DrugD - interaction (MLR) Scaled & Centered Coefficients for OneHour (Extended)
5.0 4.0 3.0 2.0 1.0 0.0 -1.0 -2.0 -3.0 -4.0
0.50
0.00
-0.50
-1.00 Vol*Vol Vol Ba(B1) Ba(B2) Ba(B3) Ba(B4) Ba(B5) Ba(B6) Te Te*Te PrS PrS*PrS Vol*pH pH pH*pH
-5.0 Vol*Vol Vol Ba(B1) Ba(B2) Ba(B3) Ba(B4) Ba(B5) Ba(B6) Te Te*Te PrS PrS*PrS Vol*pH pH pH*pH
N=162 DF=147
R2=0.791 Q2=0.745
R2 Adj.=0.771 RSD=0.5867 Conf. lev.=0.95

MODDE 7 - 2004-02-02 10:40:51
N=162 DF=147
R2=0.791 Q2=0.745
R2 Adj.=0.771 RSD=0.5867 Conf. lev.=0.95

MODDE 7 - 2004-02-02 10:41:43
2/10/2004
123
A third example: LoafVolume

Investigation of which factors affect loaf volume Target volume = 530 cm3 Inner array: Recipe
mixture of three wheat flours (Tjalve, Folke, Hard RS)
Outer array: Baking conditions which vary from bakery to bakery

mixing time of dough proofing time of dough
2/10/2004
124
LoafVolume - Classical analysis approach

PLS used for mixture data Strong model for volume Weaker model for StDev Folke and HardRS affect volume
Investigation: Loafvol2 (PLS, comp.=2) Summary of Fit
1.00
R2 Q2
0.80
0.60
0.40
0.20
0.00 loafvolume
stdev
2/10/2004
125
Model interpretation
Is it possible to get a volume of 530 and minimize spread?
Arrow shows best compromise
2/10/2004
126
LoafVolume - Interaction analysis approach

Use PLS Proofing time important
Investigation: Loafvolume (PLS, comp.=2) Summary of Fit
1.00
R2 Q2
0.80
0.60
0.40
0.20
0.00 loafvolume
N=90 DF=75 Cond. no.=8.2742 Y-miss=0
Investigation: Loafvolume (PLS, comp.=2)

Investigation: Loafvolume (PLS, comp.=2) Score Scatter: t[1] vs u[1]
4 3 2 1 0 -1 -2 -3
Investigation: Loafvolume (PLS, comp.=2) Score Scatter: t[2] vs u[2]
Scaled & Centered Coefficients for loafvolume
80 81 71 26 63 77 78 72 87 45 62 86 53 70 60 3436 54 69 88 59 44 68 50 79 84 35 27 52 33 74 23 61 51 25 32 18 8 83 43 42 4976 41 14 6 65 40 67 20 17 15 85 22 9 16 66 13 29 56 31 24 7 57 5 47 4 55 39 38 58 75 82 19 30 11 37 12 73 48 64 3 2 46 21 10 28 1
-3 -2 -1 0 t[1] 1 2 3
8990
3 2 1 u[2] 0 -1 -2 -3
680 45 8 26 81 63 9 62 3460 77 7 36 4 70 78 59 5344 71 5 33 43 40 42 74 18 50 352 72 55 61 32 41 89 54 49 15 20 39 23 14 27 51 35 25 2 69 79 29 57 56 76 90 13 116 68 37 38 65 17 67 2231 12 30 47 86 1124 1966 87 58 84 73 88 48 83 10 64 28 46 75 21 85

-3 -2 -1 t[2] 0 1 2
40.00
20.00 cm3 0.00 -20.00
u[1]
Mi*Mi
Tj*Tj
Mi
Tj
Pr*Mi
Pr*Pr
Mi*Tj
Pr*Tj
Pr
Ha
Tj*Ha
Ha*Ha
-4
N=90 DF=75
N=90 DF=75
N=90 DF=75
R2=0.894 Q2=0.754
R2 Adj.=0.874 RSD=22.6934 Conf. lev.=0.95

MODDE 7 - 2004-02-02 10:55:33
2/10/2004
Fo*Ha
Fo*Fo
Mi*Ha
Pr*Ha
Fo
Mi*Fo
Pr*Fo
Tj*Fo
82
127
Model interpretation
Volume sensitive to changes in proofing time
With short proofing time goal is not obtainable
2/10/2004
128
Model interpretation at compromise point

What does contour plot look like at mixture
Tjalve 0.25 Folke 0.11 HardRS 0.64 ?
Volume sensitive (= not robust) to changes in proofing and mixing time (which was discovered in classical analysis approach, as well)
2/10/2004
129
An additional element of robust design

Sometimes it is more appropriate to distinguish among factors which are expensive and inexpensive to vary Example drilling: Expensive, drill features diameter length geometry Cheap, machine conditions cutting speed feed rate cooling (yes/no)
2/10/2004
Inner array: Expensive factors Outer Array: Cheap factors 17 experiments per drill !
130
Summary
We have discussed
the Taguchi approach to robust design the concept of inner and outer arrays of factors the classical analysis approach the interaction analysis approach how to handle robust design testing when some factors are expensive and some inexpensive to vary
2/10/2004
131
Additional Topics
Simultaneous optimization of several responses fitted with different models
Contents
Background Example Data analysis Linked response Simultaneous optimization
2/10/2004
133
Background
When is there an interest in fitting different models to different responses? When working with many responses that are grouped
A PLS model fitted to grouped responses tends to have many components and be difficult to interpret
Selectivity among responses

A tailor-made model for each response may facilitate optimization toward a factor combination where selectivity among responses is obtained
2/10/2004
134
Example: TruckEngine
Create one investigation for each response
2/10/2004
135
Data Analysis: TruckEngine

Fit a unique model to each response (Q2 maximization) Fuel
Investigation: TruckE_Fuel (MLR) Scaled & Centered Coefficients for Fuel
10 5 mg/st 0 -5 -10
NOx
Investigation: TruckE_NOx (MLR) Scaled & Centered Coefficients for NOx
2 0 mg/s
mg/s 0.40
Soot~
Investigation: TruckE_Soot (MLR) Scaled & Centered Coefficients for Soot~
-2 -4
0.20
0.00
-6 -8
Air*Air Air NL*NL Air*NL EGR NL
-0.20
Air*Air
Air
Air
NL
NL*NL
EGR*NL
EGR
EGR*EGR
N=17 DF=10
R2=0.985 Q2=0.959
R2 Adj.=0.976 RSD=1.9080 Conf. lev.=0.95

MODDE 7 - 2004-02-02 11:08:36
N=17 DF=13
R2=0.945 Q2=0.917
R2 Adj.=0.932 RSD=0.1188 Conf. lev.=0.95

MODDE 7 - 2004-02-02 11:10:07
N=17 DF=9
R2=0.997 Q2=0.987
R2 Adj.=0.995 RSD=0.4624 Conf. lev.=0.95

MODDE 7 - 2004-02-02 11:09:12
2/10/2004
EGR
NL
136
Linking responses into a MODDE investigation

Investigation dealing with Fuel was chosen as reference project Define a new response (NOx) and find its root investigation
2/10/2004
137

Select the response(s) of interest First NOx
Then repeat whole procedure for Soot

2/10/2004
138

Resulting worksheet All settings regarding responses + coefficients of tailor-made models are brought into new base project
2/10/2004
139
Optimization results
Simplex #5 most successful
2/10/2004
140
Optimization results
Bring optimization results to response contour plots, or SweetSpot plot
Air = 240
2/10/2004
141
Summary
Linked responses can be used when responses are not correlated This means that one may, e.g., use PLS in a mother project for the analysis of a group of correlated responses, and then attach (link) another response and its model (MLR) coefficients prior to optimization Flexibility/Selectivity
outliers more easily eliminated PLS/MLR different models for different responses
Requirement: Same basic worksheet

2/10/2004
142
Additional Topics
Partial Least Squares Projections to Latent Structures, PLS
Contents
Introduction to PLS Geometrical interpretation of PLS LOWARP example
2/10/2004
144
When to use PLS

PLS is a pertinent choice, if (i) there are several correlated responses in the data set, (ii) the experimental design has a high condition number (>10), or (iii) there are small amounts of missing data in the response matrix
2/10/2004
145
Notation
K M N A = = = = number of X variables number of Y variables number of observations number of PLS components
T P W U C
= = = = =
matrix of X-scores with col.s t1,.., tA (vectors) matrix of X-loadings with col.s p1,.., pA (vectors) matrix of PLS X-weights with col.s w1,.., wA (vectors) matrix of Y-scores with col.s u1,.., uA (vectors) matrix of PLS Y-weights with col.s c1,.., cA (vectors)
2/10/2004
146
Scaling of variables
x3
measured values & "length"
3
x1 x2 x3
unit variance scaling
20
x1
x2
Defining/Selecting the length of variable axes (X and Y-spaces) Recommended: To set each axis to unit length (unit variance scaling)
2/10/2004
147
PLS -- Geometric Interpretation, 1

x3
factors/predictors K=3
observations
responses M=3
y3
X
N N
Y
x2 y2
x1
y1
For each matrix, X and Y, we construct a space with K and M dimensions, respectively (here K=M=3) Each X- and Y-variable has one coordinate axis with the length defined by its scaling, typically unit variance
2/10/2004
148
Each observation is represented by one point in the X-space and one in the Y-space As in PCA, the initial step is to calculate and subtract the averages; this corresponds to moving the coordinate systems
2/10/2004
149

x3 y3
x2
x1 y1
Same observation
y2
The mean-centering procedure implies that the origos of the coordinate systems are repositioned
2/10/2004
150

x3
Comp 1 (t1) y3 Comp 1 (u1)
x2
x1 y1
Projection of observation i
y2
The first PLS-component is a line in X-space and a line in Y-space, calculated to a) well approximate the point-swarms in X and Y and b) maximize covariance between the projections (t1 and u1) These lines pass through the average points
2/10/2004
151
PLS- Geometric Interpretation, 5
The projection coordinates, t1 and u1, in the two spaces, X and Y, are connected and correlated through the inner relation ui1 = ti1 + hi (where hi is a residual) The slope of the dotted line is 1.0
2/10/2004
152

x3 Comp 1 (t1) y3
Comp 1 (u1) Comp 2 (u2)

x2 y2
Comp 2 (t2)
x1
y1
The second PLS component is represented by lines in the X- and Y-spaces orthogonal to the lines of the first component, also going through the average points. These lines, t2 and u2, improve the approximation and correlation as much as possible.
2/10/2004
153
The second projection coordinates (t2 and u2) correlate, but less well than the first pair of latent variables By inserting X-values of a new observation into the model, we obtain its t1- and t2scores, which through the inner relation give values of u1 and u2, which, in turn, enable predicted values of Y to be computed
2/10/2004
154
PLS predictions
A new observation is similar to the training set if it is inside the tolerance cylinder in X-space Then its projection on the X-model (t) can be entered into the T-U-relation giving a u-value for each model dimension These values define a point on the Yspace model, which, in turn, corresponds to a predicted value for each y-variable
2/10/2004
155

x3
Comp 1 (t1) y3 Comp 1 (u1) Comp 2 (u2)
Comp 2 (t2)
x2
x1 y1
y2
The PLS components form planes in X- and Y-spaces The variability around the X-plane is used to calculate a tolerance interval within which new observations similar to the training set will be located. This is of interest in classification and prediction.
2/10/2004
156

Repeated plotting of successive pairs of latent variables will give a good appreciation of the correlation structure
2/10/2004
157
PLS, Overview
X = 1* x + T* P'+E Y = 1* y + U *C'+F = 1 * y + T * C'+G
(because U = T + H) (inner relation)
PLS Projection of X that both approximates X well, and correlates with Y

2/10/2004
differences to
PCA Projection of X that is an optimal approximation of X (least squares fit)

158
PLS, Parameter properties

For each component: 1) t are linear combinations of X with weight w - t is a summary of the X variables that are correlated with Y 2) u are linear combinations of Y with weight c - u is a summary of the Y variables 3) w are the correlation coefficients between the x's and u - Columns of X highly correlated with Y are given high weights 4) At Convergence for the Orthogonality: - p is computed so that t*p' is the "Best approximation of X" - t*p' is removed from X for the next component
2/10/2004
159
The LOWARP application

Production of a polymer Four factors, ingredients, were varied according to a 17 run mixture design 14 responses were measured The desired combination was low warp/shrinkage and high strength
2/10/2004
160
LOWARP worksheet
Contains some missing data and many correlated responses
2/10/2004
161
PLS model interpretation - R2/Q2 & scores

Investigation: Lowarp (PLS, comp.=3) PLS Total Summary (cum)
1.00
R2 Q2
Investigation: Lowarp (PLS, comp.=3) Score Scatter: t[1] vs u[1] with Experiment Number labels
2
After three components R2 = 0.75 and Q2 = 0.53.
16 3 2 17 8 12
10
0.80
1
R2 & Q2
u[1]
0.60
0.40
-1
0.20
7 6 11 9 4
-2
14 15 13 1
-2
0.00 Comp1
N=17 DF=13
5
-1 0 t[1]
N=17 DF=13 Cond. no.=2.0457 Y-miss=10
Comp2
Comp3
3 2 1 0 -1 -2
10 17 16 14 15 7 13 9 5 3
1 11
3 16 10 13 6 14 11 5 2 15 12 8
-2 -1 0 t[3]
N=17 DF=13 Cond. no.=2.0457 Y-miss=10
0 u[2]
9 1
-1
4 12
u[3]
-2
17 7
-3
2
-2 -1 t[2]
N=17 DF=13 Cond. no.=2.0457 Y-miss=10
2/10/2004
162
PLS model interpretation - Loadings
Investigation: lowarp (PLS, comp.=3) Loading Scatter: wc[1] vs wc[2]
Investigation: lowarp (PLS, comp.=3) Loading Scatter: wc[1] vs wc[2]
st3 st5
0.50 wc[2]
st1 gl
st3 st5
0.50 wc[2]
mi
0.00
st4 st2 st6 w2 w6 w1 w5 w7 w3 w8 am w4
st1 gl
mi
0.00
st4 st2 st6 w2 w6 w1 w5 w7 w3 w8 am w4
-0.50
cr
wc[1]
-0.50
cr
wc[1]
-0.80 -0.60 -0.40 -0.20 0.00 0.20 0.40 0.60 0.80
-0.80 -0.60 -0.40 -0.20 0.00 0.20 0.40 0.60 0.80
2/10/2004
163
PLS model interpretation - Loadings & Coefficients

Coefficient profiles for correlated and uncorrelated responses
Investigation: Lowarp (PLS, comp.=3) Loading Scatter: wc[1] vs wc[2]
0.80 0.60 0.40 wc[2] 0.20 0.00 -0.20 -0.40 -0.60 -0.80 -0.60 -0.40 -0.20
st3 st5
mi
st1 gl st4 st2 st6 w2 w6 w1 w5 w3 w7 w8 am w4 cr

0.00 0.20 0.40 0.60 0.80
wc[1]
N=17 DF=13 Cond. no.=2.0457 Y-miss=10
2/10/2004
164
PLS model interpretation - Coefficients & VIP

0.80 0.60 0.40 wc[2] 0.20 0.00 -0.20 -0.40 -0.60 -0.80 -0.60 -0.40 -0.20

0.60
Variable importance for projection, VIP, is the most condensed way of expressing variable related information
st3 st5
cr w2 w6 w5 st6 w3 st1 w1 st2 st4 w8 w7 am

-0.80 -0.60 -0.40 -0.20 0.00 wc[1]
N=17 Cond. no.=2.0457 Investigation: Lowarp (PLS, comp.=3) DF=13 Y-miss=10 Variable Importance Plot
w2 w6 w1 w5 w7 w3 w8 am w4 cr
0.00 wc[1]
N=17 Cond. no.=2.0457 Investigation: Lowarp (PLS, comp.=3) DF=13 Y-miss=10 Loading Scatter: wc[2] vs wc[3]
wc[3]
mi
st1 gl st4 st2 st6
0.40 0.20 0.00 -0.20 -0.40 -0.60
glw4
st5 st3 mi
0.20
0.40
0.60
0.80
0.20
0.40
0.60
0.80
0.60 0.40 0.20 wc[3] 0.00 -0.20 -0.40 -0.60
cr w4 gl st5 st6 st1 st3 st2st4 mi w2 w6 w5 w3 w1
1.20 1.00 0.80 VIP 0.60 0.40 0.20
w8 w7 am
-0.60 -0.40 -0.20 0.00 0.20
mi
gl
wc[2]
N=17 DF=13 Cond. no.=2.0457 Y-miss=10
N=17 DF=13
2/10/2004
am
cr
0.40
0.60
0.80
0.00
165
Summary PLS
PLS is a multivariate regression method which is useful for handling complex DOE problems PLS is especially useful when:
(i) there are several correlated responses in the data set (ii) the experimental design has a high condition number (iii) there are small amounts of missing data in the response matrix
PLS calculates a new variable, t, summarizing X, and a another new variable, u, summarizing Y, and investigates the correlation between them All diagnostic tools available for MLR are retained for PLS In addition, PLS provides other diagnostic tools, such as, scores, loadings, and VIP
2/10/2004
166
Additional Topics
Design in Latent Variables
Contents
Introduction what is design in latent variables ?
Multivariate characterization Selecting informative molecules; COST vs DOE SMD: Increasing reliability of model and data
Example: Lead finding and lead optimization Example: Onion and an overview of design families
FDs and FFDs D-optimal design Cell-based & Grid-based design Space filling design Onion design principles
Onion design three examples Summary

2/10/2004
168
Introduction
In QSAR the central idea is to develop a model based on a small-sized training set, and calculate predictions for large numbers of non-tested compounds This means that the few chemicals in the training set should be representative and have a balanced distribution How do we accomplish this ?
Multivariate characterisation data matrix examined by PCA Principal Properties (few, orthogonal) Statistical Molecular Design (SMD) in principal properties (PP) Compounds are selected by matching the PP-scores to the chosen design
2/10/2004
169
Multivariate Characterization an important step in SMD
A way to quantify qualitative, discrete, changes. The chemical descriptors must account for the dominant properties of the compounds, i.e. the principal properties that are known or anticipated to influence biological activity.
Properties such as
hydrophobicity steric properties (size) electronic properties (chemical) reactivity
PCA of the multi-property matrix gives the (latent) principal properties in terms of the principal component scores
2/10/2004
170
COST versus Statistical Molecular Design
The COST Approach Vertical line: A is held constant while varying B Horizontal line: B is kept constant while varying A
The Design Approach Both factors A and B are varied simultaneously. This results in a better and more efficient mapping of the modelled response.
2/10/2004
171
Introduction: The intuitive approach - COST

ESREVID.M9 (PC), PCA, Work set Scores: t[1]/t[2]
Chemical map of 60 haloalkanes Trace of COSTing Problem: Limited range of applicability & reliability
Density
54 55
27 58 56 5716 23
17
1 t[2]
-1
-2
22 52 53 47 48 18 15 14 29 21 24 28 26 13 20 6 50 49 25 11 10 46 7 45 32 3 44 8 41 4 2 5 42 1 43 12 30 40 33 38 37 19 36 9 34 35
-6 -5 -4 -3 -2 -1 0 t[1] 1 2 3
51
-3
39 31
4 5 6
Ellipse: Hotelling T2 (0.05)

Simca-P 8.0 by Umetrics AB 2000-11-20 08:40
Mw/log P
2/10/2004
172
Increasing reliability of model & data: SMD

Trace of COST approach
Structural factor space
2/10/2004
173
Increasing reliability of model & data: SMD

SMD efficiently fills space; few points, much information
Result from long COSTing
Results from COST-ing

2/10/2004
174
SMD Factorial and fractional factorial designs
Two-level factorial and fractional factorial designs with centre points are useful in QSAR modelling
2/10/2004
175
Selecting the Training Set

DV1 DV2 DV3 DV4 + + + + 0 0 + + + + 0 0 + + + + 0 0 + + + + 0 0 PP1 -0.72 1.96 -1.77 1.2 -1.69 1.14 -3.2 1.92 0.52 0.56 PP2 -1.26 -0.86 1.22 0.89 -0.83 -0.4 1.68 0.79 0.22 0.03 PP3 -1.29 -0.81 -0.14 -0.9 0.92 0.95 1.07 0.13 0.7 0.28 PP4 -0.51 0.15 -0.08 -0.12 0.7 -1.1 -1.9 0.7 0.48 1.54 [52] CH3-CH2-CH2-CH2Br [48] CH3-CHCl-CH3 [33] CH3-CHBr2 [30] CH3-CH2Br [15] CHCl2-CHCl2 [07] CCl3F [39] CBr3F [02] CH2Cl2 [3] CHCl3 [11] CH2Cl-CH2Cl
Balanced coverage of score plot!
2/10/2004
176
Example of SMD: Surfactants

The 8 lipophilic surfactants were excluded, and an updated PC-model was computed R2X = 0.76
Q2 = 0.55 A=3
Surfact2.M2 (PC), pca of sub-set, Work set Scores: t[1]/t[2]
7 4
6 35 16 5 30
-2
8 21 28 33 22 31 15 24 9 23 20 32 27 3725 26 11 34 12
-7 -6 -5 -4 -3 -2 -1 0 t[1] 1
t[2]
1 36
-4
38
2 3 4 5 6 7
Selected surfactants: 2,5,8,9,11,30,31,33,37,3 8
Ellipse: Hotelling T2 (0.05)

Simca-P 8.0 by Umetrics AB 2000-05-29 13:44
2/10/2004
177
Lead finding and lead optimization with SMD

A desired pharmacological profile based on five biological tests was specified Eight commercially available compounds (Sub1-Sub8) were tested in the five biological tests
1
Cl N N N N H N Cl O 2 O O N O 5 N 6 7 N Cl N N 3 O N N S
O Cl S 4
S N
Cl N Cl O 8
2/10/2004
178
Lead finding and lead optimization with SMD

PCA (R2 = 0.71) gives
aldrich .M2 (PCA-X), overview PCA model P+ t[Comp. 1]/t[Co mp. 2]
3 2 1 t[2] 0 -1 -2 -3 -5 -4 -3 -2 -1 0 t[1] 1 2 3 4 5
aldrich.M2 (PCA-X), overview PCA model P+ p[Comp. 1]/p[Comp. 2]

0.80 0.60
Test4 Test3 Test5 Test1
p[2]
Target Sub2 Sub1
Sub7
0.40 0.20 0.00 -0.20 -0.40 -0.60 -0.80 -0.60 -0.40 -0.20 0.00 p[1] 0.20 0.40
Sub5 Sub3 Sub8 Sub4
Sub6
Test2
0.60
Substances 1 and 2 are promising as leads. Sub2 would be the natural first choice.
2/10/2004
Some redundancy among the five tests. Three tests are sufficient in the future, e.g. 2, 3 and 4.
179
Substituent scales for aromatic substituents

t1 t2 -0.11 -0.68 -0.75 -0.03 -0.62 -0.18 -0.50 -0.24 -0.29 -0.24 -0.31 -0.26 -0.05 -0.08 -0.53 -0.25 -0.83 -0.11 -0.09 -0.20 -0.44 -0.33 -0.18 -0.45 -0.36 -0.86 -0.16 -0.54 -0.05 -0.59 -0.50 -0.45 -0.20 -0.13 -0.63 -0.06 t3 -0.04 -0.55 -0.18 -0.10 -0.28 -0.06 -0.66 -0.14 -0.06 -0.41 -0.32 -0.60 -0.60 -0.49 -0.50 -0.46 -0.35 -0.19 -0.19 -0.21 -0.24 -0.31 -0.63 -0.25 -0.67 -0.33 -0.45 -0.30 -0.53 -0.30 -0.59 -0.64 -0.62 -0.47 -1.00 -0.22 C CC6H5 NHCOC6H5 -1 +1 -1 F H OH SH NH2 NHNH2 NHCN CH2Cl NHCHO NHCONH2 OCH3 CH2OH SCH3 NHCH3 C CH CH2CN CH=CH2 NHCOCH3 CH2CH3 OCH2CH3 CH2OCH3 SC2H5 NHC2H5 NHCOOC2H5 OCHMe2 i-C4H9 +1 +1 -1 CH2CH2COOH OC3H7 SC3H7 NHC3H7 OC4H9 NHC4H9 N=CHC6H5
Suppose we select Sub2 as our lead Convention: 'OH' = pos 1, 'orto-Cl' pos 2, 'para-Cl' pos 3; Quinoline scaffold not varied Substituent descriptors (principal properties) taken from Skagerberg et al. (QSAR 8 (1989), 32-38
-1 -1 -1 Cl NO NO2 N3 SO2NH2 OCF3 CN NCS SCN CHO COOH CONH2 CH=NOH NHCSNH2 SOCH3 OSO2CH3 SO2CH3 NHSO2CH3 NHCOCF3 CH=CHNO2 COCH3 SCOCH3 OCOCH3 COCH3 CONHCH3 SO2C2H5 +1 -1 -1 N=CCl2 COOC2H5 CH=CHCOCH3 COOC3H7 N=NC6H5 OSO2C6H5 NHSO2C6H5 OCOC6H5 CHNC6H5 CH2OC6H5 -0.59 -0.77 -0.68 -0.25 -0.54 -0.30 -0.57 -0.32 -0.40 -0.64 -0.53 -0.50 -0.27 -0.21 -0.48 -0.39 -0.43 -0.31 -0.32 -0.27 -0.46 -0.12 -0.29 -0.23 -0.25 -0.15 0.06 0.10 0.10 0.38 0.80 0.78 0.71 0.74 0.74 0.83
t1 1.00 0.88 -0.88 -1.00 -0.83 -0.57 -0.72 -0.57 -0.55 -0.43 -0.41 -0.17 -0.45 -0.52 -0.29 -0.45 -0.33 -0.44 -0.29 -0.19 -0.32 -0.19 -0.25 -0.02 -0.09 -0.07 -0.12 -0.03 0.09 0.14 0.28 0.23 0.43 0.51 0.82
t2 -0.26 -0.07 0.17 0.61 0.55 0.05 0.80 0.63 0.13 0.18 0.04 0.20 0.40 0.28 0.15 1.00 0.02 0.17 0.20 0.06 0.47 0.40 0.20 0.08 0.77 0.18 0.45 0.25 0.19 0.33 0.09 0.74 0.34 0.74 0.29
t3 -0.13 -0.35 -0.36 -0.48 -0.52 -0.10 -0.51 -0.41 -0.25 -0.16 -0.65 -0.72 -0.36 -0.55 -0.06 -0.26 -0.45 -0.34 -0.13 -0.68 -0.01 -0.52 -0.64 -0.06 -0.36 -0.06 -0.14 -0.14 -0.56 -0.40 -0.02 -0.35 -0.35 -0.32 -0.97
t1 -1 -1 +1 Br SOOF SF5 I CF3 SCF3 SOOCF3 CF2CF3 PMe2 COC3H7 +1 -1 +1 2-Thienyl SOOC6H5 COC6H5 -1 +1 +1 CH2Br CH2I CH3 NMe2 Cyclo-Pr CHMe2 C3H7 t-C4H9 CH2C6H5 +1 +1 +1 s-C4H9 n-C4H9 C5H11 C6H5 OC6H5 NHC6H5 CycloHex -0.48 -0.63 -0.28 -0.30 -0.61 -0.15 -0.41 -0.35 -0.24 -0.11 0.26 0.22 0.04 -0.33 -0.15 -0.64 -0.34 -0.24 -0.17 -0.04 -0.10 -0.04 0.06 0.28 0.56 0.36 0.07 0.13 0.46
t2 -0.20 -0.95 -0.81 -0.22 -0.40 -0.36 -1.00 -0.44 -0.10 -0.60 -0.03 -0.84 -0.50 0.16 0.17 0.52 0.79 0.32 0.23 0.43 0.19 0.37 0.07 0.42 0.38 0.01 0.08 0.51 0.24
t3 0.06 0.09 0.35 0.22 0.33 0.18 0.28 0.46 0.39 0.47 0.12 0.18 0.84 0.10 0.25 0.00 0.12 0.26 0.66 0.00 0.95 1.00 0.50 0.02 0.04 0.17 0.71 0.64 0.63
2/10/2004
180
Selection of representative substituents

Select one representative for each substituent category Avoiding the most peculiar substitutents, a candidate list might be the following:
-1 +1 -1 +1 -1 +1 -1 +1 -1 -1 +1 +1 -1 -1 +1 +1 -1 -1 -1 -1 +1 +1 +1 +1 -NO2 -COOC2H5 -H -OC3H7 -I (or Br) -COC6H5 -CH(CH3)2 -C6H5
181
2/10/2004
Construction of lead-centered multivariate design

29-5 FFD in 16 runs
CompNo 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Position 1 a -1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 1 b -1 -1 1 1 -1 -1 1 1 -1 -1 1 1 -1 -1 1 1 abcd 1 -1 -1 1 -1 1 1 -1 -1 1 1 -1 1 -1 -1 1 Position 2 c -1 -1 -1 -1 1 1 1 1 -1 -1 -1 -1 1 1 1 1 abc -1 1 1 -1 1 -1 -1 1 -1 1 1 -1 1 -1 -1 1 bcd -1 -1 1 1 1 1 -1 -1 1 1 -1 -1 -1 -1 1 1 Position 3 d -1 -1 -1 -1 -1 -1 -1 -1 1 1 1 1 1 1 1 1 acd -1 1 -1 1 1 -1 1 -1 1 -1 1 -1 -1 1 -1 1 abd -1 1 1 -1 -1 1 1 -1 1 -1 -1 1 1 -1 -1 1
Columns x1 x3 represent substituent position 1, columns x4 x6 position 2, and columns x7 x9 position 3 The proposed molecular structures should be checked with the synthetic chemists
CompNo 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Position 1 I COOC2H5 H C6H5 NO2 COC6H5 CH(CH3)2 OC3H7 NO2 COC6H5 CH(CH3)2 OC3H7 I COOC2H5 H C6H5
Position 2 NO2 H CH(CH3)2 I C6H5 COC6H5 COOC2H5 OC3H7 I CH(CH3)2 H NO2 OC3H7 COOC2H5 COC6H5 C6H5
Position 3 NO2 CH(CH3)2 I H H I CH(CH3)2 NO2 C6H5 COOC2H5 OC3H7 COC6H5 COC6H5 OC3H7 COOC2H5 C6H5
2/10/2004
182
Overview of designs often used in SMD

Factorial and fractional factorial design D-optimal design (illustrated with example) Cluster-based design (illustrated with example) Cell-based & Grid-based design Space-filling design Onion (Russian doll) design Random complement + combinations thereof
2/10/2004
183
Example: Onion
Data set from AZ, Lund, Bosse Nordn
N = 1107 K = 115
10
onion_I.M1 (PCA-X), A is 2 t[Comp. 1]/t[Comp. 2]
Objective: Select 80 diverse and representative compounds First PCA score plot
= 0.50 (A = 2) R2X = 0.75 (A = 6) R2X
t[2]
-5
-10
-20
-10 t[1]
10
2/10/2004
184
Factorial or fractional factorial design

Two-level FDs and FFDs often used with small data sets To get 80 compounds we need to use at least 6 PCs E.g. 26 = 64, plus a random complement and centerpoint selection of 16 compounds Tedious and timeconsuming!
2/10/2004
185
What is D-optimal design?

Computer generated design The D-optimal design maximizes the determinant of the X'X matrix Geometrically, this is equivalent to saying that the volume of X is maximized
2/10/2004
186
D-optimal design in Onion_I

D-optimal samples outer part of score space with lots of replicates (i.e., 80 compounds is too much)
Investigation: Pure D-opt A = 2 Raw Data Plot
M1.t2
10 8 6 4 2 M1.t2
47 46 45 44 43 42 41 40 39 38 37 66 65 64 63 62 61 60 59 58 57 56 55 54 36 35 34 33
With A = 6 better sampling, but still replication

onion_I.M1 (PCA-X), A is 2 t[Comp. 1]/t[Comp. 2]
10
0 -2 -4 -6 -8 -10 -12
61 1 9 8 7 0 5 4 3 2 79 78 77 76 75 74 73 72 71 70 69 68 67
80
12 11
20 19 18 17 16 15 14 13 53 52 51 50 49 48 32 31 30 29 28 27 26 25 24
-20 -10 M1.t1 0 10
23 22 21
t[2]
-5
-10
-20
-10 t[1]
10
MODDE 7.0.0.1 - 2003-06-24 09:18:35
2/10/2004
187
Grid-based & Cell-based design

Grid is placed over the score space Structure closest to the - centre in each bin (cell), or - grid point (grid) is selected
t[2] 10
(PCA-X), Untitled t[1]/t[2]
Selection depends on mesh size and distribution of compounds Easy with A = 2, but complicated with A = 6!
-10
-20
0 t[1]
20
2/10/2004
188
Space-filling design
Similar to cell- & grid-based design Distance calculations between points in chemical space Compounds are selected giving the best coverage (smallest average distance between selected points) of the chemical space
2/10/2004
189
Onion design
Sees the chemical domain as composed of layers Selection becomes a function of number of layers and type of design laid out in each layer
2/10/2004
190
Some examples of onion designs

Case I: A = 2, L = 3, default settings, no requirements on NDes Case II: A = 6, L = 3, NDES = 80 Case III: As Case II, but with removal of outer 5%
2/10/2004
191
Onion design generation Step 1

Give investigation name and storage location Select SIMCA-P project, model number and how many components to consider
2/10/2004
192

Factor definition the factors have pre-formatted settings of low and high according to minimum and maximum score values of each score vector
2/10/2004
193

Response definition define responses (if any)
2/10/2004
194

Select experimental objective we here select RSM
2/10/2004
195

Select the desired model and design The number of layers can be changed here (but we use the default of 3)
2/10/2004
196

In the Layers dialogue you can change the size of the layers, the number of design runs and repetitions, and which model is to be supported
2/10/2004
197

One D-optimal design is generated inside each layer
2/10/2004
198

The worksheet of the resulting onion design is inspected
2/10/2004
199

Visualize the design (Design/Doptimal/Onion plot) Layers are clearly seen (center point hidden by other triangles)
2/10/2004
200

Look at the identity of the selected compounds (Design/Doptimal/Candidate Set). A small excerpt from the candidate set is shown.
2/10/2004
201
Case II SMD of 80 compounds

A = 6 (R2X = 0.75; Q2X = 0.70) The optimal onion design has 25 + 24 + 30 + 1 = 80 molecules This onion design supports int., int., and quadratic models in the respective layer
2/10/2004
202
Case III SMD of 80 compounds; outer 5% removed

L1: 0 30% L2: 30 60% L3: 60 95% SMD of 80 runs (24 + 24 + 31 + 1) supporting int., int., and quadratic models Extreme observations not included!
2/10/2004
203
What have we learnt - I

Multivariate characterisation and PCA offers a compact representation the PC-scores of molecular data that is well suited for design statistical molecular design, SMD The use of design ensures that systematic and representative variation is introduced into the training set (not possible with the COST-approach) Changes in molecular structures are discrete, not continuous, making Doptimal design a viable alternative
2/10/2004
204
What have we learnt - II

Onions combine the best of D-optimal design (few points) with the best of cell-based and space-filling design (inner coverage). The flexibility of onion-designs in terms of the number of layers, and the number of points in each layer, makes them very useful in practice. Space-filling and cell-based designs are very similar, and when relatively few points are selected they give similar results to D-optimal design. Unlike D-optimal and onion designs, space-filling and cell-based designs cannot be modified to correspond to different models, i.e., linear, interaction, quadratic, etc.
2/10/2004
205
What have we learnt - III

A random complement to any systematic selection is always useful Combination of different approaches:
An outer D-optimal design combined with an inner space-filling sometimes used within pharmaceutical industry
Onion design has same objective as above combination
2/10/2004
206

Chapter 12 Mixture Design One Day Add-On
Contents
Introduction A working strategy for mixture design
Example: Tablets
Application: Bubbles (Screening)

Overview of Mixture Region Overview of Mixture Design Protocols Introduction to D-optimal Design Introduction to PLS
Application: Bubbles (RSM)
2/10/2004
Mixture Design Add-On
Introduction
Applications of DOE
Designs with process factors
Regular region: Factorials, Composite, Plackett-Burman, Box-Benhken Irregular region: D-optimal design
Designs with mixture factors

Regular region: Axial designs, Simplex Centroid designs Irregular region: D-optimal design
Combined designs of mixture and process factors

Always D-optimal design
2/10/2004
Design of Process Experiments

Experiments where the response Y is a function of the levels or amounts of the factors Y = F(X1, X2, X3, ...Xp) + The changes in the levels or amounts of each factor Xk are not coupled to (= independent of) changes in other factors Thus, orthogonal arrays of experiments can be constructed, such as
factorial designs (screening) composite designs (RSM) Plackett-Burman (screening) others
2/10/2004
Design of Process Experiments
Experimental domain is a regular (half-) hypercube
2/10/2004
Design of Mixture Experiments

Experiments where the response Y is a function of the proportions of the ingredients in the mixture and not of the amounts of the ingredients Y = F(X1, X2, X3, ...Xp) + Response Y: octane rating of gasoline, crushing strength of a tablet, smoothness of a cream, .... The response depends only on the relative proportions of the ingredients of the mixture
Xk = 1
We can express the relative proportions as fractions or percentages
2/10/2004
Design of Mixture Experiments
Linear
Quadratic
Experimental domain is a simplex (or polyhedron) Experimental region has dimensionality k-1, where k is the number of mixture factors
2/10/2004
Process and mixture factors together

Process and M ixture Factors
2/10/2004
Irregular experimental domains: D-optimal design

There are constraints in experimental space
A D-optimal design is a computer generated design that locates the experiments in such a way that the experimental region is well covered
2/10/2004
10
Example: Rocket Propellant

Example, Rocket Propellant: Three components were mixed together to form a rocket propellant. The purpose was to find a propellant with an elasticity of > 2900. Formulation factors
Binder Oxidizer Fuel 0.2-0.4 0.4-0.6 0.2-0.4
What is the "problem" with the worksheet ? Each row sums to 1.0 !!!
Consequences for the design ????

2/10/2004
11

What does the mixture design look like? The experimental domain with 01 bounds on the factors takes the form of a triangle Here we are investigating a limited region of the available experimental domain
Oxidiser Fuel Binder
2/10/2004
12

A quadratic model was used
200 100 0 -100 -200 Oxi Bin*Oxi Oxi*Oxi Fue*Fue Bin*Fue Oxi*Fue Fue Bin Bin*Bin
R2=0.801 Q2=0.249
N=10 DF=4
R2 Adj.=0.553 RSD=160.1071 Conf. lev.=0.95

MODDE 7 - 2004-01-23 10:39:16
The model predicts an area in which an elasticity exceeding 2900 is found
Coefficients show that binder and fuel have the strongest impact on elasticity
We are able to quantitatively describe elasticity in terms of three varied ingredients

2/10/2004
13

1. D efinition of factors and bounds 10. U se of m odel
Illustrations: Tablet preparation & Bubble formation
2/10/2004
15
1. D efin ition of factors an d bound s
10. U se of m odel
2. Selection of experim ental ob jective an d m ixture m odel
9. V isu alization of m od elling results

Aim: To investigate tablet preparation and find out which factors that regulate the release rate of an active substance Mixture Factors:
Cellulose (0 - 1) Lactose (0 - 1) Phosphate (0 - 1) All factors sum to 100% (mixture constraint) Bounds display consistency
3. Selection of cand idate set
8. A nalysis of d ata and evaluation of m odel
4. G eneration of d esign
7. Execution of d esign
5. Evaluation of size and sh ap e of m ixture region
Constraint:
No other extra constraint
Response:
Release rate of the active substance (to be maximized)
2/10/2004
16
10. U se of m odel
Co-ordinates of a Simplex
At each corner, one component is pure, 1.0 At the opposite side, this component is absent, 0.0 The concentration is the same along a line parallel with the opposite side. E.g. for A along horizontal lines. Going from the corner A (A=1.0) down, corresponds to going through A=1.0, A=0.75, A=0.5, ..., A=0.0 In the same way, going through the corner B towards the opposite side, corresponds to going through B=1.0, B=0.75, B=0.5, ..., B=0.0. And analogously for C.
2/10/2004
17
10. U se of m odel

Checking for consistency of bounds Example:
0.1 A 0.5 0.1 B 0.3 0.2 C 0.4.
A LB UB
These bounds are inconsistent After a simple arithmetic check L*A (done automatically in the software) the new bounds become:
0.3 A 0.5 0.1 B 0.3 0.2 C 0.4.
UA
LA
LC
UC
2/10/2004
18
10. U se of model
Tablet: - 2. Selection of experimental objective and mixture model

Optimization
2. Selection of experimental objective and m ixture model
9. Visualization of modelling results
7. Execution of design
5. Evaluation of size and shape of m ixture region
6. D efinition of reference mixture
Mixture model:
Quadratic
y = 0 + 1XMF1 + 2XMF2 + 3XMF3 + 11XMF12 + 22XMF22 + 33XMF32 + 12XMF1*XMF2 + 13XMF1XMF3 + 23XMF2XMF3 + Cox model type with constraints imposed on the regression coefficients
2/10/2004
19
1. Definition of factors and bounds
10. Use of model
Tablet: - 3. Selection of candidate set
9. Visualization of m odelling results
8. Analysis of data and evaluation of m odel
4. Generation of design
6. Definition of reference m ixture
The candidate set is the pool of theoretically possible and meaningful experiments, from which the actual design is selected Here, the candidate set is small:
3 extreme vertices 3 centers of edges 3 interior points 1 overall centroid
In most cases but mixture applications, undesired experiments may be deleted from the candidate set prior to generation of the design
2/10/2004
20
10. Use of model
2. Selection of experimental objective and mixture model
Tablet: - 4. Generation of design
8. A nalysis of data and evaluation of model
5. Evaluation of size and shape of mixture region
6. Definition of reference mixture
The design should contain experiments which are informative and map the experimental region as well as possible In this case the experimental region is regular and then the Simplex Centroid design is applicable
2/10/2004
21

Introduction to regular mixture regions
A (1/0/0)
1 _ 1 _ / _ /1 3 3 3
10. U se of m odel
/0. 5/0
0.5 0 /0/
0.5
.5
B (0/1/0)
0/0.5/0.5
C (0/0/1)
1
X1 + X2 = 1
X2 0
2/10/2004
X1
1
22

In MODDE: Show/Design Region Example: Bubbles (see more info. later)
10. U se of m odel
Glycerol = 0.0
2/10/2004
Glycerol = 0.1
Glycerol = 0.2
23

Alternative designs for regular region (choice of model will be important)
10. U se of m odel
Linear
Quadratic
Special Cubic
2/10/2004
24
1. D efinition of facto rs and bounds
10. U se o f m odel
Tablet: - 6. Definition of reference mixture

The reference mixture is used to anchor the mathematical model - easy to find for regular regions (overall centroid)
1 _ 1 _ / _ /1 3 3 3
9. V isualization o f m odelling results
3. Selection of candida te set
8. A nalysis of data and eva lua tion of m odel
5. E valuation of size and sh ape of m ixture region
A (1/0/0)
0.5
Strongly irregular regions require an efficient algorithm to find overall centroid Serves the same function as the centerpoint does in process design
/0. 5/0
0.5 0 /0/ .5
B (0/1/0)
0/0.5/0.5
C (0/0/1)
Tablet preparation: 1/3,1/3,1/3
2/10/2004
25
10. Use of m odel
Tablet: - 7. Execution of design

Important to carry out experiments in random order This is done in order to break down any systematic time trend to become a non-important and random unsystematic variation
2/10/2004
26
Tablet: - 8. Analysis of data and evaluation of model

Analysis of data with PLS
Investigation: Waaler_rsm (PLS, comp.=3) Summary of Fit
1.00
R2 Q2
10. Use of model
2. Selection of experim ental objective and m ixture model
Investigation: Waaler_rsm (PLS, comp.=3) release with Experiment Number labels

0.95 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.05 N-Probability
0.80
0.60
0.40
1 7 2 3 10
-1 0
9 6
4 5
0.20
0.00 release
N=10 DF=4 R2=0.985 Q2=0.553 R2 Adj.=0.966 RSD=18.7170
MODDE 7 - 2004-01-23 11:02:43
Exp #10 is a probable outlier - Should be re-tested If #10 is deleted and model refitted, Q2 improves (from 0.55 to 0.69) indicates a more valid model
2/10/2004
27
10. Use of model
2. Selection of experimental objective and mixture m odel
9. V isualization of modelling results
Tablet: - 9. Visualization of modelling results

100
50 min
-50
-100 la la*la ce ce*ce ce*la ph*ph ce*ph ph la*ph
N=10 DF=4
R2=0.985 Q2=0.553
R2 Adj.=0.966 RSD=18.7170 Conf. lev.=0.95

MODDE 7 - 2004-01-23 11:03:19
2/10/2004
28
1. D efinition of facto rs and bounds
10. U se o f m odel
Tablet: - 10. Use of model

Use of verifying experiments
Pred No cellulose lactose phosphate release (obs) 1 0.32 0 0.68 --2 0.5 0.125 0.375 370 3 0.333 0 0.667 340 4 0.667 0 0.333 345
9. V isualization o f m odelling results
3. Selection of candida te set
8. A nalysis of data and eva lua tion of m odel
5. E valuation of size and sh ape of m ixture region
release(pred) Lower Upper 363 322 404 293 262 324 363 322 405 320 278 361
Model predicts well except for blend 0.5/0.125/0.375 This strange experiment should be repeated and the model possibly updated with this new information
2/10/2004
29
Summary
DoE is an organized approach
Yields more useful information (influence of all factors together) Yields more precise information in fewer experiments Results evaluated in the light of variability A map of the system is obtained (useful for decision-making)
Mixture factors are constrained by Xk = 1

Such factors cannot be manipulated independently of one another Experimental region is a regular/irregular simplex
Approach to mixture design very similar to approach used for conventional process designs
2/10/2004
30
Application: Bubbles (Screening)
1. D efinitio n of factors and bounds
10. U se of m odel
BubbleScr: - 1. Definition of factors and bounds

Aim: To investigate bubble formation and find out which factors that dominate bubble lifetime Process Factors:
Temperature (7 - 21C; refrigerator/kitchen temperature) Time (1 - 13 - 25h)
9. V isualizatio n of m odelling results
4. G eneration of desig n
7. E xecution of desig n
Tap Water, Ume (0.4 - 0.8) Glycerol, APOTEKETS (15% water content / 0.0 - 0.2)
Constraint:
0.2 DWL1 + DWL2 0.5
Response:
Lifetime of bubbles (sec) obtained with childrens bubble wand. Time until bursting was measured for bubbles of 4-5 cm size (diameter)
Mixture Factors:
Dish-washing liquid 1, SKONA, ICA (0 - 0.4) Dish-washing liquid 2, NEUTRAL, ADACO (0 - 0.4)
2/10/2004
32
10. U se of m odel
Mixture Region
Mixture components are not independent: X1 + X2 + ....+ Xp = total usually total = 1 or 100% Mixture region is constrained If NO additional bounds on the components; that is every component can vary between 0 and 1:
2. Selection of experim ental objective an d m ixture m odel
9. V isualization of m od elling results
2/10/2004
33
10. U se of m odel
Properties of the Mixture Region

Size of region: Shape of region:
The mixture region might be very small. The size is inferred from calculations of Range Lower (RL) and Range Upper (RU) If the region is a regular simplex, several classical mixture designs are available If the region is irregular, the experiments are laid out D-optimally
Consistency of bounds:
Some combinations of bounds are disallowed Implied bounds arise from the stated bounds
The above properties are handled automatically in software (MODDE), but unawareness of them might lead to bad or unexpected results
2/10/2004
34
10. U se of m odel
Types of Bounds
Lower bounds only
Li Xi 1.0 0.4 Tap water 1.0 (example, not realistic for bubble formation)
Lower and upper bounds

Li X i U i 0.1 DWL1 0.3
Upper bounds only

0 Xi Ui 0 DWL1 0.4
Relational constraints
0.1 X1 + X5 0.5 0.3 DWL1 + DWL2 0.5 Tap water + DWL1 + DWL2 + Glycerol 70 (SEK/l)
2/10/2004
35
10. U se of m odel
Lower Bounds Only L-simplex

Example: Rocket experiment 0.2 binder 1 0.4 oxidizer 1 0.2 fuel 1 What is the shape of the mixture region ?
A (Binder)
Oxidizer > 0.4
Binder > 0.2
B (Oxidizer)
Fuel > 0.2
C (Fuel)
2/10/2004
36
10. U se of m odel
Upper Bounds Only -- U-simplex

A
Standard mixture design (Axial extended) applicable
2/10/2004
37
Upper Bounds Only -- Irregular region (No simplex)

A
10. U se of m odel
D-optimal design the only option
2/10/2004
38
10. U se of m odel
Lower and Upper Bounds - Regular region

Li xi Ui
Definition of Extreme Points : (Ui,Lji)
Upper (or Lower) bound of one factor with Lower (or Upper) bound of all the others
If all extreme points are valid Simplex (Regular region) Example : 0.1 A 0.8 0.1 B 0.8 0.1 C 0.8
All extreme points are valid
2/10/2004
39
10. U se of m odel
Lower and Upper Bounds - Irregular region

Experimental region is the intersection of the U simplex and the L simplex Most of the time the resulting region is an irregular polyhedron Example:
0.2 < A < 0.6 0.1 < B < 0.6 0.1 < C < 0.5
These bounds are consistent, but the experimental region is irregular D-optimal design
2/10/2004
40
10. U se of m odel
Mixture Region, Lower and Upper bounds

It is easy to display an irregular region in a threecomponent mixture With five or more ingredients, the human brain cannot overview the situation A computer generated (Doptimal) design solves the problem
A
2/10/2004
41
10. U se of m odel
Relational Constraints - Three mixture factors
The mixture region is almost never a simplex, but an irregular polyhedron
Mixture region is irregular following the definition of the relational constraint A + B 0.65 shown as the dotted line
B
2/10/2004
C
42
10. U se of m odel
Relational Constraints - Four mixture factors
Example: 2 x1 + x2 0.30
2/10/2004
43
10. U se of m odel
Summary
Mixture has only Lower Bounds
Experimental Region is always a simplex
Mixture has only Upper Bounds

Experimental region is a simplex if the sum of the q-1 largest upper bounds is 1.0 Irregular region often the result, which implies D-optimal design
Mixture has Lower and Upper bounds

Region is often irregular, which implies D-optimal design. MODDE detects inconsistent bounds and proposes a change
Mixture has relational constraints

Region is often irregular, which implies D-optimal design. MODDE detects inconsistent bounds or constraints and proposes a change
2/10/2004
44
BubbleScr: - 2. Selection of experimental objective and mixture model

Screening
10. Use of model
Mixture model:
Process Model: Mixture Model: Process*Mixture Model: Interaction Linear Interaction
y = 0 + 1XPF1 + 2XPF2 + 3XMF3 + 4XMF4 + 5XMF5 + 6XMF6 + 12XPF1*XPF2 + 13XPF1*XMF3 + 14XPF1*XMF4 + 15XPF1XMF5 + 16XPF1XMF6 + 23XPF2XMF3 + 24XPF2XMF4 + 25XPF2XMF5 + 26XPF2XMF6 +
2/10/2004
45
10. Use of m odel
BubbleScr: - 3. Selection of candidate set

Overview of candidate set 48 extreme vertices (4 process "corners" * 12 mixture extreme vertices) 48 centers of edges (4*12) 72 centroids of highdimensional surfaces (4*18) 1 overall centroid
2. Selection of experimental objective and m ixture m odel
8. Analysis of data and evaluation of model
2/10/2004
46
10. Use of m odel

The proposed model needs 13 degrees of freedom (DF)
1 DF for the constant 2 DF for the linear terms of the process factors 3 DF for the linear terms of the mixture factors 1 DF for the process*process interaction 6 DF (2*3) for the process*mixture interactions
Add 5 extra experiments to get enough DF In addition, 2 supplementary experiments are recommended to handle the complexity introduced by the linear constraint lead no. of experiments, N = 20 (Note: no replicates included in this estimate)
2/10/2004
47
10. Use of m odel
Designs for Regular and Irregular Regions

Regular Region (Simplex): Classical mixture designs
Screening: Optimization: Determine component effects Axial Designs Good approximation of the response Simplex Centroid Designs
Irregular region (when the Experimental Region is not a Simplex):

Screening & Optimization: D-Optimal designs
2/10/2004
48
10. Use of m odel
Axes of a Simplex
Definition: The xi axis of the simplex is the one-dimensional subspace of the simplex where: xj = (1-xi)/(q-1) for all ji
The xi axis is the line perpendicular to the xi = 0 base of the simplex and passing through the centroid of the simplex
2/10/2004
B
Axes of components
A (x1), B (x2) and C (x3)
49
10. Use of m odel
Axial Designs
Axial designs consist of mixtures situated entirely on the axes of the simplex With Axial designs most of the points are positioned inside the simplex and consist of complete mixtures of q component blends Axial designs are recommended for use when component effects are to be measured for screening experiments and when linear models are to be fitted Extended Axial
2/10/2004
Standard Axial
50
10. Use of m odel
Simplex Centroid Designs

Simplex Centroid design is used for optimization; normally, it has experiments situated
at vertex points at edge centers at lower-dimensioned face centers at interior check points at the overall centroid
or a combination of these
2/10/2004
51
10. Use of m odel
Designs When the Experimental Region is Irregular
The Extreme Vertices designs of McLean-Anderson provide the best available solution to the constrained design The Extreme Vertices are those points that lie on the intersection of the constrained boundaries Extreme Vertices are generated by forming all possible combinations of the q-1 constraints, and calculating the level of the qth component.
This gives a q*2q-1 possible points The Extreme Vertices are those points whose component levels lie within the constraints
2/10/2004
52
10. Use of m odel
Extreme Vertices
Rapidly increasing complexity q
2 3 4 5 6 7 8 9 10 11 12
points
4 12 32 80 192 448 1024 2304 5120 11264 24576
2/10/2004
53
10. Use of m odel
Example of Finding the Extreme Vertices

Mixture system with the following constraints:
0.2 A 0.6; 0.1 B 0.6; 0.1 C 0.5 Vertex A B C ________________________ 1 .20 .10 * 2 .20 .60 .20 * 3 .60 .10 .30 4 .60 .60 5 .20 .10 * 6 .20 .30 .50 * 7 .60 .30 .10 8 .60 .50 9 .10 .10 * 10 .40 .10 .50 * 11 .30 .60 .10 12 .60 .50
7 3 11 2 10 6
B
2/10/2004
C
54
10. Use of m odel
Mixture Design when Region is Irregular

We have to consider:
Extreme Vertices Edge Centers Face Centers Overall Centroid
Design: Selected D-Optimally: Screening: Linear Model

Subset or All Extreme Vertices Overall Centroid
2/10/2004
Optimization : Quadratic Model

All or Subset Vertices Edge Centers Face Centers Centroid
55
10. Use of m odel
Introduction to D-optimal design
A D-optimal design is a computer generated design, and consists of the best subset of experiments selected from the candidate set For a given model, Y = X + , the following can be said regarding the D-optimal approach:
the selected runs maximize the determinant of the matrix X'X these experiments span the largest volume possible in the experimental region
A D-optimal design can be tailored to support an irregular experimental region, or a very complex problem set-up (process + mixture)
2/10/2004
56
10. Use of m odel
A small D-optimal example

Example: 22 full factorial design with factors x1 and x2
run 1 2 3 4
x1 -1 1 -1 1
x2 -1 -1 1 1
Model y = b0 + b1x1 + b2x2 + + b12x1x2 + e
Model in matrix form y = Xb + e b = (XX)-1Xy
2/10/2004
57
10. Use of m odel
D-optimal example, the Covariance matrix (XX)-1

X
1 1 1 1 -1 1 -1 1 -1 -1 1 1 1 -1 -1 1 1 -1 -1 1 1 1 -1 -1
X
1 -1 1 -1 1 1 1 1
(XX)
4 0 0 0 0 4 0 0 0 0 4 0 0 0 0 4 0.25 0 0 0
(XX)-1
0 0.25 0 0 0 0 0.25 0 0 0 0 0.25
Precision in b from:
(XX)-1 * RSD * t smallest (XX)-1 largest XX

58
2/10/2004
10. Use of m odel
A second small D-optimal example

Problem: two factors (x1/x2) varied in three levels Proposed model:
y = b0 + b1x1 + b2x2 + e model needs 3 DF
det=0
1 1 1
det=1
(9! / (3!*6!)) = 84 ways of selecting 3 trials out of 9 Maximize the determinant det(XX) Best precision in estimated regression coefficients with det = 16
-1
-1
-1
-1
-1 -1 1 0 1
det=4
1 1
det=9
det=16
-1 -1 0 1
-1
-1
-1
-1
2/10/2004
59
10. Use of m odel
How to compute a determinant

Example: experiments spread according to a determinant of 4
1
X
1 1 1
3 -1 0 -1 1 0
X
0 -1 1
0 0 2 3 -1 0
-1
-1
XX
-1 1 0 0 0 2
3 -1 0 -1 1 0
-1 0 0
1 -1 0
-1 1 0
1 0 -1
3 -1 0
1 0 1
-1 1 0
3 -1 0
0 0 2
(3*1*2) + (-1*0*0) + (0*-1*0) - (0*1*0) - (0*0*3) - (2*-1*-1) = 4

2/10/2004
60
10. Use of m odel
Features of the D-optimal approach

Assumes that the selected regression model is "correct" and "true Sensitive to model choice Potential terms may be added to protect against this sensitivity
2/10/2004
61
10. Use of m odel
Evaluation criteria
Two common evaluation criteria: Condition number
- ratio of largest to smallest singular value of X - a measure of sphericity - 1 is lower (ideal) limit, denotes orthogonal design
G-efficiency
- computed as Geff = 100*p/n*d - compares the efficiency of a D-optimal design to that of a fractional factorial design - 100% is the upper limit and designates that a fractional factorial design was obtained - above 60-70% is recommended
2/10/2004
62
10. Use of m odel

1 DF for the constant 2 DF for the linear terms of the process factors 3 DF for the linear terms of the mixture factors 1 DF for the process*process interaction 6 DF (2*3) for the process*mixture interactions
Add 5 extra experiments to get enough DF In addition, 2 supplementary experiments are recommended to handle the complexity introduced by the linear constraint lead no. of experiments, N = 20 (Note: no replicates included in this estimate)
2/10/2004
63
10. Use of m odel
lead no. of experiments, N = 20 (Note: no replicates included in this estimate) Due to the element of randomness in the D-optimal search, we recommend to explore N 4 runs and generate 5 versions for each level of N 4 We explored N=16 to N=24 45 alternative D-optimal designs Best design with N = 16 (Geff = 76%, CondNo = 2.7)
2/10/2004
64
10. Use of m odel

Best design with N = 16 (Geff = 76%, CondNo = 2.7)
2 series of 4 replicates were added 24 runs
2/10/2004
65
10. U se of m odel
In MODDE: Show/Design Region It was concluded that the shape of the experimental region was reasonable and not too distorted, and of sufficient size
Glycerol = 0.0
2/10/2004
Glycerol = 0.1
Glycerol = 0.2
66
1. D efinition of facto rs an d b ou nd s
10 . U se of m odel
BubbleScr: - 6. Definition of reference mixture
2. Selection of ex perim ental ob jectiv e an d m ixture m o del
9. V isu alizatio n o f m od ellin g resu lts
3. Selection of ca nd id ate set
8. A n alysis of d ata a nd evalua tio n of m od el
4. G eneration o f d esign
7. E xecutio n o f d esign
5. E valuation of size an d sh ap e of m ixtu re region
6. D efin itio n of reference m ixtu re
The reference mixture is used for anchoring the mathematical model easy to find for regular regions (overall centroid) Strongly irregular regions require efficient algorithm to find the centroid Serves the same function as the center-points in process design Calculated reference mixture: (0.183/0.183/0.55/0.084) (DWL1 / DWL2 / water / glycerol) Manually modified reference mixture: (0.2 / 0.2 / 0.5 / 0.1)
2/10/2004
67
1. D efinition of facto rs an d b ou nd s
10 . U se of m odel
Computation of Centroid for Constrained Region

Several possibilities:
Overall Center of Mass (COM)
2. Selection of ex perim ental ob jectiv e an d m ixture m o del
9. V isu alizatio n o f m od ellin g resu lts
3. Selection of ca nd id ate set
8. A n alysis of d ata a nd evalua tio n of m od el
4. G eneration o f d esign
7. E xecutio n o f d esign
5. E valuation of size an d sh ap e of m ixtu re region
- computationally extensive
Averages of all extreme vertices (AVG) Range Normalized Midrange (used in MODDE):
RNM (s1, s2, si , ., sq) si = mi - [Ri*(mj - 1.0)/Rj] i = 1 to q; j = 1 to q
Range: Ri = Ui - Li
Midrange: mi = (Ui + Li)/2
2/10/2004
68
10. U se of model
BubbleScr: - 7. Execution of design

Carry out experiments in randomized order Here, pseudo-random order was used due to the time factor; Look at RunOrder column ExpNo ExpName RunOrder InOut Temp Time DWL1 DWL2 Water Glycerol Lifetime
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 N1 N2 N3 N4 N5 N6 N7 N8 N9 N10 N11 N12 N13 N14 N15 N16 N17 N18 N19 N20 N21 N22 N23 N24 15,4 3,1 22,8 7,3 4,2 18,7 17,6 16,5 2,17 19,22 14,21 12,19 21,24 20,23 13,20 5,18 10,13 6,10 11,14 23,15 1,9 8,11 9,12 24,16 In In In In In In In In In In In In In In In In In In In In In In In In 7 7 7 7 21 21 21 21 7 7 7 7 21 21 21 21 7 7 7 7 21 21 21 21 1 1 1 1 1 1 1 1 25 25 25 25 25 25 25 25 13 13 13 13 13 13 13 13 0 0.4 0 0.2 0.4 0.1 0.2 0 0.4 0.1 0.2 0 0 0.4 0 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.4 0.1 0.2 0 0 0.4 0 0.2 0 0.4 0 0.2 0.4 0.1 0.2 0 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.4 0.5 0.8 0.6 0.4 0.5 0.8 0.6 0.4 0.5 0.8 0.6 0.4 0.5 0.8 0.6 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.2 0 0 0.2 0.2 0 0 0.2 0.2 0 0 0.2 0.2 0 0 0.2 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 139 19 14 60 208 15 11 35 362 25 26 52 213 40 33 74 94 78 132 117 61 54 77 43
Bubble lifetime ranges between 11 and 362 sec

2/10/2004
69
10. Use of model
Initial regression model was fitted with PLS

R2 Investigation: Bubb_scr (PLS, comp.=2) Q2 Summary of Fit Model Validity
0.60
Reproducibility
1.00 0.80
0.40
0.60 0.40 0.20 0.00 -0.20 Lifetime~
N=24 DF=11 Cond. no.=2.7203 Y-miss=0
0.20 s 0.00 -0.20 -0.40 -0.60 Ti Te*Ti Te*Gly Ti*DW1 Te*DW1 Te*DW2 Ti*DW2 Te*Wa Ti*Gly Gly Ti*Wa Te DW1 DW2 Wa
N=24 DF=11
R2=0.812 Q2=0.185
R2 Adj.=0.608 RSD=0.2476 Conf. lev.=0.95

MODDE 7 - 2004-02-02 12:46:48
Poor model - many insignificant interaction terms that should be removed
2/10/2004
70
10. Use of model
When to use PLS

PLS is a pertinent choice, if (i) there are several correlated responses in the data set, (ii) the experimental design has a high condition number (>10), or (iii) there are small amounts of missing data in the response matrix (iv) the application involves a mixture (formulation) design
2/10/2004
71
10. Use of model
PLS -- Notation
K M N A = = = = number of X variables number of Y variables number of observations number of PLS components
T P W U C
= = = = =
matrix of X-scores with col.s t1,.., tA (vectors) matrix of X-loadings with col.s p1,.., pA (vectors) matrix of PLS X-weights with col.s w1,.., wA (vectors) matrix of Y-scores with col.s u1,.., uA (vectors) matrix of PLS Y-weights with col.s c1,.., cA (vectors)
2/10/2004
72
10. Use of model
PLS -- Scaling of variables

x3
measured values & "length"
3
x1 x2 x3
unit variance scaling
20
x1
x2
Defining/Selecting the length of variable axes (X and Y-spaces) Recommended: To set each axis to unit length (unit variance scaling)
2/10/2004
73
10. Use of model

x3
factors/predictors K=3
observations
responses M=3
y3
X
N N
Y
x2 y2
x1
y1
For each matrix, X and Y, we construct a space with K and M dimensions, respectively (here K=M=3) Each X- and Y-variable has one coordinate axis with the length defined by its scaling, typically unit variance
2/10/2004
74
10. Use of model
Each observation is represented by one point in the X-space and one in the Y-space As in PCA, the initial step is to calculate and subtract the averages; this corresponds to moving the coordinate systems
2/10/2004
75
10. Use of model

x3 y3
x2
x1 y1
Same observation
y2
The mean-centering procedure implies that the origos of the coordinate systems are repositioned
2/10/2004
76
10. Use of model

x3
Comp 1 (t1) y3 Comp 1 (u1)
x2
x1 y1
Projection of observation i
y2
The first PLS-component is a line in X-space and a line in Y-space, calculated to a) well approximate the point-swarms in X and Y and b) maximize covariance between the projections (t1 and u1) These lines pass through the average points
2/10/2004
77
10. Use of model
PLS- Geometric Interpretation, 5
The projection coordinates, t1 and u1, in the two spaces, X and Y, are connected and correlated through the inner relation ui1 = ti1 + hi (where hi is a residual) The slope of the dotted line is 1.0
2/10/2004
78
10. Use of model

x3 Comp 1 (t1) y3
Comp 1 (u1) Comp 2 (u2)

x2 y2
Comp 2 (t2)
x1
y1
The second PLS component is represented by lines in the X- and Y-spaces orthogonal to the lines of the first component, also going through the average points. These lines, t2 and u2, improve the approximation and correlation as much as possible.
2/10/2004
79
10. Use of model
The second projection coordinates (t2 and u2) correlate, but less well than the first pair of latent variables By inserting X-values of a new observation into the model, we obtain its t1- and t2scores, which through the inner relation give values of u1 and u2, which, in turn, enable predicted values of Y to be computed
2/10/2004
80
10. Use of model

x3
Comp 1 (t1) y3 Comp 1 (u1) Comp 2 (u2)
Comp 2 (t2)
x2
x1 y1
y2
The PLS components form planes in X- and Y-spaces The variability around the X-plane is used to calculate a tolerance interval within which new observations similar to the training set will be located. This is of interest in classification and prediction.
2/10/2004
81
10. Use of model

Repeated plotting of successive pairs of latent variables will give a good appreciation of the correlation structure
2/10/2004
82
10. Use of model
PLS -- Overview
X = 1* x + T* P'+E Y = 1* y + U *C'+F = 1 * y + T * C'+G
(because U = T + H) (inner relation)
PLS Projection of X that both approximates X well, and correlates with Y

2/10/2004
differences to
PCA Projection of X that is an optimal approximation of X (least squares fit)

83
10. Use of model
PLS -- Parameter properties

For each component: 1) t are linear combinations of X with weight w - t is a summary of the X variables that are correlated with Y 2) u are linear combinations of Y with weight c - u is a summary of the Y variables
3) w are the correlation coefficients between the x's and u - Columns of X highly correlated with Y are given high weights 4) At Convergence for the Orthogonality: - p is computed so that t*p' is the "Best approximation of X" - t*p' is removed from X for the next component
2/10/2004
84
10. Use of model
Summary of PLS
PLS is a multivariate regression method which is useful for handling complex DOE problems PLS is especially useful when:
(i) there are several correlated responses in the data set (ii) the experimental design has a high condition number (iii) there are small amounts of missing data in the response matrix
PLS calculates a new variable, t, summarizing X, and a another new variable, u, summarizing Y, and investigates the correlation between them All diagnostic tools available for MLR are retained for PLS In addition, PLS provides other diagnostic tools, such as, scores, loadings, and VIP
2/10/2004
85
10. Use of model
Initial regression model was fitted with PLS

0.60
Reproducibility
1.00 0.80
0.40
0.60 0.40 0.20 0.00 -0.20 Lifetime~
N=24 DF=11 Cond. no.=2.7203 Y-miss=0
0.20 s 0.00 -0.20 -0.40 -0.60 Ti Te*Ti Te*Gly Ti*DW1 Te*DW1 Te*DW2 Ti*DW2 Te*Wa Ti*Gly Gly Ti*Wa Te DW1 DW2 Wa
N=24 DF=11
R2=0.812 Q2=0.185
R2 Adj.=0.608 RSD=0.2476 Conf. lev.=0.95

MODDE 7 - 2004-02-02 12:46:48
Poor model - many insignificant interaction terms that should be removed
2/10/2004
86

All interaction terms were eliminated and the model was refitted
10. Use of model
Investigation: Bubb_scr (PLS, comp.=2) Lifetime~ with Experiment Number labels

0.98 0.95 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.05 0.02 N-Probability
Reproducibility
1.00
0.80
0.60
0.40
0.20
12
2 10
-1
20 5 13 23 17 1 3 9 11 18 21 14 227 4 24 86 16
0 Standardized Residuals 1
19
15
0.00 Lifetime~
N=24 DF=18 Cond. no.=2.1537 Y-miss=0
N=24 DF=18
R2=0.796 Q2=0.640
R2 Adj.=0.739 RSD=0.2018
MODDE 7 - 2004-02-02 12:50:00
2/10/2004
87
10. Use of model
BubbleScr: - 9. Visualization of modelling results

0.30 0.20 0.10 0.00 -0.10 -0.20 Ti DW1 DW2 Gly Te Wa
N=24 DF=18
R2=0.796 Q2=0.640
R2 Adj.=0.739 RSD=0.2018 Conf. lev.=0.95

MODDE 7 - 2004-02-02 12:51:01
Regression coefficients (reference mixture: 0.2/0.2/0.5/0.1)

2/10/2004
88
1. D efin ition of factors and b ou nds
10. U se of m odel
BubbleScr: - 10. Use of model
9. V isu alization of m odellin g results
8. A nalysis of d ata and evaluation of m od el
5. Evaluation of size an d shape of m ixtu re region
6. D efin ition of reference m ixtu re
MODDE optimizer was used to propose two verifying experiments
2/10/2004
89
Summary
Proposed working strategy works for
mixture regions of regular geometry mixture regions of irregular geometry experimental series involving both process and mixture factors
Strategy is oriented towards a graphical presentation of modelling results In BubbleScr it was possible to raise bubble lifetime from 11 sec. to 6.02 min. Verifying experiments of model predictions gave increased lifetime of 18.40 min. Bubble lifetime further optimized by RSM D-optimal design (see section 5)
2/10/2004
90
Application: Bubbles (RSM)
1. D efinition of fa ctors and bounds
10. U se of m odel
BubbleOpt: - 1. Definition of factors and bounds

Verifying experiment #1was used to adjust the bounds of the four mixture factors Process Factors:
Temperature kept constant (+7C) Time kept constant at 25h
2. Selection of experim ental objective an d m ixture m odel
9. V isualizatio n o f m odellin g results
8. A nalysis of d ata a nd evaluation of m odel
Tap Water, Ume (0.2 - 0.4) Glycerol, APOTEKETS (15% water content / 0.2 - 0.4)
Constraint: 0.3 DWL1 + DWL2 0.5 Response: Lifetime of bubbles (sec) obtained
with childrens bubble wand. Time until bursting was measured for bubbles of 4-5 cm size (diameter)
Mixture Factors: Dish-washing liquid 1, SKONA,

ICA (0.1 - 0.3) Dish-washing liquid 2, NEUTRAL, ADACO (0.1 - 0.3)
2/10/2004
92
10. U se of model
BubbleOpt: - 2. Selection of experimental objective and mixture model

2. Selection of experimental objective and mixture model 3. Selection of candidate set 4. Generation of design 5. Evaluation of size and shape of mixture region
Optimization
Mixture model:
Quadratic
y = 0 + 1XMF1 + 2XMF2 + 3XMF3 + 4XMF4 + 11XMF12 + 22XMF22 + 33XMF32 + 44XMF42 + 12XMF1*XMF2 + 13XMF1XMF3 + 14XMF1XMF4 + 23XMF2XMF3 + 24XMF2XMF4 + 34XMF3XMF4 +
2/10/2004
93
10. Use of model
2. Selection of experimental objective and mixture model
BubbleOpt: - 3. Selection of candidate set

Overview of candidate set 12 extreme vertices 40 centers of edges 10 centroids of high-dimensional surfaces 1 overall centroid
2/10/2004
94
10. U se of m odel
BubbleOpt: - 4. Generation of design

1 DF for the constant 3 DF for the linear terms 3 + 3 DFs for the quadratic and interaction terms
6. Definition of reference m ixture
Selected design with 24 runs (Geff = 83%, CondNo = 16.8)
Add 5 extra experiments to get enough DF In addition, 2 supplementary experiments are needed to handle the complexity introduced by the linear constraint lead no of experiments = 17
2/10/2004
95
10 . U se of m odel
BubbleOpt: - 5. Evaluation of size and shape of mixture region
Glycerol = 0.2
Glycerol = 0.3
Glycerol = 0.4
2/10/2004
96
1. D efin ition of factors and b ou nds
10. U se of m odel
BubbleOpt: - 6. Definition of reference mixture
9. V isu alization of m odellin g results
8. A nalysis of d ata and evaluation of m od el
5. Evaluation of size an d shape of m ixtu re region
Calculated reference mixture: (0.2 / 0.2 / 0.3 / 0.3) (DWL1 / DWL2 / water / glycerol)
This reference mixture is identical to the previously used verifying experiment
2/10/2004
97
10. U se of model
BubbleOpt: - 7. Execution of design

Carry out experiments in randomized order
Bubble lifetime ranges between 647 and 1348 sec
2/10/2004
98
BubbleOpt: - 8. Analysis of data and evaluation of model

Regression model was fitted with PLS - good model
R2 Investigation: Bubb_rsm (PLS, comp.=2) Q2 Model Validity Summary of Fit
10. U se of model
Investigation: Bubb_rsm (PLS, comp.=2) Lifetime~ with Experiment Number labels

0.98 0.95 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.05 0.02 N-Probability
Reproducibility
1.00
0.80
0.60
0.40
0.20
11
7
-1
4 15 22 23 14 21 19 13 18 8 20 10 3 24 2 117 16 6 9 512
0 Standardized Residuals 1
0.00 Lifetime~
N=24 DF=14 Cond. no.=12.3206 Y-miss=0
N=24 DF=14
R2=0.919 Q2=0.708
R2 Adj.=0.868 RSD=0.0358
MODDE 7 - 2004-02-02 13:03:01
2/10/2004
99
10. U se of model
BubbleOpt: - 9. Visualization of modelling results

Investigation: Bubb_rsm (PLS, comp.=2) Scaled & Centered Coefficients for Lifetime~
0.060 0.040 0.020 0.000 -0.020 -0.040 -0.060 Gly*Gly DW1 DW2 Gly Wa Wa*Wa DW1*Gly DW1*DW1 DW2*DW2 DW1*DW2 DW2*Gly DW1*Wa DW2*Wa Wa*Gly
Glycerol = 0.2 Temp = 14 Time = 13
N=24 DF=14
R2=0.919 Q2=0.708
R2 Adj.=0.868 RSD=0.0358 Conf. lev.=0.95

MODDE 7 - 2004-02-02 13:05:10
2/10/2004
100
1. D efin itio n of factors and b ou nds
10. U se of m odel

Raw Data Plot
3.15 Log (Lifetime) 3.10 3.05 3.00 2.95 2.90 2.85 2.80 40 4 6 12 7 18 3 50 60 Cost 70 22 23 21 24 1517 11 10 9 5 14 13 20 1 8 19 16 2
2. Selection of experim ental ob jectiv e an d m ixture m odel
9. V isu alizatio n of m odellin g results
8. A nalysis of d ata and evalua tion of m od el
7. E xecution of d esign
5. E valuation of size an d shape of m ixtu re region
Ingredient cost is easy to take into consideration
80
2/10/2004
101
1. D efin itio n of factors and b ou nds
10. U se of m odel

Lowest ingredient cost with longlasting bubbles
2. Selection of experim ental ob jectiv e an d m ixture m odel
9. V isu alizatio n of m odellin g results
8. A nalysis of d ata and evalua tion of m od el
7. E xecution of d esign
5. E valuation of size an d shape of m ixtu re region
2/10/2004
102
Conclusions, Bubble example

Sequence 1) Screening, 2) RSM is very fruitful for rational experimental work We were able to increase bubble lifetime from 6.02 - 22.28 min Key to success was to increase glycerol substantially Long-lasting bubbles are obtained with
Cooled solution 25 h settling time (not popular for kids) Formulation
DWL1 DWL2 Water Glycerol 0.23 0.1 0.27 0.4
Red plastic bubble wand
2/10/2004
103
Mixture Designs, Summary

To obtain the best design you must determine:
Factors Bounds low-high Type of region: Regular? Irregular? Experimental Objective: Screening? Optimization? Number of Runs
. and. Use PLS for modelling!!!
2/10/2004
104
Design of Experiments (DoE) Pharma Applications

Section 13: Exercises
2/10/2004
Overview of Exercises Layout

Each exercise contains the following headlines
Background (Why this investigation?) Objective (What is the goal/objective with the exercise?) Data (Description of X and Y and observations, originator(s) and literature source(s)) Tasks (What you are expected to do in this exercise) Solutions (A proposed solution to the tasks given) Conclusions (Emphasising main points of the exercise)
Please do not hesitate to ask the course instructor(s) for help/advice Remember that our solutions are just proposals; other alternatives might exist
2/10/2004
Exercises
Getting started
ByHand CakeMix
Optimization
Chiral Separation Metabolism RGA-Phase 3 Willge DrogenD
D-optimal Design
Model Updating
Screening: Full factorial designs

Pain Tablets Protein Spray-Drying
Blocking
Blocking
Mixture Design
Mixture Region Training Waaler Rocket Corne59 Bubbles Lowarp
Robustness Testing
Nonafact RGA-Phase 4 HPLC Robustness
Screening: Fractional factorial designs

Pilot Plant RGA-Phase 1 RGA-Phase 2 Chromspher_B
Robust Design
CakeTaguchi LoafVolume
2/10/2004
DOE-Exercise ByHand (Full Fac)

Chemical synthesis: Reduction of Enamine
Background
Enamines are reduced by formic acid to saturated amines. In this example morpholine-camphor enamine is the starting material. To investigate the amount of formic acid necessary and at which temperature the reaction should be carried out, design of experiments (DOE) was used.
Objective
The original objective was to make a model for three responses. Our first objective is to do calculations by hand to get an understanding of the arithmetic involved. After that, you should familiarise yourself with the software and perform the same calculations using the computer. The experimental goal was to minimise the amount of side product (Camphor) and the amount of unreacted starting material (Enamine), whilst maximising the yield of the desired product.
Data Factors
x1 x2 y1 y2 y3 Amount formic acid/enamine (mole/mole) Reaction temperature (C) 1.0 25
Levels
0 1.25 62.5 + 1.5 100
Responses
Camphor (side product)% Enamine unreacted % The desired product %
Goals
to be minimised to be minimised to be maximised
Factors
Exp. no 1 2 3 4 5 6 7 x1 1 1.5 1 1.5 1.25 1.25 1.25 x2 25 25 100 100 62.5 62.5 62.5 y1 6.7 10.5 5.5 7.7 7.5 7.9 7.8
Responses
y2 12.5 14.0 0.0 0.0 13.1 13.5 13.3 y3 80.4 72.4 94.4 90.6 84.5 85.2 83.8
Tasks
Task 1
Calculate by hand the coefficients of the equation Y = b0 + b1x1 + b2x2 + b12x1x2 +e. Do these calculations only for the first response, Y1. Do not include the centre points in these calculations (include them only when calculating the constant, b0); centre points are used for diagnostics. Hint: use the sign table mentioned in lecture.
Page 1 (5)
Task 2
Initiate a new investigation in MODDE and define the two factors and the three responses according to the information above. Do File/New and give a name of the investigation. Press Next. Press New (or double-click on the empty row) and enter the name, abbreviation, unit, and low and high settings of the first factor. Press Add another and fill in the name, abbreviation, unit, and settings of the second factor. Press OK. Press Next. Now we have defined the factors. Press New (or double-click on the empty row) and enter the name, unit, and abbreviation of the first response. Press Add another and give the details of the second response. Press Add another and enter the information regarding the third response. Press OK. Press Next. Now we have defined the responses. Select Screening. Press Next. Make sure that the selected design is the Full Factorial design in four runs. Verify that the number of Centre Points = 3 and Total runs = 7. Press Finish. Set Worksheet Run order to detect curvature and press OK. Now we have generated the experimental design. Enter the response values in the resulting worksheet. Now we are ready for data analysis.
Task 3
Evaluate the raw data. Make replicate plots (Worksheet/Replicate Plot) and histograms (Worksheet/Histogram) to examine the responses. Do Analysis/Fit. Evaluate the model. For which responses is the model reliable? What do you think could be the problem with the misbehaving response? Discuss.
Task 4
Look at the contour plot for each response (Prediction/Contour Plot Wizard). Which conditions should be chosen for preparative large-scale reduction of enamines of the morpholine-camphor type?
Page 2 (5)
Solutions to ByHand
Task 1 Sign table b0 + + + + + + + b1 + + 0 0 0 b2 + + 0 0 0 b12 + + 0 0 0
b0=(6.7+10.5+5.5+7.7+7.5+7.9+7.8)/7=7.657 b1=(-6.7+10.5-5.5+7.7)/4=1.5 b2=(-6.7-10.5+5.5+7.7)/4=-1.0 b12=(6.7-10.5-5.5+7.7)/4=-0.4
Task 3
We start by evaluating the raw data. The three replicate plots indicate that the replicate error is small for each response, which is favourable for the data analysis. It is possible to use the replicate plot to get a rough understanding of the relationships between the factors and the responses. We are going to fit an interaction model to each response. For such a model to be valid, the measurement values of the centre-points should be found in the middle part of the response interval. This is the case for y1 and y3, but not for y2. Hence, the replicate plot for y2 suggests that the relationship between y2 and the factors is curved (non-linear), which is impossible to describe with an interaction model.
Investigation: Byhand Investigation: Byhand Investigation: Byhand
2
MODDE 7 - 2003-11-12 09:20:55
1
10
6 7 5
95 90 y3 85 80
3 4 6 5 7 1 2
1 2 3 Replicate Index
MODDE 7 - 2003-11-12 09:20:08
4 1 3
4 5
6 7 5
y1
y2 5
0 1 2 3
3
4 Replicate Index
4
5
75
MODDE 7 - 2003-11-12 09:20:35
Next, we create one histogram for each response. The three responses are approximately normally distributed and a need for response transformation cannot be detected.
Investigation: Byhand Investigation: Byhand Investigation: Byhand
Histogram of y1 5 4 Count Count 3 2 1 0 6 5 4 3 2 1 5 7 Bins

MODDE 7 - 2003-11-12 09:16:44
Histogram of y2 4 3 Count 2 1 0
Histogram of y3
11
5.5 Bins
11
16.5
72
81 Bins
90
99
MODDE 7 - 2003-11-12 09:16:32
MODDE 7 - 2003-11-12 09:16:18
Page 3 (5)
After the raw data evaluation it is appropriate to carry out the regression modelling. According to the summary of fit plot, the model is reliable for all responses except y2, Enamine unreacted. The reason for this can be any of the following:
Investigation: Byhand (MLR) Summary of Fit 1.00
the response includes an outlier a mistake was made in recording the response, for example the zeros are missing values the model is too simple the model is too complicated
0.80 0.60 0.40 0.20 0.00 -0.20
y1
N=7 DF=3
y2
y3
Since we understood from the replicate plot of y2 that curvature is involved, it is likely that the fitted model is too simple. This can easily be checked by making plots of the raw data.
Investigation: Byhand Raw Data Plot with Experiment Number labels
y2
Investigation: Byhand Raw Data Plot with Experiment Number labels
y2
14 12 10 8 y2 6 4 2 0
6 7 5
14 12 10 8 y2 6 4 2
2 1
6 7 5
3
1.00 1.10 1.20 x1 1.30 1.40
4
1.50
0 30 40 50 60 x2 70 80 90
4 3
100
From the scatter plots shown above the curvature is obvious. Such curvature can only be adequately captured by quadratic model terms, i.e. x12 and x22. The conclusion is therefore that with the current experimental design we cannot make a good model for y2. To estimate quadratic model terms the design must be expanded to become a composite design.
Page 4 (5)
Task 4
According to the response contour plots the temperature should be as high as possible and the ratio formic acid/enamine as low as possible. With these conditions we minimise the amount of Camphor (y1) and Enamine (y2) and maximise the amount of Product (y3).
Optimal point: Low x1 High x2
NOTE: Because of the model weakness with regards to y2 we should interpret the second response contour plot with some caution.
Conclusions
The optimal point is low x1 (low molar ratio) and high x2 (high temperature). The model for y2 is weak because the relationship between the factors and this response is non-linear.
Page 5 (5)
DOE-Exercise CakeMix (Full Fac)

Finding optimal CakeMix composition
Background
The producer of a commercial cake-mix experienced problems with the quality of the resulting cake in that there was considerable taste variation.
Objective
It was decided to use DOE to discover which combination of ingredients produced a tasty cake and which combination produced a reasonable cake at low cost.
Data
Three factors were studied: Flour, Shortening, and Eggpowder. The investigators used a design centred around the standard condition Flour = 300g, Shortening = 75g, and Eggpowder = 75g. Eleven experiments were made using a 23 full factorial design augmented with three replicated centre-points. The response is the average taste as assessed by a trained sensory panel.
Goal: Maximize
Page 1 (6)
Tasks
Task 1
Define a new investigation in MODDE with three factors and one response. Do File/New and name the investigation. Press Next. Press New (or double-click on the empty row) and enter the name, abbreviation, unit, and settings of the first factor. Press Add another and fill in the name, abbreviation, unit, and settings of the second factor. Press Add another and enter the details of the third factor. Press OK. Press Next. The three factors have now been defined. Press New (or double-click on the empty row) and enter the name, unit, and abbreviation of the Taste response. Press OK. Press Next. Select Screening. Press Next. Make sure that the selected design is the Full Factorial design in eight runs. Verify that Centre Points = 3 and Total runs = 11. Press Finish. Set Worksheet Run Order to detect curvature. Enter the response values in the generated worksheet. Now you are ready to analyse the data.
Evaluate the raw data. Fit the regression model. Which factors affect taste? Are there any non-significant model terms? What about lack of fit? Which factor combination gives an optimal taste?
Task 2
It is possible to take the cost of ingredients into account in the data analysis. The following prices were obtained: Flour 2.95 SEK/kg (0.00295 SEK/g) Shortening 14.70 SEK/kg (0.0147 SEK/g) Eggpowder 32.30 SEK/kg (0.0323 SEK/g)
Define a new response, a Derived response. Select Design and Responses. Double-click on the empty row. Define a derived response and press Edit, Next and Finish to enter the formula. Select ingredient from the list and multiply by the cost per gram, as shown below (NB: the parentheses shown in the formula are only used for clarity, they are not needed in reality). Also note that this task does not work with comma as decimal separator.
Refit the model. Find a recipe which represents a good compromise between a tasty cake and low cost. (Hint: Use Prediction/Contour Plot Wizard).
Page 2 (6)
Solutions to CakeMix
Task 1
We start by evaluating the raw data. First, we examine the curvature diagnostics plot (Worksheet/Curvature Diagnostics Plot) for taste (see below). This plot is constructed by plotting the value of Taste at three points, (1) the -/-/- factor combination, (9) the 0/0/0 factor combination, and (8) the +/+/+ factor combination. It is useful to examine whether the relationship between one response and the factors deviates from linearity. In this case the deviation from linearity is only mild and we may continue with the rest of the experiments. Whenever this plot exhibits strong curvature, reduce the range of the factors by 2/3.
Investigation: Cakemix
Curvature Diagnostics Plot for Taste
9
4.50 Taste
4.00
3.50
1
Low (distance: 0) Center (distance: 0) Factor Settings
The value inside parenthesis for each X-axis label is the normalized distance from the plotted experiment to the ideal design point of the design.
MODDE 7 - 2003-11-12 09:35:54
High (distance: 0)
The replicate plot shows that the replicate error is low, which is good. The histogram shows that the response is approximately normally distributed. This means that we have good data to work with.
Plot of Replications for Taste with Experiment Number labels 6.00 5.50 Taste 5.00 4.50 4.00 3.50 1
Histogram of Taste 6
6
Count
5 4
4 3
5 8 7 9 11 10
3 2 1
1
2
2
MODDE 7 - 2003-11-12 09:37:18
3.00
3.90
4.80 Bins
5.70
6.60
MODDE 7 - 2003-11-12 09:37:36
Page 3 (6)
In the data analysis it is recommended to first examine the Summary of fit plot. This plot shows that we can explain 99% (R2 = 0.99) and predict 87% (Q2 = 0.87) of the response variation. The adequacy of the model is further indicated by MVal = 0.71 and Rep = 0.99. MVal measures the validity of the model and Rep the reproducibility. When the MVal bar is larger than 0.25, there is no Lack of Fit of the model (the model error is in the same range as the pure error). This is also shown by the ANOVA-table below, where the lower p-value is larger than 0.05, which means that the model exhibits no significant lack of fit. The upper p-value is smaller than 0.05, indicating that R2 is statistically significant. If the reproducibility is below 0.5, you have a large pure error, poor control of the experimental set up (the noise level is high), and you cannot assess the validity of the model. This results in low R2 and Q2. You should improve the reproducibility.
Investigation: Cakemix (MLR) Summary of Fit 1.00 0.80 0.60 0.40 0.20 0.00
N=11 DF=4
Taste
Another diagnostic tool that is often used is the N-plot of residuals. However, with only 11 experiments it is difficult to define which residuals are normally distributed and which are not. In the plot below, the main thing to confirm is that all the experiments lie within 4 SDs, which they do. Inspection of the regression coefficients indicates that two model terms, Fl*Sh and Fl*Egg, are non-significant and can be removed from the model.
Investigation: Cakemix (MLR) Taste with Experiment Number labels 0.95 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.05 -4 -3 -2
Investigation: Cakemix (MLR) Scaled & Centered Coefficients for Taste 0.40 0.20 0.00 -0.20 -0.40 -0.60 Fl Fl*Egg Fl*Sh Egg
N-Probability
5 8 2 3 10
-1
9 11
1 6 4 7

N=11 DF=4 R2=0.995 Q2=0.874 R2 Adj.=0.988 RSD=0.0768
MODDE 7 - 2003-11-12 10:24:03
N=11 DF=4
R2=0.995 Q2=0.874
R2 Adj.=0.988 RSD=0.0768 Conf. lev.=0.95

MODDE 7 - 2003-11-12 10:24:44
After refitting the model a higher Q2 (0.94) is obtained. MVal and the ANOVA table also indicate the usefulness of the model. In the N-plot of residuals, experiment #1 is located beyond 4 SDs but it is considered harmless given the high Q2 of over 0.94.
Page 4 (6)
Sh*Egg
Sh
Investigation: Cakemix (MLR) Summary of Fit 1.00 0.80 0.60 0.40 0.20 0.00
N=11 DF=6
Taste
Investigation: Cakemix (MLR) Taste with Experiment Number labels 0.95 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.05
1 6 8 4 9 11 3 7 5 10 2
-5 -4 -3 -2 -1 0 1 2 3 4 5 Deleted Studentized Residuals
0.40 0.20 0.00 -0.20 -0.40 -0.60 Fl Sh*Egg

Page 5 (6)
N-Probability
N=11 DF=6
R2=0.988 Q2=0.937
R2 Adj.=0.980 RSD=0.0974 Conf. lev.=0.95

MODDE 7 - 2003-11-12 10:27:30
N=11 DF=6
R2=0.988 Q2=0.937
R2 Adj.=0.980 RSD=0.0974
MODDE 7 - 2003-11-12 10:27:19
The coefficient plot indicates that the largest model term is the Sh*Egg interaction. It is normal to explore such interactions by means of response contour plots. The three response contour plots shown below indicate that the highest value of taste is found with the factor settings Flour = 400g, Shortening = 50g and Eggpowder 100g.
Egg
Sh
Task 2
In Task 1 we found that Flour should be fixed at its high level in order to produce a tasty cake. This ingredient is also the cheapest one. The contour plots shown below were constructed using Flour = 400 g.
Apparently, we should stay in the upper left-hand corner to maximize taste. In this corner, the predicted ingredient cost is 5.14 SEK. However, the lower right-hand corner represents a reasonable compromise between taste and cost. Here the predicted cost is just 4.27 SEK.
Conclusions
To maximize taste we should use Flour = 400g, Shortening = 50g and Eggpowder = 100g. To obtain a compromise between high taste and low cost an alternative factor combination would be Flour = 400g, Shortening = 100g, and Eggpowder = 50g.
Page 6 (6)
DOE-Exercise Pain (Full Fac)

Combinations of active ingredients in a pain-reliever
Background
A new combination of constituents in a formulation with pain-relieving capacity was investigated. The formulation contained two active components, A and B, and the effect of different combinations of these were examined. The response was the time (in minutes) needed for the formulation to reach full anaesthetic effect (the average from testing 12 persons). The desirable result was full effect after 5 minutes. Substance A costs 60 times more than B to produce. Since every experiment was very expensive the number of experiments was minimised.
Objective
The first objective is to optimise time, with two variables, through the use of contour plots. Another objective is to consider production economy.
Data
Goal: 5 minutes
Tasks
Task 1
Define a new investigation according to the information given above. The default experimental plan in MODDE is the one used in this application. Fit the regression model. Construct a plot that shows under which conditions the formulation achieves the desired effect.
Task 2
Which approved combination of A and B is the most economical? Do this with graphical tools. (Hint: Add the derived response Cost).
Task 3
If the desirable result was full effect within 5 minutes, what would the answer to Task 2 be (95% significance)?
Page 1 (4)
Hint: use Prediction menu.
Solutions to Pain
Task 1
In the data analysis it is recommended to examine the R2/Q2 plot first (summary of fit). This plot shows that we can explain 99% (R2 = 0.99) and predict 98% (Q2 = 0.98) of the response variation. Also the statistics MVal = 0.94 and Rep = 0.98 point to an excellent model. The coefficient plot shows that constituent A (CA) affects the release time more strongly than constituent B (CB). If the release time is to be minimised, we should increase the amounts of both constituents.
Investigation: Pain (MLR) Summary of Fit 1.00 0.80
Investigation: Pain (MLR) Scaled & Centered Coefficients for Release time 0.00 -0.50 min
0.60 0.40
-1.00 -1.50
0.20
-2.00 CA CB
Release time
N=7 DF=3
R2=0.995 Q2=0.984
R2 Adj.=0.990 RSD=0.1676 Conf. lev.=0.95

MODDE 7 - 2003-11-18 14:58:52
Any point along the line Release time = 5 fulfils the experimental goal. (Hint: the number of steps in a contour plot can be changed by double-clicking in the plot, selecting Contour Levels and changing the number of steps and/or min/max. Here, we used 13 steps when going from Min = 4 to Max = 10.)
CA*CB
0.00
Page 2 (4)
Task 2
We added a new response, cost (derived from the factors).
The most economical combination of A and B contains as little of A as possible, i.e., approximately CA = 8.9 and CB = 100. At this point, the predicted cost is 634 (in arbitrary currency unit). The left-hand contour plot only gives the point estimate of Release time. The prediction list shows the uncertainty in the predicted value. As shown by this list, using the combination CA = 8.9 and CB = 100 might result in a Release time ranging from 4.6 to 5.4 minutes.
Page 3 (4)
Task 3
If we want to make sure that the Release time does not exceed 5 minutes, we have to adjust the upper confidence limit from approximately 5.4 to 5.0. This, in turn, implies that we should be looking for a point estimate of approximately 4.6. From the prediction list given below we conclude that in order to be sure that the painreliever does not take longer than 5 minutes to reach full effect, we need 9.45 mg of substance A and 100 mg of substance B (the cheapest solution). The limits are given with 95 % confidence.
Conclusions
In order to accomplish full anaesthetic effect within five minutes we may use the recipe constituent A = 9.45 mg and constituent B = 100 mg. This combination of ingredients is the most economical one.
Page 4 (4)
DOE-Exercise Tablet (Full Fac)

Variation in the thickness of pharmaceutical tablets
Background
A manufacturer experienced problems with variation in the thickness of tablets. The variation caused problems during packaging. The problem was tackled by determining which factors had the largest influence on the thickness of the tablets. Three factors that were considered to have an impact on the thickness of the tablets were investigated using experimental design. These factors were: Amount of stearate (lubricant) Amount of active substance, and Amount of starch
Objective
The objective of the investigation was to produce an experimental design and model the response. The goal was to produce a 5 mm thick tablet with a fixed level (90 mg) of active substance.
Data
Goal: 5 mm
Page 1 (4)
Tasks
Task 1
Initiate a new investigation in MODDE and define the factors and the response according to the information given above. Select Screening as objective. Accept the recommended 11 run design (Full factorial design in 8 runs plus 3 centre-points). Enter the response values in the Worksheet.
Task 2
Do Analysis/Fit. Determine which factors have the strongest influence on the thickness of tablets by looking at the coefficient plot. Are there any interaction effects present?
Task 3
How would you produce a 5 mm thick tablet with 90 mg active substance? With what precision can this be done (5 mm + ???)? Hint 1: Use a response contour plot to find suitable factor combinations at which to perform predictions. Hint 2: Use the prediction list and compute the predicted value and its associated confidence interval.
Page 2 (4)
Solutions to Tablet
Task 2
As seen below, the factors active substance and starch have a strong influence on the thickness of the tablets. The factor stearate has a small influence. The interaction between Stearate and Starch is small but should be included since the R2 and Q2 numbers decline when it is removed.
Investigation: Tablet (MLR) Investigation: Tablet (MLR) Summary of Fit 1.00 0.80 0.60 -0.20 0.40 actsu ste*actsu 0.20 0.00
N=11 DF=4
Scaled & Centered Coefficients for thickness 0.40 0.20 mm 0.00
thickness
N=11 DF=4
R2=0.969 Q2=0.504
R2 Adj.=0.922 RSD=0.1076 Conf. lev.=0.95

MODDE 7 - 2003-11-18 15:32:07
Investigation: Tablet (MLR) Summary of Fit 1.00 0.80 0.60 0.40 0.20
Investigation: Tablet (MLR) Scaled & Centered Coefficients for thickness 0.40 0.20 mm 0.00 -0.20
actsu
ste
sta
0.00
N=11 DF=6
thickness
Cond. no.=1.1726 Y-miss=0 N=11 DF=6 R2=0.953 Q2=0.808
R2 Adj.=0.921 RSD=0.1083 Conf. lev.=0.95

MODDE 7 - 2003-11-18 15:32:55
The two lower plots show the result when we have removed insignificant terms. The principle for removing model terms is that of maximisation of Q2: The term ste*actsu has the smallest coefficient and is removed first. The model is then recalculated with the remaining terms and we compare Q2 with the original model. In this case Q2 increases from 0.50 to 0.78. We then continue by removing the second smallest interaction term, actsu*sta, and check Q2. Anew Q2 increases, from 0.78 to 0.81. When removing the last interaction term, ste*sta, Q2 drops a little. This indicates that the four-term model displayed above is predictively the optimal one. This example also shows that a small model term must not necessarily be excluded from the model, just because it is insignificant according to the confidence interval criterion.
Page 3 (4)
ste*sta
actsu*sta
ste
sta
ste*sta
Task 3
In the contour plot, the line where the thickness of the tablets is predicted to be 5 mm is of interest. Below, we give some predictions for different factor combinations found along the line 5 mm. According to these predictions the tablets can be made with the following precision: 5.00 0.09mm up to 5.00 0.12mm depending on which factor combination is selected for production.
Active substance: 90 mg
Conclusions
Active substance and starch are the two ingredients most profoundly affecting the tablet thickness. According to model predictions the tablets can be made with the precision 5.00 0.09mm if the factor combination stearate = 1.0 mg, active substance = 90 mg, and starch = 44.5 mg is used in the manufacturing.
Page 4 (4)
PROTEIN SPRAY-DRYING (Full Fac)

Investigating the effect of process variables on the degradation of spray-dried protein
Background
Spray-drying is a process often used for drugs intended for inhalation. For the spray-drying of proteins, the prime interest is to produce particles of controlled size. Additionally, it is important that the protein temperature remains rather low to avoid unnecessary denaturation. Protein degradation may involve many complicated physical and chemical processes, including denaturation. Therefore, we would like to study protein stability at a molecular level in order to facilitate formulation applications.
Objective
This example is based on a model protein (D7599) developed by AstraZeneca. Protein powders of D7599 were produced by spray-drying. The experimental objective of this study was to determine which process parameters influence the quality of the spray-dried product. The data analysis will involve dealing with several responses which are not completely correlated. Original data source: Cronholm, M., The Effect of Process Variables on a Spray-dried Protein Intended for Inhalation, Undergraduate Research Study, Department of Pharmaceutics, Uppsala University, Uppsala, Sweden, 1998.
Data
Spray-drying conditions were varied using a full factorial design in four factors: Inlet Temperature temperature of drying air at the inlet of the equipment. The high and low levels of this factor were set such that degradation would be expected at the high level (220C) but not at the low temperature (100C). Atomization gas flow for this factor the low level (500 l/h) of the atomization gas (nitrogen) was the minimum required to achieve sufficient energy for atomization. The high level (800 l/h) was the maximum achievable flow with this spray-dryer. Aspiration rate the aspirator draws air through the instrument and this was varied from 60% to 100% (full capacity). Feed-flow indicates the material flow through the equipment. Here, the high level of 5ml/min was the maximum rate which could be used at the low temperature without condensation appearing in the drying chamber, whereas the low level (2 ml/min) was chosen as the slowest practical rate. Yield the amount of product produced. Should be maximized. Size particle size. Ideally, particles should be in the range 0.5 3.3 m in order to reach the lower airways. Water water content in spray-dried protein. To be minimized. Outlet temperature outlet drying air temperature. This temperature may influence protein degradation and was therefore included. No specific target value was specified for this response. HMWP high molecular weight proteins. Measures the extent of aggregations, i.e., formation of dimers and oligomers of the protein. Should be as low as possible.
To characterize the outcome of the spray-drying the following five responses were measured:
Page 1 (8)
Page 2 (8)
Tasks
Task 1
Initiate a new investigation in MODDE. Define the four factors and the five responses according to the information above. Select Screening and the full factorial design in 16 runs supplemented with three center-points. Enter the response data or copy them from PROTEIN SPRAY DRYING.XLS. Evaluate the raw data. Is there any need for data pre-treatment such as a response transformation?
Task 2
Select MLR as the fit method. Fit the regression model. Which are the important factors? Are there any non-significant model terms? Are the residuals approximately normally distributed? Refine the model, if necessary. Use the optimizer to predict good operating parameters.
Task 3
In a MODDE investigation you can only have one model (i.e., one set of model terms) for all the responses. Hence, to generate several models with the same factors and underlying design, but for different responses, we make copies of the original investigation, and in each copy keep the responses that will be fitted with the same model. Thereafter we can link all these responses into one of the investigations (File/Link Investigation) and optimize them together. Try to improve the modelling results, by dividing the responses in two separate projects. One project may contain Yield and Size, and another project Water, Outlet Temp and HMWP. Another possibility is to split the mother investigation into five new investigations and tailor-make one model for each response. Repeat Task 2, but analyze sub-sets of responses. Optimize the responses together. There is no solution provided to this Task.
Page 3 (8)
Solutions to PROTEIN SPRAY DRYING

Task 1
Evaluation of the raw data indicates two things; (i) the replicate error is small for every response, and (ii) the HMWP response needs to be transformed. The last plot in each sextet depicts the situation after log transforming HMWP using the settings C1 = 1 and C2 = -0.25.
Investigation: Protein Spray Drying Investigation: Protein Spray Drying Plot of Replications for Size with Experiment Number labels Investigation: Protein Spray Drying Plot of Replications for Water with Experiment Number labels Plot of Replications for Yield with Experiment Number labels 60 50 Yield 40 30 20 10 1 2 3
6 5 1 2 8 11 12
5 6 7 8
13 14 9 10
6 2 1 3 5 9 12 10
14
1
Water
3 4 2 5 7 8
11 13 12 16 10 14 15 18 17 19
18 19 15 16 17
Size
13 18 17 19 15 16
7 3 4
4
2 1 2 3
4 7 8
4 5 6 7 8 Replicate Index
11
2 1 2 3 4 5 6
6
7 8 Replicate Index
9 10 11 12 13 14 15 16 17
9 10 11 12 13 14 15 16 17
9 10 11 12 13 14 15 16 17
Replicate Index
MODDE 7 - 2003-11-26 19:06:37
MODDE 7 - 2003-11-26 19:06:57
MODDE 7 - 2003-11-26 19:07:13
Investigation: Protein Spray Drying Plot of Replications for Outlet Temp with Experiment Number labels
Investigation: Protein Spray Drying Plot of Replications for HMWP with Experiment Number labels
Investigation: Protein Spray Drying Plot of Replications for HMWP~ with Experiment Number labels
6
140 Outlet Temp 120 100 80 60 1
10 12
14
6 16
HMWP 3 0.40 HMWP~
6 8 8
0.20 0.00 -0.20
14 10
19 18 17 1
2 3
16
14 16 10 7
6 7 8
5 3
4 5 6 7
7 9
8
13 11
15
1
1 2 3 4 5
2 3 4 5
12 11 13 15
1 2
12 19 17 18 11 13 15
19 17 18
-0.40 1 2 3 4 5
5 7
6 7 8
9 10 11 12 13 14 15 16 17
9 10 11 12 13 14 15 16 17
9 10 11 12 13 14 15 16 17
Replicate Index
MODDE 7 - 2003-11-26 19:07:32
Replicate Index
MODDE 7 - 2003-11-26 19:07:52
Replicate Index
MODDE 7 - 2003-11-26 19:09:30
Investigation: Protein Spray Drying Histogram of Yield 6 5 4 Count Count 3 2 1 0 8 18 28 38 Bins

MODDE 7 - 2003-11-26 19:04:07
Investigation: Protein Spray Drying Histogram of Size 8 7 6 Count 5 4 3 2 1 2 0 8 6 4
Investigation: Protein Spray Drying Histogram of Water
48
58
68
1.00
1.80
2.60 Bins
3.40
4.20
5.00
1.00
1.95
2.90 Bins
3.85
4.80
5.75
MODDE 7 - 2003-11-26 19:04:26
MODDE 7 - 2003-11-26 19:04:52
Investigation: Protein Spray Drying Histogram of Outlet Temp 7 6 5 Count Count 4 3 2 1 0 50 70 90 Bins
MODDE 7 - 2003-11-26 19:05:09
Investigation: Protein Spray Drying Histogram of HMWP 14 12 10 8 6 4 2 Count 12 10 8 6 4 2 0.00 0.75 1.50 Bins
MODDE 7 - 2003-11-26 19:05:25
Investigation: Protein Spray Drying Histogram of HMWP~
110
130
150
2.25
3.00
3.75
-1.00
-0.65
-0.30 Bins
0.05
0.40
0.75
MODDE 7 - 2003-11-26 19:09:14
When dealing with many response variables you should always check the correlation matrix. It will suggest how the variables are correlated. An excerpt of the correlation matrix is shown below. This table indicates there are two groups of responses. The first sub-set contains Yield and Size which correlate with the coefficient 0.75. The second group is made up of Water, Outlet Temp and HMWP, which also have high pairwise correlation coefficients (-0.75, -0.88, and 0.88). Because of the subgrouping of the responses we should not expect them to depend in the same way on the various terms in the regression model.
Copyright Umetrics AB, 04-02-10 Page 4 (8)
Task 2
MLR was used to fit an interaction model to each of the five responses, each of which has 11 model terms (the constant, four linear terms, and six two-factor interactions). As seen below, we have good models for all responses except HMWP.
Investigation: Protein Spray Drying (MLR) Summary of Fit 1.00 0.80 0.60 0.40 0.20 0.00
Yield
Size
Water
Outlet Temp
HMWP~
N=19 DF=8
The coefficient overview plot below shows all model coefficients (except the constant term) for each response variable. The first two responses (Yield and Size) are dominated by the Atomization gas flow. Also the Aspiration rate has an influence on the Yield. The other three responses are highly influenced by the setting of the Inlet Temperature. Water content in the spray-dried protein is also dependent Aspiration rate. The different dependence on the factors suggest Yield and Size to be correlated, and Water, Outlet Temp, and HMWP to be correlated.
Page 5 (8)
Investigation: Protein Spray Drying (MLR) Normalized Coefficients

InT Ato Asp FF InT*Ato InT*Asp InT*FF Ato*Asp Ato*FF Asp*FF
1.00
0.50
0.00
-0.50
-1.00 Yield Size Water Outlet Temp HMWP~
N=19 DF=8
In an attempt to improve the five models, two model terms were removed. These were: Ato*FF and Asp*FF. Primarily, this gave a much better model for HMWP.
Investigation: Protein Spray Drying (MLR) Summary of Fit 1.00 0.80 0.60 0.40 0.20 0.00
Yield
Size
Water
Outlet Temp
HMWP~
N=19 DF=10
Page 6 (8)
The coefficients of the revised models are plotted below in the Coefficient Overview plot.
Investigation: Protein Spray Drying (MLR) Normalized Coefficients
InT Ato Asp FF InT*Ato InT*Asp InT*FF Ato*Asp
1.00
0.50
0.00
-0.50
-1.00 Yield Size Water Outlet Temp HMWP~
N=19 DF=10
Further review of the models using N-plots of residuals show a mild outlier for Outlet Temp (exp 10), but due to the high R2 and Q2 for this response this point is not alarming. No N-plots are shown. We then decided to use the above models together with the Optimizer to predict a factor combination representing good operating conditions. The response desirabilities were set according to the experimental goals mentioned on page 1.
Page 7 (8)
The results of running the Optimizer are shown below. Apparently, we have not completely fulfilled the desirabilities of the responses, but many simplexes have reached a point where many of the goals are met. It is mainly difficult to cope with the requirements on Water. The following approximate operating parameters are suggested in order to comply with most of the endpoints as well as possible: Inlet temperature: 160 C, Atomization gas-flow 580 l/h, Aspiration rate 100%, and Feed-flow 5ml/min.
Comment: Frequently, the optimizer is run iteratively in several steps, letting the results of the preceding stage dictate how to relax factor settings in the next stage. A practical way to do this is first using the optimizer for interpolation and then for extrapolation. In the current application, however, factor limits could not be changed in a second cycle of the optimizer, since they were already set according to performance limitations of the equipment used.
Conclusions
It is possible to develop strong models for the five responses. Good operating conditions predicted by the models are: Inlet temperature: 160 C, Atomization gas-flow 580 l/h, Aspiration rate 100%, and Feed-flow 5ml/min. A further experiment should be done to verify the results at this point and future work could involve an optimization study anchored around these settings.
Page 8 (8)
DOE-Exercise PILOT PLANT (Frac Fac 24-1)

Organic synthesis of semi-carbazone from glyoxylic acid in a pilot plant
Background
The organic synthesis of semi-carbazone from glyoxylic acid is a key step in the synthesis of azuracil (a cytostaticum, anti-cancer drug).
Objective
The objective of this study was to investigate the best operating conditions of a pilot plant for synthesising semicarbazone. A fractional factorial design in four factors was constructed and three responses were measured. The intentions with this experimental protocol were to obtain a high yield of semi-carbazone, high purity and rapid filtration. This exercise also mirrors some of the difficulties that might appear when several responses have to be considered simultaneously.
Data
Time for addition of glyoxylic acid (h) Stirring time (h) Reaction temperature (C) Amount of water added (ml/mol)
Yield (%) isolated. Goal: High Purity (%) titrimetric. Goal: High Filtration (ordinal scale, -5 worst, 5 best) Goal: High
Page 1 (7)
Tasks
Task 1
Write down the computational matrix with + and - signs. Describe the defining relation and list the confounding pattern for the linear terms and the two-factor interactions.
Task 2
Solve the problem with MODDE. Note that the design does not include centre-points, hence you will see no bars relating to Model Validity and Reproducibility in the Summary of Fit plot. (Hint: add some interaction terms to the model and discuss the problems this might introduce).
Task 3
Show graphically which part of the experimental space should be chosen for the first experiment in the pilot plant (specify levels for the variables). Goal: High Yield, Purity and Filtration.
Task 4
Which method is commonly used to separate confoundings between two-factor interactions?
Page 2 (7)
Solutions to Pilot Plant

Task 1
Defining relation: I = abcd This means that ab is confounded with cd and bc is confounded with ad (also seen in the table below). For the linear terms this means: a=bcd b=acd c=abd d=abc Generator: d=abc abc cd bd bc ad ac ab
const
1 2 3 4 5 6 7 8 + + + + + + + +
a
+ + + +
b
+ + + +
c
+ + + +
d
+ + + +
ab
+ + + +
ac
+ + + +
ad
+ + + +
bc
+ + + +
bd
+ + + +
cd
+ + + +
N.B. In the literature there are two ways of describing the generators and the interactions, with letters and with numbers. We use the more conventional LETTERS.
Task 2
To the left, we see the confounding pattern. The problem is that we cannot be sure of which of the confounded interaction terms is important when we get a significant coefficient (Note: a model like this one cannot be fitted with MLR, since it contains confounded terms and we only have 8 runs).
Page 3 (7)
Task 3
A linear model is a good choice for Purity and Filtration, but not for Yield
1.00 0.80 0.60 0.40 0.20 0.00 -0.20 Investigation: Pilot plant (MLR) Summary of Fit
R2 Q2
yield
N=8 DF=3
purity
filtration
From the regression coefficient plot of Yield, we can see that we have big confidence limits and hence great model uncertainty. One way to improve the model might be to add the interaction between the two largest main effects, i.e., Ad*Te and refit the model.
Investigation: Pilot plant (MLR) Scaled & Centered Coefficients for yield 2 1 % 0 -1
Ad
N=8 DF=3
R2=0.676 Q2=-1.304
R2 Adj.=0.244 RSD=1.0188 Conf. lev.=0.95

MODDE 7 - 2003-11-18 15:14:33
wa
R2 Q2
St
The model improves a lot with respect to Yield, but degrades with regards to the prediction ability of Purity and Filtration.
Investigation: Pilot plant (MLR) Summary of Fit 1.00 0.80 0.60 0.40 0.20 0.00
yield
N=8 DF=2
purity
Te
filtration
Page 4 (7)
The overview of regression coefficients shows the importance of the added interaction term for Yield. It is also apparent that the second main effect (Stirring time) is insignificant for all three responses. Remove Stirring time and refit the model.
0.50
Investigation: Pilot plant (MLR) Normalized Coefficients
Ad St Te wa Ad*Te
0.00
yield
N=8 DF=2
purity
filtration
After the deletion of Stirring time, much better models were obtained. When calculating three models with the same model terms, this is the best result.
Investigation: Pilot plant (MLR) Summary of Fit 1.00 0.80 0.60 0.40 0.20 0.00
R2 Q2
yield
N=8 DF=3
purity
filtration
The coefficient overview plot may be used in trying to solve the problem. Recall that our goals are high Yield, high Purity and high Filtration. Addition time and Temperature are the two most important terms. We make contour plots with these as axes. As constant, we set the amount of water added at its centre level (because it has a negative effect for Yield and a positive one for Purity and Filtration).
Investigation: Pilot plant (MLR) Normalized Coefficients
Ad Te wa Ad*Te
0.80 0.60 0.40 0.20 0.00 -0.20 -0.40 yield

N=8 DF=3
purity
filtration
Page 5 (7)
Water = 137.5 Area of interest
By using long addition time and high temperature the goal of simultaneously high Yield, Purity and Filtration is accomplished. The use of the MODDE Optimiser is shown below. Note that the factors have been set for extrapolation outside the investigated area.
Page 6 (7)
Task 4
The method used to unconfound two-factor interactions is called FOLD-OVER.
Conclusions
In order to accomplish high yield, high purity, and rapid filtration, the factor combination of addition time 2h, water 137.5 ml/mol and temperature 60 C looks interesting and could be verified with additional experiments. The last factor, stirring time, may be set at a level convenient for the experimental process. The optimiser in MODDE indicates that even better results are obtainable when relaxing the high limit of addition time to 2.3h and the high limit of temperature to 80 C. Reference: J-C Vallejos, Diss. IPSOI, Marseille 1978.
Page 7 (7)
REPORTER GENE ASSAY

Screening, optimisation and robustness testing of a reporter gene assay
Background
Reporter gene assays are used in mechanistic studies of gene regulation. They also have great potential when applied to toxicology and drug development. A reporter gene has an easily measurable phenotype whose transcription is controlled by a promoter. Reporter gene assays provide important information of gene regulation relating to expression (i.e. number of copies) and when and where a particular protein is formed.
Objective
The data-set used in this exercise originates from Active Biotech AB in Lund, Sweden and we gratefully acknowledge Lena Schultz and Lisbeth Abramo for permitting us to use it. This study deals with the luciferase reporter gene, one of a number of widely used reporter genes. A total of six factors were investigated using DOE and the objective was to increase and stabilise the signal-to-background ratio of the assay. This study is unique in that it contains data related to the full spectrum of DOE applications, i.e. first a screening design was performed, then fold-over, then optimisation and finally robustness testing. The exercise is structured accordingly: Phase 1 (Screening): A 26-2 fractional factorial design in 16 experiments + 3 centre points. Phase 2 (Fold-over): The initial screening design was complemented by folding over. Phase 3 (Optimisation): A CCF design in 17 experiments to optimise three of the six factors. Phase 4 (Robustness Testing): A 25-1 fractional factorial design in 16 experiments + 3 centre points to investigate the sensitivity of the response to small changes in five of the factors.
Factors: Cells number of T-cells used in assay (number per well) PMA agent added to stimulate T-cells (ng/ml) Ionomycin agent added to stimulate T-cells (g/ml) Stimulation time duration of stimulation (hours) Lysing volume volume of buffer needed to lyse T-cells (l) Ratio ratio of amount of sample to amount of substrate required to acquire a signal in the luciferase assay
Response: S/B signal-to-background ratio computed as (signal-background)/background.
Page 1 (20)
Phase 1 (Screening)
Tasks Phase 1 (Screening)

Task 1.1
In screening the objective is to identify the most important factors and their ranges. Define a new investigation in MODDE with six factors and one response. Select Screening and the Frac Fac Res IV design in 16 runs augmented with three centre-points. Enter the response data and evaluate them. Should the response be transformed?
Task 1.2
Fit the regression model. Which factors are most important? Are there any non-significant model terms? Are the residuals approximately normally distributed? Refine the model as necessary.
Task 1.3
Using the model obtained above, which factor combination maximises the signal-to-background ratio?
Page 2 (20)
Phase 2 (Fold-Over)
Page 3 (20)
Tasks Phase 2 (Fold-Over)

Task 2.1
Fold-over is applied to screening designs to increase the number of experiments so that confounded terms may be resolved. In the existing design, click on File/Complement Design and select the first complement alternative (Fold over a screening fractional factorial of resolution III or IV). Enter a new name for the investigation and select three additional centre-points. MODDE will now construct a new investigation including the existing data and the new runs. It will also add a block-factor ($Block) which is a precautionary measure to check whether the response has drifted with time. If $Block is non-significant, it will be removed from the model. Enter the response data and evaluate them (histogram & replicate plot).
Task 2.2
Fit the regression model. Which factors are most important? What about the block-factor? Are there any non-significant model terms? Are the residuals approximately normally distributed? Refine the model as necessary.
Task 2.3
Using the model obtained above, which factor combination maximises the signal-to-background ratio? Compare your answer with that obtained in Task 1.3?
Page 4 (20)
Phase 3 (Optimisation)
Tasks
Phase
(Optimisation)
Task 3.1
In optimisation, the objective is to locate an optimal factor combination which can be used as a future set point. Define a new investigation with three factors and one response. Note that the order of the factors has changed and that the factor ranges have been modified according to the results of the screening phase. The new design defines a much smaller experimental domain. Select RSM and choose a CCF design augmented with four centre-points. Enter the response data and evaluate them. Should the response be transformed?
Task 3.2
Fit the regression model. Which factors are most important? Are there any non-significant model terms? Are the residuals approximately normally distributed? Refine the model as necessary.
Task 3.3
Using the model obtained above, which factor combination maximises the signal-to-background ratio?
Phase 4 (Robustness Testing)
Page 6 (20)
Tasks Phase 4 (Robustness Testing)

Task 4.1
In robustness testing, the objective is to explore the robustness of an assay or method around its set point. The following set point was identified: Cells = 300000 (320000 was optimal according to the CCF design but 300000 is more practical as it means less crowding of the sample volume). PMA = 10 (Had virtually no effect in the screening phase, low level chosen.) Ionomycin = 1.5 (2 was optimal according to the CCF design but 1.5 is more practical. Too high a concentration creates an interference with the real signal which could reduce the signal-to-background ratio.) Stimulation time = 5.5 (Six hours was optimal according to the CCF design but 5.5h fits in better with an 8 hour working day.) LysVolume = 30 (Low level which was found to be optimal during the screening phase).
The specification of the response was that the signal-to-background ratio should exceed 50 regardless of the factor combination. Define a new investigation in MODDE with five factors and one response. Select Screening and the Frac Fac Res V+ design augmented with three centre-points. Enter the response data and evaluate them. Should the response be transformed? How do the response data compare to the specification?
Task 4.2
Fit the regression model. Which factors are most important? Are there any non-significant model terms? Are the residuals approximately normally distributed? Refine the model as necessary. Is the response sensitive to the factor changes?
Task 4.3
Evaluate the results in terms of the four limiting cases of robustness testing. Which case applies here? Inside specification/Significant model (Limiting case 1) Inside specification/ Non-significant model (Limiting case 2) Outside specification/Significant model (Limiting case 3) Outside specification/Non-significant model (Limiting case 4)
Which factors should be better controlled in order to achieve robustness according to both criteria? Propose new factor tolerances where necessary.
Page 7 (20)
Solutions to REPORTER GENE ASSAY (Phase 1 - Screening)

Task 1.1
The evaluation of the response data indicates a few very large measurements (below, top left) and their histogram is highly skewed (below, top right). A logarithmic transformation seems justified. However, since the response contains negative numbers, which cannot be logged, a small constant must be added before applying the transformation. The lowest response is 0.2. The second histogram (below, middle right) shows the effect of the log-transform using 1 as the constant. The response is still not approximately normally distributed. The third histogram (below, bottom right) shows the effect of changing the constant to 0.21. Now, the histogram and replicate plot look much better. A plot of descriptive statistics (not shown here) confirms that this transformation is appropriate.
Investigation: Reporter Gene Assay Screening Plot of Replications for S/B with Experiment Number labels 120 100
Count Investigation: Reporter Gene Assay Screening Histogram of S/B
16
15
80 S/B 60 40 20 0 1
10
14 15 1 2 3 4 5
2 3 4 5 6
6 7 8 9 10 11 12 13
7 8 Replicate Index
MODDE 7 - 2003-11-27 09:51:02
19 17 18
9 10 11 12 13 14 15 16 17
-1
24
49 Bins
74
99
124
MODDE 7 - 2003-11-27 09:51:30
Investigation: Reporter Gene Assay Screening Plot of Replications for S/B~ with Experiment Number labels 2.00
Investigation: Reporter Gene Assay Screening Histogram of S/B~
16 14 15 6 5
5 6 7
10 8 Count 6 4 2 0
1.50 S/B~ 1.00 0.50 0.00 1
8 7 9 10
8
13 12 11
19 17 18
1 2 3 4
2 3 4
9 10 11 12 13 14 15 16 17
-1.00
-0.30
0.40 Bins
1.10
1.80
2.50
Replicate Index
MODDE 7 - 2003-11-27 09:52:04
MODDE 7 - 2003-11-27 09:52:20
Investigation: Reporter Gene Assay Screening Plot of Replications for S/B~ with Experiment Number labels 2 1
Investigation: Reporter Gene Assay Screening Histogram of S/B~
14 6 4 5 7 9 10 11 3
1 2 3 4 5 6 7 8
16 15
Count
8 6 4 2 0
8 12
13
S/B~
0 -1 -2
19 17 18
1 2
9 10 11 12 13 14 15 16 17
-3
-2
-1
0 Bins
Replicate Index
MODDE 7 - 2003-11-27 09:52:48
MODDE 7 - 2003-11-27 09:53:04
Page 8 (20)
Task 1.2
The default linear model looks good with no evidence of lack of fit (R2 = 0.92, Q2 = 0.79). The top two plots correspond to this model. To try and improve the model, PMA and Ratio were removed and the six two-factor interactions of the four remaining factors added, of which only three were worth keeping (Cel*Lys, Ion*StH, and Ion*Lys). The revised model is much better (R2 = 0.96, Q2 = 0.91). The lower two plots relate to the revised model.
Investigation: Reporter Gene Assay Screening (MLR) Summary of Fit 1.00 0.80 0.60 0.40 0.20 Cel Ion Lys PM Rat 0.00 StH 0.50
Investigation: Reporter Gene Assay Screening (MLR) Scaled & Centered Coefficients for S/B~ 1.00
0.00
S/B~
N=19 DF=12
N=19 DF=12
MODDE 7 - 2003-11-27 09:54:26
R2=0.917 Q2=0.791
R2 Adj.=0.876 RSD=0.3472 Conf. lev.=0.95

MODDE 7 - 2003-11-27 09:54:46
Investigation: Reporter Gene Assay Screening (MLR) Summary of Fit 1.00 0.80 0.60 0.40
Investigation: Reporter Gene Assay Screening (MLR) Scaled & Centered Coefficients for S/B~ 1.00
0.50
0.00 0.20 Ion*StH StH Cel Ion Lys 0.00 Cel*Lys Ion*Lys
S/B~
N=19 DF=11
N=19 DF=11
R2=0.962 Q2=0.914
R2 Adj.=0.937 RSD=0.2467 Conf. lev.=0.95

MODDE 7 - 2003-11-27 09:55:16
The revised model contains no outliers (below, left), and the size of the residual is fairly independent of the predicted value (below, right), which is good.
Investigation: Reporter Gene Assay Screening (MLR) S/B~ with Experiment Number labels
Deleted Studentized Residuals Investigation: Reporter Gene Assay Screening (MLR) S/B~ with Experiment Number labels
0.98 0.95 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.05 0.02 -4 -3 -2
19 11 1 18 16 6 2 5 12 1415 7 8 4 13 10
0 1
19
2 1 0 -1 -2 -1 0 Predicted
N=19 DF=11 R2=0.962 Q2=0.914 R2 Adj.=0.937 RSD=0.2467
MODDE 7 - 2003-11-27 09:56:11
17
N-Probability
11 1 2 3 4 5 9 10
17 18 12 7 6 8 13 15 16 14
9 3
-1

N=19 DF=11 R2=0.962 Q2=0.914 R2 Adj.=0.937 RSD=0.2467
MODDE 7 - 2003-11-27 09:55:43
Page 9 (20)
Task 1.3
The contour plot below shows how the signal-to-background ratio is predicted to change as a function of the factors Cells and LysVolume, while fixing the other factors at their maximum value. The combination of Cells and LysVolume was chosen to explore the borderline significant two-factor interaction.
Conclusions of Phase 1
The three most important factors are Cells, Ionomycin and Stimulation Time. There are a few twofactor interactions which look interesting as they improve the predictive power of the model. However, these two-factor interactions are confounded with other two-factor interactions. Such confounding can be resolved using the Fold-over technique, see Phase 2 of the exercise.
Page 10 (20)
Solutions to REPORTER GENE ASSAY (Phase 2 Fold-over)

Task 2.1
For ease of interpretation, the same response transformation was applied as in Task 1. This looks reasonable given the replicate and histogram plots below.
Investigation: Reporter Gene Assay Screening - Fold over complement Plot of Replications for S/B~ with Experiment Number labels 2 1 S/B~ 0 -1 -2 0
Investigation: Reporter Gene Assay Screening - Fold over complement Histogram of S/B~
Count
12 4 3
6 8 7
16 14 15 13 19 17 18 12 9 10 11
35 33 32 34 36 37 25 38 24 26 29 31 28 30 21 23 22 20 27
12 10 8 6 4 2
10
20 Replicate Index
30
-3.00
-2.15
-1.30
-0.45
0.40
1.25
2.10
2.95
Bins
MODDE 7 - 2003-11-27 09:59:01
MODDE 7 - 2003-11-27 09:59:38
Task 2.2
The default linear model is very good (R2 = 0.92, Q2 = 0.88). The top two plots below relate to this model. The Block factor is not significant so there is no evidence of a time drift between the two sets of experiments. To try and improve the model, PMA, Ratio and $Block were removed. The refined model is only marginally better (R2 = 0.91, Q2 = 0.89). The lower two plots relate to the refined model.
Investigation: Reporter Gene Assay Screening - Fold over complement (MLR) Q2 Model Validity Summary of Fit Reproducibility 1.00 0.80 0.60 0.40 0.20 0.00
R2
Investigation: Reporter Gene Assay Screening Fold over complement (MLR) Scaled & Centered Coefficients for S/B~ 0.80 0.60 0.40 0.20 0.00 -0.20 StH Ion Cel Lys Rat $Bl Lys
MODDE 7 - 2003-11-27 10:01:55
N=38 DF=30
N=38 DF=30
PM
S/B~
R2=0.920 Q2=0.877
R2 Adj.=0.901 RSD=0.3027 Conf. lev.=0.95

MODDE 7 - 2003-11-27 10:01:19
Investigation: Reporter Gene Assay Screening - Fold over complement (MLR) Q2 Model Validity Summary of Fit Reproducibility 1.00 0.80 0.60 0.40 0.20 0.00
R2
Investigation: Reporter Gene Assay Screening Fold over complement (MLR) Scaled & Centered Coefficients for S/B~ 0.80 0.60 0.40 0.20 0.00 -0.20 StH
R2 Adj.=0.902 RSD=0.3018 Conf. lev.=0.95
N=38 DF=33
Cel
N=38 DF=33
R2=0.912 Q2=0.887
Ion
S/B~
Page 11 (20)
The revised model contains no outliers (below, left). However, the plot of the deleted studentized residuals versus the predicted value (below, right) indicates that some of the largest residuals correspond to the six centre-points. A similar phenomenon was present also in the initial screening design. This hints at curvature problems. Curvature is easy to handle with a quadratic regression model but not with the linear model used here.
Investigation: Reporter Gene Assay Screening Fold over complement (MLR) S/B~ with Experiment Number labels Deleted Studentized Residuals 0.99 0.98 0.95 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.05 0.02 0.01 -4 -3 -2 Investigation: Reporter Gene Assay Screening Fold over complement (MLR) S/B~ with Experiment Number labels 2 1 0 -1 -2 -1 0 Predicted
N=38 DF=33 R2=0.912 Q2=0.887 R2 Adj.=0.902 RSD=0.3018
MODDE 7 - 2003-11-27 10:05:35
19 16 1 36 37 17 28 35 22 27 38 2 3 18 15 8 926 13 4 5 12 32 31 30 7 11 10 34 21 14 6 25 33 24 20 3 29
N=38 DF=33 R2=0.912 Q2=0.887 R2 Adj.=0.902 RSD=0.3018
MODDE 7 - 2003-11-27 10:05:03
19 36 37 17 22 28 38 23 2 18 8 26 9 13 4 5 12 7 34 11 30 31 10 21 25 24 20 3 29 1
1
16 27 15 32 6 33 35
N-Probability
14
Task 2.3
MODDEs Optimizer was used to locate the factor combination which maximises the response. PMA, Ratio and $Block were not included in the final model and are therefore greyed out in the Optimizer factor spreadsheet.
Page 12 (20)
The results of running the Optimizer are shown below. The optimum point corresponds to having three factors at their upper limit and one at its lower limit.
The Fold-over experiments did not indicate any large two-factor interactions. Instead, it confirmed that three of the factors dominate: Cells, Ionomycin and Stimulation Time. These three factors form the basis of the optimisation design employed during Phase 3, which will be better suited to handling the non-linear behaviour noted above.
Page 13 (20)
Solutions to REPORTER GENE ASSAY (Phase 3 - Optimisation)

Task 3.1
The replicate plot shows that the signal-to-background ratio is much higher than in the screening designs. The histogram and Box-Whisker plots indicate that a response transformation is no longer required.
Investigation: Reporter Gene Assay RSM with CCF Plot of Replications for S/B with Experiment Number labels
Investigation: Reporter Gene Assay RSM with CCF Histogram of S/B 7
Investigation: Reporter Gene Assay RSM with CCF Descriptive Statistics Plot
8
200
14 12 10 11 13 16 17 15 18
Count
6 5 4 3 2 1
200 150 100 50
7
150 S/B 100 50
6 4 5 3 9
1 2 3 4 5 6 7 8
S/B
9 10 11 12 13 14 15
17
62
107 Bins
152
197
242
S/B Min: 17.9 Max: 221.4 Median: 107.8 Mean: 120.683
Replicate Index
MODDE 7 - 2003-11-27 10:16:47
MODDE 7 - 2003-11-27 10:17:10
Task 3.2
The default model has a relatively poor Q2 (R2 = 0.91, Q2 = 0.56). The top two plots relate to the initial model. The model was pruned by removing non-significant terms (R2 = 0.89, Q2 = 0.74). The lower two plots relate to the refined model.
Investigation: Reporter Gene Assay RSM with CCF (MLR) Q2 Model Validity Summary of Fit Reproducibility 1.00
50
R2
0.80
0
0.60
-50
0.40
StH*StH Cel*StH Cel*Cel StH*Ion StH*Ion Cel*Ion Ion*Ion Cel*Cel StH
N=18 DF=8
Cel
0.00
S/B
Ion
0.20
-100
R2=0.908 Q2=0.558
R2 Adj.=0.805 RSD=25.3554 Conf. lev.=0.95

MODDE 7 - 2003-11-27 10:19:15
N=18 DF=8

R2
Investigation: Reporter Gene Assay RSM with CCF (MLR) Q2 Model Validity Summary of Fit Reproducibility 1.00 0.80
0
50
0.60 0.40 0.20 0.00

-50
S/B
N=18 DF=11
R2=0.896 Q2=0.739
R2 Adj.=0.840 RSD=22.9934 Conf. lev.=0.95

MODDE 7 - 2003-11-27 10:19:50
N=18 DF=11
Ion*Ion
StH
Cel
Ion
Page 14 (20)
There are no outliers (below, left) and the residuals are independent of the predicted value (below, right).
Investigation: Reporter Gene Assay RSM with CCF (MLR) S/B with Experiment Number labels 0.98 0.95 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.05 0.02 -4 -3
Investigation: Reporter Gene Assay RSM with CCF (MLR) S/B with Experiment Number labels Deleted Studentized Residuals 2 1 0 -1 -2 20 40
1 16 17 15 3 7 6 4 10 8 12 5 14 2 11
1 3 2 9
60 80 100 120 140 160 180 200 220 Predicted
N=18 DF=11 R2=0.896 Q2=0.739 R2 Adj.=0.840 RSD=22.9934
MODDE 7 - 2003-11-27 10:20:44
N-Probability
4 10 5 11 13
16 17 15 6
7 12
8 14
18 13 9
-2 -1
18

N=18 DF=11 R2=0.896 Q2=0.739 R2 Adj.=0.840 RSD=22.9934
MODDE 7 - 2003-11-27 10:20:18
Task 3.3
The contour plots below show how the signal-to-background ratio varies in relation to the three factors. The optimum factor combination is high Stimulation time (6 hours), high Ionomycin (2) and intermediate Cells (around 320000).
The Optimizer was used to obtain more exact co-ordinates of the optimum.
Page 15 (20)
After the first optimisation round the 8th simplex was found to be best.
During the second optimisation round, new starting points were generated in the vicinity of the best simplex from the first round.
Four of the five new simplexes converge to the same point: Cells 320000, Stimulation Time = 6 and Ionomycin = 2.
Page 16 (20)
The results for the best predicted simplex were transferred to the SweetSpot plot, a plot which clearly show the location of the optimal point.
Further, the five simplex factor co-ordinates were transferred to the prediction list showing that the predicted optimal S/B value is 260 40.
The optimal factor combination within the investigated experimental domain is Cells 320000, Stimulation Time = 6 and Ionomycin = 2. In the final DOE stage, this point will be assessed for robustness. However, due to practical considerations, robustness testing was not performed on this precise point but rather one close to it (see Phase 4).
Page 17 (20)
Solutions to REPORTER GENE ASSAY (Phase 4 Robustness)

Task 4.1
The histogram and replicate plots indicate that a transformation is unnecessary. The replicate plot also shows that all the response values are above 50, i.e. within specification.
Investigation: Reporter Gene Assay RobTest Frac Fac Plot of Replications for S/B with Experiment Number labels 90.00 Investigation: Reporter Gene Assay RobTest Frac Fac Histogram of S/B
8
80.00 70.00 60.00 50.00 40.00
13 1516
8 6
1 2 3 4 5
12 14 11 10
19 18 17
S/B
Count
4 2 0
2.00 4.00 6.00 8.00 10.00 12.00 14.00 16.00 Replicate Index
MODDE 7 - 2003-11-27 11:50:30
52
60.5
69 Bins
77.5
86
94.5
MODDE 7 - 2003-11-27 11:51:03
Task 4.2
In robustness testing model refinement is usually not performed and the ideal result is no model at all. The model obtained is poor (R2 = 0.93, Q2 = negative). However, the regression coefficient plot indicates that S/B is sensitive to changes in Ionomycin concentration.
Investigation: Reporter Gene Assay RobTest Frac Fac Q2 (MLR) Model Validity Summary of Fit Reproducibility 1.00 0.80 0.60 0.40 0.20 0.00 -0.20 S/B
-5 10
R2
Investigation: Reporter Gene Assay RobTest Frac Fac (MLR)

Scaled & Centered Coefficients for S/B
Cel*StH
Ion*StH
Cel*Ion
Cel*Lys
Ion*Lys
N=19 DF=3
N=19 DF=3
R2=0.930 Q2=-84.830
R2 Adj.=0.577 RSD=6.7501 Conf. lev.=0.95

MODDE 7 - 2003-11-27 11:54:07
StH*Lys
Cel*PM
StH
PM*StH
PM*Ion
Cel
Ion
Lys
PM*Lys
PM
Page 18 (20)
Task 4.3
In Task 4.2 it was shown that S/B is not robust to changes in Ionomycin concentration. However, the response data themselves are robust given that they are within specification. The factor range of Ionomycin must be reduced by half in order to make S/B robust. Hence, the concentration range for Ionomycin within which robustness can be claimed is 1.4851.515 g/ml rather than 1.471.53 g/ml.
Investigation: Reporter Gene Assay RobTest Frac Fac (MLR) Scaled & Centered Coefficients for S/B
10
-5
Cel*StH
PM*StH
Ion*StH
Cel*Ion
PM*Ion
Ion*Lys
PM*Lys
N=19 DF=3
R2=0.930 Q2=-84.830
R2 Adj.=0.577 RSD=6.7501 Conf. lev.=0.95

MODDE 7 - 2003-11-27 11:55:49
The final DOE phase illustrated the first limiting case of robustness testing, i.e., a significant model and inside specification. S/B was most sensitive to changes in Ionomycin concentration.
StH*Lys
Cel*Lys
Cel*PM
StH
Cel
Ion
Lys
PM
Page 19 (20)
Discussion and conclusions of REPORTER GENE ASSAY example

Of the six factors originally investigated, three dominated the initial screening phases - Cells, Ionomycin and Stimulation Time (StimH). A few two-factor interactions were also used in the screening model as they increased the predictive power. However, the problem with two-factor interactions in medium-resolution screening designs is that they are often confounded with other twofactor interactions. Such confounding can be resolved using the Fold-over technique. The addition of the 19 Fold-over experiments showed two things. First of all, there was no systematic shift in the response data between the two sets of experiments. Secondly, there were no important twofactor interactions. On the contrary, it confirmed the importance of the three key factors. There was also some evidence of curvature (non-linear behaviour) which can be investigated in more detail by using a central composite design. In the RSM phase, the key factors Cells, Ionomycin and Stimulation Time were optimised using a CCF design in 17 runs. The factor ranges were adjusted in accordance with the findings of the screening phases. This design identified the optimal factor combination Cells 320000, Stimulation Time = 6 and Ionomycin = 2. In the final robustness testing, the set point was defined as Cells = 300000 PMA = 10 Ionomycin = 1.5 (320000 was optimal according to the CCF design but 300000 is more practical as it means less crowding of the sample volume.) (Had almost no influence during screening, low level selected). (A concentration of 2g/ml was optimal according to the CCF design but 1.5 is more practical. Too high a concentration creates an interference with the real signal and may thus reduce the signal-to-background ratio.)
Stimulation time = 5.5 (Six hours was optimal according to the CCF design but 5.5 hours fits in better with an 8 hour working day.) LysVolume = 30 (Low level, as found to be optimal during screening).
The signal-to-background ratio is most sensitive to changes in the Ionomycin concentration. However, the response may be regarded as robust given that all the values were within specification. The final conclusion is that the results of the four phases are both coherent and consistent. This indicates the high quality of the underlying experimental data.
Page 20 (20)
DOE-Exercise CHROMSPHER_B (Frac Fac)

Evaluation of mobile phase additives in HPLC
Background
One important property in HPLC is the capacity factor. There are several mobile phase constituents that may influence this chromatographic response, such as, pH, temperature, and type and amount of mobile phase modifiers. Thus, optimization of capacity factors is not always straightforward, but requires design of experiments in combination with multivariate modeling for optimal output. This example is based on the publication of Andersson et al (Chromatographia Vol 38, 715-722, 1994).
Objective
In this example the influence of seven factors on chromatographic response (capacity factors) is investigated. Five factors represent mobile phase modifiers, three uncharged and two charged, and the last two are pH and column temperature. The chromatographic response (i.e., capacity factor) for the Chromspher B stationary phase was assessed using five substances (almokalant, amoxicillin, metoprolol, omeprazole and S 29). The goal was to get an overview (screening) of which factors are most influential for the capacity factors, since it is desirable to regulate these through changes in the factors.
Data
Acetoniltrile (ACN), methanol (MeOH) and tetrahydrofuran (THF) represent uncharged modifiers, whilst 1-octanesulphonic acid (OSA) and N,N-dimethyloctylamine (DMOA) correspond to charged ones.
Page 1 (8)
Tasks
Task 1
In MODDE first define the seven factors according to the information given above. The factors OSA and DMOA must be log-transformed (with C1 = 1 and C2 = 0). The next step is to specify the responses. Log-transform all five responses. Set C1 to 1 and C2 to 0 for all responses but Amoxicillin, which should have the settings C1 = 1 and C2 = 0.04. Select Screening as objective and MODDE will then prompt for 16+3 experiments in terms of a 27-3 FFD. Accept this proposal. This design only supports linear terms. However, the experimenters wished to estimate some interaction terms and hence they carried out five extra runs selected D-optimally. To append extra experiments to the worksheet you may right-mouse click in the worksheet window and select Add Experiment. Continue until the worksheet has 24 experiments. In EXCEL open the XLS-file CHROMS_B.XLS and COPY/PASTE the worksheet content to the worksheet generated in MODDE.
Task 2
Evaluate the raw data by creating replicate plots and histograms. Are the responses approximately normally distributed? What about the replicate error, is it large or small compared with the variation across the entire design?
Task 3
Set runs 20-24 as excluded (Excl). Select MLR as FIT METHOD (Analysis/Select Fit Method) and compute the model. Check R2, Q2, MVal, Rep, ANOVA, and N-plot of residuals for each one of the five responses. Can you trace any anomalies in the data? Look at the coefficients and interpret the model. Which factors seem most relevant?
Task 4
Include runs 20-24. Edit the model (Edit/Model) and add the three interaction terms pH*DMOA, ACN*OSA and MeOH*THF. Compute the model with MLR and compare results with Task 3.
Task 5
Use the same data material as in Task 4, but switch to PLS instead of MLR. What are the similarities and differences between the MLR and the PLS models? How are the different responses correlated? Which factors are most meaningful?
Page 2 (8)
Solutions to CHROMSPHER_B
Task 2
Investigation: chroms_b Histogram of OM~ 12 10 8 Count 6 4 2 0 -1.00 -0.60 -0.20 0.20 Bins
MODDE 7 - 2003-11-25 11:25:15
Investigation: chroms_b Histogram of S29~ 12 8 10 8 Count 6 4 2 2 0 Count 6 4
Investigation: chroms_b Histogram of Almo~
0.60
1.00
1.40
-1.00
-0.65
-0.30
0.05 Bins
0.40
0.75
1.10
-1.00
-0.65
-0.30
0.05 Bins
0.40
0.75
1.10
MODDE 7 - 2003-11-25 11:25:46
MODDE 7 - 2003-11-25 11:26:04
Investigation: chroms_b Histogram of Amox~ 10 8 Count Count 10 6 4 2 0 0
Investigation: chroms_b Histogram of Meto~
15
-3.00
-2.45
-1.90
-1.35 Bins
-0.80
-0.25
0.30
-1.00
-0.65
-0.30 Bins
0.05
0.40
0.75
The five histograms show that all responses are approximately normally distributed. This is what you would expect for logtransformed chromatographic data.
MODDE 7 - 2003-11-25 11:26:25
MODDE 7 - 2003-11-25 11:26:51
Investigation: chroms_b Plot of Replications for OM~ with Experiment Number labels 1.00 0.80 0.60 OM~ 0.40 0.20 0.00 -0.20 0 2 4 6 8 Replicate Index
MODDE 7 - 2003-11-25 11:27:37
Investigation: chroms_b Plot of Replications for S29~ with Experiment Number labels
Investigation: chroms_b Plot of Replications for Almo~ with Experiment Number labels
1 6 8 3 14 16 13 17 10 11
21 20
S29~
0.80 0.60 0.40 0.20 0.00
1 3
21 6 8 4 12 15 5 7 9 10 11 14 17 13 16 18 19 20
0.50 Almo~
20 8 1 3
0.00
4 12 15 5 7
22
22
6 12 15 4 5 9 11 10
13 14
16 19 17 18
21 22
19 18 24 23
-0.20 0 2 4 6 8
24 23
-0.50 0 2
2 7
4 6 8
24 23
10 12 14 16 18 20 22 24
10 12 14 16 18 20 22 24 Replicate Index
MODDE 7 - 2003-11-25 11:28:06
10 12 14 16 18 20 22 24 Replicate Index
MODDE 7 - 2003-11-25 11:28:23
Investigation: chroms_b Plot of Replications for Amox~ with Experiment Number labels
Investigation: chroms_b Plot of Replications for Meto~ with Experiment Number labels
-0.50 Amox~ -1.00 -1.50 -2.00 0
1 2 5 34 15 12
6 9 7 8 17 11 14 10 13 16 22 21 20 24 23
0.60 0.40 Meto~ 0.20 0.00 -0.20 -0.40
20 6 3 2 7
0 2 4 6 8
8 13 9 11 10 14
16 17
19
21 22
18
12 15 4 5
18 24 23
19
2 4 6 8 10 12 14 16 18 20 22 24 Replicate Index
MODDE 7 - 2003-11-25 11:28:41
-0.60
10 12 14 16 18 20 22 24 Replicate Index
MODDE 7 - 2003-11-25 11:28:59
The replicate error is very small for each response, in fact so small that it will be difficult to avoid lack of fit in the ANOVA lack of fit test. The replicate plot for the fourth response (Amox) indicates a deviating behavior of experiment 19.
Page 3 (8)
Task 3
We can see that four out of five responses are well accounted for by the model. One response, Amoxicillin, has a large gap between R2 and Q2 indicating model problems for this response. Model validity is only OK with regards to the first response. In the N-plots below, residuals of a well predicted (S29) and a poorly predicted (Amox) response are plotted. For the problematic response (Amox), experiments 1, 11, 17, and 19 stick out a little, but they are still inside 4 standard deviations. The coefficient plot reveals that for most responses the coefficient patterns are similar. The notable exception is Amox, for which the factor pH has a negative coefficient, and not a positive one as for the other responses.
Investigation: chroms_b (MLR) Summary of Fit 1.00 0.80 0.60 0.40 0.20 0.00 -0.20 OM~ S29~ Almo~ Amox~ Meto~
Investigation: chroms_b (MLR) S29~ with Experiment Number labels 0.98 0.95 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.05 0.02 -4 -3
4 12 15 20 9 8 18 14 2 13 10 53 6 19 17 7 16
-2 -1 0 1
1 11
N-Probability

N=20 DF=12 R2=0.946 Q2=0.836 R2 Adj.=0.914 RSD=0.0796
MODDE 7 - 2003-11-25 13:41:16
N=20 DF=12
Investigation: chroms_b (MLR) Amox~ with Experiment Number labels 0.98 0.95 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.05 0.02 -4 -3
Investigation: chroms_b (MLR) Normalized Coefficients
ACN MeO THF pH OSA~ DMO~ T
1 189 14 13 2 7 8 20 3 16 5 6 15 4 10 12
0 1 2
11
N-Probability
0.50
0.00
17 19
-2 -1
-0.50
OM~
S29~
Almo~
Amox~
Meto~

N=20 DF=12 R2=0.798 Q2=0.362 R2 Adj.=0.680 RSD=0.2216
MODDE 7 - 2003-11-25 13:41:59
N=20 DF=12
Task 4
Evidently, the modeling of all responses benefits from the inclusion of the three cross-terms. The Nplot of Amox residuals has improved slightly, because now only experiments 1 and 19 deviate. The MeOH-THF term is most powerful among the cross-terms, which is seen in the coefficient plots. In the interaction plots, it is possible to discern that the MeOH-THF interaction is more pronounced for Amox than for S29.
Investigation: chroms_b (MLR) Summary of Fit 1.00 0.80 0.60 0.40 0.20 0.00 -0.20 OM~ S29~ Almo~ Amox~
Investigation: chroms_b (MLR) Amox~ with Experiment Number labels 0.98 0.95 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.05 0.02 -4 -3
19
-2
8 13 7 11 923 16 5 14 18 10 24 20 22 6 17 15 4 12 3 2 21
-1 0 1 2 3
N-Probability
Meto~

N=24 DF=13 R2=0.866 Q2=0.522 R2 Adj.=0.763 RSD=0.1811
MODDE 7 - 2003-11-25 11:38:38
N=24 DF=13
Investigation: chroms_b (MLR) Scaled & Centered Coefficients for S29~ 0.10 0.20
Investigation: chroms_b (MLR) Scaled & Centered Coefficients for Amox~
0.00
0.00
-0.10
-0.20
-0.20 pH*DMO~ MeO*THF MeO DMO~ ACN*OSA~ pH ACN THF OSA~ T
-0.40 ACN*OSA~ pH*DMO~

THF (low ) THF (high)
N=24 DF=13
R2=0.978 Q2=0.930
R2 Adj.=0.961 RSD=0.0621 Conf. lev.=0.95

MODDE 7 - 2003-11-25 11:39:04
N=24 DF=13
R2=0.866 Q2=0.522
R2 Adj.=0.763 RSD=0.1811 Conf. lev.=0.95

MODDE 7 - 2003-11-25 11:38:49
Investigation: chroms_b (MLR) Interaction Plot for MeO*THF, resp. S29~ 0.50 0.40 Amox S29 0.30 0.20 0.10 0.00 16 18 20 22 24 -0.80 -0.90 -1.00 -1.10
THF (low ) THF (high)
Investigation: chroms_b (MLR) Interaction Plot for MeO*THF, resp. Amox~
THF (low)
-0.70
THF (low)
THF (high)
THF (low) THF (high)

26 28 30 MeOH
THF (high)
16 18 20 22 24 MeOH
N=24 DF=13 R2=0.866 Q2=0.522
THF (low) THF (high)

26 28 30
N=24 DF=13
R2=0.978 Q2=0.930
R2 Adj.=0.961 RSD=0.0621
MODDE 7 - 2003-11-25 11:39:53
R2 Adj.=0.763 RSD=0.1811
MODDE 7 - 2003-11-25 11:40:11
MeO*THF
MeO
DMO~
ACN
THF
pH
OSA~
Page 5 (8)
Task 5
According to the R2- and Q2-values of the individual responses, the MLR and PLS models provide similar results. However, one must realize that when using MLR there are five models to consider, whereas PLS only fits one model to all responses.
Investigation: chroms_b (MLR) Summary of Fit 1.00 0.80 0.60 0.40 0.20 0.00 -0.20 OM~ S29~ Almo~ Amox~ Meto~
Investigation: chroms_b (PLS, comp.=4)Q2 Model Validity Summary of Fit Reproducibility 1.00 0.80 0.60 0.40 0.20 0.00 -0.20 OM~ S29~ Almo~ Amox~ Meto~
R2
N=24 DF=13
N=24 DF=13
The PLS model has four components. For the response S29, which is strongly correlated with OM, Almo, and Meto, we can see that primarily the first PLS component explains response variation. For the deviating response Amox, however, the first component of the PLS model reflects hardly any variation. Rather, the second component models this response.
Investigation: chroms_b (PLS, comp.=4) PLS Summary (cum) for S29~ 1.00 0.80 R2 & Q2 0.60 0.40 0.20 0.00 Comp1 Comp2 Comp3 Comp4 R2 & Q2
R2 Q2
Investigation: chroms_b (PLS, comp.=4) PLS Summary (cum) for Amox~ 1.00 0.80 0.60 0.40 0.20 0.00 Comp1 Comp2 Comp3 Comp4
R2 Q2
N=24 DF=13
R2=0.961 Q2=0.850
R2 Adj.=0.932 RSD=0.0823
MODDE 7 - 2003-11-25 13:25:50
N=24 DF=13
R2=0.836 Q2=0.548
R2 Adj.=0.710 RSD=0.2002
MODDE 7 - 2003-11-25 13:26:33
PLS provides a diagnostic tool visualizing the correlation pattern between the X-factors and the Yresponses, namely the PLS t/u score plot. The first component, accounting for almost 72% of the response variation, captures a strong correlation between X and Y. The second component, which explains another 16% of the Y-variation, uncovers a weakly deviating feature of experiment number 19, i.e., the same phenomenon observed in the foregoing exercises.
Page 6 (8)
Investigation: chroms_b (PLS, comp.=4) Score Scatter: t[1] vs u[1] with Experiment Number labels
Investigation: chroms_b (PLS, comp.=4) Score Scatter: t[2] vs u[2] with Experiment Number labels 2
20 21 86 1 13 16 3 1722 14 9 4 12 15 5 19 18
1 u[1] 0 -1 -2
-2
8 16
11 7 24 10 23
-2 -1
u[2]
75 10 23 4 15 12 20 18 13
6 1 22 17 92 21 11 24 14 3
-4
19
0 t[1] 1 2 -3 -2 -1 t[2]
N=24 DF=13 Cond. no.=3.2013 Y-miss=0
N=24 DF=13
The third and fourth components also display reasonable correlations between t and u, considering they merely model 4 and 2% of the variation in the responses. The third component reveals a weak non-linear relationship.
Investigation: chroms_b (PLS, comp.=4) Score Scatter: t[3] vs u[3] with Experiment Number labels 3 2 1 u[3] 0 -1 -2 -3
Investigation: chroms_b (PLS, comp.=4) Score Scatter: t[4] vs u[4] with Experiment Number labels 2
7 1 8 6 23 13 16 9 22 20 4 21 2412 15 2 14 10
5 17
u[4]
1 0 -1 -2
5 1 18 7 21 14 3
-1
24 19 15 12 4
13 9 11 6 20 8 17 23 16 22 2
11
18 3 19
-1
10
0 t[4] 1
0 t[3]
N=24 DF=13
N=24 DF=13
Because PLS fits only one model to all responses, we may use the PLS loading plot to overview the relationships among all factors, cross-terms, and responses at the same time. The loading plot given below represents 88% of the response variation. This plot corroborates that Amox provides unique information about the experiments. The other four responses are correlated, and correlation coefficients among them always exceed 0.75 (Hint: use Worksheet/Correlation/Correlation Matrix). The loading plot also suggests that the factors pH, ACN, THF, and MeOH are most influential for the responses. The three cross-terms are of comparatively low importance. Basically, the VIP plot confirms the conclusions drawn from the loading plot.
Page 7 (8)
Investigation: chroms_b (PLS, comp.=4) Loading Scatter: wc[1] vs wc[2] 0.60 0.40 0.20 wc[2] 0.00 -0.20 -0.40 -0.60 wc[1]
N=24 DF=13 Cond. no.=3.2013 Y-miss=0
Investigation: chroms_b (PLS, comp.=4) Variable Importance Plot
Am~ pH*DMO~ ACN*OSA~ DMO~ MeO*THF MeO THF ACN T pH

pH*DMO~ ACN*OSA~ MeO*THF MeO DMO~ pH ACN THF OSA~
Page 8 (8)
1.50
Al~ Me~
VIP
OSA~
S29~ OM~
1.00
0.50
N=24 DF=13
Conclusions
This application shows how DOE can be applied to explore the performance of chromatographic equipment. Seven factors were screened and the resulting models (MLR or PLS) revealed that four factors (pH, ACN, THF, and MeOH) were considerably more meaningful than the others. These are the factors to consider for further studies, e.g., optimization modeling. One response, Amox, was different, mainly because of a distinctively different pH-dependence. A separate MODDE investigation for this response is reasonable.
-0.60 -0.40 -0.20 0.00 0.20 0.40 0.60 0.80
0.00
CHIRAL SEPARATION (Optimisation)

Optimisation of Chiral Separation of Omeprazole and One of Its Metabolites
Background
Omeprazole is a potent inhibitor of gastric acid secretion and is frequently used against acid-related diseases in the stomach. Both enantiomers of omeprazole are effective in this respect. Omeprazole is metabolised to intermediary products of which hydroxylated omeprazole is the main metabolite. This metabolite is able to block the enzyme H+,K+-ATPase selectively. This enzyme is responsible for the gastric acid production.
Objective
The experimental objective of this study was to optimise the chiral separation (using HPLC) of the (R)- and (S)-enantiomers of omeprazole and its main metabolite hydroxylated omeprazole. In chromatography, the objective is separation of the analytes within a reasonable time. Separation relies on different retention of each analyte on the chiral stationary phase. Thus, the retention of each analyte is important and this response is described by the capacity factor, k. The degree of separation between two analytes is estimated as the resolution between two adjacent peaks in the chromatogram. A resolution of 1 is the minimum acceptable for separation of neighbouring peaks, but for complete baseline separation a resolution above 1.5 is required. In this application, four HPLC factors were varied: mobile phase pH, concentration of the organic eluent modifier acetonitrile (ACN), ionic strength and temperature. Logarithmically transformed capacity factors were measured for the four solutes (R-omeprazole, S-omeprazole, Rhydroxyomeprazole, S-hydroxyomeprazole). The experimental data are taken from the following reference: Karlsson, A., and Hermansson, S., Optimisation of Chiral Separation of Omeprazole and One of Its Metabolites on Immobilized 1-Acid Glycoprotein Using Chemometrics, Chromatographia, 44, 10-18, 1997. In the treatment of the experimental data below, solute 1 is omeprazole and solute 2 is hydroxyomeprazole. The (R)- and (S)-notation indicates different enantiomers. Capacity factors are denoted k and there are four of these. The resolution responses of interest are denoted Res. The experimental objective was to find a factor combination which: (a) achieves retention times (capacity factors) of less than 15 minutes (b) maintains resolution above 1.5 (complete baseline separation).
Page 1 (10)
Data
Factors
Responses:
Design:
Page 2 (10)
Tasks
Task 1
Create a new project in MODDE. Define the four factors and the eight responses as outlined above. Note 1: The four capacity factors are commonly analysed after transforming to logs. Note 2: The last four responses are derived from the four capacity factors. Res1 is k(S)-1 divided by k(R)-1. Res2 is k(S)-2 divided by k(R)-2. Res3 is k(R)-1 divided by k(R)-2. Res4 is k(S)-2 divided by k(R)-1.
The four derived responses are not shown in the worksheet until a model has been fitted. Select RSM and the second-ranked Reduced CCF design augmented with four centre-points. There are different versions of this design and the one used by the original investigators is not the same as that recommended by MODDE. Therefore, Copy/Paste the contents of CHIRAL SEPARATION.XLS into MODDE. Evaluate the raw data and the underlying design (replicate plot, histogram, scatter plot of responses, correlation matrix, etc). Are the responses approximately normally distributed? How large or small is the replicate error?
Task 2
Fit the model and review and interpret the results. How are the eight responses related? Which model terms are most important? Which factor settings meet the objectives of the study, i.e. capacity factors below 15 minutes and resolutions above 1.5?
Task 3
The experimenters carried out one verifying experiment to test the predictive power of the model. The verifying experiment was Eluent modifier = 11%, Temperature = 25 C, Ionic Strength = 0.02, and pH = 6.3. At this point, the measured capacity factors were: k(R)-1 = 2.48, k(S)-1 = 5.86, k(R)-2 = 1.59, and k(S)-2 = 3.18. How do the predictions from your model compare with these actual measurements?
Page 3 (10)
Solutions to CHIRAL SEPARATION

Task 1
The replicate plots below relate to log-transformed capacity factors. In all four cases, the replicate variation is small relative to the overall variation of the responses, particularly for responses 2-4. No replicate plots are shown for the four derived responses.
Investigation: Chiral Separation Plot of Replications for k(R)-1~ with Experiment Number labels 0.60 0.40 k(R)-1~ 0.20 0.00 -0.20 0 2 Investigation: Chiral Separation Plot of Replications for k(S)-1~ with Experiment Number labels 1.00
4 5
7 13 8 6 12 11 14 9 10 23 20 22 17 21 18 19 24
k(S)-1~
34
7 13 5 8 6 12 11 15 20 23 22 17 18 21 24 19 14 16 9 10
0.80
15
0.60 0.40 0.20 0.00 0 2
1 2
4 6
16
1 2
4 6
10 12 14 16 18 20 22
10 12 14 16 18 20 22
Replicate Index
MODDE 7 - 2003-11-26 18:29:03
Replicate Index
MODDE 7 - 2003-11-26 18:29:22
Investigation: Chiral Separation Plot of Replications for k(R)-2~ with Experiment Number labels 0.40
Investigation: Chiral Separation Plot of Replications for k(S)-2~ with Experiment Number labels 0.80
3
0.20 k(R)-2~ 0.00 -0.20 -0.40 0 2
7 13 5 8 6 12 11 14 15 24 23 22 21 18 20 17 19 16
34
7 13 15 5
0.60 k(S)-2~ 0.40 0.20 0.00 -0.20 -0.40 0
1 2
4 6
12 11 14
20 24 22 21 18 23 17 19 16
9 10
8 10 12 14 16 18 20 22 Replicate Index
MODDE 7 - 2003-11-26 18:29:40
2
2 4 6 8
9 10
Replicate Index
10 12 14 16 18 20 22
MODDE 7 - 2003-11-26 18:29:57
Page 4 (10)
The appropriateness of the log transformation is confirmed by the shape of the histograms of each response (below).
Investigation: Chiral Separation Histogram of k(R)-1~ 12 15 10 8 Count 6 4 2 0 -0.30 -0.15 0.00 0.15 0.30 0.45 0.60 0.75 Bins
MODDE 7 - 2003-11-26 18:30:46
Investigation: Chiral Separation Histogram of k(S)-1~
Count
10
-1.00
-0.60
-0.20
0.20 Bins
0.60
1.00
1.40
MODDE 7 - 2003-11-26 18:31:09
Investigation: Chiral Separation Histogram of k(R)-2~ 10 8 Count Count 6 4 2 0 14 12 10 8 6 4 2 -0.50 -0.35 -0.20 -0.05 Bins
MODDE 7 - 2003-11-26 18:31:27
Investigation: Chiral Separation Histogram of k(S)-2~
0.10
0.25
0.40
-1.00
-0.65
-0.30
0.05 Bins
0.40
0.75
1.10
MODDE 7 - 2003-11-26 18:31:43
Task 2
A quadratic regression model was fitted to the response data. The summary plot (below) indicates that the first response has an excellent model but responses 2-4 suffer from lack of fit.
Investigation: Chiral Separation (MLR) Summary of Fit 1.00 0.80 0.60 0.40 0.20 0.00 -0.20 k(R)-1~ k(S)-1~ k(R)-2~ k(S)-2~
N=24 DF=9
Page 5 (10)
The correlation matrix is useful for examining how the derived responses correlate with the measured ones. The figure below is an excerpt of the complete correlation matrix showing just the portion related to the responses. It is evident that all four capacity factors are strongly correlated and so it is reasonable to include them in the same investigation where we would expect them to have similar patterns of regression coefficients. The only really different response is the derived response Res3.
The coefficient overview plot shown below confirms the similarity of the coefficient profiles. There are no coefficients for the derived responses as these are generated from the fitted capacity factors. Overall, the linear terms dominate and by far the most important factors are concentration of acetonitrile (ACN) and temperature. It can also be seen that pH has some influence on the second, third and fourth responses. There is some evidence of a quadratic effect of temperature for the third and fourth responses.
Investigation: Chiral Separation (MLR) Normalized Coefficients 0.20 0.00 -0.20 -0.40 -0.60 -0.80 -1.00 k(R)-1~ k(S)-1~ k(R)-2~ k(S)-2~
ACN temp Ion pH ACN*ACN temp*temp Ion*Ion pH*pH ACN*temp ACN*Ion ACN*pH temp*Ion temp*pH Ion*pH
N=24 DF=9
The predictive power of the models was improved by removing non-significant model terms. The regression coefficients of the refined models are shown below.
Page 6 (10)
Investigation: Chiral Separation (MLR) Scaled & Centered Coefficients for k(R)-1~
Investigation: Chiral Separation (MLR) Scaled & Centered Coefficients for k(S)-1~
0.00
0.00 -0.10 -0.20
min
-0.10
-0.20 -0.30 ACN*ACN ACN*temp ACN*ACN pH ACN ACN Ion Ion pH temp*temp temp*temp ACN*temp
temp
min
N=24 DF=16
R2=0.984 Q2=0.967
R2 Adj.=0.977 RSD=0.0329 Conf. lev.=0.95
N=24 DF=16
temp
R2=0.995 Q2=0.987
R2 Adj.=0.993 RSD=0.0231 Conf. lev.=0.95
MODDE 7 - 2003-11-26 18:36:37
MODDE 7 - 2003-11-26 18:36:50
Investigation: Chiral Separation (MLR) Scaled & Centered Coefficients for k(R)-2~
Investigation: Chiral Separation (MLR) Scaled & Centered Coefficients for k(S)-2~
0.00 min -0.10 -0.20
0.00 min ACN*ACN ACN Ion pH temp*temp ACN*temp temp -0.10 -0.20 -0.30 ACN*ACN ACN Ion pH temp*temp ACN*temp temp
N=24 DF=16
R2=0.978 Q2=0.944
R2 Adj.=0.968 RSD=0.0420 Conf. lev.=0.95
MODDE 7 - 2003-11-26 18:37:01
N=24 DF=16
R2=0.990 Q2=0.973
R2 Adj.=0.986 RSD=0.0364 Conf. lev.=0.95
MODDE 7 - 2003-11-26 18:37:10
Notice how much the Q2 have increased as a result of the model pruning (see summary plot below) although responses 2, 3, and 4 still exhibit significant Lack of fit. It is concluded that this is due to the extremely low replicate errors for these three responses.
Investigation: Chiral Separation (MLR) Summary of Fit 1.00 0.80 0.60 0.40 0.20 0.00 -0.20 k(R)-1~ k(S)-1~ k(R)-2~ k(S)-2~
N=24 DF=16
In order to interpret the regression models, we created the eight response contour plots shown below. These were constructed using Eluent modifier (ACN) and Temperature as the axes and fixing Ionic strength and pH at their centre levels. The two quartets of response contour plots suggest that the lower left-hand corner (ACN low, temp low, Ion centre, pH centre) is the most interesting.
Page 7 (10)
Page 8 (10)
The conclusion from the eight response contour plots is that it will not be a problem to achieve retention times below 15 minutes. It should also be possible to get all four resolution responses above 1.5. To check this, we used MODDEs Optimizer functionality to locate the optimum factor settings. Because we already know that the capacity factors are not a problem, we excluded them from the optimisation. The specification of the response targets is shown below.
According to the results of the Optimizer (below), simplex #6 is the best. This combination of Eluent modifier = 10%, Temperature = 20C, Ionic strength 0.04 and pH 6.6 is close to the lower lefthand corner identified above in the eight response contour plots.
Page 9 (10)
Task 3
The prediction list below shows point estimates and their associated 95% confidence intervals. The results for the verifying experiment all fall within the 95% confidence interval which corroborates, albeit with just one point, the predictive power of the model.
Conclusions
Excellent R2 and Q2 were obtained for all four capacity factors. However, the models for responses 2-4 suffered from significant lack of fit, which was undoubtedly due to the extremely low replicate errors associated with these responses. The predictions for the verifying experiment were very close to the actual results obtained which gives confidence in the predictive power of the models. Using MODDEs Optimizer, it was easy to locate the factor settings which met the experimental objectives: Eluent modifier = 10%, Temperature = 20C, Ionic strength 0.04 and pH 6.6. These settings ensure complete baseline separation within reasonable retention times.
Page 10 (10)
DOE-Exercise Metabolism (RSM)

Optimization of a microsome-based metabolism assay
Background
In the pharmaceutical industry it is important to study metabolism of candidate drugs. One approach is to incubate substances with microsomal preparations which may be used as model systems to investigate e.g. liver metabolism. During incubation, dedicated inhibitors may be used to block enzymes. This will help uncover which enzyme in the microsomes is responsible for metabolizing a specific drug. However, in order to obtain reliable results it is first necessary to ascertain that the compound under study is sufficiently well metabolized. In the current application the aim was to ensure that drug metabolism exceeded 40%. The example originates from Carlsson Research AB, Gothenburg, Sweden, and we gratefully acknowledge the company for allowing us to use this data set.
Objective
The objective of the investigation was to optimise the assay conditions for the enzymes such that a maximum of 60% of the drug was left after incubation with the microsomal preparation.
Data
The following five factors were of interest:
Comments/Explanation: Drug Drug concentration [M]. The higher the drug concentration the greater the risk the drug itself will inhibit the enzymes. Expressed on a log-scale. Microsome Microsome concentration [mg/ml]. The more the enzyme the more rapid the metabolism. Expressed on a log-scale. NADPH NADPH concentration [mM]. Enzyme co-factor. The more co-factor the less risk for total NADPH depletion before the end of the experiment. Expressed on a log-scale. Time Duration of incubation [min]. A longer duration will give the enzymes more opportunity to metabolize the drug. The risk is, however, that other factors will be depleted and hence there may be no net gain from prolonging the incubation time. Ionic strength Ionic strength of the Na/K-phosphate buffer used [mM]. This buffer may affect the ability of the enzymes to interact with the drug. The following response was recorded:
Comments/Explanation: %Left Amount of drug left at the end of the incubation experiment by LC-MS. The experimental objective was to achieve a figure less than 60%.
Page 1 (6)
In order to conduct this optimization study, the five factors were varied using a CCF design. This is a standard RSM design in 26 + 3 experiments.
Tasks
Task 1
Initiate a new investigation in MODDE and define the factors and the response according to the information given above. Remember to specify the log-transform for the three first factors. Select RSM as objective. Accept the recommended 29 run design (CCF design in 26 runs plus 3 centre-points). Enter the response values in the Worksheet. Evaluate the raw data. Are there any outliers? Is there a need for response transformation? What can you say about the replicate error?
Task 2
Fit the quadratic regression model. Determine which factors have the strongest influence on the metabolism of the drug by looking at the coefficient plot. Review the fit and revise the model if needed. Which factor combination represents the optimal metabolism environment for the enzymes in the microsomal preparations?
Page 2 (6)
Solutions to Metabolism
Task 1
Experiment number 15 deviates from the rest (below, top left). This is a very interesting point as it is the only one in the worksheet meeting the stipulated goal of %Left less than 60%. Hence, we are reluctant to remove it. This experiment also causes the distribution of %Left to be skewed (below, top right). The replicate error is very small compared with the overall response variation. One possible remedy might be the NegLog transformation. The results after applying this transform are shown in the two lower plots. Evidently, the NegLog transformation is a sensible choice since the distribution of %Left is closer to a normal distribution after transformation. In the following, we will work with the transformed response variable.
Investigation: Metabolism Plot of Replications for %left with Experiment Number labels
100 90 80 %left 70 60 50 0 2 4 6
Investigation: Metabolism Histogram of %left

12.00
1 2
6 4 3 5 7 9 8
10
13 12 14
11
Count
19 23 24 27 29 28 18 21 22 17 20 25 26 16
10.00 8.00 6.00 4.00 2.00
15
8 10 12 14 16 18 20 22 24 26 28 Replicate Index
MODDE 7 - 2003-12-03 11:33:30
0.00
44
54
64
74 Bins
84
94
104
MODDE 7 - 2003-12-03 11:32:57
Investigation: Metabolism Plot of Replications for %left~ with Experiment Number labels
Investigation: Metabolism Histogram of %left~

14 12 Count
1
-0.50
2
%left~ -1.00
6 4 5
10 23 24 27 19 13 29 12 28 14 18 21 9 22 17 25 2 6 16 20 78 11 15
8 10 12 14 16 18 20 22 24 26 28 Replicate Index
MODDE 7 - 2003-12-03 11:45:53
10 8 6 4 2 0 -2.00 -1.70 -1.40 -1.10 Bins

MODDE 7 - 2003-12-03 11:46:09
-1.50
3
0 2 4 6
-0.80
-0.50
-0.20
Page 3 (6)
Task 2
The fitted quadratic model contains 5 (linear) + 5 (quadratic) + 10 (two-factor interaction) = 20 terms plus the constant. Clearly, many of these are not significant according to the confidence intervals. The model also has negative Q2, which is unsatisfactory. The normal probability plot suggests no outliers in the data and there is no lack of fit (MVal > 0.25).
Investigation: Metabolism (MLR) Summary of Fit
1.00 0.80 0.60 0.40 0.20 0.00 -0.20
Investigation: Metabolism (MLR) Scaled & Centered Coefficients for %left~

0.40 0.20 0.00 -0.20 Dru~ Mic~ NAD~ Tim Ion Dru~*Dru~ Mic~*Mic~ NAD~*NAD~ Tim*Tim Ion*Ion Dru~*Mic~ Dru~*NAD~ Dru~*Tim Dru~*Ion Mic~*NAD~ Mic~*Tim Mic~*Ion NAD~*Tim NAD~*Ion Tim*Ion
N=29 DF=8 R2=0.964 Q2=-1.592 R2 Adj.=0.875 RSD=0.1076 Conf. lev.=0.95
MODDE 7 - 2003-12-03 12:05:34
%left~
Investigation: Metabolism (MLR) %left~ with Experiment Number labels

0.99 0.98 0.95 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.05 0.02 0.01 -4 -3
15 26
-2
10 6 4 25 227 20 22 29 13 11 28 24 7 17 1 6 9 5 3 18 14 12 23 8 21 19
-1 0 1 2
N-Probability

N=29 DF=8 R2=0.964 Q2=-1.592 R2 Adj.=0.875 RSD=0.1076
MODDE 7 - 2003-12-03 11:57:55
Page 4 (6)
In order to improve the modelling results, the following six model terms were discarded: Drug*NADPH, NADPH*Time, NADPH*Ionic strength, Drug*Drug, Mic*Mic & NADPH*NADPH. For this model the performance statistics are: R2 = 0.96, Q2 = 0.85, MVal = 0.41, & Rep = 0.99. These are excellent results and there are no outliers. Hence, the model may be used for predicting a region in experimental where the goal of %Left < 60 is attained.
Investigation: Metabolism (MLR) Summary of Fit
1.00 0.80 0.60 0.40 0.20 0.00 R2 Q2 Model Validity Reproducibility
Investigation: Metabolism (MLR) Scaled & Centered Coefficients for %left~

0.30 0.20 0.10 0.00 -0.10 -0.20 -0.30 NAD~ Mic~*NAD~ Ion*Ion Dru~ Mic~ Tim Ion Dru~*Mic~ Dru~*Ion Dru~*Tim Mic~*Tim Mic~*Ion Tim*Tim Tim*Ion
%left~
N=29 DF=14 Cond. no.=5.3422 Y-miss=0
N=29 DF=14
R2=0.961 Q2=0.850
R2 Adj.=0.922 RSD=0.0850 Conf. lev.=0.95

MODDE 7 - 2003-12-03 12:06:49
Investigation: Metabolism (MLR) %left~ with Experiment Number labels

0.99 0.98 0.95 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.05 0.02 0.01 -4 -3
1 27 4 10 29 20 24 28 2 22 1 3 1 6 14 5 16 7 1 7 3 15 23 9 8 12 18 19 21 26
-2 -1 0 1 2
25
N-Probability

N=29 DF=14 R2=0.961 Q2=0.850 R2 Adj.=0.922 RSD=0.0850
MODDE 7 - 2003-12-03 12:07:00
Page 5 (6)
The contour plot below shows a saddle surface and achieving %Left < 60 is not difficult. Staying in the lower part of the contour plot (low ionic strength and drug concentration, and high microsome and cofactor concentration) may enable response values as low as 20% left (80% of the drug is metabolized). The sweet-spot plot is coded according to the requirement on the response variable.
Conclusions
A very strong quadratic model was obtained. Using the factor combination, low ionic strength and drug concentration, high microsome and cofactor concentration, and 4 hours gives the lowest %Left inside the region explored. The relevance of this area was later verified using additional experiments.
Page 6 (6)
DOE-Exercise WILLGE (Optimisation)

Chemical synthesis with the Willgerodt-Kindler reaction
Background
The Willgerodt-Kindler reaction, a rearrangement that takes place when aryl-alkyl-ketones are heated in the presence of sulphur and an amine, is difficult to explain. One way of investigating the reaction mechanism is to find the factors that have the greatest influence on the reaction. The current data are drawn from the thesis of Torbjrn Lundstedt, Ume University, 1986.
Objective
To determine which factors are the most important using a fractional factorial design. To optimise the system utilising response surfaces.
Data
The proportion Sulphur/Ketone (mol/mol) The proportion Amine/Ketone (mol/mol) Temperature (C) Grain size of Sulphur (mm) Stirring speed (rpm)
Goal: Quantitative (100%) yield.
Page 1 (5)
Tasks
Phase 1: Screening Task 1
Generate a 25-1 fractional factorial design. Enter the response values. Calculate a model showing the influence of the factors on the yield.
Task 2
Why dont you get a summary of fit plot? Edit the model by removing the two smallest terms. Recalculate the model. Which terms do you think are significant? Which factors can be neglected in further investigations?
Phase 2: Optimisation (RSM) Task 3

Define a new investigation. Continue with the three most important factors from the screening and generate a response surface design (CCC) with 6 centre points. Enter the response values listed in the table below. Note that the factor co-ordinates of the axial points have been rounded off to more manageable numbers. Calculate a regression model and carry out the necessary model revision. Interpret the model. Which factor combination will allow a maximisation of the synthetic yield?
Page 2 (5)
Solutions to Willge
Task 2
The design is saturated, i.e., there are no degrees of freedom left because we fitted a model of 16 terms to a design of 16 experiments. One way to alert the user of this undesirable situation is to deny plotting of R2, R2adj, or Q2. In the coefficient plot no confidence intervals are given (this is because RSD = 0 and because the tdistribution is undefined for zero degrees of freedom). Nevertheless, we can see that Te has the largest influence on Yield.
Investigation: Willges (MLR) Scaled & Centered Coefficients for Yield
15 10 % 5 0 -5 SK*MK MK*Sti MK SK*Sti Sti Te*Sti Te*Pa MK*Te MK*Pa SK*Te SK*Pa Pa*Sti Te*Sti SK Te Pa Te*Pa
N=16 DF=0
Conf. lev.=0.95
MODDE 7 - 2003-11-19 10:17:47
When removing the two smallest model terms, Pa*Sti and MK*Sti, a model is obtained that explains and predicts the variance in the data very well. From the coefficient plot we conclude that the three factors SK, MK, and Te have the largest influence on the Yield. Sti is also significant but will be neglected in further investigations. Through this screening we have thus reduced the number of factors from 5 to 3.
Investigation: Willges (MLR) Summary of Fit 1.00
20
R2 Q2
Investigation: Willges (MLR) Scaled & Centered Coefficients for Yield
0.80
10
0.60 0.40 0.20 0.00

N=16 DF=2
% 0 -10 MK SK*MK SK*Sti Sti MK*Te MK*Pa SK Te Pa SK*Te SK*Pa
Yield
MODDE 7 - 2003-11-19 10:18:52
Page 3 (5)
Task 3
When fitting the quadratic regression model to the data of the CCC design, a model was obtained with high R2 (0.98) and Q2 (0.85), but with negative MVal. Another diagnostic tool, the N-plot of residuals, pinpoints an outlier, i.e., experiment number 8. This outlier has to be removed in order to improve the modelling efficiency.
Investigation: Willge_Opt (MLR)
Investigation: Willge_Opt (MLR) Summary of Fit 1.00 0.80 0.60 0.40 0.20 0.00 -0.20
N=20 DF=10
Yield with Experiment Number labels 0.98 0.95 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.05 0.02
12 14 10 17 1 20 16 15 9 19 18 6 2 13 4 11 5 7 3 8
-7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 Deleted Studentized Residuals
N=20 DF=10 R2=0.980 Q2=0.849 R2 Adj.=0.961 RSD=5.0006
MODDE 7 - 2003-11-19 10:23:11
Yield
As we can see below, the removal of observation #8 improves the model. We now have an excellent model according to R2 and Q2. Observation #7 is somewhat far away in the residual plot, but it is not an influential point. There is also some indication of lack of fit (MVal), but in this case the replicate error is exceptionally low, which may, at least partly, explain why lack of fit appears.
Investigation: Willge_Opt (MLR) Summary of Fit 1.00 0.80 0.60 0.40 0.20 0.00
N=19 DF=9
N-Probability
Investigation: Willge_Opt (MLR) Yield with Experiment Number labels 0.98 0.95 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.05 0.02
4 7
-4 -3 -2
5 14 17 9 10 3 20 16 13 15 19 11 18 1 6
-1 0 1 2
12
N-Probability
Yield
N=19 DF=9
R2=0.997 Q2=0.967
R2 Adj.=0.993 RSD=2.1588
MODDE 7 - 2003-11-19 10:24:32
Page 4 (5)
Investigation: Willge_Opt (MLR) Scaled & Centered Coefficients for Yield
20 10 0 -10 MK*MK MK SK*MK SK*SK MK*Te SK Te Te*Te SK*Te
N=19 DF=9
R2=0.997 Q2=0.967
R2 Adj.=0.993 RSD=2.1588 Conf. lev.=0.95

MODDE 7 - 2003-11-19 10:25:15
By creating the response contour plots shown below, we can see how the predicted Yield changes as a function of changes in the three factors. Evidently, the model forecasts a region of quantitative yield, i.e., with Temperature = 140, and high molar ratios in the other two factors.
Conclusions
This example illustrates the working principle of first conducting a careful screening investigation and thereafter a detailed optimisation study. The screening phase identified three factors as more influential than the other factors. When bringing these three factors into the optimisation stage, an area of quantitative yield (i.e., 100%) was discovered. The appropriateness of this region of operability was later verified experimentally by Torbjrn Lundstedt.
Page 5 (5)
DOE-Exercise Drug D (Opt)

Stability: An investigation into the release profiles of a drug
Background
The stability of an analytical method (in this case release curves) cannot be investigated by changing one factor at a time. More information about the stability can be extracted using DOE. In this case, a small (small volume) design is laid out to describe how the factors should be altered around the standard settings to acquire information on the stability of the method (sensitivity to change). This example is intended to show how experimental design can simplify the examination of a methods sensitivity to small factor changes. The Drug D data originate from a pharmaceutical study at Astra Hssle performed by Tina Riesel and sa Backman.
Objective
The objective of this investigation was to examine how the release profile of Drug D was affected by changes in standard conditions (see below). Is the release after 1 hour the same as after 10 hours? Changes in standard conditions here refer to changes in the four factors: volume of an artificial stomach, its temperature, its fluctuation, and pH. In the written documentation, the manufacturer declared that after 1h the release should be between 20 and 40%, and after 10h above 80%. In addition, the specification stated that the factors should not cause more than 5% (1h) or 10% (10h) spread in each response. Hence, one experimental goal was to assess whether the variation in the release rates across the entire design was consistent with this claim.
Data
Page 1 (4)
Tasks
Task 1
Generate a design so that a model with square terms can be evaluated (select a CCF-design). Enter the response data, calculate the model, and interpret the results.
Task 2
Refine the model and make response contour plots to examine whether the responses change appreciably.
Task 3
According to the specification, the changes in the factors should not cause the response to vary more than 5% (1h) or 10% (10h). Is this specification met?
Solutions to Drug D
Task 1
As seen in the Summary of Fit plot the difference between R2 and Q2 is quite large for 10h, which indicates that the model might be too complicated, or that there are some outliers. As shown by the coefficient plot, there are several coefficients that are near zero.
Investigation: DrogenD (MLR) Summary of Fit 1.00 0.80
Investigation: DrogenD (MLR) Scaled & Centered Coefficients for Release 1h
Investigation: DrogenD (MLR) Scaled & Centered Coefficients for Release 10h 1 0 % -1 -2 -3
0.50 0.00
0.60 0.40 0.20 0.00
% -0.50 -1.00 Vol Te Sti pH Vol*Vol Te*Te Sti*Sti pH*pH Vol*Te Vol*Sti Vol*pH Te*Sti Te*pH Sti*pH
Release 1h
N=27 DF=12
Release 10h
N=27 DF=12
R2=0.948 Q2=0.710
R2 Adj.=0.886 RSD=0.3838 Conf. lev.=0.95

MODDE 7 - 2003-11-12 12:52:32
N=27 DF=12
Task 2
After the removal of five terms we get an excellent model for the first response and a good model for the second response. We see that the 1h response is much more influenced by linear contributions from the factors than the 10h response. On the other hand, the quadratic influence is more pronounced for the latter response.
Investigation: DrogenD (MLR) Summary of Fit 1.00 0.80
Investigation: DrogenD (MLR) Scaled & Centered Coefficients for Release 1h
Scaled & Centered Coefficients for Release 10h 1 0 % -1 -2
0.50 0.00 -0.50
0.60 0.40 0.20 0.00
-1.00 Vol*Vol Vol*pH Sti*pH pH*pH Vol pH Sti Te Te*Te
Vol*Vol
Vol*pH
Release 1h
N=27 DF=17
Release 10h
N=27 DF=17
R2=0.943 Q2=0.864
R2 Adj.=0.913 RSD=0.3366 Conf. lev.=0.95

MODDE 7 - 2003-11-12 12:53:37
N=27 DF=17
R2=0.889 Q2=0.705
R2 Adj.=0.831 RSD=0.6615 Conf. lev.=0.95

MODDE 7 - 2003-11-12 12:53:20
pH*pH
Page 2 (4)
Sti*pH
Vol
pH
Sti
Te
Te*Te
Vol Te Sti pH Vol*Vol Te*Te Sti*Sti pH*pH Vol*Te Vol*Sti Vol*pH Te*Sti Te*pH Sti*pH
R2=0.908 Q2=0.333 R2 Adj.=0.801 RSD=0.7162 Conf. lev.=0.95
MODDE 7 - 2003-11-12 12:52:45
Investigation: DrogenD (MLR)
Task 3
To understand the features of the two responses better, we created the response contour plots shown below. These two figures suggest that the responses change dramatically as a result of altered factor settings. However, this is misleading.
Page 3 (4)
Let us examine these plots more closely. We are tricked by the way they were constructed. More appropriate plots are given below. These plots are response surface plots in which the z-axis, the release axis, has been rescaled to values between 20 and 40% for 1h and between 80 and 100% for 10h. These are more appropriate ranges according to the original objectives of the investigation. Now we can see that the response surfaces are actually quite flat. Remarkably, the difference between the highest and the lowest measured values is as low as 4.1% for 1h and 5.5% for 10h. Hence, we conclude that the release responses are robust because they are inside the given specifications (less than 5% or 10% variation).
Conclusions
The release rate after one hour mainly relates linearly (except for pH*pH) to the four factors. The extent of quadratic dependence is more apparent for the release rate after ten hours. The specification for the 1h response is met. The specification for the 10h response is met.
Page 4 (4)
NONAFACT (Robustness Testing)

Assessing robustness of a viral inactivation step in the manufacturing of a blood product
Background
The preparation of therapeutic products derived from blood of voluntary donors is an important route for tomorrows pharmaceutical industry. This is because human blood and plasma comprises many proteins, which, once extracted and purified, are of great medical and economic importance. Since the health of the millions of patients who receive blood-derived products every year depend on the quality of the processed blood and plasma, it is crucial that high priority is placed on the quality assurance of such products. One big risk is the transmission of infectious diseases via blood transfusion. Strategies for screening of blood for the detection of infectious agents is advancing, but this is a difficult and time-consuming process due to the continued discovery of new and emerging pathogens. At CLB (Dutch Red Cross)* in Amsterdam, designed experiments are routinely used as part of their viral safety strategy for blood-derived products. The current example is a robustness test investigating the robustness of a viral reduction step in the manufacturing of a solvent/detergent-treated factor IX product called Nonafact. We recall that in a robustness testing study the objective is to probe robustness close to the set point (the set point is usually chosen as the center-point in the design). A robust system copes with small factor changes without compromising its effectiveness. In other words, robustness is a measure of a systems reliability under normal use.
*)
Reference: H. Hiemstra, CLB. Presented at Blood-Products Safety, February 5-7, 2001, MacLean, Virginia, USA, http://www.healthtech.com/2001/bss/.
Objective
The experimental objective of the study here reviewed was to explore how sensitive a viral inactivation step was to changes in six process parameters. The six factors studied were (i) percentage TNBP, (ii) percentage Tween80, (iii) temperature, (iv) amount of protein, (v) pH, and (vi) concentration of NaCl. TNBP (tri-n-butylphosphate) and Tween80 (a detergent) help disintegrate the viruses, and the other factors may affect the viral inactivation process too. The response measured was the change in virus density when comparing density before and after treatment. Virus density is often expressed and valued on a logarithmic scale, and so any decrease in virus density is commonly expressed as [log (initial virus density) log (final virus density)]. This difference is often referred to as the reduction factor, or simply RF, and the higher the better. Maintaining RF > 5 is often used as the specification. In the current study, CLB used three enveloped viruses as models: HIV (Human Immunodeficiency Virus), BVDV (Bovine Viral Diarrhea Virus), and PSR (PseudoRabies Virus). BVDV is used as a model virus for human hepatitis C. Responses were measured within 10 minutes following addition of virucidal chemicals. This is a rather short time frame and in other similar studies up to 30 minutes is used.
Page 1 (6)
Data
Factors (process parameters):
Responses:
Design:
Page 2 (6)
Tasks
Task 1
Start a new MODDE project. Define the six factors and the six responses as outlined above. Select Screening and the Frac Fac Res III design in 8 runs. Use 2 center-points (change the default proposal based on 3 center-points). On your screen the following design should appear:
The design above was not used by the investigators. Instead, they choose to use the following modification:
In order to accomplish the altered experimental design you will have to modify the design manually (or paste the contents of NONAFACT.XLS into the worksheet). Also enter or paste the response data. Evaluate the raw data and the underlying design (replicate plot, histogram, scatter plot of responses, correlation matrix, etc.). Is there a need for response transformation? How large or small is the replicate error? Do the responses comply with the often used specification of staying above an RF of 5. What can you say about the correlation between the six responses? What can you say about the geometry of the underlying design?
Task 2
Select MLR as the fit method and compute the model. Review and interpret the model. Which linear terms are important? Is this system robust?
Page 3 (6)
Solutions to NONAFACT
Task 1
The replicate plots show acceptable spread in the two replicates for all responses but the second one. However, it would have been desirable to have access to at least three replicates. The replicate plots and histogram plots (no plots shown) do not indicate any skewed response. Only the second response (HIV_5min) constantly score RF-values exceeding the often used specification of 5. However, one should remember the short measurement time. Using longer time, e.g., 30 minutes, might have resulted in generally higher RF-values.
Investigation: Nonafact Plot of Replications for HIV_1min with Experiment Number labels Investigation: Nonafact Plot of Replications for HIV_5min with Experiment Number labels 6.10 Investigation: Nonafact Plot of Replications for BVDV_1min with Experiment Number labels 6.00 5.50 5.00 4.50 4.00 1 2 3
2
5.00 HIV_1min
2 6 1 3
1 2 3 4 BVDV_1min
5 2 4
5 3
6 7
8
HIV_5min
6.00
9 10
4.50
9 10
5.90 5.80 5.70 5.60 5.50
5 4
5 6 7
8 7
4.00
4 1
1 2 3 4 5 6 7 8 9 Replicate Index
MODDE 7 - 2003-11-26 17:56:40
1 3
10
8 9 Replicate Index
MODDE 7 - 2003-11-26 17:57:01
8
9
MODDE 7 - 2003-11-26 17:57:19
Investigation: Nonafact Plot of Replications for BVDV_5min with Experiment Number labels 7.00 6.50 BVDV_5min 6.00 5.50 5.00 4.50
Investigation: Nonafact Plot of Replications for PSR_2min with Experiment Number labels 4.50
Investigation: Nonafact Plot of Replications for PSR_10min with Experiment Number labels 6
7
PSR_2min 4.00 3.50 3.00 2.50
5 2 6
7
PSR_10min
5 2
3 4
9 10
10 9
3
4 3
8 4 10 9
1
1 2 3 4 5 6 7 8 9 Replicate Index
MODDE 7 - 2003-11-26 17:57:41
1
1 2 3
3
4
4
5 6 7 8
8
9 1
1
MODDE 7 - 2003-11-26 17:58:21
Replicate Index
MODDE 7 - 2003-11-26 17:58:02
The table below is the correlation matrix. It shows how all terms in the model and all responses relate to each other. A colored cell indicates high correlation. A number of interesting observations can be made. First of all, we can see that the factors Protein/Tween80 and NaCl/Protein are correlated in a pair-wise fashion. This is unexpected and means that the original investigators have failed to create a correct fractional factorial design. The effect of these non-zero correlations will be inflated confidence intervals around the regression coefficients of Tween80, Protein, and NaCl. Secondly, it appears that the factors TNBP, Tween80, and Temperature generally exert the strongest influence on the responses, i.e., the responses are most susceptible to altered settings in these factors. Thirdly, the response HIV5 seems to be different from the others, since it only correlates appreciably with HIV1. All the other five responses correlate more or less strongly with one another.
Page 4 (6)
Task 2
A linear regression model in seven terms (constant + six main effects) was fitted to each of the six responses. The summary of fit plot below demonstrates that two significant models were obtained, that of HIV_1min and that of PSR_2min. Also recall that in robustness testing we do not generally spend much time with model refinement activities.
Investigation: Nonafact (MLR) Summary of Fit 1.00 0.80 0.60 0.40 0.20 0.00 -0.20
HIV_1min
HIV_5min
BVDV_1min
BVDV_5min
PSR_2min
PSR_10min
N=10 DF=3
The coefficient overview plot presented below is useful to get the overall picture. In appears that keeping TNBP high (0.35%), Tween80 low (0.8%), Temp high (30C), Protein high (35 mg/ml), pH low (5.5) and NaCl high (1250 mM) generally correspond to the most favorable operating conditions (encoding the highest virus reduction factors). With this setting of pH there is a minor controversy with respect to BVDV_1min, however the level of Tween80, which dominates for this response, is set advantageously.
Investigation: Nonafact (MLR) Normalized Coefficients

TNBP T80 Temp Pro pH NaCl
1.00
0.50
0.00
-0.50
HIV_1min
HIV_5min
BVDV_1min
BVDV_5min
PSR_2min
PSR_10min
N=10 DF=3
Page 5 (6)
By using six response contour plots it is easy to overview the results (see below). These plots were drawn letting TNBP and Temp be the X- and Y-axes, respectively, and by putting Tween80 high, Protein high, pH low, and NaCl high. The colour coding is consistent throughout the six plots. From these plots it is quickly understood that with the short measurement time for the three strains of virus, RF > 5 is not within reach for PSR_2min. The specification is within reach for BVDV_1min if pH is set high. Note that RF for HIV_5min is constantly predicted above 6, hence the flatness of this response contour plot.
Conclusions
Virus reduction factors above 5 are not achievable for PSR2min. The best workable factor combination is TNBP high (0.35%), Tween80 low (0.8%), Temp high (30C), Protein high (35 mg/ml), pH low (5.5) and NaCl high (1250 mM).
Page 6 (6)
DOE-Exercise: HPLC Robustness (Robustness Testing)

Evaluating sensitivity of HPLC responses to small changes in regulatory factors
Background
The aim of robustness testing is to design a process, or a system, so that its performance remains satisfactory even when some influential factors are allowed to vary. In other words, we want to minimise the systems sensitivity to changes in certain critical factors. The advantages of this include simpler process control, wider range of applicability of product and higher quality of product. A robustness test is usually carried out before the release of an almost finished product, or analytical system, as a last test to ensure quality. Such a design is usually centred on a factor combination, which is currently used for running the analytical system, or the process. We call this the set point. The set point may have been found through a screening design, an optimisation design, or some other identification principle, such as written quality documentation. The aim of robustness testing is, therefore, to explore robustness close to the chosen set point. The example that we have chosen as an illustration originates from a pharmaceutical company. It represents a typical analytical chemistry problem within the pharmaceutical industry. In analytical chemistry, the HPLC method is often mounted for routine analysis of complex mixtures. It is therefore important that such a system will work reliably for a long time, and be reasonably insensitive to varying chromatographic conditions. In chromatography, the objective is separation of the analytes within a reasonable time. Separation relies on different retention of each analyte on the stationary phase. Thus, the retention of each analyte is important, and this response is described by the capacity factor, k. The degree of separation between two analytes is estimated as the resolution between two adjacent peaks in the chromatogram. A resolution of 1 is considered as the minimum value for separation between neighbouring peaks, but for complete baseline separation a resolution of >1.5 is necessary. As the resolution value approaches zero, it becomes more difficult to discern separate peaks.
Objective
The investigators explored five factors: (1) amount of acetonitrile in the mobile phase; (2) pH of mobile phase; (3) temperature; (4) amount of the OSA counter-ion in the mobile phase; (5) stationary phase batch (column), and mapped their influence on the chromatographic behaviour of two chemical analytes. Note that the last factor is of a qualitative nature. To study whether these factors had an influence on the chromatographic system, the researchers used a 12 run experimental design to encode 12 different chromatographic conditions. For each condition, three quantitative responses reflecting the capacity factors of the two analytes (compounds) and the resolution between the analytes were measured. The goal of this study was to constantly maintain a resolution of 1.5 or higher for all chromatographic conditions. No specifications were given for the two capacity responses.
Data
A 12 run design supporting a linear model was constructed. This design, shown below, is a 25-2 fractional factorial design, supplemented with four centre-points.
Page 1 (7)
Tasks
Task 1
Define a new investigation in MODDE with five factors and three responses. Select Screening, a linear model, and a relevant fractional factorial design with 8 + 4 runs. Enter the response data. Evaluate the raw data. Is there any need for data pre-treatment, such as a response transformation?
Task 2
Fit the linear regression model. Which are the important factors? Are there any non-significant model terms? Are the residuals approximately normally distributed? Comment on any lack of fit. Which responses are robust to changes in the five factors?
Task 3
Assuming that the specification for k2 was 2.7 to 3.3, what would your recommendation be for changing the tolerances of the factors so that robustness is likely to be achieved for this response? NOTE #1: This kind of specification of a capacity factor is uncommon in the pharmaceutical industry, but is shown here for illustration. NOTE #2: Use the discussion regarding the four limiting cases of robustness testing. It will give guidance to how this problem might be solved.
Page 2 (7)
Solutions to HPLC Robustness

Task 1
We start with the evaluation of the raw data by inspecting the replicate error of each response. As seen, the replicate error was expectedly small. We would not anticipate large drifts among the replicates, as we have deliberately set up a design where each run ideally should produce equivalent results. The numerical variation in the resolution response was small. The lowest measured resolution was 1.75 and the highest 1.89. Since the operative goal was to maintain a resolution above 1.5, we see already in the raw data that this goal was fulfilled, and this means that Res1 is robust.
Investigation: HPLC Robustness Plot of Replications for k1 with Experiment Number labels 2.40
Investigation: HPLC Robustness Plot of Replications for k2 with Experiment Number labels 3.40
Investigation: HPLC Robustness Plot of Replications for Res1 with Experiment Number labels
1
2.20 k1 2.00 1.80 1.60
3 5 4 6 8
7 8 9 10 11
3 1 5 4 2 6
1 2 3 4 5 6 7 8 9 10 11 Replicate Index
MODDE 7 - 2003-11-17 10:39:21
3 7 12 11
Res1
7
k2
1
1.850
9 10
3.20
7 5 8 6 10 9 12 11
11 12
3.00 2.80 2.60
9 10
1.800
4 2
1 2 3 4 5 6
8
1.750 7 8 9 10 11
Replicate Index
MODDE 7 - 2003-11-17 10:37:54
Replicate Index
MODDE 7 - 2003-11-17 10:39:50
In evaluation of the raw data, it is compulsory to check the data distribution of the responses, to reveal any need for response transformation. We may check this need by making a histogram of each response. Such histograms are plotted below and they inform us that it is appropriate to work in the untransformed scale of each response. In most cases it is convenient to work with log k, but not here.
Investigation: HPLC Robustness Histogram of k1 7 6 5 Count Count 4 3 2 1 0 1.50 1.75 2.00 Bins
MODDE 7 - 2003-11-17 10:42:54
Investigation: HPLC Robustness Histogram of k2 6 5 4 3 2 1 Count 7 6 5 4 3 2 1 2.40 2.70 3.00 Bins

MODDE 7 - 2003-11-17 10:43:35
Investigation: HPLC Robustness Histogram of Res1
2.25
2.50
3.30
3.60
0 1.700
1.755
1.810 Bins
1.865
1.920
MODDE 7 - 2003-11-17 10:44:02
Task 2
The regression analysis phase in robustness testing is carried out in a manner similar to that of screening and optimisation. However, the focus is primarily placed on the R2 and Q2 parameters, and the analysis of variance results, but not so much on residual plots and other graphical tools. The reason for this is that the interest in robustness testing lies in classifying the regression model as significant or not significant. With such information it is then possible to get an understanding of the robustness. Another modelling difference between robustness testing and screening/optimisation is that model refinement is usually not carried out. We fitted a linear model with 6 terms to each response. The overall results of the model fitting are displayed in the summary of fit plot. The predictive power ranges from poor to excellent. The Q2 values are 0.92, 0.96, and 0.12, for k1, k2, and Res1, respectively. In robustness testing the ideal result is a Q2 of near zero value. Hence, the Q2 of 0.12 for Res1 is an indication of an extremely weak relationship between the factors and the response, that is, it seems as if the response is robust. The low Q2 for Res1 might be explained by the fact that this response is close to constant across the entire design, and hence there is not much response variation to account for. The high Q2s of k1 and k2, on the other hand, indicate that these responses are sensitive to the small factor
Page 3 (7)
changes. However, for these latter responses we cannot make any robustness statement, as there are no specifications to compare with.
Investigation: HPLC Robustness (MLR) Q2 Model Validity Summary of Fit Reproducibility 1.00 0.80 0.60 0.40 0.20 0.00
R2
k1
N=12 DF=6
k2
Res1
The results of the second diagnostic tool, the analysis of variance, are summarised in the three tables. Remembering that the upper p-value should be smaller than 0.05 and the lower p-value larger than 0.05, we realise that the former test is a borderline case with respect to Res1, because the upper listed p-value is 0.059. This suggests that the model for Res1 is insignificant, and therefore that Res1 is robust.
Task 3
The derived models will now be used in a general discussion concerning various outcomes of robustness testing. In this discussion a possible solution to the problem given in Task 3 is presented. First limiting case Inside specification/Significant model The first limiting case is inside specification and significant model. The HPLC application contains one example of this limiting case, the Res1 response. We know from the initial raw data assessment that this response is robust, because all the measured values are inside the specification, that is, above 1.5. Actually, as highlighted in the first figure below, the measured values are all above 1.75. The question of a significant model, however, is more debatable. It is possible to interpret the regression model as a weakly significant regression equation. We will do so in this section for the sake of illustration. The classification of the model as significant is based on a joint assessment of the low, but positive, Q2, seen in the second figure, and the significant linear term of acetonitrile, seen in the third figure. Hence, Res1 may be regarded as a representative of the first limiting case.
Page 4 (7)
Investigation: HPLC Robustness Plot of Replications for Res1 with Experiment Number labels 2.50 2.00 1.50 1.00
Investigation: HPLC Robustness (MLR) Q2 Model Validity Summary of Fit Reproducibility 1.00 0.80 0.60 0.40
R2
Investigation: HPLC Robustness (MLR) Scaled & Centered Coefficients for Res1 (Extended) 0.040 0.020 0.000 -0.020 -0.040 pH Co(ColA) Co(ColB) OS Ac Te
10 9
12 11
Res1
0.50 0.00
0.20
1 2 3 4 5 6 7 8 9 10 11 Replicate Index
MODDE 7 - 2003-11-17 10:55:00
0.00
k1
N=12 DF=6
k2
Res1
N=12 DF=6 R2=0.772 Q2=0.121
R2 Adj.=0.582 RSD=0.0248 Conf. lev.=0.95

MODDE 7 - 2003-11-17 10:55:34
An interesting consequence of these modelling results is that it appears to be possible to relax the factor tolerances and still maintain a robust system. For instance, the model interpretation reveals that the amount of acetonitrile could be as high as 28%, without compromising the goal of upholding a resolution above 1.5. Furthermore, in robustness testing it may be useful to estimate the response values of the most extreme experiments. The regression coefficient plot shows how to obtain these estimates. We can see that one extreme experimental condition is given by the factor combination: low Ac, high pH, high Te, high OS, and ColB. The other extreme experiment is this pattern reversed. The prediction spreadsheet gives these Res1 predictions and they are both valid with regard to the given specification.
Second limiting case Inside specification/Non-significant model The second limiting case is inside specification with a non-significant model. This is the ideal outcome of a robustness test. Again, we will use the Res1 response as an illustration. We know that the measured values of this response are all inside specification. In addition, we can interpret the obtained regression model as nonsignificant. This classification of the model as non-significant is contrary to the classification made in the previous section, but is still reasonable and is made for the purpose of illustrating the second limiting case. In general, to assess model significance, two diagnostic tools emerge as the most appropriate. The first tool is the R2/Q2 parameters. When these are both near zero, as is the situation in the left-hand figure below, we have the ideal case. This means that we are trying to model a system in which there is no relationship between the factors and the response in question. In reality, however, one has to expect that small deviations from this outcome will occur. A typical result is the case when R2 is rather large, in the range of 0.5-0.8, and Q2 low or close to zero. As shown in the middle figure, this is the case for Res1 which points to an insignificant model.
Investigation: itdoe_roblimcases (MLR) Q2 Model Validity Summary of Fit Reproducibility 1.00 0.80 0.60 0.40
0.40 1.00 0.80 0.60
R2
Investigation: HPLC Robustness (MLR) Q2 Model Validity Summary of Fit Reproducibility
R2
0.20 0.00 -0.20

N=11 DF=5
0.20 0.00
vetific
k1
N=12 DF=6
k2
Res1
Page 5 (7)
The second important modelling tool relates to the analysis of variance, and particularly the upper F-test, which is a significance test of the regression model. We can see in the right-hand figure, that the Res1 model is weakly insignificant because the p-value (0.059) exceeds 0.05. Hence, we conclude that no useful model is obtainable. When no model is obtainable it is reasonable to anticipate that all the variation in the experiments can be seen as a variation around the mean. This variation can then be seen as the mean value t-value * standard deviation. Third limiting case Outside specification/Significant model The third limiting case is outside specification with a significant model. This limiting case occurs whenever a significant regression model is acquired, and the raw response data themselves do not fulfil the goals of the problem formulation. We will use the second response, k2, of the HPLC data to illustrate this limiting case. In order to accomplish a meaningful illustration, we will have to define a specification for k2, for example that k2 should be between 2.7 and 3.3. This kind of specification of a capacity factor is uncommon in pharmaceutical industry, but is shown here for illustration. We start by assessing the statistical behaviour of the k2 regression model. This behaviour is evident from the lefthand figure below, which indicates the sensitivity to small factor changes of k2 (as well as k1). In order to understand what is causing this susceptibility to changes in the factors, it is necessary to consult the regression coefficients displayed in the right-hand figure.
Investigation: HPLC Robustness (MLR) Q2 Model Validity Summary of Fit Reproducibility 1.00 0.80 0.60 0.40 0.20 0.00
R2
Investigation: HPLC Robustness (MLR) Scaled & Centered Coefficients for k2 (Extended) 0.10 0.00 -0.10 -0.20 -0.30 pH Co(ColA) Co(ColB)
Page 6 (7)
k1
N=12 DF=6
k2
Res1
N=12 DF=6
R2=0.989 Q2=0.959
R2 Adj.=0.981 RSD=0.0418 Conf. lev.=0.95

MODDE 7 - 2003-11-17 11:05:45
We can see that it is mainly acetonitrile, pH and temperature, that affect k2. Using the procedure outlined in connection with the first limiting case, we may understand how to change the factor intervals to accomplish two things: (i) how to get k2 inside specification; and (ii) how to produce a non-significant model (i.e., how to approach the second limiting case). Firstly, it is possible to predict the most extreme experimental values (in the investigated area) of k2. These are the predictions listed on the first two rows in the next figure, and they amount to 2.50 and 3.49. Clearly, we are outside the 2.7-3.3 specification.
OS
Ac
Te
In order to move to within this specification, we must adjust the factor ranges of the three influential factors, and this is shown in rows three and four. If we also want the 95% confidence intervals, and not only the point estimates to be inside specification, somewhat harder demands on the factors are needed. Moreover, to get a non-significant regression model even narrower factor intervals are needed. This is done as follows: The regression coefficient of acetonitrile is 0.33 and its 95% confidence interval 0.036. These numbers mean that this coefficient must be decreased by a factor of 10, that is, be smaller than around 0.03, in order to make this factor non-influential for k2. Since this coefficient corresponds to the response change when the amount of acetonitrile is increased by 1% (from 26% to 27%) the new high level must be lowered from 27% to 26.1%. A similar reasoning applies to the new lower level. Hence, the narrower, more robust, factor tolerances of acetonitrile ought to be between 25.9% and 26.1%. A similar reasoning for temperature indicates that the factor interval should be decreased to one-third of the original size. Appropriate low and high levels thus appear to be 20C and 23C. Predictions obtained are listed in rows five and six. These new settings must, of course, be verified with a new design. This concludes our treatment of the third limiting case. The take-home message here is that it is possible to use the modelling results to understand how to reformulate the factor settings so that robustness can be obtained. Fourth limiting case Outside specification/Non-significant model The fourth limiting case is outside specification with a non-significant model. This limiting case may be the result when the derived regression model is poor, and there are anomalies in the data. Such anomalies are important to uncover, because their presence will influence the modelling. An informative graphical tool for identifying whether this limiting case is taking place is the replicate plot. The left-hand figure shows an example in which one strong outlier is present, which will invalidate all possibilities for robustness. The second figure depicts another case where all the replicated centre points have much higher response values than the other runs. This pattern hints at curvature and implies non-robustness. A third common situation, which partly resembles the first case, might take place when one experiment deviates from the rest and also falls outside some predefined robustness limits. This is shown in the last figure.
Investigation: itdoe_roblimcases
Plot of Replications for vetific with Experiment Number labels
45 vetific 40 35 30 25 1
10 9 11
70 vetific
10 9 11
60
4
MODDE 7 - 2003-11-17 11:58:00
50 1
1
2
2
3
3
4
4
5
5
6
6
7
7
8
8
9
100 90 80 70 60 50 40 30 20 10 0
3 1 2 4 5 6 7 8 10 11 9
vetific
Replicate Index
MODDE 7 - 2003-11-17 11:59:51
Replicate Index
MODDE 7 - 2003-11-17 12:01:59
Evidently, there can be several underlying explanations to this limiting case, and we have just shown a few. Therefore, we consider this limiting case as the most complex one. In summary, we have described four limiting cases of robustness testing, and it is important to realise that robustness testing results are not statically locked to these four extreme outcomes. In principle, there is a gradual transition from one limiting case to another, and hence an infinite number of outcomes are conceivable.
Conclusions
Evaluation of the data demonstrated that the response Res1 was robust because it was possible to maintain a resolution above 1.5 for all 12 experiments.
Page 7 (7)
DOE-Exercise CakeTaguchi (Inner/Outer arrays)

Finding an optimal cake mix composition insensitive to time and temperature fluctuations
Background
This is an industrial pilot plant investigation aimed at designing a cake mix giving tasty products.
Objective
The final goal was to design a cake mix which would produce a good cake even when a customer does not rigidly follow the baking instructions. To explore whether this was feasible, the factors Flour, Shortening, and Egg Powder were used as design factors and varied in a cubic inner array. They were varied between 200-400g (Flour), 50-100g (Shortening), and 50-100g (Egg Powder). In addition, two noise factors were incorporated in the experimental design as a square outer array. These factors were baking Temperature, varied between 175 and 225C, and Time spent in oven, varied between 30 and 50 minutes
Data
The investigators made 55 experiments. The inner array is a two-level full factorial design in 11 (8+3) runs, and the outer array a two-level full factorial design in 5 (4+1) runs, resulting in 11*5 = 55 experimental combinations. We will analyse this data set in two ways. A schematic representation of the experimental plan is given in Figure 1.
Temp Temp 225 175 30 Time 50 225 Temp 175 225 Temp 175 30 Time 50 225 Temp 175 30 Time 50 30 Time 50 30 Time 50 225 175 Temp 30 Time 50 225 Temp 175 225 175
100 Eggpowder
6
30 Time 50
Temp
30 Time 50
225 175
30 Time 50 Temp 225 175
50 100
50
200
Flour
400
Sho rten in g
Figure 1: The arrangement of the factors as inner and outer arrays. This arrangement was introduced by the Japanese engineer Genichi Taguchi.
Page 1 (9)
Organisation of data for part I (MODDE worksheet should have 11 runs): The classical approach for analysing DOE data organised in inner and outer arrays is to form, for each point in the inner array (here: Cake Mix factors), the average response value across all points in the outer array (here: Time & Temperature). This gives two responses, the average taste for each point in the inner array, and the standard deviation around this average. Note: with this approach there will be no model terms related to Time and Temperature.
Tasks
Task 1
Define a new investigation in MODDE with three factors and two responses. Select Screening and an interaction model. Select a full factorial design with 8 (corners) + 3 (centre-points) runs. Evaluate the raw data. Fit the regression model. Which are the important factors? Are there any non-significant model terms? Are there any outliers? Comment on lack of fit. Which factor combination leads to an optimal taste? Which factors correlate with StDev? How shall the inner array factors be set to minimise the influence of the outer array factors?
Page 2 (9)
Organisation of data for part II (MODDE worksheet should have 55 runs): A problem with the foregoing analysis approach is that it does not enable a quantitative understanding of the impact of baking Time and Temperature, since these factors were not introduced in the regression model. One way to accomplish this is to re-organise the worksheet so that it contains all 55 experiments and five factors in the model. The consequence of this latter interaction analysis approach is that the StDev response vanishes. Another advantage of this latter approach is that it is possible to identify outliers.
No Flour Shortening Eggpowder Temp Time Taste No Flour Shortening Eggpowder Temp Time Taste 1 200 50 50 175 30 1.1 34 200 50 50 225 50 1.3 2 400 50 50 175 30 3.8 35 400 50 50 225 50 2.1 3 200 100 50 175 30 3.7 36 200 100 50 225 50 2.9 4 400 100 50 175 30 4.5 37 400 100 50 225 50 5.2 5 200 50 100 175 30 4.2 38 200 50 100 225 50 3.5 6 400 50 100 175 30 5 39 400 50 100 225 50 5.7 7 200 100 100 175 30 3.1 40 200 100 100 225 50 3 8 400 100 100 175 30 3.9 41 400 100 100 225 50 5.4 9 300 75 75 175 30 3.5 42 300 75 75 225 50 4.1 10 300 75 75 175 30 3.4 43 300 75 75 225 50 3.8 11 300 75 75 175 30 3.4 44 300 75 75 225 50 3.8 12 200 50 50 225 30 5.7 45 200 50 50 200 40 3.1 13 400 50 50 225 30 4.9 46 400 50 50 200 40 3.2 14 200 100 50 225 30 5.1 47 200 100 50 200 40 5.3 15 400 100 50 225 30 6.4 48 400 100 50 200 40 4.1 16 200 50 100 225 30 6.8 49 200 50 100 200 40 5.9 17 400 50 100 225 30 6 50 400 50 100 200 40 6.9 18 200 100 100 225 30 6.3 51 200 100 100 200 40 3 19 400 100 100 225 30 5.5 52 400 100 100 200 40 4.5 20 300 75 75 225 30 5.15 53 300 75 75 200 40 6.6 21 300 75 75 225 30 5.3 54 300 75 75 200 40 6.5 22 300 75 75 225 30 5.4 55 300 75 75 200 40 6.7 23 200 50 50 175 50 6.4 24 400 50 50 175 50 4.3 25 200 100 50 175 50 6.7 26 400 100 50 175 50 5.8 27 200 50 100 175 50 6.5 28 400 50 100 175 50 5.9 29 200 100 100 175 50 6.4 30 400 100 100 175 50 5 31 300 75 75 175 50 4.3 32 300 75 75 175 50 4.05 33 300 75 75 175 50 4.1
Task 2
Define a new investigation in MODDE with five factors and one response. Select Screening as objective and an interaction model. Create a design with 55 rows. Paste contents from CakeTaguchi.DIF into the MODDE worksheet. Evaluate the raw data. Fit the model. Which are the important factors? Are the residuals approximately normally distributed? Comment on lack of fit. Investigate the model coefficients and particularly examine baking Time and Temperature. Are they influential? How shall the cake-mix recipe be modified to minimise the influence of baking time and temperature?
Page 3 (9)
Solutions to CakeTaguchi
Task 1
It is instructive to first consider the raw experimental data. The first two plots show the replicate plots of the responses. We see that for both responses the replicate error is small and therefore satisfactory. It is also interesting that the responses are inversely correlated (third figure). We recall that the experimental goal is a factor combination producing a tasty cake and with low variation. Hence, it seems as if experiment number 6 is the most promising one.
Investigation: CakeTaguchi_classical Investigation: CakeTaguchi_classical
Investigation: CakeTaguchi_classical Raw Data Plot with Experiment Number labels 0.40 0.30
LogStD
Plot of Replications for Taste with Experiment Number labels 6.00 5.50 Taste 5.00 4.50 4.00 3.50 1
Plot of Replications for LogStD with Experiment Number labels 0.40 0.30 LogStD
1 7 3 11 10 9 4 8
Taste
6 4 3 7 1
2
1 3 2 4 6
MODDE 7 - 2003-11-12 11:04:24
0.20 0.10 0.00 -0.10
9 11 10
LogStD
7 11 10 9
0.20 0.10 0.00 -0.10

9
2
MODDE 7 - 2003-11-12 11:04:02
-0.20
-0.20
3.63.84.04.24.44.64.85.05.25.45.65.86.0
Next, we examine the modelling results obtained when fitting an interaction model to each response. Note that the negative Q2 of StDev indicates model problems. The model for Taste is of higher quality, but we remember from previous modelling attempts (see Exercise CakeMix) that even better results are possible if the two nonsignificant two-factor interactions are omitted.
Investigation: CakeTaguchi_classical (MLR) Q2 Model Validity Summary of Fit Reproducibility 1.00
0.40
R2
Investigation: CakeTaguchi_classical (MLR) Scaled & Centered Coefficients for LogStD 0.10
0.80 0.60 0.40 0.20 0.00 -0.20
0.20 0.00 -0.20 -0.40 -0.60 Fl Fl*Egg Fl*Sh Sh*Egg Egg Sh
0.00
-0.10
-0.20 Fl Fl*Egg Fl*Sh Egg Sh*Egg Sh

R2=0.959 Q2=-0.284
Taste
N=11 DF=4
LogStD
N=11 DF=4
R2=0.995 Q2=0.874
R2 Adj.=0.988 RSD=0.0768 Conf. lev.=0.95

MODDE 7 - 2003-11-12 11:14:02
N=11 DF=4
R2 Adj.=0.898 RSD=0.0540 Conf. lev.=0.95

MODDE 7 - 2003-11-12 11:34:58
The results from fitting a refined model to each response are seen below. The model for StDev has improved a lot as a result of model pruning. Two interesting observations can now be made. The first is related to the Sh*Egg interaction, which is much smaller for StDev than for Taste. The second observation concerns the Fl main effect, which shows that Flour is the factor causing most spread around the average Taste. Hence, this is a factor to adjust in order to achieve robustness. The models that we have derived will now be used to accomplish the experimental goal.
Page 4 (9)
Investigation: CakeTaguchi_classical (MLR) Q2 Model Validity Summary of Fit Reproducibility 1.00

0.40
R2
Investigation: CakeTaguchi_classical (MLR) Scaled & Centered Coefficients for LogStD 0.10
0.80 0.60 0.40 0.20 0.00
0.20 0.00 -0.20 -0.40 -0.60 Fl Egg Sh*Egg Sh
0.00
-0.10
-0.20 Fl Egg Sh*Egg

Egg (low )
Taste
N=11 DF=6
LogStD
N=11 DF=6
R2=0.988 Q2=0.937
R2 Adj.=0.980 RSD=0.0974 Conf. lev.=0.95

MODDE 7 - 2003-11-12 11:15:34
N=11 DF=6
R2=0.939 Q2=0.677
Sh
R2 Adj.=0.899 RSD=0.0538 Conf. lev.=0.95

MODDE 7 - 2003-11-12 11:15:48
One way to understand the impact of the surviving two-factor interaction is to make interaction plots. Evidently, the impact of this model term is greater for Taste than for StDev. This is inferred from the fact that the two lines cross each other in the plot related to Taste, but do not cross in the other interaction plot. Both plots indicate that low level of Shortening and high level of EggPowder is favourable for high Taste and low StDev.
Investigation: CakeTaguchi_classical (MLR) Interaction Plot for Sh*Egg, resp. Taste 5.50 5.00 Taste 4.50 4.00 3.50
Egg (low ) Egg (high)
Investigation: CakeTaguchi_classical (MLR) Interaction Plot for Sh*Egg, resp. LogStD Egg (high)
Egg (high)
LogStD
0.200
Egg (low)
Egg (low) Egg (high)
0.150 0.100 0.050
Egg (low) Egg (high) Egg (high)

50 60 70 80 90 100 Shortening
N=11 DF=6 R2=0.939 Q2=0.677 R2 Adj.=0.899 RSD=0.0538
MODDE 7 - 2003-11-12 11:16:52
Egg (low)
50 60 70 80 90 100 Shortening
N=11 DF=6 R2=0.988 Q2=0.937 R2 Adj.=0.980 RSD=0.0974
MODDE 7 - 2003-11-12 11:17:06
0.000
An alternative procedure for understanding the modelled system is to make the response contour plots shown below. These contours were created by setting Flour to its high level, as this was found favourable in the modelling. The two contour plots convey an unambiguous message. The best cake mix conditions are found in the upper left-hand corner, where the highest taste is predicted, and at the same time the lowest standard deviation. This location corresponds to the factor settings Flour = 400, Shortening = 50, and EggPowder = 100. At this factor combination, Taste is predicted at 5.84 0.18, and StDev at 0.69 and with 95% confidence interval given by 0.55 and 0.87. Bearing in mind that the highest registered average value of Taste is 5.9, and the lowest value of StDev is 0.67, these predictions appear reasonable.
Page 5 (9)
Flour = 400g
Task 2
One drawback of the classical data analytical approach is that it does not allow the user to identify which noise factors could affect the variability of the responses. For the Taguchi method to be really successful, one would need to be able to estimate the impact of the noise factors and possible interactions between the design and the noise factors. Clearly, by definition, the success of the Taguchi approach critically depends on the existence of such noise-design factor interactions. Otherwise, the noise (variability) cannot be reduced by changing some design factors. Information about noise-design factor interactions can be extracted if both the noise and the design factors are combined in a single design. Then, a regression model can be fitted which contains both types of factors and their interactions. In this form of analysis, design factor effects in the classical approach (Task 1) now correspond to noise-design factor interactions (Task 2). We will now unfold the data table so that it comprises 55 rows and proceed with the Taguchi analysis. As usual, we commence the data analysis by evaluating the raw data. The replicate plot suggests that the replicate error is small, and the histogram shows that the response is approximately normally distributed. Hence, we may proceed to the regression analysis phase, without further pre-processing of the data.
Investigation: CakeTaguchi_interaction
Investigation: CakeTaguchi_interaction
Plot of Replications for Taste with Experiment Number labels 7 6 Taste 5 4 3 2 1 0
Histogram of Taste 14 12 10 Count 8 6 4 2 0 1.00 1.80 2.60 3.40 4.20 5.00 5.80 6.60 7.40 Bins
MODDE 7 - 2003-11-12 11:22:34
50 16 55 25 53 54 27 15 23 29 18 17 49 28 26 12 19 39 22 41 47 20 14 21 6 30 37 13 52 4 24 31 58 33 42 48 32 2 44 43 3 11 9 38 46 7 10 51 36 40 45 35 34 1
MODDE 7 - 2003-11-12 11:22:15
As seen in the summary of fit plot, the regression analysis gave a poor model with R2 = 0.60 and Q2 = 0.18. Such a large gap between R2 and Q2 is undesirable and indicates model inadequacy. The N-plot of residuals in
the next figure reveals no clues as to the poor modelling performance. The model also shows lack of fit (negative MVal).
Investigation: CakeTaguchi_interaction (MLR) Q2 Model Validity Summary of Fit Reproducibility 1.00
N-Probability
R2
Investigation: CakeTaguchi_interaction (MLR) Taste with Experiment Number labels 0.995 0.99 0.98 0.95 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.05 0.02 0.01 0.005
0.80 0.60 0.40 0.20 0.00 -0.20

N=55 DF=39
1
-4 -3 -2
23 55 53 12 54 29 218 50 41 37 25 3 9 8 4 47 49 1 6 6 42 27 15 44 4 3 7 24 13 26 9 10 1 1 3 22 45 52 28 1 4 5 21 30 46 20 35 40 34 19 36 48 38 31 51 33 32 17
-1 0 1 2 3 4

N=55 DF=39 R2=0.605 Q2=0.185 R2 Adj.=0.453 RSD=1.0545
MODDE 7 - 2003-11-12 11:24:02
Taste
However, the regression coefficient plot does reveal two plausible causes. Firstly, the model contains many irrelevant two-factor interactions. Secondly, it is surprising to see that the Fl*Te and Fl*Ti two-factor interactions are so weak. Since we observed (in Task 1) the strong impact of Flour on StDev, we would now expect much stronger noise-design factor interactions. In principle, this means that there must be a crucial higher-order term missing from the model, the Fl*Te*Ti three-factor interaction. Consequently, in the model revision, we decided to add this three-factor interaction and remove six unnecessary two-factor interactions.
Investigation: CakeTaguchi_interaction (MLR) Scaled & Centered Coefficients for Taste
0.50 0.00 -0.50 -1.00 Fl*Sh Fl*Egg Fl*Ti Egg*Ti Sh*Ti Fl Ti Fl*Te Sh*Egg Egg*Te Sh*Te Te*Ti
Page 7 (9)
Sh
Egg
Te
N=55 DF=39
R2=0.605 Q2=0.185
R2 Adj.=0.453 RSD=1.0545 Conf. lev.=0.95

MODDE 7 - 2003-11-12 11:25:01
When re-analysing the data, a more stable model with a reasonable R2 = 0.69 and Q2 = 0.57 was the result. An interesting aspect is that the R2 obtained is lower than in the classical analysis approach. This is due to the stabilising effect achieved by forming the average Taste over five trials in the classical analysis approach. Concerning the current model, we are unable to detect significant outliers among the individual experiments. The relevant N-plot of residuals is displayed below.
Investigation: CakeTaguchi_interaction (MLR) Q2 Model Validity Summary of Fit Reproducibility 1.00

N-Probability
R2
Investigation: CakeTaguchi_interaction (MLR) Taste with Experiment Number labels 0.995 0.99 0.98 0.95 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.05 0.02 0.01 0.005
0.80 0.60 0.40 0.20 0.00 -0.20

N=55 DF=44
55 53 54 5023 12 41 26 2 24 15 47 18 49 1 3 3 29 25 42 40 3 0 5 3 7 7 28 39 44 43 19 1 6 4 9 36 45 11 1 0 6 52 38 8 22 21 34 46 20 27 17 1 48 31 14 33 51 32 35
-4 -3 -2 -1 0 1 2 3 4 Deleted Studentized Residuals
N=55 DF=44 R2=0.693 Q2=0.571 R2 Adj.=0.623 RSD=0.8751
MODDE 7 - 2003-11-12 11:26:25
Taste
Having acquired a reasonable model, it is appropriate to consider the regression coefficients, which are displayed below. We can see the significance of the new three-factor interaction. This is in line with the previous finding on the impact of Flour on StDev. Some smaller two-factor interactions, which are components of the three-factor term (i.e. Fl*Te, Fl*Ti and Te*Ti), are kept in the model to make the three-factor interaction more interpretable.
Investigation: CakeTaguchi_interaction (MLR) Scaled & Centered Coefficients for Taste 1.00 0.50 0.00 -0.50 -1.00 Fl*Ti Fl*Te*Ti
Page 8 (9)
Fl
Ti
N=55 DF=44
R2=0.693 Q2=0.571
R2 Adj.=0.623 RSD=0.8751 Conf. lev.=0.95

MODDE 7 - 2003-11-12 11:26:41
The meaning of the three-factor interaction is easiest understood by constructing an interaction plot. The figure below displays the impact of the three-factor interaction. What should we look for in this kind of plot? The answer is that we want to get an indication of how to adjust the controllable factor Flour, so that the impact of variations in the uncontrollable factors Temperature and Time are minimised. The figure shows that by adjusting Flour to 400g the spread in Taste due to variations in Temperature and Time is reduced.
Sh*Egg
Te*Ti
Fl*Te
Egg
Sh
Te
Investigation: CakeTaguchi_interaction (MLR) Interaction Plot for Fl*Te*Ti, resp. Taste

Te (low ), Ti (low ) Te (low ), Ti (high) Te (high), Ti (low ) Te (high), Ti (high)
Te (low), Ti (high) Te (high), Ti (low) Te (high), Ti (low)
5 Taste
Te(high), (low), Ti Te Ti (high) (high) Te (low), Ti (low)
Te (low), Ti (low) Te (high), Ti (high)

Flour
N=55 DF=44 R2=0.693 Q2=0.571 R2 Adj.=0.623 RSD=0.8751
190 200 210 220 230 240 250 260 270 280 290 300 310 320 330 340 350 360 370 380 390 400 410
Shortening = 50 Eggpowder = 100

MODDE 7 - 2003-11-12 11:28:25
Furthermore, in solving the problem, we must not forget the significance of the strong Sh*Egg two-factor interaction. We know from the initial analysis that the combination of low Shortening and high EggPowder produces the best cakes. With these considerations in mind we draw the response contour triplet shown in the figure below. Because these contours are relatively flat, especially when Flour = 400, we can conclude that the system is robust. Hence, when industrially producing a cake mix with the composition Flour 400g, Shortening 50g, and EggPowder 100g, together with a cooking recommendation of 200 C and time 40 min, sufficient robustness towards consumer misuse ought to be the result.
Conclusions
This example illustrates two principal approaches to the analysis of Taguchi-designed data. In the analysis, it was found that an important three-factor interaction existed between Flour, Time and Temperature (Fl*Ti*Te). By interpreting this term it was concluded that the impact of Time and Temperature on variation in Taste was minimised by adjusting Flour from 300 g (initial set-point) to 400 g (new product recipe).
Page 9 (9)
DOE-Exercise LoafVolume (Inner/Outer arrays)

Investigation of mixture and process factors affecting the volume of loaves
Background
Many factors at a bakery can affect the quality of loaves including factors related to the recipe of the dough and those related to the baking conditions. Naes et al. [Chemometrics and Intelligent Laboratory Systems 41 (1998) 221-235.] carried out an extensive project in which the influence of five factors on the volume of loaves was studied. Three varieties of wheat flour were used, Tjalve, Folke and HardRS, and two factors were related to baking conditions, i.e., mixing time and proofing time of the dough. The latter two factors may be inconsistent from one bakery to another. As a quality index of loaf formation, the loaf volume was used.
Objective
The experimental objective of this study was to accomplish a factor combination yielding the target loaf volume of 530 cm3. The idea was to find a combination of the three wheat flours constantly yielding loaves of the target volume, and thus being insensitive to changes in mixing time and proofing time.
Data
The investigators made 90 experiments. The experimental plan contains an inner array made up of the three mixture factors (Tjalve, Folke, HardRS) and an outer array consisting of the two process factors (mixing time and proofing time). The inner array is a Simplex Centroid design in 10 runs, and the outer array a CCF in 9 runs resulting in 10*9 = 90 experimental combinations. We will analyse this data set in two ways. A schematic representation of the experimental plan used is given in Figure 1.
Organisation of data for part I (MODDE worksheet should have 10 runs):

The classical approach towards analysing DOE data organised in inner and outer arrays is to form, for each point in the inner array (here: mixture design), the average response value across all points in the outer array (here: CCF process design). This gives two responses, the average loaf volume for each point in the inner array, and the standard deviation around this average. Note that with this approach there will be no model terms related to mixing time and proofing time.
Page 1 (8)
Tasks
Task 1
Define a new investigation in MODDE with three formulation factors and two responses. Select RSM as objective and a quadratic model. Create a mixture design with 10 rows. Paste contents from LoafVol2.DIF into the MODDE worksheet.
Task 2
Use PLS as the fit method. Which are the important factors? Are the residuals approximately normally distributed? Comment on lack of fit. Is it possible to get a volume of 530cm3 and minimise the spread (standard deviation)? It is desirable to get the standard deviation below 60.
Organisation of data for part II (MODDE worksheet should have 90 runs):

A problem with the foregoing analysis approach is that it does not enable a quantitative understanding of the impact of mixing time and proofing time, since these factors were not introduced in the quadratic model. One way to accomplish this is to re-organise the worksheet so that it contains all 90 experiments and five factors in the model. The consequence of this latter approach is that the StDev response vanishes. Another advantage of this latter approach is that it makes it possible to identify deviating single experiments.
Task 3
Define a new investigation in MODDE with two process factors, three formulation factors, and one response. Select RSM as objective and a quadratic model. Create a D-optimal design with 90 rows. Paste contents from LoafVolume.DIF into the MODDE worksheet.
Task 4
Use PLS as the fit method. Which are the important factors? Are the residuals approximately normally distributed? Comment on lack of fit. Investigate the model coefficients and particularly examine mixing time and proofing time. Are they influential?
Page 2 (8)
Figure 1: Overview of inner and outer factor arrangement of LoafVolume application.
Page 3 (8)
Solutions to LoafVolume
Task 2
Using the default quadratic model, a strongly significant model for the average loaf volume was obtained. However, the model for StDev was weaker (low Q2 and problems in ANOVA). The residuals are nearly normally distributed for both responses. Note that the ANOVA is not complete, because there are no replicates available. We can observe from the plot of the raw data (StDev is plotted against loaf volume) that the two responses are strongly correlated (correlation coefficient = 0.90). This means that it will be difficult to get a high value of volume and a low value of the standard deviation.
Investigation: Loafvol2 (PLS, comp.=2) Summary of Fit 1.00
R2 Q2
Investigation: Loafvol2 Raw Data Plot with Experiment Number labels
stdev
80
0.80
9 10 36 4 5 1 2
460 480 500 520 540 loafvolume
70 stdev
0.60 0.40
60 50
0.20
40
0.00 loafvolume
stdev
440
Investigation: Loafvol2 (PLS, comp.=2) loafvolume with Experiment Number labels 0.95 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.05
Investigation: Loafvol2 (PLS, comp.=2) stdev with Experiment Number labels
5
N-Probability
N-Probability
2 6
-1.00
10
1 3
8 9
0.95 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.05
9 4 1 3
2 7
-1.00 -0.50
8 5
6 10
-0.50
0.00
0.50
1.00
0.00
0.50
1.00
N=10 DF=4 R2=0.953 Q2=0.782 R2 Adj.=0.894 RSD=10.4747
MODDE 7 - 2003-11-18 09:19:07
N=10 DF=4 R2=0.795 Q2=0.281 R2 Adj.=0.539 RSD=8.5309
MODDE 7 - 2003-11-18 09:18:50
Page 4 (8)
The coefficient plot of the model for loaf volume indicates that both Folke and HardRS affect the volume, whereas Tjalve does not have same strong influence. The same two factors also affect the StDev response, although these coefficients are not statistically significant according to their 95% confidence intervals. One efficient way of understanding the impact of these models is to make mixture contour plots. The solid arrow indicates where the best compromise is found: the mixture 0.25/0.11/0.64 where loaf volume is estimated at 530 cm3 and StDev as low as possible. To get the prediction uncertainty for this point we may use the prediction spreadsheet in MODDE. It appears possible to suppress the standard deviation below 70, but not below 60. As a consequence, the conclusion of the classical analysis approach is that the mixture of wheat flours used for loafbaking cannot be made sufficiently insensitive towards changes in mixing and proofing times between different bakeries.
Investigation: Loafvol2 (PLS, comp.=2) Scaled & Centered Coefficients for loafvolume Investigation: Loafvol2 (PLS, comp.=2) Scaled & Centered Coefficients for stdev 20 20 cm3 cm3 Tj*Tj 10 0 -10 -20 Tj*Ha Tj*Tj Ha*Ha Ha*Ha Fo*Ha Tj*Ha Tj Tj*Fo Tj Tj*Fo Fo*Ha Ha Fo*Fo Ha Fo*Fo Fo Fo
-20
N=10 DF=4
R2=0.953 Q2=0.782
R2 Adj.=0.894 RSD=10.4747 Conf. lev.=0.95

MODDE 7 - 2003-11-18 09:21:21
N=10 DF=4
R2=0.795 Q2=0.281
R2 Adj.=0.539 RSD=8.5309 Conf. lev.=0.95

MODDE 7 - 2003-11-18 09:21:09
Page 5 (8)
Task 4
The PLS modelling resulted in a strong model for loaf volume. The R2 and Q2 values of this model are slightly lower than the corresponding values for the previous model regarding the average loaf volume, but the model is very good. The ANOVA table and the N-plot of residuals also suggest that the acquired model is good. In addition, the two PLS score plots reveal the strong correlation among the five factors and the response. When looking at the regression coefficients we realise the strong impact of proofing time (1st bar in coefficient plot) on loaf volume. Generally, with longer proofing time larger loaves are produced. This sensitivity to proofing time means that baking specifications distributed among the different bakeries ought to contain a recommendation regarding an appropriate proofing time. The time used for mixing the dough is less critical. Unfortunately, because there is no strong interaction between the process factors (proofing time & mixing time) and the mixture factors (three types of wheat flour) it will not be possible to adjust the mixture factors and affect loaf volume and minimise the spread in this property.
Investigation: Loafvolume (PLS, comp.=2) Summary of Fit 1.00 0.80 0.60 0.40 0.20 0.00
N=90 DF=75
R2 Q2
loafvolume
Investigation: Loafvolume (PLS, comp.=2) loafvolume with Experiment Number labels 0.995 0.99 0.98 0.95 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.05 0.02 0.01 0.005
Investigation: Loafvolume (PLS, comp.=2) Scaled & Centered Coefficients for loafvolume
17
-2
6 74 80 70 55 26 4 5 34 77 2 0 60 40 78 81 33 82 39 65 84 49 29 37 76 57 63 8 3 3 4 19 73 36 1 2 8 71 59 89 67 43 15 56 52 30 9 50 64 23 42 62 2 2 7 25 53 18 13 86 66 69 88 32 79 51 14 38 47 61 27 87 31 6 8 1 44 5 16 85 11 41 9 0 2 72 28 46 1 0 24 48 5 4 21 75 35 58
-1 0 1 2 Standardized Residuals
80 60 cm3 40 20 0 -20 Pr Mi Tj Fo Ha Pr*Pr Mi*Mi Tj*Tj Fo*Fo Ha*Ha Pr*Mi Pr*Tj Pr*Fo Pr*Ha Mi*Tj Mi*Fo Mi*Ha Tj*Fo Tj*Ha Fo*Ha
MODDE 7 - 2003-11-18 09:27:03
N-Probability
N=90 DF=75
R2=0.894 Q2=0.754
R2 Adj.=0.874 RSD=22.6934
MODDE 7 - 2003-11-18 09:26:39
Page 6 (8)
Investigation: Loafvolume (PLS, comp.=2) Score Scatter: t[1] vs u[1] with Experiment Number labels 4
Investigation: Loafvolume (PLS, comp.=2) Score Scatter: t[2] vs u[2] with Experiment Number labels 3 2 1 u[2] 0 -1 -2 -3
80 81 71 26 77 78 72 86 87 62 45 63 53 70 60 36 34 69 88 54 59 44 50 68 79 27 35 84 52 74 33 23 61 51 25 18 32 83 43 42 76 49 41 14 6 8 40 65 67 20 17 15 85 22 9 16 66 56 13 29 3124 7 57 5 55 47 4 39 75 38 58 82 19 11 30 37 12 73 48 64 2 3 46 21 10 28 1
-3 -2 -1 0 t[1]
N=90 DF=75 Cond. no.=8.2742 Y-miss=0
8990
2 u[1]
-2
82
-4 -3
680 45 8 26 81 63 9 60 62 34 77 7 36 4 70 78 59 53 71 44 5 33 43 40 42 74 18 50 3 52 72 61 55 32 41 89 15 54 4925 20 39 23 14 27 51 2 35 69 79 29 57 56 76 90 13 116 68 37 38 65 2231 17 67 12 30 47 86 87 11 24 19 58 66 88 84 73 48 83 10 64 46 28 75 21 85

-2 -1 t[2]
N=90 DF=75 Cond. no.=8.2742 Y-miss=0
MODDE offers another graphical option to overview the modelling outcome, the 4D-mixture contour plot, which is displayed below. In this plot the colour coding has been made consistent across all nine plots. Thus, we can, for example, observe that when proofing time is kept low, it is impossible to manufacture loaves of the desired volume (530 cm3).
Page 7 (8)
It is also possible to make a response contour plot showing how the loaf volume changes as a function of proofing and mixing times, at the identified mixture composition 0.25/0.11/0.64/ (Tjalve/Folke/HardRS). The first plot shows how this is accomplished in MODDE and the second plot is the resulting graph. From the lower graph we may conclude that the loaf volume varies when changing proofing and mixing times, that is, the composition of the wheat flour mixture cannot be made such that the resulting loaves become insensitive to changes in the two process factors. To accomplish robustness in this respect much tougher restrictions are needed on the proofing and mixing times.
Conclusions
Loaf volume varies when changing proofing and mixing times. This means that the mixture of the wheat flours cannot be made insensitive to changes in the two process factors. To accomplish robustness in this respect much tougher restrictions are needed on the proofing and mixing times.
Page 8 (8)
DOE-Exercise Model Updating

Using D-optimal design to update an existing model
Background
Complementing an executed experimental design with additional runs is a common need in DOE. For instance, in a screening situation one may use fold-over to add more experiments to the initial fractional factorial design. Additionally, factorial and fractional factorial designs may be upgraded to more elaborate composite designs (CCF or CCC). Design augmentation may also be undertaken after optimization with the goal of transmuting e.g. a quadratic model to a cubic model. A common feature for these design augmentation principles is that the complement runs are appended to improve the modeling results in a general sense. Therefore, the model upgrading is quite unselective, as it applies to model terms originating from all factors varied. However, sometimes such a broad and unselective design augmentation might not provide the optimal solution to a problem. Rather, it might be desirable to select a critically low number of extra experiments, which are tailored to the estimation of a small set of new, well-identified model terms. This can be accomplished through D-optimal design.
Objective
In this example, we are going to work with a screening application concerning laser welding of nickel material in plate heat exchangers. The objective is not so much to deal with the regression analysis, but to focus on how to add extra runs to the original experimental protocol.
Data
This example relates to one step in the process of fabricating a plate heat exchanger, a laser welding step involving the metal nickel. The investigator, Erik Vnnman, studied the influence of four factors on the shape and quality of the resulting weld. These factors were Power of laser, Speed of laser, Gas flow at Nozzle of welding equipment, and Gas flow at Root, that is, the underside of the welding equipment. One important response is the width of the weld, which should be in the range 0.7-1.0 mm.
Page 1 (13)
Tasks
Task 1
Define a new project in MODDE consisting of four factors and one response. The design you will need is the 24fractional factorial design (8 + 3 runs). This design supports a linear model in the four factors. Enter the response data and fit the linear model to the data.
1
Task 2
Revise the model from Task 1 by estimating also the cross-term Po*Sp. Discuss the problem of including this term. (Hint: Look at Show/Confoundings).
Page 2 (13)
Task 3
Model updating is often used after screening, when it is necessary to unconfound two-factor interactions. We will now outline the procedure for adding a few extra experiments to the laser welding data set. Step 1: Make a copy of the current investigation and switch to this copy. Step 2: In the new application, do File/Complement design (this opens a wizard)
Step 3: Select D-optimal design
Page 3 (13)
Step 4: Select the number of additional runs Comment: To unconfound two two-factor interactions 4 extra experiments are appropriate. This implies that a balanced number of additional experiments is added.
Step 5: Edit the model and add the interesting term(s).
Page 4 (13)
Step 6: Select the number of additional center-points and name the new investigation Comment: If we do not want to include any center points in the design supplement, the number of center points should be set to zero. This is appropriate if the time span between the 11 first experiments and the new ones is short. Conversely, if considerable time has elapsed between the initial and the new experiments, it is recommended to add one or two center-points to test that the system is stable over time.
Step 7: Select Screening and 15 + 2 runs as lead numbers.
Page 5 (13)
Step 8: Generate D-optimal designs with 15 runs (here: five repititions)
Step 9: Evaluate the resulting designs. In this case all five alternatives are identical
Page 6 (13)
Step 10: Generate the selected design
Design tailor-made to resolve Po*Sp and No*Ro !!!
Your task is now the following: Use the approach outlined above and propose an updated experimental design, which is able to resolve Po*Sp and No*Ro from one another. How many extra runs do you think are necessary? Experiment by selecting different number of runs and repititions. Use the condition number and the G-efficiency to identify a suitable design! Also remember that many D-optimal proposals may exist with similar performance measures. It may be necessary to plot the configuration of a set of design candidates to identify the preferred design version. Note: Our solutions to this task display designs different from the one presented above.
Page 7 (13)
Solutions to MODEL UPDATING

Task 1
The geometry of the fractional factorial design selected is shown below. It is a balanced design, which means that all factors are investigated at both levels of the other factors.
As shown by the summary of fit plot below, the linear regression model is not reliable because of the large gap between R2 and Q2. We must then try to identify the cause of the low model quality. However, neither the analysis of variance nor the N-plot of residuals highlight any apparent reason for the model insufficiency. The regression coefficient plot shows that the factors Power of laser and Speed of laser dominate the model. Something to test in order to improve the model is to estimate the cross-term between these two factors. This is dealt with in Task 2.
Investigation: Updating (MLR) Summary of Fit 1.00 0.80 0.60 0.40 0.20 0.00 -0.20
N=11 DF=6
Width
Page 8 (13)
Investigation: Updating (MLR) Width with Experiment Number labels 0.95 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.05 -4 -3 -2
Investigation: Updating (MLR) Scaled & Centered Coefficients for Width
N-Probability
5 4
8 1
mm
10 7 2 9 11
3 6
0.20
0.00
-0.20
-1
-0.40 No Po Sp Ro Po*Sp Ro

N=11 DF=6 R2=0.816 Q2=-0.068 R2 Adj.=0.693 RSD=0.1732
MODDE 7 - 2003-11-18 10:38:58
N=11 DF=6
R2=0.816 Q2=-0.068
R2 Adj.=0.693 RSD=0.1732 Conf. lev.=0.95

MODDE 7 - 2003-11-18 10:39:14
Task 2
As seen below, the introduction of the Po*Sp cross-term has a profound impact on the model quality. The regression coefficient plot shows that this term is almost as large as the main effect of Power of laser. Moreover, the model error has been lowered.
Investigation: Updating (MLR) Summary of Fit 1.00 0.80 0.60 0.40 0.20 0.00
N=11 DF=5
Width
Investigation: Updating (MLR) Width with Experiment Number labels 0.95 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.05 -4
Investigation: Updating (MLR) Scaled & Centered Coefficients for Width 0.20 0.10 0.00 mm -0.10 -0.20 -0.30 No Po Sp
N-Probability
7 2
-3 -2 -1
5 4
1 8 9 11
10
6 3

N=11 DF=5 R2=0.962 Q2=0.610 R2 Adj.=0.925 RSD=0.0858
MODDE 7 - 2003-11-18 10:40:53
N=11 DF=5
R2=0.962 Q2=0.610
R2 Adj.=0.925 RSD=0.0858 Conf. lev.=0.95

MODDE 7 - 2003-11-18 10:41:02
Page 9 (13)
However, because of the moderate resolution (IV) of the design used, the Po*Sp two-factor interaction is confounded with another two-factor interaction, namely No*Ro (see Correlation Matrix below). Therefore, the coefficient labeled Po*Sp (above) reflects the sum of contributions from the terms Po*Sp and No*Ro (plus a few higher-order interactions which are assumed negligible). The Confoundings list below overviews the confounding pattern of the 24-1 fractional factorial design. The only way to resolve Po*Sp and No*Ro from one another is to conduct more experiments. This is discussed in Task 3.
Page 10 (13)
Task 3
The first aspect to consider at this stage is how many extra runs are needed? In principle, only two extra experiments are needed to resolve the Po*Sp and No*Ro two-factor interactions. In practice, however, four additional runs might offer a more stable solution. We will start by adding 2 experiments. This means that in the overview list of the D-optimal designs we should focus on the designs with 13 runs. Below, we see two different proposals displaying identical condition number and G-efficiency. We can see that for both designs variation has been induced in the factors No and Ro. Also note that alternative arrangements of the added experiments exist.
Page 11 (13)
Resource permitting, the addition of four extra experiments will provide even better resolution between Po*Sp and No*Ro. Below, we show the outcome of a design proposal where four extra runs and two extra center points have been appended to the original data set. Remember that many other alternatives exist with identical performance measures. Quite a few of these are not balanced with regards to the four corner runs, meaning that the low and high level of each factor are not explored using the same number of runs for each level. A 2 + 2 distribution is preferable to a 1 + 3 distribution.
Page 12 (13)
Conclusions
In the first instance, the researcher conducted a 24-1 fractional factorial design with three center-points, that is, eleven experiments. In the analysis it was found that one two-factor interaction, the one between Po and Sp, was influential. However, because of the moderate resolution of this design, this two-factor interaction is confounded with another two-factor interaction, namely No*Ro. An escape-route out of this problem is to complement the existing design with more experiments. One possibility is the fold-over design, which enables resolution of Po*Sp from No*Ro, as well as resolution of the remaining four two-factor interactions. The disadvantage of making the fold-over is a lot of extra experiments. Eleven additional runs are necessary. An alternative approach in this case, less costly in terms of experiments, is to make a D-optimal design updating, adding only a limited number of extra runs. It was shown how either two or four extra experiments, plus an optional number of centerpoints, could be added to the starting design to achieve this objective. The importance of design balancing was also addressed.
Page 13 (13)
DOE-Exercise Blocking (RSM)

Investigating block effects in a CCC-design
Background
In a chemical experiment the influence of two factors (time and temperature) on the yield of the main product was investigated. Initially a 22 factorial design augmented with two centre-points was performed. Preliminary examination of the results indicated that the experimental design was correctly positioned in the experimental space and hence there was no need to adjust the low and high settings of the factors. However, there was some indication of a non-linear relationship between the factors and the response and so the design was upgraded to a CCC design by adding the star points and two additional centre points. This design comprised 12 experiments: 4 corner points, 4 star points and 2+2 centre points. This data set is taken from Box GEP, Hunter WG, Hunter JS, Statistics for experimenters, John Wiley & Sons, 1978, p. 519.
Objective
The objectives of this example were two-fold: (1) to identify the optimal settings of time and temperature, (2) to investigate whether there was evidence of a shift in the response data between the two series of experiments (i.e. whether there were significant block effects or not).
Data
Page 1 (4)
Tasks
Task 1
Define a new investigation in MODDE with two factors and one response. Select RSM, the CCC design using two blocks and two center points in each block. Make sure that you tick the Block interactions check box.
Enter the response data. Note that the values of the star points were rounded to the nearest integer by the experimenters so amend the factor settings accordingly. Evaluate the raw data. Fit the regression model. Which factors affect yield? Are there any nonsignificant model terms? Which factor combination optimises yield? What about the block effects are they significant?
Page 2 (4)
Solutions to Blocking
Task 1
We start by evaluating the raw data. First we examine the replicate plot which shows that the replicate error is low, which is good. The histogram shows that the response is approximately normally distributed indicating that we have good data to work with.
Investigation: Blocking_RSM Plot of Replications for Yield with Experiment Number labels
B1 B2
Investigation: Blocking_RSM Histogram of Yield

5
90 88 86 Yield 84 82 80 78 1
5 6
Count
11 12 7 8 9 10
1
2 3 4
4
MODDE 7 - 2004-02-04 15:05:16
0 77 81 85 Bins 89 93
MODDE 7 - 2004-02-04 15:04:39
A strong model was obtained with R2=0.98, Q2=0.95, Model Validity=0.99 and Reproducibility=0.88. The regression coefficients indicate that a low value of time is best but the linear effect of temperature is not significant. However, the quadratic terms of both time and temperature are significant. The block factor and its interactions with time and temperature are not significant. However, the deletion of any of these model terms causes the model quality to deteriorate and so they are kept in the model. There is some evidence that slightly lower yields were obtained in the second set of runs.
Investigation: Blocking_RSM (MLR) R2 Q2 Model Validity Summary of Fit

1.00
Investigation: Blocking_RSM (MLR) Scaled & Centered Coefficients for Yield (Extended)
2 0 g
Reproducibility
0.80
-2 -4
0.60
-6 $Blo(B1) $Blo(B2) Tim*$Blo(B1) Tim*$Blo(B2) Temp*$Blo(B1) Temp*$Blo(B2) Tim Tim*Tim Temp Temp*Temp Tim*Temp
0.40
0.20
N=12 DF=3 R2=0.978 Q2=0.949
0.00 Yield
R2 Adj.=0.919 RSD=1.2524 Conf. lev.=0.95

MODDE 7 - 2004-02-06 13:37:45
Page 3 (4)
The two response surface plots below visualise that higher yields were obtained in the first set of runs (the factorial part of the design). The average difference between the two blocks is 1.76g.
Conclusions
To maximise yield we should use Time=76 min and Temperature=151C. There is a mild shift in yields between the two blocks of experiments.
Page 4 (4)
DOE-Exercise Mixture Region Training

By-hand training to understand geometries of various mixture regions
Example Lower
Data:
Binder: Oxidizer: Fuel: 0.1-1.0 0.5-1.0 0.1-1.0
Task 1:
Draw the experimental region by-hand.
Task 2:
Use MODDE to calculate the implied upper bounds.
Page 1 (6)
Example Upper
Data:
Task 3:
Task 4:
Use MODDE to calculate the implied lower bounds.
Page 2 (6)
Example Lower and Upper

Data:
Task 5:
Task 6:
Use MODDE to calculate the implied lower and upper bounds.
Page 3 (6)
SOLUTIONS to MIXTURE REGION TRAINING

Task 2:
To calculate compatible bounds MODDE followed the scheme listed in the right-hand part of the figure.
Binder
Fu el =
Calculate the implied upper bounds: R(L) 1- 0.1- 0.5 - 0.1 = 0.3 U i* = L i + R L Binder 0.1-1.0; 0.1 + 0.3 = 0.4 Oxidiser0.5-1.0; 0.5 + 0.3 = 0.8 Fuel 0.1-1.0; 0.1 + 0.3 = 0.4 Dashed lines indicate location of implied upper bounds.
0.1
Fu el =
Binder = 0.4
0.4
z er idi Ox =0 .5
Oxidizer
Ox ze idi r= 0.8
Binder = 0.1
Fuel
Page 4 (6)
Task 4:
Binder
Calculate the implied lower bounds: R(U) = 0.6 + 0.7 + 0.4 -1 = 0.7 Binder Oxidizer Fuel Li=Ui-RU -0.1 - 0.6 0.0 - 0.7 -0.3 - 0.4
Binder = 0.6
There are no implied lower bounds.
Oxidizer
Fu =0 .7
el =
0.4 id Ox
r ize
Fuel
Page 5 (6)
Task 6:
Binder
Check if bounds are consistent
RL = 1-0.2-0.2-0.3= 0.3 RU = 0.6+0.6+0.5 -1 = 0.7 x1 x2 x3 0.2-0.6 0.2-0.6 0.3-0.5 0.5 0.5 0.5
Ox ze idi r= 0.2
0.3
el =
Oxidizer
Fu el =
Fu
0.5
Ox zer idi =0 .5
Binder = 0.5
Dashed lines provide implied upper bounds.
Binder = 0.2
Fuel
Page 6 (6)
DOE-Exercise WAALER (Mixture)

Mixture design for tablet formulation
Background
In tablet manufacturing in pharmaceutical industry it is practical to make experiments according to mixture design. Here, three constituents were varied according to a modified simplex centroid mixture design in order to produce tablets. The three constituents were: cellulose, lactose and dicalciumphosphate.
Objective
The objective of the investigation was to find out how the three excipients influenced release of active substance.
Data
Ten tablets were prepared according to a mixture design in the three excipients mentioned. The response measured was the release (in min) of the active ingredient and this value has to be maximized. The data set is taken from P.J. Waaler, Acta Pharm Nord 4: 9-16, 1992.
Page 1 (4)
Tasks
Task 1
Create a new investigation in MODDE and define the three mixture factors and the single response according to the information given above. Select RSM as objective and accept the first choice design (Modified Simplex Centroid), using Design Runs = 9 and Centerpoints = 1. MODDE now creates a Worksheet identical to the one shown on the foregoing page. Enter the response values.
Task 2
Select PLS as fit technique. Fit the model. Questions to address and answer: Which are the significant terms? Are the residuals approximately normally distributed? What about Lack of Fit? Review the fit and interpret the model. Which formulation corresponds to maximized release (Hint: Use the Optimizer)?
Task 3
The experimenters performed three verifying experiments. x1 0.5 0.333 0.667 x2 0.125 0 0 x3 0.375 0.667 0.333 release 370 340 345
Compute predictions for these formulations to verify the model.
Page 2 (4)
Solutions to WAALER
Task 2
The PLS analysis of the tablet data gave a model with R2 = 0.98 and Q2 = 0.55 (upper left-hand plot). These statistics point to an imperfect model, because R2 substantially exceeds Q2. Unfortunately, the second diagnostic tool (upper right-hand plot), the ANOVA table, is incomplete because the lack of fit test could not be performed. However, a possible reason for the poor modelling is found when looking at the N-plot of the response residuals given in the middle left-hand figure. Experiment number 10 is an outlier and degrades the predictive ability of the model. If this experiment is omitted and the model refitted, Q2 will increase from 0.55 to 0.69. We decided not to remove the outlier, primarily to conform with the modelling procedure of the original literature source. The subsequent three plots show the inner relation for the respective PLS model dimension.
Investigation: Waaler_rsm (PLS, comp.=3) Summary of Fit 1.00 0.80 0.60 0.40 0.20 0.00
R2 Q2
release
N=10 DF=4
Investigation: Waaler_rsm (PLS, comp.=3) release with Experiment Number labels 0.95 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.05
Investigation: Waaler_rsm (PLS, comp.=3) Score Scatter: t[1] vs u[1] with Experiment Number labels 3
5 6 7 10
4 9 6 5
u[1]
2 1 0 -1 -2 -3
N-Probability
7 2 3 10
-1
8 2
4
-3 -2 -1 0 t[1] 1 2 3
N=10 DF=4
R2=0.985 Q2=0.553
R2 Adj.=0.966 RSD=18.7170
MODDE 7 - 2003-11-20 09:07:23
N=10 DF=4
Investigation: Waaler_rsm (PLS, comp.=3) Score Scatter: t[2] vs u[2] with Experiment Number labels
Investigation: Waaler_rsm (PLS, comp.=3) Score Scatter: t[3] vs u[3] with Experiment Number labels
1.00 0.50 u[2] 0.00 -0.50 -1.00 -1.50 -1.50
5 9 1 3 2
u[3]
1 0 -1 -2 -3
1 23 6 9 4 8 7 5
87
10
4
-1.00 -0.50 0.00 t[2]
0.50
1.00
-4
10
-1 0 t[3]
Page 3 (4)
Scaled and centered regression coefficients of the computed model are plotted in the left-hand plot below. This coefficient plot shows that in order to maximize the release, the amount of lactose in the recipe should be kept low and the amount of phosphate high. The presence of significant square and interaction terms indicate the existence of quadratic behavior and non-linear blending effects. These effects are more easily understood by means of the trilinear mixture contour plot shown in the righthand plot below. This latter plot suggests that with the mixture composition 0.32/0/0.68 one may expect a response value above 350. This point should be tested in reality, thus functioning as an experimental verification of the model.
100 50 min 0 -50 -100 ce
la*la
ph*ph
N=10 DF=4
R2=0.985 Q2=0.553
R2 Adj.=0.966 RSD=18.7170 Conf. lev.=0.95

MODDE 7 - 2003-11-20 09:09:41
Task 3
In this application, the optimizer identified only one point, the mixture 0.32/0/0.68, where maximum release rate was predicted at 363 minutes. This point was not tested in the original work, but one close to it was. The experimenters performed three verifying experiments and these results together with model predictions are summarized in the figure below. As seen, the model predicts well except for the mixture 0.5/0.125/0.375. Recall that the observed values (for the first three rows in the figure below) were 370, 340, and 345.
Conclusions
Maximum release is predicted for the combination 0.32 / 0 / 0.68. The experimental verification produced good agreement between measured and predicted response values for two out of three new formulations. The discrepancy between measured and predicted release for the remaining point suggests some information deficiency in the training set. One way to address this problem is to combine the two sets of data and then update the regression model. As a consequence, a new set of prediction samples should be compiled in order to verify the predictive power of this updated model.
ce*ph
ce*ce
la*ph
la
ce*la
ph
Page 4 (4)
DOE-Exercise ROCKET (Mixture)

Optimization of elasticity of a rocket propellant
Background
A manufacturer of a rocket propellant mixed three ingredients together to get the best possible product.
Objective
The objective was to formulate a propellant with elasticity > 2900.
Data
Three ingredients, mixture factors, were varied and one response (elasticity) was measured. The data table is shown below. Design: Modified Simplex Centroid. Model: Quadratic model.
Page 1 (5)
Task 1
Create a new investigation in MODDE according to the information given above. Select RSM as objective, a quadratic model, and generate a modified simplex centroid design with 9 + 1 runs. Enter the response data.
Task 2
Evaluate the raw data. Make a histogram to evaluate the distribution of elasticity, and a replicate plot to explore the replicate error. Are there any anomalies in the raw data?
Task 3
Select PLS as fit method. Relate the predictors to the response. Investigate the relevant score and loading plots. Interpret the model. What can you say about the correlation structure among the factors and responses (Hint: Look at PLS score plots)? Which factors are influential for elasticity? Which formulation should be used to maintain an elasticity above 2900?
Page 2 (5)
Solutions to ROCKET Task 2

According to the histogram elasticity is almost normally distributed. As a consequence, it was chosen to work with no transformation of the response variable. The replicate plot shows that the data contains no replicates, which implies that the ANOVA cannot be carried out fully (see below). We have also included a plot of the correlation matrix, just to always keep in mind the correlation of the factors arising from the overall mixture constraint (sum of all factors = 1.0).
Investigation: Rocket Histogram of Elasticity 4
3000 2900 Elasticity 2800 2700 2600 2500 2400 Investigation: Rocket Plot of Replications for Elasticity with Experiment Number labels
6 5 3 2 1
1 2 3 4
9 8 7
10
3 Count 2 1 0
4
MODDE 7 - 2003-11-19 15:00:13
2350
2550
2750 Bins
2950
3150
MODDE 7 - 2003-11-19 14:59:50
Page 3 (5)
Task 3
A two-component PLS model was obtained with R2 = 0.80 and Q2 = 0.25. The gap between R2 and Q2 is large and this is unsatisfactory. The PLS total summary plot shows that the first component is the most important regarding explained variance. In order to investigate the correlation structure, we have plotted the t/u scores of the two model components. These indicate a curved correlation structure in the first component, and that the second component basically is a compensation for the encountered non-linear behavior. Further, the ANOVA table shows that the model is insignificant (p = 0.14, should be p< 0.05 for a significant model). The N-plot of residuals shows a weakly deviating behavior of experiment number 4, but since it lies inside 4SD.s it was kept in the modelling.
Investigation: Rocket (PLS, comp.=2) Summary of Fit 1.00 0.80 0.60 0.40 0.20 0.00
N=10 DF=4
R2 Q2
Investigation: Rocket (PLS, comp.=2) PLS Total Summary (cum)

1.00
R2 Q2
0.80
0.60 R2 & Q2 0.40 0.20 0.00
Elasticity
Comp1
Comp2
N=10 DF=4
Investigation: Rocket (PLS, comp.=2) Score Scatter: t[1] vs u[1] with Experiment Number labels 2 1 u[1] 0 -1 -2
Investigation: Rocket (PLS, comp.=2) Score Scatter: t[2] vs u[2] with Experiment Number labels 2 1 0 u[2]
6 8 5 3
109
9 2 3 4
-3 -2 -1 t[2]
-1 -2 -3
1 87 5
10 6
2 1
-3 -2 -1 t[1]
4
0 1
-4
Investigation: Rocket (PLS, comp.=2) Elasticity with Experiment Number labels 0.95 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.05
N-Probability
5 4
-1
3 2 1 6
9 10
N=10 DF=4
R2=0.801 Q2=0.249
R2 Adj.=0.553 RSD=160.1071
MODDE 7 - 2003-11-19 15:03:58
Page 4 (5)
The PLS loading plot and the coefficient plot indicates how the various model terms influence the elasticity of the rocket propellant. However, because we have a very weak model we must interpret the model with great care. Some guidance with regards to model refinement may be extracted from the coefficient plot; however, in this case we have not found it possible to improve the model. What one can do in this kind of situation is to use the trilinear mixture contour plot to get a general appraisal of the response function. We understand from the left-hand mixture region plot that we are investigating a small, though simplex-shaped, mixture domain. We conclude from the right-hand mixture contour plot that it seems possible to accomplish an elasticity above 2900 within the investigated region.
Investigation: Rocket (PLS, comp.=2) Loading Scatter: wc[1] vs wc[2] 0.60 0.40 0.20 wc[2] 0.00 -0.20 -0.40 -0.60
Oxi*Fue Bin*Bin Bin Oxi*Oxi Oxi Bin*Fue Ela Fue

200 100 0 -100
Oxi*Oxi
Bin*Oxi
Bin*Bin
Bin
Oxi
-0.50 -0.40 -0.30 -0.20 -0.100.000.100.200.30 0.400.500.60 wc[1]

N=10 DF=4
R2=0.801 Q2=0.249
R2 Adj.=0.553 RSD=160.1071 Conf. lev.=0.95

MODDE 7 - 2003-11-19 15:05:56
Binder
Oxidiser
Fuel
Conclusions
In this case the experimental goal of obtaining an elasticity above 2900 was accomplishable. Binder and Fuel were the two excipients with the largest impact on the result variable. Oxidiser had almost negligible effect on the result variable.
Fue*Fue
Oxi*Fue
Bin*Fue
Fue
Fue*Fue
Bin*Oxi
-200
Page 5 (5)
DOE-Exercise CORNE59 (Mixture)

Optimising the Taste of Fish Pat
Background
A manufacturer of fish pat wanted to produce a quality product irrespective of which species of fish were used. Since the market price of different fish varies considerably, a mixture design was used to locate the best tasting pat.
Objective
The aim was to produce a pat with a taste rating above 3.
Data
There were three ingredients and one response (taste). The data table is shown on the next page. Design: Modified Simplex Centroid. Model: Linear.
Tasks
Task 1
Create a new investigation in MODDE according to the information given above. Make sure that the design has 19 rows and then paste the contents of CORNE59.dif into the worksheet.
Task 2
Evaluate the raw data by inspecting the distribution and replicate error of taste using Worksheet/Histogram and Worksheet/Replicate Plot respectively. Are there any anomalies in the raw data?
Task 3
Select PLS as the fit method, using Analysis/Select Fit Method/PLS, and fit the model. Interpret the model by investigating the relevant score and loading plots using Analysis/PLS Plots/Score Scatter Plot and Analysis/PLS Plots/Loading Scatter Plot respectively. What can you say about the correlation structure among the three ingredients and taste (hint: look at the PLS score plots)? Check the validity of the model by looking at the ANOVA table and residual plot using Analysis/ANOVA/Anova Table and Analysis//Normal Prob. Plot Residuals respectively. Use loadings and coefficient plots, Analysis/Coefficients/Plot, to investigate which ingredients influence the taste of the pat? Which recipe gives a taste above 3?
Page 1 (5)
Experimental Data Experiments 1-10 comprise the original design, and experiments 11-19 are replicates.
Page 2 (5)
Solutions to Corne59
Task 2
The histogram suggests that a transformation, such as log, would be preferable. However, for the sake of this preliminary analysis of the data we will not transform the response. The replicate plot clearly illustrates the small replicate error. The correlation matrix is also shown below in order to illustrate the inherent correlation between the ingredients due to the overall mixture constraint, i.e. sum of ingredients = 1.0.
Investigation: Corne59 Histogram of taste 8 7 6 Count
taste 4 3 2 1 5 Investigation: Corne59 Plot of Replications for taste with Experiment Number labels
11 1
5 4 3 2 1 0 1.00 1.90 2.80 Bins 3.70 4.60 5.50
13 3 2 12
1 2 3 4
18 15 5 4 14 17
5 6
19 16 6
7 9 8
7 8 9
10
10
Replicate Index
MODDE 7 - 2003-11-19 11:27:27
MODDE 7 - 2003-11-19 11:26:54
Page 3 (5)
Task 3
A three-component PLS model was obtained with R2=0.97, Q2=0.90, MVal = 0.30, and Rep = 0.98. The PLS Total Summary plot shows that the first component is by far the most important in terms of variance explained. In order to investigate the correlation structure, we have plotted the t/u scores of the first two model components which indicate a strong relationship between taste and the three ingredients. The ANOVA table and the N-plot of residuals also indicate an excellent model.
Investigation: Corne59 (PLS, comp.=3) Summary of Fit 1.00 0.80 0.60 0.40 0.20 0.00
N=19 DF=13
Investigation: Corne59 (PLS, comp.=3) PLS Total Summary (cum)

1.00
R2 Q2
0.80
R2 & Q2
0.60
0.40
0.20
taste
0.00 Comp1
N=19 DF=13
Comp2
Comp3
Investigation: Corne59 (PLS, comp.=3) Score Scatter: t[1] vs u[1] with Experiment Number labels
4 3 2 1 0 -1 -2 -3
1
Investigation: Corne59 (PLS, comp.=3) Score Scatter: t[2] vs u[2] with Experiment Number labels
11 1
18 15 10 8 19 16 6 7 9 5 11 1 13 3 2 12
2 12
-2
8
-1
18 15 13 5 3 10 19 16 6 9 4 14 17
0 t[1]
N=19 DF=13
u[1]
u[2]
-1
-2
4 14 17
-1
0 t[2]
N=19 DF=13 Cond. no.=6.5072 Y-miss=0
Investigation: Corne59 (PLS, comp.=3) taste with Experiment Number labels 0.98 0.95 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.05 0.02
N-Probability
7
-1
1 14 3 17 12 6
415 11 19 2 8 16
13 18
10
N=19 DF=13
R2=0.971 Q2=0.905
R2 Adj.=0.960 RSD=0.1964
MODDE 7 - 2003-11-19 11:34:17
The PLS loadings plot and the coefficient plot indicate that all three mixture ingredients affect the taste of the fish pat. The response contour plot shows that, in order to achieve a taste rating above 3, you need to be in the upper part of the mixture triangle, i.e. high x1 and low x2. There is also clear evidence of non-linear blending.
Page 4 (5)
Investigation: Corne59 (PLS, comp.=3) Loading Scatter: wc[1] vs wc[2] 0.60 0.40 wc[2] 0.20 0.00 -0.20 -0.40
Investigation: Corne59 (PLS, comp.=3) Scaled & Centered Coefficients for taste 0.40
x2*x2
x3 y x1*x1 x3*x3 x1 x2*x3
x1*x3
0.20 0.00 -0.20 -0.40 -0.60 x1 x2 x3 x1*x1 x2*x2 x3*x3 x1*x2 x1*x3 x2*x3
x2
x1*x2
wc[1]
N=19 DF=13 Cond. no.=6.5072 Y-miss=0
-0.50 -0.40 -0.30 -0.20 -0.100.00 0.100.200.30 0.400.50
N=19 DF=13
R2=0.971 Q2=0.905
R2 Adj.=0.960 RSD=0.1964 Conf. lev.=0.95

MODDE 7 - 2003-11-19 11:42:23
Conclusions
There is a strong relationship between taste and the three varied ingredients. To obtain a taste rating above 3, ingredient x1 should be high and ingredient x2 low. This gave the manufacturer a clear strategy for maintaining quality whilst simultaneously reducing cost.
Page 5 (5)
DOE-Exercise BUBBLES (Mixture)

Screening and optimization of bubble formation
Background
Kids like to blow bubbles, but dislike bubbles which burst rapidly. We decided to use mixture design to investigate which factors that may affect bubble formation. We browsed through the Internet to find a suitable bubble mixture composition, which we could use as a starting reference mixture. Then this recipe was modified using mixture design, and bubbles were blown for each mixture composition. The investigator, Lennart Eriksson, carried out these experiments while being on parental leave and taking care of his son, little Andreas, 14 months old. This ensures high bubble quality.
Objective
The objective was to understand which factors that influence the bubble making process (Screening), and to see if some kind of optimal recipe could be formulated, which would ensure long-lasting bubbles (RSM).
Data
The lifetime in seconds was measured for bubbles of 4-5 cm size. The two process factors were: temperature (C) of solution and settling time of mixture (h). The four mixture factors were: dish-washing liquid 1 (DWL1), dish-washing liquid 2 (DWL2), tap water and glycerol.
Page 1 (8)
Tasks
Task 1
In MODDE first define the factors, the response and the constraint as outlined above. Select Screening as the objective. The process model should be an interaction model, and the mixture model a linear model. Create a Doptimal design with 24 runs. Edit the reference mixture so that it becomes 0.2 / 0.2 / 0.5 / 0.1. Open BUBB_SCR.XLS and paste in the real data. Select PLS as fit method and compute the model. Review and interpret the model. Which terms are important? Is there any deviating experiment? How should we proceed to improve the result (get longer-lasting bubbles)?
Task 2
Refine the model from the previous task by removing all insignificant terms. Refit and evaluate the updated model. Which factors are most meaningful to optimize? How should we proceed to improve the result (get longer-lasting bubbles)? Use the MODDE Optimizer to get some suggestions for future experiments.
Task 3
We are now going to use the results of the screening phase and construct an appropriate RSM design. This means that we will put the two process factors temperature (7 C) and time (25 h) as constants, and vary only the four mixture factors. We shall use the mixture composition 0.2 / 0.2 / 0.3 / 0.3 as our new reference mixture. In MODDE, make a copy investigation and re-define the factor settings and the DWL-constraint according to the following:
The response should be the lifetime of the bubbles acquired (log transformed). Select RSM as the objective. The mixture model should be a quadratic model. Create a D-Optimal design with 24 runs. Edit the reference mixture so that it becomes 0.2 / 0.2 / 0.3 / 0.3. Open BUBB_RSM.XLS and paste in the real data. Select PLS as fit method and compute the model. Review and interpret the model. Which terms are important? Is there any deviating experiment? Is it possible to even further increase the lifetime of the bubbles (longer than the measured 18.40 min)? Is it possible to find an optimum within the investigated region? Use the Optimizer to explore the mixture region.
Page 2 (8)
Data Set for BUBBLES Screening
Data Set for BUBBLES - RSM
Page 3 (8)
Solutions to BUBBLES
Task 1
We can see that the distribution of the response is skewed to the right it needs to be log-transformed. The replicate plot shows that the pure error is reasonably low.
Investigation: Bubb_scr Histogram of Lifetime 10 15
2.50
Investigation: Bubb_scr Histogram of Lifetime~
Investigation: Bubb_scr Plot of Replications for Lifetime~ with Experiment Number labels
9 1 5 4 13 19 20 17 18 16 14 15 23 21 22 24
8
Lifetime~
Count
Count
10
6 4 2
2.00
1.50
8 2 3
4 6
12 11 10
11
81
151
221 Bins
291
361
431
1.00
1.30
1.60
1.90 Bins
2.20
2.50
2.80
1.00 0
7
8 10 12 14 16 18 20 22 Replicate Index
MODDE 7 - 2003-11-19 10:44:31
MODDE 7 - 2003-11-19 10:48:23
MODDE 7 - 2003-11-19 10:48:50
PLS was used to fit a model to the data, yielding R2 = 0.81, Q2 = 0.18, MVal = -0.2 and Rep = 0.93. There are several insignificant cross-terms, which cause the low Q2 and MVal. Remove these terms and refit the model.
Investigation: Bubb_scr (PLS, comp.=2) Summary of Fit 1.00 0.80 0.60
0.50
Large CI.s not a bug, but a theory problem
s
0.40 0.20 0.00 -0.20
N=24 DF=11
0.00
-0.50 Te*DW1 Te*DW2 Te*Wa Ti*DW2 Ti*Wa Ti*Gly Gly Te*Ti Te*Gly Ti*DW1 Te Ti
Lifetime~
N=24 DF=11
DW1 DW2 Wa
R2=0.812 Q2=0.185
R2 Adj.=0.608 RSD=0.2476 Conf. lev.=0.95

MODDE 7 - 2003-11-19 10:50:35
Task 2
When refitting the model a much better result was obtained. The refined model looks good according to R2/Q2, N-plot of residuals and Obs/pred. The ANOVA table and the MVal statistic show lack of fit, however, but the model is still useful. The model interpretation (with loadings or coefficients) indicates that in order to accomplish longer lasting bubbles the fraction of glycerol should be increased and the amount of water decreased. In the interpretation one must remember that the regression coefficients refer to the 0.2 / 0.2 / 0.5 / 0.1 reference mixture.
Page 4 (8)
Investigation: Bubb_scr (PLS, comp.=2) Summary of Fit 1.00 0.80 0.60 0.40 0.20 0.00
N=24 DF=18
Lifetime~
Investigation: Bubb_scr (PLS, comp.=2) Lifetime~ with Experiment Number labels 0.98 0.95 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.05 0.02
Investigation: Bubb_scr (PLS, comp.=2) Lifetime~ with Experiment Number labels
12
2 10
-1
20 13 5 23 17 1 3 11 9 18 21 14 227 4 24 6 8 16
0 1
19
15
Observed
2.50
9 19 20 1 17 18 16 23 21 22 4 12 24 14 8 10 5 13
N-Probability
2.00
1.50
15 3
1.00 7 1.00 1.20 1.40 1.60 1.80 2.00 2.20 2.40 2.60 Predicted
N=24 DF=18 R2=0.796 Q2=0.640 R2 Adj.=0.739 RSD=0.2018
MODDE 7 - 2003-11-19 10:58:44
11 2 6
N=24 DF=18 R2=0.796 Q2=0.640 R2 Adj.=0.739 RSD=0.2018
MODDE 7 - 2003-11-19 10:58:13
Investigation: Bubb_scr (PLS, comp.=2) Loading Scatter: wc[1] vs wc[2] 0.60 0.40 wc[2] 0.20 0.00 -0.20 -0.40 -0.60 -0.40 -0.20 0.00 wc[1]
N=24 DF=18 Cond. no.=2.1537 Y-miss=0
Investigation: Bubb_scr (PLS, comp.=2) Scaled & Centered Coefficients for Lifetime~ 0.30
Ti Wa Gly Li~
s
0.20 0.10 0.00
DW2 Te
DW1
0.40 0.60
-0.10 -0.20 DW1 DW2 Wa Gly Te Ti

R2=0.796 Q2=0.640
0.20
N=24 DF=18
R2 Adj.=0.739 RSD=0.2018 Conf. lev.=0.95

MODDE 7 - 2003-11-19 10:59:18
We then used the MODDE optimizer to compute predictions of where to lay out an optimization design. Two such predicted mixture compositions are shown below, together with the results from the verifying experiments. It was decided to use the first verifying experiment as the reference for the RSM mixture design.
Page 5 (8)
Predictions from MODDE Optimizer:
Verifying experiments:
#1 Temp = 7 Time = 25 Mixture = 0.2 / 0.2 / 0.3 / 0.3 Lifetime = 1120 sec (18 min 40 sec)
#2 Temp = 7 Time = 49 Mixture = 0.4 / 0.0 / 0.3 / 0.3 Lifetime = 810 sec (13 min 30 sec)
Task 3
The replicate plot shows that the pure error is low. This plot also indicates that the replicates, i.e., the reference mixture measurements, lie in the upper part of the response interval. This indicates that a quadratic model is needed. The fitted quadratic PLS model had R2 = 0.92, Q2 = 0.71, MVal = 0.56, and Rep = 0.95, which are good values, and of sufficient quality for making an optimization. The model shows no lack of fit (ANOVA table) and has approximately normally distributed residuals. The PLS score plot demonstrates the good correlation between mixture composition and bubble lifetime. According to the coefficient plot, the excipients water and glycerol have most impact on bubble lifetime in the mixture region explored. Remember that the reference mixture is 0.2 / 0.2 / 0.3 / 0.3.
Investigation: Bubb_rsm Plot of Replications for Lifetime~ with Experiment Number labels Investigation: Bubb_rsm (PLS, comp.=2) Summary of Fit 1.00 0.80 0.60 0.40
3.10 Lifetime~
1 2 5 4 3
2 4 6 8
8 11 6 7 9
10
13 14
20 19 22 23 21 24 15 17 16
3.00
10 12 18
16 18 20 22
2.90
0.20 0.00
N=24 DF=14
2.80 0
12
14
Replicate Index
MODDE 7 - 2003-11-19 11:12:35
Lifetime~
Page 6 (8)
Investigation: Bubb_rsm (PLS, comp.=2) Lifetime~ with Experiment Number labels 0.98 0.95 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.05 0.02
N-Probability
11
7
-1
1 16 6 9 512
4 15 22 23 14 21 19 13 18 20 8 10 3 24 2 17
N=24 DF=14
R2=0.919 Q2=0.708
R2 Adj.=0.868 RSD=0.0358
MODDE 7 - 2003-11-19 11:13:20
Investigation: Bubb_rsm (PLS, comp.=2) Score Scatter: t[1] vs u[1] with Experiment Number labels 2 1 0 u[1] -1 -2 -3
Investigation: Bubb_rsm (PLS, comp.=2) Scaled & Centered Coefficients for Lifetime~ 0.060 0.040 0.020 0.000 -0.020 -0.040 Gly*Gly Gly DW1*Gly DW1*Wa DW1*DW1 DW2*DW2 DW1*DW2 DW2*Wa DW2*Gly Wa*Wa Wa*Gly -0.060 DW1 DW2 Wa
15 4 18 3
-3 -2 -1 t[1]
2 10 5
13 14 20 18 16 22 19 23 21 24 1117
6 12 9 7
1
N=24 DF=14
R2=0.919 Q2=0.708
R2 Adj.=0.868 RSD=0.0358 Conf. lev.=0.95

MODDE 7 - 2003-11-19 11:14:06
N=24 DF=14
Page 7 (8)
Because the PLS model is good according to the evaluation criteria (R2/Q2/MVal/Rep, ANOVA, N-plot, t1/u1 score plot) we may proceed and make predictions. The mixture contour plot displayed below was created by putting glycerol, the most important ingredient, on its high level. Evidently, there is not a sharp optimum, but rather a ridge structure on which bubble lifetime in the span 1350-1360 seconds (approx 22.30 min) is encountered.
With the MODDE optimizer, the following five runs were predicted. They are all situated on the ridge found above.
DWL1 0.22 0.2108 0.2264 0.2229 0.2264 DWL2 0.1001 0.1187 0.1001 0.1001 0.1001 Water 0.2799 0.2705 0.2735 0.2771 0.2735 Glycerol 0.4 0.4 0.4 0.4 0.4 Lifetime 1359.421 1353.342 1360.329 1360.145 1360.329 iter 148 87 84 105 76 log(D) -0.8289 -0.7011 -0.8497 -0.8455 -0.8497
Conclusions
The conclusion is that by first using a screening design, then some steepest ascent predictions, and finally laying out an RSM design, we have made it possible to increase bubble lifetime from 6.02 min to 22.28 min!!!! Unfortunately, however, little Andreas, showed more interest for the little red plastic bubble wand, than for his fathers enormous experimental progress.
Page 8 (8)
DOE-Exercise LOWARP (Mixture)

Optimisation of a Polymer
Background
A manufacturer wanted to develop a new polymer with the properties of low warp and high strength. To achieve this, the polymer formulation was varied according to an extreme vertices mixture design with 14 runs and 3 centre points based on the following constituents: 1 2 3 4 Glas Crtp Mica Amtp 20 to 40 % 0 to 20 % 0 to 20 % 40 to 60 %
Objective
The objective of the investigation was to understand how the four constituents influence the properties of the polymer and if it was possible to manufacture a polymer with the required properties.
Data
Fourteen responses relating to warp, shrinkage and strength were measured on the polymers as shown below.
Page 1 (5)
Tasks
Task 1
Create a new investigation in MODDE and define the four factors and 14 responses (see above). Select SCREENING as the experimental objective. Generate a worksheet with 17 runs and copy/paste the entire data table (including the factor settings) from the file Lowarp.xls.
Task 2
Fit a model relating the constituents (variables 1 - 4) to the responses using PLS. Investigate the relevant score and loading plots using Analysis/PLS Plots/Score Scatter Plots and Analysis/PLS Plots/Loading Scatter Plots respectively and interpret the model. What can you say about the correlation structure among the factors and responses (hint: look at the score plots)? How are the 14 responses related (hint: look at the loading plots)? Which factors influence strength and which factors influence warp?
Page 2 (5)
Solutions to LOWARP
Task 2
PLS gives a three component model with R2 = 0.75 and Q2 = 0.53 which are excellent results considering that all 14 responses are included in one model. The R2 and Q2 values for each individual response are shown in the Summary of Fit plot below. The three PLS score plots confirm the strong correlation between the constituents and the responses. Finally, the DModY plot indicates no outliers in the response data.
Investigation: Lowarp (PLS, comp.=3) PLS Total Summary (cum)
1.00
R2 Q2
Investigation: Lowarp (PLS, comp.=3) Summary of Fit 1.00 0.80 0.60 0.40 0.20
0.80
0.60 R2 & Q2 0.40 0.20
wrp1
wrp2
wrp3
wrp4
wrp5
wrp6
wrp7
wrp8
st1
st2
st3
st4
st5
0.00 Comp1 Comp2 Comp3
N=17 DF=13
N=17 DF=13
Investigation: Lowarp (PLS, comp.=3) Score Scatter: t[1] vs u[1] with Experiment Number labels 2 1 0 -1 -2
16 3 2 17 8 12
10
1
st6
0.00
10 17 14 16 15 7 13 9 5 3
1 11
7 6 11 9 4
-2
14 15 13 1
0 u[2] -1 -2 -3
u[1]
4 12 2
-2 -1
5
-1 0 t[1]
N=17 DF=13 Cond. no.=2.0457 Y-miss=10
0 t[2]
N=17 DF=13
Investigation: Lowarp (PLS, comp.=3) Score Scatter: t[3] vs u[3] with Experiment Number labels 3 2 u[3] 1 0 -1 -2
Investigation: Lowarp (PLS, comp.=3) Distance to Model (Y)
3
2
0.60 0.50 0.40 0.30 0.20 0.10 0.00 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 Experiment Number

N=17 DF=13 Cond. no.=2.0457 Y-miss=10
16 10 13 6 14 11 5 2 15 12 8
-2 -1 0 t[3]
N=17 DF=13 Cond. no.=2.0457 Y-miss=10
9 1
17 7
Page 3 (5)
We have a PLS model characterising all 14 responses. Inspection of the VIP-plot indicates that, taken over all 14 responses, mica and glas are the most influential constituents. Since VIP is a squared function of the PLS loadings, it tells us how important each constituent is but not in which direction (positive or negative) it influences a particular response. This information can be obtained from the loadings plot which shows how the variables (constituents and responses) relate to each other. Observe that the eight warp responses are strongly clustered to the right of the loading plot in the direction of amtp and away from mica. Hence, we conclude that increasing amtp will increase warp whilst increasing mica will work in the opposite direction. The six strength responses are more scattered in the loading plot. This suggests that strength is either more difficult to measure or is a more complex phenomenon. crtp is most influential for st3 and st5, whereas glas is most important for st1, st2, st4 and st6. The four coefficient plots, shown below, illustrate the coefficient profiles for both correlated (wrp1 & wrp2) and uncorrelated (st3 & st4) responses.
Investigation: Lowarp (PLS, comp.=3) Variable Importance Plot 1.20 1.00 0.80 VIP 0.60 0.40 0.20 0.00
-0.50 -0.80 -0.60 -0.40 -0.20 Investigation: Lowarp (PLS, comp.=3) Loading Scatter: wc[1] vs wc[2]
st5st3
0.50 wc[2]
mi
0.00
st1 gl st4 st2 st6 w2 w6 w1 w5 w7 w3 w8 am w4 cr

0.00 wc[1] 0.20 0.40 0.60 0.80
mi
gl
am
cr
N=17 DF=13
N=17 DF=13
Investigation: Lowarp (PLS, comp.=3) Scaled & Centered Coefficients for wrp1 0.50 0.00 -0.50 -1.00 -1.50 gl mi cr am
Investigation: Lowarp (PLS, comp.=3) Scaled & Centered Coefficients for wrp2
0.50 0.00 -0.50 -1.00 -1.50 gl mi cr am

MODDE 7 - 2003-11-19 13:49:23
N=17 DF=13
R2=0.734 Q2=0.610
R2 Adj.=0.672 RSD=0.9196 Conf. lev.=0.95

MODDE 7 - 2003-11-19 13:49:09
N=17 DF=13
R2=0.771 Q2=0.625
R2 Adj.=0.718 RSD=0.8504 Conf. lev.=0.95
Investigation: Lowarp (PLS, comp.=3) Scaled & Centered Coefficients for st3 200 100 0 -100 -200 gl mi cr am
2000 1000 0 -1000 -2000 -3000 gl
Investigation: Lowarp (PLS, comp.=3) Scaled & Centered Coefficients for st4
mi
cr
N=17 DF=13
R2=0.958 Q2=0.931
R2 Adj.=0.949 RSD=83.8212 Conf. lev.=0.95

MODDE 7 - 2003-11-19 13:49:39
N=17 DF=13
R2=0.833 Q2=0.675
R2 Adj.=0.794 RSD=1641.0009Conf. lev.=0.95

MODDE 7 - 2003-11-19 13:49:51
am
Page 4 (5)
Mixture contour plots provide a better understanding of the relationships between warp and strength and the four constituents. These contour plots are shown below for the four responses discussed previously and were constructed by fixing amtp at 0.5 and letting the other three constituents (crtp/mica/glas) vary. The arrow indicates a reasonable compromise among the four responses yielding the desired properties of high strength and low warp. This mixture is approximately glas = 0.3, crtp = 0.0, mica = 0.2 and amtp = 0.5. This mixture should be tested to verify the model predictions.
Conclusions
The application of a simple mixture design to a complex polymer optimisation problem has successfully generated a mixture point with the desired properties.
Page 5 (5)
List of references (last revised 2004-02-10)

A foreword
This is a list covering a small selection of useful references (books and articles) in the fields of design of experiments (DoE) and multivariate analysis (MVA). It is emphasized that this is by no means an exhaustive account of the available literature. Rather, this compilation highlights references which may guide the reader for further studies.
References for DoE

Books
Box G.E.P., Hunter W.G., Hunter J.S., Statistics for Experimenters, John Wiley & Sons, Inc., New York, (1978). 2. Cornell J.A., Experiments with mixtures, John Wiley & Sons, Inc., New York, (1981). 3. Bayne C. K., Rubin I.B., Practical Experimental Designs and Optimization Methods for Chemists, VCH Publishers, Inc., Deerfield Beach, Florida, (1986). 4. Box G.E.P., Draper N.R., Empirical Model-Building and Response Surfaces, John Wiley & Sons, Inc., New York, (1987). 5. Haaland P.D. Experimental designs in biotechnology, Marcel Dekker, Inc., New York, Basel (1989). 6. Carlson R., Design and Optimization in Organic Synthesis, Elsevier science publishers, Amsterdam (1991). 7. Montogomery, D.C., Design and Analysis of Experiments, John Wiley & Sons, New York (1991) ISBN 0471-52994-X. 8. Morgan E. Chemometrics: Experimental Design, John Wiley & Sons, Inc., New York, (1991). 9. Nortvedt R et al., Anvendelse av kjemometri innen forskning og industri, Tidskriftsforlaget Kjemi AS (1996) ISBN 82-91294-01-1. 10. Goupy, J.L., Methods for Experimental Design Principles and Applications for Physicists and Chemists, Elsevier, Amsterdam (1993). 1.
Articles
1. 2. 3. 4. 5. 6. 7. 8. 9. Hendrix, C. (1979), What Every Technologist Should Know About Experimental Design, Chemtech, 9, 167174. Hunter, J.S. (1987), Applying Statistics to Solving Chemical Problems, Chemtech, 17, 167-169. Steinberg, D.M and Hunter, W.G. (1984), Experimental Design: Review and Comments, Technometrics, 26, 71-97. Grize, Y.L. (1995), A Review of Robust Process Design Approaches, Journal of Chemometrics 9, 239-262. Ahlinder, S., et al. (1997), Smart Testing Reaping the Benefits of DoE, Volvo Technology Report No 2 1997, www.volvo.se/rt/trmag/index.html. Nystrm, A. and Karlsson, A. (1997) Enantiomeric Resolution on Chiral-AGP with the aid of Experimental Design. Unusual Effects of Mobile Phase pH and Column Temperature, Journal of Chromatography A, 763, 105-113. Eriksson, L., Johansson, E., Wikstrm, C. (1998), Mixture Design Design Generation, PLS Analysis and Model Usage, Chemometrics and Intelligent Laboratory Systems, 43, 1-24. Lundstedt, T., et al. (1998), Experimental Design and Optimization, Chemometrics and Intelligent Laboratory Systems, 42, 3-40. Rappaport, K.D., et al. (1998), Perspectives on Implementing Statistical Modeling and Design in an Industrial/Chemical Environment, The American Statistician, May 1998, 52, 152-159.
page 1
References for MVA

Books
1. 2. 3. 4. 5. 6. Jollife, I.T. (1986), Principal component analysis, Springer-Verlag, New York (ISBN 0-387-96269-7). Martens, H. and Naes, T. (1989), Multivariate calibration, John Wiley, New York. Jackson, J.E. (1991), A users guide to principal components, John Wiley, New York. (ISBN 0-471-62267-2). Anthology (1996), Anvendelse av Kjemometri innen forskning og industri, Tidsskriftfolaget Kjemi AS, Bergen, Norway (ISBN 82-91294-01-1). Hskuldsson, A. (1996), Prediction Methods in Science and Technology, Thor Publishing, Copenhagen, Denmark (ISBN 87-985941-0-9). Massart, D.L., et al., Handbook of Chemometrics and Qualimetrics. Part A and Part B, Elsevier, Amsterdam (1998).
Articles, general
1. 2. 3. 4. 5. 6. 7. 8. Wold, S., Esbensen, K., Geladi, P. (1987), Principal Component Analysis, Chemometrics and Intelligent Laboratory Systems, 2, 37-52. Hskuldsson, A. (1988), PLS Regression Methods, Journal of Chemometrics, 2, 211-228. Sthle, L., and Wold, S. (1988), Multivariate Data Analysis and Experimental Design in Biomedical Research, In: Ellis, G.P., and West, G.B. (Eds) Progress in Medical Chemistry, Elsevier Science Publishers, 291-338. Wold, S., Albano, C., and Dunn W.J., et al. (1989), Multivariate Data Analysis: Converting Chemical Data tables to plots, In: Computer Applications in Chemical Research and Education, Heidelberg, Dr. Alfred Htig Verlag. Stone M, Brooks RJ (1990): Continuum regression: Cross-validated sequentially constructed prediction embracing ordinary least squares, partial least squares and principal components regression Journal of the Royal Statistical Society, Ser. B, 52, 237-269. Frank, I.E., and Friedman, J.H. (1993), A Statistical View of Some Chemometrics Regression Tools, Technometrics, 35, 109-148. Wold, S. (1994), Exponentially Weighted Moving Principal Components Analysis and Projections to Latent Structures, Chemometrics and Intelligent Laboratory Systems, 23, 149-161. Wold, S., Eriksson, L., and Sjstrm, M. (1999), PLS in Chemistry, in: Encyclopedia of Computational Chemistry, Elsevier, pp 2006-2020.
Articles, process
1. 2. 3. 4. 5. 6. 7. 8. Kresta, J.V., MacGregor J.F., and Marlin T.E. (1991), Multivariate Statistical Monitoring of Process Operating Performance, The Canadian Journal of Chemical Engineering, 69, 35-47. Kourti, T., and MacGregor, J.F. (1995), Process Analysis, Monitoring and Diagnosis, Using Multivariate Projection Methods, Chemometrics and Intelligent Laboratory Systems, 28, 3-21. MacGregor, J.F. (1996), Using, On-line Process Data to Improve Quality, ASQC Statistics Division Newsletter, vol. 16. NO. 2. Page 6-13. Nijhuis, A., de Jong, S., Vandeginste, B.G.M. (1997), Multivariate Statistical Process Control in Chromatography, Chemometrics and Intelligent Laboratory Systems, 38, 51-61. Rnnar, S., McGregor, J.F., and Wold, S. (1998), Adaptive Batch Monitoring Using Hierarchical PCA, Chemometrics and Intelligent Laboratory Systems, 41, 73-81. Wikstrm, C., et al. (1998), Multivariate Process and Quality Monitoring Applied to an Electrolysis Process Part I. Process Supervision with Multivariate Control Charts, Chemometrics and Intelligent Laboratory Systems, 42, 221-231. Wikstrm, C., et al. (1998), Multivariate Process and Quality Monitoring Applied to an Electrolysis Process Part II. Multivariate Time-series Analysis of Lagged Latent Variables, Chemometrics and Intelligent Laboratory Systems, 42, 233-240. Wold, S., et al. (1998), Modelling and Diagnostics of Batch Processes and Analogous Kinetic Experiments, Chemometrics and Intelligent Laboratory Systems, 44, 331-340, 1998.
page 2
Articles, multivariate calibration

1. 2. 3. 4. 5. 6. Brown, P.J. (1982), Multivariate Calibration, Journal of the Royal Statistical Society, B44, 287-321. Beebe, K.R. and Kowalski, B.R. (1987), An Introduction to Multivariate Calibration and Analysis, Analytical Chemistry, 57, 1007-1017. Trygg, J., and Wold, S. (1998), PLS Regression on Wavelet Compressed NIR Spectra, Chemometrics and Intelligent Laboratory Systems, 42, 209-220. Swierenga, H., et al. (1998), Improvement of PLS Model Transferability by Robust Wavelength Selection, Chemometrics and Intelligent Laboratory Systems, 41, 237-248. Wold, S., Antti, H., et al. (1998), Orthogonal Signal Correction of Near-Infrared Spectra, Chemometrics and Intelligent Laboratory Systems, 43, 123-134. Bro, R. (1996), Hndbog i Multivariabel Kalibrering, KVL, Copenhagen, Denmark.
Articles, multivariate characterization

1. 2. 3. 4. 5. 6. Carlson, R., Lundstedt, T., and Albano, C. (1985), Screening of Suitable Solvents in Organic Synthesis Strategies for Solvent Selection, Acta Chemica Scandinavica, B39, 79-91. Wallbcks, L., Edlund, U. and Nordn, B. (1991), Multivariate Characterization of Pulp Using Solid-State 13 C NMR, FTIR and NIR, Tappi Journal, 74, 201-206. Cocchi, M., et.al. (1992), Theoretical versus Empirical Molecular Descriptors in Monosubstituted Benzenes A Chemometric Study, Chemometrics and Intelligent Laboratory Systems, 12, 209-224. Eriksson, L., Verhaar, H.J.M., and Hermens, J.L.M. (1994), Multivariate Characterization and Modelling of the Chemical Reactivity of Epoxides, Environmental Toxicology and Chemistry, 13, 683-691. Lindgren, ., and Sjstrm, M. (1994), Multivariate Physico-Chemical Characterization of Some Technical Non-Ionic Surfactants, Chemometrics and Intelligent Laboratory Systems, 23, 179-189. Andersson, P., Haglund, P., and Tysklind, M. (1997), Ultraviolet Absorption Spectra of all 209 Polychlorinated Biphenyls Evaluated by Principal Component Analysis, Fresenius Journal of Analytical Chemistry, 357, 1088-1092.
Articles, QSAR
1. 2. 3. 4. 5. 6. Eriksson, L., Hermens, J.L.M., et al. (1995), Multivariate Analysis of Aquatic Toxicity Data with PLS, Aquatic Sciences, 57, 217-241. Eriksson, L., and Johansson, E. (1996), Multivariate Design and Modeling in QSAR, Chemometrics and Intelligent Laboratory Systems, 34, 1-19. Verhaar, H.J.M., Hermens, J.L.M., et al. (1996), Classifying Environmental Pollutants. Separation of Class1 and Class2 Type Compounds Based on Chemical Descriptors, Journal of Chemometrics, 10, 149162. Goodford, P. (1996), Multivariate Characterization of Molecules for QSAR Analysis, Journal of Chemometrics, 10, 107-117. Lindgren, ., et al. (1996), Quantitative Structure-Effect Relationships for Some Technical Non-ionic Surfactants, Journal of the American Oil Companies Society, 73, 863-875. Sandberg, M., et al. (1998), New Chemical Descriptors Relevant for the Design of Biologically Active Peptides. A multivariate Characterization of 87 Amino Acids, Journal of Medicinal Chemistry, 41, 24812491.
page 3

DOE Handouts Exercises Solutions Wed

Caricato da

Informazioni sul documento

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

DOE Handouts Exercises Solutions Wed

Caricato da

Copyright:

Formati disponibili

Design of Experiments (DOE) Pharma Applications

Design of Experiments (DOE) Pharma Applications

Objectives of DOE Course

To describe how to analyze the data

To describe how to interpret the results

To describe how convert modelling results into concrete action

Copyright Umetrics AB, 2004-02-11

Design of Experiments (DOE) Pharma Applications

Why/How DOE is used

Where DOE is used

Three primary experimental objectives

General Example 1: Screening

Place in luminometer and measure light emission!

General Example 2: Optimization

Place in luminometer and measure light emission!

General Example 3: Robustness Testing

(R) (I) (S)

The "intuitive" (COST) approach to experimental work

A better approach - DOE

Overview of DOE - CakeMix application

50 100 X2 200 X1 400 50

Overview of steps in DOE - part I

3. Create Design (Make experiments)

Overview of steps in DOE - part II

0.40 0.20 0.00

Investigation: Cakemix (MLR) Scaled & Centered Coefficients for Taste

0.20 0.00 -0.20 -0.40 -0.60 Fl

MODDE 7 - 2004-01-20 11:34:53

Overview of steps in DOE - part III

Three critical problems

94 96 98 92 Two measurements of yield. Any real difference?

Estimating real effects and noise

0.50 0.00 -0.50 Fl Fl*Sh Sh Fl*Egg Egg Sh*Egg

R2 Adj.=0.988 RSD=0.0768 Conf. lev.=0.95

It matters where the experiments are positioned !!! Design is needed.

Selected design must match experimental objective

Screening & Robustness Testing

Hyper cube + axial points

What we have learnt

DOE handles three problems well

Design of Experiments (DOE) Pharma Applications

Introduction to problem formulation (PF)

Introduction to problem formulation (PF)

PF - 1. Selection of experimental objective

PF - 1c. Finding the optimal region

How do we get here ?

RSM: Response surface modelling (methodology)

PF - 1e. Robustness testing

PF - 1f. Mechanistic modelling

0 -2 1.5 2.0 2.5 3.0

Modde 3.0 by Umetri AB 1995-09-15 12:12

Transform before executing design

Exclusion above line

D-optimal design Exclusion below line

Semi-continuous: (Product quality was)

Linear: Screening & Rob. Test.

The model concept

Investigation: cakemix (MLR)

Response contour plot

Modde 4.0 by Umetri AB 1998-01-02 08:44

Empirical, semi-empirical and theoretical models

What we have learnt - part I

What we have learnt - part II

y = 0 + 1x1 + 2x2 + 12x1x2 +...+

0.50 0.00 -0.50 Fl FlSh Sh FlEgg Egg Sh*Egg