Sei sulla pagina 1di 468

Design of Experiments (DOE) Pharma Applications

Contents

Design of Experiments (DOE) Pharma Applications


1. Introduction

2. Problem formulation 3. Full factorial designs 4. Analysis of full factorial designs 5. Analysis of full factorial designs. II. Causes of bad models 6. Experimental objective: Screening 7. Post-screening actions 8. Experimental objective: Optimization 9. Experimental objective: Robustness testing 10. Conclusions 11. Additional Topics 12. Mixture design One Day Add-on 13. Exercises
2/10/2004

Objectives of DOE Course


To describe how to make experiments efficiently
Span the experimental domain with the aid of an experimental design

To describe how to analyze the data


Use good statistical tools to evaluate the result of the experiments

To describe how to interpret the results


With the clever use of PC-based graphical facilities

To describe how convert modelling results into concrete action


MODDE optimizer & verifying experiments

2/10/2004

Table of contents 1. Introduction 2. Problem formulation 3. Full factorial designs 4. Analysis of full factorial designs 5. Analysis of full factorial designs. II. Causes of bad models 6. Experimental objective: Screening 7. Post-screening actions 8. Experimental objective: Optimization 9. Experimental objective: Robustness testing 10. Conclusions 11. Additional topics D-optimal design Blocking the Experimental Plan Mixture design Other RSM designs Multilevel qualitative factors The Taguchi approach to robust design Simultaneous optimization of several responses Partial least squares projections to latent structures Design in Latent Variables 12. Mixture design One Day Add-On 13. Exercises Getting started: ByHand, CakeMix Screening, Full Fac: Pain, Tablets, Protein Spray-Drying Screening, Frac Fac: Pilot Plant, Reporter Gene Assay, Chromshper_B Optimization: Chiral Separation, Metabolism, Willge, DrugD Robustness Testing: Nonafact, HPLC Robustness Robust Design: CakeTaguchi, LoafVolume D-optimal design: Model Updating Blocking the Experimental Plan: Blocking Mixture design: Mixture Region Training, Waaler, Rocket, Corne59, Bubbles, Lowarp 14. References 05 17 29 39 55 67 85 99 115 127 131 132 147 156 172 175 180 196 202 214 235 287 289 301 317 353 379 393 411 425 429 465

Copyright Umetrics AB, 2004-02-11

Page 1 (1)

Design of Experiments (DOE) Pharma Applications


Chapter 1 Introduction

Contents
Why/How DOE and where DOE is used Three primary experimental objectives Three General Examples The intuitive approach to experimental work (COST) A better approach (DOE) Overview of steps in DOE (using CakeMix) Benefits of DOE Summary

2/10/2004

Why/How DOE is used


Development of new products and processes Enhancement of existing products and processes Optimization of quality and performance of a product Optimization of an existing manufacturing procedure Screening of important factors Minimization of production cost Robustness testing of products and processes ...

2/10/2004

Where DOE is used


Chemical industry Polymer industry Car manufacturing industry Powertrain industry Pharmaceutical industry Food and dairy industry Pulp and paper industry Steel and mining industry Plastics and paints industry TeleCom industry Marketing and preference mapping; Conjoint analysis
2/10/2004

Three primary experimental objectives


Screening
Which factors are most influential? What are their appropriate ranges?

Optimization
How shall we find the optimum? Is there a unique optimum, or is a compromise necessary to meet conflicting demands on the responses?

Robustness testing
How shall we adjust our factors to guarantee robustness? Do we have to change our product specifications prior to claiming robustness?

2/10/2004

General Example 1: Screening


Reporter Gene Assay (Active Biotech AB) Change in factors give different treatments The goal was to uncover which factors affected the signal-to-background (S/B) ratio
Seed cells into plates and culture or treat as desired!

Place in luminometer and measure light emission!

t en tm ea Tr

Light

2/10/2004

General Example 2: Optimization


Reporter Gene Assay (Active Biotech AB) Based on results of screening phase; down-sizing from 6 to 3 factors The goal was to find the factor setting resulting in the highest S/B-value
Seed cells into plates and culture or treat as desired!

Place in luminometer and measure light emission!

t en tm ea Tr

Light

2/10/2004

General Example 3: Robustness Testing


HPLC separation of analytes in pharmaceutical industry The goal was to constantly maintain a resolution (Res1) above 1.5, which corresponds to complete baseline separation of two adjacent peaks
H 310/83
10000

1a

1m H 309/40

(R) (I) (S)

8000

6000

(II)
4000

2000

10

12

14 min

2/10/2004

The "intuitive" (COST) approach to experimental work


Changing one separate factor at a time (COST) does not lead to the real optimum, and gives different implications with different starting points Leads to many experiments and little information No quantification of interactions !!!
10

X2
8 6 4

-2

10

11

12

13

14

X1

15

10 8 6 4 2 0

X2

-2 10 11 12 13 14

X1
9

15

2/10/2004

A better approach - DOE


Standard 300/75/75
100

If not COST, what do we do instead? The solution is to construct a carefully prepared set of representative experiments, in which all relevant factors are varied simultaneously
200 X1 400

X3

50 100 X2 50

2/10/2004

10

Overview of DOE - CakeMix application


Three factors varied: Flour (200-400g), Shortening (50-100g), and Eggpowder (50-100g) Response: Taste of resulting cake
Cake Mix Experimental Plan Cake No 1 2 3 4 5 6 7 8 9 10 11 Flour Shortening 200 400 200 400 200 400 200 400 300 300 300 50 50 100 100 50 50 100 100 75 75 75 Egg Powder 50 50 50 50 100 100 100 100 75 75 75 Taste 3.52 3.66 4.74 5.20 5.38 5.90 4.36 4.86 4.73 4.61 4.68

Standard 300/75/75

100

X3

50 100 X2 200 X1 400 50

2/10/2004

11

Overview of steps in DOE - part I


1. Define Factors

2. Define Response(s)

3. Create Design (Make experiments)

2/10/2004

12

Overview of steps in DOE - part II


Investigation: Cakemix (MLR) Summary of Fit
1.00 0.80 0.60
R2 Q2 Model Validity Reproducibility

4. Make Model

0.40 0.20 0.00

Taste
N=11 DF=6 Cond. no.=1.1726 Y-miss=0

Investigation: Cakemix (MLR) Scaled & Centered Coefficients for Taste

0.40

5. Interpret Model

0.20 0.00 -0.20 -0.40 -0.60 Fl


N=11 DF=6

Sh
R2=0.988 Q2=0.937

Egg
R2 Adj.=0.980 RSD=0.0974 Conf. lev.=0.95

Sh*Egg

MODDE 7 - 2004-01-20 11:34:53

2/10/2004

13

Overview of steps in DOE - part III


6. Use Model (make decisions) Where to do verifying experiments ?

Flour = 400 g
2/10/2004

14

Three critical problems


Three critical problems that DOE will deal with in a better way than the COST-approach:
Problem 1 (Interactions): Systems influenced by more than one factor are poorly investigated by changing one separate factor at a time (interactions are missed) Problem 2 (Interpretation): Maps of the system may be misleading without using DOE (experiments are often ill-positioned and unable to support a response contour plot) Problem 3 (Noise): Systematic and unsystematic variability (seen "effects" and "noise") are difficult to estimate and consider in the computations without a designed series of experiments, see next slide

2/10/2004

15

Variability (Problem 3)
Every measurement and experiment is influenced by noise Under stable conditions every process and system varies around its mean, and stays within control limits; usually 3SD.s

2/10/2004

16

Reacting to noise
Consider one experiment where the temperature is changed from 35C to 40C The response change, from slightly below 93% to close to 96%, lies within the variability interval found when replicating
Ten measurements of yield, under identical conditions

94 96 98 92 Two measurements of yield. Any real difference?

yield

92
2/10/2004

94

96

98

yield

17

Focusing on effects
COST often implies an excess consumption of resources due to informationally inefficient distribution of the experiments DOE provides a better spread of the trials ==> averaging possibilities leading to more precise effect estimates

Y1

X2

X3

X1 X1
2/10/2004

X2 X1
18

Estimating real effects and noise


Real effects are estimated by the coefficients, and the noise is contained in the confidence intervals
Investigation: cakemix (MLR) Scaled & Centered Coefficients for Taste

0.50 0.00 -0.50 Fl Fl*Sh Sh Fl*Egg Egg Sh*Egg

Uncertainty of coefficient

N=11 DF=4

R2=0.995 Q2=0.874

R2 Adj.=0.988 RSD=0.0768 Conf. lev.=0.95

2/10/2004

19

Consequence of variability
Two points, experiments, close to each other make the slope of the line be poorly determined
Y Y

Two points far away from each other make the slope be well determined

And if a center-point is put in between it is possible to explore whether our model is OK. Should it be linear or nonlinear?
Y

It matters where the experiments are positioned !!! Design is needed.


2/10/2004

20

Selected design must match experimental objective


2 factors 3 factors >3 Hyper cube

Screening
Balanced fraction of hyper cube

Screening & Robustness Testing

Hyper cube + axial points

Optimization

2/10/2004

21

Benefits of DOE
Organized approach which connects experiments in a rational manner More useful information is obtained (the influence of all factors together) More precise information is acquired in fewer experiments Results are evaluated in the light of variability Support for decision-making: Map of the system (response contour plot)

2/10/2004

22

What we have learnt


DOE results in a set of experiments in which all factors are varied at the same time DOE is used for three primary experimental objectives
screening: which factors are important and what are their appropriate ranges? optimization: what is the optimal factor setting? robustness testing: how sensitive is a response to small changes in the factors?

DOE handles three problems well


factor interactions are estimable reliable maps of the systems are possible seen effects and noise are separable and estimable

2/10/2004

23

Design of Experiments (DOE) Pharma Applications


Chapter 2 Problem Formulation

Contents
Introduction to problem formulation Selection of experimental objective Definition of factors Definition of responses Selection of regression model The model concept Generation of experimental design Creation of worksheet Summary

2/10/2004

Introduction to problem formulation (PF)


Problem formulation (PF) is of central importance in DOE PF involves the selection/definition of a number of important features influencing the experimental work:
(1) selection of experimental objective (2) definition of factors (3) definition of responses (4) selection of regression model (5) generation of experimental design (6) creation of worksheet

2/10/2004

Introduction to problem formulation (PF)


Responses: Variables describing the properties of the system/process Factors: Parameters changed to influence responses and possibly direct the system/process towards a desired response profile Model: Mathematical expression linking the changes in the factors to the changes in the responses
Reporter Gene Assay

System
Spray Drying Machine

Process
HPLC Equipment

2/10/2004

Responses (Y)
4

Factors (X)

PF - 1. Selection of experimental objective


Experimental objective may be selected from six stages of DOE:
(a) familiarization (b) screening (c) finding the optimal region (d) optimization (e) robustness testing (f) mechanistic modelling

Screening, optimization and robustness testing most frequently used The experimental objective tells which kind of investigation one wants to do. One should ask why is an experiment done? And for what purpose? And what is the desired result?
2/10/2004

PF - 1a. Familiarization
Useful when one is facing an entirely new type of application or equipment Spend a limited portion of the available resources, say, 10% Simple designs are used Goal: To verify that similar results are obtained for the replicated center-points, and that different results are found in the corners
2/10/2004

Factor 2

Factor 1

PF - 1b. Screening
Useful when one wants to find out a little about many factors Goal: To uncover the important factors and their appropriate ranges. Is factor/response relationship linear or non-linear? Results before . and after screening
Pareto principle (80/20 rule) With 25 factors approximately 5 have an effect Noise

2/10/2004

PF - 1c. Finding the optimal region


Useful when one is interested in moving the experimental region so that it probably includes the optimum Goal: To accomplish an adequate re-positioning of the experimental region Tools:
Gradient techniques (manual) MODDE optimizer (automatic)
Y

How do we get here ?

x2 x1

Interesting direction

2/10/2004

PF - 1d. Optimization
Useful when detailed knowledge about the factor influences are needed We do not ask if a factor is relevant (screening), but how (optimization) Goal: To identify the factor combination at which the desired response profile is fulfilled (or almost so)

RSM: Response surface modelling (methodology)


9

2/10/2004

PF - 1e. Robustness testing


Useful when one wants to understand how to regulate factors so that changes in the responses are minimized Some factors affect the mean, some the spread around the mean, and some both properties Goal: To accomplish factor tolerances (settings) within which robustness can be assured:
small changes in controllable factors will not affect the result small changes in uncontrollable factors will not cause an undesirable spread around the desired result

2/10/2004

10

PF - 1f. Mechanistic modelling


Useful when there is a need of establishing a theoretical model for a given field and a given problem Goal: To prove a new model, or maybe falsify some competing models One or more semi-empirical models are utilized to build such a theoretical model. In this conversion process, regression coefficients are used to get an idea of appropriate derivative terms in the mech. model Important: correct
problem formulation experimental design data analysis
2/10/2004

11

PF - 2. Specification of factors
Categorization of factors Examples (MODDE) Quantitative Controlled & Uncontrolled Temperature 10C to 50C Process & Mixture (Formulation) Quant. Multilevel Quantitative & Qualitative Speed 200/300/400/500 rpm Qualitative Catalyst Pd/Pt/Mo Formulation Strawberries 0.3 - 0.4 Milk 0.3 - 0.4 Ice cream 0.3 - 0.4 Filler Solvent in mixture for which effect is uninteresting
12

2/10/2004

Transformation of factors
A factor can be transformed Examples:
log; neglog; logit; square root; fourth root
y y

log x

When ?
Variables with a natural zero Variables where the max/min ratio exceeds 10
12 10 8 6 4 2
1 2 7
Modde 3.0 by Umetri AB 1995-09-15 12:03

8 3 4

Types of variables
concentrations volumes levels

0 -2 1.5 2.0 2.5 3.0


A bel

3.5

4.0

4.5

15

9 1 7 2

0 1.5

2.0

2.5

3.0
A bel

3.5

4.0

4.5

2/10/2004

Modde 3.0 by Umetri AB 1995-09-15 12:12

Transform before executing design

10

13

Constraints of factors
An irregular experimental region may be defined by specifying linear constraints of factors
Investigation: itdoe_constraint Raw Data Plot with Experiment Number labels
5
pH

8 3 14 13 12 11 10 9

pH

Exclusion above line

1
120

7
130 140 Temp

5
150 160

D-optimal design Exclusion below line


2/10/2004

14

Uncontrolled factors
These are factors that cannot be controlled, but which still may influence the results (responses)
Examples: Ambient humidity and temperature

Record values of uncontrolled factors, and include these in the data analysis Use randomization of experiments

2/10/2004

15

PF - 3. Specification of responses
Choose responses that are relevant; many responses often necessary (Regular, Derived, Linked) Continuous:
breakage of weld soot release when running a truck engine resolution of two adjacent peaks in liquid chromatography cost of material used in production (Derived response)

Discrete :
categorical answers of yes/no type the cake tasted good/did not taste well

Semi-continuous: (Product quality was)


Very poor = 1; Bad = 2; OK = 3; Good = 4; Excellent = 5
2/10/2004

16

Transformation of responses
Responses may be transformed A non-linear relationship between y and x, may be linearized by a suitable transformation of y Examples: no transf.; log; neglog; logit; square root; fourth root Transform after executing design

log y

x
2/10/2004

x
17

PF - 4. Selection of model
We distinguish between three main types of polynomial models
linear: interaction: quadratic: y = 0 + 1x1 + 2x2 +...+ y = 0 + 1x1 + 2x2 + 12x1x2 +...+ y = 0 + 1x1 + 2x2 + 11x12 + 22x22 + 12x1x2 +...+

Linear: Screening & Rob. Test.


2/10/2004

Interaction: Screening

Quadratic: Optimization
18

The model concept


Models are not reality, but approximate representations of some aspects of reality

Investigation: cakemix (MLR)


100

Contour of Taste
5.70 5.40 5.10 Flour = 400.000

90

80

Eggpowder

70

4.80
60

4.20 3.90
50 50 60 70

4.50 5.10
80 90 100

Shortening

Toy train

Map of Iceland

Response contour plot

2/10/2004

Modde 4.0 by Umetri AB 1998-01-02 08:44

19

Empirical, semi-empirical and theoretical models


In DOE mathematical models are used for relating variation in factors to variation in responses Types of mathematical models
empirical semi-empirical fundamental y = a + bx + y = a + blogx + H = E ; pV = nRT

DOE is concerned with semi-empirical modeling using linear, interaction, quadratic, or cubic models

2/10/2004

20

PF - 5. Generation of design
Chosen model and design to be generated are intimately linked MODDE considers the number of factors, their levels and nature (quantitative, qualitative, ), and the selected experimental objective, and then recommends a design that is tailored to the researchers problem

2/10/2004

21

PF - 6. Creation of worksheet
An example worksheet with extra information
Run order; Constant factor; Uncontrollable factor; Inclusion of experiments;

Are the proposed experiments reasonable? Will they fulfil the goals?

2/10/2004

22

What we have learnt - part I


Problem formulation comprises six steps:
(i) selection of experimental objective
familiarization screening finding the optimal region optimization robustness testing (mechanistic modelling)

(ii) definition of factors (iii) definition of responses (iv) selection of regression model (v) generation of experimental design (vi) creation of worksheet

2/10/2004

23

What we have learnt - part II


Models are not reality, but useful approximations of small parts of reality Types of polynomial models:
linear: y = 0 + 1x1 + 2x2 +...+
geometry: undistorted plane objective: screening & robustness testing design: fractional factorial designs

interaction:

y = 0 + 1x1 + 2x2 + 12x1x2 +...+

geometry: twisted plane objective: screening design: full or fractional factorial designs

quadratic:

y = 0 + 1x1 + 2x2 + 11x12 + 22x22 + 12x1x2 +...+

geometry: curved plane objective: optimization (RSM in MODDE) design: composite designs

2/10/2004

24

Design of Experiments (DOE) Pharma Applications


Chapter 3 Full factorial designs

Contents
Introduction to full factorial designs Construction and geometry of the 22, 23, 24 and 25 designs Pros and cons of full factorial designs Main effect of a factor By-hand methods for computing effects Interaction effects Plotting of interaction effects Computation of effects using least squares analysis Relationship between effects and coefficients How to express regression coefficients Summary
2/10/2004

Introduction to full factorial designs


Full factorial designs form the basis for classical experimental designs They are important for a number of reasons:
they require relatively few runs per investigated factor they can be upgraded to form composite designs, which are used in optimization they form the basis for two-level fractional factorial designs, which are of great practical value at an early stage of a project they are easily interpreted by using common sense and elementary arithmetic

Full factorial designs are regularly used with 2 - 4 factors In this chapter we consider two-level full factorial designs
2/10/2004

Notation
To perform a two-level full factorial design, the investigator has to assign a low level and a high level to each factor
Notation Standard Extended Example: Temp Example: pH Example: Cat. (A, B) Low 1 High + +1 Center 0 0

100C 200C 150C 7 9 8 A B n/a


Cat A (-1)

For a simple system, it may be convenient to display the coded unit together with original factor unit

y1 = yield

Cat B (+1) low level -1 (100C) high level +1 (200C) x1 = temp


4

2/10/2004

The 22 full factorial design - construction & geometry


Example: ByHand
Definitions Construction Geometry
x1 x2 y3 Factors Amount formic acid/enamine (mole/mole) Reaction temperature (C) Response The desired product % - (1) 1.0 25 Levels 0 + (1) 1.25 1.5 62.5 100

X1
2/10/2004

Factors Original unit Exp. no x1 x2 1 1 25 2 1.5 25 3 1 100 4 1.5 100 5 1.25 62.5 6 1.25 62.5 7 1.25 62.5

Factors Coded unit x1 x2 + + + + 0 0 0 0 0 0

Response % y3 80.4 72.4 94.4 90.6 84.5 85.2 83.8

X2

The 23 full factorial design - construction & geometry


Example: CakeMix
Definitions Construction Geometry
Factors Flour Shortening Egg powder Response: Levels (Low/High) 200 g / 400 g 50 g / 100 g 50 g / 100 g Standard conditions 300 g 75 g 75 g

Taste of the cake, obtained by averaging the judgement of a sensory panel

Standard 300/75/75

100

X3

50 100 X2 200 X1 400 50

Exp No 1 2 3 4 5 6 7 8 9 10 11

Design Matrix Flour Short Egg ening + + + + + + + + + + + + 0 0 0 0 0 0 0 0 0

Experimental matrix Flour Short Egg Taste ening 200 50 50 3.52 400 50 50 3.66 200 100 50 4.74 400 100 50 5.2 200 50 100 5.38 400 50 100 5.9 200 100 100 4.36 400 100 100 4.86 300 75 75 4.68 300 75 75 4.73 300 75 75 4.61
6

2/10/2004

Orthogonality property of full factorials


Illustration: 23 design Each factor is orthogonal to the others in the design The effect of a factor can be estimated independently of all other factor influences
2/10/2004

The 24 and 25 full factorial designs


Construction of 22, 23, 24 and 25 designs in 4, 8, 16 and 32 runs, respectively (NOTE: No replicates are included) Geometrically, the 24 and 25 full factorial designs correspond to regular hyper-cubes in four and five dimensions
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 X1 X2 X3 X4 X5 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +

2/10/2004

Pros and cons of two-level full factorial designs


These designs enable interaction models to be estimated, which is adequate for screening Each factor is investigated at both levels of all other factors
balancing orthogonality
No of No of runs No of runs investigated Full factorial Fractional factorial factors (k)

Full factorial designs are realistic choices with 2-4 factors; with 5 or more factors fractional factorial designs are recommended
2/10/2004

2 3 4 5 6 7 8 9 10

4 8 16 32 64 128 256 512 1024

--4 8 16 16 16 16 32 32

Main effect of a factor


The main effect of a factor is defined as the change in the response due to varying one factor from its low level to its high level, and keeping the other factors at their center-level Example: CakeMix
Main effect plot of flour with regards to taste
y1 = taste effect of flour

low level -1 (200 g)

high level +1 (400 g)


Investigation: Cakemix (MLR) Main Effect for Flour, resp. Taste

x1 = flour

5.00 Taste 4.80 4.60 4.40 200 220 240 260 280 300 Flour
N=11 DF=4 R2=0.995 Q2=0.874 R2 Adj.=0.988 RSD=0.0768 Conf. lev.=0.95
MODDE 7 - 2004-01-20 13:41:43

320

340

360

380

400

2/10/2004

10

Computation of main effects in the 22 case (ByHand)


The main effect of a factor might be understood as the average difference in response values when moving from low to high level
3
Y3 (desired product)

1,3 = 94.4-80.4=14

Factors Response % Original unit x2 y3 Exp. no x1 1 1.0 25 80.4 2 1.5 25 72.4 3 1.0 100 94.4 4 1.5 100 90.6 5 1.25 62.5 84.5 6 1.25 62.5 85.2 7 1.25 62.5 83.8

4 1
(te m pe ra tu re )

2,4 = 90.6 - 72.4 = 18.2

5,6,7

100

X2

2
25 1.0
X1 (f 1.5 orm ic ac id/en amin e)

Main effect of temperature: (1,3 + 2,4)/ 2 = (14 + 18.2)/2 = 16.1

2/10/2004

11

Computation of main effects in the 22 case (ByHand)


3
Y3 (desired product)

Investigation: Byhand (MLR) Main Effect for x1, resp. y3

4
pe ra tu re )

3,4 = 90.6 - 94.4 = -3.8

95 90 y3 85 80

5,6,7

(te m

100

2
25 1.0
X1 ( form

1,2 = 72.4 - 80.4 = -8 Main effect of formic acid/enamine: (1,2 + 3,4)/ 2 = (-8 + (-3.8))/2 = -5.9

75 1.0 1.1 1.2 x1


N=7 DF=3 R2=0.997 Q2=0.995 R2 Adj.=0.993 RSD=0.5728 Conf. lev.=0.95

1.3

1.4

1.5

ic ac id/en

1.5
amin e)

Investigation: Byhand (MLR) MODDE 7 - 2004-01-20 13:51:55 Main Effect for x2, resp. y3

3
Y3 (desired product)

1,3 = 94.4-80.4=14

4
pe ra tu re )

95
2,4 = 90.6 - 72.4 = 18.2

(t e m

100

y3

5,6,7

90 85 80

2
25 1.0
X1 ( form

75
Main effect of temperature: (1,3 + 2,4)/ 2 = (14 + 18.2)/2 = 16.1

30 40 50 60 70 80 90 100 x2
N=7 DF=3 R2=0.997 Q2=0.995 R2 Adj.=0.993 RSD=0.5728 Conf. lev.=0.95
MODDE 7 - 2004-01-20 13:50:19

ic ac id/en

1.5
amin e)

2/10/2004

12

A quicker by-hand method for computing effects (ByHand)


Experimental matrix Exp. no 1 2 3 4 5 6 7 x1 1 1.5 1 1.5 1.25 1.25 1.25 x2 25 25 100 100 62.5 62.5 62.5 Computational matrix mean + + + + + + + x1 + + 0 0 0 x2 + + 0 0 0 x1*x2 + + 0 0 0 Response y3 80.4 72.4 94.4 90.6 84.5 85.2 83.8

Calculations refer to the computational matrix: 1st column gives the mean: (+80.4+72.4+94.4+90.6+84.5+85.2+83.8)/7 = 84.5; 2nd column gives the molar ratio, x1, main effect: (-80.4+72.4-94.4+90.6)/2 = - 5.9; 3rd column gives the temperature, x2, main effect: (-80.4-72.4+94.4+90.6)/2 = 16.1; 4th column gives the x1*x2 two-factor interaction: (+80.4-72.4-94.4+90.6)/2 = 2.1
2/10/2004

13

Plotting of main and interaction effects (ByHand)


The two main effects make the surface slope and the twofactor interaction causes its twist

Interaction plots may be used to specifically explore the nature of interactions

Investigation: Byhand (MLR) Interaction Plot for x1*x2, resp. y3


95 90 y3 85 80 75 1.00 1.10 1.20 x1
N=7 DF=3 R2=0.997 Q2=0.995 R2 Adj.=0.993 RSD=0.5728
MODDE 7 - 2004-01-20 13:54:20

Investigation: Byhand (MLR)


x2 (low ) x2 (high)

Interaction Plot for x1*x2, resp. y3


95

x1 (low ) x1 (high)

x2 (high) x2 (high)
y3

90 85 80 75

x1 (low) x1 (high)

x2 (low) x2 (low)
1.30 1.40 1.50

x1 (low) x1 (high)
30 40 50 60 x2
N=7 DF=3 R2=0.997 Q2=0.995 R2 Adj.=0.993 RSD=0.5728
MODDE 7 - 2004-01-20 13:54:57

70

80

90

100

2/10/2004

14

The interaction plot shows the strength of an interaction


No interaction
Investigation: testing zero two factor interaction (MLR) Interaction Plot for X2*X5, resp. Tornado
15.60 15.55 Tornado 15.50 15.45 15.40 2
X2 (low ) X2 (high)

Mild interaction

Strong interaction
Investigation: Cakemix (MLR) Interaction Plot for Sh*Egg, resp. Taste
5.50 5.00 Taste
Egg (low ) Egg (high)

Investigation: LaserWelding_FO (MLR) Interaction Plot for Po*Sp, resp. Width


Sp (low ) Sp (high)

X2 (high)

X2 (high)
Width

Sp (low)
1.40 1.20 1.00 0.80

Egg (high) Egg (low) Egg (high)

Sp (low) Sp (high)
Power
N=22 DF=15 R2=0.972 Q2=0.940 R2 Adj.=0.961 RSD=0.0594
MODDE 7 - 2004-01-20 14:08:37

4.50 4.00

X2 (low)
3 4 5 6 X5
N=33 DF=22 R2=0.989 Q2=0.974

X2 (low)
7 8 9 10

Sp (high)

3.50

Egg (low)
50 60 70 80 90 100 Shortening
N=11 DF=6 R2=0.988 Q2=0.937 R2 Adj.=0.980 RSD=0.0974
MODDE 7 - 2004-01-20 14:09:21

2.20 2.40 2.60 2.80 3.00 3.20 3.40 3.60 3.80 4.00 4.20

R2 Adj.=0.984 RSD=0.3925
MODDE 7 - 2004-01-20 14:07:18

2/10/2004

15

Computation of effects using least squares fit


The by-hand method is used because it gives an understanding of the main and interaction effects concepts In reality, DOE data are analyzed by calculating a regression model using least squares fit, which has the following advantages:
(i) the robustness to slight fluctuations in the factor settings (ii) the ability to handle a failing corner where experiments could not be made (iii) the estimation of the experimental noise (iv) the availability of a number of useful model diagnostic tools

An important consequence of least squares analysis is that the outcome is not main and interaction effect estimates, but a regression model consisting of coefficients reflecting the influence of the factors (see below)

2/10/2004

16

Introduction to least squares analysis


Example of a linear relationship between a factor X1 and a response Y1 The deviation between the model and measured data is known as a residual Least squares analysis seeks to minimize the sum of the squares of such residuals Goodness of fit: R2 = 1 - SSres/SStot.corr
1 denotes perfect model 0 corresponds to no model at all 0.75 indicates a rough, but stable and useful model

Y1 = -1.54 + 1.61X1 + e; R2 = 0.75 Response Y1


4.5

3.5

2.5

Factor X1
2 2 2.5 3 3.5 4

2/10/2004

17

A coefficient has a value half of that of the effect


Coefficient
Indicates response change when factor changes from 0 to +1 (in coded factor unit) Coeff.s are sorted in factor order
0.40 0.20 0.00 -0.20 -0.40 -0.60 Fl Fl*Egg Egg Fl*Sh Sh Sh*Egg
MODDE 7 - 2004-01-20 14:11:40

Investigation: Cakemix (MLR) Scaled & Centered Coefficients for Taste

Effect
Indicates response change when factor changes from -1 to +1 Effects are sorted according to abs(size)
1.00 0.50 Effects 0.00 -0.50 -1.00

N=11 DF=4

R2=0.995 Q2=0.874

R2 Adj.=0.988 RSD=0.0768 Conf. lev.=0.95

Investigation: Cakemix (MLR) Effects for Taste

Fl

Sh*Egg

Fl*Egg

Egg

Example: CakeMix
2/10/2004

N=11 DF=4

R2=0.995 Q2=0.874

R2 Adj.=0.988 RSD=0.0768 Conf. lev.=0.95


MODDE 7 - 2004-01-20 14:12:14

Fl*Sh

Sh

18

Ways of expressing regression coefficients


Scaled & Centered: Constant (4.695) relates to estimated taste at the design center-point Unscaled: Constant relates to taste at natural zero, i.e., zero grams of flour, shortening, and eggpowder (meaningless cake mix recipe !!!!!!)

2/10/2004

19

What we have learnt


Full factorial designs form the basis for classical experimental designs Full factorial designs are frequently used for exploring 2-4 factors They are useful for estimating factor main effects and two-factor interactions The main effect indicates the change in the response when the factor is varied from -1 to +1 (and fixing the others at 0) The regression coefficient of a linear model term reflects the change in the response when the factor is raised from 0 to +1 For model interpretation scaled & centered coefficients should be used Unscaled coefficients are used when exercising in EXCEL or with a pocket calculator

2/10/2004

20

Design of Experiments (DOE) Pharma Applications


Chapter 4 Analysis of full factorial designs

Contents
Introduction Minimum level of data analysis
Examples: CakeMix & ByHand

Recommended level of data analysis


Examples: CakeMix & ByHand

Advanced level of data analysis


Not illustrated

Overview of data analytical steps in MODDE Summary

2/10/2004

Introduction
Analysis of DOE-data consists of three primary stages:
evaluation of raw data
get a general appraisal for regularities and peculiarities in the data understand and/or remove anomalies

regression analysis and model interpretation


derive the best possible regression model interpret model

use of regression model


make decision of what to do next new investigation or verifying experiments?

2/10/2004

Minimum level of data analysis


Evaluation of raw data
replicate plot

Regression analysis and model interpretation


R2/Q2/Model Validity/Reproducibility coefficient plot

Use of regression model


response contour plot

2/10/2004

Evaluation of raw data - Replicate plot (CakeMix)


The replicate plot shows the variation among the replicates in relation to the variation across the entire design (reproducibility)
Investigation: cakemix Plot of Replications for Taste 6.00 5.50 Taste 5.00
Investigation: Test Plot of Replications for Response 3000 2500 amylase

6 4 3 5 8 7 1
1 2

17 16 2 4 3 1
1 2 3 4 5

2000 1500 1000

4.50 4.00 3.50

9 11 10

6 8 5
6 7

13 11 9 10 12
8

14

15 18 19 20

2
3 4 5 6 7 8 9 Replicate Index

500 Replicate Index

9 10 11 12 13 14 15

Good
2/10/2004

Bad

Regression analysis Summary of fit plot


R2 measures fit (explained variation)
1.00 0.80 0.60 0.40 0.20 0.00
N=11 DF=4

Investigation: CakeMix (MLR) Summary of Fit

Reproducibility assesses replicate R2 Q2 variation Model Validity


Reproducibility

Taste
Cond. no.=1.1726 Y-miss=0

Q2 measures predictive power (predicted variation)


2/10/2004

Model validity indicates if we have an appropriate model


6

Regression analysis - R2
Goodness of fit, R2 = 1- SSres/SStot.corr. measures how well we can reproduce current runs varies between 0 and 1 1 = perfect model (all points on line) easy to get arbitrarily close to 1 provides basis for raw and standardized residuals in Nplot
Investigation: cakemix (MLR) Taste 6.00 5.50 Observed 5.00 4.50 4.00 3.50

6 4 7 1
3.50

38 9 11 10

2
4.00 4.50 5.00 5.50 6.00 Predicted

N=11 DF=4

R2=0.995 Q2=0.874

R2 Adj.=0.988 RSD=0.0768

2/10/2004

Regression analysis - Q2
Goodness of prediction, Q2 = 1- SSpress/SStot.corr. uncovers how well we can predict new experiments varies between - and 1 better indicator of model usefulness Q2 > 0.5 GOOD Q2 > 0.9 EXCELLENT provides basis for deleted studentized residuals in N-plot
Investigation: cakemix (MLR) Taste 6.00 5.50 5.00 4.50 4.00 3.50

6 4 7 X 1
3.50

Observed

38 9 11 10

2
4.00 4.50 5.00 5.50 6.00 Predicted

N=11 DF=4

R2=0.995 Q2=0.874

R2 Adj.=0.988 RSD=0.0768

R2 must not exceed Q2 by more than 0.2-0.3 !!!!


2/10/2004

Regression analysis - Model Validity

ModelValidity =1 + 0.57647 * log10 ( p)


Model Validity > 0.25 indicates a good model
15
Investigation: byhand (MLR) y2

Model Validity < 0.25 indicates significant lack of fit (i.e., model imperfection) Only available when replicated experiments have been performed
2/10/2004

Observed

10 5 0 0

Significant lack-of-fit

6 7 5

4 3 5 10 Predicted
N=7 DF=3 R2=0.698 Q2=-10.499 R2 Adj.=0.396 RSD=5.0485

15

Regression analysis - Reproducibility


Reproducibility = 1 - (MSPure error / MSTotal corrected) If reproducibility is below 0.5, you have a large pure error and poor control of the experimental procedure
Investigation: cakemix Plot of Replications for Taste 6.00 5.50
Taste
Investigation: Test Plot of Replications for Response 3000 2500 amylase

6 4 3 5 8 7 1
1 2

17 16 2 4 3 1
1 2 3 4 5

5.00 4.50 4.00 3.50

2000 1500 1000

9 11 10

6 8 5
6 7

13 11 9 10 12
8

14

15 18 19 20

2
3 4 5 6 7 8 9 Replicate Index

500 Replicate Index

9 10 11 12 13 14 15

Good
2/10/2004

Bad
10

Model interpretation - Coefficient plot (Cake Mix)


Coefficient plot shows importance of model terms; also useful for model refinement Example: CakeMix
initial model refined model
1.00 0.80 0.60 0.40 0.20 0.00
0.40 0.20 0.00 -0.20 -0.40 -0.60 Fl Egg Sh Sh*Egg
MODDE 7 - 2004-01-20 14:47:52

Investigation: Cakemix (MLR) Summary of Fit


1.00

R2 Q2 Model Validity Reproducibility

Investigation: Cakemix (MLR) Scaled & Centered Coefficients for Taste


0.40

0.80 0.60 0.40 0.20 0.00

0.20 0.00 -0.20 -0.40 -0.60 Fl Sh*Egg Fl*Egg Egg Fl*Sh Sh


R2=0.995 Q2=0.874

Taste
N=11 DF=4 Cond. no.=1.1726 Y-miss=0
N=11 DF=4 R2 Adj.=0.988 RSD=0.0768 Conf. lev.=0.95
MODDE 7 - 2004-01-20 14:47:02

Investigation: Cakemix (MLR) Summary of Fit

R2 Q2 Model Validity Reproducibility

Investigation: Cakemix (MLR) Scaled & Centered Coefficients for Taste

Q2 increases from 0.87 to 0.94 model pruning justified

Taste
N=11 DF=6 Cond. no.=1.1726 Y-miss=0
N=11 DF=6 R2=0.988 Q2=0.937

R2 Adj.=0.980 RSD=0.0974 Conf. lev.=0.95

2/10/2004

11

Use of model - Response contour plot (CakeMix)


Useful for understanding the impact of large interactions (Sh*EggP in CakeMix) Typical questions:
Where is the interesting area? Where do we start a new investigation? Where is it appropriate to make verifying experiments?
Flour = 400

Important: Underlying model must be good (high Q2) !!!


2/10/2004

12

Evaluation of raw data - Replicate plot (ByHand)


Example: ByHand (all three responses)
y1 (side product); y2 (unreacted starting material); y3 (desired product)

Small replicate errors

2/10/2004

13

Regression analysis - Summary of fit plot (ByHand)


R2/Q2 points to a poor model for y2 Replicate plot suggests a nonlinear relationship between factors and y2
Investigation: Byhand
Investigation: Byhand (MLR) Summary of Fit
1.00 0.80 0.60 0.40 0.20 0.00 -0.20
R2 Q2 Model Validity Reproducibility

Plot of Replications for y2 with Experiment Number labels

1
10 y2

6 7 5

0
y1
N=7 DF=3

3
1 2 3 Replicate Index 4

4
5

y2
Cond. no.=1.3229 Y-miss=0

y3

MODDE 7 - 2004-01-20 15:16:05

2/10/2004

14

Model interpretation - Coefficient plot (ByHand)

Poor model for y2

2/10/2004

15

Use of model - Triplet response contour plot (ByHand)


Provides a convenient overview of the three models (but remember the weakness of model for y2) Goal
low y1 low y2 high y3

2/10/2004

16

Recommended level of data analysis


Evaluation of raw data
replicate plot + other options of Worksheet command in MODDE

Regression analysis and model interpretation


R2/Q2/Model Validity/Reproducibility + N-plot of residuals ANOVA (see Chapter 5) coefficient plot

Use of regression model


response contour plot + response surface plot, and prediction spreadsheet

2/10/2004

17

Evaluation of raw data


Analysis/Evaluate Recommended (Illustrated)
Condition number

Worksheet/Scatter plot Recommended


Plots of raw data

Worksheet/Histogram Recommended (Illustrated)


Distribution of response

Worksheet/Descriptive Statistics (Illustrated)


Distributions of several responses

Worksheet/Correlation Recommended if high CondNo


Plot or table of variable correlations

Worksheet/Replicate plot Minimum


Plot of signal-to-noise relationship

2/10/2004

18

Evaluation of raw data - Condition number


Measures the sphericity of a design Formally, the condition number is the ratio of the largest and the smallest singular values of the Xmatrix Informally, the condition number may be regarded as the ratio of the longest and shortest design diagonals
2/10/2004

Condition Number Good Design Questionable design BAD design

Scr. & Rob. Testing <3 3-6 >6

O pt. <8 8-12 >12

All factorial designs, without center-points, have condition number 1 Compute condition number before and after altering the design
19

Evaluation of raw data - Histogram & Descriptive Statistics


Tools used to evaluate the distribution of a response Near normality Positive skewness Negative skewness No transformation Use Log Use Neglog
Investigation: cakemix Histogram of Taste 6 5 4 Count 3 2 1 3.0 3.9 4.8 5.7 6.6 0
Count 10 8
Count 10 8 6 4 2 9 19 29 39 49 59 69 79 0

Investigation: itdoe_scr01c2 Histogram of Skewness

Investigation: microtox Histogram of V11

6 4 2 12 22 32 42 52 62 72 0

Bins
Investigation: CakeMix Descriptive Statistics for Taste

Bins

Bins

Investigation: itdoe_scr01c2 Descriptive Statistics for Skewness

Investigation: microtox Descriptive Statistics for V11

6
60

80 60

5
40 -

40 20
Skewness

20

Taste
Min: 3.52, Max: 5.9, Median: 4.73, Mean: 4.69455

V11
Min: 9, Max: 77, Median: 62.125, Mean: 56.1667

Min: 12.12, Max: 65.4, Median: 22.15, Mean: 23.7118

2/10/2004

20

Regression analysis - Normal probability plot of residuals


Good tool for finding outliers (deviating experiments) Example: NOx response of General Example 2 The vertical axis gives the normal probability of the distribution of residuals The horizontal axis corresponds to the numerical values of (standardized) residuals Note: Plot only useful with > 12-15 experiments & DF > 3

Investigation: TruckEngine (MLR) NOx with Experiment Number labels


0.98 0.95 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.05 0.02 -4 -3 -2

N-Probability

9 5

2
-1

13 12 16 14 11 48

10 37 17 15

Deleted Studentized Residuals


N=17 DF=9 R2=0.997 Q2=0.987 R2 Adj.=0.995 RSD=0.4624
MODDE 7 - 2004-01-20 15:36:58

2/10/2004

21

Use of model - Making predictions


Example: CakeMix Upper left-hand corner interesting Make predictions there

Flour = 400g

2/10/2004

22

Advanced level of data analysis (in Additional Topics)


Use partial least squares, PLS, as fit method for complicated applications PLS is appropriate when
(a) there are several correlated responses in the data set (b) the experimental design has a high condition number, above 10 (c) there are small amounts of missing data in the response matrix

All diagnostic tools are retained (R2/Q2, N-plot, etc.). In addition, PLS provides other useful diagnostic tools

2/10/2004

23

Overview of data analysis in DOE - CakeMix application


Three factors varied: Flour (200-400g), Shortening (50-100g), and Eggpowder (50-100g)

Responses: Taste of resulting cake, and cost of ingredients

2/10/2004

24

Overview of data analysis in DOE - CakeMix application

Compromise: High Taste & Low Cost

2/10/2004

25

Overview - Evaluate raw data


Investigation: Cakemix Histogram of taste 6 4 2 0

Count

3.0

3.9

4.8 Bins

5.7

6.6

Investigation: Cakemix_cost Plot of Replications for Taste with Experiment Number labels
6.00 5.50 Taste 5.00 4.50 4.00 3.50 1

6 4 3 7 1
2

5 8 9 11 10

2
3 4 5 6 7 8 9 Replicate Index
MODDE 7 - 2004-01-21 08:21:13

2/10/2004

26

Overview - Regression analysis and model interpretation


Investigation: Cakemix (MLR) Summary of Fit
1.00
0.40
R2 Q2 Model Validity Reproducibility

Investigation: Cakemix (MLR) Scaled & Centered Coefficients for Taste

0.80

0.20 0.00 -0.20 -0.40 -0.60 Fl Sh*Egg Fl*Egg Egg Fl*Sh Sh


R2=0.995 Q2=0.874

Compute model and interpret results

0.60 0.40 0.20 0.00

Taste
N=11 DF=4 Cond. no.=1.1726 Y-miss=0
N=11 DF=4 R2 Adj.=0.988 RSD=0.0768 Conf. lev.=0.95
MODDE 7 - 2004-01-20 14:47:02

Investigation: Cakemix (MLR) Summary of Fit


1.00

R2 Q2 Model Validity Reproducibility

Investigation: Cakemix (MLR) Scaled & Centered Coefficients for Taste


0.40 0.20

Refine model and interpret results

0.80 0.60 0.40 0.20 0.00

0.00 -0.20 -0.40 -0.60 Fl Egg Sh Sh*Egg


MODDE 7 - 2004-01-20 14:47:52

Taste
N=11 DF=6 Cond. no.=1.1726 Y-miss=0
N=11 DF=6 R2=0.988 Q2=0.937

R2 Adj.=0.980 RSD=0.0974 Conf. lev.=0.95

2/10/2004

27

Overview - Regression analysis and model interpretation


Do further diagnostic testing of refined model N-plot less useful due to few experiments Point 1 is influential but not an alarming outlier (Q2 = 0.937)
0.95 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.05 -5 -4 N-Probability

Investigation: Cakemix_cost (MLR) Taste with Experiment Number labels

1 4 9 11 3 7 5 10 2
-3 -2 -1 0 1 2 3 4 5 Deleted Studentized Residuals
N=11 DF=6 R2=0.988 Q2=0.937 R2 Adj.=0.980 RSD=0.0974
MODDE 7 - 2004-01-21 08:25:01

6 8

2/10/2004

28

Overview - Use of model


Use evaluated refined model (make decisions) Where to do verifying experiments ????

Max Taste

Min Cost

Compromise High Taste/Low Cost


Flour = 400

2/10/2004

29

What have we learnt


Data analysis of DOE-data comprises three stages
evaluation of raw data
done to understand and clean data, and speed up regression modelling

regression analysis and model interpretation


done to derive the predictively most relevant model with meaningful mechanistic interpretation

use of model
done to find out the impact of the model: What does it mean? Where should new experiments be positioned?

2/10/2004

30

Design of Experiments (DOE) Pharma Applications


Chapter 5 Analysis of full factorial designs. II. Causes of bad models

Contents
Review of data analytical steps
evaluation of raw data regression analysis and model interpretation use of model

Causes of poor model


Skew response distribution Curvature Bad replicates Deviating experiments Missing factors

2/10/2004

Review of data analysis in DOE - CakeMix application


Three factors varied: Flour (200-400g), Shortening (50-100g), and Eggpowder (50-100g)

Responses: Taste of resulting cake, and cost of ingredients

2/10/2004

Review of data analysis in DOE - CakeMix application

Compromise: High Taste & Low Cost

2/10/2004

Review - Evaluate raw data


Investigation: Cakemix Histogram of taste 6 4 2 0

Count

3.0

3.9

4.8 Bins

5.7

6.6

Investigation: Cakemix_cost Plot of Replications for Taste with Experiment Number labels
6.00 5.50 Taste 5.00 4.50 4.00 3.50 1

6 4 3 7 1
2

5 8 9 11 10

2
3 4 5 6 7 8 9 Replicate Index
MODDE 7 - 2004-01-21 08:21:13

2/10/2004

Review Regression analysis and model interpretation


Investigation: Cakemix (MLR) Summary of Fit
1.00
0.40
R2 Q2 Model Validity Reproducibility

Investigation: Cakemix (MLR) Scaled & Centered Coefficients for Taste

0.80

0.20 0.00 -0.20 -0.40 -0.60 Fl Sh*Egg Fl*Egg Egg Fl*Sh Sh


R2=0.995 Q2=0.874

Compute model and interpret results

0.60 0.40 0.20 0.00

Taste
N=11 DF=4 Cond. no.=1.1726 Y-miss=0
N=11 DF=4 R2 Adj.=0.988 RSD=0.0768 Conf. lev.=0.95
MODDE 7 - 2004-01-20 14:47:02

Investigation: Cakemix (MLR) Summary of Fit


1.00

R2 Q2 Model Validity Reproducibility

Investigation: Cakemix (MLR) Scaled & Centered Coefficients for Taste


0.40 0.20

Refine model and interpret results

0.80 0.60 0.40 0.20 0.00

0.00 -0.20 -0.40 -0.60 Fl Egg Sh Sh*Egg


MODDE 7 - 2004-01-20 14:47:52

Taste
N=11 DF=6 Cond. no.=1.1726 Y-miss=0
N=11 DF=6 R2=0.988 Q2=0.937

R2 Adj.=0.980 RSD=0.0974 Conf. lev.=0.95

2/10/2004

Regression analysis - Analysis of variance (ANOVA)


ANOVA is concerned with estimating various types of variabilities in the response data, and then comparing such estimates with each other through F-tests ANOVA table of Taste (CakeMix)
2/10/2004

Regression analysis - Analysis of variance (ANOVA)


Upper F-test assesses the significance of the regression model, and is satisfied when p < 0.05 Lower F-test compares the model error and the replicate error, and is satisfied when p > 0.05 LoF p-value used in calculation of Model Validity
2/10/2004

ANOVA table of Taste (CakeMix)


8

Review - Regression analysis and model interpretation


Do further diagnostic testing of refined model: ANOVA OK Not a strong outlier
Investigation: Cakemix_cost (MLR) Taste with Experiment Number labels
0.95 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.05 -5 -4

1 4 9 11 3 7 5 10 2
-3 -2 -1 0 1 2 3 4 5 Deleted Studentized Residuals
N=11 DF=6 R2=0.988 Q2=0.937 R2 Adj.=0.980 RSD=0.0974
MODDE 7 - 2004-01-21 10:04:51

2/10/2004

N-Probability

6 8

Review - Use of model


Use evaluated refined model (make decisions) Where to do verifying experiments ????

Max Taste

Min Cost

Compromise High Taste/Low Cost


Flour = 400

2/10/2004

10

Causes of poor model


Skew response distribution
Benefits of response transformation

Curvature Bad replicates Deviating experiments Missing factors .


2/10/2004

11

Cause of poor model. 1. - Skew response distribution


Common cause for poor modelling results Detection Tools:
Histogram Box-Whisker plot Replicate plot (next slide)
Investigation: Reporter Gene Assay Screening Histogram of S/B
18 16 14 12 Count
Count 9 8 7 6 5 4 3 2 1

Investigation: Reporter Gene Assay Screenin Histogram of S/B~

10 8 6 4 2 0 -1 24 49 Bins 74 99 124

0 -3 -2 -1 0 Bins 1 2 3

MODDE 7 - 2004-02-02 15:30:34


MODDE 7 - 2004-02-02 15:32:45

Investigation: Reporter Gene Assay Screening Descriptive Statistics Plot


120 100 80 60

Investigation: Reporter Gene Assay Screening Descriptive Statistics Plot


2

General Example 1; Response S/B

40 20 0 S/B
S/B Min: -0.2 Max: 117 Median: 1.7 Mean: 11.8053

-1

-2 S/B~
S/B Min: -2 Max: 2.06896 Median: 0.281033 Mean: 0.219957

2/10/2004

12

Cause of poor model. 1. - Skew response distribution


Investigation: Reporter Gene Assay Screening Plot of Replications for S/B with Experiment Number labels
Investigation: Reporter Gene Assay Screening Plot of Replications for S/B~ with Experiment Number labels

Replicate plot
S/B

120 100 80

16

1 S/B~

6 1 2 3
1 2 3 4 5 6 7 8

8 7

60 40 20

14 15 6 8 1 2 3 4 5 7 9 10111213
1 2 3 4 5 6 7 8 Replicate Index
MODDE 7 - 2004-02-02 15:37:01

16 14 15 13 19 17 18 12 9 10 11

-1

increases as a result of the logtransformation (from -0.2 to 0.91) Q2

19 17 18

-2

9 10 11 12 13 14 15 16 17

9 10 11 12 13 14 15 16 17

Replicate Index
MODDE 7 - 2004-02-02 15:35:41

R2 Investigation: Reporter Gene Assay Screening (MLR) Q2 Summary of Fit Model Validity Reproducibility

R2 Investigation: Reporter Gene Assay Screening (MLR) Q2 Model Validity Summary of Fit Reproducibility

1.00 0.80 0.60

1.00

0.80

0.60

0.40
0.40

0.20 0.00 -0.20 S/B


N=19 DF=11 Cond. no.=1.0897 Y-miss=0
N=19 DF=11

0.20

0.00 S/B~
Cond. no.=1.0897 Y-miss=0

2/10/2004

13

Benefits of response transformation


A well-chosen transformation may
(i) simplify the response function by linearizing a non-linear response-factor relationship, (ii) stabilize the variance of the residuals, and (iii) make the distribution of the residuals more normal, which sometimes implies that outliers are eliminated

2/10/2004

14

Example of benefits of a transformation


Production of long-lasting device for service in an aircraft. Ten factors were varied in 32 experiments (screening). Response was the lifetime in hours of device.
Investigation: Airplane (MLR) Summary of Fit
1.00
R2 Q2

Investigation: Airplane (MLR) Time with Experiment Number labels

Investigation: Airplane (MLR) Time with Experiment Number labels


Deleted Studentized Residuals

Investigation: Airplane (MLR) Time with Experiment Number labels

Investigation: Airplane Histogram of Time


12 10 Count 8 6 4 2

0.80

6000 Observed 4000 2000 0


Time
N=32 DF=21 Cond. no.=1.0000 Y-miss=0

0.60

0.40

0.20

10 26
0

15 31 21 5 19 3 823 7 24 25 30 9 29 13 27 32 18 11 12 14 1 17 28 6 2 16 22 4 20

2 1 0 -1 -2

26 10

15 31 19 5 21 4 20 3 6 22 118 7 2 16 32 30 78 23 1 29 11 27 12 9 14 13 25 24 28
N-Probability

0.00

0.99 0.98 0.95 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.05 0.02 0.01 -4 -3 -2

21 19 5 4 3 20 6 22 1 7 8 2 18 16 32 2 3 7 1 30 29 11 27 12 9 14 13 24 25 28
-1 0 1 2

26 15 10 31

2000

4000

6000

2000 Predicted

4000

6000

Predicted
N=32 DF=21 R2=0.876 Q2=0.712 R2 Adj.=0.817 RSD=844.4341
N=32 DF=21

Deleted Studentized Residuals


N=32 DF=21 R2=0.876 Q2=0.712 R2 Adj.=0.817 RSD=844.4341
MODDE 7 - 2004-01-21 10:27:11

316 13162316331643165316631673168316 Bins

R2=0.876 Q2=0.712

R2 Adj.=0.817 RSD=844.4341
MODDE 7 - 2004-01-21 10:23:15

Investigation: Airplane Histogram of Time~


12 10 8 Count 6 4 2
3 4

MODDE 7 - 2004-01-21 10:27:58

Investigation: Airplane (MLR) Summary of Fit


1.00

R2 Q2

MODDE 7 - 2004-01-21 10:21:55 Investigation: Airplane (MLR)

Investigation: Airplane (MLR) Time~ with Experiment Number labels


Deleted Studentized Residuals 2 1 0 -1 -2

Investigation: Airplane (MLR) Time~ with Experiment Number labels


0.99 0.98 0.95 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.05 0.02 0.01 -4 -3

Time~ with Experiment Number labels

0.80

0.60

0.40

Observed

10

3.00

4 10 26

0.20

0.00

2.50
Time~
N=32 DF=21 Cond. no.=1.0000 Y-miss=0

26

28
Predicted

21 23

N-Probability

3.50

15 31 5 19 3 21 78 23 24 25 30 9 13 29 27 18 32 11 14 12 1 17 28 6 20 16 2 22

32 19 31 2017 18 30 29 9 78 5 6 11 27 16 14 4 2 12 13 2524 3 15 22 1

26
-2

32 19 20 18 31 30 17 29 10 9 8 7 11 6 25 27 16 14 4 12 15 13 25 24 3 1 21 22 28 23
-1 0 1 2

2.60 2.80 3.00 3.20 3.40 3.60 3.80 Predicted


N=32 DF=21 R2=0.990 Q2=0.978 R2 Adj.=0.986 RSD=0.0399
MODDE 7 - 2004-01-21 11:10:59

2.60 2.80 3.00 3.20 3.40 3.60 3.80

Deleted Studentized Residuals


N=32 DF=21 R2=0.990 Q2=0.978 R2 Adj.=0.986 RSD=0.0399
MODDE 7 - 2004-01-21 10:30:04

2.00 2.30 2.60 2.90 3.20 3.50 3.80 4.10 Bins


MODDE 7 - 2004-01-21 10:29:07

Desired result: Both high


2/10/2004

N=32 DF=21

R2=0.990 Q2=0.978

R2 Adj.=0.986 RSD=0.0399
MODDE 7 - 2004-01-21 10:30:50

Straight line

No patterns

Straight line Bell-shaped curve


15

Cause of poor model. 2. - Curvature


Investigation: Byhand

Curvature is a problem in screening because the used linear and interaction models are unable to fit such a phenomenon Fortunately, problems related to curvature are easily detected and fixed Detection Tools:
Replicate plot Low Q2 & Model Validity LoF (ANOVA)

Plot of Replications for y2 with Experiment Number labels


14 12 10 y2 8 6 4 2 0 1 2 3 Replicate Index
MODDE 7 - 2004-01-21 11:30:05

Investigation: Byhand (MLR) Summary of Fit


1.00

R2 Q2 Model Validity Reproducibility

6 7 5

0.80 0.60 0.40 0.20 0.00

3
4

4
5

-0.20

y1
N=7 DF=3

y2
Cond. no.=1.3229 Y-miss=0

y3

Example: ByHand

2/10/2004

16

Curvature: How to handle it


Steps:
removal of non-significant two-factor interaction addition of Temp2 (x22) refitting of model higher Q2 & Model Val. better ANOVA
Investigation: Byhand (MLR) Summary of Fit
1.00 0.80 0.60
R2 Q2 Model Validity Reproducibility

Investigation: Byhand (MLR) Summary of Fit


1.00

R2 Q2 Model Validity Reproducibility

0.80

0.60
0.40 0.20 0.00 -0.20

0.40

0.20

y1
N=7 DF=3

y2
Cond. no.=1.3229 Y-miss=0

y3

0.00

y1
N=7 DF=3

y2
Cond. no.=2.8209 Y-miss=0

y3

2/10/2004

17

Curvature: How to handle it


NOTE: the initial 22 factorial design must be augmented with axisexperiments to permit a more rigorous assessment of necessary quadratic terms x12 and x22 are confounded
Investigation: Byhand (MLR) Summary of Fit
1.00
R2 Q2 Model Validity Reproducibility

Investigation: Byhand (MLR) Summary of Fit


1.00

R2 Q2 Model Validity Reproducibility

0.80

0.80

0.60

0.60

0.40

0.40

0.20

0.20

0.00

y1
N=7 DF=3

y2
Cond. no.=2.8209 Y-miss=0

y3

0.00

y1
N=7 DF=3

y2
Cond. no.=2.8209 Y-miss=0

y3

Investigation: Byhand (MLR) Scaled & Centered Coefficients for y2

Investigation: Byhand (MLR) Scaled & Centered Coefficients for y2

0.00 -2.00 -4.00 -6.00 -8.00 x1


N=7 DF=3 R2=0.997 Q2=0.964

0.00 -2.00 -4.00 -6.00 -8.00 x1


N=7 DF=3 R2=0.997 Q2=0.964

x2

x1*x1

x2

x2*x2

R2 Adj.=0.995 RSD=0.4628 Conf. lev.=0.95


MODDE 7 - 2004-01-21 11:50:00

R2 Adj.=0.995 RSD=0.4628 Conf. lev.=0.95


MODDE 7 - 2004-01-21 11:47:46

2/10/2004

18

Cause of poor model. 3. - Bad replicates


Investigation: amylase

A third common cause resulting in a poor screening model is when replicated experiments spread too much Detection Tools:
Replicate plot ANOVA table Reproducibility bar (here = 0.53, but not shown)

Plot of Replications for amylase with Experiment Number labels


3000 2500 amylase 2000 1500 1000 500 1 2 3 4 5 6 7 8 9 10 11 Replicate Index
MODDE 7 - 2004-01-21 11:51:54

17 16 2 3 1 5 7 4 6 8 9 10 12
12 13 14 15

13 11

14

15 18 19 20

2/10/2004

19

Cause of poor model. 4. - Deviating experiments


Deviating experiments, or outliers, may degrade the predictive ability and blur the interpretation of a regression model Detection Tools: mainly N-plot, but also other residual plots Replicate plot, Model Validity, and LoF in ANOVA
Investigation: Willge_Opt (MLR) Yield with Experiment Number labels
0.98 0.95 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.05 0.02 -7

Investigation: Willge_Opt (MLR) Yield with Experiment Number labels


0.98 0.95 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.05 0.02

Investigation: Willge_Opt (MLR) Yield with Experiment Number labels

8
-6 -5 -4 -3 -2 -1

12 14 10 17 1 20 16 15 9 19 18 6 2 13 4 11 5 7 3
0 1 2 3 4 5 6 7

N-Probability

N-Probability

N-Probability

8
-1

17 20 16 15 9 6 19 2 13 4 18 11 5 7 3
0 Standardized Residuals
N=20 DF=10 R2=0.980 Q2=0.849 R2 Adj.=0.961 RSD=5.0006

14 1 10

12

0.98 0.95 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.05 0.02

7 3

17 20 16 15 69 19 2 13 4 18 11 5

14 1 10

12

-9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 Raw Residuals
N=20 DF=10 R2=0.980 Q2=0.849 R2 Adj.=0.961 RSD=5.0006
MODDE 7 - 2004-01-21 11:54:57

Deleted Studentized Residuals


N=20 DF=10 R2=0.980 Q2=0.849 R2 Adj.=0.961 RSD=5.0006
MODDE 7 - 2004-01-21 11:54:00

MODDE 7 - 2004-01-21 11:54:26

2/10/2004

20

Cause of poor model. 5. - Missing factors


Most difficult outcome Might be the case when a model with moderate values of R2 ( 0.6) and Q2 ( 0.4) is obtained A missing factor requires additional thinking
Variation in temperature, humidity, raw material, equipment failure, ?

A mapping of a new factor requires more experiments; therefore, in reality, we usually only eliminate factors in screening

2/10/2004

21

What have we learnt


Data analysis of DOE-data comprises three stages
evaluation of raw data regression analysis and model interpretation use of model

Causes of bad model


Skew response distribution Curvature Bad replicates Deviating experiments Missing factors

Primary Detection Tool


Histogram Model Validity/LoF in ANOVA Replicate plot N-plot Generally poor model performance

2/10/2004

22

Design of Experiments (DOE) Pharma Applications


Chapter 6 Experimental Objective: Screening Illustration: General Example 1 (Reporter Gene Assay)

Contents
General Example 1
Background Steps in problem formulation Introduction Geometry Confoundings Generators Defining Relation Resolution Summary of properties

Fractional factorial designs

General Example 1

Summary

Evaluation of raw data Regression analysis and model interpretation Use of model

2/10/2004

Background to General Example 1


Reporter gene assays are used in mechanistic studies of gene regulation (toxicology, drug development, etc.) A reporter gene has an easily measurable phenotype whose transcription is controlled by a promoter Reporter gene assays provide important information of gene regulation relating to expression (i.e., number of copies), and when and where a particular protein is formed
2/10/2004

Seed cells into plates and culture or treat as desired!

Place in luminometer and measure light emission!

Principal investigators: Lena Schultz and Lisbeth Abramo Active Biotech AB, Lund

PF - Selection of experimental objective


In the reporter gene application, the selected experimental objective was screening With screening one wants to find out a little about many factors, that is, which factors dominate and what are their optimal ranges? Typically, screening designs involve the study of between 4 and 10 factors, but applications with as many as 12-15 screened factors are not uncommon The reporter gene case contains six factors, and this facilitates the overview of the results

t en tm ea Tr

Light

2/10/2004

PF - Specification of factors
The Ishikawa, or fishbone, system diagram is a very helpful method to overview all factors Reduces the risk of missing a critical factor The four Ms Practical maximum depth 4-5 levels

Methods

Manpower

Machines

Materials

2/10/2004

PF - Specification of factors
Six factors:
Number of cells/well (50000 400000) PMA (stimulator) (5 100 ng/ml) Ionomycin (stimulator) (0.1 2 g/ml) Stimulation time (3 6 hours) Lysing volume (30 100 l) Ratio sample/substrate (2 10)

2/10/2004

PF - Specification of responses
It is important to select responses that are relevant according to the experimental goals Many responses is not a problem Here: Signal-to-background ratio, computed as: [(signal-background)/background]

GOAL: As high as possible (Maximize)


2/10/2004

PF - Selection of regression model


Linear model? Interaction Model? With six factors a linear model requires 16 + 3 and an interaction 32 + 3 experiments A linear model was selected:
y = 0 + 1x1 + 2x2 +...+

2/10/2004

PF - Generation of design and creation of worksheet


The 26-2 fractional factorial design is applicable (explained in a moment) Standard design with 16 corners and 3 center-points

NOTE: Randomized run order!

2/10/2004

Introduction to fractional factorial designs


Consider the 27 full factorial design in 128 runs It is possible to estimate 128 model parameters
1 constant term; 7 linear terms; 21 two-factor interactions; 35 three-factor interactions; 35 four-factor interactions; 21 five-factor interactions; 7 six-factor interactions; 1 seven-factor interaction

Not all parameters are of appreciable size and meaningful -- hierarchy Linear terms tend to be larger than two-factor interactions, which, in turn, tend to be larger than three-factor interactions, ... A 2k full factorial design has a parameter redundancy, i.e., an excess number of parameters which can be estimated but which lack relevance Fractional factorial designs exploit this redundancy, by reducing the number of design runs

2/10/2004

10

Geometry of fractional factorial designs (FFDs)


Fractional factorial designs are used in screening and robustness testing A fraction of all possible corner experiments is selected Advantage: Reduction of experiments Disadvantage: Confounding of effects
7 8 Eggpowder 100
7 8 Eggpowder 100

8 Eggpowder 4
Sho rten ing

100

5
50 100

4
rten in g

Sho rten in g

50 100
1 200

50 100

Sh o

1 200

Flour

2 400

50

Flour

2 400

50

1 200

Flour

2 400

50

2/10/2004

11

Going from the 23 full factorial design to the 24-1 FFD


Use computational matrix of parent 23 design
Run # 1 2 3 4 5 6 7 8 constant + + + + + + + + x1 + + + + x2 + + + + x3 + + + + x1x2 + + + + x1x3 + + + + x2x3 + + + + x1x2x3 + + + +

Introduction of the fourth factor


Run # 1 2 3 4 5 6 7 8 constant + + + + + + + + x1 + + + + x2 + + + + x3 + + + + x 1x 2 + + + + x 1x 3 + + + + x 2x 3 + + + +

x4 = x1x2x3 + + + +

2/10/2004

12

Confounding of effects
Reduction of experiments means that effects become confounded, that is, to a certain degree mixed up with each other The 16 possible effects are evenly allocated as two effects per column Main effects are confounded with the three-factor interactions Comparatively simple confounding situation
x1x2x3x4 constant + + + + + + + + x2x3x4 x1 + + + + x1x3x4 x2 + + + + x 1x 2x 4 x3 + + + + = x 1x 2x 3 x4 + + + + x 3x 4 x 1x 2 + + + + x 2x 4 x 1x 3 + + + + x 2x 3 x 1x 4 + + + +

1 2 3 4 5 6 7 8

2/10/2004

13

Confoundings: Use of Correlation Matrix


Full factorial design (23); no confounding Fractional factorial design (24-1); confounding

2/10/2004

14

A graphical interpretation of confoundings


Only the sum of confounded terms is estimated Main effects usually dominate over three-factor interactions More experiments are needed to better resolve confounded terms

x2 / x1x3x4

x1 / x2 / x3 / x4 / x1x2 / x1x3 / x1x4 / x2x3x4 x1x3x4 x1x2x4 x1x2x3 x3x4 x2x4 x2x3

2/10/2004

15

Generators - Introduction
The generator dictates which specific fraction will be selected, and thereby, indirectly, controls the confounding pattern
7 8 Eggpowder 100
7 8 Eggpowder 100

8 Eggpowder 4
Sho rten in g

100

5
50 100

4
Sho rten ing

Sho rten in g

50 100
1 200

50 100

1 200

Flour

2 400

50

Flour

2 400

50

1 200

Flour

2 400

50

x1 = Flour x2 = Shortening x3 = Eggpowder

Run 5 2 3 8

x1 + +

x2 + +

x3 = x1x2 + +

Run 1 6 7 4

x1 + +

x2 + +

-x3 = x1x2 + + -

2/10/2004

16

Generators of the 24-1 fractional factorial design


Two versions of the 24-1 design, one given by the generator x4 = x1x2x3, and the other by the alternative generator -x4 = x1x2x3
Run # 1 2 3 4 5 6 7 8 x1 + + + + x2 + + + + x3 + + + + x4 Run # 1 10 11 4 13 6 7 16 x1 + + + + x1 + + + + x2 + + + + x2 + + + + x3 + + + + x3 + + + + x 1x 2x 3= x4 + + + + x 1x 2x 3= x4 + + + +

9 10 11 12 13 14 15 16
2/10/2004

+ + + +

+ + + +

+ + + +

+ + + + + + + +

9 2 3 12 5 14 15 8

17

Multiple generators
Example: Construction of 25-2 fractional factorial design Generators: x4 = x1x2 and x5 = x1x3 Fourth and fifth factors may be introduced in the design as +x4/+x5, +x4/-x5, -x4/+x5, or -x4/-x5 Four possible quarter-fractions of 8 experiments
1 2 3 4 5 6 7 8 x1 + + + + x2 + + + + x3 + + + + x4 = x 1x 2 + + + + x5 = x1x3 + + + + x2x3 + + + + x1x2x3 + + + +

2/10/2004

18

Defining relation - Introduction


The defining relation of a design is a formula derived from all its generators, that allows the calculation of the occurring confounding pattern This relation ties together all generators; Example: 24-1 design
Rules: (1) X1*I=X1
X1 * I = X1 + + + + + + + +

(2) X1*X1=X12=I
X1 * X1 = I + + + + + + + +

Step 1: Identify generator(s): Step 2: Multiply both sides by X4: Step 3: Apply rule 2 This is the defining relation for the 24-1 design

X4=X1X2X3 X42=X1X2X3X4 I=X1X2X3X4

2/10/2004

19

Use of defining relation


Confounding pattern can be understood through defining relation Example 24-1 design What is x1 confounded with?
Step 1: I = x1x2x3x4 Step 2: x1I = x12x2x3x4 Step 3: x1 = Ix2x3x4 Step 4: x1 = x2x3x4

2/10/2004

20

Defining relation of the 25-2 fractional factorial design

I = x1x2x4 = x1x3x5 = x2x3x4x5 What is x4 confounded with? x4 = x1x2 = x1x3x4x5 = x2x3x5

2/10/2004

21

Confounding pattern of the used 26-2 protocol


There are four ways of selecting 16 experiments out of 64 Actual selection controlled by the generators Each such quarterfraction is equivalent from a mathematical point of view

2/10/2004

22

Resolution of fractional factorial designs


Resolution of a design is defined as the length of the shortest word in the defining relation Resolution III I=a*b*c
Main effects confounded with two-factor interactions. Resolution III designs are the most reduced but also the most difficult analyze. Recommended for robustness testing.

Resolution IV Resolution V

I=a*b*c*d

Main effects unconfounded with two-factor interactions. Two-factor interactions still confounded with each other. Recommended for screening

I=a*b*c*d*e

Main effects unconfounded with two-factor interactions. Two-factor interactions unconfounded with each other. Resolution V designs are almost as good as full factorial designs.

2/10/2004

23

Summary of properties of fractional factorial designs


The selected generators control the confounding pattern and the selected fraction of experiments. Indirectly, this means that the selected generators also influence the shape of the defining relation and the resolution of the design.
Confounding Pattern

Resolution

Defining Relation

Generator(s)

Selected Fraction

2/10/2004

24

Overview table of common fractional factorial designs


Factors

4 8

3 3-1 2 Res III


+/-X3=X1*X2

8 No design No design 2 Res IV


8-4

9 No design No design 2 Res III


9-5

10 No design No design 2 Res III


10-6

No design No design 2 Res IV 2


4 4-1

No design No design 2 Res III


+/-X4=X1*X2 +/-X5=X1*X3 +/-X6=X2*X3

+/-X4=X1*X2*X3

2 Res III
+/-X4=X1*X2 +/-X5=X1*X3

5-2

6-3

2 Res III
7-3

7-4

16

2 Res V

5-1

+/-X5=X1*X2*X3*X4

2 Res IV

6-2

+/-X4=X1*X2 +/-X5=X1*X3 +/-X6=X2*X3 +/-X7=X1*X2*X3

+/-X5=X1*X2*X3 +/-X6=X2*X3*X4

2 Res IV

+/-X5=X1*X2*X3 +/-X6=X2*X3*X4 +/-X7=X1*X3*X4

+/-X5=X2*X3*X4 +/-X6=X1*X3*X4 +/-X7=X1*X2*X3 +/-X8=X1*X2*X4

+/-X5=X1*X2*X3 +/-X6=X2*X3*X4 +/-X7=X1*X3*X4 +/-X8=X1*X2*X4 +/-X9=X1*X2*X3*X4

32

2 D-opt 2 D-opt 2 Res VI Res IV Res IV


+/-X6=X1*X2*X3 *X4*X5 +/-X6=X1*X2*X3*X4 +/-X7=X1*X2*X4*X5

6-1

7-2

8-3

+/-X6=X1*X2*X3 +/-X7=X1*X2*X4 +/-X8=X2*X3*X4*X5

2 Res IV

9-4

+/-X5=X1*X2*X3 +/-X6=X2*X3*X4 +/-X7=X1*X3*X4 +/-X8=X1*X2*X4 +/-X9=X1*X2*X3*X4 +/-X10=X1*X2

+/-X6=X2*X3*X4*X5 +/-X7=X1*X3*X4*X5 +/-X8=X1*X2*X4*X5 +/-X9=X1*X2*X3*X5

2 Res IV

10-5

64

2 D-opt 2 D-opt 2 D-opt 2 Res VII Res V Res IV Res IV


+/-X7=X1*X2*X3 * X4*X5*X6 +/-X7=X1*X2*X3*X4 +/-X8=X1*X2*X5*X6 +/-X7=X1*X2*X3*X4 +/-X8=X1*X3*X5*X6 +/-X9=X3*X4*X5*X6

7-1

8-2

9-3

+/-X6=X1*X2*X3*X4 +/-X7=X1*X2*X3*X5 +/-X8=X1*X2*X4*X5 +/-X9=X1*X3*X4*X5 +/-X10=X2*X3*X4*X5

10-4

D-opt

128

2 D-opt 2 D-opt 2 Res VIII Res VI Res V


+/-X8=X1*X2*X3*X4 *X5*X6*X7 +/-X8=X1*X3*X4*X6*X7 +/-X9=X2*X3*X5*X6*X7

8-1

9-2

+/-X7=X2*X3*X4*X6 +/-X8=X1*X3*X4*X6 +/-X9=X1*X2*X4*X5 +/-X10=X1*X2*X3*X5

10-3

D-opt

+/-X8=X1*X2*X3*X7 +/-X9=X2*X3*X4*X5 +/-X10=X1*X3*X4*X6

2/10/2004

25

Summary of fractional factorial designs


Advantage: We can investigate more factors with drastically fewer runs (always add 3-5 center-points)
No of Factors 2 3 4 5 6 7 8 9-16 Full Factorial 4 8 16 32 64 128 256 >512 Fractional Factorial N/A 4 8 16 16 16 16 32

Disadvantage: Confounding = Aliasing of effects


Higher resolution gives less problems - Resolution IV recommended

2/10/2004

26

Reporter Gene Assay - Evaluation of raw data


Replicate and histogram plots (before and after log-transformation)
Investigation: Reporter Gene Assay Screening Plot of Replications for S/B with Experiment Number labels
18
120 100 80

Investigation: Reporter Gene Assay Screening Histogram of S/B

16

16 14 12 Count 10 8 6

Before

S/B

60 40 20 0 1

14 15 6 8 1 2 3 4 5 7 9 10111213
2 3 4 5 6 7 8 Replicate Index

19 17 18

2 0 -1 24 49 Bins
MODDE 7 - 2004-02-02 15:55:58 Investigation: Reporter Gene Assay Screening

9 10 11 12 13 14 15 16 17

74

99

124

Investigation: Reporter Screening MODDE Gene 7 - 2004-02-02Assay 15:55:14 Plot of Replications for S/B~ with Experiment Number labels
2

Histogram of S/B~
9 8 7 6 Count 5 4 3 2 1

After

6 1 2 3
1 2 3 4 5 6 7 8

8 7

S/B~

16 14 15 13 19 17 18 12 9 10 11

-1

-2

9 10 11 12 13 14 15 16 17

0 -3 -2 -1 0 Bins 1 2 3

Replicate Index
MODDE 7 - 2004-02-02 15:57:02

MODDE 7 - 2004-02-02 15:57:59

2/10/2004

27

Reporter Gene Assay - Evaluation of raw data


Condition number & Correlation matrix
Good design Response depends on Cells, Ion, and StH

2/10/2004

28

Reporter Gene Assay - Regression analysis


R2 Investigation: Reporter Gene Assay Screening (MLR) Q2 Model Validity Summary of Fit Reproducibility

The default linear model looks good with no evidence of lack of fit (R2 = 0.92, Q2 = 0.79, MVal = 0.65, Rep = 0.96)
0.98 0.95 0.9 N-Probability

1.00

0.80

0.60

0.40

0.20

0.00 S/B~
N=19 DF=12 Cond. no.=1.0897 Y-miss=0

Investigation: Reporter Gene Assay Screening (MLR) S/B~ with Experiment Number labels

No outliers

0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.05 0.02 -4 -3 -2

1 16 19 15 17 2 18 8 49 5 13 12 10 7 11 14 6
-1 0 1 2 3 4

Deleted Studentized Residuals


N=19 DF=12 R2=0.917 Q2=0.791 R2 Adj.=0.876 RSD=0.3472
MODDE 7 - 2004-02-02 16:01:08

2/10/2004

29

Reporter Gene Assay - Model adjustment


Model revision steps:
1.00
R2 Investigation: Reporter Gene Assay Screening (MLR) Q2 Summary of Fit Model Validity

Investigation: Reporter Gene Assay Screening (MLR) Scaled & Centered Coefficients for S/B~
1.00 0.80 0.60

Reproducibility

1) PMA and Ratio were removed 2) Six two-factor interactions were added 3) Only three interactions were kept (Cel*Lys, Ion*StH, and Ion*Lys) 4) The revised model is much better (R2 = 0.96, Q2 = 0.91, Mval = 0.79, Rep = 0.96)
2/10/2004

0.80

0.60

0.40 0.20 0.00

0.40

0.20

-0.20 -0.40 Cel PM


S/B~
N=19 DF=12 Cond. no.=1.0897 Y-miss=0

N=19 DF=12

R2=0.917 Q2=0.791

R2 Adj.=0.876 RSD=0.3472 Conf. lev.=0.95


MODDE 7 - 2004-02-02 16:02:25

R2 Investigation: Reporter Gene Assay Screening (MLR) Q2 Model Validity Summary of Fit Reproducibility

Investigation: Reporter Gene Assay Screening (MLR) Scaled & Centered Coefficients for S/B~
1.00 0.80 0.60

1.00

0.80

0.60

0.40 0.20

0.40

0.00 -0.20

0.20
Cel Cel*Lys Ion*Lys Lys Ion Ion*StH StH

0.00 S/B~
N=19 DF=11 Cond. no.=1.0897 Y-miss=0
N=19 DF=11 R2=0.962 Q2=0.914

R2 Adj.=0.937 RSD=0.2467 Conf. lev.=0.95


MODDE 7 - 2004-02-02 16:04:35

Lys

StH

Rat

Ion

0.00

30

Reporter Gene Assay - Model adjustment


Some observations:
The three most important factors are Cells, Ionomycin and Stimulation Time There are a few two-factor interactions which look interesting as they improve the predictive power of the model However, these two-factor interactions are confounded with other two-factor interactions

Experimenters decided to carry out more experiments


to possibly improve the modelling of S/B (although we must say the model is already very good) to resolve confounded two-factor interactions technique: Fold-over (Chapter 7)

2/10/2004

31

Reporter Gene Assay - Use of model


Plot created with Cells and LysVolume as axes (allows exploration of twofactor interaction), while fixing the other factors at their maximum value (because of positive regression coefficients) Region of Maximum S/B
2/10/2004

32

Summary
Fractional factorial designs form the most widely used family of screening designs Many factors can be mapped in few runs Confounding of effects is a disadvantage, but this can be reasonably tolerated by selecting a ResIV design Reporter Gene Assay:
Very good model for S/B Indication of some small interaction terms, which may be important More experiments, to possibly improve the modelling of S/B, and to resolve confounded two-factor interactions, will be done using Fold-over (Chapter 7)

2/10/2004

33

Design of Experiments (DOE) Pharma Applications


Chapter 7 Post-screening actions (What to do after screening ?)

Contents
Principles for inter- and extrapolation Basic requirement: Sound modelling Main outcomes Gradient techniques & software optimizer Adding new experiments Reporter Gene Assay
Creating the fold-over design Data Analysis

Summary
2/10/2004

Interpolation basic principle


Interpolation is based on using the derived regression model for predictions inside the experimental space explored Of interest when experimenting outside the investigated region is either impossible or undesired
Response contour plots, response surface plots Software optimizer

2/10/2004

Extrapolation basic principle


Extrapolation (predictions outside investigated region) is done when it is possible to change factor settings
Response contour plots, response surface plots (Gradient techniques) Software optimizer Extrapolation more uncertain than interpolation Recommendation: Avoid extrapolating more than 25% outside factor interval (Example: Temperature range 20 60 C; Modified range (10 - 70 C)

2/10/2004

Basic requirement for polation: Reliable model


Investigation: Reporter Gene Assay Screening (MLR) Summary of Fit
1.00
R2 Q2 Model Validity Reproducibility

Use richness of diagnostic tools to acquire a reliable and predictive model Example: Reporter Gene Assay (plots from Chap. 6)

0.80

0.60

0.40

0.20

0.00 S/B~
N=19 DF=11 Cond. no.=1.0897 Y-miss=0

Investigation: Reporter Gene Assay Screening (MLR) Scaled & Centered Coefficients for S/B~
1.00 0.80 0.60 0.40 0.20 0.00 -0.20 Cel Cel*Lys Ion*Lys
MODDE 7 - 2004-02-04 08:51:25

Lys

Ion

N=19 DF=11

R2=0.962 Q2=0.914

R2 Adj.=0.937 RSD=0.2467 Conf. lev.=0.95

2/10/2004

Ion*StH

StH

Main outcomes One point that fulfills the goals

One of the performed experiments fulfills the experimental goals (IDEAL case) Make a limited set of new trials to verify the golden run

2/10/2004

Main outcomes Interpolation to find interesting area

Predictions inside region lead forward to an interesting point or small region Make a limited set of new trials to verify this point or small region

2/10/2004

Main outcomes Extrapolation to find interesting area

Predictions ouside region lead forward to an interesting point or small region Make a limited set of new trials to verify this point or small region

2/10/2004

How do we find interesting point or area?


Example: Finding the optimal region is an objective which is often used to bridge the gap between screening and optimization

Two techniques are used for moving the experimental region


graphically oriented gradient technique automatic optimization procedure based on running multiple simplexes

2/10/2004

Gradient techniques
Steepest ascent or descent. Example shows Steepest descent Gradient techniques work best with fairly few responses, and when occurring twofactor interactions are fairly small

Adhere to the line

Simulate a design load = 100.00

40
35 137

106

50 90 130 170
NH3

30
7 5 . 0

25
20 15 12.5 10 1.10

43.7

169

1.15

1.20

1.25

1.30

1.35

1.40

Airfuel

2/10/2004

10

Software optimizer
The MODDE optimizer will simultaneously start as many as eight simplexes, from different locations in the factor space (Details in Chapter 8) Example: Reporter Gene Assay The eight starting points in factor space

2/10/2004

11

Adding new experiments region not moved


Complementing for.

unconfounding
2/10/2004

curvature
12

Adding new experiments region is moved


Complementing using

follow-up screening design (or robustness testing design)

2/10/2004

13

Adding new experiments region is moved


Complementing using

new RSM design!


2/10/2004

14

What it looks like in software


Software opportunities: Chap 7 Chap 8 Add Topics Hinted at in Chap 5
tim op D al

2/10/2004

15

The fold-over technique


The fold-over principle gives a complementary design that results in unconfounding in resolution III and resolution IV designs Example relates to the 25-2 fractional factorial design
x4 = x1x2 x5 = x1x3

x4 = -x1x2 x5 = -x1x3

2/10/2004

16

Summary; Post-screening actions


Post-screening actions depends on
the quality of the obtained regression model whether it is possible and necessary to modify the factor ranges whether some of the already conducted experiments are close to fulfilling the goals stated in the problem formulation

Experiments outside investigated region impossible or undesired


One of the performed experiments fulfills the experimental goals (IDEAL case) None of the performed experiments meets the goals and the model must be used for finding the best point Addition of complementary experiments to the mother design (e.g. Fold-over)

Experiments outside investigated region possible and/or desired


graphically oriented gradient technique automatic procedure based on running multiple simplexes in parallel

2/10/2004

17

Creating the fold-over reporter gene assay design


Worksheet has been appended with 19 new experiments The block factor is a precautionary measure that is useful for probing whether significant changes over time have occurred

2/10/2004

18

Reporter Gene Assay - Evaluation of raw data


New design must be evaluated in the same way as parent design
Investigation: Reporter Gene Assay Screening - Fold over complement Plot of Replications for S/B with Experiment Number labels
150

Investigation: Reporter Gene Assay Screening Fold over complement Histogram of S/B

35
Count

30

16
100 S/B

20

50

14 27 15 33 32 13 17 19 20 36 37 25 18 38 12 24 26 29 31 34 12345678910 11 21 22 23 28 30
0 10 20 Replicate Index 30

10

0 -1 24 49 74 Bins
MODDE 7 - 2004-02-04 09:02:08

99

124

149

174

MODDE 7 - 2004-02-04 09:03:19

The S/B response is not normally distributed (there are a few extreme values)

2/10/2004

19

Reporter Gene Assay - Evaluation of raw data


Response becomes more nearly normal after log transformation
Investigation: Reporter Gene Assay Screening - Fold over complement Plot of Replications for S/B~ with Experiment Number labels
2 1 S/B~ 0 -1 -2 0

Investigation: Reporter Gene Assay Screening - Fold over complement Histogram of S/B~
12 10 8 Count 6 4 2

35 16 14 27 15 33 32 68 13 19 3436 37 17 25 38 12 18 7 26 24 29 5 31 10 9 30 21 23 28 12 4 22 11 3
10

20
20 Replicate Index
MODDE 7 - 2004-02-04 09:05:45

30

0 -3.00 -2.15 -1.30 -0.45 0.40 1.25 2.10 2.95 Bins


MODDE 7 - 2004-02-04 09:05:12

2/10/2004

20

Reporter Gene Assay - Regression analysis


Summary of fit plot of fitted model
R2 Investigation: Reporter Gene Assay Screening Q2 - Fold over complement (MLR); Summary of Fit Model Validity

Investigation: Reporter Gene Assay Screening - Fold over complement (MLR) S/B~ with Experiment Number labels
0.99 0.98 0.95 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.05 0.02 0.01 -4 -3 -2

Reproducibility

1.00

0.80

0.60

0.40

0.20

1 19 16 36 37 28 17 38 27 2 35 22 18 2 3 9 15 26 32 5 13 8 10 21 31 4 30 25 33 14 1 2 6 20 34 7 11 24 29
-1 0 1 2 3 4

0.00 S/B~
N=38 DF=30 Cond. no.=1.0897 Y-miss=0

N-Probability

Deleted Studentized Residuals


N=38 DF=30 R2=0.920 Q2=0.877 R2 Adj.=0.901 RSD=0.3027
MODDE 7 - 2004-02-04 09:16:02

No outliers
2/10/2004

21

Reporter Gene Assay - Model interpretation


Block factor is insignificant (no significant drift over time) PMA and Ratio not influential Keep four linear terms
Investigation: Reporter Gene Assay Screening - Fold over complement (MLR) Scaled & Centered Coefficients for S/B~
0.80 0.60 0.40 0.20 0.00 -0.20 Cel Lys StH Rat $Bl PM Ion
R2=0.920 Q2=0.877

N=38 DF=30

R2 Adj.=0.901 RSD=0.3027 Conf. lev.=0.95


MODDE 7 - 2004-02-04 09:16:57

2/10/2004

22

Reporter Gene Assay - Model refinement


Four main effects were kept Two-factor interactions were evaluated but were insignificant The revised model is only marginally better
R2 Investigation: Reporter Gene Assay Screening Q2 - Fold over complement (MLR); Summary of Fit Model Validity

Investigation: Reporter Gene Assay Screening - Fold over complement (MLR) Scaled & Centered Coefficients for S/B~
0.80

Reproducibility

1.00

0.80

0.60
0.60

0.40 0.20

0.40

0.00
0.20

-0.20 Cel Lys StH Rat $Bl PM Ion


R2=0.920 Q2=0.877

0.00 S/B~
N=38 DF=30 Cond. no.=1.0897 Y-miss=0

N=38 DF=30

R2 Adj.=0.901 RSD=0.3027 Conf. lev.=0.95


MODDE 7 - 2004-02-04 09:16:57

R2 = 0.92, Q2 = 0.88, MVal = 0.47, Rep = 0.97


R2 Investigation: Reporter Gene Assay Screening Q2 Model Validity - Fold over complement (MLR); Summary of Fit

Investigation: Reporter Gene Assay Screening - Fold over complement (MLR) Scaled & Centered Coefficients for S/B~
0.80 0.60

Reproducibility

1.00

0.80

0.60

0.40 0.20 0.00

0.40

0.20

-0.20 Cel Lys


MODDE 7 - 2004-02-04 09:21:07

Ion

0.00 S/B~
N=38 DF=33 Cond. no.=1.0897 Y-miss=0

N=38 DF=33

R2=0.912 Q2=0.887

R2 Adj.=0.902 RSD=0.3018 Conf. lev.=0.95

R2
2/10/2004

= 0.91,

Q2

= 0.89, MVal = 0.47, Rep = 0.97


23

Reporter Gene Assay Further Diagnostic Checking


The revised model contains no outliers. However, some of the largest residuals are encountered for the six center-points (hints at curvature problems) Curvature is easy to handle with a quadratic model (Chap 8)
Investigation: Reporter Gene Assay Screening - Fold over complement (MLR) S/B~ with Experiment Number labels
0.99 0.98 0.95 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.05 0.02 0.01 -4 -3 -2

Investigation: Reporter Gene Assay Screening - Fold over complement (MLR) S/B~ with Experiment Number labels
Deleted Studentized Residuals 2 1 0 -1 -2 -1 0 Predicted
N=38 DF=33 R2=0.912 Q2=0.887 R2 Adj.=0.902 RSD=0.3018
MODDE 7 - 2004-02-04 09:28:55

19 16 1 36 137 7 28 35 22 27 38 2 3 18 15 8 926 13 4 5 12 32 31 30 7 11 10 34 21 14 6 25 33 24 20 3 29
-1 0 1 2 3 4 Deleted Studentized Residuals
N=38 DF=33 R2=0.912 Q2=0.887 R2 Adj.=0.902 RSD=0.3018
MODDE 7 - 2004-02-04 09:28:11

19 36 37 17 28 22 23 38 2 18 8 26 9 4 5 12 13 7 34 11 30 31 21 10 25 24 20 3 29 1
1

16 27 15 32 6 33 35

N-Probability

14
2

2/10/2004

StH

24

Reporter Gene Assay - Interpretation of refined model

Mainly a linear dependence of S/B on the factors


0.80

Investigation: Reporter Gene Assay Screening - Fold over complement (MLR) Scaled & Centered Coefficients for S/B~

Cells, Ionomycin and Stimulation Time (StH) are most important Lysing Volume will not be varied in RSM (Chapter 8)

0.60 0.40 0.20 0.00 -0.20 Cel Lys


MODDE 7 - 2004-02-04 09:21:07

Ion

N=38 DF=33

R2=0.912 Q2=0.887

R2 Adj.=0.902 RSD=0.3018 Conf. lev.=0.95

2/10/2004

StH

25

MODDE optimizer applied to Reporter Gene Assay data


What does it look like at point
Cells = 400000 Ionom. = 2.0 StimH = 6 LysVol = 30 ?

Bring optimization results into response contour plots

2/10/2004

26

What we have learnt


DOE applied to technical and chemical problems often involves proceeding in stages Initially, a screening design in 19 experiments was laid out Secondly a fold-over complement was added With the combined set of experiments it was possible to corroborate that four factors are more influential than the others

2/10/2004

27

Design of Experiments (DOE) Pharma Applications


Chapter 8 Experimental objective: Optimization Illustration: General Example 2 (Reporter Gene Assay)

Contents
General Example 2
Background Problem formulation

Introduction to RSM Composite designs; CCC & CCF General Example 2


Construction & Geometry Comparison of CCC and CCF Evaluation of raw data Regression analysis and model interpretation Use of model MODDE optimizer

Summary

2/10/2004

Background to General Example 2


Continuation of screening designs (Chapters 6 and 7) Main results of screening designs
Three main factors Suspected quadratic dependence (structured residuals)
Seed cells into plates and culture or treat as desired!

Place in luminometer and measure light emission!

Principal investigators: Lena Schultz and Lisbeth Abramo Active Biotech AB, Lund

t en tm ea Tr

Light

2/10/2004

PF - Selection of experimental objective


In the third phase of the reporter gene application, the selected experimental objective was optimization In optimization, the important factors, usually between 2 and 5, have been identified, and one wants to extract in-depth information about them It is of interest to reveal the nature of the relationships between the few factors and the measured responses For some factors and responses the relationships might be linear, for others non-linear, that is, curved Such relationships are conveniently investigated by fitting a quadratic regression model

2/10/2004

PF - Specification of factors
Factors:
Cells Stimlation time Ionomycin

Old range
50000 400000 cells 26h 0.1 2 g/ml

New range
200000 400000 cells 46h 1 2 g/ml

2/10/2004

PF - Specification of responses
It is important to select responses that are relevant according to the experimental goals Many responses is not a problem Here: Signal-to-background ratio, computed as: [(signal-background)/background]

GOAL: As high as possible (Maximize)


2/10/2004

PF - Selection of regression model


A quadratic model was selected
y = 0 + 1x1 + 2x2 + 11x12 + 22x22 + 12x1x2 +...+

With three factors a quadratic model requires a design with 17 experiments (8 + 6 + 3)

2/10/2004

PF - Generation of design and creation of worksheet


The experimenters selected a central composite facecentered, CCF, design in 18 runs Standard design with 8 corner points, 6 axis points, and 4 center-points

2/10/2004

Introd. to response surface methodology (RSM) designs


RSM has been the acronym for response surface methodology, reflecting the use of response surface plots for finding an optimal point In more recent years, however, the re-interpretation response surface modelling has become more prevalent Good RSM designs must
allow the estimation of the parameters of the model with low uncertainty give rise to a model with small prediction error have prediction error independent of direction permit a judgement of the adequacy of the model encode as few experiments as possible

The family of central composite designs meets these demands

2/10/2004

The CCC design in two factors


The composite designs are natural extensions of the two-level full and fractional factorial designs Central Composite Circumscribed

The CCC design consists of three building blocks


(i) regularly arranged corner experiments of a two-level factorial design (ii) symmetrically arrayed star points located on the factor axes, and (iii) repeatedly performed center-points
2/10/2004

10

The CCC design in two factors


Example worksheet of an 11 run CCC design in two factors
The first four rows represent the corner experiments, the next four rows the star (axial) points, and the last three rows the replicated center-points

All factors are mapped in five levels with the CCC design This makes it possible to estimate quadratic terms with great rigor The corner experiments and the axial experiments are all situated on the circumference of a circle with radius 1.41, and therefore the experimental region is symmetrical
2/10/2004

11

The CCC design in three factors


The CCC design in three factors is constructed in a fashion similar to that of the two factor analogue:
(i) eight corner experiments, (ii) six axial experiments, and (iii) three replicated center-points

2/10/2004

12

The CCC design in three factors


(i) eight corner experiments

(ii) six axial experiments

(iii) three replicated center-points

2/10/2004

13

The CCF design in three factors


When it is desirable to maintain the low and high factor levels, and still perform an RSM design, the central composite face-centered (CCF) design is a prudent alternative

MODDE CCF CCC CCF is the recommended design choice for pilot plant and full scale investigations
2/10/2004

14

A comparison of CCC and CCF-designs


Theoretically, the CCF design is slightly inferior to the CCC design:
the CCC design spans a larger volume five levels of each factor also means that the CCC design is better prepared for capturing strong curvature, or even cubic response behavior quadratic model terms are less correlated in the CCC than CCF case

CCC

CCF
2/10/2004

15

Overview of composite designs


It is possible to explore as many as five factors in as few as 29 experiments When moving up to six factors, there is a huge increase in the number of experiments

Number of factors 2 3 4 5 6 7

Number of experiments 8+3 14 + 3 24 + 3 26 + 3 44 + 3 78 + 3

2/10/2004

16

Reporter Gene Assay - Evaluation of raw data


The replicate plot shows that the signal-to-background response variable is numerically much higher than in the screening designs
Investigation: Reporter Gene Assay RSM with CCF Plot of Replications for S/B with Experiment Number labels

Investigation: Reporter Gene Assay RSM with CCF Histogram of S/B


7

Investigation: Reporter Gene Assay RSM with CCF Descriptive Statistics Plot

8
200

14 12 1011 13 16 17 15 18
Count

6 5 4 3 2

200

7
150 S/B

6 4 5 3 9
1 2 3 4 5 6 7 8 9

150

100

100

50

1
10 11 12 13 14 15

50

0 17 62 107 Bins 152 197 242


S/B

Replicate Index
MODDE 7 - 2004-02-04 09:59:25

S/B
Min: 17.9 Max: 221.4 Median: 107.8 Mean: 120.683

MODDE 7 - 2004-02-04 10:00:06

The response is sufficiently well distributed to allow the choice of no transformation

2/10/2004

17

Reporter Gene Assay - Regression analysis


The initial model is good (R2 = 0.91, Q2 = 0.56, MVal = 0.87 and Rep = 0.79), albeit with an undesirably large gap between R2 and Q2 Some insignificant model terms
R2 Investigation: Reporter Gene Assay RSM with CCF (MLR) Q2 Summary of Fit Model Validity Reproducibility

1.00

0.80

0.60

0.40

0.20

0.00 S/B

Investigation: Reporter Gene Assay RSM with CCF (MLR) Scaled & Centered Coefficients for S/B

N=18 DF=8

Cond. no.=4.4596 Y-miss=0

50

-50

-100 Cel Cel*Cel Ion*Ion Ion Cel*Ion StH*StH Cel*StH StH*Ion StH

N=18 DF=8

R2=0.908 Q2=0.558

R2 Adj.=0.805 RSD=25.3554 Conf. lev.=0.95


MODDE 7 - 2004-02-04 10:02:27

2/10/2004

18

Reporter Gene Assay - Model refinement


R2 Investigation: Reporter Gene Assay RSM with CCF (MLR) Q2 Summary of Fit Model Validity Reproducibility

The two cross-terms Cel*StH and Cel*Ion and the quadratic term StH*StH were omitted The revised model is much better (R2 = 0.89, Q2 = 0.74, MVal = 0.92, Rep = 0.79).

1.00

0.80

0.60

0.40

0.20

0.00 S/B
N=18 DF=11 Cond. no.=4.0089 Y-miss=0

Investigation: Reporter Gene Assay RSM with CCF (MLR) Scaled & Centered Coefficients for S/B

50

-50

Cel

Cel*Cel

Ion*Ion

Ion

N=18 DF=11

R2=0.896 Q2=0.739

R2 Adj.=0.840 RSD=22.9934 Conf. lev.=0.95


MODDE 7 - 2004-02-04 10:04:18

2/10/2004

StH*Ion

StH

19

Reporter Gene Assay - Model refinement


The revised model contains no outliers (below, left), and the size of the residual is independent of the predicted value (below, right), which is good.
Investigation: Reporter Gene Assay RSM with CCF (MLR) S/B with Experiment Number labels
0.98 0.95 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.05 0.02 -4 -3 N-Probability

Investigation: Reporter Gene Assay RSM with CCF (MLR) S/B with Experiment Number labels
Deleted Studentized Residuals 2 1 0 -1 -2 20 40

1 16 17 15 3 7 6 4 10 8 512 14 11 2

1 3 2 9
60 80 100 120

4 10 5 11

16 17 15 6

7 12

8 14

18 13 9
-2 -1

13 18
140 160 180 200 220

Deleted Studentized Residuals


N=18 DF=11 R2=0.896 Q2=0.739 R2 Adj.=0.840 RSD=22.9934
MODDE 7 - 2004-02-04 10:07:04

Predicted
N=18 DF=11 R2=0.896 Q2=0.739 R2 Adj.=0.840 RSD=22.9934
MODDE 7 - 2004-02-04 10:07:27

2/10/2004

20

Reporter Gene Assay - Use of model


The optimum inside the experimental region lies close to the factor combination high Stimulation time (6h), high Ionomycin (2) and intermediate Cells (ca 320000) Region of optimum

2/10/2004

21

What to do after RSM - Introduction


Three primary actions are envisioned
(i) one of the design runs fulfils the goals stated in the problem formulation: this corresponds to the ideal case, and only a couple of verification experiments are needed to establish the usefulness of this factor combination (ii) regression modelling and use of the model for predicting the location of a new promising experimental point (most common case) (iii) supplementation of an existing RSM design with a few extra experiments to support a specific model, e.g., partly cubic

Reporter Gene Assay:


case (ii) by using the optimization routine in MODDE

2/10/2004

22

MODDE optimizer applied to the Reporter Gene Data


Factor settings for interpolation

Response is to be maximized above a minimum value

2/10/2004

23

MODDE optimizer applied to the Reporter Gene Data


Factor co-ordinates for simplex launching Rows 1-5: 23-1 FFD in three most important factors + center point Rows 6-8: Runs from worksheet closest to meet the experimental goals

2/10/2004

24

MODDE optimizer applied to the Reporter Gene Data


Results of first optimization round Evaluation, log (D)
> 0 BAD 0 GOOD < 0 Even better = -10 IDEAL

Below zero means that we are between Target and Min for the response

Run #8 has lowest log(D)

2/10/2004

25

MODDE optimizer applied to the Reporter Gene Data


Second optimization New starting points around run 8

Performed in order to reduce the risk of being trapped by a local minimum or maximum

2/10/2004

26

MODDE optimizer applied to the Reporter Gene Data


Evidently, all five points are predicted to meet our experimental goals In fact, by neglecting some small variation among the decimal digits, we find that the five simplexes have converged to the same point, that is, Cells 320000, StimH = 6h, and Ionomycin = 2 g/ml.

2/10/2004

27

Uncertainties in predicted optimal point


The factor co-ordinates were transferred to the prediction list. This list shows that the predicted optimal S/N-value is 260 40

The relevance of the above factor combination was tested in a final robustness testing design.

2/10/2004

28

What we have learnt


The composite designs CCC and CCF are natural extensions of the twolevel full and fractional factorial designs A composite design consists of three building blocks,
(i) regularly arranged corner experiments of a two-level factorial design, (ii) symmetrically arrayed star points located on the factor axes, and (iii) repeatedly performed center-points.

The CCC and CCF designs differ in how the star points, or axis points, are positioned Both CCC and CCF support quadratic models
2/10/2004

29

Summary of the Reporter Gene Assay application


Three important factors found in the screening, Cells, Ionomycin and Stimulation Time (StimH), were varied in a CCF encompassing 18 experiments A very good model, without outliers resulted The interpretation of this model revealed the factor combination Cells 320000, Stimulation Time = 6 and Ionomycin = 2 as optimal inside the investigated experimental domain.

2/10/2004

30

Design of Experiments (DOE) Pharma Applications


Chapter 9 Experimental objective: Robustness testing Illustration: General Example 3 (HPLC)

Contents
Introduction to robustness testing General Example 3
Background Steps in problem formulation

Common designs in robustness testing


Fractional factorial designs Plackett-Burman designs

General Example 3
Evaluation of raw data Regression analysis and model interpretation

Four limiting cases of robustness testing

2/10/2004

Introduction to robustness testing


Minimize the systems sensitivity to small changes in critical factors A robustness test is usually carried out before the release of an almost finished product, or analysis system, as a last test to ensure quality Set point: factor combination which is currently used for running the system Aim of robustness testing:
- to explore robustness close to the set point

2/10/2004

Background to General Example 3


HPLC is used in routine analysis of complex mixtures in pharmaceutical industry Example:
5 factors were varied in a 12 run experimental design Responses: capacity factors of two analytes and resolution between two adjacent peaks
10000

H 310/83 (R) (I) (S)

1a

1m H 309/40

8000

6000

(II)
4000

2000

10

12

14 min

2/10/2004

PF - Selection of experimental objective


The objectives in robustness testing are:
to identify responses which are robust to small factor changes to identify responses that are sensitive to small factors changes to understand which factors that need to be better controlled to achieve robustness

Small factor changes ???


variation that may normally occur in the laboratory variation in raw materials, equipment, ...

2/10/2004

PF- Specification of factors


Four quantitative factors and one qualitative factor

2/10/2004

PF- Specification of responses


Three responses:
Capacity factor 1, k1 Capacity factor 2, k2 Resolution, Res1
H 310/83 1a
10000

H 309/40 1m

(R) (I) (S)

8000

6000

(II)
4000

2000

Specifications:
Res1 should be >1.5 (complete baseline separation) k1 N/A k2 N/A

10

12

14 min

2/10/2004

PF - Selection of regression model


We distinguish between three main types of polynomial models
linear interaction quadratic y = b0 + b1x1 + b2x2 +...+ e y = b0 + b1x1 + b2x2 + b12x1x2 +...+ e y = b0 + b1x1 + b2x2 + + b11x12 + b22x22 + b12x1x2 +...+ e

In robustness testing, a linear model is usually selected

2/10/2004

PF - Generation of design and creation of worksheet


The ideal result in a robustness testing study is identical response values for each trial low-resolution screening design useful A 25-2 design augmented with four center-points was used

2/10/2004

Geometry of HPLC-Rob design

2/10/2004

10

How to deal with center-points in case of qual. factors


If all factors are quantitative it is easy to add center points If one factor is qualitative one may position center points centered on two surfaces of the cube

3 Center points

2 "Center" points

100 200 400 TypeA TypeB

100

50

50

2/10/2004

11

Common designs in robustness testing - Part I


Fractional factorial designs
Resolution III
5 6 7 8 Eggpowder
Sho rten ing

100

50 100

1 200
7 8 Eggpowder 100 7

Flour

2 400

50

8 Eggpowder 4
Sho rten ing

100

4
Sho rten ing

50 100 1 200

50 100

1 200

Flour

2 400

50

Flour

2 400

50

2/10/2004

12

Common designs in robustness testing - Part II


Plackett-Burman designs
two level designs support linear models requires very few experimental runs per factor also used in screening

In some cases a PB-design is a specific fraction of a factorial design Number of runs a multiple of 4 PB designs of 12, 20, and 24 runs of particular interest

2/10/2004

13

Common designs in robustness testing - Part III


Example shows the 12 run PB-design Recommended use of PB-designs
No of factors 5 9 13 17 21 25 29 Maximum No of runs No of factors 7 8 Use Frac Fac 11 12 15 16 Use Frac Fac 19 20 23 24 27 28 31 32 Use Frac Fac

Always add 3 center-points


2/10/2004

14

HPLC application - Evaluation of raw data


The numerical variation in the resolution response is small The lowest measured resolution is 1.75 and the highest 1.89 This means that Res1 is robust (inside specification)

2/10/2004

15

HPLC application - Evaluation of raw data


All three responses are nearly normally distributed

2/10/2004

16

HPLC application - Evaluation of raw data


Low condition number, which means that we have a good design Responses are strongly correlated

2/10/2004

17

HPLC application - Regression analysis


Is model significant or not? Model refinement is usually not carried out The low Q2 of 0.12 for Res1 suggests that this response is robust
Investigation: HPLC Robustness (MLR) Summary of Fit
1.00 0.80 0.60 0.40 0.20 0.00
R2 Q2 Model Validity Reproducibility

k1
N=12 DF=6

k2
Cond. no.=1.2289 Y-miss=0

Res1

2/10/2004

18

Four limiting cases of robustness testing


Nature of robustness Is regression model significant, or not? Are responses inside or outside specifications? Four limiting cases Inside specification/Significant model Inside specification/Non-significant model Outside specification/Significant model Outside specification/Non-significant model

2/10/2004

19

First limiting case - Inside specification/Significant model


All the measured values are inside the specification, that is, above 1.5 Regression model significant: weak Q2 and significant term of AcN
Investigation: HPLC Robustness Plot of Replications for Res1 with Experiment Number labels
2.0
1.00

Investigation: HPLC Robustness (MLR) Summary of Fit

R2 Q2 Model Validity Reproducibility

Investigation: HPLC Robustness (MLR) Scaled & Centered Coefficients for Res1 (Extended)
0.040 0.020 0.000 -0.020 -0.040

1.9 1.8 Res1 1.7 1.6 1.5 1.4 1

1 2

7 8 5 1 90 4 6

12 11

0.80 0.60 0.40 0.20

Co(ColA)

pH

Ac

10 11

0.00

Replicate Index
MODDE 7 - 2004-01-22 15:00:10

k1
N=12 DF=6

k2
Cond. no.=1.2289 Y-miss=0

Res1

N=12 DF=6

R2=0.772 Q2=0.121

R2 Adj.=0.582 RSD=0.0248 Conf. lev.=0.95


MODDE 7 - 2004-01-22 14:26:41

Extreme cases predictions (what is maximum variation ?):

2/10/2004

Co(ColB)

OS

Te

20

Second limiting case - Inside spec/Non-significant model


Ideal outcome Res1 can be used to illustrate this Model is non-significant according to ANOVA; p-value of 0.059 exceeds 0.05
Investigation: itdoe_roblimcases (MLR) Q2 Model Validity Summary of Fit Reproducibility
1.00 0.80 0.60
0.60 1.00 0.80
R2

Investigation: HPLC Robustness (MLR) Summary of Fit

R2 Q2 Model Validity Reproducibility

0.40 0.20 0.00 -0.20

0.40 0.20 0.00

k1
N=12 DF=6

k2
Cond. no.=1.2289 Y-miss=0

Res1

vetific
N=11 DF=5 Cond. no.=1.1726 Y-miss=0

2/10/2004

21

Third limiting case - Outside spec/Significant model


Investigation: HPLC Robustness (MLR)

k2, used to illustrate this limiting case; temporary spec. between 2.7 and 3.3 Coefficients used for understanding two things, namely (i) how to get k2 inside specification and (ii) how to produce a nonsignificant model (how to get the second limiting case ?) Rows 2-3: extreme cases Rows 4-5: how to enter inside specifications Rows 6-7: how to get a nonsignificant model

Investigation: HPLC Robustness (MLR) Summary of Fit


1.00 0.80 0.60

R2 Q2 Model Validity Reproducibility

Scaled & Centered Coefficients for k2 (Extended)


0.10 0.00 -0.10

0.40

-0.20
0.20

-0.30
0.00 k1
N=12 DF=6

k2
Cond. no.=1.2289 Y-miss=0

Res1

Co(ColA)

pH

Ac

N=12 DF=6

R2=0.989 Q2=0.959

R2 Adj.=0.981 RSD=0.0418 Conf. lev.=0.95


MODDE 7 - 2004-01-22 15:06:16

2/10/2004

Co(ColB)

OS

Te

22

Fourth limiting case - Outside spec/Non-significant model


Most complex limiting case, as many outcomes are conceivable:
one strong outlier (left) replicated center-points have much higher response values (middle) one experiment deviates and falls outside specification (right) many more...
Inv estigation: itdoe_roblimcases

Investigation: itdoe_roblimcases Plot of Replications for vetific with Experiment Number labels

Investigation: itdoe_roblimcases Plot of Replications for vetific with Experiment Number labels

Plot of Replications for v etific with Experiment Number labels

45 vetific 40 35 30 25 1

10 9 11
70 vetific

10 9 11

60

4
2 3 4 5 6 7 8 9 Replicate Index
MOD D E 7 - 2003-1 1-17 11:5 8:00

50 1

1
2

2
3

3
4

4
5

5
6

6
7

7
8

8
9

100 90 80 70 60 50 40 30 20 10 0

3 1 2 4 5 6 7 8 10 11 9

vetific

Replicate Index
MODDE 7 - 2003-11-17 11:59:51

Replicate Index
MODDE 7 - 2003-11-17 12:01:59

2/10/2004

23

What we have learnt


We have discussed:
the experimental objective of robustness testing common designs in robustness testing the HPLC application the problem formulation steps of this example the evaluation of raw data, the regression analysis, and model interpretation, related to the HPLC example four limiting cases of robustness testing what to do to possibly convert a non-robust system to become a robust one

2/10/2004

24

Design of Experiments (DOE) Pharma Applications


Chapter 10 Conclusions

Key features of DOE


How to make experiments efficiently
Span the experimental domain with the aid of a suitable experimental design

How to analyze the data


Use good statistical tools to evaluate experimental results

How to interpret the results


With the use of user-friendly PC-based graphical facilities

How to convert modelling results into concrete actions/decisions


MODDE optimizer & verifying experiments

2/10/2004

Design of Experiments (DOE)


Maximizes the information content from experimental series meanwhile keeping the number of experiments low A) Prepare a set of representative experiments, in which all factors under investigation are varied simultaneously

B) From the set of experiments, a model is derived which captures the relation between factor settings and experimental result (responses).

Experimental result = (factor settings)

2/10/2004

DATA
Measurement Data

INFORMATION INFORMATION
Decision Action

Information Knowledge

DoE DoE Design Design of of Experiments Experiments


2/10/2004

MVA MVA Multivariate Multivariate Data Data Analysis Analysis


4

Multivariate Data Analysis (MVA)


Captures the systematic parts in Mb data sets and visualizes the information in plots and graphs

INFORMATION DATA

Multivariate Multivariate Modeling Modeling

2/10/2004

Design of Experiments (DOE) Pharma Applications


Chapter 11 Additional Topics

Contents
D-optimal design Blocking the experimental plan Mixture design Other RSM designs Multilevel qualitative factors The Taguchi approach to robust design Simultaneous optimization of several responses fitted with different models Partial least squares projections to latent structures, PLS Design in latent variables

2/10/2004

Additional Topics

D-optimal design

Contents
Introduction to D-optimal design Evaluation criteria
G-efficiency Condition number

Typical examples of D-optimal design

2/10/2004

When to use D-optimal design - Irregular regions


Irregular experimental region in screening optimization mixture design
A
Factor B Factor B Factor B Factor A

Factor A

Factor A

Factor B

Factor B

Factor A

Factor A

Factor B

Factor A

B
2/10/2004

C
5

When to use D-optimal design - Qualitative factors


Multi-level qualitative factors in screening
Fa c

Factor A Level 1 Level 2 Level 3 Level 4

Sett 1

Factor 3

Factor 1

r cto Fa

Factor 1

r cto Fa

Qualitative factor, level A

Qualitative factor, level B

2/10/2004

Factor 3

Optimization designs with qualitative factors

to r

Sett 2

Factor C

+1

-1 Sett 3

When to use D-optimal design - Special requirements


Special number of runs
# Runs # Center-points 8 3 11 3 12 3 16 3 # Total runs 11 14 15 19 Design type Frac Fac D-optimal PB Frac Fac

#Factors CCC/CCF BB 5 26 + 3 40 + 3 6 44 + 3 48 + 3 7 78 + 3 56 + 3

D-opt 26 + 3 35 + 3 43 + 3

Model upgrading

y = b0 + b1x1 + b2x2 + b3x3+ b12x1x2 + b13x1x3 + b23x2x3+ b22x22 + e b11x12 b33x32

y = b0 + b1x1 + b2x2 + b3x3+ b12x1x2 + b13x1x3 + b23x2x3+ b11x12 + b22x22 + b33x32 + b111x13 + e

2/10/2004

When to use D-optimal design - Inclusions


Inclusions of existing experimental information screening optimization

2/10/2004

When to use D-opt. design - Process and Mixture Factors


P rocess and M ixture F actors

When making a combined design for process and mixture factors LoafVolume is a typical example where D-optimal design could have been utilized

2/10/2004

Introduction to D-optimal design


A D-optimal design is a computer generated design, and consists of the best subset of experiments selected from the candidate set For a given model, Y = X + , the following can be said regarding the D-optimal approach:
the selected runs maximize the determinant of the matrix X'X these experiments span the largest volume possible in the experimental region

A D-optimal design can be tailored to support an irregular experimental region, or a very complex problem set-up (process + mixture)

2/10/2004

10

A small D-optimal example


Example: 22 full factorial design with factors x1 and x2

run 1 2 3 4

x1 -1 1 -1 1

x2 -1 -1 1 1

Model y = b0 + b1x1 + b2x2 + + b12x1x2 + e

Model in matrix form y = Xb + e b = (XX)-1Xy

2/10/2004

11

D-optimal example, the Covariance matrix (XX)-1


X
1 1 1 1 -1 1 -1 1 -1 -1 1 1 1 -1 -1 1 1 -1 -1 1 1 1 -1 -1

X
1 -1 1 -1 1 1 1 1

(XX)
4 0 0 0 0 4 0 0 0 0 4 0 0 0 0 4 0.25 0 0 0

(XX)-1
0 0.25 0 0 0 0 0.25 0 0 0 0 0.25

Precision in b from:

(XX)-1 * RSD * t smallest (XX)-1 largest XX


12

2/10/2004

A second small D-optimal example


Problem: two factors (x1/x2) varied in three levels Proposed model:
y = b0 + b1x1 + b2x2 + e model needs 3 DF

det=0
1 1 1

det=1

(9! / (3!*6!)) = 84 ways of selecting 3 trials out of 9 Maximize the determinant det(XX) Best precision in estimated regression coefficients with det = 16
2/10/2004

-1

-1

-1

-1

-1 -1 1 0 1

det=4
1 1

det=9

det=16

-1 -1 0 1

-1

-1

-1

-1

13

How to compute a determinant


Example: experiments spread according to a determinant of 4
1

X
1 1 1
3 -1 0 -1 1 0

X
0 -1 1
0 0 2 3 -1 0

-1

-1

XX
-1 1 0 0 0 2
3 -1 0 -1 1 0

-1 0 0

1 -1 0
-1 1 0

1 0 -1
3 -1 0

1 0 1
-1 1 0

3 -1 0
0 0 2

(3*1*2) + (-1*0*0) + (0*-1*0) - (0*1*0) - (0*0*3) - (2*-1*-1) = 4


2/10/2004

14

Features of the D-optimal approach


Assumes that the selected regression model is "correct" and "true Sensitive to model choice Potential terms may be added to protect against this sensitivity

2/10/2004

15

Evaluation criteria
Two common evaluation criteria: Condition number
- ratio of largest to smallest singular value of X - a measure of sphericity - 1 is lower (ideal) limit, denotes orthogonal design

G-efficiency
- computed as Geff = 100*p/n*d - compares the efficiency of a D-optimal design to that of a fractional factorial design - 100% is the upper limit and designates that a fractional factorial design was obtained - above 60-70% is recommended

2/10/2004

16

Applications of D-optimal design - Model updating


Model updating is common after screening, when it is necessary to unconfound two-factor interactions Example: Laser welding Po*Sp two-factor interaction needed, but confounded with No*Ro Fold-over leads to 11 new experiments Selective updating possible with D-optimal design
2/10/2004

17

Applications of D-optimal design - Model updating


Step 1: Make a copy of current investigation

Step 2: In the new application, do File/Complement design (opens a wizard)

2/10/2004

18

Applications of D-optimal design - Model updating

Step 3: Select D-optimal design

2/10/2004

19

Applications of D-optimal design - Model updating


Step 4: Select the number of additional runs; to unconfound two two-factor interactions 4 extra experiments are appropriate

2/10/2004

20

Applications of D-optimal design - Model updating


Step 5: Edit Model to add the interesting terms.

2/10/2004

21

Applications of D-optimal design - Model updating


Step 6: Select the number of additional center-points and name the new investigation

2/10/2004

22

Applications of D-optimal design - Model updating


Step 7: Select Screening and 15 + 2 runs as lead numbers

2/10/2004

23

Applications of D-optimal design - Model updating


Step 8: Generate Doptimal design with 15 runs (here: five variants)

2/10/2004

24

Applications of D-optimal design - Model updating

Step 9: Evaluate the resulting designs. In this case all five alternatives are identical

2/10/2004

25

Applications of D-optimal design - Model updating


Step 10: Generate the selected design

Design tailor-made to resolve Po*Sp and No*Ro


2/10/2004

26

Applications of D-opt. design - Multi-level qual. factors


Example: Cotton cultivation

Full factorial design has 4*7 = 28 runs (very many in screening) A linear model is sufficient in screening:
Yield = b0 + b1Center + b2Variety + e constant term: 1 DF linear term of Center (7 levels): 6 DF linear term of Variety (4 levels): 3 DF extra: 5 DF Total: 15 DF

2/10/2004

27

Applications of D-opt. design - Multi-level qual. factors


D-opt designs with 14, 15, and 16 runs were generated 5 versions for each N 14 runs (above) balancing with regards to Center 16 runs (below) balancing with regards to Variety

C1........C4........C7 V1 V2 V3 V4
C1........C4........C7 V1 V2 V3 V4

C1........C4........C7 V1 V2 V3 V4

C1........C4........C7 V1 V2 V3 V4

2/10/2004

28

Applications of D-opt. design - Combined design


D-opt design is useful for combined designs of process and mixture factors Example: Bubble formation 2 process factors 4 mixture factors Response: lifetime of bubbles

2/10/2004

29

Applications of D-opt. design - Combined design


Objective screening Selected model was one with linear and interaction terms
interactions allowed among process factors between the process and mixture factors but not among the mixture factors themselves

Necessary DF (= 20) are calculated as follows:


- 1 DF for the constant term - 2 DF for the linear process terms - 3 DF for the linear mixture terms, - 1 DF for the process*process interaction - 6 (2*3) DF for the process*mixture interactions - 2 DF for the relational constraint - 5 extra DF
2/10/2004

30

Applications of D-opt. design - Combined design


Recommended procedure: find a lead number of design runs, N generate designs with N 4 runs make five alternative versions for each level of N 4 runs We generated 45 alternative D-optimal designs (N = 16 to N = 24) Selected: N=16 showing a G-efficiency of 76.1% and a condition number of 2.7 We have obtained a good design

2/10/2004

31

Applications of D-opt. design - Combined design


2 series of 4 replicates were added, making the entire design comprise 16 + 8 = 24 experiments Screening:
span in lifetime 11 362 sec

Optimization:
span in lifetime 6.02 - 22.28 min

Key to prolonging bubble lifetime: substantial increase of glycerol

2/10/2004

32

Summary
We have discussed: When to use D-optimal design What D-optimal design is Computational and geometrical aspects of the D-optimality criterion The condition number as evaluation criterion of D-optimality The G-efficiency as evaluation criterion of D-optimality Applications of D-optimal design
model updating multi-level qualitative factors combined designs of process and mixture factors

2/10/2004

33

Additional Topics

Blocking the Experimental Plan

Contents
Introduction to blocking When to use blocking Blocking in MODDE
Block size Number of blocks Blockable designs Recoding of block factors

Chemical example Summary

2/10/2004

35

Introduction to blocking
Randomization is used as a safeguard against unwanted sources of extraneous systematic variability When you cannot conduct all the experiments in a homogeneous way randomizing your experiments may not be sufficient to deal with such variability Blocking the experiments in synchronized groups may help to decrease the impact of such variability on the effects of the factors

2/10/2004

36

When to use blocking


Suppose you are running a full factorial design in 5 factors and 32 runs Batch size of raw material permits 8 experiments per batch You may then want to run your experiments in 4 blocks, each composed of 8 runs using homogeneous starting material Orthogonal Blocking makes it possible to divide the 32 experiments into 4 blocks of 8 runs, such that the difference between the blocks (the raw material) does not affect the estimate of the factor effects

2/10/2004

37

Example: Blocking_Scr
With 25 design there are two options, with or without block interactions:

2/10/2004

38

Example: Blocking_Scr
With block interactions:

2/10/2004

39

Example: Blocking_Scr
Without block interactions:

2/10/2004

40

Example: Blocking_Scr
Design region (same with or without block interactions) Each block occurs twice in each cornercube

2/10/2004

41

Blocking in MODDE
MODDE supports orthogonal blocking for two-level full and fractional factorials, CCC, PB, and BB-designs (Note: CCF not blockable!) MODDE also supports blocking of D-optimal designs provided that the number of design runs is a multiple of the number of blocks (Note: blocks in D-optimal designs are usually not orthogonal to the factors)

2/10/2004

42

Blocking of full and fractional factorial designs


Maximum number of blocks is 8 with minimum block size of 4. One blocking factor is used for 2 blocks, 2 for 4 blocks, and 3 for 8 blocks The block effects consist of the effects of the blocking factors and all their interactions Hence with 8 blocks there are 7 block effects consuming 7 DF Pseudo-resolution: The resolution of the design when all block effects (blocking factors and all their interactions) are treated as main effects under the assumption that there are no interactions between blocks and main effects, or blocks and main effects interactions

2/10/2004

43

Blocking of other designs


PB: Can only be split into two blocks by introducing one block factor, and using its signs to split the design. CCC: Each block must be a first-order orthogonal block. Can be split into two blocks, the cube portion and the star portion. The cube portion can sometimes be split into smaller blocks. Each block must have the same number of center-points. BB: BB3 not blockable; BB4 3 blocks, BB5-BB7 2 blocks, BB8 N/A. D-optimal design: Blocks must have equal size and the total number of runs must be a multiple of the number of blocks. Interactions between the block factor and other factors disallowed.

2/10/2004

44

Recoding the blocking factors


Blocks are assigned according to the combination of signs of the blocking factors To generate 4 blocks the following recoding is done $B1 + + $B2 + + Block no 1 2 3 4

2/10/2004

45

Example: Blocking_RSM
Chemical example with objective to maximize yield. CCC design in two factors where cube and star portions were run at different time points.

2/10/2004

46

How to specify the problem in MODDE

CCC design, which is blockable 2 blocks Equal number of center-points in each block

2/10/2004

47

Evaluation of raw data


Excellent input data!

Investigation: Blocking_RSM Plot of Replications for Yield with Experiment Number labels
B1 B2

Investigation: Blocking_RSM Histogram of Yield


5

90 88 86 Yield 84 82 80 78 1

5 6
Count

11 12 7 8 9 10

1
2 3 4

4
5 6 7 8 9 10 Replicate Index
MODDE 7 - 2004-02-04 15:05:16

0 77 81 85 Bins 89 93

MODDE 7 - 2004-02-04 15:04:39

2/10/2004

48

Regression model building


Strong regression model with
= 0.98 Q2 = 0.95 MVal = 0.99 Rep = 0.88 R2
Investigation: Blocking_RSM (MLR) Summary of Fit
1.00
R2 Q2 Model Validity Reproducibility

0.80

0.60

0.40

0.20

0.00 Yield
N=12 DF=3 Cond. no.=3.1808 Y-miss=0

Investigation: Blocking_RSM (MLR)

MODDE 7 - 2004-02-04 15:06:47

Scaled & Centered Coefficients for Yield (Extended)

Temp*$Blo(B1)

N=12 DF=3

R2=0.978 Q2=0.949

R2 Adj.=0.919 RSD=1.2524 Conf. lev.=0.95


MODDE 7 - 2004-02-06 13:38:08

2/10/2004

Temp*$Blo(B2)

$Blo(B1)

$Blo(B2)

Tim*$Blo(B1)

Tim*$Blo(B2)

Tim

Tim*Tim

Temp

Temp*Temp

Tim*Temp

There is some evidence that slightly lower yields were obtained in the second block of six runs

2 0 -2 -4 -6 g

49

Use of model
Response surface plots visualise that higher yields were obtained in the first experimental campaign (when running the cube portion)

2/10/2004

50

What have we learnt


Blocking introduces extra factors in the design this reduces residual DF and design resolution You should only block when the extraneous source of variability is large and cannot be controlled by randomizing the run order

2/10/2004

51

Additional Topics

Mixture Design

Contents
Introduction to mixture design A working strategy for mixture design
Example 1: Tablet formulation (regular experimental region) Example 2: Bubble formation - screening (irregular experimental region) Example 3: Bubble formation - optimization (irregular experimental region)

2/10/2004

53

Introduction to mixture design


Example, Rocket Propellant: Three components were mixed together to form a rocket propellant. The purpose was to find a propellant with an elasticity of > 2900 Formulation factors
Binder Oxidizer Fuel 0.2-0.4 0.4-0.6 0.2-0.4

What is the "problem" with the worksheet ? Each row sums to 1.0 !!!

Consequences for the design ????


2/10/2004

54

Introduction to mixture design


What does the mixture design look like? The experimental domain with 01 bounds on the factors takes the form of a triangle Here we are investigating a limited region of the available experimental domain
Oxidiser Fuel Binder

2/10/2004

55

Introduction to mixture design


A quadratic model was used
Investigation: Rocket (PLS, comp.=2) Scaled & Centered Coefficients for Elasticity
200 100 0 -100 -200 Oxi*Oxi Bin*Oxi Oxi Fue*Fue Bin*Fue Oxi*Fue Fue Bin Bin*Bin
R2=0.801 Q2=0.249

N=10 DF=4

R2 Adj.=0.553 RSD=160.1071 Conf. lev.=0.95


MODDE 7 - 2004-01-23 10:39:16

The model predicts an area in which an elasticity exceeding 2900 is found

Coefficients show that binder and fuel have the strongest impact on elasticity

We are able to quantitatively describe elasticity in terms of three varied ingredients


2/10/2004

56

A Working Strategy for Mixture Design


1. D efinition of factors and bounds 10. U se of m odel

Illustrations: Tablet preparation & Bubble formation

2. Selection of experim ental objective and m ixture m odel

9. V isualization of m odelling results

3. Selection of candidate set

8. A nalysis of data and evaluation of m odel

4. G eneration of design

7. E xecution of design

5. E valuation of size and shape of m ixture region

6. D efinition of reference m ixture

2/10/2004

57

Tablet: - 1. Definition of factors and bounds


Aim: To investigate tablet preparation and find out which factors that regulate the release rate of an active substance Mixture Factors:
Cellulose (0 - 1) Lactose (0 - 1) Phosphate (0 - 1) All factors sum to 100% (mixture constraint) Bounds display consistency

Constraint:
No other extra constraint

Response:
Release rate of the active substance (to be maximized)

2/10/2004

58

Tablet: - 1. Definition of factors and bounds


Checking for consistency of bounds Example:
0.1 A 0.5 0.1 B 0.3 0.2 C 0.4.

A LB UB

These bounds are inconsistent L*A After a simple arithmetic check (done automatically in the software) the new bounds become: LA
0.3 A 0.5 0.1 B 0.3 0.2 C 0.4.
2/10/2004

UA

LC

UC

59

Tablet: - 2. Selection of experimental objective and mixture model


Experimental objective:
Optimization

Mixture model:
Quadratic

y = 0 + 1XMF1 + 2XMF2 + 3XMF3 + 11XMF12 + 22XMF22 + 33XMF32 + 12XMF1*XMF2 + 13XMF1XMF3 + 23XMF2XMF3 + Cox model type with constraints imposed on the regression coefficients

2/10/2004

60

Tablet: - 3. Selection of candidate set


The candidate set is the pool of theoretically possible and meaningful experiments, from which the actual design is selected Here, the candidate set is small:
3 extreme vertices 3 centers of edges 3 interior points 1 overall centroid

Undesired experiments may be deleted from the candidate set prior to generation of the design

2/10/2004

61

Tablet: - 4. Generation of design


The design should contain experiments which are informative and map the experimental region as well as possible In this case the experimental region is regular and then the Simplex Centroid design is applicable

2/10/2004

62

Tablet: - 5. Evaluation of size and shape of mixture region


Introduction to regular mixture regions
A (1/0/0)
1 _ 1 _ 1 _ 3 3 3

/ /

/0. 5/0

0.5 0 /0/

0.5

.5

B (0/1/0)

0/0.5/0.5

C (0/0/1)

1
X1 + X2 = 1

X2 0
2/10/2004

X1

1
63

Tablet: - 5. Evaluation of size and shape of mixture region


In MODDE: Show/Design Region Example: Bubbles (see more info. later)

1. D efinition of factors and bounds

10. U se of m odel

2. Selection of experim ental objective and m ixture m odel

9. V isualization of m odelling results

3. Selection of candidate set

8. A nalysis of data and evaluation of m odel

4. G eneration of design

7. E xecution of design

5. E valuation of size and shape of m ixture region

6. D efinition of reference m ixture

Useful approach to understand how and where the experiments are laid out

Glycerol = 0.0
2/10/2004

Glycerol = 0.1

Glycerol = 0.2
64

Tablet: - 5. Evaluation of size and shape of mixture region


Alternative designs for regular region (choice of model will be important)

Linear

Quadratic

Special Cubic

2/10/2004

65

Tablet: - 6. Definition of reference mixture


The reference mixture is used to anchor the mathematical model - easy to find for regular regions (overall centroid)
1 _ 1 _ 1 _ 3 3 3

A (1/0/0)
/ /

0.5

Strongly irregular regions require an efficient algorithm to find overall centroid Serves the same function as the centerpoint does in process design

/0. 5/0

0.5 0 /0/ .5

B (0/1/0)

0/0.5/0.5

C (0/0/1)

Tablet preparation: 1/3,1/3,1/3

2/10/2004

66

Tablet: - 7. Execution of design


Important to carry out experiments in random order This is done in order to break down any systematic time trend to become a non-important and random unsystematic variation

2/10/2004

67

Tablet: - 8. Analysis of data and evaluation of model


Analysis of data with PLS
Investigation: Waaler_rsm (PLS, comp.=3) Summary of Fit
1.00
R2 Q2

Investigation: Waaler_rsm (PLS, comp.=3) release with Experiment Number labels


0.95 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.05 N-Probability

0.80

0.60

0.40

1 7 2 3 10
-1 0

9 6

4 5

0.20

0.00 release
N=10 DF=4 Cond. no.=7.4174 Y-miss=0

Standardized Residuals
N=10 DF=4 R2=0.985 Q2=0.553 R2 Adj.=0.966 RSD=18.7170
MODDE 7 - 2004-01-23 11:02:43

Exp #10 is a probable outlier - Should be re-tested If #10 is deleted and model refitted, Q2 improves (from 0.55 to 0.69) indicates a more valid model
2/10/2004

68

Tablet: - 9. Visualization of modelling results


Investigation: Waaler_rsm (PLS, comp.=3) Scaled & Centered Coefficients for release

100

50 min

-50

-100 la la*la ce ce*ce ce*la ph*ph ce*ph ph la*ph

N=10 DF=4

R2=0.985 Q2=0.553

R2 Adj.=0.966 RSD=18.7170 Conf. lev.=0.95


MODDE 7 - 2004-01-23 11:03:19

Regression coefficients

Tri-linear contour plot

2/10/2004

69

Tablet: - 10. Use of model


Use of verifying experiments
Pred No cellulose lactose phosphate release (obs) 1 0.32 0 0.68 --2 0.5 0.125 0.375 370 3 0.333 0 0.667 340 4 0.667 0 0.333 345 release(pred) Lower Upper 363 322 404 293 262 324 363 322 405 320 278 361

Model predicts well except for blend 0.5/0.125/0.375 This strange experiment should be repeated and the model possibly updated with this new information
2/10/2004

70

BubbleScr: - 1. Definition of factors and bounds


Aim: To investigate bubble formation and find out which factors that dominate bubble lifetime Process Factors:
Temperature (7 - 21C; refrigerator/kitchen temperature) Time (1 - 13 - 25h) Tap Water, Ume (0.4 - 0.8) Glycerol, APOTEKETS (15% water content / 0.0 - 0.2)

Constraint:
0.2 DWL1 + DWL2 0.5

Response:
Lifetime of bubbles (sec) obtained with childrens bubble wand. Time until bursting was measured for bubbles of 4-5 cm size (diameter)

Mixture Factors:
Dish-washing liquid 1, SKONA, ICA (0 - 0.4) Dish-washing liquid 2, NEUTRAL, ADACO (0 - 0.4)

2/10/2004

71

BubbleScr: - 4. Generation of design


Best design with N = 16 (Geff = 76%, CondNo = 2.7) 2 series of 4 replicates were added 24 runs

2/10/2004

72

BubbleScr: - 8. Analysis of data and evaluation of model


Investigation: Bubb_scr

PLS analysis
Lifetime~

Plot of Replications for Lifetime~ with Experiment Number labels


1.00

Investigation: Bubb_scr (PLS, comp.=2) Summary of Fit

R2 Q2 Model Validity Reproducibility

2.60 2.40 2.20 2.00 1.80 1.60 1.40 1.20 1.00

9 1 4 8 2 3 6 7
Replicate Index
MODDE 7 - 2004-01-23 11:15:36

13 19 20 17 18 16 14 15 23 21 22 24

0.80

0.60

12 11 10

0.40

0.20

0.00 Lifetime~

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22

Investigation: Bubb_scr (PLS, comp.=2) N=24 Cond. no.=2.1537


DF=18 Y-miss=0

Lifetime~ with Experiment Number labels


0.98 0.95 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.05 0.02 N-Probability

12

2 10
-1

14 4 24 86 16

20 13 5 23 17 1 3 11 9 18 21 22 7

19

15

0 Standardized Residuals

N=24 DF=18

R2=0.796 Q2=0.640

R2 Adj.=0.739 RSD=0.2018
MODDE 7 - 2004-01-23 11:23:22

2/10/2004

73

BubbleScr: - 9. Visualization of modelling results


Investigation: Bubb_scr (PLS, comp.=2) Scaled & Centered Coefficients for Lifetime~
0.30 0.20 0.10 0.00 -0.10 -0.20 Ti DW1 DW2 Gly Te Wa

Glycerol = 0.2 Temp = 14 Time = 13

Reference mixture 0.2 / 0.2 / 0.5 / 0.1

N=24 DF=18

R2=0.796 Q2=0.640

R2 Adj.=0.739 RSD=0.2018 Conf. lev.=0.95


MODDE 7 - 2004-01-23 11:24:53

Regression coefficients

Tri-linear contour plot

2/10/2004

74

BubbleScr: - 10. Use of model


MODDE optimizer was used to propose two verifying experiments
Temp 7 7 Time 25 49 DWL1 0.2 0.4 DWL2 0.2 0 Water 0.3 0.3 Glycerol Lifetime Lower Upper 0.3 570.196 300.513 1081.893 0.3 1243.664 456.962 3384.745

Verifying experiment #1 Temp = 7 Time = 25 Mixture = 0.2 / 0.2 / 0.3 / 0.3 Resp 1 = 1120 sec (18 min 40 sec)

Verifying experiment #2 Temp = 7 Time = 49 Mixture = 0.4 / 0.0 / 0.3 / 0.3 Resp 1 = 810 sec (13 min 30 sec)

2/10/2004

75

BubbleOpt: - 1. Definition of factors and bounds


Verifying experiment #1was used to adjust the bounds of the four mixture factors Process Factors:
Temperature kept constant (+7C) Time kept constant at 25h Tap Water, Ume (0.2 - 0.4) Glycerol, APOTEKETS (15% water content / 0.2 - 0.4)

Constraint: 0.3 DWL1 + DWL2 0.5 Response: Lifetime of bubbles (sec) obtained
with childrens bubble wand. Time until bursting was measured for bubbles of 4-5 cm size (diameter)

Mixture Factors: Dish-washing liquid 1, SKONA,


ICA (0.1 - 0.3) Dish-washing liquid 2, NEUTRAL, ADACO (0.1 - 0.3)
2/10/2004

76

BubbleOpt: - 4. Generation of design


Selected design with 24 (20 + 4) runs Geff = 83%, CondNo = 16.8

2/10/2004

77

BubbleOpt: - 8. Analysis of data and evaluation of model


Investigation: Bubb_rsm

PLS analysis
Lifetime~

Plot of Replications for Lifetime~ with Experiment Number labels

Investigation: Bubb_rsm (PLS, comp.=2) Summary of Fit


1.00

R2 Q2 Model Validity Reproducibility

3.10

1 2

8 11 5 4 3 6 7 9
8

1314

16 17

15

20 19 22 23 21 24

0.80

3.00

0.60

10 12
0.40

2.90

18

0.20

2.80 0 1 2 3

9 10 11 12 13 14 15 16 17 18 19 20 21 22 Replicate Index
MODDE 7 - 2004-01-23 11:45:51

0.00 Lifetime~
N=24 DF=14 Cond. no.=12.3206 Y-miss=0

Investigation: Bubb_rsm (PLS, comp.=2) Lifetime~ with Experiment Number labels


0.98 0.95 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.05 0.02 N-Probability

11

7
-1

1 17 16 6 9 12 5

20 10 3 224

23 1921 13 18 8

14

22

4 15

0 Standardized Residuals

N=24 DF=14

R2=0.919 Q2=0.708

R2 Adj.=0.868 RSD=0.0358
MODDE 7 - 2004-01-23 13:14:00

2/10/2004

78

BubbleOpt: - 9. Visualization of modelling results


Investigation: Bubb_rsm (PLS, comp.=2) Scaled & Centered Coefficients for Lifetime~
0.060 0.040 0.020 0.000 -0.020 -0.040 -0.060 Gly*Gly Gly DW1*Gly DW1*DW1 DW2*DW2 DW1*DW2 DW2*Gly DW1*Wa DW2*Wa Wa*Gly DW1 DW2 Wa Wa*Wa

Reference mixture 0.2 / 0.2 / 0.3 / 0.3

Glycerol = 0.4

N=24 DF=14

R2=0.919 Q2=0.708

R2 Adj.=0.868 RSD=0.0358 Conf. lev.=0.95


MODDE 7 - 2004-01-23 13:14:42

Regression coefficients

Tri-linear contour plot

2/10/2004

79

BubbleOpt: - 10. Use of model


Raw Data Plot
3.15 Log (Lifetime) 3.10 3.05 3.00 2.95 2.90 2.85 2.80 40 4 6 12 7 18 3 50 60 Cost 70 80 22 23 21 24 1517 11 10 9 5 14 13 20 1 8 19 16 2

Ingredient cost is easy to take into consideration

2/10/2004

80

BubbleOpt: - 10. Use of model


Lowest ingredient cost with longlasting bubbles

2/10/2004

81

Conclusions, Bubble example


Sequence 1) Screening, 2) RSM is very fruitful for rational experimental work We were able to increase bubble lifetime from 6.02 - 22.28 min Key to success was to increase glycerol substantially Long-lasting bubbles are obtained with
Cooled solution 25 h settling time (not popular for kids) Formulation
DWL1 DWL2 Water Glycerol 0.23 0.1 0.27 0.4

Red plastic bubble wand

2/10/2004

82

Additional Topics

Other RSM designs for regular experimental region

Contents
Introduction Three-level full factorial designs Box-Behnken designs Comparison of Composite, Three-level factorial, and Box-Behnken designs

2/10/2004

84

Introduction
Composite designs are commonly used in optimization

We will now discuss two additional design families namely


(i) Three-level full factorial designs (ii) Box-Behnken designs

2/10/2004

85

Three-level full factorial designs


Three-level full factorial designs are extensions of the two-level full factorials Geometry of 32 and 33 designs displayed Runs = 3k ; k = no. factors; 9,27,81,243,... With k = 4 or higher this design family is not used to any great extent Observe that the 32 design is equivalent to the CCF design in two factors
2/10/2004

86

Three-level full factorial designs


Geometry of 34 and 35 designs
x1

Investigation: itdoe_testingofdesigns D esign: Full Fac (3 levels)

x1

x3

x3

x1

Investigation: itdoe_testingofdesigns D esign: Full Fac (3 levels)

x2

x2 x4

x2

x1

x1

x3

x3

x1

x2

x2

x2

x3

x2

x2

x2

81 and 243 experiments !!!

x5

x1

x1

x3

x3

x1

x1

x1

x3

x3

x1

x2

x2

x2

x4

2/10/2004

x3

x3

x3

87

Box-Behnken designs
Family of designs employing three levels per varied factor BB-designs are useful if experimenting in the corners is unwanted Mostly, BB-designs are used when investigating three or four factors.

2/10/2004

88

Summary
An overview of the number of experiments encoded by composite, three-level full factorial, and Box-Behnken designs, for 2-5 factors
# Factors 2 3 4 5 CCC/CCF 8+3 14 + 3 24 + 3 26 + 3 Three-level 9+3 27 + 3 81 + 3 243 + 3 Box-Behnken ----12 + 3 24 + 3 40 + 3

Overall, the CCC and CCF designs are most economical Some parsimony is provided by the BB-designs in three and four factors as well, but with five factors the BB design is not an optimal choice The big drawback of the three-level full factorial designs is the rapidly increasing number of experiments
2/10/2004

89

Additional Topics

Multi-level qualitative factors

Contents
Introduction Example: Cotton cultivation Regression modelling of multi-level qualitative factors Interpretation of regression models
regression coefficient plot interaction plot

2/10/2004

91

Introduction
Example: Multilevel qualitative factors
Factor A is a qualitative factor with four levels, factor B a qualitative factor with three settings, and factor C a quantitative factor changing between -1 and +1 Selected objective: Screening and linear model Full factorial design in 24 experiments is not the best choice D-optimal design (open set or filled set) is a better alternative
Factor C Factor A Level 1 Level 2 Level 3 Level 4
2/10/2004

+1

Fa ct o

-1 Sett 3

Sett 1

rB

Sett 2

92

Example - Cotton cultivation

2/10/2004

93

Regression analysis - Coefficient plot


Data support interaction model Coefficient plot has 39 bars V 4 bars C 7 bars V*C 28 bars Groups of terms which cannot be split Center has more impact than Variety
Investigation: Yates (MLR) Scaled & Centered Coefficients for Yield (Extended)
60 40 20 0 -20

2/10/2004

V(V1) V(V2) V(V3) V(V4) C(C1) C(C2) C(C3) C(C4) C(C5) C(C6) C(C7) V(V1)*C(C1) V(V1)*C(C2) V(V1)*C(C3) V(V1)*C(C4) V(V1)*C(C5) V(V1)*C(C6) V(V1)*C(C7) V(V2)*C(C1) V(V2)*C(C2) V(V2)*C(C3) V(V2)*C(C4) V(V2)*C(C5) V(V2)*C(C6) V(V2)*C(C7) V(V3)*C(C1) V(V3)*C(C2) V(V3)*C(C3) V(V3)*C(C4) V(V3)*C(C5) V(V3)*C(C6) V(V3)*C(C7) V(V4)*C(C1) V(V4)*C(C2) V(V4)*C(C3) V(V4)*C(C4) V(V4)*C(C5) V(V4)*C(C6) V(V4)*C(C7)
N=28 DF=0 Conf. lev.=0.95
MODDE 7 - 2004-01-23 13:34:35

94

Regression analysis - Interaction plot


Investigation: Yates (MLR)

In the case of multi-level qualitative factors, the interaction plot is especially informative Best possible combination of factors is Variety #4 and Center #4

Interaction Plot for V*C, resp. Yield

V V V V

(V1) (V2) (V3) (V4)

80 60 40 Yield 20 0 -20 -40

V (V1) (V2) V (V4) (V3) V (V4) V (V1) V V (V3) (V2)


C1 C2 C3 C4 Center
N=28 DF=0
MODDE 7 - 2004-01-23 13:36:15

C5

C6

C7

2/10/2004

95

Regression coding of qualitative variables


Qualitative variables require a special form of coding for regression analysis to work properly A qualitative factor with k levels, will have k-1 expanded terms in the model calculations
Expanded term Level of factor V1 V2 V3 V4 V(V2) -1 1 0 0 V(V3) -1 0 1 0 V(V4) -1 0 0 1

2/10/2004

96

Regular and extended lists of coefficients


Regular Extended
Yield Constant V V(V1) V(V2) V(V3) V(V4) Sum Coeff. -0.25 DF = 3 -0.035 -2.75 -5.036 7.821 0 Yield Constant Coeff. -0.25

The last extended term = negative sum of the other expanded terms All extended coefficients of a qualitative factor sum to zero

V(V2) V(V3) V(V4) Sum

-2.75 -5.036 7.821 0.035

C(C2) C(C3) C(C4) C(C5) C(C6) C(C7) Sum


2/10/2004

4 -19.5 56.5 -30.75 7.25 14 31.5

C C(C1) C(C2) C(C3) C(C4) C(C5) C(C6) C(C7) Sum

DF = 6 -31.5 4 -19.5 56.5 -30.75 7.25 14 0 97

Generation of designs with multi-level qualitative factors


With two-level qualitative factors standard two-level factorial designs apply With multi-level qualitative factors a Doptimal design is a more sensible choice
Factor A Level 1 Level 2 Level 3

Important: Balancing

2/10/2004

Level 4

Fa ct

Sett 1

or

Sett 2

Factor C

+1

-1 Sett 3

98

Summary
Interaction plot informative tool in regression modelling Expansion of qualitative factors in regression modelling gives regular and extended mode coefficients plots Multi-level qualitative factors are well handled with D-optimal design

2/10/2004

99

Additional Topics

The Taguchi approach to robust design

Contents
The Taguchi approach to robust design Inner and outer arrays of factors Classical analysis approach Interaction analysis approach Examples
CakeMix DrugD LoafVolume

Studying of expensive and inexpensive factors

2/10/2004

101

Robust design vs. Robustness testing


In the Taguchi approach, robustness has a different connotation and objective The objective is to find conditions where simultaneously the responses have values close to target and low variability Factors are often varied in large intervals and with designs very different from those used in Robustness testing Factors are varied in inner and outer arrays often many runs In robustness testing small factor intervals are usually used
2/10/2004

102

The Taguchi approach - Three phases


product design
measures quality in terms of the loss suffered by society caused by product variability around a specified target loss function desirable product is one for which the total loss is acceptably small specifies an acceptable region within which the final product design can lie

parameter design
equivalent to using DOE for finding optimal settings of the process variables

tolerance design
takes place when optimal factor settings have been specified tolerances on the factors are further adjusted if variability in the product quality is unacceptably high accomplished by using a mathematical model of the process, and the loss function belonging to the product property of interest

2/10/2004

103

Arranging factors in inner and outer arrays


Design factors easy-to-control affect the mean form the inner array Noise factors hard-to-control may or may not affect the process mean, and the spread around the mean comprise outer array Example: CakeMix
2/10/2004

Temp

Temp

225 175 30 Time 50 Temp 225 175 225 Temp 175 30 Time 50 Temp 225 175 30 Time 50 30 Time 50 30 Time 50 Temp 225 175 30 Time 50 Temp 225 175

225 175

100 Eggpowder

6
30 Time 50

Temp

30 Time 50

225 175

30 Time 50 Temp 225 175

50 100
eni n g

50

200

Flour

400
104

Sho rt

CakeMix application
Inner and outer array system requires many experiments CakeMix: 11*5 = 55 experiments Experimental goal was to find levels of the three ingredients producing a good cake
(a) when the noise factors temperature and time were correctly set according to the instructions on the box, and (b) when deviations from these specifications occur

In this kind of testing, the producer has to consider worst-case scenarios corresponding to what the consumer might do with the product, and let these considerations regulate low and high levels of the noise factors

2/10/2004

105

The classical analysis approach


For each experimental point in the inner array, two responses are formed: average response for the five outer array experiments (Taste) StDev from the five outer array experiments (StDev)

2/10/2004

106

The classical analysis approach


Questions: Which factors affect the variation (StDev) only? Which factors affect the mean level (Taste) only? And which affect both responses? Note that with this approach, there will be no model terms related to the noise factors Standard deviation responses tend to be non-normally distributed, and log-transformation is common practice

2/10/2004

107

Evaluation of raw data


Replicate plots show no outliers Responses are inversely correlated and run #6 appears promising
Investigation: CakeTaguchi_classical Plot of Replications for Taste with Experiment Number labels
6.00 5.50 5.00 Taste

Investigation: CakeTaguchi_classical Plot of Replications for LogStD with Experiment Number labels
0.40 0.30 0.20 LogStD

Investigation: CakeTaguchi_classical Raw Data Plot with Experiment Number labels


0.40

LogStD

6 4 3 5 8 7 1
1 2

1
0.30

1 7 3 11 10 9 4 8
Taste

7 11 10 9
LogStD

4.50

9 11 10

3 2

0.20

5 4 6 8
7 8 9

0.10 0.00 -0.10 -0.20 1 2 3 4 5 6

0.10

0.00

4.00 3.50

2
3 4 5 6 7 8 9 Replicate Index

-0.10

-0.20

3.60 3.80 4.00 4.20 4.40 4.60 4.80 5.00 5.20 5.40 5.60 5.80 6.00

Replicate Index

MODDE 7 - 2004-01-23 13:39:11

MODDE 7 - 2004-01-23 13:38:23

2/10/2004

108

Modelling results, interaction model


Investigation: CakeTaguchi_classical (MLR) Summary of Fit
R2 Q2 Model Validity Reproducibility

Negative Q2 of StDev Two non-significant twofactor interactions

1.00

0.80 0.60

0.40

0.20 0.00

-0.20 Taste
N=11 DF=4 Cond. no.=1.1726 Y-miss=0

LogStD

Investigation: CakeTaguchi_classical (MLR) Scaled & Centered Coefficients for Taste


0.10
0.40 0.20 0.00 -0.20 -0.40 -0.60 Fl Fl*Egg Sh*Egg Fl*Sh Egg Sh

Investigation: CakeTaguchi_classical (MLR) Scaled & Centered Coefficients for LogStD

0.00

-0.10

-0.20 Fl Fl*Sh Fl*Egg Egg Sh*Egg Sh

N=11 DF=4

R2=0.995 Q2=0.874

R2 Adj.=0.988 RSD=0.0768 Conf. lev.=0.95


MODDE 7 - 2004-01-23 13:44:13

N=11 DF=4

R2=0.959 Q2=-0.284

R2 Adj.=0.898 RSD=0.0540 Conf. lev.=0.95


MODDE 7 - 2004-01-23 13:44:57

2/10/2004

109

Modelling results, refined model


Model for StDev has improved Sh*Egg interaction is much smaller for StDev than for Taste Flour causes most spread around the average
Investigation: CakeTaguchi_classical (MLR) Scaled & Centered Coefficients for Taste
0.10
0.40 0.20 0.00 -0.20 -0.40 -0.60 Fl Egg Sh*Egg Sh
Investigation: CakeTaguchi_classical (MLR) Summary of Fit
1.00
R2 Q2 Model Validity Reproducibility

0.80

0.60

0.40

0.20

0.00 Taste
N=11 DF=6 Cond. no.=1.1726 Y-miss=0

LogStD

Investigation: CakeTaguchi_classical (MLR) Scaled & Centered Coefficients for LogStD

0.00

-0.10

-0.20 Fl Egg Sh Sh*Egg


MODDE 7 - 2004-01-23 13:47:15

N=11 DF=6

R2=0.988 Q2=0.937

R2 Adj.=0.980 RSD=0.0974 Conf. lev.=0.95


MODDE 7 - 2004-01-23 13:47:33

N=11 DF=6

R2=0.939 Q2=0.677

R2 Adj.=0.899 RSD=0.0538 Conf. lev.=0.95

2/10/2004

110

Interpretation of refined model


Interaction term is more important for Taste than for StDev The two lines cross each other in the plot related to Taste, but not in the other interaction plot Both plots indicate that low level of Shortening and high level of Eggpowder is favorable for high Taste and low StDev

2/10/2004

111

Interpretation of refined model


Response contour plots useful for interpretation

The best cake mix conditions are found in the upper left-hand corner Flour = 400g, Shortening = 50g, and Eggpowder = 100g
2/10/2004

112

Limitations of classical analysis approach


Which noise factors are important?
There are no noise factors in the model!!!!!

For the Taguchi method to be really successful, one would need to be able to estimate the impact of the noise factors and possible interactions between the design and the noise factors The existence of such noise-design factor interactions is crucial, otherwise the noise (variability) cannot be reduced by changing some design factors

2/10/2004

113

The interaction analysis approach


Information regarding important noise-design factor interactions can be extracted if inner and outer arrays are combined into one single design Expectation: What were in the classical approach design factor effects on StDev, now correspond to noise-design factor crossterms
No Flour Shortening Eggpowder Temp Time Taste No Flour Shortening Eggpowder Temp Time Taste 1 200 50 50 175 30 1.1 34 200 50 50 225 50 1.3 2 400 50 50 175 30 3.8 35 400 50 50 225 50 2.1 3 200 100 50 175 30 3.7 36 200 100 50 225 50 2.9 4 400 100 50 175 30 4.5 37 400 100 50 225 50 5.2 5 200 50 100 175 30 4.2 38 200 50 100 225 50 3.5 6 400 50 100 175 30 5 39 400 50 100 225 50 5.7 7 200 100 100 175 30 3.1 40 200 100 100 225 50 3 8 400 100 100 175 30 3.9 41 400 100 100 225 50 5.4 9 300 75 75 175 30 3.5 42 300 75 75 225 50 4.1 10 300 75 75 175 30 3.4 43 300 75 75 225 50 3.8 11 300 75 75 175 30 3.4 44 300 75 75 225 50 3.8 12 200 50 50 225 30 5.7 45 200 50 50 200 40 3.1 13 400 50 50 225 30 4.9 46 400 50 50 200 40 3.2 14 200 100 50 225 30 5.1 47 200 100 50 200 40 5.3 15 400 100 50 225 30 6.4 48 400 100 50 200 40 4.1 16 200 50 100 225 30 6.8 49 200 50 100 200 40 5.9 17 400 50 100 225 30 6 50 400 50 100 200 40 6.9 18 200 100 100 225 30 6.3 51 200 100 100 200 40 3 19 400 100 100 225 30 5.5 52 400 100 100 200 40 4.5 20 300 75 75 225 30 5.15 53 300 75 75 200 40 6.6 21 300 75 75 225 30 5.3 54 300 75 75 200 40 6.5 22 300 75 75 225 30 5.4 55 300 75 75 200 40 6.7 23 200 50 50 175 50 6.4 24 400 50 50 175 50 4.3 25 200 100 50 175 50 6.7 26 400 100 50 175 50 5.8 27 200 50 100 175 50 6.5 28 400 50 100 175 50 5.9 29 200 100 100 175 50 6.4 30 400 100 100 175 50 5 31 300 75 75 175 50 4.3 32 300 75 75 175 50 4.05 33 300 75 75 175 50 4.1

2/10/2004

114

The interaction analysis approach


No strong noisedesign factor interaction !!!!!
Investigation: CakeTaguchi_interaction (MLR) Summary of Fit
1.00
R2 Q2 Model Validity Reproducibility

0.80 0.60

0.40

0.20 0.00

-0.20 Taste
N=55 DF=39 Cond. no.=1.3110 Y-miss=0

Investigation: CakeTaguchi_interaction (MLR) Taste with Experiment Number labels


0.995 0.99 0.98 0.95 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.05 0.02 0.01 0.005 -4 -3

Investigation: CakeTaguchi_interaction (MLR)

Scaled & Centered Coefficients for Taste

1
-2

23 55 53 12 54 29 218 50 41 37 25 39 8 4 47 49 16 6 42 27 15 44 4 3 7 24 13 26 9 10 11 3 22 45 52 28 1 4 5 21 30 46 20 35 40 34 19 36 48 38 31 51 33 32 17
-1 0 1 2 3 4 Deleted Studentized Residuals

0.50

N-Probability

0.00

-0.50

-1.00

Fl*Ti

Fl

Ti

Egg*Ti

Sh*Ti

Sh*Te

N=55 DF=39

R2=0.605 Q2=0.185

R2 Adj.=0.453 RSD=1.0545
MODDE 7 - 2004-01-23 13:53:55

N=55 DF=39

R2=0.605 Q2=0.185

R2 Adj.=0.453 RSD=1.0545 Conf. lev.=0.95


MODDE 7 - 2004-01-23 13:57:09

2/10/2004

Sh*Egg

Egg*Te

Fl*Egg

Fl*Sh

Te*Ti

Fl*Te

Egg

Sh

Te

115

The interaction analysis approach


Six terms removed and three-factor interaction added An important three-factor interaction, Fl*Te*Ti !!!!!
Investigation: CakeTaguchi_interaction (MLR) Q2 Model Validity Summary of Fit Reproducibility
1.00 0.80 0.60 0.40 0.20 0.00 -0.20 Taste
N=55 DF=44 Cond. no.=1.3110 Y-miss=0
N=55 DF=44 R2=0.693 Q2=0.571 R2 Adj.=0.623 RSD=0.8751 Conf. lev.=0.95
MODDE 7 - 2004-02-02 10:03:38

R2

Investigation: CakeTaguchi_interaction (MLR) Scaled & Centered Coefficients for Taste


1.00 0.50 0.00 -0.50 -1.00 Fl*Ti Fl*Te*Ti Ti Sh*Egg Te*Ti Fl Fl*Te Egg Sh Te

2/10/2004

116

Interpretation of the three-factor interaction


By adjusting Flour to 400g the spread in Taste due to variations in Temperature and Time is minimized
Investigation: CakeTaguchi_interaction (MLR) Interaction Plot for Fl*Te*Ti, resp. Taste
Te (low ), Ti (low ) Te (low ), Ti (high) Te (high), Ti (low ) Te (high), Ti (high)

Te (low), Ti (high) Te (high), Ti (low) Te (high), Ti (low)

Taste

Large variation

Small
Te(high), (low), Ti Te Ti (high) (high) Te (low), Ti (low)

variation

Te (low), Ti (low) Te (high), Ti (high)


200 220 240 260 280 300 Flour 320 340 360 380 400

N=55 DF=44

R2=0.693 Q2=0.571

R2 Adj.=0.623 RSD=0.8751
MODDE 7 - 2004-02-02 10:05:09

2/10/2004

117

Response contour plots


Flatness at Flour 400g indicates sufficient robustness towards consumers not following baking instructions

Sh = 50, Egg = 100

2/10/2004

118

A second example - DrugD


Classical analysis approach

Interaction analysis approach

2/10/2004

119

DrugD - The classical analysis approach


Investigation: DrugD - classical (MLR) Summary of Fit
1.00
R2 Q2 Model Validity Reproducibility

Investigation: DrugD - classical (MLR) Summary of Fit


1.00 0.80 0.60 0.40 0.20 0.00 -0.20

R2 Q2 Model Validity Reproducibility

Some model refinement necessary After: Strong model for OneHour, and no model for log SD1h (robust) All factors but Volume influence the average release

0.80 0.60 0.40 0.20 0.00 -0.20 OneHour


N=27 DF=12 Cond. no.=6.6122 Y-miss=0

SD1h~

OneHour
N=27 DF=18 Cond. no.=5.9888 Y-miss=0

SD1h~

2/10/2004

120

DrugD - Graphical evaluation


Flatness of the response surface: the difference between the highest and lowest measured values is as low as 4.1% OneHour is robust

Temp = 39 PropSpeed = 100


2/10/2004

121

DrugD - The interaction analysis approach


Investigation: DrugD - interaction Plot of Replications for OneHour with Experiment Number labels

Investigation: DrugD - interaction Histogram of OneHour


30.00

Partially quadratic model with R2 = 0.79 and Q2 = 0.74 N-plot and ANOVA indicate model validity

36 35 34 33 32 31

62 142 87 49 115 74 35 104 61 114 125 30 32 44 27 88 23 143 128 113 140 153 155 20 34 43 102 47 59 152 103 107 77 86 112 17 31 124 136 116 5 50 89 4 54 3 139 158 141 157 26 33 45 48 72 8 1622 73 60 69 70 138 75 131 58 156 84 99 130 76 151 101 111 81 53 134 46 80 160 19 29 18 129 82 98 25 161 57 127 135 133 79 162 108 137 1 67 71 100 159 52 21 13 109 154 2 110 148 15 28 97 106 126 78 85 96 149 68 150 55 14 123 39 40 121 41 42 95 38 122 120 118 56 66 83 93 94 105 117 132 37 51 65 36 12 92 11 91 146 10 90 119 147 144 64 145 24 9 63 6 7 0 20 40 60 80 100 120 140 160 Replicate Index
MODDE 7 - 2004-02-02 10:37:16

OneHour

20.00 Count 10.00 0.00

30.00

30.45

30.90

31.35

31.80

32.25

32.70

33.15

33.60

34.05

34.50

34.95

35.40

Bins

R2 Investigation: DrugD - interaction (MLR) Q2 Summary of Fit Model Validity

Investigation: DrugD - interaction (MLR) OneHour with Experiment Number labels


0.995 0.99 0.98 0.95 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.05 0.02 0.01 0.005 -4 -3 87 102 104 125 74 159 27 124 136 43 67 1 3 6 69 62 23 44 49 117 16 118 114 30 70 107 75 78 148 153 112 156 48 151 152 4 128 142 36 73 82 17 139 129 31 90 20 115 77 91 155 29 137 38 39 37 7 26 2 3 58 19 65 32 93 54 121 113 35 68 99 84 149 1 20 45 2 103 110 158 92 105 46 140 134 47 86 8 1 1 14 10 50 15 40 66 138 1 27 7 111 21 131 96 59 80 109 61 98 100 160 95 88 132 101 141 18 144 97 11 33 53 9 4 143 5 60 12 25 157 1 9 22 108 150 161 146 41 135 123 133 64 79 154 51 57 119 162 63 89 145 28 71 130 22 76 106 116 855 5 52 42 3 8 34 126 56 147 -2 -1 0 1 2 3 4

MODDE 7 - 2004-02-02 10:38:34

Reproducibility

1.00

0.80
N-Probability

0.60

0.40

0.20

24

0.00 OneHour
N=162 DF=147 Cond. no.=6.6122 Y-miss=0
N=162 DF=147

Deleted Studentized Residuals


R2=0.791 Q2=0.745 R2 Adj.=0.771 RSD=0.5867
MODDE 7 - 2004-02-02 10:39:50

2/10/2004

35.85

122

DrugD - Modelling results from interaction analysis


Bath B2 provides slightly higher numerical values than other baths Effect of B2 is weak and must not be over-interpreted (right-hand plot) The recognition of this small term is important for further fine-tuning of the experimental equipment
Investigation: DrugD - interaction (MLR) Scaled & Centered Coefficients for OneHour (Extended)

Investigation: DrugD - interaction (MLR) Scaled & Centered Coefficients for OneHour (Extended)
5.0 4.0 3.0 2.0 1.0 0.0 -1.0 -2.0 -3.0 -4.0

0.50

0.00

-0.50

-1.00 Vol*Vol Vol Ba(B1) Ba(B2) Ba(B3) Ba(B4) Ba(B5) Ba(B6) Te Te*Te PrS PrS*PrS Vol*pH pH pH*pH

-5.0 Vol*Vol Vol Ba(B1) Ba(B2) Ba(B3) Ba(B4) Ba(B5) Ba(B6) Te Te*Te PrS PrS*PrS Vol*pH pH pH*pH

N=162 DF=147

R2=0.791 Q2=0.745

R2 Adj.=0.771 RSD=0.5867 Conf. lev.=0.95


MODDE 7 - 2004-02-02 10:40:51

N=162 DF=147

R2=0.791 Q2=0.745

R2 Adj.=0.771 RSD=0.5867 Conf. lev.=0.95


MODDE 7 - 2004-02-02 10:41:43

2/10/2004

123

A third example: LoafVolume


Investigation of which factors affect loaf volume Target volume = 530 cm3 Inner array: Recipe
mixture of three wheat flours (Tjalve, Folke, Hard RS)

Outer array: Baking conditions which vary from bakery to bakery


mixing time of dough proofing time of dough

2/10/2004

124

LoafVolume - Classical analysis approach


PLS used for mixture data Strong model for volume Weaker model for StDev Folke and HardRS affect volume
Investigation: Loafvol2 (PLS, comp.=2) Summary of Fit
1.00
R2 Q2

0.80

0.60

0.40

0.20

0.00 loafvolume
N=10 DF=4 Cond. no.=6.8608 Y-miss=0

stdev

2/10/2004

125

Model interpretation
Is it possible to get a volume of 530 and minimize spread?

Arrow shows best compromise

2/10/2004

126

LoafVolume - Interaction analysis approach


Use PLS Proofing time important
Investigation: Loafvolume (PLS, comp.=2) Summary of Fit
1.00
R2 Q2

0.80

0.60

0.40

0.20

0.00 loafvolume
N=90 DF=75 Cond. no.=8.2742 Y-miss=0

Investigation: Loafvolume (PLS, comp.=2)


Investigation: Loafvolume (PLS, comp.=2) Score Scatter: t[1] vs u[1]
4 3 2 1 0 -1 -2 -3

Investigation: Loafvolume (PLS, comp.=2) Score Scatter: t[2] vs u[2]

Scaled & Centered Coefficients for loafvolume

80 81 71 26 63 77 78 72 87 45 62 86 53 70 60 3436 54 69 88 59 44 68 50 79 84 35 27 52 33 74 23 61 51 25 32 18 8 83 43 42 4976 41 14 6 65 40 67 20 17 15 85 22 9 16 66 13 29 56 31 24 7 57 5 47 4 55 39 38 58 75 82 19 30 11 37 12 73 48 64 3 2 46 21 10 28 1
-3 -2 -1 0 t[1] 1 2 3

8990

3 2 1 u[2] 0 -1 -2 -3

680 45 8 26 81 63 9 62 3460 77 7 36 4 70 78 59 5344 71 5 33 43 40 42 74 18 50 352 72 55 61 32 41 89 54 49 15 20 39 23 14 27 51 35 25 2 69 79 29 57 56 76 90 13 116 68 37 38 65 17 67 2231 12 30 47 86 1124 1966 87 58 84 73 88 48 83 10 64 28 46 75 21 85


-3 -2 -1 t[2] 0 1 2

40.00

20.00 cm3 0.00 -20.00

u[1]

Mi*Mi

Tj*Tj

Mi

Tj

Pr*Mi

Pr*Pr

Mi*Tj

Pr*Tj

Pr

Ha

Tj*Ha

Ha*Ha

-4

N=90 DF=75

Cond. no.=8.2742 Y-miss=0

N=90 DF=75

Cond. no.=8.2742 Y-miss=0

N=90 DF=75

R2=0.894 Q2=0.754

R2 Adj.=0.874 RSD=22.6934 Conf. lev.=0.95


MODDE 7 - 2004-02-02 10:55:33

2/10/2004

Fo*Ha

Fo*Fo

Mi*Ha

Pr*Ha

Fo

Mi*Fo

Pr*Fo

Tj*Fo

82

127

Model interpretation
Volume sensitive to changes in proofing time

With short proofing time goal is not obtainable

2/10/2004

128

Model interpretation at compromise point


What does contour plot look like at mixture
Tjalve 0.25 Folke 0.11 HardRS 0.64 ?

Volume sensitive (= not robust) to changes in proofing and mixing time (which was discovered in classical analysis approach, as well)
2/10/2004

129

An additional element of robust design


Sometimes it is more appropriate to distinguish among factors which are expensive and inexpensive to vary Example drilling: Expensive, drill features diameter length geometry Cheap, machine conditions cutting speed feed rate cooling (yes/no)
2/10/2004

Inner array: Expensive factors Outer Array: Cheap factors 17 experiments per drill !
130

Summary
We have discussed
the Taguchi approach to robust design the concept of inner and outer arrays of factors the classical analysis approach the interaction analysis approach how to handle robust design testing when some factors are expensive and some inexpensive to vary

2/10/2004

131

Additional Topics

Simultaneous optimization of several responses fitted with different models

Contents
Background Example Data analysis Linked response Simultaneous optimization

2/10/2004

133

Background
When is there an interest in fitting different models to different responses? When working with many responses that are grouped
A PLS model fitted to grouped responses tends to have many components and be difficult to interpret

Selectivity among responses


A tailor-made model for each response may facilitate optimization toward a factor combination where selectivity among responses is obtained

2/10/2004

134

Example: TruckEngine
Create one investigation for each response

2/10/2004

135

Data Analysis: TruckEngine


Fit a unique model to each response (Q2 maximization) Fuel
Investigation: TruckE_Fuel (MLR) Scaled & Centered Coefficients for Fuel
10 5 mg/st 0 -5 -10

NOx
Investigation: TruckE_NOx (MLR) Scaled & Centered Coefficients for NOx
2 0 mg/s
mg/s 0.40

Soot~
Investigation: TruckE_Soot (MLR) Scaled & Centered Coefficients for Soot~

-2 -4

0.20

0.00

-6 -8
Air*Air Air NL*NL Air*NL EGR NL
-0.20

Air*Air

Air

Air

NL

NL*NL

EGR*NL

EGR

EGR*EGR

N=17 DF=10

R2=0.985 Q2=0.959

R2 Adj.=0.976 RSD=1.9080 Conf. lev.=0.95


MODDE 7 - 2004-02-02 11:08:36

N=17 DF=13

R2=0.945 Q2=0.917

R2 Adj.=0.932 RSD=0.1188 Conf. lev.=0.95


MODDE 7 - 2004-02-02 11:10:07

N=17 DF=9

R2=0.997 Q2=0.987

R2 Adj.=0.995 RSD=0.4624 Conf. lev.=0.95


MODDE 7 - 2004-02-02 11:09:12

2/10/2004

EGR

NL

136

Linking responses into a MODDE investigation


Investigation dealing with Fuel was chosen as reference project Define a new response (NOx) and find its root investigation

2/10/2004

137

Linking responses into a MODDE investigation


Select the response(s) of interest First NOx

Then repeat whole procedure for Soot


2/10/2004

138

Linking responses into a MODDE investigation


Resulting worksheet All settings regarding responses + coefficients of tailor-made models are brought into new base project
2/10/2004

139

Optimization results
Simplex #5 most successful

2/10/2004

140

Optimization results
Bring optimization results to response contour plots, or SweetSpot plot

Air = 240
2/10/2004

141

Summary
Linked responses can be used when responses are not correlated This means that one may, e.g., use PLS in a mother project for the analysis of a group of correlated responses, and then attach (link) another response and its model (MLR) coefficients prior to optimization Flexibility/Selectivity
outliers more easily eliminated PLS/MLR different models for different responses

Requirement: Same basic worksheet


2/10/2004

142

Additional Topics

Partial Least Squares Projections to Latent Structures, PLS

Contents
Introduction to PLS Geometrical interpretation of PLS LOWARP example

2/10/2004

144

When to use PLS


PLS is a pertinent choice, if (i) there are several correlated responses in the data set, (ii) the experimental design has a high condition number (>10), or (iii) there are small amounts of missing data in the response matrix

2/10/2004

145

Notation
K M N A = = = = number of X variables number of Y variables number of observations number of PLS components

T P W U C

= = = = =

matrix of X-scores with col.s t1,.., tA (vectors) matrix of X-loadings with col.s p1,.., pA (vectors) matrix of PLS X-weights with col.s w1,.., wA (vectors) matrix of Y-scores with col.s u1,.., uA (vectors) matrix of PLS Y-weights with col.s c1,.., cA (vectors)

2/10/2004

146

Scaling of variables
x3
measured values & "length"

3
x1 x2 x3

unit variance scaling

20
x1

x2

Defining/Selecting the length of variable axes (X and Y-spaces) Recommended: To set each axis to unit length (unit variance scaling)
2/10/2004

147

PLS -- Geometric Interpretation, 1


x3
factors/predictors K=3
observations

responses M=3

y3

X
N N

Y
x2 y2

x1

y1

For each matrix, X and Y, we construct a space with K and M dimensions, respectively (here K=M=3) Each X- and Y-variable has one coordinate axis with the length defined by its scaling, typically unit variance
2/10/2004

148

PLS -- Geometric Interpretation, 2

Each observation is represented by one point in the X-space and one in the Y-space As in PCA, the initial step is to calculate and subtract the averages; this corresponds to moving the coordinate systems

2/10/2004

149

PLS -- Geometric Interpretation, 3


x3 y3

x2
x1 y1
Same observation

y2

The mean-centering procedure implies that the origos of the coordinate systems are repositioned
2/10/2004

150

PLS -- Geometric Interpretation, 4


x3

Comp 1 (t1) y3 Comp 1 (u1)

x2
x1 y1
Projection of observation i

y2

The first PLS-component is a line in X-space and a line in Y-space, calculated to a) well approximate the point-swarms in X and Y and b) maximize covariance between the projections (t1 and u1) These lines pass through the average points
2/10/2004

151

PLS- Geometric Interpretation, 5

The projection coordinates, t1 and u1, in the two spaces, X and Y, are connected and correlated through the inner relation ui1 = ti1 + hi (where hi is a residual) The slope of the dotted line is 1.0
2/10/2004

152

PLS -- Geometric Interpretation, 6


x3 Comp 1 (t1) y3

Comp 1 (u1) Comp 2 (u2)


x2 y2

Comp 2 (t2)

x1

y1

The second PLS component is represented by lines in the X- and Y-spaces orthogonal to the lines of the first component, also going through the average points. These lines, t2 and u2, improve the approximation and correlation as much as possible.

2/10/2004

153

PLS -- Geometric Interpretation, 7

The second projection coordinates (t2 and u2) correlate, but less well than the first pair of latent variables By inserting X-values of a new observation into the model, we obtain its t1- and t2scores, which through the inner relation give values of u1 and u2, which, in turn, enable predicted values of Y to be computed
2/10/2004

154

PLS predictions
A new observation is similar to the training set if it is inside the tolerance cylinder in X-space Then its projection on the X-model (t) can be entered into the T-U-relation giving a u-value for each model dimension These values define a point on the Yspace model, which, in turn, corresponds to a predicted value for each y-variable

2/10/2004

155

PLS -- Geometric Interpretation, 8


x3

Comp 1 (t1) y3 Comp 1 (u1) Comp 2 (u2)

Comp 2 (t2)

x2
x1 y1

y2

The PLS components form planes in X- and Y-spaces The variability around the X-plane is used to calculate a tolerance interval within which new observations similar to the training set will be located. This is of interest in classification and prediction.
2/10/2004

156

PLS -- Geometric Interpretation, 9


Repeated plotting of successive pairs of latent variables will give a good appreciation of the correlation structure

2/10/2004

157

PLS, Overview
X = 1* x + T* P'+E Y = 1* y + U *C'+F = 1 * y + T * C'+G
(because U = T + H) (inner relation)

PLS Projection of X that both approximates X well, and correlates with Y


2/10/2004

differences to

PCA Projection of X that is an optimal approximation of X (least squares fit)


158

PLS, Parameter properties


For each component: 1) t are linear combinations of X with weight w - t is a summary of the X variables that are correlated with Y 2) u are linear combinations of Y with weight c - u is a summary of the Y variables 3) w are the correlation coefficients between the x's and u - Columns of X highly correlated with Y are given high weights 4) At Convergence for the Orthogonality: - p is computed so that t*p' is the "Best approximation of X" - t*p' is removed from X for the next component

2/10/2004

159

The LOWARP application


Production of a polymer Four factors, ingredients, were varied according to a 17 run mixture design 14 responses were measured The desired combination was low warp/shrinkage and high strength

2/10/2004

160

LOWARP worksheet
Contains some missing data and many correlated responses

2/10/2004

161

PLS model interpretation - R2/Q2 & scores


Investigation: Lowarp (PLS, comp.=3) PLS Total Summary (cum)
1.00
R2 Q2

Investigation: Lowarp (PLS, comp.=3) Score Scatter: t[1] vs u[1] with Experiment Number labels
2

After three components R2 = 0.75 and Q2 = 0.53.

16 3 2 17 8 12

10

0.80

1
R2 & Q2

u[1]

0.60

0.40

-1
0.20

7 6 11 9 4
-2

14 15 13 1

-2
0.00 Comp1
N=17 DF=13

5
-1 0 t[1]
N=17 DF=13 Cond. no.=2.0457 Y-miss=10

Comp2
Cond. no.=2.0457 Y-miss=10

Comp3

Investigation: Lowarp (PLS, comp.=3) Score Scatter: t[2] vs u[2] with Experiment Number labels

Investigation: Lowarp (PLS, comp.=3) Score Scatter: t[3] vs u[3] with Experiment Number labels
3 2 1 0 -1 -2

10 17 16 14 15 7 13 9 5 3

1 11

3 16 10 13 6 14 11 5 2 15 12 8
-2 -1 0 t[3]
N=17 DF=13 Cond. no.=2.0457 Y-miss=10

0 u[2]

9 1

-1

4 12

u[3]

-2

17 7

-3

2
-2 -1 t[2]
N=17 DF=13 Cond. no.=2.0457 Y-miss=10

2/10/2004

162

PLS model interpretation - Loadings

Investigation: lowarp (PLS, comp.=3) Loading Scatter: wc[1] vs wc[2]

Investigation: lowarp (PLS, comp.=3) Loading Scatter: wc[1] vs wc[2]

st3 st5
0.50 wc[2]

st1 gl

st3 st5
0.50 wc[2]

mi
0.00

st4 st2 st6 w2 w6 w1 w5 w7 w3 w8 am w4

st1 gl

mi
0.00

st4 st2 st6 w2 w6 w1 w5 w7 w3 w8 am w4

-0.50

cr
wc[1]

-0.50

cr
wc[1]

-0.80 -0.60 -0.40 -0.20 0.00 0.20 0.40 0.60 0.80

-0.80 -0.60 -0.40 -0.20 0.00 0.20 0.40 0.60 0.80

2/10/2004

163

PLS model interpretation - Loadings & Coefficients


Coefficient profiles for correlated and uncorrelated responses
Investigation: Lowarp (PLS, comp.=3) Loading Scatter: wc[1] vs wc[2]
0.80 0.60 0.40 wc[2] 0.20 0.00 -0.20 -0.40 -0.60 -0.80 -0.60 -0.40 -0.20

st3 st5

mi

st1 gl st4 st2 st6 w2 w6 w1 w5 w3 w7 w8 am w4 cr


0.00 0.20 0.40 0.60 0.80

wc[1]
N=17 DF=13 Cond. no.=2.0457 Y-miss=10

2/10/2004

164

PLS model interpretation - Coefficients & VIP


Investigation: Lowarp (PLS, comp.=3) Loading Scatter: wc[1] vs wc[2]
0.80 0.60 0.40 wc[2] 0.20 0.00 -0.20 -0.40 -0.60 -0.80 -0.60 -0.40 -0.20

Investigation: Lowarp (PLS, comp.=3) Loading Scatter: wc[1] vs wc[3]


0.60

Variable importance for projection, VIP, is the most condensed way of expressing variable related information

st3 st5

cr w2 w6 w5 st6 w3 st1 w1 st2 st4 w8 w7 am


-0.80 -0.60 -0.40 -0.20 0.00 wc[1]
N=17 Cond. no.=2.0457 Investigation: Lowarp (PLS, comp.=3) DF=13 Y-miss=10 Variable Importance Plot

w2 w6 w1 w5 w7 w3 w8 am w4 cr
0.00 wc[1]
N=17 Cond. no.=2.0457 Investigation: Lowarp (PLS, comp.=3) DF=13 Y-miss=10 Loading Scatter: wc[2] vs wc[3]

wc[3]

mi

st1 gl st4 st2 st6

0.40 0.20 0.00 -0.20 -0.40 -0.60

glw4

st5 st3 mi

0.20

0.40

0.60

0.80

0.20

0.40

0.60

0.80

0.60 0.40 0.20 wc[3] 0.00 -0.20 -0.40 -0.60

cr w4 gl st5 st6 st1 st3 st2st4 mi w2 w6 w5 w3 w1

1.20 1.00 0.80 VIP 0.60 0.40 0.20

w8 w7 am
-0.60 -0.40 -0.20 0.00 0.20

mi

gl

wc[2]
N=17 DF=13 Cond. no.=2.0457 Y-miss=10
N=17 DF=13

Cond. no.=2.0457 Y-miss=10

2/10/2004

am

cr

0.40

0.60

0.80

0.00

165

Summary PLS
PLS is a multivariate regression method which is useful for handling complex DOE problems PLS is especially useful when:
(i) there are several correlated responses in the data set (ii) the experimental design has a high condition number (iii) there are small amounts of missing data in the response matrix

PLS calculates a new variable, t, summarizing X, and a another new variable, u, summarizing Y, and investigates the correlation between them All diagnostic tools available for MLR are retained for PLS In addition, PLS provides other diagnostic tools, such as, scores, loadings, and VIP

2/10/2004

166

Additional Topics

Design in Latent Variables

Contents
Introduction what is design in latent variables ?
Multivariate characterization Selecting informative molecules; COST vs DOE SMD: Increasing reliability of model and data

Example: Lead finding and lead optimization Example: Onion and an overview of design families
FDs and FFDs D-optimal design Cell-based & Grid-based design Space filling design Onion design principles

Onion design three examples Summary


2/10/2004

168

Introduction
In QSAR the central idea is to develop a model based on a small-sized training set, and calculate predictions for large numbers of non-tested compounds This means that the few chemicals in the training set should be representative and have a balanced distribution How do we accomplish this ?
Multivariate characterisation data matrix examined by PCA Principal Properties (few, orthogonal) Statistical Molecular Design (SMD) in principal properties (PP) Compounds are selected by matching the PP-scores to the chosen design

2/10/2004

169

Multivariate Characterization an important step in SMD

A way to quantify qualitative, discrete, changes. The chemical descriptors must account for the dominant properties of the compounds, i.e. the principal properties that are known or anticipated to influence biological activity.

Properties such as
hydrophobicity steric properties (size) electronic properties (chemical) reactivity

PCA of the multi-property matrix gives the (latent) principal properties in terms of the principal component scores

2/10/2004

170

COST versus Statistical Molecular Design

The COST Approach Vertical line: A is held constant while varying B Horizontal line: B is kept constant while varying A

The Design Approach Both factors A and B are varied simultaneously. This results in a better and more efficient mapping of the modelled response.

2/10/2004

171

Introduction: The intuitive approach - COST


ESREVID.M9 (PC), PCA, Work set Scores: t[1]/t[2]

Chemical map of 60 haloalkanes Trace of COSTing Problem: Limited range of applicability & reliability
Density

54 55

27 58 56 5716 23

17

1 t[2]

-1

-2

22 52 53 47 48 18 15 14 29 21 24 28 26 13 20 6 50 49 25 11 10 46 7 45 32 3 44 8 41 4 2 5 42 1 43 12 30 40 33 38 37 19 36 9 34 35
-6 -5 -4 -3 -2 -1 0 t[1] 1 2 3

51

-3

39 31
4 5 6

Ellipse: Hotelling T2 (0.05)


Simca-P 8.0 by Umetrics AB 2000-11-20 08:40

Mw/log P

2/10/2004

172

Increasing reliability of model & data: SMD


Trace of COST approach

Structural factor space

2/10/2004

173

Increasing reliability of model & data: SMD


SMD efficiently fills space; few points, much information

Result from long COSTing

Results from COST-ing


2/10/2004

174

SMD Factorial and fractional factorial designs

Two-level factorial and fractional factorial designs with centre points are useful in QSAR modelling

2/10/2004

175

Selecting the Training Set


DV1 DV2 DV3 DV4 + + + + 0 0 + + + + 0 0 + + + + 0 0 + + + + 0 0 PP1 -0.72 1.96 -1.77 1.2 -1.69 1.14 -3.2 1.92 0.52 0.56 PP2 -1.26 -0.86 1.22 0.89 -0.83 -0.4 1.68 0.79 0.22 0.03 PP3 -1.29 -0.81 -0.14 -0.9 0.92 0.95 1.07 0.13 0.7 0.28 PP4 -0.51 0.15 -0.08 -0.12 0.7 -1.1 -1.9 0.7 0.48 1.54 [52] CH3-CH2-CH2-CH2Br [48] CH3-CHCl-CH3 [33] CH3-CHBr2 [30] CH3-CH2Br [15] CHCl2-CHCl2 [07] CCl3F [39] CBr3F [02] CH2Cl2 [3] CHCl3 [11] CH2Cl-CH2Cl

Balanced coverage of score plot!

2/10/2004

176

Example of SMD: Surfactants


The 8 lipophilic surfactants were excluded, and an updated PC-model was computed R2X = 0.76
Q2 = 0.55 A=3
Surfact2.M2 (PC), pca of sub-set, Work set Scores: t[1]/t[2]

7 4

6 35 16 5 30

-2

8 21 28 33 22 31 15 24 9 23 20 32 27 3725 26 11 34 12
-7 -6 -5 -4 -3 -2 -1 0 t[1] 1

t[2]

1 36

-4

38
2 3 4 5 6 7

Selected surfactants: 2,5,8,9,11,30,31,33,37,3 8

Ellipse: Hotelling T2 (0.05)


Simca-P 8.0 by Umetrics AB 2000-05-29 13:44

2/10/2004

177

Lead finding and lead optimization with SMD


A desired pharmacological profile based on five biological tests was specified Eight commercially available compounds (Sub1-Sub8) were tested in the five biological tests
1

Cl N N N N H N Cl O 2 O O N O 5 N 6 7 N Cl N N 3 O N N S

O Cl S 4

S N

Cl N Cl O 8

2/10/2004

178

Lead finding and lead optimization with SMD


PCA (R2 = 0.71) gives
aldrich .M2 (PCA-X), overview PCA model P+ t[Comp. 1]/t[Co mp. 2]
3 2 1 t[2] 0 -1 -2 -3 -5 -4 -3 -2 -1 0 t[1] 1 2 3 4 5

aldrich.M2 (PCA-X), overview PCA model P+ p[Comp. 1]/p[Comp. 2]


0.80 0.60

Test4 Test3 Test5 Test1

p[2]

Target Sub2 Sub1

Sub7

0.40 0.20 0.00 -0.20 -0.40 -0.60 -0.80 -0.60 -0.40 -0.20 0.00 p[1] 0.20 0.40

Sub5 Sub3 Sub8 Sub4

Sub6

Test2

0.60

Substances 1 and 2 are promising as leads. Sub2 would be the natural first choice.
2/10/2004

Some redundancy among the five tests. Three tests are sufficient in the future, e.g. 2, 3 and 4.
179

Substituent scales for aromatic substituents


t1 t2 -0.11 -0.68 -0.75 -0.03 -0.62 -0.18 -0.50 -0.24 -0.29 -0.24 -0.31 -0.26 -0.05 -0.08 -0.53 -0.25 -0.83 -0.11 -0.09 -0.20 -0.44 -0.33 -0.18 -0.45 -0.36 -0.86 -0.16 -0.54 -0.05 -0.59 -0.50 -0.45 -0.20 -0.13 -0.63 -0.06 t3 -0.04 -0.55 -0.18 -0.10 -0.28 -0.06 -0.66 -0.14 -0.06 -0.41 -0.32 -0.60 -0.60 -0.49 -0.50 -0.46 -0.35 -0.19 -0.19 -0.21 -0.24 -0.31 -0.63 -0.25 -0.67 -0.33 -0.45 -0.30 -0.53 -0.30 -0.59 -0.64 -0.62 -0.47 -1.00 -0.22 C CC6H5 NHCOC6H5 -1 +1 -1 F H OH SH NH2 NHNH2 NHCN CH2Cl NHCHO NHCONH2 OCH3 CH2OH SCH3 NHCH3 C CH CH2CN CH=CH2 NHCOCH3 CH2CH3 OCH2CH3 CH2OCH3 SC2H5 NHC2H5 NHCOOC2H5 OCHMe2 i-C4H9 +1 +1 -1 CH2CH2COOH OC3H7 SC3H7 NHC3H7 OC4H9 NHC4H9 N=CHC6H5

Suppose we select Sub2 as our lead Convention: 'OH' = pos 1, 'orto-Cl' pos 2, 'para-Cl' pos 3; Quinoline scaffold not varied Substituent descriptors (principal properties) taken from Skagerberg et al. (QSAR 8 (1989), 32-38

-1 -1 -1 Cl NO NO2 N3 SO2NH2 OCF3 CN NCS SCN CHO COOH CONH2 CH=NOH NHCSNH2 SOCH3 OSO2CH3 SO2CH3 NHSO2CH3 NHCOCF3 CH=CHNO2 COCH3 SCOCH3 OCOCH3 COCH3 CONHCH3 SO2C2H5 +1 -1 -1 N=CCl2 COOC2H5 CH=CHCOCH3 COOC3H7 N=NC6H5 OSO2C6H5 NHSO2C6H5 OCOC6H5 CHNC6H5 CH2OC6H5 -0.59 -0.77 -0.68 -0.25 -0.54 -0.30 -0.57 -0.32 -0.40 -0.64 -0.53 -0.50 -0.27 -0.21 -0.48 -0.39 -0.43 -0.31 -0.32 -0.27 -0.46 -0.12 -0.29 -0.23 -0.25 -0.15 0.06 0.10 0.10 0.38 0.80 0.78 0.71 0.74 0.74 0.83

t1 1.00 0.88 -0.88 -1.00 -0.83 -0.57 -0.72 -0.57 -0.55 -0.43 -0.41 -0.17 -0.45 -0.52 -0.29 -0.45 -0.33 -0.44 -0.29 -0.19 -0.32 -0.19 -0.25 -0.02 -0.09 -0.07 -0.12 -0.03 0.09 0.14 0.28 0.23 0.43 0.51 0.82

t2 -0.26 -0.07 0.17 0.61 0.55 0.05 0.80 0.63 0.13 0.18 0.04 0.20 0.40 0.28 0.15 1.00 0.02 0.17 0.20 0.06 0.47 0.40 0.20 0.08 0.77 0.18 0.45 0.25 0.19 0.33 0.09 0.74 0.34 0.74 0.29

t3 -0.13 -0.35 -0.36 -0.48 -0.52 -0.10 -0.51 -0.41 -0.25 -0.16 -0.65 -0.72 -0.36 -0.55 -0.06 -0.26 -0.45 -0.34 -0.13 -0.68 -0.01 -0.52 -0.64 -0.06 -0.36 -0.06 -0.14 -0.14 -0.56 -0.40 -0.02 -0.35 -0.35 -0.32 -0.97

t1 -1 -1 +1 Br SOOF SF5 I CF3 SCF3 SOOCF3 CF2CF3 PMe2 COC3H7 +1 -1 +1 2-Thienyl SOOC6H5 COC6H5 -1 +1 +1 CH2Br CH2I CH3 NMe2 Cyclo-Pr CHMe2 C3H7 t-C4H9 CH2C6H5 +1 +1 +1 s-C4H9 n-C4H9 C5H11 C6H5 OC6H5 NHC6H5 CycloHex -0.48 -0.63 -0.28 -0.30 -0.61 -0.15 -0.41 -0.35 -0.24 -0.11 0.26 0.22 0.04 -0.33 -0.15 -0.64 -0.34 -0.24 -0.17 -0.04 -0.10 -0.04 0.06 0.28 0.56 0.36 0.07 0.13 0.46

t2 -0.20 -0.95 -0.81 -0.22 -0.40 -0.36 -1.00 -0.44 -0.10 -0.60 -0.03 -0.84 -0.50 0.16 0.17 0.52 0.79 0.32 0.23 0.43 0.19 0.37 0.07 0.42 0.38 0.01 0.08 0.51 0.24

t3 0.06 0.09 0.35 0.22 0.33 0.18 0.28 0.46 0.39 0.47 0.12 0.18 0.84 0.10 0.25 0.00 0.12 0.26 0.66 0.00 0.95 1.00 0.50 0.02 0.04 0.17 0.71 0.64 0.63

2/10/2004

180

Selection of representative substituents


Select one representative for each substituent category Avoiding the most peculiar substitutents, a candidate list might be the following:
-1 +1 -1 +1 -1 +1 -1 +1 -1 -1 +1 +1 -1 -1 +1 +1 -1 -1 -1 -1 +1 +1 +1 +1 -NO2 -COOC2H5 -H -OC3H7 -I (or Br) -COC6H5 -CH(CH3)2 -C6H5
181

2/10/2004

Construction of lead-centered multivariate design


29-5 FFD in 16 runs
CompNo 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Position 1 a -1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 1 b -1 -1 1 1 -1 -1 1 1 -1 -1 1 1 -1 -1 1 1 abcd 1 -1 -1 1 -1 1 1 -1 -1 1 1 -1 1 -1 -1 1 Position 2 c -1 -1 -1 -1 1 1 1 1 -1 -1 -1 -1 1 1 1 1 abc -1 1 1 -1 1 -1 -1 1 -1 1 1 -1 1 -1 -1 1 bcd -1 -1 1 1 1 1 -1 -1 1 1 -1 -1 -1 -1 1 1 Position 3 d -1 -1 -1 -1 -1 -1 -1 -1 1 1 1 1 1 1 1 1 acd -1 1 -1 1 1 -1 1 -1 1 -1 1 -1 -1 1 -1 1 abd -1 1 1 -1 -1 1 1 -1 1 -1 -1 1 1 -1 -1 1

Columns x1 x3 represent substituent position 1, columns x4 x6 position 2, and columns x7 x9 position 3 The proposed molecular structures should be checked with the synthetic chemists

CompNo 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Position 1 I COOC2H5 H C6H5 NO2 COC6H5 CH(CH3)2 OC3H7 NO2 COC6H5 CH(CH3)2 OC3H7 I COOC2H5 H C6H5

Position 2 NO2 H CH(CH3)2 I C6H5 COC6H5 COOC2H5 OC3H7 I CH(CH3)2 H NO2 OC3H7 COOC2H5 COC6H5 C6H5

Position 3 NO2 CH(CH3)2 I H H I CH(CH3)2 NO2 C6H5 COOC2H5 OC3H7 COC6H5 COC6H5 OC3H7 COOC2H5 C6H5

2/10/2004

182

Overview of designs often used in SMD


Factorial and fractional factorial design D-optimal design (illustrated with example) Cluster-based design (illustrated with example) Cell-based & Grid-based design Space-filling design Onion (Russian doll) design Random complement + combinations thereof

2/10/2004

183

Example: Onion
Data set from AZ, Lund, Bosse Nordn
N = 1107 K = 115
10

onion_I.M1 (PCA-X), A is 2 t[Comp. 1]/t[Comp. 2]

Objective: Select 80 diverse and representative compounds First PCA score plot
= 0.50 (A = 2) R2X = 0.75 (A = 6) R2X

t[2]

-5

-10

-20

-10 t[1]

10

2/10/2004

184

Factorial or fractional factorial design


Two-level FDs and FFDs often used with small data sets To get 80 compounds we need to use at least 6 PCs E.g. 26 = 64, plus a random complement and centerpoint selection of 16 compounds Tedious and timeconsuming!
2/10/2004

185

What is D-optimal design?


Computer generated design The D-optimal design maximizes the determinant of the X'X matrix Geometrically, this is equivalent to saying that the volume of X is maximized

2/10/2004

186

D-optimal design in Onion_I


D-optimal samples outer part of score space with lots of replicates (i.e., 80 compounds is too much)
Investigation: Pure D-opt A = 2 Raw Data Plot
M1.t2

10 8 6 4 2 M1.t2

47 46 45 44 43 42 41 40 39 38 37 66 65 64 63 62 61 60 59 58 57 56 55 54 36 35 34 33

With A = 6 better sampling, but still replication


onion_I.M1 (PCA-X), A is 2 t[Comp. 1]/t[Comp. 2]
10

0 -2 -4 -6 -8 -10 -12

61 1 9 8 7 0 5 4 3 2 79 78 77 76 75 74 73 72 71 70 69 68 67

80

12 11

20 19 18 17 16 15 14 13 53 52 51 50 49 48 32 31 30 29 28 27 26 25 24
-20 -10 M1.t1 0 10

23 22 21

t[2]

-5

-10

-20

-10 t[1]

10

MODDE 7.0.0.1 - 2003-06-24 09:18:35

2/10/2004

187

Grid-based & Cell-based design


Grid is placed over the score space Structure closest to the - centre in each bin (cell), or - grid point (grid) is selected
t[2] 10

(PCA-X), Untitled t[1]/t[2]

Selection depends on mesh size and distribution of compounds Easy with A = 2, but complicated with A = 6!

-10

-20

0 t[1]

20

2/10/2004

188

Space-filling design
Similar to cell- & grid-based design Distance calculations between points in chemical space Compounds are selected giving the best coverage (smallest average distance between selected points) of the chemical space
2/10/2004

189

Onion design
Sees the chemical domain as composed of layers Selection becomes a function of number of layers and type of design laid out in each layer

2/10/2004

190

Some examples of onion designs


Case I: A = 2, L = 3, default settings, no requirements on NDes Case II: A = 6, L = 3, NDES = 80 Case III: As Case II, but with removal of outer 5%

2/10/2004

191

Onion design generation Step 1


Give investigation name and storage location Select SIMCA-P project, model number and how many components to consider

2/10/2004

192

Onion design generation Step 2


Factor definition the factors have pre-formatted settings of low and high according to minimum and maximum score values of each score vector

2/10/2004

193

Onion design generation Step 3


Response definition define responses (if any)

2/10/2004

194

Onion design generation Step 4


Select experimental objective we here select RSM

2/10/2004

195

Onion design generation Step 5


Select the desired model and design The number of layers can be changed here (but we use the default of 3)

2/10/2004

196

Onion design generation Step 6


In the Layers dialogue you can change the size of the layers, the number of design runs and repetitions, and which model is to be supported

2/10/2004

197

Onion design generation Step 7


One D-optimal design is generated inside each layer

2/10/2004

198

Onion design generation Step 8


The worksheet of the resulting onion design is inspected

2/10/2004

199

Onion design generation Step 9


Visualize the design (Design/Doptimal/Onion plot) Layers are clearly seen (center point hidden by other triangles)

2/10/2004

200

Onion design generation Step 10


Look at the identity of the selected compounds (Design/Doptimal/Candidate Set). A small excerpt from the candidate set is shown.

2/10/2004

201

Case II SMD of 80 compounds


A = 6 (R2X = 0.75; Q2X = 0.70) The optimal onion design has 25 + 24 + 30 + 1 = 80 molecules This onion design supports int., int., and quadratic models in the respective layer

2/10/2004

202

Case III SMD of 80 compounds; outer 5% removed


L1: 0 30% L2: 30 60% L3: 60 95% SMD of 80 runs (24 + 24 + 31 + 1) supporting int., int., and quadratic models Extreme observations not included!

2/10/2004

203

What have we learnt - I


Multivariate characterisation and PCA offers a compact representation the PC-scores of molecular data that is well suited for design statistical molecular design, SMD The use of design ensures that systematic and representative variation is introduced into the training set (not possible with the COST-approach) Changes in molecular structures are discrete, not continuous, making Doptimal design a viable alternative

2/10/2004

204

What have we learnt - II


Onions combine the best of D-optimal design (few points) with the best of cell-based and space-filling design (inner coverage). The flexibility of onion-designs in terms of the number of layers, and the number of points in each layer, makes them very useful in practice. Space-filling and cell-based designs are very similar, and when relatively few points are selected they give similar results to D-optimal design. Unlike D-optimal and onion designs, space-filling and cell-based designs cannot be modified to correspond to different models, i.e., linear, interaction, quadratic, etc.

2/10/2004

205

What have we learnt - III


A random complement to any systematic selection is always useful Combination of different approaches:
An outer D-optimal design combined with an inner space-filling sometimes used within pharmaceutical industry

Onion design has same objective as above combination

2/10/2004

206

Design of Experiments (DOE) Pharma Applications


Chapter 12 Mixture Design One Day Add-On

Contents
Introduction A working strategy for mixture design
Example: Tablets

Application: Bubbles (Screening)


Overview of Mixture Region Overview of Mixture Design Protocols Introduction to D-optimal Design Introduction to PLS

Application: Bubbles (RSM)

2/10/2004

Mixture Design Add-On

Introduction

Applications of DOE
Designs with process factors
Regular region: Factorials, Composite, Plackett-Burman, Box-Benhken Irregular region: D-optimal design

Designs with mixture factors


Regular region: Axial designs, Simplex Centroid designs Irregular region: D-optimal design

Combined designs of mixture and process factors


Always D-optimal design

2/10/2004

Design of Process Experiments


Experiments where the response Y is a function of the levels or amounts of the factors Y = F(X1, X2, X3, ...Xp) + The changes in the levels or amounts of each factor Xk are not coupled to (= independent of) changes in other factors Thus, orthogonal arrays of experiments can be constructed, such as
factorial designs (screening) composite designs (RSM) Plackett-Burman (screening) others

2/10/2004

Design of Process Experiments

Experimental domain is a regular (half-) hypercube

2/10/2004

Design of Mixture Experiments


Experiments where the response Y is a function of the proportions of the ingredients in the mixture and not of the amounts of the ingredients Y = F(X1, X2, X3, ...Xp) + Response Y: octane rating of gasoline, crushing strength of a tablet, smoothness of a cream, .... The response depends only on the relative proportions of the ingredients of the mixture

Xk = 1
We can express the relative proportions as fractions or percentages
2/10/2004

Design of Mixture Experiments

Linear

Quadratic

Experimental domain is a simplex (or polyhedron) Experimental region has dimensionality k-1, where k is the number of mixture factors
2/10/2004

Process and mixture factors together


Process and M ixture Factors

2/10/2004

Irregular experimental domains: D-optimal design


There are constraints in experimental space

A D-optimal design is a computer generated design that locates the experiments in such a way that the experimental region is well covered
2/10/2004

10

Example: Rocket Propellant


Example, Rocket Propellant: Three components were mixed together to form a rocket propellant. The purpose was to find a propellant with an elasticity of > 2900. Formulation factors
Binder Oxidizer Fuel 0.2-0.4 0.4-0.6 0.2-0.4

What is the "problem" with the worksheet ? Each row sums to 1.0 !!!

Consequences for the design ????


2/10/2004

11

Example: Rocket Propellant


What does the mixture design look like? The experimental domain with 01 bounds on the factors takes the form of a triangle Here we are investigating a limited region of the available experimental domain
Oxidiser Fuel Binder

2/10/2004

12

Example: Rocket Propellant


A quadratic model was used
Investigation: Rocket (PLS, comp.=2) Scaled & Centered Coefficients for Elasticity
200 100 0 -100 -200 Oxi Bin*Oxi Oxi*Oxi Fue*Fue Bin*Fue Oxi*Fue Fue Bin Bin*Bin
R2=0.801 Q2=0.249

N=10 DF=4

R2 Adj.=0.553 RSD=160.1071 Conf. lev.=0.95


MODDE 7 - 2004-01-23 10:39:16

The model predicts an area in which an elasticity exceeding 2900 is found

Coefficients show that binder and fuel have the strongest impact on elasticity

We are able to quantitatively describe elasticity in terms of three varied ingredients


2/10/2004

13

Mixture Design Add-On

A Working Strategy for Mixture Design

A Working Strategy for Mixture Design


1. D efinition of factors and bounds 10. U se of m odel

Illustrations: Tablet preparation & Bubble formation

2. Selection of experim ental objective and m ixture m odel

9. V isualization of m odelling results

3. Selection of candidate set

8. A nalysis of data and evaluation of m odel

4. G eneration of design

7. E xecution of design

5. E valuation of size and shape of m ixture region

6. D efinition of reference m ixture

2/10/2004

15

1. D efin ition of factors an d bound s

10. U se of m odel

2. Selection of experim ental ob jective an d m ixture m odel

9. V isu alization of m od elling results

Tablet: - 1. Definition of factors and bounds


Aim: To investigate tablet preparation and find out which factors that regulate the release rate of an active substance Mixture Factors:
Cellulose (0 - 1) Lactose (0 - 1) Phosphate (0 - 1) All factors sum to 100% (mixture constraint) Bounds display consistency

3. Selection of cand idate set

8. A nalysis of d ata and evaluation of m odel

4. G eneration of d esign

7. Execution of d esign

5. Evaluation of size and sh ap e of m ixture region

6. D efinition of reference m ixture

Constraint:
No other extra constraint

Response:
Release rate of the active substance (to be maximized)

2/10/2004

16

1. D efin ition of factors an d bound s

10. U se of m odel

2. Selection of experim ental ob jective an d m ixture m odel

9. V isu alization of m od elling results

Co-ordinates of a Simplex
At each corner, one component is pure, 1.0 At the opposite side, this component is absent, 0.0 The concentration is the same along a line parallel with the opposite side. E.g. for A along horizontal lines. Going from the corner A (A=1.0) down, corresponds to going through A=1.0, A=0.75, A=0.5, ..., A=0.0 In the same way, going through the corner B towards the opposite side, corresponds to going through B=1.0, B=0.75, B=0.5, ..., B=0.0. And analogously for C.
2/10/2004

3. Selection of cand idate set

8. A nalysis of d ata and evaluation of m odel

4. G eneration of d esign

7. Execution of d esign

5. Evaluation of size and sh ap e of m ixture region

6. D efinition of reference m ixture

17

1. D efin ition of factors an d bound s

10. U se of m odel

2. Selection of experim ental ob jective an d m ixture m odel

9. V isu alization of m od elling results

Tablet: - 1. Definition of factors and bounds


Checking for consistency of bounds Example:
0.1 A 0.5 0.1 B 0.3 0.2 C 0.4.

3. Selection of cand idate set

8. A nalysis of d ata and evaluation of m odel

4. G eneration of d esign

7. Execution of d esign

5. Evaluation of size and sh ap e of m ixture region

6. D efinition of reference m ixture

A LB UB

These bounds are inconsistent After a simple arithmetic check L*A (done automatically in the software) the new bounds become:
0.3 A 0.5 0.1 B 0.3 0.2 C 0.4.

UA

LA

LC

UC

2/10/2004

18

1. D efinition of factors and bounds

10. U se of model

Tablet: - 2. Selection of experimental objective and mixture model


Experimental objective:
Optimization

2. Selection of experimental objective and m ixture model

9. Visualization of modelling results

3. Selection of candidate set

8. A nalysis of data and evaluation of m odel

4. G eneration of design

7. Execution of design

5. Evaluation of size and shape of m ixture region

6. D efinition of reference mixture

Mixture model:
Quadratic

y = 0 + 1XMF1 + 2XMF2 + 3XMF3 + 11XMF12 + 22XMF22 + 33XMF32 + 12XMF1*XMF2 + 13XMF1XMF3 + 23XMF2XMF3 + Cox model type with constraints imposed on the regression coefficients

2/10/2004

19

1. Definition of factors and bounds

10. Use of model

Tablet: - 3. Selection of candidate set

2. Selection of experim ental objective and m ixture m odel

9. Visualization of m odelling results

3. Selection of candidate set

8. Analysis of data and evaluation of m odel

4. Generation of design

7. Execution of design

5. Evaluation of size and shape of m ixture region

6. Definition of reference m ixture

The candidate set is the pool of theoretically possible and meaningful experiments, from which the actual design is selected Here, the candidate set is small:
3 extreme vertices 3 centers of edges 3 interior points 1 overall centroid

In most cases but mixture applications, undesired experiments may be deleted from the candidate set prior to generation of the design

2/10/2004

20

1. D efinition of factors and bounds

10. Use of model

2. Selection of experimental objective and mixture model

9. Visualization of modelling results

Tablet: - 4. Generation of design

3. Selection of candidate set

8. A nalysis of data and evaluation of model

4. Generation of design

7. Execution of design

5. Evaluation of size and shape of mixture region

6. Definition of reference mixture

The design should contain experiments which are informative and map the experimental region as well as possible In this case the experimental region is regular and then the Simplex Centroid design is applicable

2/10/2004

21

Tablet: - 5. Evaluation of size and shape of mixture region


Introduction to regular mixture regions
A (1/0/0)
1 _ 1 _ / _ /1 3 3 3

1. D efinition of factors and bounds

10. U se of m odel

2. Selection of experim ental objective and m ixture m odel

9. V isualization of m odelling results

3. Selection of candidate set

8. A nalysis of data and evaluation of m odel

4. G eneration of design

7. E xecution of design

5. E valuation of size and shape of m ixture region

6. D efinition of reference m ixture

/0. 5/0

0.5 0 /0/

0.5

.5

B (0/1/0)

0/0.5/0.5

C (0/0/1)

1
X1 + X2 = 1

X2 0
2/10/2004

X1

1
22

Tablet: - 5. Evaluation of size and shape of mixture region


In MODDE: Show/Design Region Example: Bubbles (see more info. later)

1. D efinition of factors and bounds

10. U se of m odel

2. Selection of experim ental objective and m ixture m odel

9. V isualization of m odelling results

3. Selection of candidate set

8. A nalysis of data and evaluation of m odel

4. G eneration of design

7. E xecution of design

5. E valuation of size and shape of m ixture region

6. D efinition of reference m ixture

Useful approach to understand how and where the experiments are laid out

Glycerol = 0.0
2/10/2004

Glycerol = 0.1

Glycerol = 0.2
23

Tablet: - 5. Evaluation of size and shape of mixture region


Alternative designs for regular region (choice of model will be important)

1. D efinition of factors and bounds

10. U se of m odel

2. Selection of experim ental objective and m ixture m odel

9. V isualization of m odelling results

3. Selection of candidate set

8. A nalysis of data and evaluation of m odel

4. G eneration of design

7. E xecution of design

5. E valuation of size and shape of m ixture region

6. D efinition of reference m ixture

Linear

Quadratic

Special Cubic

2/10/2004

24

1. D efinition of facto rs and bounds

10. U se o f m odel

Tablet: - 6. Definition of reference mixture


The reference mixture is used to anchor the mathematical model - easy to find for regular regions (overall centroid)
1 _ 1 _ / _ /1 3 3 3

2. Selection of experim ental objective and m ixture m odel

9. V isualization o f m odelling results

3. Selection of candida te set

8. A nalysis of data and eva lua tion of m odel

4. G eneration of design

7. Execution of design

5. E valuation of size and sh ape of m ixture region

6. D efinition of reference m ixture

A (1/0/0)

0.5

Strongly irregular regions require an efficient algorithm to find overall centroid Serves the same function as the centerpoint does in process design

/0. 5/0

0.5 0 /0/ .5

B (0/1/0)

0/0.5/0.5

C (0/0/1)

Tablet preparation: 1/3,1/3,1/3

2/10/2004

25

1. Definition of factors and bounds

10. Use of m odel

Tablet: - 7. Execution of design


Important to carry out experiments in random order This is done in order to break down any systematic time trend to become a non-important and random unsystematic variation

2. Selection of experim ental objective and m ixture m odel

9. Visualization of m odelling results

3. Selection of candidate set

8. Analysis of data and evaluation of m odel

4. G eneration of design

7. Execution of design

5. Evaluation of size and shape of m ixture region

6. Definition of reference mixture

2/10/2004

26

Tablet: - 8. Analysis of data and evaluation of model


Analysis of data with PLS
Investigation: Waaler_rsm (PLS, comp.=3) Summary of Fit
1.00
R2 Q2

1. Definition of factors and bounds

10. Use of model

2. Selection of experim ental objective and m ixture model

9. Visualization of m odelling results

3. Selection of candidate set

8. Analysis of data and evaluation of m odel

4. Generation of design

7. Execution of design

5. Evaluation of size and shape of mixture region

6. Definition of reference mixture

Investigation: Waaler_rsm (PLS, comp.=3) release with Experiment Number labels


0.95 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.05 N-Probability

0.80

0.60

0.40

1 7 2 3 10
-1 0

9 6

4 5

0.20

0.00 release
N=10 DF=4 Cond. no.=7.4174 Y-miss=0

Standardized Residuals
N=10 DF=4 R2=0.985 Q2=0.553 R2 Adj.=0.966 RSD=18.7170
MODDE 7 - 2004-01-23 11:02:43

Exp #10 is a probable outlier - Should be re-tested If #10 is deleted and model refitted, Q2 improves (from 0.55 to 0.69) indicates a more valid model
2/10/2004

27

1. D efinition of factors and bounds

10. Use of model

2. Selection of experimental objective and mixture m odel

9. V isualization of modelling results

Tablet: - 9. Visualization of modelling results


Investigation: Waaler_rsm (PLS, comp.=3) Scaled & Centered Coefficients for release

3. Selection of candidate set

8. A nalysis of data and evaluation of m odel

4. Generation of design

7. Execution of design

5. Evaluation of size and shape of mixture region

6. D efinition of reference mixture

100

50 min

-50

-100 la la*la ce ce*ce ce*la ph*ph ce*ph ph la*ph

N=10 DF=4

R2=0.985 Q2=0.553

R2 Adj.=0.966 RSD=18.7170 Conf. lev.=0.95


MODDE 7 - 2004-01-23 11:03:19

Regression coefficients

Tri-linear contour plot

2/10/2004

28

1. D efinition of facto rs and bounds

10. U se o f m odel

Tablet: - 10. Use of model


Use of verifying experiments
Pred No cellulose lactose phosphate release (obs) 1 0.32 0 0.68 --2 0.5 0.125 0.375 370 3 0.333 0 0.667 340 4 0.667 0 0.333 345

2. Selection of experim ental objective and m ixture m odel

9. V isualization o f m odelling results

3. Selection of candida te set

8. A nalysis of data and eva lua tion of m odel

4. G eneration of design

7. Execution of design

5. E valuation of size and sh ape of m ixture region

6. D efinition of reference m ixture

release(pred) Lower Upper 363 322 404 293 262 324 363 322 405 320 278 361

Model predicts well except for blend 0.5/0.125/0.375 This strange experiment should be repeated and the model possibly updated with this new information
2/10/2004

29

Summary
DoE is an organized approach
Yields more useful information (influence of all factors together) Yields more precise information in fewer experiments Results evaluated in the light of variability A map of the system is obtained (useful for decision-making)

Mixture factors are constrained by Xk = 1


Such factors cannot be manipulated independently of one another Experimental region is a regular/irregular simplex

Approach to mixture design very similar to approach used for conventional process designs

2/10/2004

30

Mixture Design Add-On

Application: Bubbles (Screening)

1. D efinitio n of factors and bounds

10. U se of m odel

BubbleScr: - 1. Definition of factors and bounds


Aim: To investigate bubble formation and find out which factors that dominate bubble lifetime Process Factors:
Temperature (7 - 21C; refrigerator/kitchen temperature) Time (1 - 13 - 25h)

2. Selection of experim ental objective and m ixture m odel

9. V isualizatio n of m odelling results

3. Selection of candidate set

8. A nalysis of data and evaluation of m odel

4. G eneration of desig n

7. E xecution of desig n

5. Evaluation of size and shape of m ixture region

6. D efinition of reference m ixture

Tap Water, Ume (0.4 - 0.8) Glycerol, APOTEKETS (15% water content / 0.0 - 0.2)

Constraint:
0.2 DWL1 + DWL2 0.5

Response:
Lifetime of bubbles (sec) obtained with childrens bubble wand. Time until bursting was measured for bubbles of 4-5 cm size (diameter)

Mixture Factors:
Dish-washing liquid 1, SKONA, ICA (0 - 0.4) Dish-washing liquid 2, NEUTRAL, ADACO (0 - 0.4)

2/10/2004

32

1. D efinition of factors and bounds

10. U se of m odel

Mixture Region
Mixture components are not independent: X1 + X2 + ....+ Xp = total usually total = 1 or 100% Mixture region is constrained If NO additional bounds on the components; that is every component can vary between 0 and 1:

2. Selection of experim ental objective an d m ixture m odel

9. V isualization of m od elling results

3. Selection of cand idate set

8. A nalysis of data and evaluation of m odel

4. G eneration of d esign

7. E xecution of design

5. E valuation of size and shape of m ixture region

6. D efinition of reference m ixture

2/10/2004

33

1. D efinitio n of factors and bounds

10. U se of m odel

Properties of the Mixture Region


Size of region: Shape of region:

2. Selection of experim ental objective and m ixture m odel

9. V isualizatio n of m odelling results

3. Selection of candidate set

8. A nalysis of data and evaluation of m odel

4. G eneration of desig n

7. E xecution of desig n

5. Evaluation of size and shape of m ixture region

6. D efinition of reference m ixture

The mixture region might be very small. The size is inferred from calculations of Range Lower (RL) and Range Upper (RU) If the region is a regular simplex, several classical mixture designs are available If the region is irregular, the experiments are laid out D-optimally

Consistency of bounds:
Some combinations of bounds are disallowed Implied bounds arise from the stated bounds

The above properties are handled automatically in software (MODDE), but unawareness of them might lead to bad or unexpected results

2/10/2004

34

1. D efinitio n of factors and bounds

10. U se of m odel

Types of Bounds
Lower bounds only
Li Xi 1.0 0.4 Tap water 1.0 (example, not realistic for bubble formation)

2. Selection of experim ental objective and m ixture m odel

9. V isualizatio n of m odelling results

3. Selection of candidate set

8. A nalysis of data and evaluation of m odel

4. G eneration of desig n

7. E xecution of desig n

Lower and upper bounds


Li X i U i 0.1 DWL1 0.3

5. Evaluation of size and shape of m ixture region

6. D efinition of reference m ixture

Upper bounds only


0 Xi Ui 0 DWL1 0.4

Relational constraints
0.1 X1 + X5 0.5 0.3 DWL1 + DWL2 0.5 Tap water + DWL1 + DWL2 + Glycerol 70 (SEK/l)

2/10/2004

35

1. D efinitio n of factors and bounds

10. U se of m odel

Lower Bounds Only L-simplex


Example: Rocket experiment 0.2 binder 1 0.4 oxidizer 1 0.2 fuel 1 What is the shape of the mixture region ?
A (Binder)

2. Selection of experim ental objective and m ixture m odel

9. V isualizatio n of m odelling results

3. Selection of candidate set

8. A nalysis of data and evaluation of m odel

4. G eneration of desig n

7. E xecution of desig n

5. Evaluation of size and shape of m ixture region

6. D efinition of reference m ixture

Oxidizer > 0.4

Binder > 0.2

B (Oxidizer)

Fuel > 0.2

C (Fuel)

2/10/2004

36

1. D efinitio n of factors and bounds

10. U se of m odel

Upper Bounds Only -- U-simplex


A

2. Selection of experim ental objective and m ixture m odel

9. V isualizatio n of m odelling results

3. Selection of candidate set

8. A nalysis of data and evaluation of m odel

4. G eneration of desig n

7. E xecution of desig n

5. Evaluation of size and shape of m ixture region

6. D efinition of reference m ixture

Standard mixture design (Axial extended) applicable

2/10/2004

37

Upper Bounds Only -- Irregular region (No simplex)


A

1. D efinitio n of factors and bounds

10. U se of m odel

2. Selection of experim ental objective and m ixture m odel

9. V isualizatio n of m odelling results

3. Selection of candidate set

8. A nalysis of data and evaluation of m odel

4. G eneration of desig n

7. E xecution of desig n

5. Evaluation of size and shape of m ixture region

6. D efinition of reference m ixture

D-optimal design the only option

2/10/2004

38

1. D efinitio n of factors and bounds

10. U se of m odel

Lower and Upper Bounds - Regular region


Li xi Ui
Definition of Extreme Points : (Ui,Lji)
Upper (or Lower) bound of one factor with Lower (or Upper) bound of all the others

2. Selection of experim ental objective and m ixture m odel

9. V isualizatio n of m odelling results

3. Selection of candidate set

8. A nalysis of data and evaluation of m odel

4. G eneration of desig n

7. E xecution of desig n

5. Evaluation of size and shape of m ixture region

6. D efinition of reference m ixture

If all extreme points are valid Simplex (Regular region) Example : 0.1 A 0.8 0.1 B 0.8 0.1 C 0.8

All extreme points are valid

2/10/2004

39

1. D efinitio n of factors and bounds

10. U se of m odel

Lower and Upper Bounds - Irregular region


Experimental region is the intersection of the U simplex and the L simplex Most of the time the resulting region is an irregular polyhedron Example:
0.2 < A < 0.6 0.1 < B < 0.6 0.1 < C < 0.5

2. Selection of experim ental objective and m ixture m odel

9. V isualizatio n of m odelling results

3. Selection of candidate set

8. A nalysis of data and evaluation of m odel

4. G eneration of desig n

7. E xecution of desig n

5. Evaluation of size and shape of m ixture region

6. D efinition of reference m ixture

These bounds are consistent, but the experimental region is irregular D-optimal design
2/10/2004

40

1. D efinitio n of factors and bounds

10. U se of m odel

Mixture Region, Lower and Upper bounds


It is easy to display an irregular region in a threecomponent mixture With five or more ingredients, the human brain cannot overview the situation A computer generated (Doptimal) design solves the problem
A

2. Selection of experim ental objective and m ixture m odel

9. V isualizatio n of m odelling results

3. Selection of candidate set

8. A nalysis of data and evaluation of m odel

4. G eneration of desig n

7. E xecution of desig n

5. Evaluation of size and shape of m ixture region

6. D efinition of reference m ixture

2/10/2004

41

1. D efinitio n of factors and bounds

10. U se of m odel

Relational Constraints - Three mixture factors

2. Selection of experim ental objective and m ixture m odel

9. V isualizatio n of m odelling results

3. Selection of candidate set

8. A nalysis of data and evaluation of m odel

4. G eneration of desig n

7. E xecution of desig n

5. Evaluation of size and shape of m ixture region

6. D efinition of reference m ixture

The mixture region is almost never a simplex, but an irregular polyhedron

Mixture region is irregular following the definition of the relational constraint A + B 0.65 shown as the dotted line

B
2/10/2004

C
42

1. D efinitio n of factors and bounds

10. U se of m odel

Relational Constraints - Four mixture factors

2. Selection of experim ental objective and m ixture m odel

9. V isualizatio n of m odelling results

3. Selection of candidate set

8. A nalysis of data and evaluation of m odel

4. G eneration of desig n

7. E xecution of desig n

5. Evaluation of size and shape of m ixture region

6. D efinition of reference m ixture

Example: 2 x1 + x2 0.30

2/10/2004

43

1. D efinitio n of factors and bounds

10. U se of m odel

Summary
Mixture has only Lower Bounds
Experimental Region is always a simplex

2. Selection of experim ental objective and m ixture m odel

9. V isualizatio n of m odelling results

3. Selection of candidate set

8. A nalysis of data and evaluation of m odel

4. G eneration of desig n

7. E xecution of desig n

5. Evaluation of size and shape of m ixture region

6. D efinition of reference m ixture

Mixture has only Upper Bounds


Experimental region is a simplex if the sum of the q-1 largest upper bounds is 1.0 Irregular region often the result, which implies D-optimal design

Mixture has Lower and Upper bounds


Region is often irregular, which implies D-optimal design. MODDE detects inconsistent bounds and proposes a change

Mixture has relational constraints


Region is often irregular, which implies D-optimal design. MODDE detects inconsistent bounds or constraints and proposes a change

2/10/2004

44

BubbleScr: - 2. Selection of experimental objective and mixture model


Experimental objective:
Screening

1. Definition of factors and bounds

10. Use of model

2. Selection of experimental objective and mixture m odel

9. Visualization of modelling results

3. Selection of candidate set

8. Analysis of data and evaluation of m odel

4. Generation of design

7. Execution of design

5. Evaluation of size and shape of mixture region

6. Definition of reference mixture

Mixture model:
Process Model: Mixture Model: Process*Mixture Model: Interaction Linear Interaction

y = 0 + 1XPF1 + 2XPF2 + 3XMF3 + 4XMF4 + 5XMF5 + 6XMF6 + 12XPF1*XPF2 + 13XPF1*XMF3 + 14XPF1*XMF4 + 15XPF1XMF5 + 16XPF1XMF6 + 23XPF2XMF3 + 24XPF2XMF4 + 25XPF2XMF5 + 26XPF2XMF6 +

2/10/2004

45

1. Definition of factors and bounds

10. Use of m odel

BubbleScr: - 3. Selection of candidate set


Overview of candidate set 48 extreme vertices (4 process "corners" * 12 mixture extreme vertices) 48 centers of edges (4*12) 72 centroids of highdimensional surfaces (4*18) 1 overall centroid

2. Selection of experimental objective and m ixture m odel

9. Visualization of m odelling results

3. Selection of candidate set

8. Analysis of data and evaluation of model

4. Generation of design

7. Execution of design

5. Evaluation of size and shape of m ixture region

6. Definition of reference mixture

2/10/2004

46

1. Definition of factors and bounds

10. Use of m odel

2. Selection of experimental objective and m ixture m odel

9. Visualization of m odelling results

BubbleScr: - 4. Generation of design


The proposed model needs 13 degrees of freedom (DF)
1 DF for the constant 2 DF for the linear terms of the process factors 3 DF for the linear terms of the mixture factors 1 DF for the process*process interaction 6 DF (2*3) for the process*mixture interactions

3. Selection of candidate set

8. Analysis of data and evaluation of model

4. Generation of design

7. Execution of design

5. Evaluation of size and shape of m ixture region

6. Definition of reference mixture

Add 5 extra experiments to get enough DF In addition, 2 supplementary experiments are recommended to handle the complexity introduced by the linear constraint lead no. of experiments, N = 20 (Note: no replicates included in this estimate)
2/10/2004

47

1. Definition of factors and bounds

10. Use of m odel

2. Selection of experimental objective and m ixture m odel

9. Visualization of m odelling results

Designs for Regular and Irregular Regions


Regular Region (Simplex): Classical mixture designs
Screening: Optimization: Determine component effects Axial Designs Good approximation of the response Simplex Centroid Designs

3. Selection of candidate set

8. Analysis of data and evaluation of model

4. Generation of design

7. Execution of design

5. Evaluation of size and shape of m ixture region

6. Definition of reference mixture

Irregular region (when the Experimental Region is not a Simplex):


Screening & Optimization: D-Optimal designs

2/10/2004

48

1. Definition of factors and bounds

10. Use of m odel

2. Selection of experimental objective and m ixture m odel

9. Visualization of m odelling results

Axes of a Simplex
Definition: The xi axis of the simplex is the one-dimensional subspace of the simplex where: xj = (1-xi)/(q-1) for all ji

3. Selection of candidate set

8. Analysis of data and evaluation of model

4. Generation of design

7. Execution of design

5. Evaluation of size and shape of m ixture region

6. Definition of reference mixture

The xi axis is the line perpendicular to the xi = 0 base of the simplex and passing through the centroid of the simplex
2/10/2004

B
Axes of components

A (x1), B (x2) and C (x3)

49

1. Definition of factors and bounds

10. Use of m odel

2. Selection of experimental objective and m ixture m odel

9. Visualization of m odelling results

Axial Designs
Axial designs consist of mixtures situated entirely on the axes of the simplex With Axial designs most of the points are positioned inside the simplex and consist of complete mixtures of q component blends Axial designs are recommended for use when component effects are to be measured for screening experiments and when linear models are to be fitted Extended Axial
2/10/2004

3. Selection of candidate set

8. Analysis of data and evaluation of model

4. Generation of design

7. Execution of design

5. Evaluation of size and shape of m ixture region

6. Definition of reference mixture

Standard Axial

50

1. Definition of factors and bounds

10. Use of m odel

2. Selection of experimental objective and m ixture m odel

9. Visualization of m odelling results

Simplex Centroid Designs


Simplex Centroid design is used for optimization; normally, it has experiments situated
at vertex points at edge centers at lower-dimensioned face centers at interior check points at the overall centroid

3. Selection of candidate set

8. Analysis of data and evaluation of model

4. Generation of design

7. Execution of design

5. Evaluation of size and shape of m ixture region

6. Definition of reference mixture

or a combination of these

2/10/2004

51

1. Definition of factors and bounds

10. Use of m odel

Designs When the Experimental Region is Irregular

2. Selection of experimental objective and m ixture m odel

9. Visualization of m odelling results

3. Selection of candidate set

8. Analysis of data and evaluation of model

4. Generation of design

7. Execution of design

5. Evaluation of size and shape of m ixture region

6. Definition of reference mixture

The Extreme Vertices designs of McLean-Anderson provide the best available solution to the constrained design The Extreme Vertices are those points that lie on the intersection of the constrained boundaries Extreme Vertices are generated by forming all possible combinations of the q-1 constraints, and calculating the level of the qth component.
This gives a q*2q-1 possible points The Extreme Vertices are those points whose component levels lie within the constraints

2/10/2004

52

1. Definition of factors and bounds

10. Use of m odel

2. Selection of experimental objective and m ixture m odel

9. Visualization of m odelling results

Extreme Vertices
Rapidly increasing complexity q
2 3 4 5 6 7 8 9 10 11 12

3. Selection of candidate set

8. Analysis of data and evaluation of model

4. Generation of design

7. Execution of design

5. Evaluation of size and shape of m ixture region

6. Definition of reference mixture

points
4 12 32 80 192 448 1024 2304 5120 11264 24576

2/10/2004

53

1. Definition of factors and bounds

10. Use of m odel

2. Selection of experimental objective and m ixture m odel

9. Visualization of m odelling results

Example of Finding the Extreme Vertices


Mixture system with the following constraints:
0.2 A 0.6; 0.1 B 0.6; 0.1 C 0.5 Vertex A B C ________________________ 1 .20 .10 * 2 .20 .60 .20 * 3 .60 .10 .30 4 .60 .60 5 .20 .10 * 6 .20 .30 .50 * 7 .60 .30 .10 8 .60 .50 9 .10 .10 * 10 .40 .10 .50 * 11 .30 .60 .10 12 .60 .50

3. Selection of candidate set

8. Analysis of data and evaluation of model

4. Generation of design

7. Execution of design

5. Evaluation of size and shape of m ixture region

6. Definition of reference mixture

7 3 11 2 10 6

B
2/10/2004

C
54

1. Definition of factors and bounds

10. Use of m odel

2. Selection of experimental objective and m ixture m odel

9. Visualization of m odelling results

Mixture Design when Region is Irregular


We have to consider:
Extreme Vertices Edge Centers Face Centers Overall Centroid

3. Selection of candidate set

8. Analysis of data and evaluation of model

4. Generation of design

7. Execution of design

5. Evaluation of size and shape of m ixture region

6. Definition of reference mixture

Design: Selected D-Optimally: Screening: Linear Model


Subset or All Extreme Vertices Overall Centroid
2/10/2004

Optimization : Quadratic Model


All or Subset Vertices Edge Centers Face Centers Centroid
55

1. Definition of factors and bounds

10. Use of m odel

2. Selection of experimental objective and m ixture m odel

9. Visualization of m odelling results

Introduction to D-optimal design

3. Selection of candidate set

8. Analysis of data and evaluation of model

4. Generation of design

7. Execution of design

5. Evaluation of size and shape of m ixture region

6. Definition of reference mixture

A D-optimal design is a computer generated design, and consists of the best subset of experiments selected from the candidate set For a given model, Y = X + , the following can be said regarding the D-optimal approach:
the selected runs maximize the determinant of the matrix X'X these experiments span the largest volume possible in the experimental region

A D-optimal design can be tailored to support an irregular experimental region, or a very complex problem set-up (process + mixture)

2/10/2004

56

1. Definition of factors and bounds

10. Use of m odel

2. Selection of experimental objective and m ixture m odel

9. Visualization of m odelling results

A small D-optimal example


Example: 22 full factorial design with factors x1 and x2

3. Selection of candidate set

8. Analysis of data and evaluation of model

4. Generation of design

7. Execution of design

5. Evaluation of size and shape of m ixture region

6. Definition of reference mixture

run 1 2 3 4

x1 -1 1 -1 1

x2 -1 -1 1 1

Model y = b0 + b1x1 + b2x2 + + b12x1x2 + e

Model in matrix form y = Xb + e b = (XX)-1Xy

2/10/2004

57

1. Definition of factors and bounds

10. Use of m odel

2. Selection of experimental objective and m ixture m odel

9. Visualization of m odelling results

D-optimal example, the Covariance matrix (XX)-1


X
1 1 1 1 -1 1 -1 1 -1 -1 1 1 1 -1 -1 1 1 -1 -1 1 1 1 -1 -1

3. Selection of candidate set

8. Analysis of data and evaluation of model

4. Generation of design

7. Execution of design

5. Evaluation of size and shape of m ixture region

6. Definition of reference mixture

X
1 -1 1 -1 1 1 1 1

(XX)
4 0 0 0 0 4 0 0 0 0 4 0 0 0 0 4 0.25 0 0 0

(XX)-1
0 0.25 0 0 0 0 0.25 0 0 0 0 0.25

Precision in b from:

(XX)-1 * RSD * t smallest (XX)-1 largest XX


58

2/10/2004

1. Definition of factors and bounds

10. Use of m odel

2. Selection of experimental objective and m ixture m odel

9. Visualization of m odelling results

A second small D-optimal example


Problem: two factors (x1/x2) varied in three levels Proposed model:
y = b0 + b1x1 + b2x2 + e model needs 3 DF

3. Selection of candidate set

8. Analysis of data and evaluation of model

4. Generation of design

7. Execution of design

5. Evaluation of size and shape of m ixture region

6. Definition of reference mixture

det=0
1 1 1

det=1

(9! / (3!*6!)) = 84 ways of selecting 3 trials out of 9 Maximize the determinant det(XX) Best precision in estimated regression coefficients with det = 16

-1

-1

-1

-1

-1 -1 1 0 1

det=4
1 1

det=9

det=16

-1 -1 0 1

-1

-1

-1

-1

2/10/2004

59

1. Definition of factors and bounds

10. Use of m odel

2. Selection of experimental objective and m ixture m odel

9. Visualization of m odelling results

How to compute a determinant


Example: experiments spread according to a determinant of 4
1

3. Selection of candidate set

8. Analysis of data and evaluation of model

4. Generation of design

7. Execution of design

5. Evaluation of size and shape of m ixture region

6. Definition of reference mixture

X
1 1 1
3 -1 0 -1 1 0

X
0 -1 1
0 0 2 3 -1 0

-1

-1

XX
-1 1 0 0 0 2
3 -1 0 -1 1 0

-1 0 0

1 -1 0
-1 1 0

1 0 -1
3 -1 0

1 0 1
-1 1 0

3 -1 0
0 0 2

(3*1*2) + (-1*0*0) + (0*-1*0) - (0*1*0) - (0*0*3) - (2*-1*-1) = 4


2/10/2004

60

1. Definition of factors and bounds

10. Use of m odel

2. Selection of experimental objective and m ixture m odel

9. Visualization of m odelling results

Features of the D-optimal approach


Assumes that the selected regression model is "correct" and "true Sensitive to model choice Potential terms may be added to protect against this sensitivity

3. Selection of candidate set

8. Analysis of data and evaluation of model

4. Generation of design

7. Execution of design

5. Evaluation of size and shape of m ixture region

6. Definition of reference mixture

2/10/2004

61

1. Definition of factors and bounds

10. Use of m odel

2. Selection of experimental objective and m ixture m odel

9. Visualization of m odelling results

Evaluation criteria
Two common evaluation criteria: Condition number
- ratio of largest to smallest singular value of X - a measure of sphericity - 1 is lower (ideal) limit, denotes orthogonal design

3. Selection of candidate set

8. Analysis of data and evaluation of model

4. Generation of design

7. Execution of design

5. Evaluation of size and shape of m ixture region

6. Definition of reference mixture

G-efficiency
- computed as Geff = 100*p/n*d - compares the efficiency of a D-optimal design to that of a fractional factorial design - 100% is the upper limit and designates that a fractional factorial design was obtained - above 60-70% is recommended

2/10/2004

62

1. Definition of factors and bounds

10. Use of m odel

2. Selection of experimental objective and m ixture m odel

9. Visualization of m odelling results

BubbleScr: - 4. Generation of design


The proposed model needs 13 degrees of freedom (DF)
1 DF for the constant 2 DF for the linear terms of the process factors 3 DF for the linear terms of the mixture factors 1 DF for the process*process interaction 6 DF (2*3) for the process*mixture interactions

3. Selection of candidate set

8. Analysis of data and evaluation of model

4. Generation of design

7. Execution of design

5. Evaluation of size and shape of m ixture region

6. Definition of reference mixture

Add 5 extra experiments to get enough DF In addition, 2 supplementary experiments are recommended to handle the complexity introduced by the linear constraint lead no. of experiments, N = 20 (Note: no replicates included in this estimate)
2/10/2004

63

1. Definition of factors and bounds

10. Use of m odel

2. Selection of experimental objective and m ixture m odel

9. Visualization of m odelling results

BubbleScr: - 4. Generation of design

3. Selection of candidate set

8. Analysis of data and evaluation of model

4. Generation of design

7. Execution of design

5. Evaluation of size and shape of m ixture region

6. Definition of reference mixture

lead no. of experiments, N = 20 (Note: no replicates included in this estimate) Due to the element of randomness in the D-optimal search, we recommend to explore N 4 runs and generate 5 versions for each level of N 4 We explored N=16 to N=24 45 alternative D-optimal designs Best design with N = 16 (Geff = 76%, CondNo = 2.7)

2/10/2004

64

1. Definition of factors and bounds

10. Use of m odel

2. Selection of experimental objective and m ixture m odel

9. Visualization of m odelling results

BubbleScr: - 4. Generation of design


Best design with N = 16 (Geff = 76%, CondNo = 2.7)

3. Selection of candidate set

8. Analysis of data and evaluation of model

4. Generation of design

7. Execution of design

5. Evaluation of size and shape of m ixture region

6. Definition of reference mixture

2 series of 4 replicates were added 24 runs

2/10/2004

65

Tablet: - 5. Evaluation of size and shape of mixture region

1. D efinition of factors and bounds

10. U se of m odel

2. Selection of experim ental objective and m ixture m odel

9. V isualization of m odelling results

3. Selection of candidate set

8. A nalysis of data and evaluation of m odel

4. G eneration of design

7. E xecution of design

5. Evaluation of size and shape of m ixture region

6. D efinition of reference m ixture

Useful approach to understand how and where the experiments are laid out
In MODDE: Show/Design Region It was concluded that the shape of the experimental region was reasonable and not too distorted, and of sufficient size

Glycerol = 0.0
2/10/2004

Glycerol = 0.1

Glycerol = 0.2
66

1. D efinition of facto rs an d b ou nd s

10 . U se of m odel

BubbleScr: - 6. Definition of reference mixture

2. Selection of ex perim ental ob jectiv e an d m ixture m o del

9. V isu alizatio n o f m od ellin g resu lts

3. Selection of ca nd id ate set

8. A n alysis of d ata a nd evalua tio n of m od el

4. G eneration o f d esign

7. E xecutio n o f d esign

5. E valuation of size an d sh ap e of m ixtu re region

6. D efin itio n of reference m ixtu re

The reference mixture is used for anchoring the mathematical model easy to find for regular regions (overall centroid) Strongly irregular regions require efficient algorithm to find the centroid Serves the same function as the center-points in process design Calculated reference mixture: (0.183/0.183/0.55/0.084) (DWL1 / DWL2 / water / glycerol) Manually modified reference mixture: (0.2 / 0.2 / 0.5 / 0.1)
2/10/2004

67

1. D efinition of facto rs an d b ou nd s

10 . U se of m odel

Computation of Centroid for Constrained Region


Several possibilities:
Overall Center of Mass (COM)

2. Selection of ex perim ental ob jectiv e an d m ixture m o del

9. V isu alizatio n o f m od ellin g resu lts

3. Selection of ca nd id ate set

8. A n alysis of d ata a nd evalua tio n of m od el

4. G eneration o f d esign

7. E xecutio n o f d esign

5. E valuation of size an d sh ap e of m ixtu re region

6. D efin itio n of reference m ixtu re

- computationally extensive
Averages of all extreme vertices (AVG) Range Normalized Midrange (used in MODDE):
RNM (s1, s2, si , ., sq) si = mi - [Ri*(mj - 1.0)/Rj] i = 1 to q; j = 1 to q

Range: Ri = Ui - Li
Midrange: mi = (Ui + Li)/2

2/10/2004

68

1. D efinition of factors and bounds

10. U se of model

BubbleScr: - 7. Execution of design


Carry out experiments in randomized order Here, pseudo-random order was used due to the time factor; Look at RunOrder column ExpNo ExpName RunOrder InOut Temp Time DWL1 DWL2 Water Glycerol Lifetime
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 N1 N2 N3 N4 N5 N6 N7 N8 N9 N10 N11 N12 N13 N14 N15 N16 N17 N18 N19 N20 N21 N22 N23 N24 15,4 3,1 22,8 7,3 4,2 18,7 17,6 16,5 2,17 19,22 14,21 12,19 21,24 20,23 13,20 5,18 10,13 6,10 11,14 23,15 1,9 8,11 9,12 24,16 In In In In In In In In In In In In In In In In In In In In In In In In 7 7 7 7 21 21 21 21 7 7 7 7 21 21 21 21 7 7 7 7 21 21 21 21 1 1 1 1 1 1 1 1 25 25 25 25 25 25 25 25 13 13 13 13 13 13 13 13 0 0.4 0 0.2 0.4 0.1 0.2 0 0.4 0.1 0.2 0 0 0.4 0 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.4 0.1 0.2 0 0 0.4 0 0.2 0 0.4 0 0.2 0.4 0.1 0.2 0 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.4 0.5 0.8 0.6 0.4 0.5 0.8 0.6 0.4 0.5 0.8 0.6 0.4 0.5 0.8 0.6 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.2 0 0 0.2 0.2 0 0 0.2 0.2 0 0 0.2 0.2 0 0 0.2 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 139 19 14 60 208 15 11 35 362 25 26 52 213 40 33 74 94 78 132 117 61 54 77 43

2. Selection of experim ental objective and m ixture m odel

9. V isualization of m odelling results

3. Selection of candidate set

8. A nalysis of data and evaluation of m odel

4. G eneration of design

7. Execution of design

5. Evaluation of size and shape of m ixture region

6. D efinition of reference m ixture

Bubble lifetime ranges between 11 and 362 sec


2/10/2004

69

BubbleScr: - 8. Analysis of data and evaluation of model

1. D efinition of factors and bounds

10. Use of model

2. Selection of experimental objective and mixture m odel

9. V isualization of modelling results

3. Selection of candidate set

8. A nalysis of data and evaluation of m odel

4. Generation of design

7. Execution of design

5. Evaluation of size and shape of mixture region

6. D efinition of reference mixture

Initial regression model was fitted with PLS


R2 Investigation: Bubb_scr (PLS, comp.=2) Q2 Summary of Fit Model Validity

Investigation: Bubb_scr (PLS, comp.=2) Scaled & Centered Coefficients for Lifetime~
0.60

Reproducibility

1.00 0.80

0.40
0.60 0.40 0.20 0.00 -0.20 Lifetime~
N=24 DF=11 Cond. no.=2.7203 Y-miss=0

0.20 s 0.00 -0.20 -0.40 -0.60 Ti Te*Ti Te*Gly Ti*DW1 Te*DW1 Te*DW2 Ti*DW2 Te*Wa Ti*Gly Gly Ti*Wa Te DW1 DW2 Wa

N=24 DF=11

R2=0.812 Q2=0.185

R2 Adj.=0.608 RSD=0.2476 Conf. lev.=0.95


MODDE 7 - 2004-02-02 12:46:48

Poor model - many insignificant interaction terms that should be removed

2/10/2004

70

1. D efinition of factors and bounds

10. Use of model

When to use PLS


PLS is a pertinent choice, if (i) there are several correlated responses in the data set, (ii) the experimental design has a high condition number (>10), or (iii) there are small amounts of missing data in the response matrix (iv) the application involves a mixture (formulation) design

2. Selection of experimental objective and mixture m odel

9. V isualization of modelling results

3. Selection of candidate set

8. A nalysis of data and evaluation of m odel

4. Generation of design

7. Execution of design

5. Evaluation of size and shape of mixture region

6. D efinition of reference mixture

2/10/2004

71

1. D efinition of factors and bounds

10. Use of model

PLS -- Notation
K M N A = = = = number of X variables number of Y variables number of observations number of PLS components

2. Selection of experimental objective and mixture m odel

9. V isualization of modelling results

3. Selection of candidate set

8. A nalysis of data and evaluation of m odel

4. Generation of design

7. Execution of design

5. Evaluation of size and shape of mixture region

6. D efinition of reference mixture

T P W U C

= = = = =

matrix of X-scores with col.s t1,.., tA (vectors) matrix of X-loadings with col.s p1,.., pA (vectors) matrix of PLS X-weights with col.s w1,.., wA (vectors) matrix of Y-scores with col.s u1,.., uA (vectors) matrix of PLS Y-weights with col.s c1,.., cA (vectors)

2/10/2004

72

1. D efinition of factors and bounds

10. Use of model

PLS -- Scaling of variables


x3
measured values & "length"

2. Selection of experimental objective and mixture m odel

9. V isualization of modelling results

3. Selection of candidate set

8. A nalysis of data and evaluation of m odel

4. Generation of design

7. Execution of design

5. Evaluation of size and shape of mixture region

6. D efinition of reference mixture

3
x1 x2 x3

unit variance scaling

20
x1

x2

Defining/Selecting the length of variable axes (X and Y-spaces) Recommended: To set each axis to unit length (unit variance scaling)
2/10/2004

73

1. D efinition of factors and bounds

10. Use of model

PLS -- Geometric Interpretation, 1


x3
factors/predictors K=3
observations

2. Selection of experimental objective and mixture m odel

9. V isualization of modelling results

3. Selection of candidate set

8. A nalysis of data and evaluation of m odel

4. Generation of design

7. Execution of design

responses M=3

y3

5. Evaluation of size and shape of mixture region

6. D efinition of reference mixture

X
N N

Y
x2 y2

x1

y1

For each matrix, X and Y, we construct a space with K and M dimensions, respectively (here K=M=3) Each X- and Y-variable has one coordinate axis with the length defined by its scaling, typically unit variance
2/10/2004

74

1. D efinition of factors and bounds

10. Use of model

PLS -- Geometric Interpretation, 2

2. Selection of experimental objective and mixture m odel

9. V isualization of modelling results

3. Selection of candidate set

8. A nalysis of data and evaluation of m odel

4. Generation of design

7. Execution of design

5. Evaluation of size and shape of mixture region

6. D efinition of reference mixture

Each observation is represented by one point in the X-space and one in the Y-space As in PCA, the initial step is to calculate and subtract the averages; this corresponds to moving the coordinate systems

2/10/2004

75

1. D efinition of factors and bounds

10. Use of model

PLS -- Geometric Interpretation, 3


x3 y3

2. Selection of experimental objective and mixture m odel

9. V isualization of modelling results

3. Selection of candidate set

8. A nalysis of data and evaluation of m odel

4. Generation of design

7. Execution of design

5. Evaluation of size and shape of mixture region

6. D efinition of reference mixture

x2
x1 y1
Same observation

y2

The mean-centering procedure implies that the origos of the coordinate systems are repositioned
2/10/2004

76

1. D efinition of factors and bounds

10. Use of model

PLS -- Geometric Interpretation, 4


x3

2. Selection of experimental objective and mixture m odel

9. V isualization of modelling results

3. Selection of candidate set

8. A nalysis of data and evaluation of m odel

4. Generation of design

7. Execution of design

Comp 1 (t1) y3 Comp 1 (u1)

5. Evaluation of size and shape of mixture region

6. D efinition of reference mixture

x2
x1 y1
Projection of observation i

y2

The first PLS-component is a line in X-space and a line in Y-space, calculated to a) well approximate the point-swarms in X and Y and b) maximize covariance between the projections (t1 and u1) These lines pass through the average points
2/10/2004

77

1. D efinition of factors and bounds

10. Use of model

PLS- Geometric Interpretation, 5

2. Selection of experimental objective and mixture m odel

9. V isualization of modelling results

3. Selection of candidate set

8. A nalysis of data and evaluation of m odel

4. Generation of design

7. Execution of design

5. Evaluation of size and shape of mixture region

6. D efinition of reference mixture

The projection coordinates, t1 and u1, in the two spaces, X and Y, are connected and correlated through the inner relation ui1 = ti1 + hi (where hi is a residual) The slope of the dotted line is 1.0
2/10/2004

78

1. D efinition of factors and bounds

10. Use of model

PLS -- Geometric Interpretation, 6


x3 Comp 1 (t1) y3

2. Selection of experimental objective and mixture m odel

9. V isualization of modelling results

3. Selection of candidate set

8. A nalysis of data and evaluation of m odel

4. Generation of design

7. Execution of design

5. Evaluation of size and shape of mixture region

6. D efinition of reference mixture

Comp 1 (u1) Comp 2 (u2)


x2 y2

Comp 2 (t2)

x1

y1

The second PLS component is represented by lines in the X- and Y-spaces orthogonal to the lines of the first component, also going through the average points. These lines, t2 and u2, improve the approximation and correlation as much as possible.

2/10/2004

79

1. D efinition of factors and bounds

10. Use of model

PLS -- Geometric Interpretation, 7

2. Selection of experimental objective and mixture m odel

9. V isualization of modelling results

3. Selection of candidate set

8. A nalysis of data and evaluation of m odel

4. Generation of design

7. Execution of design

5. Evaluation of size and shape of mixture region

6. D efinition of reference mixture

The second projection coordinates (t2 and u2) correlate, but less well than the first pair of latent variables By inserting X-values of a new observation into the model, we obtain its t1- and t2scores, which through the inner relation give values of u1 and u2, which, in turn, enable predicted values of Y to be computed
2/10/2004

80

1. D efinition of factors and bounds

10. Use of model

PLS -- Geometric Interpretation, 8


x3

2. Selection of experimental objective and mixture m odel

9. V isualization of modelling results

3. Selection of candidate set

8. A nalysis of data and evaluation of m odel

4. Generation of design

7. Execution of design

Comp 1 (t1) y3 Comp 1 (u1) Comp 2 (u2)

5. Evaluation of size and shape of mixture region

6. D efinition of reference mixture

Comp 2 (t2)

x2
x1 y1

y2

The PLS components form planes in X- and Y-spaces The variability around the X-plane is used to calculate a tolerance interval within which new observations similar to the training set will be located. This is of interest in classification and prediction.
2/10/2004

81

1. D efinition of factors and bounds

10. Use of model

PLS -- Geometric Interpretation, 9


Repeated plotting of successive pairs of latent variables will give a good appreciation of the correlation structure

2. Selection of experimental objective and mixture m odel

9. V isualization of modelling results

3. Selection of candidate set

8. A nalysis of data and evaluation of m odel

4. Generation of design

7. Execution of design

5. Evaluation of size and shape of mixture region

6. D efinition of reference mixture

2/10/2004

82

1. D efinition of factors and bounds

10. Use of model

PLS -- Overview
X = 1* x + T* P'+E Y = 1* y + U *C'+F = 1 * y + T * C'+G
(because U = T + H) (inner relation)

2. Selection of experimental objective and mixture m odel

9. V isualization of modelling results

3. Selection of candidate set

8. A nalysis of data and evaluation of m odel

4. Generation of design

7. Execution of design

5. Evaluation of size and shape of mixture region

6. D efinition of reference mixture

PLS Projection of X that both approximates X well, and correlates with Y


2/10/2004

differences to

PCA Projection of X that is an optimal approximation of X (least squares fit)


83

1. D efinition of factors and bounds

10. Use of model

PLS -- Parameter properties


For each component: 1) t are linear combinations of X with weight w - t is a summary of the X variables that are correlated with Y 2) u are linear combinations of Y with weight c - u is a summary of the Y variables

2. Selection of experimental objective and mixture m odel

9. V isualization of modelling results

3. Selection of candidate set

8. A nalysis of data and evaluation of m odel

4. Generation of design

7. Execution of design

5. Evaluation of size and shape of mixture region

6. D efinition of reference mixture

3) w are the correlation coefficients between the x's and u - Columns of X highly correlated with Y are given high weights 4) At Convergence for the Orthogonality: - p is computed so that t*p' is the "Best approximation of X" - t*p' is removed from X for the next component

2/10/2004

84

1. D efinition of factors and bounds

10. Use of model

Summary of PLS

2. Selection of experimental objective and mixture m odel

9. V isualization of modelling results

3. Selection of candidate set

8. A nalysis of data and evaluation of m odel

4. Generation of design

7. Execution of design

PLS is a multivariate regression method which is useful for handling complex DOE problems PLS is especially useful when:
(i) there are several correlated responses in the data set (ii) the experimental design has a high condition number (iii) there are small amounts of missing data in the response matrix

5. Evaluation of size and shape of mixture region

6. D efinition of reference mixture

PLS calculates a new variable, t, summarizing X, and a another new variable, u, summarizing Y, and investigates the correlation between them All diagnostic tools available for MLR are retained for PLS In addition, PLS provides other diagnostic tools, such as, scores, loadings, and VIP

2/10/2004

85

BubbleScr: - 8. Analysis of data and evaluation of model

1. D efinition of factors and bounds

10. Use of model

2. Selection of experimental objective and mixture m odel

9. V isualization of modelling results

3. Selection of candidate set

8. A nalysis of data and evaluation of m odel

4. Generation of design

7. Execution of design

5. Evaluation of size and shape of mixture region

6. D efinition of reference mixture

Initial regression model was fitted with PLS


R2 Investigation: Bubb_scr (PLS, comp.=2) Q2 Summary of Fit Model Validity

Investigation: Bubb_scr (PLS, comp.=2) Scaled & Centered Coefficients for Lifetime~
0.60

Reproducibility

1.00 0.80

0.40
0.60 0.40 0.20 0.00 -0.20 Lifetime~
N=24 DF=11 Cond. no.=2.7203 Y-miss=0

0.20 s 0.00 -0.20 -0.40 -0.60 Ti Te*Ti Te*Gly Ti*DW1 Te*DW1 Te*DW2 Ti*DW2 Te*Wa Ti*Gly Gly Ti*Wa Te DW1 DW2 Wa

N=24 DF=11

R2=0.812 Q2=0.185

R2 Adj.=0.608 RSD=0.2476 Conf. lev.=0.95


MODDE 7 - 2004-02-02 12:46:48

Poor model - many insignificant interaction terms that should be removed

2/10/2004

86

BubbleScr: - 8. Analysis of data and evaluation of model


All interaction terms were eliminated and the model was refitted
R2 Investigation: Bubb_scr (PLS, comp.=2) Q2 Summary of Fit Model Validity

1. D efinition of factors and bounds

10. Use of model

2. Selection of experimental objective and mixture m odel

9. V isualization of modelling results

3. Selection of candidate set

8. A nalysis of data and evaluation of m odel

4. Generation of design

7. Execution of design

5. Evaluation of size and shape of mixture region

6. D efinition of reference mixture

Investigation: Bubb_scr (PLS, comp.=2) Lifetime~ with Experiment Number labels


0.98 0.95 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.05 0.02 N-Probability

Reproducibility

1.00

0.80

0.60

0.40

0.20

12

2 10
-1

20 5 13 23 17 1 3 9 11 18 21 14 227 4 24 86 16
0 Standardized Residuals 1

19

15

0.00 Lifetime~
N=24 DF=18 Cond. no.=2.1537 Y-miss=0

N=24 DF=18

R2=0.796 Q2=0.640

R2 Adj.=0.739 RSD=0.2018
MODDE 7 - 2004-02-02 12:50:00

2/10/2004

87

1. D efinition of factors and bounds

10. Use of model

2. Selection of experimental objective and mixture m odel

9. V isualization of modelling results

BubbleScr: - 9. Visualization of modelling results


Investigation: Bubb_scr (PLS, comp.=2) Scaled & Centered Coefficients for Lifetime~
0.30 0.20 0.10 0.00 -0.10 -0.20 Ti DW1 DW2 Gly Te Wa

3. Selection of candidate set

8. A nalysis of data and evaluation of m odel

4. Generation of design

7. Execution of design

5. Evaluation of size and shape of mixture region

6. D efinition of reference mixture

N=24 DF=18

R2=0.796 Q2=0.640

R2 Adj.=0.739 RSD=0.2018 Conf. lev.=0.95


MODDE 7 - 2004-02-02 12:51:01

Regression coefficients (reference mixture: 0.2/0.2/0.5/0.1)


2/10/2004

Tri-linear contour plot

88

1. D efin ition of factors and b ou nds

10. U se of m odel

BubbleScr: - 10. Use of model

2. Selection of experim ental ob jective an d m ixture m odel

9. V isu alization of m odellin g results

3. Selection of cand idate set

8. A nalysis of d ata and evaluation of m od el

4. G eneration of design

7. Execution of d esign

5. Evaluation of size an d shape of m ixtu re region

6. D efin ition of reference m ixtu re

MODDE optimizer was used to propose two verifying experiments

Verifying experiment #1 Temp = 7 Time = 25 Mixture = 0.2 / 0.2 / 0.3 / 0.3 Resp 1 = 1120 sec (18 min 40 sec)

Verifying experiment #2 Temp = 7 Time = 49 Mixture = 0.4 / 0.0 / 0.3 / 0.3 Resp 1 = 810 sec (13 min 30 sec)

2/10/2004

89

Summary
Proposed working strategy works for
mixture regions of regular geometry mixture regions of irregular geometry experimental series involving both process and mixture factors

Strategy is oriented towards a graphical presentation of modelling results In BubbleScr it was possible to raise bubble lifetime from 11 sec. to 6.02 min. Verifying experiments of model predictions gave increased lifetime of 18.40 min. Bubble lifetime further optimized by RSM D-optimal design (see section 5)

2/10/2004

90

Mixture Design Add-On

Application: Bubbles (RSM)

1. D efinition of fa ctors and bounds

10. U se of m odel

BubbleOpt: - 1. Definition of factors and bounds


Verifying experiment #1was used to adjust the bounds of the four mixture factors Process Factors:
Temperature kept constant (+7C) Time kept constant at 25h

2. Selection of experim ental objective an d m ixture m odel

9. V isualizatio n o f m odellin g results

3. Selection of candidate set

8. A nalysis of d ata a nd evaluation of m odel

4. G eneration of desig n

7. E xecution of design

5. E valuation of size and shape of m ixture region

6. D efin ition of reference m ixtu re

Tap Water, Ume (0.2 - 0.4) Glycerol, APOTEKETS (15% water content / 0.2 - 0.4)

Constraint: 0.3 DWL1 + DWL2 0.5 Response: Lifetime of bubbles (sec) obtained
with childrens bubble wand. Time until bursting was measured for bubbles of 4-5 cm size (diameter)

Mixture Factors: Dish-washing liquid 1, SKONA,


ICA (0.1 - 0.3) Dish-washing liquid 2, NEUTRAL, ADACO (0.1 - 0.3)

2/10/2004

92

1. Definition of factors and bounds

10. U se of model

BubbleOpt: - 2. Selection of experimental objective and mixture model


2. Selection of experimental objective and mixture model 3. Selection of candidate set 4. Generation of design 5. Evaluation of size and shape of mixture region

9. Visualization of modelling results

8. A nalysis of data and evaluation of model

7. Execution of design

6. D efinition of reference mixture

Experimental objective:
Optimization

Mixture model:
Quadratic

y = 0 + 1XMF1 + 2XMF2 + 3XMF3 + 4XMF4 + 11XMF12 + 22XMF22 + 33XMF32 + 44XMF42 + 12XMF1*XMF2 + 13XMF1XMF3 + 14XMF1XMF4 + 23XMF2XMF3 + 24XMF2XMF4 + 34XMF3XMF4 +

2/10/2004

93

1. D efinition of factors and bounds

10. Use of model

2. Selection of experimental objective and mixture model

9. Visualization of modelling results

BubbleOpt: - 3. Selection of candidate set


Overview of candidate set 12 extreme vertices 40 centers of edges 10 centroids of high-dimensional surfaces 1 overall centroid

3. Selection of candidate set

8. A nalysis of data and evaluation of model

4. Generation of design

7. Execution of design

5. Evaluation of size and shape of mixture region

6. Definition of reference mixture

2/10/2004

94

1. Definition of factors and bounds

10. U se of m odel

2. Selection of experim ental objective and m ixture m odel

9. Visualization of m odelling results

BubbleOpt: - 4. Generation of design


The proposed model needs 10 degrees of freedom (DF)
1 DF for the constant 3 DF for the linear terms 3 + 3 DFs for the quadratic and interaction terms

3. Selection of candidate set

8. Analysis of data and evaluation of m odel

4. G eneration of design

7. Execution of design

5. Evaluation of size and shape of m ixture region

6. Definition of reference m ixture

Selected design with 24 runs (Geff = 83%, CondNo = 16.8)

Add 5 extra experiments to get enough DF In addition, 2 supplementary experiments are needed to handle the complexity introduced by the linear constraint lead no of experiments = 17
2/10/2004

95

1. D efinition of factors and bounds

10 . U se of m odel

BubbleOpt: - 5. Evaluation of size and shape of mixture region

2. Selection of experim ental objective and m ixture m odel

9. V isualization of m odelling results

3. Selection of candidate set

8. A nalysis of data and evaluation of m odel

4. G eneration of design

7. E xecution of design

5. E valuation of size and shape of m ixture region

6. D efinition of reference m ixture

Glycerol = 0.2

Glycerol = 0.3

Glycerol = 0.4

2/10/2004

96

1. D efin ition of factors and b ou nds

10. U se of m odel

BubbleOpt: - 6. Definition of reference mixture

2. Selection of experim ental ob jective an d m ixture m odel

9. V isu alization of m odellin g results

3. Selection of cand idate set

8. A nalysis of d ata and evaluation of m od el

4. G eneration of design

7. Execution of d esign

5. Evaluation of size an d shape of m ixtu re region

6. D efin ition of reference m ixtu re

Calculated reference mixture: (0.2 / 0.2 / 0.3 / 0.3) (DWL1 / DWL2 / water / glycerol)

This reference mixture is identical to the previously used verifying experiment

2/10/2004

97

1. D efinition of factors and bounds

10. U se of model

BubbleOpt: - 7. Execution of design


Carry out experiments in randomized order

2. Selection of experim ental objective and m ixture m odel

9. V isualization of m odelling results

3. Selection of candidate set

8. A nalysis of data and evaluation of m odel

4. G eneration of design

7. Execution of design

5. Evaluation of size and shape of m ixture region

6. D efinition of reference m ixture

Bubble lifetime ranges between 647 and 1348 sec

2/10/2004

98

BubbleOpt: - 8. Analysis of data and evaluation of model


Regression model was fitted with PLS - good model
R2 Investigation: Bubb_rsm (PLS, comp.=2) Q2 Model Validity Summary of Fit

1. D efinition of factors and bounds

10. U se of model

2. Selection of experim ental objective and m ixture m odel

9. V isualization of m odelling results

3. Selection of candidate set

8. A nalysis of data and evaluation of m odel

4. Generation of design

7. Execution of design

5. Evaluation of size and shape of m ixture region

6. D efinition of reference mixture

Investigation: Bubb_rsm (PLS, comp.=2) Lifetime~ with Experiment Number labels


0.98 0.95 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.05 0.02 N-Probability

Reproducibility

1.00

0.80

0.60

0.40

0.20

11

7
-1

4 15 22 23 14 21 19 13 18 8 20 10 3 24 2 117 16 6 9 512
0 Standardized Residuals 1

0.00 Lifetime~
N=24 DF=14 Cond. no.=12.3206 Y-miss=0

N=24 DF=14

R2=0.919 Q2=0.708

R2 Adj.=0.868 RSD=0.0358
MODDE 7 - 2004-02-02 13:03:01

2/10/2004

99

1. D efinition of factors and bounds

10. U se of model

BubbleOpt: - 9. Visualization of modelling results


Investigation: Bubb_rsm (PLS, comp.=2) Scaled & Centered Coefficients for Lifetime~
0.060 0.040 0.020 0.000 -0.020 -0.040 -0.060 Gly*Gly DW1 DW2 Gly Wa Wa*Wa DW1*Gly DW1*DW1 DW2*DW2 DW1*DW2 DW2*Gly DW1*Wa DW2*Wa Wa*Gly

2. Selection of experim ental objective and m ixture m odel

9. V isualization of m odelling results

3. Selection of candidate set

8. A nalysis of data and evaluation of m odel

4. G eneration of design

7. Execution of design

5. Evaluation of size and shape of m ixture region

6. D efinition of reference m ixture

Glycerol = 0.2 Temp = 14 Time = 13

Reference mixture 0.2 / 0.2 / 0.3 / 0.3

N=24 DF=14

R2=0.919 Q2=0.708

R2 Adj.=0.868 RSD=0.0358 Conf. lev.=0.95


MODDE 7 - 2004-02-02 13:05:10

Regression coefficients

Tri-linear contour plot

2/10/2004

100

1. D efin itio n of factors and b ou nds

10. U se of m odel

BubbleOpt: - 10. Use of model


Raw Data Plot
3.15 Log (Lifetime) 3.10 3.05 3.00 2.95 2.90 2.85 2.80 40 4 6 12 7 18 3 50 60 Cost 70 22 23 21 24 1517 11 10 9 5 14 13 20 1 8 19 16 2

2. Selection of experim ental ob jectiv e an d m ixture m odel

9. V isu alizatio n of m odellin g results

3. Selection of cand idate set

8. A nalysis of d ata and evalua tion of m od el

4. G eneration of design

7. E xecution of d esign

5. E valuation of size an d shape of m ixtu re region

6. D efin itio n of reference m ixtu re

Ingredient cost is easy to take into consideration

80

2/10/2004

101

1. D efin itio n of factors and b ou nds

10. U se of m odel

BubbleOpt: - 10. Use of model


Lowest ingredient cost with longlasting bubbles

2. Selection of experim ental ob jectiv e an d m ixture m odel

9. V isu alizatio n of m odellin g results

3. Selection of cand idate set

8. A nalysis of d ata and evalua tion of m od el

4. G eneration of design

7. E xecution of d esign

5. E valuation of size an d shape of m ixtu re region

6. D efin itio n of reference m ixtu re

2/10/2004

102

Conclusions, Bubble example


Sequence 1) Screening, 2) RSM is very fruitful for rational experimental work We were able to increase bubble lifetime from 6.02 - 22.28 min Key to success was to increase glycerol substantially Long-lasting bubbles are obtained with
Cooled solution 25 h settling time (not popular for kids) Formulation
DWL1 DWL2 Water Glycerol 0.23 0.1 0.27 0.4

Red plastic bubble wand

2/10/2004

103

Mixture Designs, Summary


To obtain the best design you must determine:
Factors Bounds low-high Type of region: Regular? Irregular? Experimental Objective: Screening? Optimization? Number of Runs

. and. Use PLS for modelling!!!

2/10/2004

104

Design of Experiments (DoE) Pharma Applications


Section 13: Exercises

2/10/2004

Overview of Exercises Layout


Each exercise contains the following headlines
Background (Why this investigation?) Objective (What is the goal/objective with the exercise?) Data (Description of X and Y and observations, originator(s) and literature source(s)) Tasks (What you are expected to do in this exercise) Solutions (A proposed solution to the tasks given) Conclusions (Emphasising main points of the exercise)

Please do not hesitate to ask the course instructor(s) for help/advice Remember that our solutions are just proposals; other alternatives might exist

2/10/2004

Exercises
Getting started
ByHand CakeMix

Optimization
Chiral Separation Metabolism RGA-Phase 3 Willge DrogenD

D-optimal Design
Model Updating

Screening: Full factorial designs


Pain Tablets Protein Spray-Drying

Blocking
Blocking

Mixture Design
Mixture Region Training Waaler Rocket Corne59 Bubbles Lowarp

Robustness Testing
Nonafact RGA-Phase 4 HPLC Robustness

Screening: Fractional factorial designs


Pilot Plant RGA-Phase 1 RGA-Phase 2 Chromspher_B

Robust Design
CakeTaguchi LoafVolume

2/10/2004

DOE-Exercise ByHand (Full Fac)


Chemical synthesis: Reduction of Enamine

Background
Enamines are reduced by formic acid to saturated amines. In this example morpholine-camphor enamine is the starting material. To investigate the amount of formic acid necessary and at which temperature the reaction should be carried out, design of experiments (DOE) was used.

Objective
The original objective was to make a model for three responses. Our first objective is to do calculations by hand to get an understanding of the arithmetic involved. After that, you should familiarise yourself with the software and perform the same calculations using the computer. The experimental goal was to minimise the amount of side product (Camphor) and the amount of unreacted starting material (Enamine), whilst maximising the yield of the desired product.

Data Factors
x1 x2 y1 y2 y3 Amount formic acid/enamine (mole/mole) Reaction temperature (C) 1.0 25

Levels
0 1.25 62.5 + 1.5 100

Responses
Camphor (side product)% Enamine unreacted % The desired product %

Goals
to be minimised to be minimised to be maximised

Factors
Exp. no 1 2 3 4 5 6 7 x1 1 1.5 1 1.5 1.25 1.25 1.25 x2 25 25 100 100 62.5 62.5 62.5 y1 6.7 10.5 5.5 7.7 7.5 7.9 7.8

Responses
y2 12.5 14.0 0.0 0.0 13.1 13.5 13.3 y3 80.4 72.4 94.4 90.6 84.5 85.2 83.8

Tasks
Task 1
Calculate by hand the coefficients of the equation Y = b0 + b1x1 + b2x2 + b12x1x2 +e. Do these calculations only for the first response, Y1. Do not include the centre points in these calculations (include them only when calculating the constant, b0); centre points are used for diagnostics. Hint: use the sign table mentioned in lecture.

Copyright Umetrics AB, 04-02-10

Page 1 (5)

Task 2
Initiate a new investigation in MODDE and define the two factors and the three responses according to the information above. Do File/New and give a name of the investigation. Press Next. Press New (or double-click on the empty row) and enter the name, abbreviation, unit, and low and high settings of the first factor. Press Add another and fill in the name, abbreviation, unit, and settings of the second factor. Press OK. Press Next. Now we have defined the factors. Press New (or double-click on the empty row) and enter the name, unit, and abbreviation of the first response. Press Add another and give the details of the second response. Press Add another and enter the information regarding the third response. Press OK. Press Next. Now we have defined the responses. Select Screening. Press Next. Make sure that the selected design is the Full Factorial design in four runs. Verify that the number of Centre Points = 3 and Total runs = 7. Press Finish. Set Worksheet Run order to detect curvature and press OK. Now we have generated the experimental design. Enter the response values in the resulting worksheet. Now we are ready for data analysis.

Task 3
Evaluate the raw data. Make replicate plots (Worksheet/Replicate Plot) and histograms (Worksheet/Histogram) to examine the responses. Do Analysis/Fit. Evaluate the model. For which responses is the model reliable? What do you think could be the problem with the misbehaving response? Discuss.

Task 4
Look at the contour plot for each response (Prediction/Contour Plot Wizard). Which conditions should be chosen for preparative large-scale reduction of enamines of the morpholine-camphor type?

Copyright Umetrics AB, 04-02-10

Page 2 (5)

Solutions to ByHand
Task 1 Sign table b0 + + + + + + + b1 + + 0 0 0 b2 + + 0 0 0 b12 + + 0 0 0

b0=(6.7+10.5+5.5+7.7+7.5+7.9+7.8)/7=7.657 b1=(-6.7+10.5-5.5+7.7)/4=1.5 b2=(-6.7-10.5+5.5+7.7)/4=-1.0 b12=(6.7-10.5-5.5+7.7)/4=-0.4

Task 3
We start by evaluating the raw data. The three replicate plots indicate that the replicate error is small for each response, which is favourable for the data analysis. It is possible to use the replicate plot to get a rough understanding of the relationships between the factors and the responses. We are going to fit an interaction model to each response. For such a model to be valid, the measurement values of the centre-points should be found in the middle part of the response interval. This is the case for y1 and y3, but not for y2. Hence, the replicate plot for y2 suggests that the relationship between y2 and the factors is curved (non-linear), which is impossible to describe with an interaction model.
Investigation: Byhand Investigation: Byhand Investigation: Byhand

Plot of Replications for y1 with Experiment Number labels

Plot of Replications for y2 with Experiment Number labels

Plot of Replications for y3 with Experiment Number labels

2
10 9 8 7 6 1 2 3 Replicate Index
MODDE 7 - 2003-11-12 09:20:55

1
10

6 7 5

95 90 y3 85 80

3 4 6 5 7 1 2
1 2 3 Replicate Index
MODDE 7 - 2003-11-12 09:20:08

4 1 3
4 5

6 7 5

y1

y2 5

0 1 2 3

3
4 Replicate Index

4
5

75

MODDE 7 - 2003-11-12 09:20:35

Next, we create one histogram for each response. The three responses are approximately normally distributed and a need for response transformation cannot be detected.
Investigation: Byhand Investigation: Byhand Investigation: Byhand

Histogram of y1 5 4 Count Count 3 2 1 0 6 5 4 3 2 1 5 7 Bins


MODDE 7 - 2003-11-12 09:16:44

Histogram of y2 4 3 Count 2 1 0

Histogram of y3

11

5.5 Bins

11

16.5

72

81 Bins

90

99

MODDE 7 - 2003-11-12 09:16:32

MODDE 7 - 2003-11-12 09:16:18

Copyright Umetrics AB, 04-02-10

Page 3 (5)

After the raw data evaluation it is appropriate to carry out the regression modelling. According to the summary of fit plot, the model is reliable for all responses except y2, Enamine unreacted. The reason for this can be any of the following:
Investigation: Byhand (MLR) Summary of Fit 1.00
R2 Q2 Model Validity Reproducibility

the response includes an outlier a mistake was made in recording the response, for example the zeros are missing values the model is too simple the model is too complicated

0.80 0.60 0.40 0.20 0.00 -0.20

y1
N=7 DF=3

y2
Cond. no.=1.3229 Y-miss=0

y3

Since we understood from the replicate plot of y2 that curvature is involved, it is likely that the fitted model is too simple. This can easily be checked by making plots of the raw data.
Investigation: Byhand Raw Data Plot with Experiment Number labels
y2

Investigation: Byhand Raw Data Plot with Experiment Number labels

y2

14 12 10 8 y2 6 4 2 0

6 7 5

14 12 10 8 y2 6 4 2

2 1

6 7 5

3
1.00 1.10 1.20 x1 1.30 1.40

4
1.50

0 30 40 50 60 x2 70 80 90

4 3
100

From the scatter plots shown above the curvature is obvious. Such curvature can only be adequately captured by quadratic model terms, i.e. x12 and x22. The conclusion is therefore that with the current experimental design we cannot make a good model for y2. To estimate quadratic model terms the design must be expanded to become a composite design.

Copyright Umetrics AB, 04-02-10

Page 4 (5)

Task 4
According to the response contour plots the temperature should be as high as possible and the ratio formic acid/enamine as low as possible. With these conditions we minimise the amount of Camphor (y1) and Enamine (y2) and maximise the amount of Product (y3).

Optimal point: Low x1 High x2

NOTE: Because of the model weakness with regards to y2 we should interpret the second response contour plot with some caution.

Conclusions
The optimal point is low x1 (low molar ratio) and high x2 (high temperature). The model for y2 is weak because the relationship between the factors and this response is non-linear.

Copyright Umetrics AB, 04-02-10

Page 5 (5)

DOE-Exercise CakeMix (Full Fac)


Finding optimal CakeMix composition

Background
The producer of a commercial cake-mix experienced problems with the quality of the resulting cake in that there was considerable taste variation.

Objective
It was decided to use DOE to discover which combination of ingredients produced a tasty cake and which combination produced a reasonable cake at low cost.

Data
Three factors were studied: Flour, Shortening, and Eggpowder. The investigators used a design centred around the standard condition Flour = 300g, Shortening = 75g, and Eggpowder = 75g. Eleven experiments were made using a 23 full factorial design augmented with three replicated centre-points. The response is the average taste as assessed by a trained sensory panel.

Goal: Maximize

Copyright Umetrics AB, 04-02-10

Page 1 (6)

Tasks
Task 1
Define a new investigation in MODDE with three factors and one response. Do File/New and name the investigation. Press Next. Press New (or double-click on the empty row) and enter the name, abbreviation, unit, and settings of the first factor. Press Add another and fill in the name, abbreviation, unit, and settings of the second factor. Press Add another and enter the details of the third factor. Press OK. Press Next. The three factors have now been defined. Press New (or double-click on the empty row) and enter the name, unit, and abbreviation of the Taste response. Press OK. Press Next. Select Screening. Press Next. Make sure that the selected design is the Full Factorial design in eight runs. Verify that Centre Points = 3 and Total runs = 11. Press Finish. Set Worksheet Run Order to detect curvature. Enter the response values in the generated worksheet. Now you are ready to analyse the data.

Evaluate the raw data. Fit the regression model. Which factors affect taste? Are there any non-significant model terms? What about lack of fit? Which factor combination gives an optimal taste?

Task 2
It is possible to take the cost of ingredients into account in the data analysis. The following prices were obtained: Flour 2.95 SEK/kg (0.00295 SEK/g) Shortening 14.70 SEK/kg (0.0147 SEK/g) Eggpowder 32.30 SEK/kg (0.0323 SEK/g)

Define a new response, a Derived response. Select Design and Responses. Double-click on the empty row. Define a derived response and press Edit, Next and Finish to enter the formula. Select ingredient from the list and multiply by the cost per gram, as shown below (NB: the parentheses shown in the formula are only used for clarity, they are not needed in reality). Also note that this task does not work with comma as decimal separator.

Refit the model. Find a recipe which represents a good compromise between a tasty cake and low cost. (Hint: Use Prediction/Contour Plot Wizard).

Copyright Umetrics AB, 04-02-10

Page 2 (6)

Solutions to CakeMix
Task 1
We start by evaluating the raw data. First, we examine the curvature diagnostics plot (Worksheet/Curvature Diagnostics Plot) for taste (see below). This plot is constructed by plotting the value of Taste at three points, (1) the -/-/- factor combination, (9) the 0/0/0 factor combination, and (8) the +/+/+ factor combination. It is useful to examine whether the relationship between one response and the factors deviates from linearity. In this case the deviation from linearity is only mild and we may continue with the rest of the experiments. Whenever this plot exhibits strong curvature, reduce the range of the factors by 2/3.
Investigation: Cakemix

Curvature Diagnostics Plot for Taste

9
4.50 Taste

4.00

3.50

1
Low (distance: 0) Center (distance: 0) Factor Settings
The value inside parenthesis for each X-axis label is the normalized distance from the plotted experiment to the ideal design point of the design.
MODDE 7 - 2003-11-12 09:35:54

High (distance: 0)

The replicate plot shows that the replicate error is low, which is good. The histogram shows that the response is approximately normally distributed. This means that we have good data to work with.
Investigation: Cakemix

Investigation: Cakemix

Plot of Replications for Taste with Experiment Number labels 6.00 5.50 Taste 5.00 4.50 4.00 3.50 1

Histogram of Taste 6

6
Count

5 4

4 3

5 8 7 9 11 10

3 2 1

1
2

2
3 4 5 6 7 8 9 Replicate Index
MODDE 7 - 2003-11-12 09:37:18

3.00

3.90

4.80 Bins

5.70

6.60

MODDE 7 - 2003-11-12 09:37:36

Copyright Umetrics AB, 04-02-10

Page 3 (6)

In the data analysis it is recommended to first examine the Summary of fit plot. This plot shows that we can explain 99% (R2 = 0.99) and predict 87% (Q2 = 0.87) of the response variation. The adequacy of the model is further indicated by MVal = 0.71 and Rep = 0.99. MVal measures the validity of the model and Rep the reproducibility. When the MVal bar is larger than 0.25, there is no Lack of Fit of the model (the model error is in the same range as the pure error). This is also shown by the ANOVA-table below, where the lower p-value is larger than 0.05, which means that the model exhibits no significant lack of fit. The upper p-value is smaller than 0.05, indicating that R2 is statistically significant. If the reproducibility is below 0.5, you have a large pure error, poor control of the experimental set up (the noise level is high), and you cannot assess the validity of the model. This results in low R2 and Q2. You should improve the reproducibility.
Investigation: Cakemix (MLR) Summary of Fit 1.00 0.80 0.60 0.40 0.20 0.00
N=11 DF=4
R2 Q2 Model Validity Reproducibility

Taste
Cond. no.=1.1726 Y-miss=0

Another diagnostic tool that is often used is the N-plot of residuals. However, with only 11 experiments it is difficult to define which residuals are normally distributed and which are not. In the plot below, the main thing to confirm is that all the experiments lie within 4 SDs, which they do. Inspection of the regression coefficients indicates that two model terms, Fl*Sh and Fl*Egg, are non-significant and can be removed from the model.
Investigation: Cakemix (MLR) Taste with Experiment Number labels 0.95 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.05 -4 -3 -2
Investigation: Cakemix (MLR) Scaled & Centered Coefficients for Taste 0.40 0.20 0.00 -0.20 -0.40 -0.60 Fl Fl*Egg Fl*Sh Egg

N-Probability

5 8 2 3 10
-1

9 11

1 6 4 7

Deleted Studentized Residuals


N=11 DF=4 R2=0.995 Q2=0.874 R2 Adj.=0.988 RSD=0.0768
MODDE 7 - 2003-11-12 10:24:03

N=11 DF=4

R2=0.995 Q2=0.874

R2 Adj.=0.988 RSD=0.0768 Conf. lev.=0.95


MODDE 7 - 2003-11-12 10:24:44

After refitting the model a higher Q2 (0.94) is obtained. MVal and the ANOVA table also indicate the usefulness of the model. In the N-plot of residuals, experiment #1 is located beyond 4 SDs but it is considered harmless given the high Q2 of over 0.94.

Copyright Umetrics AB, 04-02-10

Page 4 (6)

Sh*Egg

Sh

Investigation: Cakemix (MLR) Summary of Fit 1.00 0.80 0.60 0.40 0.20 0.00
N=11 DF=6

R2 Q2 Model Validity Reproducibility

Taste
Cond. no.=1.1726 Y-miss=0

Investigation: Cakemix (MLR) Taste with Experiment Number labels 0.95 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.05

Investigation: Cakemix (MLR) Scaled & Centered Coefficients for Taste

1 6 8 4 9 11 3 7 5 10 2
-5 -4 -3 -2 -1 0 1 2 3 4 5 Deleted Studentized Residuals

0.40 0.20 0.00 -0.20 -0.40 -0.60 Fl Sh*Egg


Page 5 (6)

N-Probability

N=11 DF=6

R2=0.988 Q2=0.937

R2 Adj.=0.980 RSD=0.0974 Conf. lev.=0.95


MODDE 7 - 2003-11-12 10:27:30

N=11 DF=6

R2=0.988 Q2=0.937

R2 Adj.=0.980 RSD=0.0974
MODDE 7 - 2003-11-12 10:27:19

The coefficient plot indicates that the largest model term is the Sh*Egg interaction. It is normal to explore such interactions by means of response contour plots. The three response contour plots shown below indicate that the highest value of taste is found with the factor settings Flour = 400g, Shortening = 50g and Eggpowder 100g.

Copyright Umetrics AB, 04-02-10

Egg

Sh

Task 2
In Task 1 we found that Flour should be fixed at its high level in order to produce a tasty cake. This ingredient is also the cheapest one. The contour plots shown below were constructed using Flour = 400 g.

Apparently, we should stay in the upper left-hand corner to maximize taste. In this corner, the predicted ingredient cost is 5.14 SEK. However, the lower right-hand corner represents a reasonable compromise between taste and cost. Here the predicted cost is just 4.27 SEK.

Conclusions
To maximize taste we should use Flour = 400g, Shortening = 50g and Eggpowder = 100g. To obtain a compromise between high taste and low cost an alternative factor combination would be Flour = 400g, Shortening = 100g, and Eggpowder = 50g.

Copyright Umetrics AB, 04-02-10

Page 6 (6)

DOE-Exercise Pain (Full Fac)


Combinations of active ingredients in a pain-reliever

Background
A new combination of constituents in a formulation with pain-relieving capacity was investigated. The formulation contained two active components, A and B, and the effect of different combinations of these were examined. The response was the time (in minutes) needed for the formulation to reach full anaesthetic effect (the average from testing 12 persons). The desirable result was full effect after 5 minutes. Substance A costs 60 times more than B to produce. Since every experiment was very expensive the number of experiments was minimised.

Objective
The first objective is to optimise time, with two variables, through the use of contour plots. Another objective is to consider production economy.

Data

Goal: 5 minutes

Tasks
Task 1
Define a new investigation according to the information given above. The default experimental plan in MODDE is the one used in this application. Fit the regression model. Construct a plot that shows under which conditions the formulation achieves the desired effect.

Task 2
Which approved combination of A and B is the most economical? Do this with graphical tools. (Hint: Add the derived response Cost).

Task 3
If the desirable result was full effect within 5 minutes, what would the answer to Task 2 be (95% significance)?

Copyright Umetrics AB, 04-02-10

Page 1 (4)

Hint: use Prediction menu.

Solutions to Pain
Task 1
In the data analysis it is recommended to examine the R2/Q2 plot first (summary of fit). This plot shows that we can explain 99% (R2 = 0.99) and predict 98% (Q2 = 0.98) of the response variation. Also the statistics MVal = 0.94 and Rep = 0.98 point to an excellent model. The coefficient plot shows that constituent A (CA) affects the release time more strongly than constituent B (CB). If the release time is to be minimised, we should increase the amounts of both constituents.
Investigation: Pain (MLR) Summary of Fit 1.00 0.80
R2 Q2 Model Validity Reproducibility

Investigation: Pain (MLR) Scaled & Centered Coefficients for Release time 0.00 -0.50 min

0.60 0.40

-1.00 -1.50

0.20

-2.00 CA CB
Release time
N=7 DF=3 Cond. no.=1.3229 Y-miss=0

N=7 DF=3

R2=0.995 Q2=0.984

R2 Adj.=0.990 RSD=0.1676 Conf. lev.=0.95


MODDE 7 - 2003-11-18 14:58:52

Any point along the line Release time = 5 fulfils the experimental goal. (Hint: the number of steps in a contour plot can be changed by double-clicking in the plot, selecting Contour Levels and changing the number of steps and/or min/max. Here, we used 13 steps when going from Min = 4 to Max = 10.)

Copyright Umetrics AB, 04-02-10

CA*CB

0.00

Page 2 (4)

Task 2
We added a new response, cost (derived from the factors).

The most economical combination of A and B contains as little of A as possible, i.e., approximately CA = 8.9 and CB = 100. At this point, the predicted cost is 634 (in arbitrary currency unit). The left-hand contour plot only gives the point estimate of Release time. The prediction list shows the uncertainty in the predicted value. As shown by this list, using the combination CA = 8.9 and CB = 100 might result in a Release time ranging from 4.6 to 5.4 minutes.

Copyright Umetrics AB, 04-02-10

Page 3 (4)

Task 3
If we want to make sure that the Release time does not exceed 5 minutes, we have to adjust the upper confidence limit from approximately 5.4 to 5.0. This, in turn, implies that we should be looking for a point estimate of approximately 4.6. From the prediction list given below we conclude that in order to be sure that the painreliever does not take longer than 5 minutes to reach full effect, we need 9.45 mg of substance A and 100 mg of substance B (the cheapest solution). The limits are given with 95 % confidence.

Conclusions
In order to accomplish full anaesthetic effect within five minutes we may use the recipe constituent A = 9.45 mg and constituent B = 100 mg. This combination of ingredients is the most economical one.

Copyright Umetrics AB, 04-02-10

Page 4 (4)

DOE-Exercise Tablet (Full Fac)


Variation in the thickness of pharmaceutical tablets

Background
A manufacturer experienced problems with variation in the thickness of tablets. The variation caused problems during packaging. The problem was tackled by determining which factors had the largest influence on the thickness of the tablets. Three factors that were considered to have an impact on the thickness of the tablets were investigated using experimental design. These factors were: Amount of stearate (lubricant) Amount of active substance, and Amount of starch

Objective
The objective of the investigation was to produce an experimental design and model the response. The goal was to produce a 5 mm thick tablet with a fixed level (90 mg) of active substance.

Data

Goal: 5 mm

Copyright Umetrics AB, 04-02-10

Page 1 (4)

Tasks
Task 1
Initiate a new investigation in MODDE and define the factors and the response according to the information given above. Select Screening as objective. Accept the recommended 11 run design (Full factorial design in 8 runs plus 3 centre-points). Enter the response values in the Worksheet.

Task 2
Do Analysis/Fit. Determine which factors have the strongest influence on the thickness of tablets by looking at the coefficient plot. Are there any interaction effects present?

Task 3
How would you produce a 5 mm thick tablet with 90 mg active substance? With what precision can this be done (5 mm + ???)? Hint 1: Use a response contour plot to find suitable factor combinations at which to perform predictions. Hint 2: Use the prediction list and compute the predicted value and its associated confidence interval.

Copyright Umetrics AB, 04-02-10

Page 2 (4)

Solutions to Tablet
Task 2
As seen below, the factors active substance and starch have a strong influence on the thickness of the tablets. The factor stearate has a small influence. The interaction between Stearate and Starch is small but should be included since the R2 and Q2 numbers decline when it is removed.
Investigation: Tablet (MLR) Investigation: Tablet (MLR) Summary of Fit 1.00 0.80 0.60 -0.20 0.40 actsu ste*actsu 0.20 0.00
N=11 DF=4
R2 Q2 Model Validity Reproducibility

Scaled & Centered Coefficients for thickness 0.40 0.20 mm 0.00

thickness
Cond. no.=1.1726 Y-miss=0
R2 Q2 Model Validity Reproducibility

N=11 DF=4

R2=0.969 Q2=0.504

R2 Adj.=0.922 RSD=0.1076 Conf. lev.=0.95


MODDE 7 - 2003-11-18 15:32:07

Investigation: Tablet (MLR) Summary of Fit 1.00 0.80 0.60 0.40 0.20

Investigation: Tablet (MLR) Scaled & Centered Coefficients for thickness 0.40 0.20 mm 0.00 -0.20

actsu

ste

sta

0.00
N=11 DF=6

thickness
Cond. no.=1.1726 Y-miss=0 N=11 DF=6 R2=0.953 Q2=0.808

R2 Adj.=0.921 RSD=0.1083 Conf. lev.=0.95


MODDE 7 - 2003-11-18 15:32:55

The two lower plots show the result when we have removed insignificant terms. The principle for removing model terms is that of maximisation of Q2: The term ste*actsu has the smallest coefficient and is removed first. The model is then recalculated with the remaining terms and we compare Q2 with the original model. In this case Q2 increases from 0.50 to 0.78. We then continue by removing the second smallest interaction term, actsu*sta, and check Q2. Anew Q2 increases, from 0.78 to 0.81. When removing the last interaction term, ste*sta, Q2 drops a little. This indicates that the four-term model displayed above is predictively the optimal one. This example also shows that a small model term must not necessarily be excluded from the model, just because it is insignificant according to the confidence interval criterion.

Copyright Umetrics AB, 04-02-10

Page 3 (4)

ste*sta

actsu*sta

ste

sta

ste*sta

Task 3
In the contour plot, the line where the thickness of the tablets is predicted to be 5 mm is of interest. Below, we give some predictions for different factor combinations found along the line 5 mm. According to these predictions the tablets can be made with the following precision: 5.00 0.09mm up to 5.00 0.12mm depending on which factor combination is selected for production.

Active substance: 90 mg

Conclusions
Active substance and starch are the two ingredients most profoundly affecting the tablet thickness. According to model predictions the tablets can be made with the precision 5.00 0.09mm if the factor combination stearate = 1.0 mg, active substance = 90 mg, and starch = 44.5 mg is used in the manufacturing.

Copyright Umetrics AB, 04-02-10

Page 4 (4)

PROTEIN SPRAY-DRYING (Full Fac)


Investigating the effect of process variables on the degradation of spray-dried protein

Background
Spray-drying is a process often used for drugs intended for inhalation. For the spray-drying of proteins, the prime interest is to produce particles of controlled size. Additionally, it is important that the protein temperature remains rather low to avoid unnecessary denaturation. Protein degradation may involve many complicated physical and chemical processes, including denaturation. Therefore, we would like to study protein stability at a molecular level in order to facilitate formulation applications.

Objective
This example is based on a model protein (D7599) developed by AstraZeneca. Protein powders of D7599 were produced by spray-drying. The experimental objective of this study was to determine which process parameters influence the quality of the spray-dried product. The data analysis will involve dealing with several responses which are not completely correlated. Original data source: Cronholm, M., The Effect of Process Variables on a Spray-dried Protein Intended for Inhalation, Undergraduate Research Study, Department of Pharmaceutics, Uppsala University, Uppsala, Sweden, 1998.

Data
Spray-drying conditions were varied using a full factorial design in four factors: Inlet Temperature temperature of drying air at the inlet of the equipment. The high and low levels of this factor were set such that degradation would be expected at the high level (220C) but not at the low temperature (100C). Atomization gas flow for this factor the low level (500 l/h) of the atomization gas (nitrogen) was the minimum required to achieve sufficient energy for atomization. The high level (800 l/h) was the maximum achievable flow with this spray-dryer. Aspiration rate the aspirator draws air through the instrument and this was varied from 60% to 100% (full capacity). Feed-flow indicates the material flow through the equipment. Here, the high level of 5ml/min was the maximum rate which could be used at the low temperature without condensation appearing in the drying chamber, whereas the low level (2 ml/min) was chosen as the slowest practical rate. Yield the amount of product produced. Should be maximized. Size particle size. Ideally, particles should be in the range 0.5 3.3 m in order to reach the lower airways. Water water content in spray-dried protein. To be minimized. Outlet temperature outlet drying air temperature. This temperature may influence protein degradation and was therefore included. No specific target value was specified for this response. HMWP high molecular weight proteins. Measures the extent of aggregations, i.e., formation of dimers and oligomers of the protein. Should be as low as possible.

To characterize the outcome of the spray-drying the following five responses were measured:

Copyright Umetrics AB, 04-02-10

Page 1 (8)

Copyright Umetrics AB, 04-02-10

Page 2 (8)

Tasks
Task 1
Initiate a new investigation in MODDE. Define the four factors and the five responses according to the information above. Select Screening and the full factorial design in 16 runs supplemented with three center-points. Enter the response data or copy them from PROTEIN SPRAY DRYING.XLS. Evaluate the raw data. Is there any need for data pre-treatment such as a response transformation?

Task 2
Select MLR as the fit method. Fit the regression model. Which are the important factors? Are there any non-significant model terms? Are the residuals approximately normally distributed? Refine the model, if necessary. Use the optimizer to predict good operating parameters.

Task 3
In a MODDE investigation you can only have one model (i.e., one set of model terms) for all the responses. Hence, to generate several models with the same factors and underlying design, but for different responses, we make copies of the original investigation, and in each copy keep the responses that will be fitted with the same model. Thereafter we can link all these responses into one of the investigations (File/Link Investigation) and optimize them together. Try to improve the modelling results, by dividing the responses in two separate projects. One project may contain Yield and Size, and another project Water, Outlet Temp and HMWP. Another possibility is to split the mother investigation into five new investigations and tailor-make one model for each response. Repeat Task 2, but analyze sub-sets of responses. Optimize the responses together. There is no solution provided to this Task.

Copyright Umetrics AB, 04-02-10

Page 3 (8)

Solutions to PROTEIN SPRAY DRYING


Task 1
Evaluation of the raw data indicates two things; (i) the replicate error is small for every response, and (ii) the HMWP response needs to be transformed. The last plot in each sextet depicts the situation after log transforming HMWP using the settings C1 = 1 and C2 = -0.25.
Investigation: Protein Spray Drying Investigation: Protein Spray Drying Plot of Replications for Size with Experiment Number labels Investigation: Protein Spray Drying Plot of Replications for Water with Experiment Number labels Plot of Replications for Yield with Experiment Number labels 60 50 Yield 40 30 20 10 1 2 3

6 5 1 2 8 11 12
5 6 7 8

13 14 9 10

6 2 1 3 5 9 12 10

14

1
Water

3 4 2 5 7 8

11 13 12 16 10 14 15 18 17 19

18 19 15 16 17

Size

13 18 17 19 15 16

7 3 4
4

2 1 2 3

4 7 8
4 5 6 7 8 Replicate Index

11

2 1 2 3 4 5 6

6
7 8 Replicate Index

9 10 11 12 13 14 15 16 17

9 10 11 12 13 14 15 16 17

9 10 11 12 13 14 15 16 17

Replicate Index
MODDE 7 - 2003-11-26 19:06:37

MODDE 7 - 2003-11-26 19:06:57

MODDE 7 - 2003-11-26 19:07:13

Investigation: Protein Spray Drying Plot of Replications for Outlet Temp with Experiment Number labels

Investigation: Protein Spray Drying Plot of Replications for HMWP with Experiment Number labels

Investigation: Protein Spray Drying Plot of Replications for HMWP~ with Experiment Number labels

6
140 Outlet Temp 120 100 80 60 1

10 12

14

6 16
HMWP 3 0.40 HMWP~

6 8 8
0.20 0.00 -0.20

14 10

19 18 17 1
2 3

16

14 16 10 7
6 7 8

5 3
4 5 6 7

7 9
8

13 11

15
1

1 2 3 4 5
2 3 4 5

12 11 13 15

1 2

12 19 17 18 11 13 15

19 17 18

-0.40 1 2 3 4 5

5 7
6 7 8

9 10 11 12 13 14 15 16 17

9 10 11 12 13 14 15 16 17

9 10 11 12 13 14 15 16 17

Replicate Index
MODDE 7 - 2003-11-26 19:07:32

Replicate Index
MODDE 7 - 2003-11-26 19:07:52

Replicate Index
MODDE 7 - 2003-11-26 19:09:30

Investigation: Protein Spray Drying Histogram of Yield 6 5 4 Count Count 3 2 1 0 8 18 28 38 Bins


MODDE 7 - 2003-11-26 19:04:07

Investigation: Protein Spray Drying Histogram of Size 8 7 6 Count 5 4 3 2 1 2 0 8 6 4

Investigation: Protein Spray Drying Histogram of Water

48

58

68

1.00

1.80

2.60 Bins

3.40

4.20

5.00

1.00

1.95

2.90 Bins

3.85

4.80

5.75

MODDE 7 - 2003-11-26 19:04:26

MODDE 7 - 2003-11-26 19:04:52

Investigation: Protein Spray Drying Histogram of Outlet Temp 7 6 5 Count Count 4 3 2 1 0 50 70 90 Bins
MODDE 7 - 2003-11-26 19:05:09

Investigation: Protein Spray Drying Histogram of HMWP 14 12 10 8 6 4 2 Count 12 10 8 6 4 2 0.00 0.75 1.50 Bins
MODDE 7 - 2003-11-26 19:05:25

Investigation: Protein Spray Drying Histogram of HMWP~

110

130

150

2.25

3.00

3.75

-1.00

-0.65

-0.30 Bins

0.05

0.40

0.75

MODDE 7 - 2003-11-26 19:09:14

When dealing with many response variables you should always check the correlation matrix. It will suggest how the variables are correlated. An excerpt of the correlation matrix is shown below. This table indicates there are two groups of responses. The first sub-set contains Yield and Size which correlate with the coefficient 0.75. The second group is made up of Water, Outlet Temp and HMWP, which also have high pairwise correlation coefficients (-0.75, -0.88, and 0.88). Because of the subgrouping of the responses we should not expect them to depend in the same way on the various terms in the regression model.
Copyright Umetrics AB, 04-02-10 Page 4 (8)

Task 2
MLR was used to fit an interaction model to each of the five responses, each of which has 11 model terms (the constant, four linear terms, and six two-factor interactions). As seen below, we have good models for all responses except HMWP.
Investigation: Protein Spray Drying (MLR) Summary of Fit 1.00 0.80 0.60 0.40 0.20 0.00
R2 Q2 Model Validity Reproducibility

Yield

Size

Water

Outlet Temp

HMWP~

N=19 DF=8

Cond. no.=1.0897 Y-miss=0

The coefficient overview plot below shows all model coefficients (except the constant term) for each response variable. The first two responses (Yield and Size) are dominated by the Atomization gas flow. Also the Aspiration rate has an influence on the Yield. The other three responses are highly influenced by the setting of the Inlet Temperature. Water content in the spray-dried protein is also dependent Aspiration rate. The different dependence on the factors suggest Yield and Size to be correlated, and Water, Outlet Temp, and HMWP to be correlated.

Copyright Umetrics AB, 04-02-10

Page 5 (8)

Investigation: Protein Spray Drying (MLR) Normalized Coefficients


InT Ato Asp FF InT*Ato InT*Asp InT*FF Ato*Asp Ato*FF Asp*FF

1.00

0.50

0.00

-0.50

-1.00 Yield Size Water Outlet Temp HMWP~

N=19 DF=8

Cond. no.=1.0897 Y-miss=0

In an attempt to improve the five models, two model terms were removed. These were: Ato*FF and Asp*FF. Primarily, this gave a much better model for HMWP.
Investigation: Protein Spray Drying (MLR) Summary of Fit 1.00 0.80 0.60 0.40 0.20 0.00
R2 Q2 Model Validity Reproducibility

Yield

Size

Water

Outlet Temp

HMWP~

N=19 DF=10

Cond. no.=1.0897 Y-miss=0

Copyright Umetrics AB, 04-02-10

Page 6 (8)

The coefficients of the revised models are plotted below in the Coefficient Overview plot.
Investigation: Protein Spray Drying (MLR) Normalized Coefficients
InT Ato Asp FF InT*Ato InT*Asp InT*FF Ato*Asp

1.00

0.50

0.00

-0.50

-1.00 Yield Size Water Outlet Temp HMWP~

N=19 DF=10

Cond. no.=1.0897 Y-miss=0

Further review of the models using N-plots of residuals show a mild outlier for Outlet Temp (exp 10), but due to the high R2 and Q2 for this response this point is not alarming. No N-plots are shown. We then decided to use the above models together with the Optimizer to predict a factor combination representing good operating conditions. The response desirabilities were set according to the experimental goals mentioned on page 1.

Copyright Umetrics AB, 04-02-10

Page 7 (8)

The results of running the Optimizer are shown below. Apparently, we have not completely fulfilled the desirabilities of the responses, but many simplexes have reached a point where many of the goals are met. It is mainly difficult to cope with the requirements on Water. The following approximate operating parameters are suggested in order to comply with most of the endpoints as well as possible: Inlet temperature: 160 C, Atomization gas-flow 580 l/h, Aspiration rate 100%, and Feed-flow 5ml/min.

Comment: Frequently, the optimizer is run iteratively in several steps, letting the results of the preceding stage dictate how to relax factor settings in the next stage. A practical way to do this is first using the optimizer for interpolation and then for extrapolation. In the current application, however, factor limits could not be changed in a second cycle of the optimizer, since they were already set according to performance limitations of the equipment used.

Conclusions
It is possible to develop strong models for the five responses. Good operating conditions predicted by the models are: Inlet temperature: 160 C, Atomization gas-flow 580 l/h, Aspiration rate 100%, and Feed-flow 5ml/min. A further experiment should be done to verify the results at this point and future work could involve an optimization study anchored around these settings.

Copyright Umetrics AB, 04-02-10

Page 8 (8)

DOE-Exercise PILOT PLANT (Frac Fac 24-1)


Organic synthesis of semi-carbazone from glyoxylic acid in a pilot plant

Background
The organic synthesis of semi-carbazone from glyoxylic acid is a key step in the synthesis of azuracil (a cytostaticum, anti-cancer drug).

Objective
The objective of this study was to investigate the best operating conditions of a pilot plant for synthesising semicarbazone. A fractional factorial design in four factors was constructed and three responses were measured. The intentions with this experimental protocol were to obtain a high yield of semi-carbazone, high purity and rapid filtration. This exercise also mirrors some of the difficulties that might appear when several responses have to be considered simultaneously.

Data
Time for addition of glyoxylic acid (h) Stirring time (h) Reaction temperature (C) Amount of water added (ml/mol)

Yield (%) isolated. Goal: High Purity (%) titrimetric. Goal: High Filtration (ordinal scale, -5 worst, 5 best) Goal: High

Copyright Umetrics AB, 04-02-10

Page 1 (7)

Tasks
Task 1
Write down the computational matrix with + and - signs. Describe the defining relation and list the confounding pattern for the linear terms and the two-factor interactions.

Task 2
Solve the problem with MODDE. Note that the design does not include centre-points, hence you will see no bars relating to Model Validity and Reproducibility in the Summary of Fit plot. (Hint: add some interaction terms to the model and discuss the problems this might introduce).

Task 3
Show graphically which part of the experimental space should be chosen for the first experiment in the pilot plant (specify levels for the variables). Goal: High Yield, Purity and Filtration.

Task 4
Which method is commonly used to separate confoundings between two-factor interactions?

Copyright Umetrics AB, 04-02-10

Page 2 (7)

Solutions to Pilot Plant


Task 1
Defining relation: I = abcd This means that ab is confounded with cd and bc is confounded with ad (also seen in the table below). For the linear terms this means: a=bcd b=acd c=abd d=abc Generator: d=abc abc cd bd bc ad ac ab

const
1 2 3 4 5 6 7 8 + + + + + + + +

a
+ + + +

b
+ + + +

c
+ + + +

d
+ + + +

ab
+ + + +

ac
+ + + +

ad
+ + + +

bc
+ + + +

bd
+ + + +

cd
+ + + +

N.B. In the literature there are two ways of describing the generators and the interactions, with letters and with numbers. We use the more conventional LETTERS.

Task 2

To the left, we see the confounding pattern. The problem is that we cannot be sure of which of the confounded interaction terms is important when we get a significant coefficient (Note: a model like this one cannot be fitted with MLR, since it contains confounded terms and we only have 8 runs).

Copyright Umetrics AB, 04-02-10

Page 3 (7)

Task 3
A linear model is a good choice for Purity and Filtration, but not for Yield
1.00 0.80 0.60 0.40 0.20 0.00 -0.20 Investigation: Pilot plant (MLR) Summary of Fit
R2 Q2

yield
N=8 DF=3

purity
Cond. no.=1.0000 Y-miss=0

filtration

From the regression coefficient plot of Yield, we can see that we have big confidence limits and hence great model uncertainty. One way to improve the model might be to add the interaction between the two largest main effects, i.e., Ad*Te and refit the model.

Investigation: Pilot plant (MLR) Scaled & Centered Coefficients for yield 2 1 % 0 -1

Ad

N=8 DF=3

R2=0.676 Q2=-1.304

R2 Adj.=0.244 RSD=1.0188 Conf. lev.=0.95


MODDE 7 - 2003-11-18 15:14:33

wa
R2 Q2

St

The model improves a lot with respect to Yield, but degrades with regards to the prediction ability of Purity and Filtration.

Investigation: Pilot plant (MLR) Summary of Fit 1.00 0.80 0.60 0.40 0.20 0.00

yield
N=8 DF=2

purity
Cond. no.=1.0000 Y-miss=0

Te

filtration

Copyright Umetrics AB, 04-02-10

Page 4 (7)

The overview of regression coefficients shows the importance of the added interaction term for Yield. It is also apparent that the second main effect (Stirring time) is insignificant for all three responses. Remove Stirring time and refit the model.
0.50

Investigation: Pilot plant (MLR) Normalized Coefficients

Ad St Te wa Ad*Te

0.00

yield
N=8 DF=2

purity
Cond. no.=1.0000 Y-miss=0

filtration

After the deletion of Stirring time, much better models were obtained. When calculating three models with the same model terms, this is the best result.

Investigation: Pilot plant (MLR) Summary of Fit 1.00 0.80 0.60 0.40 0.20 0.00

R2 Q2

yield
N=8 DF=3

purity
Cond. no.=1.0000 Y-miss=0

filtration

The coefficient overview plot may be used in trying to solve the problem. Recall that our goals are high Yield, high Purity and high Filtration. Addition time and Temperature are the two most important terms. We make contour plots with these as axes. As constant, we set the amount of water added at its centre level (because it has a negative effect for Yield and a positive one for Purity and Filtration).

Investigation: Pilot plant (MLR) Normalized Coefficients

Ad Te wa Ad*Te

0.80 0.60 0.40 0.20 0.00 -0.20 -0.40 yield


N=8 DF=3

purity
Cond. no.=1.0000 Y-miss=0

filtration

Copyright Umetrics AB, 04-02-10

Page 5 (7)

Water = 137.5 Area of interest

By using long addition time and high temperature the goal of simultaneously high Yield, Purity and Filtration is accomplished. The use of the MODDE Optimiser is shown below. Note that the factors have been set for extrapolation outside the investigated area.

Copyright Umetrics AB, 04-02-10

Page 6 (7)

Task 4
The method used to unconfound two-factor interactions is called FOLD-OVER.

Conclusions
In order to accomplish high yield, high purity, and rapid filtration, the factor combination of addition time 2h, water 137.5 ml/mol and temperature 60 C looks interesting and could be verified with additional experiments. The last factor, stirring time, may be set at a level convenient for the experimental process. The optimiser in MODDE indicates that even better results are obtainable when relaxing the high limit of addition time to 2.3h and the high limit of temperature to 80 C. Reference: J-C Vallejos, Diss. IPSOI, Marseille 1978.

Copyright Umetrics AB, 04-02-10

Page 7 (7)

REPORTER GENE ASSAY


Screening, optimisation and robustness testing of a reporter gene assay

Background
Reporter gene assays are used in mechanistic studies of gene regulation. They also have great potential when applied to toxicology and drug development. A reporter gene has an easily measurable phenotype whose transcription is controlled by a promoter. Reporter gene assays provide important information of gene regulation relating to expression (i.e. number of copies) and when and where a particular protein is formed.

Objective
The data-set used in this exercise originates from Active Biotech AB in Lund, Sweden and we gratefully acknowledge Lena Schultz and Lisbeth Abramo for permitting us to use it. This study deals with the luciferase reporter gene, one of a number of widely used reporter genes. A total of six factors were investigated using DOE and the objective was to increase and stabilise the signal-to-background ratio of the assay. This study is unique in that it contains data related to the full spectrum of DOE applications, i.e. first a screening design was performed, then fold-over, then optimisation and finally robustness testing. The exercise is structured accordingly: Phase 1 (Screening): A 26-2 fractional factorial design in 16 experiments + 3 centre points. Phase 2 (Fold-over): The initial screening design was complemented by folding over. Phase 3 (Optimisation): A CCF design in 17 experiments to optimise three of the six factors. Phase 4 (Robustness Testing): A 25-1 fractional factorial design in 16 experiments + 3 centre points to investigate the sensitivity of the response to small changes in five of the factors.

Factors: Cells number of T-cells used in assay (number per well) PMA agent added to stimulate T-cells (ng/ml) Ionomycin agent added to stimulate T-cells (g/ml) Stimulation time duration of stimulation (hours) Lysing volume volume of buffer needed to lyse T-cells (l) Ratio ratio of amount of sample to amount of substrate required to acquire a signal in the luciferase assay

Response: S/B signal-to-background ratio computed as (signal-background)/background.

Copyright Umetrics AB, 04-02-10

Page 1 (20)

Phase 1 (Screening)

Tasks Phase 1 (Screening)


Task 1.1
In screening the objective is to identify the most important factors and their ranges. Define a new investigation in MODDE with six factors and one response. Select Screening and the Frac Fac Res IV design in 16 runs augmented with three centre-points. Enter the response data and evaluate them. Should the response be transformed?

Task 1.2
Fit the regression model. Which factors are most important? Are there any non-significant model terms? Are the residuals approximately normally distributed? Refine the model as necessary.

Task 1.3
Using the model obtained above, which factor combination maximises the signal-to-background ratio?

Copyright Umetrics AB, 04-02-10

Page 2 (20)

Phase 2 (Fold-Over)

Copyright Umetrics AB, 04-02-10

Page 3 (20)

Tasks Phase 2 (Fold-Over)


Task 2.1
Fold-over is applied to screening designs to increase the number of experiments so that confounded terms may be resolved. In the existing design, click on File/Complement Design and select the first complement alternative (Fold over a screening fractional factorial of resolution III or IV). Enter a new name for the investigation and select three additional centre-points. MODDE will now construct a new investigation including the existing data and the new runs. It will also add a block-factor ($Block) which is a precautionary measure to check whether the response has drifted with time. If $Block is non-significant, it will be removed from the model. Enter the response data and evaluate them (histogram & replicate plot).

Task 2.2
Fit the regression model. Which factors are most important? What about the block-factor? Are there any non-significant model terms? Are the residuals approximately normally distributed? Refine the model as necessary.

Task 2.3
Using the model obtained above, which factor combination maximises the signal-to-background ratio? Compare your answer with that obtained in Task 1.3?

Copyright Umetrics AB, 04-02-10

Page 4 (20)

Phase 3 (Optimisation)

Tasks

Phase

(Optimisation)

Task 3.1
In optimisation, the objective is to locate an optimal factor combination which can be used as a future set point. Define a new investigation with three factors and one response. Note that the order of the factors has changed and that the factor ranges have been modified according to the results of the screening phase. The new design defines a much smaller experimental domain. Select RSM and choose a CCF design augmented with four centre-points. Enter the response data and evaluate them. Should the response be transformed?

Task 3.2
Fit the regression model. Which factors are most important? Are there any non-significant model terms? Are the residuals approximately normally distributed? Refine the model as necessary.

Task 3.3
Using the model obtained above, which factor combination maximises the signal-to-background ratio?
Copyright Umetrics AB, 04-02-10 Page 5 (20)

Phase 4 (Robustness Testing)

Copyright Umetrics AB, 04-02-10

Page 6 (20)

Tasks Phase 4 (Robustness Testing)


Task 4.1
In robustness testing, the objective is to explore the robustness of an assay or method around its set point. The following set point was identified: Cells = 300000 (320000 was optimal according to the CCF design but 300000 is more practical as it means less crowding of the sample volume). PMA = 10 (Had virtually no effect in the screening phase, low level chosen.) Ionomycin = 1.5 (2 was optimal according to the CCF design but 1.5 is more practical. Too high a concentration creates an interference with the real signal which could reduce the signal-to-background ratio.) Stimulation time = 5.5 (Six hours was optimal according to the CCF design but 5.5h fits in better with an 8 hour working day.) LysVolume = 30 (Low level which was found to be optimal during the screening phase).

The specification of the response was that the signal-to-background ratio should exceed 50 regardless of the factor combination. Define a new investigation in MODDE with five factors and one response. Select Screening and the Frac Fac Res V+ design augmented with three centre-points. Enter the response data and evaluate them. Should the response be transformed? How do the response data compare to the specification?

Task 4.2
Fit the regression model. Which factors are most important? Are there any non-significant model terms? Are the residuals approximately normally distributed? Refine the model as necessary. Is the response sensitive to the factor changes?

Task 4.3
Evaluate the results in terms of the four limiting cases of robustness testing. Which case applies here? Inside specification/Significant model (Limiting case 1) Inside specification/ Non-significant model (Limiting case 2) Outside specification/Significant model (Limiting case 3) Outside specification/Non-significant model (Limiting case 4)

Which factors should be better controlled in order to achieve robustness according to both criteria? Propose new factor tolerances where necessary.

Copyright Umetrics AB, 04-02-10

Page 7 (20)

Solutions to REPORTER GENE ASSAY (Phase 1 - Screening)


Task 1.1
The evaluation of the response data indicates a few very large measurements (below, top left) and their histogram is highly skewed (below, top right). A logarithmic transformation seems justified. However, since the response contains negative numbers, which cannot be logged, a small constant must be added before applying the transformation. The lowest response is 0.2. The second histogram (below, middle right) shows the effect of the log-transform using 1 as the constant. The response is still not approximately normally distributed. The third histogram (below, bottom right) shows the effect of changing the constant to 0.21. Now, the histogram and replicate plot look much better. A plot of descriptive statistics (not shown here) confirms that this transformation is appropriate.
Investigation: Reporter Gene Assay Screening Plot of Replications for S/B with Experiment Number labels 120 100
Count Investigation: Reporter Gene Assay Screening Histogram of S/B

16

15

80 S/B 60 40 20 0 1

10

14 15 1 2 3 4 5
2 3 4 5 6

6 7 8 9 10 11 12 13
7 8 Replicate Index
MODDE 7 - 2003-11-27 09:51:02

19 17 18

9 10 11 12 13 14 15 16 17

-1

24

49 Bins

74

99

124

MODDE 7 - 2003-11-27 09:51:30

Investigation: Reporter Gene Assay Screening Plot of Replications for S/B~ with Experiment Number labels 2.00

Investigation: Reporter Gene Assay Screening Histogram of S/B~

16 14 15 6 5
5 6 7

10 8 Count 6 4 2 0

1.50 S/B~ 1.00 0.50 0.00 1

8 7 9 10
8

13 12 11

19 17 18

1 2 3 4
2 3 4

9 10 11 12 13 14 15 16 17

-1.00

-0.30

0.40 Bins

1.10

1.80

2.50

Replicate Index
MODDE 7 - 2003-11-27 09:52:04

MODDE 7 - 2003-11-27 09:52:20

Investigation: Reporter Gene Assay Screening Plot of Replications for S/B~ with Experiment Number labels 2 1

Investigation: Reporter Gene Assay Screening Histogram of S/B~

14 6 4 5 7 9 10 11 3
1 2 3 4 5 6 7 8

16 15
Count

8 6 4 2 0

8 12

13

S/B~

0 -1 -2

19 17 18

1 2

9 10 11 12 13 14 15 16 17

-3

-2

-1

0 Bins

Replicate Index
MODDE 7 - 2003-11-27 09:52:48

MODDE 7 - 2003-11-27 09:53:04

Copyright Umetrics AB, 04-02-10

Page 8 (20)

Task 1.2
The default linear model looks good with no evidence of lack of fit (R2 = 0.92, Q2 = 0.79). The top two plots correspond to this model. To try and improve the model, PMA and Ratio were removed and the six two-factor interactions of the four remaining factors added, of which only three were worth keeping (Cel*Lys, Ion*StH, and Ion*Lys). The revised model is much better (R2 = 0.96, Q2 = 0.91). The lower two plots relate to the revised model.
Investigation: Reporter Gene Assay Screening (MLR) Summary of Fit 1.00 0.80 0.60 0.40 0.20 Cel Ion Lys PM Rat 0.00 StH 0.50
R2 Q2 Model Validity Reproducibility

Investigation: Reporter Gene Assay Screening (MLR) Scaled & Centered Coefficients for S/B~ 1.00

0.00

S/B~

N=19 DF=12

Cond. no.=1.0897 Y-miss=0

N=19 DF=12
MODDE 7 - 2003-11-27 09:54:26

R2=0.917 Q2=0.791

R2 Adj.=0.876 RSD=0.3472 Conf. lev.=0.95


MODDE 7 - 2003-11-27 09:54:46

Investigation: Reporter Gene Assay Screening (MLR) Summary of Fit 1.00 0.80 0.60 0.40

R2 Q2 Model Validity Reproducibility

Investigation: Reporter Gene Assay Screening (MLR) Scaled & Centered Coefficients for S/B~ 1.00

0.50

0.00 0.20 Ion*StH StH Cel Ion Lys 0.00 Cel*Lys Ion*Lys

S/B~

N=19 DF=11

Cond. no.=1.0897 Y-miss=0

N=19 DF=11

R2=0.962 Q2=0.914

R2 Adj.=0.937 RSD=0.2467 Conf. lev.=0.95


MODDE 7 - 2003-11-27 09:55:16

The revised model contains no outliers (below, left), and the size of the residual is fairly independent of the predicted value (below, right), which is good.
Investigation: Reporter Gene Assay Screening (MLR) S/B~ with Experiment Number labels
Deleted Studentized Residuals Investigation: Reporter Gene Assay Screening (MLR) S/B~ with Experiment Number labels

0.98 0.95 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.05 0.02 -4 -3 -2

19 11 1 18 16 6 2 5 12 1415 7 8 4 13 10
0 1

19
2 1 0 -1 -2 -1 0 Predicted
N=19 DF=11 R2=0.962 Q2=0.914 R2 Adj.=0.937 RSD=0.2467
MODDE 7 - 2003-11-27 09:56:11

17

N-Probability

11 1 2 3 4 5 9 10

17 18 12 7 6 8 13 15 16 14

9 3

-1

Deleted Studentized Residuals


N=19 DF=11 R2=0.962 Q2=0.914 R2 Adj.=0.937 RSD=0.2467
MODDE 7 - 2003-11-27 09:55:43

Copyright Umetrics AB, 04-02-10

Page 9 (20)

Task 1.3
The contour plot below shows how the signal-to-background ratio is predicted to change as a function of the factors Cells and LysVolume, while fixing the other factors at their maximum value. The combination of Cells and LysVolume was chosen to explore the borderline significant two-factor interaction.

Conclusions of Phase 1
The three most important factors are Cells, Ionomycin and Stimulation Time. There are a few twofactor interactions which look interesting as they improve the predictive power of the model. However, these two-factor interactions are confounded with other two-factor interactions. Such confounding can be resolved using the Fold-over technique, see Phase 2 of the exercise.

Copyright Umetrics AB, 04-02-10

Page 10 (20)

Solutions to REPORTER GENE ASSAY (Phase 2 Fold-over)


Task 2.1
For ease of interpretation, the same response transformation was applied as in Task 1. This looks reasonable given the replicate and histogram plots below.
Investigation: Reporter Gene Assay Screening - Fold over complement Plot of Replications for S/B~ with Experiment Number labels 2 1 S/B~ 0 -1 -2 0

Investigation: Reporter Gene Assay Screening - Fold over complement Histogram of S/B~

Count

12 4 3

6 8 7

16 14 15 13 19 17 18 12 9 10 11

35 33 32 34 36 37 25 38 24 26 29 31 28 30 21 23 22 20 27

12 10 8 6 4 2

10

20 Replicate Index

30

-3.00

-2.15

-1.30

-0.45

0.40

1.25

2.10

2.95

Bins
MODDE 7 - 2003-11-27 09:59:01

MODDE 7 - 2003-11-27 09:59:38

Task 2.2
The default linear model is very good (R2 = 0.92, Q2 = 0.88). The top two plots below relate to this model. The Block factor is not significant so there is no evidence of a time drift between the two sets of experiments. To try and improve the model, PMA, Ratio and $Block were removed. The refined model is only marginally better (R2 = 0.91, Q2 = 0.89). The lower two plots relate to the refined model.
Investigation: Reporter Gene Assay Screening - Fold over complement (MLR) Q2 Model Validity Summary of Fit Reproducibility 1.00 0.80 0.60 0.40 0.20 0.00
R2

Investigation: Reporter Gene Assay Screening Fold over complement (MLR) Scaled & Centered Coefficients for S/B~ 0.80 0.60 0.40 0.20 0.00 -0.20 StH Ion Cel Lys Rat $Bl Lys
MODDE 7 - 2003-11-27 10:01:55

N=38 DF=30

Cond. no.=1.0897 Y-miss=0

N=38 DF=30

PM

S/B~

R2=0.920 Q2=0.877

R2 Adj.=0.901 RSD=0.3027 Conf. lev.=0.95


MODDE 7 - 2003-11-27 10:01:19

Investigation: Reporter Gene Assay Screening - Fold over complement (MLR) Q2 Model Validity Summary of Fit Reproducibility 1.00 0.80 0.60 0.40 0.20 0.00

R2

Investigation: Reporter Gene Assay Screening Fold over complement (MLR) Scaled & Centered Coefficients for S/B~ 0.80 0.60 0.40 0.20 0.00 -0.20 StH
R2 Adj.=0.902 RSD=0.3018 Conf. lev.=0.95

N=38 DF=33

Cond. no.=1.0897 Y-miss=0

Cel

N=38 DF=33

R2=0.912 Q2=0.887

Copyright Umetrics AB, 04-02-10

Ion

S/B~

Page 11 (20)

The revised model contains no outliers (below, left). However, the plot of the deleted studentized residuals versus the predicted value (below, right) indicates that some of the largest residuals correspond to the six centre-points. A similar phenomenon was present also in the initial screening design. This hints at curvature problems. Curvature is easy to handle with a quadratic regression model but not with the linear model used here.
Investigation: Reporter Gene Assay Screening Fold over complement (MLR) S/B~ with Experiment Number labels Deleted Studentized Residuals 0.99 0.98 0.95 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.05 0.02 0.01 -4 -3 -2 Investigation: Reporter Gene Assay Screening Fold over complement (MLR) S/B~ with Experiment Number labels 2 1 0 -1 -2 -1 0 Predicted
N=38 DF=33 R2=0.912 Q2=0.887 R2 Adj.=0.902 RSD=0.3018
MODDE 7 - 2003-11-27 10:05:35

19 16 1 36 37 17 28 35 22 27 38 2 3 18 15 8 926 13 4 5 12 32 31 30 7 11 10 34 21 14 6 25 33 24 20 3 29
-1 0 1 2 3 4 Deleted Studentized Residuals
N=38 DF=33 R2=0.912 Q2=0.887 R2 Adj.=0.902 RSD=0.3018
MODDE 7 - 2003-11-27 10:05:03

19 36 37 17 22 28 38 23 2 18 8 26 9 13 4 5 12 7 34 11 30 31 10 21 25 24 20 3 29 1
1

16 27 15 32 6 33 35

N-Probability

14

Task 2.3
MODDEs Optimizer was used to locate the factor combination which maximises the response. PMA, Ratio and $Block were not included in the final model and are therefore greyed out in the Optimizer factor spreadsheet.

Copyright Umetrics AB, 04-02-10

Page 12 (20)

The results of running the Optimizer are shown below. The optimum point corresponds to having three factors at their upper limit and one at its lower limit.

Conclusions of Phase 2
The Fold-over experiments did not indicate any large two-factor interactions. Instead, it confirmed that three of the factors dominate: Cells, Ionomycin and Stimulation Time. These three factors form the basis of the optimisation design employed during Phase 3, which will be better suited to handling the non-linear behaviour noted above.

Copyright Umetrics AB, 04-02-10

Page 13 (20)

Solutions to REPORTER GENE ASSAY (Phase 3 - Optimisation)


Task 3.1
The replicate plot shows that the signal-to-background ratio is much higher than in the screening designs. The histogram and Box-Whisker plots indicate that a response transformation is no longer required.
Investigation: Reporter Gene Assay RSM with CCF Plot of Replications for S/B with Experiment Number labels
Investigation: Reporter Gene Assay RSM with CCF Histogram of S/B 7
Investigation: Reporter Gene Assay RSM with CCF Descriptive Statistics Plot

8
200

14 12 10 11 13 16 17 15 18
Count

6 5 4 3 2 1
200 150 100 50

7
150 S/B 100 50

6 4 5 3 9
1 2 3 4 5 6 7 8

S/B

9 10 11 12 13 14 15

17

62

107 Bins

152

197

242
S/B Min: 17.9 Max: 221.4 Median: 107.8 Mean: 120.683

Replicate Index
MODDE 7 - 2003-11-27 10:16:47

MODDE 7 - 2003-11-27 10:17:10

Task 3.2
The default model has a relatively poor Q2 (R2 = 0.91, Q2 = 0.56). The top two plots relate to the initial model. The model was pruned by removing non-significant terms (R2 = 0.89, Q2 = 0.74). The lower two plots relate to the refined model.
Investigation: Reporter Gene Assay RSM with CCF (MLR) Q2 Model Validity Summary of Fit Reproducibility 1.00
50
R2

Investigation: Reporter Gene Assay RSM with CCF (MLR) Scaled & Centered Coefficients for S/B

0.80
0

0.60
-50

0.40
StH*StH Cel*StH Cel*Cel StH*Ion StH*Ion Cel*Ion Ion*Ion Cel*Cel StH
N=18 DF=8

Cel

0.00

S/B

Ion

0.20

-100

R2=0.908 Q2=0.558

R2 Adj.=0.805 RSD=25.3554 Conf. lev.=0.95


MODDE 7 - 2003-11-27 10:19:15

N=18 DF=8

Cond. no.=4.4596 Y-miss=0


R2

Investigation: Reporter Gene Assay RSM with CCF (MLR) Q2 Model Validity Summary of Fit Reproducibility 1.00 0.80
0

Investigation: Reporter Gene Assay RSM with CCF (MLR) Scaled & Centered Coefficients for S/B

50

0.60 0.40 0.20 0.00


-50

S/B

N=18 DF=11

R2=0.896 Q2=0.739

R2 Adj.=0.840 RSD=22.9934 Conf. lev.=0.95


MODDE 7 - 2003-11-27 10:19:50

N=18 DF=11

Cond. no.=4.0089 Y-miss=0

Copyright Umetrics AB, 04-02-10

Ion*Ion

StH

Cel

Ion

Page 14 (20)

There are no outliers (below, left) and the residuals are independent of the predicted value (below, right).
Investigation: Reporter Gene Assay RSM with CCF (MLR) S/B with Experiment Number labels 0.98 0.95 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.05 0.02 -4 -3

Investigation: Reporter Gene Assay RSM with CCF (MLR) S/B with Experiment Number labels Deleted Studentized Residuals 2 1 0 -1 -2 20 40

1 16 17 15 3 7 6 4 10 8 12 5 14 2 11

1 3 2 9
60 80 100 120 140 160 180 200 220 Predicted
N=18 DF=11 R2=0.896 Q2=0.739 R2 Adj.=0.840 RSD=22.9934
MODDE 7 - 2003-11-27 10:20:44

N-Probability

4 10 5 11 13

16 17 15 6

7 12

8 14

18 13 9
-2 -1

18

Deleted Studentized Residuals


N=18 DF=11 R2=0.896 Q2=0.739 R2 Adj.=0.840 RSD=22.9934
MODDE 7 - 2003-11-27 10:20:18

Task 3.3
The contour plots below show how the signal-to-background ratio varies in relation to the three factors. The optimum factor combination is high Stimulation time (6 hours), high Ionomycin (2) and intermediate Cells (around 320000).

The Optimizer was used to obtain more exact co-ordinates of the optimum.

Copyright Umetrics AB, 04-02-10

Page 15 (20)

After the first optimisation round the 8th simplex was found to be best.

During the second optimisation round, new starting points were generated in the vicinity of the best simplex from the first round.

Four of the five new simplexes converge to the same point: Cells 320000, Stimulation Time = 6 and Ionomycin = 2.

Copyright Umetrics AB, 04-02-10

Page 16 (20)

The results for the best predicted simplex were transferred to the SweetSpot plot, a plot which clearly show the location of the optimal point.

Further, the five simplex factor co-ordinates were transferred to the prediction list showing that the predicted optimal S/B value is 260 40.

Conclusions of Phase 3
The optimal factor combination within the investigated experimental domain is Cells 320000, Stimulation Time = 6 and Ionomycin = 2. In the final DOE stage, this point will be assessed for robustness. However, due to practical considerations, robustness testing was not performed on this precise point but rather one close to it (see Phase 4).

Copyright Umetrics AB, 04-02-10

Page 17 (20)

Solutions to REPORTER GENE ASSAY (Phase 4 Robustness)


Task 4.1
The histogram and replicate plots indicate that a transformation is unnecessary. The replicate plot also shows that all the response values are above 50, i.e. within specification.
Investigation: Reporter Gene Assay RobTest Frac Fac Plot of Replications for S/B with Experiment Number labels 90.00 Investigation: Reporter Gene Assay RobTest Frac Fac Histogram of S/B

8
80.00 70.00 60.00 50.00 40.00

13 1516

8 6

1 2 3 4 5

12 14 11 10

19 18 17

S/B

Count

4 2 0

2.00 4.00 6.00 8.00 10.00 12.00 14.00 16.00 Replicate Index
MODDE 7 - 2003-11-27 11:50:30

52

60.5

69 Bins

77.5

86

94.5

MODDE 7 - 2003-11-27 11:51:03

Task 4.2
In robustness testing model refinement is usually not performed and the ideal result is no model at all. The model obtained is poor (R2 = 0.93, Q2 = negative). However, the regression coefficient plot indicates that S/B is sensitive to changes in Ionomycin concentration.
Investigation: Reporter Gene Assay RobTest Frac Fac Q2 (MLR) Model Validity Summary of Fit Reproducibility 1.00 0.80 0.60 0.40 0.20 0.00 -0.20 S/B
-5 10
R2

Investigation: Reporter Gene Assay RobTest Frac Fac (MLR)


Scaled & Centered Coefficients for S/B

Cel*StH

Ion*StH

Cel*Ion

Cel*Lys

Ion*Lys

N=19 DF=3

Cond. no.=1.0897 Y-miss=0

N=19 DF=3

R2=0.930 Q2=-84.830

R2 Adj.=0.577 RSD=6.7501 Conf. lev.=0.95


MODDE 7 - 2003-11-27 11:54:07

Copyright Umetrics AB, 04-02-10

StH*Lys

Cel*PM

StH

PM*StH

PM*Ion

Cel

Ion

Lys

PM*Lys

PM

Page 18 (20)

Task 4.3
In Task 4.2 it was shown that S/B is not robust to changes in Ionomycin concentration. However, the response data themselves are robust given that they are within specification. The factor range of Ionomycin must be reduced by half in order to make S/B robust. Hence, the concentration range for Ionomycin within which robustness can be claimed is 1.4851.515 g/ml rather than 1.471.53 g/ml.
Investigation: Reporter Gene Assay RobTest Frac Fac (MLR) Scaled & Centered Coefficients for S/B

10

-5

Cel*StH

PM*StH

Ion*StH

Cel*Ion

PM*Ion

Ion*Lys

PM*Lys

N=19 DF=3

R2=0.930 Q2=-84.830

R2 Adj.=0.577 RSD=6.7501 Conf. lev.=0.95


MODDE 7 - 2003-11-27 11:55:49

Conclusions of Phase 4
The final DOE phase illustrated the first limiting case of robustness testing, i.e., a significant model and inside specification. S/B was most sensitive to changes in Ionomycin concentration.

Copyright Umetrics AB, 04-02-10

StH*Lys

Cel*Lys

Cel*PM

StH

Cel

Ion

Lys

PM

Page 19 (20)

Discussion and conclusions of REPORTER GENE ASSAY example


Of the six factors originally investigated, three dominated the initial screening phases - Cells, Ionomycin and Stimulation Time (StimH). A few two-factor interactions were also used in the screening model as they increased the predictive power. However, the problem with two-factor interactions in medium-resolution screening designs is that they are often confounded with other twofactor interactions. Such confounding can be resolved using the Fold-over technique. The addition of the 19 Fold-over experiments showed two things. First of all, there was no systematic shift in the response data between the two sets of experiments. Secondly, there were no important twofactor interactions. On the contrary, it confirmed the importance of the three key factors. There was also some evidence of curvature (non-linear behaviour) which can be investigated in more detail by using a central composite design. In the RSM phase, the key factors Cells, Ionomycin and Stimulation Time were optimised using a CCF design in 17 runs. The factor ranges were adjusted in accordance with the findings of the screening phases. This design identified the optimal factor combination Cells 320000, Stimulation Time = 6 and Ionomycin = 2. In the final robustness testing, the set point was defined as Cells = 300000 PMA = 10 Ionomycin = 1.5 (320000 was optimal according to the CCF design but 300000 is more practical as it means less crowding of the sample volume.) (Had almost no influence during screening, low level selected). (A concentration of 2g/ml was optimal according to the CCF design but 1.5 is more practical. Too high a concentration creates an interference with the real signal and may thus reduce the signal-to-background ratio.)

Stimulation time = 5.5 (Six hours was optimal according to the CCF design but 5.5 hours fits in better with an 8 hour working day.) LysVolume = 30 (Low level, as found to be optimal during screening).

The signal-to-background ratio is most sensitive to changes in the Ionomycin concentration. However, the response may be regarded as robust given that all the values were within specification. The final conclusion is that the results of the four phases are both coherent and consistent. This indicates the high quality of the underlying experimental data.

Copyright Umetrics AB, 04-02-10

Page 20 (20)

DOE-Exercise CHROMSPHER_B (Frac Fac)


Evaluation of mobile phase additives in HPLC

Background
One important property in HPLC is the capacity factor. There are several mobile phase constituents that may influence this chromatographic response, such as, pH, temperature, and type and amount of mobile phase modifiers. Thus, optimization of capacity factors is not always straightforward, but requires design of experiments in combination with multivariate modeling for optimal output. This example is based on the publication of Andersson et al (Chromatographia Vol 38, 715-722, 1994).

Objective
In this example the influence of seven factors on chromatographic response (capacity factors) is investigated. Five factors represent mobile phase modifiers, three uncharged and two charged, and the last two are pH and column temperature. The chromatographic response (i.e., capacity factor) for the Chromspher B stationary phase was assessed using five substances (almokalant, amoxicillin, metoprolol, omeprazole and S 29). The goal was to get an overview (screening) of which factors are most influential for the capacity factors, since it is desirable to regulate these through changes in the factors.

Data

Acetoniltrile (ACN), methanol (MeOH) and tetrahydrofuran (THF) represent uncharged modifiers, whilst 1-octanesulphonic acid (OSA) and N,N-dimethyloctylamine (DMOA) correspond to charged ones.

Copyright Umetrics AB, 04-02-10

Page 1 (8)

Tasks
Task 1
In MODDE first define the seven factors according to the information given above. The factors OSA and DMOA must be log-transformed (with C1 = 1 and C2 = 0). The next step is to specify the responses. Log-transform all five responses. Set C1 to 1 and C2 to 0 for all responses but Amoxicillin, which should have the settings C1 = 1 and C2 = 0.04. Select Screening as objective and MODDE will then prompt for 16+3 experiments in terms of a 27-3 FFD. Accept this proposal. This design only supports linear terms. However, the experimenters wished to estimate some interaction terms and hence they carried out five extra runs selected D-optimally. To append extra experiments to the worksheet you may right-mouse click in the worksheet window and select Add Experiment. Continue until the worksheet has 24 experiments. In EXCEL open the XLS-file CHROMS_B.XLS and COPY/PASTE the worksheet content to the worksheet generated in MODDE.

Task 2
Evaluate the raw data by creating replicate plots and histograms. Are the responses approximately normally distributed? What about the replicate error, is it large or small compared with the variation across the entire design?

Task 3
Set runs 20-24 as excluded (Excl). Select MLR as FIT METHOD (Analysis/Select Fit Method) and compute the model. Check R2, Q2, MVal, Rep, ANOVA, and N-plot of residuals for each one of the five responses. Can you trace any anomalies in the data? Look at the coefficients and interpret the model. Which factors seem most relevant?

Task 4
Include runs 20-24. Edit the model (Edit/Model) and add the three interaction terms pH*DMOA, ACN*OSA and MeOH*THF. Compute the model with MLR and compare results with Task 3.

Task 5
Use the same data material as in Task 4, but switch to PLS instead of MLR. What are the similarities and differences between the MLR and the PLS models? How are the different responses correlated? Which factors are most meaningful?

Copyright Umetrics AB, 04-02-10

Page 2 (8)

Solutions to CHROMSPHER_B
Task 2
Investigation: chroms_b Histogram of OM~ 12 10 8 Count 6 4 2 0 -1.00 -0.60 -0.20 0.20 Bins
MODDE 7 - 2003-11-25 11:25:15

Investigation: chroms_b Histogram of S29~ 12 8 10 8 Count 6 4 2 2 0 Count 6 4

Investigation: chroms_b Histogram of Almo~

0.60

1.00

1.40

-1.00

-0.65

-0.30

0.05 Bins

0.40

0.75

1.10

-1.00

-0.65

-0.30

0.05 Bins

0.40

0.75

1.10

MODDE 7 - 2003-11-25 11:25:46

MODDE 7 - 2003-11-25 11:26:04

Investigation: chroms_b Histogram of Amox~ 10 8 Count Count 10 6 4 2 0 0

Investigation: chroms_b Histogram of Meto~

15

-3.00

-2.45

-1.90

-1.35 Bins

-0.80

-0.25

0.30

-1.00

-0.65

-0.30 Bins

0.05

0.40

0.75

The five histograms show that all responses are approximately normally distributed. This is what you would expect for logtransformed chromatographic data.

MODDE 7 - 2003-11-25 11:26:25

MODDE 7 - 2003-11-25 11:26:51

Investigation: chroms_b Plot of Replications for OM~ with Experiment Number labels 1.00 0.80 0.60 OM~ 0.40 0.20 0.00 -0.20 0 2 4 6 8 Replicate Index
MODDE 7 - 2003-11-25 11:27:37

Investigation: chroms_b Plot of Replications for S29~ with Experiment Number labels

Investigation: chroms_b Plot of Replications for Almo~ with Experiment Number labels

1 6 8 3 14 16 13 17 10 11

21 20
S29~

0.80 0.60 0.40 0.20 0.00

1 3

21 6 8 4 12 15 5 7 9 10 11 14 17 13 16 18 19 20
0.50 Almo~

20 8 1 3
0.00

4 12 15 5 7

22

22

6 12 15 4 5 9 11 10

13 14

16 19 17 18

21 22

19 18 24 23

-0.20 0 2 4 6 8

24 23

-0.50 0 2

2 7
4 6 8

24 23

10 12 14 16 18 20 22 24

10 12 14 16 18 20 22 24 Replicate Index
MODDE 7 - 2003-11-25 11:28:06

10 12 14 16 18 20 22 24 Replicate Index
MODDE 7 - 2003-11-25 11:28:23

Investigation: chroms_b Plot of Replications for Amox~ with Experiment Number labels

Investigation: chroms_b Plot of Replications for Meto~ with Experiment Number labels

-0.50 Amox~ -1.00 -1.50 -2.00 0

1 2 5 34 15 12

6 9 7 8 17 11 14 10 13 16 22 21 20 24 23

0.60 0.40 Meto~ 0.20 0.00 -0.20 -0.40

20 6 3 2 7
0 2 4 6 8

8 13 9 11 10 14

16 17

19

21 22

18

12 15 4 5

18 24 23

19
2 4 6 8 10 12 14 16 18 20 22 24 Replicate Index
MODDE 7 - 2003-11-25 11:28:41

-0.60

10 12 14 16 18 20 22 24 Replicate Index
MODDE 7 - 2003-11-25 11:28:59

The replicate error is very small for each response, in fact so small that it will be difficult to avoid lack of fit in the ANOVA lack of fit test. The replicate plot for the fourth response (Amox) indicates a deviating behavior of experiment 19.

Copyright Umetrics AB, 04-02-10

Page 3 (8)

Task 3
We can see that four out of five responses are well accounted for by the model. One response, Amoxicillin, has a large gap between R2 and Q2 indicating model problems for this response. Model validity is only OK with regards to the first response. In the N-plots below, residuals of a well predicted (S29) and a poorly predicted (Amox) response are plotted. For the problematic response (Amox), experiments 1, 11, 17, and 19 stick out a little, but they are still inside 4 standard deviations. The coefficient plot reveals that for most responses the coefficient patterns are similar. The notable exception is Amox, for which the factor pH has a negative coefficient, and not a positive one as for the other responses.
Investigation: chroms_b (MLR) Summary of Fit 1.00 0.80 0.60 0.40 0.20 0.00 -0.20 OM~ S29~ Almo~ Amox~ Meto~
R2 Q2 Model Validity Reproducibility

Investigation: chroms_b (MLR) S29~ with Experiment Number labels 0.98 0.95 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.05 0.02 -4 -3

4 12 15 20 9 8 18 14 2 13 10 53 6 19 17 7 16
-2 -1 0 1

1 11

N-Probability

Deleted Studentized Residuals


N=20 DF=12 R2=0.946 Q2=0.836 R2 Adj.=0.914 RSD=0.0796
MODDE 7 - 2003-11-25 13:41:16

N=20 DF=12

Cond. no.=1.6392 Y-miss=0

Investigation: chroms_b (MLR) Amox~ with Experiment Number labels 0.98 0.95 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.05 0.02 -4 -3

Investigation: chroms_b (MLR) Normalized Coefficients

ACN MeO THF pH OSA~ DMO~ T

1 189 14 13 2 7 8 20 3 16 5 6 15 4 10 12
0 1 2

11

N-Probability

0.50

0.00

17 19
-2 -1

-0.50

OM~

S29~

Almo~

Amox~

Meto~

Deleted Studentized Residuals


N=20 DF=12 R2=0.798 Q2=0.362 R2 Adj.=0.680 RSD=0.2216
MODDE 7 - 2003-11-25 13:41:59

N=20 DF=12

Cond. no.=1.6392 Y-miss=0

Task 4
Evidently, the modeling of all responses benefits from the inclusion of the three cross-terms. The Nplot of Amox residuals has improved slightly, because now only experiments 1 and 19 deviate. The MeOH-THF term is most powerful among the cross-terms, which is seen in the coefficient plots. In the interaction plots, it is possible to discern that the MeOH-THF interaction is more pronounced for Amox than for S29.
Copyright Umetrics AB, 04-02-10 Page 4 (8)

Investigation: chroms_b (MLR) Summary of Fit 1.00 0.80 0.60 0.40 0.20 0.00 -0.20 OM~ S29~ Almo~ Amox~

R2 Q2 Model Validity Reproducibility

Investigation: chroms_b (MLR) Amox~ with Experiment Number labels 0.98 0.95 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.05 0.02 -4 -3

19
-2

8 13 7 11 923 16 5 14 18 10 24 20 22 6 17 15 4 12 3 2 21
-1 0 1 2 3

N-Probability

Meto~

Deleted Studentized Residuals


N=24 DF=13 R2=0.866 Q2=0.522 R2 Adj.=0.763 RSD=0.1811
MODDE 7 - 2003-11-25 11:38:38

N=24 DF=13

Cond. no.=3.3153 Y-miss=0

Investigation: chroms_b (MLR) Scaled & Centered Coefficients for S29~ 0.10 0.20

Investigation: chroms_b (MLR) Scaled & Centered Coefficients for Amox~

0.00

0.00

-0.10

-0.20

-0.20 pH*DMO~ MeO*THF MeO DMO~ ACN*OSA~ pH ACN THF OSA~ T

-0.40 ACN*OSA~ pH*DMO~


THF (low ) THF (high)

N=24 DF=13

R2=0.978 Q2=0.930

R2 Adj.=0.961 RSD=0.0621 Conf. lev.=0.95


MODDE 7 - 2003-11-25 11:39:04

N=24 DF=13

R2=0.866 Q2=0.522

R2 Adj.=0.763 RSD=0.1811 Conf. lev.=0.95


MODDE 7 - 2003-11-25 11:38:49

Investigation: chroms_b (MLR) Interaction Plot for MeO*THF, resp. S29~ 0.50 0.40 Amox S29 0.30 0.20 0.10 0.00 16 18 20 22 24 -0.80 -0.90 -1.00 -1.10
THF (low ) THF (high)

Investigation: chroms_b (MLR) Interaction Plot for MeO*THF, resp. Amox~

THF (low)
-0.70

THF (low)

THF (high)

THF (low) THF (high)


26 28 30 MeOH

THF (high)
16 18 20 22 24 MeOH
N=24 DF=13 R2=0.866 Q2=0.522

THF (low) THF (high)


26 28 30

N=24 DF=13

R2=0.978 Q2=0.930

R2 Adj.=0.961 RSD=0.0621
MODDE 7 - 2003-11-25 11:39:53

R2 Adj.=0.763 RSD=0.1811
MODDE 7 - 2003-11-25 11:40:11

Copyright Umetrics AB, 04-02-10

MeO*THF

MeO

DMO~

ACN

THF

pH

OSA~

Page 5 (8)

Task 5
According to the R2- and Q2-values of the individual responses, the MLR and PLS models provide similar results. However, one must realize that when using MLR there are five models to consider, whereas PLS only fits one model to all responses.
Investigation: chroms_b (MLR) Summary of Fit 1.00 0.80 0.60 0.40 0.20 0.00 -0.20 OM~ S29~ Almo~ Amox~ Meto~
R2 Q2 Model Validity Reproducibility

Investigation: chroms_b (PLS, comp.=4)Q2 Model Validity Summary of Fit Reproducibility 1.00 0.80 0.60 0.40 0.20 0.00 -0.20 OM~ S29~ Almo~ Amox~ Meto~

R2

N=24 DF=13

Cond. no.=3.3153 Y-miss=0

N=24 DF=13

Cond. no.=3.2013 Y-miss=0

The PLS model has four components. For the response S29, which is strongly correlated with OM, Almo, and Meto, we can see that primarily the first PLS component explains response variation. For the deviating response Amox, however, the first component of the PLS model reflects hardly any variation. Rather, the second component models this response.
Investigation: chroms_b (PLS, comp.=4) PLS Summary (cum) for S29~ 1.00 0.80 R2 & Q2 0.60 0.40 0.20 0.00 Comp1 Comp2 Comp3 Comp4 R2 & Q2
R2 Q2

Investigation: chroms_b (PLS, comp.=4) PLS Summary (cum) for Amox~ 1.00 0.80 0.60 0.40 0.20 0.00 Comp1 Comp2 Comp3 Comp4
R2 Q2

N=24 DF=13

R2=0.961 Q2=0.850

R2 Adj.=0.932 RSD=0.0823
MODDE 7 - 2003-11-25 13:25:50

N=24 DF=13

R2=0.836 Q2=0.548

R2 Adj.=0.710 RSD=0.2002
MODDE 7 - 2003-11-25 13:26:33

PLS provides a diagnostic tool visualizing the correlation pattern between the X-factors and the Yresponses, namely the PLS t/u score plot. The first component, accounting for almost 72% of the response variation, captures a strong correlation between X and Y. The second component, which explains another 16% of the Y-variation, uncovers a weakly deviating feature of experiment number 19, i.e., the same phenomenon observed in the foregoing exercises.

Copyright Umetrics AB, 04-02-10

Page 6 (8)

Investigation: chroms_b (PLS, comp.=4) Score Scatter: t[1] vs u[1] with Experiment Number labels

Investigation: chroms_b (PLS, comp.=4) Score Scatter: t[2] vs u[2] with Experiment Number labels 2

20 21 86 1 13 16 3 1722 14 9 4 12 15 5 19 18

1 u[1] 0 -1 -2

-2

8 16

11 7 24 10 23
-2 -1

u[2]

75 10 23 4 15 12 20 18 13

6 1 22 17 92 21 11 24 14 3

-4

19
0 t[1] 1 2 -3 -2 -1 t[2]
N=24 DF=13 Cond. no.=3.2013 Y-miss=0

N=24 DF=13

Cond. no.=3.2013 Y-miss=0

The third and fourth components also display reasonable correlations between t and u, considering they merely model 4 and 2% of the variation in the responses. The third component reveals a weak non-linear relationship.

Investigation: chroms_b (PLS, comp.=4) Score Scatter: t[3] vs u[3] with Experiment Number labels 3 2 1 u[3] 0 -1 -2 -3

Investigation: chroms_b (PLS, comp.=4) Score Scatter: t[4] vs u[4] with Experiment Number labels 2

7 1 8 6 23 13 16 9 22 20 4 21 2412 15 2 14 10

5 17
u[4]

1 0 -1 -2

5 1 18 7 21 14 3
-1

24 19 15 12 4

13 9 11 6 20 8 17 23 16 22 2

11

18 3 19
-1

10
0 t[4] 1

0 t[3]

N=24 DF=13

Cond. no.=3.2013 Y-miss=0

N=24 DF=13

Cond. no.=3.2013 Y-miss=0

Because PLS fits only one model to all responses, we may use the PLS loading plot to overview the relationships among all factors, cross-terms, and responses at the same time. The loading plot given below represents 88% of the response variation. This plot corroborates that Amox provides unique information about the experiments. The other four responses are correlated, and correlation coefficients among them always exceed 0.75 (Hint: use Worksheet/Correlation/Correlation Matrix). The loading plot also suggests that the factors pH, ACN, THF, and MeOH are most influential for the responses. The three cross-terms are of comparatively low importance. Basically, the VIP plot confirms the conclusions drawn from the loading plot.

Copyright Umetrics AB, 04-02-10

Page 7 (8)

Investigation: chroms_b (PLS, comp.=4) Loading Scatter: wc[1] vs wc[2] 0.60 0.40 0.20 wc[2] 0.00 -0.20 -0.40 -0.60 wc[1]
N=24 DF=13 Cond. no.=3.2013 Y-miss=0

Investigation: chroms_b (PLS, comp.=4) Variable Importance Plot

Am~ pH*DMO~ ACN*OSA~ DMO~ MeO*THF MeO THF ACN T pH


pH*DMO~ ACN*OSA~ MeO*THF MeO DMO~ pH ACN THF OSA~
Page 8 (8)

1.50

Al~ Me~

VIP

OSA~

S29~ OM~

1.00

0.50

N=24 DF=13

Cond. no.=3.2013 Y-miss=0

Conclusions
This application shows how DOE can be applied to explore the performance of chromatographic equipment. Seven factors were screened and the resulting models (MLR or PLS) revealed that four factors (pH, ACN, THF, and MeOH) were considerably more meaningful than the others. These are the factors to consider for further studies, e.g., optimization modeling. One response, Amox, was different, mainly because of a distinctively different pH-dependence. A separate MODDE investigation for this response is reasonable.

Copyright Umetrics AB, 04-02-10

-0.60 -0.40 -0.20 0.00 0.20 0.40 0.60 0.80

0.00

CHIRAL SEPARATION (Optimisation)


Optimisation of Chiral Separation of Omeprazole and One of Its Metabolites

Background
Omeprazole is a potent inhibitor of gastric acid secretion and is frequently used against acid-related diseases in the stomach. Both enantiomers of omeprazole are effective in this respect. Omeprazole is metabolised to intermediary products of which hydroxylated omeprazole is the main metabolite. This metabolite is able to block the enzyme H+,K+-ATPase selectively. This enzyme is responsible for the gastric acid production.

Objective
The experimental objective of this study was to optimise the chiral separation (using HPLC) of the (R)- and (S)-enantiomers of omeprazole and its main metabolite hydroxylated omeprazole. In chromatography, the objective is separation of the analytes within a reasonable time. Separation relies on different retention of each analyte on the chiral stationary phase. Thus, the retention of each analyte is important and this response is described by the capacity factor, k. The degree of separation between two analytes is estimated as the resolution between two adjacent peaks in the chromatogram. A resolution of 1 is the minimum acceptable for separation of neighbouring peaks, but for complete baseline separation a resolution above 1.5 is required. In this application, four HPLC factors were varied: mobile phase pH, concentration of the organic eluent modifier acetonitrile (ACN), ionic strength and temperature. Logarithmically transformed capacity factors were measured for the four solutes (R-omeprazole, S-omeprazole, Rhydroxyomeprazole, S-hydroxyomeprazole). The experimental data are taken from the following reference: Karlsson, A., and Hermansson, S., Optimisation of Chiral Separation of Omeprazole and One of Its Metabolites on Immobilized 1-Acid Glycoprotein Using Chemometrics, Chromatographia, 44, 10-18, 1997. In the treatment of the experimental data below, solute 1 is omeprazole and solute 2 is hydroxyomeprazole. The (R)- and (S)-notation indicates different enantiomers. Capacity factors are denoted k and there are four of these. The resolution responses of interest are denoted Res. The experimental objective was to find a factor combination which: (a) achieves retention times (capacity factors) of less than 15 minutes (b) maintains resolution above 1.5 (complete baseline separation).

Copyright Umetrics AB, 04-02-10

Page 1 (10)

Data
Factors

Responses:

Design:

Copyright Umetrics AB, 04-02-10

Page 2 (10)

Tasks
Task 1
Create a new project in MODDE. Define the four factors and the eight responses as outlined above. Note 1: The four capacity factors are commonly analysed after transforming to logs. Note 2: The last four responses are derived from the four capacity factors. Res1 is k(S)-1 divided by k(R)-1. Res2 is k(S)-2 divided by k(R)-2. Res3 is k(R)-1 divided by k(R)-2. Res4 is k(S)-2 divided by k(R)-1.

The four derived responses are not shown in the worksheet until a model has been fitted. Select RSM and the second-ranked Reduced CCF design augmented with four centre-points. There are different versions of this design and the one used by the original investigators is not the same as that recommended by MODDE. Therefore, Copy/Paste the contents of CHIRAL SEPARATION.XLS into MODDE. Evaluate the raw data and the underlying design (replicate plot, histogram, scatter plot of responses, correlation matrix, etc). Are the responses approximately normally distributed? How large or small is the replicate error?

Task 2
Fit the model and review and interpret the results. How are the eight responses related? Which model terms are most important? Which factor settings meet the objectives of the study, i.e. capacity factors below 15 minutes and resolutions above 1.5?

Task 3
The experimenters carried out one verifying experiment to test the predictive power of the model. The verifying experiment was Eluent modifier = 11%, Temperature = 25 C, Ionic Strength = 0.02, and pH = 6.3. At this point, the measured capacity factors were: k(R)-1 = 2.48, k(S)-1 = 5.86, k(R)-2 = 1.59, and k(S)-2 = 3.18. How do the predictions from your model compare with these actual measurements?

Copyright Umetrics AB, 04-02-10

Page 3 (10)

Solutions to CHIRAL SEPARATION


Task 1
The replicate plots below relate to log-transformed capacity factors. In all four cases, the replicate variation is small relative to the overall variation of the responses, particularly for responses 2-4. No replicate plots are shown for the four derived responses.
Investigation: Chiral Separation Plot of Replications for k(R)-1~ with Experiment Number labels 0.60 0.40 k(R)-1~ 0.20 0.00 -0.20 0 2 Investigation: Chiral Separation Plot of Replications for k(S)-1~ with Experiment Number labels 1.00

4 5

7 13 8 6 12 11 14 9 10 23 20 22 17 21 18 19 24
k(S)-1~

34

7 13 5 8 6 12 11 15 20 23 22 17 18 21 24 19 14 16 9 10

0.80

15

0.60 0.40 0.20 0.00 0 2

1 2
4 6

16

1 2
4 6

10 12 14 16 18 20 22

10 12 14 16 18 20 22

Replicate Index
MODDE 7 - 2003-11-26 18:29:03

Replicate Index
MODDE 7 - 2003-11-26 18:29:22

Investigation: Chiral Separation Plot of Replications for k(R)-2~ with Experiment Number labels 0.40

Investigation: Chiral Separation Plot of Replications for k(S)-2~ with Experiment Number labels 0.80

3
0.20 k(R)-2~ 0.00 -0.20 -0.40 0 2

7 13 5 8 6 12 11 14 15 24 23 22 21 18 20 17 19 16

34

7 13 15 5

0.60 k(S)-2~ 0.40 0.20 0.00 -0.20 -0.40 0

1 2
4 6

12 11 14

20 24 22 21 18 23 17 19 16

9 10
8 10 12 14 16 18 20 22 Replicate Index
MODDE 7 - 2003-11-26 18:29:40

2
2 4 6 8

9 10
Replicate Index

10 12 14 16 18 20 22

MODDE 7 - 2003-11-26 18:29:57

Copyright Umetrics AB, 04-02-10

Page 4 (10)

The appropriateness of the log transformation is confirmed by the shape of the histograms of each response (below).
Investigation: Chiral Separation Histogram of k(R)-1~ 12 15 10 8 Count 6 4 2 0 -0.30 -0.15 0.00 0.15 0.30 0.45 0.60 0.75 Bins
MODDE 7 - 2003-11-26 18:30:46

Investigation: Chiral Separation Histogram of k(S)-1~

Count

10

-1.00

-0.60

-0.20

0.20 Bins

0.60

1.00

1.40

MODDE 7 - 2003-11-26 18:31:09

Investigation: Chiral Separation Histogram of k(R)-2~ 10 8 Count Count 6 4 2 0 14 12 10 8 6 4 2 -0.50 -0.35 -0.20 -0.05 Bins
MODDE 7 - 2003-11-26 18:31:27

Investigation: Chiral Separation Histogram of k(S)-2~

0.10

0.25

0.40

-1.00

-0.65

-0.30

0.05 Bins

0.40

0.75

1.10

MODDE 7 - 2003-11-26 18:31:43

Task 2
A quadratic regression model was fitted to the response data. The summary plot (below) indicates that the first response has an excellent model but responses 2-4 suffer from lack of fit.
Investigation: Chiral Separation (MLR) Summary of Fit 1.00 0.80 0.60 0.40 0.20 0.00 -0.20 k(R)-1~ k(S)-1~ k(R)-2~ k(S)-2~
R2 Q2 Model Validity Reproducibility

N=24 DF=9

Cond. no.=5.9084 Y-miss=0

Copyright Umetrics AB, 04-02-10

Page 5 (10)

The correlation matrix is useful for examining how the derived responses correlate with the measured ones. The figure below is an excerpt of the complete correlation matrix showing just the portion related to the responses. It is evident that all four capacity factors are strongly correlated and so it is reasonable to include them in the same investigation where we would expect them to have similar patterns of regression coefficients. The only really different response is the derived response Res3.

The coefficient overview plot shown below confirms the similarity of the coefficient profiles. There are no coefficients for the derived responses as these are generated from the fitted capacity factors. Overall, the linear terms dominate and by far the most important factors are concentration of acetonitrile (ACN) and temperature. It can also be seen that pH has some influence on the second, third and fourth responses. There is some evidence of a quadratic effect of temperature for the third and fourth responses.
Investigation: Chiral Separation (MLR) Normalized Coefficients 0.20 0.00 -0.20 -0.40 -0.60 -0.80 -1.00 k(R)-1~ k(S)-1~ k(R)-2~ k(S)-2~
ACN temp Ion pH ACN*ACN temp*temp Ion*Ion pH*pH ACN*temp ACN*Ion ACN*pH temp*Ion temp*pH Ion*pH

N=24 DF=9

Cond. no.=5.9084 Y-miss=0

The predictive power of the models was improved by removing non-significant model terms. The regression coefficients of the refined models are shown below.

Copyright Umetrics AB, 04-02-10

Page 6 (10)

Investigation: Chiral Separation (MLR) Scaled & Centered Coefficients for k(R)-1~

Investigation: Chiral Separation (MLR) Scaled & Centered Coefficients for k(S)-1~

0.00

0.00 -0.10 -0.20

min

-0.10

-0.20 -0.30 ACN*ACN ACN*temp ACN*ACN pH ACN ACN Ion Ion pH temp*temp temp*temp ACN*temp
R2 Q2 Model Validity Reproducibility

temp

min

N=24 DF=16

R2=0.984 Q2=0.967

R2 Adj.=0.977 RSD=0.0329 Conf. lev.=0.95

N=24 DF=16

temp

R2=0.995 Q2=0.987

R2 Adj.=0.993 RSD=0.0231 Conf. lev.=0.95

MODDE 7 - 2003-11-26 18:36:37

MODDE 7 - 2003-11-26 18:36:50

Investigation: Chiral Separation (MLR) Scaled & Centered Coefficients for k(R)-2~

Investigation: Chiral Separation (MLR) Scaled & Centered Coefficients for k(S)-2~

0.00 min -0.10 -0.20

0.00 min ACN*ACN ACN Ion pH temp*temp ACN*temp temp -0.10 -0.20 -0.30 ACN*ACN ACN Ion pH temp*temp ACN*temp temp

N=24 DF=16

R2=0.978 Q2=0.944

R2 Adj.=0.968 RSD=0.0420 Conf. lev.=0.95

MODDE 7 - 2003-11-26 18:37:01

N=24 DF=16

R2=0.990 Q2=0.973

R2 Adj.=0.986 RSD=0.0364 Conf. lev.=0.95

MODDE 7 - 2003-11-26 18:37:10

Notice how much the Q2 have increased as a result of the model pruning (see summary plot below) although responses 2, 3, and 4 still exhibit significant Lack of fit. It is concluded that this is due to the extremely low replicate errors for these three responses.
Investigation: Chiral Separation (MLR) Summary of Fit 1.00 0.80 0.60 0.40 0.20 0.00 -0.20 k(R)-1~ k(S)-1~ k(R)-2~ k(S)-2~

N=24 DF=16

Cond. no.=4.7348 Y-miss=0

In order to interpret the regression models, we created the eight response contour plots shown below. These were constructed using Eluent modifier (ACN) and Temperature as the axes and fixing Ionic strength and pH at their centre levels. The two quartets of response contour plots suggest that the lower left-hand corner (ACN low, temp low, Ion centre, pH centre) is the most interesting.

Copyright Umetrics AB, 04-02-10

Page 7 (10)

Copyright Umetrics AB, 04-02-10

Page 8 (10)

The conclusion from the eight response contour plots is that it will not be a problem to achieve retention times below 15 minutes. It should also be possible to get all four resolution responses above 1.5. To check this, we used MODDEs Optimizer functionality to locate the optimum factor settings. Because we already know that the capacity factors are not a problem, we excluded them from the optimisation. The specification of the response targets is shown below.

According to the results of the Optimizer (below), simplex #6 is the best. This combination of Eluent modifier = 10%, Temperature = 20C, Ionic strength 0.04 and pH 6.6 is close to the lower lefthand corner identified above in the eight response contour plots.

Copyright Umetrics AB, 04-02-10

Page 9 (10)

Task 3
The prediction list below shows point estimates and their associated 95% confidence intervals. The results for the verifying experiment all fall within the 95% confidence interval which corroborates, albeit with just one point, the predictive power of the model.

Conclusions
Excellent R2 and Q2 were obtained for all four capacity factors. However, the models for responses 2-4 suffered from significant lack of fit, which was undoubtedly due to the extremely low replicate errors associated with these responses. The predictions for the verifying experiment were very close to the actual results obtained which gives confidence in the predictive power of the models. Using MODDEs Optimizer, it was easy to locate the factor settings which met the experimental objectives: Eluent modifier = 10%, Temperature = 20C, Ionic strength 0.04 and pH 6.6. These settings ensure complete baseline separation within reasonable retention times.

Copyright Umetrics AB, 04-02-10

Page 10 (10)

DOE-Exercise Metabolism (RSM)


Optimization of a microsome-based metabolism assay

Background
In the pharmaceutical industry it is important to study metabolism of candidate drugs. One approach is to incubate substances with microsomal preparations which may be used as model systems to investigate e.g. liver metabolism. During incubation, dedicated inhibitors may be used to block enzymes. This will help uncover which enzyme in the microsomes is responsible for metabolizing a specific drug. However, in order to obtain reliable results it is first necessary to ascertain that the compound under study is sufficiently well metabolized. In the current application the aim was to ensure that drug metabolism exceeded 40%. The example originates from Carlsson Research AB, Gothenburg, Sweden, and we gratefully acknowledge the company for allowing us to use this data set.

Objective
The objective of the investigation was to optimise the assay conditions for the enzymes such that a maximum of 60% of the drug was left after incubation with the microsomal preparation.

Data
The following five factors were of interest:

Comments/Explanation: Drug Drug concentration [M]. The higher the drug concentration the greater the risk the drug itself will inhibit the enzymes. Expressed on a log-scale. Microsome Microsome concentration [mg/ml]. The more the enzyme the more rapid the metabolism. Expressed on a log-scale. NADPH NADPH concentration [mM]. Enzyme co-factor. The more co-factor the less risk for total NADPH depletion before the end of the experiment. Expressed on a log-scale. Time Duration of incubation [min]. A longer duration will give the enzymes more opportunity to metabolize the drug. The risk is, however, that other factors will be depleted and hence there may be no net gain from prolonging the incubation time. Ionic strength Ionic strength of the Na/K-phosphate buffer used [mM]. This buffer may affect the ability of the enzymes to interact with the drug. The following response was recorded:

Comments/Explanation: %Left Amount of drug left at the end of the incubation experiment by LC-MS. The experimental objective was to achieve a figure less than 60%.

Copyright Umetrics AB, 04-02-10

Page 1 (6)

In order to conduct this optimization study, the five factors were varied using a CCF design. This is a standard RSM design in 26 + 3 experiments.

Tasks
Task 1
Initiate a new investigation in MODDE and define the factors and the response according to the information given above. Remember to specify the log-transform for the three first factors. Select RSM as objective. Accept the recommended 29 run design (CCF design in 26 runs plus 3 centre-points). Enter the response values in the Worksheet. Evaluate the raw data. Are there any outliers? Is there a need for response transformation? What can you say about the replicate error?

Task 2
Fit the quadratic regression model. Determine which factors have the strongest influence on the metabolism of the drug by looking at the coefficient plot. Review the fit and revise the model if needed. Which factor combination represents the optimal metabolism environment for the enzymes in the microsomal preparations?

Copyright Umetrics AB, 04-02-10

Page 2 (6)

Solutions to Metabolism
Task 1
Experiment number 15 deviates from the rest (below, top left). This is a very interesting point as it is the only one in the worksheet meeting the stipulated goal of %Left less than 60%. Hence, we are reluctant to remove it. This experiment also causes the distribution of %Left to be skewed (below, top right). The replicate error is very small compared with the overall response variation. One possible remedy might be the NegLog transformation. The results after applying this transform are shown in the two lower plots. Evidently, the NegLog transformation is a sensible choice since the distribution of %Left is closer to a normal distribution after transformation. In the following, we will work with the transformed response variable.
Investigation: Metabolism Plot of Replications for %left with Experiment Number labels
100 90 80 %left 70 60 50 0 2 4 6

Investigation: Metabolism Histogram of %left


12.00

1 2

6 4 3 5 7 9 8

10

13 12 14

11

Count

19 23 24 27 29 28 18 21 22 17 20 25 26 16

10.00 8.00 6.00 4.00 2.00

15
8 10 12 14 16 18 20 22 24 26 28 Replicate Index
MODDE 7 - 2003-12-03 11:33:30

0.00

44

54

64

74 Bins

84

94

104

MODDE 7 - 2003-12-03 11:32:57

Investigation: Metabolism Plot of Replications for %left~ with Experiment Number labels

Investigation: Metabolism Histogram of %left~


14 12 Count

1
-0.50

2
%left~ -1.00

6 4 5

10 23 24 27 19 13 29 12 28 14 18 21 9 22 17 25 2 6 16 20 78 11 15
8 10 12 14 16 18 20 22 24 26 28 Replicate Index
MODDE 7 - 2003-12-03 11:45:53

10 8 6 4 2 0 -2.00 -1.70 -1.40 -1.10 Bins


MODDE 7 - 2003-12-03 11:46:09

-1.50

3
0 2 4 6

-0.80

-0.50

-0.20

Copyright Umetrics AB, 04-02-10

Page 3 (6)

Task 2
The fitted quadratic model contains 5 (linear) + 5 (quadratic) + 10 (two-factor interaction) = 20 terms plus the constant. Clearly, many of these are not significant according to the confidence intervals. The model also has negative Q2, which is unsatisfactory. The normal probability plot suggests no outliers in the data and there is no lack of fit (MVal > 0.25).
Investigation: Metabolism (MLR) Summary of Fit
1.00 0.80 0.60 0.40 0.20 0.00 -0.20
R2 Q2 Model Validity Reproducibility

Investigation: Metabolism (MLR) Scaled & Centered Coefficients for %left~


0.40 0.20 0.00 -0.20 Dru~ Mic~ NAD~ Tim Ion Dru~*Dru~ Mic~*Mic~ NAD~*NAD~ Tim*Tim Ion*Ion Dru~*Mic~ Dru~*NAD~ Dru~*Tim Dru~*Ion Mic~*NAD~ Mic~*Tim Mic~*Ion NAD~*Tim NAD~*Ion Tim*Ion
N=29 DF=8 R2=0.964 Q2=-1.592 R2 Adj.=0.875 RSD=0.1076 Conf. lev.=0.95
MODDE 7 - 2003-12-03 12:05:34

%left~
N=29 DF=8 Cond. no.=7.2003 Y-miss=0

Investigation: Metabolism (MLR) %left~ with Experiment Number labels


0.99 0.98 0.95 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.05 0.02 0.01 -4 -3

15 26
-2

10 6 4 25 227 20 22 29 13 11 28 24 7 17 1 6 9 5 3 18 14 12 23 8 21 19
-1 0 1 2

N-Probability

Deleted Studentized Residuals


N=29 DF=8 R2=0.964 Q2=-1.592 R2 Adj.=0.875 RSD=0.1076
MODDE 7 - 2003-12-03 11:57:55

Copyright Umetrics AB, 04-02-10

Page 4 (6)

In order to improve the modelling results, the following six model terms were discarded: Drug*NADPH, NADPH*Time, NADPH*Ionic strength, Drug*Drug, Mic*Mic & NADPH*NADPH. For this model the performance statistics are: R2 = 0.96, Q2 = 0.85, MVal = 0.41, & Rep = 0.99. These are excellent results and there are no outliers. Hence, the model may be used for predicting a region in experimental where the goal of %Left < 60 is attained.
Investigation: Metabolism (MLR) Summary of Fit
1.00 0.80 0.60 0.40 0.20 0.00 R2 Q2 Model Validity Reproducibility

Investigation: Metabolism (MLR) Scaled & Centered Coefficients for %left~


0.30 0.20 0.10 0.00 -0.10 -0.20 -0.30 NAD~ Mic~*NAD~ Ion*Ion Dru~ Mic~ Tim Ion Dru~*Mic~ Dru~*Ion Dru~*Tim Mic~*Tim Mic~*Ion Tim*Tim Tim*Ion

%left~
N=29 DF=14 Cond. no.=5.3422 Y-miss=0

N=29 DF=14

R2=0.961 Q2=0.850

R2 Adj.=0.922 RSD=0.0850 Conf. lev.=0.95


MODDE 7 - 2003-12-03 12:06:49

Investigation: Metabolism (MLR) %left~ with Experiment Number labels


0.99 0.98 0.95 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.05 0.02 0.01 -4 -3

1 27 4 10 29 20 24 28 2 22 1 3 1 6 14 5 16 7 1 7 3 15 23 9 8 12 18 19 21 26
-2 -1 0 1 2

25

N-Probability

Deleted Studentized Residuals


N=29 DF=14 R2=0.961 Q2=0.850 R2 Adj.=0.922 RSD=0.0850
MODDE 7 - 2003-12-03 12:07:00

Copyright Umetrics AB, 04-02-10

Page 5 (6)

The contour plot below shows a saddle surface and achieving %Left < 60 is not difficult. Staying in the lower part of the contour plot (low ionic strength and drug concentration, and high microsome and cofactor concentration) may enable response values as low as 20% left (80% of the drug is metabolized). The sweet-spot plot is coded according to the requirement on the response variable.

Conclusions
A very strong quadratic model was obtained. Using the factor combination, low ionic strength and drug concentration, high microsome and cofactor concentration, and 4 hours gives the lowest %Left inside the region explored. The relevance of this area was later verified using additional experiments.

Copyright Umetrics AB, 04-02-10

Page 6 (6)

DOE-Exercise WILLGE (Optimisation)


Chemical synthesis with the Willgerodt-Kindler reaction

Background
The Willgerodt-Kindler reaction, a rearrangement that takes place when aryl-alkyl-ketones are heated in the presence of sulphur and an amine, is difficult to explain. One way of investigating the reaction mechanism is to find the factors that have the greatest influence on the reaction. The current data are drawn from the thesis of Torbjrn Lundstedt, Ume University, 1986.

Objective
To determine which factors are the most important using a fractional factorial design. To optimise the system utilising response surfaces.

Data
The proportion Sulphur/Ketone (mol/mol) The proportion Amine/Ketone (mol/mol) Temperature (C) Grain size of Sulphur (mm) Stirring speed (rpm)

Goal: Quantitative (100%) yield.

Copyright Umetrics AB, 04-02-10

Page 1 (5)

Tasks
Phase 1: Screening Task 1
Generate a 25-1 fractional factorial design. Enter the response values. Calculate a model showing the influence of the factors on the yield.

Task 2
Why dont you get a summary of fit plot? Edit the model by removing the two smallest terms. Recalculate the model. Which terms do you think are significant? Which factors can be neglected in further investigations?

Phase 2: Optimisation (RSM) Task 3


Define a new investigation. Continue with the three most important factors from the screening and generate a response surface design (CCC) with 6 centre points. Enter the response values listed in the table below. Note that the factor co-ordinates of the axial points have been rounded off to more manageable numbers. Calculate a regression model and carry out the necessary model revision. Interpret the model. Which factor combination will allow a maximisation of the synthetic yield?

Copyright Umetrics AB, 04-02-10

Page 2 (5)

Solutions to Willge
Task 2
The design is saturated, i.e., there are no degrees of freedom left because we fitted a model of 16 terms to a design of 16 experiments. One way to alert the user of this undesirable situation is to deny plotting of R2, R2adj, or Q2. In the coefficient plot no confidence intervals are given (this is because RSD = 0 and because the tdistribution is undefined for zero degrees of freedom). Nevertheless, we can see that Te has the largest influence on Yield.
Investigation: Willges (MLR) Scaled & Centered Coefficients for Yield

15 10 % 5 0 -5 SK*MK MK*Sti MK SK*Sti Sti Te*Sti Te*Pa MK*Te MK*Pa SK*Te SK*Pa Pa*Sti Te*Sti SK Te Pa Te*Pa

N=16 DF=0

Conf. lev.=0.95
MODDE 7 - 2003-11-19 10:17:47

When removing the two smallest model terms, Pa*Sti and MK*Sti, a model is obtained that explains and predicts the variance in the data very well. From the coefficient plot we conclude that the three factors SK, MK, and Te have the largest influence on the Yield. Sti is also significant but will be neglected in further investigations. Through this screening we have thus reduced the number of factors from 5 to 3.
Investigation: Willges (MLR) Summary of Fit 1.00
20
R2 Q2

Investigation: Willges (MLR) Scaled & Centered Coefficients for Yield

0.80
10

0.60 0.40 0.20 0.00


N=16 DF=2

% 0 -10 MK SK*MK SK*Sti Sti MK*Te MK*Pa SK Te Pa SK*Te SK*Pa

Yield
Cond. no.=1.0000 Y-miss=0
N=16 DF=2 R2=0.999 Q2=0.930 R2 Adj.=0.992 RSD=2.2849 Conf. lev.=0.95
MODDE 7 - 2003-11-19 10:18:52

Copyright Umetrics AB, 04-02-10

Page 3 (5)

Task 3
When fitting the quadratic regression model to the data of the CCC design, a model was obtained with high R2 (0.98) and Q2 (0.85), but with negative MVal. Another diagnostic tool, the N-plot of residuals, pinpoints an outlier, i.e., experiment number 8. This outlier has to be removed in order to improve the modelling efficiency.
Investigation: Willge_Opt (MLR)

Investigation: Willge_Opt (MLR) Summary of Fit 1.00 0.80 0.60 0.40 0.20 0.00 -0.20
N=20 DF=10

R2 Q2 Model Validity Reproducibility

Yield with Experiment Number labels 0.98 0.95 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.05 0.02

12 14 10 17 1 20 16 15 9 19 18 6 2 13 4 11 5 7 3 8
-7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 Deleted Studentized Residuals
N=20 DF=10 R2=0.980 Q2=0.849 R2 Adj.=0.961 RSD=5.0006
MODDE 7 - 2003-11-19 10:23:11

Yield
Cond. no.=3.5887 Y-miss=0

As we can see below, the removal of observation #8 improves the model. We now have an excellent model according to R2 and Q2. Observation #7 is somewhat far away in the residual plot, but it is not an influential point. There is also some indication of lack of fit (MVal), but in this case the replicate error is exceptionally low, which may, at least partly, explain why lack of fit appears.
Investigation: Willge_Opt (MLR) Summary of Fit 1.00 0.80 0.60 0.40 0.20 0.00
N=19 DF=9
R2 Q2 Model Validity Reproducibility

N-Probability

Investigation: Willge_Opt (MLR) Yield with Experiment Number labels 0.98 0.95 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.05 0.02

4 7
-4 -3 -2

5 14 17 9 10 3 20 16 13 15 19 11 18 1 6
-1 0 1 2

12

N-Probability

Deleted Studentized Residuals

Yield
Cond. no.=4.0695 Y-miss=0

N=19 DF=9

R2=0.997 Q2=0.967

R2 Adj.=0.993 RSD=2.1588
MODDE 7 - 2003-11-19 10:24:32

Copyright Umetrics AB, 04-02-10

Page 4 (5)

Investigation: Willge_Opt (MLR) Scaled & Centered Coefficients for Yield

20 10 0 -10 MK*MK MK SK*MK SK*SK MK*Te SK Te Te*Te SK*Te

N=19 DF=9

R2=0.997 Q2=0.967

R2 Adj.=0.993 RSD=2.1588 Conf. lev.=0.95


MODDE 7 - 2003-11-19 10:25:15

By creating the response contour plots shown below, we can see how the predicted Yield changes as a function of changes in the three factors. Evidently, the model forecasts a region of quantitative yield, i.e., with Temperature = 140, and high molar ratios in the other two factors.

Conclusions
This example illustrates the working principle of first conducting a careful screening investigation and thereafter a detailed optimisation study. The screening phase identified three factors as more influential than the other factors. When bringing these three factors into the optimisation stage, an area of quantitative yield (i.e., 100%) was discovered. The appropriateness of this region of operability was later verified experimentally by Torbjrn Lundstedt.

Copyright Umetrics AB, 04-02-10

Page 5 (5)

DOE-Exercise Drug D (Opt)


Stability: An investigation into the release profiles of a drug

Background
The stability of an analytical method (in this case release curves) cannot be investigated by changing one factor at a time. More information about the stability can be extracted using DOE. In this case, a small (small volume) design is laid out to describe how the factors should be altered around the standard settings to acquire information on the stability of the method (sensitivity to change). This example is intended to show how experimental design can simplify the examination of a methods sensitivity to small factor changes. The Drug D data originate from a pharmaceutical study at Astra Hssle performed by Tina Riesel and sa Backman.

Objective
The objective of this investigation was to examine how the release profile of Drug D was affected by changes in standard conditions (see below). Is the release after 1 hour the same as after 10 hours? Changes in standard conditions here refer to changes in the four factors: volume of an artificial stomach, its temperature, its fluctuation, and pH. In the written documentation, the manufacturer declared that after 1h the release should be between 20 and 40%, and after 10h above 80%. In addition, the specification stated that the factors should not cause more than 5% (1h) or 10% (10h) spread in each response. Hence, one experimental goal was to assess whether the variation in the release rates across the entire design was consistent with this claim.

Data

Copyright Umetrics AB, 04-02-10

Page 1 (4)

Tasks
Task 1
Generate a design so that a model with square terms can be evaluated (select a CCF-design). Enter the response data, calculate the model, and interpret the results.

Task 2
Refine the model and make response contour plots to examine whether the responses change appreciably.

Task 3
According to the specification, the changes in the factors should not cause the response to vary more than 5% (1h) or 10% (10h). Is this specification met?

Solutions to Drug D
Task 1
As seen in the Summary of Fit plot the difference between R2 and Q2 is quite large for 10h, which indicates that the model might be too complicated, or that there are some outliers. As shown by the coefficient plot, there are several coefficients that are near zero.
Investigation: DrogenD (MLR) Summary of Fit 1.00 0.80
R2 Q2 Model Validity Reproducibility

Investigation: DrogenD (MLR) Scaled & Centered Coefficients for Release 1h

Investigation: DrogenD (MLR) Scaled & Centered Coefficients for Release 10h 1 0 % -1 -2 -3

0.50 0.00

0.60 0.40 0.20 0.00

% -0.50 -1.00 Vol Te Sti pH Vol*Vol Te*Te Sti*Sti pH*pH Vol*Te Vol*Sti Vol*pH Te*Sti Te*pH Sti*pH

Release 1h
N=27 DF=12

Release 10h
Cond. no.=6.6122 Y-miss=0

N=27 DF=12

R2=0.948 Q2=0.710

R2 Adj.=0.886 RSD=0.3838 Conf. lev.=0.95


MODDE 7 - 2003-11-12 12:52:32

N=27 DF=12

Task 2
After the removal of five terms we get an excellent model for the first response and a good model for the second response. We see that the 1h response is much more influenced by linear contributions from the factors than the 10h response. On the other hand, the quadratic influence is more pronounced for the latter response.
Investigation: DrogenD (MLR) Summary of Fit 1.00 0.80
R2 Q2 Model Validity Reproducibility

Investigation: DrogenD (MLR) Scaled & Centered Coefficients for Release 1h

Scaled & Centered Coefficients for Release 10h 1 0 % -1 -2

0.50 0.00 -0.50

0.60 0.40 0.20 0.00

-1.00 Vol*Vol Vol*pH Sti*pH pH*pH Vol pH Sti Te Te*Te

Vol*Vol

Vol*pH

Release 1h
N=27 DF=17

Release 10h
Cond. no.=5.9887 Y-miss=0

N=27 DF=17

R2=0.943 Q2=0.864

R2 Adj.=0.913 RSD=0.3366 Conf. lev.=0.95


MODDE 7 - 2003-11-12 12:53:37

N=27 DF=17

R2=0.889 Q2=0.705

R2 Adj.=0.831 RSD=0.6615 Conf. lev.=0.95


MODDE 7 - 2003-11-12 12:53:20

Copyright Umetrics AB, 04-02-10

pH*pH

Page 2 (4)

Sti*pH

Vol

pH

Sti

Te

Te*Te

Vol Te Sti pH Vol*Vol Te*Te Sti*Sti pH*pH Vol*Te Vol*Sti Vol*pH Te*Sti Te*pH Sti*pH
R2=0.908 Q2=0.333 R2 Adj.=0.801 RSD=0.7162 Conf. lev.=0.95
MODDE 7 - 2003-11-12 12:52:45

Investigation: DrogenD (MLR)

Task 3
To understand the features of the two responses better, we created the response contour plots shown below. These two figures suggest that the responses change dramatically as a result of altered factor settings. However, this is misleading.

Copyright Umetrics AB, 04-02-10

Page 3 (4)

Let us examine these plots more closely. We are tricked by the way they were constructed. More appropriate plots are given below. These plots are response surface plots in which the z-axis, the release axis, has been rescaled to values between 20 and 40% for 1h and between 80 and 100% for 10h. These are more appropriate ranges according to the original objectives of the investigation. Now we can see that the response surfaces are actually quite flat. Remarkably, the difference between the highest and the lowest measured values is as low as 4.1% for 1h and 5.5% for 10h. Hence, we conclude that the release responses are robust because they are inside the given specifications (less than 5% or 10% variation).

Conclusions
The release rate after one hour mainly relates linearly (except for pH*pH) to the four factors. The extent of quadratic dependence is more apparent for the release rate after ten hours. The specification for the 1h response is met. The specification for the 10h response is met.

Copyright Umetrics AB, 04-02-10

Page 4 (4)

NONAFACT (Robustness Testing)


Assessing robustness of a viral inactivation step in the manufacturing of a blood product

Background
The preparation of therapeutic products derived from blood of voluntary donors is an important route for tomorrows pharmaceutical industry. This is because human blood and plasma comprises many proteins, which, once extracted and purified, are of great medical and economic importance. Since the health of the millions of patients who receive blood-derived products every year depend on the quality of the processed blood and plasma, it is crucial that high priority is placed on the quality assurance of such products. One big risk is the transmission of infectious diseases via blood transfusion. Strategies for screening of blood for the detection of infectious agents is advancing, but this is a difficult and time-consuming process due to the continued discovery of new and emerging pathogens. At CLB (Dutch Red Cross)* in Amsterdam, designed experiments are routinely used as part of their viral safety strategy for blood-derived products. The current example is a robustness test investigating the robustness of a viral reduction step in the manufacturing of a solvent/detergent-treated factor IX product called Nonafact. We recall that in a robustness testing study the objective is to probe robustness close to the set point (the set point is usually chosen as the center-point in the design). A robust system copes with small factor changes without compromising its effectiveness. In other words, robustness is a measure of a systems reliability under normal use.
*)

Reference: H. Hiemstra, CLB. Presented at Blood-Products Safety, February 5-7, 2001, MacLean, Virginia, USA, http://www.healthtech.com/2001/bss/.

Objective
The experimental objective of the study here reviewed was to explore how sensitive a viral inactivation step was to changes in six process parameters. The six factors studied were (i) percentage TNBP, (ii) percentage Tween80, (iii) temperature, (iv) amount of protein, (v) pH, and (vi) concentration of NaCl. TNBP (tri-n-butylphosphate) and Tween80 (a detergent) help disintegrate the viruses, and the other factors may affect the viral inactivation process too. The response measured was the change in virus density when comparing density before and after treatment. Virus density is often expressed and valued on a logarithmic scale, and so any decrease in virus density is commonly expressed as [log (initial virus density) log (final virus density)]. This difference is often referred to as the reduction factor, or simply RF, and the higher the better. Maintaining RF > 5 is often used as the specification. In the current study, CLB used three enveloped viruses as models: HIV (Human Immunodeficiency Virus), BVDV (Bovine Viral Diarrhea Virus), and PSR (PseudoRabies Virus). BVDV is used as a model virus for human hepatitis C. Responses were measured within 10 minutes following addition of virucidal chemicals. This is a rather short time frame and in other similar studies up to 30 minutes is used.

Copyright Umetrics AB, 04-02-10

Page 1 (6)

Data
Factors (process parameters):

Responses:

Design:

Copyright Umetrics AB, 04-02-10

Page 2 (6)

Tasks
Task 1
Start a new MODDE project. Define the six factors and the six responses as outlined above. Select Screening and the Frac Fac Res III design in 8 runs. Use 2 center-points (change the default proposal based on 3 center-points). On your screen the following design should appear:

The design above was not used by the investigators. Instead, they choose to use the following modification:

In order to accomplish the altered experimental design you will have to modify the design manually (or paste the contents of NONAFACT.XLS into the worksheet). Also enter or paste the response data. Evaluate the raw data and the underlying design (replicate plot, histogram, scatter plot of responses, correlation matrix, etc.). Is there a need for response transformation? How large or small is the replicate error? Do the responses comply with the often used specification of staying above an RF of 5. What can you say about the correlation between the six responses? What can you say about the geometry of the underlying design?

Task 2
Select MLR as the fit method and compute the model. Review and interpret the model. Which linear terms are important? Is this system robust?

Copyright Umetrics AB, 04-02-10

Page 3 (6)

Solutions to NONAFACT
Task 1
The replicate plots show acceptable spread in the two replicates for all responses but the second one. However, it would have been desirable to have access to at least three replicates. The replicate plots and histogram plots (no plots shown) do not indicate any skewed response. Only the second response (HIV_5min) constantly score RF-values exceeding the often used specification of 5. However, one should remember the short measurement time. Using longer time, e.g., 30 minutes, might have resulted in generally higher RF-values.
Investigation: Nonafact Plot of Replications for HIV_1min with Experiment Number labels Investigation: Nonafact Plot of Replications for HIV_5min with Experiment Number labels 6.10 Investigation: Nonafact Plot of Replications for BVDV_1min with Experiment Number labels 6.00 5.50 5.00 4.50 4.00 1 2 3

2
5.00 HIV_1min

2 6 1 3
1 2 3 4 BVDV_1min

5 2 4

5 3

6 7

8
HIV_5min

6.00

9 10

4.50

9 10

5.90 5.80 5.70 5.60 5.50

5 4
5 6 7

8 7

4.00

4 1
1 2 3 4 5 6 7 8 9 Replicate Index
MODDE 7 - 2003-11-26 17:56:40

1 3
4 5 6 7 8 Replicate Index

10
8 9 Replicate Index
MODDE 7 - 2003-11-26 17:57:01

8
9

MODDE 7 - 2003-11-26 17:57:19

Investigation: Nonafact Plot of Replications for BVDV_5min with Experiment Number labels 7.00 6.50 BVDV_5min 6.00 5.50 5.00 4.50

Investigation: Nonafact Plot of Replications for PSR_2min with Experiment Number labels 4.50

Investigation: Nonafact Plot of Replications for PSR_10min with Experiment Number labels 6

7
PSR_2min 4.00 3.50 3.00 2.50

5 2 6

7
PSR_10min

5 2

3 4

9 10

10 9

3
4 3

8 4 10 9

1
1 2 3 4 5 6 7 8 9 Replicate Index
MODDE 7 - 2003-11-26 17:57:41

1
1 2 3

3
4

4
5 6 7 8

8
9 1

1
2 3 4 5 6 7 8 9 Replicate Index
MODDE 7 - 2003-11-26 17:58:21

Replicate Index
MODDE 7 - 2003-11-26 17:58:02

The table below is the correlation matrix. It shows how all terms in the model and all responses relate to each other. A colored cell indicates high correlation. A number of interesting observations can be made. First of all, we can see that the factors Protein/Tween80 and NaCl/Protein are correlated in a pair-wise fashion. This is unexpected and means that the original investigators have failed to create a correct fractional factorial design. The effect of these non-zero correlations will be inflated confidence intervals around the regression coefficients of Tween80, Protein, and NaCl. Secondly, it appears that the factors TNBP, Tween80, and Temperature generally exert the strongest influence on the responses, i.e., the responses are most susceptible to altered settings in these factors. Thirdly, the response HIV5 seems to be different from the others, since it only correlates appreciably with HIV1. All the other five responses correlate more or less strongly with one another.

Copyright Umetrics AB, 04-02-10

Page 4 (6)

Task 2
A linear regression model in seven terms (constant + six main effects) was fitted to each of the six responses. The summary of fit plot below demonstrates that two significant models were obtained, that of HIV_1min and that of PSR_2min. Also recall that in robustness testing we do not generally spend much time with model refinement activities.
Investigation: Nonafact (MLR) Summary of Fit 1.00 0.80 0.60 0.40 0.20 0.00 -0.20
R2 Q2 Model Validity Reproducibility

HIV_1min

HIV_5min

BVDV_1min

BVDV_5min

PSR_2min

PSR_10min

N=10 DF=3

Cond. no.=2.4142 Y-miss=0

The coefficient overview plot presented below is useful to get the overall picture. In appears that keeping TNBP high (0.35%), Tween80 low (0.8%), Temp high (30C), Protein high (35 mg/ml), pH low (5.5) and NaCl high (1250 mM) generally correspond to the most favorable operating conditions (encoding the highest virus reduction factors). With this setting of pH there is a minor controversy with respect to BVDV_1min, however the level of Tween80, which dominates for this response, is set advantageously.

Investigation: Nonafact (MLR) Normalized Coefficients


TNBP T80 Temp Pro pH NaCl

1.00

0.50

0.00

-0.50

HIV_1min

HIV_5min

BVDV_1min

BVDV_5min

PSR_2min

PSR_10min

N=10 DF=3

Cond. no.=2.4142 Y-miss=0

Copyright Umetrics AB, 04-02-10

Page 5 (6)

By using six response contour plots it is easy to overview the results (see below). These plots were drawn letting TNBP and Temp be the X- and Y-axes, respectively, and by putting Tween80 high, Protein high, pH low, and NaCl high. The colour coding is consistent throughout the six plots. From these plots it is quickly understood that with the short measurement time for the three strains of virus, RF > 5 is not within reach for PSR_2min. The specification is within reach for BVDV_1min if pH is set high. Note that RF for HIV_5min is constantly predicted above 6, hence the flatness of this response contour plot.

Conclusions
Virus reduction factors above 5 are not achievable for PSR2min. The best workable factor combination is TNBP high (0.35%), Tween80 low (0.8%), Temp high (30C), Protein high (35 mg/ml), pH low (5.5) and NaCl high (1250 mM).

Copyright Umetrics AB, 04-02-10

Page 6 (6)

DOE-Exercise: HPLC Robustness (Robustness Testing)


Evaluating sensitivity of HPLC responses to small changes in regulatory factors

Background
The aim of robustness testing is to design a process, or a system, so that its performance remains satisfactory even when some influential factors are allowed to vary. In other words, we want to minimise the systems sensitivity to changes in certain critical factors. The advantages of this include simpler process control, wider range of applicability of product and higher quality of product. A robustness test is usually carried out before the release of an almost finished product, or analytical system, as a last test to ensure quality. Such a design is usually centred on a factor combination, which is currently used for running the analytical system, or the process. We call this the set point. The set point may have been found through a screening design, an optimisation design, or some other identification principle, such as written quality documentation. The aim of robustness testing is, therefore, to explore robustness close to the chosen set point. The example that we have chosen as an illustration originates from a pharmaceutical company. It represents a typical analytical chemistry problem within the pharmaceutical industry. In analytical chemistry, the HPLC method is often mounted for routine analysis of complex mixtures. It is therefore important that such a system will work reliably for a long time, and be reasonably insensitive to varying chromatographic conditions. In chromatography, the objective is separation of the analytes within a reasonable time. Separation relies on different retention of each analyte on the stationary phase. Thus, the retention of each analyte is important, and this response is described by the capacity factor, k. The degree of separation between two analytes is estimated as the resolution between two adjacent peaks in the chromatogram. A resolution of 1 is considered as the minimum value for separation between neighbouring peaks, but for complete baseline separation a resolution of >1.5 is necessary. As the resolution value approaches zero, it becomes more difficult to discern separate peaks.

Objective
The investigators explored five factors: (1) amount of acetonitrile in the mobile phase; (2) pH of mobile phase; (3) temperature; (4) amount of the OSA counter-ion in the mobile phase; (5) stationary phase batch (column), and mapped their influence on the chromatographic behaviour of two chemical analytes. Note that the last factor is of a qualitative nature. To study whether these factors had an influence on the chromatographic system, the researchers used a 12 run experimental design to encode 12 different chromatographic conditions. For each condition, three quantitative responses reflecting the capacity factors of the two analytes (compounds) and the resolution between the analytes were measured. The goal of this study was to constantly maintain a resolution of 1.5 or higher for all chromatographic conditions. No specifications were given for the two capacity responses.

Data
A 12 run design supporting a linear model was constructed. This design, shown below, is a 25-2 fractional factorial design, supplemented with four centre-points.

Copyright Umetrics AB, 04-02-10

Page 1 (7)

Tasks
Task 1
Define a new investigation in MODDE with five factors and three responses. Select Screening, a linear model, and a relevant fractional factorial design with 8 + 4 runs. Enter the response data. Evaluate the raw data. Is there any need for data pre-treatment, such as a response transformation?

Task 2
Fit the linear regression model. Which are the important factors? Are there any non-significant model terms? Are the residuals approximately normally distributed? Comment on any lack of fit. Which responses are robust to changes in the five factors?

Task 3
Assuming that the specification for k2 was 2.7 to 3.3, what would your recommendation be for changing the tolerances of the factors so that robustness is likely to be achieved for this response? NOTE #1: This kind of specification of a capacity factor is uncommon in the pharmaceutical industry, but is shown here for illustration. NOTE #2: Use the discussion regarding the four limiting cases of robustness testing. It will give guidance to how this problem might be solved.

Copyright Umetrics AB, 04-02-10

Page 2 (7)

Solutions to HPLC Robustness


Task 1
We start with the evaluation of the raw data by inspecting the replicate error of each response. As seen, the replicate error was expectedly small. We would not anticipate large drifts among the replicates, as we have deliberately set up a design where each run ideally should produce equivalent results. The numerical variation in the resolution response was small. The lowest measured resolution was 1.75 and the highest 1.89. Since the operative goal was to maintain a resolution above 1.5, we see already in the raw data that this goal was fulfilled, and this means that Res1 is robust.
Investigation: HPLC Robustness Plot of Replications for k1 with Experiment Number labels 2.40

Investigation: HPLC Robustness Plot of Replications for k2 with Experiment Number labels 3.40

Investigation: HPLC Robustness Plot of Replications for Res1 with Experiment Number labels

1
2.20 k1 2.00 1.80 1.60

3 5 4 6 8
7 8 9 10 11

3 1 5 4 2 6
1 2 3 4 5 6 7 8 9 10 11 Replicate Index
MODDE 7 - 2003-11-17 10:39:21

3 7 12 11
Res1

7
k2

1
1.850

9 10

3.20

7 5 8 6 10 9 12 11

11 12

3.00 2.80 2.60

9 10

1.800

4 2
1 2 3 4 5 6

8
1.750 7 8 9 10 11

Replicate Index
MODDE 7 - 2003-11-17 10:37:54

Replicate Index
MODDE 7 - 2003-11-17 10:39:50

In evaluation of the raw data, it is compulsory to check the data distribution of the responses, to reveal any need for response transformation. We may check this need by making a histogram of each response. Such histograms are plotted below and they inform us that it is appropriate to work in the untransformed scale of each response. In most cases it is convenient to work with log k, but not here.
Investigation: HPLC Robustness Histogram of k1 7 6 5 Count Count 4 3 2 1 0 1.50 1.75 2.00 Bins
MODDE 7 - 2003-11-17 10:42:54

Investigation: HPLC Robustness Histogram of k2 6 5 4 3 2 1 Count 7 6 5 4 3 2 1 2.40 2.70 3.00 Bins


MODDE 7 - 2003-11-17 10:43:35

Investigation: HPLC Robustness Histogram of Res1

2.25

2.50

3.30

3.60

0 1.700

1.755

1.810 Bins

1.865

1.920

MODDE 7 - 2003-11-17 10:44:02

Task 2
The regression analysis phase in robustness testing is carried out in a manner similar to that of screening and optimisation. However, the focus is primarily placed on the R2 and Q2 parameters, and the analysis of variance results, but not so much on residual plots and other graphical tools. The reason for this is that the interest in robustness testing lies in classifying the regression model as significant or not significant. With such information it is then possible to get an understanding of the robustness. Another modelling difference between robustness testing and screening/optimisation is that model refinement is usually not carried out. We fitted a linear model with 6 terms to each response. The overall results of the model fitting are displayed in the summary of fit plot. The predictive power ranges from poor to excellent. The Q2 values are 0.92, 0.96, and 0.12, for k1, k2, and Res1, respectively. In robustness testing the ideal result is a Q2 of near zero value. Hence, the Q2 of 0.12 for Res1 is an indication of an extremely weak relationship between the factors and the response, that is, it seems as if the response is robust. The low Q2 for Res1 might be explained by the fact that this response is close to constant across the entire design, and hence there is not much response variation to account for. The high Q2s of k1 and k2, on the other hand, indicate that these responses are sensitive to the small factor

Copyright Umetrics AB, 04-02-10

Page 3 (7)

changes. However, for these latter responses we cannot make any robustness statement, as there are no specifications to compare with.

Investigation: HPLC Robustness (MLR) Q2 Model Validity Summary of Fit Reproducibility 1.00 0.80 0.60 0.40 0.20 0.00

R2

k1
N=12 DF=6

k2
Cond. no.=1.2289 Y-miss=0

Res1

The results of the second diagnostic tool, the analysis of variance, are summarised in the three tables. Remembering that the upper p-value should be smaller than 0.05 and the lower p-value larger than 0.05, we realise that the former test is a borderline case with respect to Res1, because the upper listed p-value is 0.059. This suggests that the model for Res1 is insignificant, and therefore that Res1 is robust.

Task 3
The derived models will now be used in a general discussion concerning various outcomes of robustness testing. In this discussion a possible solution to the problem given in Task 3 is presented. First limiting case Inside specification/Significant model The first limiting case is inside specification and significant model. The HPLC application contains one example of this limiting case, the Res1 response. We know from the initial raw data assessment that this response is robust, because all the measured values are inside the specification, that is, above 1.5. Actually, as highlighted in the first figure below, the measured values are all above 1.75. The question of a significant model, however, is more debatable. It is possible to interpret the regression model as a weakly significant regression equation. We will do so in this section for the sake of illustration. The classification of the model as significant is based on a joint assessment of the low, but positive, Q2, seen in the second figure, and the significant linear term of acetonitrile, seen in the third figure. Hence, Res1 may be regarded as a representative of the first limiting case.

Copyright Umetrics AB, 04-02-10

Page 4 (7)

Investigation: HPLC Robustness Plot of Replications for Res1 with Experiment Number labels 2.50 2.00 1.50 1.00

Investigation: HPLC Robustness (MLR) Q2 Model Validity Summary of Fit Reproducibility 1.00 0.80 0.60 0.40

R2

Investigation: HPLC Robustness (MLR) Scaled & Centered Coefficients for Res1 (Extended) 0.040 0.020 0.000 -0.020 -0.040 pH Co(ColA) Co(ColB) OS Ac Te

10 9

12 11

Res1

0.50 0.00

0.20
1 2 3 4 5 6 7 8 9 10 11 Replicate Index
MODDE 7 - 2003-11-17 10:55:00

0.00

k1
N=12 DF=6

k2
Cond. no.=1.2289 Y-miss=0

Res1
N=12 DF=6 R2=0.772 Q2=0.121

R2 Adj.=0.582 RSD=0.0248 Conf. lev.=0.95


MODDE 7 - 2003-11-17 10:55:34

An interesting consequence of these modelling results is that it appears to be possible to relax the factor tolerances and still maintain a robust system. For instance, the model interpretation reveals that the amount of acetonitrile could be as high as 28%, without compromising the goal of upholding a resolution above 1.5. Furthermore, in robustness testing it may be useful to estimate the response values of the most extreme experiments. The regression coefficient plot shows how to obtain these estimates. We can see that one extreme experimental condition is given by the factor combination: low Ac, high pH, high Te, high OS, and ColB. The other extreme experiment is this pattern reversed. The prediction spreadsheet gives these Res1 predictions and they are both valid with regard to the given specification.

Second limiting case Inside specification/Non-significant model The second limiting case is inside specification with a non-significant model. This is the ideal outcome of a robustness test. Again, we will use the Res1 response as an illustration. We know that the measured values of this response are all inside specification. In addition, we can interpret the obtained regression model as nonsignificant. This classification of the model as non-significant is contrary to the classification made in the previous section, but is still reasonable and is made for the purpose of illustrating the second limiting case. In general, to assess model significance, two diagnostic tools emerge as the most appropriate. The first tool is the R2/Q2 parameters. When these are both near zero, as is the situation in the left-hand figure below, we have the ideal case. This means that we are trying to model a system in which there is no relationship between the factors and the response in question. In reality, however, one has to expect that small deviations from this outcome will occur. A typical result is the case when R2 is rather large, in the range of 0.5-0.8, and Q2 low or close to zero. As shown in the middle figure, this is the case for Res1 which points to an insignificant model.
Investigation: itdoe_roblimcases (MLR) Q2 Model Validity Summary of Fit Reproducibility 1.00 0.80 0.60 0.40
0.40 1.00 0.80 0.60
R2

Investigation: HPLC Robustness (MLR) Q2 Model Validity Summary of Fit Reproducibility

R2

0.20 0.00 -0.20


N=11 DF=5

0.20 0.00

vetific
Cond. no.=1.1726 Y-miss=0

k1
N=12 DF=6

k2
Cond. no.=1.2289 Y-miss=0

Res1

Copyright Umetrics AB, 04-02-10

Page 5 (7)

The second important modelling tool relates to the analysis of variance, and particularly the upper F-test, which is a significance test of the regression model. We can see in the right-hand figure, that the Res1 model is weakly insignificant because the p-value (0.059) exceeds 0.05. Hence, we conclude that no useful model is obtainable. When no model is obtainable it is reasonable to anticipate that all the variation in the experiments can be seen as a variation around the mean. This variation can then be seen as the mean value t-value * standard deviation. Third limiting case Outside specification/Significant model The third limiting case is outside specification with a significant model. This limiting case occurs whenever a significant regression model is acquired, and the raw response data themselves do not fulfil the goals of the problem formulation. We will use the second response, k2, of the HPLC data to illustrate this limiting case. In order to accomplish a meaningful illustration, we will have to define a specification for k2, for example that k2 should be between 2.7 and 3.3. This kind of specification of a capacity factor is uncommon in pharmaceutical industry, but is shown here for illustration. We start by assessing the statistical behaviour of the k2 regression model. This behaviour is evident from the lefthand figure below, which indicates the sensitivity to small factor changes of k2 (as well as k1). In order to understand what is causing this susceptibility to changes in the factors, it is necessary to consult the regression coefficients displayed in the right-hand figure.
Investigation: HPLC Robustness (MLR) Q2 Model Validity Summary of Fit Reproducibility 1.00 0.80 0.60 0.40 0.20 0.00
R2

Investigation: HPLC Robustness (MLR) Scaled & Centered Coefficients for k2 (Extended) 0.10 0.00 -0.10 -0.20 -0.30 pH Co(ColA) Co(ColB)
Page 6 (7)

k1
N=12 DF=6

k2
Cond. no.=1.2289 Y-miss=0

Res1

N=12 DF=6

R2=0.989 Q2=0.959

R2 Adj.=0.981 RSD=0.0418 Conf. lev.=0.95


MODDE 7 - 2003-11-17 11:05:45

We can see that it is mainly acetonitrile, pH and temperature, that affect k2. Using the procedure outlined in connection with the first limiting case, we may understand how to change the factor intervals to accomplish two things: (i) how to get k2 inside specification; and (ii) how to produce a non-significant model (i.e., how to approach the second limiting case). Firstly, it is possible to predict the most extreme experimental values (in the investigated area) of k2. These are the predictions listed on the first two rows in the next figure, and they amount to 2.50 and 3.49. Clearly, we are outside the 2.7-3.3 specification.

Copyright Umetrics AB, 04-02-10

OS

Ac

Te

In order to move to within this specification, we must adjust the factor ranges of the three influential factors, and this is shown in rows three and four. If we also want the 95% confidence intervals, and not only the point estimates to be inside specification, somewhat harder demands on the factors are needed. Moreover, to get a non-significant regression model even narrower factor intervals are needed. This is done as follows: The regression coefficient of acetonitrile is 0.33 and its 95% confidence interval 0.036. These numbers mean that this coefficient must be decreased by a factor of 10, that is, be smaller than around 0.03, in order to make this factor non-influential for k2. Since this coefficient corresponds to the response change when the amount of acetonitrile is increased by 1% (from 26% to 27%) the new high level must be lowered from 27% to 26.1%. A similar reasoning applies to the new lower level. Hence, the narrower, more robust, factor tolerances of acetonitrile ought to be between 25.9% and 26.1%. A similar reasoning for temperature indicates that the factor interval should be decreased to one-third of the original size. Appropriate low and high levels thus appear to be 20C and 23C. Predictions obtained are listed in rows five and six. These new settings must, of course, be verified with a new design. This concludes our treatment of the third limiting case. The take-home message here is that it is possible to use the modelling results to understand how to reformulate the factor settings so that robustness can be obtained. Fourth limiting case Outside specification/Non-significant model The fourth limiting case is outside specification with a non-significant model. This limiting case may be the result when the derived regression model is poor, and there are anomalies in the data. Such anomalies are important to uncover, because their presence will influence the modelling. An informative graphical tool for identifying whether this limiting case is taking place is the replicate plot. The left-hand figure shows an example in which one strong outlier is present, which will invalidate all possibilities for robustness. The second figure depicts another case where all the replicated centre points have much higher response values than the other runs. This pattern hints at curvature and implies non-robustness. A third common situation, which partly resembles the first case, might take place when one experiment deviates from the rest and also falls outside some predefined robustness limits. This is shown in the last figure.
Investigation: itdoe_roblimcases

Investigation: itdoe_roblimcases Plot of Replications for vetific with Experiment Number labels

Investigation: itdoe_roblimcases Plot of Replications for vetific with Experiment Number labels

Plot of Replications for vetific with Experiment Number labels

45 vetific 40 35 30 25 1

10 9 11
70 vetific

10 9 11

60

4
2 3 4 5 6 7 8 9 Replicate Index
MODDE 7 - 2003-11-17 11:58:00

50 1

1
2

2
3

3
4

4
5

5
6

6
7

7
8

8
9

100 90 80 70 60 50 40 30 20 10 0

3 1 2 4 5 6 7 8 10 11 9

vetific

Replicate Index
MODDE 7 - 2003-11-17 11:59:51

Replicate Index
MODDE 7 - 2003-11-17 12:01:59

Evidently, there can be several underlying explanations to this limiting case, and we have just shown a few. Therefore, we consider this limiting case as the most complex one. In summary, we have described four limiting cases of robustness testing, and it is important to realise that robustness testing results are not statically locked to these four extreme outcomes. In principle, there is a gradual transition from one limiting case to another, and hence an infinite number of outcomes are conceivable.

Conclusions
Evaluation of the data demonstrated that the response Res1 was robust because it was possible to maintain a resolution above 1.5 for all 12 experiments.

Copyright Umetrics AB, 04-02-10

Page 7 (7)

DOE-Exercise CakeTaguchi (Inner/Outer arrays)


Finding an optimal cake mix composition insensitive to time and temperature fluctuations

Background
This is an industrial pilot plant investigation aimed at designing a cake mix giving tasty products.

Objective
The final goal was to design a cake mix which would produce a good cake even when a customer does not rigidly follow the baking instructions. To explore whether this was feasible, the factors Flour, Shortening, and Egg Powder were used as design factors and varied in a cubic inner array. They were varied between 200-400g (Flour), 50-100g (Shortening), and 50-100g (Egg Powder). In addition, two noise factors were incorporated in the experimental design as a square outer array. These factors were baking Temperature, varied between 175 and 225C, and Time spent in oven, varied between 30 and 50 minutes

Data
The investigators made 55 experiments. The inner array is a two-level full factorial design in 11 (8+3) runs, and the outer array a two-level full factorial design in 5 (4+1) runs, resulting in 11*5 = 55 experimental combinations. We will analyse this data set in two ways. A schematic representation of the experimental plan is given in Figure 1.
Temp Temp 225 175 30 Time 50 225 Temp 175 225 Temp 175 30 Time 50 225 Temp 175 30 Time 50 30 Time 50 30 Time 50 225 175 Temp 30 Time 50 225 Temp 175 225 175

100 Eggpowder

6
30 Time 50

Temp

30 Time 50

225 175

30 Time 50 Temp 225 175

50 100

50

200

Flour

400

Copyright Umetrics AB, 04-02-10

Sho rten in g

Figure 1: The arrangement of the factors as inner and outer arrays. This arrangement was introduced by the Japanese engineer Genichi Taguchi.

Page 1 (9)

Organisation of data for part I (MODDE worksheet should have 11 runs): The classical approach for analysing DOE data organised in inner and outer arrays is to form, for each point in the inner array (here: Cake Mix factors), the average response value across all points in the outer array (here: Time & Temperature). This gives two responses, the average taste for each point in the inner array, and the standard deviation around this average. Note: with this approach there will be no model terms related to Time and Temperature.

Tasks
Task 1
Define a new investigation in MODDE with three factors and two responses. Select Screening and an interaction model. Select a full factorial design with 8 (corners) + 3 (centre-points) runs. Evaluate the raw data. Fit the regression model. Which are the important factors? Are there any non-significant model terms? Are there any outliers? Comment on lack of fit. Which factor combination leads to an optimal taste? Which factors correlate with StDev? How shall the inner array factors be set to minimise the influence of the outer array factors?

Copyright Umetrics AB, 04-02-10

Page 2 (9)

Organisation of data for part II (MODDE worksheet should have 55 runs): A problem with the foregoing analysis approach is that it does not enable a quantitative understanding of the impact of baking Time and Temperature, since these factors were not introduced in the regression model. One way to accomplish this is to re-organise the worksheet so that it contains all 55 experiments and five factors in the model. The consequence of this latter interaction analysis approach is that the StDev response vanishes. Another advantage of this latter approach is that it is possible to identify outliers.

No Flour Shortening Eggpowder Temp Time Taste No Flour Shortening Eggpowder Temp Time Taste 1 200 50 50 175 30 1.1 34 200 50 50 225 50 1.3 2 400 50 50 175 30 3.8 35 400 50 50 225 50 2.1 3 200 100 50 175 30 3.7 36 200 100 50 225 50 2.9 4 400 100 50 175 30 4.5 37 400 100 50 225 50 5.2 5 200 50 100 175 30 4.2 38 200 50 100 225 50 3.5 6 400 50 100 175 30 5 39 400 50 100 225 50 5.7 7 200 100 100 175 30 3.1 40 200 100 100 225 50 3 8 400 100 100 175 30 3.9 41 400 100 100 225 50 5.4 9 300 75 75 175 30 3.5 42 300 75 75 225 50 4.1 10 300 75 75 175 30 3.4 43 300 75 75 225 50 3.8 11 300 75 75 175 30 3.4 44 300 75 75 225 50 3.8 12 200 50 50 225 30 5.7 45 200 50 50 200 40 3.1 13 400 50 50 225 30 4.9 46 400 50 50 200 40 3.2 14 200 100 50 225 30 5.1 47 200 100 50 200 40 5.3 15 400 100 50 225 30 6.4 48 400 100 50 200 40 4.1 16 200 50 100 225 30 6.8 49 200 50 100 200 40 5.9 17 400 50 100 225 30 6 50 400 50 100 200 40 6.9 18 200 100 100 225 30 6.3 51 200 100 100 200 40 3 19 400 100 100 225 30 5.5 52 400 100 100 200 40 4.5 20 300 75 75 225 30 5.15 53 300 75 75 200 40 6.6 21 300 75 75 225 30 5.3 54 300 75 75 200 40 6.5 22 300 75 75 225 30 5.4 55 300 75 75 200 40 6.7 23 200 50 50 175 50 6.4 24 400 50 50 175 50 4.3 25 200 100 50 175 50 6.7 26 400 100 50 175 50 5.8 27 200 50 100 175 50 6.5 28 400 50 100 175 50 5.9 29 200 100 100 175 50 6.4 30 400 100 100 175 50 5 31 300 75 75 175 50 4.3 32 300 75 75 175 50 4.05 33 300 75 75 175 50 4.1

Task 2
Define a new investigation in MODDE with five factors and one response. Select Screening as objective and an interaction model. Create a design with 55 rows. Paste contents from CakeTaguchi.DIF into the MODDE worksheet. Evaluate the raw data. Fit the model. Which are the important factors? Are the residuals approximately normally distributed? Comment on lack of fit. Investigate the model coefficients and particularly examine baking Time and Temperature. Are they influential? How shall the cake-mix recipe be modified to minimise the influence of baking time and temperature?

Copyright Umetrics AB, 04-02-10

Page 3 (9)

Solutions to CakeTaguchi
Task 1
It is instructive to first consider the raw experimental data. The first two plots show the replicate plots of the responses. We see that for both responses the replicate error is small and therefore satisfactory. It is also interesting that the responses are inversely correlated (third figure). We recall that the experimental goal is a factor combination producing a tasty cake and with low variation. Hence, it seems as if experiment number 6 is the most promising one.
Investigation: CakeTaguchi_classical Investigation: CakeTaguchi_classical

Investigation: CakeTaguchi_classical Raw Data Plot with Experiment Number labels 0.40 0.30

LogStD

Plot of Replications for Taste with Experiment Number labels 6.00 5.50 Taste 5.00 4.50 4.00 3.50 1

Plot of Replications for LogStD with Experiment Number labels 0.40 0.30 LogStD

1 7 3 11 10 9 4 8
Taste

6 4 3 7 1
2

1 3 2 4 6
1 2 3 4 5 6 7 8 Replicate Index
MODDE 7 - 2003-11-12 11:04:24

0.20 0.10 0.00 -0.10

9 11 10

LogStD

7 11 10 9

0.20 0.10 0.00 -0.10


9

2
3 4 5 6 7 8 9 Replicate Index
MODDE 7 - 2003-11-12 11:04:02

-0.20

-0.20

3.63.84.04.24.44.64.85.05.25.45.65.86.0

Next, we examine the modelling results obtained when fitting an interaction model to each response. Note that the negative Q2 of StDev indicates model problems. The model for Taste is of higher quality, but we remember from previous modelling attempts (see Exercise CakeMix) that even better results are possible if the two nonsignificant two-factor interactions are omitted.
Investigation: CakeTaguchi_classical (MLR) Q2 Model Validity Summary of Fit Reproducibility 1.00
0.40
R2

Investigation: CakeTaguchi_classical (MLR) Scaled & Centered Coefficients for Taste

Investigation: CakeTaguchi_classical (MLR) Scaled & Centered Coefficients for LogStD 0.10

0.80 0.60 0.40 0.20 0.00 -0.20

0.20 0.00 -0.20 -0.40 -0.60 Fl Fl*Egg Fl*Sh Sh*Egg Egg Sh

0.00

-0.10

-0.20 Fl Fl*Egg Fl*Sh Egg Sh*Egg Sh


R2=0.959 Q2=-0.284

Taste
N=11 DF=4

LogStD
Cond. no.=1.1726 Y-miss=0

N=11 DF=4

R2=0.995 Q2=0.874

R2 Adj.=0.988 RSD=0.0768 Conf. lev.=0.95


MODDE 7 - 2003-11-12 11:14:02

N=11 DF=4

R2 Adj.=0.898 RSD=0.0540 Conf. lev.=0.95


MODDE 7 - 2003-11-12 11:34:58

The results from fitting a refined model to each response are seen below. The model for StDev has improved a lot as a result of model pruning. Two interesting observations can now be made. The first is related to the Sh*Egg interaction, which is much smaller for StDev than for Taste. The second observation concerns the Fl main effect, which shows that Flour is the factor causing most spread around the average Taste. Hence, this is a factor to adjust in order to achieve robustness. The models that we have derived will now be used to accomplish the experimental goal.

Copyright Umetrics AB, 04-02-10

Page 4 (9)

Investigation: CakeTaguchi_classical (MLR) Q2 Model Validity Summary of Fit Reproducibility 1.00


0.40

R2

Investigation: CakeTaguchi_classical (MLR) Scaled & Centered Coefficients for Taste

Investigation: CakeTaguchi_classical (MLR) Scaled & Centered Coefficients for LogStD 0.10

0.80 0.60 0.40 0.20 0.00

0.20 0.00 -0.20 -0.40 -0.60 Fl Egg Sh*Egg Sh

0.00

-0.10

-0.20 Fl Egg Sh*Egg


Egg (low )

Taste
N=11 DF=6

LogStD
Cond. no.=1.1726 Y-miss=0

N=11 DF=6

R2=0.988 Q2=0.937

R2 Adj.=0.980 RSD=0.0974 Conf. lev.=0.95


MODDE 7 - 2003-11-12 11:15:34

N=11 DF=6

R2=0.939 Q2=0.677

Sh

R2 Adj.=0.899 RSD=0.0538 Conf. lev.=0.95


MODDE 7 - 2003-11-12 11:15:48

One way to understand the impact of the surviving two-factor interaction is to make interaction plots. Evidently, the impact of this model term is greater for Taste than for StDev. This is inferred from the fact that the two lines cross each other in the plot related to Taste, but do not cross in the other interaction plot. Both plots indicate that low level of Shortening and high level of EggPowder is favourable for high Taste and low StDev.
Investigation: CakeTaguchi_classical (MLR) Interaction Plot for Sh*Egg, resp. Taste 5.50 5.00 Taste 4.50 4.00 3.50
Egg (low ) Egg (high)

Investigation: CakeTaguchi_classical (MLR) Interaction Plot for Sh*Egg, resp. LogStD Egg (high)

Egg (high)
LogStD

0.200

Egg (low)

Egg (low) Egg (high)

0.150 0.100 0.050

Egg (low) Egg (high) Egg (high)


50 60 70 80 90 100 Shortening
N=11 DF=6 R2=0.939 Q2=0.677 R2 Adj.=0.899 RSD=0.0538
MODDE 7 - 2003-11-12 11:16:52

Egg (low)
50 60 70 80 90 100 Shortening
N=11 DF=6 R2=0.988 Q2=0.937 R2 Adj.=0.980 RSD=0.0974
MODDE 7 - 2003-11-12 11:17:06

0.000

An alternative procedure for understanding the modelled system is to make the response contour plots shown below. These contours were created by setting Flour to its high level, as this was found favourable in the modelling. The two contour plots convey an unambiguous message. The best cake mix conditions are found in the upper left-hand corner, where the highest taste is predicted, and at the same time the lowest standard deviation. This location corresponds to the factor settings Flour = 400, Shortening = 50, and EggPowder = 100. At this factor combination, Taste is predicted at 5.84 0.18, and StDev at 0.69 and with 95% confidence interval given by 0.55 and 0.87. Bearing in mind that the highest registered average value of Taste is 5.9, and the lowest value of StDev is 0.67, these predictions appear reasonable.

Copyright Umetrics AB, 04-02-10

Page 5 (9)

Flour = 400g

Task 2
One drawback of the classical data analytical approach is that it does not allow the user to identify which noise factors could affect the variability of the responses. For the Taguchi method to be really successful, one would need to be able to estimate the impact of the noise factors and possible interactions between the design and the noise factors. Clearly, by definition, the success of the Taguchi approach critically depends on the existence of such noise-design factor interactions. Otherwise, the noise (variability) cannot be reduced by changing some design factors. Information about noise-design factor interactions can be extracted if both the noise and the design factors are combined in a single design. Then, a regression model can be fitted which contains both types of factors and their interactions. In this form of analysis, design factor effects in the classical approach (Task 1) now correspond to noise-design factor interactions (Task 2). We will now unfold the data table so that it comprises 55 rows and proceed with the Taguchi analysis. As usual, we commence the data analysis by evaluating the raw data. The replicate plot suggests that the replicate error is small, and the histogram shows that the response is approximately normally distributed. Hence, we may proceed to the regression analysis phase, without further pre-processing of the data.
Investigation: CakeTaguchi_interaction

Investigation: CakeTaguchi_interaction

Plot of Replications for Taste with Experiment Number labels 7 6 Taste 5 4 3 2 1 0

Histogram of Taste 14 12 10 Count 8 6 4 2 0 1.00 1.80 2.60 3.40 4.20 5.00 5.80 6.60 7.40 Bins
MODDE 7 - 2003-11-12 11:22:34

50 16 55 25 53 54 27 15 23 29 18 17 49 28 26 12 19 39 22 41 47 20 14 21 6 30 37 13 52 4 24 31 58 33 42 48 32 2 44 43 3 11 9 38 46 7 10 51 36 40 45 35 34 1
10 20 30 40 50 Replicate Index
MODDE 7 - 2003-11-12 11:22:15

As seen in the summary of fit plot, the regression analysis gave a poor model with R2 = 0.60 and Q2 = 0.18. Such a large gap between R2 and Q2 is undesirable and indicates model inadequacy. The N-plot of residuals in
Copyright Umetrics AB, 04-02-10 Page 6 (9)

the next figure reveals no clues as to the poor modelling performance. The model also shows lack of fit (negative MVal).
Investigation: CakeTaguchi_interaction (MLR) Q2 Model Validity Summary of Fit Reproducibility 1.00
N-Probability
R2

Investigation: CakeTaguchi_interaction (MLR) Taste with Experiment Number labels 0.995 0.99 0.98 0.95 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.05 0.02 0.01 0.005

0.80 0.60 0.40 0.20 0.00 -0.20


N=55 DF=39

1
-4 -3 -2

23 55 53 12 54 29 218 50 41 37 25 3 9 8 4 47 49 1 6 6 42 27 15 44 4 3 7 24 13 26 9 10 1 1 3 22 45 52 28 1 4 5 21 30 46 20 35 40 34 19 36 48 38 31 51 33 32 17
-1 0 1 2 3 4

Deleted Studentized Residuals


N=55 DF=39 R2=0.605 Q2=0.185 R2 Adj.=0.453 RSD=1.0545
MODDE 7 - 2003-11-12 11:24:02

Taste
Cond. no.=1.3110 Y-miss=0

However, the regression coefficient plot does reveal two plausible causes. Firstly, the model contains many irrelevant two-factor interactions. Secondly, it is surprising to see that the Fl*Te and Fl*Ti two-factor interactions are so weak. Since we observed (in Task 1) the strong impact of Flour on StDev, we would now expect much stronger noise-design factor interactions. In principle, this means that there must be a crucial higher-order term missing from the model, the Fl*Te*Ti three-factor interaction. Consequently, in the model revision, we decided to add this three-factor interaction and remove six unnecessary two-factor interactions.
Investigation: CakeTaguchi_interaction (MLR) Scaled & Centered Coefficients for Taste

0.50 0.00 -0.50 -1.00 Fl*Sh Fl*Egg Fl*Ti Egg*Ti Sh*Ti Fl Ti Fl*Te Sh*Egg Egg*Te Sh*Te Te*Ti
Page 7 (9)

Sh

Egg

Te

N=55 DF=39

R2=0.605 Q2=0.185

R2 Adj.=0.453 RSD=1.0545 Conf. lev.=0.95


MODDE 7 - 2003-11-12 11:25:01

When re-analysing the data, a more stable model with a reasonable R2 = 0.69 and Q2 = 0.57 was the result. An interesting aspect is that the R2 obtained is lower than in the classical analysis approach. This is due to the stabilising effect achieved by forming the average Taste over five trials in the classical analysis approach. Concerning the current model, we are unable to detect significant outliers among the individual experiments. The relevant N-plot of residuals is displayed below.

Copyright Umetrics AB, 04-02-10

Investigation: CakeTaguchi_interaction (MLR) Q2 Model Validity Summary of Fit Reproducibility 1.00


N-Probability

R2

Investigation: CakeTaguchi_interaction (MLR) Taste with Experiment Number labels 0.995 0.99 0.98 0.95 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.05 0.02 0.01 0.005

0.80 0.60 0.40 0.20 0.00 -0.20


N=55 DF=44

55 53 54 5023 12 41 26 2 24 15 47 18 49 1 3 3 29 25 42 40 3 0 5 3 7 7 28 39 44 43 19 1 6 4 9 36 45 11 1 0 6 52 38 8 22 21 34 46 20 27 17 1 48 31 14 33 51 32 35
-4 -3 -2 -1 0 1 2 3 4 Deleted Studentized Residuals
N=55 DF=44 R2=0.693 Q2=0.571 R2 Adj.=0.623 RSD=0.8751
MODDE 7 - 2003-11-12 11:26:25

Taste
Cond. no.=1.3110 Y-miss=0

Having acquired a reasonable model, it is appropriate to consider the regression coefficients, which are displayed below. We can see the significance of the new three-factor interaction. This is in line with the previous finding on the impact of Flour on StDev. Some smaller two-factor interactions, which are components of the three-factor term (i.e. Fl*Te, Fl*Ti and Te*Ti), are kept in the model to make the three-factor interaction more interpretable.
Investigation: CakeTaguchi_interaction (MLR) Scaled & Centered Coefficients for Taste 1.00 0.50 0.00 -0.50 -1.00 Fl*Ti Fl*Te*Ti
Page 8 (9)

Fl

Ti

N=55 DF=44

R2=0.693 Q2=0.571

R2 Adj.=0.623 RSD=0.8751 Conf. lev.=0.95


MODDE 7 - 2003-11-12 11:26:41

The meaning of the three-factor interaction is easiest understood by constructing an interaction plot. The figure below displays the impact of the three-factor interaction. What should we look for in this kind of plot? The answer is that we want to get an indication of how to adjust the controllable factor Flour, so that the impact of variations in the uncontrollable factors Temperature and Time are minimised. The figure shows that by adjusting Flour to 400g the spread in Taste due to variations in Temperature and Time is reduced.

Copyright Umetrics AB, 04-02-10

Sh*Egg

Te*Ti

Fl*Te

Egg

Sh

Te

Investigation: CakeTaguchi_interaction (MLR) Interaction Plot for Fl*Te*Ti, resp. Taste


Te (low ), Ti (low ) Te (low ), Ti (high) Te (high), Ti (low ) Te (high), Ti (high)

Te (low), Ti (high) Te (high), Ti (low) Te (high), Ti (low)

5 Taste

Te(high), (low), Ti Te Ti (high) (high) Te (low), Ti (low)

Te (low), Ti (low) Te (high), Ti (high)


Flour
N=55 DF=44 R2=0.693 Q2=0.571 R2 Adj.=0.623 RSD=0.8751

190 200 210 220 230 240 250 260 270 280 290 300 310 320 330 340 350 360 370 380 390 400 410

Shortening = 50 Eggpowder = 100


MODDE 7 - 2003-11-12 11:28:25

Furthermore, in solving the problem, we must not forget the significance of the strong Sh*Egg two-factor interaction. We know from the initial analysis that the combination of low Shortening and high EggPowder produces the best cakes. With these considerations in mind we draw the response contour triplet shown in the figure below. Because these contours are relatively flat, especially when Flour = 400, we can conclude that the system is robust. Hence, when industrially producing a cake mix with the composition Flour 400g, Shortening 50g, and EggPowder 100g, together with a cooking recommendation of 200 C and time 40 min, sufficient robustness towards consumer misuse ought to be the result.

Conclusions
This example illustrates two principal approaches to the analysis of Taguchi-designed data. In the analysis, it was found that an important three-factor interaction existed between Flour, Time and Temperature (Fl*Ti*Te). By interpreting this term it was concluded that the impact of Time and Temperature on variation in Taste was minimised by adjusting Flour from 300 g (initial set-point) to 400 g (new product recipe).

Copyright Umetrics AB, 04-02-10

Page 9 (9)

DOE-Exercise LoafVolume (Inner/Outer arrays)


Investigation of mixture and process factors affecting the volume of loaves

Background
Many factors at a bakery can affect the quality of loaves including factors related to the recipe of the dough and those related to the baking conditions. Naes et al. [Chemometrics and Intelligent Laboratory Systems 41 (1998) 221-235.] carried out an extensive project in which the influence of five factors on the volume of loaves was studied. Three varieties of wheat flour were used, Tjalve, Folke and HardRS, and two factors were related to baking conditions, i.e., mixing time and proofing time of the dough. The latter two factors may be inconsistent from one bakery to another. As a quality index of loaf formation, the loaf volume was used.

Objective
The experimental objective of this study was to accomplish a factor combination yielding the target loaf volume of 530 cm3. The idea was to find a combination of the three wheat flours constantly yielding loaves of the target volume, and thus being insensitive to changes in mixing time and proofing time.

Data
The investigators made 90 experiments. The experimental plan contains an inner array made up of the three mixture factors (Tjalve, Folke, HardRS) and an outer array consisting of the two process factors (mixing time and proofing time). The inner array is a Simplex Centroid design in 10 runs, and the outer array a CCF in 9 runs resulting in 10*9 = 90 experimental combinations. We will analyse this data set in two ways. A schematic representation of the experimental plan used is given in Figure 1.

Organisation of data for part I (MODDE worksheet should have 10 runs):


The classical approach towards analysing DOE data organised in inner and outer arrays is to form, for each point in the inner array (here: mixture design), the average response value across all points in the outer array (here: CCF process design). This gives two responses, the average loaf volume for each point in the inner array, and the standard deviation around this average. Note that with this approach there will be no model terms related to mixing time and proofing time.

Copyright Umetrics AB, 04-02-10

Page 1 (8)

Tasks
Task 1
Define a new investigation in MODDE with three formulation factors and two responses. Select RSM as objective and a quadratic model. Create a mixture design with 10 rows. Paste contents from LoafVol2.DIF into the MODDE worksheet.

Task 2
Use PLS as the fit method. Which are the important factors? Are the residuals approximately normally distributed? Comment on lack of fit. Is it possible to get a volume of 530cm3 and minimise the spread (standard deviation)? It is desirable to get the standard deviation below 60.

Organisation of data for part II (MODDE worksheet should have 90 runs):


A problem with the foregoing analysis approach is that it does not enable a quantitative understanding of the impact of mixing time and proofing time, since these factors were not introduced in the quadratic model. One way to accomplish this is to re-organise the worksheet so that it contains all 90 experiments and five factors in the model. The consequence of this latter approach is that the StDev response vanishes. Another advantage of this latter approach is that it makes it possible to identify deviating single experiments.

Task 3
Define a new investigation in MODDE with two process factors, three formulation factors, and one response. Select RSM as objective and a quadratic model. Create a D-optimal design with 90 rows. Paste contents from LoafVolume.DIF into the MODDE worksheet.

Task 4
Use PLS as the fit method. Which are the important factors? Are the residuals approximately normally distributed? Comment on lack of fit. Investigate the model coefficients and particularly examine mixing time and proofing time. Are they influential?

Copyright Umetrics AB, 04-02-10

Page 2 (8)

Figure 1: Overview of inner and outer factor arrangement of LoafVolume application.

Copyright Umetrics AB, 04-02-10

Page 3 (8)

Solutions to LoafVolume
Task 2
Using the default quadratic model, a strongly significant model for the average loaf volume was obtained. However, the model for StDev was weaker (low Q2 and problems in ANOVA). The residuals are nearly normally distributed for both responses. Note that the ANOVA is not complete, because there are no replicates available. We can observe from the plot of the raw data (StDev is plotted against loaf volume) that the two responses are strongly correlated (correlation coefficient = 0.90). This means that it will be difficult to get a high value of volume and a low value of the standard deviation.
Investigation: Loafvol2 (PLS, comp.=2) Summary of Fit 1.00
R2 Q2

Investigation: Loafvol2 Raw Data Plot with Experiment Number labels

stdev

80
0.80

9 10 36 4 5 1 2
460 480 500 520 540 loafvolume

70 stdev
0.60 0.40

60 50

0.20

40
0.00 loafvolume
N=10 DF=4 Cond. no.=6.8608 Y-miss=0

stdev

440

Investigation: Loafvol2 (PLS, comp.=2) loafvolume with Experiment Number labels 0.95 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.05

Investigation: Loafvol2 (PLS, comp.=2) stdev with Experiment Number labels

5
N-Probability

N-Probability

2 6
-1.00

10

1 3

8 9

0.95 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.05

9 4 1 3

2 7
-1.00 -0.50

8 5

6 10

-0.50

0.00

0.50

1.00

0.00

0.50

1.00

Standardized Residuals
N=10 DF=4 R2=0.953 Q2=0.782 R2 Adj.=0.894 RSD=10.4747
MODDE 7 - 2003-11-18 09:19:07

Standardized Residuals
N=10 DF=4 R2=0.795 Q2=0.281 R2 Adj.=0.539 RSD=8.5309
MODDE 7 - 2003-11-18 09:18:50

Copyright Umetrics AB, 04-02-10

Page 4 (8)

The coefficient plot of the model for loaf volume indicates that both Folke and HardRS affect the volume, whereas Tjalve does not have same strong influence. The same two factors also affect the StDev response, although these coefficients are not statistically significant according to their 95% confidence intervals. One efficient way of understanding the impact of these models is to make mixture contour plots. The solid arrow indicates where the best compromise is found: the mixture 0.25/0.11/0.64 where loaf volume is estimated at 530 cm3 and StDev as low as possible. To get the prediction uncertainty for this point we may use the prediction spreadsheet in MODDE. It appears possible to suppress the standard deviation below 70, but not below 60. As a consequence, the conclusion of the classical analysis approach is that the mixture of wheat flours used for loafbaking cannot be made sufficiently insensitive towards changes in mixing and proofing times between different bakeries.
Investigation: Loafvol2 (PLS, comp.=2) Scaled & Centered Coefficients for loafvolume Investigation: Loafvol2 (PLS, comp.=2) Scaled & Centered Coefficients for stdev 20 20 cm3 cm3 Tj*Tj 10 0 -10 -20 Tj*Ha Tj*Tj Ha*Ha Ha*Ha Fo*Ha Tj*Ha Tj Tj*Fo Tj Tj*Fo Fo*Ha Ha Fo*Fo Ha Fo*Fo Fo Fo

-20

N=10 DF=4

R2=0.953 Q2=0.782

R2 Adj.=0.894 RSD=10.4747 Conf. lev.=0.95


MODDE 7 - 2003-11-18 09:21:21

N=10 DF=4

R2=0.795 Q2=0.281

R2 Adj.=0.539 RSD=8.5309 Conf. lev.=0.95


MODDE 7 - 2003-11-18 09:21:09

Copyright Umetrics AB, 04-02-10

Page 5 (8)

Task 4
The PLS modelling resulted in a strong model for loaf volume. The R2 and Q2 values of this model are slightly lower than the corresponding values for the previous model regarding the average loaf volume, but the model is very good. The ANOVA table and the N-plot of residuals also suggest that the acquired model is good. In addition, the two PLS score plots reveal the strong correlation among the five factors and the response. When looking at the regression coefficients we realise the strong impact of proofing time (1st bar in coefficient plot) on loaf volume. Generally, with longer proofing time larger loaves are produced. This sensitivity to proofing time means that baking specifications distributed among the different bakeries ought to contain a recommendation regarding an appropriate proofing time. The time used for mixing the dough is less critical. Unfortunately, because there is no strong interaction between the process factors (proofing time & mixing time) and the mixture factors (three types of wheat flour) it will not be possible to adjust the mixture factors and affect loaf volume and minimise the spread in this property.
Investigation: Loafvolume (PLS, comp.=2) Summary of Fit 1.00 0.80 0.60 0.40 0.20 0.00
N=90 DF=75
R2 Q2

loafvolume
Cond. no.=8.2742 Y-miss=0

Investigation: Loafvolume (PLS, comp.=2) loafvolume with Experiment Number labels 0.995 0.99 0.98 0.95 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.05 0.02 0.01 0.005

Investigation: Loafvolume (PLS, comp.=2) Scaled & Centered Coefficients for loafvolume

17
-2

6 74 80 70 55 26 4 5 34 77 2 0 60 40 78 81 33 82 39 65 84 49 29 37 76 57 63 8 3 3 4 19 73 36 1 2 8 71 59 89 67 43 15 56 52 30 9 50 64 23 42 62 2 2 7 25 53 18 13 86 66 69 88 32 79 51 14 38 47 61 27 87 31 6 8 1 44 5 16 85 11 41 9 0 2 72 28 46 1 0 24 48 5 4 21 75 35 58
-1 0 1 2 Standardized Residuals

80 60 cm3 40 20 0 -20 Pr Mi Tj Fo Ha Pr*Pr Mi*Mi Tj*Tj Fo*Fo Ha*Ha Pr*Mi Pr*Tj Pr*Fo Pr*Ha Mi*Tj Mi*Fo Mi*Ha Tj*Fo Tj*Ha Fo*Ha
N=90 DF=75 R2=0.894 Q2=0.754 R2 Adj.=0.874 RSD=22.6934 Conf. lev.=0.95
MODDE 7 - 2003-11-18 09:27:03

N-Probability

N=90 DF=75

R2=0.894 Q2=0.754

R2 Adj.=0.874 RSD=22.6934
MODDE 7 - 2003-11-18 09:26:39

Copyright Umetrics AB, 04-02-10

Page 6 (8)

Investigation: Loafvolume (PLS, comp.=2) Score Scatter: t[1] vs u[1] with Experiment Number labels 4

Investigation: Loafvolume (PLS, comp.=2) Score Scatter: t[2] vs u[2] with Experiment Number labels 3 2 1 u[2] 0 -1 -2 -3

80 81 71 26 77 78 72 86 87 62 45 63 53 70 60 36 34 69 88 54 59 44 50 68 79 27 35 84 52 74 33 23 61 51 25 18 32 83 43 42 76 49 41 14 6 8 40 65 67 20 17 15 85 22 9 16 66 56 13 29 3124 7 57 5 55 47 4 39 75 38 58 82 19 11 30 37 12 73 48 64 2 3 46 21 10 28 1
-3 -2 -1 0 t[1]
N=90 DF=75 Cond. no.=8.2742 Y-miss=0

8990

2 u[1]

-2

82
-4 -3

680 45 8 26 81 63 9 60 62 34 77 7 36 4 70 78 59 53 71 44 5 33 43 40 42 74 18 50 3 52 72 61 55 32 41 89 15 54 4925 20 39 23 14 27 51 2 35 69 79 29 57 56 76 90 13 116 68 37 38 65 2231 17 67 12 30 47 86 87 11 24 19 58 66 88 84 73 48 83 10 64 46 28 75 21 85


-2 -1 t[2]
N=90 DF=75 Cond. no.=8.2742 Y-miss=0

MODDE offers another graphical option to overview the modelling outcome, the 4D-mixture contour plot, which is displayed below. In this plot the colour coding has been made consistent across all nine plots. Thus, we can, for example, observe that when proofing time is kept low, it is impossible to manufacture loaves of the desired volume (530 cm3).

Copyright Umetrics AB, 04-02-10

Page 7 (8)

It is also possible to make a response contour plot showing how the loaf volume changes as a function of proofing and mixing times, at the identified mixture composition 0.25/0.11/0.64/ (Tjalve/Folke/HardRS). The first plot shows how this is accomplished in MODDE and the second plot is the resulting graph. From the lower graph we may conclude that the loaf volume varies when changing proofing and mixing times, that is, the composition of the wheat flour mixture cannot be made such that the resulting loaves become insensitive to changes in the two process factors. To accomplish robustness in this respect much tougher restrictions are needed on the proofing and mixing times.

Conclusions
Loaf volume varies when changing proofing and mixing times. This means that the mixture of the wheat flours cannot be made insensitive to changes in the two process factors. To accomplish robustness in this respect much tougher restrictions are needed on the proofing and mixing times.

Copyright Umetrics AB, 04-02-10

Page 8 (8)

DOE-Exercise Model Updating


Using D-optimal design to update an existing model

Background
Complementing an executed experimental design with additional runs is a common need in DOE. For instance, in a screening situation one may use fold-over to add more experiments to the initial fractional factorial design. Additionally, factorial and fractional factorial designs may be upgraded to more elaborate composite designs (CCF or CCC). Design augmentation may also be undertaken after optimization with the goal of transmuting e.g. a quadratic model to a cubic model. A common feature for these design augmentation principles is that the complement runs are appended to improve the modeling results in a general sense. Therefore, the model upgrading is quite unselective, as it applies to model terms originating from all factors varied. However, sometimes such a broad and unselective design augmentation might not provide the optimal solution to a problem. Rather, it might be desirable to select a critically low number of extra experiments, which are tailored to the estimation of a small set of new, well-identified model terms. This can be accomplished through D-optimal design.

Objective
In this example, we are going to work with a screening application concerning laser welding of nickel material in plate heat exchangers. The objective is not so much to deal with the regression analysis, but to focus on how to add extra runs to the original experimental protocol.

Data
This example relates to one step in the process of fabricating a plate heat exchanger, a laser welding step involving the metal nickel. The investigator, Erik Vnnman, studied the influence of four factors on the shape and quality of the resulting weld. These factors were Power of laser, Speed of laser, Gas flow at Nozzle of welding equipment, and Gas flow at Root, that is, the underside of the welding equipment. One important response is the width of the weld, which should be in the range 0.7-1.0 mm.

Copyright Umetrics AB, 04-02-10

Page 1 (13)

Tasks
Task 1
Define a new project in MODDE consisting of four factors and one response. The design you will need is the 24fractional factorial design (8 + 3 runs). This design supports a linear model in the four factors. Enter the response data and fit the linear model to the data.
1

Task 2
Revise the model from Task 1 by estimating also the cross-term Po*Sp. Discuss the problem of including this term. (Hint: Look at Show/Confoundings).

Copyright Umetrics AB, 04-02-10

Page 2 (13)

Task 3
Model updating is often used after screening, when it is necessary to unconfound two-factor interactions. We will now outline the procedure for adding a few extra experiments to the laser welding data set. Step 1: Make a copy of the current investigation and switch to this copy. Step 2: In the new application, do File/Complement design (this opens a wizard)

Step 3: Select D-optimal design

Copyright Umetrics AB, 04-02-10

Page 3 (13)

Step 4: Select the number of additional runs Comment: To unconfound two two-factor interactions 4 extra experiments are appropriate. This implies that a balanced number of additional experiments is added.

Step 5: Edit the model and add the interesting term(s).

Copyright Umetrics AB, 04-02-10

Page 4 (13)

Step 6: Select the number of additional center-points and name the new investigation Comment: If we do not want to include any center points in the design supplement, the number of center points should be set to zero. This is appropriate if the time span between the 11 first experiments and the new ones is short. Conversely, if considerable time has elapsed between the initial and the new experiments, it is recommended to add one or two center-points to test that the system is stable over time.

Step 7: Select Screening and 15 + 2 runs as lead numbers.

Copyright Umetrics AB, 04-02-10

Page 5 (13)

Step 8: Generate D-optimal designs with 15 runs (here: five repititions)

Step 9: Evaluate the resulting designs. In this case all five alternatives are identical

Copyright Umetrics AB, 04-02-10

Page 6 (13)

Step 10: Generate the selected design

Design tailor-made to resolve Po*Sp and No*Ro !!!

Your task is now the following: Use the approach outlined above and propose an updated experimental design, which is able to resolve Po*Sp and No*Ro from one another. How many extra runs do you think are necessary? Experiment by selecting different number of runs and repititions. Use the condition number and the G-efficiency to identify a suitable design! Also remember that many D-optimal proposals may exist with similar performance measures. It may be necessary to plot the configuration of a set of design candidates to identify the preferred design version. Note: Our solutions to this task display designs different from the one presented above.

Copyright Umetrics AB, 04-02-10

Page 7 (13)

Solutions to MODEL UPDATING


Task 1
The geometry of the fractional factorial design selected is shown below. It is a balanced design, which means that all factors are investigated at both levels of the other factors.

As shown by the summary of fit plot below, the linear regression model is not reliable because of the large gap between R2 and Q2. We must then try to identify the cause of the low model quality. However, neither the analysis of variance nor the N-plot of residuals highlight any apparent reason for the model insufficiency. The regression coefficient plot shows that the factors Power of laser and Speed of laser dominate the model. Something to test in order to improve the model is to estimate the cross-term between these two factors. This is dealt with in Task 2.
Investigation: Updating (MLR) Summary of Fit 1.00 0.80 0.60 0.40 0.20 0.00 -0.20
N=11 DF=6
R2 Q2 Model Validity Reproducibility

Width
Cond. no.=1.1726 Y-miss=0

Copyright Umetrics AB, 04-02-10

Page 8 (13)

Investigation: Updating (MLR) Width with Experiment Number labels 0.95 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.05 -4 -3 -2

Investigation: Updating (MLR) Scaled & Centered Coefficients for Width

N-Probability

5 4

8 1

mm

10 7 2 9 11

3 6

0.20

0.00

-0.20

-1

-0.40 No Po Sp Ro Po*Sp Ro

Deleted Studentized Residuals


N=11 DF=6 R2=0.816 Q2=-0.068 R2 Adj.=0.693 RSD=0.1732
MODDE 7 - 2003-11-18 10:38:58

N=11 DF=6

R2=0.816 Q2=-0.068

R2 Adj.=0.693 RSD=0.1732 Conf. lev.=0.95


MODDE 7 - 2003-11-18 10:39:14

Task 2
As seen below, the introduction of the Po*Sp cross-term has a profound impact on the model quality. The regression coefficient plot shows that this term is almost as large as the main effect of Power of laser. Moreover, the model error has been lowered.
Investigation: Updating (MLR) Summary of Fit 1.00 0.80 0.60 0.40 0.20 0.00
N=11 DF=5
R2 Q2 Model Validity Reproducibility

Width
Cond. no.=1.1726 Y-miss=0

Investigation: Updating (MLR) Width with Experiment Number labels 0.95 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.05 -4

Investigation: Updating (MLR) Scaled & Centered Coefficients for Width 0.20 0.10 0.00 mm -0.10 -0.20 -0.30 No Po Sp

N-Probability

7 2
-3 -2 -1

5 4

1 8 9 11

10

6 3

Deleted Studentized Residuals


N=11 DF=5 R2=0.962 Q2=0.610 R2 Adj.=0.925 RSD=0.0858
MODDE 7 - 2003-11-18 10:40:53

N=11 DF=5

R2=0.962 Q2=0.610

R2 Adj.=0.925 RSD=0.0858 Conf. lev.=0.95


MODDE 7 - 2003-11-18 10:41:02

Copyright Umetrics AB, 04-02-10

Page 9 (13)

However, because of the moderate resolution (IV) of the design used, the Po*Sp two-factor interaction is confounded with another two-factor interaction, namely No*Ro (see Correlation Matrix below). Therefore, the coefficient labeled Po*Sp (above) reflects the sum of contributions from the terms Po*Sp and No*Ro (plus a few higher-order interactions which are assumed negligible). The Confoundings list below overviews the confounding pattern of the 24-1 fractional factorial design. The only way to resolve Po*Sp and No*Ro from one another is to conduct more experiments. This is discussed in Task 3.

Copyright Umetrics AB, 04-02-10

Page 10 (13)

Task 3
The first aspect to consider at this stage is how many extra runs are needed? In principle, only two extra experiments are needed to resolve the Po*Sp and No*Ro two-factor interactions. In practice, however, four additional runs might offer a more stable solution. We will start by adding 2 experiments. This means that in the overview list of the D-optimal designs we should focus on the designs with 13 runs. Below, we see two different proposals displaying identical condition number and G-efficiency. We can see that for both designs variation has been induced in the factors No and Ro. Also note that alternative arrangements of the added experiments exist.

Copyright Umetrics AB, 04-02-10

Page 11 (13)

Resource permitting, the addition of four extra experiments will provide even better resolution between Po*Sp and No*Ro. Below, we show the outcome of a design proposal where four extra runs and two extra center points have been appended to the original data set. Remember that many other alternatives exist with identical performance measures. Quite a few of these are not balanced with regards to the four corner runs, meaning that the low and high level of each factor are not explored using the same number of runs for each level. A 2 + 2 distribution is preferable to a 1 + 3 distribution.

Copyright Umetrics AB, 04-02-10

Page 12 (13)

Conclusions
In the first instance, the researcher conducted a 24-1 fractional factorial design with three center-points, that is, eleven experiments. In the analysis it was found that one two-factor interaction, the one between Po and Sp, was influential. However, because of the moderate resolution of this design, this two-factor interaction is confounded with another two-factor interaction, namely No*Ro. An escape-route out of this problem is to complement the existing design with more experiments. One possibility is the fold-over design, which enables resolution of Po*Sp from No*Ro, as well as resolution of the remaining four two-factor interactions. The disadvantage of making the fold-over is a lot of extra experiments. Eleven additional runs are necessary. An alternative approach in this case, less costly in terms of experiments, is to make a D-optimal design updating, adding only a limited number of extra runs. It was shown how either two or four extra experiments, plus an optional number of centerpoints, could be added to the starting design to achieve this objective. The importance of design balancing was also addressed.

Copyright Umetrics AB, 04-02-10

Page 13 (13)

DOE-Exercise Blocking (RSM)


Investigating block effects in a CCC-design

Background
In a chemical experiment the influence of two factors (time and temperature) on the yield of the main product was investigated. Initially a 22 factorial design augmented with two centre-points was performed. Preliminary examination of the results indicated that the experimental design was correctly positioned in the experimental space and hence there was no need to adjust the low and high settings of the factors. However, there was some indication of a non-linear relationship between the factors and the response and so the design was upgraded to a CCC design by adding the star points and two additional centre points. This design comprised 12 experiments: 4 corner points, 4 star points and 2+2 centre points. This data set is taken from Box GEP, Hunter WG, Hunter JS, Statistics for experimenters, John Wiley & Sons, 1978, p. 519.

Objective
The objectives of this example were two-fold: (1) to identify the optimal settings of time and temperature, (2) to investigate whether there was evidence of a shift in the response data between the two series of experiments (i.e. whether there were significant block effects or not).

Data

Copyright Umetrics AB, 04-02-10

Page 1 (4)

Tasks
Task 1
Define a new investigation in MODDE with two factors and one response. Select RSM, the CCC design using two blocks and two center points in each block. Make sure that you tick the Block interactions check box.

Enter the response data. Note that the values of the star points were rounded to the nearest integer by the experimenters so amend the factor settings accordingly. Evaluate the raw data. Fit the regression model. Which factors affect yield? Are there any nonsignificant model terms? Which factor combination optimises yield? What about the block effects are they significant?

Copyright Umetrics AB, 04-02-10

Page 2 (4)

Solutions to Blocking
Task 1
We start by evaluating the raw data. First we examine the replicate plot which shows that the replicate error is low, which is good. The histogram shows that the response is approximately normally distributed indicating that we have good data to work with.
Investigation: Blocking_RSM Plot of Replications for Yield with Experiment Number labels
B1 B2

Investigation: Blocking_RSM Histogram of Yield


5

90 88 86 Yield 84 82 80 78 1

5 6
Count

11 12 7 8 9 10

1
2 3 4

4
5 6 7 8 9 10 Replicate Index
MODDE 7 - 2004-02-04 15:05:16

0 77 81 85 Bins 89 93

MODDE 7 - 2004-02-04 15:04:39

A strong model was obtained with R2=0.98, Q2=0.95, Model Validity=0.99 and Reproducibility=0.88. The regression coefficients indicate that a low value of time is best but the linear effect of temperature is not significant. However, the quadratic terms of both time and temperature are significant. The block factor and its interactions with time and temperature are not significant. However, the deletion of any of these model terms causes the model quality to deteriorate and so they are kept in the model. There is some evidence that slightly lower yields were obtained in the second set of runs.

Investigation: Blocking_RSM (MLR) R2 Q2 Model Validity Summary of Fit


1.00

Investigation: Blocking_RSM (MLR) Scaled & Centered Coefficients for Yield (Extended)
2 0 g

Reproducibility

0.80

-2 -4

0.60

-6 $Blo(B1) $Blo(B2) Tim*$Blo(B1) Tim*$Blo(B2) Temp*$Blo(B1) Temp*$Blo(B2) Tim Tim*Tim Temp Temp*Temp Tim*Temp

0.40

0.20
N=12 DF=3 R2=0.978 Q2=0.949

0.00 Yield
N=12 DF=3 Cond. no.=3.1808 Y-miss=0

R2 Adj.=0.919 RSD=1.2524 Conf. lev.=0.95


MODDE 7 - 2004-02-06 13:37:45

Copyright Umetrics AB, 04-02-10

Page 3 (4)

The two response surface plots below visualise that higher yields were obtained in the first set of runs (the factorial part of the design). The average difference between the two blocks is 1.76g.

Conclusions
To maximise yield we should use Time=76 min and Temperature=151C. There is a mild shift in yields between the two blocks of experiments.

Copyright Umetrics AB, 04-02-10

Page 4 (4)

DOE-Exercise Mixture Region Training


By-hand training to understand geometries of various mixture regions

Example Lower
Data:
Binder: Oxidizer: Fuel: 0.1-1.0 0.5-1.0 0.1-1.0

Task 1:
Draw the experimental region by-hand.

Task 2:
Use MODDE to calculate the implied upper bounds.

Copyright Umetrics AB, 04-02-10

Page 1 (6)

Example Upper
Data:
Binder: Oxidizer: Fuel: 0.0-0.6 0.0-0.7 0.0-0.4

Task 3:
Draw the experimental region by-hand.

Task 4:
Use MODDE to calculate the implied lower bounds.

Copyright Umetrics AB, 04-02-10

Page 2 (6)

Example Lower and Upper


Data:
Binder: Oxidizer: Fuel: 0.2-0.6 0.2-0.6 0.3-0.5

Task 5:
Draw the experimental region by-hand.

Task 6:
Use MODDE to calculate the implied lower and upper bounds.

Copyright Umetrics AB, 04-02-10

Page 3 (6)

SOLUTIONS to MIXTURE REGION TRAINING


Task 2:
To calculate compatible bounds MODDE followed the scheme listed in the right-hand part of the figure.

Binder
Fu el =
Calculate the implied upper bounds: R(L) 1- 0.1- 0.5 - 0.1 = 0.3 U i* = L i + R L Binder 0.1-1.0; 0.1 + 0.3 = 0.4 Oxidiser0.5-1.0; 0.5 + 0.3 = 0.8 Fuel 0.1-1.0; 0.1 + 0.3 = 0.4 Dashed lines indicate location of implied upper bounds.

0.1

Fu el =

Binder = 0.4

0.4

z er idi Ox =0 .5

Oxidizer

Copyright Umetrics AB, 04-02-10

Ox ze idi r= 0.8

Binder = 0.1

Fuel

Page 4 (6)

Task 4:
To calculate compatible bounds MODDE followed the scheme listed in the right-hand part of the figure.

Binder

Calculate the implied lower bounds: R(U) = 0.6 + 0.7 + 0.4 -1 = 0.7 Binder Oxidizer Fuel Li=Ui-RU -0.1 - 0.6 0.0 - 0.7 -0.3 - 0.4

Binder = 0.6

There are no implied lower bounds.

Oxidizer

Fu =0 .7

el =

0.4 id Ox
Copyright Umetrics AB, 04-02-10

r ize

Fuel

Page 5 (6)

Task 6:
To calculate compatible bounds MODDE followed the scheme listed in the right-hand part of the figure.

Binder
Check if bounds are consistent

RL = 1-0.2-0.2-0.3= 0.3 RU = 0.6+0.6+0.5 -1 = 0.7 x1 x2 x3 0.2-0.6 0.2-0.6 0.3-0.5 0.5 0.5 0.5

Ox ze idi r= 0.2

0.3

el =

Oxidizer

Fu el =

Fu

0.5

Copyright Umetrics AB, 04-02-10

Ox zer idi =0 .5

Binder = 0.5

Dashed lines provide implied upper bounds.

Binder = 0.2

Fuel

Page 6 (6)

DOE-Exercise WAALER (Mixture)


Mixture design for tablet formulation

Background
In tablet manufacturing in pharmaceutical industry it is practical to make experiments according to mixture design. Here, three constituents were varied according to a modified simplex centroid mixture design in order to produce tablets. The three constituents were: cellulose, lactose and dicalciumphosphate.

Objective
The objective of the investigation was to find out how the three excipients influenced release of active substance.

Data
Ten tablets were prepared according to a mixture design in the three excipients mentioned. The response measured was the release (in min) of the active ingredient and this value has to be maximized. The data set is taken from P.J. Waaler, Acta Pharm Nord 4: 9-16, 1992.

Copyright Umetrics AB, 04-02-10

Page 1 (4)

Tasks
Task 1
Create a new investigation in MODDE and define the three mixture factors and the single response according to the information given above. Select RSM as objective and accept the first choice design (Modified Simplex Centroid), using Design Runs = 9 and Centerpoints = 1. MODDE now creates a Worksheet identical to the one shown on the foregoing page. Enter the response values.

Task 2
Select PLS as fit technique. Fit the model. Questions to address and answer: Which are the significant terms? Are the residuals approximately normally distributed? What about Lack of Fit? Review the fit and interpret the model. Which formulation corresponds to maximized release (Hint: Use the Optimizer)?

Task 3
The experimenters performed three verifying experiments. x1 0.5 0.333 0.667 x2 0.125 0 0 x3 0.375 0.667 0.333 release 370 340 345

Compute predictions for these formulations to verify the model.

Copyright Umetrics AB, 04-02-10

Page 2 (4)

Solutions to WAALER
Task 2
The PLS analysis of the tablet data gave a model with R2 = 0.98 and Q2 = 0.55 (upper left-hand plot). These statistics point to an imperfect model, because R2 substantially exceeds Q2. Unfortunately, the second diagnostic tool (upper right-hand plot), the ANOVA table, is incomplete because the lack of fit test could not be performed. However, a possible reason for the poor modelling is found when looking at the N-plot of the response residuals given in the middle left-hand figure. Experiment number 10 is an outlier and degrades the predictive ability of the model. If this experiment is omitted and the model refitted, Q2 will increase from 0.55 to 0.69. We decided not to remove the outlier, primarily to conform with the modelling procedure of the original literature source. The subsequent three plots show the inner relation for the respective PLS model dimension.
Investigation: Waaler_rsm (PLS, comp.=3) Summary of Fit 1.00 0.80 0.60 0.40 0.20 0.00
R2 Q2

release

N=10 DF=4

Cond. no.=7.4174 Y-miss=0

Investigation: Waaler_rsm (PLS, comp.=3) release with Experiment Number labels 0.95 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.05

Investigation: Waaler_rsm (PLS, comp.=3) Score Scatter: t[1] vs u[1] with Experiment Number labels 3

5 6 7 10

4 9 6 5
u[1]

2 1 0 -1 -2 -3

N-Probability

7 2 3 10
-1

8 2

0 Standardized Residuals

4
-3 -2 -1 0 t[1] 1 2 3

N=10 DF=4

R2=0.985 Q2=0.553

R2 Adj.=0.966 RSD=18.7170
MODDE 7 - 2003-11-20 09:07:23

N=10 DF=4

Cond. no.=7.4174 Y-miss=0

Investigation: Waaler_rsm (PLS, comp.=3) Score Scatter: t[2] vs u[2] with Experiment Number labels

Investigation: Waaler_rsm (PLS, comp.=3) Score Scatter: t[3] vs u[3] with Experiment Number labels

1.00 0.50 u[2] 0.00 -0.50 -1.00 -1.50 -1.50

5 9 1 3 2
u[3]

1 0 -1 -2 -3

1 23 6 9 4 8 7 5

87

10

4
-1.00 -0.50 0.00 t[2]
N=10 DF=4 Cond. no.=7.4174 Y-miss=0

0.50

1.00

-4

10
-1 0 t[3]
N=10 DF=4 Cond. no.=7.4174 Y-miss=0

Copyright Umetrics AB, 04-02-10

Page 3 (4)

Scaled and centered regression coefficients of the computed model are plotted in the left-hand plot below. This coefficient plot shows that in order to maximize the release, the amount of lactose in the recipe should be kept low and the amount of phosphate high. The presence of significant square and interaction terms indicate the existence of quadratic behavior and non-linear blending effects. These effects are more easily understood by means of the trilinear mixture contour plot shown in the righthand plot below. This latter plot suggests that with the mixture composition 0.32/0/0.68 one may expect a response value above 350. This point should be tested in reality, thus functioning as an experimental verification of the model.
Investigation: Waaler_rsm (PLS, comp.=3) Scaled & Centered Coefficients for release

100 50 min 0 -50 -100 ce

la*la

ph*ph

N=10 DF=4

R2=0.985 Q2=0.553

R2 Adj.=0.966 RSD=18.7170 Conf. lev.=0.95


MODDE 7 - 2003-11-20 09:09:41

Task 3
In this application, the optimizer identified only one point, the mixture 0.32/0/0.68, where maximum release rate was predicted at 363 minutes. This point was not tested in the original work, but one close to it was. The experimenters performed three verifying experiments and these results together with model predictions are summarized in the figure below. As seen, the model predicts well except for the mixture 0.5/0.125/0.375. Recall that the observed values (for the first three rows in the figure below) were 370, 340, and 345.

Conclusions
Maximum release is predicted for the combination 0.32 / 0 / 0.68. The experimental verification produced good agreement between measured and predicted response values for two out of three new formulations. The discrepancy between measured and predicted release for the remaining point suggests some information deficiency in the training set. One way to address this problem is to combine the two sets of data and then update the regression model. As a consequence, a new set of prediction samples should be compiled in order to verify the predictive power of this updated model.

Copyright Umetrics AB, 04-02-10

ce*ph

ce*ce

la*ph

la

ce*la

ph

Page 4 (4)

DOE-Exercise ROCKET (Mixture)


Optimization of elasticity of a rocket propellant

Background
A manufacturer of a rocket propellant mixed three ingredients together to get the best possible product.

Objective
The objective was to formulate a propellant with elasticity > 2900.

Data
Three ingredients, mixture factors, were varied and one response (elasticity) was measured. The data table is shown below. Design: Modified Simplex Centroid. Model: Quadratic model.

Copyright Umetrics AB, 04-02-10

Page 1 (5)

Task 1
Create a new investigation in MODDE according to the information given above. Select RSM as objective, a quadratic model, and generate a modified simplex centroid design with 9 + 1 runs. Enter the response data.

Task 2
Evaluate the raw data. Make a histogram to evaluate the distribution of elasticity, and a replicate plot to explore the replicate error. Are there any anomalies in the raw data?

Task 3
Select PLS as fit method. Relate the predictors to the response. Investigate the relevant score and loading plots. Interpret the model. What can you say about the correlation structure among the factors and responses (Hint: Look at PLS score plots)? Which factors are influential for elasticity? Which formulation should be used to maintain an elasticity above 2900?

Copyright Umetrics AB, 04-02-10

Page 2 (5)

Solutions to ROCKET Task 2


According to the histogram elasticity is almost normally distributed. As a consequence, it was chosen to work with no transformation of the response variable. The replicate plot shows that the data contains no replicates, which implies that the ANOVA cannot be carried out fully (see below). We have also included a plot of the correlation matrix, just to always keep in mind the correlation of the factors arising from the overall mixture constraint (sum of all factors = 1.0).
Investigation: Rocket Histogram of Elasticity 4
3000 2900 Elasticity 2800 2700 2600 2500 2400 Investigation: Rocket Plot of Replications for Elasticity with Experiment Number labels

6 5 3 2 1
1 2 3 4

9 8 7

10

3 Count 2 1 0

4
5 6 7 8 9 10 Replicate Index
MODDE 7 - 2003-11-19 15:00:13

2350

2550

2750 Bins

2950

3150

MODDE 7 - 2003-11-19 14:59:50

Copyright Umetrics AB, 04-02-10

Page 3 (5)

Task 3
A two-component PLS model was obtained with R2 = 0.80 and Q2 = 0.25. The gap between R2 and Q2 is large and this is unsatisfactory. The PLS total summary plot shows that the first component is the most important regarding explained variance. In order to investigate the correlation structure, we have plotted the t/u scores of the two model components. These indicate a curved correlation structure in the first component, and that the second component basically is a compensation for the encountered non-linear behavior. Further, the ANOVA table shows that the model is insignificant (p = 0.14, should be p< 0.05 for a significant model). The N-plot of residuals shows a weakly deviating behavior of experiment number 4, but since it lies inside 4SD.s it was kept in the modelling.
Investigation: Rocket (PLS, comp.=2) Summary of Fit 1.00 0.80 0.60 0.40 0.20 0.00
N=10 DF=4
R2 Q2

Investigation: Rocket (PLS, comp.=2) PLS Total Summary (cum)


1.00

R2 Q2

0.80

0.60 R2 & Q2 0.40 0.20 0.00

Elasticity
Cond. no.=7.4174 Y-miss=0

Comp1

Comp2

N=10 DF=4

Cond. no.=7.4174 Y-miss=0

Investigation: Rocket (PLS, comp.=2) Score Scatter: t[1] vs u[1] with Experiment Number labels 2 1 u[1] 0 -1 -2

Investigation: Rocket (PLS, comp.=2) Score Scatter: t[2] vs u[2] with Experiment Number labels 2 1 0 u[2]

6 8 5 3

109

9 2 3 4
-3 -2 -1 t[2]
N=10 DF=4 Cond. no.=7.4174 Y-miss=0

-1 -2 -3

1 87 5

10 6

2 1
-3 -2 -1 t[1]
N=10 DF=4 Cond. no.=7.4174 Y-miss=0

4
0 1

-4

Investigation: Rocket (PLS, comp.=2) Elasticity with Experiment Number labels 0.95 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.05

N-Probability

5 4
-1

3 2 1 6

9 10

0 Standardized Residuals

N=10 DF=4

R2=0.801 Q2=0.249

R2 Adj.=0.553 RSD=160.1071
MODDE 7 - 2003-11-19 15:03:58

Copyright Umetrics AB, 04-02-10

Page 4 (5)

The PLS loading plot and the coefficient plot indicates how the various model terms influence the elasticity of the rocket propellant. However, because we have a very weak model we must interpret the model with great care. Some guidance with regards to model refinement may be extracted from the coefficient plot; however, in this case we have not found it possible to improve the model. What one can do in this kind of situation is to use the trilinear mixture contour plot to get a general appraisal of the response function. We understand from the left-hand mixture region plot that we are investigating a small, though simplex-shaped, mixture domain. We conclude from the right-hand mixture contour plot that it seems possible to accomplish an elasticity above 2900 within the investigated region.
Investigation: Rocket (PLS, comp.=2) Loading Scatter: wc[1] vs wc[2] 0.60 0.40 0.20 wc[2] 0.00 -0.20 -0.40 -0.60
Investigation: Rocket (PLS, comp.=2) Scaled & Centered Coefficients for Elasticity

Oxi*Fue Bin*Bin Bin Oxi*Oxi Oxi Bin*Fue Ela Fue


200 100 0 -100

Oxi*Oxi

Bin*Oxi

Bin*Bin

Bin

Oxi

-0.50 -0.40 -0.30 -0.20 -0.100.000.100.200.30 0.400.500.60 wc[1]


N=10 DF=4 Cond. no.=7.4174 Y-miss=0
N=10 DF=4

R2=0.801 Q2=0.249

R2 Adj.=0.553 RSD=160.1071 Conf. lev.=0.95


MODDE 7 - 2003-11-19 15:05:56

Binder

Oxidiser

Fuel

Conclusions
In this case the experimental goal of obtaining an elasticity above 2900 was accomplishable. Binder and Fuel were the two excipients with the largest impact on the result variable. Oxidiser had almost negligible effect on the result variable.

Copyright Umetrics AB, 04-02-10

Fue*Fue

Oxi*Fue

Bin*Fue

Fue

Fue*Fue

Bin*Oxi

-200

Page 5 (5)

DOE-Exercise CORNE59 (Mixture)


Optimising the Taste of Fish Pat

Background
A manufacturer of fish pat wanted to produce a quality product irrespective of which species of fish were used. Since the market price of different fish varies considerably, a mixture design was used to locate the best tasting pat.

Objective
The aim was to produce a pat with a taste rating above 3.

Data
There were three ingredients and one response (taste). The data table is shown on the next page. Design: Modified Simplex Centroid. Model: Linear.

Tasks
Task 1
Create a new investigation in MODDE according to the information given above. Make sure that the design has 19 rows and then paste the contents of CORNE59.dif into the worksheet.

Task 2
Evaluate the raw data by inspecting the distribution and replicate error of taste using Worksheet/Histogram and Worksheet/Replicate Plot respectively. Are there any anomalies in the raw data?

Task 3
Select PLS as the fit method, using Analysis/Select Fit Method/PLS, and fit the model. Interpret the model by investigating the relevant score and loading plots using Analysis/PLS Plots/Score Scatter Plot and Analysis/PLS Plots/Loading Scatter Plot respectively. What can you say about the correlation structure among the three ingredients and taste (hint: look at the PLS score plots)? Check the validity of the model by looking at the ANOVA table and residual plot using Analysis/ANOVA/Anova Table and Analysis//Normal Prob. Plot Residuals respectively. Use loadings and coefficient plots, Analysis/Coefficients/Plot, to investigate which ingredients influence the taste of the pat? Which recipe gives a taste above 3?

Copyright Umetrics AB, 04-02-10

Page 1 (5)

Experimental Data Experiments 1-10 comprise the original design, and experiments 11-19 are replicates.

Copyright Umetrics AB, 04-02-10

Page 2 (5)

Solutions to Corne59
Task 2
The histogram suggests that a transformation, such as log, would be preferable. However, for the sake of this preliminary analysis of the data we will not transform the response. The replicate plot clearly illustrates the small replicate error. The correlation matrix is also shown below in order to illustrate the inherent correlation between the ingredients due to the overall mixture constraint, i.e. sum of ingredients = 1.0.
Investigation: Corne59 Histogram of taste 8 7 6 Count
taste 4 3 2 1 5 Investigation: Corne59 Plot of Replications for taste with Experiment Number labels

11 1

5 4 3 2 1 0 1.00 1.90 2.80 Bins 3.70 4.60 5.50

13 3 2 12
1 2 3 4

18 15 5 4 14 17
5 6

19 16 6

7 9 8
7 8 9

10

10

Replicate Index
MODDE 7 - 2003-11-19 11:27:27

MODDE 7 - 2003-11-19 11:26:54

Copyright Umetrics AB, 04-02-10

Page 3 (5)

Task 3
A three-component PLS model was obtained with R2=0.97, Q2=0.90, MVal = 0.30, and Rep = 0.98. The PLS Total Summary plot shows that the first component is by far the most important in terms of variance explained. In order to investigate the correlation structure, we have plotted the t/u scores of the first two model components which indicate a strong relationship between taste and the three ingredients. The ANOVA table and the N-plot of residuals also indicate an excellent model.
Investigation: Corne59 (PLS, comp.=3) Summary of Fit 1.00 0.80 0.60 0.40 0.20 0.00
N=19 DF=13
R2 Q2 Model Validity Reproducibility

Investigation: Corne59 (PLS, comp.=3) PLS Total Summary (cum)


1.00

R2 Q2

0.80

R2 & Q2

0.60

0.40

0.20

taste
Cond. no.=6.5072 Y-miss=0

0.00 Comp1
N=19 DF=13

Comp2
Cond. no.=6.5072 Y-miss=0

Comp3

Investigation: Corne59 (PLS, comp.=3) Score Scatter: t[1] vs u[1] with Experiment Number labels
4 3 2 1 0 -1 -2 -3
1

Investigation: Corne59 (PLS, comp.=3) Score Scatter: t[2] vs u[2] with Experiment Number labels

11 1

18 15 10 8 19 16 6 7 9 5 11 1 13 3 2 12

2 12
-2

8
-1

18 15 13 5 3 10 19 16 6 9 4 14 17
0 t[1]
N=19 DF=13

u[1]

u[2]

-1

-2

4 14 17
-1

0 t[2]
N=19 DF=13 Cond. no.=6.5072 Y-miss=0

Cond. no.=6.5072 Y-miss=0

Investigation: Corne59 (PLS, comp.=3) taste with Experiment Number labels 0.98 0.95 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.05 0.02

N-Probability

7
-1

1 14 3 17 12 6

415 11 19 2 8 16

13 18

10

0 Standardized Residuals

N=19 DF=13

R2=0.971 Q2=0.905

R2 Adj.=0.960 RSD=0.1964
MODDE 7 - 2003-11-19 11:34:17

The PLS loadings plot and the coefficient plot indicate that all three mixture ingredients affect the taste of the fish pat. The response contour plot shows that, in order to achieve a taste rating above 3, you need to be in the upper part of the mixture triangle, i.e. high x1 and low x2. There is also clear evidence of non-linear blending.

Copyright Umetrics AB, 04-02-10

Page 4 (5)

Investigation: Corne59 (PLS, comp.=3) Loading Scatter: wc[1] vs wc[2] 0.60 0.40 wc[2] 0.20 0.00 -0.20 -0.40

Investigation: Corne59 (PLS, comp.=3) Scaled & Centered Coefficients for taste 0.40

x2*x2

x3 y x1*x1 x3*x3 x1 x2*x3

x1*x3

0.20 0.00 -0.20 -0.40 -0.60 x1 x2 x3 x1*x1 x2*x2 x3*x3 x1*x2 x1*x3 x2*x3

x2

x1*x2
wc[1]
N=19 DF=13 Cond. no.=6.5072 Y-miss=0

-0.50 -0.40 -0.30 -0.20 -0.100.00 0.100.200.30 0.400.50

N=19 DF=13

R2=0.971 Q2=0.905

R2 Adj.=0.960 RSD=0.1964 Conf. lev.=0.95


MODDE 7 - 2003-11-19 11:42:23

Conclusions
There is a strong relationship between taste and the three varied ingredients. To obtain a taste rating above 3, ingredient x1 should be high and ingredient x2 low. This gave the manufacturer a clear strategy for maintaining quality whilst simultaneously reducing cost.

Copyright Umetrics AB, 04-02-10

Page 5 (5)

DOE-Exercise BUBBLES (Mixture)


Screening and optimization of bubble formation

Background
Kids like to blow bubbles, but dislike bubbles which burst rapidly. We decided to use mixture design to investigate which factors that may affect bubble formation. We browsed through the Internet to find a suitable bubble mixture composition, which we could use as a starting reference mixture. Then this recipe was modified using mixture design, and bubbles were blown for each mixture composition. The investigator, Lennart Eriksson, carried out these experiments while being on parental leave and taking care of his son, little Andreas, 14 months old. This ensures high bubble quality.

Objective
The objective was to understand which factors that influence the bubble making process (Screening), and to see if some kind of optimal recipe could be formulated, which would ensure long-lasting bubbles (RSM).

Data
The lifetime in seconds was measured for bubbles of 4-5 cm size. The two process factors were: temperature (C) of solution and settling time of mixture (h). The four mixture factors were: dish-washing liquid 1 (DWL1), dish-washing liquid 2 (DWL2), tap water and glycerol.

Copyright Umetrics AB, 04-02-10

Page 1 (8)

Tasks
Task 1
In MODDE first define the factors, the response and the constraint as outlined above. Select Screening as the objective. The process model should be an interaction model, and the mixture model a linear model. Create a Doptimal design with 24 runs. Edit the reference mixture so that it becomes 0.2 / 0.2 / 0.5 / 0.1. Open BUBB_SCR.XLS and paste in the real data. Select PLS as fit method and compute the model. Review and interpret the model. Which terms are important? Is there any deviating experiment? How should we proceed to improve the result (get longer-lasting bubbles)?

Task 2
Refine the model from the previous task by removing all insignificant terms. Refit and evaluate the updated model. Which factors are most meaningful to optimize? How should we proceed to improve the result (get longer-lasting bubbles)? Use the MODDE Optimizer to get some suggestions for future experiments.

Task 3
We are now going to use the results of the screening phase and construct an appropriate RSM design. This means that we will put the two process factors temperature (7 C) and time (25 h) as constants, and vary only the four mixture factors. We shall use the mixture composition 0.2 / 0.2 / 0.3 / 0.3 as our new reference mixture. In MODDE, make a copy investigation and re-define the factor settings and the DWL-constraint according to the following:

The response should be the lifetime of the bubbles acquired (log transformed). Select RSM as the objective. The mixture model should be a quadratic model. Create a D-Optimal design with 24 runs. Edit the reference mixture so that it becomes 0.2 / 0.2 / 0.3 / 0.3. Open BUBB_RSM.XLS and paste in the real data. Select PLS as fit method and compute the model. Review and interpret the model. Which terms are important? Is there any deviating experiment? Is it possible to even further increase the lifetime of the bubbles (longer than the measured 18.40 min)? Is it possible to find an optimum within the investigated region? Use the Optimizer to explore the mixture region.

Copyright Umetrics AB, 04-02-10

Page 2 (8)

Data Set for BUBBLES Screening

Data Set for BUBBLES - RSM

Copyright Umetrics AB, 04-02-10

Page 3 (8)

Solutions to BUBBLES
Task 1
We can see that the distribution of the response is skewed to the right it needs to be log-transformed. The replicate plot shows that the pure error is reasonably low.
Investigation: Bubb_scr Histogram of Lifetime 10 15
2.50

Investigation: Bubb_scr Histogram of Lifetime~

Investigation: Bubb_scr Plot of Replications for Lifetime~ with Experiment Number labels

9 1 5 4 13 19 20 17 18 16 14 15 23 21 22 24

8
Lifetime~

Count

Count

10

6 4 2

2.00

1.50

8 2 3
4 6

12 11 10

11

81

151

221 Bins

291

361

431

1.00

1.30

1.60

1.90 Bins

2.20

2.50

2.80

1.00 0

7
8 10 12 14 16 18 20 22 Replicate Index

MODDE 7 - 2003-11-19 10:44:31

MODDE 7 - 2003-11-19 10:48:23

MODDE 7 - 2003-11-19 10:48:50

PLS was used to fit a model to the data, yielding R2 = 0.81, Q2 = 0.18, MVal = -0.2 and Rep = 0.93. There are several insignificant cross-terms, which cause the low Q2 and MVal. Remove these terms and refit the model.
Investigation: Bubb_scr (PLS, comp.=2) Summary of Fit 1.00 0.80 0.60
R2 Q2 Model Validity Reproducibility

Investigation: Bubb_scr (PLS, comp.=2) Scaled & Centered Coefficients for Lifetime~

0.50

Large CI.s not a bug, but a theory problem

s
0.40 0.20 0.00 -0.20
N=24 DF=11

0.00

-0.50 Te*DW1 Te*DW2 Te*Wa Ti*DW2 Ti*Wa Ti*Gly Gly Te*Ti Te*Gly Ti*DW1 Te Ti
Lifetime~
Cond. no.=2.7203 Y-miss=0

N=24 DF=11

DW1 DW2 Wa
R2=0.812 Q2=0.185

R2 Adj.=0.608 RSD=0.2476 Conf. lev.=0.95


MODDE 7 - 2003-11-19 10:50:35

Task 2
When refitting the model a much better result was obtained. The refined model looks good according to R2/Q2, N-plot of residuals and Obs/pred. The ANOVA table and the MVal statistic show lack of fit, however, but the model is still useful. The model interpretation (with loadings or coefficients) indicates that in order to accomplish longer lasting bubbles the fraction of glycerol should be increased and the amount of water decreased. In the interpretation one must remember that the regression coefficients refer to the 0.2 / 0.2 / 0.5 / 0.1 reference mixture.

Copyright Umetrics AB, 04-02-10

Page 4 (8)

Investigation: Bubb_scr (PLS, comp.=2) Summary of Fit 1.00 0.80 0.60 0.40 0.20 0.00
N=24 DF=18

R2 Q2 Model Validity Reproducibility

Lifetime~
Cond. no.=2.1537 Y-miss=0

Investigation: Bubb_scr (PLS, comp.=2) Lifetime~ with Experiment Number labels 0.98 0.95 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.05 0.02

Investigation: Bubb_scr (PLS, comp.=2) Lifetime~ with Experiment Number labels

12

2 10
-1

20 13 5 23 17 1 3 11 9 18 21 14 227 4 24 6 8 16
0 1

19

15
Observed

2.50

9 19 20 1 17 18 16 23 21 22 4 12 24 14 8 10 5 13

N-Probability

2.00

1.50

15 3

1.00 7 1.00 1.20 1.40 1.60 1.80 2.00 2.20 2.40 2.60 Predicted
N=24 DF=18 R2=0.796 Q2=0.640 R2 Adj.=0.739 RSD=0.2018
MODDE 7 - 2003-11-19 10:58:44

11 2 6

Standardized Residuals
N=24 DF=18 R2=0.796 Q2=0.640 R2 Adj.=0.739 RSD=0.2018
MODDE 7 - 2003-11-19 10:58:13

Investigation: Bubb_scr (PLS, comp.=2) Loading Scatter: wc[1] vs wc[2] 0.60 0.40 wc[2] 0.20 0.00 -0.20 -0.40 -0.60 -0.40 -0.20 0.00 wc[1]
N=24 DF=18 Cond. no.=2.1537 Y-miss=0

Investigation: Bubb_scr (PLS, comp.=2) Scaled & Centered Coefficients for Lifetime~ 0.30

Ti Wa Gly Li~
s

0.20 0.10 0.00

DW2 Te

DW1
0.40 0.60

-0.10 -0.20 DW1 DW2 Wa Gly Te Ti


R2=0.796 Q2=0.640

0.20

N=24 DF=18

R2 Adj.=0.739 RSD=0.2018 Conf. lev.=0.95


MODDE 7 - 2003-11-19 10:59:18

We then used the MODDE optimizer to compute predictions of where to lay out an optimization design. Two such predicted mixture compositions are shown below, together with the results from the verifying experiments. It was decided to use the first verifying experiment as the reference for the RSM mixture design.

Copyright Umetrics AB, 04-02-10

Page 5 (8)

Predictions from MODDE Optimizer:

Verifying experiments:

#1 Temp = 7 Time = 25 Mixture = 0.2 / 0.2 / 0.3 / 0.3 Lifetime = 1120 sec (18 min 40 sec)

#2 Temp = 7 Time = 49 Mixture = 0.4 / 0.0 / 0.3 / 0.3 Lifetime = 810 sec (13 min 30 sec)

Task 3
The replicate plot shows that the pure error is low. This plot also indicates that the replicates, i.e., the reference mixture measurements, lie in the upper part of the response interval. This indicates that a quadratic model is needed. The fitted quadratic PLS model had R2 = 0.92, Q2 = 0.71, MVal = 0.56, and Rep = 0.95, which are good values, and of sufficient quality for making an optimization. The model shows no lack of fit (ANOVA table) and has approximately normally distributed residuals. The PLS score plot demonstrates the good correlation between mixture composition and bubble lifetime. According to the coefficient plot, the excipients water and glycerol have most impact on bubble lifetime in the mixture region explored. Remember that the reference mixture is 0.2 / 0.2 / 0.3 / 0.3.
Investigation: Bubb_rsm Plot of Replications for Lifetime~ with Experiment Number labels Investigation: Bubb_rsm (PLS, comp.=2) Summary of Fit 1.00 0.80 0.60 0.40
R2 Q2 Model Validity Reproducibility

3.10 Lifetime~

1 2 5 4 3
2 4 6 8

8 11 6 7 9
10

13 14

20 19 22 23 21 24 15 17 16

3.00

10 12 18
16 18 20 22

2.90

0.20 0.00
N=24 DF=14

2.80 0

12

14

Replicate Index
MODDE 7 - 2003-11-19 11:12:35

Lifetime~
Cond. no.=12.3206 Y-miss=0

Copyright Umetrics AB, 04-02-10

Page 6 (8)

Investigation: Bubb_rsm (PLS, comp.=2) Lifetime~ with Experiment Number labels 0.98 0.95 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.05 0.02

N-Probability

11

7
-1

1 16 6 9 512

4 15 22 23 14 21 19 13 18 20 8 10 3 24 2 17

0 Standardized Residuals

N=24 DF=14

R2=0.919 Q2=0.708

R2 Adj.=0.868 RSD=0.0358
MODDE 7 - 2003-11-19 11:13:20

Investigation: Bubb_rsm (PLS, comp.=2) Score Scatter: t[1] vs u[1] with Experiment Number labels 2 1 0 u[1] -1 -2 -3

Investigation: Bubb_rsm (PLS, comp.=2) Scaled & Centered Coefficients for Lifetime~ 0.060 0.040 0.020 0.000 -0.020 -0.040 Gly*Gly Gly DW1*Gly DW1*Wa DW1*DW1 DW2*DW2 DW1*DW2 DW2*Wa DW2*Gly Wa*Wa Wa*Gly -0.060 DW1 DW2 Wa

15 4 18 3
-3 -2 -1 t[1]

2 10 5

13 14 20 18 16 22 19 23 21 24 1117

6 12 9 7

1
N=24 DF=14

R2=0.919 Q2=0.708

R2 Adj.=0.868 RSD=0.0358 Conf. lev.=0.95


MODDE 7 - 2003-11-19 11:14:06

N=24 DF=14

Cond. no.=12.3206 Y-miss=0

Copyright Umetrics AB, 04-02-10

Page 7 (8)

Because the PLS model is good according to the evaluation criteria (R2/Q2/MVal/Rep, ANOVA, N-plot, t1/u1 score plot) we may proceed and make predictions. The mixture contour plot displayed below was created by putting glycerol, the most important ingredient, on its high level. Evidently, there is not a sharp optimum, but rather a ridge structure on which bubble lifetime in the span 1350-1360 seconds (approx 22.30 min) is encountered.

With the MODDE optimizer, the following five runs were predicted. They are all situated on the ridge found above.
DWL1 0.22 0.2108 0.2264 0.2229 0.2264 DWL2 0.1001 0.1187 0.1001 0.1001 0.1001 Water 0.2799 0.2705 0.2735 0.2771 0.2735 Glycerol 0.4 0.4 0.4 0.4 0.4 Lifetime 1359.421 1353.342 1360.329 1360.145 1360.329 iter 148 87 84 105 76 log(D) -0.8289 -0.7011 -0.8497 -0.8455 -0.8497

Conclusions
The conclusion is that by first using a screening design, then some steepest ascent predictions, and finally laying out an RSM design, we have made it possible to increase bubble lifetime from 6.02 min to 22.28 min!!!! Unfortunately, however, little Andreas, showed more interest for the little red plastic bubble wand, than for his fathers enormous experimental progress.

Copyright Umetrics AB, 04-02-10

Page 8 (8)

DOE-Exercise LOWARP (Mixture)


Optimisation of a Polymer

Background
A manufacturer wanted to develop a new polymer with the properties of low warp and high strength. To achieve this, the polymer formulation was varied according to an extreme vertices mixture design with 14 runs and 3 centre points based on the following constituents: 1 2 3 4 Glas Crtp Mica Amtp 20 to 40 % 0 to 20 % 0 to 20 % 40 to 60 %

Objective
The objective of the investigation was to understand how the four constituents influence the properties of the polymer and if it was possible to manufacture a polymer with the required properties.

Data
Fourteen responses relating to warp, shrinkage and strength were measured on the polymers as shown below.

Copyright Umetrics AB, 04-02-10

Page 1 (5)

Tasks
Task 1
Create a new investigation in MODDE and define the four factors and 14 responses (see above). Select SCREENING as the experimental objective. Generate a worksheet with 17 runs and copy/paste the entire data table (including the factor settings) from the file Lowarp.xls.

Task 2
Fit a model relating the constituents (variables 1 - 4) to the responses using PLS. Investigate the relevant score and loading plots using Analysis/PLS Plots/Score Scatter Plots and Analysis/PLS Plots/Loading Scatter Plots respectively and interpret the model. What can you say about the correlation structure among the factors and responses (hint: look at the score plots)? How are the 14 responses related (hint: look at the loading plots)? Which factors influence strength and which factors influence warp?

Copyright Umetrics AB, 04-02-10

Page 2 (5)

Solutions to LOWARP
Task 2
PLS gives a three component model with R2 = 0.75 and Q2 = 0.53 which are excellent results considering that all 14 responses are included in one model. The R2 and Q2 values for each individual response are shown in the Summary of Fit plot below. The three PLS score plots confirm the strong correlation between the constituents and the responses. Finally, the DModY plot indicates no outliers in the response data.
Investigation: Lowarp (PLS, comp.=3) PLS Total Summary (cum)
1.00

R2 Q2

Investigation: Lowarp (PLS, comp.=3) Summary of Fit 1.00 0.80 0.60 0.40 0.20

R2 Q2 Model Validity Reproducibility

0.80

0.60 R2 & Q2 0.40 0.20

wrp1

wrp2

wrp3

wrp4

wrp5

wrp6

wrp7

wrp8

st1

st2

st3

st4

st5

0.00 Comp1 Comp2 Comp3

N=17 DF=13

Cond. no.=2.0457 Y-miss=10

N=17 DF=13

Cond. no.=2.0457 Y-miss=10

Investigation: Lowarp (PLS, comp.=3) Score Scatter: t[1] vs u[1] with Experiment Number labels 2 1 0 -1 -2

Investigation: Lowarp (PLS, comp.=3) Score Scatter: t[2] vs u[2] with Experiment Number labels

16 3 2 17 8 12

10
1

st6

0.00

10 17 14 16 15 7 13 9 5 3

1 11

7 6 11 9 4
-2

14 15 13 1

0 u[2] -1 -2 -3

u[1]

4 12 2
-2 -1

5
-1 0 t[1]
N=17 DF=13 Cond. no.=2.0457 Y-miss=10

0 t[2]

N=17 DF=13

Cond. no.=2.0457 Y-miss=10

Investigation: Lowarp (PLS, comp.=3) Score Scatter: t[3] vs u[3] with Experiment Number labels 3 2 u[3] 1 0 -1 -2

Investigation: Lowarp (PLS, comp.=3) Distance to Model (Y)

3
Standardized Residuals
2

0.60 0.50 0.40 0.30 0.20 0.10 0.00 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 Experiment Number


N=17 DF=13 Cond. no.=2.0457 Y-miss=10

16 10 13 6 14 11 5 2 15 12 8
-2 -1 0 t[3]
N=17 DF=13 Cond. no.=2.0457 Y-miss=10

9 1

17 7

Copyright Umetrics AB, 04-02-10

Page 3 (5)

We have a PLS model characterising all 14 responses. Inspection of the VIP-plot indicates that, taken over all 14 responses, mica and glas are the most influential constituents. Since VIP is a squared function of the PLS loadings, it tells us how important each constituent is but not in which direction (positive or negative) it influences a particular response. This information can be obtained from the loadings plot which shows how the variables (constituents and responses) relate to each other. Observe that the eight warp responses are strongly clustered to the right of the loading plot in the direction of amtp and away from mica. Hence, we conclude that increasing amtp will increase warp whilst increasing mica will work in the opposite direction. The six strength responses are more scattered in the loading plot. This suggests that strength is either more difficult to measure or is a more complex phenomenon. crtp is most influential for st3 and st5, whereas glas is most important for st1, st2, st4 and st6. The four coefficient plots, shown below, illustrate the coefficient profiles for both correlated (wrp1 & wrp2) and uncorrelated (st3 & st4) responses.
Investigation: Lowarp (PLS, comp.=3) Variable Importance Plot 1.20 1.00 0.80 VIP 0.60 0.40 0.20 0.00
-0.50 -0.80 -0.60 -0.40 -0.20 Investigation: Lowarp (PLS, comp.=3) Loading Scatter: wc[1] vs wc[2]

st5st3
0.50 wc[2]

mi
0.00

st1 gl st4 st2 st6 w2 w6 w1 w5 w7 w3 w8 am w4 cr


0.00 wc[1] 0.20 0.40 0.60 0.80

mi

gl

am

cr

N=17 DF=13

Cond. no.=2.0457 Y-miss=10

N=17 DF=13

Cond. no.=2.0457 Y-miss=10

Investigation: Lowarp (PLS, comp.=3) Scaled & Centered Coefficients for wrp1 0.50 0.00 -0.50 -1.00 -1.50 gl mi cr am

Investigation: Lowarp (PLS, comp.=3) Scaled & Centered Coefficients for wrp2

0.50 0.00 -0.50 -1.00 -1.50 gl mi cr am


MODDE 7 - 2003-11-19 13:49:23

N=17 DF=13

R2=0.734 Q2=0.610

R2 Adj.=0.672 RSD=0.9196 Conf. lev.=0.95


MODDE 7 - 2003-11-19 13:49:09

N=17 DF=13

R2=0.771 Q2=0.625

R2 Adj.=0.718 RSD=0.8504 Conf. lev.=0.95

Investigation: Lowarp (PLS, comp.=3) Scaled & Centered Coefficients for st3 200 100 0 -100 -200 gl mi cr am
2000 1000 0 -1000 -2000 -3000 gl

Investigation: Lowarp (PLS, comp.=3) Scaled & Centered Coefficients for st4

mi

cr

N=17 DF=13

R2=0.958 Q2=0.931

R2 Adj.=0.949 RSD=83.8212 Conf. lev.=0.95


MODDE 7 - 2003-11-19 13:49:39

N=17 DF=13

R2=0.833 Q2=0.675

R2 Adj.=0.794 RSD=1641.0009Conf. lev.=0.95


MODDE 7 - 2003-11-19 13:49:51

Copyright Umetrics AB, 04-02-10

am

Page 4 (5)

Mixture contour plots provide a better understanding of the relationships between warp and strength and the four constituents. These contour plots are shown below for the four responses discussed previously and were constructed by fixing amtp at 0.5 and letting the other three constituents (crtp/mica/glas) vary. The arrow indicates a reasonable compromise among the four responses yielding the desired properties of high strength and low warp. This mixture is approximately glas = 0.3, crtp = 0.0, mica = 0.2 and amtp = 0.5. This mixture should be tested to verify the model predictions.

Conclusions
The application of a simple mixture design to a complex polymer optimisation problem has successfully generated a mixture point with the desired properties.

Copyright Umetrics AB, 04-02-10

Page 5 (5)

List of references (last revised 2004-02-10)


A foreword
This is a list covering a small selection of useful references (books and articles) in the fields of design of experiments (DoE) and multivariate analysis (MVA). It is emphasized that this is by no means an exhaustive account of the available literature. Rather, this compilation highlights references which may guide the reader for further studies.

References for DoE


Books
Box G.E.P., Hunter W.G., Hunter J.S., Statistics for Experimenters, John Wiley & Sons, Inc., New York, (1978). 2. Cornell J.A., Experiments with mixtures, John Wiley & Sons, Inc., New York, (1981). 3. Bayne C. K., Rubin I.B., Practical Experimental Designs and Optimization Methods for Chemists, VCH Publishers, Inc., Deerfield Beach, Florida, (1986). 4. Box G.E.P., Draper N.R., Empirical Model-Building and Response Surfaces, John Wiley & Sons, Inc., New York, (1987). 5. Haaland P.D. Experimental designs in biotechnology, Marcel Dekker, Inc., New York, Basel (1989). 6. Carlson R., Design and Optimization in Organic Synthesis, Elsevier science publishers, Amsterdam (1991). 7. Montogomery, D.C., Design and Analysis of Experiments, John Wiley & Sons, New York (1991) ISBN 0471-52994-X. 8. Morgan E. Chemometrics: Experimental Design, John Wiley & Sons, Inc., New York, (1991). 9. Nortvedt R et al., Anvendelse av kjemometri innen forskning og industri, Tidskriftsforlaget Kjemi AS (1996) ISBN 82-91294-01-1. 10. Goupy, J.L., Methods for Experimental Design Principles and Applications for Physicists and Chemists, Elsevier, Amsterdam (1993). 1.

Articles
1. 2. 3. 4. 5. 6. 7. 8. 9. Hendrix, C. (1979), What Every Technologist Should Know About Experimental Design, Chemtech, 9, 167174. Hunter, J.S. (1987), Applying Statistics to Solving Chemical Problems, Chemtech, 17, 167-169. Steinberg, D.M and Hunter, W.G. (1984), Experimental Design: Review and Comments, Technometrics, 26, 71-97. Grize, Y.L. (1995), A Review of Robust Process Design Approaches, Journal of Chemometrics 9, 239-262. Ahlinder, S., et al. (1997), Smart Testing Reaping the Benefits of DoE, Volvo Technology Report No 2 1997, www.volvo.se/rt/trmag/index.html. Nystrm, A. and Karlsson, A. (1997) Enantiomeric Resolution on Chiral-AGP with the aid of Experimental Design. Unusual Effects of Mobile Phase pH and Column Temperature, Journal of Chromatography A, 763, 105-113. Eriksson, L., Johansson, E., Wikstrm, C. (1998), Mixture Design Design Generation, PLS Analysis and Model Usage, Chemometrics and Intelligent Laboratory Systems, 43, 1-24. Lundstedt, T., et al. (1998), Experimental Design and Optimization, Chemometrics and Intelligent Laboratory Systems, 42, 3-40. Rappaport, K.D., et al. (1998), Perspectives on Implementing Statistical Modeling and Design in an Industrial/Chemical Environment, The American Statistician, May 1998, 52, 152-159.

page 1

References for MVA


Books
1. 2. 3. 4. 5. 6. Jollife, I.T. (1986), Principal component analysis, Springer-Verlag, New York (ISBN 0-387-96269-7). Martens, H. and Naes, T. (1989), Multivariate calibration, John Wiley, New York. Jackson, J.E. (1991), A users guide to principal components, John Wiley, New York. (ISBN 0-471-62267-2). Anthology (1996), Anvendelse av Kjemometri innen forskning og industri, Tidsskriftfolaget Kjemi AS, Bergen, Norway (ISBN 82-91294-01-1). Hskuldsson, A. (1996), Prediction Methods in Science and Technology, Thor Publishing, Copenhagen, Denmark (ISBN 87-985941-0-9). Massart, D.L., et al., Handbook of Chemometrics and Qualimetrics. Part A and Part B, Elsevier, Amsterdam (1998).

Articles, general
1. 2. 3. 4. 5. 6. 7. 8. Wold, S., Esbensen, K., Geladi, P. (1987), Principal Component Analysis, Chemometrics and Intelligent Laboratory Systems, 2, 37-52. Hskuldsson, A. (1988), PLS Regression Methods, Journal of Chemometrics, 2, 211-228. Sthle, L., and Wold, S. (1988), Multivariate Data Analysis and Experimental Design in Biomedical Research, In: Ellis, G.P., and West, G.B. (Eds) Progress in Medical Chemistry, Elsevier Science Publishers, 291-338. Wold, S., Albano, C., and Dunn W.J., et al. (1989), Multivariate Data Analysis: Converting Chemical Data tables to plots, In: Computer Applications in Chemical Research and Education, Heidelberg, Dr. Alfred Htig Verlag. Stone M, Brooks RJ (1990): Continuum regression: Cross-validated sequentially constructed prediction embracing ordinary least squares, partial least squares and principal components regression Journal of the Royal Statistical Society, Ser. B, 52, 237-269. Frank, I.E., and Friedman, J.H. (1993), A Statistical View of Some Chemometrics Regression Tools, Technometrics, 35, 109-148. Wold, S. (1994), Exponentially Weighted Moving Principal Components Analysis and Projections to Latent Structures, Chemometrics and Intelligent Laboratory Systems, 23, 149-161. Wold, S., Eriksson, L., and Sjstrm, M. (1999), PLS in Chemistry, in: Encyclopedia of Computational Chemistry, Elsevier, pp 2006-2020.

Articles, process
1. 2. 3. 4. 5. 6. 7. 8. Kresta, J.V., MacGregor J.F., and Marlin T.E. (1991), Multivariate Statistical Monitoring of Process Operating Performance, The Canadian Journal of Chemical Engineering, 69, 35-47. Kourti, T., and MacGregor, J.F. (1995), Process Analysis, Monitoring and Diagnosis, Using Multivariate Projection Methods, Chemometrics and Intelligent Laboratory Systems, 28, 3-21. MacGregor, J.F. (1996), Using, On-line Process Data to Improve Quality, ASQC Statistics Division Newsletter, vol. 16. NO. 2. Page 6-13. Nijhuis, A., de Jong, S., Vandeginste, B.G.M. (1997), Multivariate Statistical Process Control in Chromatography, Chemometrics and Intelligent Laboratory Systems, 38, 51-61. Rnnar, S., McGregor, J.F., and Wold, S. (1998), Adaptive Batch Monitoring Using Hierarchical PCA, Chemometrics and Intelligent Laboratory Systems, 41, 73-81. Wikstrm, C., et al. (1998), Multivariate Process and Quality Monitoring Applied to an Electrolysis Process Part I. Process Supervision with Multivariate Control Charts, Chemometrics and Intelligent Laboratory Systems, 42, 221-231. Wikstrm, C., et al. (1998), Multivariate Process and Quality Monitoring Applied to an Electrolysis Process Part II. Multivariate Time-series Analysis of Lagged Latent Variables, Chemometrics and Intelligent Laboratory Systems, 42, 233-240. Wold, S., et al. (1998), Modelling and Diagnostics of Batch Processes and Analogous Kinetic Experiments, Chemometrics and Intelligent Laboratory Systems, 44, 331-340, 1998.

page 2

Articles, multivariate calibration


1. 2. 3. 4. 5. 6. Brown, P.J. (1982), Multivariate Calibration, Journal of the Royal Statistical Society, B44, 287-321. Beebe, K.R. and Kowalski, B.R. (1987), An Introduction to Multivariate Calibration and Analysis, Analytical Chemistry, 57, 1007-1017. Trygg, J., and Wold, S. (1998), PLS Regression on Wavelet Compressed NIR Spectra, Chemometrics and Intelligent Laboratory Systems, 42, 209-220. Swierenga, H., et al. (1998), Improvement of PLS Model Transferability by Robust Wavelength Selection, Chemometrics and Intelligent Laboratory Systems, 41, 237-248. Wold, S., Antti, H., et al. (1998), Orthogonal Signal Correction of Near-Infrared Spectra, Chemometrics and Intelligent Laboratory Systems, 43, 123-134. Bro, R. (1996), Hndbog i Multivariabel Kalibrering, KVL, Copenhagen, Denmark.

Articles, multivariate characterization


1. 2. 3. 4. 5. 6. Carlson, R., Lundstedt, T., and Albano, C. (1985), Screening of Suitable Solvents in Organic Synthesis Strategies for Solvent Selection, Acta Chemica Scandinavica, B39, 79-91. Wallbcks, L., Edlund, U. and Nordn, B. (1991), Multivariate Characterization of Pulp Using Solid-State 13 C NMR, FTIR and NIR, Tappi Journal, 74, 201-206. Cocchi, M., et.al. (1992), Theoretical versus Empirical Molecular Descriptors in Monosubstituted Benzenes A Chemometric Study, Chemometrics and Intelligent Laboratory Systems, 12, 209-224. Eriksson, L., Verhaar, H.J.M., and Hermens, J.L.M. (1994), Multivariate Characterization and Modelling of the Chemical Reactivity of Epoxides, Environmental Toxicology and Chemistry, 13, 683-691. Lindgren, ., and Sjstrm, M. (1994), Multivariate Physico-Chemical Characterization of Some Technical Non-Ionic Surfactants, Chemometrics and Intelligent Laboratory Systems, 23, 179-189. Andersson, P., Haglund, P., and Tysklind, M. (1997), Ultraviolet Absorption Spectra of all 209 Polychlorinated Biphenyls Evaluated by Principal Component Analysis, Fresenius Journal of Analytical Chemistry, 357, 1088-1092.

Articles, QSAR
1. 2. 3. 4. 5. 6. Eriksson, L., Hermens, J.L.M., et al. (1995), Multivariate Analysis of Aquatic Toxicity Data with PLS, Aquatic Sciences, 57, 217-241. Eriksson, L., and Johansson, E. (1996), Multivariate Design and Modeling in QSAR, Chemometrics and Intelligent Laboratory Systems, 34, 1-19. Verhaar, H.J.M., Hermens, J.L.M., et al. (1996), Classifying Environmental Pollutants. Separation of Class1 and Class2 Type Compounds Based on Chemical Descriptors, Journal of Chemometrics, 10, 149162. Goodford, P. (1996), Multivariate Characterization of Molecules for QSAR Analysis, Journal of Chemometrics, 10, 107-117. Lindgren, ., et al. (1996), Quantitative Structure-Effect Relationships for Some Technical Non-ionic Surfactants, Journal of the American Oil Companies Society, 73, 863-875. Sandberg, M., et al. (1998), New Chemical Descriptors Relevant for the Design of Biologically Active Peptides. A multivariate Characterization of 87 Amino Acids, Journal of Medicinal Chemistry, 41, 24812491.

page 3

Potrebbero piacerti anche