Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Tutorials
By CAMO Process AS
This manual was produced using ComponentOne Doc-To-Help 2005 together with Microsoft
Word. Visio and Excel were used to make some of the illustrations. The screen captures were taken
with Paint Shop Pro.
Trademark Acknowledgments
Doc-To-Help is a trademark of ComponentOne LLC.
Microsoft is a registered trademark and Windows 95, Windows 98, Windows NT, Windows
2000, Windows ME, Windows XP, Excel and Word are trademarks of Microsoft Corporation.
PaintShop Pro is a trademark of JASC, Inc.
Visio is a trademark of Shapeware Corporation.
Restrictions
Information in this manual is subject to change without notice. No part of the documents that build it
up may be reproduced or transmitted in any form or by any means, electronic or mechanical, for any
purpose, without the express written permission of CAMO Process AS.
Software Version
This manual is up to date for version 9.5 of The Unscrambler.
Document last updated on March 17, 2006.
Note:
Some model names in these tutorials assume that your files are stored on a computer which runs Microsoft
Windows Operating Systems (Windows 9X, Windows 2K, Windows NT and Windows XP). Substitute the
long filenames with names that comply with the DOS 8.3 rule if The Unscrambler is installed on a file server
running Windows for Workgroups, or similar.
We suggest that you copy the data files to a safe place. This way you can always start from scratch with the
tutorials.
Read the details below to understand which tutorials are useful in your case, and get some practical advice for
running the tutorials.
Depending on your degree of experience in using The Unscrambler and your fields of interest, here are the
tutorials we recommend that you start with:
Wavelength
Blue Red
Seven solutions, or samples, have known concentrations, Y, of a, and can be used as the calibration samples.
Three other samples have unknown concentrations, which should be predicted by the use of a regression
model.
Task
Start The Unscrambler and log in.
How to Do It
Start The Unscrambler by double-clicking on The Unscrambler icon or selecting The Unscrambler from the
Start menu in Windows. A list of the users that are registered in The Unscrambler is shown. (Lookup Image
A002)
Select yourself from the list of users and click OK. If your name does not appear, the system supervisor has to
add your name to the list of users. You are asked to enter your password before The Unscrambler is opened if
the system supervisor has set this option.
Task
Read the Tutor_a data file into the Editor and view some basic statistics of the data table.
How to Do It
Use File - Open to select the file Tutor_a in the Examples directory. This directory should be below the
directory where you installed The Unscrambler. (Lookup Image A003)
Some basic statistics like the Mean, Standard Deviation and Skewness of the samples and variables can be
calculated and shown in a new Editor. Select View - Sample Statistics or View - Variable Statistics. A
dialog pops up which asks you on which part of the data table to calculate the statistics: (Lookup Image
A005)
Accept the default choice (All samples or All variables) and click OK. A new Editor is launched with means,
standard deviations, etc. (Lookup Image A006)
Close the Editor window with the statistics before you continue.
How to Do It
Choose Modify - Edit Set to launch the Set Editor. You see the list of already defined Variable Sets (which
in this case is empty). (Lookup Image A007)
A007 Set Editor Dialog (Variable Sets) A008 New Variable Set Dialog
Press Add... to launch the New Variable Set dialog (Lookup Image A008), where you define the first
variable Set:
Name: Light Absorbance
Data Type: Non-Spectra
Interval: 1-2
You can enter the variable numbers directly in the Set Interval field, or click Select to launch an interactive
Editor where you mark the variables that belong to the Set. De-select variables you have marked by mistake by
pressing <Ctrl> while you click on the variable you want to remove from the Set.
Click OK. Back in the Set Editor, press Add... again to launch the New Variable Set dialog once more,
where you define the second variable Set:
Name: Constituent A
Data Type: Non-Spectra
Set Interval: 3
Click OK.
Change the Set type to Sample Sets by selecting Sample Sets from the drop-down list in the Set Editor.
(Lookup Image A009)
Press Add... to launch the New Sample Set dialog (Lookup Image A010), where you define the following
Sample Sets in the same way as you defined the Variable Sets:
Name: Calibration Samples
Interval: 1-7
Name: Prediction Samples
Interval: 8-10
Click OK when you are finished with the Set Editor.
You will save a lot of energy in your own analyses later by defining the necessary Sets from the beginning. All
analyses and plotting will be much easier for you to set up.
Remember to save the data table before you proceed by selecting File - Save or pressing the button.
Task
Find the regression of component a on the absorbance of red light ( X1 ).
How to Do It
You do the regression by plotting the red light variable ( X 1 ) against component a:
We want to do the univariate regression on the calibration samples only, and the Y-values are missing in the
prediction Set.
The plot displayed here appears (Lookup Image A012), but without the trend lines. Toggle the regression
and/or target line on and off using View - Trend Lines - Regression Line/Target Line. The target line is
very useful in predicted vs. measured plots.
Statistics from the plot is shown in a special frame in the upper left corner. Toggle it on and off using View -
Plot Statistics.
Tutorial A - Calibration
Now it is time to make the first multivariate model.
Task
Make a PLS regression model between the absorbance measurements and the concentration of a.
How to Do It
Activate the Tutor_a data table by clicking on it or selecting it from menu Window 1 Tutor_a. Unmark
the variables by pressing the <Esc> key.
Select Task - Regression. Use the following parameters to define the model in the Regression dialog:
(Lookup Image A013)
Method: PLS1
Samples: Calibration Samples [7]
X-variables: Light Absorbance [2]
Y-variables: Constituent A [1]
Weights: All 1.0
Validation method: Leverage Correction
Num PCs (number of components): 2
Model Size: Full
The Center Data and Issue Warnings tick-boxes should always be checked, and Add Start Noise un-
checked. This also applies to all models you make later.
Leverage correction is a validation method that is quick, but may give too optimistic results. It is useful in the
first runs of modeling, and when the data table is small. Therefore we use it here and in most of the later
tutorials. You should use a more conservative validation method for your own data. When Leverage correction
is used in a PLS model, an information dialog pops up (Lookup Image A014).
Click OK to start the calibration.
Task
Interpret the residual variance curve
Display the modeling results
Study the Regression Coefficients plot
You can also always display the modeling results of saved models from the Results menu. Select the kind of
model you want to look at and mark the model in the Results dialog.
Information about the model is available in the Information field. This is useful to answer questions like:
Which sample Set did we use to make the model? Did we remember to weight the X-variables? In the Results
dialog, the residual variance curve is displayed on a small screen to see the performance of the model.
Go to Edit - Options and select Bars in the Plot Layout field if the plot is displayed as curves. (Lookup
Image A018)
Let the mouse cursor rest over one of the bars to see which variable it is. Click once more to get th e object
information window. The b-coefficient for the Red absorbance is 1.04, the b-coefficient for the Blue
absorbance is -0.208 and the offset (B0) is 1E-06, i.e. approximately zero.
The b-coefficients enable us to write the model equation relating the concentration of a to the Red and Blue
light absorbances:
Tutorial A - Prediction
The purpose of making a regression model is most of the time to be able to predict the response value of new
samples that are measured in the future.
Task
Use the calibration model to predict the concentration of a in the three unknown samples in the data table.
How to Do It
Use menu Window 1 Tutor_a to activate the data table, but do not close the Viewer with the regression
coefficients. Then, select Task - Predict. Use the parameters below to make the necessary specifications in
the Prediction dialog: (Lookup Image A019)
Samples: Prediction Samples 3
X-variables: Light Absorbance 2
Y-reference: no selection (do not include Y-reference values)
Model Name: Tutorial A
Number of Components: 2
Press Find to select the model if you do not remember its name or are not sure where it is. You may also enter
the model name directly into the field. Click OK to start the prediction.
Task
Evaluate the prediction results by looking at the plots Predicted vs. Measured from the PLS calibration stage
and Predicted with Deviation from the prediction stage.
How to Do It
First, let us look at the predictions you just made. Press View in the Prediction Progress dialog to open the
Viewer with the prediction results: (Lookup Image A020)
Save the results file under the name Tutorial A Prediction 1 before you proceed.
Activate the Viewer with the regression coefficients (Window 2 Tutorial A). This is one way to go back
to the model results. Then select Plot - Predicted vs Measured and specify the following parameters in the
Predicted vs Measured dialog: (Lookup Image A021)
Plot type: Predicted vs. Measured
Y-variable: 1; Comp a
Components: 2
Samples: Calibration
Click OK.
The Predicted vs Measured plot appears. (Lookup Image A022)
Use View - Trend Lines to toggle the regression and/or target line on and off.
Use View - Plot Statistics to toggle the statistics windows on and off.
You see that the prediction by the PLS model is extremely good in this case. Compare this multivariate
regression with the univariate regression result, which used variable Red only to predict Comp a (Lookup
Image A012). The correlation between predicted and measured is higher when the multivariate model, based
on both Red and Blue, is used for prediction.
Task
Make a new calibration with the same parameters as last time, but change the validation method to cross
validation.
How to Do It
Activate the Editor (Window 1 Tutor_a) and select Task - Regression. Use the following parameters:
(Lookup Image A023)
Method: PLS1
Samples: Calibration Samples [7]
X-variables: Light Absorbance [2]
Y-variables: Constituent A [1]
Weights: All 1.0
Validation method: Cross Validation
Num PCs (number of components): 2
Task
Use some of the plot options.
How to Do It
Press View in the PLS1 Regression Progress dialog to launch the regression overview of the latest model.
(Lookup Image A025)
Activate the Scores plot, which is the upper left plot, by clicking in it.
Select Edit - Options and go to the Sample Grouping tab. Tick the box Enable Sample Grouping and
choose Cross Validation Segments in the Group By field. (Lookup Image A026)
Click OK. (Lookup Image A027)
You see that each segment from the cross validation has its own color in the score plot. In this example each
segment had only one sample, but in other cases this is a good way to see how the segments are distributed in
the whole population of samples.
Activate the Predicted vs. Measured plot (lower right corner) and select Edit - Options. Enable sample
grouping the same way as you did before, but group by Value of Variable this time, choose X-variable 1 and
select to generate three groups. (Lookup Image A028)
Use the Next Horizontal PC and Previous Horizontal PC buttons to display the active plot
Predicted vs Measured for one/two Principal Components. (Lookup Image A030)
Save this last model and give it a meaningful name, if you want to take a look at it later from Results -
Regression without remaking the whole model.
We are also interested in finding a way to rationalize quality control, since the use of taste panels is very
costly. Therefore, we will try to find instrumental measurement variables to replace some of the sensory
testing. Problem II is thus to explore the relationships between sensory variables and chemical/instrumental
measurements.
Finally we would like to predict consumer preference for raspberry jam from descriptive sensory analysis. This
is Problem III.
Insert category variables;
Define Sets;
Decompose by PCA;
Interpret scores and loadings;
PLS regression;
Export models;
Predict response values from new samples;
Estimate regression coefficients;
Find optimal number of components;
Numerical results.
18 Quality Analysis with PCA and PLS (Tutorial B) The Unscrambler Tutorials
Agronomic production variables
The samples are taken from four different cultivars, at three different harvesting times:
No Name Cultivar Harvest No Name Cultivar Harvest
time time
1 C2-H1 2 1 7 C2-H3 2 3
2 C4-H1 4 1 8 C4-H2 4 2
3 C3-H3 3 3 9 C1-H2 1 2
4 C3-H1 3 1 10 C3-H2 3 2
5 C1-H1 1 1 11 C1-H3 1 3
6 C4-H3 4 3 12 C2-H2 2 2
Note that the agronomic production variables are not used as input variables in any of the matrices, but they are
known information which is very valuable for the interpretation of the results of the data analysis. They will
be utilized as category variables later.
Note that the variable numbers in that table are within the Instrumental Variable set, and not to the variable
numbers in the original data table.
Note that the variable numbers in that table are within the Sensory Variable set, and not the variable numbers
in the original data table.
The Unscrambler Tutorials Quality Analysis with PCA and PLS (Tutorial B) 19
Variable Set Preference
114 representative consumers tasted the 12 jam samples and gave them preference scores on a scale from 1-9.
The average over all consumers for each sample is given in the data table.
Task
Insert two category variables Cultivar and Harvest Time.
How to Do It
Open the data file Tutor_b by selecting File - Open. (Lookup Image B001)
Activate a cell in the first column of the table as we will insert our category variables at the beginning of the
table. Then, follow these five steps:
20 Quality Analysis with PCA and PLS (Tutorial B) The Unscrambler Tutorials
1. Select Edit - Insert - Category Variable. The dialog: Category Variable Wizard - Enter
Variable Name and Choosing Method pops up. (Lookup Image B002)
2. Enter Category Variable Name Cultivar and choose I want to specify the levels manually. Press
Next.
3. This launches the Specify Levels dialog, (Lookup Image B003) where you must specify the levels
of your new category variable. Use C1, C2, C3, and C4 as the level values for Cultivar. Type in the
name of the level in the Level name field and press Add to add the level; one for each cultivar.
4. Press Finish and the category variable is inserted into the Editor. (Lookup Image B004)
Note that category variable names appear in blue fonts in the Editor to distinguish them from ordinary
variables.
5. All cells are filled with m to denote missing. Enter the level for a sample by double -clicking the
category variable cell. The cell is highlit and a drop-down list appears. Click to see the available
levels and click on the correct one. Use the arrow keys to move up and down in the list. The cultivar
(and harvest time) values are seen in the sample name.
B002 The Category Variable Wizard - B003 The Category Variable Wizard -
Enter Variable Name and Choosing Specify Levels dialog
Method dialog
B004 The Tutor_b data table displayed in B005 The Tutor_b data table displayed in
the Editor (with Cultivar) the Editor (after insertion of Cultivar and
Harvest Time)
Insert the category variable Harvest Time and fill in the correct Harvest Time levels by repeating the five -
step procedure above (Lookup Image B005).
The Unscrambler Tutorials Quality Analysis with PCA and PLS (Tutorial B) 21
Tutorial B - Check Variable Sets
In The Unscrambler, matrices are defined by Sample and Variable Sets. It is a good habit to define all Sets
before any analyses are performed.
Task
Check that the three Variable Sets: Instrumental, Sensory and User Preference were defined.
How to Do It
Select Modify - Edit Set to open the Set Editor. Check that the three following Variable Sets were defined:
(Lookup Image B006)
Set name: Instrumental
Data Type: Non-Spectra
Size: 6 variables
Interval: 3-8
Set name: Preference
Data Type: Non-Spectra
Size: 1 variable
Interval: 14
Set name: Sensory
Data Type: Non-Spectra
Size: 12 variables
Interval: 9-13, 15-21
B006 The Set Editor dialog with three User-defined Variable Sets
These sets were defined for you, and automatically saved as data table Tutor_b was saved. When working on
your own data, to create sets press the Add button in the Set Editor dialog. This launches the New Variable
22 Quality Analysis with PCA and PLS (Tutorial B) The Unscrambler Tutorials
Set dialog. Enter the Name and Data type of your set. Press Select to launch an Editor where you can mark
the variables that belong to the Set you are defining. You may alternatively enter the set intervals directly in
the Set Interval field.
Note that the Set Sensory is not continuous, but consists of two ranges in the data table. Together, the
variables from these two ranges define variable set Sensory.
Task
Define two Sample Sets: Calibration Sam and Prediction Sam.
How to Do It
In the Set Editor dialog, change the set type to Sample Sets and define the following parameters:
Name: Calibration Sam
Set Interval: 1-12
Set Name: Prediction Sam
Set Interval: 13-20
To do this: Press the Add button in the Set Editor dialog. This launches the New Sample Set dialog. Enter
the Name of the set. Press Select to launch an Editor where you can mark the samples that belong to the Set
you are defining. You may alternatively enter the set intervals directly in the Set Interval field.
Save the data file in the Editor before you continue with the tutorial.
Task
Make a PCA model using the Set Sensory (i.e. one data matrix is decomposed by PCA).
The Unscrambler Tutorials Quality Analysis with PCA and PLS (Tutorial B) 23
How to Do It
Select Task - PCA. Specify the following parameters in the Principal Component Analysis dialog:
(Lookup Image B007)
Samples: Calibration Sam [12]
Variables: Sensory [12]
Weights: All 1.0
Validation method: Cross Validation
Num PCs: 8
B007 The Principal Component Analysis B008 The Cross Validation Setup dialog
dialog
Press Setup in the Validation Method field to specify in the Cross Validation Setup dialog that Full
Cross Validation is to be used (Lookup Image B008). This validation method is more time consuming than
leverage correction, but the estimate of the residual variance is more reliable.
No weighting is used in this model, i.e. all weights are set to 1.0, to see which variables do actually vary the
most. However, sensory variables are often weighted when you investigate relationships with other variables.
The most common weighting to use is 1/SDev.
Click OK to start the PCA. You see how The Unscrambler makes a PCA model for each segment, twelve in
all. Finally, the global model is made and the residual variance curve is shown for this model.
Task
Interpret the residual variance curve in the PCA Progress dialog. This displays the progress of the modeling.
The residual variance should decrease as the number of PCs in the model increase, and should be as small as
possible.
24 Quality Analysis with PCA and PLS (Tutorial B) The Unscrambler Tutorials
How to Do It
The residual variance decreases until PC 5 is reached. Then the residual va riance increases again due to
overfitting. The important decision we face now is to select the optimal number of PCs in this model.
The lowest residual variance is found with 5 PCs, but the residual variance in a model using 3 PCs is not much
worse. A simple model is more robust than a complex one, and easier to interpret. We therefore choose to work
with a model consisting of 3 PCs.
Note that the residual variance shown here is the residual variance for X, while the regression models made in
Tutorial A and later in this tutorial show the residual variance for Y. This reflects the difference between PCA
and regression models. PCA focuses on one matrix, X, containing variables which describe the samples;
regression models like PLS focus on a second matrix, Y, which contains variables to be predicted.
Press View to take a closer look at the other model results. The residual variance turns up again in the PCA
Overview, (Lookup Image B009) which consists of four plots that reveal a lot of information.
The Viewer which you are now looking at has the most common model results available for you as predefined
plots in the Plot menu. You can always get this display of your model back via the Results menu. Let us look
at the different plots in the PCA overview.
The Unscrambler Tutorials Quality Analysis with PCA and PLS (Tutorial B) 25
Task
Change the residual variance plot to an explained variance plot.
How to Do It
Activate the lower right plot by clicking in it. Select View - Source and change this option from Residual
Variance to Explained Variance. You also have access to this menu option by right-clicking with the
mouse in the plot or by using the corresponding toolbar button for explained variance . A more elaborate
way of doing this is to make the plot once again using Plot - Variances and RMSEP, but the other ways to
change the plot are preferred because faster.
The residual variance is now converted to explained variance (Lookup Image B010). The information is the
same, but presented in another way. The residual variance is well suited to find the optimal number of PCs to
use in a model, while the explained variance is a better measure to tell how much of the variation in the data
the model describes.
You see that a model with 3 PCs describes almost 92% of the validated variation in the data; for calibration it
is 97%. You can get the value by clicking at the data point in the plot. Use the toolbar buttons and to
change between having only the calibrated or validated variance curve plotted, or both.
B010 The PCA Explained Variance plot B011 The PCA Scores plot
Task
Interpret plot Scores . Use different plot options to ease interpretation.
How to Do It
The score plot shows the projected locations of the objects onto the PCs, and by studying patterns you may
find the meaning of the PCs. (Lookup Image B011) There are many patterns to be detected from score (and
loading) plots.
On the score plot, you will notice that the 12 samples are not arranged in a random way on the map. When you
move from the left to the right part of the plot, you first encounter samples harvested on time H1, then H2 and
finally H3. Moreover, if you now move from the top to the bottom, you see several C4 samples first, then C3,
then C2, and finally C1.
26 Quality Analysis with PCA and PLS (Tutorial B) The Unscrambler Tutorials
The category variables that were inserted into the data table will make things even clearer. Select Edit -
Options. Select the Sample Grouping tab and tick Enable Sample Grouping. Choose the following
options: (Lookup Image B012)
Separate with: Colors
Group By: Value of Variable; Levelled Variable
Markers Layout: Name
You may press Select in the Group By field to select the levelled variable that you want to use as a marker. It
launches an Editor where you can mark the category variable of your choice, for example variable 1: Cultivar.
B012 The Sample Grouping sheet in the B013 The Options dialog, Markers Layout
Options dialog Before and After adjusting the Markers
Layout
Now, we are going to alter the Markers Layout a little. The sample names are entered with an underscore in
the data table. We are going to remove this underscore from the markers in the plot.
Click once in the fifth box in the Name sequence. All boxes that are ticked correspond to letters in the sample
names that will be displayed. Press the <Ctrl> key and click the third box to remove the third character (i.e. the
underscore). (Lookup Image B013)
Note:
The first click marks the beginning; the second click marks the end of a range. Make it a habit to click twice
whenever you want to mark a range of marker characters; once to mark the beginning of the range and once
again to mark the end of the range. Press the <Ctrl> key at the same time as you click a box to (de-)select a box
in the marker.
Press OK. The Scores plot is updated with the Sample grouping options. Each level of the category variable is
assigned a unique color, and the markers in the plot are displayed without the underscore. (Lookup Image
B014)
The Unscrambler Tutorials Quality Analysis with PCA and PLS (Tutorial B) 27
B014 PCA, Scores plot with samples grouped by colors
Try to perform a new sample grouping, this time upon category variable Harvest Time.
Task
Interpret variable relationships in the correlation loadings plot.
How to Do It
Activate the X-Loadings plot by clicking on it, then use menu View - Correlation Loadings or the
corresponding shortcut button. The Correlation Loadings plot is best appropriate to study variable
correlations. (Lookup Image B015)
B015 PCA, Correlation Loadings (X) plot (PC1 vs PC2)
The plot shows that two variables (REDNESS and COLOUR) have an extreme position to the right of the plot
along PC1. They are close to each other, and far from the center, very close to the 100% explained variance
28 Quality Analysis with PCA and PLS (Tutorial B) The Unscrambler Tutorials
circle; they correlate positively. This also means that objects lying to the right of the score plot have higher
values for those two variables.
Along the vertical axis (PC2), you notice two variables lying at the top (R.SMELL and R.FLAV), opposed to
variable OFF FLAV which lies at the bottom. So we see that raspberry smell and flavor correlate positively
with each other, and negatively with off-flavor. Thus, the more you move up on the score plot, the more the
smell and flavor of the samples will be characteristic of raspberries.
Task
Relate Scores (samples) information to Loadings (variables) information.
How to Do It
The Scores plot and Correlation Loadings plot show that samples C2H3 and C1H3 have strong color and
redness intensities, while sample C1H2 has much off-flavour. Samples in one spot of the 2-vector score plot
has, in general, much of the properties of the variables pointing in the same direction in the loading plot,
provided that the plotted PCs describe a large portion of the variance.
PC 3 describes the variation in sweetness, bitterness and chewing resistance. Confirm this by activating the
loading plot (upper right quadrant) and selecting Plot - Loadings. Display PC 1 vs. PC 3 by changing
Vector 2 in the Components field in the Loadings dialog to 3. (Lookup Image B016)
B016 The Loadings dialog
The Unscrambler Tutorials Quality Analysis with PCA and PLS (Tutorial B) 29
On this new plot, the horizontal axis is unchanged (PC1) and the vertical axis is PC3. Use View -
Correlation Loadings to better interpret variable correlations along PC3.
Task
Interpret the influence plot, which is used to look for outliers.
How to Do It
The influence plot is displayed in the lower left quadrant. The strongest outliers are placed in the upper right
corner of the plot, and have a large leverage and a high residual variance. In this particular case, we do not see
any outliers. (Lookup Image B017)
B017 PCA, Influence Plot
Close the PCA overview and save the results file with the name Tutorial B PCA. Close all other Viewers you
may have open at the same time.
30 Quality Analysis with PCA and PLS (Tutorial B) The Unscrambler Tutorials
Task
Make a PLS2 regression model that predicts the variations in sensory variables from instrumental and chemical
variables.
How to Do It
Select Task - Regression. Specify the following parameters in the Regression dialog: (Lookup Image
B018)
Method: PLS2
Samples: Calibration Sam [12]
X-variables: Instrumental [6]
Y-variables: Sensory [12]
Weights: All 1/SDev in X and Y
Validation Method: Cross Validation
Number of components: 6
Press Weights to launch the Set Weights dialog. (Lookup Image B019) Press All to change the weighting
of all variables at the same time. You can also select the variables by clicking on them in the list. Remember to
hold <Ctrl> down while you select several variables. Choose the A / (Sdev +B) radio button. Use constants A
= 1 and B = 0.
We are weighting all variables by dividing them with their own standard deviations. This allows all variables
to contribute to the model, regardless of whether they have a small or large standard deviation from the outset;
what really counts is the systematic variation.
Press Update and see the weights change in the list, then click OK.
Remember to adjust the weights for both X-variables and Y-variables.
The Unscrambler Tutorials Quality Analysis with PCA and PLS (Tutorial B) 31
Press Setup to launch the Cross Validation Setup dialog and choose Full Cross Validation as the cross
validation method. Normally it is more practical to use leverage correction in the first calibration runs to detect
outliers etc., and re-calibrate with a proper validation method (e.g. cross validation) as the last step.
Click OK in the regression dialog when you have set all parameters. The PLS2 Regression Progress
dialogs shows how the different segments are being made before the final model is calibrated. The prediction
error is minimized after five PCs, but the first local minimum is 0.84 after two PCs, which we must choose to
avoid overfitting.
This Viewer is your gateway to your model. You can choose the most useful and common predefined result
plots, e.g. loading weights and residuals, from the Plot menu. At later stages you can always review this model
by using Results - Regression and selecting this results file.
Before we continue with the interpretation, let us take a look at the warnings that were issued during the
calibration.
Task
Interpret the warnings given for this model in the Warning List.
32 Quality Analysis with PCA and PLS (Tutorial B) The Unscrambler Tutorials
How to Do It
Use Window - Warning List to display the warnings at the bottom of the screen. In this case, the warnings
relate to the variance curves and do not indicate any outliers in the data set.
You may also want to take a look at the actual tests that lead to these warnings by looking at the outliers list.
Click the Outliers button in the warning window to see the outlier tests displayed in the Outlier List dialog.
For details on how to find and identify outliers, see Tutorial C.
Task
Interpret the explained variance curve, which can be shown as residual variance, as it was in the PLS
Regression Progress dialog, or as explained variance. The two different views are useful for different
tasks.
How to Do It
The residual variance plot in the lower left corner is the same as you saw in the PLS Regression Progress
dialog. We saw that a local minimum was reached with two PCs. Now we want to look at how much each of
the six first Y-variables are described by the model. We do this by looking at the explained variance.
Activate the lower left window. Select Plot - Variances and RMSEP and use the X- or Y-variance tab,
where you specify the following parameters: (Lookup Image B021)
Variables: Y; 1-6
Samples: Validation
And check the Total box. Press OK.
B021 Variances and RMSEP dialog, X- B022 PLS2, Explained Validation Variance Plot
or Y-variance sheet displayed for the Total model and for the six
individual Y-variables
Make sure that the plot shows the Explained Variance. If not, change it by selecting View - Source -
Explained Variance. (Lookup Image B022)
The Unscrambler Tutorials Quality Analysis with PCA and PLS (Tutorial B) 33
We concluded from the residual variance curve that two PCs were optimal. Here, we see that the variables that
are well described are done so by two PCs. About 85% of the color variation (variables 1 and 2), and 80% of
the variation in sweetness (variable 6) can be explained by a combination of the chemical and instrumental
variables.
Note that only 23% of the total Y-variance is explained by the model using two PCs.
Task
Interpret the score plot.
How to Do It
The score plot shows patterns in the samples. This is often difficult to see without some help. Use the category
variables as markers the same way you did in Tutorial B - Interpretation of the Score Plot for the PCA
model, using Edit - Options from the Scores plot, and selecting the relevant options in the Sample
Grouping tab of the Options dialog.
You see that PC 1 describes the harvesting time. Harvest time 1 is placed to the left in the plot and harvest time
3 to the right. The score plot does not reveal information about the cultivars.
A comparison with the loading plot gives more information. Try to interpret the two plots (Scores and
Loadings) together.
Task
Interpret the loadings plot.
Interpret the loading weights plot.
How to Do It
The loadings plot is located in the upper right quadrant. Activate it and select Plot - Loa ding Weights.
(Lookup Image B023) On the General sheet, make sure you plot both X and Y, which gives you the
loading weights for X and the loadings for Y. Plot PC1 vs. PC2 in the upper right corner.
34 Quality Analysis with PCA and PLS (Tutorial B) The Unscrambler Tutorials
General sheet Loadings Plot
Draw straight lines between the variables through the origin. Variables along the same line, far from the origin,
may be correlated. (Negatively correlated when situated on opposite sides of the origin.) (Lookup Image
B024)
It seems that the spectrophotometric color measurements (L, A, and B) are strongly negatively correlated with
color intensity and redness. Sweetness is, as expected, rather strongly negatively correlated with measured
Acidity. But the R. Flavor shows weak correlation to the PLS-factors (near origin = low PLS loadings).
We learned in Problem I that the jam quality varied both with respect to color, flavor, and sweetness. But the
results so far in Problem II show that the chemical and instrumental variables mainly predict variations in color
and sweetness (which is indicated by the low explained Y-variance of Flavor). This means that we cannot
replace the Y-variable Flavor with the present set of X-variables. There is no information in the chemical and
instrumental measurements we have made that are related to the Flavor content in the jam samples.
Use of other instrumental X-variables, e.g. gas chromatographic data, could probably have increased the flavor
prediction ability of the raspberry jam data.
Task
Interpret the predicted vs. measured plot.
How to Do It
The predicted vs. measured plot in the regression overview currently displays the results for the first Y-
variable. (Lookup Image B025) Use Plot - Predicted vs Measured to see how the predictions are for
other variables. Make sure to display these plots for two PCs, as this is the right number of PCs for our model.
The Unscrambler Tutorials Quality Analysis with PCA and PLS (Tutorial B) 35
B025 PLS2, Predicted vs Measured Plot for variable Redness, model with two PCs
Close the results Viewer and save it with the name Tutorial B Inst-Sens.
Task
Make a PLS1 regression model of the relationships between sensory data and preference.
How to Do It
From the Editor, select Task - Regression, and specify the following parameters in the Regression dialog:
Method: PLS1
Samples: Calibration Sam [12]
X-variables: Sensory [12]
Y-variables: Preference [1]
Weights: All 1/SDev in X and Y
Validation method: Full Cross Validation
Uncertainty test: on
Number of components: 6
36 Quality Analysis with PCA and PLS (Tutorial B) The Unscrambler Tutorials
Press Weights to launch the Set Weights dialog, and weight all variables with 1/Sdev to get them in the
same range and let them contribute equally in the modeling.
Press OK. In the PLS1 Regression Progress dialog, we see that the residual variance seems to decrease all
the time, which may lead us to think that we should use five or six PCs for predictions. Let us for the residual
variance plot in the regression overview before we decide upon number of PCs to use. Click View to open the
regression overview. (Lookup Image B026)
Task
Interpret the regression overview plots, which display the necessary plots to diagnose the model quickly.
How to Do It
We are mostly interested in how well the model can do the predictions. We therefore only comment on the
residual variance and the Predicted vs Measured plots.
The Unscrambler Tutorials Quality Analysis with PCA and PLS (Tutorial B) 37
B027 PLS1, Residual Validation Variance Plot B028 The Predicted vs Measured dialog
Predicted vs Measured
Activate the predicted vs. measured plot and select Plot - Predicted vs Measured. Specify the following
parameters in the Predicted vs Measured dialog: (Lookup Image B028)
Y-variable: 1
Components: 2
Samples: Validation
Press OK.
Turn on the regression line and the target line with View - Trend Lines. (Lookup Image B029)
We see that the predictions are fairly good. Some samples are not so well predicted, but the overall correlation
coefficient is good. The warnings issued are of no real consequence for this model.
B029 PLS1, Predicted vs Measured Plot B030 PLS1, Regression Coefficients Plot
with trend lines
Predicted Y
9
Elements: 12
Slope: 0.838829 C1_H3
Offset: 0.669368
Correlation: 0.921301 C2_H3
C1_H2
RMSEP: 0.830774
6
SEP: 0.855452 C3_H2
Bias: -0.139174 C2_H2 C3_H3C4_H3
C1_H1
C4_H1C2_H1
C4_H2
3
C3_H1
0
Measured Y
0 2 4 6 8 10
Model, (Y-var, PC):(PREFEREN,2)
38 Quality Analysis with PCA and PLS (Tutorial B) The Unscrambler Tutorials
Tutorial B - Interpretation of the Regression Coefficients
The regression coefficients are used to calculate the response value from the X-measurements. The size of the
coefficients gives an indication of which variables have an important impact on the response variables.
There are two kinds of regression coefficients, Bw and B. The Bw coefficients are calculated from the
weighted data table and are used for interpretation. The B coefficients are calculated from the raw data table
and are used for predictions.
Task
Find which variables are important for predicting Y-variable Preference.
How to Do It
The estimated regression coefficients tell us the cumulative importance of each of the sensory variables to the
consumer preference.
Select Plot - Regression Coefficients. Double-click on the preview screen to make the plot fill the whole
Viewer. Choose the Weighted coefficients (BW) option. Specify 2 Components before you click OK.
(Lookup Image B030)
Use Edit - Options to change the layout of the plot to Bars. Then, select Edit - Mark - Significant X-
Variables Only. (Lookup Image B031)
B031 PLS1, Regression Coefficients Plot after automatic marking of significant X-
variables
The Unscrambler Tutorials Quality Analysis with PCA and PLS (Tutorial B) 39
Redness, Color and Sweetness are statistically significant in predicting Preference. Raspberry Smell is also
significant, but contributing negatively to the Preference. Thickness seems to be of importance also as it has a
large (negative) coefficient, however it is not shown significant in this model.
Task
Import regression coefficients into an Editor.
How to Do It
Select File - Import - Unscrambler Results and select Import data into New data table in the Import
Target dialog. (Lookup Image B032) Select the file Tutorial B Sens-Pref in the Import dialog. You
will find the file when you use File of Type: Regression.
B032 The Import Target dialog B033 The Import from Regression Result
dialog
In the Import from Regression Result dialog, mark the matrix B and select PCs: 2 in the field below the
matrix list, and then select B0 as well in the matrix list. (Lookup Image B033)
40 Quality Analysis with PCA and PLS (Tutorial B) The Unscrambler Tutorials
Note that B may be used for prediction of new, un-weighted data, while Bw (studied above in the Regression
Viewer) should be used with new, weighted data. Always identify important variables by studying Bw when
the data used in the model have been weighted.
Click OK. An Editor with the regression coefficients is launched. (Lookup Image B034) The b-coefficient s
can then be treated as every other data in an Editor. You may plot the coefficients from the Plot menu, etc.
B034 Editor with the imported B coefficients from the PLS1 model relating
Preference to sensory properties
Close the Editor with the imported B-coefficients before you proceed.
Task
Export the regression model used to predict Preference from Sensory Data.
How to Do It
Select Results - Regression and find the result file Tutorial B Sens-Pref. Mark it and look at the
information given in the lower part of the dialog. Here you see which Sample and Variable Sets were used in
the modeling, whether you used weighting, etc. The information given here is very useful when you want to
find a particular model at a later stage.
Click on the Export button. This launches the Export Model dialog. (Lookup Image B035) Select Ascii-
Mod to launch the dialog Export ASCII-MOD.
B035 The Export Model dialog B036 The Export ASCII-MOD dialog
The optimal number of components should be used in the export. Therefore, change the number of PCs to 2
before you click OK. (Lookup Image B036)
The Unscrambler Tutorials Quality Analysis with PCA and PLS (Tutorial B) 41
Full ASCII-MOD export includes all results that are necessary to do outlier detection, etc. You may want to
use this format if you need to use Unscrambler models outside The Unscrambler, for example in a program you
wrote yourself. The ASCII-MOD file is readable by any ASCII editor.
Task
Predict the Preference for the jam samples.
Interpret the prediction results to see whether the predictions can be trusted.
How to Do It
Activate the Tutor_b Editor. Select Task - Predict and specify the following parameters in the Prediction
dialog: (Lookup Image B037)
Samples: Prediction Sam
8
X-variables: Sensory
12
Y-reference: Not included
Model: Tutorial B Sens-Pref
Number of Components: 2
Click OK to perform the prediction.
B037 The Prediction dialog B038 Prediction Results Predicted with Deviation
Plot
42 Quality Analysis with PCA and PLS (Tutorial B) The Unscrambler Tutorials
Tutorial B - Interpretation of Predicted with Deviation
No reference measurements were made for the samples in the Prediction Sam Set. This makes it impossible
to check predicted vs. measured values. Because we have made a model based on projection, we have an
option left: To check the reliability of the predictions from the deviations.
Task
Interpret the Predicted with Deviation plot.
How to Do It
Click View in the Progress dialog to see the predicted with deviation plot. (Lookup Image B038)
Predicted preference for the unknown new jams have some uncertainty limits, i.e. the accuracy of new
predictions is not so good, but this model can be used to predict the preference of new jam samples to give an
indication of which ones will be accepted or not by customers.
Save the results file under the name Tutorial B Predict 1.
Task
Plot the RMSEP.
How to Do It
The information you need is stored with the PLS model -Tutorial B Sens-Pref. Therefore, we have to find a
way to look at those old results. This is done by opening a results Viewer again. Select Results -
Regression. Mark the model and click View. The regression overview appears.
Select Plot - Variances and RMSEP and go to the RMSE sheet. Double-click on the preview screen to fill
the whole Viewer with the RMSE plot. (Lookup Image B039)
B039 Variances and RMSEP B040 PLS1, Root Mean Square Error Plot
dialog, RMSE sheet
The Unscrambler Tutorials Quality Analysis with PCA and PLS (Tutorial B) 43
Select only the RMSEP in the Samples section. Click OK. (Lookup Image B040)
Now you can study the RMSEP for Preference for all PCs. RMSEP (using two PCs) is 0.83. This means that
any predicted new sample on the scale from 1 to 9 will have a prediction error around 0.8. This is an acceptable
error level in sensory analysis, which has much uncertainty in all measurements.
44 Quality Analysis with PCA and PLS (Tutorial B) The Unscrambler Tutorials
Spectroscopy and Interference Problems (Tutorial C)
Description of Tutorial C
Context of Tutorial C
We need an easy way to determine the concentration of dye (a brightly red-colored heme protein, Cytochrome-
C), predicted variable Dye, in water solutions. Dye absorbs light in the visible range, and we want to base the
concentration determination on this light absorbance.
In the solutions to be analyzed there are varying, unknown amounts of milk, which absorbs some light in the
same wavelength range as dye and therefore causes chemical interference in the measurements. In addition,
milk contains particles that give serious light scattering.
Another effect that will influence the absorbance spectra is the varying sample thickness.
The Light Absorbance Spectrum figure shows the light absorbance spectrum of one sample of the
dye/milk/water solution (Lookup Image C001). The vertical lines represent the 16 different wavelength
channels selected as predicting variables - ( x1 , x2 , , x 16 ) for this sample.
This example is constructed to enable duplication in a lab. This illustrates so well the interference effects and
other effects that make spectroscopy difficult. However - similar problems occur at many industrial
applications, eg. at measuring the concentration of different chemical species in sewer water, w hich contains
many other chemical agents, as well as physical interferences like slurries and particles.
The two major peaks (channels x 4 and x 6 ) represent the absorbance of dye, while the first peak ( x 2 )
represents absorbance due to an absorbing component in the milk. The broad peak to the right ( x12 , x13 and
x14 ) is due to light absorption by water itself.
A problem similar to this tutorial is described extensively in chapter 8 in the book Multivariate Calibration,
by Martens & Naes.
Note that the known Milk and Water quantities will not be used to make the model, only as descriptors in
result plots. The sample names are coded with these quantities as well.
Note: You will find the illustrations for this tutorial (Image C001, etc) at the end of the document.
Task
Open the data table and take a look at the properties of the data. Then define Sets to be used in the analyses.
How to Do It
Select File - Open and the file Tutor_c from the Examples directory. An Editor with the data table is
launched.
Go to Modify - Edit Set to define the necessary Variable and Sample Sets for later analyses in the Set
Editor. Define the Variable Sets and Sample Sets by clicking Add and entering the intervals given here:
Sample Sets:
Name: Calibration
Interval: 1-28
Name : Prediction
Interval: 29-42
Click OK when you have finished defining the variables and samples sets and save the Editor before you
continue.
Task
Plot some calibration samples in order to see how the spectra vary with varying amount of dye and milk.
How to Do It
We want to plot samples that have the same amount of milk, 10 ml. Do this by marking the samples in the
Editor (samples 6, 14, 19, and 23). Use Edit - Select Samples and specify the sample numbers in the dialog
(Lookup Image C002). The Selection method should be Select. Click OK and you see that the four
samples are marked in the Editor. You could do the same by clicking the sample numbers while holding down
the <Ctrl> key.
Select Plot - Line and specify that you wish to use the Variable set Absorbance in the Line Plot dialog
(Lookup Image C003).
These four samples have the same milk level and the plot shows that the dye level has infl uence on the
absorbance of wavelengths number 2 - 8 only.
Plot samples 20, 21, 22, and 23 the same way. These samples have the same dye level, 6 ml.
The plot shows that increasing milk level will increase the absorbance of light of all wavelengths from number
1 to number 16. There seems to be a great deal of interference or scattering to deal with, over the whole
spectrum. This indicates that we may have to do some transformations of our data to get an optimal model.
Close the Viewer so that the Editor with the data is active.
Task
Find the best wavelength on which to make a univariate regression model.
How to Do It
You find the best wavelength by looking at the correlation between each absorbance variable and the Dye level
variable. Activate the Tutor_c Editor. Select Task - Statistics and specify the following parameters in the
Statistics dialog (Lookup Image C005).
Samples: Calibration [28]
Variables: Statistical [17]
Click Close instead of View and save the result file with the name Tutorial C Statistics. We are going to
import the correlation matrix from the result file into an Editor instead.
Select File - Import - Unscrambler Results. Specify New data table in the Import Target dialog to
avoid overwriting the data table in the Editor.
In the Import dialog, change the Files of type to Statistics and select Tutorial C Statistics before you click
Import to launch the Import from Statistics Result dialog (Lookup Image C006). The matrix where the
correlation results are stored is called StatCorr and you should import Group 1.
The variable with the highest correlation coefficient to Dye Level is Xvar6 with a correlation coefficient of
0.49. Close the Editor with the correlation matrix; you do not need to save the Editor. The values in the Editor
are the correlation coefficients between the variables.
Now we should illustrate the regression in a plot. To get the right plot we have to copy Xvar6 to a variable left
of Dye Level. Mark the Xvar6 variable in the Tutor_c Editor. Then, click and hold the <Ctrl> key as you click
inside the marked column and drag the Xvar6 until the Dye Level variable is framed. Release the mouse button
and the Xvar6 is copied (Lookup Image C007).
Mark the two variables and select Plot - 2D Scatter. Remember to plot only the calibration samples
(Lookup Image C008).
Turn on the Regression Line and Target Line with View - Trend Lines, if they are not tuned on by
defaut.. Hopefully we can do better with multivariate regression models. Close the Viewer after you have
studied the plot. Mark the copied variable in the Editor (column 3) and delete it.
Tutorial C - Calibration
We choose to make a PLS regression model because PLS takes the variation in Y into consideration when the
model is calibrated.
Task
Make a PLS regression model between the variable set Absorbance (X) and the variable set Dye Level(Y).
How to Do It
Activate the Tutor_c Editor and select Task - Regression. In the Regression dialog, specify the following
parameters:
Method: PLS1
Samples: Calibration [28]
X-variables: Absorbance [16]
Y-variables: Dye Level [1]
Weights: All 1.0 in X and Y
Validation method: Leverage Correction
Num PCs: 10
Start the calibration by clicking OK.
Task
Find an outlier by looking at warnings and plots.
How to Do It
Click View to enter the Regression Overview plot. This shows the most important regression results, but
we are more interested in the warning list. Select Scores plot by clicking on it. (Lookup Image C009)
Select Window - Warning List if the warning list is not visible. . A dockable view appears with all warnings
listed (Lookup Image C010). The first warnings indicate that some samples are outliers. Look for further
information in the outlier list by clicking the outliers button.
Sample 8 is listed frequently. Investigate that sample further by plotting the raw data table.
Activate the Tutor_c Editor, mark samples 7, 8, 9, and 10. Select Plot - Line and use the Variable set
Absorbance (Lookup Image C011).
It is obvious that the pattern in sample 8 is typical. Samples that are very different from others may distort the
model so much that it becomes useless for future use. So this sample should not be included in the calibration
samples used to make the model.
The detection of outliers and the way you should treat them is an important, but difficult task. It makes no
sense to interpret the model as long as outliers are present. Close the Viewer with the line plot and save the
result file with the name Tutorial C. Now you should make a new model without the known outliers.
Task
Make a new PLS1 model with the same parameters as before, but with sample 8 kept out of the calculations.
How to Do It
Activate the Tutor_c Editor and select Task - Regression. In the Regression dialog, specify the following
parameters:
Method: PLS1
Go to the Samples sheet and click Select next to the Keep Out of Calculation field. An Editor pops up
where you can mark the samples that should not be used in the calibration. Mark sample 8. Several samples are
marked by holding down <Ctrl> while clicking on the samples to be marked. If you mark some other samples,
you may deselect them by holding down <Ctrl> while you click on the undesired samples. Click OK and the
sample 8 is inserted in the Keep Out of Calculation field.
There are still some warnings issued, but they do not make any real harm to the model. We go on and proceed
to look at other modeling results. Do not click View yet!
Task
Study the residual variance in the model.
How to Do It
We want to study the prediction error in the screen output.
(Lookup Image C012). The horizontal bars in the PLS1 Regression Progress dialog indicate the residual
variance after each PC. The first bar, in PC 0, represents the total variance. The second bar is about 10%
smaller, meaning that PC no 1 explains about 10% of the total variance. After PC no 2 about 2/3 of the total
variance has been explained. The numerical value of the residual Y-variance is shown, too.
The variation in the calibration samples cannot be described significantly better with any new PC after the five
first PCs. Very little more variance is explained by PC 8, but we still have not explained all of the variance.
After 8 PCs, observe how the prediction variance now increases slightly again, due to overfitting and noise
modeling.
The minimum estimated residual variance is less in this run than in the previous run: now 1.1 compared to 2.2
in the first model. It seems that seven PCs will give the optimal model. Eight PCs give a smaller variance, but
the difference is too small to motivate the use of more PCs.
If the model has successfully described systematic variation, we start to interpret diff erent additional modeling
results. The most important model results to study then, is the Scores, Loadings, and the Predicted vs
Measured.
Task
Interpret the plots in the regression overview.
How to Do It
The regression overview was launched when you clicked View (Lookup Image C013). It consists of four
plots of the most important modeling results from the regression model. Save the results file under the name
Tutorial C No Outliers before you continue.
The plot in the lower left corner is the residual variance. This is the same results as you saw in the regression
progress dialog while the model was being calibrated. We do not comment further on this plot.
Score Plot
The plot in the upper left corner is the Scores plot. From the Scores plot we can interpret that, the combination
of two main PCs, PC 1 and PC 2, reflects the variations in the milk and water levels. The milk level increases
from upper left to lower right in the plot, while the water level increases from right to left.
Select Edit - Options and go to the Sample Grouping sheet, where you check Enable Sample Grouping .
Go to Markers Layout and select Value of Variable where you specify Y-variables 1. The score plot now
reveals a clear pattern from lower left to upper right in the plot. You would see the same information in a 2D
scatter loading plot.
Regression Coefficients
The plot in the upper right corner displays a regression coefficients line plot instead of a 2D scatter plot of
loadings and loadings weights (which is default), when models are made from data other than spectral. This
happens because we changed the Data Type to Spectra (see section Read Data File and Define Sets). It is
easier to interpret the regression coefficients plots than loading and loading weights plots when the variables
are functions of another implicit variable, such as wavelength or time.
Use Edit - Options to change the plot layout to bars instead of a curve.
The regression coefficients plot summarizes the relationship between all predictors and a given response.
Task
Take a closer look at the residual variances in the error measures plots.
How to Do It
Activate the Predicted vs Measured plot and select Plot - Variances and RMSEP and go to the X- and
Y-variance sheet, where you specify the following parameters: (Lookup Image C014)
Variables: Remove the number in the X: and Y: boxes. Only the total variances should be plotted
Samples: Both Calibration and Validation
Change the variance from residual to explained by selecting View - Source - Explained Variance
(Lookup Image C015). The upper plot shows that the model describes much of the variance in the X -
variables in the first PCs, while it takes longer time in the lower plot to describe the variance in Y (d ye level).
We are interested in describing Y, therefore we have to include enough PCs in our model to get a high
explained variance for the Y-variable.
Note that the model results are available to you as predefined plots in the Plot menu when you have a result
Viewer active. Activate the Tutor_c Editor and see that the Plot menu changes to general plot options.
Sometimes you close the result Viewer by accident. You can then get the predefined plots back by selecting
Results - Regression and opening the Tutorial C No Outliers result file with the View button. The result
Viewer with the regression overview is launched.
Task
Correct the data for multiplicative scatter effects. Omit variables 1 to 8 in the Set Absorbance as important
variables.
First, we verify the need for MSC by looking at the Scatter Effects plot. This plot is available from a
Statistics model. Select Task - Statistics and specify the following parameters in the Statistics dialog:
Samples: Calibration [28]
Keep out of calculation (samples): 8
Variables: Absorbance [16]
Click OK to make the model and click View in the Progress dialog when it is finished. Then use Plot -
Statistics and select the Scatter sheet. Click the All button and then OK to make the plot.
The plot has to be scaled so the origin is shown. Do this with View - Scaling - Min/Max and enter 0 in both
From fields (Lookup Image C016). Click OK.
The regression lines intercept roughly in the origin, which indicates no need to correct for offset (Lookup
Image C017). But the regression lines have different slopes, which calls for MSC using common
amplification.
Close the Viewer before you continue.
Select Modify - Transform - MSC. Specify the following parameters in the Multiplicative Scatter
Correction dialog: (Lookup Image C018)
Samples: Calibration [28]
Variables: Absorbance [16]
Function: Common Amplification
Test Samples: 8
Omit Important Variables: 1-7
Test samples are not used to find the correction factors we want to find now and use in the MSC. Sample 8 is
an outlier and will give slightly inferior results if it is used. We therefore include it in the Test Samples to
avoid that it is used.
Variables 1-8 are omitted as important because the light absorption of these variables vary with the dye level,
while wavelengths 9 to 16 (the water absorption peak) is independent of the concentration of dye. The
difference in these wavelengths is instead caused by the general light-scatter due to milk addition. It is
important that only wavelengths with no chemical information is used to find the correction factors.
Save the MSC model with the name Tutorial C MSC Model. Save the corrected data now displayed in the
Editor with the name Tutorial C MSCorrected.
Look at the corrected data by launching a general Viewer (Results - General View ) and selecting Plot -
Line. Select the data file you just saved with the corrected data in the Line Plot dialog. Plot Samples 20 -
23 using the Variables Set Absorbance. (Lookup Image C019)
Now we are going to plot the original data, but this time not from the original data file. Select Plot - Line and
find the result file Tutorial C No Outliers (Lookup Image C021). You see that the raw data from which the
model is made is saved together with the model matrices. This time you do not have to specify a Variable Set
because the raw data used for the model is only the X-variables from that Set.
Plot samples 20 - 23 from Xraw (Lookup Image C022). You see that the MSCorrected data are different
from the original. The interference and light scatter effects have successfully been corrected for.
Task
Make a PLS1 model with the same model parameters as the model Tutorial C No Outliers.
How to Do It
Activate the Editor with the corrected data. Select Task - Regression and specify the following parameters
in the Regression dialog:
Method: PLS1
Samples: Calibration [28]
X-variables: Absorbance [16]
Y-variables: Dye Level [1]
Weights: All 1.0 in X and Y
Validation Method: Leverage Correction
Num PCs: 10
Click OK to make the model. See how the residual variance decreases faster for each PC in this model
compared to the previous models. The MSCorrection has improved the model.
Click Close and save the model under the name Tutorial C MSCorrected.
How to Do It
Select Results - General View and then Plot - Line. Click the Browse button against Source and find the
result file Tutorial C. The matrix we are interested in is called ResYValTot (Lookup Image C023).
Select Edit - Add Plot and plot the same matrix for the model Tutorial C No Outliers and Tutorial C
MSCorrected.
The plot shows the validated residual Y-variance for the three models (Lookup Image C024). From this plot
we find that the minimum square error is approximately:
for Tutorial C MSCorrected using 6 PCs,
for Tutorial C No Outliers using 8 PCs
for Tutorial C using 3 PCs
Tutorial C MSCorrected with six PCs gives the lowest estimate for the residual Y-variance. Predictions done
by this model using six PCs therefore give the predictions with the lowest prediction error.
Note again how the Results menu is your way to look at results from older models.
Task
Finally, let us see how larger the error in ml dye we have to expect in future predictions; Root Mean Square
Error of Prediction.
How to Do It
Activate the regression overview Viewer. Select Plot - Variance and RMSEP and go to the RMSE sheet.
Double-click the screen preview to display the plot in the whole Viewer. De-select the calibration samples box
and tick the validation samples (RMSEP) instead (Lookup Image C025).
You see that the shape of the curve is exactly that of the residual variance, but the values have changed. The
plot says that predictions done with this model and using six PCs will have an average prediction error of 0.98.
Task
MSCorrect the prediction samples.
How to Do It
Go back to your data table Tutorial C MSCorrected containing the MSCorrected training samples. In order to
correct the prediction samples, use Modify - Transform - MSC. Specify the following parameters in the
Multiplicative Scatter Correction dialog: (Lookup Image C026)
Samples: Prediction [14]
Variables: Absorbance [16]
Use Existing MSC Model: Tutorial C MSC Model
Click OK and save the MSC coefficients. The prediction samples are then changed according to the
MSCorrection you found previously.
Task
Predict the dye level of these samples.
How to Do It
Select Task - Predict. Specify the following parameters in the Prediction dialog: (Lookup Image C027)
Samples: Prediction [14]
X-variables: Absorbance [16]
Y-reference: None
Model name: Tutorial C MSCorrected
Number of Components: 6
Click View after the prediction is done. The prediction overview plot appears where the predicted values is
shown together with the deviations (Lookup Image C028). Large deviations indicate that the predictions
cannot be trusted.
Read Data: File - Open or File - Import. You can import data from many instruments - directly or via
e.g. JCAMP-DX or ASCII. Many instruments also write U5 data files or Unsc-ASCII data files.
View and Prepare Data: Look at the Editor, define sets. Select some samples and Plot - Line or Matrix
to get an overview of the spectra (data plot). Histograms of Y-variables ARE useful too, as well as 3D
scatter plots of constituents if there are several.
Pre-process: Modify - Transform allows you to do spectroscopic transformations, derivation,
smoothing, etc. Modify - Reduce (Average) may be useful too. If you have a data plot of your spectra
open, you will see how the spectra change on the fly.
Statistics: Task - Statistics may be useful. The Statistics plot Scatter reveals scatter problems.
Select Samples: If you need to throw away data to get a more balanced data set you may make a PCA of
the spectra or the constituents. From the Score plot, use Edit - Mark and mark samples that span all the
important components (samples far away from the origin, but not extremes.) Select Task - Extract
Marked and save as a new file.
Reduce Spectra: If you need to use fewer wavelengths, or perhaps only a range of the spectra, select
Modify - Edit Set - Add - Special intervals - Select Every n variables - Update. You can now
change the starting point in the Interval field. Click OK twice to save the Set, e.g. under the Name New
Set. Then, choose Edit - Select Variables - Set and select the NewSet. The marked variables can now
be deleted, and you can save the new data file under a new name.
Make First Calibration Model and Look for Outliers: Task - Regression - PLS2 gives a nice
overview if you have several constituents. Otherwise use PLS1. View the results, especially Variance,
Scores and Predicted vs Measured. Plotting results, use Edit - Mark (also available under right mouse
button) to mark suspicious samples in the score plots. Plot - Sample outliers and XY Relation outliers
are useful to investigate them. You will see that the samples are marked in those plots too (and all other
sample based plots).
View - Raw data produces a link to the raw data table, high-lighting the marked samples - or vice versa!
Mark in the raw data table and see them marked in the corresponding plots.
Refine the Model: Task - Recalculate without Marked, gives a new model with the marked samples
removed. Compare results, and look for more outliers. Repeat if necessary.
Study the Model in Detail: Plot - Variances and RMSEP - RMSE/Important variables/Predicted
versus measured are useful tools. View - Trend lines - Regression line and View- Plot
statistics are useful too. Scores plots using Edit - Options - Sample grouping (also under right
mouse button) is excellent for investigating patterns.
Delete Wavelengths: From the Important variables plot you can Edit - Mark ranges in spectra that are
not important (potentially noisy). Task - Recalculate without Marked gives you a new model based
on fewer wavelengths, that is possibly more rugged and with a smaller prediction error.
Validation: Before you finish, make sure the model is properly validated using a suitable cross validation
or test set. Always keep replicates of the same samples in the same segment.
Additional Tools: Statistics on the B-vector is helpful to determine the number of PCs. Use Results -
General view - Plot - Line to plot the B-diagnosis for the model (Statistics, vector 1-6). Vector no 4 and
5 are especially useful (Bsum and SquSum B).
File - Import - Unscrambler results lets you see the numerical values of all results, e.g. B (the
regression coefficients) or ExtraVal, which contains information about the need for slope and bias
adjustment. Use Help for details.
A standard method for the synthesis of enamine from a ketone gave some problems, and a modified procedure
was investigated. A first series of experiments gave two important results:
A new procedure was built up, which shortened reaction time considerably;
It was shown that the optimal operational conditions were highly dependent on the structure of the original
ketone.
Thus, a new investigation had to be conducted to study the specific case of the formation of morpholine
enamine from methyl isobutyl ketone. It was decided to adopt a 2-step strategy:
First, at a screening stage, study the main effects of 4 factors (relative amounts of the reagents, stirring rate
and reaction temperature) and their possible interactions;
Then, conduct an optimization investigation with a reduced number of factors.
Task
Select a screening design which requires a maximum of 11 experiments that will make it possible to estimate
all main effects and detect the existence of 2-factor interactions.
Note: With 4 design variables, you need a fractional factorial design to keep the number of experiments lower
4
than 16 (2 ).
How to Do It
Choose File - New Design to launch the Design Wizard, where you can generate a designed data table.
In the Design Wizard - Select Method to Use dialog, choose to build the design From Scratch and Click
Next.
This launches the Design Wizard - Select Design Type dialog, where you select Create Fractional
Factorial Design and proceed by clicking Next.
Do this by clicking the New button. This launches the Add Design Variable dialog (Lookup Image
D001), where you must enter the name of the new variable (e.g. TiCl4, Morpholine, Temperature and
Stirring), select Continuous, and enter the low and high levels as stated above. Validate by clicking OK and
enter the next variable by Clicking New again.
Note: In order to be allowed to specify center samples, you will have to define Stirring rate as a continuous
variable; you can give it the arbitrary levels -1 and 1, where -1 stands for no stirring and 1 stands for high
stirring.
Click Next to launch the Design Details dialog. Keep Number of Replicates to 1, and add 3 Center
Samples (Lookup Image D002).
Once you are satisfied with your design specifications, click Finish to exit. The generated design is
automatically displayed on screen (Lookup Image D003).
You can use the View menu to toggle between display options. Try Sample Names and Point Names,
Standard Sample Sequence and Experiment Sample Sequence (randomized order).
It should now be safe to store your new data table into a file, using File - Save As; give it a name, e.g. Enam
FRD. Note that you should not overwrite the existing file Enam_frd. You will need this file later in the
tutorial.
Task
Run an Analysis of Effects.
How to Do It
First, you should enter the response values. Since this has already been done, you just need to read the
complete file. Use File - Open, and select from the Designed Data list in the Open File dialog the file
named Enam_frd, which already contains the response values.
Task
Interpret the results of the Analysis of Effects that you have just run.
How to Do It
The Effects Overview plot shows which effects are significant (Lookup Image D005). By default, the
Significance Testing Method is Center.
Select Plot - Effects and choose COSCIND as Significance Testing Method on the Overview sheet in the
Effects dialog. Click OK to display the new plot. (Lookup Image D006)
You can see that three effects are considered to be significant: Main effect TiCl4 (A), Interaction AB or CD,
and Main effect Morpholine (B).
Select Window - Copy to - 2 : this copies the Effects Overview plot into sub-view 2 (the upper sub-view in
a system of two). Activate the lower sub-view (which is currently empty), and use Plot - Effects. On the
Response Details sheet (Lookup Image D007), select Normal Probability in the Plot type field, and
remove the option Include Table.
The normal probability plot of the effects (Lookup Image D008) confirms the results of the Effects
Overview: the effect of Morpholine (B) is clearly very significant, and AB=CD and TiCl4 (A) are also likely
to be significant.
Task
Check the data for non-linearities.
Click in the lower plot and use Plot - Statistics. On the Compressed sheet (Lookup Image D010), go to
the Sample Groups field, where you specify that you wish to plot groups containing Design and Center
samples. Validate your choices with OK.
The lower plot (Lookup Image D011) now displays the mean and standard deviation of all Design samples
compared to that of the Center samples only.
You can see that the standard deviation for the center samples is about half the overall standard deviation. This
indicates some lack of reproducibility in the center samples; this is why most of the effects observed in the
Analysis of Effects were not found significant according to the Center Significance testing method. If you go
back to the Editor and study the Yield values, you will notice that center sample Cent-c has a very different
value from Cent-a and -b; maybe that experiment was not performed correctly.
The other important information conveyed by the plot is that there is a strong non-linearity in the actual
relationship between Yield and the design variables: The mean value for the center samples is much higher
than for the overall design.
How to Do It
Choose File - New Design to launch the Design Wizard, where you will be able to generate a designed
data table. In the Select Method to Use dialog, choose to build the design from scratch and Click Next.
This launches the Select Design Type dialog, where you select Optimization designs: Central
Composite and validate by Clicking Next.
In the Define Design Variables dialog, you will specify the variables TiCl4 and Morpholine with the same
ranges of variation as before (resp. 0.6 0.9 and 3.7 7.3), as follows:
Click New to launch the Add Design Variable dialog, where you must enter the name of the new variable,
select Continuous, and enter the Low and High levels. Validate by Clicking OK.
Enter the next variable by Clicking New again.
When both variables have been defined, check that the Define Design Variables dialog indicates the correct
Star Points Distance from Center, namely 1.41.
After all design variables have been defined, click Next to enter the Define Non-design Variables dialog,
where you click New to define the non-designed response variable Yield in the Add Non-designed
Variable dialog.
Once you are satisfied with your variable definitions, use Next to get into the Design Details dialog, where
you set the Number of Replicates to 1 and the Number of Center Samples to 5.
You need not make any further specification in the next dialog, Randomization Details. Click Next again
to launch the Last Checks dialog, where you make sure that all your design parameters have the correct
values. The design should include a total of 13 experiments. Otherwise, use Back to go to the appropriate
dialog and make the necessary corrections.
Once you are satisfied with your design specifications, use Finish to exit. The generated design is
automatically displayed on screen. Save your design for further use, e.g. with the name Enam CCD.
Task
Run a Response Surface Analysis.
How to Do It
Normally, you would first have to enter the response values, but this has already been done. From the
Designed Data list in the Open File dialog, open the file named Enam_ccd, which already contains the
response values.
Choose Task - Response Surface. In the Response Surface dialog (Lookup Image D012), make the
following selections:
Samples: Default
X-var: Design Vars + Int + Squ (2+3)
Y-variables: Default
When the computations are done, click View to study the results. Do not forget to save the file before you start
interpreting the results!
Task
Interpret the results from the Response Surface Analysis.
How to Do It
The viewer displays a Response Surface Overview, which consists of 4 plots (Lookup Image D013):
Analysis of Variance, Residuals, Response Surface visualized as a contour plot, and Response Surface
visualized as a landscape plot.
First, study the ANOVA results. Use Window - Copy To - 1 to copy the upper left plot to sub-view 1
(which covers the whole Viewer window). You can adjust the width of the various columns of the table if
necessary (Lookup Image D014). Study in turn: Summary, Model Check, Variables, and Lack of Fit.
The Summary shows that the model is globally significant, so we can go on with the interpretation.
The Model Check indicates that the quadratic part of the model is significant, which shows that the
interaction and square effects included in the model are useful.
The ANOVA table for variables displays the values of the b -coefficients, and their significance. You see
that the most significant coefficients are for the linear and quadratic effects of Morpholine; the quadratic
effect of TiCl4 is close to the 0.05 significance level. That section of the table also tells you that the
maximum point is reached for TiCl4=0.835 and Morpholine= 6.504; the information displayed on top of
the table shows a Predicted Max Point Value of 96.747.
Task
Check the residuals from the Response Surface Analysis.
How to Do It
The upper right sub-view (if necessary, use Window - Go To - 5) in the Response Surface Overview plot
shows a Normal Probability plot of the residuals. This plot can be used to detect any outliers. Here, you see
that the residuals form two groups (positive residuals and negative ones). Apart from that, they lie roughly
along a straight line, and no extreme residual is to be found outside that line. This means that there is no
apparent outlier.
From that window, go to Plot - Residuals and select Y-Residuals vs Predicted Y on the General sheet
(Lookup Image D015). Try alternatively the two options Residuals (which shows the raw residuals) and
Studentized (which shows transformed residuals that can be compared to a Student distribution).
In the Studentized residuals plot (Lookup Image D016), all values are within the (-2;+2) range, which
confirms that there are no outliers. Furthermore, there is no clear pattern in the residuals, so nothing seems to
be wrong with the model.
Select Plot - Predicted vs Measured and choose Predicted vs Measured. If necessary, use View -
Trend Lines - Regression Line to display the regression line (blue), and View - Trend Lines - Target
Line to visualize the y=x line (black) (Lookup Image D017).
You can see how the design samples are spread around the regression line; in particular, the Center samples to
the right of the plot show an important spread. This is why so few effects in the model are very significant:
There is quite a large amount of experimental variability.
Task
Interpret the response surface plots.
How to Do It
The landscape plot displayed in the lower right quadrant shows you the shape of the response surface: a kind of
round hill with a maximum somewhere between the center and maximum values of the design variables.
That plot is not precise enough to spot the coordinates of the maximum; the contour plot displayed left
(Lookup Image D018) is better suited for that purpose. For instance, you can change the scaling to zoom
around the optimum, so as to locate its coordinates more accurately. Check that they match what is displayed
in the ANOVA table.
Finally, you may also have noticed that the Predicted Max Point Value is smaller than several of the actually
observed Yield values (sample Cube004a for instance has a Yield of 98.7). This is not paradoxical, since the
model smoothes the observed values; those high observed values might not be reproduced if you performed the
same experiments again.
Since there was no apparent lack of fit, no outliers, and the residuals showed no clear pattern, the model could
be considered valid and its results interpreted more thoroughly.
The response surface showed an optimum predicted Yield of 96.747 for TiCl4=0.835 and Morpholine= 6.504;
the predicted Yield is larger than 95 in the neighboring area, so that even small deviations from the optimal
settings of the two variables will give quite acceptable results.
The training samples are divided into three Sample Sets, each containing 25 samples. The three Sets are:
Setosa, Versicolor, and Virginica. The Sample Set Testing will later be used to test the classification.
Four variables are measured; Sepal length, Sepal width, Petal length, and Petal width. The measurements
are given in centimeters.
Note: You will find the illustrations for this tutorial (Image E001, etc) at the end of the document.
Task
Insert a category variable into the Tutor_e data table.
Enter the right type for each of the 75 test samples. A simple way to do this is as follows:
Click on the first cell containing m. From the keyboard, type in m (which activates the entry mode on the
cell) then v (initial of Versicolor), followed by <Enter>. You are now positioned in the next cell; apply the
same procedure, until you reach the first Setosa sample. There, type in m and s followed by <Enter>. Go
on like this, until you reach the first Virginica sample. There, type in m, v and v (we need to type in
v twice to activate the second level which has v as initial).
Save the data table once you have completed this task.
Task
Make a PCA model of all calibration samples.
How to Do It
Use Task - PCA and select the following parameters:
Samples: Training
Variables: Measurements
Weights: 1/SDev
Validation Method: Leverage correction
Number of PCs: 4
We assume that you are familiar with making models by now. Refer to one of the previous tutorials if you have
trouble finding your way in the PCA dialog.
You see that there are few outlier warnings and most of the variance is explained by three PCs. Click View to
look at the modeling results.
Activate the score plot and select Edit - Options. Enable sample grouping and select Value of Variable in
the Group By field. Make sure Leveled Variable 1 is selected. Click OK . (Lookup Image E003) You can
see the three groups in different colors; one very distinct (Setosa) and two that are not so well separated
(Versicolor and Virginica). This indicates that it may be difficult to differentiate Versicolor from Virginica.
Task
Make PCA models for the three classes Setosa, Versicolor, and Virginica.
How to Do It
Go back to the Editor window containing your re-formatted data table. Select Task - PCA and make a model
with the following parameters:
Samples: Setosa
Variables: Measurements
Weights: 1/SDev
Validation: Leverage correction
Number of PCs: 4
When the model is computed, close the PCA Progress dialog and save the class model with name Setosa.
Repeat the procedure successively on Sample Sets Versicolor and Virginica, also saving each new PCA
model.
Task
Assign the Sample Set Testing to the classes Setosa, Versicolor, and Virginica.
How to Do It
Select Task - Classify. Use the following parameters: (Lookup Image E004)
Make sure that Centered Models is checked. Add the three models Setosa, Versicolor, and Virginica.
The suggested number of PCs to use is 3 for all models; keep that default (it is based on the variance curve for
each model). If you are curious, you may select a model in the list and click Variance to display the
calibration and validation variances for that model.
Click OK to start the classification.
Task
Interpret the classification results displayed in a table plot.
How to Do It
Click View when the classification is finished. (Lookup Image E005)
A table plot is displayed, called Classification Table. There are three columns: one for each class model.
Samples recognized as members of a class (they are within the limits on sample-to-model distance and
leverage) have a star * in the corresponding column.
The significance level can be toggled with the Significance option, which is available as a drop-down menu
from the menu bar.
At the 5% significance level, we can see that all but three samples (false negatives: virg1,virg36,virg42) are
recognized by their rightful class model.
However, some samples are classified as belonging to two classes (false positives): 12 Versicolor samples are
also classified as Virginica, while 6 Virginica samples are also classified as Versicolor. Only the Setosa
samples are 100% correctly classified (no false positives, no false negatives).
If you tune up the significance limit to 25%, this reduces the number of false positives but also increases the
number of false negativse (vers41 and Virg35 come in addition).
How to Do It
Select Plot - Classification and choose the Coomans plot for models Virginica and Versicolor. (Lookup
Image E006)
This plot displays the sample-to-model distance for each sample to two models. The newly classified samples
(from sample set Testing) are displayed in green color, while the calibration samples for the two models are
displayed in blue and red. (Lookup Image E007)
The Coomans plot for the classes Virginica and Versicolor shows that all Setosa samples are far away from
the Virginica model (they appear far to the right). However, we can see that many Virginica and Versicolor
samples are within the distance limits for both models. This suggests some classification problems.
Task
Look at the Si vs. Hi plots.
How to Do It
Select Plot - Classification and choose Si vs. Hi for model Versicolor. Before you start interpreting the plot,
turn on Sample Grouping in the Options dialog and choose Name as Markers Layout, with length 2 (tick
only the first two boxes in the Name field). (Lookup Image E008) The plot is much easier to interpret: iris
type appears clearly with the initials Se, Ve, Vi in three different colors.
Some Virginica samples are classified as belonging to the class Versicolor, but most samples that are not
Versicolor are outside the lower left quadrant. The reason for the difficult classification between Versicolor
and Virginica is that the samples are overlapping in the score plot. They are very similar with respect to the
width and length of the sepal and petal.
Task
Look at the Model Distance plots.
This plot allows you to compare different models. A distance larger than three indicates good class separation.
The models are different.
It is clear from this plot that the Setosa model is different from the Versicolor, while the distance to Virginica
is smaller.
Task
Look at the Discrimination Power plots.
How to Do It
Select Plot - Classification and choose the Discrimination Power for Versicolor projected onto the
Setosa model.
This plot tells which of the variables that are most useful in describing the difference between the two types of
iris. (Lookup Image E010) We can see that variables Sepal Length and Sepal Width have high
discrimination powers (7.5 8) while it is lower for Petal length and Petal Width (4.5 5).
Do the same for Versicolor onto Virginica: all variables have discrimination powers lower than 5. This is
obviously not enough.
Task
Look at the Modeling Power plots.
How to Do It
Select Plot - Classification and choose the Modeling Power for Versicolor.
Variables with a modeling power near one are important for the model. A rule of thumb says that variables
with modeling power less than 0.3 are of little importance for the model.
The plot tells us that all variables have a modeling power larger than 0.3, which means that all variables are
important for describing the model. None of the variables should be deleted from the modeling. The only
chance to improve on the classification between Versicolor and Virginica is to measure some additional
variables.
In this tutorial we show you some of the capabilities The Unscrambler has to interact with other programs
under the Windows operating system. The main focus here is how The Unscrambler is used in conjunction
with other software.
The water content of wheat samples was measured and is the response variable in the data.
Note: You will find the illustrations for this tutorial (Image F001, etc) at the end of the document.
Task
Import the ASCII file Tutor_F.txt.
This launches the Import ASCII dialog, where you specify what the ASCII file looks like (Lookup Image
F001). Use the options displayed in the dialog. Note that the first row in the data file contains variable names
and the first column contains sample names.
Click OK to import the file and the data are read into an Editor.
Task
Import the data file Tutor_F.xls from Excel.
How to Do It
There are two procedures. Use Procedure I if you have Excel or Lotus installed on your Personal Computer or
Procedure II if you do not have a spreadsheet program that can read the file Tutor_F.xls. You only need to
follow one of the procedures.
You are now going to drag the selected data area to the first variable in the Editor in The Unscrambler. Hold
down <Ctrl> and click on one of the sides of the marked area; the cursor changes and you see a + sign on top
of the cursor. Drag the data from Excel to the Editor in The Unscrambler that contains the wheat data. Note
how a frame marks the data area that is covered by the data you copy. Let go of the left mouse button when
you see that the frame covers the first variable completely, i.e. from sample 1 and down.
The dialog Select Drop Method appears (Lookup Image F002). Select Insert as 1 new column. Import
the sample and variable names from Excel the same way.
Find the file Tutor_F.xls in the Import dialog and Click Import. This launches the Import Worksheet
dialog, where you specify the options (Lookup Image F004). The Excel file is prepared by defining Range
Select Water Content for Range names against Data and specify A2:A56 in the Sheet range Delete the
entries A1:A1 in the sheet range for Sample names when you import data without names. In the Sheet
range field against Variable names specify A1:A1. Then click OK.
Task
Insert a category variable to group the samples into three categories, depending on the water content level.
How to Do It
Place the cursor in the first column and select Edit - Insert - Category variable. This launches the
Category Variable Wizard - Enter Variable Name and Choosing Method dialog (Lookup Image
F005). Enter a name for the variable in the first dialog, select I want to specify the levels manually under
Method and Click Next to enter the next dialog, where you specify the levels. Add three levels: Low (Water <
13.0), Medium (13.0 > Water >15.0), and High (15.0 > Water).
Enter the category values according to the distribution above. Double click the category variable cell and select
the drop-down list. A list of the valid levels is displayed. A faster way to enter the value is to double-click the
cell and Click the first character of the desired level. Click the character repeatedly if many levels begin with
the same character.
The name of the category variable is written in blue text to distinguish this kind of variable from the ordinary
ones.
Task
Define the Variable Sets NIR Spectra and Water Content . Change the data table properties to Spectra.
How to Do It
In the Editor, mark variable number two which now contains the water content of the wheat samples. Select
Modify - Edit Set and make sure that Variable Sets is selected. Click Add to define the Set Water
Content from current Editor. Define another Variable Set NIR Spectra using variables 3 22. Change the
Date Type to Spectra for both. We do not need to define a Sample Set because All Samples is automatically
defined as a Set.
Task
Make a PLS1 model from NIR spectra to the Water Content.
How to Do It
Select Task - Regression and specify the following parameters in the Regression dialog:
Method: PLS1
Samples: All Samples [55]
X-variables: NIR Spectra [20]
Y-variables: Water Content [1]
Weights: All 1.0
Validation method: Leverage Correction
Number of components: 5
You see how the model describes more and more of the water content.
Task
Look at the model results.
How to Do It
Click View in the dialog when the model is made. The following plot appears: (Lookup Image F006)
The residual Y-variance goes down nicely and is close to 0 after two PCs. The Predicted vs Measured plot
looks OK. The fit is quite good. From the regression Coefficients we see that there is a distinct peak around
1940.
Task
Transfer plots from The Unscrambler into Word using Copy and Paste.
How to Do It
Open Word. Select the score plot in the regression overview plot and select Edit - Copy. Go to Word and
place the cursor where you want the plot to appear. Select Edit - Paste. The score plot is now inserted as a
graphical object in your Word document.
The plot can be transferred either as a bitmap or a picture file. The picture file option will usually give better
quality of the plot, but also larger Word files. You may want to use the bitmap option if you transfer plots with
many plot objects.
You choose between the two options from File - System Setup and the Viewer tab.
Task
Export an ASCII-MOD file.
How to Do It
Open Results - Regression and select the PLS model you made. The ordinary thing to do would be to open
the regression overview plot and look at the different predefined plots for this model. But now we take a look
at the numerical results in the model that is available.
Click Variance to see the variances for different PCs in the model (Lookup Image F007). Scroll through
the information field to look at properties of your model.
Take a look at the ASCII file that is generated. The format of the file is described in chapter Technical
References, available as .PDF file from CAMOs web site www.camo.com/TheUnscrambler/Appendices .
How to Do It
Activate the Wheat Editor and select File - Export. Make sure that the File Format is set to Flat ASCII /
Wide ASCII before you Click OK. Specify the ASCII file as suggested in the Export ASCII dialog (Lookup
Image F008).
Wide ASCII means that each sample is written as a row in the ASCII file with a paragraph mark to tell the end
of the row. The sample and variable names are written as the first column and first row in the ASCII file.
Open the file in an ASCII editor and look at the file. All names are enclosed in double quotes.
A fruit punch is to be prepared by blending three types of fruit juice: watermelon, pineapple and orange. The
purpose of the manufacturer is to use their large supplies of watermelons by introducing watermelon juice, of
little value by itself, into a blend of fruit juices. Therefore, the fruit punch has to contain a substantial amount
of watermelon - at least 30% of the total. Pineapple and orange have been selected as the other components of
the mixture, since juices from these fruits are easy to get and relatively inexpensive.
The manufacturer decides to use experimental design to find out which combination of those three ingredients
maximizes consumer acceptance of the taste of the punch.
The responses of interest for the manufacturer are detailed in the table below.
Consumer acceptance is the most important response, but if the analysis of the results should reveal two areas
with equally high consumer acceptance, the mixture with lower production cost will be preferred. The sensory
descriptors are here to provide an explanation for consumer acceptance and directions for further improvement
(for instance by adding sugar or sweetener if the consumers seem to prefer sweeter mixtures).
Note: You will find the illustrations for this tutorial (Image G001, etc) at the end of the document.
Task
Build a simplex centroid design with the help of the Design Wizard.
How to Do It
Use File - New Design to start the Design Wizard. The first dialog is Design Wizard - Select Method to
Use, where you select option From Scratch and click Next to proceed.
You enter dialog Design Wizard - Select Design Type, where you select option Mixture Design
(Lookup Image G001); you can see that the contents of the Information field at the bottom of the dialog
box are updated and give you some advice about the selected type of design. For instance, the last sentence
states that for optimization purposes we should add interactions and sq uares to our model. We will remember
that!
Click Next; this starts the Design Wizard - Define Mixture Variables dialog where we will create a new
variable for each of our fruit juices. Click New to access the Add Design Variable dialog. Type in the
details of the first fruit juice (Lookup Image G002):
Name: Watermelon
Lower Bound: 30 %
Upper Bound: 100 %
Click OK to accept your choices and go back to the Design Wizard - Define Mixture Variables dialog.
Apply the same procedure to specify the other fruit juices (Pineapple and Orange, varying from 0% to 70%).
In the next dialog, you have the possibility to define Process Variables (i.e. other design v ariables which are
not part of the mixture). As we do not need any of those, just click Next.
You are now in the Design Wizard - Define Non-design Variables dialog, where you should specify
your responses. Click New to access the Add Non-design Variable dialog; type in the name of the
response variable. Do that for each response: Accept, Cost, Sweet, Bitter, Fruity. Click Next when all five
responses are specified.
The next dialog, called Design Wizard - Define Model, allows you to add terms to a default linear model.
As you can see (Lookup Image G004), the only available choice in our case is Mixture Interactions and
Squares. Tick that box and proceed with Next.
This leads you to the Design Wizard - Define Design Purpose dialog, where the system detects that your
purpose must be Optimization since you have added interactions and squares. Click Next to proceed.
The next dialog is Design Wizard - Design Type (Mixture). It recommends a Simplex-Centroid Design
with Interior Points (Lookup Image G005), and we accept that choice. Click Next to proceed.
In the next dialog called Design Wizard - Design Details, we accept the default choice of 1 Replicate and
3 Center Samples and click Next. In the Design Wizard - Randomization Details (General) dialog,
just click Next to proceed.
In the Design Wizard - Last Checks dialog, check that all details of the design are correct (Lookup
Image G006). Should anything be different from what you were supposed to have chosen, go back as many
dialogs as necessary with the Back button, then move forward again.
Click Preview to have a look at the randomized list of experiments. If you are not happy with the
randomization, click OK to go back to the main dialog then Re-randomize to start a new randomization (then
click Preview again to check the result). If you wish to print out the randomized list of experiments, click Lab
Report then OK.
Once you have made all necessary checks and corrections, click Finish; this displays an information dialog
(Lookup Image G007) (click OK after reading its contents) and opens the new designed data table into the
Editor (Lookup Image G008). Be aware that if you need to do any further corrections after that, you will
have to use command File - Duplicate - As Modified Design to access the Design Wizard once again.
Save the new table with File - Save (you may call it Fruit Punch empty for instance).
Task
Import response values from Excel into your designed data table.
Click OK after double-checking your choices: you are now back in your data table, with the response values
filled in.
Select File - Save As and give the table a new name (for instance Fruit Punch).
Task
Run Statistics, display the results as plots, check response variations and look for abnormal values.
How to Do It
With your Fruit Punch data table displayed in the Editor, select Task - Statistics.
Choose the following settings in the Statistics dialog:
Sample Set: All Samples (12)
Variable Set: Cont Non-Design Vars (5)
Calculate Cross-Correlation: not selected
then click OK to start the computations.
Click View in the Statistics Progress dialog: the Statistics results are displayed as two plots (Lookup
Image G011). The upper plot is Percentiles , the lower Mean and SDev.
Save the results file as Fruit Punch Stats.
Now we are going to display the same two plots for Design samples and Center samples, in order to compare
variation over the whole design to variation over the replicated Center samples. If the experiments have been
performed correctly, there should be much more variation among design points than among the three replicates
of the Center sample.
Select Plot - Statistics; in the Statistics dialog (Lookup Image G012), look at the Compressed sheet
and focus on the Sample Groups field. Design should already be selected; select Center as well (you can
see that the plot preview is updated as a result, now showing several groups in different colors) and click OK.
The Percentiles and Mean and SDev plots are now displayed for two groups (Lookup Image G013). The
bars or boxes for Design samples appear in blue and for Center samples, in red (unless you are using your own
color scheme).
On the Percentiles plot, you can see that there is much more variation among design points than among the
Center samples. This also appears clearly on the Mean and SDev plot: for instance, if you click successively
on the blue and red bars for variable Accept, you will see that SDev is 0.75 for Design samples and only 0.25
for Center samples.
Conclusions:
The ranges of variation of the 5 responses are as expected.
There is no abnormal value for any response.
There is much more variation over the whole design than among the Center samples, which suggests that
the experiments were performed correctly.
Task
Build a PLS model of the response variations, validate it with cross validation and uncertainty testing. View
the results and check the model.
Method PLS2
Sample Set All Samples (12)
X-Variables Design Def Model (3+6)
Weights for X-vars All 1/SDev
Y-Variables Cont Non-Design Vars (5)
Weights for Y-vars All 1/SDev
Validation Method Cross Validation
Uncertainty test Selected
Model Size Full
Num PCs 5
Issue Warnings Selected
Click OK, then have a look at the PLS2 Regression Progress dialog (Lookup Image G014). The
model needs 4 PCs, and even then the Y-validation variance is quite high (0.50). We can also see that several
warnings have been issued, especially for PC 0 (that is to say, at the Centering stage of the computations) and
PC 1.
This suggests some problems in the data maybe an outlier? We will have to investigate.
Click View to access the Viewer where the regression results are displayed.
Note: Since this is a mixture model, all terms of the model are linked. Therefore it would be meaningless to
remove the non-significant effects from the model. This is why we do not mark the non-significant
coefficients nor recalculate the model without the marked variables, as we would have done in another context.
From the X-variables sheet (Lookup Image G021), choose the following:
Axis 1 Watermelon(A)
Axis 2 Pineapple(B)
Axis 3 Orange(C)
Double-check your choices then click OK. The Response Surface plot for variable Accept is now displayed in
the upper left sub-view.
Do the same in the other three sub-views with responses Sweet, Bitter and Fruity (Lookup Image G022).
Have a look at the four response surfaces and interpret them.
You may copy one of the plots to sub-view 1 (with Window - Copy To - 1) so as to study it in more detail.
Let us do so with response Accept (Lookup Image G023). We can see that consumer acceptance is low
(blue curves) for mixtures with high Watermelon or high Pineapple contents.
Maximum acceptance is reached for a fruit punch with relatively high Orange and low Pineapple. By
clicking on that point we dan display its coordinates (A= 38.75, B= 16.04, C= 45.21) and the Accept value
(3.76).
Conclusions:
With the help of the Y-variance curve, the Influence plot and the Outlier List, we have found an error in
the data.
Once the punching error has been corrected, the PLS2 model has good quality (high explained Calibration
and Validation Y-variance).
The Correlation Loadings show the underlying logic in response variations.
The Regression Coefficients have large uncertainties for response Accept, but are better for the sensory
responses.
The Response Surface plots show maximum consumer acceptance for a fruit punch with about 39%
Watermelon, 16% Pineapple and 45% Orange.
Fluorescence spectroscopy is able to distinguish similar molecules and can discriminate identical molecules in
different chemical environments. This is due to the possibility to scan excitation spectra at specified emission
wavelengths and to scan emission spectra at specified excitation wavelengths (EEM -scans). This procedure
results in 3-D graphs of the fluorescence intensity with respect to different excitation and emission
wavelengths. But the EEM data are strongly intercorrelated and difficult to interpret. Standard unfolding
methods often give unsatisfactory results. We will use a three-way analysis approach to overcome this
problem.
Severity (Y Data)
The Y data is found in table Tutor_h_Y2D, consisting of 32 rows for the 32 woodchip samples and one
column, Severity.
Severity of steaming is a measure reflecting the duration and temperature of steam treatment. The spruce and
beech samples were treated with steam at temperatures from 160C to 220C. The Severity values range from
1.7 to 3.5.
Note: You will find the illustrations for this tutorial (Image H001, etc) at the end of the document.
Task
Toggle 3D data layouts.
How to Do It
Open the data file Tutor_h_X3D by selecting File - Open. It is a file of type 3D Data. (Lookup Image
H001)
The table opens in the 3D Editor. It is a table of OV 2 layout (1 object mode, 2 variable modes), therefore its
column numbers are two-fold. For example, column 1:6 corresponds to primary variable number 1 (Excitation
wavelength 250 nm) and secondary variables number 6 (Emission wavelength 350 nm). (Lookup Image
H002)
2
Toggle the layout several times (Ctrl+3) until you are back to an OV table of size 32 x (66 x 31), that is to say
32 samples, 66 Primary Variables and 31 Secondary variables. The size of the table is shown at the bottom
right corner of the Editor. (Lookup Image H003)
Task
Study the raw data by plotting the fluorescence spectra of a few wood samples.
How to Do It
Go to menu Plot - Matrix 3-D and select sample 13, BFFi (Beech, Fresh wood, Fine grinding). The
excitation-emission spectrum for this sample is displayed in the Viewer. (Lookup Image H004)
You may use the Rotate option ( or View - Rotate) to view the spectral landscape from various angles.
Use either the mouse or the arrow keys on your keyboard to rotate the plot. Holding your finger on an arrow
key will allow a continuous rotation of the plot; pressing the Alt Gr key at the same time will slow down the
rotation.
Menu Edit - Options (or ) allows you to change the Plot Layout from a 3-dimensional Landscape
view into Contour or Map. (Lookup Image H005)
Go back to the 3D Editor and use menu Plot - Matrix 3-D to plot sample 29, SFFi (Spruce, Fresh wood, Fine
grinding). (Lookup Image H006)
Close your various matrix plots before proceeding with the tutorial.
Task
Define a Primary Variables set and a Secondary Variables set.
How to Do It
Go to menu Modify - Edit Set or use the corresponding shortcut Ctrl+E. This opens up the Set Editor
dialog. (Lookup Image H007)
Click on the Add button to open the New Primary Variable Set dialog. Use the following settings:
Name: Excitation 320-540 nm
Data type: Spectra
Interval: 15-59
Alternatively, click the Select button and select wavelengths 320 to 540 nm in the Select Variables
dialog.
(Lookup Image H008)
Click OK; you are back in the Set Editor dialog where you can see your Primary Variable Set.
Use the drop-down list and select option Secondary Variable Set. (Lookup Image H009). Click on the
Add button to open the New Secondary Variable Set dialog, and define a set as follows:
Name: Emission 370-600 nm
Data type: Spectra
Interval: 8-31
Alternatively, click the Select button and select wavelengths 370 to 600 nm in the Select Variables
dialog.
(Lookup Image H010)
Click OK; you are back in the Set Editor dialog where you can see your Secondary Variable Set. (Lookup
Image H011)
Note!
If you made any mistake in defining the variable sets, use the Properties button to return to the New
Primary/Secondary Variable Set dialog and make corrections accordingly.
Click OK; you are back in the 3D Editor. Use menu File-Save As to save the data sets information. You
may call your new table Tutor_h_X3D with sets. (Lookup Image H012)
Task
Set up the options for a Three-Way PLS Regression and launch the model calculations.
How to Do It
Make sure that your 3D data table Tutor_h_X3D with sets is on screen. Select Task - Regression to
open the Regression (Three-Way PLS) dialog. Choose the following options:
Sample Set: All Samples [32]
Match samples in X and Y Data Tables By row numbers
Pri. X-Vars: Excitation 320-540 [45]
Weights: All 1.0
Sec. X-Vars: Emission 370-600 [24]
Weights: All 1.0
Y-Variable File: Tutor_h_Y2D
Variable Set: Severity [1]
Weights: All 1.0
Validation Method: Cross Validation. Use the Setup button to choose Full Cross Validation
Num PCs: 10
Center Data: selected
(Lookup Image H013)
Note!
In the Y-variables sheet, you may have to Browse to find the Y-Variable File Tutor_h_Y2D.
Click OK to launch the calculations. The Three-Way PLS Regression Progress dialog appears. As the
calculations run, the Y-Validation Residual Variance curve per cross validation segment is shown. When the
calculations are over the Residual Y-Validation Variance curve for the global model is displayed. (Lookup
Image H014)
Hit the View button. The Regression Overview opens, showing four default plots. These are (clockwise):
Scores, X1-Loading Weights and Y-Loadings, Predicted vs. Measured, Residual Y-Validation Variance.
(Lookup Image H015)
How to Do It
Go to menu Plot - Sample Outliers. Keep the default settings and click OK. Four plots appear in the
Viewer: Scores, Influence, Y -Residual Sample Variance and X-Residual Sample Variance. (Lookup Image
H016)
Click on the Influence plot so that it is active, then use the X and Y buttons ( ) to display only X
information, or only Y information, or both. Sample 18 (SOFi) is an outlier with a high Residual Y-Variance.
Go to menu Edit - Mark - One By One or use the corresponding shortcut , then click on sample 18 in
the Influence plot. This sample is now marked by a circle on all plots. (Lookup Image H017)
Go to menu Task - Recalculate Without Marked. This brings up the Regression (Three-Way PLS)
dialog, and you can observe that sample 18 is shown in the Keep Out of Calculation field.
Check that the Cross Validation setup is still Full Cross Validation, and that the number of components
(Num PCs) is 10. (Lookup Image H018)
Click OK to compute a new model without sample 18 (Lookup Image H019). Click View to display the
Regression Overview. Go to menu Plot - Sample Outliers and check that no sample is outlying in this
new model. (Lookup Image H020)
Go to menu File - Save and save the new model as Wood Severity_model 2
Tasks
the regression coefficients and the Predicted vs. Measured plot.
Task
Interpret the Y-Residual Validation Variance plot and determine optimal number of components (PCs).
Note!
If your plot differs from the picture, you may adjust it using this set of buttons:
The Y-residual validation variance shows a plateau from PCs 7-8, in agreement with the suggested number of
components given by the software. We decide to be conservative and use 7 PCs for this model.
Task
Interpret the Scores plot and find out if there are any clear groups of samples.
How to Do It
Activate the Scores plot (map of samples) by clicking on it; it is the plot situated in the first quadrant. The
sample names contain a lot of information. Let us focus on Wood type.
Go to Edit - Options or click on this shortcut: . This opens the Options dialog. In the Markers Layout
field, choose option Name, then click on the first box. This will disable the following boxes, so that only the
first character in the sample name will be kept (Lookup Image H022). Click OK. The Sample names only
indicate S for Spruce wood (soft) or B for Beech wood (hard).
Click on the Next Vertical PC button , or use the Up arrow key on your keyboard to display the Scores
for PC1 vs. PC3. We can observe that PC3 separates the Spruce samples (to the bottom) from the Beech
samples (to the top). (Lookup Image H023)
Task
Interpret the X-Loading Weights and find out which information is carried by PC3.
Click OK. The Loading Weights for excitation spectra (Primary variables, X1) appear in the top window and
the Loading Weights for emission spectra (Secondary variables, X2) appear in the bottom window. (Lookup
Image H025)
PC3 is represented in green on the plots. On the top plot, it shows a peak for excitation 355 nm. On the bottom
plot, it shows a peak for emission 400 nm.
These peaks describe the CH3O functional groups of hardwood and softwood. The CH3O functional groups
are higher in hardwood lignin than in softwood. This information is shown with PC3. The beech samples have
higher scores than the spruce samples for this PC.
Task
Interpret the Regression Coefficients and find important absorption/emission bands.
How to Do It
Go to Plot - Regression Coefficients, and in the Regression Coefficients dialog choose the following
settings:
Plot type: Matrix
X-variables: Primary X Vs Secondary X
Y-variable: 1, Severity
Components: 7
Double click on the preview screen at the top of the dialog to enlarge the plot: the plot will be displayed in Full
Window (Lookup Image H026)
Click OK to display the regression coefficients plot. The plot is shown in landscape layout. (Lookup Image
H027) We can observe four major areas presenting high regression coefficients (three positive, one negative).
To better study the plot, use the rotate function ( or View - Rotate). Use either the mouse or the arrow
keys on your keyboard to rotate the plot. Holding your finger on an arrow key will allow a continuous rotation
of the plot; pressing the AltGr key at the same time will slow down the rotation.
Menu Edit - Options (or ) allows you to change the Plot Layout from a 3-dimensional Landscape
view into Map. Move your mouse over the Map plot to get the coordinates for excitation and emission
wavelengths. (Lookup Image H028)
Task
Interpret the Predicted and Measured plot and find out which samples are best predicted.
How to Do It
Go to Plot - Predicted vs Measured. In the dialog, choose the following settings:
Plot type: Predicted and Measured
Y-variable: 1, Severity
Components: 7
Samples: Calibration
(Lookup Image H030)
Click OK to display the plot. The blue curve corresponds to our model, while the red curve corresponds to the
measured values. There is a good fit of the model. Yet we can observe that several samples are not as well
predicted as the others. By moving the mouse over these samples to identify them, it is seen that especially
fresh wood samples (F) are generally better predicted than old wood samples (O). (Lookup Image H031)
The RMSEC for the model is accessible from Plot - Predicted vs Measured. Choose settings:
Plot type: Predicted vs Measured
Y-variable: 1, Severity
Components: 7
Samples: Calibration
RMSEC is of 0.11, for steam treatments severity values that ranged from 1.7 to 3.5. This is about the size for
the reproducibility of the severity measurement.
Note: You will find the illustrations for this tutorial (Image I001, etc) at the end of the document.
Variables
The first three variables are concentration measurements of blue, green and orange dyes. Variables 4 to 59 are
UV/Vis spectra measured at range 250-800 nm with step 10 nm. In the Set Editor dialog box, select the
Variable Sets option to see the list of existing variable sets. (Lookup Image I003)
When you have seen the sets, click OK to leave this box and return to the data table.
Task
Plot the spectra of all mixture samples together:
How to Do It
1. Select the mixture samples 4-39 (either directly from the Editor, or with Edit - Select Samples the set
you are interested in is called Mixture).
2. Use Plot - Line (or the button from the toolbar) and choose Variable set 250-800nm as scope for
the plot. (Lookup Image I004)
To plot the reference spectra of the three pure components, select samples 1-3 and make a Line plot of
Variable set 250-800nm. (Lookup Image I005)
To plot the reference concentrations of the three dyes, select columns 1-3 and make a Line plot of Sample set
Mixture. (Lookup Image I006)
Note:
Reference measurements of spectra and concentrations of pure components are not necessary to make your
data set suitable for MCR!
Task
Set up the options for an MCR analysis, launch the calculations and plot results.
Task
Plot MCR results for various numbers of pure components.
How to Do It
Actually, the Unscrambler MCR procedure generates several sets of results, covering a number of estimated
pure components from 2 to <optimum +1>. By default, the results are plotted for the optimal number of
components.
You may view the results for varying numbers of pure components. Let us plot the spectral profiles for a 2 -
component solution. Click on the Estimated Concentrations plot to make it active (blue frame), then click Plot
- Estimated Spectra, select Number of Components as 2, and Profiles 1-2 as shown. (Lookup Image
I010)
Click OK: the plot of estimated spectra for a resolution with two pure components is displayed.
In a similar manner, click on the bottom left subview to make that plot active, then use Plot - Estimated
Spectra, to plot the 4-component solution.
MCR fitting and Principal Component Analysis (PCA) fitting results are also available for varying numbers of
pure components from 2 to <optimum +1>. Each fitting includes Variable Residuals, Sample Residuals and
Total Residuals plots. The plot of Total Residuals for MCR fitting is shown by default in the lower-right
subview. Like any other plot, it can also be accessed from the Plot menu. Click and activate the lower-right
subview, then click Plot - Residuals. In the MCR fitting tab, select Total Residuals. (Lookup Image
I011)
Click OK.
Here are the four plots which should now be displayed in your Viewer: (Lookup Image I012)
If the lower-right plot appears as a curve instead of bars, use Edit - Options (or or Ctrl+L) and select
Bars as Plot Layout.
100 Multivariate Curve Resolution of Dye Mixtures (Tutorial I) The Unscrambler Tutorials
Tutorial I - Interpret MCR results
Task
Determine the optimum number of pure components.
How to Do It
In the Total Residuals, MCR Fitting plot, residuals are high for 2 components, low for 3 components, and not
significantly decreasing for 4 components. (Lookup Image I012) This suggests that 3 components is the
optimum solution.
Click and activate the Estimated Spectra plot with 3 components, and enlarge it by clicking Window - Copy
To - 1. The toolbar contains a set of buttons , which is used to navigate between results at
different numbers of components. Use the buttons to increase and decrease the number of components, and
watch the impact on the profiles.
As you can see, the 4-component solution contains two almost identical spectral profiles. This also suggests
that 4 components may not be the optimum number, and that the mixtures contain three pure components only.
Task
Run an MCR calculation with Initial Guess.
How to Do It
If prior knowledge such as spectra of pure components or concentrations of mixture samples exists, you may
include this information in the MCR calculation to help the algorithm converge towards the right solution of
curve resolution.
Go back to data table Tutor_i data by using menu Window - Tutor_i. Click Task - MCR. The MCR
dialog box with default settings will open up. In the dialog box, click Enable Initial Guess and select option
Spectra (Samples). (Lookup Image I013)
Click the Select button and pick rows 1 to 3 as initial guess for spectra (Lookup Image I014), then click
OK to return to the MCR dialog box.
Click OK to launch the calculations, then View to open the model results. (Lookup Image I015)
Save the result file as Dye_Result2.
Notes
1. When using the initial guess option, The Unscrambler requires all pure components to be included as
initial guess inputs. Partial reference will generate erroneous results. It is recommended to run MCR without
initial guess if only partial reference is available.
2. The Unscrambler only requires either spectra or concentration of pure component as an initial guess input.
The Unscrambler Tutorials Multivariate Curve Resolution of Dye Mixtures (Tutorial I) 101
Tutorial I - Validate the Estimated Results with Reference
Information
Task
We are going to compare the models Estimated Concentrations for a 3-component solution to the existing
reference concentrations found in the data table and plotted earlier. In a first step we are going to compare the
concentration profiles visually.
How to Do It
Select the Estimated Concentrations plot, then use menu Window - Copy To - 1. Reduce the window size of
the plot on your screen. Then go back to the data table (Window - Tutor_i) and build a line plot of the three
concentrations (first 3 columns of the table). Resize the windows of the two plots in order to compare them on
screen. (Lookup Image I016)
st
You can observe that the 1 estimated concentration profile is similar to the reference profile of the blue dye
nd
(blue curves on the plots), the 2 estimated concentration profile is similar to the reference profile of the green
dye (red curves on the plots), and the 3rd estimated concentration profile is very close to the reference
concentration of the orange dye (green curves on the plots).
Note!
Estimated concentrations are relative values within an individual component itself. Estimated concentrations of
a sample are NOT its real composition.
The estimated spectral profiles can be compared to the reference spectral profiles in the same way as for the
concentrations. Because we used the spectra as initial guess inputs in this example, the comparison shows a
perfect match. However, estimated spectra are unit-vector normalized, they are not the real spectral profile
of the samples. (Lookup Image I017)
Tasks
Import the MCR result matrix of estimated concentrations,
Compare the estimated concentrations to the reference concentrations in 2D scatter plots,
Convert the estimated concentrations into real scale.
How to Do It
Use menu File - Import - Unscrambler Results, and select your MCR result file Dye_Result2. Click
Import. The Import from MCR Result dialog box will open up. Select matrix Estimated Conc and type in 3
in the PCs box, to import the concentration profiles for a 3-component mixture system. (Lookup Image
I018)
Click OK to perform the importation. A new data table Dye_Result2_Estimated Conc is generated.
(Lookup Image I019)
Insert three empty rows at the top of this table, so that the table has a total of 39 rows. (Lookup Image I020)
102 Multivariate Curve Resolution of Dye Mixtures (Tutorial I) The Unscrambler Tutorials
Go to table Tutor_i, select the first three columns (blue, green and orange), copy them and paste them at the
beginning of the new data table. We now have a table of six columns, containing the three measured
concentrations of the pure dyes followed by the three estimated concentrations. (Lookup Image I021)
Select columns Blue and 1 (press the Ctrl key on your keyboard to select several columns at a time). Click
Plot - 2D Scatter to display a 2D Scatter plot of these columns. The correlation between estimated and
reference concentrations for the blue dye is of 0.994. If the box containing plot statistics (among which
correlation) is not displayed on the upper left corner of your plot, use View - Plot Statistics to display it.
For the green dye (columns Green and 2 in the table), the correlation between estimated and reference
concentrations is of 0.997.
As for the orange dye (columns Orange and 3), the correlation is of 0.998. These very high correlations
indicate that the MCR calculations have determined concentration profiles accurately in this case. (Lookup
Image I022)
Now let us convert the estimated Orange concentrations to real scale. In order to do this, at least one reference
measurement is needed. The estimated concentrations (in relative scale) of all samples can be converted into
real concentration scale by multiplying by a factor <real concentration / estimated concentration>.
In the present case, we can use for example sample PROBE_11, which has a reference concentration of Orange
dye of 7 and an estimated concentration of 0.4443.
Use menu Edit - Append - Variables to append a new column at the end of the table, and name it MCR
Orange real scale. Go to Modify - Compute General, and type in the expression: V7=V6*(7/0.4443)
in the Expression space. (Lookup Image I023)
Click OK to perform the calculation. The new column fills up with the values of estimated Orange dye
concentrations converted to real scale. (Lookup Image I024)
The Unscrambler Tutorials Multivariate Curve Resolution of Dye Mixtures (Tutorial I) 103
Constraint Settings in Multivariate Curve Resolution
(Tutorial J)
Description of Tutorial J
Context of Tutorial J
In this tutorial we will utilize FTIR spectra of an esterification reaction to extract pure spectra and their relative
concentrations. The original data are from University of Rhode Island (Prof. Chris Brown), USA.
The esterification reaction of iso-propanol and acetic anhydride using pyridine as a catalyst in carbon
tetrachloride solution was monitored by FTIR. The initial concentrations of these three chemicals were 15%,
10% and 5% in volume, respectively. Iso-propyl acetate was one of the products in this typical esterification
reaction. The reaction was carried out in a ZnSe cell, and mixture spectra were measured at 4 cm -1 resolution.
The data set consisted of 25 spectra, covering approximately 75 minutes of the reaction. To shift the
equilibrium of the esterification, one-tenth of the volume was removed from the cell at 24, 45 and 60 minutes.
An equal amount of a single reactant was added to the cell in the sequence of acetic anhydride, pyridine and
iso-propanol.
Note: You will find the illustrations for this tutorial (Image J001, etc) at the end of the document.
104 Constraint Settings in Multivariate Curve Resolution (Tutorial J) The Unscrambler Tutorials
Task
Run a PCA on the raw data.
How to Do It
Click Task - PCA to run a Principal Component Analysis and choose the following settings:
Sample set: All Samples
Variable set: All Variables
Validation Method: Full cross-validation
Num PCs: 10
(Lookup Image J002)
Once the PCA calculations are done, click View to open the result viewer. (Lookup Image J003)
Click Plot - Loadings, select a plot of type Line, and type in value 1-3 in field Vector 1, so that the first
three principal components will be represented into the same line plot. (Lookup Image J004)
Click OK to display the plot.
Select another plotting area by clicking on it with the mouse, for example the upper-right subview. Click Plot -
Loadings, select a plot of type Line, and type in value 4-6 in field Vector 1. Click OK to display the plot.
(Lookup Image J005)
th
You can see that the loadings along the 6 principal component are quite noisy. The program recommends four
components as the optimal number of PCs in this model. Select the Explained Variance plot by clicking on it
with the mouse, then click View - Numerical. (Lookup Image J006) As you can see, the explained
variance globally reaches a plateau from the 4 th principal component. The 5th and 6 th PCs still show some slight
increase; at that stage, it is difficult to know whether they represent noise or real information.
Now, study the Influence plot at the bottom-left corner of the Viewer. You may observe that sample 1 sticks
out from the group of samples, with a high leverage and a high residual variance. Go to menu Plot - Sample
Outliers to display a combination of four useful plots for outlier detection. The plot of Residual Sample
Variance at the bottom-left corner indicates a high validation residual for sample 1. (Lookup Image J007)
As there is no validation check in MCR, we may use the outlier information issued from PCA into our MCR
modelling later on.
Task
Build a first MCR model with default settings.
How to Do It
The Unscrambler Tutorials Constraint Settings in Multivariate Curve Resolution (Tutorial J) 105
Using menu Window - Tutor_j, go back to the data table. Click Task - MCR and keep the default
settings:
Sample set: All Samples
Variable set: All Variables
Non-negative concentrations: selected
Non-negative spectra: selected
Closure: not selected
Unimodality: not selected
Sensitivity to pure components: 100
Note: MCR computations are demanding. Building the model can easily take several minutes depending on the
size of the data set, the selected options and the capacity of your machine.
Click View when the calculations are finished; the MCR result viewer opens. Notice that the program suggests
4 as the optimal number of pure components, by indicating (4) at the bottom of each plot. (Lookup Image
J009)
Task
Read the MCR Message List and follow the systems recommendation for the Sensitivity to pure
components setting.
How to Do It
Click on menu View - MCR Message List in model mIR Result1 to check the recommendations given
by the system. There are four types of recommendations:
Type 1: Increase sensitivity to pure components
Type 2: Decrease sensitivity to pure components
Type 3: Change sensitivity to pure components (increase or decrease)
Type 4: Baseline offset or normalization is recommended.
In the present case, the system recommends to change the setting for sensitivity to pure components. (Lookup
Image J010)
106 Constraint Settings in Multivariate Curve Resolution (Tutorial J) The Unscrambler Tutorials
The default setting (100) that was used for Sensitivity to pure components is usually a good starting point.
After interpreting the results and reading the system recommendations, you can tune it up or down between 10
and 190. The higher the Sensitivity, the more pure components will be extracted. Therefore, if too many
components are extracted, it is recommended to reduce the setting. On the opposite, if you would like to see
more components at an almost undetectable level, or even some noise profiles, it is recommended to increase
the setting.
Go back to the data table and re-do the MCR calculation with a Sensitivity to pure components setting of
150. (Lookup Image J011)
The plot of Estimated spectra is now shown by default for 5 components instead of 4 in the previous model.
(Lookup Image J012)
One can compare those profiles with FTIR spectra of known constituents, and identify the 5 estimated spectra
as pyridine, iso-propanol, a possible intermediate, propyl acetate and acetic anhydride, from curves 1-5
respectively.
Task
Run MCR with a closure constraint. Compare two MCR models on the same data, with and without closure.
How to Do It
Among the MCR settings we have used so far, two types of constraints were not selected.
A constraint of Unimodality can be applied to restrict the resolution to concentration profiles that have only
one maximum.
With a constraint of Closure, the resolution will yield concentration profiles whose sum is constant.
th th
In the present case, acetic anhydride was added at 24 minutes (between the 8 and the 9 samples), which
means that the first 8 samples can be treated in closure conditions.
Go back to the data table and run a new MCR model with the following settings:
Sample set: Closure [8]
(contains the first 8 samples of the data table)
Variable set: All Variables
Non-negative concentrations: selected
Non-negative spectra: selected
Closure: selected
Unimodality: not selected
Sensitivity to pure components: 100
(Lookup Image J013)
The Unscrambler Tutorials Constraint Settings in Multivariate Curve Resolution (Tutorial J) 107
Once the computations are finished, save the model file as mIR Result3.
You may compare the resolved concentration and spectral profiles of pure components with and without the
closure setting. To do that, compute a new MCR model on sample set Closure without checking the Closure
constraint option. Save the new model file as mIR Result4 and compare the results to mIR Result3.
The spectral profiles under the constraint of closure present higher peaks for pure component 1 (blue) for
-1
wavelengths around 110 and 1250 cm . (Lookup Image J014)
You can also observe that under constraint of closure, the concentrations of the pure components always add
up to 1. (Lookup Image J015)
Task
Use the interactive Recalculate functionality to remove samples or variables with high residuals.
How to Do It
Click menu Window - mIR_Result1 to bring back your first MCR model on screen.
The Validation calculations of the PCA model that we built earlier indicated that Sample 1 was an outlier. We
can check this again in the MCR model by looking at the PCA fitting residuals. Click on the bottom-right
subview to highlight it, then use Plot - Residuals, choose sheet PCA Fitting and option Sample Residuals.
You may notice a high residual showing for Sample 1, compared to the other samples. Let us build a model
without this sample.
Use the marking tools to highlight sample 1 on one of the plots, for example the Sample Residuals,
PCA Fitting plot. (Lookup Image J016)
Click menu Task - Recalculate Without Marked to specify a new MCR calculation without sample 1.
(Lookup Image J017)
This brings you back to the MCR dialog, where Sample 1 is now included in the Keep Out Of Calculation
field. You may launch the calculations to get the new MCR results.
Note that similarly, you may want to keep out of the model non-targeted wavelength regions, or highly
overlapped wavelength regions.
Click Plot - Residuals and choose Variable Residuals. (Lookup Image J018)
108 Constraint Settings in Multivariate Curve Resolution (Tutorial J) The Unscrambler Tutorials
Mark any unwanted variables on the plot using the marking tools, for examples variables around 1100-1140
cm-1 which present very high residuals (Lookup Image J019), then use Task - Recalculate Without
Marked to specify a new MCR calculation.
The Unscrambler Tutorials Constraint Settings in Multivariate Curve Resolution (Tutorial J) 109
Tutorial C - Illustrations
C001 The Light Absorbance Spectrum
Absorbance
log(1/T)
3.5
3.0
2.5
2.0
12
1.5 1 23 4 56 78 9 10 11 1314 15 16
C023 The Line Plot dialog with Source: Tutorial C and Matrix: ResYValTot
Tutorial D - Illustrations
D006 The Enam FRD Analysis of Effects results displayed with Significance Testing
Method: COSCIND
D009 The Statistics results plotted as Percentiles and Mean and SDev (Design
Samples)
D011 The Statistics results plotted as Percentiles and Mean and SDev (Design
samples and Center samples)
D012 The Response Surface dialog with the X-var sheet active
Tutorial E - Illustrations
E001 Data Table with category variable Iris
Tutorial F - Illustrations
F001 The Import ASCII dialog
F005 The Category Variable Wizard Enter Variable Name and Choosing Method
dialog
Tutorial G - Illustrations
G003 The Design Wizard - Define Mixture Variables dialog with three defined
variables
G007 The Information dialog displayed upon exiting the Design Wizard
G010 The Import Worksheet dialog - Selecting ranges for Data, Sample names and
Variable names
G014 The PLS2 Regression Progress dialog showing high residual variance and
several warnings
G021 The Response Surface dialog with the X-variables sheet active
G023 The Response Surface for Accept with the optimum coordinates and value
H007 Set Editor dialog for an OV2 data table. Primary Variable Sets, Secondary
Variable sets and Sample sets can be defined
H009 Set Editor dialog for an OV 2 data table. A Primary Variable Set was defined, now
the Secondary Variable Sets option is selected to define a new set
H017 Sample Outliers plots, Wood Severity_model 1, with sample 18 marked with a
circle
Tutorial I Illustrations
I001 Tutor_i data table, size 39x59
0.4
0.2
0
Variables
200 400 600 800
PROBE_01 PROBE_1B PROBE_02PROBE_2B PROBE_03 PROBE_3B PROBE_04
1.0
0.5
0
Variab les
200 400 600 800
BB_50 GR_50 OR_50
15
10
0
Samples
10 20 30 40
Blue Green Orange
I010 Estimated Spectra dialog, plotting estimated spectra for a 2-component solution
I020 Imported matrix after insertion of three empty rows to the top
I024 Editor with a column presenting the estimated Orange concentrations converted
to real scale.
J009 MCR Overview for model mIR Result1 with default settings
J012 Estimated spectra with Sensitivity to pure components set at 150 (model mIR
Result 2)