Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
2
Classical univariate calibration model
1.2
• linear regression
• extrapolation allowed
1
• sensitivity directly definable
Measured Value
0.6
0.4
0.2
0
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45
Concentration / %
3
Multivariate calibration model
• evaluation of spectrum
PLS Partial
Least
Squares
4
OPUS QUANT2 (PLS) for advanced users
5
Principles and Properties of Factor
Analysis
6
Factor Analysis of Spectra
Factor analysis breaks apart the spectral data into the most
common spectral variations (factors, loadings, principal
components) and the corresponding scaling coefficients (scores)
p d p
Loadings
Scores
Spectral data matrix = d
n n
7
Factor Analysis of the Spectral Variance
(without Property Values)
8
Inverse Factor Analysis: Reconstruction
of a Spectrum using all Factors
7.731
+ -0.699
+ 3.67E-04
+ -1.15E-02
+ -2.04E-02
9
Reconstruction using two Factors:
>99% of Information Content retained
7.731
+ -0.699
In the software the spectra are not reconstructed. The spectra are
represented just by the few scores values (data compression) which are
used in the modeling calculations.
10
Moving from factor analysis to PLS
11
PLS Loadings for components A and B
PLS loadings
A B Comp. A
1
2
3
PLS loadings
Comp. B
1
2
3
12
Analysis of Spectra using PCA or PLS
models based on Scores and Loadings
7.731
+ -0.699
For the measured spectrum the scores are calculated according to the
factors (loadings) stored in the model.
The scores are used for the final evaluation in the PCA model
(identification) or PLS model (quantification).
13
Experiment to show Capabilities of PLS
Reflectance
0.7
spectra from
0.6
glucose with
admixtures of
0.5
Absorbance
1.0-1.9% talc
0.4 0.3
0.2
0.1
14
Spectra after Vector Normalization
Absorbance
15
Not optimized PLS Model with a broad
Spectral Range used
R2=96.63
1.9
RMSECV=0.05%
1.7 Rank 2
NIR prediction / %
1.5
1.3
1.1
0.9
0.9 1.1 1.3 1.5 1.7 1.9
Reference value / %
Weighting of
wavenumbers of
the calibrated
property.
PLS contains an
Absorbance
automatic ‘search’
for relevant
wavenumbers
Both factors
Factor 1 contain parts of
Factor 2 the spectral
variation caused
by the talc
Absorbance
content
R2=99.68
1.9
RMSECV=0.02%
1.7 Rank 2
NIR prediction / %
1.5
1.3
1.1
0.9
0.9 1.1 1.3 1.5 1.7 1.9
Reference value / %
20
Regression Coefficients for the
optimized PLS Model
In the optimized
model only the
talc peak is
considered
Absorbance
of the talc
content
23
Data sets for model setup and method
validation
Test
Calibration Setup with test set validation
Set
Calibration
Val
Calibration
Set
Val Val Val Val
Set Set Set Set
Test
Calibration
Set
Test Val
Calibration
Set Set
Validation with
independent samples is
the ONLY way to
• check the accuracy,
reproducibility and
robustness of PLS Val Val Val Val
methods, Set Set Set Set
• select methods for
routine use.
robustness
of model
27
Principles of method development
28
Principles of method development
29
Distribution of samples
Prediction
„rare sample“ or
outlier
typical
concentration range
The concentration range of the calibration
should extent the expected analysis range
if possible.
Reference value
30
General parameter influencing the
modeling and the model accuracy
• Quality of instruments
e.g. Resolution, stability, signal/noise ratio, precision, robustness
31
Selection of calibration and test
samples
• Calibration and test set samples should be well distributed over the entire
property range
• As many as possible samples should be used for the test set but important
samples must be in the calibration. In case of big data sets the splitting is
done by having 50% in the calibration and test set.
32
Selection of spectral ranges for
calibration
4 • Avoiding of spectral
noise, e.g. on the left
and right border of the
spectra where the
3
detector has low
sensitivity or a cut off
2 • Avoiding of spectral
ranges with total
absorbance
(absorbance >2,0 AU)
1
• A quantitative
evaluation is only
0 possible up to 2 AU but
starting from the
baseline.
10000 8000 6000 4000 2000
Wavenumber
33
Trouble shooting in case of poor
prediction
34
Trouble shooting in case of outliers
35
Trouble shooting in case of outliers
Solution:
• Revision of the reference analysis method (2nd reference technique, old
chemicals, operator?)
• Revision of accuracy, error limits and reproducibility of the reference
analysis?
• Repetition and/or multiple determination of the reference values for some
samples
36
NIR-Spectra of Water at various
temperatures
0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4
Absorbance Units
0 °C
5 °C
1.45
10 °C
15 °C
20 °C
Absorbance Units
25 °C
1.40
30 °C
35 °C
40 °C
45 °C
1.35
50 °C
55 °C
60 °C
65 °C
1.30
70 °C
75 °C
80 °C
85 °C
1.25
90 °C
95 °C
7100 7000 6900 6800 6700 100 °C
Wavenumber cm -1
39
Difference Spectra of Water band on
increasing temperature
0 °C
5 °C
10 °C
15 °C
0.4
20 °C
Absorbance Units
25 °C
30 °C
0.2
35 °C
40 °C
45 °C
50 °C
0.0
55 °C
60 °C
65 °C
-0.2
70 °C
75 °C
80 °C
85 °C
-0.4
90 °C
95 °C
7500 7000 6500 6000 100 °C
Wavenumber cm-1
40
Toluene NIR-Spectra on various
temperatures
-0.01 0.0 0.01 0.02 0.03 0.04 0.05 0.06
Absorbance Units
25 °C
30 °C
35 °C
Absorbance Units
40 °C
45 °C
50 °C
55 °C
60 °C
65 °C
70 °C
75 °C
80 °C
85 °C
90 °C
95 °C
42
Error values for characterizing
calibration performance and validation
43
Error values for characterizing
calibration performance and validation
44
What our customers expect…
Customer YOU
Normal (Gaussian) Distribution
+/- 2s
• +/- 2 standard deviations
(95.5%)
95.5%
• +/- 3 standard deviations
+/- 3s (99.7%)
99.7%
95.5%
99.7%
R2 and its meaning: expresses the
relation of error bar and value range
R2 = 66.4%
R2 = 81.4%
R2 = 98.9%
R2 is coefficient of determination
and is not the same as
r2 is regression coefficient
R2: Calibration of Fat in Milk
50
Principle of Vector Normalization
Acetylsalicylic acid
Salicylic acid
1.0
0.8
Normalization of spectra
Absorption
0.6
Acetylsalicylic acid
Salicylic acid
0.5
0.4
Absorption
0.3
0.2
0.1
0.0
Absorption 5 5pt5ptpt
band
1st derivative
55
Spectra of Glucose after 1st Derivative
0.004
5 pt
13 pt
25 pt
0.002
Absorption
-0.002 0.000
-0.004
5 pt
13 pt
25 pt
-0.0000
Absorption
-0.0002
-0.0004
1st derivative 13 pt
0.10
2nd derivative 13 pt
0.05
Absorption
-0.05 0.00
-0.10
-0.15
1st derivative 13 pt
2nd derivative 13 pt
0.10
0.05
-0.10 -0.05 0.00
Absorption
-0.15
-0.20
60
Advantages and Disadvantages of
Derivatives
• Advantages of derivatives
• Contrast enhanced, more details visible
• Result depends not on used spectral range
• Disadvantages of derivatives
• Noise enhanced, smoothing step needed
• Result depends on used window size
61
Recommended smoothing Point
Settings for 1st Derivatives
• Resolution 8cm-1
• Quant2: 13 to 21pt, mainly 17pt
• Ident: 9 to 17pt, mainly 13pt
• Resolution 16cm-1
• Quant2: 9 to 17pt, mainly 13pt
• Ident: 9 to 17pt, mainly 9pt
62
Other Pre-processing Methods in
Quant2
63
Other Pre-processing Methods in
Quant2
• Min-Max Normalization
Only useful if you have a more or
less constant highest peak or you
looking for peak ratios in the
selected spectral range
Not really useful for NIR, quite risky
in most cases
• Internal Standard
Used only if an internal standard is
used for scaling spectra
64
Other Pre-processing Methods in
Quant2
• Second Derivative
For elimination of offsets and skewed
baselines
Common for dispersive systems to
increase the contrast for low resolution
spectra
Noise is highly increased, that’s why
using 1st derivative plus
Vectornormlization is better
65
Other Pre-processing Methods in
Quant2
66
OPUS QUANT2 (PLS) for advanced users
67
Load method with overview on spectra
and parameters
68
Component definition with units and
decimal point settings
69
Adding dummy components as category
variables
70
Adding dummy components as category
variables
71
Spectra table for spectra and
component values
72
Check spectra before loading them!
73
Missing values are handled as a blank
Even with
missing values
you can
copy/paste
tables from e.g.
Excel to the
spectra table.
74
Set sample number
75
New method based on mean spectra for
each sample (sample no.)
76
New method based on mean spectra for
each sample (sample no.)
• The new method can be further developed and new samples can be
added even with repeated measurements.
77
New method based on mean spectra for
each sample (sample no.)
Mean spectra
New method
78
New method based on mean spectra for
each sample (sample no.)
79
Component correlations
80
Component correlations
81
Calibration design
82
Calibration design
83
Calibration design
84
Dataset settings
85
Set data set
86
Spectra without reference values can be
set
For selected
components the
spectra can be
excluded for
blank entries or
for spectra with
component
values of 0 or -
1.
87
Automatic selection of test samples on
component values (Kennard-Stone)
88
Automatic selection of test samples on
component values (Kennard-Stone)
Samples with
lowest and
highest
property values
are in the
calibration set,
the next inner
ones in the test
set
89
Automatic selection of test samples on
component values (Kennard-Stone)
Next test
sample is
chosen with the
Next test
maximum
sample
distance from
the already
selected ones in
all dimensions
(properties).
Here it is found
in the middle.
90
Automatic selection of test samples on
component values (Kennard-Stone)
10 % Test samples
Next test
sample is
chosen with the
maximum
distance from
the already
selected ones in
all dimensions
(properties)
until the
required
percentage of
test samples is
reached.
91
Automatic selection of test samples on
component values (Kennard-Stone)
20 % Test samples
92
Automatic selection of test samples on
component values (Kennard-Stone)
50 % Test samples
93
Automatic selection of test samples
(Kennard-Stone) in scores space (PCA)
94
Quant2 OPUS 7: exclude redundant
samples
Quant2 OPUS 7: exclude redundant
samples
• The new algorithm is looking for k nearest neighbors (kNN) and kick
redundant samples out which are very close to a given sample.
• This is the opposite approach to Kennard-Stone algorithm which is
used to find and select sample which are covering well the range of
samples.
RMSEP = 0.73
Quant2 OPUS 7: exclude redundant
samples
RMSEE = 0.54
Quant2 OPUS 7: exclude redundant
samples
RMSEP = 0.73
RMSEE = 0.81
Zoom in.
Quant2 OPUS 7: Set Color in PCA Score
Plot
Set color.
Quant2 OPUS 7: Set Color in PCA Score
Plot
Done.
Set dataset
110
Set color for plots on page Graph
111
Set color for plots on page Graph
112
Set color for plots on page Graph
113
OPUS QUANT2 (PLS) for advanced users
114
Parameter page for data pretreatment
and spectral regions
115
Data pretreatment in any order and in
any spectral ranges
116
Data pretreatment in any order and in
any spectral ranges
CAUTION!
Everything possible,
but maybe not useful!
117
Data pretreatment in any order and in
any spectral ranges
118
Data pretreatment in spectral regions
selected for modeling
119
Interactive selection of spectral regions
120
Display preprocessed spectra
121
Display preprocessed spectra but only
every x th sample
122
Statistics for repeated measurements
(replicates) on preprocessed spectra
123
Statistics for repeated measurements
(replicates) on preprocessed spectra
124
Statistics for repeated measurements
(replicates) on preprocessed spectra
125
Model calculation including validation
126
Model calculation including validation
127
Internal Validation
1) cross-validation
2) test-set-validation
128
(Full) Cross Validation
129
(Full) Cross Validation
130
(Full) Cross Validation
131
(Full) Cross Validation
This procedure is
continued until all
samples has been
taken out, tested
and put back into
the calibration set
132
(Full) Cross Validation
Advantages of Cross
Validation:
133
(Full) Cross Validation
Disadvantages of
Cross Validation:
134
Test Set Validation
Samples from the Test Set need to be independent from the Calibration
Data Set
135
Test Set Validation
Problem: Only 50% of the samples are used for calibration set up.
136
Cross validation, (full) cross validation
137
OPUS QUANT2 (PLS) for advanced users
138
NIR predictions vs. true values
(reference) in the model validation
139
NIR predictions vs. true values
(reference) for the calibration
140
Statistics for the model validation
Residual
Prediction
Deviation
RPD = SD/SECV
or
RPD = SD/SEP
SD = Standard
deviation of the
true values
(reference)
RPD > 3
acceptable
model
141
Statistics for the model validation
Residual
RPD Classification Application
Prediction
<1.0 very poor not recommended Deviation
1.0 - 2.4 poor not recommended
RPD = SD/SECV
2.5 - 2.9 fair rough screening
or
3.0 - 3.9 reasonable screening RPD = SD/SEP
4.0 - 5.9 good QC
SD = Standard
6.0 - 7.9 very good QA
deviation of the
8.0 - 10.0 excellent any application true values
>10.0 superior as good as reference (reference)
142
Statistics for the model validation
143
Regression line, ideal case
Regression
line (blue)
144
Regression line, non ideal case
Regression
line (blue)
145
Statistics for the model validation
146
Differences vs. true values (reference)
The distribution of
the deviations
and especially the
range between
minimum and
maximum
deviation helps to
check model
performance.
147
Error vs. rank
Each factor
contributes with
helpful
information for
lowering the
error. After a
reaching a
minimum the
error increases
again.
(overfitting)
148
Mahalanobis distance (MD) and spectral
residuals
Only spectra in
the upper right
corner are
potential outliers,
but not spectra of
samples with very
low or high
property values.
149
Quant2 OPUS 7: New Mahalanobis
Distance threshold
To check MD settings
go to calibration!!!
In OPUS 7 the
threshold is set based
on the calibration set
statistic.
Almost all calibration
spectra will be below
the threshold. This is
logical because those
samples belong to the
calibration set.
Scores plot showing PLS scores
153
Statistics based on the predictions for
repeated measurements
154
Regression coefficients (b-vector)
The regression
coefficients are
showing the
weighting of data
point
(wavenumbers or
wavelength) in
the model.
155
PLS loadings (factors)
156
All plots as values in the full report
157
Component Value Density
158
Detection of relevant samples for
calibration expansion by the predicition
159
Detection of relevant samples for
calibration expansion by the predicition
60
Component value density
45
Model NIR vs. true
50
Component value density
43
40 41
NIR prediction
39
30
37
20
35
10
33
0 31
31 33 35 37 39 41 43 45
True value (reference)
160
Statistics based on the predictions for
repeated measurements
161
OPUS QUANT2 (PLS) for advanced users
162
Optimization with NIR, A or B algorithm
163
Optimization with NIR, A or B algorithm
164
Direct transfer of settings to the
parameter page for the selected model
165
Basic settings with a broad maximum
test range
166
Pre-defined spectral ranges for NIR
optimization
167
Pre-defined spectral ranges for NIR
optimization
168
10 spectral ranges for A & B
optimization by splitting the test range
169
10 spectral ranges for A & B
optimization by splitting the test range
170
User defined spectral ranges for A & B
optimization
171
User defined spectral ranges for A & B
optimization
172
OPUS QUANT2 (PLS) for advanced users
173
Overview NIR spectral regions
O-H
C-H
N-H
174
User defined regions for A opt. of C-H
and N-H (w/o water and water vapour)
175
User defined regions for NIR
optimization of water (moisture)
176
Suggested spectral regions for user
defined optimization with Quant2 (PLS)
A optimization
for C-H and N-H
9000 - 8000 cm-1
8000 - 7450 cm-1
6900 - 6770 cm-1
6770 - 6400 cm-1
6400 - 6030 cm-1
6030 - 5500 cm-1
4950 - 4770 cm-1
4770 - 4600 cm-1
4600 - 4500 cm-1
4500 - 3850 cm-1
NIR optimization
for O-H
10550 - 9250 cm-1
7100 - 6800 cm-1
6800 - 6400 cm-1
O-H C-H N-H 6400 - 6030 cm-1
5300 - 4950 cm-1
Regions above 9000cm-1 are normally not considered for reflection spectra obtained with integration sphere.
177
OPUS QUANT2 (PLS) for advanced users
178
Different models can be tested at once
with a list of spectra
179
Adding true values (reference) for
comparison with predictions
180
Copy/paste of true values (reference)
for comparison with predictions
181
Copy/paste of true values (reference)
for comparison with predictions
182
Predictions overview
183
Prediction vs. true value (reference)
with target and regression line (blue)
184
Easy comparison of different models
185
Difference vs. true value (reference)
with bias line (blue)
186
Quant2 Filelist OPUS 7: marking of MD
and calibration range outliers
Marking according to
the indication in the
table on page ‘Analysis
Results’:
MD/range OK
MD not OK
out of range
MD and range
not OK
Quant2 Filelist OPUS 7: marking of MD
and calibration range outliers
Marking according to
the indication in the
table on page ‘Analysis
Results’:
MD/range OK
MD not OK
out of range
MD and range
not OK
Quant2 Filelist OPUS 7: marking of MD
and calibration range outliers
Marking according to
the indication in the
table on page ‘Analysis
Results’:
MD/range OK
MD not OK
out of range
MD and range
not OK
Result statistics
190
74 PLS models for API in tablets:
calibration results
10
RMSEP or RMSECV of calibration
9
7
RMSEP or RMSECV
0
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71 73
Model
191
74 PLS models for API in tablets:
calibration and validation results
10
RMSEP or RMSECV of calibration RMSEP of validation
9
7
RMSEP or RMSECV
0
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71 73
Model
192
Region with API information but bad
influence
193
Spectra of tablets and pure API
194
Maximized spectra of tablets and pure
API
195
Select regions related to API
Remove API
spectrum before
starting
optimization!
196
Model robustness check by prediction of
independent samples across
instruments
• Sunflower samples were scanned on 3 Bruker Instruments
• Each sample were scanned 2 times with re-filling
• Same cup filling was measured on all instruments
• Predictions were done with 5 models obtained during model
optimization process
• All models showed very similar calibration results but act different in
terms of
197
Model robustness check by prediction of
independent samples across
instruments
38 Protein
Model 1
33
RMSECV = 1.0
SEP = 1.3
28
23
198
Model robustness check by prediction of
independent samples across
instruments
38 Protein
Model 2
33
RMSECV = 0.99
SEP = 1.7
28
23
199
Model robustness check by prediction of
independent samples across
instruments
38 Protein
Model 3
33
RMSECV = 1.1
SEP = 1.7
28
23
200
Model robustness check by prediction of
independent samples across
instruments
38 Protein
Model 4
33
RMSECV = 1.1
SEP = 1.7
28
23
201
Model robustness check by prediction of
independent samples across
instruments
38 Protein
Model 5
33
RMSECV = 1.2
SEP = 2.5
28
23
202
Modeling with big spectra data sets
transferred from Foss and new Bruker
data
By the time when Foss spectra are transferred the number of available samples
is limited. Sometimes the reference values are not available or to old (i.e. for
moisture).
Nevertheless as many samples as possible should be measured on the Bruker.
Not for the transfer samples but for the calibration samples reference values
are required.
For the modeling and the model selection it is helpful to scan samples several
times to check and select models by repeatability.
203
Modeling with big spectra data sets
transferred from Foss and new Bruker
data
The modeling must be guided towards the characteristic of
Bruker spectra by a proper splitting of data sets:
204
Innovation with Integrity
©Copyright
Copyright Bruker
© 2011 Bruker Corporation.
Corporation. All rights
All rights reserved. reserved.
www.bruker.com