So Be Hart Keenan Stein 2000

default risk
51: Credit : May 2000

A MAJ OR CHALLENGEin developing m odels that can effec-
tively assess the credit risk of individual obligors is the lim ited
availability of high-frequency objective inform ation to use as
m odel inputs. M ost m odels estim ate creditw orthiness over a
period of one year or m ore, w hich often im plies the need for
several years of historical financial data for each borrow er
1
.
W hile reliable and tim ely financial data can usually be
obtained for the largest corporate borrow ers, they are difficult
to obtain for sm aller borrow ers, and are particularly difficult to
obtain for com panies in financial distress or default, w hich are
key to the construction of accurate credit risk m odels. The
scarcity of reliable data required for building credit risk m odels
also stem s from the highly infrequent nature of default events.
In addition to the difficulties associated w ith developing
m odels, the lim ited availability of data presents challenges in
assessing the accuracy and reliability of credit risk m odels.
In its recent report on credit risk m odelling, the Basle C om -
m ittee on Banking Supervision highlighted the relatively infor-
m al nature of the credit m odel validation approaches at m any
financial institutions. In particular, the C om m ittee em phasised
data sufficiency and m odel sensitivity analysis as significant
challenges to validation. The C om m ittee has identified validation
as a key issue in the use of quantitative default m odels and con-
cluded that the area of validation w ill prove to be a key chal-
lenge for banking institutions in the foreseeable future.
2
This article describes several of the techniques that M oodys
has found valuable for quantitative default m odel validation and
benchm arking. M ore precisely, w e focus on (a) robust segm en-
tation of data for m odel validation and testing, and (b) several
robust m easures of m odel perform ance and inter-m odel com -
parison that w e have found inform ative and currently use. These
perform ance m easures can be used to com plem ent standard
statistical m easures.
W e address the tw o fundam ental issues that arise in vali-
dating and determ ining the accuracy of a credit risk m odel
under: w hat is m easured, or the m etrics by w hich m odel good-
nesscan be defined; and how it is m easured, or the fram ew ork
that ensures that the observed perform ance can reasonably be
expected to represent the behavior of the m odel in practice.
Model accuracy
W hen used as classification tools, default risk m odels can err in
one of tw o w ays
3
. First, the m odel can indicate low risk w hen,
in fact, the risk is high. This Type I error corresponds to the
assignm ent of high credit quality to issuers w ho nevertheless
default or com e close to defaulting in their obligations. The cost
to the investor can be the loss of principal and interest, or a loss
in the m arket value of the obligation.
Second, the m odel can assign a low credit quality w hen, in
fact, the quality is high. Potential losses resulting from this Type
II error include the loss of return and origination fees w hen loans
are either turned dow n or lost through non-com petitive bidding.
These accuracy and cost scenarios are described schem atically
in Figures 1 and 2. U nfortunately, m inim ising one type of error
usually com es at the expense of increasing the other. The trade-
off betw een these errors is a com plex and im portant issue. It is
often the case, for exam ple, that a particular m odel w ill outper-
form another under one set of cost assum ptions, but can be dis-
advantaged under a different set of assum ptions.
Since different institutions have different cost and pay-off
structures, it is difficult to present a single cost function that is
appropriate across all firm s. For this reason, here w e use cost
functions related only to the inform ation content of the m odels.
A validation framework
Perform ance statistics for credit risk m odels can be highly sen-
sitive to the data sam ple used for validation. To avoid em bed-
ding unw anted sam ple dependency, quantitative m odels
should be developed and validated using som e type of out-of-
sam ple
4
, out-of-universe and out-of-tim e testing approach on
panel or cross-sectional data sets.
H ow ever, even this seem ingly rigorous approach can gener-
ate false im pressions about a m odels reliability if done incor-
rectly. H old out testing can easily m iss im portant m odel prob-
lem s, particularly w hen processes vary over tim e, as credit risk
does.
In the follow ing section, w e describe a validation fram ew ork
that accounts for variations across both tim e and across the
population of obligors
5
.
Validation m ethodologies
for default risk m odels
The Basle C om m ittee has identified credit m odel validation as one of the m ost challenging
issues in quantitative credit m odel developm ent. J orge Sobehart,Sean Keenanand
Roger Steinof M oodys Investors Service address issues of data sparseness and the sensi-
tivity of m odels to changing econom ic conditions along the Basle guidelines.
p51p56.qxd 15/05/00 12:22 Page 51
default risk
A schem atic of the fram ew ork is show n in Figure 3.
The figure breaks up the m odel testing procedure along tw o
dim ensions: (a) tim e (along the horizontal axis); and (b) the pop-
ulation of obligors (along the vertical axis). The least restrictive
validation procedure is represented by the upper-left quadrant,
and the m ost stringent by the low er-right quadrant. The other
tw o quadrants represent procedures that are m ore stringent
w ith respect to one dim ension than another.
The upper-left quadrant describes validation data chosen
com pletely at random from the full data set. This approach
assum es that the properties of the data stay stable over tim e
(stationary process). Because the data are draw n at random ,
this approach validates the m odel across the population of
obligors preserving its original distribution.
The upper-right quadrant describes one of the m ost com -
m on testing procedures. In this case, data for building a m odel
are chosen from any tim e period prior to a certain date and vali-
dation data are selected from tim e periods only after that date.
A m odel constructed w ith data from 1990 to 1995 and tested
on data from 1996 through 1999 is a sim ple exam ple of this out-
of-tim e procedure.
B ecause m odel validation is perform ed w ith out-of-tim e
sam ples, tim e dependence can be detected using different val-
idation sub-sam ples. H ow ever, since the sam ple of obligors is
draw n from the population at random , this approach also vali-
dates the m odel preserving its original distribution.
The low er-left quadrant represents the case in w hich the
data are segm ented into a m odel estim ation set and a validation
(out-of-sam ple) set containing no firm s in com m on. If the pop-
ulation of the validation set is different from that of the m odel
estim ation set, the data set is out-of-universe. A n exam ple of
out-of-universe validation w ould be a m odel that w as trained on
m anufacturing firm s but tested on other industry sectors. This
approach validates the m odel hom ogeneously in tim e and w ill
not identify tim e dependence in the data.
Finally, the m ost flexible procedure is show n in the low er-right
quadrant. In addition to being segm ented in tim e, the data are
also segm ented across the population of obligors. A n exam ple
of this approach
6
w ould be a m odel constructed w ith data for all
rated m anufacturing firm s from 1980 to 1989 and tested on a
sam ple of all retail firm s rated Ba1 or low er for 1990 to 1999.
B ecause default events are infrequent and default m odel
outputs for consecutive years are highly correlated, it is often
im practical to create a m odel using one data set and then test it
on a separate hold-outdata set com posed of com pletely inde-
pendent cross-sectional data. W hile such out-of-sam ple and
out-of-tim e tests w ould unquestionably be the best w ay to
com pare m odelsperform ance, default data are rarely available.
As a result, m ost institutions face the follow ing dilem m a:
N If too m any defaulters are left out of the in-sam ple data
set, estim ation of the m odel param eters w ill be seriously
im paired and over-fitting becom es likely.
NIf too m any defaulters are left out of the hold-out sam ple,
it becom es exceedingly difficult to evaluate the true m odel per-
form ance due to severe reductions in statistical pow er.
In light of these problem s, an effective approach is to ratio-
nalise the default experience of the sam ple at hand by com bin-
Out of sample
Out of time
Out of universe
A
A
A
A
B B
Out of sample
Out of sample
Out of time
Out of sample
Out of universe
A
c
r
o
s
s

U
n
i
v
e
r
s
e
Across Time
NO YES
NO
YES
Figure 3: Schematic of out of sample validation techniques
Source: M oodys R isk M anagem ent Services
Moody's fits a model using a sample of historical data on firms and tests
the model using both data on those firms one year later, and using data on
new firms one year later (upper portion of exhibit). Dark circles represent
model estimation data and white circles represent validation data.
We do "walk-forward testing"(bottom left) by fitting the parameters of a
model using data through a particular year, and testing on data from the
following year, and then inching the whole process forward one year.
The results of the testing for each validation year are aggregated and then
resampled (lower left) to calculate particular statistics of interest.
ACTUAL
Low Credit Quality High Credit Quality
Low Credit
Quality
MODEL
High Credit
Quality
Figure 1: Types of errors
ACTUAL
Low Credit Quality High Credit Quality
Low
Credit
Quality
MODEL
High
Credit
Quality
Figure 2: Cost of errors
Correct Prediction Type II Error
Type I Error Correct Prediction
Correct Opportunity costs, and
Assessment lost potential profits.
Lost interest income
and origination fees.
Premature selling at
disadvantageous prices.
Lost interest and Correct Assessment
principle through
defaults. Recovery costs.
Loss in market value.
p51p56.qxd 15/05/00 12:22 Page 52
ing out-of-tim e and out-of-sam ple tests.
The procedure w e describe is often referred to in the trading
m odel literature as w alk-forw ardtesting and w orks as fol-
low s
7
. Select a year, for exam ple, 1989. Then, fit the m odel
using all the data available on or before the selected year. O nce
the m odel form and param eters are established, generate the
m odel outputs for all the firm s available during the follow ing
year (in this exam ple 1990).
N ote that the predicted m odel outputs for 1990 are out-of-
tim e for firm s existing in previous years, and out-of-sam ple for
all the firm s w hose data becom e available after 1989. N ow
m ove the w indow up one year, using all of the data through
1990 to fit the m odel and 1991 to validate it. The process is
repeated using data for every year.
Finally, collect all the out-of-sam ple and out-of-tim e m odel
predictions in a validation result set that can then be used to
analyse the perform ance of the m odel in m ore detail. N ote that
this approach sim ulates the process by w hich the m odel w ill
actually be used in practice. Each year, the m odel can be refitted
and used to predict the credit quality of all know n obligors, one
year hence. The process is outlined in the low er left of Figure 4.
For exam ple, for M oodys Public Firm D efault R isk M odel,
w e selected 1989 as the first year for w hich to construct the val-
idation result set. Follow ing the above procedure, w e con-
structed a validation result data set containing over 54,000
observations (firm years), obtained from a sam ple representing
about 9,000 different firm s, and including over
530 default events from M oodys extensive data-
base.
O nce a result set of this type has been pro-
duced, a variety of perform ance m easures of
interest can be calculated. It is im portant to note
that the validation set is itself a sub-sam ple of the
population and, therefore, m ay yield spurious
m odel perform ance differences based only on
data anom alies. Several resam pling techniques
are available to leverage the available data and
reduce the dependency on the particular sam ple
at hand
8
.
A typical resam pling technique proceeds as
follow
9
. From the result set, a sub-sam ple is
selected at random . The selected perform ance
m easure (for exam ple, the num ber of defaults
correctly predicted) is calculated for this sub-
sam ple and recorded.
A nother sub-sam ple is then draw n, and the
process is repeated. This continues for m any rep-
etitions until a distribution of the perform ance
m easure is established. A schem atic of this vali-
dation process is show n in Figure 4.
Resam pling approaches provide tw o related
benefits. First, they give an estim ate of the vari-
ability around the actual reported m odel perform -
ance. In those cases in w hich the distribution of
m eans converges to a know n distribution, this
variability can be used to determ ine w hether dif-
ferences in m odel perform ance are statistically
significant using fam iliar statistical tests. In cases w here the
distributional properties are unknow n, non-param etric perm u-
tation type tests can be used instead.
Second, because of the low num bers of defaults, resam -
pling approaches decrease the likelihood that individual default
events (or non-defaults) w ill overly influence a particular m odels
chances of being ranked higher or low er than another m odel.
Model performance and benchmarking
W e introduce four objective m etrics for analysing inform ation
redundancy
10
, and m easuring and com paring the perform ance
of credit risk m odels to predict default events: cum ulative accu-
racy profiles, accuracy ratios, conditional inform ation entropy
ratios, and m utual inform ation entropy. These techniques are
quite general and can be used to com pare different types of
m odels.
In order to dem onstrate the applicability of the m ethodology
described here, w e com pared six univariate and m ultivariate
m odels of credit risk using M oodys proprietary databases,
including our default database and our credit m odelling data-
base. W e com pared the follow ing m odels:
1) a sim ple univariate m odel based on return on assets (RO A )
2) reduced Zscore m odel
11
(1993)
3) Zscore m odel (1993)
4) a hazard m odel
12
(1998)
5) a variant of the M erton m odel based on distance to default
13
,
default risk
A
B
A
B
A
B
A
B
A
B
Training set of firms taken at t
o
Validation set of original firms in
training sample but taken at t
1
Validation set of new firms not in
training sample and taken at t
1
-2 -1 0 1
0 .0
0 .2
0 .4
0. 6
Error
-3 -2 -1 0 1 2 3 0. 0
0. 05
0. 10
0. 1 5
0 .2 0
0 . 25
0. 3 0
Error
-3 -2 -1 0 1 2 3
0. 0
0. 1
0 .2
0. 3
0. 4
Error
-3 -2 -1 0 1 2 3
0. 0
0 .1
0. 2
0 .3
0. 4
0 .5
Error
Resampling
}
1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999
Figure 4: Testing methodology: end-to-end

Moody's fits a model using a sample of historical data on firms and tests the model using
both data on those firms one year later, and using data on new firms one year later (upper
portion of exhibit). Dark circles represent model estimation data and white circles represent
validation data.
We do "walk-forward testing"(bottomleft) by fitting the parameters of a model using data
through a particular year, and testing on data fromthe following year, and then inching the
whole process forward one year. The results of the testing for each validation year are
aggregated and then resampled (lower left) to calculate particular statistics of interest.
p51p56.qxd 15/05/00 12:22 Page 53
and
6) M oodys Public Firm m odel, a m odel based on ratings, m ar-
ket and financial inform ation (2000).
These m odels represent a w ide range of m odelling
approaches listed in order of com plexity.
Cumulative Accuracy Profiles (CAPs)
C um ulative Accuracy Profiles (C A P) can be used to m ake visual
qualitative assessm ents of m odel perform ance. W hile sim ilar
tools exist under a variety of different nam es (lift-curves,
dubbed-curves, receiver-operator curves, pow er curves, etc)
M oodys use of the term C A P refers specifically to the case
w here the curve represents the cum ulative probability of
default over the entire population, as opposed to the non-
defaulting population only.
To plot a Type I C um ulative Accuracy Profiles, com panies are
first ordered by m odel score, from riskiest to safest. For a given
fraction x% of the total num ber of com panies, a C A P curve is
constructed by calculating the percentage y(x) of the defaulters
w hose risk score is equal to or low er than the one for fraction x.
Figure 5 show s an exam ple of a C A P plot.
A good m odel concentrates the defaulters at the riskiest
scores and, therefore, the percentage of all defaulters identified
(the y axis in the figure above) increases quickly as one m oves up
the sorted sam ple (along the x axis). If the m odel-assigned risk
scores random ly, w e w ould expect to capture a proportional frac-
tion of the defaulters w ith aboutx% of the observations, gener-
ating a straight line or Random CAP (the dotted line in Figure 5).
A perfect m odel w ould produce the Ideal C A P, w hich is a
straight line capturing 100% of the defaults w ithin a fraction of
the population equal to the fraction of defaulters in the sam ple.
Because the fraction of defaulters is usually a sm all num ber, the
ideal C A P is very steep.
O ne of the m ost useful properties of C A Ps is that they reveal
inform ation about the predictive accuracy of the m odel over its
entire range of risk scores for a particular tim e horizon.
Figure 6 show s the C A P curves for several m odels using the
validation sam ple. Sim ilar results are obtained for the in-sam ple
tests
14
. N ote that M oodys m odel appears to outperform all of
the benchm ark m odels consistently.
Accuracy ratios
It is often convenient to have a single m easure that sum m arises
the predictive accuracy of a m odel. To calculate one such sum -
m ary statistic, w e focus on the area that lies above the Random
C A P and below the m odel C A P. The m ore area there is below
the m odel C A P and above the R andom C A P, the better the
m odel is doing overall (see Figure 5).
The m axim um area that can be enclosed above the Random
C A P is identified by the Ideal C A P. Therefore, the ratio of the
area betw een a m odels C A P and the random C A P to the area
betw een the ideal C A P and the random C A P sum m arises the
predictive pow er over the entire range of possible risk values.
W e refer to this m easure as the Accuracy Ratio (A R ), w hich is a
fraction betw een 0 and 1. A R values close to 0 display little
advantage over a random assignm ent of risk scores, w hile
those w ith A R values near 1 display alm ost perfect predictive
pow er. M athem atically, the A R value is defined as
H ere y(x) and z(x) are the Type I and Type II C A P curves for a
population x of ordered risk scores, and f = D/(N+D) is the frac-
tion of defaults, w here D is the total num ber of defaulting oblig-
ors and N is the total num ber of non-defaulting obligors. N ote
that our definition of A R provides the sam e perform ance m eas-
ure for Type I and Type II C A P curves.
In a loose sense, A R is sim ilar to the Kolm ogorov-Sm irnov
(KS) test designed to determ ine if the m odel is better than a ran-
dom assignm ent of credit quality. H ow ever, AR is a global m eas-
ure of the discrepancy betw een the C A Ps w hile the KS test
focuses only on the m axim um discrepancy and can be m islead-
ing in cases w here tw o m odels behave quite differently, as they
cover m ore of the data space from low risk to high risk m odel
outputs. Also notice that, because the com parison of ARs is rel-
ative to a data set, our definition of the AR is not restricted to hav-
ing com pletely independent sam ples as in the KS test
15
.
M ost of the m odels w e tested had A R s in the range of 50%
to 75% for (out-of-sam ple and out-of-tim e) validation tests. The
results w e report here are the product of the resam pling
approach described in the previous section. Thus, in addition to
the reported value, w e are also able to estim ate an error bound
for the statistic through resam pling. W e found that the m axi-
m um absolute deviation of the A R is of the order of 0.02 for
AR
y x dx
f
z x dx
f
=
z z
2 1
1
1 2
0
1
0
1
( ) ( )
default risk
Figure 5. Type I CAP curve
x
y
Perfect Model
(Ideal CAP)
Nave Model
(RandomCAP)
Performance
Differential Model being
evaluated
Figure 5: Type I CAP curve
The dark curved line shows the performance of the model being evaluated.
It depicts the percentage of defaults captured by the model (vertical axis)
vs. the model score (horizontal axis). The heavy dotted line represents the
nave case (which is equivalent to a randomassignment of scores).
The grey dashed line represents the case in which the model is able to dis-
criminate perfectly and all defaults are caught at the lowest model output.
The grey region represents the performance differential between the nave
model and the model being evaluated.
p51p56.qxd 15/05/00 12:22 Page 54
m ost m odels
16
.
Table 1 show s A R values for the tested m odels for
in-sam ple
17
and validation tests. To confirm the validity of the
A R figures, w e also checked w hether a particular m odel differed
significantly from the one ranked im m ediately above it by cal-
culating KS statistics, using about 9,000 independent observa-
tions selected from the validation set. M ore precisely, KS tests
show ed that only the reduced Z-score and RO A w ere not sig-
nificantly different.
Conditional information entropy ratio
A different perform ance m easure is based on the inform ation
about defaults contained in the distribution of m odel scores, or
inform ation entropy (IE). Intuitively, the inform ation entropy
m easures the overall am ount of uncertaintyrepresented by
a probability distribution. In the sam e w ay w e reduced the C A P
plot to a single A R statistic, w e can reduce the inform ation
entropy m easures into another useful sum m ary statistic: the
C onditional Inform ation Entropy Ratio
18
(C IER ).
To calculate the C IER , w e first calculate the inform ation
entropyH
0
=H
1
(p) w ithout attem pting to control for any know l-
edge that w e m ight have about credit quality. H ere P is the
aggregate default rate of the sam ple and H
1
is the inform ation
entropy defined in the A ppendix
19
. This entropy reflects know l-
edge com m on to all m odels: the likelihood of the event given by
the probability of default. W e then calculate the inform ation
entropy H
1
(R) after having taken into account the risk scores
R ={R
1
,,R
N
}of the selected m odel. The C IER is defined as
20
If the m odel held no predictive pow er, the C IER w ould be 0.
In this case the m odel provides no additional inform ation on the
likelihood of default that is not already know n. If it w ere per-
fectly predictive, the C IER w ould be 1. In this case, there w ould
be no uncertainty about the default event and, therefore, per-
fect default prediction. Because C IER m easures the reduction
of uncertainty, a higher value indicates a better m odel. Table 2
show s the C IER results. C IER errors are of the order of 0.02 and
are obtained w ith a resam pling schem e sim ilar to the one
described for the A R statistic.
Mutual information entropy
To this point w e have been describing m ethods of com paring
m odels on the assum ption that the best perform ing m odel
w ould be adopted. H ow ever, it is not unreasonable to question
w hether a com bination of m odels m ight perform better than
any individual one. Tw o m odels m ay both predict 10 out of 20
defaulters in a sam ple of 1,000 obligors.
U nfortunately, this inform ation does not provide guidance on
w hich m odel to choose. If each m odel predicted a different set
of 10 defaulters, then using both m odels w ould have double the
predictive accuracy of either m odel individually
21
. In practice,
there is considerable overlap, or dependence, in w hat tw o m od-
els w ill predict for any given data sam ple.
To quantify the dependence betw een any tw o m odels A and
B , w e use a m easure called the m utual inform ation entropy
(M IE). The m utual inform ation entropy is a m easure of how
m uch inform ation can be predicted about m odel B given the
CIER R
H H R
H
( )
( )
=

0 1
0
default risk
Figure 6. CAP curves for
Selected CAP curves
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
0% 20% 40% 60% 80% 100%
Population
Random
ROA
reduced Z'-score
Z'-score
Hazard Model
Merton Model Variant
Moody's Model
Figure 6: CAP curves for the tested models
This composite figure shows the CAP curves for six models. All models
were tested on the same data set. The 45 dashed grey line represents
the nave (which is equivalent to a randomassignment of scores).
Note that Moody's model performs better than the Merton model variant
at discriminating defaults in the middle ranges of credits.
In-sample AR Validation AR
ROA
Reduced Z-Score
Z -Score
Hazard model
Moodys Model
Table 1: Selected Accuracy Ratios
0.53 0.53
0.56 0.53
0.48 0.43
0.59 0.58
0.67 0.67
0.76 0.73
In-sample CIER Validation CIER
ROA
Reduced Z-Score
Z -Score
Hazard model
Moodys Model
Table 2: Selected Entropy Ratios
0.06 0.06
0.10 0.09
0.07 0.06
0.11 0.11
0.14 0.14
0.21 0.19
p51p56.qxd 15/05/00 12:22 Page 55
output of m odel A . M IE is defined as
w here r and Rare the risk score sets of m odels A and B respec-
tively, and H
2
(r,R) is the joint entropy defined in the A ppendix.
Because the M IE is calculated w ith the joint conditional distri-
bution of m odels A and B, this m easure requires a large num -
ber of defaults to be accurate. W hen default data are not w idely
available, this requirem ent can be relaxed by including reliable
degrees of credit quality, such as agency ratings, instead of
defaults only.
If m odels A and B are independent, the m utual inform ation
entropy is zero, w hile if m odel B is com pletely dependent on
m odel A then M IE = 1-C IER (A ). The additional uncertainty gen-
erated by m odel B can be estim ated by com paring w ith the
uncertainty generated by m odel A alone. In this context, the
statistic serves m uch the sam e function as a correlation coeffi-
cient in a classic regression sense. H ow ever, the M IE statistic
is based on the inform ation content of the m odels.
Table 3 show s the difference D = M IE(A ,B ) M IE(A ,A ),
w here A is M oodys m odel and B is any of the other selected
m odels. In this exam ple, w e have com pared all the benchm ark
m odels to M oodys m odel to determ ine if they contain redun-
dant inform ation.
Summary
The benefits of im plem enting and using quantitative risk m od-
els cannot be fully realised w ithout an understanding of how
accurately any given m odel represents the dynam ics of credit
risk. This m akes reliable validation techniques crucial for both
com m ercial and regulatory purposes.
In the course of our research into quantitative credit m odel-
ling, w e have found that sim ple statistics
22
(such as the num ber
of defaults correctly predicted) are often inappropriate in the
dom ain of credit m odels. As a result, w e have developed sev-
eral useful m etrics that give a sense of the value added by a
quantitative risk m odel.
The four such m easures presented here perm it analysts to
assess the am ount of additional predictive inform ation con-
tained in one credit m odel versus another. In situations w here
a specific m odel contains no additional inform ation relative to
another, the less inform ative should be discarded in favor of the
m ore inform ative. In the special case w here both m odels con-
tribute inform ation to each other, users m ay w ish to com bine
the tw o to garner additional insight. I
Jorge Sobehart is vice president, senior analyst, risk m an-
agem ent services at M oodys Investors Service
e-m ail: sobeharj@ m oodys.com
Sean Keenan is vice president, senior analyst, risk m anage-
m ent services at M oodys. Roger Stein is vice president, senior
credit officer, and director of quantitative m odelling analytics
A full version of this paper, including a m athem atical descrip-
tion of inform ation entropy and a full list of references, is avail-
able from the authors. C ontact +44 (0)20 7772 5454.
MIE r R
H
H r H R H r R ( , ) ( ) ( ) ( , ) = +
1
0
1 1 2
b g
default risk
FOOTNOTES
1
See, for exam ple, H errity, Keenan, Sobehart, C arty and Falkenstein (1999).
2
Basel, op. cit., p. 50.
3
Accuracy m ay be only one of m any m easures of m odel quality. See D har
and Stein (1997).
4
In-sam ple refers to observations used to build a m odel. O ut-of-sam ple
refers to observations that are not included in the in-sam ple set. O ut-of-uni-
verse refers to observations w hose distribution differs from the in-sam ple pop-
ulation. O ut-of-tim e refers to observations that are not contem porary w ith the
in-sam ple set.
5
This presentation follow s closely that of D har and Stein (1998), Stein
(1999), and Keenan and Sobehart (1999), w ith additional clarifications from
Sobehart, Keenan and Stein (2000).
6
This case is particularly im portant w hen one type of error is m ore serious
than another. To illustrate, an error of tw o notches for an A a-rated credit is gen-
erally less costly than a sim ilar error for a B -rated credit.
7
See Sobehart, Keenan and Stein (2000).
8
The bootstrap (e.g., Efron, B. and R . J. Tibshirani (1993)), random isation
testing (e.g., Sprent, P. (1998)), and cross-validation (ibid.) are all exam ples of
resam pling tests.
9
See, for exam ple, H errity, Keenan, Sobehart, C arty and Falkenstein (1999).
10
See Keenan and Sobehart (1999).
11
For the definition of the original Z score and its various revisions Zsee
C aouette, A ltm an, N arayanan (1998).
12
For sim plicity w e selected the m odel based on Zm ijw eskis variables
described in Shum w ay (1998).
13
For this research, M oodys has adapted the M erton m odel (1974) in a
sim ilar fashion to w hich KM V has m odified it to produce their public firm m odel.
M ore specifically, w e calculate a D istance to D efault based on equity prices and
firm s liabilities. See also Vasicek (1984) and M cQ uow n (1993). For an exact def-
inition of M oodys distance to default m easure see Sobehart, Stein, M ikityan-
skaya and Li (2000).
14
H ere in-sam ple refers to the data set used to build M oodys m odel.
15
In fact, A R based on panel data sets w ill provide aggregated inform ation
about the tim e correlation of the risk scores.
16
D ue to the high levels of correlation in the resam pling, the m axim um
absolute deviation gives a m ore robust estim ate of an error range than a cor-
rected standard error.
17
H ere in-sam ple refers to the data set used to build M oodys m odel.
18
This is sim ilar to m easures such as gain ratios used in the inform ation
theory and tim e series analysis literature (see, for exam ple, Prichard and Theiler
(1995)). H ow ever, our definition m easures explicitly the uncertainty to predict
defaults instead of the overall uncertainty in the distribution of m odel outputs.
19
For additional details see Keenan and Sobehart (1999).
20
C IER = 1- IER , w here IER is the inform ation entropy ratio defined in H er-
rity, Keenan, Sobehart, C arty and Falkenstein (1999). H ere w e introduce C IER
for consistency w ith the concept of conditional entropy in Inform ation Theory
and C om m unication Theory.
21
O f course com bining the m odels could also create ancillary trade-offs
w ith respect to increased Type II error.
22
For an exam ple of a m ore standard approach to validation see: C aouette,
A ltm an and N arayanan (1998).
In-sample MIE In-sample D Validation MIE Validation D
ROA
Reduced Z-Score
Z -Score
Hazard model
Moodys Model
Table 3: Difference of Mutual Information Entropy
0.96 0.17 0.97 0.16
0.93 0.14 0.96 0.15
0.95 0.16 0.98 0.17
0.91 0.12 0.92 0.11
0.87 0.08 0.87 0.06
0.79 0 0.81 0
The additional uncertainty generated by a model can be estimated by
comparing it with the uncertainty generated by Moody's model alone.
Table 3 shows the difference D =MIE(A,B) - MIE(A,A), where A is
Moody's model and B is any of the other selected models.
p51p56.qxd 15/05/00 12:22 Page 56

So Be Hart Keenan Stein 2000

Caricato da

Informazioni sul documento

Descrizione originale:

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

So Be Hart Keenan Stein 2000

Caricato da

Copyright:

Formati disponibili

default risk

51: Credit : May 2000

Figure 4: Testing methodology: end-to-end

Potrebbero piacerti anche