Sei sulla pagina 1di 14

Journal of Risk Model Validation (316)

Volume 1/Number 4, Winter 2007/08

Validation of banks internal rating systems: a challenging task?


Stefan Blochwitz
Department for Banking and Financial Supervision, Deutsche Bundesbank, Wilhelm-Epstein-Strae 14, D-60431 Frankfurt am Main, Germany; email: stefan.blochwitz@bundesbank.de

The revised risk-based capital framework (Basel II) developed by the Basel Committee on Banking Supervision (BCBS) aims at further strengthening and stabilizing banking systems around the world. A key element of Basel II is the use of banks internal ratings systems for the calculation of the minimum regulatory capital charge. We suggest here a pragmatic approach to assess the validity of such internal rating systems. This approach follows the BCBS principles that were developed on the assumption that a holistic approach for validation has to be taken. Consequently, our approach establishes a broad view on validation. Our goal is to embed into these principles on validation our comprehensive approach that integrates backtesting of rating systems into a top-down assessment of the appropriateness of rating systems. For the purpose of this article the BCBS principles mark our starting point.

1 IRBA VALIDATION WHERE ARE WE? 1.1 Basel II the framework


The revised risk-based capital framework (Basel II) developed by the Basel Committee on Banking Supervision (BCBS)1 aims at further strengthening and stabilizing banking systems around the world. The validation of banks internal ratings systems, which can be used for providing the input for the calculation of the minimum regulatory capital charge as suggested by the internal ratings-based approach (IRBA) in Basel II, is a key challenge for both banks and supervisors alike.
The views expressed herein are the authors own and do not necessarily reect those of the Deutsche Bundesbank. The author is indebted to Stefan Hohl from the BIS for his profound ideas, his valued collaboration and his in-depth contribution to this paper. 1 The Basel Committee on Banking Supervision is a committee of banking supervisory authorities that was established by the central bank governors of the Group of Ten countries in 1975. It consists of senior representatives of bank supervisory authorities and central banks from Belgium, Canada, France, Germany, Italy, Japan, Luxembourg, the Netherlands, Spain, Sweden, Switzerland, the United Kingdom and the United States.

S. Blochwitz

The early interpretation of the Basel validation requirements was mainly on aspects related to the narrow focus of backtesting. However, as a result of an intensive discussion among banks and supervisors, the BCBS developed guidelines on validation, and published it in the BCBS newsletter No. 4 (BCBS (2005b)). The BCBS guidelines were developed on the assumption that a holistic approach for validation has to be taken. Consequently, these guidelines establish a broad view on validation. Our goal is to embed into these principles on validation our comprehensive approach that integrates backtesting into a top-down assessment of the appropriateness of rating systems. For the purpose of this article the BCBS principles mark our starting point. The established broad view on validation by the BCBS newsletter No. 4 reinforces the importance of the minimum requirements of the IRBA, as well as highlighting the importance of the risk management process. In addition, it ts nicely within our long-time established view on the use of backtesting in a topdown assessment. A well-functioning credit rating system should demonstrate that the risk categories or rating buckets differ in terms of their risk content. The quantication of risk parameters is based on a banks own historical experience, backed by other public information and, to a certain extent, private information. The objective of the validation of a rating system is to assess whether a rating system can and ultimately does full its task of, rstly, accurately distinguishing risk in a relative order and, secondly, quantifying credit risk in an absolute measure. The common view describes the term validation as a means of combining quantitative and qualitative methods. If applied together, it should indicate whether a rating system measures credit risk appropriately and whether it is properly implemented in the banks risk management process. Basel II has as one of its key objectives the improvement of a banks risk management processes. The basic idea is to minimize the gap between internal risk measurement and management assessments and regulatory requirements. As such, the methodologies and systems used in a banks day-to-day operations and risk management practices should serve as a foundation for regulatory capital requirements. One of the key components for supervisors is the so-called use-test requirement (see BCBS (2005a), paragraph 444). The use test for banks demands that internal ratings and default and loss estimates must play an essential role in the credit approval, risk management, internal capital allocation and corporate governance functions of banks using the IRB approach. Clearly, its assessment will play a crucial role in determining compliance with statutory standards, including the validation of IRB systems. In some instances, differences between IRB components and other internal risk estimates are unavoidable as they result
Journal of Risk Model Validation Volume 1/Number 4, Winter 2007/08

Validation of banks internal rating systems

from differences between prudential requirements in Basel II and economicallydriven risk management practices.2 However, the calculation of minimum regulatory capital is legally binding and must be set by the regulator. As such, the bottomline is that IRB components that are produced and used only for regulatory purposes will not be good enough for supervisory approval of the IRBA. The nal framework of Basel II clearly highlighted the importance of sound validation of IRB systems if used for regulatory purposes but stopped short of specifying more detailed guidance on the subject at hand.

1.2 Basel II expectations on validation claried


The BCBS newsletter No. 4, January 2005 (see BCBS (2005b)), informs about supervisory expectations3 in the area of validation in Basel II. One of the most important pieces of information provided was the relatively simple answer to the question What aspects of validation will be looked at? Despite the importance of validation as a requirement of the IRB approach, Basel II documents, however, do not explicitly specify what constitutes validation. Consequently, the Subgroup reached the agreement that, in the context of rating systems, the term validation encompasses a range of processes and activities that contribute to assessing whether ratings adequately differentiate risk and, more importantly, whether estimates of risk components (such as PD, LGD or EAD) appropriately characterize and quantify the relevant risk dimension. Starting with this denition, the AIGV developed six important principles constituting a broad framework for validation as below (see BCBS (2005b)). The validation framework covers all aspects of validation, including the goal of validation (principle 1), the responsibility for validation (principle 2), expectations on validation techniques (principles 3, 4 and 5) and the control environment for validation (principle 6). Publishing these principles was a major step in clarifying the ongoing discussions between banks and their supervisors about validation. The principles establish a broad view on validation. In the past, quite often validation was seen as being restricted to only dealing with aspects related to the narrow focus of backtesting. The established broad view on validation reinforces the importance of the minimum requirements of the IRBA, as well as highlighting the importance of the risk management process. The debate around IRBA was too
2 For example, different regulatory and accounting requirements for downturn loss given

defaults (LGD), see BCBS (2005c), could be mentioned in this context. Another example is related to the quantication process of the IRBA risk parameters (probability of default (PD), LGD, exposure at default (EAD)). Basel II focuses on long-term averages to reduce volatility of minimum regulatory capital requirements, which may not be fully in line with bank practice leading to a different quantication process for the sole purpose of meeting supervisory standards. 3 The Subgroup on Validation (AIGV)6 of the BCBS Accord Implementation Group (AIG) was established in 2004 with the objective to share and exchange views related to the validation of IRB systems. Research Paper www.journalofriskmodelvalidation.com

S. Blochwitz

Panel 1 Six basic principles for validation.


Principle 1: Validation is fundamentally about assessing the predictive ability of a banks risk estimates and the use of ratings in credit processes The two-step process for ratings systems requires banks to, rstly, discriminate adequately between risky borrowers (ie, being able to discriminate between risks and its associated risk of loss) and, secondly, calibrate risk (ie, being able to accurately quantify the level of risk). The IRB parameters must, as always with statistical estimates, be based on historical experience, which should form the basis for the forward-looking quality of the IRB parameters. IRB validation should encompass the processes for assigning those estimates including the governance and control procedures in a bank. Principle 2: The bank has primary responsibility for validation The primary responsibility for validating IRB systems lies with the banks itself and does not remain with the supervisor. This certainly should reect the self-interest and the need for banks to have a rating system in place reecting its business. Supervisors obviously must review the banks validation processes and should also rely upon additional processes in order to get the adequate level of supervisory comfort. Principle 3: Validation is an iterative process Setting up and running an IRB system in real life is by design an iterative process. Validation, as an important part of this circle, should therefore be an ongoing, iterative process following an iterative dialogue between banks and their supervisors. This may result in a renement of the validation tools used. Principle 4: There is no single validation method Many well-known validation tools such as backtesting, benchmarking, replication, etc, are a useful supplement to the overall goal of achieving a sound IRB system. However, there is unanimous agreement that there is no universal tool available, which could be used across portfolios and across markets. Principle 5: Validation should encompass both quantitative and qualitative elements Validation is not a technical or solely mathematical exercise. Validation must be considered and applied in a broad sense, with its individual components such as data, documentation, internal use and the underlying rating models and all processes that the rating system uses being equally important. Principle 6: Validation processes and outcomes should be subject to independent review For IRB systems, there must be an independent review within the bank. This species neither the organization in the bank nor its relationship across departments, but the review team must be independent of the designers of the IRB system and those who implement the validation process.

often restricted to sole consideration of risk quantication or risk measurement. We think that this more balanced perspective, including the deliberations more of the qualitative aspects of the IRBA, better reects the issues of establishing and validating rating systems, especially given the important limitations in data.
Journal of Risk Model Validation Volume 1/Number 4, Winter 2007/08

Validation of banks internal rating systems

Principles 3 to 5 now establish a comprehensive approach for validating rating systems. We consider three mutually supporting ways to validate the internal rating systems of banks (see Blochwitz and Hohl (2006)). The AIGV principles 3 to 5 could be interpreted in a similar way. With that, we now establish a comprehensive approach for validating rating systems, which encompasses a range of processes and activities that contribute to the overall assessment and nal judgement. (i) Component-based validation: analyzes each of the three elements data collection and compilation, quantitative procedure and human inuence for appropriateness and workability. (ii) Result-based validation (also known as backtesting): analyzes the rating systems quantication of credit risk ex post. (iii) Process-based validation: analyzes the rating systems interface with other processes in the bank and how the rating system is integrated into the banks overall management structure. In the following we focus solely on result-based validation or backtesting and its integration into the comprehensive validation approach.

2 BACKTESTING RECONSIDERED 2.1 The role of backtesting and its limitations


The application of statistical backtesting in the IRB approach, as introduced in paragraph 501 of the Basel II document, requires the comparison of the ex ante PD per rating category and, in the special case of consumer loans, the expected loss parameter (EL) per pool of exposures,4 with its ex post realization. A common and widespread approach for backtesting rating assignments and its quantications is the application of the law of large numbers and the inference from the observed default rates of the PD.5 It seems to us that almost all backtesting techniques for PD (or EL) rely on this statistical concept. However, a proper application requires that borrowers are grouped into grades exhibiting
4 In the special case of consumer loans, the estimation and validation of key parameters is

dependent on the approach taken by a bank. A conceptually similar rating system as used for wholesale borrowers leads to an analogous assessment for purposes of validation. In contrast, instead of rating each borrower separately, Basel II clusters loans in homogenous portfolios during the segmentation process (see above). This segmentation process should include assessing borrower and transaction risk characteristics such as product type, etc, as well as identifying the different delinquency stages (30 days, 60 days, 90 days, etc). Subsequently, the risk assessment on a (sub) portfolio level could be based on its roll rates, ie, a transaction moving from one delinquency stage to another. 5 Besides the fact that an application of the law of large numbers requires that defaults are uncorrelated, there is another subtle violation in the prerequisites for applying the law of large numbers. It is required that the defaults stem from the same distribution. This requirement cannot be fullled for different borrowers. Research Paper www.journalofriskmodelvalidation.com

S. Blochwitz

similar default risk characteristics.6 This is necessary even in the case of direct estimates of PD when each borrower is assigned an individual PD. Statistical tests introduce test statistics for the distribution of default rates under the assumption that the PD estimate of a rating grade k is given by PDk . Then, with the help of the test statistics, a mathematical relationship between a critical value pmax and a predened condence level can be derived. This allows to calculate pmax so that the ex post observed number of defaults exceeds pmax by only a probability of 1 ; in other words, we are % condent that the observed number of defaults does not exceed pmax . However, if it does go over pmax , then the calibration of the rating systems grade appears to be inappropriate. As a consequence, a potentially proper calibration is rejected with probability 1 , although the observed default rate may be too high only by accident, ie, the false alarm rate of the test. Thus, there is a trade-off between the chosen level of condence and the false alarm rate of the test. A higher condence level implies a lower false alarm rate. Such a designed test identies the really deteriorated rating grades, but the critical value pmax increases as well. At the end such a designed test would accept any rating system and this eventually is meaningless. If, on the contrary, the chosen is rather low, other rating systems, which may even be better calibrated, will be rejected. The application of a meaningful statistical test requires a proper balance between the critical value and the false alarm rate.7 A discussion of a variety of statistical tests, and their advantages and disadvantages, can be found in Blochwitz et al (2006) and the references given therein. The most obvious limitations in the application of backtesting are due to the scarcity of data as discussed above. However, there are others as well. Firstly, PD are estimated for the time horizon of one year leading to only one data point per annum available to perform the comparison. Secondly, measuring PD is heavily dependent on the denition of a credit default, which in many instances is subjective. Basel II suggests retaining this subjective element as the basis of the IRB approach, albeit with a forward-looking focus and a back-stop ratio of 90 days past due. This may be justied, not least by the fact that a signicant number of defaulted borrowers seem to have a considerable inuence on the timing of their own credit default. Thirdly, the impact of a banks rating philosophy, which is often referred to as point-in-time (PIT) versus through-the-cycle (TTC) ratings, on backtesting is largely not reected in pure statistical tests. Fourthly, the impact of correlations of defaults has to be taken into account. There is no doubt that default correlations exist, and they are usually modeled in credit risk portfolio models.
6 We believe that validation of rating systems, more specically, the calibration of PD, seems

almost impossible without the grouping of borrowers into grades exhibiting the same risk prole as required in Basel II. 7 Just to give an example: if defaults are assumed to be uncorrelated, then for a rating grade with PD = 1% and 500 borrowers, for = 99% the critical value pmax would be 2%; otherwise, if pmax were set to a more reasonable value of say, 1.2%, the false alarm rate of such a test would be 33%. These gures are even more disappointing if correlations are taken into account or is set to some higher condence levels. Journal of Risk Model Validation Volume 1/Number 4, Winter 2007/08

Validation of banks internal rating systems

However, in reality they are not easy to measure, and values for correlations reecting real economic conditions cannot be estimated in a simple and reliable way. Their value can and should always be debated for these reasons. PIT ratings measure credit risk given the current state of a borrower in its current economic environment, whereas TTC ratings measure credit risk taking into account the (assumed) state of the borrower over a full economic cycle. PIT and TTC ratings mark the two ends of the spectrum of possible rating systems. In practice, neither pure TTC nor pure PIT systems will be found, but hybrid systems leaning more to either PIT or TTC are being used. Broadly (and taking a simplied view), agency ratings are assumed to be more TTC, whereas bank internal systems are considered be more PIT. The rating philosophy has an important impact on backtesting. Again, after simplication, for TTC systems, borrower ratings grades are stable over time, reecting the long-term full cycle assessment. However, the observed default rates for the individual grades are expected to vary over time in accordance with the change in the economic environment. This is in contrast to that of rating systems based on the PIT philosophy. By reacting more quickly to changing economic conditions, borrower ratings tend to migrate more often through the rating grades throughout the business cycle, whereas the PD for each grade is expected to be more stable over time. This means that the PD is less dependent on the current economic environment. The BCBS did not favor one system over the other and both ratings philosophies, PIT and TTC, are acceptable for the IRBA. However, we think it seems reasonable to view risk parameters as a forecast for future realizations within a one-year time horizon. This reasoning is also reected in the rst validation principle of the AIGV, where a forward looking element is required to be included in the estimation of Basel IIs risk parameters.

2.2 A proposal for a statistical multi-period test


The Basel II framework (cf BCBS (2005a), paragraph 501) seems to suggest backtesting based on a variety of statistical tests as one of the key techniques in the validation process of internal rating systems. However, if this were the full approach for validating IRB systems, the complexity of validation would be reduced to a too simplistic consideration and, in fact, it would raise the question of a possible misinterpretation of Basel II. Consequently, the role of backtesting has to be reconsidered. We deem that backtesting must be used as one of the supporting tools in a comprehensive validation approach. We favor the idea of using backtesting techniques as the rst step in a top-down validation approach. The techniques help to identify possibly deteriorated rating grades or rating systems. The overall assessment should further take into account all components and processes as they have been discussed in the previous section. Decision-making on the functioning or non-functioning of a rating system based solely on statistical tests seems to be possible only in very rare cases. The challenging task is to identify under what situations statistical tests are very likely
Research Paper www.journalofriskmodelvalidation.com

10

S. Blochwitz

to fail. A possible way could be the use of a trafc light approach as discussed below. Dating back to the approval of market risk models for regulatory purposes, the idea of using a trafc light approach for model validation seems to be a considerable exploratory extension to statistical tests. For value-at-risk outliers produced by market risk models, a binomial test with green, yellow and red zones was implemented by the BCBS (cf BCBS (1996)). The outcome of the BCBS trafc light approach leads eventually to higher regulatory capital charges depending on the divergence between the prognosed value-at-risk and the observed losses, signalled by the color scheme. Any trafc light approach for IRB systems is based on the idea of translating the deviation between the PD and the observed default rate of a rating grade into a color scheme. The more red the color, the greater is the excess of the observed default rate over the estimated parameter PD. A real advantage of the application of trafc light approaches is their underlying message, ie, they do not pretend to be more exact than they are in reality. The contrary can be sometimes found in the use of statistical tests, mainly due to the fact that all relevant values can be calculated or set with arbitrary precision. This may tempt their users to take their results more seriously than they ought to. Trafc light approaches avoid this unfavorable behavior due to their built-in imprecision. We see a trafc light approach as the best way to support the suggested top-down approach. Tasche (2003) picks up the idea of a trafc light approach with three colors green, yellow and red for the validation of default probabilities. The basic idea again is to introduce probability levels low = 0.950 and high = 0.999, dened via their respective border values dgreen/yellow and dyellow/red , for the observed number of defaults for a rating grade. The levels are chosen to assure that in the model used, with the ex post observed number of defaults, dk exceeds the level dgreen/yellow by only a probability of 1 low (and, similarly, for dyellow/red by probability 1 high ). We note that the exemplary levels are similar to those used for market risk (cf BCBS (1996)). As is common practice, defaults are modeled with the standard one-factor model, so therefore the impact of correlated default events are included by the chosen asset correlations. The analysis of the problem applies the Vasicek model (cf Crosbie and Bohn (2001)), with asset correlation , an independent standard normal random variables X and i , i = 1, . . . , Ni , for a threshold value c.8 The following expression then simply counts the defaults among the Ni borrowers: dk =
Nk i =1
8 Note that for reasons of convenience critical values c instead of default probabilities are

1(,c] ( X +

1 i )

used. The relationship between these values is given by p = (c) with the standard normal distribution function. By the same formula the critical values can be transformed into dgreen/yellow and dyellow/red . Journal of Risk Model Validation Volume 1/Number 4, Winter 2007/08

Validation of banks internal rating systems

11

Then, as a next step, the critical values ccrit are determined. These values for a level are given by: ccrit = min{i : P (dk i) 1 } When calculating the thresholds, the choice of asset correlation is of crucial importance as it has a high inuence on the results. The higher the asset correlation, the higher are the critical value, and thus the higher are the borders dgreen/yellow and dyellow/red . This is a drawback of this approach suggested by Tasche, since, as mentioned above, reliable values for asset correlations are not available. An alternative approach for implementing a trafc light based judgment called the extended trafc light approach (ETLA) was proposed in Blochwitz et al (2005). The advantage of the ETLA is that it does not require an explicit specication of asset correlations, which most likely makes it more accessible to practitioners. To sketch out the underlying rationale and methodology for the approach, let us start by considering the impact of the inclusion of correlations into backtesting. On the one hand, when assuming defaults to be independent, their number is binomially distributed. Using simple mathematics as presented in Blochwitz et al (2005), we obtain a formula for the relationship between pmax and a given level of condence bin : pmax = PDk +
1

(bin )

PDk (1 PDk ) Nk

Similarly, modeling correlations Basel II alike, as exercised in the IRB framework, with the one-factor model (cf Crosbie and Bohn (2001)), including the asset correlation , yields for the similar relationship (Koyluoglu and Hickman (1998) and Wilde (2001)): pmax =
1 ( asset ) +

1 (PD ) k

By comparing the statistics of both results we conclude that if the critical values pmax for both test statistics, with and without correlations, are equal, then the following equation must be satised: PDk +
1

PDk (1 PDk ) (bin ) = Nk

1 (

asset ) +

1 (PD

k)

An analysis of this equation using, for example, the graphs as presented in Blochwitz et al (2005) suggests that for parameters around a rating system, for
Research Paper www.journalofriskmodelvalidation.com

12

S. Blochwitz

an average wholesale credit quality banking book,9 both formulae approximately yield a similar threshold pmax . We can rephrase the ndings by saying that for an average credit portfolio and a reasonable level of condence, which is not too high, the correlation of default does not necessarily need to be incorporated into a backtesting procedure. For this reason, the subsequent considerations are based on the normal approximation, ie, from now on we assume an asset correlation of zero. Let us focus on measuring the distance between the observed default rates Dk and the forecast probabilities of default PDk for a rating grade k. We transform this distance into a specic color of a proposed color scheme. A natural way is to measure the distance by the standard deviation of default rates. This is under the assumption of no correlations given by the expression: (PDk , Nk ) = PDk (1 PDk )/Nk

For the ETLA we propose four colored zones. The rule for assignment is shown in the following color scheme, which maps each possible observed rate into one of the zones labeled with color green, yellow, orange and red: Green : if Dk < PDk Yellow : if PDk Dk < PDk + K y (PDk , Nk ) Orange : if PDk + K y (PDk , Nk ) Dk < PDk + K o (PDk , Nk ) Red : if PDk + K o (PDk , Nk ) Dk Now, the important boundary parameters K y and K o have to be set. Under the assumption of no correlation the probability of observing green is 50%. Practical considerations lead to the conclusion that the respective probability of observing the colors green, yellow, orange and red should decline, since the greater the deviation between Dk and PDk , the less frequent such a deviation will be observed for adequate calibrated rating grades. Additionally, K o should not be chosen too large as we intend to refrain from entering the tail of the distribution, which is inuenced much more by asset correlations. A reasonable choice for the probability of observing red could be 5%, which corresponds according to the proposed color scheme to K o = 1.64. This choice satises, as we have discussed above, the neglect of any effects caused by default correlation. Drawing the remaining border such that the probability of observing yellow is 30% yields K y = 0.84 and a probability of observing orange of 15%. Such an assignment of color-zones halves roughly the probability of observing a zone for two adjacent zones.
bin = asset 95%, and analyze rating grades k , which are reasonably populated reecting the average credit quality of banks, ie, with Nk 3,000 and PDk 3%, and we use values of asset correlations of less than 5%. This appears as a reasonable conservative upper bound for the German market. Recent empirical studies have shown asset correlations of less than 1% for German corporates (cf Hamerle et al (2004)). For retail clients the asset correlation varies depending on their rating from below 0.5% to up to 1.2% (cf Huschens and Stahl (2004)). 9 We set a moderate level of condence

Journal of Risk Model Validation

Volume 1/Number 4, Winter 2007/08

Validation of banks internal rating systems

13

The color scheme is intended to be applied to a time series of default rates and its PD forecasts for a certain rating grade k. The result is a quadruple dened by the numbers Lg , Ly , Lo and Lr denoting the number of observed green, yellow, orange and red periods, respectively. For further analysis a labelling function is dened, which maps a unique number to each observable quadruple: [Lg , Ly , Lo , Lr ] = 1000Lg + 100Ly + 10Lo + Lr From a prudential point of view, for a time series of default rates and PD of, say, four years length, the best calibrated rating system would exhibit the quadruple 4,000, whereas the worst calibrated rating system would show the quadruple labelled 0004. It remains to decide on the ranking of all other quadruples according to their severity, ie, the degree of miscalibration of the rating grade. An intuitive choice for ranking the observable quadruples is the function: ( ) := [Lg , Ly , Lo , Lr ] = Pg Lg + Py Ly + Po Lo + Pr Lr with Pg , Py , Po and Pr the probabilities associated with each colored zone (ie, 0.5, 0.3, 0.15 and 0.05, respectively). Subsequently, it is possible to assign a mixed color scheme for more periods10 and to rank these colors with the help of the weighting function. An example is given in Table 1 for a length of four years for the observation period. The rating grade can be validated based on the mixed color scheme by simply looking up its ranking of the observed quadruple over time. With the information provided by Table 1 we suggest that rating grades that show quadruples ranked up to number 17 should be analyzed more carefully, whereas the calibration of rating grades ranked from number 24 onwards generally can be accepted. For all remaining rating grades, and more importantly for the overall picture of the rating system, a case-by-case analysis should be made. Our experience in the application of ETLA, unfortunately, is limited due to lack of data. It shows, however, that the approach as proposed works in practice. For example, we applied a banks rating system where we observed the following behavior. The rating system consisted of 16 rating grades and all rating grades but one exhibited a quadruple that was always ranked higher than rank number 25 according to Table 1. The outlier grade, incidentally a grade for borrowers with rather low credit quality, was ranked as number 14, indicating a larger deviation between PD and default rate over four subsequent years. To understand this deviation, a view on that banks internal risk management processes is necessary. According to the banks internal credit policy, borrowers assigned to rating grades below the identied outlier grade were handed over from the customer relationship department to the intense treatment or work out department. However, such a transfer was unfavorable for the customer relationship department for
10 Here again a reasonable, but nevertheless intuitive, choice has to be made. We decided to assign the mixed color to the average physical wavelength of the observed colors of each quadruple.

Research Paper

www.journalofriskmodelvalidation.com

14

S. Blochwitz

TABLE 1 All realizations of the extended trafc light approach for a time series of four years.
Rank 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 0004 0013 0022 0103 0031 0112 0040 1003 0121 0202 1012 0130 0211 1021 0220 1102 0301 1030 0.0000 0.0001 0.0003 0.0002 0.0007 0.0014 0.0005 0.0003 0.0041 0.0014 0.0023 0.0040 0.0081 0.0068 0.0122 0.0045 0.0054 0.0068
cum

Rank R R R R R R R Y O O Y O O Y Y Y Y Y 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 1111 0310 2002 1120 1201 2011 0400 1210 2020 2101 1300 2110 3001 2200 3010 3100 4000

0.0270 0.0162 0.0038 0.0405 0.0270 0.0225 0.0081 0.0810 0.0338 0.0450 0.0540 0.1350 0.0250 0.1350 0.0750 0.1500 0.0625

cum

0.0000 0.0001 0.0004 0.0006 0.0012 0.0026 0.0031 0.0034 0.0074 0.0088 0.0110 0.0151 0.0232 0.0299 0.0421 0.0466 0.0520 0.0587

0.0857 0.1019 0.1057 0.1462 0.1732 0.1957 0.2038 0.2848 0.3185 0.3635 0.4175 0.5525 0.5775 0.7125 0.7875 0.9375 1.0000

Y Y G G G G Y G G G G G G G G G G

Note that shows the quadruples label, is the probability of observing that quadruple and cum is the probability of observing events of at least the same severity, ie, quadruples with a lower order function ( ). R, red; Y, yellow; O, orange; G, green.

various reasons. It thus appeared that staff of the customer relationship department, which was also involved in assigning the ratings, (mis-)used the overruling features of the rating system to keep borrowers attached to them by assigning the identied outlier grade. This example is an illustration of the top-down approach for validation in practice. Simply put, recalibrating a rating grades PD should only be the very end of the process and the ultimate rather than the default action. It is more important to nd out the reasons for deviations between PD and default rates. As evidenced in various instances, both wrongly structured or implemented processes around rating systems and the misuse of sample data have a severe negative impact on the calibration of PD. In such cases recalibration does not cure the disease but rather helps to cover the symptoms. As discussed above, the underlying rating philosophy is crucial. Both, statistical tests and trafc light approaches can only reasonably be used with PIT ratings, since these techniques take into account uctuations of default rates caused by random processes as opposed to systematic shifts.

3 IRB VALIDATION SOME REMAINING ISSUES


The validation of banks internal rating systems requires a considerable degree of sensitivity (see Neale (2001)) based on the difculties around the data situation
Journal of Risk Model Validation Volume 1/Number 4, Winter 2007/08

Validation of banks internal rating systems

15

and the variety of IRB models being used in practice. We discussed the impact of limited available public and private information on backtesting, the central approach proposed by the BCBS. Instead of focusing solely on backtesting techniques, we proposed a top-down evaluation of the rating process within the ongoing cycle of setting up and monitoring a rating system. The pragmatic backtesting approach of IRB systems presented here may be of practical use for banks and supervisors to assess the ongoing performance of IRB models. It may become an important element of such a top-down evaluation: backtesting in that sense could have a role in being the rst step in the top-down approach followed by the application of rather heuristic methods. However, the underlying rating philosophy is crucial. Both statistical tests and trafc light approaches can only reasonably be used with PIT ratings, since these techniques take into account uctuations of default rates caused by random processes as opposed to systematic shifts. A further renement of the trafc light approach should aim to recognize the rating philosophy and clearly further work is needed.
REFERENCES
Basel Committee on Banking Supervision (1996). Supervisory framework for the use of backtesting in conjunction with the internal models approach to market risk capital requirements. BIS, January. http://www.bis.org/publ/bcbs22.htm. Basel Committee on Banking Supervision (2005a). International convergence of capital measurement and capital standards: a revised framework. BIS, Updated November 2005. http://www.bis.org/publ/bcbs107.htm. Basel Committee on Banking Supervision (2005b). Validation, Newsletter No 4. www.bis.org/publ/bcbs_nl4.htm. Basel Committee on Banking Supervision (2005c). Guidance on the estimation of loss given default (Paragraph 468 of the Framework Document). www.bis.org/publ/bcbs115.htm. Blochwitz, S., Hohl, S., and Wehn, C. S. (2005). Reconsidering ratings. Wilmott Magazine May, 6069. Blochwitz, S., Martin, M. R., and Wehn, C. S. (2006). Statistical approaches to PD validation. The Basel II Risk Parameters: Estimation, Validation and Stress Testing, Engelmann, B., and Rauhmeier, R. (eds). Springer, Berlin, pp. 289306. Blochwitz, S., and Hohl, S. (2006). Validation of banks internal rating systems a supervisory perspective. The Basel Handbook, Ong, M. (ed). Risk Books, London, pp. 453480. Crosbie, P. J., and Bohn, J. R. (2001). Modelling default risk. KMV LLC 2001. http://www.kmv.com/insight/index.html. Hamerle, L., Liebig, T., and Scheule, H. (2004). Forecasting portfolio risk. Deutsche Bundesbank Discussion Paper 01/2004. www.bundesbank.de. Huschens, S., and Stahl, G. (2004). A general framework for IRBA backtesting (Dresdner Beitrge zu quantitativen Verfahren 39/04). www.tu-dresden.de. Research Paper www.journalofriskmodelvalidation.com

16

S. Blochwitz Koyluoglu, U., and Hickman, A. (1998). Reconcilable differences. Risk Magazine October, 5662. Neale, C. (2001). The truth and the proof. Risk Magazine March, 1819. Tasche, D. (2003). A trafc light approach to PD validation. Working Paper, http://www.gloriamundi.org/detailpopup.asp?ID=453057435. Wilde, T. (2001). IRB approach explained. Risk Magazine May, 8790.

Journal of Risk Model Validation

Volume 1/Number 4, Winter 2007/08

Potrebbero piacerti anche