Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Intensity-Duration-Frequency Estimation using Generalized Pareto Distribution for Urban Area in a Tropical Region
M.D. Norlida1*, I. Abustan1, R. Abdullah1, A. S. Yahaya1, O. Sazali1, M.D. Mohd Nor2 and M.S. Lariyah2 School of Civil Engineering, Engineering Campus, Universiti Sains Malaysia, 14300 Nibong Tebal, Seberang Perai Selatan, Pulau Pinang, Malaysia 2 Colledge of Engineering, Civil Engineering Department, Universiti Tenaga Nasional, Jalan Ikram-Uniten, 43000 Kajang, Selangor, Malaysia
* Corresponding author, e-mail norlidamd@water.gov.my, norlidamd@gmail.com
1
ABSTRACT
The Generalized Pareto Distribution (GPD) is used to derive the Intensity-DistributionFrequency curve for an urban area located in the tropical region using partial duration series (PDS). The Method of L-Moments (LMOM) is used to fit the distribution while the Kolmogorov-Smirnov (K-S) is used for goodness-of-fit test. The procedure was repeated for eleven rainfall durations, which range from 5 minutes to 4320 minutes. Five urban rainfall stations where the data was extracted were used in the study. For comparison purpose, the Log-Logistic 3(P) and the Generalized Extreme Value (GEV) were used. The GPD continuous parameters k, and were used to derive recurrence intervals for predicting rainfall intensities at the rainfall stations having less than 10-years data. The study proved that GPD is the most appropriate distribution compared to others. In the majority of cases, the GPD distribution provided good fits to PDS, while the performance fell to third place in the ranking for the rest of the cases. The result from 11 rainfall durations showed that the GPD, GEV and LL3 (P) had a total of ranking number 89, 111 and 130 respectively. The eleven duration most preferred first ranking by GPD, GEV and LL3 (P) is 61.8%, 20% and 18.2% respectively.
KEYWORDS
Generalized extreme value; generalized pareto distribution; log-logistic; methods of lmoments; maximum likelihood estimates
INTRODUCTION
The rainfall Intensity-Duration-Frequency (IDF) relationship is one of the most commonly used tools in water resource engineering, either for planning, designing and operating of water resource projects, or the protection of various engineering projects (e.g. highways or dams) against floods. This IDF curve estimation is used to estimate floods for different intervals at specific return periods. Several statistic distributions have been applied to characterise the extreme behaviour of rainfall by a mathematical framework to the ordinary rainfall data and discharge observation (Koutsoyiannis et al., 1998). GPD, GEV and LL3 are used to characterise the partial distribution series of recorded rainfall, which having less than 10-years data. The GEV distribution was re-introduced and reviewed by Bertin and Clusel (2003) to provide a general framework for the frequency analysis of Norlida et al. 1
12nd International Conference on Urban Drainage, Porto Alegre/Brazil, 11-16 September 2011 extreme hydrological and meteorological events. The GPD and GEV investigated parameters are widely used in looking at the most accurate continuous parameters in Italy on time series samples (Deida and Pauliga, 2009) and on a performance of some parameter estimator in USA (Zea Bermudez and Kotz, 2010). Fitting the Log Logistic distribution by generalized moment, i.e. Maximum Likehood Estimates (MLE), probability weighted moments (PWM) and LMOM (Fahim Ashkar and Mahdi, 2006) found to be more related to the choice of the moment. The parameters accuracy is based on the type of the generalized moment chosen. The review on flood frequency estimation was explained by Cunnane (1989) and Ahmad (2008) used rainfall threshold to separate a convective and a non-convective rainfall in the tropical region in Kuala Lumpur. Abustan et al. (2000) used urban rainfall and stream flow in finding the relationships between the two hydrologic parameters and rainfall station altitudes in a highly urbanised area in Kuala Lumpur. This paper explains and illustrates the continuous parameter estimation of the GPD, GEV and LL (3) P based on PDS and POT data while the Kolmogrov-Smirnov (K-S) is used for the goodness-of-fit test. The study found that GPD is the most appropriate distribution compared to others. The GPD parameters are used to derive an IDF curve for all rainfall stations in Kuala Lumpur.
METHODS
Probability Distribution Generalized Pareto Distribution. The probability density functions for the GPD with shape parameter k 0, scale parameter , and threshold or location parameter . The k, and are the continuous shape, scale and location parameters respectively (Zea Bermudez and Kotz, 2010). Generalized Extreme Value (GEV) Distribution. The GEV distribution which is widely recommended for flood frequency analysis has the probability density function and cumulative distribution function (Bertin and Clusel, 2006). The class of GEV distributions is very flexible with the tail shape parameter k, as a scale and as a location parameter. Log-Logistic. The LL (3) P distribution has been used in hydrology for modelling rainfall and stream flow (Singh, 1995). The unbounded distributions have a range of . The Log-Logistic Distribution with three parameters LL (3)P has parameters shape , scale , and location . Goodness of Fit Test The K-S tests (Chakravart et al., 1967 and Andres-Domenech et al., 2010) are used to decide if a sample comes from a population with a specific distribution. The K-S test makes use of critical values to choose the specific best selected distribution run by EasyFit version 5.3 Professional in which critical values must be calculated for each distribution. The smallest critical p-value indicated the best distribution following the classical Glivenko-Cantelli Theorem (Topse, 1970). Estimation Methods Fitting statistical distribution to extreme rainfall event data is necessary to establish good IDF 2 IDF Estimation using GPD for Urban Area in a Tropical Region
12nd International Conference on Urban Drainage, Porto Alegre/Brazil, 11-16 September 2011
Figure 1. The location of Sg. Kerayong in Malaysia. curves. The use of graphical method namely the Method of L-Moments (LMOM) and Maximum Likelihood Estimates (MLE) have been employed for this purpose. Methods of L-Moments. LMOM is a linear combination of order statistics, which are robust to outliers and virtually unbiased toward small samples, making them suitable for rainfall and flood frequency analysis, including the identification of distribution and parameter estimation (Hosking, 1993). Pearson (1991) and Yang et al. (2009) developed regional flood frequency based on LMOM for New Zealand and China respectively. Vogel and Fennessey (1993) studied on how one should replace LMOM with product moment diagrams. Maximum Likelihood Estimates. The MLE equations can be expressed as where N is the sample size and involve matrix form calculation (Singh et al., 1993). The LMOM is used to determine the parameters that maximize the probability (likelihood) of the sample data. In statistical hydrology, the method of MLE is considered to be more robust (Singh et al. 1993), versatile and yields estimators with good properties of statistics. MLE methods apply to most models and to different types of data. In addition, they provide efficient methods for quantifying uncertainty through confidence bounds.
Norlida et al.
12nd International Conference on Urban Drainage, Porto Alegre/Brazil, 11-16 September 2011 Table 1. Station Inventory for Selected Urban Rainfall Stations.
Station No 1 2 3 4 5 Number 3117070 3117101 3117102 3117104 3117130 Name JPS Ampang Kg. Cheras Baru Taman Miharja Pandan Indah Jam. Jalan Cheras From 30-Jun-70 27-Mar-98 31-Mar-98 02-Jan-06 07-Dec-06 Data Duration To 18-Jan-09 20-Nov-08 15-Feb-09 01-Jul-06 13-Jan-08
Partial Duration Series and Threshold Data Partial Duration Series (PDS) or Peak Over threshold (POT) studies series consist of various distributions were conducted continuously by Diebolt et al. (2003), Madsen and Rosbjerg (2004), and Zea Bermudez and Kotz (2010). For the identified regions GPD is the suitable distribution for heavier tailed at the end of the series. In a longer duration record, however, the threshold was usually raised so that on average, only three or four floods a year is included. Van Montfort and Witter (1986) examined PDS having on average, 110 events per year. Rosbjerg and Madsen (2004) demonstrated that the PDS/GPD model is competitive with the Annual Maximum Series (AMS)/GEV model and highly efficient for regionalisation. Dahal and Hasegawa (2008) recommended the threshold relationship fitted to the lower boundary of the data group as defined by landslide-triggering rainfall events and is suitable for Himalaya, Nepal. Both visual and statistical methods were proposed for obtaining a given precipitation threshold which resulted from various monthly rainfall thresholds within the months in the arid environment (Lopez et al., 2008). In general, from small to larger PDS are considered in the present study. The 35 mm/hr following Ahmad (2003) is used as a threshold basis and varied accordingly to the rainfall time interval. The data quality is improved by investigating neighbourhood stations within 2 km in radius.
Figure 2. The catchment study in highly developed urban area in Kuala Lumpur and five rainfall stations distribution within Sg. Kerayong catchment.
12nd International Conference on Urban Drainage, Porto Alegre/Brazil, 11-16 September 2011
0.30 0.20 0.10 0.00 50
GPD ,k
y = -4E-05x + 0.0995 R = 0.6439
k min
45 40 35 30
GPD,
GPD,
25 20 15 10 5 0 1 25
k max
k min k max
80 60 40
y = 12.98x0.2789 R = 0.9315
min k max
y= R = 0.9089
Durations (minutes)
10 100 1000
Durations (minutes)
100
y = 9.7849x0.185 R = 0.8362
10000
1000
GEV, k
20
GEV,
GEV,
15
0.20 0.15 0.10 0.05 0.00 0 3.50 3.00 2.50 2.00 1000 k max
10
y = 2.3524x0.2471 R = 0.8818
min max
80 60 40 20 0
y = 15.461x0.2703 R = 0.945
min max
y = 0.1145e4E-05x R = 0.0261
Durations (minutes)
2000 3000 4000 5000
y = 10.453x0.2189 R = 0.8798
Durations (minutes)
1 10 100 1000 10000
0 35
LL(3)P,
y = -0.092ln(x) + 2.6089 R = 0.2529
LL(3)P,
30 25 20
LL(3)P,
k min k max
15 10 5 0
y= R = 0.9383
2.8835x0.2646 y= R = 0.9073
Durations (minutes)
k min k max
80 60 40 20 0
y = 13.285x0.273 R = 0.9323
k min k max
y = 1.2548x0.0353 R = 0.3616
Durations (minutes)
2000 3000 4000 5000
1.8786x0.2942
Durations (minutes)
1 10 100
y = 9.0855x0.1864 R = 0.8075
10000
10
100
1000
10000
1000
Norlida et al.
12nd International Conference on Urban Drainage, Porto Alegre/Brazil, 11-16 September 2011 Table 2. K-S Test Preferable First Ranking of GPD, GEV and LL (3)P.
Total 1st Ranking (no of times) % Preference 1st Ranking GPD 34 61.8% GEV 10 18.2% LL(3)P 11 20%
linear if the specified theoretical distribution is the correct model. The null hypothesis was accepted and explained in the K-S tests. Meanwhile, the percentage of preference for the first ranking choices of distribution is 61.8% to the GPD, 18.2% to the GEV and 20% to the LL (3)P as shown in Table 2. Furthermore, the GPD continuous parameter k, and were used to derive rainfall intensity estimation for various recurrence intervals. The finding shows that the average recurrence interval (ARI) lines appear undulating and show improper IDF curves where there are small rainfall samples less than 20 samples in one population included in the analyses of station 3117104 and 3117130. The study proceeded with three rainfall stations selected 3117070, 3117101 (Figure 4) and 3117102. The three stations provided three continuous parameter estimations of rainfall stations and showed a common shape of GPD, GEV and LL(3)P in Figure 3. Two further investigations were carried out to minimise the continuous parameter error as well as to increase the accuracy. First, it was found that the power equations of the scale and location parameters gave the highest accuracy ranging between 83.6% and 94.5% on linearlog scale. Secondly, accuracy of the continuous shape parameter k and value ranged from 2% to 64.4% on liner-linear scale and a linear or a polynomial equation. A general equation can be taken from an average continuous parameter equation of its lower and upper bound equations.
1000
0.8
0.7
0.6
0.5
0.4
100
6 5 4 6 3 5 2 4 1 3
P(model)
P (M o d e l)
0.3
10
0.2
0.1
1 DENOTES 1 IN 2-YEAR 2 DENOTES 1 IN 5-YEAR 3 DENOTES 1 IN 10-YEAR 4 DENOTES 1 IN 20-YEAR 5 DENOTES 1 IN 50-YEAR 6 DENOTES 1 IN 100-YEAR
P(empirical) Generalized Pareto Generalized Extreme Value Log Logistic (3) parameter
Gen. Extreme Value Gen. Pareto Log-Logistic (3P)
P (Empirical)
Figure 4. P-P Plot and IDF Curve for Station Kg. Cheras Baru, 3117101.
12nd International Conference on Urban Drainage, Porto Alegre/Brazil, 11-16 September 2011
CONCLUSION
The GPD continuous parameter is used to derive rainfall IDF estimation for various ARI. The figures and tables presented appear to be very useful in illustrating the GPD, GEV and LL (3) continuous parameter behaviour in the study area and some conclusions can be derived from the results. From the analysis of five rainfall stations, the important findings are as follows: The continuous scale parameters of GPD, GEV and LL(3)P denoted as , and have a consistent in pattern and bounded by their minimum and maximum continuous parameter estimations in power equations respectively. The continuous shape parameters k and of the GPD, GEV and LL (3)P are in linear equations. The location parameters of GPD, GEV and LL (3)P are diverging in power equations as rainfall increases. The continuous parameter equations of GPD, GEV and L (3)P are able to estimate a general shape, scale and location of continuous parameter estimation at various rainfall durations hence for the derivation of IDF curves. Future study shall use different distribution with more recorded rainfall data as longer duration records can minimise errors and difficulties which will eventually lead to better rainfall IDF estimation for various ARI.
ACKNOWLEDGEMENTS
This paper is financially supported by Public Administrative Department Malaysia, Department of Irrigation and Drainage Malaysia and Universiti Sains Malaysia.
REFERENCES
Abustan I., Mohd Nor M.D. and Abdul Wahid N. (2000). SWMM modelling for a small catchment in Kuala Lumpur. Proceedings of Fresh Perspectives on Hydrology and Water Resources in Southeast Asia and the Pacific, Christchurch, New Zealand, 21 24 Nov 2000, pp.166 -172. Ahmad N. (2008). Characterization of convective rain in Klang Valley, Malaysia. Master Thesis (Hydrology and Water Resources). Universiti Teknologi Malaysia. Andres-Domenech I., Montanari A., and Marco J.B. (2010. ) Stochastic rainfall analysis for storm tank performance evaluation. Hydrol. Earth Syst. Sci., 14, 12211232. Ashkar F and Mahdi S. (2006). Fitting the log-logistic distribution by generalized moments. Journal of Hydrology, 328, 694 703. Bertin E. and Clusel M. (2006). Generalized Extreme Value Statistics and sum of correlated variables. Journal of Physics A: Mathematical and General, 39(24), 7607. Chakravarti, Laha, and Roy (1967). Handbook of Methods of Applied Statistics, Volume I, John Wiley and Sons. pp. 392-394. Cunnane, C. (1989). Review of statistical models for flood frequency analysis. WMO Operational Hydrology. Rep. no. 33, WMO no. 718, World Meteorological Organization. Dahal R.K., Hasegawa S. (2008). Representative rainfall thresholds for landslides in the Nepal Himalaya. Geomorphology 100, 429- 443. Deidda R., Puliga M. (2009). Performances of some parameter estimators of the generalized Pareto distribution over rounded-off samples. Physics and Chemistry of the Earth, 34, 626634. Diebolt J., El-Aroui M.A., Garrido M. and Girard S. (2003). Quasi-conjugate Bayes estimates for GPD parameters and application to heavy tails modeling. Rapport de recherche no 4803,29 pages. Hosking J.R.M. and Wallis J.R. (1993). Some statistics useful in regional frequency analysis. Water Resource Research, 29(2), 271-281. Koutsoyiannis D., Kozonis D. and Manetas A. (1998). A mathematical framework for studying rainfall intensityduration-frequency relationships. Journal of Hydrology, 206,118-135.
Norlida et al.
12nd International Conference on Urban Drainage, Porto Alegre/Brazil, 11-16 September 2011
Langbein W.B. (1949). Annual Floods and the partial-duration flood series, Transactions, American Geophysical Union, 30(6), 879-881. Lopez B.C., Holmgren M., Sabate S., Gracia C. A. (2008). Estimating annual rainfall threshold for establishment of tree species in water-limited ecosystems using tree-ring data. Journal of Arid Environments, 72, 602611. Madsen H. and Rosbjerg D. (1993). Application of the partial duration series approach in the analysis of extreme rainfalls. IAHS Publ. no. 213. Pearson C.P. (1991). New Zealand regional flood frequency analysis using L-moments. The New Zealand Hydrological Society. J Hydrol, 30(2),53-64. Rosbjerg D. and Madsen H. (2004). Advanced approaches in PDS/POT modelling of extreme hydrological events. British Hydrology Society. Hydrology: Science & Practice for the 21st Century, 1, 217-220. Singh V.P. and Guo H. (1995). Parameter estimation for 3 parameter generalized pareto distribution by the principle of maximum entropy (POME). Hydrological Sciences Journal des Sciences Hydrologiques, 40, 2. Singh V.P. and Guo H. and Yu F.X. (1993). Parameter estimation for 3 parameter log logistic distribution (LLD3) by pome. Stochastic Hydro. Hydraul. 7, 163 177. Topsoe F. (1970). On the Glivenko-Cantelli Theorem. Probability Theory and Related Fields, 14(3), 239-250. Van Montfort M.A.J. and Witter J.V. (1986). The Generalized Pareto distribution applied to rainfall depths. Hydrological Science, 31(2), 151-162. Vogel R.M. and Fennessey N.M. (1993). L-moments should replace product moment diagrams. Water Resources 29(6),1745-1752. Yang T., Xu C.Y., Shao Q.X., and Chen X. (2009). Regional Flood Frequency and Spatial Patterns analysis in the Pearl River Delta Region using L-Moments Approach. Stoch Environ Res Risk and Assess, 24(2), 165-182. Zea Bermudez P.D., Kotz S. (2010). Parameter estimation of the generalized Pareto distribution part 1. Journal of Statistical Planning and Inference, 140,1353-1373.