Sei sulla pagina 1di 9

Available online at www.sciencedirect.


Procedia Computer Science 104 (2017) 3 – 11

ICTE in Regional Development, December 2016, Valmiera, Latvia

Modelling of Water Supply Costs

Edvins Karnitisa, Girts Karnitisa,*, Janis Zutersa, Viktorija Bobinaiteb
University of Latvia, Raina blvd.19, Riga, LV1586, Latvia
Lithuanian Energy Institute, Breslaujos st. 3, Kaunas, LT-44403, Lithuania


Water supply tariffs’ setting is a labour intensive regulatory procedure; currently number of informative and procedural shortages
and problems exist. The aim of the current research is improvement of methodology for determination of the substantiated costs
for provision of water services. A working hypothesis was advanced to modernize the methodology: the specific costs (€/m3)
required for the provision of water services in a specific region is a variable multi-parameter function of key performance
indicators. There is preferred a benchmark modelling procedure, which is based on the factual cases (declared indicators of water
utilities) and synthesis of the general regularity. The model is developed using two independent modelling procedures. The
correlation of the synthesized model with declared specific costs of Latvian water utilities is strong (0.88). The correlation
between the respective modelled indications exceeds 0.95; hence, the trustworthiness in the results is high. The prospect is the
determination of the price ceilings and then an operative tariff setting, thus significantly improving the methodology.

© 2017
Publishedby by
Elsevier B.V.B.V.
Elsevier This is an open access article under the CC BY-NC-ND license
Peer-review under responsibility of organizing committee of the scientific committee of the international conference; ICTE 2016.
Peer-review under responsibility of organizing committee of the scientific committee of the international conference; ICTE 2016
Keywords: Water utilities; Benchmarking methodologies; Data mining; Artificial neural networks

1. Introduction

The provision of water services typically is a highly segmented function. Actually only one water utility is
functioning in any specific territory (in total even hundreds of utilities in most of countries); consequently, all of
them are local monopolies. Therefore, tariff setting usually is the task of the National Regulatory Authority (NRA).
So, according to the Law1, the Public Utilities Commission of Latvia (PUC)2 regulates drinking water and
sewerage services (including tariff setting)3. The tariff setting methodology4 prescribes that the water utility prepares

Corresponding author. Tel.: +37167034488; fax: +37167225039.
E-mail address:

1877-0509 © 2017 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license
Peer-review under responsibility of organizing committee of the scientific committee of the international conference; ICTE 2016
4 Edvins Karnitis et al. / Procedia Computer Science 104 (2017) 3 – 11

and submits to the PUC a particular tariff draft, which contains justified data on the volume of the service and costs
in the previous years as well the prognosis for the next year; the data are detailed in number of cost positions and
should be based on documents. The process is similar in many countries; methodologies are based on aggregation of
large number of cost items5. The differences are in details, some countries need to save the water resource6, while
the others have quality problems, or have implemented universal service principle by the tariffs.

In the last decade, benchmarking has become widely considered as the tool to motivate the water utilities to raise
their productivity7. Both the most popular benchmark methods (metric and process benchmarking) compare
performance indicators (PIs) of utilities to find the more efficient companies and to share the best practice8.
Unfortunately, currently there is benchmarking of separate data only “…over time, across water utilities, and across
countries”9 without reflections and conclusions on impact of benchmarking process on sector management and
development. Unanswered remains the question: how to achieve by the benchmarking some regulatory outcome,
e.g., evaluation of costs and tariff setting.

2. Shortages of the methodological approach

More detailed analysis identifies number of informative and procedural shortages and problems in the current
approach to the tariff setting.
Methodological principles, which are based on careful evaluation of all cost items, cause the need for extremely
detailed laborious individual assessment of each position of each tariff draft, since:

x Water utilities are using different business models; e.g., the utility can maintain and repair the infrastructure, can
employ its own legal and/or IT specialists or it can use outsourcing10; comparative assessment is not possible
x National regulations on accounting and bookkeeping are quite general, account layouts really are quite different;
especially it relates to the administrative and personnel costs, material accounting, etc.

The regulatory procedure becomes long and hard, in addition it stimulates long-term application of the tariff.
Applied tariffs frequently are behind the time and become unjustified because of frequent changes of business scale
as well energy, material and service prices, wages, etc.
Another reason of problems is low quality, compatibility and reliability of input data (values of the PIs) since:

x There is lack of regulations on the material, human and other resources needed for an efficient (i.e., economically
substantiated) water supply service
x The large number of utilities means a potential considerable diversity in the comprehension on the PIs
x Huge number of used PIs is a strong administrative burden for utilities to provide all of them: “It is not only the
small utilities that find it difficult to evaluate such large number of PIs, larger utilities fare no better. ….”11
x Many utilities are multi-sector companies; they provide regulated and non-regulated services; there is low
assurance on the absence of the cross-subsidies, particularly on the subsidization of non-regulated services from
x Frequent stochastic changes of the network length, consumption, water losses and other aspects make prognoses
inaccurate and unreliable

Moreover, the detailed audit of various cost models and structures (even due diligence) of utilities is not a
regulatory function; the NRA should examine the validity of costs as a whole instead examination of every cost item
(including those with a negligible impact) and their composition. The aim of current research is improvement of
methodology for the determination of the substantiated costs for provision of water services in order to enable NRAs
to increase their efficiency and to reduce significantly the administrative burden on utilities. This article presents the
results of the first stage (water supply) of on-going project.
Edvins Karnitis et al. / Procedia Computer Science 104 (2017) 3 – 11 5

3. Working hypothesis

The enumeration of currently existing deficiencies not only clearly demonstrates the need for a radical
enhancement of the methodological approach to the water tariff setting, but also outlines the main principles for
modernization of the algorithm according to the realities in the water industry.

The generalizations over the tariff draft (assessment of the total specific costs only) and over utilities
(comparison of specific costs) would be achievable by:

x Investigation of the potential dependency of the specific costs on the key PIs that are declared by the utilities
x Determination of the general correlations among the specific costs of utilities and synthesis of the corresponding
functional regularity
x Determination of reasonable/substantiated costs for each water utility using the synthesized general regularity

A facilitating practical aspect for the synthesis of the general regularity would be similar basic normative and
business conditions for water utilities, which operate in the same geographical area with comparable environmental
and socio-economic factors as well rules of game for business (e.g., NUTS 2 level region (case of Latvia)).
Proceeding from this aspect we created a working hypothesis: the total specific costs (C) required for the provision
of the water services in a specific region is a variable multi-parameter function of the set of key PIs (Ȇ), which
characterizes the scale and specific features of this business. Then these PIs would serve as the drivers for
determination of the substantiated specific costs; the searched regularity is:

ൌˆሺȫሻ (1)

Huge amount of unreliable input data naturally cannot form a necessary basis for proof of the hypothesis and
consequently for setting justified tariffs. The well-known information processing axiom postulates that the quality of
the output data is fully determined by the quality of the input data. To increase the last one, to achieve accuracy of
the actual values of PIs, the set of used PIs is limited to:

x Certainly, clearly and unambiguously defined PIs to provide their uniform understanding in all utilities
x Quantitatively measurable and controllable PIs (input quantity data) to ensure the reliability of their values
x Well-known and widely used PIs that exist in the business accounting and are obtainable for the NRA in the
annual reports
x Small number of key PIs that characterize the utility’s business and thus determine substantiated costs; it will
reduce the administrative burden on utilities and by default will raise the quality of data

To achieve the aim of current research, to develop methodology for determination of the functional regularity (1)
we should to prove the working hypothesis and to make it tenable for the practical implementation (control of costs
and tariff setting). Whereas it will be impossible to define it theoretically, there is preferred a benchmark modelling
procedure, which is based on the factual cases (declared information of water utilities).

4. Selection of input data

Careful selection of input data is a well-known critical precondition for successful modelling.
The long-term experience of water professionals suggested the first step in the selection of indicators: the scale of
utility’s k business (i.e., amount of authorised consumption A(k)) and size of its infrastructure (i.e., total length of the
pipe network L(k)) provide the first relative notion (although the very rough one) on specific costs C(k) of the utility
k. Consequently the initial composition of input data set Ȇ(k) would be defined as:

ȫሺሻൌሼሺሻǡሺሻሽ (2)
6 Edvins Karnitis et al. / Procedia Computer Science 104 (2017) 3 – 11

A difference between amounts of authorised consumption A(k) and produced water P(k) (so called non-revenue
water) actually exist due to number of reasons, e.g., technological consumption, leaks, disruptions, unmetered and
unauthorized connections. The non-revenue water raises significant costs for its sourcing, treatment and partial
pumping. Therefore, the initial data set (2) is supplemented with an indicator of water use efficiency E(k), which
displays the share of produced water that is supplied to authorized consumers:

ሺሻൌሺሻȀሺሻ (3)

Many water utilities are serving not only one city or town, but also are providing services in a number of smaller
neighboring settlements (e.g., villages, hamlets). Fragmentation of the total network L(k) in s(k) separate segments
adversely affects the substantiated costs (personnel, transport, management, etc.) of the utility. Network
concentration index H(k) takes into account the total number of isolated segments s(k) in the network L(k), as well
the specific weight of the length of each isolated segment L(m(k)) in total length of the network:
ሺሻ ൌ ෌୫ୀଵሺሺሺሻሻȀሺሻሻଶ (4)

The connection of the consumer is an exit point of the utility’s infrastructure and boundary of its responsibility
for the service. The number of connections N(k) characterizes expenditures in relation to consumer services, it
indices also fragmentation of the network as well the ratio between the larger diameter (transmission and
distribution) and smaller diameter (consumer) pipes. Now the overall input data set has become:

ȫሺሻൌሼሺሻǡሺሻǡሺሻǡ ሺሻǡሺሻሽ (5)

Several aspects, which are significant in other countries, are not relevant in Latvia. The single real water quality
problem is iron removal, but it is quite equal in the whole territory; the tap water is suitable for drinking without
further filtration in any settlement. Water shortage problems are negligible as minimum until 20406. Support of
vulnerable consumers is implemented by special accommodation allowance on municipal level. The set of PIs (5),
which practically is the result of several iterations, was used in the project.

Accuracy of input data (selected PIs) remains some challenge for the water utilities (currently only the volume of
authorised consumption is used to calculate the tariff), although they are the primary operational data of everyday
business. Actually, it is clear that in any case part of the particular input data sets Ȇ(k) will stay in the risk zone.

A significant advantage of the modelling is the ability to pick the most qualitative and reliable (good) data sets
Ȇ(k) of the full data pack, which are declared by all utilities, and to develop the model on this basis; the obtained
general regularity will be applicable to the remaining (bad) utilities too. Moreover, the theoretical research in data
analysis and modelling shows that it is much more reasonably to carry out the analysis and to develop the data
models selecting the most reliable, good data sets instead of using the full data pack, thus minimizing information

To select good data sets a detailed formal analysis of declared data for 2013 and 2014 (that were used for the
creation of the model) and comparative analysis of declared data for 2012–2014 were carried out. As the utilities of
low confidence (bad utilities) were considered those, which have declared, e.g.:

x Volume of produced water that is equal in two years or very near to the volume of supplied water (E > 0.99)
x A very large average connection density (N/L > 100 connections per km; average in Latvia – 22.6) and/or a very
low average consumption per connection (< 6 m3/month; average in Latvia – 48.3)
x Decreased or significantly increased (> 30%) total length of the network in the current year in comparison with
previous one at unchanged or even declining amount of authorised consumption

Practitioners of data analysis recommend including in the good data around 70% of total data sets; the
Edvins Karnitis et al. / Procedia Computer Science 104 (2017) 3 – 11 7

synthesized regularity will be applied to the remaining 30% too13. We have supplemented this factor by another one:
selected good data should cover the full ranges of declared data. Because of the unconvincing data quality, we
carried out modelling using data of only 33 utilities in 2013 (50% of total) and 38 utilities in 2014 (60%).

5. Modelling procedure

For the development of the benchmark model, the water utility k is considered as a multiple-input single-output
converter with a variable internal state (transition regularity), which transforms the input data set (5) (set of the n PIs
(cost drivers); in our case n = 5) for the corresponding output indicator – specific costs C(k):

ሺሻൌˆሺȫሺሻሻൌˆሺሺሻǡሺሻǡሺሻǡ ሺሻǡሺሻሻ (6)

The converter is a determined one – its input data set unambiguously defines the internal state of the converter
and the output information (specific costs). Then synthesis of the model means creation of the monotonous (due to
the economic logic) multi-functional regularity that describes the hyper-surface in the n-dimensional space, which is
a geometric place of values of the specific costs. Number of input data sets u and consequently the output data is
finite (number of utilities u). Nevertheless, in order to have the possibility to evaluate new and/or modified
undertakings, the regularity should be continuous for any PI in the determined ranges of input data.

The internal structure and operation of the utility are irrelevant for performing this task, the converter is
considered as a black-box (see Fig. 1) that can be characterized by its transition function (6).

Fig. 1. Functionality of the benchmark model.

There are u different transition functions f(1), f(2), …, f(k), …, f(u) for u utilities; they form a factual basis for the
modelling – set of the practical cases that can be used for synthesis of the general regularity from input/output
examples. This is an advantage against the need to rely only on theoretical preconceptions. Then the modelling
means an inductive process – synthesis of the general regularity of the transition function C = f(Ȇ) on the basis of u
particular cases C(k) = f(Ȇ(k))14; the sought equation is developed using the mathematical modelling procedure.

It can be predicted that it will not be achievable the model, which is completely adequate to all real utilities. The
leading motive for practical purposes is to create the equation with the best possible quality. As the quality criterion
for the created general regularity we used the correlation of the values of the synthesized model (equation) with the
corresponding declared specific costs (correl (C(k); C)).

Two mutually non-related modelling procedures have been used to ensure also the cross-check of results. Both
consist of several phases and activities and are carried out in an iterative manner to approach gradually the searched
general regularity.

One procedure is based on the nonlinear regression process (NLR)15. To create a model, equation (6) should be
generalized because of variability of impacts of any cost driver (performance indicator) on specific costs depending
on PI value. For this purpose, specific values of any PI should be replaced by the mathematical functions; of course,
8 Edvins Karnitis et al. / Procedia Computer Science 104 (2017) 3 – 11

impact regularities of each PI will be different:

 ൌˆሺȫሻൌˆሺˆͳሺሻǡˆʹሺሻǡˆ͵ሺሻǡˆͶሺ ሻǡˆͷሺሻሻ (7)

Then the synthesis of the searched general regularity comes to the determination of all functions in the equation
(7). According to the Occam’s razor principle, search of the suitable functions f1…f5 was made between the
elementary functions, which satisfy the monotony condition (exponential, logarithmic, power), to indicate the
particular regularity that best of all correlates with all practical examples.

The NLR process in our case is an empirical movement (navigation) in multi-dimensional search space, which is
formed by the input data vectors (see Fig. 2), to find the optimal function and optimal its parameters for each cost
driver. The incremental and efficient bottom-up navigation was implemented as the gradual process and a definite
trend towards the target – the maximum achievable quality criterion (correl (C(k); C) value 0,88 was achieved).

The other modelling procedure uses a type of the artificial neural networks – multi-layer perceptron (MLP) with
the error back-propagation training algorithm16. We realized that it is enough to have just three computing units
(neurons) in the hidden layer and one output unit (see Fig. 3) to achieve the best accuracy17. Each neuron is
represented by a mathematical function, weights (or parameters) of which is set automatically via a machine
learning process. Input data normalization was made by S-type function because of very diverse scales of data

Fig. 2. Modelling using nonlinear regression process.

The final outcome (i.e., the searched transition function) is obtained in the form of trained neural network; its
quality was carefully evaluated in a combined way:
x In addition to correlation between modelled specific costs and declared ones, the accuracy of the modelling was
controlled by the stopping criterion – continuation of the training process until the modelling error falls below
some predetermined threshold İ18. The best possible correlation obtained was 0.96 with İ value of 0.00001
x The models obtained with maximum possible accuracy/correlation too much adapt to the concrete data points
used for training and thus they are invalid to model the overall process (so called overfitting). Therefore specific
monotony tests of the model were defined to obtain a maximum quality of the searched regularity while
preserving its monotony; e.g., the final value of the threshold İ was increased until 0.0001, the obtained
correlation was similar with that in NLR case – 0.88
Edvins Karnitis et al. / Procedia Computer Science 104 (2017) 3 – 11 9

Fig. 3. Modelling using multi-layer perceptron.

6. Modelling results and inferences

Using both modelling procedures, the benchmark model of Latvian water utility are synthesized on the basis of
declared data 2013 and 2014.
The models 2013 for 33 good utilities in comparison with the declared costs are shown on Fig. 4. The correlation
of the modelled costs C with declared specific costs C(k) (table 1) is very strong (> 0.85); p-values (i.e., probability
that obtained correlations are accidental) are less than 10-8. More than 70% of modelled specific costs are in the
standard segment that is formed by values C(k) +/- 10%. Modelling for good data of 2014 (38 utilities) provide
results of even better quality, correlation value 0,88 was achieved (increased input data quality!).
Mathematical expressions of models are different, but the surfaces practically coincide in the whole range of
input data; the correlations between the respective indications of models are extremely strong (>0.95). The
difference between modelled costs for particular utility does not exceed 10%; the biggest differences are for utilities
whose specific costs are out of standard segment or on its border. Hence, the trustworthiness in the results is high.

0.4 Declaredcosts NLRmodel MLPmodel
0.2 0,9*C 1,1*C
0 Utilities
U1 U4 U7 U10 U13 U16 U19 U22 U25 U28 U31

Fig. 4. Modelled and declared specific costs of 2013 (33 good utilities).

Application of the synthesized regularities to the bad utilities (see Fig. 5) clearly identifies two clusters of bad
data, i.e., utilities, which have declared unduly high and low costs in relation to scale of their business. Let us
remember – bad data were separated according to the results of very formal data analysis without any connection to
specific costs. Practically unchanged excellent mutual coincidence of the models for all utilities displays currently
existing information asymmetry and incorrect data as the most probable reason of differences.
10 Edvins Karnitis et al. / Procedia Computer Science 104 (2017) 3 – 11

0.4 Declared MLPmodel
0.2 Utilities
Goodcases Badcases

Fig. 5. Modelled (MLP model) and declared specific costs of 2014 (all 63 utilities).

According to the principles of benchmark modelling, the general regularity shows the mutual correlation between
specific costs of good utilities and represents mean values of the declared specific costs. Whereas both artificially
reduced and exaggerated costs exist, it can be roughly assumed that the synthesized benchmark model presents
average/reasonable costs. In the next stages, it will be possible to use the modelled specific costs as a motivator to
increase the operational efficiency of utilities and to reduce their expenses.
Comparison of two NLR models for 2013 and 2014 (see Fig. 6) shows gradual general growing production costs.
Increasing consumption means decreasing specific costs (e.g., U13), while developed but currently untapped
network results in increasing costs (e.g., U1, U2, U22). Relatively small scale of business and weak technological
base are the basic reasons of significant impact of any serious accident on specific costs in comparison with previous
year – disruptions in 2013 (e.g., U15, U17, U18, U23) or in 2014 (e.g., U1, U2, U7, U22). This factor clearly shows
necessity of operative tariff change.

1.2 Specificcosts,€/cubm
0.4 2013 2014
U1 U4 U7 U10 U13 U16 U19 U22

Fig. 6. NLR models 2013 and 2014.

A principal practical regulatory need is coincidence of models, using regularity of previous year for real PIs of
current year. Fig. 7 shows that there is serious shift only for U1; the reason is different modelling ranges.

Specific costsfor2014,€/cubm
0.4 Model2013 Model2014
U1 U4 U7 U10 U13 U16 U19 U22 U25 U28 U31 U34 U37

Fig. 7. Modelled specific costs for 2014 (MLP model), using models 2013 and 2014.
Edvins Karnitis et al. / Procedia Computer Science 104 (2017) 3 – 11 11

7. Conclusion

The level of accuracy and credibility of the results, coincidence of the models obtained with the factual cases
clearly demonstrates the correctness of the working hypothesis and perspective of the research. There is a strong
basis for continuation of research to enhance further the compliance of the model, to increase its quality.
Reduction of currently existing information asymmetry is the primary task. The results indicate much greater
input data problems in comparison with the inadequacy of the model (the mutual correlation of independent
modelling results is much higher than their correlation with declared costs of utilities). Studies related to the
potential incompleteness of the input data set also should be continued to determine whether all the substantial input
data (cost drivers) are included in the set of PIs. It is necessary to identify and to quantify individualities, which
distort the regularity of the declared costs of utilities: some of them artificially reduce costs e.g., municipal subsidies
or underinvestment, while others generate exaggerated costs, e.g., excessive capacity of some infrastructure objects.
Thus, the general goals – setting of the substantiated tariffs, growing efficiency of utilities, reduction of the
administrative burden on business and increase of the efficiency of the NRA will be achieved. Analogous
methodology could be developed also for evaluation of the costs of sewerage and district heating utilities.


1. On Regulators of Public Utilities. Available:; 2016.
2. Sabiedrisko pakalpojumu regulesanas komisija (in Latvian). Available:
3. Udenssaimniecibas pakalpojumu likums (in Latvian). Available:; 2016.
4. Udenssaimniecibas pakalpojumu tarifu aprekinasanas metodika (in Latvian). Available:; 2016.
5. Geriamojo vandens tiekimo ir nuoteku tvarkymo bei pavirsiniu nuoteku tvarkymo paslaugu kainu nustatymo metodika (in
Lithuanian). Available: (in Bulgarian). Available:; 2016.
6. Luo T, Young R, Reig P. Aqueduct projected water stress rankings. Available:
water-stress-country-rankings; 2016.
7. Berg SV. Water utility benchmarking; measurement, methodologies, performance incentives. London: IWA Publishing; 2010. 172.
8. Storto C. Benchmarking operational efficiency in the integrated water service provision; does contract type matter? Benchmarking.
Vol.21, 6; 2014. p. 917-943.
9. Berg S, Padowski JC. Overview of Water Utility Benchmarking Methodologies: From Indicators to Incentives. Available:; 2016.
10. Baranzini A, Faust A, Maradan D. Water supply: costs and performance of water utilities, evidence from Switzerland. Available:; 2016.
11. Shinde VR, Hirayama N, Mugita A, Itoh S. Revising the existing performance indicator system for small water supply utilities in
Japan. Urban Water. Vol.10, 6; 2013. p. 377-393.
12. Barzdins J, Barzdins G, Apsitis K, Sarkans U. Towards Efficient Inductive Synthesis of Expressions from Input/Output Examples.
Proceedings of the 4th International Workshop on Algorithmic Learning Theory. London: Springer-Verlag; 1993. p. 59-72.
13. Leek J. The Elements of Data Analytic Style. Victoria British Columbia: Leanpub; 2015. 94.
14. Angluin D. Inductive inference: theory and methods. Computing Surveys. Vol. 15, 3; 1983. p. 237-267.
15. Dean J. Big data, data mining and machine learning. Hoboken New Jersey: Wiley; 2014. 266.
16. Haykin S. Neural Networks and Learning Machines. New York: Pearson Education Inc.; 2010. 936.
17. Alpaydin E. Introduction to Machine Learning. Cambridge. The MIT Press; 2010. 584.
18. Mitchell TM. Machine Learning. Columbus OH: McGraw-Hill Education; 1997. 432.

Girts Karnitis born in 1974, earned his Doctors degree in Computer Science from the
University of Latvia in 2004. He has published more than 20 scientific papers and has
participated in many software projects as designer and programmer. His main scientific
interests include business process modelling and database technologies, including NoSQL
databases and Big Data technologies. Contact him at