Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
M. Amirijoo (Ericsson),
L. Jorguseski (TNO ICT),
T. Kürner (TU Braunschweig),
R. Litjens (TNO ICT),
M. Neuland (TU Braunschweig),
L. C. Schmelz (Nokia Siemens Networks),
U. Türke (Atesio)
Abstract
1
1
Outline
Introduction
Components in cell outage management
Operator policy
Measurement
Control parameters
Scenarios
Assessment criteria
Simulator
Future work
WWW.FP7-SOCRATES.EU
2/20
2
Introduction
One aspect that benefits from self-organization is cell outage management (COM),
which consists of:
– Cell outage detection
– Cell outage compensation
Reasons for outages, e.g.:
– hardware and software failures,
– external failures such as power supply or network connectivity
Outages
– may not be detected for hours or even days
– may require manual analysis and unplanned site visit
– SOCRATES COM concentrates on those cases where an eNodeB reset does
not help
Outage detection function must timely inform the operator about the occurrence and
the cause of an outage
Automatic compensation actions are triggered to alleviate performance
degradation
WWW.FP7-SOCRATES.EU
3/20
There are multiple reasons for a cell outage, e.g., hardware and software failures (radio board
failure, channel processing implementation error etc), external failures such as power supply or
network connectivity failures, or even erroneous configuration. While some cell outage cases
are detected by Operations Support System (OSS) functions through performance counters
and/or alarms, some may not be detected for hours or even days. It is often through long term
performance analysis and subscriber complaints that these outages are detected. Currently,
discovery and identification of some errors involves sometimes manual analysis and may
require unplanned site visits, which makes cell outage detection a costly task. It is the task of
the cell outage detection function to timely inform the operator about the occurrence of an
outage and the cause of the outage.
In the event of a cell outage detection, appropriate compensation methods are triggered to
alleviate the degraded performance due to the resulting coverage gap and loss in throughput
by appropriately adjusting radio parameters in surrounding sites. In general, human
involvement shall only be triggered when absolutely necessary, e.g., when manual repairs are
needed.
3
3
Components of Cell Outage Management
Control
parameters
Operator policy:
Coverage, QoS
Compensation
-70
2000
-80
1500
-90
1000
-100
500 -110
0 -120
-500
-1000
-130
-140
Detection Measurements
-150
-1500
-2000
-2000 -1500 -1000 -500 0 500
Cov. map
1000 1500 2000 2500
-160
-170
estimation
Simulation tools Scenarios Assessment criteria
WWW.FP7-SOCRATES.EU
4/20
Various measurements are gathered from the user equipments (UEs) and the base stations
(called eNodeBs in LTE). The measurements are then fed into the cell outage detection
function, which detects whether at the current time an outage has occurred and triggers the
cell outage compensation function to take appropriate actions.
The goal of cell outage compensation is to minimise the network performance degradation
when a cell is in outage. This is done by automatic adjustment of network parameters in order
to meet the operator’s performance requirements based on coverage and other quality
indicators, e.g., throughput. It is realized that performance requirements may not be achieved
during an outage and, as such, the goal of the compensation function is to meet performance
requirements to the largest possible extent. Cell outage compensation algorithms may for
example alter the antenna tilt and azimuth, or the cell transmit power, in order to cover the
area that is in outage.
Altering the radio parameter of the neighbouring cells means that some of the UEs served by
neighbouring cells may be affected. For example, if the coverage of a neighbouring cell
increases then this implies that more UEs will be served and, consequently, the UE throughput
may decrease. This should be taken into account and an appropriate balance between
coverage and other quality indicators (e.g. throughput) should be achieved. This balance is
indicated by means of an operator policy that governs the actions taken by the cell outage
compensation function.
In order to monitor and evaluate the actions taken by the cell outage compensation algorithm,
there is a need to estimate the coverage around the vicinity of the outage area. This is
provided by the coverage map estimation function, which continuously monitors the network
coverage by means of measurements and possibly prediction data.
There are also other activities being pursued along this work, namely, the development of
simulation tools, scenarios that capture different outage cases, and methods and criteria for
assessing the impact and performance of cell outage detection and compensation algorithms.
4
4
Measurements
WWW.FP7-SOCRATES.EU
5/20
The following example nodes may be used when collecting the measurements.
eNodeB1 denotes the eNodeB that encounters a cell/sector outage or a complete node outage
(marked by a flash).
UE1 is served by eNodeB1 (shown by solid line). UE1 furthermore receives signal from
neighbouring eNodeBs 2 and 3 (shown by dotted line) and reports measurements to eNodeB1.
UE2 is served by eNodeB2. Furthermore, UE2 receives signal from eNodeB1 (shown by
dotted line) and reports measurements to its serving eNodeB2.
Access Gateway (aGW) where affected eNodeB1 is connected to via S1 interface is shown by
a dashed line.
5
5
Measurements - Details
WWW.FP7-SOCRATES.EU
6/20
The following measurements may be useful when detecting and compensating an outage. It is to be
further investigated to see which of the measurements that are needed by the cell outage management
functions.
•Cell Capacity: zero capacity or sudden decrease of capacity may indicate an outage
•Radio Link Failure (RLF) counter: sudden increase may indicate a cell outage
•Internal handover failure rate: high failure rate of internal handovers (between sectors / cells of one
eNodeB) may indicate the outage of a sector / cell
•Incoming / outgoing handover failure rate: high inter-eNodeB handover failure rate towards dedicated
eNodeB may indicate outage of a sector of this eNodeB or the whole eNodeB
•Relative load indicator: message is exchanged between neighbouring eNodeBs via X2 interface to
transfer load and interference coordination information. The indication of sudden interference level
decrease reported through load indicator messages may indicate a cell outage at the sending eNodeB
•Resource status reporting: an eNodeB may request resource status from neighbouring eNodeBs via X2
interface. Different measurements can be requested, such as cell load or prioritised bit rates, e.g.,
speech which is for further study (FFS), in standardisation. Suddenly changing values of measurements
may indicate cell outages, and the resource request could also be used to confirm potential outages that
have been indicated by other measurements
•Cause Information Element: multiple X2 messages include the (mandatory) information element (IE)
"Cause", e.g. Handover Request, Handover Preparation Failure, Handover Cancel, Error Indication, X2
Setup Failure, eNodeB Configuration Update Failure, and Resource Status Failure. Relevant cause
values for cell outage management are: “Cell not Available”, “Transport resource unavailable”,
“Hardware Failure”. This information exchanged via X2 interface can be used for cell outage detection
and cause analysis
•Interference measurements: sudden changes (reduction) in interference measurements taken by
eNodeB or UE may indicate outage of neighbour eNodeB; this is still a topic FFS
•Reference Signal Received Power (RSRP): sudden decrease in received power level of serving or
neighbour cells may indicate a cell outage. RSRP includes measurements of serving and neighbouring
cells but is reported only to the serving eNodeB
•Channel Quality Indicator (CQI): the CQI is a measurement about the radio channel quality of the
serving eNodeB which is taken by the UE, and reported only to the serving eNodeB. A sudden decrease
of the CQI level may indicate an upcoming cell outage of the serving eNodeB. On the other hand, a
suddenly increasing CQI may indicate the outage of a neighbouring eNodeB which may cause a
decrease in channel interference
•Failure reports: failure reports are generated by the UE after connection or handover failures and sent to
eNodeB / OAM system for cause analysis. These reports can also be used for cell outage detection or
cause analysis
•Historic measurements: the UE provides measurements such as RSRP, CQI and others to the serving
eNodeB during connection – in case of a connection failure to the serving eNodeB (e.g. due to a cell
outage) the latest measurements are usually lost. These historic could be provided to the new serving
eNodeB after cell re-selection for failure cause analysis (FFS in standardisation)
•Blocked / dropped calls: the number of blocked or dropped calls is counted by the eNodeB and by
default sent to the OAM system – a sudden increase may indicate cell outage
•Alarms from eNodeB / aGW: eNodeB and aGW send alarms to the OAM system in case of connection
failures, handover failures etc. These alarms could be used for cell outage detection and cause analysis
6
6
Control Parameters
WWW.FP7-SOCRATES.EU
7/20
The power allocated to physical channels dictates the cell size. One the one hand, by
increasing the physical channel power the coverage area of a cell can be increased (in order to
compensate for outage). On the other hand, by lowering the cell power the cell area is reduced
and as a consequence load and interference caused by the cell can also be reduced.
Modern antenna design allows influencing the antenna pattern and the orientation of the main
lobe by electrical means (e.g., remote electrical tilt, beam forming) . This is the basis for an on-
line control of the antenna characteristics and thus provides the option to modify the
characteristics according to the needs of self-organisation. In case of a cell outage, the
footprint of the surrounding cells change as one or more cells "disappear". The changes in the
footprints are coincidental. As part of cell outage management, these changes may be
controlled as to optimize network coverage and load balancing among the remaining cells.
The consideration of Home-eNodeB for outage compensation is for further study, because the
implementation details of Home-eNodeBs are not yet completely fixed. Two aspects are
currently foreseen as possible options. First, if there are sufficient Home-eNodeBs in the
outage area, these may take over some macro-layer traffic, once a macro-level sector/eNodeB
has failed. The feasibility of this approach depends on whether it is possible to open Home-
eNodeBs for access by UEs. Second aspect, is that Home-eNodeB users – if operating in the
same spectrum - cause UL interference to the macro-layer. During an outage, resulting in a
high distance between UE and a functioning eNodeB, the coverage might be significantly
reduced by the UEs connected to Home-eNodeBs. A possible approach is to turn Home
eNodeBs off whenever there is an outage or at least decrease the transmission power of UE
served by Home eNodeBs.
7
7
Scenarios
The developed cell outage management algorithms are assessed for different
scenarios, comprising
– Outage scenarios: type and extent of failure
– Evaluation scenarios: considered network, traffic and environment scenarios
– COM mechanism: detection (COD) vs compensation (COC)
X2
X2
– COC: X2 failure may affect HO performance and the
effectiveness of ICIC
WWW.FP7-SOCRATES.EU
8/20
The objective is to formulate a number of scenarios that are considered in the development
and assessment of cell outage management methods. These scenario descriptions comprise
three key elements (i) the considered outage scenario, e.g. sleeping sites/cells, failing
sites/cells or failing transport network links; (ii) assumptions regarding network, traffic and
environment aspects for which useful deployment of cell outage management solutions is
anticipated; and (iii) whether cell outage detection and/or cell outage compensation methods
are/should be developed.
A distinction is made between four categories of outage scenarios: (i) physical channel failures
causing sleeping sites/sectors; (ii) whole site/sector failure (where failure is reported to OSS);
(iii) transport network link failures; and (iv) other types of malfunctions, e.g. reduced
performance due to hardware aging.
In category (i) different subcases can be distinguished offering different potential for (primarily)
cell outage detection solutions, depending on the actual physical channel that fails, e.g. the
synchronisation channel, reference signal, broadcast information, paging channel or user plane
transmission. The type of control/signalling channel that fails influence e.g. whether
network/mobile-originating calls or handovers can be supported, which in turn affects the
possibilities for developing cell outage detection methods.
In category (iii), a distinction is made regarding the type of transport network link that fails, e.g.
in case all S1 links to an eNodeB fail, the site is entirely unfunctional (as in category (ii)),
whereas if an X2 link fails, handovers (HOs) may endure additional delays as they will be
handled via S1 links, and inter-cell interference coordination (ICIC) algorithms could no longer
function, which affects capacity and/or coverage.
8
8
Scenarios
Evaluation scenarios
Impact of eNodeB density and load
More COC potential in a dense capacity-driven network layout
Impact of service type
More COC potential in an area with predominantly low-bandwidth service, e.g. VoIP
telephony
Impact of outage location
More COC potential if a cell/site
outage occurs at the inner part
of an LTE island
Also study impact of:
user mobility
spatial traffic distribution
propagation aspects
UE terminal class
WWW.FP7-SOCRATES.EU
9/20
The following evaluation scenarios have been formulated for the development of cell outage
compensation methods. Furthermore, the priority of the evaluation scenarios is given.
•Impact of eNodeB density and traffic load: in a sparse, coverage-driven network layout, little
potential exists for compensating outage-induced coverage/capacity loss. In a dense, capacity-
driven network layout, this potential is higher, particularly when traffic loads are low. Priority:
high.
•Impact of service type: the distinct elasticity and quality of service requirements of different
services affect the compensation potential. For instance, compensation actions may be able to
alleviate local outage effects to handle only low bandwidth services. Priority: high.
•Impact of outage location: if cell outages occur at the edge of an ‘LTE island’ fewer
neighbours exist to help compensate. For outages in the core of such an ‘LTE island’, the
compensation potential is larger. Priority: medium.
•Impact of user mobility: if mobility is low (high), few (many) users spend a relatively long
(short) time in an outage area. The perceived outage impact depends on the delay-tolerance
and elasticity of the service. Priority: medium.
•Impact of spatial traffic distribution: if traffic is concentrated around sites, such traffic is
typically relatively far away from neighbouring sites and hence the compensation potential is
limited. If traffic is concentrated ‘in between’ sites, the potential is larger. Priority: medium.
•Impact of propagation aspects: the higher the path loss exponent (higher frequency bands,
more urbanised environments), the more difficult it may be to cover such users with relatively
remote compensation cells in case of an outage. Furthermore, a higher shadowing variation
(also primarily depending on the propagation environment) may allow more distant cells to
serve calls in an outage area. Priority: low.
•Impact of UE terminal class: the higher a UE’s maximum uplink transmit power (class-
dependent) the lower the need for outage compensation, since it may still be able to attach to a
more distant cell even without compensation measures. The potential for cell outage
compensation is higher, since retuned surrounding cells may be more able to serve such a
user. Priority: low.
9
9
Assessment Criteria – Cell Outage Detection
F D D F
WWW.FP7-SOCRATES.EU
10/20
A so-called failure duration interval is associated with each failure, which starts with the
occurrence of a failure and ends with the elimination of the failure (e.g., by repairing the error
involved). A true detection is a detection which is reported by the cell outage detection
mechanism during the failure duration interval. In contrast a false detection is reported outside
the failure duration interval.
Processing overhead refers to the amount of processing needed to detect the outage. The
execution time of a particular algorithm typically depends on the size n of the input data, e.g.,
number of measurements, and can be asymptotically logarithmic (log n), polynomial (na), or
even exponential (an). An analysis of asymptotic execution time gives an insight in the
processing demand of an algorithm.
Assume that there are nL levels of severity, where each severity is characterized by multiple
factors, e.g., reduction in QoS, capacity, and coverage. Let Ldetector denote the severity
indication level estimated by the detection algorithm and Ltrue be the true severity indication
level, i.e., the level classified if true network conditions are known. The level estimation
accuracy AL is a function of Ldetector - Ltrue, e.g., AL = |Ldetector - Ltrue|. Non-linear variations of AL
may also be considered, e.g., AL = (Ldetector - Ltrue)2.
10
10
Assessment Criteria – Cell Outage Compensation
Network Performance
(Coverage, Throughput)
Margin 1
Coverage
Margin 2
Throughput
TSS1 Ti me
Ti TSS2
Convergence Ti me
WWW.FP7-SOCRATES.EU
11/20
The convergence time is defined as the difference between the time the outage compensation
algorithm reaches a steady state on the network performance and the time outage is detected.
This is measure of how fast the compensation algorithm reconfigures the network. Different
network performance metrics can be considered here. In the figure above, coverage and
throughput performance metrics are chosen for illustration.
The convergence time of coverage is obtained by TSS2 - Ti for one cell (similar definition is
used for, e.g. throughput). The margins (Margin 1 and 2) presented in the figure are used to
set some allowed deviation to the final state of performance. Margin 1 can be considered e.g.
±1%, meaning that coverage can oscillate between the steady state plus or minus 1%. TSS2 is
set to the earliest time after which coverage is within margin 1. The convergence time in the
area under consideration is obtained as the maximum of convergence times of all the cells
involved in the cell outage compensation.
11
11
Operator Policy
In a cell outage situation an operator may still target at the ideal goal of
– achieving the best coverage possible,
– providing the highest accessibility, and
– delivering the best possible quality in the cell outage area and all surrounding
cells.
The optimisation goal itself may vary depending on the operator’s policy
– coverage orientation
– quality orientation
– capacity orientation
WWW.FP7-SOCRATES.EU
12/20
Every operator has its own policy for the network performance. For cell outage management
the operator wants to implement its policy into the network. For this, optimisation goals and a
cost function have to be defined.
In a cell outage situation an operator may still target at the ideal goal of
•achieving the best coverage possible,
•providing the highest accessibility, and
•delivering the best possible quality in the cell outage area and all surrounding cells.
In most cases, all these goals cannot be fulfilled at the same time. As a consequence the
targets have to be weighted and/or ranked in order to provide quantitative input to an
optimisation procedure.
The optimisation goal itself may vary depending on the operator’s policy. For example, one
operator wants to maximize its income also during the cell outage, whereas another operator
may have also the long-term user satisfaction in mind. In the first case an operator may target
a high capacity sacrificing coverage in a large area yielding a low perceived coverage, which is
often worse for the reputation compared to a higher blocking rate. Hence, the policy definition
should be modular/flexible enough to capture different operator strategies, e.g., coverage
oriented operator vs. capacity oriented operator.
12
12
Operator Policy
The cost function has to take all these different foci into account
WWW.FP7-SOCRATES.EU
13/20
When defining the cost function it has to be decided which cells have to be taken into account.
Therefore, three groups of cells have to be distinguished: the cell in outage, all cells that are
actively tuned by the COM algorithms, and all cells that are not actively tuned by the COM
algorithms but influenced by the network changes at all other cells. This is shown in the upper
figure. The cell in the centre is the cell in outage. The first ring are all cells that are actively
tuned by the COM algorithms. The second ring (outer ring) are all cells that are not actively
tuned but are influenced by the network changes at all other cells.
Optimisation goals may vary for different cells. For example, in cells covering large areas
coverage should be kept high, whereas in high-capacity cells located in the same region the
focus will be more on accessibility and/or quality. As a consequence the optimisation goals
have to be defined on a cell-basis. This concept is shown in the lower figure in which the
different layouts of the cells represent different foci. In addition to the different optimisation
goals cells may have different importance, too, so that different priorities are assigned to the
cells.
The cost function has to take all important effects, such as coverage, accessibility, and quality,
into account. Normalised values are used in the cost function because it is easier to weight the
normalised and dimensionless values of the different criteria against each other. The reference
value to which the criteria are normalized, and the normalisation procedure have still to be
defined.
Accessibility and quality can be assessed by key performance indicators (KPI) retrieved from
counters available in the network. The coverage can be estimated based on measurement
reports. Therefore, geographic coordinates of the mobile position or at least an estimation of
these coordinates have to be assigned to the obtained measurement reports. The main
challenge here is how to accurately estimate the position of the UE. It has to be investigated to
which degree coverage can be estimated based on measurement reports.
13
13
COC Observability study
Goal of the study is to define and evaluate specific KPIs for the evaluation of cell
outage compensation success
Coverage:
– Describe the percentage of the area of the failed cell that is covered by modified other cells
after compensation identification of coverage gaps
– For Simulator the area can be subdivided into “pixels” by an overlay grid, and the expected
downlink and uplink coverage can be calculated open question how this can be achieved
in a real system without requiring drive tests, e.g. through measurements from UE
Accessibility (GBR traffic, e.g. speech, VoIP, IPTV etc.)
– Calculation of the Session Set-up Success Rate (simple approach)
– Calculation of the Establishment Success Rate that includes the requested QoS Class
Identifier (QCI)
– If possible using measurements / counters (session blocking ratio) available from eNodeB;
potential to use counters from MME; it is to be evaluated how far in a real system counters
are available with the required frequency
Quality / Throughput (non-GBR traffic, e.g. FTP, HTTP etc.)
– IP throughput (uplink / downlink) according to the requested QCI
– Direct measurement of IP traffic in eNodeB and UE, and comparison of the results with
requested QCI will be difficult to be performed locally approaches to use information
about buffer sizes at eNodeB and UE currently under evaluation
WWW.FP7-SOCRATES.EU
14
Efficiency of cell outage management
WWW.FP7-SOCRATES.EU
15
Future Work
Socrates will continue the work on cell outage detection and compensation
WWW.FP7-SOCRATES.EU
16/20
Next steps involve modelling various aspects of LTE and implementing simulators for cell
outage compensation and coverage map estimation. The simulator implementation is planned
to be finished by July 2009.
A controllability study will be carried out in order to see to what degree an outage can be
compensated (by manual means) as well as to understand the relation between control
parameters and overall goal and assessment criteria of cell outage compensation. For
example, a study will be carried out to understand the impact of antenna tilt on coverage and
quality. The controllability study is planned to be finished by July 2009.
The questions that will be addressed in the observability study deal mostly with to what degree
measurements, counters etc from the network can aid in
•detecting an outage,
•estimating the impact of compensation measures and the performance of the compensation
function, and
•estimating the coverage in the network
The observability study is planned to be finished by July 2009.
There will be an intermediate report finalized by Oct 2009. Algorithms for cell outage detection
and compensation will be developed using the knowledge obtained in the controllability and
observability study. The algorithm development and evaluation is planned to be finished by
beginning 2010. Socrates will publish results in project deliverables, conferences, and journals.
16
16
Contact
WWW.FP7-SOCRATES.EU
17/20
17