Sei sulla pagina 1di 20

MME Migration issue

(aka - LTE Rehome issue)

Impact: LTE KPI Degradation


Northern California

June 2012

Copyright 2010 AT&T. All rights Reserved.

MME Migration issue


Investigation was triggered after 190 basis point degradation in LTE KPI was identified 12 May (HOSuccess) Deep dive by NCAL mRAN team identified that several active ENodeB had MME configuration modified during a recent MME Migration mRAN team worked with OEM vendor and MPC to quickly resolve the issue and triage impact Configuration update was made on 7 June and ATS ticket for RCA submitted in parallel to modify process to prevent reoccurrence

Slide 2

DRAFT FOR REVIEW Copyright 2010 AT&T. All rights Reserved.

Escalate

Identify

Identify: KPI impacts


Cluster: Zone 11D
Chart 1

Resolve

Analyze

KPI degraded severely May 12,2012

Chart1: RRC Success & Accessibility Stats degraded


a. Refer to slide 19 for detailed chart HO Success impacted by 190 basis points Affected KPIs- Mobility(HO Success) Chart 2: Mobility degradation a.

i.
b. c. i.

Pink trend line: 99.0% - 97.1%


Refer to Slide 18 for detailed chart
Chart 2

HO Preparation degraded HO Execution remained flat

Investigated KPIs

Checked areas that degraded most which was directed to cluster 11D.
Focused on delta report a. 4 biggest movers i. CCL00411,CCL02128,CCL05717 and CCL04430

Slide 3

DRAFT FOR REVIEW Copyright 2010 AT&T. All rights Reserved.

Escalate

Identify

Analyze Issue:
Troubleshooting-New Site Integrations Zone 11D
Chart 3

Resolve Analyze

Investigated common issue due to hot/live sites


a. Found: Issues due to TermPoints & relations created by newly integrated sites (ANR issue).

b.

Action: Deleted relations, disabled TermPoints created by these newly integrated sites and blocked ANR.
Conclusion: Improvement to HO success; shown in the Chart3 by Green arrow; HO success still shows an issue.

c.

Analysis showed that there was still a bigger problem remaining hidden

Slide 4

DRAFT FOR REVIEW Copyright 2010 AT&T. All rights Reserved.

Escalate

Identify

Analyze Issue:
Troubleshooting- MME Configuration
Fig 1

Resolve Analyze

MME discrepancies found in four severely degraded sites.


a. Fig 1 shows that one of these sites has only 6 MMEs defined: should be 10 MMEs i. Verified all four had same issue b. Fig 2 shows TermPointsToMME discrepancies i. MME 7 & 8 were missing in all impacted sites
Fig 2

c. A recent MME activity (LTE Rehome) occurred exactly when KPI degraded i. May 12,2012 - new MME pool added to these 4 sites incorrectly

Verified results using CTR traces


a. See next slide for sample of traces

Slide 5

DRAFT FOR REVIEW Copyright 2010 AT&T. All rights Reserved.

Escalate

Identify

Analyze Issue:
Troubleshooting- UE Issues (CTR Traces)

Resolve Analyze

MME Configuration issues confirmed by CTR Traces As you can see from the below trace, the UE is looking for MME 6
a. b. c. <0x599> is decimal equivalent for MME Proxy iD:1433 Proxy iD: 1433 corresponds to TermPointToMME id 6 from the previous slide MME 6 is not defined

As you can see from the two traces below; UE fails to hand into eNB CCL00411 and the connection is eventually released from source cell

Slide 6

DRAFT FOR REVIEW Copyright 2010 AT&T. All rights Reserved.

Escalate

Identify

Resolve

Resolve

Analyze

RF Support Team took the following actions to fix the issue


1. Instructed MPC team to add the missing MMEs to the four issue sites 2. Ericsson was involved to fix the discrepancies for the sites with MME Discripencies.

Slide 7

DRAFT FOR REVIEW Copyright 2010 AT&T. All rights Reserved.

Escalate

Identify

Resolve

Resolve

Analyze

RF Support Team took the following actions to fix the issue AfterMME Fix- Mobility(HO Success) recovered. You can see this in Chart 4. Accessibilitystarted showing signs of recovery. You can see this in Chart 5.

Slide 8

DRAFT FOR REVIEW Copyright 2010 AT&T. All rights Reserved.

Escalate the fix


RF Support Team took the following actions to fix the issue

1.

MME Discrepancies were escalated to MPC team for immediate fix.


Opened TICKET with MPC(NSN042712608.0 & NSN042712610.0) and ATS/MNRC (TT000033428296) for confirmation of findings.

2.

Slide 9

DRAFT FOR REVIEW Copyright 2010 AT&T. All rights Reserved.

Next Steps
Integration vendor who is handling eNB integrations must incorporate MME checks as well Verify all active MME in Pool are defined

When a new MME is added to the pool; all impacted eNB must be audited/updated
MME rehome: eNB has a change to MME in the pool (addition or deletion) Verify that there are no S1 MME SCTP alarms on the eNBs in question

Ensure all MME are active:


Recall NCAL issue: Identified 6 active & 4 inactive MMEs Run a ping trace from all MME SCTP associations at the eNB to the MME to ensure there is connectivity Events are triggered despite RelativeCapacity=0 for the new MMEs Expect that issue is associate with the MME mismatch

Requested RCA to explain HO failures

Slide 10

DRAFT FOR REVIEW Copyright 2010 AT&T. All rights Reserved.

Summary:
LTE KPI Degradation due to MME Migration
1. Cluster KPI degraded a. Deep dive showed a lot of activity in the market during the same period i. New site Integration in Zone 11D. ii. LTE-Rehome (New MMEs added to pool) b. Narrowed degradation to 4 sites with most impact c. HO Prep Stats pointed to MME mismatches . 2. Further investigation was performed on one of the cells that was most broken. a. Total 4 sites were identified with MME Discrepancies: (CCL00411,CCL02128,CCL05717 and CCL04430) b. CCL00411 is one of the sites for which Accessibility, RRC Success and HO Success degraded severely(refer to slide 4 and 5) which was identified and escalated by RF support team. c. From the CTR traces it was identified that HO Preparation failures occurred on all the sites that were trying to Hand in to the sites CCL00411. These were caused because TermPoinToMME between two MMEs were not defined correctly. 3. Deep dive concluded that MME LTE rehome was root cause of issue a. Opened TICKET with MPC fro fix and ATS for confirmation of findings. b. Ticket numbers are (NSN042712608.0 & NSN042712610.0 ) (June 6, 2012) with MPC. Ticket numbers (TT000033428296) with ATS/MNRC. 4. Ericsson were involved to fix the MME issues on CCL00411 and CCL05717.
Slide 11 DRAFT FOR REVIEW Copyright 2010 AT&T. All rights Reserved.

Appendix

Slide 12

DRAFT FOR REVIEW Copyright 2010 AT&T. All rights Reserved.

Troubleshoot details
ENodeB to MME configuration issue caused by MME Migration- Details 1) Observation: A . After the audit we identified 3 sites in the network which had only 6 termpttoMME's defined which were unlocked and enabled. The rest have 8 termpointtoMME's defined which are unlocked and enabled. 2 of the 3 sites below were launched and on-air but missing termpointtoMME's . When these sites initially came on air they had two only MMEs defined . MPC has been adding new MMEs for capacity reasons.

B. Below Three sites show same scenario as shown above


eNB CCL00411 CCL04430 CCL05717
Slide 13

#MME 6 6 6
DRAFT FOR REVIEW Copyright 2010 AT&T. All rights Reserved.

MME Discrepancies

Slide 14

DRAFT FOR REVIEW Copyright 2010 AT&T. All rights Reserved.

MME Relative Capacity snapshot

Relative Capacity for in-service MMEs should be set to 64 and Non-service ones to 0
DRAFT FOR REVIEW Copyright 2010 AT&T. All rights Reserved.

Slide 15

HO Failures On CCL00411
Impact of the mismatch: HO Degradation was noticed on the sites around CCL00411 which was found with MME discrepancy. Explained in the next slide.

Slide 16

DRAFT FOR REVIEW Copyright 2010 AT&T. All rights Reserved.

HO Failures Explained

Impact of the mismatch: As you can see in the above slide. Handover failures were seen in the preparation stage into the node which has the lesser number of termpttoMME's defined. This in turn brought down the Handover success rate. As you can see from the KPIs in below for the site CCL00411 in slide 16 and Zone 11D in slide 18 and 19( since all three sites belong to this district). HO Prep% into the node with the lesser number of termpttoMME's defined we see degradation. CCL00411 has only 6 termpointtoMME's defined. Every time you see a Handover prep failure, the 1st few checks should be if the license is applied and enabled, you haven't crossed the number of allowed users {which won't happen with our network currently}, all the termpoints are unlocked and enabled and there is no sctp imbalance between the nodes. These are normally the reasons for Handover Prep failures.

Slide 17

DRAFT FOR REVIEW Copyright 2010 AT&T. All rights Reserved.

KPIs show Improvement after MME FixHO Success

Slide 18

DRAFT FOR REVIEW Copyright 2010 AT&T. All rights Reserved.

KPIs show Improvement after MME Fix- Accessibility and RRC success

Slide 19

DRAFT FOR REVIEW Copyright 2010 AT&T. All rights Reserved.

CTR Traces Explained


CTR Traces printout shows following error

(Ft_RRC_CONN_SETUP) /vobs/erbs/node/lm/centralLmU/model/LmCentralLmU_mp750_EXE/src/Ueh NwIfMmeSelectionC.cpp:224 CHECK:!<0x599>! Couldn't select an MME for the


0x599 which in decimal is 1433 and would correspond to termpointtoMME id 6 which is currently missing You can see details in the attached CTR trace file.

Slide 20

DRAFT FOR REVIEW Copyright 2010 AT&T. All rights Reserved.

Potrebbero piacerti anche