Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
IN BSS
AXE810&APG 40: MAIN CHARACTERISTICS
SS7
BYB 501 BSC-TRC 2 SS7
BSC-MSC 4 channels
Ater
A
1
Y&E
BSCs AXE810 IN VF SPAIN NETWORK
2
Y&E
AXE810 TEST PERIOD
•TEST PERIOD:
• Third phase:
3
Y&E
MAIN PROBLEMS FOUND
1. Telnet and FTP were used for remote admin access and software
upgade (plus TFTP) to the APG40 plus the PC-Anywhere.
• The Telnet and FTP services (between the OSS TMOS and the
APG40) were stopped, desinstalled and substituted by Secure
SHell (SSH) using the F-Secure software pack.
5
Y&E
MAIN PROBLEMS FOUND: SECURITY
6
Y&E
MAIN PROBLEMS FOUND:FIREWALL HW
VLAN BSC
Insecured side
Secured side
Ethernet Ethernet
Node A Node B
APG40 APG40
FW
BSC AXE810
7
Y&E
WHY ARE WE SO CONSIOUS ABOUT SECURITY?
8
Y&E
QUESTIONS
9
Y&E
QUESTIONS
• How could an Operator deal with a DoS attack in HLR that blocks
billing?
•Why Ericsson doesn’t use a propietary system (as in IOG20) or a
MIPS RICS microprocessors an Non Stop UX/os (as in APG30)?
10
Y&E
AXE 810- APG40
IN HLR
11
Y&E
WHY APG40 IN HLR ?
12
Y&E
APG PROGRAM IN VF Sp
Usual Dates:
• Validation Test : 03/02/03 – 21/02/03 (3 weeks)
• Acceptance Test : 27/02/03 – 12/03/03 (2 weeks)
13
Y&E
PROBLEMS SUMMARY
VALIDATION TEST:
16 TROUBLE REPORT:
3 with critical service impact
1 with OAM critical impact
1 with security impact
ACEPTANCE TEST.
52 TROUBLE REPORT:
17 with critical service impact
10 with OAM critical impact
1 with security impact
3 with statistic impact
14
Y&E
MAIN PROBLEMS I
Critical problems in APG40 without solution.
– APG out of service during more than 24 hours. Unknown reason. Ericsson’s
solution was to initialize APG.
– Data corruption in one shared disk of APG. Unkown reason. Ericsson’s solution
was to initialize APG.
o Ericsson answer: APG faulty in test environment
Instability Problems.
– APG in state “unkown”- It was required a manual intervention. The APG got this
state after a test or suddenly without reason.
– Communication lost between APG and CP with unkown reason.
– Instability in CP (error interrupt) due to a problem in APG.
– Lost of service in one of the sides after APG switch on/off
o Ericsson’s solution were patches, manual restarts or replace the APG
Statistics / Maintenance Problems
– Common Lost of connection with TMOS
– Incorrect hour in statistics data received in TMOS
– TMOS doesn't receive the statistics form HLR keeping its connection.
15
Y&E
MAIN PROBLEMS II
Documentation Problems:
– Recovery procedures for critical situations within APG40 availabe in ALEX were
faulty or incomplete.
– Many alarms regarding APG40 were not found in ALEX.
– Most of OPIs, referred by alarms in APG, end with phrase “Contact next level of
support” and no other solution was given.
16
Y&E
CONCERNs:
Today:
The procedure to update Windows NT(v4.0) with urgent packages is not
clearly defined by Ericsson.
The main sw of the ant-virus will not be updated until R11.
It is not possible choose the kind of anti-virus in the APG.
Ntbackup without response and activity check.
Future :
APG in MSCs with billing blocks open to DoS attacks.
Product based in Windows means instability .
Until R12 the operator needs a hw firewall per APG.
17
Y&E
NEXT STEPs
Resume after 6 moths the validation test for the APG in HLRs
18
Y&E
Backup...........
Problem description Impact Comments
There is no switch-over (active-passive) if we turn-off the Active node. The
Passive node changes state to "unkown". There is no control of APG nor Critical Solved with AC-A4 to 7
HLR.
There is no switch-over (active-passive) if there is a fail in the ethernet
cable for external communications in Active node on APG. In this situation
Critical Solved with AC-A4 to 7
there is only communication with Passive node on APG, so we cannot
reach the CP.
Data lost after reload during the execution of the test : Command log
Critical Command log file was bad defined.
execution after reload fails due to incorrect definition of Command Log file
While doing test, we got disk corruption. We had to stop testing for almost a week while Ericsson specialists solved
Disk corruption in the test bed. Unkown reason Critical
the problem.
APG out of service during more 24 hours. Unkown reason. Critical While doing test, we found the APG out of service. We had to stop testing again.
OPI – Vodafone Doc. – Type Acceptance Doc.
“AP, System Restore, Initiate” - Rev. H - Rev. M
“AP, System Backup and Verify, Initiate” - Rev. J - Rev. M
“AP, System Data Disk Restore” - Rev. B - Rev. F
Documentati “AP, System Disaster Recovery” - Rev. E - Rev. L
Vodafone APG documentation (ALEX) is not updated.
on “APG40, Node Change” - Rev. K - Rev. T
“Central Processor Store, Size Change” - Rev. D - Rev. E
“Command Log, Activate” - Rev. A - Rev. C
“Command Log, Initiate” - Rev. A - Rev. B
Documentati
INCORRECT OPI “AP, SYSTEM DISASTER RECOVERY” (REV. L) Bad steps: 89, 102, 146, 195.
on
Documentati
INCORRECT OPI “AP, SYSTEM DATA DISK RESTORE” (REV. F) Bad steps: 81, 90
on
Documentati After turn off APG, we get alarm "AP Fault" with "Node is down Cause". We follow OPI and the result is "Contact next
Incomplete OPI “AP FAULT”
on level of support". There is no "power check" step in the OPI.
APG “AM_LOG_EVENTLOG_TYPE” Y CP “AP FAULT. GENERAL Documentati
Alarm appears while doing tests in test-room. Ericsson support people don't know anything about it (cause, solution)
ERROR” alarm not specified in Ericsson documentation on
APG “fcc_save_to remove.
EVENTLOG_ERROR_TYPE_INTERNAL_DESCRIPTION_FOR_MAINTEN Documentati
Alarm appears while doing tests in test-room. Ericsson support people don't know anything about it (cause, solution)
ANCE_PURPOSES” Y CP “AP FAULT. GENERAL ERROR” alarm not on
specified in Ericsson documentation
“RDT_SERVICE. PROCESS DEATH” alarm not specified in Ericsson Documentati
Alarm appears while doing tests in test-room. Ericsson support people don't know anything about it (cause, solution)
documentation on
Problems to verify tape backup execution if you execute it from command Documentati If you want to control the evolution of a backup to tape you can't send it from command line. You must use graphical
line on tool instead.
Documentati
It's necessary to have administrator profile to execute backups
on
In "NETWORK SURVILLANCE" functionality, the parameter Documentati This parameter specifies the time that the APG should wait before switch-over in case of Network connectivity
“ACS_NSF_ROUTERREPONSE” doesn't work properly on problems. It does not work properly.
Documentati After an APG reboot caused by the Network Survillance functionality, we get an "AP Reboot" alarm, wtih cause
“AP REBOOT” alarm with wrong cause
on "Command Initiate". It's not correct.
19
Y&E
Problem Description Impact
Fault code 6 in SYBUP execution. It's impossible to do backups Critical
Data lost after reload during the execution of the test : Command log execution after reload fails. Critical
Communication lost between APG and CP without reason. It's impossible the communication with the HLR in MML mode, only CPT mode. Critical
APG Node B fall down without reason. It's necessary a local reset to recover it. Critical
APG Node A fall down without reason. It's necessary a local reset to recover it. Critical
I-Module: wrong definition of APG files. Some of the files weren't included in the I-Module (Statistics, etc). Critical
Instability in CP due to a problem in APG . Critical
One APG side pass to "Unkown" state, after a restart, without reason. Manual work to recover the APG . Critical
Some inestabilities in APG without reason . Critical
Lost of service in one of APG sides after switch off and switch on both sides of APG . Critical
IPUs (CP) boards with ROJ 212 238/2 R1B version have a design problem. Remove then and change with a new IPU boards. Critical
Wrong Synchronism connections wiring. Critical
Clocks out of range in HLR . Major
TMOS doesn't receive the statistics form HLR Critical
Incorrect hour in statistics data received in TMOS . Critical
Wrong distribution in the disk L size. There aren't enough size in the HLR for 3 Backups. Major
The file HPSDFOAFILE is incorrectly defined. Critical for Provisioning and O&M
EHIP stop without reason. It's impossible to open the Command Handling from TMOS. Major
"mml" command (used to access to the CP) introduced from CHA interface, block the connection. Major
The APG software module necessary to define the Alarms panel isn't included in APG . Critical
The FTP functionality is incorrectly implemented. The "FTP Area" isn't clearly defined . Major
The HLR has lost the connection with TMOS due to "ossuser" password has expired. Major
"AP FAULT" alarm with unknown cause . Major
"AP SYSTEM ANALYSIS" alarm with unknown cause. ALEX doesn't indicate the cause and the recovery procedure. Documentation
"AP DIAGNOSTIC FAULT" alarm with unknown cause. ALEX doesn't indicate the cause and the recovery procedure. Documentation
Ericsson hasn't provided the command file to recover the APG after a disaster fault in HW and SW. Documentation
SAACTIONS inconsistent. The system indicates that we must increase a SAE to a lowest value that the present value. Critical for O&M
The parameter TSMO-0 appears in the subscribers with NAM=0 (GPRS subscribers) . Critical for Provisioning
Problems with the HGPFI command. Major
Incorrect definition of the procedure to update the Antivirus . Critical for O&M
Manual work is necessary to delete the Command Log files.. Major
AD-0 and AD-4 routed to a file where all the printout are written. There is a danger of disks fulfil. Major
Problems with the Antivirus configuration. When we load an APG software correction, the Antivirus and the update procedure are
Major
desconfigured .
Problems to verify a tape backup execution if you execute it from command line . Critical for O&M
The OPI to solve the alarm "INTELLIGENT NETWORKS MANAGEMENT INTERFACE FILE FAULT" force to contact with the next support
Critical for O&M
level .
Billing alarms in the HLR . Major
No Alarm Panel defined in the I-Module Major
Wrong execution of password change with the APG command "NET USER". It's mandatory to do it from the graphical interface . Major
20
Y&E