Core Server Redundancy 2.6 - Installation Manual PDF

NICE Recording (CyberTech)
Core Server Redundancy

INSTALLATION MANUAL
Version: Date: 2.6 11 October 2012
Copyright 2012 by NICE systems Ltd All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopy, recording, or any information storage and retrieval system, without prior written consent of NICE systems Ltd. Disclaimer To the best of our knowledge, the information contained in this document is the most accurate available at the time of publication. Whilst every care is taken to ensure that the information in this document is correct, no liability can be accepted by NICE Systems Ltd. for loss, damage or injury caused by any errors in, or omissions from, the information given. Trademark Acknowledgements Microsoft, Windows, Windows Server, and Internet Explorer are trademarks or registered trademarks of Microsoft Corporation in the United States and/or other countries. Java is a trademark of Sun Microsystems, Inc.
Contents
1 2 Introduction ................................................................................... 5 Requirements checklist ................................................................... 6
2.1 Replay to Handset and CSR....................................................................... 7
Installing redundancy components ................................................. 8

3.1 3.2 3.3 Step 1 Install Core Server Redundancy on Core Servers ......................... 8 Step 2 Install CSR Components on Satellites and CTI Servers .............. 13 Step 3 Verify successful installation....................................................... 14
Configuring Core Server Redundancy ........................................... 15

4.1 4.2 4.3 4.4 4.5 4.6 4.7 Defining core servers ............................................................................... 15 4.1.1 Specifying location for large bin log files ................................. 15 Adding a core server resilience group ..................................................... 15 Setting caching periods and timeouts ..................................................... 17 Checking core server status ..................................................................... 17 4.4.1 Resilience host status ............................................................... 18 Forcing failover........................................................................................ 19 Core server fail-back (using web interface) .............................................. 19 Core server fail-back (using replication utility) ......................................... 21 4.7.1 4.7.2 Restoring database manually (optional) ................................... 21 Re-enable replication ............................................................... 23
Copying very large database files ................................................. 26

5.1 Copying from active core to standby core (not failed-over) ..................... 26
Upgrading a CSR system .............................................................. 29

6.1 6.2 6.3 6.4 Upgrading CSR 2.5.2 or lower to CSR 2.6 ............................................... 29 Upgrading CT6 feature pack with CSR .................................................... 30 Upgrading CT5 feature packs with CSR ................................................... 31 Upgrading Active CTI integration with CSR ............................................. 32
Appendix A Core redundancy alarms ................................................ 33 Appendix B New replication password .............................................. 37 Appendix C Core Services not to be monitored ................................. 39 Appendix D Replication Troubleshooting ........................................... 40
Core Server Redundancy 2.6 Installation Manual
Appendix E Network requirements .................................................... 44 Appendix F Resilience Add-On ......................................................... 45 Appendix G Changes replication utility ............................................. 46 Appendix H CSR Alarm Profile .......................................................... 47
1 Introduction
For CyberTech recording systems R5 and R6 you can install an add-on package for active/standby core server redundancy. This is only available for core servers. A core server is a recorder server with no recording channels. Core Server Redundancy (CSR) uses MySQL replication to replicate the database of an active core server to a standby core server. It monitors the core server status and switches to the standby core server if the active core server fails.
Satellites and CTI servers, automatically failover to the standby core server. For CTI servers to support core failover, the CTI integration version must support active/standby core server redundancy. See Requirements chapter for version details. There are 2 triggers for the active core to failover: Losing the keep-alive message sent between cores. If the keep-alive is lost for a configurable time (default 5 minutes) the cores failover. This happens in case of power failure or a network issue. Alarm 3006 on the Active Core service stopped unexpectedly. This alarm triggers failover after 5 minutes (default). This alarm is defined in an alarm profile used for failover named Core Server Resilience Errors.
This manual includes screenshots from the R6 recording system. Although skinning for R5 looks different, the core server redundancy pages and configuration are identical for both the R5 and R6.
2 Requirements checklist
Core Server Redundancy (CSR) requires: A fully functioning recorder system, comprising of core server with satellites (and CTI servers if needed), meeting either of the following requirements: o R5.4.2 or higher installed on core server and satellites Core Server Redundancy is NOT compatible with Screen Recording. o R6.0.3 or higher installed on core server and satellites The core server MUST be core only. o No recording channels in the server o No CTI or CDR integration in the server o R5.4.2 or R6.0.3 or higher core server installation. A license for Core Server Redundancy (article code C35919) For supported CTI Server versions, contact support A standby core server with identical recorder version, patch level and addon packages. The Installation path and log file patch must be identical. You dont need to configure GUI settings this core server, as the settings replicate from the first core server when core server redundancy is installed. Core and standby servers must have valid IP addresses All redundancy components linked to the core servers must be version 2.5.0.117 or higher. If redundancy kits are older than this version they will work, but will not support failure combinations (such as Core and CTI Server failing). Other redundancy components might include: o N+1 Satellite resilience o CTI Server A/S resilience, Active Satellite resilience
BOTH core servers require: Older versions of Core Server Redundancy uninstalled CyberTech recording and licensing services disabled - CSR does NOT support recording on the core server Time synchronized to an external time source At least 500G space in their MySQL data partitions Windows WMI service enabled and running If running Windows 2008 R2, Service Pack 1 installed Archiving to a central location. Both cores must be linked and have read/write access to the same archive location to replay archived calls. Archive location could be EMC, drive share or NAS. (Delete access is
required if you have retention calls that must be deleted from a drive share or NAS.) The following ports open in firewalls in addition to the standard recorder ports: o Port 4251 The core server link failover trigger must be open on all CTI Servers and Satellites. o Port 4252/3306 The core server agent keep-alive and replication ports must be open on each core servers. o For standard recorder ports, see OS Hardening manual
2.1 Replay to Handset and CSR

If you install Replay to Handset 2.5.1 in combination with Core Server Redundancy, check the INI_FILE_LOCATION: 64 bit: HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\CyberTech 32 bit: HKEY_LOCAL_MACHINE\SOFTWARE\CyberTech
Replay to Handset might have changed it to an incorrect location. The replication tool uses this .ini value to locate the mysql my.ini file. Replication setup will not complete properly if the my.ini file cant be found.
3 Installing redundancy components

To install core server redundancy, follow these steps:
3.1 Step 1 Install Core Server Redundancy on Core Servers

1. Make sure both core servers fulfil the requirements listed in the requirements checklist, see previous chapter. 2. Use the monitor tool to stop all CyberTech services. This must be done on both core servers, and preferably on all satellites and CTI servers. 3. On the 1st core server, run the CoreServerRedundancySetup installer. This system becomes the active (master) core server. 4. Select the Master Install option, and then click Next.
5. The software is ready to install. Click Install.
6. After the software is installed, the Replication Utility starts. 7. The Replication Utility verifies that the Recorder services start-up state is configured correctly. If not configured correctly, a pop-up shows which services must be changed. Click Yes to change the configuration automatically.
8. In the Replication Utility: - Enter service account user name and password - Enter the standby (slave) core server IP address - Create a password for the replication user. Remember this password, you need it for the replication setup on the standby core server.
9. The option Copy local DB to slave DB is checked by default. Be aware that copying a big database over the network might take several hours (Calculation in appendix E). If you prefer not to copy the database, uncheck this option. Instead click Export DB to file and remember to move this file to the standby (slave) core server. 10. Click Start and wait until the operation completes. This might take several hours (Calculation in appendix E) if the database is big. When finished, close the utility. The installer continues after the utility is closed. 11. The Replication Utility checks if a Core Server Resilience group is configured in the web GUI. If no group is present, the Replication Utility automatically creates one, and adds the Core Servers to the group, provided theres a license for CSR. (You can configure this group later, see chapter 4, Configuring Core Server Redundancy.) 12. Run the WMI fix distributed as part of the CSR installation kit, in the folder Windows 2008 R2 SP1, Windows 2003 - WMI Fix, filename: csr_addwmidependency.reg This WMI fix makes CSR services dependent on the Microsoft WMI service. If the WMI fix is not installed, the CSR 2.6 services might not start properly after restart of the core server, causing Core Server Redundancy not to work properly. 13. On the standby (slave) core server, run CoreServerRedundancySetup. Select Slave Install, and then click Next.
10
14. After the software is installed, the Replication Utility starts: - Enter service account user name and password. (Similar to the master service account) - Enter the active (master) core server IP address - Enter the password for the replication user as defined at the active (master) core server.
15. Click Start. It might take a few minutes to configure standby (slave) replication. 16. Wait until the Replication Utility is ready. If an error occurs, click Start again. This operation fails if the active (master) core server cant be reached. 17. When finished, close the Replication Utility.
11
18. Run the WMI fix distributed as part of the CSR installation kit, in the folder Windows 2008 R2 SP1, Windows 2003 - WMI Fix, filename: csr_addwmidependency.reg 19. Check all CyberTech services on the active core server. They should be running:
20. Check all CyberTech services on the standby core server. They should be stopped:
21. Log on to the standby core system and open the web GUI. This should redirect to the active core server.
12
3.2 Step 2 Install CSR Components on Satellites and CTI Servers

Servers connecting to core servers (such as Recording satellites, CTI servers, CDR servers) must know which core server is operational, and be redirected on failover / failback. If the active core server fails, the standby core becomes operational. This redirection is handled by the core server redundancy component installed on satellites and CTI server. 1. Run CoreServerRedundancySetup.exe on all satellites, CTI servers, CDR servers. Click Next to proceed.
2. The components are now ready to install. Click Next.
3. After installing components, the connection setup utility starts. 4. Specify host address of active (master) and standby (slave) core servers and recorder account credentials. To test these credentials click Apply. To save, click Close.
13
5. If the resilience group for Core Server Redundancy has already been set up, the tool offers to add the host address to the list of Core Link Failover Clients as described the configuration chapter. You only need to do this if this server isnt a satellite (satellites are always notified about core failover). If the server is a CDR server, add it.
6. Click Apply to test the settings, or Close to apply the settings and close the tool. 7. The tool installs a desktop shortcut.
3.3 Step 3 Verify successful installation

1. Log into the web GUI, look at the resilience hosts page (under system installation) and verify that the standby core server status is Replicating and the active status is Live. This means that the database is replicating properly from active to standby and the system is operating correctly. 2. Check the alarms page, check if there are any unattended alarms for core server resilience (range 11050 11064). If there are configuration problems with CSR, alarm 11058 are raised and contain the reason why the configuration is incorrect. 3. Enable failover by making sure that failover is set to Immediate. 4. Test forced failover and failback. See following chapter.
14
4 Configuring Core Server Redundancy

When the core server fails over, users are redirected to the standby core, as long as the active core web server is accessible. (When using HTTPS security, redirection might fail due to certificate differences between core servers. Then users must manually access the standby core.) Screenshots in this chapter are from the R6 recording system. For R5.4.2 core servers, configuration parameters are similar although the GUI skin differs.
4.1 Defining core servers

Log on to the active core server web GUI. The menu system installation>core servers shows both core servers. You can edit the details of the second core server if necessary. The core servers are added automatically by the replication utility on initial setup.
The roles of the core servers are fixed. The core server with ID 1 has a fixed active role. The core server with ID 2 is the fixed standby. 4.1.1 Specifying location for large bin log files To prevent a core server from running out of hard disk space if a system generates large bin log files, you can specify a location as follows: 1. In the CyberTech.ini files location, edit the my.ini line to specify the location of the MySQL data partition:
Default: Example: log-bin=mysql-bin log-bin=E:\\mysql-bin\\mysql-bin
4.2 Adding a core server resilience group

A resilience group is a list of servers that have an active or standby role. Core servers are automatically added to the CSR resilience group. If you need to add a core server manually, follow these steps:
15
1. Click system installation->resilience groups. Add a resilience group for Core Server Redundancy 2. Enter a Group Name and select the Error Profile Core Server Resilience Errors which is a default alarm profile available in the GUI. This error profile includes only alarm 3006 (service stopped unexpectedly)
3. Click on the resilience group to edit the configuration.
4. Check if the list Core link failover clients is complete. Items appear when you install CoreServerRedundancySetup on satellites, CTI servers, and include their IP addresses. 5. If you want to automatically export the database from the standby core server back to the active core server when failing back, then make sure Auto-Export Database on Fail-Back is checked. 6. Only change caching periods and timeouts if you are familiar with how they work. See the following section.
16
4.3 Setting caching periods and timeouts

The default call record caching period on the satellites is hour (1800 s). Leave this value, unless you are experienced with configuring the archiving interval / cache period. All recorded calls are kept on the linked satellites for this interval, even when uploaded to the active core server. If the active core server fails, the satellites link to the standby core server and send their audio buffer. This ensures no audio is lost in case of core failure if it was not yet archived. The Resilience Service Keep-alive Timeout is set to 5 minutes. This ensures failover is triggered if keep-alives are lost for 5 minutes, or if an alarm is raised and is present in the alarm profile 'Core Server Resilience Errors', and stays unattended for 5 minutes. The error profile contains one error 3006 at unexpected service stop. The 5 minute timeout allows a core server restart. If operating system updates are installed on the active core server, the server might need a planned restart. This restart will not trigger a core server failover, as long as the core server is back within 5 minutes. The keep alive interval can be extended to 10 minutes if required. If you prefer to failover the core server after a core server restart, you can reduce the timer, for example to 30 seconds. Be aware that core server outage does not affect recording. Satellite and CTI operations continue when the core server is stopped. Core server outage affects replay and archiving as their functions are handled by the core server.
4.4 Checking core server status

To check the status of active and standby core servers 1. Click system installation > resilience hosts
2. Status lights show if heartbeat is up to date. The heartbeat is refreshed every 15 seconds. When you see a green light for both core servers, the system is ready for use. This can take up to a miunute.
17
A red light indicates a problem with the configuration (for example, only one core server in a resilience group) Click for details Configuration Error Information. 3. The active core server status becomes Live, which means that the resilience sub-system is live. The standby core server status changes to Replicating, which means that the resilience sub-system has detected that the standby core servers database is replicating as expected. 4. Set the failover state to Immediate for both core servers:
5. Now core server redundancy with active and standby cores is operational. You are advised to test a forced failover and fail-back scenario to ensure all configurations are correct. 4.4.1 Live: Resilience host status The core server is online. This is the normal the status for active core server. The standby core server gets this status, the moment a successful failover occurs. Database replication is running ok. This is the normal status for the standby core server, as long there is no failover. The active core server never gets this status. The active core server has failed over. The standby core server should now have the Live status. The standby core server never gets the status Failed over. The standby core server is not reachable. Reasons can be: - Network link loss - Redundancy services not running on standby core - Database stopped on standby core. All database actions on the active core server are still logged. The database might or might not be replicating: if stopped, replication is automatically restored when the issue causing replication to fail, has been resolved.
Replicating:
Failed over:
Inoperative:
Not replicating:
18
The standby core server is reachable, but the database is not replicating. This might be caused by a database issue on the standby core server. Alarm 11060 is raised. Check the alarm appendix for details. Recommissioning: Core server fail-back is in progress. Depending on the database size, this might take several hours (Calculation in appendix E). When fail-back has completed, the active core server gets status Live and the standby core server Replicating.
4.5 Forcing failover

To force failover to the standby core server: 1. In system installation>resilience hosts, select the active core server. 2. Click Force Failover.
The user interface on the active core server stops responding, because the IIS service is shut down. You might see the following message:
3. If necessary, click one of the tabs to force the browser to load a new page. The browser should now redirect you to the standby core server. 4. Log in to the standby core server. 5. The system has now failed over. Satellites (and CTI servers etc) automatically connect to the standby core server. No new core server failover is possible now, the core server must first be recommissioned. Only then is core server failover available again. Note: If you do a DNS update, for example after failover, be sure to restart the resilience services on both cores.
4.6 Core server fail-back (using web interface)

Be aware that recommisioning can only be done if the active core server is up and running. After failover, you must first resolve the issue that caused the active core server to fail.
19
While failed over, futher fail-over is not possible and the database is no longer replicating. You must manually Fail-back to restore the fail-back capability. Fail-back is not automaticaly triggered. 1. Click system installation>redundancy groups. 2. Check option Auto-Export database on Fail-Back. 3. Click system installation > resilience hosts. You see the status of the active and standby core servers. 4. Click the failed-over core server, and click Recommission.
5. The user interface stops responding, since the web services are restarting on the standby core server. After a time you see the following message when you refresh the page:
6. Be aware that copying a huge database over the network might take several hours (Calculation in appendix E). A timer shows the fail-back time remaining. 7. Log into the GUI on the active server. It might take some time for the GUI to become active, first the database rollback must be completed.
8. Click system installation > resilience hosts and verify the status of the active core server. If fail-back is complete, the status of the active core server is Live and the standby core server Replicating. 9. If statuses are as described above the operation mode of the active core server is restored, and the standby core server is available for failover again. 10. If for some reason the automatic fail-back is cancelled, for example by a core server restart, the fail-back should abort, and the standby core reverts back to being the active core server. A second attempt to recommission
20
can be made via the web interface on the standby core server.
4.7 Core server fail-back (using replication utility)

Be aware that recommisioning can only be done if the active core server is up. After failover, you must first resolve the issue causing the failure on the active core server. In failover status, no core server redundancy functionality is available. Fail-back needs to be performed to restore core server redundancy functionality. Fail-back always needs a manual activation, it is not automaticaly triggered. 1. If you dont want to copy the database over the network between the core servers, uncheck the option Auto-Export database on Fail-Back on system installation->resilience groups. Use the Replication utility for manual database export.
2. Click system installation > resilience hosts. You see the status of the active and standby core servers. 3. Click the Failed over core server, and click Recommission.
4. After about 30 seconds, the user interface stops responding, as the web services are now restarting on the standby core server. You see the message below when refreshing the page:
4.7.1 Restoring database manually (optional) Optionally you can restore the database manually from the standby core to the active core server. 1. If the database needs to be restored manually, start the Replication utility on the standby core server. (Start menu>Programs>CyberTech>Core
21
Server Redundancy)
2. Enter service account user name and password. 3. Enter the active (master) core server IP address. 4. Enter the password for the replication user as defined on the active core server. 5. Export the database to a file by clicking Export DB to file. 6. Make this database available on the active core. 7. Start the Replication utility on the active (master) core server.
22
8. Enter service and replication accounts 7. Enter service account user name and password. 8. UNCHECK Copy local DB to slave DB. 9. Enter the replication user password. 10. Click Import DB from file and select the database which was exported from the standby (slave) core server. Advised to ensure no calls are recorded during this process. 11. Be aware that importing a huge database might take several hours (see calculation in appendix E). 4.7.2 Re-enable replication To enable replication again from the active (master) to the standby (slave) core server:
23
1. Start the Replication utility on the standby core server:
2. Enter service account user name and password. 3. Optionally select copy DB. 4. Enter the active (master) core server IP address. 5. Enter the password for the replication user. 6. If the database was not restored manually, select Copy local DB to master DB. 7. Be aware that copying a huge database over the network might take several hours (Calculation in appendix E). 8. Click Start button to enable replication again. Wait for the action to be completed as shown in the status view. It could take a few minutes. 9. Wait until the replication utility is ready. 10. To complete manual fail-back, activate failover as follows: Logon to the active core server and click system installation > resilience hosts. 11. Select the standby core server and set the Failover option to Immediate.
24
12. Do the same for the active core server.
13. This completes the manual fail-back procedure. The statuses for the core servers become Live and Replication. Failover is set to immediate:
25
5 Copying very large database files

On systems that have very large databases, the normal method of importing and exporting the database takes a very long time. (This is because indexes are fully re-created on import). There is alternative way to copy the database - copy the actual binary data files. Advantages of copying binary database files: It's significantly faster to export the data, since it's just a file copy operation. The speed is limited by the speed of the network, which is usually very fast on big sites. Disadvantages of copying binary database files: The MySQL service must be off. Copy a live database would cause serious data loss. You must assume that binary data files are not portable between other versions of MySQL or other operating systems. Core servers must be identical in OS and MySQL version.
5.1 Copying from active core to standby core (not failed-over)

This procedure is for copying the database from an active (master) core server to a standby (slave) core server. Assumptions: The active core server has NOT failed-over You want to copy the database because replication stopped on the standby core server $ network shares (c$, d$, e$ etc) are available between core servers. This is required when copying the database Steps and results: 1. Disable failover in the Web GUI. 2. Stop all the services using the monitor tool on both core servers. 3. Stop the resilience manager and resilience agent on both core servers. 4. Execute the following command on both core servers to stop the IIS web service; iisreset /STOP
26
5. Log onto MySQL client on the standby core server with the service account, execute the MySQL command 'stop slave'. 6. Stop MySQL service on both active and standby core servers; net stop MySQL. 7. Locate the MySQL data folder on both active and standby core servers. You can do this by locating the mysql my.ini file The location is pointed to in the registry under the HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\CyberTech\INI_FIL E_LOCATION value After finding my.ini, open it up and look for the 'datadir' entry. This defines the data folder. datadir=C:/ProgramData/CyberTech/mySQL/Data/ Find the PARENT of this folder, for example: c:\ProgramData\CyberTech\mySQL Make a note of the location! 8. On the standby core server: Create a sensibly-named backup folder in this parent folder. (Check there is enough disk space!) For example: c:\ProgramData\CyberTech\mySQL\Backup-2012-09-01 9. On the standby core server: Move the following folders into the backup folder location. After that, the standby database is backed up. Data InnoDBData InnoDBLog 10. On the standby core server: Open a command prompt and go to this folder. 11. Execute the following command; - Note: Specify the path for the mysql data folder parent folder. Make sure the command completes successfully. xcopy /e "\\{master core server IP}\c$\ProgramData\CyberTech\mySQL" 12. Edit my.ini on both active and standby core servers, look for read_only setting, make sure it says:
27
read_only=1 Note: If the read_only setting is not present in the ini file, add it to the end of the file. 13. Start MySQL on both core servers - double check that it starts correctly; net start MySQL 14. Start the replication tool on the standby core server. 15. Fill out the details, make sure the "Copy local DB to master DB" is NOT checked. 16. Click Start. If everything is normal, replication setup should take a few seconds. 17. Click the Check Replication button to verify that it is working: It should say Slave Status: Slave is replicating, 0 seconds behind. If this message is not displayed, repeat the procedure from step 6. 18. Close the replication tool. 19. Start the resilience manager and resilience agent on the active and standby core servers using the service manager. 20. Run the iisreset command on both active and standby core servers. 21. Edit the my.ini file on the active core server and set the read_only flag to: read_only=0 22. Open a MySQL client connection to the active database using the service username and password and issue the following commands. SET GLOBAL READ_ONLY='OFF' SHOW GLOBAL VARIABLES LIKE 'read_only' 23. You should now see that the read-only state is now 'OFF'. 24. Start the recorder services on the master using the monitor tool. 25. Open the recorder GUI in Internet Explorer. 26. Navigate to the System Installation > Resilience Hosts page. 27. The resilience status should be Live and Replicating.
28
6 Upgrading a CSR system

Do NOT upgrade during business hours, as upgrading affects system operation. When upgrading a core server pair from CT6.0.3 to CT6.1, follow the steps in this chapter. Otherwise the standby core database can become corrupted and youll need to rebuild the standby core system. When upgrading, first upgrade the CSR version to v2.6. Then upgrade other components such as Recorder and CTI Integrations.
6.1 Upgrading CSR 2.5.2 or lower to CSR 2.6

Follow these steps to upgrade the CSR installation. 1. Ensure both active and standby core servers are healthy and replication is running: In GUI Redundancy Hosts page, the active state is Live and the standby state is Replicating. 2. Create database backup on the active core using the database export in the replication utility. 3. Disable CSR failover in the web GUI: On the Resilience Hosts page, set failover for both cores to Disabled. 4. Install the new version of CSR on the active core server. 5. Wait until the installation is finished on the active core server before you proceed to the next step. 6. Install the new version of CSR on the standby core server. 7. On the standby core server, start the Replication Utility (make sure Copy DB to Master is deselected). - Click Start and wait for the procedure to complete - Open the Monitor tool and verify that all regular CT services are stopped and set to start type Manual 8. Install the new version of CSR on each satellite and CTI server in turn, uninstall the existing version of CSR and install the new version. The CSR client configuration tool will be run. You might see a prompt asking you if you want to add the server to the list of hosts notified about failover by CSR you dont need to do this for satellites or resilient CTI servers. Click Close
29
to apply the settings.
9. Ensure the standby core database is read-only. Check using a MySQL command prompt (with super privilege eg. Service account) on the standby core server, issue the command show GLOBAL VARIABLES LIKE read_only; The value should be ON. If it is OFF, issue the command set GLOBAL read_only=ON; and then check its value again to make sure it is now ON. 10. Install the latest resilience add-on on the active core (the standby core database must be read-only for this step as described in the previous step). Resilience add-on 1.5 is released with CSR 2.6, a later version could be available as the add-on is used for all redundancy kits. 11. Re-enable failover by setting failover to immediate in the GUI if there are no more upgrades to be made. Verify that the status of the standby core server is still Replicating.
6.2 Upgrading CT6 feature pack with CSR

To upgrade the version of CT6.0.3 to CT6.1 or higher: 1. Ensure both active and standby core servers are healthy and replication is running: In GUI Redundancy Hosts page, the active state is Live and the standby state is replicating. 2. Create database backup on the active core using the database export in the replication utility. 3. Disable CSR failover in the web GUI: On the Resilience Hosts page, set failover for both cores to Disabled. 4. Wait for 30 seconds to ensure the configuration change is processed.
30
5. Upgrade the active core server. This action upgrades the following: - Database (changes are also migrated to the standby database) - Services on the active core - Website 6. Wait until the upgrade is finished on the active core server before you proceed to the next step. 7. Ensure the standby core database is read-only. Check using a MySQL command prompt (with super privilege eg. Service account) on the standby core server, issue the command show GLOBAL VARIABLES LIKE read_only; The value should be ON. If it is OFF, issue the command set GLOBAL read_only=ON; and then check its value again to make sure it is ON. 8. Upgrade the standby core server. Database changes are ignored as already replicated from the active core. This action upgrades: - Services on the standby core - Website 9. Enable CSR failover: Go to the Resilience Hosts page and set Failover for both cores to Immediate. If the upgrade procedure failed but both the active and standby cores are updated to CT6.1 correctly, follow the procedure as described in appendix D to restart replication. Note: When you upgrade CT6, it starts CyberTech services on the standby core and makes them automatic. You must stop these services on the standby core and make them manual.
6.3 Upgrading CT5 feature packs with CSR

To update a CT5.4 or higher CT5 feature pack, follow the procedure below. Ensure both active and standby core servers are healthy and replication is running: In GUI Redundancy Hosts page, the active (master) state is Live and the standby (slave) state is replicating. 1. Create database backup on the active core using the database export in the replication utility. 2. Force failover of the active core by selecting it in the list in the resilience hosts page, and clicking Force Failover. 3. If possible, stop the Recorder Services on all satellites. Any calls that accidentally get transferred to the active database may be lost, therefore it is best to do this out of hours.
31
4. Using the Monitor tool, verify that all Recorder Services are stopped on the active core server. 5. On the active core server, using the service account, issue the mysql command set GLOBAL read_only=off. 6. Upgrade CT5.4 on both active and standby core servers. 7. On the active core server, using the service account, issue the mysql command set GLOBAL read_only=on. 8. Now open the GUI on the standby core server and recommission the active core server by selecting the active core server from the list in the resilience hosts page (in system installation), and by clicking the Recommission button. The web GUI should show the blue status bar, with an estimate of how much longer fail-back will take. This may be a few minutes to a number of hours depending on the size of the database. (The resilience manager log on the standby core server shows progress of the copy operation.) 9. Once the database has been copied, the user interface should redirect to the active core server. Recorder Services will be started on the active core server. Verify that the GUI reports that the standby core server is replicating. (This can be seen in the status field in the resilience hosts page under system installation).
6.4 Upgrading Active CTI integration with CSR

For Active CTI Connectivity kits released after 11-08-2011 (Cisco 6.0.1 being the first) the upgrade procedure for the core is identical to a CT6 feature packs. In this case update the Active CTI kit in the Core instead of running a CT feature pack. For Active CTI Connectivity kits released before 11-08-2011 the upgrade procedure for the Core is identical to a CT5 feature packs (even if the Core is based on CT6). In this case update the Active CTI kit in the Core instead of running a CT feature pack.
32
Appendix A Core redundancy alarms

A Warning raises an SNMP trap and generates an email. In the Alarm Status page, warnings can be cleared without attending to them first. An Error raises an SNMP trap and generates an email. In the Alarm Status page, errors must be attended to. An Alarm raises an SNMP trap and generates an email. Additionally an audio alarm (beep) is raised and Alarm relay card contacts are closed (if placed in the recorded system). In the Alarm Status page Alarms must be attended to. A Message is information only. No additional SNMP traps or emails are sent. 11050 Message: Failover attempt started for core server <host> due to error <error> Description: Core server failover triggered, failover in progress Successful failover will be reported via message 11051. Failed failover trigger alarm 11052. 11051 Message: Failover succeeded for core server <host>. Core server currently activated: <host> Description: Core server failover successful. Failover completed. Failover attempt failed for core server <host>. Tried to failover to core server <host>. Fail reason <reason> Description: Core server failover request failed. Reasons could be a database communication error. 11053 Message: Failover recommissioning started for core server <host>. Description: Core server recommission button pressed in web GUI, fail-back procedure has started. 11054 Message: Failover recommissioning succeeded for core server <host>. Description: Automatic core server fail-back completed. For manual fail-back, the database still needs to be restored and failover needs to be enabled again. 11055 Alarm: Failover recommisioning failed for core server <host>. Reason: <reason> 11052 Alarm:
33
Description: Execution of fail-back request via web GUI, failed. Recommisioning cancelled, retry after the issue causing this failure has been resolved. Reason could be a database access problem possibly caused by a network link failure between the core servers. Resolve the issue and then re-attempt recommissioning via the web GUI. 11056 Alarm: Resilience Agent Keep-Alive time-out occurred on <host> Description: Keep alive timer between core servers, timed out. A core server failover is triggered if the standby core server is on-line. Might be that the standby core database is not reachable. The standby core status then becomes Not Replicating. 11057 Message: 11058 Error: The user initiated a forced failover on core server <host> Description: Manual failover triggered via the web GUI.
The configuration is incorrect, failover will not proceed. Reason <reason> Description: If core server redundancy is not correct configured, this error is triggered. It verifies the existence of a resilience group for core server redundancy and checks whether two core server hosts are defined. The check runs every 30s so an issue is not reported immediately. 11059 Error: A failover occurred but the standby core server <host> failed with error <error> Description: The standby core server is not operational and the active core server has failed 11060 Alarm: Database replication stopped unexpectedly on the standby Core Server <host> Description: Database replication between core servers has stopped. The slave core server status in the web GUI is set to Not replicating. Reasons can be: - Network link loss - Redundancy services not running on slave core - Database stopped on slave core. - Firewall issue blocking replication port 3306 No replication possible between the active and standby core servers. All database actions on the active core server are still logged and replication restarts automatically when the issue causing failure is resolved.
34
If unable to resolve this issue, copy the replication log files located in <database drive>\mysql\data\, replication log file name is <hostname>.err. Send these log files to CyberTech support for analysis. 11061 Alarm: The following failover clients (client hosts), are unreachable from host (core host). These clients will not respond if the core fails over. Check the failover clients list in the resilience group configuration Description: Failover clients such as satellites and CTI servers,not reachable. When core server failover is triggered, it is not possible to notify these clients,they will not reconnect to the standby core server. 11062 Alarm: Failover attempt aborted for core server <host>. Reason: <reason> Description: Failover has been aborted due to a replication error. To resolve, disable failover in the web GUI temporarily. Use the Replication utility to manually export the database from the active core server to the standby core server. Now enable failover again in the web GUI. If you still require failover, trigger it manually using the web GUI. Failover has now been disabled. Reason: Replication is behind <seconds>s, (max allowable lag = 1800s) Description: Failover has been disabled because replication is too far behind. This is to prevent data loss when failing back. An address of a core failover client cannot be resolved. (Address = '<address>') Description: An address in the field Core link failover clients, core server resilience group configuration, cannot be resolved. This can occur if you or the customer made changes to IP addresses or hostnames in CTI/satellite clients. Action: Restart the resilience services. This forces CSR to resolve the host addresses again, which might resolve the issue. Ensure no host resolve issues still exist on the system. Use nslookup from the Core to any CTI/Satellite clients that fail to link. 11065 Alarm: Core Server: Resilience service stopped unexpectedly: %s Where %s can be one of the following; 'Resilience Agent on Master Core' 'Resilience Manager on Master Core' 11064 Alarm: 11063 Alarm:
35
'Resilience Agent on Slave Core' 'Resilience Manager on Slave Core' Description: Reasons can be: - Network link loss - Redundancy services not running on slave core - Database stopped on slave core
36
Appendix B New replication password

1. First disable failover in the web GUI, to ensure any alarms do not trigger an accidental core server failover. 2. Start the replication utility on the active (master) core server. (This is located in the start menu under programs; CyberTech; Core Server Redundancy) 3. In the Replication Utility: - Enter service account user name and password - UNCHECK copy DB option! - Enter the slave core server IP address - Enter the NEW replication user password
6. Click Start to update the replication account on the active database. 7. Start the Replication Utility on the standby (slave) core server: - Enter service account user name and password - UNCHECK copy DB option - Enter the active (master) core server IP address - Enter the NEW password for the replication user as defined at the active core server
37
8. Click Start to complete the replication user password change. It might take a few minutes to update the slave password. Wait until the replication utility is ready. 13. Set failover for the active and standby core servers to Immediate to enable failover again. 14. Verify the replication status in the web interface:
Status of the active core server should be Live and the standby core server Replicating.
38
Appendix C Core Services not to be monitored

The resilience manager thats part of Core Server Redundancy is responsible for starting and stopping Recorder Services on the core servers. On failover, the resilience manager on the active core server (if it is still responsive) tries to stop the main CyberTech services. On the slave core server the resilience manager starts the main CyberTech services. On fail-back (after recommissioning), the services stop on the slave and start on the active core. If services dont start or stop properly, the resilience manager waits one minute before giving up. Three registry values that are used now, all located in the registry key: HKLM\Software\Cybertech\CoreServerResilience (on Windows 2008 see HKLM\Software\Wow6432Node\... ) ServiceNameRegExps Regular expressions used to match services. If matched, the service is controlled by CSR. ExcludedServices AdditionalServices Services that should not be controlled by CSR. This overrides the ServiceNameRegExps value. Services controlled by CSR in addition to the names matched by the ServiceNameRegExps. This could be used for controlling other services not known to the CyberTech installation.
Intelligent default services names are configured on installation. This is a list of the default values of the services to be controlled in CSR v2.6. Note that services are filtered based on Display Names as shown in the Windows service controller. You must make the same changes on both core servers to keep the lists identical.
ServiceNameRegExps ExcludedServices cybertech .* cybertech cti receiver cybertech licensing service cybertech recording service cybertech dsc service cybertech MaintenanceTool cybertech ParrotDSCAPIDemo cybertech ParrotLT cybertech Programmer CyberTech MAX Content manager CyberTech MAX System Manager CyberTech MAX User Manager
AdditionalServices
39
Appendix D Replication Troubleshooting

Replication Tool Error: Problem trying to update database with core server configuration This indicates that the database is read-only. To resolve this, follow these steps: 1. Log on to the active (master) database and type the following command: SHOW GLOBAL VARIABLES LIKE 'read_only' This should show Read_only=OFF 2. If it is ON, Enter the following command: set GLOBAL read_only='OFF'; 3. Make sure you can start replicating again by using the replication tool. Use the CHECK replication button to check if your database is replicating. 4. Go to the webGUI and check if the state is Live -> Replicating. If this is not the case, do a final check on the active (master) database and standby (slave) database. 5. SHOW GLOBAL VARIABLES LIKE 'read_only' The active (master) database should be set to OFF. If not change accordingly. The standby (slave) database should be set to ON. If not change accordingly. 11060 Database Replication Stopped Unexpectedly If alarm 11060 Database Replication Stopped Unexpectedly is raised and active core server status Not Replicating, is shown in the GUI. It indicates MySQL replication has stopped. Replication must be restarted, otherwise failover is not possible.
In CSR 2.6 replication restarts immediately when replication fails and the alarm is cleared. If the alarm occurs repeatedly (even if it is cleared), it is worth contacting customer support, since this may indicate a more general problem with replication. If replication does not restart automatically, contact customer support.
40
Replication should not stop, but an error might occur during replication. (More information about MySQL replication can be found in MySQL documentation). For CSR v2.6, to find out more about why replication stopped, run the replication utility on the standby core server, entering the service user name and password, and then to click Check Replication for details of the error that caused replication to stop. Please cut and paste this email into any correspondence discussing replication failure with customer support. For CSR versions lower than v2.6, you can try to recover from replication failure as follows: Restart replication from the MySQL command line Use Replication utility to export the active database to the standby database and get replication running again. Restarting replication from the MySQL command line 6. Copy of the log in \mysql\data, called {hostname}.err. The end of this log contains the reason why replication stopped. It is very important to pass this information to the CyberTech support desk. 7. Start mysql.exe session on the slave, use the service account credentials. Start mysql u <user> -p <password> recorder from the command line. 8. Issue command start slave; 9. Check the GUI. Replication may have started. If it fails to start after about 1 minute, continue with the next step. 10. Issue command SET GLOBAL sql_slave_skip_counter = 1; 11. Issue command start slave; 12. Check the GUI. Replication may have started. If this is not the case, repeat the procedure from step 5. Replication normally starts in about one or two attempts, but may take more (up to about 15 in more extreme circumstances). If this fails to restart replication, re-export the database from the master core server. This procedure is described in the following steps. 1. Use Replication Utility to export the active database to the standby database and get replication running. 2. Make sure all the CyberTech services are stopped on the standby and active cores. 3. Run the replication setup tool on the active core.
41
4. Specify credentials and make sure Copy local DB to slave DB is checked 5. Click start and the database will be transferred. 6. Close the application when finished. 7. On the slave, open the Setup Replication utility. There is a link on the desktop, and also in the start menu under program files, CyberTech; Core Server Redundancy.
42
8. Specify credentials and make sure that Copy local DB to master DB is NOT checked. 9. Click Start to trigger replication to be synchronised with the master database. 10. It might take a few minutes for replication to start. Wait until the replication utility is ready. The GUI shows replication has started in few minutes. If it fails to do so, try and repeat the procedure from step 5, or contact support.
43
Appendix E Network requirements

Replication: We recommend you run replication in a 1 GB network environment. Core Server Redundancy uses MySQL replication to sync the active and standby databases. All changes to the active database are also passed to the standby database. Recommission: When recommissioning the complete database needs to be written from the standby system to the active For the recommissioning procedure of core server redundancy, the database needs to be imported from the standby core server to the active core server. You can do this automatically over the network. Depending on network links between the active and standby cores, automatic import might take some time. Calculation at Gigabit network link: 1 million calls import take about 2.5 minutes. When importing over a VPN network, the network bandwidth and latency can dramatically slow down the import procedure. Calculation at 250 kbps, latency 75ms: 1 million calls import take about 90 minutes. Therefore for network links lower than 100 Megabit, we advise you to manually export the slave database using the replication tool. Move the exported database to the master system and manually import it. This procedure is described in chapter 3.7.1. Import over the network will take too long in this case. Minimum bandwidth We recommend a minimum bandwidth of 300kbit/s with latency 50ms. Depending on system size, bandwidth requirements increase: 1-500 channels: 300kbit/s, 50ms latency 500-2000 channels: 10Mbit, 10ms latency 2000-4000 channels: 100Mbit/s, 10ms latency Must be matched against installed systems to set requirements based on channels/# calls recorded per day.
44
Appendix F Resilience Add-On

The resilience add-on for GUI and schema updates is a backwards compatible add-on that updates the CyberTech recorder GUI and database schema with any changes needed for resilience. This is a list of the versions that exist as of the release of CSR 2.6: Version Main Reason for release 1.0.0 Initial release to be installed with CT 6.0.2. 1.1.0 CT 6.0.3 compatibility 1.2.0 CT 6.1 compatibility Added 2N trading satellite redundancy 1.2.1 2N Trading satellite redundancy fix 1.3.0 Added CTI Redundancy 2.9 fixes for CSR 1.4 2N trading satellite fix 1.5 Fixes for CSR Deployed on Core servers Core servers Core servers Core servers Core servers and CTI servers Core servers Core servers
45
Appendix G Changes replication utility

The replication utility is the multi-purpose tool used for copying the database from master to slave or vice versa, and for starting MySQL replication. Here is a list of changes made for CSR v2.6: Automatically assign the existing CSR resilience group to the core servers, so no user configuration is necessary in the GUI. Check to see if Automatically export database is set in CSR configuration. If this isnt set, the user is asked if this should be set. If the user chooses Yes, the setting is applied to the configuration. Check to see if the recorder service and license service are running or set to automatic start-up. If they are, then the user is asked if they should be stopped and set to manual. If the user chooses Yes, these services are stopped and set to manual. If the user chooses No, the services continue to run. When copying the database from active to standby or standby to active, a status update is given in the text window every 30 seconds. (Unfortunately it is not possible to determine exactly how long the copy operation will take in MySQL because MySQL does not return the size of the exported database.) Added check replication button which can be used to check the status of replication. If replication has stopped, the error information will be shown in the text window.
46
Appendix H CSR Alarm Profile

The Alarm profile Core Server Resilience Errors as referenced in chapter 2 is referring to a recorder error profile. This is a standard function in the CT recorder. It is used to select a group of alarms normally user for email notification. Redundancy also relies on error profiles but in this case to trigger failover on such alarms. The alarm included in the Core Server Resilience Errors profile is 3006, Service stopped unexpectedly. It is not advised to make any changes to this profile for CSR functionality. Check the CT Recording Solutions R6 - Installation Manual for more information on alarm profiles.
47
Version history
Date 26-08-2010 30-08-2010 Version 1.0 1.0.1 Remark First version of Core Server redundancy for CT5.4.2 and CT6.0, missing alarming list. Added alarms and appendix B/C. Completed the manual fail-back procedure. Updated the default timing values for the resilience group to allow core server restart without triggering failover. 31-08-2010 1.0.2 Updated core server alarming profile to include 3006. Added remark for Use hostname option in chapter 2.2 Updated screenshots Replication Utility more chapters. Updated screenshot Sat/CTI component chapter 2.2. Minor textual changes after review. 01-10-2010 13-10-2010 15-10-2010 09-11-2010 1.0.3 1.0.4 1.0.5 1.0.6 Added appendix D for Replication restart. Textual and style changes after review. Added appendix E for database import speed Added prerequisites remark version 2.5.0.117 for redundancy kits in chapter 1.1. Added remark on Core Server Redundancy not being compatible with Screen Recording, to chapter 1.1. Added remark on Core Server being core only, to chapter 1.1. Added GUI add-on install to R6.0.2 in chapter 2.0 Updated CT6 dependence to R6.0.2 Added remarks on slave replication utility taking few minutes to process any slave database changes. 03-05-2011 19-05-2011 26-05-2011 1.0.7 1.0.8 1.0.8 Added appendix F: Database blocked hosts. Add-on for 6.0.3 mentioned at start of chapter 2 Added Upgrade chapter Added failover trigger details to chapter 1. Added remark on shared archive location to chapter 1. Added reference to alarm profile in chapter 3.2 Change appendix E to Network requirements and bandwidth and latency considerations Changes to Replication Utility as in CSR 2.5.3. 9-08-2011 1.0.9 Ch.1 Minor edits Ch.2 Change style to show installation steps, shown in the order that they should be executed. Ch.3 More detail for 2.5.3 Ch.3.6 Added section on making DB read-only for 2.5.2 Ch.4 Added more detail for 2.5.3, Changed style to show installation steps in order that they should be executed. Added section for upgrading CSR 2.5.3, upgrading CT5, CT6 and Active CTI kits. Appendix A Added alarms 11063,64 Appendix C Reworded section with more explanation Appendix D Added detail for 2.5.3 & amended replication restart procedure in line with current thinking. Appendix F Added new appendix to show resilience add-on compatibility Appendix G Added new appendix to document changes to replication utility 8-05-2012 10-05-2012 22-5-2012 24-5-2012 29-5-2012 31-5-2012 5-6-2012 7-6-2012 2.5.3 2.5.3 2.6 2.6 2.6 2.6 2.6 2.6 Appendix E Added minimum bandwidth Updates and fixes for release Updates with input PB Structure and writing improvements Input GS Review PB Upgrade instructions standby core updated Review SL
48
26-7-2012 2-8-2012 13-9-2012 8-10-2012 11-10-2012
2.6 2.6 2.6 2.6 2.6
Windows 2008 R2 Service Pack 1, and desktop shortcut added Added install step: Wait until the installation is finished on the active core server before you proceed to the next step. Added WMI fix to installation steps Added binary database copy procedure Added action to hostnames alarm 11064 Added Replication troubleshooting for read only database issue
*Manual version number synchronised with release
49

Core Server Redundancy 2.6 - Installation Manual PDF

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Core Server Redundancy 2.6 - Installation Manual PDF

Caricato da

Copyright:

Formati disponibili

NICE Recording (CyberTech)

Core Server Redundancy

Installing redundancy components ................................................. 8

Configuring Core Server Redundancy ........................................... 15

Copying very large database files ................................................. 26

Upgrading a CSR system .............................................................. 29

Core Server Redundancy 2.6 Installation Manual

Core Server Redundancy 2.6 Installation Manual

Core Server Redundancy 2.6 Installation Manual

Core Server Redundancy 2.6 Installation Manual

2.1 Replay to Handset and CSR

Core Server Redundancy 2.6 Installation Manual

3 Installing redundancy components

3.1 Step 1 Install Core Server Redundancy on Core Servers

Core Server Redundancy 2.6 Installation Manual

5. The software is ready to install. Click Install.

Core Server Redundancy 2.6 Installation Manual

Core Server Redundancy 2.6 Installation Manual

Core Server Redundancy 2.6 Installation Manual

Core Server Redundancy 2.6 Installation Manual

3.2 Step 2 Install CSR Components on Satellites and CTI Servers

2. The components are now ready to install. Click Next.

Core Server Redundancy 2.6 Installation Manual

3.3 Step 3 Verify successful installation

Core Server Redundancy 2.6 Installation Manual

4 Configuring Core Server Redundancy

4.1 Defining core servers

4.2 Adding a core server resilience group

Core Server Redundancy 2.6 Installation Manual

3. Click on the resilience group to edit the configuration.

Core Server Redundancy 2.6 Installation Manual

4.3 Setting caching periods and timeouts

4.4 Checking core server status

Core Server Redundancy 2.6 Installation Manual

Core Server Redundancy 2.6 Installation Manual

4.5 Forcing failover

4.6 Core server fail-back (using web interface)

Core Server Redundancy 2.6 Installation Manual

Core Server Redundancy 2.6 Installation Manual

4.7 Core server fail-back (using replication utility)

Core Server Redundancy 2.6 Installation Manual

Core Server Redundancy 2.6 Installation Manual

Core Server Redundancy 2.6 Installation Manual

1. Start the Replication utility on the standby core server:

Core Server Redundancy 2.6 Installation Manual

12. Do the same for the active core server.

Core Server Redundancy 2.6 Installation Manual

5 Copying very large database files

5.1 Copying from active core to standby core (not failed-over)

Core Server Redundancy 2.6 Installation Manual

Core Server Redundancy 2.6 Installation Manual

Core Server Redundancy 2.6 Installation Manual

6 Upgrading a CSR system

6.1 Upgrading CSR 2.5.2 or lower to CSR 2.6

Core Server Redundancy 2.6 Installation Manual

to apply the settings.

6.2 Upgrading CT6 feature pack with CSR

Core Server Redundancy 2.6 Installation Manual

6.3 Upgrading CT5 feature packs with CSR

Core Server Redundancy 2.6 Installation Manual

6.4 Upgrading Active CTI integration with CSR

Core Server Redundancy 2.6 Installation Manual

Appendix A Core redundancy alarms