341 High Availability

OpenEdge High Availabilty
Adam Backman Grand Poobah White Star Software
About the speaker

Head Winemaker White Star Software
One of the oldest and most respected consulting and training companies in the Progress OpenEdge sector
Lackey DBAppraise
Managed database services backed up by experienced Progress OpenEdge professionals not rookies off the bench
Read a book or two Snappy Dresser Knows a bit about systems and OpenEdge
Agenda
Are you really 24X7? Redundancy Replication Maintenance Failing over Conclusion
What is High Availability?
A real business need that requires full access to current data at any time of the day or night Many sites are kind of 24X7 but only a small percentage of companies have real business requirements that necessitate access to the data 24 hours a day. Some applications have high availability needs but only during given hours which simplifies maintenance The need is growing every day
Are You Really 24X7?
Business runs 24 hours a day

3-shift manufacturing, Utility, Casino, Website,
Business needs access 24 hours

Work during the day, report and plan at night
Weekend requirements
What is High Availability?
The ability to keep running your business Continuous Access which allows for failures with zero impact to the users Minimally Invasive failure management like using HACMP clustering with OpenEdge as a cluster service Major Failover where physical location of the application must be changed Minimal recovery time in case of disaster It is not disaster recovery DR is only used when HA fails
Before you begin
Understand your business Understand the cost of downtime Do not build a solution that costs more that what you are protecting
People
Who owns the data Be inclusive with invites most will drop out This is not solely an IT decision
You are the keeper, not owner of the data You know what is technically possible You know the cost of the tech needed to build the solution
The goal is to eliminate surprises if/when a problem occurs
Planning
Budget it is not free Hardware fault tolerant, redundancy, Software OpenEdge plus ALL the other stuff you have to run the operation Knowledge Buy or Rent Time schedule and outage time Personnel constraints Who is on call and who is their backup
Causes of Downtime
Hardware
Disks are most vulnerable as they are the only moving part unless you have SSD Power - All the hardware requires power
Software
OS bug OpenEdge (core or application) bug
Natural disaster
Fire Flood
Sabotage Human Error
Basic Rules
Good Hardware
Trusted vendor Good support (local support if possible)
No Windows (OK, maybe 2008) You need a good recovery plan You will run with after imaging enabled
Redundancy
Hardware Software Personnel
Redundancy: Hardware
Power (UPS or UPS + Generator) Mirrored disks Network - in machine and general network Non-interleaved memory (some use FT memory) Multiple CPUs Support hardware (PCs, terminals, phone,) Complete failover environment
Hardware
Why have a UPS and a generator?

UPS has limited capacity Generators can run for a long time Have a reliable source of extra fuel
Hardware
Do not let standby systems sit idle Use them for development or test Keep copies of all support files
.pf .ini .d
Redundancy: Software
Host-based are least fault tolerant Web-based can provide a good environment provided the AppServer calls are stateless In client/server model remember that file servers need to be redundant as well
Redundancy: Software
NameServer on the broadcast and clustered Dont use the NameServer Cluster your AppServers so if a single AppServer fails there is another to pick up the load
Redundancy: Staffing
Is the failover machine close? Can it reliably be accessed remotely (failure point) Possible to call in additional resources?
More hands Different skills Relief of tired staff
Is it necessary to support all functions or only core?
Replication of Data
Database data
OpenEdge replication (synchronous) Log-based replication (asynchronous) Hardware-based replication (?)
Application and User files

OS utililty (fsync, rsync, ) Hardware (remote mirroring) Third-party (polyserve)
Replication: OpenEdge
Pros:
Supported product Synchronous Fast (Really Fast)
Cons
Cost Yet another thing to support Additional resource usage
Replication: Log-based
Pros:
Cheap (Not free, but close) Easy to setup and maintain
Cons:
No formal support Additional resource utilization
Hardware Replication
Pros:
Easy setup Easy Maintenance
Cons:
Expensive Possibility of data corruption unless ALL writes are guaranteed
Maintenance
Script everything to eliminate human error Scheduled Maintenance

Application changes Backups Index maintenance Adding space
Unscheduled maintenance
Eliminate unscheduled maintenance buy monitoring and trending
Maintenance: Application
Schema
Use fast schema add then add default value Still requires an outage for some changes due to table locks
Code changes
If you are n-tier you can stop the AppServer to reduce the interruption Switch to a different propath and move clients over through natural attrition
Maintenance: Backups
Progress backup
Reliable Online option
Split mirror backup Replication backup

Eliminate overhead on production db Must be a no recover backup for log-based replication
Maintenance: Index
Index rebuild cannot be run against a replicated database Use index compact online
proutil <dbname> -C idxcompact <table.index>
Notes:
Watch for open transactions as idx compact will do a significant amount of logging Schedule outside of busy times to allow replication to keep up
Maintenance: Add Space (Online and offline approaches)
prostrct addonline to add space while you are running Process

Make sure your umask is correct Validate your add.st file prostrct addonline db add.st
prostrct is supported for both source and target databases with the exception of prostrct unlock Process
Shutdown source and target Make changes to source Make changes to target Start both databases
Maintenance
All maintenance should be scripted and tested in a test environment before proceeding with the Production run
Eliminate the human element (no typos) Know how long it will take Make sure maintenance does not cause a problem Apply and test schema changes thoroughly
Building a failover plan

Who
Business and technical personnel Gets informed email, conference call, call tree, Makes Decisions Does the work
What
What resources are affected?
Where
Location of physical resources Location of personnel Location of replacement/replication target
Building a failover plan - continued
When
Times of backups Times of data archiving Times of backup archiving Times of log archiving
Why
What are we protecting ourselves from Why did we choose not to deal with some event
Risk Assessment
Things to consider
Risk Natural Disaster, Human caused, hardware, Likelihood Impact to application environment Time to recover
It is OK to say we considered that and it was not high enough in likelihood in our eyes to create a solution Determine the dependency of each level
Hardware requires power OpenEdge application requires PostalSoft
Solutions
Document redundancy where it exists Document places where redundancy is missing or unknown (on purpose or omission) Ensure reasonable software update procedures are in place and documented Verify security, division of responsibilities and software release policies per layer Need to develop Risk Assessment form
Aspects of a failover plan

When
When do we decide to move to the standby environment? Who makes the decision? Who does the work along with a backup for who does the work Defined process Service level agreements with customers Milestones in the process
Why
This is a tougher decision than you think Fix or flee lost time vs. lost data
Documenting your plan
Your plan should be able to be executed by anyone You cannot have enough detail Automate as much of the process as possible to eliminate the human element Document and automate both the failover and the failback
Test your plan
Switch over to your standby environment and run for a day or more You dont want to cause an extended outage testing your plan You will only find issues if you run at full load Do this at least once a year Follow your document and correct mistakes as you go
Keep documents and support files up-to-date
Keep your failover and failback documents up-to-date Keep contact lists up-to-date Keep all individual process documents up-to-date Keep copies of your support files
Scripts Application (.pf, .ini, .properties, )
Good password management Keep everything accessible (online and hard copies)
Points to Remember
Build redundancy into all aspects of your operation Look at the likelihood of a failure and its impact to the customer Protect your entire application environment both hardware and software Build a total solution but think about the cost/benefit of each component Automate tasks to eliminate human error Test your failover plan at least once a year
Questions?
Adam Backman adam@wss.com
Thank you for your time!

341 High Availability

Caricato da

Informazioni sul documento

Descrizione originale:

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

341 High Availability

Caricato da

Copyright:

Formati disponibili

OpenEdge High Availabilty

Adam Backman Grand Poobah White Star Software

About the speaker

What is High Availability?

Are You Really 24X7?

Business runs 24 hours a day

Business needs access 24 hours

What is High Availability?

Before you begin

The goal is to eliminate surprises if/when a problem occurs

Sabotage Human Error

Hardware Software Personnel

Why have a UPS and a generator?

Is it necessary to support all functions or only core?

Application and User files

Script everything to eliminate human error Scheduled Maintenance

Split mirror backup Replication backup

Maintenance: Add Space (Online and offline approaches)

prostrct addonline to add space while you are running Process

Building a failover plan

Building a failover plan - continued

Aspects of a failover plan

Documenting your plan

Test your plan

Keep documents and support files up-to-date

Thank you for your time!

Potrebbero piacerti anche