Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
different solutions
Robert Krzykawski
DB Team Coordinator, bwi n
games.
Anders Karlsson
Principal Sales Engineer, MySQL
Agenda
Who are we?
HA Basics – Anders
How we did it; Success or failure – Robert
Summary
Questions?
Anders Karlsson
Sales Engineer with Sun / MySQL
for 5+ years
I have been in the RDBMS business for
20+ years
I have worked for many of the major vendors and
with most of the vendor products
I’ve been in roles as
> Sales Engineer
> Consultant
> Porting engineer
> Support engineer
> Etc.
Outside MySQL I build websites
(www.papablues.com), develop Open Source
software (MyQuery, ndbtop etc), am a keen
photographer and drives sub-standard cars,
among other things.
Also: www.makezfsgpl.com ! Right now!
Robert Krzykawski
DB Team Coordinator @ bwi n Games AB
Have been working with MySQL in every way from
system admin, DBA, DBD and now taking a more
system architectural role.
Been involved in building both small and big web
based solutions since 1998 using MySQL.
My roles throughout my professional life have
varied. System administrator, Technical Sales
support, DBA, DBD, Programmer, Application
architect and System architect.
Off work I am trying to automate things with
scripts and programs to off load myself when “on
work”.
I am also trying to find time to snowboard, play
some paintball and a recently introduced hobby is
our Maine Coon kittens.
Why do you need HA
Something can break. It usually will, eventually
You will need to maintain your
database eventually, without shutting
the whole system down
Adding HA to an existing running
system is difficult, Much more so than
to provide HA from the start
You want a good nights sleep!
You want failover to be automatic!
HA Concepts
Fault tolerant architectures
> These are hardware architectures with supporting software
that prevents against even individual component failures
Single Point of Failure (SPOF)
> In any fault tolerant setup, you want to avoid a SPOF, as a
link is not better than it’s weakest link
Fail over and Fail back
> Fail over is the process of switching from a failed
component to another component, dormant or also active.
Fail back is the process of failing back from the backup
component to the original one.
Some HA Components
Heartbeat
> Heartbeat is an HA component that checks that the services
that are being failed over, are alive. Heartbeat can check
individual servers, software services, networking etc.
HA Monitor
> The HA Monitor has different names in different
frameworks. This is the component that allows configuration
of the services, ensures proper shutdown and startup and
allows manual control
Replication
> Replication is a common component that ensures that the
data content of managed data rich components are in sync
What should I require?
Don’t aim too high, aim for what is reasonable for
your needs
Aim to ensure that no important data is lost
> What is “important data”? You decide! Different data means
different “needs”!
Aim to ensure that the solution can be automated.
You will want this eventually anyway
Aim to ensure a solution that can easily be tested
and administered
Aim to ensure that the solution is performant and
scalable
HA with MySQL – In short
MySQL Replication
> Easy to use and set up. Low performance impact
> Asynchronous only. Failback can be difficult. Need
additional components
MySQL with DRBD / ZFS / AVS
> Easy to use. Low cost software only. Synchronous.
Good HA software integration.
> Certain performance impact. Limited data size and
transaction rates.
HA with MySQL – In short
MySQL with Shared storage
> Good performance. Eases hardware management.
Good integration with HA software.
> Costly. SAN itself is a SPOF.
MySQL Cluster
> Very good performance. Self contained. Very short
fail-over times. Software only solution.
> Needs several physical servers. Not optimized for
all MySQL applications.
bwin games ab
Our goal at bwin
We were faced with a requirement; establish a
highly available database platform.
We had some rules to follow from management.
> interruptions due to hardware failure should not require
hands-on work.
> Downtime should be minimized during interruptions.
> Performance of DB platform should not decrease when
operating as usual
> Performance can decrease if a failure has occurred but
should not deem the service unusable.
> Implementation should be done by the operations
department. Developers should not be involved.
What solutions did we consider?
Master/Master
Linux HA
HP Service Guard
Sun Cluster
Combination of the above
MySQL Cluster
Master Slave
HA Standby1
SAN01 SAN02
HA Standby2
SAN01 SAN02
We do..
Use Linux HA 2.0. Needed for setup of “cluster”
Use SAN. Shared storage is easier and faster, but
Expensive.
> DRBD can be used but saves the same data twice Also
comes with a performance decrease.
Heartbeat on two bonds. Primary database
interconnect network, secondary on database
service network
We have LUNs presented to multiple hosts
Services have rules to be run on specific hosts
only.
We fence using RiLOE
> Have plans to fence on port level in FC switches.
What’s good and what’s bad..
Easy and fast implementation
Our config does not increase/decrease
performance.
Innodb log size causes long recovery times.
Testing to decrease it has caused performance
penalties.
Our solution is not fool proof because of long
recovery times.
It causes interruption of service.
We can say it’s HA, but true HA solution would
give us 100% uptime.
2nd Setup is complicated. We should aim for
having simple setups. More common
What can we do better.
Fine tune config for faster recovery/startup
Add better fencing
Monitor failover in case recovery takes long