Sei sulla pagina 1di 20

Load balancing howto: LVS-dr + ldirectord + heartbeat 2 Environment SuSE Linux Enterprise Server 10 Service Pack 2 heartbeat-2.1.3-0.

9 IPVS v1.2.1 Problem A high capacity load balancing solution is needed to address current and future needs to provide highly available and scalable services. Solution Linux Virtual Server (LVS) provides the means of building scalable and high performing virtual cluster server. Heartbeat 2 can be used to further increase the availability of the virtual services. Limitations Iptables redirection to avoid ARP problems with direct routing load balancing is not covered. Heartbeat 2 SSH STONITH is used without quorumd or pingd. Very limited "tiebreaker" capability. Concepts LVS hides real servers behind a virtual IP and load balances the incoming request across all cluster nodes based on a scheduling algorithm. It implements transport-layer load balancing inside the Linux kernel, also called Layer-4 switching. There are 3 types of LVS load balancing: Network Address Translation (NAT) Incoming requests arrive at the virtual IP and are forwarded to the real servers by changing the destination IP address. The real servers send the response to the load balancer which in turn changes the destination IP address and forwards the response back to the client. As all traffic goes through the load balancer, it usually becomes a bottleneck for the cluster. IP Tunneling LVS sends requests to real servers through an IP tunnel (redirecting to a different IP

address) and the real servers reply directly to the client using their own routing tables. Cluster members can be in different subnets. Direct routing Packets from end users are forwarded directly to the real server. The IP packetis not modified as the real servers are configured to accept traffic for the shared cluster virtual IP address by using a virtual non-ARP alias interface. The response from the real server is send directly to the client. The real servers and load-balancer (LVS) have to be in the same physical network segment. (layer 2) As the load-balancer is the only entry point for all incoming requests, it would present a single point of failure for the cluster. A backup load-balancer is needed as well as a monitoring program that can fail over the service along with the connection statuses. I this example, Linux Director Daemon (ldirectord) is used to monitor and administer real servers in the LVS cluster and heartbeat 2 is used as the fail-over monitor for the load balancers (ldirectord). Note: linux-director or ldirerector in this document are used to refer to the load-balancing server. Ldirectord monitors the health of the real servers by periodically requesting a known URL and checking that the response contains an expected string. If a service fails on a server, then the server is taken out of the pool of real-servers and will be reinserted once it comes back on line. Goals

Click to view Image 1: Load balancer graph.

Load balancing behaviour Linux-directors and real servers will have 1 real interface with their IP address and 1 virtual alias interface that will be configured with the shared Virtual IP (VIP) 192.168.0.200. 1. A client will send a request for a web page from 192.168.0.200. 2. Ldirectord will check the IP and port number and If they are matched for a virtual service, a real server is chosen from the cluster by a scheduling algorithm, and the connection is added into the hash table which records connections. 3. The load balancer forwards the packet (VIP is unchanged) to the chosen real server. 4. When the real server receives the forwarded packet, it finds that the packet is for the address on its loopback alias interface, it processes the request and returns the result directly to the client High availability behaviour 1. Node level monitoring If one of the nodes (ldirector1/ldirector2) running cluster resources stops sending out heartbeat signals, declare it dead, reboot the node and fail over all resources to a different node. 2. Service level monitoring: If the VIP or ldirectord service fails, try to restart the service, if it fails, reboot the node and fail over all resources to a different node. 3. Service "stickiness" If a dead or stand-by node becomes active again, keep the resources where they run now and don't fail-back. Configuration: linux-director (load balancer) It is recommended to disable SuSEfirewall2 for the configuration to avoid networking issues. rcSuSEfirewall2 stop

chkconfig SuSEfirewall2_init off

chkconfig SuSEfirewall2_setup off Required software Install heartbeat and ldirectord by running:

zypper install heartbeat heartbeat-ldirectord perl-MailTools IP forwarding The Linux-Directors must be able to route traffic to the real-servers. This is achieved by enabling the kernel IPV4 packet forwarding. Edit /etc/sysct.conf and add net.ipv4.ip_forward = 1. # /etc/sysctl.conf net.ipv4.ip_forward = 1 For the changes to take effect, run: sysctl -p Ldirectord Create the file /etc/ha.d/ldirectord.cf and add: # /etc/ha.d/ldirectord.cf checktimeout=3

checkinterval=5

autoreload=yes

logfile="/var/log/ldirectord.log"

quiescent=yes

virtual=192.168.0.200:80

fallback=127.0.0.1:80

real=192.168.0.110:80 gate

real=192.168.0.120:80 gate

service=http

request="test.html"

receive="Still alive"

scheduler=wlc

protocol=tcp

checktype=negotiate IMPORTANT: The directives under "virtual=" have to start with a [TAB], not white space. Explanation virtual=92.168.0.200:80 Defines a virtual service by IP-address and port. real=192.168.0.110:80 gate Defines a real service by IP-address and port. The second argument defines the forwarding method, which in this case (gate) translates to *Direct routing*. request="test.html" Defines what file to request. receive="Still alive" Defines the expected response. See "man ldirectord" for configuration directives not covered here. What ldirector does: Ldirectord will connect to each real server once every 5 seconds (checkinterval) and request 192.168.0.110:80/test.html (real/request). If it does not receive the expected string "Still alive" (receive) within 3 seconds of the last check (checktimeout), it will remove the server from the available pool. It will be added again once the check succeeds.

Because of the quiescent=yes setting, the real servers won't be removed from the LVS table. Rather ,their weight is set to "0" so that no new connections will be accepted. Already established connections will be persistent until they timeout. Test Start ldirectord and check the real server table: /etc/init.d/ldirectord start Starting ldirectord... success

ipvsadm -L -n IP Virtual Server version 1.2.1 (size=4096)

Prot LocalAddress:Port Scheduler Flags

-> RemoteAddress:Port

Forward Weight ActiveConn InActConn

TCP 192.168.0.200:http wlc persistent 600 -> 192.168.0.110:http -> 192.168.0.120:http Note: The weight in this case is 0 because the real-servers are not configured and ldirectord could not fetch the test.html page. Disable ldirectord service Make sure ldirectord is *not running* and won't start on boot. Only heartbeat 2 will be allowed to start and stop the service. /etc/init.d/ldirectord stop Route 0 Route 0 0 0 0 0

/sbin/chkconfig ldirectord off

Heartbeat 2 This heartbeat 2 setup is not a production setup as no redundant heartbeat media is used. In a production setup at least two media are needed over which the heartbeat signals can be propagated. The SSH STONITH device should never be used in production ! Always use hardware STONITH devices, such as power switches, DRAC, iLO etc. Heartbeat 2 runs on the two Linux-Directors (load balancers) and handles bringing up the interface for the virtual address. This is the address to which end-users should connect. It will also monitor the ldirectord daemon. Main configuration file Create a file /etc/ha.d/ha.cf and add: # /etc/ha.d/ha.cf crm on

udpport 694

bcast eth0

node ldirector1 ldirector2 Note: The order of directives is significant. Explanation crm on Use heartbeat version 2 udpport 694 Which port Heartbeat will use for its UDP intra-cluster communication node ldirector1 ldirector2 Node names. Output of uname -n bcast eth0 Use device eth0 to broadcast the heartbeat Node authentication

The authkeys configuration file contains information for Heartbeat to use when authenticating cluster members. Create /etc/ha.d/authkeys and add: # /etc/ha.d/authkeys auth 1

1 sha1 YourSecretKey 1 - key number associated with this line sha1 - key signature method. YourSecretKey - shared secret key This file cannot be readable or writable by anyone other than root: chmod 600 /etc/ha.d/authkeys Name resolution Add node names to /etc/hosts on both linux-directors: # /etc/hosts 192.168.0.10 ldirector1

192.168.0.20

ldirector2

The name used here should be the output of uname -r. Time synchronization Even though not required, it is very useful in every cluster environment where you want to compare log files from different nodes. The time server should be outside the cluster. See novell documentation on how to configure an NTP client through YaST2. Propagate configuration to all nodes To configure Heartbeat 2 on the other nodes in the cluster run: /usr/lib/heartbeat/ha_propagate

on the heartbeat node you just configured or just copy over the /etc/ha.d/ha.cf and/etc/ha.d/authkeys. To start the cluster, run: /etc/init.d/heartbeat start on both nodes. It can take more than 1 min to elect the Designated Coordinator (DC) and synchronize the cluster when it start for the first time. Check the cluster status with: crm_mon -i 5 ============

Last updated: Tue May 20 04:58:32 2008

Current DC: ldirector1 (5792135e-ed53-438b-8a71-85f0285464c2)

2 Nodes configured.

0 Resources configured.

============

Node: ldirector2 (f8f2ad4a-a05d-416a-92a9-66b759768fb9): online

Node: ldirector1 (5792135e-ed53-438b-8a71-85f0285464c2): online

Connect to hb_gui On the linux-directors set the password for user: hacluster /usr/bin/passwd hacluster This is the user that can connect to the heartbeat cluster management consolehb_gui. Connect to hb_gui by running: /usr/bin/hb_gui & on any of the Ldirectors and specify the IP, username hacluster and the newly set password. Create resource group "load_balancer" A resource group places constraints on the resources to make their management easier. It enforces that resources within the group run on the same node and have to start in a specific order; from the top to the bottom and stop in the reverse order. Resource: virtual IP 1. Create a group named "load_balancer", leave "colocation" and "ordered" as "true" 2. Add the resource "IPAddr2" 3. Add the value for the parameter "ip" 192.168.0.200. This is the virtual IP address 192.168.0.200 that clients will connect to. 4. Add the parameter "lvs_support" with a value of "true"

Click to view Image 2: Resource group configuration - IP address. 5. Add an operation named "monitor", interval "20", timeout "10" start delay "0" On fail "restart"

Click to view Image 3: Resource group configuration - resource monitor. These values are by default in milliseconds and can be read this way: Check the VIP service every 20 milliseconds. If the monitor does not get a response within 10

milliseconds, try to restart the service on this node. If the restart fails, fence-off/reboot the node and fail over the resource to an active node. Note: If you wonder what's the difference between IPAddr and IPAddr2, the first one uses "ifconfig" and the second "ip" command to set up the interface. Moreover, IPAddr2 can be used as a cloned Cluster IP. Resource: ldirector 1. Add a native resource named "ldirectord" with the class "ocf/heartbeat" that belongs to the group "load_balancer". 2. Add the parameter "configfile" with the value of "/etc/ha.d/ldirectord.cf" 3. Add an operation name "monitor", interval "20", timeout "10" start delay "0" On fail "restart"

Click to view Image 4: Resource group configuration - ldirector. 4. Start the resource group Highlight the "load_balancer" group and click on the "Play" button on the top bar. The resource group should come up with a green light.

Click to view Image 5: Starting the resource group. STONITH "Shoot the other node in the head" (STONITH) is a fencing technique. If a node loses communication to the cluster, it will be fenced of the cluster. As heartbeat can't know for sure if the node is really dead or not, it uses STONITH to make the uncertain assumptions become a solid fact by powering down or rebooting the errant node. Generate SSH keys

On both nodes, execute as root: ssh-keygen -t rsa Generating public/private rsa key pair.

Enter file in which to save the key (/root/.ssh/id_rsa):

Created directory '/root/.ssh'. Enter passphrase (empty for no passphrase): # leave empty Enter same passphrase again: # leave empty

Your identification has been saved in /root/.ssh/id_rsa.

Your public key has been saved in /root/.ssh/id_rsa.pub. Generate the SSH keys without a passphrase (just hit enter when prompted for passphrase) Distribute SSH keys Distribute the SSH keys to both nodes using the following commands: #on ldirector1

ssh-copy-id -i /root/.SSH/id_rsa.pub ldirector2

#on ldirector2

ssh-copy-id -i /root/.SSH/id_rsa.pub ldirector1 After propagating the SSH keys, test if you can execute commands without being prompted for a password. Here is an example from ldirector2: ldirector2:~ # ssh -q -x -n -l root "ldirector1" "ls -l /"

total 22

drwxr-xr-x 2 root root 2920 May 19 22:47 bin

drwxr-xr-x 3 root root 624 May 19 23:03 boot

drwxr-xr-x 9 root root 6760 May 20 01:03 dev

drwxr-xr-x 79 root root 6616 May 20 05:03 etc

-- snip - Activate the ATD daemon The ATD daemon is used to execute the SSH STONITH reboot command. Activate it by running: /etc/init.d/atd start

chkconfig atd on STONITH clone resource Clones are resources that can run simultaneously on multiple nodes. 1. Add a native resource with resource_id "ssh_stonith" with resource type "external/ssh" 2. Add parameter "hostlist" with the value "ldirector1,ldirector2". 3. Check the "Clone" button and set the Attributes "clone_max" to "2" and "clone_node_max" to "1". This will make sure that only 1 STONITH clone can run on a single node and a total of 2 STONITH resources can run in the whole cluster.

Click to view Image 6: Clone resource configuration - SSH STONITH. Finalize configuration

1. Start the STONITH clone by pressing the "play" button. 2. Highlight the "linux-ha", do under the "Configurations" tab, check the "Stonith Enabled" box and set "Default Resource Stickiness" to "INFINITY"

Click to view Image 7: Completed configuration. WARNING: If your STONITH device does not work properly, the resources might never fail over in the case of a failure as the healthy node will try to STONITH the faulty node and will not take over resources until it get's an confirmation about a successful STONITH. If you have issues with this, try to disable STONITH through the hb_gui. Configuration check As all resources should be running now, check on the linux-director that is currently running the load_balancer group, if ldirectord and the virtual IP address are running. In this example, ldirector2 is running the resources: ldirector2:~ # ip add sh eth0 2: eth0: mtu 1500 qdisc pfifo_fast qlen 1000

link/ether 00:50:56:00:00:f0 brd ff:ff:ff:ff:ff:ff

inet 192.168.0.20/24 brd 192.168.0.255 scope global eth0 inet 192.168.0.200/24 brd 192.168.0.255 scope global secondary eth0

ldirector2:~ # ps x |grep ldirector 9918 ? S 0:00 /usr/bin/perl -w /usr/sbin/ldirectord /etc/ha.d/ldirectord.cf start

Configuration: real servers Virtual interface Edit /etc/sysconfig/network/ifcfg-lo and add: # /etc/sysconfig/network/ifcfg-lo IPADDR_0=192.168.0.200 # VIP

NETMASK_0=255.255.255.255

NETWORK_0=192.168.0.0

BROADCAST_0=192.168.0.255

LABEL_0='0' Restart the network: /etc/init.d/network restart The new lo:0 virtual interface is now active: ip add sh lo 1: lo: mtu 16436 qdisc noqueue

link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00

inet 127.0.0.1/8 brd 127.255.255.255 scope host lo inet 192.168.0.200/32 brd 192.168.0.255 scope global lo:0 Restrict ARP advertisements Clients will send all HTTP requests to the VIP 192.168.0.200. Before they can connect to the IP, an ARP request is made to match a MAC address to the requested IP address. Since the linux-directors and real servers both have an interface configured with the same virtual IP

address, each one of them can randomly reply to an ARP request for 192.168.0.200. This would break the load balancing for the cluster. To solve this problem, ARP replies for the virtual interfaces have to be disabled. Edit /etc/sysctl.conf and add: # /etc/sysctl.conf net.ipv4.conf.all.ARP_ignore = 1

net.ipv4.conf.eth0.ARP_ignore = 1

net.ipv4.conf.all.ARP_announce = 2

net.ipv4.conf.eth0.ARP_announce = 2

Load the changes with: sysctl -p Explanation net.ipv4.conf.all.ARP_ignore = 1 Enable configuration of ARP_ignore option net.ipv4.conf.eth0.ARP_ignore = 1 Do not respond to ARP requests if the requested IP address is configured on the "lo" (loopback) device or any virtual eth0:X device. net.ipv4.conf.all.ARP_announce = 2 Enable configuration of ARP_announce option net.ipv4.conf.eth0.ARP_announce = 2 As the source IP address of ARP requests is entered into the ARP cache on the destination, it has the effect of announcing this address. This is undesirable for the lo or any other virtual interfaces from the real servers. Using this setting, whenever the real server makes an ARP request, it tries to use the real IP as the source IP of the ARP request.

Default gateway The real-servers need to be set up so that their default route is set to the gatewayrouter's address on the server network and not an address on one of the linux-directors. In this example, 192.168.0.254 is the default gateway. echo "default 192.168.0.254" > /etc/sysconfig/network/routes; rcnetwork restart; # and check the routing table route -n Kernel IP routing table

Destination

Gateway

Genmask

Flags Metric Ref

Use Iface

192.168.0.0

0.0.0.0

255.255.255.0 U

0 eth0

169.254.0.0

0.0.0.0

255.255.0.0

0 eth0

127.0.0.0 0.0.0.0 Web server

0.0.0.0

255.0.0.0

U UG

0 0

0 0

0 lo 0 eth0

192.168.0.254 0.0.0.0

1. Install Apache2 by running: zypper install apache2 2. Create a test.html page that ldirectord will periodically check to determine if the service is available: echo "Still alive" > /srv/www/htdocs/test.html

echo "Real server 1" > /srv/www/htdocs/index.html Note: The default SLES10 SP2 Apache2 DocumentRoot is used in this example.

Repeat the same on real-server2 but change the index.html to "Real server2" so it is visible which web server is serving the request. Start HTTP service: /etc/init.d/apache2 start Note: We only use a virtual HTTP service. It is possible to configure ldirectord to check any other services, such as , oracle listener, MySQL, SMTP, POP/IMAP, FTP, LDAP,NNTP and others. Ldirectord test After setting up and starting the apache web server on both real-servers, check on the linuxdirector that is currently running the load_balancer resource group if both servers are available in the IPVS server pool: ldirector2:~ # ipvsadm -Ln IP Virtual Server version 1.2.1 (size=4096)

Prot LocalAddress:Port Scheduler Flags

-> RemoteAddress:Port

Forward Weight ActiveConn InActConn

TCP 192.168.0.200:80 wlc -> 192.168.0.110:80 -> 192.168.0.120:80 Route 1 Route 1 0 0 0 0

Now we see both servers with weight of 1. Connect with a browser to: 192.168.0.200. A page served by real-server1 or real-server2 should come up.

Click to view Image 8: Load balancing test. Testing the cluster heartbeat

put one of the ldirectors on stand-by using the hb_gui result: resource fail-over connect to the active linux-director and kill the ldirectord process (killall ldirectord) result: resource restart kill the heartbeat process (killall heartbeat) result: node reboot and resource fail-over kill network connection between ldirector1 and ldirector2 result: split-brain situation. Both nodes get quorum. The first one to send a successful STONITH takes over resources. Usually, the DC is the faster one to STONITH the other node. In the case of SSH STONITH, this does not work that well as a network connection is needed for the ssh command. ldirectord connect to 192.168.0.200 a couple of times result: index.html from real-server 1 or 2 is shown kill connection to real-server1 (wait 10 sec) and check connectivity again result: index.html from real-server2 is shown Caveats Lidrecord does not start When trying to run ldirector, the following errors occurs: /etc/init.d/ldirectord start Starting ldirectord... Can't locate Mail/Send.pm in @INC (@INC contains: /usr/lib/perl5/5.8.8/i586-linux-thread-multi /usr/lib/perl5/5.8.8

-- snip -Install perl-MailTools: zypper install perl-MailTools or the CPAN Mail::Send module: env FTP_PASSIVE=1 cpan -i Send::Mail

Failed start with error: /etc/init.d/ldirectord start Error [19251] reading file /etc/ha.d/ldirectord.cf at line X: Unknown command fallback=127.0.0.1:80 Make sure the directives in /etc/ha.d/ldirectord.cf undervirtual=192.168.0.200:80 begin with a [TAB]. Ldrectord shows real-servers with a weight of 0 Check if you can connect to the real servers directly with a browser and that the web page tree matches the ldirector request="test.html" and receive="Still alive" directives. SSH STONITH fails to reboot a node Make sure that the ATD and ssh daemons are running. Test if you can ssh to both nodes without a passphrase. Alternative solutions Keepalived Keepalived provides a strong and robust health checking for LVS clusters. It implements a framework of health checking on multiple layers for serverfailover, and VRRPv2 stack to handle director failover. Piranha Piranha provides the ability to load-balance incoming IP network requests across a farm of servers. IP Load Balancing is based on open source Linux Virtual Server (LVS) technology. Conclusion The combination of Heartbeat 2, ldirectord and LVS provides a robust framework of open source tools to build highly available clusters that can load balance work between two or more servers in order to assure optimal resource utilization, scalability and availability of services. The example shown here depicts a basic working setup, that can be fine tuned to meet more specific needs.

Potrebbero piacerti anche