Sei sulla pagina 1di 49

Exchange Server 2013

High Availability and Site Resilience


Scott Schnoll, Senior Content Developer, Microsoft Corporation
scott.schnoll@microsoft.com
New in Exchange Server 2013
Storage
Multiple databases per volume
Autoreseed
Self-recovery behaviors
Lagged copy innovations
High Availability
Managed Availability
Database failover changes
Best copy selection changes
DAG network innovations
Site Resilience
http://aka.ms/E15HATechEdAU
http://aka.ms/E15HATechEdNZ
http://aka.ms/E15HATechDaysNL
http://aka.ms/E15HATechEdNA
http://aka.ms/E15HATechEdEU
Agenda
DAG architecture
MSExchangeRepl
MSExchangeDAGMgmt
Cluster
Crimson Channel
Witness Server Placement
Dynamic Quorum
DAG Member Maintenance
DAG Replication Service
Introduced in Exchange 2007 RTM
Microsoft Exchange Replication service | MSExchangeRepl
MSExchangeRepl.exe
Runs on all Mailbox servers (not just DAG members)
Communicates with Active Directory and other DAG members
Includes 16 components
Active Directory lookup Replay RPC server wrapper TPR API manager
Copy status lookup Remote data provider wrapper Support API manager
Replay core manager VssWriter Server locator manager
Seed manager Active Manager Health state tracker
Autoreseed manager Active Manager RPC server wrapper
Disk reclaimer manager Failure item manager
DAG Management Service
Introduced in RTM CU2
Microsoft Exchange DAG Management service |
MSExchangeDagMgmt
MSExchangeDagMgmt.exe
Runs on all Mailbox servers (not just DAG members)
Communicates with Active Directory and other DAG members
Includes 4 components
Active Directory lookup
Copy status lookup
Monitoring
Tracer instance
DAG Management Service
Created for two primary reasons:
so the Replication service can have more focused functionality
so Managed Availability actions can kill lower-priority activities
Writes events to same place as Replication service
Application event log (source of MSExchangeRepl)
HighAvailability crimson channel
As we refactor more, other functions will move to this service
AutoReseed
Disk Reclaimer
Dynamic replay lag playdown
Future AutoDAG copy layout and mobility features
Cluster Service
Introduced in NT Server Enterprise Edition (1997)
Cluster Service | ClusSvc
Clussvc.exe
Exchange DAGs use several Cluster components
Quorum
Membership and Node Management
Networks and Heartbeating
Cluster Registry
Cluster Service
Quorum is required to mount databases
Quorum is based on votes, not members
Votes can be taken away manually or dynamically
NodeWeight or Dynamic Quorum
Exchange manages quorum model, not quorum
Exchange management of quorum model based on nodes, not
votes
Removing votes requires manual configuration of quorum model
Exchange will make incorrect quorum model management
decisions if votes are manually removed at the cluster level
Cluster Registry
Active Manager stores information in the cluster
registry for DAG members
Registry changes are replicated immediately to all
DAG members
Stored information is used as part of BCSS
Cluster Registry
IsEntryExist?True*ActiveServer?ex2*LastMountedServer?ex2*LastMountedTime?2013-07-
15T22:29:39*MountStatus?Mounted*IsAdminDismounted?False*IsAutomaticActionsAllowed?True*
ActiveServer
Name of the server where the database is currently mounted or
is expected to be mounted when mount operations complete
LastMountServer
The name of the server where the database was last
successfully mounted
LastMountedTime
The date and time stamp of the last time the database was
mounted
Cluster Registry
IsEntryExist?True*ActiveServer?ex2*LastMountedServer?ex2*LastMountedTime?2013-07-
15T22:29:39*MountStatus?Mounted*IsAdminDismounted?False*IsAutomaticActionsAllowed?True*
MountStatus
The current mount status for the database
Possible values are mounted / dismounted
IsAdminDismounted
Designates whether the current dismounted status of the database is the
result of administrator action
Possible values are True / False
IsAutomaticActionsAllowed
Designates whether the database can be automatically activated by AM
Possible values are True / False
Cluster Registry
Last Log
Entry for each database copy in the DAG (named by the database
GUID)
Stores the last sequence number of the last generated log (in decimal)
Cluster Networking
Cluster provides network heartbeating for all
networks
Heartbeat tolerances are configurable
D cluster-1 SameSubnetDelay 1000 (0x3e8)
D cluster-1 CrossSubnetDelay 1000 (0x3e8)
D cluster-1 SameSubnetThreshold 5 (0x5)
D cluster-1 CrossSubnetThreshold 5 (0x5)
Cluster Networking
Cluster Network Communications
UDP unicast on port 3343
Heartbeats between nodes are TCP
IPv6 is supported for cluster IP addresses
Windows Network Orders still important
MAPI network at top of binding order
Followed by replication networks
Followed by iSCSI networks
Crimson Channel
Applications and Services logs
Area of event log used by applications for logging and internal
communication
Store events from a single application or component rather than events that
might have system-wide impact
This is referred to as an application's crimson channel
Exchange 2013 has multiple channels
ActiveMonitoring
HighAvailability
MailboxDatabaseFailureItems
ManagedAvailability
PushNotifications
Troubleshooters
Crimson Channel
Witness Server
A server that participates in a failover cluster with an even
number of members
Is not a member of the cluster
Does not contain a full copy of quorum data
Represented by File Share Witness resource
Created when Node and File Share quorum model used
Uses IsAlive Check for availability
If witness server or share is not available, cluster core resources are failed
and moved to another node
If another node does not bring witness resource online, the resource
remains in a Failed state, with restart attempts every 60 minutes
If needed for quorum, but cannot be brought online, quorum will be lost
Witness Server
A lock is not actively maintained on the witness
When it becomes necessary to obtain an additional vote to
maintain quorum
An SMB file lock is placed on the witness.log file by one node
Node paxos information is incremented by locking node and the
updated paxos tag written to the witness.log file
When it is no longer needed to maintain quorum
The lock on the witness.log file is released
Windows Failover Clustering
Node that locks witness.log retains the vote
Nodes in contact with the locking node are in the majority
and maintain quorum
Nodes not in contact with the locking node are in the
minority and lose quorum
Nodes not owning cluster core resources wait 6 seconds
prior to attempting to lock the FSW (arbitrationDelay)
Windows Failover Clustering
Cluster Core Resources
Sequence #: 20
Sequence #: 20
Cluster state change
node owning cluster
core resources locks FSW
updates sequence
number
Cluster Core Resources
Sequence #: 21
Lock witness.log
Sequence #: 21
Challenging node
attempts witness lock.
Lock already exists
sequence # higher,
challenge not successful.
All nodes available.
FSW lock released.
Changes replicated,
sequence numbers in
sync.
Sequence #: 22
Cluster Core Resources
Sequence #: 22
0 1 5 4 3 2 6 7 11 10 9 8 12 13 16 15 14
Windows Failover Clustering
Cluster Core Resources
Sequence #: 20
Cluster state change
node owning cluster
core resources
unavailable.
Cluster Core Resources
Sequence #: 21
Lock witness.log
Sequence #: 21
Challenging node
attempts witness lock.
No lock exists, lock
successful, sequence
number updated.
All nodes available.
FSW lock released.
Changes replicated,
sequence numbers in
sync.
Sequence #: 22
Cluster Core Resources
Sequence #: 22
Sequence #: 20
0 1 5 4 3 2 6 7 11 10 9 8 12 13 16 15 14
Witness Server Placement
Basic guidance for placement of witness server in Exchange
2010
We recommend that you use a Hub Transport server running on
Microsoft Exchange Server 2010 in the Active Directory site
containing the DAG. This allows the witness server and directory to
remain under the control of an Exchange administrator.
If your DAG is extended to multiple datacenters, we recommend
deploying the witness server in the datacenter that is considered to
be the primary datacenter.
Witness Server Placement
Exchange 2013 guidance more complicated due to
new options introduced by architectural changes
Options that were not recommended or possible in
previous versions of Exchange are now possible,
such as a third location (third physical datacenter or a
branch office)
Witness Server Placement
Ultimately, the placement of a DAGs witness server
depends on business requirements and the options
available to the organization
Deployment Scenario Recommendations
Single DAG deployed in a single datacenter Locate witness server in the same datacenter as DAG members
Single DAG deployed across two datacenters; no
additional locations available
Locate witness server in primary datacenter
Multiple DAGs deployed in a single datacenter Locate witness server in the same datacenter as DAG members. Additional options include:
Using the same witness server for multiple DAGs
Using a DAG member to act as a witness server for a different DAG
Multiple DAGs deployed across two datacenters Locate witness server in the same datacenter as DAG members. Additional options include:
Using the same witness server for multiple DAGs
Using a DAG member to act as a witness server for a different DAG
Single or Multiple DAGs deployed across more than
two datacenters
Locate the witness server in the datacenter where you want the majority of quorum votes to exist
Witness Server Placement
A DAGs witness server can be deployed in a third
location for automatic site resilience
The third location must have network infrastructure
and connectivity that is isolated from network failures
that affect the two datacenters with Exchange
For all DAGs, the availability of the witness server
should be on the Exchange administrators radar
Witness Server Placement
Windows Azure is not supported for use as a
Witness Server for Exchange DAGs
Azure does not support the required underlying
network configuration to enable an Azure file server
VM to act as a witness server
More info at http://aka.ms/DAGAzure
No IaaS or cloud providers are supported for
witness servers
Dynamic Quorum
Windows Server 2012+ Cluster feature
Enabled for all clusters by default
Cluster quorum majority is determined by the set of nodes
that are active members of the cluster at a given time
This is different from Windows Server 2008 R2, where
quorum majority is fixed, based on the cluster
configuration
Dynamic Quorum
Cluster dynamically manages vote assignment based on
state of node
When a node shuts down or crashes, it loses its vote
When a node rejoins the cluster, it regains its vote
Cluster can dynamically increase or decrease the
number of votes needed to maintain quorum and keep
running
Enables the cluster to maintain availability during
sequential node failures or shutdowns
Dynamic Quorum
It is now possible for a cluster to keep running on
the last surviving cluster node
If the cluster has quorum, number of votes needed
for quorum can be adjusted down to one node
This is called the Last Man Standing scenario
Dynamic Quorum
Dynamic quorum management does not allow the
cluster to sustain a simultaneous failure of a majority
of voting members
To continue running, the cluster must always have a
quorum majority at the time of a node shutdown or
failure
If you explicitly remove the vote of a node, the
cluster cannot dynamically add or remove that vote
Dynamic Quorum
Dynamic Quorum
X
X
X
Dynamic Quorum
X
X
X
X
Dynamic Quorum
X
X
X
X
X
Dynamic Quorum
X
X
X
X
X
Dynamic Quorum
X
X
X
X
X
Dynamic Quorum
X
X
X
X
X
Dynamic Quorum
X
X
X
X
X
X
Dynamic Quorum
Use Get-ClusterNode to verify DynamicWeight property of
Node
0 = no quorum vote
1 = quorum vote
Get-ClusterNode <Name> | ft name, *weight, state
Verify vote assignment with Validate Cluster Quorum test
Name DynamicWeight NodeWeight State
---- ------------------------- ------
EX1 1 1 Up
Dynamic Quorum and DAGs
Dynamic quorum does work with DAGs
Exchange is not dynamic quorum-aware
Dynamic quorum does not change quorum
requirements for DAGs
All internal DAG testing is performed with dynamic
quorum enabled
Dynamic quorum is enabled in Office 365
Dynamic Quorum and DAGs
Cluster team guidance:
Selecting this option generally increases the availability of the
cluster. By default the option is enabled, and it is strongly
recommended to not disable this option. This option allows the
cluster to continue running in failure scenarios that are not
possible when this option is disabled.
Exchange team guidance:
Leave it enabled for majority of DAG members
Dont factor it into availability plans
The advantage is that, in some cases where 2008 R2 would have lost
quorum, 2012 can maintain quorum; this only applies to a few cases,
and should not be relied upon when planning a DAG
DAG Member Maintenance
Basic guidance for DAG member maintenance in
Exchange 2010
Run StartDagServerMaintenance.ps1 to put DAG member
in maintenance mode
Perform the maintenance (e.g., install the update rollup)
Run StopDagServerMaintenance.ps1 to take DAG
member out of maintenance mode and put it back into
production
Optionally rebalance the DAG by using
RedistributeActiveDatabases.ps1
DAG Member Maintenance
Exchange 2013 guidance more complicated
Go into Maintenance Mode
Set-ServerComponentState <Server> -Component HubTransport -State Draining -Requester Maintenance
Set-ServerComponentState <Server> -Component UMCallRouter State Draining Requester Maintenance
Restart-Service MSExchangeTransport
Redirect-Message -Server <Server> -Target <FQDNTarget>
Suspend-ClusterNode <Server>
Set-MailboxServer <Server> -DatabaseCopyActivationDisabledAndMoveNow $True
Set-MailboxServer <Server> -DatabaseCopyAutoActivationPolicy Blocked
Set-ServerComponentState <Server> -Component ServerWideOffline -State Inactive -Requester Maintenance
Verify Maintenance Mode
Get-ServerComponentState <Server> | ft Component,State -Autosize
Get-MailboxServer <Server> | ft DatabaseCopy* -Autosize
Get-ClusterNode <Server> | fl
Get-Queue
DAG Member Maintenance
Exchange 2013 guidance more complicated
Go into Production Mode
Set-ServerComponentState <Server> -Component ServerWideOffline -State Active -Requester Maintenance
Set-ServerComponentState <Server> -Component UMCallRouter State Active Requester Maintenance
Resume-ClusterNode <Server>
Set-MailboxServer <Server> -DatabaseCopyActivationDisabledAndMoveNow $False
Set-MailboxServer <Server> -DatabaseCopyAutoActivationPolicy Unrestricted
Set-ServerComponentState <Server> -Component HubTransport -State Active -Requester Maintenance
Restart-Service MSExchangeTransport
Verify Production Mode
Get-ServerComponentState <Server> | ft Component,State -Autosize
Get-MailboxServer <Server> | ft DatabaseCopy* -Autosize
Get-ClusterNode <Server> | fl
Get-Queue
Thank you!
Questions?
Scott Schnoll
scott.schnoll@microsoft.com
Twitter: @Schnoll
Blog: http://aka.ms/schnoll

Potrebbero piacerti anche