Sei sulla pagina 1di 16

Troubleshooting scenarios with

SQL Server availability groups


David Barbarin
David BARBARIN

6.5 <= SQL Server <= 2016

mikedavem1@hotmail.com
david.barbarin@dbi-services.com

http://blog.developpez.com/mikedavem (French)
http://tinyurl.com/h5ufofp (English)

@mikedavem
Our Main Sponsors:
Say Thank you to Volunteers:
 They spend their FREE time to give you this event.
 Because they are crazy. 
 Because they want YOU
to learn from the BEST IN THE WORLD.

Eduardo Piairo Diamantino Falcão Filipe Coelho

João Sarmento Nuno Rafael


Sponsor Sessions at 15:05

 Don’t miss them, they might be getting


distributing some awesome prizes!
Agenda

 Observation

 Availability groups tool box

 Most common issues


Dealing with availability groups ?

Observation
For some customers… … but for others
Top customer issues survey

~25 customers
Performance SQL12 – SQL14
Timeout / Quorum
Active directory / DNS
Transaction log

Unexpected / Expected
failovers

AlwaysOn config
Availability groups Toolbox

 Cluster
 Windows log
 Cluster log (UTC by default)
 SQL Server
 XE sessions (system_health, AlwaysOn, SQLDIAG)
 SQL Server error log
 DMVs (sys.dm_hadr_*, sys.availability_*, sys.dm_os_*,
sys.dm_io_* … )
 Perfmon counters (SQL:Availability Replica / Database Replica
/ Databases, Processor:*, Network interface:*, Logical Disk:*)
Troubleshooting – Quorum issues
 Misconfigured quorum regarding the topology
 Quorum state not monitored by the IT / DBA team
 Connectivity issues with the cluster nodes / witness
 Unexpected behaviors by using unconfigured antivirus
Troubleshooting – AD / DNS issues
 Active directory permissions
 CNO / VCO
 DNS
 Existing objects in the AD

Cluster
CNO VCO
admin
Troubleshooting – Listener issues
 AD / DNS permissions
 IP conflict
 Port conflict
 Unconformed listener

Listener Port SQL Server Port EndPoint Port


Troubleshooting – Kerberos issues
 Misconfigured services accounts
 Misconfigured SPN
 SPN and read-only replicas
Troubleshooting – TLog issues
 Transaction log backups don’t occur
 No free space available
 Blocked redo thread
 Disconnected replica(s)
 Get stuck with “availability replica” state
Troubleshooting – Secondaries

 Replication
 Unhealthy states in sync / async mode
 Asymmetric architecture
 Join replica issues (errors 35250)
 Authentication and permissions to endpoint ports
 Firewall / Antivirus configuration
 DNS lookup
 Read-only replicas
 Misconfigured routes
 Option is enabled but not used (/!\ storage impact /!\)
 Statistic and execution plan behaviors
Troubleshooting – Failover issues
 2 main categories of events

 Unexpected failovers that occur (resources, server restart,


sqlservr.exe crash, misconfigured failover policy …)
 Expected failovers that do not occur (disks / quorum /
registry issues)

Potrebbero piacerti anche