Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
When you work as DBA, many people will approach you with a complaint like "Application is
taking ages to load the data on a page,could you please check something going wrong with
database server?" There might be hundred of other reason for slowness of the page.It might be a
Problem with application server,network issues,really a bad implementation or problem with
database server due to generation of huge report /job running at that moment. What ever be the
issue, database gets the blame first. Then it is our responsibility to cross check the server state.
Let us discuss how we can approach this issue. I use following script to diagnose the issues. The
first script which I will run against server is given below:
SELECT
parent_node_id AS Node_Id,
COUNT(*) AS [No.of CPU In the NUMA],
SUM(COUNT(*)) OVER() AS [Total No. of CPU],
SUM(runnable_tasks_count ) AS [Runnable Task Count],
SUM(pending_disk_io_count) AS [Pending disk I/O count],
SUM(work_queue_count) AS [Work queue count]
FROM sys.dm_os_schedulers WHERE status='VISIBLE ONLINE' GROUP BY parent_node_id
Number of records in the output will be equal to number of NUMA nodes (if it is
fetching only one record , it is not a NUMA supported server)
Node_id : NUMA node id . Can be mapped into the later scripts.
No.of CPU in the NUMA : Total number of CPU assigned to the specific NUMA node or
the number of schedulers.
Total No. of CPU : Total No. of CPU available in the server.If you have set the affinity
mask, total number of CPU assigned to this instance.
Runnable Task Count: Number of workers, with tasks assigned to them, that are waiting
to be scheduled on the runnable queue. Is not nullable. In short number of request in
runnable queue.To understand more about Runnable queue , read my earlier post.
Pending disk I/O count : Number of pending I/Os that are waiting to be completed. Each
scheduler has a list of pending I/Os that are checked to determine whether they have been
completed every time there is a context switch. The count is incremented when the
request is inserted. This count is decremented when the request is completed.
Work queue count: Number of tasks in the pending queue. These tasks are waiting for a
worker to pick them up.
I have scheduled this scrip to store the output this query in a table for two days in the interval of
10 minutes. That will give baseline data about what is normal in your environment. In my
environment people will start complaining once the Runnabable Task Count of most of the nodes
goes beyond 10 consistently. In normal scenario, the value of Runnabable Task Count will be
always below 10 on each node and never seen a value greater than 0 for work queue count
field.This will give a picture of current state of the system.If the output of this step is normal, we
are safe to an extent, the slow response might be issue which might be beyond our control or a
blocking and slow response is only for a couple of screens(sessions) not for entire system.
In the Part 1, we have seen how quickly we can check the runnable task and I/O pending task on
an SQL server instance. This is very light weight script and it will give the result even if the server
is under pressure and will give an over all state of the server at that moment.
The next step (Step2) in my way of diagnosing is to check the session that are waiting of any
resources. Below script will help us. This query required a function as prerequisite, which will
help us to display the SQL server agent job name if the session started by SQL server agent.
/*************************************************************************************
****
PREREQUISITE FUNCTION
**************************************************************************************
****/
USE MASTER
GO
CREATE FUNCTION ConvertStringToBinary ( @hexstring VARCHAR(100)
) RETURNS BINARY(34) AS
BEGIN
If there is a session with very long wait_duration_ms and not blocked by any other session and not
going away from the list in the subsequent execution of the same query, I will look into the
program name,host name,login name and the statement that is running which will give me an idea
about the session.Based on all these information, I might decide to kill that session and look into
the implementation of that SQL batch. If the session is blocked, I will look into the blocking
session using a different script which I will share later.(Refer this post)
The next step (Step 3) is to list all session which are currently running on the server. I use below
query to do that.
/***************************************************************************************
STEP 3: List the session which are currently waiting/running
**************************************************************************************
**/SELECT node.parent_node_id AS Node_id,
es.HOST_NAME,
es.login_name,
CASE WHEN es.program_name LIKE '%SQLAgent - TSQL JobStep%' THEN
(SELECT 'SQL AGENT JOB: '+name FROM msdb..sysjobs WHERE
job_id=ADMIN.DBO.ConvertStringToBinary
(LTRIM(RTRIM((SUBSTRING(es.program_name,CHARINDEX('(job',es.program_name,0)+4,35)))))
)ELSE es.program_name END AS program_name ,
DB_NAME(er.database_id) AS DatabaseName,
er.session_id,
wt.blocking_session_id,
wt.wait_duration_ms,
wt.wait_type,
wt.NoThread ,
er.command,
er.status,
er.wait_resource,
er.open_transaction_count,
er.cpu_time,
er.total_elapsed_time AS ElapsedTime_ms,
er.percent_complete ,
er.reads,er.writes,er.logical_reads,
wlgrp.name AS ResoursePool ,
SUBSTRING (sqltxt.TEXT,(er.statement_start_offset/2) + 1,
((CASE WHEN er.statement_end_offset = -1
THEN LEN(CONVERT(NVARCHAR(MAX), sqltxt.TEXT)) * 2
ELSE er.statement_end_offset
END - er.statement_start_offset)/2) + 1) AS [Individual Query],
sqltxt.TEXT AS [Batch Query]
FROM
SYS.DM_EXEC_REQUESTS er INNER JOIN SYS.DM_EXEC_SESSIONS es ON es.session_id=
er.session_id
INNER JOIN SYS.DM_RESOURCE_GOVERNOR_WORKLOAD_GROUPS wlgrp ON
wlgrp.group_id=er.group_id
INNER JOIN (SELECT os.parent_node_id ,task_address FROM SYS.DM_OS_SCHEDULERS OS
INNER JOIN SYS.DM_OS_WORKERS OSW ON OS.scheduler_address=OSW.scheduler_address
WHERE os.status='VISIBLE ONLINE' GROUP BY os.parent_node_id ,task_address ) node ON
node.task_address=er.task_address
LEFT JOIN
(SELECT session_id, SUM(wait_duration_ms) AS
wait_duration_ms,wait_type,blocking_session_id,COUNT(*) AS NoThread
FROM SYS.DM_OS_WAITING_TASKS GROUP BY session_id, wait_type,blocking_session_id) wt
ON wt.session_id=er.session_id
CROSS apply SYS.DM_EXEC_SQL_TEXT(er.sql_handle) AS sqltxt
WHERE sql_handle IS NOT NULL AND ISNULL(wt.wait_type ,'') NOT IN
('WAITFOR','BROKER_RECEIVE_WAITFOR')
ORDER BY er.total_elapsed_time DESC
GO
The columns are same as we discussed in step 2 . I used to analyse the sessions with more
total_elapsed_time and take appropriate actions like killing the session and look into
the implementation. In most of the scenario (where server was running perfectly but all off
sudden it become standstill) , I will be able fix the issue by following these steps. In the next
part let us discuss about blocking session and session with open transaction which is not active.