SQL Server允许并发操作,BLOCKING是指在某一操作没有完成之前,其他操作必须等待,以便于保证数据的完整性。BLOCKING的解决方法要查看BLOCKING的头是什么,为什么BLOCKING头上的语句执行的很慢。通常来讲只要我们能找到BLOCKING头上的语句,我们总能够想出各种各种的办法,来提升性能,缓解或解决BLOCKING的问题。
但是问题的关键是,我们不知道BLOCKING什么时候会发生。用户跟我们抱怨数据库性能很差,等我们连上数据库去查看的时候,那时候有可能BLOCKING可能就已经过去了。性能又变好了。或者由于问题的紧急性,我们直接重新启动服务器以恢复运营。但是问题并没有最终解决,我们不知道下次问题会在什么时候发生。
BLOCKING问题的后果比较严重。因为终端用户能直接体验到。他们提交一个订单的时候,无论如何提交不上去,通常几秒之内能完成的一个订单提交,甚至要等待十几分钟,才能提交完成。更有甚者,极严重的BLOCKING能导致SQL Server停止工作。如下面的SQL ERRORLOG所表示, 在短短的几分钟之内,SPID数据从158增长到694, 并马上导致SQL Server打了一个dump, 停止工作。我们很容易推断出问题的原因是由于BLOCKING导致的,但是我们无法得知BLOCKING HEADER是什么,我们必须要等下次问题重现时,辅之以工具,才能得知BLOCKING HEADER在做什么事情。如果信息抓取时机不对,我们可能要等问题发生好几次,才能抓到。这时候,客户和经理就会有抱怨了。因为我们的系统是生产系统,问题每发生一次,都会对客户带来损失。
2011-06-01 16:22:30.98 spid1931 Alert There are 158 Active database sessions which is too high.
2011-06-01 16:23:31.16 spid3248 Alert There are 342 Active database sessions which is too high.
2011-06-01 16:24:31.06 spid3884 Alert There are 517 Active database sessions which is too high.
2011-06-01 16:25:31.08 spid3688 Alert There are 694 Active database sessions which is too high.
2011-06-01 16:26:50.93 Server Using 'dbghelp.dll' version '4.0.5'
2011-06-01 16:26:50.97 Server **Dump thread - spid = 0, EC = 0x0000000000000000
2011-06-01 16:26:50.97 Server ***Stack Dump being sent to D:\MSSQL10.INSTANCE\MSSQL\LOG\SQLDump0004.txt
2011-06-01 16:26:50.97 Server * *******************************************************************************
2011-06-01 16:26:50.97 Server *
2011-06-01 16:26:50.97 Server * BEGIN STACK DUMP:
2011-06-01 16:26:50.97 Server * 06/01/11 16:26:50 spid 4124
2011-06-01 16:26:50.97 Server *
2011-06-01 16:26:50.97 Server * Deadlocked Schedulers
2011-06-01 16:26:50.97 Server *
2011-06-01 16:26:50.97 Server * *******************************************************************************
2011-06-01 16:26:50.97 Server * -------------------------------------------------------------------------------
2011-06-01 16:26:50.97 Server * Short Stack Dump
2011-06-01 16:26:51.01 Server Stack Signature for the dump is 0x0000000000000258
BLOCKING的信息抓取有很多种方法。这里罗列了几种。并且对每种分析它的优缺点。以便我们选择。在枚举方法之前,我们先简单演示一下BLOCKING.
我们首先创建一个测试表:
DROP TABLE [TESTTABLE]
GO
CREATE TABLE [dbo] . [TESTTABLE] (
[ID] [int] NULL,
[NAME] [nvarchar] ( 50 ) NULL
)
GO
INSERT INTO TESTTABLE VALUES ( 1 , 'aaaa' )
GO
然后打开一个查询窗口,执行下面的语句, 该语句修改一行数据,并等待3分钟,然后在结束transaction
BEGIN TRANSACTION
UPDATE TESTTABLE SET [NAME] = 'bbbb' WHERE [ID] = 1
WAITFOR DELAY '00:03:00'
COMMIT TRANSACTION
这时候,如果打开另外一个查询窗口,执行下面的语句,下面的语句就会被BLOCK住。
UPDATE TESTTABLE SET [NAME] = 'cccc' WHERE [ID] = 1
方法一, 抓取SQL Profiler
======================
SQL Profiler里面包含大量的信息。其中有一个事件在Errors and Warnings->Blocked Process Report专门用来获得blocking的情况。但是因为信息量比较大,而且我们并不能很好的估算在什么时候会产生blocking,另外在生产环境使用Profiler, 对性能可能会有影响,所以SQL Profiler并不是最合适的工具。我们在这里并不对它赘述。
方法二, 执行查询
================
如果我们检查问题的时候,blocking还存在,那么,我们可以直接可以运行几个查询,得知BLOCKING HEADER的信息
SELECT * FROM sys . sysprocesses where spid > 50
上述查询只是告诉我们,BLOCKING HEADER的头是SPID=53, 但是并没有告诉我们SPID=53在做什么事情。我们可以用下面的查询,得到SPID=53的信息
DBCC INPUTBUFFER ( 53 )
我们可以把上述的两个查询合并起来,用下面的查询:
SELECT SPID = p . spid ,
DBName = convert ( CHAR ( 20 ), d . name ),
ProgramName = program_name ,
LoginName = convert ( CHAR ( 20 ), l . name ),
HostName = convert ( CHAR ( 20 ), hostname ),
Status = p . status ,
BlockedBy = p . blocked ,
LoginTime = login_time ,
QUERY = CAST ( TEXT AS VARCHAR ( MAX ))
FROM MASTER . dbo . sysprocesses p
INNER JOIN MASTER . dbo . sysdatabases d
ON p . dbid = d . dbid
INNER JOIN MASTER . dbo . syslogins l
ON p . sid = l . sid
CROSS APPLY sys . dm_exec_sql_text ( sql_handle )
WHERE p . blocked = 0
AND EXISTS ( SELECT 1
FROM MASTER .. sysprocesses p1
WHERE p1 . blocked = p . spid )
这样,一次执行,就能告诉我们BLOCKING header的SPID信息,以及该SPID在做的语句。我们可以进一步研究该语句,以理解为什么该语句执行很慢。
用这个方法有一个缺点,就是使用的时候,要求BLOCKING是存在的。如果BLOCKING已经消失了,那么我们的方法就不管用了。
方法三,长期执行一个BLOCKING SCRIPT
==================================
因为我们通常无法知道BLOCKING什么时候会产生,所以通常的办法是我们长期运行一个BLOCKING SCRIPT, 这样,等下次发生的时候,我们就会有足够的信息。长期运行BLOCKING SCRIPT对性能基本上是没有影响的。因为我们每隔10秒钟抓取一次信息。缺点是,如果问题一个月才发生一次,那么,我们的BLOCKING日志信息会很大。所以这种方法适用于几天之内能重现问题。
运行方法如下:
如果要停止运行,我们按ctrl+c就可以了。
BLOCKING的信息存在log.out这个文件中
我们可以打开log.out这个文件, 会发现SPID 54被 SPID 53给Block住了。
而随后,我们可以看到SPID=53在做什么事情:
下面是BLOCKING SCRIPT的脚本, 我们可以把它存为blocking.sql
use master
go
while 1 = 1
begin
print 'Start time: ' + convert ( varchar ( 26 ), getdate (), 121 )
Print 'Running processes'
select spid , blocked , waittype , waittime , lastwaittype , waitresource , dbid , uid , cpu , physical_io , memusage , login_time , last_batch ,
open_tran , status , hostname , program_name , cmd , net_library , loginame
from sysprocesses
--where (kpid <> 0 ) or (spid < 51)
-- Change it if you only want to see the working processes
print '*********lockinfor***********'
select convert ( smallint , req_spid ) As spid ,
rsc_dbid As dbid ,
rsc_objid As ObjId ,
rsc_indid As IndId ,
substring ( v . name , 1 , 4 ) As Type ,
substring ( rsc_text , 1 , 16 ) as Resource ,
substring ( u . name , 1 , 8 ) As Mode ,
substring ( x . name , 1 , 5 ) As Status
from master . dbo . syslockinfo ,
master . dbo . spt_values v ,
master . dbo . spt_values x ,
master . dbo . spt_values u
where master . dbo . syslockinfo . rsc_type = v . number
and v . type = 'LR'
and master . dbo . syslockinfo . req_status = x . number
and x . type = 'LS'
and master . dbo . syslockinfo . req_mode + 1 = u . number
and u . type = 'L'
order by spid
print 'inputbuffer for running processes'
declare @spid varchar ( 6 )
declare ibuffer cursor fast_forward for
select cast ( spid as varchar ( 6 )) as spid from sysprocesses where spid > 50
open ibuffer
fetch next from ibuffer into @spid
while ( @@fetch_status != - 1 )
begin
print ''
print 'DBCC INPUTBUFFER FOR SPID ' + @spid
exec ( 'dbcc inputbuffer (' + @spid + ')' )
fetch next from ibuffer into @spid
end
deallocate ibuffer
waitfor delay '0:0:10'
End
这种方法的缺陷就是,log.out会比较巨大,会占用很大的空间,如果blocking一个月甚至更长时间才发生一次,那我们的这个方法就不太适宜。
方法四,我们用Agent Job来检查BLOCKING
=====================================
长期运行一个BLOCKING SCRIPT的缺点是我们每隔一段时间,去查询信息,但是大多数收集的信息是无用的。所以会导致日志文件巨大,对于一个生产系统来讲,磁盘空间满可不是个好事情,另外,有一些客户对于用命令行来长期运行TSQL脚本有所顾忌,所以我们做了一个改进。这次,我们只收集有用的信息。对于无用的信息我们不关注。这样能极大减少日志大小。
我们首先创建一个观察数据库,然后建立两张表格 Blocking_sysprocesses和Blocking_SQLText, 建立一个存储过程和一个Job, 该Job每隔一段时间去调用存储过程。只有发现有blocking的,我们才记录到表格Blocking_sysprocesses和Blocking_SQLText这两个表格中。如果跟blocking无关,我们就不对它进行记录。下面是TSQL语句:
CREATE DATABASE [MonitorBlocking]
GO
USE [MonitorBlocking]
GO
CREATE TABLE Blocking_sysprocesses (
[spid] smallint ,
[kpid] smallint ,
[blocked] smallint ,
[waitType] binary ( 2 ),
[waitTime] bigInt ,
[lastWaitType] nchar ( 32 ),
[waitResource] nchar ( 256 ),
[dbID] smallint ,
[uid] smallint ,
[cpu] int ,
[physical_IO] int ,
[memusage] int ,
[login_Time] datetime ,
[last_Batch] datetime ,
[open_Tran] smallint ,
[status] nchar ( 30 ),
[sid] binary ( 86 ),
[hostName] nchar ( 128 ),
[program_Name] nchar ( 128 ),
[hostProcess] nchar ( 10 ),
[cmd] nchar ( 16 ),
[nt_Domain] nchar ( 128 ),
[nt_UserName] nchar ( 128 ),
[net_Library] nchar ( 12 ),
[loginName] nchar ( 128 ),
[context_Info] binary ( 128 ),
[sqlHandle] binary ( 20 ),
[CapturedTimeStamp] datetime
)
GO
CREATE TABLE [dbo] . [Blocking_SqlText] (
[spid] [smallint] ,
[sql_text] [nvarchar] ( 2000 ),
[Capture_Timestamp] [datetime]
)
GO
CREATE PROCEDURE [dbo] . [checkBlocking]
AS
BEGIN
SET NOCOUNT ON ;
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED
declare @Duration int -- in milliseconds, 1000 = 1 sec
declare @now datetime
declare @Processes int
select @Duration = 100 -- in milliseconds, 1000 = 1 sec
select @Processes = 0
select @now = getdate ()
CREATE TABLE #Blocks_rg (
[spid] smallint ,
[kpid] smallint ,
[blocked] smallint ,
[waitType] binary ( 2 ),
[waitTime] bigInt ,
[lastWaitType] nchar ( 32 ),
[waitResource] nchar ( 256 ),
[dbID] smallint ,
[uid] smallint ,
[cpu] int ,
[physical_IO] int ,
[memusage] int ,
[login_Time] datetime ,
[last_Batch] datetime ,
[open_Tran] smallint ,
[status] nchar ( 30 ),
[sid] binary ( 86 ),
[hostName] nchar ( 128 ),
[program_Name] nchar ( 128 ),
[hostProcess] nchar ( 10 ),
[cmd] nchar ( 16 ),
[nt_Domain] nchar ( 128 ),
[nt_UserName] nchar ( 128 ),
[net_Library] nchar ( 12 ),
[loginName] nchar ( 128 ),
[context_Info] binary ( 128 ),
[sqlHandle] binary ( 20 ),
[CapturedTimeStamp] datetime
)
INSERT INTO #Blocks_rg
SELECT
[spid] ,
[kpid] ,
[blocked] ,
[waitType] ,
[waitTime] ,
[lastWaitType] ,
[waitResource] ,
[dbID] ,
[uid] ,
[cpu] ,
[physical_IO] ,
[memusage] ,
[login_Time] ,
[last_Batch] ,
[open_Tran] ,
[status] ,
[sid] ,
[hostName] ,
[program_name] ,
[hostProcess] ,
[cmd] ,
[nt_Domain] ,
[nt_UserName] ,
[net_Library] ,
[loginame] ,
[context_Info] ,
[sql_Handle] ,
@now as [Capture_Timestamp]
FROM master. . sysprocesses where blocked <> 0
AND waitTime > @Duration
SET @Processes = @@rowcount
INSERT into #Blocks_rg
SELECT
src . [spid] ,
src . [kpid] ,
src . [blocked] ,
src . [waitType] ,
src . [waitTime] ,
src . [lastWaitType] ,
src . [waitResource] ,
src . [dbID] ,
src . [uid] ,
src . [cpu] ,
src . [physical_IO] ,
src . [memusage] ,
src . [login_Time] ,
src . [last_Batch] ,
src . [open_Tran] ,
src . [status] ,
src . [sid] ,
src . [hostName] ,
src . [program_name] ,
src . [hostProcess] ,
src . [cmd] ,
src . [nt_Domain] ,
src . [nt_UserName] ,
src . [net_Library] ,
src . [loginame] ,
src . [context_Info] ,
src . [sql_Handle]
, @now as [Capture_Timestamp]
FROM master. . sysprocesses src inner join #Blocks_rg trgt
on trgt . blocked = src . [spid]
if @Processes > 0
BEGIN
INSERT [dbo] . [Blocking_sysprocesses]
SELECT * from #Blocks_rg
DECLARE @SQL_Handle binary ( 20 ), @SPID smallInt ;
DECLARE cur_handle CURSOR FOR SELECT sqlHandle , spid FROM #Blocks_rg ;
OPEN cur_Handle
FETCH NEXT FROM cur_handle INTO @SQL_Handle , @SPID
WHILE ( @@FETCH_STATUS = 0 )
BEGIN
INSERT [dbo] . [Blocking_SqlText]
SELECT @SPID , CONVERT ( nvarchar ( 4000 ), [text] ) , @now as [Capture_Timestamp] from :: fn_get_sql ( @SQL_Handle )
FETCH NEXT FROM cur_handle INTO @SQL_Handle , @SPID
END
CLOSE cur_Handle
DEALLOCATE cur_Handle
END
DROP table #Blocks_rg
END
GO
USE msdb ;
GO
EXEC dbo . sp_add_job
@job_name = N'MonitorBlocking' ;
GO
EXEC sp_add_jobstep
@job_name = N'MonitorBlocking' ,
@step_name = N'execute blocking script' ,
@subsystem = N'TSQL' ,
@command = N'exec checkBlocking' ,
@database_name = N'MonitorBlocking';
GO
EXEC sp_add_jobSchedule
@name = N'ScheduleBlockingCheck' ,
@job_name = N'MonitorBlocking' ,
@freq_type = 4 , -- daily
@freq_interval = 1 ,
@freq_subday_type = 4 ,
@freq_subday_interval = 1
EXEC sp_add_jobserver @job_name = N'MonitorBlocking' , @server_name = N'(local)'
当Blocking发生一段时间后,我们可以查询下面的两个表格,以得知当时问题发生时的blocking信息:
use MonitorBlocking
GO
SELECT * from Blocking_sqlText
SELECT * FROM Blocking_sysprocesses