SQLServerWiki

“The Only Thing That Is Constant Is Change”

Troubleshooting PLE issue:

Posted by database-wiki on April 11, 2016

Problem Description:
===================
We have observed that SQL Server PLE value has hit all-time record low in one of our environment.

Analysis:
========
Currently we have 16 gigs of memory in the server out of which 12 gigs has been allocated to SQL Server instance.

This is not to say that the instance is under memory pressure. If that’s the case, you will see the below error in the SQL Server error log and it’s a critical issue that needs immediate attention.

“A significant part of sql server process memory has been paged out. This may result in a performance degradation. Duration: 0 seconds. Working set (KB): 1086400, committed (KB): 2160928, memory utilization: 50%.”

During the above scenario an activity called workset trimming happens. Where windows signals SQL Server memory to move the pages in buffer pool to the windows page file. This scenario happens as a result of memory contention faced by entire server due to SQL and workset trimming hugely degrades the SQL performance.

As far as PLE issue, SQL Server instance has started to suffocate due to the shortage of contiguous memory in the buffer pool for optimal memory management.

PLE value is generally calculated using the below formula.

PLE = (DataCacheSizeInGB/4GB * 300 seconds)

so the value should ideally be (12GB/4GB) X300 = 900 seconds.

Below query give you the PLE value for the instance.

SELECT [object_name],
[counter_name],
[cntr_value]
FROM sys.dm_os_performance_counters
WHERE [object_name] LIKE ‘%Manager%’
AND [counter_name] = ‘Page life expectancy’

After monitoring the instance for some time, this value vacillates between 5, and as high as 200.

This is evident that the SQL server is frequently load and unloads pages from the buffer pool, with all the pages being used, and then a query run that causes a huge amount of different data to be read from disk, displacing much of what’s already in memory and causing a precipitous drop in PLE.

If you see DB1 pages in the buffer pool it’s close to 4 gigs. At any given point if SQL Server cannot find a contiguous space to accommodate 4 gigs worth of pages (Every ETL cycle) It will flush the existing pages of DB1 database file and load the other required pages.

Below is the query that will give you the datapages size in MB for all the databases.

DECLARE @total_buffer INT;
SELECT @total_buffer = cntr_value
FROM sys.dm_os_performance_counters
WHERE RTRIM([object_name]) LIKE’%Buffer Manager’
AND counter_name = ‘Total Pages’;

;WITH BufCount AS
(
SELECT
database_id, db_buffer_pages = COUNT_BIG(*)
FROM sys.dm_os_buffer_descriptors
WHERE database_id BETWEEN 5 AND 32766
GROUP BY database_id
)
SELECT
[Database_Name] = CASE [database_id] WHEN 32767
THEN’MSSQL System Resource DB’
ELSE DB_NAME([database_id]) END,
[Database_ID],
db_buffer_pages as [Buffer Count (8KB Pages)],
[Buffer Size (MB)] = db_buffer_pages / 128,
[Buffer Size (%)] = CONVERT(DECIMAL(6,3),
db_buffer_pages * 100.0 / @total_buffer)
FROM BufCount
ORDER BY [Buffer Size (MB)] DESC;

=> Below query give you the which object has got the large chunk of pages in bufferpool.

begin tran t1

select cached_MB,dd.name,s.name,s.index_id,s.type,s.type_desc,s.fill_factor from (
SELECT count(1)/128 AS cached_MB
, name
, index_id
FROM sys.dm_os_buffer_descriptors AS bd with (NOLOCK)
INNER JOIN
(
SELECT name = OBJECT_SCHEMA_NAME(object_id) + ‘.’ + object_name(object_id)
–name = ‘dbo.’ + cast(object_id as varchar(100))
, index_id
, allocation_unit_id
FROM sys.allocation_units AS au with (NOLOCK)
INNER JOIN sys.partitions AS p with (NOLOCK)
ON au.container_id = p.hobt_id
AND (au.type = 1 OR au.type = 3)
UNION ALL
SELECT name = OBJECT_SCHEMA_NAME(object_id) + ‘.’ + object_name(object_id)
–name = ‘dbo.’ + cast(object_id as varchar(100))
, index_id
, allocation_unit_id
FROM sys.allocation_units AS au with (NOLOCK)
INNER JOIN sys.partitions AS p with (NOLOCK)
ON au.container_id = p.partition_id
AND au.type = 2
) AS obj
ON bd.allocation_unit_id = obj.allocation_unit_id
WHERE database_id = db_id()
GROUP BY name, index_id
HAVING Count(*) > 128
) as dd
inner join sys.indexes s on dd.index_id=s.index_id and object_id(dd.name)=s.[object_id]
order by cached_MB desc

rollback tran t1

 

=> Pages from clustered index loaded to the buffer pool means we are having a query that does an index scan on the object.

=> Below query is used to get the execution plan for the index on the table with large number of pages in the bufferpool.

DECLARE @IndexName SYSNAME = ‘[PK_DB1]’;
DECLARE @DatabaseName SYSNAME;

SELECT @DatabaseName = ‘[‘ + DB_NAME() + ‘]’;

WITH XMLNAMESPACES
(DEFAULT ‘http://schemas.microsoft.com/sqlserver/2004/07/showplan’)
SELECT
n.value(‘(@StatementText)[1]’, ‘VARCHAR(4000)’) AS sql_text,
n.query(‘.’),
cp.plan_handle,
i.value(‘(@PhysicalOp)[1]’, ‘VARCHAR(128)’) AS PhysicalOp,
i.value(‘(./IndexScan/@Lookup)[1]’, ‘VARCHAR(128)’) AS IsLookup,
i.value(‘(./IndexScan/Object/@Database)[1]’, ‘VARCHAR(128)’) AS DatabaseName,
i.value(‘(./IndexScan/Object/@Schema)[1]’, ‘VARCHAR(128)’) AS SchemaName,
i.value(‘(./IndexScan/Object/@Table)[1]’, ‘VARCHAR(128)’) AS TableName,
i.value(‘(./IndexScan/Object/@Index)[1]’, ‘VARCHAR(128)’) as IndexName,
i.query(‘.’),
STUFF((SELECT DISTINCT ‘, ‘ + cg.value(‘(@Column)[1]’, ‘VARCHAR(128)’)
FROM i.nodes(‘./OutputList/ColumnReference’) AS t(cg)
FOR XML PATH(”)),1,2,”) AS output_columns,
STUFF((SELECT DISTINCT ‘, ‘ + cg.value(‘(@Column)[1]’, ‘VARCHAR(128)’)
FROM i.nodes(‘./IndexScan/SeekPredicates/SeekPredicateNew//ColumnReference’) AS t(cg)
FOR XML PATH(”)),1,2,”) AS seek_columns,
RIGHT(i.value(‘(./IndexScan/Predicate/ScalarOperator/@ScalarString)[1]’, ‘VARCHAR(4000)’), len(i.value(‘(./IndexScan/Predicate/ScalarOperator/@ScalarString)[1]’, ‘VARCHAR(4000)’)) – charindex(‘.’, i.value(‘(./IndexScan/Predicate/ScalarOperator/@ScalarString)[1]’, ‘VARCHAR(4000)’))) as Predicate,
cp.usecounts,
query_plan
FROM ( SELECT plan_handle, query_plan
FROM ( SELECT DISTINCT plan_handle
FROM sys.dm_exec_query_stats WITH(NOLOCK)) AS qs
OUTER APPLY sys.dm_exec_query_plan(qs.plan_handle) tp
) as tab (plan_handle, query_plan)
INNER JOIN sys.dm_exec_cached_plans AS cp
ON tab.plan_handle = cp.plan_handle
CROSS APPLY query_plan.nodes(‘/ShowPlanXML/BatchSequence/Batch/Statements/*’) AS q(n)
CROSS APPLY n.nodes(‘.//RelOp[IndexScan/Object[@Index=sql:variable(“@IndexName”) and @Database=sql:variable(“@DatabaseName”)]]’ ) as s(i)
–WHERE i.value(‘(./IndexScan/@Lookup)[1]’, ‘VARCHAR(128)’) = 1
OPTION(RECOMPILE, MAXDOP 1);

=> Query to find the buffer pool usage by objects:

IF OBJECT_ID(‘TempDB..#BufferSummary’) IS NOT NULL BEGIN
DROP TABLE #BufferSummary
END

IF OBJECT_ID(‘TempDB..#BufferPool’) IS NOT NULL BEGIN
DROP TABLE #BufferPool
END

CREATE TABLE #BufferPool
(
Cached_MB Int
, Database_Name SysName
, Schema_Name SysName NULL
, Object_Name SysName NULL
, Index_ID Int NULL
, Index_Name SysName NULL
, Used_MB Int NULL
, Used_InRow_MB Int NULL
, Row_Count BigInt NULL
)

SELECT Pages = COUNT(1)
, allocation_unit_id
, database_id
INTO #BufferSummary
FROM sys.dm_os_buffer_descriptors
GROUP BY allocation_unit_id, database_id

DECLARE @DateAdded SmallDateTime
SELECT @DateAdded = GETDATE()

DECLARE @SQL NVarChar(4000)
SELECT @SQL = ‘ USE [?]
INSERT INTO #BufferPool (
Cached_MB
, Database_Name
, Schema_Name
, Object_Name
, Index_ID
, Index_Name
, Used_MB
, Used_InRow_MB
, Row_Count
)
SELECT sum(bd.Pages)/128
, DB_Name(bd.database_id)
, Schema_Name(o.schema_id)
, o.name
, p.index_id
, ix.Name
, i.Used_MB
, i.Used_InRow_MB
, i.Row_Count
FROM #BufferSummary AS bd
LEFT JOIN sys.allocation_units au ON bd.allocation_unit_id = au.allocation_unit_id
LEFT JOIN sys.partitions p ON (au.container_id = p.hobt_id AND au.type in (1,3)) OR (au.container_id = p.partition_id and au.type = 2)
LEFT JOIN (
SELECT PS.object_id
, PS.index_id
, Used_MB = SUM(PS.used_page_count) / 128
, Used_InRow_MB = SUM(PS.in_row_used_page_count) / 128
, Used_LOB_MB = SUM(PS.lob_used_page_count) / 128
, Reserved_MB = SUM(PS.reserved_page_count) / 128
, Row_Count = SUM(row_count)
FROM sys.dm_db_partition_stats PS
GROUP BY PS.object_id
, PS.index_id
) i ON p.object_id = i.object_id AND p.index_id = i.index_id
LEFT JOIN sys.indexes ix ON i.object_id = ix.object_id AND i.index_id = ix.index_id
LEFT JOIN sys.objects o ON p.object_id = o.object_id
WHERE database_id = db_id()
GROUP BY bd.database_id
, o.schema_id
, o.name
, p.index_id
, ix.Name
, i.Used_MB
, i.Used_InRow_MB
, i.Row_Count
HAVING SUM(bd.pages) > 128
ORDER BY 1 DESC;’

EXEC sp_MSforeachdb @SQL

SELECT Cached_MB
, Pct_of_Cache = CAST(Cached_MB * 100.0 / SUM(Cached_MB) OVER () as Dec(20,3))
, Pct_Index_in_Cache = CAST(Cached_MB * 100.0 / CASE Used_MB WHEN 0 THEN 0.001 ELSE Used_MB END as DEC(20,3))
, Database_Name
, Schema_Name
, Object_Name
, Index_ID
, Index_Name
, Used_MB
, Used_InRow_MB
, Row_Count
FROM #BufferPool
ORDER BY Cached_MB DESC

=> once you get the execution plan. Check if the plan is optimal meaning if there is any missing indexes leading to page scan.
=> In our case creating the missing indexes increase the PLE value to an optimum level.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
%d bloggers like this: