-
Notifications
You must be signed in to change notification settings - Fork 51
Azure SQL Database tips
This wiki page describes get-sqldb-tips.sql, a T-SQL script that produces tips to improve database design, health, and performance in Azure SQL Database. Tips are based on well-established best practices for the MSSQL database engine, with focus on Azure SQL Database specifics.
The script works on all Azure SQL Database service tiers, including Basic, Standard, Premium, General Purpose, Business Critical, and Hyperscale. Both single databases and databases in elastic pools are supported.
The script checks over 50 conditions to produce tips. New tips are added based on our customer engagements and community feedback.
After reading this wiki page, see FAQ for answers to common questions. For changes in each release, see Change log.
Download the latest release of the script. Unzip, and open the get-sqldb-tips.sql
file in a query window in SQL Server Management Studio (SSMS), Azure Data Studio (ADS), Query editor in Azure portal, or any MSSQL client tool of your choice. Connect the query window to your target database and execute the script. If your database uses compatibility level 100, use get-sqldb-tips-compat-level-100-only.sql
file.
The script usually executes in about a minute or less. It may take longer on resource-constrained service objectives such as Basic, S0-S2, or when CPU utilization is high.
The result set has the following columns:
-
tip_id
: Unique numeric identifier for a tip, included in tip descriptions on this page. -
description
: Short description of a tip. -
confidence_percent
: Our degree of confidence that following the tip is worthwhile and will result in a net benefit. Tips with lower confidence may require additional analysis and testing in the context of a specific application and workload. In the right circumstances, even tips with a relatively low confidence can produce substantial benefits. This confidence value applies to the tip definition and does not change from one execution of the script to the next. -
additional_info_url
: Pasting this URL into a browser window opens a detailed tip description on this wiki page. Descriptions include recommended actions, sample commands and queries, and an explanation for the data in thedetails
column. There are also links to additional resources (documentation, blogs, etc.) to provide background information on the topic. -
details
: This provides additional data to help you act on a tip. For example, this can be the list of top queries, or the list of tables/indexes that are data compression candidates. In SSMS and ADS, clicking on this column displays the data in a new window for easier reading. These details can change from one execution of the script to the next, as database state changes.
To return tips in a single JSON document instead of the default tabular format, set the @JSONOutput
variable to 1. This is useful if you want to send script results to others in a portable format, or if you are using Query editor in Azure portal and want to see the entire details
column.
- For each tip in the result, make sure to read the description on this wiki page using the link in
additional_info_url
, especially if the meaning of data in thedetails
column in not immediately clear. - Execute the script while a representative workload is running. This could mean running the script on a production database, or a representative copy of a production database. Without a representative workload running, or with a smaller (e.g. non-production) database, relevant tips may be missed, or less accurate tips may be generated. If workload type and intensity change over time, consider executing the script at different points in time to get all relevant tips.
- You can use the script as a troubleshooting tool, executing it when a performance or stability problem is occurring. This has a higher chance of producing specific tips to help you fix the problem.
- Some tips are based on system metadata accumulated since database engine startup. In those cases, the
details
column includes engine startup time. If the time passed since engine startup is short, consider rerunning the script when more time has passed, for the tip to be based on more detailed and representative system metadata. - This script can be executed on either primary or readable secondary replica of a database, including read scale-out replicas and geo-replicas. Some tips will only be produced on either primary or secondary replica. When the database has any readable secondary replicas, we recommend executing the script on the primary and each secondary replica because results may differ.
In this example, there are seven tips returned by the script. For most tips, there is additional information in the details
column.
The security principal executing the script must be either:
- A server administrator using SQL authentication or Azure Active Directory (AAD) authentication;
- A server login with an associated database user in the target database. The server login must be a member of
##MS_ServerStateReader##
server role. Additionally, to evaluate certain tips, theVIEW DEFINITION
permission on the database is required.
Here is a code sample to grant the necessary and sufficient access to execute the script and evaluate all tips to a login named tips
:
-- In the master database on an Azure SQL Database logical server
CREATE LOGIN tips WITH PASSWORD = 'strong-password-here';
ALTER SERVER ROLE [##MS_ServerStateReader##] ADD MEMBER tips;
ALTER SERVER ROLE [##MS_DefinitionReader##] ADD MEMBER tips;
-- In the user database where tips are to be produced, on the same logical server
CREATE USER tips FOR LOGIN tips;
This code grants permissions equivalent to a union of VIEW SERVER STATE
and VIEW DEFINITION
, which is required to query system metadata views used in the script. The ##MS_ServerStateReader##
and ##MS_DefinitionReader##
server roles do not grant any additional permissions, specifically no permissions to modify database schema or to read data.
The above permissions are sufficient to execute the script on any service tier. This is the recommended way to run the script because it ensures that all potential tips are evaluated.
On single databases in Standard (S2 and higher), General Purpose, Premium, Business Critical, and Hyperscale service tiers, for nearly all tips it is sufficient to only hold the VIEW DATABASE STATE
, VIEW DATABASE PERFORMANCE STATE
, and VIEW DEFINITION
permissions on the database. That can be granted as follows:
-- In the master database on an Azure SQL Database logical server
CREATE LOGIN tips WITH PASSWORD = 'strong-password-here';
-- In the user database where tips are to be produced, on the same logical server
CREATE USER tips FOR LOGIN tips;
GRANT VIEW DATABASE STATE, VIEW DATABASE PERFORMANCE STATE, VIEW DEFINITION TO tips;
Alternatively, using database-scoped authentication, i.e. without creating a server login in the master
database:
-- In the user database where tips are to be produced
-- Using SQL authentication
CREATE USER tips WITH PASSWORD = 'strong-password-here';
GRANT VIEW DATABASE STATE, VIEW DATABASE PERFORMANCE STATE, VIEW DEFINITION TO tips;
-- Or, using Azure Active Directory (AAD) authentication
CREATE USER [[email protected]] FROM EXTERNAL PROVIDER;
GRANT VIEW DATABASE STATE, VIEW DATABASE PERFORMANCE STATE, VIEW DEFINITION TO [[email protected]];
Note that for Basic, S0, and S1 single databases, and for any database in an elastic pool, VIEW DATABASE STATE
, VIEW DATABASE PERFORMANCE STATE
, and VIEW DEFINITION
are sufficient to produce only a few tips, and the recommended permissions described in the previous section should be granted.
If tips were skipped because of insufficient permissions, they will be listed in the second result set that describes skipped tips.
By default, all tips are evaluated. If you prefer to skip some tips, you can easily edit tip metadata in the script to not run the corresponding queries.
To do that, search the script for the Define all tips
comment to find the T-SQL INSERT
statement where tips are defined. In the VALUES
clause, change the first column value for the tips you want to exclude from 1
to 0
. In the following example, tip 1310
(Last partitions...) will be evaluated, while tip 1320
(Top queries...) will be skipped:
(1, 1310, 'Last partitions are not empty', ...),
(0, 1320, 'Top queries should be investigated and tuned', ...),
If any tips were skipped because of user-specified exclusions, they will be listed in the second result set that describes skipped tips.
Tips are produced by analyzing database engine metadata in system views. For many tips, heuristics (based on configurable thresholds) are used, thus the output is produced with a finite degree of confidence as indicated in the confidence_percent
column. For each tip, our goal was to make it useful for a significant portion of our customers, not necessarily for every customer, which would be impossible given the broad variety of designs and workloads across millions of databases in Azure SQL Database.
The script is lightweight and does not query any user data. To avoid impacting workloads, the script checks current CPU utilization, and aborts before executing any other queries if CPU utilization is very high. The script also aborts any query it is running if the query is blocked for more than a few seconds waiting on a lock, to avoid contributing to blocking chains. If any tips were skipped because lock timeout was exceeded, they will be listed in the second result set that describes skipped tips.
The database where you run the script must use compatibility level 110-150 or later, which is the case for the vast majority of databases in Azure SQL. For databases using compatibility level 100, we provide a different script, get-sqldb-tips-compat-level-100-only.sql
. When using that script, data in the details
column may not be sorted in any particular order.
To determine the compatibility level of your database, execute the following query in the context of the database (not in the master
database):
SELECT name, compatibility_level
FROM sys.databases
WHERE database_id = DB_ID();
This section includes detailed descriptions for tips that may be produced by the script. Descriptions below are not in any particular order, and should be viewed as collateral to script output, not as a set of general recommendations.
Maximum degree of parallelism (MAXDOP) for primary and secondary replicas is not in the recommended range of 1 to 8. Depending on workload, this may cause unnecessary resource utilization, and in extreme cases, errors and timeouts.
-- Set MAXDOP to 8 for both primary and secondary replicas
ALTER DATABASE SCOPED CONFIGURATION SET MAXDOP = 8;
ALTER DATABASE SCOPED CONFIGURATION FOR SECONDARY SET MAXDOP = PRIMARY;
Changing default MAXDOP in Azure SQL Database
ALTER DATABASE SCOPED CONFIGURATION
Maximum degree of parallelism (MAXDOP) for the primary replica is not in the recommended range of 1 to 8. Depending on workload, this may cause unnecessary resource utilization, and in extreme cases, errors and timeouts.
-- Set MAXDOP to 8 for the primary replica
ALTER DATABASE SCOPED CONFIGURATION SET MAXDOP = 8;
Changing default MAXDOP in Azure SQL Database
ALTER DATABASE SCOPED CONFIGURATION
Maximum degree of parallelism (MAXDOP) for secondary replicas is not in the recommended range of 1 to 8. Depending on workload, this may cause unnecessary resource utilization, and in extreme cases, errors and timeouts.
-- Set MAXDOP to 8 for the secondary replicas
ALTER DATABASE SCOPED CONFIGURATION FOR SECONDARY SET MAXDOP = 8;
Changing default MAXDOP in Azure SQL Database
ALTER DATABASE SCOPED CONFIGURATION
Database compatibility level is not current. Recently added database engine features and improvements may not be available when using older compatibility levels. Consider using latest supported compatibility level, but note that changing compatibility level may require functional and performance testing before and after the change, known as A/B testing.
-- Adjust to use the latest compatibility level
-- To find the latest compatibility level in Azure SQL Database, see the first link in tip description
ALTER DATABASE CURRENT SET COMPATIBILITY_LEVEL = 150;
Auto-create statistics is disabled. This may cause poor query performance due to suboptimal query plans. Enable auto-create statistics.
ALTER DATABASE CURRENT SET AUTO_CREATE_STATISTICS ON;
Auto-update statistics is disabled. This may cause poor query performance due to suboptimal query plans. Enable auto-update statistics.
ALTER DATABASE CURRENT SET AUTO_UPDATE_STATISTICS ON;
Read Committed Snapshot Isolation (RCSI) is disabled. This may cause unnecessary lock blocking, such as read queries blocking write queries. RCSI is the preferred alternative to the NOLOCK
hint, which should not be used in scenarios that require correct query results.
The sample query below enables RCSI for a database. Note that this command requires exclusive database access, and will terminate all connections to the database after waiting for 5 seconds to give open transactions an opportunity to complete.
-- Execute in the master database, and change to reference the actual database name
ALTER DATABASE [database-name] SET READ_COMMITTED_SNAPSHOT ON WITH ROLLBACK AFTER 5 SECONDS;
SET TRANSACTION ISOLATION LEVEL
The Dirty Secrets of NOLOCK, a free course by Kendra Little
Query Store is disabled. This complicates performance tuning and troubleshooting, and could disable certain Intelligent Query Processing features that can improve query performance and reduce resource consumption. Enable Query Store.
ALTER DATABASE CURRENT SET QUERY_STORE = ON;
This tip may only appear on the primary replica.
Monitoring performance by using the Query Store
Best practices with Query Store
Query Store blog articles by Erin Stellato
Query Store operation mode is READ_ONLY. This complicates performance tuning and troubleshooting, and impacts certain Intelligent Query Processing features that can improve query performance and reduce resource consumption.
The details
column provides the reason for the read-only state. Change Query Store operation mode to READ_WRITE.
ALTER DATABASE CURRENT SET QUERY_STORE (OPERATION_MODE = READ_WRITE);
This tip may only appear on the primary replica.
Monitoring performance by using the Query Store
Best practices with Query Store
Query Store blog articles by Erin Stellato
Query Store capture mode is NONE. This complicates performance tuning and troubleshooting, and impacts certain Intelligent Query Processing features that can improve query performance and reduce resource consumption.
Execute one of the commands below to change Query Store query capture mode to AUTO or ALL depending on requirements, or use a custom query capture mode that captures all queries relevant to your workload.
ALTER DATABASE CURRENT SET QUERY_STORE (QUERY_CAPTURE_MODE = AUTO);
ALTER DATABASE CURRENT SET QUERY_STORE (QUERY_CAPTURE_MODE = ALL);
This tip may only appear on the primary replica.
Monitoring performance by using the Query Store
Best practices with Query Store
Query Store blog articles by Erin Stellato
AUTO_SHRINK is enabled. An occasional shrinking of a database may be needed in response to a one-time data deletion event and/or to reduce excessive allocated but unused space in data files. However, shrinking should not be executed continuously by keeping auto-shrink enabled at all times, because it causes persistent and high resource utilization that will negatively impact workload performance. Auto-shrink should be disabled for the vast majority of databases.
ALTER DATABASE CURRENT SET AUTO_SHRINK OFF;
Considerations for AUTO_SHRINK
Recommendations and guidelines for setting the AUTO_SHRINK database option
Btree indexes have uniqueidentifier (GUID) leading columns. For larger tables, the details
column contains a list of btree indexes with uniqueidentifier
leading columns. Such indexes are subject to low page density and high fragmentation as data is modified. This leads to increased memory and disk space usage and negatively impacts performance.
Avoid this pattern in physical database design, particularly for clustered btree indexes. Rebuild indexes periodically when page density is less than 70%, either manually or using a database maintenance solution.
-- Determine average page density and fragmentation for an index.
-- Replace [object-id-here] and [index-id-here] placeholders with actual values from the details column, or with NULL/default to see all objects and indexes
SELECT OBJECT_SCHEMA_NAME(ips.object_id) AS schema_name,
OBJECT_NAME(ips.object_id) AS object_name,
i.name AS index_name,
i.type_desc AS index_type,
ips.avg_page_space_used_in_percent,
ips.avg_fragmentation_in_percent,
ips.page_count,
ips.alloc_unit_type_desc,
ips.ghost_record_count
FROM sys.dm_db_index_physical_stats(DB_ID(), [object-id-here], [index-id-here], default, 'SAMPLED') AS ips
INNER JOIN sys.indexes AS i
ON ips.object_id = i.object_id
AND
ips.index_id = i.index_id
ORDER BY page_count DESC;
-- Consider rebuilding an index if its average page density is below 70%
ALTER INDEX [index_name] ON [schema_name].[table_name] REBUILD WITH (ONLINE = ON, RESUMABLE = ON, MAXDOP = 8, FILLFACTOR = 100);
Index Architecture and Design Guide
Optimize index maintenance to improve query performance and reduce resource consumption
Database maintenance solution by Ola Hallengren
FORCE_LAST_GOOD_PLAN auto-tuning option is not enabled. Query plan regressions will not be fixed automatically via plan forcing. To avoid performance issues due to unexpected query plan regressions, enable this auto-tuning option.
ALTER DATABASE CURRENT SET AUTOMATIC_TUNING (FORCE_LAST_GOOD_PLAN = ON);
This tip may only appear on the primary replica.
Automatic tuning in Azure SQL Database
Enable automatic tuning in the Azure portal
Used data size within the database is close to maximum configured database size (MAXSIZE). To allow continued data growth, increase maximum database size proactively, or scale up to a service tier or service objective that supports higher maximum database size, or implement data compression, or delete unneeded data.
If used data size is approaching 4 TB, plan on reducing used data size by compressing or deleting data, or switch to the Hyperscale service tier.
-- Determine used space, allocated space, and database MAXSIZE
SELECT SUM(CAST(FILEPROPERTY(name, 'SpaceUsed') AS bigint) * 8 / 1024.) AS space_used_mb,
SUM(CAST(size AS bigint) * 8 / 1024.) AS space_allocated_mb,
CAST(DATABASEPROPERTYEX(DB_NAME(), 'MaxSizeInBytes') AS bigint) / 1024. / 1024 AS max_size_mb
FROM sys.database_files
WHERE type_desc = 'ROWS';
-- Increase MAXSIZE to 4 TB
ALTER DATABASE [database_name] MODIFY (MAXSIZE = 4096 GB); -- adjust database name and MAXSIZE value according to requirements
Manage file space for databases in Azure SQL Database
What happens when database storage limit is reached
Storage size allocated for data files is close to maximum configured database size (MAXSIZE). If used size is not close to MAXSIZE and significant data growth is not expected, consider shrinking data files to reduce allocated size. This is particularly relevant for databases in elastic pools when the total allocated storage size for the pool is close to pool storage limit.
Note that data file shrink is a long-running and resource intensive process that should not be executed on a regular basis. Shrink will also increase index fragmentation, which may impact workloads using large readaheads in index scans.
Prior to shrinking, if page density is low (below 70%), rebuild indexes.
-- Determine used space, allocated space, and database MAXSIZE
SELECT SUM(CAST(FILEPROPERTY(name, 'SpaceUsed') AS bigint) * 8 / 1024.) AS space_used_mb,
SUM(CAST(size AS bigint) * 8 / 1024.) AS space_allocated_mb,
CAST(DATABASEPROPERTYEX(DB_NAME(), 'MaxSizeInBytes') AS bigint) / 1024. / 1024 AS max_size_mb
FROM sys.database_files
WHERE type_desc = 'ROWS';
-- Determine average page density and fragmentation for an index.
-- Replace [object-id-here] and [index-id-here] placeholders with actual values from the details column, or with NULL/default to see all objects and indexes
SELECT OBJECT_SCHEMA_NAME(ips.object_id) AS schema_name,
OBJECT_NAME(ips.object_id) AS object_name,
i.name AS index_name,
i.type_desc AS index_type,
ips.avg_page_space_used_in_percent,
ips.avg_fragmentation_in_percent,
ips.page_count,
ips.alloc_unit_type_desc,
ips.ghost_record_count
FROM sys.dm_db_index_physical_stats(DB_ID(), [object-id-here], [index-id-here], default, 'SAMPLED') AS ips
INNER JOIN sys.indexes AS i
ON ips.object_id = i.object_id
AND
ips.index_id = i.index_id
ORDER BY page_count DESC;
-- Consider rebuilding an index if its average page density is below 70%
ALTER INDEX [index_name] ON [schema_name].[table_name] REBUILD WITH (ONLINE = ON, RESUMABLE = ON, MAXDOP = 8);
-- First, attempt to truncate to release any space at the end of the file
DBCC SHRINKFILE (1, TRUNCATEONLY); -- change file_id if needed
-- If allocated space is still too large, shrink the file. This takes longer.
DBCC SHRINKFILE (1, 1024); -- change file_id if needed, change target size from 1024 (1 GB) if needed
Manage file space for databases in Azure SQL Database
Storage size allocated for data files is much larger than used data size. If significant data growth is not expected, consider shrinking data files to reduce allocated size. This is particularly relevant for databases in elastic pools when the total allocated storage size for the pool is close to pool storage limit.
Note that data file shrink is a long-running and resource intensive process that should not be used on a regular basis. Shrink will also increase index fragmentation, which may impact workloads using large readaheads in index scans.
Prior to shrinking, if page density is low (below 70%), rebuild indexes.
-- Determine used space, allocated space, and database MAXSIZE
SELECT SUM(CAST(FILEPROPERTY(name, 'SpaceUsed') AS bigint) * 8 / 1024.) AS space_used_mb,
SUM(CAST(size AS bigint) * 8 / 1024.) AS space_allocated_mb,
CAST(DATABASEPROPERTYEX(DB_NAME(), 'MaxSizeInBytes') AS bigint) / 1024. / 1024 AS max_size_mb
FROM sys.database_files
WHERE type_desc = 'ROWS';
-- Determine average page density and fragmentation for an index.
-- Replace [object-id-here] and [index-id-here] placeholders with actual values from the details column, or with NULL/default to see all objects and indexes
SELECT OBJECT_SCHEMA_NAME(ips.object_id) AS schema_name,
OBJECT_NAME(ips.object_id) AS object_name,
i.name AS index_name,
i.type_desc AS index_type,
ips.avg_page_space_used_in_percent,
ips.avg_fragmentation_in_percent,
ips.page_count,
ips.alloc_unit_type_desc,
ips.ghost_record_count
FROM sys.dm_db_index_physical_stats(DB_ID(), [object-id-here], [index-id-here], default, 'SAMPLED') AS ips
INNER JOIN sys.indexes AS i
ON ips.object_id = i.object_id
AND
ips.index_id = i.index_id
ORDER BY page_count DESC;
-- Consider rebuilding an index if its average page density is below 70%
ALTER INDEX [index_name] ON [schema_name].[table_name] REBUILD WITH (ONLINE = ON, RESUMABLE = ON, MAXDOP = 8);
-- First, attempt to truncate to release any space at the end of the file
DBCC SHRINKFILE (1, TRUNCATEONLY); -- change file_id if needed
-- If allocated space is still too large, shrink the file. This takes longer.
DBCC SHRINKFILE (1, 1024); -- change file_id if needed, change target size from 1024 (1 GB) if needed
Manage file space for databases in Azure SQL Database
Significant CPU throttling has recently occurred, as noted in the details
column. If workload performance has been inadequate during that time, tune query workload to consume less CPU, or scale up to a service objective with more CPU capacity, or both. To find queries that have consumed the most CPU in a time interval, review top queries ranked high by cpu time.
CPU throttling is common on lower and resource-constrained service objectives, e.g. Basic, S0, S1, etc.
What happens when CPU resource limits are reached
Out of memory errors have recently occurred, as noted in the details
column. Tune query workload to consume less memory, or scale up to a service objective with more memory, or both. To find queries that have consumed the most memory in a time interval, review top queries ranked high by maximum query memory used.
What happens when memory resource limits are reached
Memory grant waits and/or timeouts have recently occurred. The data in the details
column notes the number of recent requests for memory grants that had to wait (manifested as RESOURCE_SEMAPHORE
waits), and the number of such requests that timed out waiting. To find queries that have consumed the most memory in a time interval, review top queries ranked high by maximum query memory used.
Tune query workload to use smaller memory grants, or reduce the number of queries running concurrently, or scale up to a service objective with more memory.
What happens when memory resource limits are reached
Nonclustered indexes with low reads found. The details
column contains a list of non-unique nonclustered indexes where the number of index read operations is much less than the number of index write (update) operations.
As data changes, indexes must be updated, which requires CPU and IO resources and slows down DML statements. The resource overhead of updating indexes that have few reads may outweigh their benefit, if any. This overhead is particularly noticeable when there are many indexes with low reads in the database, a phenomenon known as "over-indexed" databases.
If the data in details
is for a sufficiently long period during which all critical workloads have run, drop these indexes. Then, monitor performance to validate that no significant negative impact has occurred. If performance regression is unacceptable, recreate some or all of the dropped indexes.
When running query workloads on readable secondary replicas, use the sample query below to collect index read and write statistics from all replicas and add them up for each index. An index that has low reads on the primary replica may have high reads on readable replicas, and may be required for that reason.
-- Index read (seeks, scans, lookups) and write (updates) statistics for each nonclustered index not used for constraint enforcement
SELECT QUOTENAME(OBJECT_SCHEMA_NAME(o.object_id)) AS schema_name,
QUOTENAME(o.name) AS object_name,
QUOTENAME(i.name) AS index_name,
ius.user_seeks,
ius.user_scans,
ius.user_lookups,
ius.user_updates
FROM sys.dm_db_index_usage_stats AS ius
INNER JOIN sys.indexes AS i
ON ius.object_id = i.object_id
AND
ius.index_id = i.index_id
INNER JOIN sys.objects AS o
ON i.object_id = o.object_id
AND
ius.object_id = o.object_id
WHERE ius.database_id = DB_ID()
AND
i.type_desc = 'NONCLUSTERED'
AND
i.is_primary_key = 0
AND
i.is_unique_constraint = 0
AND
i.is_unique = 0
AND
o.is_ms_shipped = 0;
Index Architecture and Design Guide
ROW or PAGE data compression opportunities found. The details
column contains a list of partition ranges for each object and index where data compression may be beneficial, based on recent workload sampling and heuristics. The current and suggested new data compression types are shown. The list is sorted with the largest objects first, letting you focus on compressing them first to maximize space and memory savings. The impact of compressing smaller objects is proportionally smaller. However, consider future growth when deciding what to compress. It is easier to compress a table while it is small than do it once it has become large.
To improve tip accuracy, obtain this result while a representative workload is running, or shortly thereafter.
In the following example of data in the details
column, the suggested data compression is ROW
for the first 5 partitions, and PAGE
for the last 5 partitions of the clustered index in the usertable
table. The total size of all partitions in each range, and a breakdown of that size by storage type are included. Note that row overflow and LOB storage is not compressible.
schema: [dbo], object: [usertable], index: [pk_usertable], index type: CLUSTERED, object size (MB): 11,535.46, partition range: 1-5, partition range total size (MB): 5,319.10 (in-row: 5,319.10, row overflow: 0.00, LOB: 0.00), present compression type: ROW, suggested compression type: ROW
schema: [dbo], object: [usertable], index: [pk_usertable], index type: CLUSTERED, object size (MB): 11,535.46, partition range: 6-10, partition range total size (MB): 6,216.36 (in-row: 6,216.36, row overflow: 0.00, LOB: 0.00), present compression type: ROW, suggested compression type: PAGE
ROW compression adds minimal CPU utilization while providing substantial storage and memory savings. For most workloads, ROW compression should be enabled by default for new tables and indexes.
PAGE compression adds higher CPU utilization, but also provides higher storage and memory savings.
-- Create an index with ROW compression
CREATE INDEX ix_table1_1 ON dbo.table1 (column1, column2) WITH (ONLINE=ON, RESUMABLE=ON, DATA_COMPRESSION=ROW);
-- Rebuild an index and use PAGE compression
ALTER INDEX ix_table1_2 ON dbo.table1 REBUILD WITH (ONLINE=ON, RESUMABLE=ON, DATA_COMPRESSION=PAGE);
Additional analysis and testing based on information in provided links may be required to select data compression types optimally. Any data compression requires additional CPU processing, thus testing is particularly important when low query latency must be maintained in transactional workloads.
This tip may only appear on the primary replica.
Data Compression: Strategy, Capacity Planning and Best Practices
There are recent occurrences of transaction log IO (aka log rate) approaching the limit of the service objective, as noted in the details
column. To improve performance of bulk data modifications including data loading, consider tuning the workload to reduce log IO, or scale up to a service objective with a higher log IO limit.
Additional solutions and workarounds include:
- Using data compression (ROW, PAGE, columnstore) to reduce effective log IO;
- Loading transient data to be further transformed/aggregated into the
tempdb
database (e.g. using global temporary tables), or into non-durable memory-optimized tables. If using global temporary tables (e.g.##tablename
), set theGLOBAL_TEMPORARY_TABLE_AUTO_DROP
database-scoped configuration toOFF
, to retain global temporary tables until database engine restarts, or until they are dropped explicitly.
This tip may only appear on the primary replica.
Transaction log rate governance
A significant part of plan cache is occupied by single-use plans, which have never been reused by query executions and therefore use memory unnecessarily.
To avoid caching single-use plans for a database and reduce memory consumption, enable the OPTIMIZE_FOR_AD_HOC_WORKLOADS
database-scoped configuration for each database where single-use plans occupy a large amount of memory, as noted in the details
column. Review application code to ensure that all parameter data type declarations match table column data types exactly, including size, precision, and scale of the column, and are not derived from each value assigned to the parameter.
ALTER DATABASE SCOPED CONFIGURATION SET OPTIMIZE_FOR_AD_HOC_WORKLOADS = ON;
ALTER DATABASE SCOPED CONFIGURATION
Queries could benefit from indexes that do not currently exist. The details
column includes data to help create new indexes, based on metadata collected since database engine startup.
For more accurate indexing recommendations based on longer term historical data and including CREATE INDEX
and DROP INDEX
statements, see performance recommendations for this database in Azure portal, or use the Get-AzSqlDatabaseRecommendedAction PowerShell cmdlet.
Index Architecture and Design Guide
Redo queue on a secondary readable replica is large. Queries running on this replica may experience significant data latency relative to the primary replica. A large redo queue may also cause longer failovers if the replica is used as the failover target.
Common causes of a large redo queue are CPU and/or IO resource contention on the secondary replica, or blocking of the redo thread due to long-running queries on the secondary replica and concurrent DDL operations on the primary replica.
Tune/reduce workload on the secondary replica to use less resources, and/or increase the service objective. Avoid DDL operations such as schema changes, index rebuilds, or statistics updates on the primary replica while there are long-running queries executing on the secondary replica.
This tip is produced only on readable secondary replicas, including non-Hyperscale read scale-out replicas and geo-replicas.
Monitoring and troubleshooting read-only replicas
Data IOPS are close to workload group limit. If workload performance for this database is insufficient and PAGEIOLATCH*
waits are a top wait type, either tune the workload to generate less data IO, or scale to a service objective with a higher data IOPS limit.
Workload group defines resource governance limits for a single database, or for a database in an elastic pool.
Resource limits for single databases using the vCore purchasing model
Resource limits for single databases using the DTU purchasing model
Compare the DTU-based service tiers
Workload group IO governance impact is significant. If workload performance for this database is insufficient and PAGEIOLATCH*
waits are a top wait type, either tune the workload to generate less data IO, or scale to a service objective with a higher data IOPS limit.
Workload group defines resource governance limits for a single database, or for a database in an elastic pool.
Resource limits for single databases using the vCore purchasing model
Resource limits for single databases using the DTU purchasing model
Compare the DTU-based service tiers
Data IOPS are close to resource pool limit. If workload performance for one or more databases in this elastic pool is insufficient and PAGEIOLATCH*
waits are a top wait type, either tune the workload to generate less data IO, or scale the elastic pool to a service objective with a higher data IOPS limit, or rebalance databases across multiple elastic pools to reduce cumulative data IO in this elastic pool.
Resource pool defines resource governance limits for an elastic pool.
Resource limits for elastic pools using the vCore purchasing model
Resources limits for elastic pools using the DTU purchasing model
Compare the DTU-based service tiers
Resource pool IO governance impact is significant. If workload performance for one or more databases in this elastic pool is insufficient and PAGEIOLATCH*
waits are a top wait type, either tune the workload to generate less data IO, or scale the elastic pool to a service objective with a higher data IOPS limit, or rebalance databases across multiple elastic pools to reduce cumulative data IO in this elastic pool.
Resource pool defines resource governance limits for an elastic pool.
Resource limits for elastic pools using the vCore purchasing model
Resources limits for elastic pools using the DTU purchasing model
Compare the DTU-based service tiers
Persistent Version Store (PVS) size is large. PVS size, used and allocated data size, maximum database size, and other relevant information is available in the details
column.
PVS is a part of the user database. It is used to store row versions when Accelerated Database Recovery (ADR), Read-Committed Snapshot Isolation (RCSI), or Snapshot Isolation (SI) are enabled for the database. Large PVS size is usually caused by long-running active transactions. It may also occur due to a large number of aborted transactions that haven't yet been cleaned up by the PVS cleaner background thread. In extreme cases, a large PVS may cause the database to run out of space until PVS space is cleaned up.
Review application patterns and ad-hoc query patterns to avoid long-running transactions, particularly when implicit transactions are used. Make sure to explicitly roll back open transactions after query timeouts and other errors.
If there are any transactions that have been unintentionally left open for a long time and caused PVS to become large, either commit them, or roll them back. Alternatively, consider killing idle sessions with long-running transactions to allow PVS cleanup to occur.
-- Transactions sorted by duration
SELECT st.session_id,
dt.database_transaction_begin_time,
DATEDIFF(second, dt.database_transaction_begin_time, CURRENT_TIMESTAMP) AS transaction_duration_seconds,
dt.database_transaction_log_bytes_used,
dt.database_transaction_log_bytes_reserved,
st.is_user_transaction,
st.open_transaction_count,
ib.event_type,
ib.parameters,
ib.event_info
FROM sys.dm_tran_database_transactions AS dt
INNER JOIN sys.dm_tran_session_transactions AS st
ON dt.transaction_id = st.transaction_id
OUTER APPLY sys.dm_exec_input_buffer(st.session_id, default) AS ib
WHERE dt.database_id = DB_ID()
ORDER BY transaction_duration_seconds DESC;
-- Kill the session with a transaction open for a long time
KILL 4242; -- replace 4242 with the actual session_id value
Manage file space for databases in Azure SQL Database
Paused resumable index maintenance operations found. The list of paused resumable index operations is included in the details
column. Paused resumable index operations will be aborted automatically after a period of time specified by the PAUSED_RESUMABLE_INDEX_ABORT_DURATION_MINUTES
database-scoped configuration.
Using one of sample queries below, either resume these index operations, or, if not planning to resume, abort them explicitly to avoid unnecessary performance impact on the DML operations changing these indexes between now and the time when auto-abort occurs.
ALTER INDEX [index_name] ON [schema_name].[object_name] RESUME; -- change schema, object, and index names
ALTER INDEX [index_name] ON [schema_name].[object_name] ABORT; -- change schema, object, and index names
This tip may only appear on the primary replica.
sys.index_resumable_operations
Optimize index maintenance to improve query performance and reduce resource consumption
A clustered columnstore index (CCI) can be created on one or more tables to improve query and data load performance and drastically reduce storage space and memory consumption by the table.
CCI indexes are most useful in analytical/reporting workloads where queries use large scans. CCI indexes are not recommended in OLTP workloads, or when queries predominantly use index seeks, lookups, and small range scans. They are also not recommended when rows in the table are frequently updated, or when less frequent but large updates occur.
The details
column provides a list of CCI candidate tables. Index usage statistics for the recent query workload on each table are included to help determine if CCI will benefit queries referencing this table.
To improve tip accuracy, obtain this result while a representative workload is running, or shortly thereafter.
This tip may only appear on the primary replica.
Columnstore indexes - Design guidance
Columnstore indexes - Query performance
Columnstore indexes - Data loading guidance
Some geo-replication links may be unhealthy. The details
column describes geo-replication link health, including data propagation lag (reported on the primary geo-replica only) and replica state.
If a geo-replication link becomes unhealthy during high write load on the primary geo-replica, then the service objective of the secondary geo-replica may need to be increased to sustain data propagation rate and avoid geo-replica unavailability.
For a recently created geo-replica, the SEEDING
replication state is expected. In all other cases, replication state for a healthy geo-replica should be CATCH_UP
.
sys.dm_geo_replication_link_status
Last partitions are not empty for some partitioned tables. If the very last partition is not empty, splitting it to create a new partition for new data will be slow and resource-intensive, and will block queries accessing the table for the duration of the split operation.
If new partitions are periodically added to accommodate new data, make sure to keep a buffer of several empty partitions at the end of the partition list. While a single empty partition is sufficient, multiple empty partitions are preferred. They reduce the risk of data getting into the very last partition because of failure to split the last empty partition on time.
The details
column provides the list of last several partitions for each table if at least some of them are not empty. If the very last partition (i.e. the partition number equal to the total count of partitions) is still empty, act before any data is loaded into this partition to split it and create at least one more empty partition at the end of the partition list.
Otherwise, plan and prepare for a time-consuming process of splitting a non-empty partition, to avoid all new data accumulating in the last partition. This may require application downtime.
ALTER PARTITION FUNCTION [function_name] SPLIT RANGE ([new boundary value]);
Partitioned Tables and Indexes
Blog articles by Kendra Little:
How to Fix Lopsided Partitioned Tables
Sliding Window Table Partitioning: What to Decide Before You Automate
To help in performance investigations, top queries for a time interval are summarized in the details
column. This tip is informational in nature, and is produced when any queries have executed in the time interval. On idle databases, top queries may include the queries executed by the get-sqldb-tips.sql
script itself. On readable secondary replicas, this tip reports on queries that executed on the primary.
For each query, identified by its query_hash
value, query_id
, plan_id
, the number of query executions by execution type (regular, aborted, exception) and weighted wait categories are included in details
.
Additionally, each query is ranked by its consumption of the following resource types:
- cpu time
- duration
- execution count
- logical IO reads
- physical IO reads
- maximum query memory used
- log bytes used
- tempdb space used
- parallelism
Lower rank numbers mean higher ranking, e.g. the topmost CPU consuming query has cpu time rank 1.
In the following example, the query with hash 0x5F9043DBAAEBB1FF
is the top consumer of tempdb
space (tempdb used rank: 1
), has two associated query_id
values (66
and one more), and three associated plan_id
values (36
and two more). It successfully executed 31615
times in the time interval. Most waits (~77.4%) were on CPU.
query hash: 0x5F9043DBAAEBB1FF, query_id: 66 (+1), plan_id: 36 (+2), executions: (regular: 31615, aborted: 0, exception: 0), CPU time rank: 4, duration rank: 32, executions rank: 50, logical IO reads rank: 6, physical IO reads rank: 5, max used memory rank: 79, log bytes used rank: 48, tempdb used rank: 1, parallelism rank: 50, weighted wait categories: CPU (0.774) | Latch (0.137) | Memory (0.057) | Unknown (0.028) | Preemptive (0.002) | Buffer IO (0.002)
Depending on the nature of the performance issue, focus on the queries ranked high for the relevant resource type. For example, if troubleshooting high tempdb
space consumption, look at the tempdb used
rank.
To change the number of top ranked queries included for each resource type, change the @QueryStoreTopQueryCount
variable. By default, 2 top queries are included for each resource type.
By default, the time interval used is the last hour. You can change the @QueryStoreIntervalMinutes
variable from 60
to another value to look at shorter or longer intervals. To look at a time interval with specific start and end times, set the @QueryStoreCustomTimeStart
and @QueryStoreCustomTimeEnd
variables to the start and end time of the interval. Note that this tip may not appear if the specified interval is shorter than the configured Query Store resource statistics interval length (INTERVAL_LENGTH_MINUTES
).
For further investigations, use Query Performance Insight in Azure portal, Query Store reports in SSMS, or query sys.query_store_*
views directly.
The query below provides resource consumption details for a given query hash value, as well as associated query_id
and plan_id
values and SQL text of the query.
-- Return query/plan runtime statistics in the specified interval for a given query_hash value from the details column for further investigation
DECLARE @QueryHash binary(8) = 0x24C09AE7EDD849D4; -- replace query hash with the actual value from details
DECLARE @QueryStoreIntervalMinutes int = 60; -- if needed, adjust to look at shorter or longer intervals, or set the next two variables to start/end time
DECLARE @QueryStoreCustomTimeStart datetimeoffset -- = '2021-01-01 00:01 +00:00';
DECLARE @QueryStoreCustomTimeEnd datetimeoffset -- = '2021-12-31 23:59 +00:00';
DECLARE @QueryStoreTimeFrom datetimeoffset = IIF(
(@QueryStoreCustomTimeStart IS NULL OR @QueryStoreCustomTimeEnd IS NULL) AND @QueryStoreIntervalMinutes IS NOT NULL,
DATEADD(minute, -@QueryStoreIntervalMinutes, SYSDATETIMEOFFSET()),
@QueryStoreCustomTimeStart
),
@QueryStoreTimeThru datetimeoffset = IIF(
(@QueryStoreCustomTimeStart IS NULL OR @QueryStoreCustomTimeEnd IS NULL) AND @QueryStoreIntervalMinutes IS NOT NULL,
SYSDATETIMEOFFSET(),
@QueryStoreCustomTimeEnd
);
SELECT q.query_id,
q.context_settings_id,
qt.query_sql_text,
OBJECT_SCHEMA_NAME(q.object_id) AS schema_name,
OBJECT_NAME(q.object_id) AS object_name,
p.plan_id,
TRY_CAST(p.query_plan AS xml) AS query_plan,
SUM(IIF(rs.execution_type_desc = 'Regular', rs.count_executions, 0)) AS count_regular_executions,
SUM(IIF(rs.execution_type_desc = 'Aborted', rs.count_executions, 0)) AS count_aborted_executions,
SUM(IIF(rs.execution_type_desc = 'Exception', rs.count_executions, 0)) AS count_exception_executions,
SUM(rs.avg_cpu_time * rs.count_executions) / SUM(rs.count_executions) AS avg_cpu_time,
SUM(rs.avg_duration * rs.count_executions) / SUM(rs.count_executions) AS avg_duration,
SUM(rs.avg_logical_io_reads * rs.count_executions) / SUM(rs.count_executions) AS avg_logical_io_reads,
SUM(rs.avg_physical_io_reads * rs.count_executions) / SUM(rs.count_executions) AS avg_physical_io_reads,
SUM(rs.avg_query_max_used_memory * rs.count_executions) / SUM(rs.count_executions) AS avg_query_max_used_memory,
SUM(rs.avg_log_bytes_used * rs.count_executions) / SUM(rs.count_executions) AS avg_log_bytes_used,
SUM(rs.avg_tempdb_space_used * rs.count_executions) / SUM(rs.count_executions) AS avg_tempdb_space_used,
SUM(rs.avg_dop * rs.count_executions) / SUM(rs.count_executions) AS avg_dop
FROM sys.query_store_query AS q
INNER JOIN sys.query_store_query_text AS qt
ON q.query_text_id = qt.query_text_id
INNER JOIN sys.query_store_plan AS p
ON q.query_id = p.query_id
INNER JOIN sys.query_store_runtime_stats AS rs
ON p.plan_id = rs.plan_id
INNER JOIN sys.query_store_runtime_stats_interval AS rsi
ON rs.runtime_stats_interval_id = rsi.runtime_stats_interval_id
WHERE q.query_hash = @QueryHash
AND
rsi.start_time >= @QueryStoreTimeFrom
AND
rsi.start_time <= @QueryStoreTimeThru
GROUP BY q.query_hash,
q.query_id,
q.context_settings_id,
q.object_id,
qt.query_sql_text,
p.plan_id,
p.query_plan
ORDER BY query_id, plan_id
OPTION (RECOMPILE);
Query Performance Insight for Azure SQL Database
Monitoring performance by using the Query Store
Best practices with Query Store
Query Store blog articles by Erin Stellato
Volume storage space allocated for tempdb
data files is close to maximum tempdb
data size. The details
column provides tempdb
allocated, used, and maximum data sizes. If additional tempdb
space is needed, consider scaling to a service objective that provides a larger tempdb
.
In the Premium and Business Critical service tiers, tempdb
shares local SSD space with user databases. For large (1-4 TB) databases, large tempdb
allocated space can prevent user database growth and cause out-of-space errors.
As an emergency measure, you can reset tempdb
allocated space to its initial small value by failing over the database or elastic pool using PowerShell or REST API. Use Invoke-AzSqlDatabaseFailover or database failover API for single databases, and Invoke-AzSqlElasticPoolFailover or elastic pool failover API for elastic pools. A failover will abort any running queries, terminate connections, and cause a short period of unavailability for the database or all databases in an elastic pool.
Resource limits for single databases using the vCore purchasing model
Resource limits for elastic pools using the vCore purchasing model
tempdb database in the DTU purchasing model
Queries to monitor tempdb usage
Storage space used in tempdb
data files is close to maximum tempdb
data size. The details
column provides tempdb
allocated, used, and maximum data sizes. Continuing growth of used space may cause tempdb
out-of-space errors. If additional tempdb
space is needed, consider scaling to a service objective that provides a larger tempdb
.
As an emergency measure, you can reduce tempdb
used size to its initial small value by failing over the database or elastic pool using PowerShell or REST API. Use Invoke-AzSqlDatabaseFailover or database failover API for single databases, and Invoke-AzSqlElasticPoolFailover or elastic pool failover API for elastic pools. A failover will abort any running queries, terminate connections, and cause a short period of unavailability for the database or all databases in an elastic pool.
Resource limits for single databases using the vCore purchasing model
Resource limits for elastic pools using the vCore purchasing model
tempdb database in the DTU purchasing model
Queries to monitor tempdb usage
Volume storage space allocated for tempdb
log file is close to maximum tempdb
log size. The details
column provides tempdb
allocated, used, and maximum transaction log sizes. This may be caused by a long-running transaction modifying data in a temporary table, or using tempdb
for internal storage.
As an emergency measure, you can reset tempdb
allocated log size to its initial small value by failing over the database or elastic pool using PowerShell or REST API. Use Invoke-AzSqlDatabaseFailover or database failover API for single databases, and Invoke-AzSqlElasticPoolFailover or elastic pool failover API for elastic pools. A failover will abort any running queries, terminate connections, and cause a short period of unavailability for the database or all databases in an elastic pool. Alternatively, you can kill the sessions with long-running transactions in tempdb
.
-- Transactions in tempdb sorted by duration
SELECT st.session_id,
dt.database_transaction_begin_time,
DATEDIFF(second, dt.database_transaction_begin_time, CURRENT_TIMESTAMP) AS transaction_duration_seconds,
dt.database_transaction_log_bytes_used,
dt.database_transaction_log_bytes_reserved,
st.is_user_transaction,
st.open_transaction_count,
ib.event_type,
ib.parameters,
ib.event_info
FROM sys.dm_tran_database_transactions AS dt
INNER JOIN sys.dm_tran_session_transactions AS st
ON dt.transaction_id = st.transaction_id
OUTER APPLY sys.dm_exec_input_buffer(st.session_id, default) AS ib
WHERE dt.database_id = 2
ORDER BY transaction_duration_seconds DESC;
-- Kill the session with a transaction open for a long time
KILL 4242; -- replace 4242 with the actual session_id value
Queries to monitor tempdb usage
There are recent occurrences of worker utilization approaching the workload group limit of the service objective, as noted in the details
column. Workload group defines resource governance limits for a single database, or for a database in an elastic pool.
If the limit is reached, queries and new connection attempts will fail. Common reasons for high worker utilization include:
- excessive parallelism
- large blocking chains
- the number of concurrently executing queries or concurrent connection attempts exceeding the capacity of the service objective
Tune query workload and reduce/stagger connection attempts, and/or scale up to a service objective with a higher worker limit. To find queries that contribute the most to worker utilization in a time interval, review top queries ranked high by parallelism and/or execution count.
Resource limits for single databases using the vCore purchasing model
Resource limits for single databases using the DTU purchasing model
What happens when worker resource limit is reached
There are recent occurrences of worker utilization approaching the resource pool limit of the service objective, as noted in the details
column. Resource pool defines resource governance limits for an elastic pool.
If the limit is reached, queries and new connection attempts will fail for some or all databases in the elastic pool. Common reasons for high worker utilization include:
- excessive parallelism
- large blocking chains
- the number of concurrently executing queries or concurrent connection attempts exceeding the capacity of the service objective
Tune query workloads and reduce/stagger connection attempts, and/or scale up to a service objective with a higher worker limit. To find queries that contribute the most to worker utilization in an elastic pool in a time interval, review top queries for all databases in the pool starting with the most utilized, ranked high by parallelism and/or execution count. To find the most utilized databases in a pool, query sys.resource_stats.
Resource limits for elastic pools using the vCore purchasing model
Resources limits for elastic pools using the DTU purchasing model
What happens when worker resource limit is reached
sys.elastic_pool_resource_stats
Notable network connectivity events have recently occurred in this database, or other databases in the same elastic pool. This includes network errors, slow login attempts, and abnormally closed connections. Retry logic implemented by an application or a MSSQL network driver can prevent these events from impacting applications.
This tip is informational in nature, and is intended to facilitate network connectivity troubleshooting. The presence of network connectivity events does not necessarily indicate a problem. Some events, while reported by this tip, may be expected in Azure SQL Database. For example, idle sessions are killed after a period of inactivity by design.
The details
column provides a query to view detailed connectivity event data obtained in the latest execution of the script.
To reduce the likelihood of workload-impacting network events, configure the logical server to use the Redirect connection policy.
High CPU utilization by the database engine instance was sustained for a significant period of time, as noted in the details
column. If the database is in an elastic pool, this refers to cumulative CPU utilization by all databases in the same pool.
If workload performance has been inadequate during that time, tune query workload to consume less CPU, or scale up to a service objective with more CPU capacity, or both. To find queries that have consumed the most CPU in a time interval, review top queries for the database, or for the most utilized databases in an elastic pool, ranked high by cpu time.
What happens when CPU resource limits are reached
Identify query performance issues
Application and database tuning guidance
Detectable database performance patterns
sys.elastic_pool_resource_stats
Some statistics may be out of date. This may cause poor query performance due to suboptimal query plans. The list of potentially out of date statistics on user objects and some system objects is included in the details
column.
Update statistics by executing sp_updatestats
, UPDATE STATISTICS
, or a database maintenance solution, and consider establishing a periodic maintenance procedure to keep statistics up to date.
EXEC sys.sp_updatestats;
Database maintenance solution by Ola Hallengren
Many tables do not have any indexes. This tip appears when a substantial percentage of all tables in the database (excluding very small tables) do not have any indexes. This often causes poor query performance due to suboptimal data access. The list of such tables is included in the details
column.
You can enable the CREATE INDEX auto-tuning option for the database to have the system create missing indexes automatically to improve query performance in recent workloads that used these tables.
If you prefer to create indexes manually, review the Index Architecture and Design Guide to learn how to create efficient indexes. If the Queries could benefit from indexes that do not currently exist tip also appears, you can derive new indexes from the data in the details
column of that tip.
Enable automatic tuning in the Azure portal to monitor queries and improve workload performance
Significant lock blocking has recently occurred. This tip appears when at least one of the following conditions has recently occurred:
- The time spent by queries waiting on locks during a time interval exceeded a threshold
- The number of observed blocked tasks exceeded a threshold
The details
column includes a list of recent time intervals when one or both of these conditions were occurring. Each entry includes:
- Interval start time,
- Interval end time,
- Interval duration,
- Total time user queries spent waiting on locks during the interval,
- The minimum number of observed blocked tasks during the interval,
- The maximum number of observed blocked tasks during the interval,
- The total number of lock waits during the interval.
The number of blocked tasks is recorded by the database engine using point-in-time snapshots every 15-20 seconds. The actual number of blocked tasks between two snapshots may exceed the reported number. If the minimum number of blocked tasks is greater than zero for long time periods of time, it usually indicates workload-impacting long-term blocking.
Short-lived lock blocking is expected in many MSSQL workloads. Locks are used to guarantee the ACID properties of transactions. However, unnecessary long-term blocking can reduce workload throughput, and can cause query timeouts and application errors. Common reasons for longer workload-impacting blocking include:
- Manual (ad-hoc) transactions not committed or rolled back for a long time. This is particularly common when implicit transactions are used.
- Not using the Read Committed Snapshot Isolation (RCSI) database option.
- Using unnecessarily large transactions, i.e. loading or modifying a large dataset in a single transaction instead of breaking it into smaller batches.
- Not handling query execution timeouts properly in application code. Application code that opens an explicit or implicit transaction and then receives a query timeout error should attempt to immediately roll back the transaction, or set the
XACT_ABORT
option toON
when opening new connections to the database. Otherwise, a transaction could remain open for a long time until the connection is terminated or reused, holding locks and blocking other sessions. - Poorly written queries or suboptimal query plans causing DML (
INSERT
,UPDATE
,DELETE
,MERGE
) statements to execute for a long time while holding locks. - Long-running DDL operations such as offline index rebuilds or schema changes executed concurrently with other query workloads.
- Unnecessarily mixing DML and DDL statements in the same transaction.
- Unnecessarily using
REPEATABLE READ
andSERIALIZABLE
transaction isolation levels (e.g. setting them as default in application code), or locking hints such asHOLDLOCK
,PAGLOCK
,READCOMMITTEDLOCK
,REPEATABLEREAD
,SERIALIZABLE
,TABLOCK
,TABLOCKX
,UPDLOCK
,XLOCK
. - Unintended lock escalation. If necessary, lock escalation can be disabled per table at the expense of using more memory for locks.
You can use the sp_whoisactive
stored procedure to observe currently executing queries and see whether they are blocked waiting on locks. In Azure SQL Database, create sp_whoisactive
in the user database, not in the master
database.
For suitable new and existing workloads, evaluate In-Memory OLTP. This is a lock-free database engine technology not subject to lock blocking issues.
-- View currently executing queries and the locks they are holding or waiting for
EXEC sp_whoisactive @get_locks = 1;
-- Transactions sorted by duration
SELECT st.session_id,
dt.database_transaction_begin_time,
DATEDIFF(second, dt.database_transaction_begin_time, CURRENT_TIMESTAMP) AS transaction_duration_seconds,
dt.database_transaction_log_bytes_used,
dt.database_transaction_log_bytes_reserved,
st.is_user_transaction,
st.open_transaction_count,
ib.event_type,
ib.parameters,
ib.event_info
FROM sys.dm_tran_database_transactions AS dt
INNER JOIN sys.dm_tran_session_transactions AS st
ON dt.transaction_id = st.transaction_id
OUTER APPLY sys.dm_exec_input_buffer(st.session_id, default) AS ib
WHERE dt.database_id = DB_ID()
ORDER BY transaction_duration_seconds DESC;
-- Kill the session if it has a transaction that is open for a long time and is blocking other queries
KILL 4242; -- replace 4242 with the actual session_id value
-- Disable lock escalation for a table
-- Only consider this if observing object-level IX or X locks that block other queries
ALTER TABLE [table-name] SET (LOCK_ESCALATION = DISABLE);
Understand and resolve Azure SQL Database blocking problems
Transaction Locking and Row Versioning Guide
The number of recent query optimizations is high. This tip appears when a significant portion of recent query requests required the query optimizer to create a new execution plan, instead of reusing an existing plan from plan cache. The list of time intervals with a high number of query plan optimizations is included in the details
column.
Query plan optimization requires CPU and memory and delays query execution. If a plan for a query is already cached, resource utilization increase and execution delay is usually avoided by reusing the plan.
At times, query plans need to be recompiled to ensure query correctness or improve performance by creating a more optimal plan. At steady state, such recompilations and optimizations are infrequent and are unlikely to produce this tip.
If many query optimizations occur continuously, it may be caused by one of the following:
- Using literal values instead of query parameters in queries, for example
WHERE name = 'thing'
instead ofWHERE name = @parameter
. The database engine may automatically parameterize some queries, i.e. replace literal values with parameters, to enable plan reuse. This is known as simple parameterization. You can switch from simple parameterization (default) to forced parameterization to make automatic parameterization more aggressive, though this may result in less optimal query plans. While automatic parameterization (either simple or forced) helps reduce optimizations, explicit parameterization is the preferred way to achieve that.
-- Enable forced parameterization for a database
ALTER DATABASE CURRENT SET PARAMETERIZATION FORCED;
-- Enable simple parameterization for a database
ALTER DATABASE CURRENT SET PARAMETERIZATION SIMPLE;
- Not declaring a parameter data type to fully and exactly match the data type of the corresponding column in the database. The database engine may not always reuse a plan when there is a mismatch in parameter data type, even if the SQL text of the query is the same. In that case, a new plan is created and cached. Review application code to ensure that all parameter data type declarations match table column data types exactly, including size, precision, and scale of the column, and are not derived from each value assigned to the parameter.
For example, using SqlClient in C#:
//declare a parameter for an nvarchar(10) column, specifying Size explicitly
SqlParameter parameter = new SqlParameter("@Name", SqlDbType.NVarChar);
parameter.Value = "thing";
parameter.Size = 10;
//declare a parameter for a decimal(8,4) column, specifying Precision and Scale explicitly
SqlParameter parameter = new SqlParameter("@Amount", SqlDbType.Decimal);
parameter.Value = 1234.5678;
parameter.Precision = 8;
parameter.Scale = 4;
It is normal and expected to see a high number of compilations or recompilations, and a temporary increase in resource usage, for a short time in the following cases:
- After a database or an elastic pool has been restarted (failed over) and before plans for repeating queries in the workload have been cached,
- After plan cache has been flushed, for example because of memory pressure, or with an explicit command,
- When a new query workload starts running, for example after releasing a new version of the application.
The following query returns top queries compiled in a time interval, ordered by compilation CPU cost.
-- Queries compiled in a time interval with a potentially significant compilation CPU cost
DECLARE @QueryStoreIntervalMinutes int = 60; -- if needed, adjust to look at shorter or longer intervals, or set the next two variables to start/end time
DECLARE @QueryStoreCustomTimeStart datetimeoffset -- = '2021-01-01 00:01 +00:00';
DECLARE @QueryStoreCustomTimeEnd datetimeoffset -- = '2021-12-31 23:59 +00:00';
DECLARE @QueryStoreTimeFrom datetimeoffset = IIF(
(@QueryStoreCustomTimeStart IS NULL OR @QueryStoreCustomTimeEnd IS NULL) AND @QueryStoreIntervalMinutes IS NOT NULL,
DATEADD(minute, -@QueryStoreIntervalMinutes, SYSDATETIMEOFFSET()),
@QueryStoreCustomTimeStart
),
@QueryStoreTimeThru datetimeoffset = IIF(
(@QueryStoreCustomTimeStart IS NULL OR @QueryStoreCustomTimeEnd IS NULL) AND @QueryStoreIntervalMinutes IS NOT NULL,
SYSDATETIMEOFFSET(),
@QueryStoreCustomTimeEnd
);
SELECT TOP (50) q.query_hash,
MIN(q.query_id) AS sample_query_id,
COUNT(DISTINCT(q.query_id)) AS count_distinct_query_id,
MIN(qt.query_sql_text) AS sample_query_text,
COUNT(DISTINCT(qt.query_text_id)) AS count_query_text_variants,
OBJECT_SCHEMA_NAME(q.object_id) AS schema_name,
OBJECT_NAME(q.object_id) AS object_name,
SUM(q.count_compiles) AS count_compiles,
SUM((q.avg_bind_cpu_time + q.avg_optimize_cpu_time) * q.count_compiles) AS total_compilation_cpu_time,
MIN(initial_compile_start_time) AS initial_compile_start_time,
MAX(last_compile_start_time) AS last_compile_start_time,
MAX(last_execution_time) AS last_execution_time
FROM sys.query_store_query AS q
INNER JOIN sys.query_store_query_text AS qt
ON q.query_text_id = qt.query_text_id
WHERE q.last_compile_start_time >= @QueryStoreTimeFrom
AND
q.last_compile_start_time <= @QueryStoreTimeThru
GROUP BY q.query_hash,
q.object_id
HAVING SUM(q.count_compiles) > 1
ORDER BY total_compilation_cpu_time DESC
OPTION (RECOMPILE);
Note that if Query Store query capture mode is not ALL
, the query above may not return all queries that have contributed to a high number of optimizations.
-- Set query capture mode to ALL for troubleshooting purposes
ALTER DATABASE CURRENT SET QUERY_STORE (QUERY_CAPTURE_MODE = ALL);
-- Reset query capture mode to the default AUTO setting
ALTER DATABASE CURRENT SET QUERY_STORE (QUERY_CAPTURE_MODE = AUTO);
Execution Plan Caching and Reuse
Plan Caching and Recompilation whitepaper
Factors that affect query plan changes
SQL Compilations/sec is not what you think it is - blog article by Fabiano Amorim
Row locks or page locks are disabled for some indexes. The list of indexes that have row or page locks disabled is provided in the details
column.
Disabling row or page locks can lead to concurrency problems (blocking and deadlocks), and cause unnecessary CPU and memory utilization. When both row and page locks are disabled, the use of the index in a query will require an object-level lock, blocking other queries from reading or modifying any data in the table.
An index cannot be reorganized when page locks are disabled.
With rare exceptions, both row and page locks should be enabled. The database engine will choose the appropriate lock type automatically to reduce blocking and resource utilization.
-- Enable both row and page locks for an index
ALTER INDEX [index-name-here] ON [schema-name-here].[table-name-here] SET (ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON);
Disabling ROW and PAGE Level Locks in SQL Server - blog article by Klaus Aschenbrenner
Allocated local storage is close to maximum local storage. Local storage usage details are provided in the details
column.
This tip appears for databases and elastic pools in the Premium and Business Critical service tiers when their total local storage consumption by data files and transaction log files, including the tempdb
database, is approaching the maximum storage limit.
If local storage consumed by tempdb
is large, as an emergency measure, you can reset tempdb
allocated space to its initial small value by failing over the database or elastic pool using PowerShell or REST API. Use Invoke-AzSqlDatabaseFailover or database failover API for single databases, and Invoke-AzSqlElasticPoolFailover or elastic pool failover API for elastic pools. A failover will abort any running queries, terminate connections, and cause a short period of unavailability for the database or all databases in an elastic pool.
If local storage is consumed by a large transaction log for a user database, you can shrink the transaction log file, or wait for the periodic automatic shrink to occur.
Resource limits for single databases using the vCore purchasing model
Resource limits for elastic pools using the vCore purchasing model
Manage file space for databases in Azure SQL Database
What happens when database storage limit is reached
Column collation does not match database collation. The list of objects where column collation does not match database collation is provided in the details
column.
A column that does not have a collation explicitly specified inherits the database collation. Referencing columns with different collations in the same expression can result in collation conflict errors. Explicitly using the COLLATE clause to cast one of the conflicting collations to the other collation may be used as a workaround, though this may prevent the use of index seek and result in worse performance than when there is no collation conflict.
Scenarios where individual columns need to use a collation different from the database collation are infrequent. Unless there is a clear need to customize column collation, omit the COLLATE clause when defining table columns.
-- An example of a collation conflict and a workaround
SELECT DATABASEPROPERTYEX(DB_NAME(), 'Collation') AS DatabaseCollation;
-- returns SQL_Latin1_General_CP1_CI_AS
CREATE TABLE T1
(
String varchar(10) COLLATE Latin1_General_100_BIN2_UTF8 -- a collation different from the database collation is specified
);
CREATE TABLE T2
(
String varchar(10) -- default database collation is used
);
SELECT 1
FROM T1
INNER JOIN T2
ON T1.String = T2.String
;
/*
Fails with:
Msg 468, Level 16, State 9, Line 14
Cannot resolve the collation conflict between "SQL_Latin1_General_CP1_CI_AS" and "Latin1_General_100_BIN2_UTF8" in the equal to operation.
*/
SELECT 1
FROM T1
INNER JOIN T2
ON T1.String COLLATE DATABASE_DEFAULT = T2.String -- work around by explicitly casting T1.String to database collation
;
Indexes with excessively large keys found. The list of indexes with excessively large keys is provided in the details
column.
The database engine has a limit on the maximum length of data in an index key. For clustered indexes, the limit is 900 bytes; for nonclustered indexes, it is 1700 bytes. If the key of an index contains variable length columns, then the total of maximum column lengths of index key columns can exceed these values. However, attempts to insert or update a row that would result in exceeding the limits will fail.
It is not recommended, and in most cases unnecessary to create indexes with large index keys. This is particularly relevant for the clustered index key, which is also stored in every nonclustered index on the same table, increasing storage size and storage IO. If many columns need to be present in a nonclustered index, use the INCLUDE
clause.
Index Architecture and Design Guide
Create indexes with included columns
Disabled indexes found. The list of disabled indexes is provided in the details
column.
Disabled indexes consume data and backup storage, but are not used or updated by the database engine. Any primary key or unique constraints dependent on a disabled index are also disabled. If the clustered index is disabled, the table becomes inaccessible.
To re-enable a disabled index, it must be rebuilt. This is similar to recreating the index. For large disabled indexes, index rebuild may consume significant time and resources.
If a disabled index is no longer required, it should be dropped.
-- Rebuild a disabled index
ALTER INDEX [disabled_index_name] ON [schema_name].[table_name] REBUILD;
-- Drop a disabled index if it exists
DROP INDEX IF EXISTS [disabled_index_name] ON [schema_name].[table_name];
Disable Indexes and Constraints
Indexes with low fill factor found. The list of indexes with low fill factor is provided in the details
column.
Fill factor is an index property that determines how much free space to leave on a btree index leaf page when adding rows to a page. By default, fill factor is set to 100, which means that pages are filled to capacity. This is the recommended value. Setting fill factor to 100 improves performance by storing more data on each page. This reduces storage IO and increases the amount of data cached in memory.
In certain specific scenarios, such as frequently modified indexes with the leading column containing non-sequential GUID values, lowering the fill factor to 70-90% helps reduce the frequency of performance-impacting page splits.
Unless there is a clear need to change the fill factor from the default of 100 (or 0, which is equivalent to 100 in this context), rebuild indexes that have low fill factor, setting fill factor to 100.
Over time, as rows on a page are updated and deleted, and as pages are split, the percentage of each page used by data (known as page density) may become substantially lower than the fill factor. If page density is low, follow the index maintenance strategy to determine if indexes should be reorganized or rebuilt.
-- Rebuild an index and set fill factor to 100
ALTER INDEX [index_name] ON [schema_name].[table_name] REBUILD WITH (ONLINE = ON, RESUMABLE = ON, MAXDOP = 8, FILLFACTOR = 100);
-- Determine average page density and fragmentation for an index.
-- Replace [object-id-here] and [index-id-here] placeholders with actual values from the details column, or with NULL/default to see all objects and indexes
SELECT OBJECT_SCHEMA_NAME(ips.object_id) AS schema_name,
OBJECT_NAME(ips.object_id) AS object_name,
i.name AS index_name,
i.type_desc AS index_type,
ips.avg_page_space_used_in_percent,
ips.avg_fragmentation_in_percent,
ips.page_count,
ips.alloc_unit_type_desc,
ips.ghost_record_count
FROM sys.dm_db_index_physical_stats(DB_ID(), [object-id-here], [index-id-here], default, 'SAMPLED') AS ips
INNER JOIN sys.indexes AS i
ON ips.object_id = i.object_id
AND
ips.index_id = i.index_id
ORDER BY page_count DESC;
Optimize index maintenance to improve query performance and reduce resource consumption
Pages and Extents Architecture Guide
Non-unique clustered indexes found. The list of non-unique clustered indexes is provided in the details
column.
For rowstore tables, there are several desired properties to consider when designing the clustered index:
- Unique. If the clustered index is not defined as unique, the database engine will add a hidden "uniqueifier" integer column to the index to make it unique internally. This column adds 4 bytes to the clustered index key (and every nonclustered index key), increasing storage, memory, and IO consumption. When a suitable narrow and static column already exists in the table (i.e. an IDENTITY or sequence-based column), add it to the clustered index key and define the clustered index as unique to avoid this overhead.
- Narrow. The clustered index key is included in every nonclustered index key on the same table. Making the clustered key narrow reduces storage, memory, and IO consumption.
- Static. If the clustered index key is updated, every nonclustered index key must be updated as well. Additionally, updating the clustered index key may require the entire row to move to a different page. Choosing static columns for the clustered index key avoids the CPU and IO overhead required to update nonclustered indexes and move rows to other pages when rows are updated.
- Ever-increasing. If the clustered index key is ever-increasing, rows are always added to new pages at the end of the index. Otherwise, rows are added to pages elsewhere in the index, potentially causing performance-impacting page splits, increasing index fragmentation, and decreasing page density.
- Non-nullable and fixed length. If nullable or variable length columns are a part of the clustered index key, the database engine must add several bytes to each row to store the nullability and column length data. This adds storage, memory, and IO overhead.
-- An example of recreating the clustered index to make it unique
CREATE TABLE ClusteredIndexExample
(
c1 int NOT NULL,
c2 varchar(10) NOT NULL,
c3 int NOT NULL IDENTITY,
INDEX ixClusteredIndexExample1 CLUSTERED (c1) -- a non-unique clustered index
)
-- Add an IDENTITY column to the clustered index and make it unique
CREATE UNIQUE CLUSTERED INDEX ixClusteredIndexExample1 ON ClusteredIndexExample (c1, c3) WITH (DROP_EXISTING = ON, ONLINE = ON, RESUMABLE = ON);
Index Architecture and Design Guide
Most of the IDENTITY range is used. The list of columns that are close to running out of IDENTITY values is provided in the details
column.
Attempting to insert a row when the new IDENTITY value is outside of the data type range will cause an error. Either change the column data type to support a wider range (for example, from int
to bigint
), or use DBCC CHECKIDENT
to reseed the identity value to start at the beginning of an unused range, e.g. with the minimum negative integer value.
Changing column data type in a large table may consume significant time and resources and block other queries. For larger tables, instead of changing the data type in place using ALTER TABLE ... ALTER COLUMN
, create a new table with the desired schema, copy data to the new table, drop the original table, and rename the new table and its constraints and indexes to match the names of the original table.
-- An example of changing column data type from int to bigint in place
CREATE TABLE IdentityDataTypeExample
(
c1 int NOT NULL IDENTITY,
c2 varchar(10) NOT NULL
);
-- Change column data type to bigint while keeping the IDENTITY property
-- This works if there are no other objects (indexes, constraints) dependent on the column being changed
-- This can take a long time for larger tables
ALTER TABLE IdentityDataTypeExample
ALTER COLUMN c1 bigint NOT NULL WITH (ONLINE = ON);
-- An example of changing an IDENTITY column to use the range of negative integers
CREATE TABLE IdentityRangeExample
(
c1 int NOT NULL IDENTITY (1,1),
c2 varchar(10) NOT NULL
);
-- reseed IDENTITY to start with the smallest negative integer, while keeping the original increment of 1
DBCC CHECKIDENT (IdentityRangeExample, RESEED, -2147483648);
Most of the sequence range is used. The list of sequences that are close to reaching their maximum or minimum value is provided in the details
column.
When a sequence reaches its maximum or minimum value, several problems may occur:
- If the sequence is defined with
NO CYCLE
(default), attempts to get next value from the sequence will fail with an error. - If the sequence is defined with
CYCLE
, it may start generating previously generated numbers. If sequence values are used in a column with a primary key or unique constraint, a constraint violation error may occur. - If the sequence is defined with
CYCLE
, generated numbers may switch from positive to negative, and vice versa. This may be unexpected by the application.
Depending on situation, consider one of the following solutions:
- If a sequence has been defined with
MINVALUE
andMAXVALUE
smaller than the full range of the data type, alter these parameters to support a wider range of values. - Drop and recreate the sequence to use a larger data type, e.g.
bigint
instead ofint
. Note that this may also require changing data types of dependent columns. - If a sequence has been defined with
NO CYCLE
, alter it toCYCLE
. If the starting value of the sequence was in the middle of the range, this will use the remainder of the range that hasn't been used yet. - If the sequence has been defined with the starting value in the middle of the range, alter the starting value to be in the beginning of the range.
Using the last two options may change the sign of the generated values.
Disabled or not trusted constraints found. The list of constraints is provided in the details
column.
A disabled foreign key or check constraint potentially serves as an indication of the database designer's intent, but does not play any other role. Data modifications that would violate the constraint are allowed.
A not trusted foreign key or check constraint does prevent data modifications that violate it. However, it does not guarantee that all existing data also satisfies the constraint. For that reason, the database engine does not take advantage of not trusted constraints. This prevents the use of not trusted constraints in query optimization, and may block operations such as partition switching.
If a disabled constraint is enabled, but existing data is not checked for conformance with the constraint, the constraint remains marked as not trusted. Checking existing data for conformance with the constraint as part of enabling the constraint is possible and recommended, but can take substantial time for larger tables.
-- Examples of modifying a check constraint. Foreign key constraints behave similarly.
-- Enable an existing constraint on a table. The constraint remains not trusted.
ALTER TABLE TableName CHECK CONSTRAINT ConstraintName;
-- Enable an existing constraint on a table and check existing data for conformance to mark the constraint as trusted.
ALTER TABLE TableName WITH CHECK CHECK CONSTRAINT ConstraintName;
-- Disable an existing constraint. This also marks it as not trusted until it is re-enabled and existing data is checked.
ALTER TABLE TableName NOCHECK CONSTRAINT ConstraintName;
Page compression is ineffective for some indexes. The list of indexes and partitions where some page compression attempts consumed CPU but did not result in any compression is provided in the details
column.
For a table or index with page compression enabled, the database engine attempts to compress any added or modified page. These attempts consume CPU. However, if the achieved compression ratio is less than 20%, the page is stored with row compression only. This tip outputs a list of partition ranges in each index where the ratio of successful page compression attempts to all page compression attempts is below a threshold.
To reduce unnecessary CPU utilization, consider changing page compression to row compression for such indexes.
-- Rebuild an index and use ROW compression
ALTER INDEX ix_table1_2 ON dbo.table1 REBUILD WITH (ONLINE=ON, RESUMABLE=ON, DATA_COMPRESSION=ROW);
Additional analysis and testing based on information in provided links may be required to select data compression types optimally. Any data compression requires additional CPU processing, thus testing is particularly important when low query latency must be maintained in transactional workloads.
This tip may only appear on the primary replica.
Data Compression: Strategy, Capacity Planning and Best Practices
The Curious Case of… tracking page compression success rates, a blog article by Paul Randal