Search code, repositories, users, issues, pull requests...

Changing default MAXDOP in Azure SQL Database

tip_id: 1010

Maximum degree of parallelism (MAXDOP) for the primary replica is not in the recommended range of 1 to 8. Depending on workload, this may cause unnecessary resource utilization, and in extreme cases, errors and timeouts.

-- Set MAXDOP to 8 for the primary replica
ALTER DATABASE SCOPED CONFIGURATION SET MAXDOP = 8;

Changing default MAXDOP in Azure SQL Database

tip_id: 1020

Maximum degree of parallelism (MAXDOP) for secondary replicas is not in the recommended range of 1 to 8. Depending on workload, this may cause unnecessary resource utilization, and in extreme cases, errors and timeouts.

-- Set MAXDOP to 8 for the secondary replicas
ALTER DATABASE SCOPED CONFIGURATION FOR SECONDARY SET MAXDOP = 8;

SET TRANSACTION ISOLATION LEVEL

tip_id: 1030

Database compatibility level is not current. Recently added database engine features and improvements may not be available when using older compatibility levels. Consider using latest supported compatibility level, but note that changing compatibility level may require functional and performance testing before and after the change, known as A/B testing.

-- Adjust to use the latest compatibility level
-- To find the latest compatibility level in Azure SQL Database, see the first link in tip description
ALTER DATABASE CURRENT SET COMPATIBILITY_LEVEL = 150;

tip_id: 1040

Auto-create statistics is disabled. This may cause poor query performance due to suboptimal query plans. Enable auto-create statistics.

ALTER DATABASE CURRENT SET AUTO_CREATE_STATISTICS ON;

Statistics

tip_id: 1050

Auto-update statistics is disabled. This may cause poor query performance due to suboptimal query plans. Enable auto-update statistics.

ALTER DATABASE CURRENT SET AUTO_UPDATE_STATISTICS ON;

Statistics

tip_id: 1060

Read Committed Snapshot Isolation (RCSI) is disabled. This may cause unnecessary lock blocking, such as read queries blocking write queries. RCSI is the preferred alternative to the NOLOCK hint, which should not be used in scenarios that require correct query results.

The sample query below enables RCSI for a database. Note that this command requires exclusive database access, and will terminate all connections to the database after waiting for 5 seconds to give open transactions an opportunity to complete.

-- Execute in the master database, and change to reference the actual database name
ALTER DATABASE [database-name] SET READ_COMMITTED_SNAPSHOT ON WITH ROLLBACK AFTER 5 SECONDS;

Snapshot isolation

The Dirty Secrets of NOLOCK, a free course by Kendra Little

tip_id: 1070

Query Store is disabled. This complicates performance tuning and troubleshooting, and could disable certain Intelligent Query Processing features that can improve query performance and reduce resource consumption. Enable Query Store.

ALTER DATABASE CURRENT SET QUERY_STORE = ON;

This tip may only appear on the primary replica.

tip_id: 1071

Query Store operation mode is READ_ONLY. This complicates performance tuning and troubleshooting, and impacts certain Intelligent Query Processing features that can improve query performance and reduce resource consumption.

The details column provides the reason for the read-only state. Change Query Store operation mode to READ_WRITE.

ALTER DATABASE CURRENT SET QUERY_STORE (OPERATION_MODE = READ_WRITE);

This tip may only appear on the primary replica.

tip_id: 1072

Query Store capture mode is NONE. This complicates performance tuning and troubleshooting, and impacts certain Intelligent Query Processing features that can improve query performance and reduce resource consumption.

Execute one of the commands below to change Query Store query capture mode to AUTO or ALL depending on requirements, or use a custom query capture mode that captures all queries relevant to your workload.

ALTER DATABASE CURRENT SET QUERY_STORE (QUERY_CAPTURE_MODE = AUTO);
ALTER DATABASE CURRENT SET QUERY_STORE (QUERY_CAPTURE_MODE = ALL);

This tip may only appear on the primary replica.

Considerations for AUTO_SHRINK

tip_id: 1080

AUTO_SHRINK is enabled. An occasional shrinking of a database may be needed in response to a one-time data deletion event and/or to reduce excessive allocated but unused space in data files. However, shrinking should not be executed continuously by keeping auto-shrink enabled at all times, because it causes persistent and high resource utilization that will negatively impact workload performance. Auto-shrink should be disabled for the vast majority of databases.

ALTER DATABASE CURRENT SET AUTO_SHRINK OFF;

Recommendations and guidelines for setting the AUTO_SHRINK database option

tip_id: 1100

Btree indexes have uniqueidentifier (GUID) leading columns. For larger tables, the details column contains a list of btree indexes with uniqueidentifier leading columns. Such indexes are subject to low page density and high fragmentation as data is modified. This leads to increased memory and disk space usage and negatively impacts performance.

Avoid this pattern in physical database design, particularly for clustered btree indexes. Rebuild indexes periodically when page density is less than 70%, either manually or using a database maintenance solution.

-- Determine average page density and fragmentation for an index.
-- Replace [object-id-here] and [index-id-here] placeholders with actual values from the details column, or with NULL/default to see all objects and indexes
SELECT OBJECT_SCHEMA_NAME(ips.object_id) AS schema_name,
       OBJECT_NAME(ips.object_id) AS object_name,
       i.name AS index_name,
       i.type_desc AS index_type,
       ips.avg_page_space_used_in_percent,
       ips.avg_fragmentation_in_percent,
       ips.page_count,
       ips.alloc_unit_type_desc,
       ips.ghost_record_count
FROM sys.dm_db_index_physical_stats(DB_ID(), [object-id-here], [index-id-here], default, 'SAMPLED') AS ips
INNER JOIN sys.indexes AS i 
ON ips.object_id = i.object_id
   AND
   ips.index_id = i.index_id
ORDER BY page_count DESC;

-- Consider rebuilding an index if its average page density is below 70%
ALTER INDEX [index_name] ON [schema_name].[table_name] REBUILD WITH (ONLINE = ON, RESUMABLE = ON, MAXDOP = 8, FILLFACTOR = 100);

Indexes

Optimize index maintenance to improve query performance and reduce resource consumption

Database maintenance solution by Ola Hallengren

tip_id: 1110

FORCE_LAST_GOOD_PLAN auto-tuning option is not enabled. Query plan regressions will not be fixed automatically via plan forcing. To avoid performance issues due to unexpected query plan regressions, enable this auto-tuning option.

ALTER DATABASE CURRENT SET AUTOMATIC_TUNING (FORCE_LAST_GOOD_PLAN = ON);

This tip may only appear on the primary replica.

Automatic tuning

Automatic tuning in Azure SQL Database

Enable automatic tuning in the Azure portal

tip_id: 1120

Used data size within the database is close to maximum configured database size (MAXSIZE). To allow continued data growth, increase maximum database size proactively, or scale up to a service tier or service objective that supports higher maximum database size, or implement data compression, or delete unneeded data.

If used data size is approaching 4 TB, plan on reducing used data size by compressing or deleting data, or switch to the Hyperscale service tier.

-- Determine used space, allocated space, and database MAXSIZE
SELECT SUM(CAST(FILEPROPERTY(name, 'SpaceUsed') AS bigint) * 8 / 1024.) AS space_used_mb,
       SUM(CAST(size AS bigint) * 8 / 1024.) AS space_allocated_mb,
       CAST(DATABASEPROPERTYEX(DB_NAME(), 'MaxSizeInBytes') AS bigint) / 1024. / 1024 AS max_size_mb
FROM sys.database_files
WHERE type_desc = 'ROWS';

-- Increase MAXSIZE to 4 TB
ALTER DATABASE [database_name] MODIFY (MAXSIZE = 4096 GB); -- adjust database name and MAXSIZE value according to requirements

Manage file space for databases in Azure SQL Database

What happens when database storage limit is reached

tip_id: 1130

Storage size allocated for data files is close to maximum configured database size (MAXSIZE). If used size is not close to MAXSIZE and significant data growth is not expected, consider shrinking data files to reduce allocated size. This is particularly relevant for databases in elastic pools when the total allocated storage size for the pool is close to pool storage limit.

Note that data file shrink is a long-running and resource intensive process that should not be executed on a regular basis. Shrink will also increase index fragmentation, which may impact workloads using large readaheads in index scans.

Prior to shrinking, if page density is low (below 70%), rebuild indexes.

-- Determine used space, allocated space, and database MAXSIZE
SELECT SUM(CAST(FILEPROPERTY(name, 'SpaceUsed') AS bigint) * 8 / 1024.) AS space_used_mb,
       SUM(CAST(size AS bigint) * 8 / 1024.) AS space_allocated_mb,
       CAST(DATABASEPROPERTYEX(DB_NAME(), 'MaxSizeInBytes') AS bigint) / 1024. / 1024 AS max_size_mb
FROM sys.database_files
WHERE type_desc = 'ROWS';

-- Determine average page density and fragmentation for an index.
-- Replace [object-id-here] and [index-id-here] placeholders with actual values from the details column, or with NULL/default to see all objects and indexes
SELECT OBJECT_SCHEMA_NAME(ips.object_id) AS schema_name,
       OBJECT_NAME(ips.object_id) AS object_name,
       i.name AS index_name,
       i.type_desc AS index_type,
       ips.avg_page_space_used_in_percent,
       ips.avg_fragmentation_in_percent,
       ips.page_count,
       ips.alloc_unit_type_desc,
       ips.ghost_record_count
FROM sys.dm_db_index_physical_stats(DB_ID(), [object-id-here], [index-id-here], default, 'SAMPLED') AS ips
INNER JOIN sys.indexes AS i 
ON ips.object_id = i.object_id
   AND
   ips.index_id = i.index_id
ORDER BY page_count DESC;

-- Consider rebuilding an index if its average page density is below 70%
ALTER INDEX [index_name] ON [schema_name].[table_name] REBUILD WITH (ONLINE = ON, RESUMABLE = ON, MAXDOP = 8);

-- First, attempt to truncate to release any space at the end of the file
DBCC SHRINKFILE (1, TRUNCATEONLY); -- change file_id if needed

-- If allocated space is still too large, shrink the file. This takes longer.
DBCC SHRINKFILE (1, 1024); -- change file_id if needed, change target size from 1024 (1 GB) if needed

Manage file space for databases in Azure SQL Database

sys.database_files

tip_id: 1140

Storage size allocated for data files is much larger than used data size. If significant data growth is not expected, consider shrinking data files to reduce allocated size. This is particularly relevant for databases in elastic pools when the total allocated storage size for the pool is close to pool storage limit.

Note that data file shrink is a long-running and resource intensive process that should not be used on a regular basis. Shrink will also increase index fragmentation, which may impact workloads using large readaheads in index scans.

Prior to shrinking, if page density is low (below 70%), rebuild indexes.

-- Determine used space, allocated space, and database MAXSIZE
SELECT SUM(CAST(FILEPROPERTY(name, 'SpaceUsed') AS bigint) * 8 / 1024.) AS space_used_mb,
       SUM(CAST(size AS bigint) * 8 / 1024.) AS space_allocated_mb,
       CAST(DATABASEPROPERTYEX(DB_NAME(), 'MaxSizeInBytes') AS bigint) / 1024. / 1024 AS max_size_mb
FROM sys.database_files
WHERE type_desc = 'ROWS';

-- Determine average page density and fragmentation for an index.
-- Replace [object-id-here] and [index-id-here] placeholders with actual values from the details column, or with NULL/default to see all objects and indexes
SELECT OBJECT_SCHEMA_NAME(ips.object_id) AS schema_name,
       OBJECT_NAME(ips.object_id) AS object_name,
       i.name AS index_name,
       i.type_desc AS index_type,
       ips.avg_page_space_used_in_percent,
       ips.avg_fragmentation_in_percent,
       ips.page_count,
       ips.alloc_unit_type_desc,
       ips.ghost_record_count
FROM sys.dm_db_index_physical_stats(DB_ID(), [object-id-here], [index-id-here], default, 'SAMPLED') AS ips
INNER JOIN sys.indexes AS i 
ON ips.object_id = i.object_id
   AND
   ips.index_id = i.index_id
ORDER BY page_count DESC;

-- Consider rebuilding an index if its average page density is below 70%
ALTER INDEX [index_name] ON [schema_name].[table_name] REBUILD WITH (ONLINE = ON, RESUMABLE = ON, MAXDOP = 8);

-- First, attempt to truncate to release any space at the end of the file
DBCC SHRINKFILE (1, TRUNCATEONLY); -- change file_id if needed

-- If allocated space is still too large, shrink the file. This takes longer.
DBCC SHRINKFILE (1, 1024); -- change file_id if needed, change target size from 1024 (1 GB) if needed

Manage file space for databases in Azure SQL Database

sys.database_files

tip_id: 1150

Significant CPU throttling has recently occurred, as noted in the details column. If workload performance has been inadequate during that time, tune query workload to consume less CPU, or scale up to a service objective with more CPU capacity, or both. To find queries that have consumed the most CPU in a time interval, review top queries ranked high by cpu time.

CPU throttling is common on lower and resource-constrained service objectives, e.g. Basic, S0, S1, etc.

What happens when CPU resource limits are reached

tip_id: 1160

Out of memory errors have recently occurred, as noted in the details column. Tune query workload to consume less memory, or scale up to a service objective with more memory, or both. To find queries that have consumed the most memory in a time interval, review top queries ranked high by maximum query memory used.

What happens when memory resource limits are reached

tip_id: 1165

Memory grant waits and/or timeouts have recently occurred. The data in the details column notes the number of recent requests for memory grants that had to wait (manifested as RESOURCE_SEMAPHORE waits), and the number of such requests that timed out waiting. To find queries that have consumed the most memory in a time interval, review top queries ranked high by maximum query memory used.

Tune query workload to use smaller memory grants, or reduce the number of queries running concurrently, or scale up to a service objective with more memory.

What happens when memory resource limits are reached

tip_id: 1170

Nonclustered indexes with low reads found. The details column contains a list of non-unique nonclustered indexes where the number of index read operations is much less than the number of index write (update) operations.

As data changes, indexes must be updated, which requires CPU and IO resources and slows down DML statements. The resource overhead of updating indexes that have few reads may outweigh their benefit, if any. This overhead is particularly noticeable when there are many indexes with low reads in the database, a phenomenon known as "over-indexed" databases.

If the data in details is for a sufficiently long period during which all critical workloads have run, drop these indexes. Then, monitor performance to validate that no significant negative impact has occurred. If performance regression is unacceptable, recreate some or all of the dropped indexes.

When running query workloads on readable secondary replicas, use the sample query below to collect index read and write statistics from all replicas and add them up for each index. An index that has low reads on the primary replica may have high reads on readable replicas, and may be required for that reason.

-- Index read (seeks, scans, lookups) and write (updates) statistics for each nonclustered index not used for constraint enforcement
SELECT QUOTENAME(OBJECT_SCHEMA_NAME(o.object_id)) AS schema_name, 
       QUOTENAME(o.name) AS object_name, 
       QUOTENAME(i.name) AS index_name, 
       ius.user_seeks,
       ius.user_scans,
       ius.user_lookups,
       ius.user_updates
FROM sys.dm_db_index_usage_stats AS ius
INNER JOIN sys.indexes AS i
ON ius.object_id = i.object_id
   AND
   ius.index_id = i.index_id
INNER JOIN sys.objects AS o
ON i.object_id = o.object_id
   AND
   ius.object_id = o.object_id
WHERE ius.database_id = DB_ID()
      AND
      i.type_desc = 'NONCLUSTERED'
      AND
      i.is_primary_key = 0
      AND
      i.is_unique_constraint = 0
      AND
      i.is_unique = 0
      AND
      o.is_ms_shipped = 0;

Data Compression: Strategy, Capacity Planning and Best Practices

tip_id: 1180

ROW or PAGE data compression opportunities found. The details column contains a list of partition ranges for each object and index where data compression may be beneficial, based on recent workload sampling and heuristics. The current and suggested new data compression types are shown. The list is sorted with the largest objects first, letting you focus on compressing them first to maximize space and memory savings. The impact of compressing smaller objects is proportionally smaller. However, consider future growth when deciding what to compress. It is easier to compress a table while it is small than do it once it has become large.

To improve tip accuracy, obtain this result while a representative workload is running, or shortly thereafter.

In the following example of data in the details column, the suggested data compression is ROW for the first 5 partitions, and PAGE for the last 5 partitions of the clustered index in the usertable table. The total size of all partitions in each range, and a breakdown of that size by storage type are included. Note that row overflow and LOB storage is not compressible.

schema: [dbo], object: [usertable], index: [pk_usertable], index type: CLUSTERED, object size (MB): 11,535.46, partition range: 1-5, partition range total size (MB): 5,319.10 (in-row: 5,319.10, row overflow: 0.00, LOB: 0.00), present compression type: ROW, suggested compression type: ROW
schema: [dbo], object: [usertable], index: [pk_usertable], index type: CLUSTERED, object size (MB): 11,535.46, partition range: 6-10, partition range total size (MB): 6,216.36 (in-row: 6,216.36, row overflow: 0.00, LOB: 0.00), present compression type: ROW, suggested compression type: PAGE

ROW compression adds minimal CPU utilization while providing substantial storage and memory savings. For most workloads, ROW compression should be enabled by default for new tables and indexes.

PAGE compression adds higher CPU utilization, but also provides higher storage and memory savings.

-- Create an index with ROW compression
CREATE INDEX ix_table1_1 ON dbo.table1 (column1, column2) WITH (ONLINE=ON, RESUMABLE=ON, DATA_COMPRESSION=ROW);

-- Rebuild an index and use PAGE compression
ALTER INDEX ix_table1_2 ON dbo.table1 REBUILD WITH (ONLINE=ON, RESUMABLE=ON, DATA_COMPRESSION=PAGE);

Additional analysis and testing based on information in provided links may be required to select data compression types optimally. Any data compression requires additional CPU processing, thus testing is particularly important when low query latency must be maintained in transactional workloads.

This tip may only appear on the primary replica.

Data Compression

tip_id: 1190

There are recent occurrences of transaction log IO (aka log rate) approaching the limit of the service objective, as noted in the details column. To improve performance of bulk data modifications including data loading, consider tuning the workload to reduce log IO, or scale up to a service objective with a higher log IO limit.

Additional solutions and workarounds include:

Using data compression (ROW, PAGE, columnstore) to reduce effective log IO;
Loading transient data to be further transformed/aggregated into the tempdb database (e.g. using global temporary tables), or into non-durable memory-optimized tables. If using global temporary tables (e.g. ##tablename), set the GLOBAL_TEMPORARY_TABLE_AUTO_DROP database-scoped configuration to OFF, to retain global temporary tables until database engine restarts, or until they are dropped explicitly.

This tip may only appear on the primary replica.

Transaction log rate governance

tip_id: 1200

A significant part of plan cache is occupied by single-use plans, which have never been reused by query executions and therefore use memory unnecessarily.

To avoid caching single-use plans for a database and reduce memory consumption, enable the OPTIMIZE_FOR_AD_HOC_WORKLOADS database-scoped configuration for each database where single-use plans occupy a large amount of memory, as noted in the details column. Review application code to ensure that all parameter data type declarations match table column data types exactly, including size, precision, and scale of the column, and are not derived from each value assigned to the parameter.

ALTER DATABASE SCOPED CONFIGURATION SET OPTIMIZE_FOR_AD_HOC_WORKLOADS = ON;

optimize for ad hoc workloads

sys.dm_exec_cached_plans

tip_id: 1210

Queries could benefit from indexes that do not currently exist. The details column includes data to help create new indexes, based on metadata collected since database engine startup.

For more accurate indexing recommendations based on longer term historical data and including CREATE INDEX and DROP INDEX statements, see performance recommendations for this database in Azure portal, or use the Get-AzSqlDatabaseRecommendedAction PowerShell cmdlet.

Monitoring and troubleshooting read-only replicas

tip_id: 1220

Redo queue on a secondary readable replica is large. Queries running on this replica may experience significant data latency relative to the primary replica. A large redo queue may also cause longer failovers if the replica is used as the failover target.

Common causes of a large redo queue are CPU and/or IO resource contention on the secondary replica, or blocking of the redo thread due to long-running queries on the secondary replica and concurrent DDL operations on the primary replica.

Tune/reduce workload on the secondary replica to use less resources, and/or increase the service objective. Avoid DDL operations such as schema changes, index rebuilds, or statistics updates on the primary replica while there are long-running queries executing on the secondary replica.

This tip is produced only on readable secondary replicas, including non-Hyperscale read scale-out replicas and geo-replicas.

tip_id: 1230

Data IOPS are close to workload group limit. If workload performance for this database is insufficient and PAGEIOLATCH* waits are a top wait type, either tune the workload to generate less data IO, or scale to a service objective with a higher data IOPS limit.

Workload group defines resource governance limits for a single database, or for a database in an elastic pool.

Resource limits for single databases using the vCore purchasing model

Resource limits for single databases using the DTU purchasing model

Resource limits for single databases using the vCore purchasing model

tip_id: 1240

Workload group IO governance impact is significant. If workload performance for this database is insufficient and PAGEIOLATCH* waits are a top wait type, either tune the workload to generate less data IO, or scale to a service objective with a higher data IOPS limit.

Workload group defines resource governance limits for a single database, or for a database in an elastic pool.

Resource limits for single databases using the DTU purchasing model

Resource limits for elastic pools using the vCore purchasing model

tip_id: 1250

Data IOPS are close to resource pool limit. If workload performance for one or more databases in this elastic pool is insufficient and PAGEIOLATCH* waits are a top wait type, either tune the workload to generate less data IO, or scale the elastic pool to a service objective with a higher data IOPS limit, or rebalance databases across multiple elastic pools to reduce cumulative data IO in this elastic pool.

Resource pool defines resource governance limits for an elastic pool.

Resources limits for elastic pools using the DTU purchasing model

Resource limits for elastic pools using the vCore purchasing model

tip_id: 1260

Resource pool IO governance impact is significant. If workload performance for one or more databases in this elastic pool is insufficient and PAGEIOLATCH* waits are a top wait type, either tune the workload to generate less data IO, or scale the elastic pool to a service objective with a higher data IOPS limit, or rebalance databases across multiple elastic pools to reduce cumulative data IO in this elastic pool.

Resource pool defines resource governance limits for an elastic pool.

Resources limits for elastic pools using the DTU purchasing model

Accelerated Database Recovery

tip_id: 1270

Persistent Version Store (PVS) size is large. PVS size, used and allocated data size, maximum database size, and other relevant information is available in the details column.

PVS is a part of the user database. It is used to store row versions when Accelerated Database Recovery (ADR), Read-Committed Snapshot Isolation (RCSI), or Snapshot Isolation (SI) are enabled for the database. Large PVS size is usually caused by long-running active transactions. It may also occur due to a large number of aborted transactions that haven't yet been cleaned up by the PVS cleaner background thread. In extreme cases, a large PVS may cause the database to run out of space until PVS space is cleaned up.

Review application patterns and ad-hoc query patterns to avoid long-running transactions, particularly when implicit transactions are used. Make sure to explicitly roll back open transactions after query timeouts and other errors.

If there are any transactions that have been unintentionally left open for a long time and caused PVS to become large, either commit them, or roll them back. Alternatively, consider killing idle sessions with long-running transactions to allow PVS cleanup to occur.

-- Transactions sorted by duration
SELECT st.session_id,
       dt.database_transaction_begin_time,
       DATEDIFF(second, dt.database_transaction_begin_time, CURRENT_TIMESTAMP) AS transaction_duration_seconds,
       dt.database_transaction_log_bytes_used,
       dt.database_transaction_log_bytes_reserved,
       st.is_user_transaction,
       st.open_transaction_count,
       ib.event_type,
       ib.parameters,
       ib.event_info
FROM sys.dm_tran_database_transactions AS dt
INNER JOIN sys.dm_tran_session_transactions AS st
ON dt.transaction_id = st.transaction_id
OUTER APPLY sys.dm_exec_input_buffer(st.session_id, default) AS ib
WHERE dt.database_id = DB_ID()
ORDER BY transaction_duration_seconds DESC;

-- Kill the session with a transaction open for a long time
KILL 4242; -- replace 4242 with the actual session_id value

PVS Troubleshooting

Manage file space for databases in Azure SQL Database

tip_id: 1280

Paused resumable index maintenance operations found. The list of paused resumable index operations is included in the details column. Paused resumable index operations will be aborted automatically after a period of time specified by the PAUSED_RESUMABLE_INDEX_ABORT_DURATION_MINUTES database-scoped configuration.

Using one of sample queries below, either resume these index operations, or, if not planning to resume, abort them explicitly to avoid unnecessary performance impact on the DML operations changing these indexes between now and the time when auto-abort occurs.

ALTER INDEX [index_name] ON [schema_name].[object_name] RESUME; -- change schema, object, and index names
ALTER INDEX [index_name] ON [schema_name].[object_name] ABORT; -- change schema, object, and index names

This tip may only appear on the primary replica.

sys.index_resumable_operations

Optimize index maintenance to improve query performance and reduce resource consumption

tip_id: 1290

A clustered columnstore index (CCI) can be created on one or more tables to improve query and data load performance and drastically reduce storage space and memory consumption by the table.

CCI indexes are most useful in analytical/reporting workloads where queries use large scans. CCI indexes are not recommended in OLTP workloads, or when queries predominantly use index seeks, lookups, and small range scans. They are also not recommended when rows in the table are frequently updated, or when less frequent but large updates occur.

The details column provides a list of CCI candidate tables. Index usage statistics for the recent query workload on each table are included to help determine if CCI will benefit queries referencing this table.

To improve tip accuracy, obtain this result while a representative workload is running, or shortly thereafter.

This tip may only appear on the primary replica.

Columnstore indexes: Overview

Columnstore indexes - Design guidance

Columnstore indexes - Query performance

Columnstore indexes - Data loading guidance

tip_id: 1300

Some geo-replication links may be unhealthy. The details column describes geo-replication link health, including data propagation lag (reported on the primary geo-replica only) and replica state.

If a geo-replication link becomes unhealthy during high write load on the primary geo-replica, then the service objective of the secondary geo-replica may need to be increased to sustain data propagation rate and avoid geo-replica unavailability.

For a recently created geo-replica, the SEEDING replication state is expected. In all other cases, replication state for a healthy geo-replica should be CATCH_UP.

Geo-replication

Failover groups

sys.dm_geo_replication_link_status

tip_id: 1310

Last partitions are not empty for some partitioned tables. If the very last partition is not empty, splitting it to create a new partition for new data will be slow and resource-intensive, and will block queries accessing the table for the duration of the split operation.

If new partitions are periodically added to accommodate new data, make sure to keep a buffer of several empty partitions at the end of the partition list. While a single empty partition is sufficient, multiple empty partitions are preferred. They reduce the risk of data getting into the very last partition because of failure to split the last empty partition on time.

The details column provides the list of last several partitions for each table if at least some of them are not empty. If the very last partition (i.e. the partition number equal to the total count of partitions) is still empty, act before any data is loaded into this partition to split it and create at least one more empty partition at the end of the partition list.

Otherwise, plan and prepare for a time-consuming process of splitting a non-empty partition, to avoid all new data accumulating in the last partition. This may require application downtime.

ALTER PARTITION FUNCTION [function_name] SPLIT RANGE ([new boundary value]);

ALTER PARTITION FUNCTION

Partitioned Tables and Indexes

Blog articles by Kendra Little:

How to Fix Lopsided Partitioned Tables

Sliding Window Table Partitioning: What to Decide Before You Automate

tip_id: 1320

To help in performance investigations, top queries for a time interval are summarized in the details column. This tip is informational in nature, and is produced when any queries have executed in the time interval. On idle databases, top queries may include the queries executed by the get-sqldb-tips.sql script itself. On readable secondary replicas, this tip reports on queries that executed on the primary.

For each query, identified by its query_hash value, query_id, plan_id, the number of query executions by execution type (regular, aborted, exception) and weighted wait categories are included in details.

Additionally, each query is ranked by its consumption of the following resource types:

cpu time
duration
execution count
logical IO reads
physical IO reads
maximum query memory used
log bytes used
tempdb space used
parallelism

Lower rank numbers mean higher ranking, e.g. the topmost CPU consuming query has cpu time rank 1.

In the following example, the query with hash 0x5F9043DBAAEBB1FF is the top consumer of tempdb space (tempdb used rank: 1), has two associated query_id values (66 and one more), and three associated plan_id values (36 and two more). It successfully executed 31615 times in the time interval. Most waits (~77.4%) were on CPU.

query hash: 0x5F9043DBAAEBB1FF, query_id: 66 (+1), plan_id: 36 (+2), executions: (regular: 31615, aborted: 0, exception: 0), CPU time rank: 4, duration rank: 32, executions rank: 50, logical IO reads rank: 6, physical IO reads rank: 5, max used memory rank: 79, log bytes used rank: 48, tempdb used rank: 1, parallelism rank: 50, weighted wait categories: CPU (0.774) | Latch (0.137) | Memory (0.057) | Unknown (0.028) | Preemptive (0.002) | Buffer IO (0.002)

Depending on the nature of the performance issue, focus on the queries ranked high for the relevant resource type. For example, if troubleshooting high tempdb space consumption, look at the tempdb used rank.

To change the number of top ranked queries included for each resource type, change the @QueryStoreTopQueryCount variable. By default, 2 top queries are included for each resource type.

By default, the time interval used is the last hour. You can change the @QueryStoreIntervalMinutes variable from 60 to another value to look at shorter or longer intervals. To look at a time interval with specific start and end times, set the @QueryStoreCustomTimeStart and @QueryStoreCustomTimeEnd variables to the start and end time of the interval. Note that this tip may not appear if the specified interval is shorter than the configured Query Store resource statistics interval length (INTERVAL_LENGTH_MINUTES).

For further investigations, use Query Performance Insight in Azure portal, Query Store reports in SSMS, or query sys.query_store_* views directly.

The query below provides resource consumption details for a given query hash value, as well as associated query_id and plan_id values and SQL text of the query.

-- Return query/plan runtime statistics in the specified interval for a given query_hash value from the details column for further investigation
DECLARE @QueryHash binary(8) = 0x24C09AE7EDD849D4; -- replace query hash with the actual value from details
DECLARE @QueryStoreIntervalMinutes int = 60; -- if needed, adjust to look at shorter or longer intervals, or set the next two variables to start/end time
DECLARE @QueryStoreCustomTimeStart datetimeoffset -- = '2021-01-01 00:01 +00:00';
DECLARE @QueryStoreCustomTimeEnd datetimeoffset -- = '2021-12-31 23:59 +00:00';

DECLARE @QueryStoreTimeFrom datetimeoffset = IIF(
                                                (@QueryStoreCustomTimeStart IS NULL OR @QueryStoreCustomTimeEnd IS NULL) AND @QueryStoreIntervalMinutes IS NOT NULL,
                                                DATEADD(minute, -@QueryStoreIntervalMinutes, SYSDATETIMEOFFSET()),
                                                @QueryStoreCustomTimeStart
                                                ),
        @QueryStoreTimeThru datetimeoffset = IIF(
                                                (@QueryStoreCustomTimeStart IS NULL OR @QueryStoreCustomTimeEnd IS NULL) AND @QueryStoreIntervalMinutes IS NOT NULL,
                                                SYSDATETIMEOFFSET(),
                                                @QueryStoreCustomTimeEnd
                                                );

SELECT q.query_id,
       q.context_settings_id,
       qt.query_sql_text,
       OBJECT_SCHEMA_NAME(q.object_id) AS schema_name,
       OBJECT_NAME(q.object_id) AS object_name,
       p.plan_id,
       TRY_CAST(p.query_plan AS xml) AS query_plan,
       SUM(IIF(rs.execution_type_desc = 'Regular', rs.count_executions, 0)) AS count_regular_executions,
       SUM(IIF(rs.execution_type_desc = 'Aborted', rs.count_executions, 0)) AS count_aborted_executions,
       SUM(IIF(rs.execution_type_desc = 'Exception', rs.count_executions, 0)) AS count_exception_executions,
       SUM(rs.avg_cpu_time * rs.count_executions) / SUM(rs.count_executions) AS avg_cpu_time,
       SUM(rs.avg_duration * rs.count_executions) / SUM(rs.count_executions) AS avg_duration,
       SUM(rs.avg_logical_io_reads * rs.count_executions) / SUM(rs.count_executions) AS avg_logical_io_reads,
       SUM(rs.avg_physical_io_reads * rs.count_executions) / SUM(rs.count_executions) AS avg_physical_io_reads,
       SUM(rs.avg_query_max_used_memory * rs.count_executions) / SUM(rs.count_executions) AS avg_query_max_used_memory,
       SUM(rs.avg_log_bytes_used * rs.count_executions) / SUM(rs.count_executions) AS avg_log_bytes_used,
       SUM(rs.avg_tempdb_space_used * rs.count_executions) / SUM(rs.count_executions) AS avg_tempdb_space_used,
       SUM(rs.avg_dop * rs.count_executions) / SUM(rs.count_executions) AS avg_dop
FROM sys.query_store_query AS q
INNER JOIN sys.query_store_query_text AS qt
ON q.query_text_id = qt.query_text_id
INNER JOIN sys.query_store_plan AS p
ON q.query_id = p.query_id
INNER JOIN sys.query_store_runtime_stats AS rs
ON p.plan_id = rs.plan_id
INNER JOIN sys.query_store_runtime_stats_interval AS rsi
ON rs.runtime_stats_interval_id = rsi.runtime_stats_interval_id
WHERE q.query_hash = @QueryHash
      AND
      rsi.start_time >= @QueryStoreTimeFrom
      AND
      rsi.start_time <= @QueryStoreTimeThru
GROUP BY q.query_hash,
         q.query_id,
         q.context_settings_id,
         q.object_id,
         qt.query_sql_text,
         p.plan_id,
         p.query_plan
ORDER BY query_id, plan_id
OPTION (RECOMPILE);

Query Performance Insight for Azure SQL Database

Query Store Catalog Views

Resource limits for single databases using the vCore purchasing model

tip_id: 1330

Volume storage space allocated for tempdb data files is close to maximum tempdb data size. The details column provides tempdb allocated, used, and maximum data sizes. If additional tempdb space is needed, consider scaling to a service objective that provides a larger tempdb.

In the Premium and Business Critical service tiers, tempdb shares local SSD space with user databases. For large (1-4 TB) databases, large tempdb allocated space can prevent user database growth and cause out-of-space errors.

As an emergency measure, you can reset tempdb allocated space to its initial small value by failing over the database or elastic pool using PowerShell or REST API. Use Invoke-AzSqlDatabaseFailover or database failover API for single databases, and Invoke-AzSqlElasticPoolFailover or elastic pool failover API for elastic pools. A failover will abort any running queries, terminate connections, and cause a short period of unavailability for the database or all databases in an elastic pool.

Resource limits for elastic pools using the vCore purchasing model

tempdb database in the DTU purchasing model

Queries to monitor tempdb usage

tip_id: 1340

Storage space used in tempdb data files is close to maximum tempdb data size. The details column provides tempdb allocated, used, and maximum data sizes. Continuing growth of used space may cause tempdb out-of-space errors. If additional tempdb space is needed, consider scaling to a service objective that provides a larger tempdb.

As an emergency measure, you can reduce tempdb used size to its initial small value by failing over the database or elastic pool using PowerShell or REST API. Use Invoke-AzSqlDatabaseFailover or database failover API for single databases, and Invoke-AzSqlElasticPoolFailover or elastic pool failover API for elastic pools. A failover will abort any running queries, terminate connections, and cause a short period of unavailability for the database or all databases in an elastic pool.

Resource limits for single databases using the vCore purchasing model

Resource limits for elastic pools using the vCore purchasing model

tempdb database in the DTU purchasing model

Queries to monitor tempdb usage

tip_id: 1350

Volume storage space allocated for tempdb log file is close to maximum tempdb log size. The details column provides tempdb allocated, used, and maximum transaction log sizes. This may be caused by a long-running transaction modifying data in a temporary table, or using tempdb for internal storage.

As an emergency measure, you can reset tempdb allocated log size to its initial small value by failing over the database or elastic pool using PowerShell or REST API. Use Invoke-AzSqlDatabaseFailover or database failover API for single databases, and Invoke-AzSqlElasticPoolFailover or elastic pool failover API for elastic pools. A failover will abort any running queries, terminate connections, and cause a short period of unavailability for the database or all databases in an elastic pool. Alternatively, you can kill the sessions with long-running transactions in tempdb.

-- Transactions in tempdb sorted by duration
SELECT st.session_id,
       dt.database_transaction_begin_time,
       DATEDIFF(second, dt.database_transaction_begin_time, CURRENT_TIMESTAMP) AS transaction_duration_seconds,
       dt.database_transaction_log_bytes_used,
       dt.database_transaction_log_bytes_reserved,
       st.is_user_transaction,
       st.open_transaction_count,
       ib.event_type,
       ib.parameters,
       ib.event_info
FROM sys.dm_tran_database_transactions AS dt
INNER JOIN sys.dm_tran_session_transactions AS st
ON dt.transaction_id = st.transaction_id
OUTER APPLY sys.dm_exec_input_buffer(st.session_id, default) AS ib
WHERE dt.database_id = 2
ORDER BY transaction_duration_seconds DESC;

-- Kill the session with a transaction open for a long time
KILL 4242; -- replace 4242 with the actual session_id value

Queries to monitor tempdb usage

tip_id: 1360

There are recent occurrences of worker utilization approaching the workload group limit of the service objective, as noted in the details column. Workload group defines resource governance limits for a single database, or for a database in an elastic pool.

If the limit is reached, queries and new connection attempts will fail. Common reasons for high worker utilization include:

excessive parallelism
large blocking chains
the number of concurrently executing queries or concurrent connection attempts exceeding the capacity of the service objective

Tune query workload and reduce/stagger connection attempts, and/or scale up to a service objective with a higher worker limit. To find queries that contribute the most to worker utilization in a time interval, review top queries ranked high by parallelism and/or execution count.

Resource limits for single databases using the vCore purchasing model

Resource limits for single databases using the DTU purchasing model

What happens when worker resource limit is reached

sys.dm_db_resource_stats

tip_id: 1370

There are recent occurrences of worker utilization approaching the resource pool limit of the service objective, as noted in the details column. Resource pool defines resource governance limits for an elastic pool.

If the limit is reached, queries and new connection attempts will fail for some or all databases in the elastic pool. Common reasons for high worker utilization include:

excessive parallelism
large blocking chains
the number of concurrently executing queries or concurrent connection attempts exceeding the capacity of the service objective

Tune query workloads and reduce/stagger connection attempts, and/or scale up to a service objective with a higher worker limit. To find queries that contribute the most to worker utilization in an elastic pool in a time interval, review top queries for all databases in the pool starting with the most utilized, ranked high by parallelism and/or execution count. To find the most utilized databases in a pool, query sys.resource_stats.

Resource limits for elastic pools using the vCore purchasing model

Resources limits for elastic pools using the DTU purchasing model

What happens when worker resource limit is reached

sys.elastic_pool_resource_stats

sys.dm_db_resource_stats

tip_id: 1380

Notable network connectivity events have recently occurred in this database, or other databases in the same elastic pool. This includes network errors, slow login attempts, and abnormally closed connections. Retry logic implemented by an application or a MSSQL network driver can prevent these events from impacting applications.

This tip is informational in nature, and is intended to facilitate network connectivity troubleshooting. The presence of network connectivity events does not necessarily indicate a problem. Some events, while reported by this tip, may be expected in Azure SQL Database. For example, idle sessions are killed after a period of inactivity by design.

The details column provides a query to view detailed connectivity event data obtained in the latest execution of the script.

To reduce the likelihood of workload-impacting network events, configure the logical server to use the Redirect connection policy.

Connection policy

tip_id: 1390

High CPU utilization by the database engine instance was sustained for a significant period of time, as noted in the details column. If the database is in an elastic pool, this refers to cumulative CPU utilization by all databases in the same pool.

If workload performance has been inadequate during that time, tune query workload to consume less CPU, or scale up to a service objective with more CPU capacity, or both. To find queries that have consumed the most CPU in a time interval, review top queries for the database, or for the most utilized databases in an elastic pool, ranked high by cpu time.

What happens when CPU resource limits are reached

Identify query performance issues

Application and database tuning guidance

Detectable database performance patterns

sys.dm_db_resource_stats

sys.resource_stats

sys.elastic_pool_resource_stats

tip_id: 1400

Some statistics may be out of date. This may cause poor query performance due to suboptimal query plans. The list of potentially out of date statistics on user objects and some system objects is included in the details column.

Update statistics by executing sp_updatestats, UPDATE STATISTICS, or a database maintenance solution, and consider establishing a periodic maintenance procedure to keep statistics up to date.

EXEC sys.sp_updatestats;

Database maintenance solution by Ola Hallengren

tip_id: 1410

Many tables do not have any indexes. This tip appears when a substantial percentage of all tables in the database (excluding very small tables) do not have any indexes. This often causes poor query performance due to suboptimal data access. The list of such tables is included in the details column.

You can enable the CREATE INDEX auto-tuning option for the database to have the system create missing indexes automatically to improve query performance in recent workloads that used these tables.

If you prefer to create indexes manually, review the Index Architecture and Design Guide to learn how to create efficient indexes. If the Queries could benefit from indexes that do not currently exist tip also appears, you can derive new indexes from the data in the details column of that tip.

Enable automatic tuning in the Azure portal to monitor queries and improve workload performance

Indexes

Understand and resolve Azure SQL Database blocking problems

tip_id: 1420

Significant lock blocking has recently occurred. This tip appears when at least one of the following conditions has recently occurred:

The time spent by queries waiting on locks during a time interval exceeded a threshold
The number of observed blocked tasks exceeded a threshold

The details column includes a list of recent time intervals when one or both of these conditions were occurring. Each entry includes:

Interval start time,
Interval end time,
Interval duration,
Total time user queries spent waiting on locks during the interval,
The minimum number of observed blocked tasks during the interval,
The maximum number of observed blocked tasks during the interval,
The total number of lock waits during the interval.

The number of blocked tasks is recorded by the database engine using point-in-time snapshots every 15-20 seconds. The actual number of blocked tasks between two snapshots may exceed the reported number. If the minimum number of blocked tasks is greater than zero for long time periods of time, it usually indicates workload-impacting long-term blocking.

Short-lived lock blocking is expected in many MSSQL workloads. Locks are used to guarantee the ACID properties of transactions. However, unnecessary long-term blocking can reduce workload throughput, and can cause query timeouts and application errors. Common reasons for longer workload-impacting blocking include:

Manual (ad-hoc) transactions not committed or rolled back for a long time. This is particularly common when implicit transactions are used.
Not using the Read Committed Snapshot Isolation (RCSI) database option.
Using unnecessarily large transactions, i.e. loading or modifying a large dataset in a single transaction instead of breaking it into smaller batches.
Not handling query execution timeouts properly in application code. Application code that opens an explicit or implicit transaction and then receives a query timeout error should attempt to immediately roll back the transaction, or set the XACT_ABORT option to ON when opening new connections to the database. Otherwise, a transaction could remain open for a long time until the connection is terminated or reused, holding locks and blocking other sessions.
Poorly written queries or suboptimal query plans causing DML (INSERT, UPDATE, DELETE, MERGE) statements to execute for a long time while holding locks.
Long-running DDL operations such as offline index rebuilds or schema changes executed concurrently with other query workloads.
Unnecessarily mixing DML and DDL statements in the same transaction.
Unnecessarily using REPEATABLE READ and SERIALIZABLE transaction isolation levels (e.g. setting them as default in application code), or locking hints such as HOLDLOCK, PAGLOCK, READCOMMITTEDLOCK, REPEATABLEREAD, SERIALIZABLE, TABLOCK, TABLOCKX, UPDLOCK, XLOCK.
Unintended lock escalation. If necessary, lock escalation can be disabled per table at the expense of using more memory for locks.

You can use the sp_whoisactive stored procedure to observe currently executing queries and see whether they are blocked waiting on locks. In Azure SQL Database, create sp_whoisactive in the user database, not in the master database.

For suitable new and existing workloads, evaluate In-Memory OLTP. This is a lock-free database engine technology not subject to lock blocking issues.

-- View currently executing queries and the locks they are holding or waiting for
EXEC sp_whoisactive @get_locks = 1;

-- Transactions sorted by duration
SELECT st.session_id,
       dt.database_transaction_begin_time,
       DATEDIFF(second, dt.database_transaction_begin_time, CURRENT_TIMESTAMP) AS transaction_duration_seconds,
       dt.database_transaction_log_bytes_used,
       dt.database_transaction_log_bytes_reserved,
       st.is_user_transaction,
       st.open_transaction_count,
       ib.event_type,
       ib.parameters,
       ib.event_info
FROM sys.dm_tran_database_transactions AS dt
INNER JOIN sys.dm_tran_session_transactions AS st
ON dt.transaction_id = st.transaction_id
OUTER APPLY sys.dm_exec_input_buffer(st.session_id, default) AS ib
WHERE dt.database_id = DB_ID()
ORDER BY transaction_duration_seconds DESC;

-- Kill the session if it has a transaction that is open for a long time and is blocking other queries
KILL 4242; -- replace 4242 with the actual session_id value

-- Disable lock escalation for a table
-- Only consider this if observing object-level IX or X locks that block other queries
ALTER TABLE [table-name] SET (LOCK_ESCALATION = DISABLE);

Transaction Locking and Row Versioning Guide

Coding Efficient Transactions

tip_id: 1430

The number of recent query optimizations is high. This tip appears when a significant portion of recent query requests required the query optimizer to create a new execution plan, instead of reusing an existing plan from plan cache. The list of time intervals with a high number of query plan optimizations is included in the details column.

Query plan optimization requires CPU and memory and delays query execution. If a plan for a query is already cached, resource utilization increase and execution delay is usually avoided by reusing the plan.

At times, query plans need to be recompiled to ensure query correctness or improve performance by creating a more optimal plan. At steady state, such recompilations and optimizations are infrequent and are unlikely to produce this tip.

If many query optimizations occur continuously, it may be caused by one of the following:

Using literal values instead of query parameters in queries, for example WHERE name = 'thing' instead of WHERE name = @parameter. The database engine may automatically parameterize some queries, i.e. replace literal values with parameters, to enable plan reuse. This is known as simple parameterization. You can switch from simple parameterization (default) to forced parameterization to make automatic parameterization more aggressive, though this may result in less optimal query plans. While automatic parameterization (either simple or forced) helps reduce optimizations, explicit parameterization is the preferred way to achieve that.

-- Enable forced parameterization for a database
ALTER DATABASE CURRENT SET PARAMETERIZATION FORCED;

-- Enable simple parameterization for a database
ALTER DATABASE CURRENT SET PARAMETERIZATION SIMPLE;

Not declaring a parameter data type to fully and exactly match the data type of the corresponding column in the database. The database engine may not always reuse a plan when there is a mismatch in parameter data type, even if the SQL text of the query is the same. In that case, a new plan is created and cached. Review application code to ensure that all parameter data type declarations match table column data types exactly, including size, precision, and scale of the column, and are not derived from each value assigned to the parameter.

For example, using SqlClient in C#:

//declare a parameter for an nvarchar(10) column, specifying Size explicitly
SqlParameter parameter = new SqlParameter("@Name", SqlDbType.NVarChar);
parameter.Value = "thing";
parameter.Size = 10;

//declare a parameter for a decimal(8,4) column, specifying Precision and Scale explicitly
SqlParameter parameter = new SqlParameter("@Amount", SqlDbType.Decimal);
parameter.Value = 1234.5678;
parameter.Precision = 8;
parameter.Scale = 4;

It is normal and expected to see a high number of compilations or recompilations, and a temporary increase in resource usage, for a short time in the following cases:

After a database or an elastic pool has been restarted (failed over) and before plans for repeating queries in the workload have been cached,
After plan cache has been flushed, for example because of memory pressure, or with an explicit command,
When a new query workload starts running, for example after releasing a new version of the application.

The following query returns top queries compiled in a time interval, ordered by compilation CPU cost.

-- Queries compiled in a time interval with a potentially significant compilation CPU cost
DECLARE @QueryStoreIntervalMinutes int = 60; -- if needed, adjust to look at shorter or longer intervals, or set the next two variables to start/end time
DECLARE @QueryStoreCustomTimeStart datetimeoffset -- = '2021-01-01 00:01 +00:00';
DECLARE @QueryStoreCustomTimeEnd datetimeoffset -- = '2021-12-31 23:59 +00:00';

DECLARE @QueryStoreTimeFrom datetimeoffset = IIF(
                                                (@QueryStoreCustomTimeStart IS NULL OR @QueryStoreCustomTimeEnd IS NULL) AND @QueryStoreIntervalMinutes IS NOT NULL,
                                                DATEADD(minute, -@QueryStoreIntervalMinutes, SYSDATETIMEOFFSET()),
                                                @QueryStoreCustomTimeStart
                                                ),
        @QueryStoreTimeThru datetimeoffset = IIF(
                                                (@QueryStoreCustomTimeStart IS NULL OR @QueryStoreCustomTimeEnd IS NULL) AND @QueryStoreIntervalMinutes IS NOT NULL,
                                                SYSDATETIMEOFFSET(),
                                                @QueryStoreCustomTimeEnd
                                                );

SELECT TOP (50) q.query_hash,
                MIN(q.query_id) AS sample_query_id,
                COUNT(DISTINCT(q.query_id)) AS count_distinct_query_id,
                MIN(qt.query_sql_text) AS sample_query_text,
                COUNT(DISTINCT(qt.query_text_id)) AS count_query_text_variants,
                OBJECT_SCHEMA_NAME(q.object_id) AS schema_name,
                OBJECT_NAME(q.object_id) AS object_name,
                SUM(q.count_compiles) AS count_compiles,
                SUM((q.avg_bind_cpu_time + q.avg_optimize_cpu_time) * q.count_compiles) AS total_compilation_cpu_time,
                MIN(initial_compile_start_time) AS initial_compile_start_time,
                MAX(last_compile_start_time) AS last_compile_start_time,
                MAX(last_execution_time) AS last_execution_time
FROM sys.query_store_query AS q
INNER JOIN sys.query_store_query_text AS qt
ON q.query_text_id = qt.query_text_id
WHERE q.last_compile_start_time >= @QueryStoreTimeFrom
      AND
      q.last_compile_start_time <= @QueryStoreTimeThru
GROUP BY q.query_hash,
         q.object_id
HAVING SUM(q.count_compiles) > 1
ORDER BY total_compilation_cpu_time DESC
OPTION (RECOMPILE);

Note that if Query Store query capture mode is not ALL, the query above may not return all queries that have contributed to a high number of optimizations.

-- Set query capture mode to ALL for troubleshooting purposes
ALTER DATABASE CURRENT SET QUERY_STORE (QUERY_CAPTURE_MODE = ALL);

-- Reset query capture mode to the default AUTO setting
ALTER DATABASE CURRENT SET QUERY_STORE (QUERY_CAPTURE_MODE = AUTO);

Execution Plan Caching and Reuse

Plan Caching and Recompilation whitepaper

Factors that affect query plan changes

SQL Compilations/sec is not what you think it is - blog article by Fabiano Amorim

tip_id: 1440

Row locks or page locks are disabled for some indexes. The list of indexes that have row or page locks disabled is provided in the details column.

Disabling row or page locks can lead to concurrency problems (blocking and deadlocks), and cause unnecessary CPU and memory utilization. When both row and page locks are disabled, the use of the index in a query will require an object-level lock, blocking other queries from reading or modifying any data in the table.

An index cannot be reorganized when page locks are disabled.

With rare exceptions, both row and page locks should be enabled. The database engine will choose the appropriate lock type automatically to reduce blocking and resource utilization.

-- Enable both row and page locks for an index
ALTER INDEX [index-name-here] ON [schema-name-here].[table-name-here] SET (ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON);

Disabling ROW and PAGE Level Locks in SQL Server - blog article by Klaus Aschenbrenner

Why ROWLOCK Hints Can Make Queries Slower and Blocking Worse in SQL Server - blog article by Kendra Little

tip_id: 1450

Allocated local storage is close to maximum local storage. Local storage usage details are provided in the details column.

This tip appears for databases and elastic pools in the Premium and Business Critical service tiers when their total local storage consumption by data files and transaction log files, including the tempdb database, is approaching the maximum storage limit.

If local storage consumed by tempdb is large, as an emergency measure, you can reset tempdb allocated space to its initial small value by failing over the database or elastic pool using PowerShell or REST API. Use Invoke-AzSqlDatabaseFailover or database failover API for single databases, and Invoke-AzSqlElasticPoolFailover or elastic pool failover API for elastic pools. A failover will abort any running queries, terminate connections, and cause a short period of unavailability for the database or all databases in an elastic pool.

If local storage is consumed by a large transaction log for a user database, you can shrink the transaction log file, or wait for the periodic automatic shrink to occur.

Resource limits for single databases using the vCore purchasing model

Resource limits for elastic pools using the vCore purchasing model

Manage file space for databases in Azure SQL Database

What happens when database storage limit is reached

tip_id: 1460

Column collation does not match database collation. The list of objects where column collation does not match database collation is provided in the details column.

A column that does not have a collation explicitly specified inherits the database collation. Referencing columns with different collations in the same expression can result in collation conflict errors. Explicitly using the COLLATE clause to cast one of the conflicting collations to the other collation may be used as a workaround, though this may prevent the use of index seek and result in worse performance than when there is no collation conflict.

Scenarios where individual columns need to use a collation different from the database collation are infrequent. Unless there is a clear need to customize column collation, omit the COLLATE clause when defining table columns.

-- An example of a collation conflict and a workaround

SELECT DATABASEPROPERTYEX(DB_NAME(), 'Collation') AS DatabaseCollation;
-- returns SQL_Latin1_General_CP1_CI_AS

CREATE TABLE T1
(
String varchar(10) COLLATE Latin1_General_100_BIN2_UTF8 -- a collation different from the database collation is specified
);

CREATE TABLE T2
(
String varchar(10) -- default database collation is used
);

SELECT 1
FROM T1
INNER JOIN T2
ON T1.String = T2.String
;

/*
Fails with:

Msg 468, Level 16, State 9, Line 14
Cannot resolve the collation conflict between "SQL_Latin1_General_CP1_CI_AS" and "Latin1_General_100_BIN2_UTF8" in the equal to operation.
*/

SELECT 1
FROM T1
INNER JOIN T2
ON T1.String COLLATE DATABASE_DEFAULT = T2.String -- work around by explicitly casting T1.String to database collation
;

COLLATE

Collation and Unicode support

Collation Precedence

sys.fn_helpcollations

tip_id: 1470

Indexes with excessively large keys found. The list of indexes with excessively large keys is provided in the details column.

The database engine has a limit on the maximum length of data in an index key. For clustered indexes, the limit is 900 bytes; for nonclustered indexes, it is 1700 bytes. If the key of an index contains variable length columns, then the total of maximum column lengths of index key columns can exceed these values. However, attempts to insert or update a row that would result in exceeding the limits will fail.

It is not recommended, and in most cases unnecessary to create indexes with large index keys. This is particularly relevant for the clustered index key, which is also stored in every nonclustered index on the same table, increasing storage size and storage IO. If many columns need to be present in a nonclustered index, use the INCLUDE clause.

Create indexes with included columns

tip_id: 1480

Disabled indexes found. The list of disabled indexes is provided in the details column.

Disabled indexes consume data and backup storage, but are not used or updated by the database engine. Any primary key or unique constraints dependent on a disabled index are also disabled. If the clustered index is disabled, the table becomes inaccessible.

To re-enable a disabled index, it must be rebuilt. This is similar to recreating the index. For large disabled indexes, index rebuild may consume significant time and resources.

If a disabled index is no longer required, it should be dropped.

-- Rebuild a disabled index
ALTER INDEX [disabled_index_name] ON [schema_name].[table_name] REBUILD;

-- Drop a disabled index if it exists
DROP INDEX IF EXISTS [disabled_index_name] ON [schema_name].[table_name];

Disable Indexes and Constraints

DROP INDEX

tip_id: 1490

Indexes with low fill factor found. The list of indexes with low fill factor is provided in the details column.

Fill factor is an index property that determines how much free space to leave on a btree index leaf page when adding rows to a page. By default, fill factor is set to 100, which means that pages are filled to capacity. This is the recommended value. Setting fill factor to 100 improves performance by storing more data on each page. This reduces storage IO and increases the amount of data cached in memory.

In certain specific scenarios, such as frequently modified indexes with the leading column containing non-sequential GUID values, lowering the fill factor to 70-90% helps reduce the frequency of performance-impacting page splits.

Unless there is a clear need to change the fill factor from the default of 100 (or 0, which is equivalent to 100 in this context), rebuild indexes that have low fill factor, setting fill factor to 100.

Over time, as rows on a page are updated and deleted, and as pages are split, the percentage of each page used by data (known as page density) may become substantially lower than the fill factor. If page density is low, follow the index maintenance strategy to determine if indexes should be reorganized or rebuilt.

-- Rebuild an index and set fill factor to 100
ALTER INDEX [index_name] ON [schema_name].[table_name] REBUILD WITH (ONLINE = ON, RESUMABLE = ON, MAXDOP = 8, FILLFACTOR = 100);

-- Determine average page density and fragmentation for an index.
-- Replace [object-id-here] and [index-id-here] placeholders with actual values from the details column, or with NULL/default to see all objects and indexes
SELECT OBJECT_SCHEMA_NAME(ips.object_id) AS schema_name,
       OBJECT_NAME(ips.object_id) AS object_name,
       i.name AS index_name,
       i.type_desc AS index_type,
       ips.avg_page_space_used_in_percent,
       ips.avg_fragmentation_in_percent,
       ips.page_count,
       ips.alloc_unit_type_desc,
       ips.ghost_record_count
FROM sys.dm_db_index_physical_stats(DB_ID(), [object-id-here], [index-id-here], default, 'SAMPLED') AS ips
INNER JOIN sys.indexes AS i 
ON ips.object_id = i.object_id
   AND
   ips.index_id = i.index_id
ORDER BY page_count DESC;

Fill Factor

Optimize index maintenance to improve query performance and reduce resource consumption

Pages and Extents Architecture Guide

tip_id: 1500

Non-unique clustered indexes found. The list of non-unique clustered indexes is provided in the details column.

For rowstore tables, there are several desired properties to consider when designing the clustered index:

Unique. If the clustered index is not defined as unique, the database engine will add a hidden "uniqueifier" integer column to the index to make it unique internally. This column adds 4 bytes to the clustered index key (and every nonclustered index key), increasing storage, memory, and IO consumption. When a suitable narrow and static column already exists in the table (i.e. an IDENTITY or sequence-based column), add it to the clustered index key and define the clustered index as unique to avoid this overhead.
Narrow. The clustered index key is included in every nonclustered index key on the same table. Making the clustered key narrow reduces storage, memory, and IO consumption.
Static. If the clustered index key is updated, every nonclustered index key must be updated as well. Additionally, updating the clustered index key may require the entire row to move to a different page. Choosing static columns for the clustered index key avoids the CPU and IO overhead required to update nonclustered indexes and move rows to other pages when rows are updated.
Ever-increasing. If the clustered index key is ever-increasing, rows are always added to new pages at the end of the index. Otherwise, rows are added to pages elsewhere in the index, potentially causing performance-impacting page splits, increasing index fragmentation, and decreasing page density.
Non-nullable and fixed length. If nullable or variable length columns are a part of the clustered index key, the database engine must add several bytes to each row to store the nullability and column length data. This adds storage, memory, and IO overhead.

-- An example of recreating the clustered index to make it unique

CREATE TABLE ClusteredIndexExample
(
c1 int NOT NULL,
c2 varchar(10) NOT NULL,
c3 int NOT NULL IDENTITY,
INDEX ixClusteredIndexExample1 CLUSTERED (c1) -- a non-unique clustered index
)

-- Add an IDENTITY column to the clustered index and make it unique
CREATE UNIQUE CLUSTERED INDEX ixClusteredIndexExample1 ON ClusteredIndexExample (c1, c3) WITH (DROP_EXISTING = ON, ONLINE = ON, RESUMABLE = ON);