SQL Server Buddy: January 2019

Merry-go-round scanning ?

In SQL Server Enterprise edition, There is a strategy being followed when performing SCAN on tables!

Its also called as "Advanced Scanning"

The advanced scan feature allows multiple tasks to share full table scans. If the execution plan of a Transact-SQL statement requires a scan of the data pages in a table and the Database Engine detects that the table is already being scanned for another execution plan, the Database Engine joins the second scan to the first, at the current location of the second scan.

The Database Engine reads each page one time and passes the rows from each page to both execution plans.

This continues until the end of the table's data is reached.

At that point, the first execution plan has the complete results of a scan, but the second execution plan must still retrieve the data pages that were read before it joined the in-progress scan. The scan for the second execution plan then wraps back to the first data page of the table and scans forward to where it joined the first scan. Any number of scans can be combined like this. The Database Engine will keep looping through the data pages until it has completed all the scans. This mechanism is also called "merry-go-round scanning" and demonstrates why the order of the results returned from a SELECT statement cannot be guaranteed without an ORDER BY clause.

i.e:, assume that you have a table with 500000 pages. User-1 executes a Transact-SQL statement that requires a scan of the table. When that scan has processed 100000 pages, User-2 executes another Transact-SQL statement that scans the same table. The Database Engine schedules one set of read requests for pages after 100001, and passes the rows from each page back to both scans. When the scan reaches the 200000 th page, User-3 executes another Transact-SQL statement that scans the same table. Starting with page 200,001, the Database Engine passes the rows from each page it reads back to all three scans. After it reads the 500000 th row, the scan for User-1 is complete, and the scans for User-2 and User-3 wrap back and start to read the pages starting with page 1. When the Database Engine gets to page 100000, the scan for User-2 is completed. The scan for User-3 then keeps going alone until it reads page 200000. At this point, all the scans have been completed

Without this strategy:
"Each user would have to compete for buffer space and cause disk arm contention. The same pages would then be read once for each user, instead of read one time and shared by multiple users, slowing down performance and taxing resources"

Tempdb should be some % size of Large Database in SQL Instance ?

It doesn't seem like that!!!

There is NO any arithmetic formula to calculate Tempdb Size. Yes ?

Do we have any ?

Yes. still we can calculate/figure out by performing the following things when storing Intermediate results

1. Memory Spill - Causing by Hash Or Sort operation
2. Rebuilding Index along with SORT_IN_TEMPDB Option
3. DBCC CHECKDB on Larg Database
4. Using Temp (#/##) Table
5. Using multiple aggregations with huge data

Tempdb doesn't behave like User Databases (Say suppose, If the User database grows up to 500 GB and It'll remain same even the SQL Instance restarts. Right ?

But, The Tempdb will be recreated with the Size to whatever the size it was last set to!!!

To avoid Memory Spill:
- Omit the ORDER BY clause if you do not need the result set to be ordered.
- If ORDER BY is required, eliminate the column that participates in the multiple range scans from the ORDER BY clause.
- Using an index hint, force the optimizer to use a different access path on the table in question.
- Rewrite the query to produce a different query execution plan.
- Force serial execution of the query by adding the MAXDOP = 1 option to the end of the query or index operation

Is always good idea to use Temporary Table ?

No. Its NOT

As far as I know, Sometimes using Temp table improves performance and Sometimes Its not!

Creating a Temp table to hold intermediate results forces SQL Server to interrupt the data pipeline through a query to persist the results to disk.

Sometimes just doing one query rather than pre-aggregating or pre-sorting can be way more efficient and lead to far lower run time and tempdb usage!

So, "Always not good using Temp table to have intermediate aggregations/processing before resulting it"

It has to be compared with various methods with/without Temp table/CTE/table variable before implementing it in production environment!

So, Its not the case always do something Or always not to do something.

Importantly, make sure to pull only the data that's really required and create Non-clustered index after the table populated when using Temp tables to gain proper statistics created on that with actual loaded data!

Also, Creating more Non-clustered Index that aren't actually being used also causes the slowness on loading data and retrieving as well!

So, Non-clustered index to be created only based on the query plan actually using.

There is no point of pulling Millions of records which is even not needed at all. Right ? But, Its happening most of times which can lead to enormous Tempdb usage!!!