by http://webgeektutorials.blogspot.com

Thursday, March 1, 2012

What happens when SQL Server executes a stored procedure or query ?

SQL Server performs a couple of internal steps before executing a query or stored procedure. The steps that interest us here are compilation and execution.
When SQL Server receives a query for execution, its execution plan may already be present in memory (the procedure cache); if not, SQL Server will have to compile the query before executing it.
The compilation process is divided into four parts: parsing, normalization, compilation and optimization.

Parsing

During this stage, SQL Server checks the query for syntax errors and transforms it into a complier-ready structure that it will use later to optimize the query. It does not check for object names or column names.

Normalization
At this stage, SQL Server checks all references to objects in the query. This is where we typically get the “Object not found” message when an object referenced in the query is not found in the database. SQL Server also checks to see if a query makes sense. For example, we cannot execute a table or select from a stored procedure.
Bear in mind that while we can optimize select, insert, and update statements, there is no way to optimize if, while, and for operators.

Compilation
This is where we start building the execution plan for the query we passed to SQL Server. First, we create a sequence tree. The sequence tree is normalized, which includes adding implicit conversions if necessary. Also during this phase, if the query is referencing views, a view definition is placed in the query. If a statement is a DML statement, a special object is created called the query graph. The query graph is the object on which the optimizer works to generate an optimized plan for the query. This is the compiled plan that is stored in the procedure cache for reuse.
Optimization
SQL Server Optimizer is a cost-based optimizer, which means that it will come up with the cheapest execution plan available for each SQL statement. For each SQL statement to run, we need to use resources like CPU, memory, hard disk, etc. The cheapest plan is the one that will use the least amount of resources to get the desired output. For optimizing DML statements, SQL Server will test different indexes and join orders to get the best plan for executing the query. Your index definition helps optimizer by reducing/minimizing resource usage. If the index has a high selectivity then it is most suitable for optimization. Because a complex query will take into account all indexes and joins, there can be many paths to take to execute the query. In such cases, determining the best path for optimization can take a long time. The longer this process takes, the higher the cost that is involved.
So first, a trivial plan is generated. This plan assumes that cost-based optimization is costly; if there is only one path for execution possible, there is no point optimizing the query. For example, when placing a simple insert statement into a table, there is no way that your indexes or join orders can increase optimization, so the trivial plan is used.
For any particular query, SQL Server will use statistics to understand the distribution of data. Statistics are stored in the statblob column of the sysindexes table in each database. Join orders can also be optimized based on the number of rows that are fetched by each join condition.

To help us better understand this, let's take a look at an example that replicates the symptoms of our case.
Create table ParameterSniffing ( i int identity(1,1), Gender Char(1), lastname varchar(100))

Fill the table with data:
Declare @i int
Set @i = 0
While @i < 10000
Begin
            Insert into ParameterSniffing(Gender) values('M')

            Set @i = @i + 1
end

Fill only two rows for female:
Insert into ParameterSniffing(Gender) values('F')

Create indexes on the table:
Create clustered index clus_ind on ParameterSniffing(i)
Create index ind on ParameterSniffing(Gender)

Create a stored procedures that take one parameter and gives us output:
Create procedure pr_testParameterSniffing
@Gender char(1)
As
Select * from ParameterSniffing where Gender = @Gender

Before we pass the parameter 'M' (typical values), let's free the procedure cache.
DBCC FREEPROCCACHE
DBCC DROPCLEANBUFFERS

Execute procedure with parameter male('M'):
Set statistics profile on
Go
Set statistics time on
Go
Set statistics io on
Go
                pr_testParameterSniffing 'M'

Set statistics profile off
Go
Set statistics time off
Go
Set statistics io off
Go

You will notice that the execution plan calls for a clustered index scan. This behavior is normal because 10,000 of our 10,002 rows are 'M'. It does not make sense to do a nonclustered index (defined) on gender , follow it up with as bookmark lookup and then generate the results. Instead, go for a clustered index scan or a table scan if no clustered index is present.

Now let's see what happens when we run this again with parameter 'F':
DBCC FREEPROCCACHE
DBCC DROPCLEANBUFFERS

Execute procedure with parameter Female('F'):
Set statistics profile on
Go
Set statistics time on
Go
Set statistics io on
Go
            pr_testParameterSniffing 'F'

Set statistics profile off
Go
Set statistics time off
Go
Set statistics io off
Go

Notice that the execution plan goes for index seek -> bookmark lookup. This also makes sense as it is going to get on two rows and the cost of a bookmark is very low.
Now imagine that one of my users has come in and executed the 'F' parameter first. So, it went for index seek and then bookmark lookup. For the rest of entire day, we use the 'M' parameter. What would happen? Even though the clustered index scan is better for this parameter, we still go for index seek and bookmark lookup even though bookmark lookup is costly here. You can see the difference in execution time for the same query with a different set of parameters.

Before trying this for yourself, please do not free the procedure cache on this step:
Set statistics profile on
Go
Set statistics time on
Go
Set statistics io on
Go
            pr_testParameterSniffing 'M'

Set statistics profile off
Go
Set statistics time off
Go
Set statistics io off
Go

Why does this happen? Because when we ran the first procedure with 'F' parameter, it went for index seek (nonclustered) and then cached this execution plan in memory. Subsequent executions of the query will use same execution plan even though it obviously is costly.
Now that the problem problem is clear, let's find a solution for it.

Solution 1:

Execute same stored procedure (without clearing the procedure cache) with the recompile option:
Set statistics profile on
Go
Set statistics time on
Go
Set statistics io on
Go
            pr_testParameterSniffing 'm' with recompile

Set statistics profile off
Go
Set statistics time off
Go
Set statistics io off
Go
This will recompile the query and use the correct execution plan.

Solution 2:
During the creation of the procedure, use the "with recompile"option:
Create procedure pr_testParameterSniffing @Gender char(1) with recompile
As
Select * from ParameterSniffing where Gender = @Gender

This will let SQL Server know that every time we execute the query we need to recompile the stored procedure.
These first two solutions should only be used when the compilation time for the query is very short. Otherwise, we might impact performance again by recompiling the stored procedure and not taking advantage of the procedure cache.

Solution 3:

DBCC  freeproccache
(This will remove every thing from cache and recompile the stored procedure again.)
This is not a practical approach as we cannot afford to clean the procedure cache during business hours.

Solution 4: The smart way to do it:

In the stored procedure declare variables that are being sent as parameters and used by the where clause. Here is the example:
Create procedure pr_testParameterSniffing
@Gender char(1)
with recompile
As
declare @gender1 char(1)
set @Gender1 = @Gender
Select * from ParameterSniffing where Gender = @Gender

This will help create a good execution plan because SQL Server is good at guessing parameter values. Typically, we see such behavior when the distribution of data is not uniform in the table.
Declaring parameters should be done only in cases where we are seeing the behavior described above, as there are issues with declaring parameters as well.



Many thanks to Guru Charan.
Profile: Over five years experience as an IT professional. Working as an SQL DBA lead for Satyam Computers, a multinational software company based in India. Involved in remote management of database server across the globe. Certifications include MSCD, MDCBA, MCSE.

1 comment:

Anonymous said...

Helped me..... thanks alot

Post a Comment