Geeks With Blogs
Marcin Kasprzykowski`s blog

ABSTRACT

This article shows that adding a TOP clause to the SQL SELECT query may greatly increase its execution time. It also shows very effective solution to fix this problem.


INTRODUCTION

Lately I was conducting performance tests for a new project. I built sample database, filled it with data and began testing. The database schema is not important here. All you should know is that I had about 3M of records in main table [Articles] and about 11M of records in tables which I joined. I used this naive query at first:

SELECT

[Articles].Id

,CountryCategories.Name

,CityCategories.Name

FROM [Articles]

INNER JOIN CategoryCountry2Articles

ON [Articles].Id = CategoryCountry2Articles.IdArticle

INNER JOIN  CountryCategories

ON CountryCategories.Id = CategoryCountry2Articles.IdCountry

INNER JOIN CategoryCity2Articles

ON [Articles].Id = CategoryCity2Articles.IdArticle

INNER JOIN  CityCategories

ON CityCategories.Id = CategoryCity2Articles.IdCity

WHERE CountryCategories.Name = 'country1'

AND CityCategories.Name = 'city4'

 

This query took 00:04 (seconds) to execute. It returned far too much records, so I thought it would be a good idea to use TOP N clause to limit them. After all, limiting the number of records to be returned should decrease execution time. And certainly it should not increase it. Query optimizer however is odd enough to refute my assumptions. By adding a TOP clause to aforementioned query:

SELECT TOP(5)

[Articles].Id,

(…)

query execution time jumped to 01:14!  How is that possible?


EXPLANATION

Execution plans for both queries may help to explain it. For base query:

http://imageshack.us/a/img195/5361/sqltop1.png

 and for query with TOP clause:

http://imageshack.us/a/img521/9523/sqltop2.png

The main difference between those two execution plans is that in first plan optimizer uses Hash Matches, while in second it uses Nested Loops. Both symbols generally describe the same operation (join). However Hash Matches are efficient with large sets, while Nested Loops only with small sets. This is why the latter query performs so badly. It is the query optimizer that makes wrong decision.

But let’s assume that for some reason you do not want to abandon the idea  of using TOP clause here. There are two ways to induce query optimizer to take right decision.


SOLUTIONS

First solution is to use explicit hash joins in your query:

SELECT TOP 5

[Articles].Id

,CountryCategories.Name

,CityCategories.Name

FROM [Articles]

 INNER HASH JOIN CategoryCountry2Articles

ON [Articles].Id = CategoryCountry2Articles.IdArticle

 INNER HASH JOIN  CountryCategories

ON CountryCategories.Id = CategoryCountry2Articles.IdCountry

 INNER HASH JOIN CategoryCity2Articles

ON [Articles].Id = CategoryCity2Articles.IdArticle

 INNER HASH JOIN  CityCategories

ON CityCategories.Id = CategoryCity2Articles.IdCity

WHERE CountryCategories.Name = 'country1'

AND CityCategories.Name = 'city4'

 

Execution time: 00:07. Execution plan:

 http://imageshack.us/a/img812/2150/sqltop3.png

 

As you can see, this time Hash Matches are used.

 

Second and most effective solution is to use SQL variable:

DECLARE @topCount INT

SET @topCount = 5

SELECT TOP (@topCount)

(...)

 

Execution time: 00:00.2 (0,2 sec).

Quick look at the execution plan:

http://imageshack.us/a/img802/6742/sqltop4.png

 

It shows that query optimizer did not only choose to use Hash Matches, but it also shows high degree of parallelism in the query. This is why it is so incredibly efficient.

 

I am not able to explain why using variable pushes query optimizer towards those decisions. But in case you are struggling with the same problem as I did, try this solution. I hope it helps.

 

 

Posted on Wednesday, January 30, 2013 10:27 PM | Back to top


Comments on this post: Why SQL TOP may slow down your query and how to fix it

# re: Why SQL TOP may slow down your query and how to fix it
Requesting Gravatar...
Have you played to see how this might impact any ORM? Are you using EF or nhibernate? I believe they both use the top construct.

Very interesting analysis
Left by Nick on Jan 31, 2013 2:46 AM

# re: Why SQL TOP may slow down your query and how to fix it
Requesting Gravatar...
Hi Nick,

Thanks for feedback.

I tried using ADO.NET, LINQ2SQL and EF. While the latter two perform in a similar way (which is not surprising), ADO.NET was the fastest (with TOP (@N) version of the query).

I checked with Profiler that both EF and LINQ2SQL build my query using inner joins some of which where nested inside others. That was very interesting optimization.

Also I figured out that building my query inside LINQ2SQL was far more effective using .Join() methods than using .Select() methods.

Another thing worth mentioning is that although one ADO.NET query execution took a bit more than 0,2sec, I issued 1000 serial similar queries with different WHERE conditions (inside one sql connection). Average response time was 0,04sec.
Left by Martinez on Jan 31, 2013 3:50 AM

# re: Why SQL TOP may slow down your query and how to fix it
Requesting Gravatar...
Great stuff, thanks!
Left by Led on Jan 31, 2013 7:05 PM

# re: Why SQL TOP may slow down your query and how to fix it
Requesting Gravatar...
I was very puzzled by this issue. You just saved me a lot of time. Thanks :-)
Left by Torben Rohde on Oct 23, 2013 11:11 PM

# re: Why SQL TOP may slow down your query and how to fix it
Requesting Gravatar...
Very useful thanks!

Left by Chris Nevill on Jan 03, 2014 3:25 AM

Your comment:
 (will show your gravatar)


Copyright © Martinez | Powered by: GeeksWithBlogs.net