SQL Azure: Improving Performance Using a Shard Library

Shards (defined as a collection of databases acting as one) are starting to take serious traction in the database development world because they offer something that was unavailable previously: asynchronous data fetching. It is fair to say that developing multi-threaded applications can be difficult. Shard technologies remove many of the complexities required to fetch data from multiple databases asynchronously.

In this blog I will discuss a specific case in which performance was improved significantly using a Shard technology against SQL Azure databases (although this works just as well with SQL Server databases). I am using the shard library I published on Codeplex (http://EnzoSqlShard.codeplex.com).  In this open-source project I provide a sample application that can be used to test drive the shard library.

Before the shard

Before using the shard library I ran a statement that returns two records (actually, in this test I am using the shard library, but the shard has a single database in it). While it doesn't seem like a whole lot, each record contains a PDF file of a few megabytes each. So each record returned to the client forces many packets to be sent over the network. In fact, each record takes about 1.3 seconds to be returned. So the total time needed to return both records took me about 2.6 seconds.

With the shard

However when the shard was using two databases, each containing one record, the total execution time was 1.4 seconds. This is due to the fact that each record was being fetched asynchronously. The performance of my SQL statement almost doubled.  Keep in mind however that in this case my network bandwidth was not a bottleneck as far as I could tell. So while I was able to double my throughput by adding a database to the shard, scalability will be limited unless I upgrade my network cards and overall network pipe.


While shards come with their own set of complexities and challenges, such as a lack of strong referential integrity, they can be used to assist with performance problems typical of tables containing millions of records and/or when dealing with records that contain files or large binary columns.



Print | posted @ Sunday, June 27, 2010 9:00 PM

Comments on this entry:

Comments are closed.

Comments have been closed on this topic.