Solving Schema Separation Challenges

Introduction
To save on hosting costs and simplify maintenance, Software as a Service (SaaS) providers typically rely on schema separation to host multiple customers' records. This implementation relies on a specific SQL Server and SQL Azure feature called a schema object. A schema object behaves like a container, or a namespace in programmatic terms, allowing multiple tables (and other objects) to be stored with the same name in a single database.
However schema separation comes with its own set of challenges. The challenge in using schema separation is that few tools support it. For example, until recently, it was not possible to easily backup or restore a single schema.  Other challenges include security and account management. Let's explore further.
Basics of Schema Separation
Let's assume you have two customers, and you need to track historical information. So you need to create a table called tbl_history for each customer.  You could either create a database for each customer with the tbl_history table in each database (the linear shard pattern), or create a schema container for each customer in a single database, and create the tbl_history table in each schema container (the compressed shard pattern). For a description of sharding patterns review this white paper:  http://www.bluesyntax.net/files/EnzoFramework.pdf.
Here is an example of a schema-based storage for different customers. A database is created with 2 schema containers (cust1 and cust2). The tbl_history table is then added to each schema container. The following script assumes that the database has already been created.
CREATE SCHEMA cust1
GO
CREATE SCHEMA cust2
GO
 
CREATE TABLE cust1.tbl_history(id int identity(1,1) primary key, dateadded datetime default (getdate()), productid int, quantity int)
CREATE TABLE cust2.tbl_history(id int identity(1,1) primary key, dateadded datetime default (getdate()), productid int, quantity int)
 
INSERT INTO cust1.tbl_history(productid, quantity) values (1, 5)
INSERT INTO cust1.tbl_history(productid, quantity) values (2, 7)
INSERT INTO cust2.tbl_history(productid, quantity) values (107, 22)
 
SELECT * FROM cust1.tbl_history
SELECT * FROM cust2.tbl_history
At this point we have two tables with the same name, each in a different schema container, but within a single database. Here is more information about schema containers: http://msdn.microsoft.com/en-us/library/ms189462.aspx
Securing Access
An important feature of schema containers is their support for security. You could easily create a user (mapped to a login account) and grant that user SELECT rights to cust1.  The following statements should be executed against the master database in SQL Azure.
CREATE LOGIN logincust1 WITH PASSWORD = 'p@ssw0rd001'
CREATE LOGIN logincust2 WITH PASSWORD = 'p@ssw0rd002'
Then back in the user database, create two users.
CREATE USER user1 FOR LOGIN logincust1
CREATE USER user2 FOR LOGIN logincust2
 
Finally, you need to authorize each user to execute statements against the history table. We will authorize user1 to use all the objects in schema cust1, and user2 in cust2.
GRANT SELECT, EXECUTE ON SCHEMA::cust1 TO user1
GRANT SELECT, EXECUTE ON SCHEMA::cust2 TO user2
 
At this point, user1 can only select and execute stored procedures in the cust1 schema. user1 cannot access schema cust2.  Here is more information about the Create Login statement: http://msdn.microsoft.com/en-us/library/ee336268.aspx
Customer Accounts vs. Service Accounts
The previous section discusses the creation of logins and users that provide security to each schema container. It should be noted that each login/user account created in the user database should be treated as a service account, not an actual customer account. Indeed, if you create customer accounts in SQL Azure directly you could negatively impact connection pooling, and hence performance. You should authorize users first, using a specific application authentication component (such as ASP.NET Membership), then map the customer account to the service account to use.
You typically implement the mapping (from customer account to service account) using a separate customer account database. A column in your customer account table would store the database connection string, which would contain the service account to use. Here is an example:
CREATE TABLE custmapping(customerid int primary key, custconnection nvarchar(255) NOT NULL)
Note that in a production environment, you would likely encrypt the connection string. The above table does not implement encryption to illustrate a simple scenario.
You would then add a record for each customer:
INSERT INTO custmapping VALUES (1, 'server=....;UID=logincust1;PWD=p@ssw0rd001')
INSERT INTO custmapping VALUES (2, 'server=....;UID=logincust2;PWD=p@ssw0rd002')
 
When a customer logins with their account, your code would read the custmapping table to retrieve the connection string to use for the duration of the session.
Moving A Schema
If a customer grows significantly, or is abusing your SaaS application, you may be facing the need to move that customer's data to a different database so that the other customers (called tenants in SaaS terms) are not affected negatively by the increase in resources needed by that customer.
There are very few options available today to move a single schema container, and its associated objects, from one database to another. You could manually create the schema container and its objects first, then use BCP or SSIS to move the data. However this can be error prone and lengthy.
A tool recently released by Blue Syntax, called Enzo Backup for SQL Azure, provides the ability to backup a single schema. This tool will backup the associated users in addition to the related objects (tables, stored procedures and so forth). The restore process will recreate the schema in the chosen database server and all the objects (and data) in that schema. Here is more information about this tool: http://www.bluesyntax.net/backup.aspx
Multitenant Frameworks
Due to the level of complexity in building multitenant environments, certain companies are turning to specialized frameworks. Although these frameworks can require some learning curve, they provide certain capabilities that would be difficult to build, such as fan-out, caching, and other capabilities. Here are a few .NET frameworks:
·         The Enzo Sharding Library (open-source); also supports Data Federation: http://enzosqlshard.codeplex.com/releases/view/72791
·         The CloudNinja project (open-source): http://cloudninja.codeplex.com/
·         The Enzo Multitenant Framework: http://www.bluesyntax.net/scale.aspx
Schema Separation and Data Federation
Data Federation is an upcoming feature of SQL Azure that will provide a new mechanism to distribute data. In a way, Data Federation allows SaaS vendors to design a monolithic database, in which all customers are located, and distribute records of one or more tables across databases when the time comes. Data Federation is essentially a compressed shard, similarly to schema separation, with the added benefit of tooling support and easier repartitioning based on performance and storage needs. In addition, Data Federation can help you distribute data on almost any dimension; not just customer id.
Data Federation and Schema Separation are not competing solutions for SaaS vendors. Each have specific benefits and challenges. SaaS vendors will need to determine which one serves their needs best, or even use both.
Using both Schema Separation and Data Federation delivers even greater flexibility.  For example, if the history table of customer 2 becomes too large, you could leverage Data Federation to split that table, without affecting customer 1. A specific use of this technique could be to use Data Federation to split tables across databases every year for all customers (or only those that have a lot of records). This dual layering technique (using schema separation for customer records, and data federation for archiving) can deliver remarkable performance results and scalability.
Conclusion
Many vendors have successfully built a SaaS solution using schema separation. While tooling has been lacking, some vendors are adding schema separation support in their products. Understanding the options available at your finger tips, including the frameworks already developed, and how to combine sharding models can give you a significant advantage when building your SaaS solutions.
  

Print | posted @ Wednesday, October 5, 2011 3:03 PM

Comments on this entry:

Comments are closed.

Comments have been closed on this topic.