Understanding CLR Integration in SQL Server 2005
One of the biggest new features in SQL Server 2005 is the integration of the .NET Common Language Runtime (CLR) into the database server. It has certainly been one of the most discussed new features. Having direct access to the CLR and all the functionality it offers opens a range of new possibilities for developers. Perhaps the most compelling is that it is possible to write database objects in C#, Visual Basic.NET or any other .NET-language that the developer favors. Even with the new features that have been added to T-SQL - such as structured error handling - most developers are not very comfortable in writing procedural logic with it. Other related benefits are access to (most of) the rich API in the .NET Framework as well as the ability to use Visual Studio 2005 for working with database objects. In short, the main reason to have CLR integration is to increase developer productivity. As a side-effect there is also the possibility of increased performance in certain scenarios.
On the other hand the DBAs are wondering what this new feature is and why they should enable it in their database servers. Will they risk the stability of their servers if they do so? The answer is no, and the purpose of this article is to drill further down to understand why it is safe for DBAs to enable CLR integration. It does not describe how to create database objects in a .NET-language, please refer to Books Online or other articles for that.
h2. Some basic .NET concepts
Before digging into how the CLR integration works it is necessary to define a few terms regarding .NET and the CLR.
The *CLR* is at the heart of the .NET Framework, providing runtime services to applications written in managed code. Some of the services it provides are automatic memory management (garbage collection), code verifiability and code access security.
Managed code means that a developer writes code in a high-level language such as C#, using the *Base Class Library* (BCL) to get access to the features of the .NET Framework. The BCL is a rich set of classes, which are used as building blocks to build applications. The C# language compiler then compiles this code into an *assembly*. An assembly is the smallest unit of deployment in .NET. For the purposes of this article an assembly can be considered to be a DLL which contains the compiled code in MSIL (intermediate language) and a manifest that holds metadata about the assembly. At run time the CLR then compiles the MSIL into native, processor specific code and executes it. In this process the CLR inspects the metadata and the code to validate integrity, resolve references to other assemblies and verify that the code can be given the permissions it needs to execute.
h2. Hosting the CLR
Another feature of the CLR is the great amount of flexibility it supports. The CLR provides extensibility points that allow processes to host the CLR inside themselves and customize a lot of the behavior of the CLR. In fact, the CLR cannot be run ‘by itself’ (e.g. there is no operating system support to run a CLR application directly). Whenever an application written in managed code is run from the shell there is a small piece of unmanaged code that is executed to load the CLR into the process and then pass on execution to it.
To load and customize the CLR a process uses a collection of interfaces and functions called the CLR Hosting APIs. Through these interfaces a CLR host (the process that loaded the CLR) can customize a lot of the behavior of the CLR, including (but not limited to) the following:
* Assembly loading
* Failure handling
* Security policies including programming model enforcement
* Memory management and garbage collection
* Threading and synchronization
Most CLR hosts probably do not care about many of these and just use the default behavior. SQL Server however uses all of the hosting interfaces to customize every aspect of the CLR to make sure that the security, reliability and performance of the server is not compromised.
h3. Architecture of CLR integration in SQL Server
When a database object implemented in .NET is referenced, for instance a .NET stored procedure is executed, the CLR is loaded into the SQL Server process (if not already loaded of course) by a hosting layer. This hosting layer implements the hosting interfaces and coordinates the management of the CLR. The hosting layer is connected to the SQLOS (SQL Server Operating System).
SQLOS is a central component in SQL Server 2005, basically serving as an abstraction between SQL Server and the operating system and hardware that the server is running on. The CLR hosting layer uses SQLOS in the same way as other components in SQL Server do, to get cohesive access to system services such as scheduling, memory management, synchronization, resource management and more. Since other components in SQL Server also use it, this means that managed code executed in the CLR and the resources used by it are handled by the same component as any other piece of code in SQL Server. For instance, when the CLR needs to access memory it is the multi-page allocator (often known as the MemToLeave region) that responds to the memory request. And if there is memory pressure the memory manager can tell the CLR to do a garbage collection to free up some memory.
h2. Tight integration for safer and more effective handling
The tight integration between SQL Server and the CLR means that it is now possible to extend SQL Server with new functionality in a reliable and secure way, as opposed to using extended stored procedures (not to mention how much easier and more productive it is). A number of steps have been taken to ensure that the reliability, security and scalability of SQL Server are not compromised by executing managed code in it.
h3. Only local assemblies are loaded
SQL Server will never load an assembly directly off disk to reference a database object implemented in it. Instead the assembly must first be registered in SQL Server before it can be used. This is done using the @CREATE ASSEMBLY@ DDL statement, which reads the assembly bits of disk and stores it locally in the database where the statement is executed. As an effect of this, the registered assemblies and the code they contain are backed up along with the rest of the database. The original files can be removed from disk, as SQL Server will always load the assembly from the bits stored internally, as opposed to the way extended stored procedures are dynamically loaded from disk.
When the CLR requests an assembly to be loaded SQL Server uses the hosting APIs to handle this request by looking in the database to see if the assembly exists there. If it does not already exist the request fails. The exception to this is the assemblies containing the classes in the BCL. These are loaded directly from the Global Assembly Cache (GAC) in the same way as a normal CLR application does. However, SQL Server does not allow all assemblies of the BCL to be loaded. It has a pre-defined list of the assemblies that are accepted. Assemblies that are not accepted include those that are used for user interfacing (System.Windows.Forms.dll) and remoting (System.Runtime.Remoting.dll), and many others.
h3. Restrictions define what the code is allowed to do
To further restrict the code that is allowed to exist and execute inside SQL Server each assembly must be registered with a set of permissions. Three pre-defined sets are available to use; SAFE, EXTERNAL_ACCESS and UNSAFE. When an assembly is registered in SQL Server the user specifies which permission set the assembly should have. Before the assembly is accepted and stored SQL Server will have the CLR inspect and validate it according to the restrictions defined by the permission set. If the validation fails then the assembly cannot be registered and created in SQL Server.
For an assembly to be registered as SAFE the code must be verifiable (e.g. the CLR can enforce security policies), it cannot use any shared state and cannot include any unmanaged code or interop calls. Basically it is only allowed to do computations and read-and-write access to locally stored data. EXTERNAL_ACCESS has the same restrictions as SAFE except that it allows access to external resources. Assemblies that are registered as UNSAFE can do more or less anything at all, including access to unmanaged code.
When code in an assembly registered with SAFE or EXTERNAL_ACCESS is executed the CLR verifies that it does not break any security policies. A part of this is verifying that it does not access any type or member that has been annotated with certain HostProtectionAttributes (HPA). The .NET Framework supports annotating types and members with attributes that are not interpreted by the compiler but simply compiled into the metadata of an assembly. These attributes can then be inspected at a later time and have many uses.
In the case of host protection attributes the CLR makes sure that none of the types or members used by the code is annotated with an HPA that says a type or member is doing something that could be considered harmful to a CLR host. For instance, if a host protection attribute specifying UI (user interface action such as showing a message box) or ExternalProcessMgmt (creating new threads and processes) is found on any of the types or members used by the code the CLR will not allow it to execute.
A good recommendation is to try to separate unrelated code into different assemblies and register each assembly using the least amount of permissions necessary. For instance, if you have a procedure that needs to read from the file system (EXTERNAL_ACCESS) it should not be in the same assembly as a function that matches strings with regular expressions. Also note that external resources by default are accessed in the context of the SQL Server service account. It is possible in code to explicitly impersonate the security context of the caller, so this should be used when possible to tighten security.
Finally, it goes without saying that UNSAFE code should be avoided. If a developer wants to add something that requires UNSAFE access the natural reaction should be to question whether or not it is necessary to have that code in the database server. The possible exception to this is when you have extended stored procedures or code using the sp_OA* procedures. This code is a prime target for reimplementation in managed code. But remember that not all extended stored procedures will require UNSAFE access when implemented in .NET!
h3. SQL Server creates separate application domains
When an assembly is loaded it is loaded into an application domain. Application domains are similar to processes in that they are a unit of isolation for the code running in them. This isolation means that the application domain can be independently stopped without affecting other application domains, and that a problem in one does not affect others. Application domains are lighter than processes, since a single process can run multiple applications in separate application domains.
SQL Server takes responsibility of creating application domains when the CLR needs them. SQL Server will create one application domain per database and assembly owner. So, if an assembly owned by Jane and registered in the Doe database is loaded it will not be loaded in the same application domain as an assembly registered in the same Doe database but owned by John. By separating “unrelated” assemblies into different isolated application domains, and by not allowing cross-domain calls, SQL Server ensures that a failing assembly does not affect other users’ functionality.
h3. SQL Server controls scheduling of CLR threads
In addition to the memory management possibilities mentioned already, one of the main features for keeping the scalability of SQL Server while running managed code is how the execution is scheduled. A big problem with extended stored procedures is that they use preemptive scheduling. This is the normal way threads run in Windows and means that the thread runs for a specific time-slice given to it by the operating system before it is interrupted and the next item on the runnable list can execute. However, SQL Server has its own user mode scheduler, meaning it does not use a lot of threads running in preemptive mode to handle all the tasks that need to be executed in the system. Instead it uses cooperative scheduling, where the code voluntarily yields control of the execution as often as possible. The SQL Server code has these yields everywhere so everything works in harmony. However, when an extended stored procedure is executed SQL Server has no way of knowing if the code will yield execution often enough (it probably will not yield at all), so therefore a thread is exclusively used to execute the extended stored procedure. This thread is switched to preemptive scheduling, which requires an expensive context switch. For this and other reasons it is not really healthy for the scalability of SQL Server to run external code in extended stored procedures.
With managed code executed by the CLR things are different though. With the control SQL Server has over the CLR it is possible for SQL Server to force the code to yield execution. Whenever a thread that is executing a piece of managed code requests a resource from SQL Server, such as accessing data in the database or requesting a memory allocation, it will take the thread off execution and move it to the back of the runnable queue and let other threads run.
Even if a piece of managed code does not request any resources for a certain time SQL Server still has a way of letting other tasks execute. This is done by interrupting the thread executing the managed code and putting it back in the runnable list. If there is nothing else waiting to get executed then the thread will get picked up again and continue execution, but if there are other things to execute this handling will make sure a piece of managed code cannot hog a CPU without letting others in to use it.
Note that if you have managed code running in the UNSAFE permission set and using unmanaged code or otherwise accessing unmanaged resources, this code will be scheduled like extended stored procedures. As soon as a thread leaves the managed execution environment the thread will be switched to preemptive scheduling mode and context switched.
h2. Conclusion
This article has described some of the aspects of how CLR integration works under the hood and why it is a much better solution for extending SQL Server with external code than using extended stored procedures. There is a lot more to say about when and how to use CLR integration in the most effective ways, and of course when not to use it. I plan to delve further into those topics in later writings.
h3. Revisions
2006-03: First published in “SQL Server Standard”:http://www.sqlserverstandard.com/ magazine
2007-02-03: Same version published online

