No pipe in functions
I spent a couple of minutes with a confused look on my face today. I was writing a CLR function and wanted to output some information about the progress. @SqlContext.Pipe.Send(”some message”);@ should work, right? Nope, a NullReferenceException was all I got for my troubles. I figured out that it was the property Pipe that did not return a SqlPipe, instead it gave me null. But I could not understand why there was no pipe.
Although I am sure that most of my readers are smarter than me and would figure out what the problem was in an instant, I thought I’d just make a note of it here anyway. First I thought about the DataAccess and SystemDataAccess parameters of the SqlFunction attribute. It seemed strange that they should be involved, but I tried experimenting with them anyway. No luck. After a while of thinking some more I did what always works, went for a cup of coffee. The answer came to me just about the same time as the cup was brewed. Functions are not allowed to send results to the client, whether implemented in CLR or T-SQL. Using a PRINT statement in a T-SQL UDF results in the following error when trying to create it:
@Msg 443, Level 16, State 14, Procedure test, Line 3@
@Invalid use of side-effecting or time-dependent operator in ‘PRINT’ within a function.@
So, I ended up using the Visual Studio debugger for my purposes, just as I should have from the start of course. Debugging by outputting text strings is not the best way of debugging. I just had one problem with the debugger though. I was using Niels Berglund’s excellent SQLCLR project add-in for Visual Studio for creating, deploying and debugging the code. I am not sure if the Visual Studio CLR project type behaves the same way, or even if it was just me doing something wrong, but the debugging mode were always abruptly ended with a message that the deployment took too long and timed out. Has anyone else had this problem?
Two notes about impersonation in SQLCLR
While experimenting a bit with impersonation in SQLCLR I noted two things I thought would be good to mention here. Firstly, if you are creating a function you need to set the named parameter DataAccess or SystemDataAccess (I assume it works with both set as well) of the SqlFunction attribute to Read. Strange that it works with either one of them.
The other thing was that BOL is not 100% accurate in the description of what the property WindowsIdentity of SqlContext can return. In BOL it says that if you are running integrated security this returns the token for the caller of the code, but if SQL Server security is used it returns NULL. What is missing is that if a SQL Server login which is a member of the sysadmins fixed server role (such as sa) is the caller of the code then WindowsIdentity will return a token. There is no meaning in calling the Impersonate method on it though, as it is the token for the SQL Server service account. Since SQLCLR code by default executes as the SQL Server service account there is not a lot of meaning in impersonating it..
Extension methods and method overload resolution
Eric Gunnerson recently “posted an interesting piece of C# code”:http://blogs.msdn.com/ericgu/archive/2005/12/12/502890.aspx and asked what it does. As he explains in the “follow-up discussion post”:http://blogs.msdn.com/ericgu/archive/2005/12/13/503225.aspx, the reason why it does what it does is that we would not want the behaviour of our program to change due to a new method being added to a class we are subclassing. This is of course precisely the way it should be to avoid nasty problems. However, an interesting thought that struck me was that the extension methods feature in C# 3.0 and the method overload resolution rules they use can create precisely that problem.
Consider the following C# 3.0 code:
// Foo.cs
namespace thirdpartylib {
public class Foo {
private int x;
private int y;
public Foo(int x, int y) {
this.x = x;
this.y = y;
}
public int DoFoo() {
return this.x + this.y;
}
public int X { get { return this.x; } }
public int Y { get { return this.y; } }
}
}
// ExtendFoo.cs
using thirdpartylib;
namespace extensions {
public static class ExtendFoo {
public static int DoBar(this Foo theFoo, int z) {
return theFoo.X - theFoo.Y - z;
}
}
}
// Program.cs
using System;
using thirdpartylib;
using extensions;
namespace LINQConsoleApplication1 {
class Program {
static void Main(string[] args) {
Foo foo = new Foo(1, 2);
int i = foo.DoFoo();
int j = foo.DoBar(3);
Console.WriteLine("DoFoo: {0}", i);
Console.WriteLine("DoBar: {0}", j);
}
}
}
When this little program is executed it prints “DoFoo: 3″ and “DoBar: -4″ to the console. For those that have not seen C# 3.0, the interesting part is the static class ExtendFoo with it’s static method DoBar. The keyword @this@ before the first parameter is what is new in 3.0, and it is what lets us extend the Foo class to seemingly have a DoBar method that the program can call. By simply bringing them into scope (with the @using extensions;@ line) the compiler adds any static methods on static classes with the first parameter ‘decorated’ with the @this@ keyword to the applicable classes. Note though that this is only syntactic sugar, what is really happening (and which can easily be seen by reviewing the IL generated) is the same as if we would have written this:
int j = ExtendFoo.DoBar(foo, 3);
So, coming back to method overload resolution. For extension methods the rule is that the extension method is used if it does not clash with an instance method on the extended type. So, now consider what happens when this code is added to Foo:
public int DoBar(short z) {
return this.x * this.y * z;
}
Now the behaviour of our program has changed into printing "DoBar: 6" instead of "DoBar: -4". Considering the IL that is generated with extension methods this is not at all strange, but I anticipate one or two developers might get caught by this. Microsoft does include a warning about extension methods in the PDC preview bits (??"Extension methods are less discoverable and more limited in functionality than instance methods. For those reasons, it is recommended that extension methods be used sparingly and only in situations where instance methods are not feasible or possible."??), but this in particular feels kind of creepy.
Troubles with shared state and anonymous delegates in SQLCLR
Earlier today I set out to create “a SQLCLR function for splitting a comma-separated string including ranges”:/articles/2006/01/27/splitting-a-comma-separated-string-with-ranges-included/ (such as “2,5,7-9,13”) into a set of values. I encountered some problems that I think is valuable to document here. The code below shows what I initially wrote:
SqlFunction(
DataAccess = DataAccessKind.None,
SystemDataAccess = SystemDataAccessKind.None,
IsDeterministic = true,
IsPrecise = true,
FillRowMethodName = "FillRow")]
public static IEnumerable SplitWithRanges(SqlString input) {
// First expand ranges
Regex rangeExpander = new Regex(@"(?\d+)\-(?\d+)");
string expandedRanges = rangeExpander.Replace(
input.Value,
delegate(Match match) {
int startOfRange = Int32.Parse(
match.Groups["RangeStart"].Value);
int endOfRange = Int32.Parse(
match.Groups["RangeEnd"].Value);
string[] values = new string[endOfRange - startOfRange + 1];
for(int i = 0; i < values.Length; i++) {
values[i] = (startOfRange + i).ToString();
}
return String.Join(",", values);
}
);
// Now split the string containing the comma-separated values
string[] splittedValues = expandedRanges.Split(
new char[] { ',' },
StringSplitOptions.None);
return splittedValues;
}
public static void FillRow(Object obj, out SqlString tuple) {
tuple = new SqlString((String)obj);
}
However, when I tried to add this assembly to SQL Server I ran into problems. The CREATE ASSEMBLY statement failed with the following message:
@Msg 6212, Level 16, State 1, Line 8@
@CREATE ASSEMBLY failed because method ‘SplitWithRanges’ on type ‘Functions’ in safe assembly ‘Splitting’ is storing to a static field. Storing to a static field is not allowed in safe assemblies.@
I know that assemblies must be registered in UNSAFE to use shared state. The reason is that if an executing thread suffers a thread abort exception (for instance due to a stack overflow or out of memory exception being thrown by something it does) and there is any shared state in the application domain where it is executing, then the entire application domain is unloaded. It does not matter whether or not this specific thread was using the shared state, as a safety precaution the application domain is unloaded. Therefore SQL Server tries to restrict you from using shared state in SQLCLR.
But where does the C# code above use any shared state? A little reflecting with Lutz Roeder’s Reflector gave me the answer. The row with the anonymous delegate above is actually compiled into IL that matches C# code similar to this:
if (Functions.<>9__CachedAnonymousMethodDelegate1 == null) {
Functions.<>9__CachedAnonymousMethodDelegate1 = new MatchEvaluator(
Functions.b__0);
}
string text1 = regex1.Replace(
input.Value,
Functions.<>9__CachedAnonymousMethodDelegate1);
Functions is the name of the class containing my code, so what this does is use a static field for storing the delegate. The static field is defined as below:
[CompilerGenerated] private static MatchEvaluator <>9__CachedAnonymousMethodDelegate1;
So there we have the shared state. To get around this there are two solutions. I can either define a function to use as the delegate, like this:
private static String ExpandRanges(Match match) {
int startOfRange = Int32.Parse(
match.Groups["RangeStart"].Value);
int endOfRange = Int32.Parse(
match.Groups["RangeEnd"].Value);
string[] values = new string[endOfRange - startOfRange + 1];
for(int i = 0; i < values.Length; i++) {
values[i] = (startOfRange + i).ToString();
}
return String.Join(",", values);
}
Then I just change the row calling the delegate to this:
string expandedRanges = rangeExpander.Replace( input.Value, new MatchEvaluator(Functions.ExpandRanges));
The second “solution” is to use a variable local to the function SplitWithRanges inside the anonymous delegate. The reason this works is that the compiler now generates a private nested class with a function that is used as the delegate. So, if I for instance declare a variable @bool foo = true;@ outside the delegate and then simply do @foo = foo;@ in the code of the anonymous delegate then the compiler will actually create this:
[SqlFunction(
DataAccess=DataAccessKind.None,
SystemDataAccess=SystemDataAccessKind.None,
IsDeterministic=true,
IsPrecise=true,
FillRowMethodName="FillRow")]
public static IEnumerable SplitWithRanges(SqlString input) {
Functions.<>c__DisplayClass1 class1 = new Functions.<>c__DisplayClass1();
Regex regex1 = new Regex(
@"(?\d+)\-(?\d+)");
class1.foo = true;
string text1 = regex1.Replace(
input.Value,
new MatchEvaluator(class1.b__0));
char[] chArray1 = new char[] { ',' } ;
return text1.Split(chArray1, StringSplitOptions.None);
}
As I mentioned above, class1 is a private class defined (by the compiler) in my class Functions. So under the covers this solution is very similar to the previous one. It is just a question of where the function is placed. Although I love anonymous delegates, in this case I went with defining the function myself and using that for the delegate.
Splitting a comma separated string with ranges included
While browsing the SqlServerCentral.com forums earlier today I saw a question that interested me. The person asking it wanted to split a string containing a repeating group of values into a set of values. I remembered that Eric Gunnerson is running a series of regular expression puzzle questions and that “one of the recent ones”:http://blogs.msdn.com/ericgu/archive/2006/01/16/513644.aspx was about just this, just not with the SQL Server variant. So, here is what we want to be able to do:
SELECT * FROM dbo.SplitWithRangesExpanded( '1,3,7,12,15-20,23,27-29' );
The result should be this set { 1,3,7,12,15,16,17,18,19,20,23,27,28,29 }. The following code in C# can be used to define a streaming table-valued function in SQL Server.
using System;
using System.Collections;
using System.Data.SqlTypes;
using System.Text.RegularExpressions;
using Microsoft.SqlServer.Server;
public partial class Functions {
[SqlFunction(DataAccess = DataAccessKind.None
, SystemDataAccess = SystemDataAccessKind.None
, IsDeterministic = true
, IsPrecise = true
, FillRowMethodName = "FillRow")]
public static IEnumerable SplitWithRanges(SqlString input) {
// First expand ranges
Regex rangeExpander = new Regex(
@"(?\d+)\-(?\d+)");
string expandedRanges = rangeExpander.Replace(
input.Value,
new MatchEvaluator(Functions.ExpandRanges));
// Now split the string containing the comma-separated values
string[] splittedValues = expandedRanges.Split(
new char[] { ',' },
StringSplitOptions.None);
return splittedValues;
}
private static String ExpandRanges(Match match) {
int startOfRange = Int32.Parse(
match.Groups["RangeStart"].Value);
int endOfRange = Int32.Parse(
match.Groups["RangeEnd"].Value);
string[] values = new string[endOfRange - startOfRange + 1];
for(int i = 0; i < values.Length; i++) {
values[i] = (startOfRange + i).ToString();
}
return String.Join(",", values);
}
public static void FillRow(Object obj, out SqlString tuple) {
tuple = new SqlString((String)obj);
}
}
Note that the original code I wrote used an anonymous delegate for the regular expression match evaluator, but due to problems with shared state I had to explicitly define the function to use for it. Read the “companion post regarding these technical issues”:/articles/2006/10/04/troubles-with-shared-state-and-anonymous-delegates-in-sqlclr/ for more info.
Index lookups, seeks and scans
In my latest article entitled “Dynamic Management Objects”:http://www.hedgate.net/pages/articles/dynamic-management-objects/ I describe a great new feature in SQL Server 2005. With Dynamic Management Views (DMVs) and Dynamic Management Functions (DMFs) we can get a lot of real-time information regarding what have been going on in the server since the latest restart. I was looking through some of these recently and while testing the DMV called sys.dm_db_index_usage_stats I noticed something that intrigued me. Among a lot of other interesting columns in the output there are three called user_seeks, user_scans and user_lookups. Now, I know what scans and seeks are, but how is a lookup defined? In particular, how does it differ from a seek? I had a hunch that I knew what they indicated, but to verify it I did some quick testing, which is described in the below script.
USE AdventureWorks
GO
– Empty result set (when server is just restarted of course)
SELECT * FROM sys.dm_db_index_usage_stats
WHERE OBJECT_ID = OBJECT_ID(’Sales.SalesOrderHeader’);
SELECT * FROM Sales.SalesOrderHeader;
– IndexId 1: 0 user_seeks, 1 user_scans and 0 user_lookups
SELECT * FROM sys.dm_db_index_usage_stats
WHERE OBJECT_ID = OBJECT_ID(’Sales.SalesOrderHeader’);
SELECT * FROM Sales.SalesOrderHeader
WHERE SalesOrderID = 70875;
– IndexId 1: 1 user_seeks, 1 user_scans and 0 user_lookups
SELECT * FROM sys.dm_db_index_usage_stats
WHERE OBJECT_ID = OBJECT_ID(’Sales.SalesOrderHeader’);
SELECT * FROM Sales.SalesOrderHeader
WHERE SalesOrderID BETWEEN 70875 AND 70879;
– IndexId 1: 2 user_seeks, 1 user_scans and 0 user_lookups
SELECT * FROM sys.dm_db_index_usage_stats
WHERE OBJECT_ID = OBJECT_ID(’Sales.SalesOrderHeader’);
SELECT * FROM Sales.SalesOrderHeader
WHERE CustomerID = 28220;
– IndexId 1: 2 user_seeks, 1 user_scans and 1 user_lookups
– IndexId 5: 1 user_seeks, 0 user_scans and 0 user_lookups
SELECT * FROM sys.dm_db_index_usage_stats
WHERE OBJECT_ID = OBJECT_ID(’Sales.SalesOrderHeader’);
SELECT * FROM Sales.SalesOrderHeader
WHERE CustomerID IN (28220, 25925);
– IndexId 1: 2 user_seeks, 1 user_scans and 2 user_lookups
– IndexId 5: 2 user_seeks, 0 user_scans and 0 user_lookups
SELECT * FROM sys.dm_db_index_usage_stats
WHERE OBJECT_ID = OBJECT_ID(’Sales.SalesOrderHeader’);
One thing to note about sys.dm_db_index_usage_stats is that the numbers in all of these columns show the number of ‘operations’, not the number of pages or rows or anything. So, as this script shows, a user_lookup is the operation used in a clustered index (or heap) to fetch a full data row (or rows) using the key(s) returned from a seek operation of a non-clustered index.
More on magic numbers
I recently posted a recommendation about “avoiding magic numbers in stored procedures”:/articles/2006/01/12/stored-procedure-guidelines-avoid-magic-numbers (and other t-sql routines of course). Yesterday I updated that post with some important info that invalidates the recommendation from a performance view. Please read the updated post if you have not seen this. However, I still think the advice of avoiding magic numbers to make code more readable is important.
So what can we do to write readable code and still have good performance? Well, the simple answer is unfortunately that we cannot in this case. In SQL Server 2005 we an use the new query hint OPTIMIZE FOR to allow us to use a variable instead of a literal in our code, but I am not sure I would say the code below is more readable than simply using a literal:
DECLARE @StartOrderDate datetime
SET @StartOrderDate = ‘20040731′
SELECT * FROM Sales.SalesOrderHeader h, Sales.SalesOrderDetail d
WHERE h.SalesOrderID = d.SalesOrderId
AND h.OrderDate >= @StartOrderDate
OPTION (OPTIMIZE FOR ( @StartOrderDate = ‘20040731′ ) )
This query will produce the same plan as if we just used the literal instead of a variable. If the literal represents a value that is not at all easily understood (unlike the date literal above) then maybe we could say that this code is more easily readable than the alternative, but for most cases it will just look silly.
However, when thinking about this I came up with another idea. What if we could use named (declared) literals in our code, just like constants in a language such as C#? We could declare a constant with a read-only literal, and then use that constant in the query. The effect would be the same as using the literal by itself. Some examples of how this could look is shown below:
DECLARE @SomeConstant int = 42
DECLARE !SomeConstant int = 42
CONSTANT @SomeConstant int = 42
READONLY @SomeConstant int = 42
So what do you think? Am I just too concerned with making stored procedures readable, or is this something you would like to see in a future version of SQL Server? How should it look and work?
Stored Procedure Guidelines: Avoid magic numbers
*UPDATE*: I have just read a post by “Ian Jose”:http://blogs.msdn.com/ianjo/default.aspx from the “Query Optimizer team”:http://blogs.msdn.com/sqlqueryprocessing/default.aspx that sort of invalidates this recommendation. He writes that you should “use literals, or at least unchanged parameters, for query inputs”:http://blogs.msdn.com/ianjo/archive/2005/11/10/491538.aspx. As his post shows, using variables as I write below can often have a negative impact on performance. I think this is a real shame, since using magic numbers in this way definitely makes the code less readable. Possibly the best way to go is to move the query to a separate procedure and pass the magic number from the ‘outer’ procedure as a parameter to this new procedure, like Ian describes in his post. That way we would avoid magic numbers and still get the best plan, although I am not sure the readability is helped by this added complexity. I will leave my post as it was originally written because I think it is still good advice to think about the readability of code. The part about watching out for non-SARGable where clauses is also still valid, which Ian hints about in “another post”:http://blogs.msdn.com/ianjo/archive/2005/11/10/491541.aspx.
Most programmers are aware of the pitfalls involved with using magic numbers in code (at least I hope so). For instance, instead of writing something like @if(foo == 42)@ you should create a constant and use @if(foo == MeaningOfLife)@, since not everyone knows that 42 equals the meaning of life (you might think that all computer geeks know this, but they actually do not). For some reason though the good practice of not using magic numbers seems to be forgotten when it comes to writing stored procedures, or just sql scripts in general.
In stored procedures you often see something like @WHERE somecolumn = 42@. There’s a magic number right there. In a situation where somecolumn should be compared to a parameter that was passed to the procedure it would instead be @WHERE somecolumn = @someparameter@. My advice is that the same style should be used even when the number is not a parameter but just a hard-coded value. So:
*Avoid using magic numbers in SQL statements. Declare them as variables and use the variable instead.*
– Bad code, do not use this style
SELECT …
WHERE somecolumn = 10
– Good code, use this style
DECLARE @somevariable SMALLINT
SET @somevariable = 10
SELECT …
WHERE somecolumn = @somevariable
Apart from making the code more readable, this advice can also be important for performance reasons. The literal 10 used in the example above is seen by SQL Server as an integer constant. This means that if the data type of somecolumn is not integer then an implicit cast needs to be done to be able to compare the values. Now, consider what happens when somecolumn has the data type smallint. The data type precedence rules for SQL Server state that a smallint is converted to an integer. So, the where clause in reality becomes @CAST(somecolumn AS INT) = 10@. Not only are we wasting CPU cycles casting the value of somecolumn for every row, even more worse is that this clause does not confirm to the SARG rules. These state that to be able to use an index seek for evaluating a where clause it needs to follow the formula @column operator expression@. With the CAST function the clause becomes @expression operator expression@, so a table (or index) scan is used.
This problem is of course easily avoided by casting the integer constant to a smallint (@somecolumn = CAST(10 AS SMALLINT)@), but then we would still have the magic number. And it is easy to forget to cast the literal and forget about the implicit cast. Avoiding magic numbers force us to think about what data type we actually want.
Review: Pragmatic Unit Testing in C# with NUnit
I read “Pragmatic Unit Testing in C# with NUnit”:http://www.pragmaticprogrammer.com/starter_kit/utc/index.html this morning. I love the fast-paced to-the-point writing style of “Andy Hunt”:http://www.toolshed.com/blog and “Dave Thomas”:http://blogs.pragprog.com/cgi-bin/pragdave.cgi (and the other authors in their publishing house), and this book was no different than other books in the Pragmatic-series. Most of the contents was stuff that I already knew, both practically (how to) and theoretically (why), but it was still a good read to get another person’s (or two in this case) view of these matters. I also learnt a couple of smaller things about NUnit that I did not know before, such as the [Suite] attribute that can be used to create test suites of test fixtures.
For anyone new to unit testing and TDD (and specifically in C# or at least the .NET world) I would recommend this book along with Kent Beck’s “Test-Driven Development: By Example”:http://www.amazon.com/gp/product/0321146530/. Even more so if you are just starting up a project where you will be using unit tests and have to get up to speed quickly. It starts with a good explanation of why you need unit tests, then proceeds to describe how to do this using NUnit and finally discuss how to write better unit tests and how to incorporate unit tests into a project. All of this in just 176 easy-to-read pages (I finished it in just over an hour, though I could read quite fast since most of it was not new to me).
Design Patterns in Ruby: Chain of Command
Chain of Responsibility:
bq. Avoid coupling the sender of a request to its receiver by giving more than one object a chance to handle the request. Chain the receiving objects and pass the request along the chain until an object handles it.
p>. “GOF”:http://www.amazon.com/gp/product/0201633612/ p.223
In the book, the GoF mention how in Smalltalk it is possible to do a very different implementation using the @doesNotUnderstand@ mechanism. Ruby of course has a similar one in @method_missing@, so I thought that I would use this in an implementation example of Chain of Responsibility. There is also an alternative implementation beneath it that does not use @method_missing@.
# Generic implementation for Chain of Responsibility
module Chainable
def next_in_chain(link)
@next = link
end
def method_missing(method, *args, &block)
if @next == nil
puts “huh?”
return
end
@next.__send__(method, *args, &block)
end
end
class X
include Chainable
def initialize(link)
next_in_chain(link)
end
def do_x
puts “x”
end
end
class Y
include Chainable
def do_y
puts “y”
end
end
y1 = Y.new
x1 = X.new(y1)
x1.do_x
x1.do_y
x1.do_z
Now, I guess it might not be a good idea to define method_missing in a module (comment anyone?), so you could of course change it to let the classes handle that themselves. I also did this alternative implementation that does not use method_missing but is still fairly generic.
# Alternative implementation for Chain of Responsibility
module Chainable
def next_in_chain
nil
end
def handle_message(message, *args, &block)
if self.respond_to?(message)
self.__send__(message, *args, &block)
else
next_in_chain.handle_message(message, *args, &block) unless next_in_chain.nil?
end
end
end
class X
include Chainable
def initialize(link)
@next = link
end
def next_in_chain
@next
end
def do_x(x)
puts x
end
end
class Y
include Chainable
def do_y
yield ‘y’
end
end
y1 = Y.new
x1 = X.new(y1)
x1.handle_message(:do_x, ‘x’)
x1.handle_message(:do_y) {|y| puts y}
x1.handle_message(:do_z)

