To aggregate or not to aggregera

Posted by Chris on October 04, 2006

I do not think I have yet posted a message that is only meant for Swedish (speaking) readers. Anyway, this post is probably mostly interesting to Swedish-speaking people, but I will keep it in English anyway (except where not possible, as is natural from the following discussion).

A couple of months ago (well about half a year ago actually, don’t know why I haven’t thought about this again until now) I was preparing a presentation on SQL Server 2005 that I was to present at a breakfast seminar at Dotway. Since I was going to speak Swedish I wanted to try and include as little Swenglish as possible (very common in the Swedish IT community otherwise, though usually called ‘Svengelska’ which would be the Swedish ‘translation’ of the ‘English’ word Swenglish). So I was looking to translate as much of the terms I could. But one term where I completely stumbled was _aggregate functions_.

*What do you call Aggregate Functions in Swedish?*

Or more specifically, I was going to describe user-defined aggregate functions in SQL Server 2005, and wanted to avoid Swenglish. To make sure we have the same English definition here, what I mean by an aggregate function is a function that calculates a single value based on a set of values as input. I guess more or less as it is defined in “Princeton Wordnet”:http://wordnet.princeton.edu/perl/webwn?s=aggregate, but with a little more database context to it.

The first one that came to mind was the probably very Swenglish ‘*Aggregatfunktioner*’. To someone familiar with SQL Server it sounds very correct, but I guess that is precisely because they (and I) recognise the English word that is actually in there. And the only definition of the word ‘aggregat’ I have found in Swedish does not provide a lot of help here:

*aggregat* [agreg'a:t Uttal] aggregatet aggregat aggregaten subst.
grupp av sammanbyggda maskiner

A typical use of this is a word like ‘värmeaggregat’ (something like a heater). And that is nowhere near what I am looking for..

Next try was ‘*Aggregeringsfunktioner*’ and immediately after that came ‘*Aggregerande funktioner*’. These feel more Swedish, and the first one is actually used in the “IT word dictionary of PC World”:http://pcforalla.idg.se/tjanster/dataordboken/ (look up ‘aggregera’), as much as you want to use that as a source.. But they still do not sound quite correct (and fully Swedish), and they definitively do not feel right to say. So I actually went ahead and used the Swenglish ‘*aggregatfunktioner*’ in my presentation, which I think worked very well and everyone understood what I meant (in a typically Swenglish fashion).

Today when I somehow came to think about this again I contacted “Jesper Holmberg”:http://blogs.msdn.com/jesperh/ at Microsoft and asked him what he thought about it. Jesper works with the Swedish localization of Windows and other Microsoft products (and possibly more, sorry if I missed something) and has a very interesting blog as well. He was really quick at answering and his suggestion was ‘*mängdfunktioner*’, apparantly the recommended translation at Microsoft (remember other products, such as Excel, have aggregates as well). This is a great translation that really conveys the meaning like I wanted, but unfortunately it feels like noone ever use that word in Swedish. So I fear half of the audience (if I were to give the presentation again) would not realize what I was talking about, at least not immediately. But I do think that I will actually use ‘*mängdfunktioner*’ next time unless I get a better suggestion.

So, anyone with other ideas, please post a comment…

SQLUG - Michael Rys

Posted by Chris on October 04, 2006

So I was at the “Swedish SQL Server User Group”:http://www.sqlug.se/ meeting in Stockholm yesterday. It’s sad that there is never anyone but me from Skåne (the part of Sweden where I live) that come there, but of course it is an hour’s flight and the meeting ends at like 21.30 so it does get pretty late. It was great to chat with “Tobias Thernström”:http://www.rbam.se/?page=pages/tobias.htm and “Tibor Karaszi”:http://www.solidqualitylearning.com/blogs/Tibor/ (the founders of SQLUG) as well as “André Henriksson”:http://blogs.msdn.com/ahenrik/ and Maria Johansson from Microsoft Sweden.

As I mentioned in the “preview post”:http://www.hedgate.net/blog/2005/12/06/coming-user-group-meetings/ both presentations at the meeting was given by Dr. Michael Rys from Microsft and member of the ISO SQL committee and W3C XQuery committee. Michael knows a lot about XML and his presentation on XQuery was very interesting. Personally I do not see me storing XML in SQL Server even with the new support in SQL Server 2005. Michael mentioned scenarios where you want to store “semi-structured data” as one example where it would be a good idea to store it in an XML datatype column. But to me the word “semi-structured data” has as little meaning as “denormalization”. It does not have a specific meaning, and the general explanation “data that is not relational” is no good since all data is best managed in a relational management system in my view. Using XML and XQuery in stored procedures is a different case, where I have many thoughts with lots of further sidetracks, but that part of my brain is to incoherent right now to write anything about it. Look for more on this in the future.

The other presentation Michael gave was a general overview of SQL Server 2005, actually the same presentation that he gave on the SQL/VS2005 launch events lately. So there was not a lot of interesting news to me there, but again it was interesting to listen to Michael since he gave his personal view on different items. I asked him about the possibility of row-value constructors finally appearing in the next version of SQL Server. He agreed that it was definitely one of the important parts from the SQL standard missing in SQL Server and a top candidate for being implemented, but also recommended letting them know about it by sending an email to “sqlwish@microsoft.com”:mailto:sqlwish@microsoft.com. I guess another request for row-value constructors won’t hurt.. :)

And one last thing, regarding the next version of SQL Server. Although Bill Gates has already mentioned it in an interview and I have seen it in a couple of blogs (Microsoft and non-Microsoft), this was the first time I heard a Microsoft-employee use the codename “Katmai” for the next version of SQL Server in speech. I wonder if all of the attendees understood what he meant when he said “will be fixed in Katmai”. :)

All in all it was a great meeting and as always well worth the time and effort to go there. Now I am looking forward to the local SNUG-meeting this monday.

Driving a car or riding the bus

Posted by Chris on October 04, 2006

Here is an interesting question. You are at point A and want to get to point B. The problem is that you do not know exactly where point B is. So to help you find it you have a device that always shows you the current direction to point B. You now have two alternatives of getting to point B. You can either take your car and drive there letting the device show you the direction. Naturally you will not be able to just fix the steering wheel in the correct position and take a nap since there will be obstacles in the way, but you will be able to quickly adapt to any detours encountered. The second alternative is to get on a bus that is headed more-or-less in the correct direction. When it starts to deviate too much you can get off it and change to another bus, and if you keep doing that eventually you will get to point B (or close enough to walk).

Disregarding any environmental and economical factors, which alternative would you choose? Most of us, I think, would choose the car. Now compare this to software design. Which one of the two alternatives best resembles the way most people design software? Unfortunately the bus is a lot more common than the car. Requirements are ’set in stone’ initially even though they are not fully known (and they can’t be of course). The developers then try and implement some of the requirements and when they are done they leave it for QA (and/or the customer) to test and give them feedback on the direction. Sure, there are lots of processes and ideas on how to refine the bus trip to make it as smooth as possible, but the fact is you are on that bus and you are not driving. You can only get feedback (or at least react to it) when the bus stops, and as soon as you get on the next bus you are once again working in unknown land until the next stop.

The way we use the car instead of the bus trip in software design is that we increase the frequency of feedback. By testing first, continuously integrating, evolving the design and ‘requirements’ and always communicating with the customer (in fact putting the customer in the passenger seat of our car) we get feedback as often as possible which lets us adapt to the current direction we need to be moving in.

Dynamic Management Objects in SQL Server 2005

Posted by Chris on October 04, 2006

Performance tuning and troubleshooting in SQL Server has always been something of a black art. To be effective at it you need to know how to use a large set of tools, including Profiler, Perfmon, DBCC commands and stored procedures. Sometimes it can seem almost random which tool you should use for a specific issue. They will often affect performance themselves, so you might not always be able to use them. [...] SQL Server 2000 can be seen as a black box that can be quite difficult to penetrate. SQL Server 2005 changes all this by introducing the new Dynamic Management Objects.

I have written a short “article on Dynamic Management Objects (DMVs and DMFs) in SQL Server 2005″:/writings/dynamic-management-objects for the “Quest Pipelines newsletter”:http://www.quest-pipelines.com/newsletter-v6/newsletter_1205.htm.

Agile advice from P&P

Posted by Chris on October 04, 2006

I have been listening to an excellent webcast called “Lessons Learned from the Warroom”:http://msevents.microsoft.com/CUI/WebCastEventDetails.aspx?EventID=1032286000&EventCategory=5&culture=en-US&CountryCode=US. It features “Peter Provost”:http://www.peterprovost.org/, “Brian Button”:http://www.agileprogrammer.com/oneagilecoder, “Brad Wilson”:http://www.agileprogrammer.com/dotnetguy and “Darrell Snow”:http://blogs.msdn.com/darrellsnow (all working in the Patterns & Practices group at Microsoft) discussing what they have learned from working by agile methodologies. Among the projects they have worked on following agile principles are the “Enterprise Library”:http://www.gotdotnet.com/codegallery/codegallery.aspx?id=295a464a-6072-4e25-94e2-91be63527327 and the “Composite UI Application Block (CAB)”:http://www.gotdotnet.com/codegallery/codegallery.aspx?id=22f72167-af95-44ce-a6ca-f2eafbf2653c, so they have aquired quite a lot of experience.

If you have not listened to the webcast I definitely recommend you do so, it is filled with great advice. Below are some notes from it, posted here mostly so that I will remember them myself. But I have also added some comment where appropriate.

* Do not forget to do reflection when you are planning. As Peter Provost comments, this is simply following the process since the process tells us to reflect on the past iteration while we are planning the current iteration. But somehow this part is often forgotten. To add to that, you should of course not only reflect while planning the current iteration, reflection should be done as often as possible.
* Manager approval and sponsorship is very important to agile projects. For instance, a common problem is that people are used to being judged by their individual achievements. But agile advocates that the whole team and the results it produce (in particular) is what is important. Therefore it is a manager’s job to make sure that the importance of setting aside your ego and instead working for the team is communicated to everyone.
* One specific area is pair-programming. Developers (who have not tried it) can be reluctant to do it. A similar problem is pairs that never change, e.g. some or all of the pairs are always comprised of the same developers. Peter suggested a pairing chart to help with these issues. List all the names of the developers in a matrix and note down when two developers pair. The goal is to have all the columns in the chart filled. Someone added that a pairing session should not be more than a couple of hours long, spanning a “single coherent logical thought”. After that you should move on to another partner and problem.
* A particularly interesting part (to me) was remote pairing. The P&P team includes a number of consultants that are not present in the warroom at Microsoft at all times. Some of the work was done by people in South America. All the time though people where pairprogramming. They tried a number of different setups, including Skype, VNC and LiveMeeting, similar to what “me and Andrés”:/articles/2005/11/22/distributed-pair-programming/ tried. Although this seems to have worked very well for them, they where also quick to note that when they where actually pairing at the same physical desk theyfelt much more productive and focused.
* The final thing I want to mention was the discussion on how to get people (developers, management, customers etc) interested in working with agile principles. This was interesting since Andrés recently did a presentation with similar ideas called Guerilla Agile at Dotway’s latest competence weekend (actually I ended up presenting his material since he got sick the night before). The most obvious way is to simply show them how well it works. For instance, write your code using TDD even if you are the only one. It should soon be obvious that your code is better. :) When someone presents a design as a diagram, ask if you could state it as a test to document it. Pairprogram whenever you can, soon pairing will be the default instead of the other way round. And my favorite quote from Peter Provost: “If you do not have a warroom, steal one”. Book a conference room for weeks, days, hours or whatever it takes. Share the reservations between the team members, finally management will understand that you really do need a dedicated room.
This is just a small set of all the great advice from the webcast. I hope you are not satisfied with my notes and go listen to the whole thing ASAP.

Inverted wisdom

Posted by Chris on October 04, 2006

The agile community is full of words of wisdom. These often describe a lot more about the process than an article (or even books), at least if you understand the meaning behind them. The funny thing is that many of these are often kind of inverted wisdoms, negations if you wish. Here are some examples (in my own wording):

bq. If your user story does not fit on a card, get a smaller card!1

The reasoning for this is of course that you are not really writing user stories, you are writing something that is bigger (too big). The story should not include a lot of detail, that part is for the conversation [about the story]. By making sure that the stories are not too large you also make sure that you are not spending too much time gathering requirements and thinking too much about details.

bq. If a project is finished on time with all the specified requirements implemented, then chances are that it will not be considered a successful project after some time!2

If a project finishes all the specified requirements on time, then chances are you did not really ‘find’ all the requirements that the customer really wants to have. Project management is all about deciding which requirements that are to be implemented now, later or left behind. You should always have more stories than you have time for.

bq. If a project/team/company is completely dependent on one programmer, get rid of him!3

The longer you wait, the more dependent you become and the more of a bottleneck, or constraint, this programmer becomes. Note to ‘trigger-happy’ management: you do not really need to fire the programmer, there are “other ways to remove the constraint”:http://www.nayima.be/about/TheoryOfConstraints.html.

If you have similar wisdoms then please post them in the comments.

References:
fn1. This one I picked up in Mike Cohn’s book “User Stories Applied: For Agile Software Development”:http://www.amazon.com/gp/product/0321205685/, but he further attributes it to Tom Poppendieck.

fn2. I am not quite sure where I first heard this, but I know Ron Jeffries has written about “similar issues”:http://www.xprogramming.com/xpmag/jatmakingthedate.htm.

fn3. I am not sure if I have read this one specified in this way or just made it up myself, but in any case it is such a common idea that no one source can be attributed.

No pipe in functions

Posted by Chris on October 04, 2006

I spent a couple of minutes with a confused look on my face today. I was writing a CLR function and wanted to output some information about the progress. @SqlContext.Pipe.Send(”some message”);@ should work, right? Nope, a NullReferenceException was all I got for my troubles. I figured out that it was the property Pipe that did not return a SqlPipe, instead it gave me null. But I could not understand why there was no pipe.

Although I am sure that most of my readers are smarter than me and would figure out what the problem was in an instant, I thought I’d just make a note of it here anyway. First I thought about the DataAccess and SystemDataAccess parameters of the SqlFunction attribute. It seemed strange that they should be involved, but I tried experimenting with them anyway. No luck. After a while of thinking some more I did what always works, went for a cup of coffee. The answer came to me just about the same time as the cup was brewed. Functions are not allowed to send results to the client, whether implemented in CLR or T-SQL. Using a PRINT statement in a T-SQL UDF results in the following error when trying to create it:

@Msg 443, Level 16, State 14, Procedure test, Line 3@

@Invalid use of side-effecting or time-dependent operator in ‘PRINT’ within a function.@

So, I ended up using the Visual Studio debugger for my purposes, just as I should have from the start of course. Debugging by outputting text strings is not the best way of debugging. I just had one problem with the debugger though. I was using Niels Berglund’s excellent SQLCLR project add-in for Visual Studio for creating, deploying and debugging the code. I am not sure if the Visual Studio CLR project type behaves the same way, or even if it was just me doing something wrong, but the debugging mode were always abruptly ended with a message that the deployment took too long and timed out. Has anyone else had this problem?

Two notes about impersonation in SQLCLR

Posted by Chris on October 04, 2006

While experimenting a bit with impersonation in SQLCLR I noted two things I thought would be good to mention here. Firstly, if you are creating a function you need to set the named parameter DataAccess or SystemDataAccess (I assume it works with both set as well) of the SqlFunction attribute to Read. Strange that it works with either one of them.

The other thing was that BOL is not 100% accurate in the description of what the property WindowsIdentity of SqlContext can return. In BOL it says that if you are running integrated security this returns the token for the caller of the code, but if SQL Server security is used it returns NULL. What is missing is that if a SQL Server login which is a member of the sysadmins fixed server role (such as sa) is the caller of the code then WindowsIdentity will return a token. There is no meaning in calling the Impersonate method on it though, as it is the token for the SQL Server service account. Since SQLCLR code by default executes as the SQL Server service account there is not a lot of meaning in impersonating it..

Extension methods and method overload resolution

Posted by Chris on October 04, 2006

Eric Gunnerson recently “posted an interesting piece of C# code”:http://blogs.msdn.com/ericgu/archive/2005/12/12/502890.aspx and asked what it does. As he explains in the “follow-up discussion post”:http://blogs.msdn.com/ericgu/archive/2005/12/13/503225.aspx, the reason why it does what it does is that we would not want the behaviour of our program to change due to a new method being added to a class we are subclassing. This is of course precisely the way it should be to avoid nasty problems. However, an interesting thought that struck me was that the extension methods feature in C# 3.0 and the method overload resolution rules they use can create precisely that problem.

Consider the following C# 3.0 code:

// Foo.cs
namespace thirdpartylib {
  public class Foo {
    private int x;
    private int y;
    public Foo(int x, int y) {
      this.x = x;
      this.y = y;
    }

    public int DoFoo() {
      return this.x + this.y;
    }

    public int X { get { return this.x; } }
    public int Y { get { return this.y; } }
  }
}

// ExtendFoo.cs
using thirdpartylib;
namespace extensions {
  public static class ExtendFoo {
    public static int DoBar(this Foo theFoo, int z) {
      return theFoo.X - theFoo.Y - z;
    }
  }
}

// Program.cs
using System;
using thirdpartylib;
using extensions;
namespace LINQConsoleApplication1 {
  class Program {
    static void Main(string[] args) {
      Foo foo = new Foo(1, 2);

      int i = foo.DoFoo();
      int j = foo.DoBar(3);

      Console.WriteLine("DoFoo: {0}", i);
      Console.WriteLine("DoBar: {0}", j);
    }
  }
}

When this little program is executed it prints “DoFoo: 3″ and “DoBar: -4″ to the console. For those that have not seen C# 3.0, the interesting part is the static class ExtendFoo with it’s static method DoBar. The keyword @this@ before the first parameter is what is new in 3.0, and it is what lets us extend the Foo class to seemingly have a DoBar method that the program can call. By simply bringing them into scope (with the @using extensions;@ line) the compiler adds any static methods on static classes with the first parameter ‘decorated’ with the @this@ keyword to the applicable classes. Note though that this is only syntactic sugar, what is really happening (and which can easily be seen by reviewing the IL generated) is the same as if we would have written this:

int j = ExtendFoo.DoBar(foo, 3);

So, coming back to method overload resolution. For extension methods the rule is that the extension method is used if it does not clash with an instance method on the extended type. So, now consider what happens when this code is added to Foo:

    public int DoBar(short z) {
return this.x * this.y * z;
}

Now the behaviour of our program has changed into printing "DoBar: 6" instead of "DoBar: -4". Considering the IL that is generated with extension methods this is not at all strange, but I anticipate one or two developers might get caught by this. Microsoft does include a warning about extension methods in the PDC preview bits (??"Extension methods are less discoverable and more limited in functionality than instance methods. For those reasons, it is recommended that extension methods be used sparingly and only in situations where instance methods are not feasible or possible."??), but this in particular feels kind of creepy.

Troubles with shared state and anonymous delegates in SQLCLR

Posted by Chris on October 04, 2006

Earlier today I set out to create “a SQLCLR function for splitting a comma-separated string including ranges”:/articles/2006/01/27/splitting-a-comma-separated-string-with-ranges-included/ (such as “2,5,7-9,13”) into a set of values. I encountered some problems that I think is valuable to document here. The code below shows what I initially wrote:

SqlFunction(
  DataAccess = DataAccessKind.None,
  SystemDataAccess = SystemDataAccessKind.None,
  IsDeterministic = true,
  IsPrecise = true,
  FillRowMethodName = "FillRow")]
public static IEnumerable SplitWithRanges(SqlString input) {
  // First expand ranges
  Regex rangeExpander = new Regex(@"(?\d+)\-(?\d+)");

  string expandedRanges = rangeExpander.Replace(
    input.Value,
    delegate(Match match) {
      int startOfRange = Int32.Parse(
        match.Groups["RangeStart"].Value);
      int endOfRange = Int32.Parse(
        match.Groups["RangeEnd"].Value);
      string[] values = new string[endOfRange - startOfRange + 1];
      for(int i = 0; i < values.Length; i++) {
        values[i] = (startOfRange + i).ToString();
      }
      return String.Join(",", values);
    }
  );

  // Now split the string containing the comma-separated values
  string[] splittedValues = expandedRanges.Split(
    new char[] { ',' },
    StringSplitOptions.None);

  return splittedValues;
}

public static void FillRow(Object obj, out SqlString tuple) {
  tuple = new SqlString((String)obj);
}

However, when I tried to add this assembly to SQL Server I ran into problems. The CREATE ASSEMBLY statement failed with the following message:

@Msg 6212, Level 16, State 1, Line 8@

@CREATE ASSEMBLY failed because method ‘SplitWithRanges’ on type ‘Functions’ in safe assembly ‘Splitting’ is storing to a static field. Storing to a static field is not allowed in safe assemblies.@

I know that assemblies must be registered in UNSAFE to use shared state. The reason is that if an executing thread suffers a thread abort exception (for instance due to a stack overflow or out of memory exception being thrown by something it does) and there is any shared state in the application domain where it is executing, then the entire application domain is unloaded. It does not matter whether or not this specific thread was using the shared state, as a safety precaution the application domain is unloaded. Therefore SQL Server tries to restrict you from using shared state in SQLCLR.

But where does the C# code above use any shared state? A little reflecting with Lutz Roeder’s Reflector gave me the answer. The row with the anonymous delegate above is actually compiled into IL that matches C# code similar to this:

if (Functions.<>9__CachedAnonymousMethodDelegate1 == null) {
  Functions.<>9__CachedAnonymousMethodDelegate1 = new MatchEvaluator(
    Functions.b__0);
}
string text1 = regex1.Replace(
  input.Value,
  Functions.<>9__CachedAnonymousMethodDelegate1);

Functions is the name of the class containing my code, so what this does is use a static field for storing the delegate. The static field is defined as below:

[CompilerGenerated]
private static MatchEvaluator <>9__CachedAnonymousMethodDelegate1;

So there we have the shared state. To get around this there are two solutions. I can either define a function to use as the delegate, like this:

private static String ExpandRanges(Match match) {
  int startOfRange = Int32.Parse(
    match.Groups["RangeStart"].Value);
  int endOfRange = Int32.Parse(
    match.Groups["RangeEnd"].Value);
  string[] values = new string[endOfRange - startOfRange + 1];
  for(int i = 0; i < values.Length; i++) {
    values[i] = (startOfRange + i).ToString();
  }
  return String.Join(",", values);
}

Then I just change the row calling the delegate to this:

string expandedRanges = rangeExpander.Replace(
input.Value,
new MatchEvaluator(Functions.ExpandRanges));

The second “solution” is to use a variable local to the function SplitWithRanges inside the anonymous delegate. The reason this works is that the compiler now generates a private nested class with a function that is used as the delegate. So, if I for instance declare a variable @bool foo = true;@ outside the delegate and then simply do @foo = foo;@ in the code of the anonymous delegate then the compiler will actually create this:

[SqlFunction(
  DataAccess=DataAccessKind.None,
  SystemDataAccess=SystemDataAccessKind.None,
  IsDeterministic=true,
  IsPrecise=true,
  FillRowMethodName="FillRow")]
public static IEnumerable SplitWithRanges(SqlString input) {
  Functions.<>c__DisplayClass1 class1 = new Functions.<>c__DisplayClass1();
  Regex regex1 = new Regex(
    @"(?\d+)\-(?\d+)");
  class1.foo = true;
  string text1 = regex1.Replace(
    input.Value,
    new MatchEvaluator(class1.b__0));
  char[] chArray1 = new char[] { ',' } ;
  return text1.Split(chArray1, StringSplitOptions.None);
}

As I mentioned above, class1 is a private class defined (by the compiler) in my class Functions. So under the covers this solution is very similar to the previous one. It is just a question of where the function is placed. Although I love anonymous delegates, in this case I went with defining the function myself and using that for the delegate.