The Ody Brothers

Sunday, December 14, 2014

Working with Age Groups in SQL

Following on from my previous post SQL - Beyond the standard ASC / DESC ORDER BY which introduced the CASE WHEN statement, this post tackles the problem of handling age groups with SQL and introduces a technique I call “query wrapping”.

When writing queries to generate reports, it is a common requirement to aggregate records into groups and a typical grouping is age. The following example outlines the process of aggregating a set of employee records into the following age groups:

Under 18
18 – 25
26 – 35
36 – 55
56+

For this post, the sample employee list I am using looks like this:

Before assigning an employee an age group, the employee’s age needs to be calculated. To do this the DATEDIFF T-SQL function can be used. MSDN defines the syntax as:

DATEDIFF(datepart, startdate, enddate)

To calculate an employee’s age, the number of years in between their date of birth and today’s date is required:

DATEDIFF(yy, DateOfBirth, GETDATE())

Note “yy” specifies that the number of years between the two dates is required and GETDATE() will return today's date.

The DATEDIFF function can be used in conjunction with a CASE WHEN statement to create a query which will add an age group label to each employee and can in turn be grouped upon:

SELECT
    CASE
        WHEN DATEDIFF(yy, DateOfBirth, GETDATE()) <= 17
            THEN 'Under 18'
        WHEN DATEDIFF(yy, DateOfBirth, GETDATE()) BETWEEN 18 AND 25
            THEN '18-25'
        WHEN DATEDIFF(yy, DateOfBirth, GETDATE()) BETWEEN 26 AND 35
            THEN '26-35'
        WHEN DATEDIFF(yy, DateOfBirth, GETDATE()) BETWEEN 36 AND 55
            THEN '36-55'
        WHEN DATEDIFF(yy, DateOfBirth, GETDATE()) > 55
            THEN '56+'
    END AS YearGroup
FROM Employees

This query will generate the following result set:

This result set will be the basis to determining how many employees are in each age group. To create the age group aggregation it is required to group on the result set displayed above. How do you group on the result set of another query? You use that result set as the input to your GROUP BY query. It is possible to dynamically generate a query's FROM data-set from an SQL query instead of referencing an actual table. Take a look at the query below to see this in action:

SELECT AgeGroups.YearGroup AS [Year Group]
    , COUNT(AgeGroups.YearGroup) AS [Group Count]
FROM
(
    SELECT
        CASE
            WHEN DATEDIFF(yy, DateOfBirth, GETDATE()) <= 17
                THEN 'Under 18'
            WHEN DATEDIFF(yy, DateOfBirth, GETDATE()) BETWEEN 18 AND 25
                THEN '18-25'
            WHEN DATEDIFF(yy, DateOfBirth, GETDATE()) BETWEEN 26 AND 35
                THEN '26-35'
            WHEN DATEDIFF(yy, DateOfBirth, GETDATE()) BETWEEN 36 AND 55
                THEN '36-55'
            WHEN DATEDIFF(yy, DateOfBirth, GETDATE()) > 55
                THEN '56+'
        END AS YearGroup
    FROM Employees
) AgeGroups
GROUP BY AgeGroups.YearGroup

The query previously written to add an age group label against each employee has been included as the FROM dataset for the GROUP BY query. I think of this as wrapping the group by statement around the result set of the inner age group query. This is an important technique to understand as it allows you to build up very powerful SQL queries by wrapping queries around one another. Data-sets which are being JOINED on to can also be created dynamically using a separate SQL statement opening up a limitless number of querying options.

This is where thinking in sets when working in SQL helps. Think of the inner age group labeling query as generating your initial data-set and the outer group by query filtering (or in this case aggregating) down this result set further to something closer to what is required.

When executed, the updated query produces the required aggregated result set:

When writing an SQL query it is usually habit to start at the top and work down--what you want as the end result. I find it easier to write the inner most queries first where I am pulling in all the data I require to work with in its most primitive form. I then start to wrap these queries in other queries which will better shape the data into the end result that I require and look for ways to JOIN this data on to other data-sets as required.

Hopefully this has illustrated the technique of “query wrapping” whilst demonstrating how to work with age groups in SQL and helped open a couple more SQL querying doors for you.

Wednesday, December 3, 2014

C# Recursion with a Lambda

This is the first in a series about exploring the code patterns that Miguel Castro introduces in his Pluralsight course titled "Building End-to-End Multi-Client Service Oriented Applications". I am creating these blog posts as a learning exercise where I can delve deeper into the pattern and solidify my understanding of it. Hopefully this can be of some help to other programmers who are also taking Miguel's course or anyone trying to improve their C# skills.

Traditional recursion

If you're like me, when it has come to writing a recursive algorithm, you generally follow the pattern of defining a function that contains all the code that needs to be executed repeatedly and then call this from another function. It's simple and it works, but it does require that you create at least two functions. Let's take a look at an example where recursion is used to calculate the factorial for a given number:

public void TraditionalRecursion()
{
    Console.WriteLine(ComputeFactorial(5));
}

private int ComputeFactorial(int n)
{
    if (n == 0) 
    { 
        return 1; 
    }

    return (n * ComputeFactorial(n - 1));
}

The "work" is being done by the ComputeFactorial function, but as it needs to be able to call itself, it has to be created as a separate function. Like with many recursive algorithms, the first function (TraditionalRecursion) doesn't contain any logic or functionality that requires it to be separated--it just has to act as the starting point for the algorithm.

Lambda recursion

Now, wouldn't it be nice if you could do this with just a single function where you no longer need a second function just to start the algorithm. Imagine being able to define the recursive algorithm and start it all within the same function. Well, it turns out that this is quite easily achievable through the use of a lambda expression:

public void LambdaRecursion()
{
    Func<int, int> computeFactorial = (n) =>
    {
        if (n == 0) 
        { 
            return 1; 
        }

        return n * ComputeFactorial(n - 1);
    };

    Console.WriteLine(computeFactorial(5));
}

The inline delegate, a Func in this example, contains exactly the same behavior as the ComputeFactorial function; the only difference is that it has been declared as a variable. An easy way to look at this, is to image that ComputeFactorial has been wrapped up and is now contained within the computeFactorial variable (declaring functions as variables should be very familiar concept to a JavaScript programmer). Once declared, all you have to do it call this variable to start the recursive algorithm--it really is as simple as that!

Friday, November 28, 2014

SPA Silos

I was recently listening to a fascinating .NET Rocks podcast which featured Miguel Castro discussing MVVM on the Web. Miguel talked about his experiences with Knockout.js and AngularJS as client side MVVM frameworks. What I really enjoyed about this podcast was the insight he brought to the table of developing commercial SPA's (Single Page Applications) but using a more hybrid approach. What am I saying here? I think Miguel used the phrase “pages with SPA pockets” but said Brian Noyes coined a much more accurate description—SPA Silos.

SPA Silos are where a single web application is actually made up of many SPA’s. When we envisage creating a SPA, we normally have a vision of a single HTML base page that will initially load all of your SPA code. From there the browser will not actually request any new pages from the server but only swap out views and make web API calls in the background as you navigate around the app. The SPA Silo approach is to break that single SPA into many mini SPA’s each responsible for its own area of functionality. Navigation within the mini SPA is handled by routing in the SPA framework. When moving to different areas of the application, this is handled by requesting a new page from the server which bootstraps all the SPA code for the mini SPA that will provide the functionality for the area of the application you are now navigating to.

Miguel gave a nice example which helped illustrate this practice very well:
Let’s say you are building an e-commerce management web application. The application would have areas such as Products, Customers, Orders, Admin etc. Instead of having a single SPA which contained the functionality for all of these areas, you would actually have 4 mini SPA’s which are each loaded when navigating to the required area instead of upfront when first loading the application.

What are the benefits from using this hybrid SPA Silos approach I hear you asking? There is one very big benefit which immediately springs to mind--Separation of Concerns. I cannot stress the importance of this principle enough. If this hybrid approach was not used you would have one very big SPA application with potentially a very complex and large routing table as well as a very large code base. By breaking the application out into many smaller applications you now only need a routing table for the area your application is concerned with (products or customers for example). Another benefit is that the code base for each area of the e-commerce management application is modularised due to it being implemented in a self-contained SPA. This will help make finding the required code easier assisting with maintaining and enhancing the web application in the long run. SPA applications force the browser to work in a way it was never originally designed to. Browsers work best when they can navigate from page to page and forget everything that happen on the previous page. They like to "wipe the slate clean". SPA’s force the browser to stay on a single page but update the current page's content dynamically as the user navigates around the application. This behaviour could lead to memory leaks as a browser could go for hours, days or even weeks without the page actually being refreshed. This hybrid model helps encourage page refreshing as each time the user moves from one area of the application to another, the application will make a page request and the browser will be breathing a sigh of relief. Of course, if the user were to spend all their time in a single area of the application the original problem will still be present. However, with the hybrid approach the functional surface area which a user can operate in without triggering a page refresh is greatly limited, increasing the likelihood of a page refresh occurring on a more regular basis.

On the downside there is the added complexity of handling a combination of client-side and server-side routing within the web application but I feel this is outweighed by the benefits of this approach.

Miguel will be presenting a session on this approach at NDC London in December so I will be watching this session as soon as it’s posted on their Vimo channel. If you cannot wait until then, Miguel's Building End-to-End Multi-Client Service Oriented Applications - Angular Edition course on Pluralsight also provides an insight into this hybrid approach.

Quoting Miguel, "It’s the best of both worlds".

EDIT:
You can find Miguel's talk at NDC London here: AngularJS for ASP.NET MVC Developers by Miguel A Castro.

Wednesday, November 19, 2014

Think Twice Before Changing your APIs

How often do you make changes to a service that actually alter the API itself? I'm sure if I asked this question to an experienced developer the answer would follow something along the lines of "You would NEVER want to do that! If you change the API, you risk breaking any applications that are consuming that service." As this danger is fairly well known, how do you guard against it?

Imagine this scenario: there is a set of web services that are being consumed by multiple web applications. The API for one of these services has been changed so that a service method that was taking a string, is now taking an integer. Unfortunately, this little change caused the contract surrounding the service to also change, which meant that any applications consuming it would also need to be updated. It wasn't until another application started to throw exceptions that it became clear that one had been missed.

If this was a new area of development, a contributing factor to this oversight could have been that the developer making the change might have not known that the contract they were changing was already being consumed by multiple applications. The developer might have assumed that it was OK to change the contract because the code was "very young" and it couldn't have been integrated with other applications in such a short time frame. If anything this highlights why you can’t rely on a developer to catch this as they can't be expected to know of where each service method is being used.

The fix is simple, but could this have been avoided in the first place? A simple solution would be to never alter the API of a service, ever. The problem with this is that it is still comes down to the developer to ensure that this is being followed. Remember that in my example, the service method had been only been created very recently so it's conceivable that another developer would also assume that they were dealing with the only occurrence where the service method was being consumed. Documentation and improved communications between developers would help to identify an issue, but it would still be up to the developer to notice it.

An ideal solution would be automated so that a developer would not need to be aware of how an API is being consumed. If this could be created as a set of tests that are run as part of a build process, any issues would be identified well before deployment to production. To achieve this, you would need the ability to stage the service in a test environment where you could fire off requests to verify that the API has not changed. Another possibility would be to compared the current WSDL file with a previous version to check for changes.

If you find yourself considering altering an API, be sure to take a moment to ask yourself why. Can you extend the current contract to ensure that the API will not be broken instead of changing it? In most cases, this should be the preferred course of action and making changes to the contract should only happen when there really is no other choice.

Sunday, November 2, 2014

Bootstrap 3's Grid System

I have been playing with Bootstrap 3's grid system and have found it to be a very clever general purpose grid system for assisting developers with the layout of their sites. It also helps lead you to developing a responsive design which adapts to the size of the screen size it is being displayed on. To help cement these concept in, I quickly created a demo site which allowed me to view bootstraps responsive design features in action.

You can find that site here.
Adjust the width of the page and see how the grids layout automatically adjusts with you. These adjustments go beyond shrinking the width of a column but allow you to specify different column sizes for specific view ports. This opens up lots of options for re-arranging your site to make it just as usable on a phone, tablet or laptop.

Bootstrap goes beyond just allowing you to apply different widths to a column within a grid though. It also provides you with the ability to show and hide columns based on the width of a page. This provides you with even more functionality to empower your sites responsive design.

How does Bootstrap do this? It's all through the use of CSS media query's. Media query's allow specific CSS to be applied when certain condition are met. In the case of the Bootstrap grid system, these conditions are based on the width of your page. Take a look at the Bootstrap 3's docs to get a better understanding of how this works and the grid system in general.

With CSS frameworks like Bootstrap, HTML5 Boilerplate and Pure, its getting harder and harder to NOT create a responsive site.

Friday, October 24, 2014

Using dynamic and DynamicObject in C#

Up until I came across Simple.Data, I had only used the dynamic type when unit testing controllers that returned JsonResult objects. After playing with Simple.Data and taking a look at the source code, it became clear that there was a lot more to the dynamic type than I had realized; it hadn’t occurred to me that you could use the member’s name essentially as an additional argument (thanks Jon for making me see this!). To better understand all of this, I decided to create my own limited implementation that would allow me to execute stored procedures in the same fashion. VerySimpleData is heavily based on Simple.Data and I am not trying to take any credit for the creativity of this approach. I have tried to keep my implementation as lean as possible to highlight the use and function of the dynamic type.

VerySimpleData

First of all, you can find the repository for VerySimpleData here.

I’m sure you’re familiar with how the compiler will bark at you if you make a typo and end up calling a method or setting a property that doesn’t exist. That’s static typing at work making sure that everything your code is talking about actually does exist. The dynamic type allows you to sidestep this and by-pass what would normally be resolved at compile-time and instead have it resolved dynamically at run-time. This means that you now can call methods or set property that do not exist, as long as you’re using a dynamic type. Let's start with an example of how to use VerySimpleData to highlight this.

1 2	var database = Database.WithNamedConnectionString("connectionString"); var results = database.MyStoredProcedure(Param: "something", AnotherParam: "something-else");

In line one, the connection string is supplied and a dynamic object is returned which will act as the database context. In line two, I am calling a stored procedure called "MyStoredProcedure" and passing it two parameters called "Param" and "AnotherParam"; if I wanted to execute a stored procedure with a different name and/or parameters, I would only need to change the method's name and the names of the parameters that are being supplied:

var results = database.AnotherStoredProcedure(DifferentParam: "something-different");

Looking Under the Hood:

The entry point to VerySimpleData is through one of the static methods on the Database class:

public static class Database
{
    public static dynamic WithNamedConnectionString(string name)
    {
        var connectionStringSettings = ConfigurationManager.ConnectionStrings[name];

        if (connectionStringSettings == null)
        {
            throw new ArgumentException("No connection string found for: " + name);
        }

        return WithConnectionString(connectionStringSettings.ConnectionString);
    }

    public static dynamic WithConnectionString(string connectionString)
    {
        IDatabase database = new SqlServerDatabase(connectionString);
        var dynamicContext = new DatabaseContext(database);

        return dynamicContext;
    }
}

public class DatabaseContext : DynamicObject
{
    private readonly IDatabase Database;

    public DatabaseContext(IDatabase database)
    {
        Database = database;
    }

    public override bool TryInvokeMember(InvokeMemberBinder binder, object[] args, out object result)
    {
        if (binder.CallInfo.ArgumentCount != args.Length)
        {
            result = false;

            return false;
        }

        var procedureName = binder.Name;
        var parameters = binder.CallInfo.ArgumentNames.Zip(args, (s, i) => new { s, i })
            .ToDictionary(item => item.s, item => item.i);

        result = Database.ExecuteStoredProcedure(procedureName, parameters);

        return true;
    }
}

An instance of DatabaseContext is returned from the Database static class as a dynamic type when either WithNamedConnectionString() or WithConnectionString() is called. The key to this class is that it inherits from DynamicObject. DynamicObject provides the ability to interact with any of the members that are executed/set on the dynamic type by overriding the base implementation:

TryInvokeMember is invoked whenever a method is called on the dynamic object. By providing your own override of this, you can control what will happen. The example in DatabaseContext.cs is very simple. First there is a check that the argument count matches the number of arguments that were supplied. From a quick inspection, the InvokeMemberBinder appears to contains meta data/semantics about the invoked member (e.g. the names of the arguments, the name of the member being invoked) whereas the values themselves are contained on the args array. It is important to note here that the “out” result parameter holds the object that will be returned when this member is invoked; the returned Boolean for TryInvokeMember indicates if the operation was successful or not. Returning false causes a Microsoft.CSharp.RuntimeBinder.RuntimeBinderException to be thrown. This pattern is the same for all the Try[…] overrides exposed from inheriting from DynamicObject.

My implementation assumes that the name of the invoked member will match the stored procedure’s name (line 19) and that the names of the arguments will match the procedure’s parameters (lines 20-21). Calling the ExecuteStoredProcudure method (line 23) fires off the ADO.NET code that talks to the database and it should be no surprise that this returns a dynamic object as the result could be a single value, a single record, a set of records or nothing at all.

If there’s only one row and multiple columns, this will be returned as a VerySimpleDataRecord:

public class VerySimpleDataRecord : DynamicObject
{
    private readonly Dictionary<string, object> _data;

    public VerySimpleDataRecord()
        :this(new Dictionary<string, object>())
    { }

    public VerySimpleDataRecord(Dictionary<string, object> data)
    {
        _data = data;
    }

    public override bool TryGetMember(GetMemberBinder binder, out object result)
    {
        if (_data.ContainsKey(binder.Name))
        {
            result = _data[binder.Name];

            return true;
        }
        
        return base.TryGetMember(binder, out result);
    }

    public override bool TrySetMember(SetMemberBinder binder, object value)
    {
        _data[binder.Name] = value;

        return true;
    }

    public int ColumnCount
    {
        get { return _data.Count; }
    }
}

This is constructed using a dictionary collection where the key-value pairs comprise of the column name and the data that was held in that column. The TryGetMember override is invoked when performing a property get operation on the dynamic object; if there is a key in the dictionary collection that matches the property’s name, it is returned.

The final scenario is where there are multiple rows with one or more columns. In this case, a collection of data will be returned that is packaged in a way that make it enumerable:

public class VerySimpleDataRecordSet : DynamicObject, IEnumerable
{
    private readonly IList<VerySimpleDataRecord> _data;

    public VerySimpleDataRecordSet()
        : this(new List<VerySimpleDataRecord>())
    { }

    public VerySimpleDataRecordSet(IList<VerySimpleDataRecord> data)
    {
        _data = data;
    }

    public override bool TryGetIndex(GetIndexBinder binder, object[] indexes, out object result)
    {
        if (indexes.Length == 1 && indexes[0].GetType() == typeof(int))
        {
            result = _data[(int)indexes[0]];

            return true;
        }

        result = false;

        return false;
    }

    public int Count
    {
        get { return _data.Count; }
    }

    public IEnumerator GetEnumerator()
    {
        return _data.GetEnumerator();
    }
}

A VerySimpleDataRecordSet is initialized by supplying a collection of VerySimpleDataRecords. Like the dictionary collection in the VerySimpleDataRecord, this is stored internally. VerySimpleDataRecordSet also inherits from DynamicObject that provides the TryGetIndex override. When this is invoked, the supplied index value is first validated and if it passes, the appropriate VerySimpleDataRecord is retrieved from the internal collection and returned. The ability to enumerate the VerySimpleDataRecordSet is achieved by implementing IEnumerable: the GetEnumerator method exposes the internal collection’s IEnumerator.

I have only touched a few of the method overrides that are available through DynamicObject mainly to do with getting and returning data. There are other operations to do with setting values and performing conversions that would be worth exploring.

Friday, September 12, 2014

SQL - Beyond the standard ORDER BY ASC / DESC

Recently at work a request came in from a user to update an existing report so it applied a specific order to the reported data. Previously the report ordered items alphabetically by Class and then Model, however, this was no longer suitable and a non-alphabetical ordering was required.
Initially when I read this requirement I thought “Oh heck, how am I going to do this as it’s no longer in alphabetical order?”. As is often the case, the solution was simpler than expected.
I have created a simplified representation of the data in use to illustrate the technique I used to accomplish the custom sort order below.

User Requirement:
Order products in the following Class order
R-Class
ZA-Class
T-Class
G-Class
J-Class

As discussed earlier, this is not in alphabetical order so the following SQL statement will not produce the required result:

SELECT Model, Class
FROM Products
ORDER BY Class, Model

As we need to apply a specific order, we can accomplish this by introducing an additional numeric field into the result set which is solely used for ordering the results. This additional field can be produced by using a CASE WHEN statement to determine the correct numeric value to ensure the correct ordering is applied.

SELECT Model
 , Class
 , CASE
  WHEN Class = 'R-Class' THEN 1
  WHEN Class = 'ZA-Class' THEN 2
  WHEN Class = 'T-Class' THEN 3
  WHEN Class = 'G-Class' THEN 4 
  WHEN Class = 'J-Class' THEN 5      
 END AS ProductClassOrdering
FROM Products
ORDER BY ProductClassOrdering, Model

The ProductClassOrdering field will contain a numeric value which can be used as part of the ORDER BY clause to ensure the users custom ordering is applied.

Boom! This technique creates the desired result but it also introduces the use of an interesting technique: creating a field dynamically as part of your result set. It can be very easy to believe a SELECT statement can only select static data from a table (I will admit I was guilty of this belief when I first learnt SQL) but this is not true. If you start viewing each select value as a piece of data which can be generated by other means within the SQL languages constraints, you will find a whole new range of querying options available to you (sub-selects, logic to determine a value, additional filtering to name a few).

However, a word of warning is needed. With these additional SQL powers comes great responsibility which can be summarized in a single word: performance. Keep one eye on the performance of your query when introducing additional SQL statements into the SELECT portion of your query as you can inadvertently introduce performance bottle necks into your query.

Have a play and start flexing those SQL muscles a little more.