Friday, May 25, 2012

Hierarchy Cache

In the past I tried to avoid using hierarchies in SQL, but I couldn't find another simple data structure for a system requirement: all elements must be children of other elements with no depth limit.

I was able to change the depth limit to 100.

So I started out with:

CREATE TABLE [dbo].[Hierarchy](
    [ElementId] [int] IDENTITY(1,1) NOT NULL,
    [ParentId] [int] NULL,
    [ElementId] ASC

at first everything worked fine, I could traverse the tree with CTE without any problem, it was even fast for the current load, then came another system requirement, the tree size will be about 1 million records.

with cte as
 select ElementId, ParentId, 0 as Level
 from Hierarchy
 where ParentId is null
 union all
 select Hierarchy.ElementId, Hierarchy.ParentId, Level + 1
 from Hierarchy
  join cte on cte.ElementId = Hierarchy.ParentId

select * from cte

trying to traverse a tree of 1M records with CTE with only 15 levels took more than a production sql should take, even if the system's requirement should be live or near live.

We've been looking for solutions in every corner of the internet, graph databases (we'll still need to use these IDs in the SQL so passing the data back will be a design nightmare or we can move the portions or whole application, I guess these are other options), trying to find all sorts of algorithms (which will be error prune and might be a heavy toll on the development team), CLR functions and of curse scrapping the whole hierarchy idea (no can do, system requirements).

Eventually we came to the conclusion that keeping a cache will not be too costly on the memory, cpu and disk provided that the right indexes will be created and they will provide the best result and the fun part is, its very easy and fast to join the cache table on any other table we'll need. 

But we had to compromise somewhere, we compromised on node parent changes, some of them might cause the entire tree to rebuild.

I built a method to quickly update the tree given an elementid and if elementid was not provided, it will rebuild the whole tree.

I can smell just one little problem, which will be solved when updating records, need to detect circular references.

CREATE TABLE [dbo].[HierarchyCache](
 [OwnerId] [int] NOT NULL,
 [ElementId] [int] NOT NULL

And the SQL code updating the HierarchyCache, note that I'm using merge instead of dropping the whole table or writing individual insert/delete.

declare @ElementId int = null

--if no ElementId was supplied, it means we should go over all Hierarchy and rebuild the table
if (@ElementId is null)

    with parentHierarchy (ElementId, ParentId , TopElementId)
        select  ElementId, ParentId, ElementId as TopElementId
        from Hierarchy with (nolock)

        union all

        select  [Hierarchy].ElementId,[Hierarchy].ParentId , pu.TopElementId
        from Hierarchy with (nolock)
        inner join parentHierarchy as pu
            on pu.ParentId = [Hierarchy].ElementId

    --instead of updating the entire table, we're doing a merge, more efficient and easier on the disk
    merge into HierarchyCache as target
    using parentHierarchy
        on (parentHierarchy.TopElementId = target.OwnerId and parentHierarchy.ElementId = target.ElementId)
    when not matched by target then
        insert (ElementId, OwnerId) values (parentHierarchy.ElementId,parentHierarchy.TopElementId)
    when not matched by source then

    --a ElementId was supplied, update only its tree

    with parentHierarchy (ElementId, ParentId  , TopElementId)
        select  ElementId, ParentId , ElementId as TopElementId
        from Hierarchy with (nolock)
        where ElementId = @ElementId

        union all

        select  [Hierarchy].ElementId,(case when (Hierarchy.ParentId = Hierarchy.ElementId) then null else Hierarchy.ParentId end) as ParentId , pu.TopElementId
        from Hierarchy  with (nolock)
        inner join parentHierarchy as pu
            on pu.ParentId = [Hierarchy].ElementId


    --instead of updating the entire table, we're doing a merge, more efficient and easier on the disk
    merge into HierarchyCache as target
    using parentHierarchy
        on (parentHierarchy.TopElementId = target.OwnerId and parentHierarchy.ElementId = target.ElementId)
    when not matched by target then
        insert (ElementId, OwnerId) values (parentHierarchy.ElementId,parentHierarchy.TopElementId);



Many years ago when I was doing some IT work for the company I worked for, I needed routers, I had many PCs laying around so getting a few NICs and installing them was not a problem, I've tried many open source routers but they couldn't handle the IGMP packets properly and after a few seconds the PPTP VPN connections would drop.

Until I met MikroTik, its not a freeware, but it costs very little for what it can do ($45), I've been using it since to help low budget IT departments have professional grade routing.

Level number 0 (Demo mode) 1 (Free) 3 (WISP CPE) 4 (WISP) 5 (WISP) 6 (Controller)
Price no key registration required volume only $45 $95 $250
Upgradable To - no upgrades ROS v6.x ROS v6.x ROS v7.x ROS v7.x
Initial Config Support - - - 15 days 30 days 30 days
Wireless AP 24h trial - - yes yes yes
Wireless Client and Bridge 24h trial - yes yes yes yes
RIP, OSPF, BGP protocols 24h trial - yes(*) yes yes yes
EoIP tunnels 24h trial 1 unlimited unlimited unlimited unlimited
PPPoE tunnels 24h trial 1 200 200 500 unlimited
PPTP tunnels 24h trial 1 200 200 500 unlimited
L2TP tunnels 24h trial 1 200 200 500 unlimited
OVPN tunnels 24h trial 1 200 200 unlimited unlimited
VLAN interfaces 24h trial 1 unlimited unlimited unlimited unlimited
HotSpot active users 24h trial 1 1 200 500 unlimited
RADIUS client 24h trial - yes yes yes yes
Queues 24h trial 1 unlimited unlimited unlimited unlimited
Web proxy 24h trial - yes yes yes yes
User manager active sessions 24h trial 1 10 20 50 Unlimited
Number of KVM guests none 1 Unlimited Unlimited Unlimited Unlimited

They also have appliances, take a look at their website.

I do not get percentages, but I did went through one of their courses in Florida at 2005.

Thursday, May 24, 2012

Dictionary vs ConcurrentDictionary

Who hasn't seen in almost every best practices guide "cache everything!", but what they forget to tell you is that by that time, the caching you're using is one of the biggest bottlenecks of your program, you try to use as many dictionaries as you can (at least for a particular caching need) and avoid locking and blocking in any way possible.

When Microsoft decided its time for .NET to start using multi-threading heavily, they came out with Parallel Extensions, later on they added ConcurrentDictionary, but I've always wondered what is the price we're paying for the dictionary to work in a thread-safe environment.

I can't stress this enough, Do not use a regular Dictionary with multi-threading!
In some cases when the dictionary is updated by two (or more) threads it will corrupt its internal structures and it will throw exceptions. sometimes items will disappear, sometimes the whole dictionary will stop working.

In the past I've implemented a dictionary with lock keyword, spinlock, read/write locks and even a copying dictionary so whenever you had to make a change, it will copy itself, make the modification and replace the reference, I didn't care about losing information, I just wanted the quickest dictionary.

So how does the ConcurrentDictionary works? well, using ILSpy, we can take a look, I can't post the source code here so you'll have to do your own digging, but apparently its using Monitor.Enter for the writing portion, which is just an alternative of the lock keyword, for the reading portion it seems that its pretty close to the regular Dictionary.

I remembered reading about lock-free hashmaps (PDF) in the past but didn't have the time looking up the algorithm, understanding it and implementing it, luckily someone already did but I can't find the original C# implementation, here's what I've found now

I did some benchmarks and the results are pretty interesting.
The categories (X) are the number of concurrent threads attempting to write and read.
Note: the graphs do not work in IE for now.

You may find the benchmark project at:

I've taken the readwrite locked dictionary from Lorenz Cuno Klopfenstein, I've implemented a minimal spinlock dictionary just for the test, don't use it.

The ThreadSafeDictionary is the star of this benchmarks, it performs exceptionally well, so if your program relies heavily on dictionaries and threading, use it.

Tuesday, May 22, 2012

Showing dynamic graphs in a blog

Lets face it, those static graphs in blogs are sometimes completely useless, sometimes one bar is overpowering the rest, you can't really edit the data easily, and don't even mention changing the graph type on the fly, including a few graph types in the same page? huh! you must be joking, why would I overcrowd the page just for ease of use?

Here's what I came up with, its doesn't have that flashy select box with multiple graph types, but it should be pretty easy to add.

Update: The demo page is not working in IE, the host I'm using is serving javascript files as text/plain instead of text/javascript.

First, create graph data in google spreadsheet, make sure you publish the spreadsheet, otherwise you won't have the url needed (File -> publish to the web -> Start Publishing)

Then create a page in the blog with this code (or similar), make sure you have the correct urls, for the scripts and your spreadsheet.

 <script src="Scripts/jquery-1.4.1.min.js" type="text/javascript"></script>
 <script src="Scripts/google-spreadsheet.js" type="text/javascript"></script>
 <script src="Scripts/google-spreadsheet-parser.js" type="text/javascript"></script>
 <script src=""></script>

     <div id="readContainer"></div>
     <div id="writeContainer"></div>

    <script type="text/javascript">

        //container: div container
        //type: 'line' for example
        //title: 'Dictionary Benchmarks'
        //categories: array of x-graph
        //ytitle: y graph title
        function DrawChart(container, type, title, categories, ytitle, series)
            var chart = new Highcharts.Chart({
                chart: {
                    renderTo: container,
                    type: type,
                    marginRight: 130,
                    marginBottom: 25
                title: {
                    text: title,
                    x: -20 //center
                //                subtitle: {
                //                    text: 'Source:',
                //                    x: -20
                //                },
                xAxis: {
                    categories: categories
                yAxis: {
                    title: {
                        text: ytitle
                    plotLines: [{
                        value: 0,
                        width: 1,
                        color: '#808080'
                tooltip: {
                    formatter: function ()
                        return '<b>' + + '</b><br/>' +
     this.x + ': ' + this.y + '';
                legend: {
                    layout: 'vertical',
                    align: 'right',
                    verticalAlign: 'top',
                    x: -10,
                    y: 100,
                    borderWidth: 0
                series: series
            return chart;

        function ChartFromDictionaryData(container,sparser, rows, title, ytitle)

            //need to do read / write / only

            //get all dictionary types
            var serieslist = sparser.DistinctByColumn(rows, "Dictionary");
            delete serieslist["Dictionary"];

            //get all threads
            var threads = sparser.DistinctByColumn(rows, "Threads");
            delete threads["Threads"];

            //highcharts categories 
            var categories = [];
            for (var thread in threads)
            categories.sort(function (a, b) { return a - b; });

            //foreach dictionary, get all rows, order by threads, get timing

            var highchartseries = [];
            for (var series in serieslist)
                var newseries = {};
       = series;
       = [];
                var rowdata = sparser.FilterByColumn(rows, "Dictionary", series);
                for (var thread in threads)
                    var rowvalue = parseInt(sparser.GetValue(sparser.FilterByColumn(rowdata, "Threads", thread), "Time"));
            DrawChart(container, "line", title, categories, ytitle, highchartseries);


//replace with your published spreadsheet url
        var url = "";
        var googleSpreadsheet = new GoogleSpreadsheet();
        googleSpreadsheet.load(function (result)
                var sparser = new GoogleSpreadsheetParser(result);

                var readrows = sparser.FilterByColumn(sparser.FilterByColumn(sparser.GetAllRows(), "Read", "TRUE"), "Write", "FALSE");
                var writerows = sparser.FilterByColumn(sparser.FilterByColumn(sparser.GetAllRows(), "Read", "FALSE"), "Write", "TRUE");

                ChartFromDictionaryData('readContainer',sparser, readrows, "Dictionary Benchmarks - Read", "Time (ms) per 1 million operations");
                ChartFromDictionaryData('writeContainer',sparser, writerows, "Dictionary Benchmarks - Write", "Time (ms) per 1 million operations");

                //$('#results').html(JSON.stringify(result).replace(/,/g, ",\n"));
            } catch (e)

and voila, you have a dynamic chart.

Take a loot at highcharts, I have yet to see a better charting library, the graph selection is great:

I took the idea from Mike McKay, here, but I've modified his code and added some of mine, there's a test project at and the files are in feel free to use it.

Here's an example I did today:

And one last bit of information, this link might help you to understand what google spreadsheets can do for you - as a datasource.

Saturday, May 19, 2012

.NET Composition

.NET 4.0 Provides an easy way for runtime or aftermarket plugins and components, its called Composition.

Before composition we had to load each assembly, check its types and create new instances, I've provided a demo in class PreCompositionInfrastructure.

Composition provides that infrastructure and more, it allows you to specify if you want only one object or multiple, you can specify imports by attribute or by method, specify exports by type or contract, lazy initialization of imports by type or contact.

You just need to add a reference to System.ComponentModel.Composition, create a catalog with directory or assembly and when one is not enough, you can aggregate catalogs.

Then you can either get specific plugins with GetExportedValue or GetExportedValues, or you can let composition do it automatically with ComposeParts, but you'll have to use property attributes to specify which properties will be populated with which plugins and if its many or just one.

On the plugin side, you need to add a reference to System.ComponentModel.Composition and specify which class is an Export of which interface/contract, of curse you have to have access to the same interface from both the imports and the exports, so I recommend using another class library for that interface(s).

And we have to have some benchmarks, but its not really an issue since most applications load plugins at startup and never touch it again and its a lot better having an easy to program infrastructure rather than a quick one if its not affecting the overall performance of the application.

For 1000 iterations:
PluginInfrastructre - 1846ms
(More or less the same time, deviation because of unrelated changes)
PreCompositionInfrastructure - 294ms