Saturday, March 16, 2013

MemoryMappedFile


Ever since .NET 4.0 came out I've been wanting to play with MemoryMappedFile.

MemoryMappedFile is a way to map a file to memory address, so you're able to access it just like you would a byte array (for example).

At first I've attempted to create a simple document store like library, but MemoryMappedFile has its own quirks and working around them seemed a like a task for a library.

So feel free to use this FileStore library in your own projects, mind you, this library wasn't tested beyond simple sanity tests, no unit tests or integration tests have been done and in its current form it is in no way thread safe.

There are two main classes in the library:
1. MemoryMapper which is a wrapper around MemoryMappedFile, designed to allow file growing and multiple views/streams create/cleanup.
It has CreateView/CreateStream for creating accessors and Flush which is used to free unused accessors and flush the accessors changes to disk.

2. FileStoreManager - a document store-like object which stores data on one file and the data table with guid key on another file, benchmarks are pretty good for its scope, main bottlenecks are the binary formatter and the fact that there's no index so it has to scan through the data table file, I've added a dictionary for faster access, also there's no handling for modifying or deleting objects.


So here are a few conclusions from that project:

1. CreateViewAccessor and CreateViewStream are VERY slow, use blocks as large as possible over creating views/streams for each block.
2. There's a limit to how many Views and Streams you can create, since its mapping to a contiguous memory area, it might pose a problem for the calling program to allocate memory for other things or throw an exception itself ("Not enough storage is available to process this command."). this is more likely to happen in x86 because of the short address space of 32bit applications.
3. Disposing the accessors flushes their changes to disk, if you're doing many of these, it could slow things down.
4. Resizing the file requires disposing the MemoryMappedFile and its Streams/Views, doing it is an expensive operation so you should try to minimize file resizes.

You might want to take a look at RaptorDB by Mehdi Gholam.

You can find the project here:

https://github.com/drorgl/ForBlog/tree/master/MemoryMappedFileDemo

Saturday, February 9, 2013

NuGet Package for RTree

I've just created a NuGet Package for the RTree porting I've done a few years ago, I've seen people are still interested in that project so why not make their lives easier?


I've also updated the project at (Just SVN):


I've added simple tests and changed the log4net dependency to NuGet.

I've also seen that Aled Morris updated his java documentation at:


Its not exactly the same anymore as the version I've ported is from 2009.





Friday, February 1, 2013

Passing Multiple Values to SQL


This article started as a collection of ways to pass multiple values to SQL, some of the articles I've written in the past became benchmarks because I just can't resist the temptation of checking how much can one thing do over the other and what's the best uses for each case.

So allow me to show you some of my findings.

In the past (SQL 2005 era) when we wanted to pass multiple values, we probably used splits, XML, dynamic queries and whatnot.

SQL 2008 introduced user defined table types which allowed us to pass whole tables as a SQL parameter.

Starting SQL 2008 we got merge which could be used on user defined table types and XML for bulk inserts/updates, which, in my opinion, is a great possibility.

So lets see what we're going to cover.

1. Split

2. SQLBulkCopy

3. XML Parameters

4. Dynamic SQL

5. Pre-rendered SQL (which is just a fancy way of saying I'm going to convert SqlCommand to string so we can see if that's the way ADO.NET passes the values).

6. User Defined Table Types


So Lets Start:

1. Split

Lets say we have a column with multiple values in the same varchar field, but we need to join on these values to get a report our boss needs.

What do we do?

Well, here's a split function that Mladen Prajdić wrote:


CREATE FUNCTION dbo.Split(@data NVARCHAR(MAX), @delimiter NVARCHAR(5))
RETURNS @t TABLE (data NVARCHAR(max))
AS
BEGIN
    
    DECLARE @textXML XML;
    SELECT    @textXML = CAST('<d>' + REPLACE(@data, @delimiter, '</d><d>') + '</d>' AS XML);

    INSERT INTO @t(data)
    SELECT  T.split.value('.', 'nvarchar(max)') AS data
    FROM    @textXML.nodes('/d') T(split)
    
    RETURN
END



Performance is not so bad but also no great. so great, we have that report, but what if they decide to make that report available to clients?


Well, I've got a solution for that but I'm not sure you're gonna like it, it has something to do with triggers.

How then? well, create a cache table and triggers to update it, then you can index that cache table and get the job done.

2. SQLBulkCopy

One of the oldest ways of bulk inserts is with BULK INSERT , but its more of a DBA role than a developer, but ADO.NET does provide a way for bulk inserts, its called SQLBulkCopy.

We can set the batch size, timeouts, column mappings and some more settings.


using (SqlBulkCopy bulkCopy = new SqlBulkCopy(connection, SqlBulkCopyOptions.Default,null))
{
    bulkCopy.BatchSize = this.BulkOperationsCount;
    bulkCopy.DestinationTableName = "Entities";
    bulkCopy.WriteToServer(rows.ToArray());
}


3. XML Parameters

Starting SQL 2005, there is XML data type, which meant we could pass XML to stored procedures, store it in tables and query by it.

This made SQL server very robust as there is a possibility to pass multiple values without resorting to workarounds.


with upusers (EntityId, Name, Phone, Email, Address, City, Zip, State, Country, BirthDate)
as
(    
    SELECT 
        T.val.value('(@EntityId)','int') as EntityId,
        T.val.value('(@Name)','nvarchar(max)') as Name,
        T.val.value('(@Phone)','nvarchar(max)') as Phone,
        T.val.value('(@Email)','nvarchar(max)') as Email,
        T.val.value('(@Address)','nvarchar(max)') as Address,
        T.val.value('(@City)','nvarchar(max)') as City,
        T.val.value('(@Zip)','nvarchar(max)') as Zip,
        T.val.value('(@State)','nvarchar(max)') as State,
        T.val.value('(@Country)','nvarchar(max)') as Country,
        T.val.value('(@BirthDate)','nvarchar(max)') as BirthDate

    FROM @xmlval.nodes('/root/item') T(val)
)

merge Entities as target
using (select * from upusers) as source (EntityId, Name, Phone, Email, Address, City, Zip, State, Country, BirthDate)
on (target.EntityId = source.EntityId)
when matched then
    update 
        set Name = source.Name,
         Phone = source.Phone,
         Email = source.Email,
         Address = source.Address,
         City = source.City,
         Zip = source.Zip,
         State = source.State,
         Country = source.Country,
         BirthDate = source.Birthdate
when Not Matched Then
    insert (Name, Phone, Email, Address, City, Zip, State, Country, Birthdate)
    values (source.Name, source.Phone, source.Email, source.Address, source.City, source.Zip, source.State, source.Country, source.Birthdate)



4. Dynamic SQL

Dynamic was always an ugly love child of programmers, it made everything simpler from a program flow point of view, it made things hacky and ugly when it wasn't made properly and a hell to fix whenever there's a problem.

Dynamic SQL done wrong is the main cause of SQL injection, but like many good things Microsoft brought to us in SQL 2005 which sp_executesql was one of them, its possible to write dynamic SQL and avoid most of the injection problems.

It should be noted that writing proper insert queries is also affecting performance, which you can see in NoOptimizationsDynamicExecution.

So an unoptimized insert query would look like this:


INSERT INTO tablename(col1, col2,...)
     VALUES (..,..,..);
INSERT INTO tablename(col1, col2,...)
     VALUES (..,..,..);
INSERT INTO tablename(col1, col2,...)
     VALUES (..,..,..);  
...


while optimized insert query would look like this:


INSERT INTO tablename(col1, col2,...)
     VALUES (..,..,..),(..,..,..),(..,..,..)...


5. Pre-render SQL

RenderSQL.cs is a class from the IQueryable to SQL project, I've converted it to SqlCommand to see if I can better understand how the query is converted from parameters to sp_executesql.

For reference, the benchmarks is executing the User Defined Table Type query, passed through RenderSQL.

6. User Defined Table Type

User Defined Table Types provides us with reasonable performance and IMHO, huge readability and program flow benefits. Instead of going through multiple layers, converting the data from X to Y and doing the same in SQL, you just pass a List of SqlDataRecord, SqlDataReader or a DataTable and on SQL side, you just use the parameter as a normal table variable.

so assuming we have this:


string testcommand = @"
select * from Entities
where Entities.EntityId in ({0})
";

using (SqlConnection connection = new SqlConnection(this.ConnectionString))
{
    connection.Open();
    using (SqlCommand command = new SqlCommand(string.Format(testcommand, string.Join(",", this.SelectRecords.Select(i => i.GetValue(0)).ToArray())),connection)) 
    {
        command.CommandType = CommandType.Text;

        using (var reader = command.ExecuteReader())
        {
            ...
        }
    }
}


We just need to use this instead:

string testcommand = @"
select * from Entities
join @intt 
    on [@intt].Id = Entities.EntityId
";

using (SqlConnection connection = new SqlConnection(ConfigurationManager.ConnectionStrings["TestConnectionString"].ConnectionString))
{
    connection.Open();
    using (SqlCommand command = new SqlCommand(testcommand, connection))
    {
        command.CommandType = CommandType.Text;

        var param = command.CreateParameter();
        param.SqlDbType = SqlDbType.Structured;
        param.ParameterName = "@intt";
        param.TypeName = "udt_inttable";

        var dt = new DataTable();
        dt.Columns.Add("Id");
        dt.Rows.Add(1);
        dt.Rows.Add(2);
        dt.Rows.Add(3);
        ...

        param.Value = dt;

        command.Parameters.Add(param);

        using (var reader = command.ExecuteReader())
        {
            ...
        }
    }
}


Simple, right?

After digging around (and looking at SQL Statistics, Compilations/Recompilations and this article), it seems that the query is not cached and according to this, its by design.

Project
https://github.com/drorgl/ForBlog/tree/master/MultiplParametersSQL
References:
http://www.codeproject.com/Articles/25457/Multiple-Ways-to-do-Multiple-Inserts

And Benchmarks, I've attempted to make exaggerated caching so it will affect the benchmarks as little as possible, in the real world, actually getting the data to SQL might be your bottleneck.
http://uhurumkate.blogspot.com/p/sql-bulk-insert-benchmarks.html

During the test I got this very nice error message:

The query processor ran out of internal resources and could not produce a query plan. This is a rare event and only expected for extremely complex queries or queries that reference a
very large number of tables or partitions. Please simplify the query. If you believe you have received this message in error, contact Customer Support Services for more information.


Saturday, January 26, 2013

Using Addins in WPF

I was starting to program an idea I've had, you know what they say, Necessity is the mother of invention... 

Its about unifying all the notifications in my daily tasks, RSS, emails, website updates (that didn't bother implementing RSS), etc' into one place, under one application, with each one able to trigger its own workflow, execute applications, popup on screen, queue messages in the systray and more.

So, one of the first things to do when writing an application like that, is to make it pluggable, makes it easy to extend the functionality in a very easy manner.

When developing such an application, one needs a way to configure each plugin, eventually I've figured out a way to do it in WPF.

I'm using composition for simplicity, from performance point of view, you might want to use the alternative I offered here

ApplicationPlugins = _container.GetExportedValues<IWPFApplicationPlugin>();


The way I'm doing it is creating a container window which has cancel and ok buttons and a grid which will later contain the user control from the plugins.

<Window x:Class="WPFApplication.SettingsWindow"
        xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"
        xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"
        Title="SettingsWindow" mc:Ignorable="d" xmlns:d="http://schemas.microsoft.com/expression/blend/2008" 
        xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006"
        d:DesignHeight="386" d:DesignWidth="501" SizeToContent="WidthAndHeight" Closing="Window_Closing">
    <Grid Width="Auto" Height="Auto">
        <Button Content="Ok" Height="25" HorizontalAlignment="Right" Margin="0,0,12,12" Name="btnOk" VerticalAlignment="Bottom" Width="98" Click="btnOk_Click" />
        <Button Content="Cancel" Height="25" HorizontalAlignment="Left" Margin="12,0,0,12" Name="btnCancel" VerticalAlignment="Bottom" Width="98" Click="btnCancel_Click" />
        <Grid Margin="12,12,12,43" Name="UCContainer" Width="Auto" Height="Auto"  />
    </Grid>
</Window>


When a user wants to add another record from a specific plugin, I open up a new window, populate the UCContainer's children with the user control from the plugin and display it.


public bool? ShowDialog(IWPFApplicationPlugin plugin, string data)
{
    this.m_plugin = plugin;
    var uc = this.m_plugin.GetUserControl();
    this.m_plugin.Reset();
    if (data != null)
        this.m_plugin.SetData(data);
    this.UCContainer.Children.Add(uc);
    this.UCContainer.MinHeight = uc.MinHeight;
    this.UCContainer.MinWidth = uc.MinWidth;
    return this.ShowDialog();
}


After looking it for myself and seeing people struggling with setting the window's MinWidth/MinHeight based on the content, here's how I got it:


private void Window_Loaded(object sender, RoutedEventArgs e)
{
    Size margin = new Size();
    FrameworkElement contentElement = this.Content as FrameworkElement;
    if (contentElement != null)
    {
        margin.Width = this.Width - contentElement.ActualWidth;
        margin.Height = this.Height - contentElement.ActualHeight;
    }

    Rect size = VisualTreeHelper.GetDescendantBounds(this);

    this.MinHeight = size.Height + margin.Height;
    this.MinWidth = size.Width + margin.Width ;
}

You can find the project at:
https://github.com/drorgl/ForBlog/tree/master/WPFPlugins