Saturday, February 9, 2013

NuGet Package for RTree

I've just created a NuGet Package for the RTree porting I've done a few years ago, I've seen people are still interested in that project so why not make their lives easier?


I've also updated the project at (Just SVN):


I've added simple tests and changed the log4net dependency to NuGet.

I've also seen that Aled Morris updated his java documentation at:


Its not exactly the same anymore as the version I've ported is from 2009.





Friday, February 1, 2013

Passing Multiple Values to SQL


This article started as a collection of ways to pass multiple values to SQL, some of the articles I've written in the past became benchmarks because I just can't resist the temptation of checking how much can one thing do over the other and what's the best uses for each case.

So allow me to show you some of my findings.

In the past (SQL 2005 era) when we wanted to pass multiple values, we probably used splits, XML, dynamic queries and whatnot.

SQL 2008 introduced user defined table types which allowed us to pass whole tables as a SQL parameter.

Starting SQL 2008 we got merge which could be used on user defined table types and XML for bulk inserts/updates, which, in my opinion, is a great possibility.

So lets see what we're going to cover.

1. Split

2. SQLBulkCopy

3. XML Parameters

4. Dynamic SQL

5. Pre-rendered SQL (which is just a fancy way of saying I'm going to convert SqlCommand to string so we can see if that's the way ADO.NET passes the values).

6. User Defined Table Types


So Lets Start:

1. Split

Lets say we have a column with multiple values in the same varchar field, but we need to join on these values to get a report our boss needs.

What do we do?

Well, here's a split function that Mladen Prajdić wrote:


CREATE FUNCTION dbo.Split(@data NVARCHAR(MAX), @delimiter NVARCHAR(5))
RETURNS @t TABLE (data NVARCHAR(max))
AS
BEGIN
    
    DECLARE @textXML XML;
    SELECT    @textXML = CAST('<d>' + REPLACE(@data, @delimiter, '</d><d>') + '</d>' AS XML);

    INSERT INTO @t(data)
    SELECT  T.split.value('.', 'nvarchar(max)') AS data
    FROM    @textXML.nodes('/d') T(split)
    
    RETURN
END



Performance is not so bad but also no great. so great, we have that report, but what if they decide to make that report available to clients?


Well, I've got a solution for that but I'm not sure you're gonna like it, it has something to do with triggers.

How then? well, create a cache table and triggers to update it, then you can index that cache table and get the job done.

2. SQLBulkCopy

One of the oldest ways of bulk inserts is with BULK INSERT , but its more of a DBA role than a developer, but ADO.NET does provide a way for bulk inserts, its called SQLBulkCopy.

We can set the batch size, timeouts, column mappings and some more settings.


using (SqlBulkCopy bulkCopy = new SqlBulkCopy(connection, SqlBulkCopyOptions.Default,null))
{
    bulkCopy.BatchSize = this.BulkOperationsCount;
    bulkCopy.DestinationTableName = "Entities";
    bulkCopy.WriteToServer(rows.ToArray());
}


3. XML Parameters

Starting SQL 2005, there is XML data type, which meant we could pass XML to stored procedures, store it in tables and query by it.

This made SQL server very robust as there is a possibility to pass multiple values without resorting to workarounds.


with upusers (EntityId, Name, Phone, Email, Address, City, Zip, State, Country, BirthDate)
as
(    
    SELECT 
        T.val.value('(@EntityId)','int') as EntityId,
        T.val.value('(@Name)','nvarchar(max)') as Name,
        T.val.value('(@Phone)','nvarchar(max)') as Phone,
        T.val.value('(@Email)','nvarchar(max)') as Email,
        T.val.value('(@Address)','nvarchar(max)') as Address,
        T.val.value('(@City)','nvarchar(max)') as City,
        T.val.value('(@Zip)','nvarchar(max)') as Zip,
        T.val.value('(@State)','nvarchar(max)') as State,
        T.val.value('(@Country)','nvarchar(max)') as Country,
        T.val.value('(@BirthDate)','nvarchar(max)') as BirthDate

    FROM @xmlval.nodes('/root/item') T(val)
)

merge Entities as target
using (select * from upusers) as source (EntityId, Name, Phone, Email, Address, City, Zip, State, Country, BirthDate)
on (target.EntityId = source.EntityId)
when matched then
    update 
        set Name = source.Name,
         Phone = source.Phone,
         Email = source.Email,
         Address = source.Address,
         City = source.City,
         Zip = source.Zip,
         State = source.State,
         Country = source.Country,
         BirthDate = source.Birthdate
when Not Matched Then
    insert (Name, Phone, Email, Address, City, Zip, State, Country, Birthdate)
    values (source.Name, source.Phone, source.Email, source.Address, source.City, source.Zip, source.State, source.Country, source.Birthdate)



4. Dynamic SQL

Dynamic was always an ugly love child of programmers, it made everything simpler from a program flow point of view, it made things hacky and ugly when it wasn't made properly and a hell to fix whenever there's a problem.

Dynamic SQL done wrong is the main cause of SQL injection, but like many good things Microsoft brought to us in SQL 2005 which sp_executesql was one of them, its possible to write dynamic SQL and avoid most of the injection problems.

It should be noted that writing proper insert queries is also affecting performance, which you can see in NoOptimizationsDynamicExecution.

So an unoptimized insert query would look like this:


INSERT INTO tablename(col1, col2,...)
     VALUES (..,..,..);
INSERT INTO tablename(col1, col2,...)
     VALUES (..,..,..);
INSERT INTO tablename(col1, col2,...)
     VALUES (..,..,..);  
...


while optimized insert query would look like this:


INSERT INTO tablename(col1, col2,...)
     VALUES (..,..,..),(..,..,..),(..,..,..)...


5. Pre-render SQL

RenderSQL.cs is a class from the IQueryable to SQL project, I've converted it to SqlCommand to see if I can better understand how the query is converted from parameters to sp_executesql.

For reference, the benchmarks is executing the User Defined Table Type query, passed through RenderSQL.

6. User Defined Table Type

User Defined Table Types provides us with reasonable performance and IMHO, huge readability and program flow benefits. Instead of going through multiple layers, converting the data from X to Y and doing the same in SQL, you just pass a List of SqlDataRecord, SqlDataReader or a DataTable and on SQL side, you just use the parameter as a normal table variable.

so assuming we have this:


string testcommand = @"
select * from Entities
where Entities.EntityId in ({0})
";

using (SqlConnection connection = new SqlConnection(this.ConnectionString))
{
    connection.Open();
    using (SqlCommand command = new SqlCommand(string.Format(testcommand, string.Join(",", this.SelectRecords.Select(i => i.GetValue(0)).ToArray())),connection)) 
    {
        command.CommandType = CommandType.Text;

        using (var reader = command.ExecuteReader())
        {
            ...
        }
    }
}


We just need to use this instead:

string testcommand = @"
select * from Entities
join @intt 
    on [@intt].Id = Entities.EntityId
";

using (SqlConnection connection = new SqlConnection(ConfigurationManager.ConnectionStrings["TestConnectionString"].ConnectionString))
{
    connection.Open();
    using (SqlCommand command = new SqlCommand(testcommand, connection))
    {
        command.CommandType = CommandType.Text;

        var param = command.CreateParameter();
        param.SqlDbType = SqlDbType.Structured;
        param.ParameterName = "@intt";
        param.TypeName = "udt_inttable";

        var dt = new DataTable();
        dt.Columns.Add("Id");
        dt.Rows.Add(1);
        dt.Rows.Add(2);
        dt.Rows.Add(3);
        ...

        param.Value = dt;

        command.Parameters.Add(param);

        using (var reader = command.ExecuteReader())
        {
            ...
        }
    }
}


Simple, right?

After digging around (and looking at SQL Statistics, Compilations/Recompilations and this article), it seems that the query is not cached and according to this, its by design.

Project
https://github.com/drorgl/ForBlog/tree/master/MultiplParametersSQL
References:
http://www.codeproject.com/Articles/25457/Multiple-Ways-to-do-Multiple-Inserts

And Benchmarks, I've attempted to make exaggerated caching so it will affect the benchmarks as little as possible, in the real world, actually getting the data to SQL might be your bottleneck.
http://uhurumkate.blogspot.com/p/sql-bulk-insert-benchmarks.html

During the test I got this very nice error message:

The query processor ran out of internal resources and could not produce a query plan. This is a rare event and only expected for extremely complex queries or queries that reference a
very large number of tables or partitions. Please simplify the query. If you believe you have received this message in error, contact Customer Support Services for more information.