Sunday, December 5, 2010

SQL Backup without interrupting the backup set

If you're like me, you're probably not too happy to make changes to a production database without a proper backup. 


Most SQL configurations I've seen either have replication/mirroring or they have a full backup once in a while and a differential backup relatively often so backing up the database normally will interrupt the flow of full to differential backups, creating a hole in the middle where your differential backups are unusable.


So what do we do?


Well, that's why "Copy Only" was made.


copy only printscreen




And for those who like code:


BACKUP DATABASE dbname
TO DISK = 'c:\folder\dbname.bak'
WITH COPY_ONLY
GO



Thursday, November 11, 2010

OpenVPN

OpenVPN is a great alternative to PPTP and L2TP, you get strong encryption out of the box and lets face it, no one is going to try to decrypt your stream, the maximum they'll try is MITM (man in the middle) attacks which are easily avoided if you enable some kind of certificate check.


Some interesting features/uses are:


1. You can authenticate on Active Directory.
2. You can actually script authentication on anything you want, check out auth-user-pass-verify.
3. Tunnel through any port on TCP/UDP, TCP is not recommended, as it will slow down connections significantly, but its a great workaround for ISPs that slow down UDP.
4. Connections can be enabled on startup, if you need a poor man's server-to-server VPN, you can use it with automatic startup, take a look at sc.exe,
startup options:
C:\Program Files\OpenVPN\bin\openvpn-gui.exe --connect sitename.ovpn
and retry infinite settings
5. you can bridge the TUN connections with the physical lan if you need bridging other than routing.
6. Connection compression.

Monday, August 2, 2010

Multiple Keyword Search

Some time ago I needed to do a huge data mining project, one of the tasks was getting keyword lists from multiple documents, my initial implementation was multiple IndexOf but my boss demanded near realtime and with the amount of data passing through the program, even a multi-core machine wasn't enough.


I've started digging and found this little algorithm called Aho-Corasick, its a nice algorithm that finds multiple keywords in a document.


Immediately after, I've found out that Tomas Petricek already implemented it in C# and the performance was better than the requirements, allowing for expansion.


So thanks Tomas!



Taken from CodeProject







Thursday, June 24, 2010

SQL Identifying Foreign Keys

In the past I've got a few legacy systems to maintain and improve, some of them with good documentation, some of them with non-existing documentation some even my own which were left alone for too long before I was able to document them.


This great article, written by Greg Robidoux has been an invaluable help in determining relationships in SQL, thank you Greg!


I've removed a few fields and left with this:


SELECT --PKTABLE_QUALIFIER = CONVERT(SYSNAME,DB_NAME()), 
       --PKTABLE_OWNER = CONVERT(SYSNAME,SCHEMA_NAME(O1.SCHEMA_ID)), 
       PKTABLE_NAME = CONVERT(SYSNAME,O1.NAME), 
       PKCOLUMN_NAME = CONVERT(SYSNAME,C1.NAME), 
       --FKTABLE_QUALIFIER = CONVERT(SYSNAME,DB_NAME()), 
       --FKTABLE_OWNER = CONVERT(SYSNAME,SCHEMA_NAME(O2.SCHEMA_ID)), 
       FKTABLE_NAME = CONVERT(SYSNAME,O2.NAME), 
       FKCOLUMN_NAME = CONVERT(SYSNAME,C2.NAME), 
       -- Force the column to be non-nullable (see SQL BU 325751) 
       --KEY_SEQ             = isnull(convert(smallint,k.constraint_column_id), sysconv(smallint,0)), 
       UPDATE_RULE = CONVERT(SMALLINT,CASE OBJECTPROPERTY(F.OBJECT_ID,'CnstIsUpdateCascade')  
                                        WHEN 1 THEN 0 
                                        ELSE 1 
                                      END), 
       DELETE_RULE = CONVERT(SMALLINT,CASE OBJECTPROPERTY(F.OBJECT_ID,'CnstIsDeleteCascade')  
                                        WHEN 1 THEN 0 
                                        ELSE 1 
                                      END), 
       FK_NAME = CONVERT(SYSNAME,OBJECT_NAME(F.OBJECT_ID)), 
       PK_NAME = CONVERT(SYSNAME,I.NAME), 
       DEFERRABILITY = CONVERT(SMALLINT,7)   -- SQL_NOT_DEFERRABLE 
FROM   SYS.ALL_OBJECTS O1, 
       SYS.ALL_OBJECTS O2, 
       SYS.ALL_COLUMNS C1, 
       SYS.ALL_COLUMNS C2, 
       SYS.FOREIGN_KEYS F 
       INNER JOIN SYS.FOREIGN_KEY_COLUMNS K 
         ON (K.CONSTRAINT_OBJECT_ID = F.OBJECT_ID) 
       INNER JOIN SYS.INDEXES I 
         ON (F.REFERENCED_OBJECT_ID = I.OBJECT_ID 
             AND F.KEY_INDEX_ID = I.INDEX_ID) 
WHERE  O1.OBJECT_ID = F.REFERENCED_OBJECT_ID 
       AND O2.OBJECT_ID = F.PARENT_OBJECT_ID 
       AND C1.OBJECT_ID = F.REFERENCED_OBJECT_ID 
       AND C2.OBJECT_ID = F.PARENT_OBJECT_ID 
       AND C1.COLUMN_ID = K.REFERENCED_COLUMN_ID
       AND C2.COLUMN_ID = K.PARENT_COLUMN_ID
order by PKTABLE_NAME, FKTABLE_NAME

Wednesday, June 9, 2010

SQL Query Hint With (NOLOCK)

SQL is an atomic transactional database by default, which means that each change either finishes completely or doesn't change anything.


This forces SQL server to use locks to perform its tasks, but what if we don't have to have the most updated data? or we don't care about records that are being modified right now and we're willing to accept some errors?


Example? 
Some statistics, it might change right now, we might not even care if lets say a user logged in once today or not at all, perhaps a refresh is being done every 5 minutes to update the UI and if the user won't show up on the screen now it will show up later?


But what am I saying?


If for every write to the statistics the table is being locked, but you need to get the current statistics without being delayed every update, what should you do? well, you can ask SQL to ignore the locks and give you whatever it has right now, consistent or not, you don't care as long as its close enough and fast.


For that we have nolock query hint.


select ... from statistics with (nolock)



SQL Query Hint WITH (NOEXPAND)

If you've reached to the point that you optimize your queries with indexed views, you probably know what you're doing or at least willing to try and understand execution plans.


Indexed views are great (as long as you don't abuse them), they can be an aggregate which is always ready to be used, you can index the columns of the view and get overall huge performance boost.


create view dbo.vwView with schemabinding
as
  select ...



But, if you don't use SQL Enterprise edition, the query optimizer needs to be told explicitly to use them, otherwise it will treat the indexed views like any other views, get the underlying query and use it instead. which defeats the purpose of having an indexed view.


So, what do we do? add with (noexpand) to queries that might benefit from the indexed view's indexes, note that you shouldn't use it blindly, look at the execution plans for both options and base your decision on that.


select ... from dbo.vwView with (noexpand)



Sunday, May 30, 2010

Copying Production Databases to Development Environment

Sometimes when you need to debug a production bug which can't be replicated on a development machine its very useful to get the production database(s) on the development server.

Our infrastructure was rather complex since we used more than one database, one for each section of the software with a common database for the common data.

First, I needed a way of deleting a database with minimal resistance, so taking the database offline before deleting it assured 99% of the time that it will go through the deletion.


ALTER DATABASE [DBName_Development]
SET OFFLINE
WITH ROLLBACK IMMEDIATE
GO
ALTER DATABASE [DBName_Development]
SET ONLINE
GO
DROP DATABASE [DBName_Development]
GO


Then I needed to backup the production database without affecting the backup set, hence the COPY_ONLY


BACKUP DATABASE DBName_Production
TO DISK = 'C:\TempBackup\DBName.bak'
WITH INIT,COPY_ONLY
GO


Then, we need to restore it on the development server, if the paths are a bit different the restore tsql needs some modifications, so this stored procedure was born:


-- =============================================
-- Author:        Dror Gluska
-- Create date: 2010-05-30
-- Description:    Restores databases with multiple files to specific destination
-- =============================================
CREATE PROCEDURE tuspTempRestoreDatabase
    @backupfilename nvarchar(255),
    @restorepath nvarchar(255),
    @dbname nvarchar(255)
AS
BEGIN

    Declare @LogicalName nvarchar(128)
    Declare @PhysicalName nvarchar(128)
    Declare @FType char(1)
    Declare @FileId int


    declare @dinfo table (LogicalName nvarchar(255),
        PhysicalName nvarchar(255),
        Type nvarchar(255),
        FileGroupName nvarchar(255),
        Size int,
        MaxSize float,
        FileId int,
        CreateLSN int,
        DropLSN int,
        UniqueId uniqueidentifier ,
        ReadOnlyLSN int,
        ReadWriteLSN int,
        BackupSizeInBytes int,
        SourceBlockSize int,
        FileGroupId int,
        LogGroupGUID uniqueidentifier ,
        DifferentialBaseLSN float,
        DifferentialBaseGUID uniqueidentifier ,
        IsReadOnly int,
        IsPresent int,
        TDEThumbprint int);

    declare @sql nvarchar(4000);
    set @sql = 'RESTORE FILELISTONLY from disk = ''' + @backupfilename + ''''
      
    insert into @dinfo 
    execute sp_executesql @sql

    set @sql = 'restore database ' + @dbname + ' from disk = ''' + @backupfilename + ''' with '

    Declare c1 cursor for
    select LogicalName,PhysicalName,Type,FileId
    from @dinfo
    order by Type

    open c1

    fetch next from c1
    into @LogicalName,@PhysicalName, @FType, @FileId

    while @@fetch_status = 0
    begin
        if(@FType = 'D')
        begin
            set @sql = @sql + char(9) + ' Move ''' + @LogicalName + ''' to ''' + @restorepath + '\' + @dbname + '_data_' + cast(@FileId as nvarchar(10)) + '.mdf''' + char(13)
        end
        else
        begin
            set @sql = @sql + char(9) + ' Move ''' + @LogicalName + ''' to ''' + @restorepath + '\' + @dbname + '_log_' + cast(@FileId as nvarchar(10)) + '.ldf''' + char(13)
        end

        fetch next from c1
        into @LogicalName,@PhysicalName, @FType, @FileId

        if (@@FETCH_STATUS = 0)
        begin
            set @sql = @sql + ',';
        end
    end
    close c1
    deallocate c1

    print @sql
    execute sp_executesql @sql
END
GO


Its asking for files list of the backup file, then writing a specific restore tsql with all these files to the selected directory name and the database name. here's an example:


exec tuspTempRestoreDatabase 'C:\TempBackup\DBName.bak', 'C:\Databases\DBName_Dev', 'DBName_Development'


As I stated before, this configuration had multiple databases, each held its own section but the tricky part was that it used cross databases stored procedures, function and views, so just restoring the databases with different names and location wouldn't do it as all cross referencing scripts would stop working. 

The simplest solution I could find for it is go over all the scripts and modify the database names.

The first component of this stage is getting all the stored scripts and their bodies:


-- =============================================
-- Author:      Dror Gluska
-- Create date: 2010-05-30
-- Description: Gets all Views/StoredProcedures and references outside the current database
-- =============================================
CREATE PROCEDURE tuspGetAllExecutables
AS
BEGIN
    SET NOCOUNT ON;

    declare @retval table (name nvarchar(max), text nvarchar(max), refs int);

    declare @tmpval nvarchar(max);

    declare @tmpname nvarchar(max);

    declare @ref table(
        ReferencingDBName nvarchar(255),
        ReferencingEntity nvarchar(255),
        ReferencedDBName nvarchar(255),
        ReferencedSchema nvarchar(255),
        ReferencedEntity nvarchar(255)
     );
    declare @refdata table (DBName nvarchar(255), Entity nvarchar(255), NoOfReferences int);
     
    insert into @ref
    select DB_NAME() AS ReferencingDBName, OBJECT_NAME(referencing_id) as ReferencingEntity, referenced_database_name, referenced_schema_name, referenced_entity_name
    FROM sys.sql_expression_dependencies




    insert into @refdata
    select     ReferencingDBNAme,
             ReferencingEntity,
             SUM(NoOfReferences) as NoOfReferences
    from
    (
        select *
        , (
            select COUNT(*)
            from @ref r2
            where r2.ReferencedEntity = [@ref].ReferencingEntity
            and r2.ReferencedDBName = [@ref].ReferencingDBName
          )
          as NoOfReferences

        from @ref
    ) as refs
    group by ReferencingDBNAme,   ReferencingEntity
    order by NoOfReferences



    declare xpcursor CURSOR for
    SELECT name
    FROM syscomments B, sysobjects A
    WHERE    A.[id]=B.[id]
        and xtype in ('TF','IF','P','V')
    group by name
    order by name

    open xpcursor
    fetch next from xpcursor into @tmpname

    while @@FETCH_STATUS = 0
    begin
        set @tmpval = '';

        select @tmpval = @tmpval + text
        FROM syscomments B, sysobjects A
        WHERE   A.[id]=B.[id]
            and A.name = @tmpname
        order by colid

        insert into @retval
        select @tmpname, @tmpval, (select top 1 NoOfReferences from @refdata where [@refdata].Entity = @tmpname)

        fetch next from xpcursor into @tmpname
    end

    close xpcursor
    deallocate xpcursor

    select * from @retval
    order by refs
 
END
GO


Then, I needed a stored procedure that will give me all the indexes created on schemabound views (but it works on tables too):


-- =============================================
-- Author:        thorv-918308
-- Create date: 2009-06-05
-- Description:    Script all indexes as CREATE INDEX statements
-- Copied from http://www.sqlservercentral.com/Forums/Topic401795-566-1.aspx#bm879833
-- =============================================
CREATE PROCEDURE tuspGetIndexOnTable
 @indexOnTblName nvarchar(255)
AS
BEGIN
    --declare @indexOnTblName nvarchar(255) = 'test'

    -- interfering with SELECT statements.
    SET NOCOUNT ON;

    --1. get all indexes from current db, place in temp table
    select
        tablename = object_name(i.id),
        tableid = i.id,
        indexid = i.indid,
        indexname = i.name,
        i.status,
        isunique = indexproperty (i.id,i.name,'isunique'),
        isclustered = indexproperty (i.id,i.name,'isclustered'),
        indexfillfactor = indexproperty (i.id,i.name,'indexfillfactor')
        into #tmp_indexes
    from sysindexes i
    where i.indid > 0 and i.indid < 255                                             --not certain about this
    and (i.status & 64) = 0                                                                 --existing indexes

    --add additional columns to store include and key column lists
    alter table #tmp_indexes add keycolumns varchar(4000), includes varchar(4000)

    --2. loop through tables, put include and index columns into variables
    declare @isql_key varchar(4000), @isql_incl varchar(4000), @tableid int, @indexid int

    declare index_cursor cursor for
    select tableid, indexid from #tmp_indexes

    open index_cursor
    fetch next from index_cursor into @tableid, @indexid

    while @@fetch_status <> -1
    begin

    select @isql_key = '', @isql_incl = ''

    select --i.name, sc.colid, sc.name, ic.index_id, ic.object_id, *
     --key column
     @isql_key = case ic.is_included_column
       when 0 then
         case ic.is_descending_key
           when 1 then @isql_key + coalesce(sc.name,'') + ' DESC, '
           else            @isql_key + coalesce(sc.name,'') + ' ASC, '
         end
       else @isql_key end,
                       
     --include column
     @isql_incl = case ic.is_included_column
       when 1 then
         case ic.is_descending_key
           when 1 then @isql_incl + coalesce(sc.name,'') + ', '
           else @isql_incl + coalesce(sc.name,'') + ', '
         end
       else @isql_incl end
    from sysindexes i
    INNER JOIN sys.index_columns AS ic ON (ic.column_id > 0 and (ic.key_ordinal > 0 or ic.partition_ordinal = 0 or ic.is_included_column != 0)) AND (ic.index_id=CAST(i.indid AS int) AND ic.object_id=i.id)
    INNER JOIN sys.columns AS sc ON sc.object_id = ic.object_id and sc.column_id = ic.column_id
       
    where i.indid > 0 and i.indid < 255
    and (i.status & 64) = 0
    and i.id = @tableid and i.indid = @indexid
    order by i.name, case ic.is_included_column when 1 then ic.index_column_id else ic.key_ordinal end

       
    if len(@isql_key) > 1   set @isql_key   = left(@isql_key,  len(@isql_key) -1)
    if len(@isql_incl) > 1  set @isql_incl  = left(@isql_incl, len(@isql_incl) -1)

    update #tmp_indexes
    set keycolumns = @isql_key,
     includes = @isql_incl
    where tableid = @tableid and indexid = @indexid

    fetch next from index_cursor into @tableid,@indexid
    end

    close index_cursor
    deallocate index_cursor

    --remove invalid indexes,ie ones without key columns
    delete from #tmp_indexes where keycolumns = ''

    --3. output the index creation scripts
    set nocount on

    --create index scripts (for backup)
    SELECT
        'CREATE '
        + CASE WHEN ISUNIQUE    = 1 THEN 'UNIQUE ' ELSE '' END
        + CASE WHEN ISCLUSTERED = 1 THEN 'CLUSTERED ' ELSE '' END
        + 'INDEX [' + INDEXNAME + ']'
        +' ON [' + TABLENAME + '] '
        + '(' + keycolumns + ')'
        + CASE
             WHEN INDEXFILLFACTOR = 0 AND ISCLUSTERED = 1 AND INCLUDES = '' THEN ''
             WHEN INDEXFILLFACTOR = 0 AND ISCLUSTERED = 0 AND INCLUDES = '' THEN ' WITH (ONLINE = ON)'
             WHEN INDEXFILLFACTOR <> 0 AND ISCLUSTERED = 0 AND INCLUDES = '' THEN ' WITH (ONLINE = ON, FILLFACTOR = ' + CONVERT(VARCHAR(10),INDEXFILLFACTOR) + ')'
             WHEN INDEXFILLFACTOR = 0 AND ISCLUSTERED = 0 AND INCLUDES <> '' THEN ' INCLUDE (' + INCLUDES + ') WITH (ONLINE = ON)'
             ELSE ' INCLUDE(' + INCLUDES + ') WITH (FILLFACTOR = ' + CONVERT(VARCHAR(10),INDEXFILLFACTOR) + ', ONLINE = ON)'
          END collate database_default as 'DDLSQL'
    FROM #tmp_indexes
    where left(tablename,3) not in ('sys', 'dt_')   --exclude system tables
    and tablename = @indexOnTblName
    order by tablename, indexid, indexname


    set nocount off

    drop table #tmp_indexes
END


And finally I needed something to connect those two components into a solution:


declare @codetext nvarchar(max);
declare @newcodetext nvarchar(max);
declare @idxcode nvarchar(max)
declare @tablename nvarchar(255);
declare @indextable table (DDLSQL nvarchar(max));

--get all executables
declare @executables table(name nvarchar(max), text nvarchar(max),refcount int);
insert into @executables
exec tuspGetAllExecutables

declare spcursor CURSOR for
select text,name
from @executables
order by refcount

open spcursor
fetch next from spcursor into @codetext, @tablename

--for each executable 
while @@FETCH_STATUS = 0
begin
    set @newcodetext = @codetext;

    --modify the database name
    set @newcodetext = REPLACE( @newcodetext,'DBName_Production.','DBName_Development.')
    --etc' etc' etc' 

    --check if we actually modified the contents of the script, otherwise, no need to alter it.
    if (@newcodetext != @codetext)
    begin
        --modify CREATE to ALTER
        set @newcodetext = REPLACE( @newcodetext,'CREATE FUNCTION','ALTER FUNCTION')
        set @newcodetext = REPLACE( @newcodetext,'CREATE PROCEDURE','ALTER PROCEDURE')
        set @newcodetext = REPLACE( @newcodetext,'CREATE VIEW','ALTER VIEW')

        --get all indexes on script, if its a schemabound view, it will get all the indexes on that view, otherwise empty
        insert into @indextable
        exec tuspGetIndexOnTable @tablename

        print @tablename
        print @newcodetext
        EXECUTE sp_executesql @newcodetext

        --recreate lost index for schemabinded views
        if (select COUNT(*) from @indextable) > 0
        begin

            declare vidxcursor cursor for select DDLSQL from @indextable
            open vidxcursor
            fetch next from vidxcursor into @idxcode

            while @@FETCH_STATUS = 0
            begin
                print @idxcode
                EXECUTE sp_executesql @idxcode
                fetch next from vidxcursor into @idxcode
            end
            close vidxcursor
            deallocate vidxcursor

        end
    end
    fetch next from spcursor into @codetext, @tablename
end

close spcursor
deallocate spcursor


In the end, I needed to delete user emails from the databases so test won't cause the application to start emailing users, but you might have your own development practices.

And finally:


print 'Done!'


Tuesday, May 4, 2010

SQL Grouping by Year, Week, Month, Day

You've been tasked with writing a query that can slice records based on years, weeks, months and days, now what?


Well, I have a good option for you, the following can be written in a stored function, procedure or just plain SQL query. You can pass y,m,d,w,a (Year, Month, Day, Week, All) and it will group accordingly.


You'd be surprised how fast it is.


declare @GroupByDate char(1) = 'a'; --y,m,d,w,a

select (case when @GroupByDate = 'y' then CONVERT(varchar(4), DATEPART(yyyy, Timestamp))
    when @GroupByDate = 'm' then CONVERT(varchar(4),DATEPART(yyyy, Timestamp )) + '/' + RIGHT('00'+ CONVERT(varchar(4),DATEPART(mm, Timestamp)),2)
    when @GroupByDate = 'd' then CONVERT(varchar(4),DATEPART(yyyy, Timestamp )) + '/' + RIGHT('00'+ CONVERT(varchar(4),DATEPART(mm, Timestamp)),2) + '/' + RIGHT('00'+CONVERT(varchar(4),DATEPART(dd,Timestamp)),2)
    when @GroupByDate = 'w' then CONVERT(varchar(4),DATEPART(yyyy, Timestamp )) + '/' + RIGHT('00'+ CONVERT(varchar(4),DATEPART(wk, Timestamp)),2)
    when @GroupByDate = 'a' then 'All'
    end) as DateGroup,
  sum(Pages) as TotalPages,
  COUNT(*) as NumberOfVisitors  
from Visitors
group by (case when @GroupByDate = 'y' then CONVERT(varchar(4), DATEPART(yyyy, Timestamp))
    when @GroupByDate = 'm' then CONVERT(varchar(4),DATEPART(yyyy, Timestamp )) + '/' + RIGHT('00'+ CONVERT(varchar(4),DATEPART(mm, Timestamp)),2)
    when @GroupByDate = 'd' then CONVERT(varchar(4),DATEPART(yyyy, Timestamp )) + '/' + RIGHT('00'+ CONVERT(varchar(4),DATEPART(mm, Timestamp)),2) + '/' + RIGHT('00'+CONVERT(varchar(4),DATEPART(dd,Timestamp)),2)
    when @GroupByDate = 'w' then CONVERT(varchar(4),DATEPART(yyyy, Timestamp )) + '/' + RIGHT('00'+ CONVERT(varchar(4),DATEPART(wk, Timestamp)),2)
    when @GroupByDate = 'a' then 'All'
    end)



If you need a sequence from start to end (to join on for example, so you'll have a complete set), you can use it with a combination of date sequence generating function like this:



declare @fromdate datetime = '2008-01-01'
declare @todate datetime = '2011-01-01'
;
with Dates(MyDate)
AS
(
    Select @fromdate MyDate
    
    UNION ALL
    
    SELECT (MyDate+1) MyDate
    FROM Dates
    WHERE
        MyDate < @todate
)


SELECT MyDate FROM Dates 
OPTION(MAXRECURSION 0)





This is how I created the data for this example:




CREATE TABLE [dbo].[Visitors](
    [VisitId] [int] IDENTITY(1,1) NOT NULL,
    [Timestamp] [datetime] NOT NULL,
    [Pages] [int] NOT NULL
)


The random date was taken from here and the numbers from here.


WITH Visits(RecordId) AS( 
    select (1) 
    UNION ALL
    SELECT RecordId+1 FROM Visits WHERE RecordId < 100 
)

insert into Visitors(Timestamp, Pages)
select 
 dateadd(month, -1 * abs(convert(varbinary, newid()) % (2 * 12)), getdate()) as timestamp
, ABS(CAST(NEWID() AS binary(6)) %10) + 1 as RandomNumber
from visits