SSIS: How to handle updates

Handling updates in SQL Server Integration Services isn’t straight forward. If you have developed an ETL (Extract Transform Load) process for a Data Warehouse, then you may have faced a situation where you needed to update existing records. This could be the case when you are implementing Type 1 SCDs. The situation isn’t particular for a Data Warehouse ETL; in routine SSIS tasks, there can be situations when you want to insert a record if it does not exist, or else update the existing record. The problem is that SQL Server Integration Services does not provide direct support for updating records based on some criteria. This post will have a brief look over various solutions:

Ole DB Command

The most basic solution could be to use an Ole DB Command transformation for every input row. For this, our dataflow may look something like.

The most annoying thing in this approach is that Ole DB Command does not support named parameters and we need to write our update queries using “?” like this:

Update Dim_myDimTable
 Set Property1 = ?,
 Property2 = ?
 ....
 Where BusinessKey = ?

This could get tedious to maintain if we have lot of columns in our table since we need to map the ? parameters in the exact order they appeared in the update query.

Also, this method uses a single update query for every matched record and so is not a good choice if the records to be updated are higher in number. Also, when using this approach, we need to make sure that we only update the rows that are changed (the sample dataflow above does not depict this).

Slowly Changing Dimension transformation

This is a nice built-in method that starts a wizard and finally generates a dataflow similar to the one we saw in the first approach if we choose options in the wizard that represent Type 1 SCD with changing attributes. So this option could be regarded as a more maintainable way of approach 1 since it generates all the related data flow components automatically. However, since this approach also updates the records one by one so the performance may degrade if we have a lot of rows to update. To use this option, we need to drag a Slowly Changing Dimension transformation to our data flow and follow the wizard accordingly.

Staging Update Table

This approach gives greater scalablity, maintainablity and flexibility. This option requires us to create a staging table in the database and fill that table with the rows that that already exist in the target table. After that, we can simply write an update query with a join condition to update all the records in one-go. Here’s the dataflow sketch for this option:

After the above dataflow, we will need to create an SQL Script control flow component to write our query and clears the staging update table. This can be done using an sql command like this:

--update dimension rows
 Update D
  Set
   Property1=U.Property1,
   Property2=U.Property2
   ...
  From Dim_myDimTable D
  Inner Join Temp_myDimUpdateTable U
   on D.BusinessKey = U.BusinessKey

  --empty update table
  Truncate table Temp_myDimUpdateTable

Of course, this solution is an overhead if the records to be updated are small in number since it requires an staging table and is a two-pass solution.

Merge Statement

SQL Server 2008 came up with a new Merge command that can automatically insert or update records in the destination table in a single statement. This option is more described in this MSDN BOL article: Using MERGE in Integration Services Packages

That’s all I learned when I wrote an ETL for a data warehouse. I hope that post helps others that are going to write SSIS packages as well. Please have your thoughts propagated using the comments section below.

Control.Invoke and BeginInvoke using lamba and anonymous delegates

Working constantly with LINQ and .NET 3.5 these days, I sometimes forget how to do things without lamda expressions. Today, I was working on a WinForms application and I needed to update the UI from a separate thread, so I wrote the following code:

    this.BeginInvoke( ()=>
        {
            //code to update UI
        });

This gave me the following error:

Cannot convert lambda expression to type 'System.Delegate' because it is not a delegate type

I, then tried the anonymous delegate syntax:

    this.BeginInvoke( delegate ()
        {
            //code to update UI
        });

Still incorrect, as it gave me this error:

Cannot convert anonymous method to type 'System.Delegate' because it is not a delegate type

A quick google revealed that Control.Invoke and Control.BeginInvoke take a System.Delegate as parameter that is not a delegate type and we need to cast our lambas or anonymous delegates to MethodInvoker or Action like this:

    this.BeginInvoke( (Action) (()=>
        {
            this.txtLongestWord.Text = this.longestWord;
        }));

The above code does not look good due to lot of brackets, so let’s use the anonymous delegate syntax:

    this.BeginInvoke( (Action) delegate ()
        {
            //code to update UI
        });

//or

    this.BeginInvoke( (MethodInvoker) delegate ()
        {
            //code to update UI
        });

T-SQL: Using cursor with Common Table Expressions

Cursors aren’t a bad choice for certain scenarios. Few days back, I was writing a stored procedure that was to be scheduled as a SQL agent job. I needed to send emails for each row of a result set and hence concluded that a Fast Forward cursor (Read Only, Forward only cursor) would be a nice choice. I created several CTEs (common table expressions) to reach my final result set and declared a cursor for my final select statement like this:


--declare some CTEs
;With CTE1 as
(
   --CTE1 definitiion
)
,CTE2 as
(
   --CTE2 definitiion
)

--declare a cursor for final select statement
Declare myCursor Cursor Fast_Forward For
   --select query
   Select ... From CTE2

But I was greeted with the following error:

Incorrect syntax near the keyword 'Declare'.

Ok, so my expected syntax is incorrect. After a quick google, I found that the declare cursor statement needs to be on the top of CTE declarations. So, here’s the correct syntax to define cursor with CTEs:


--declare a cursor above the CTE definitions
Declare myCursor Cursor Fast_Forward For

--declare CTEs
With CTE1 as
(
   --CTE1 definitiion
)
,CTE2 as
(
   --CTE2 definitiion
)

--select query as normal
Select ... From CTE2

--now open and use the cursor and don't forget to close and deallocate it in the end

Cool!! That was a nice learning.

Posted in T-SQL. Tags: , , . 1 Comment »

Winning TSQL Challenge 12

It’s a source of great pleasure for me that I am included in the winners for TSQL challenge 12.

Here’s my post describing my solution and here’s the analysis by Jacob Sebastian, the founder and president of TSQL Challenges.

Silverlight: Update service reference for a WCF service generating empty class

My Visual Studio sometimes goes angry. I had a Silverlight 2 – WCF – LINQToSQL application that I recently converted to Silverlight 3. I noticed that sometimes the Update Service Reference does not function properly and instead blanks out the generated reference.cs. It broke once again today and I decided to blog it. Here are the contents of Error window when such abnormal activity happens:

Custom tool error: Failed to generate code for the service reference '..'.  Please check other error and warning messages for details.

The warning tab may have many warnings, some of which are:

Custom tool warning: Unable to load one or more of the requested types. Retrieve the LoaderExceptions property for more information.

Custom tool warning: The type 'System.Collections.ObjectModel.ObservableCollection`1' could not be found.  Ensure that the assembly containing the type is referenced.  If the assembly is part of the current development project, ensure that the project has been built.

Custom tool warning: Cannot import wsdl:portType
Detail: An exception was thrown while running a WSDL import extension: System.ServiceModel.Description.DataContractSerializerMessageContractImporter
Error: Exception has been thrown by the target of an invocation.
XPath to Error Source: //wsdl:definitions[@targetNamespace='']/wsdl:portType[@name=..]

Custom tool warning: Cannot import wsdl:port
Detail: There was an error importing a wsdl:binding that the wsdl:port is dependent on.
XPath to wsdl:binding: //wsdl:definitions[@targetNamespace='..']/wsdl:binding[@name='..']

Custom tool warning: No endpoints compatible with Silverlight 3 were found. The generated client class will not be usable unless endpoint information is provided via the constructor.

Ok. Looks like something is wrong with the Observable Collection reference. This was further confirmed when I brought up the service configuration window (by right clicking the WCF service and selecting Configure Service Reference), and noticed that the ObservableCollection collection type has been replaced by { Custom } :

service-reference-incorrect

For comparison, here’s the normal screenshot for this window:

service-reference-correct

To get rid of this and regenerate service proxy properly, I had to remove the Reuse types in all referenced assemblies (or at least System assembly).

So, if such behavior ever happen to you, try to update the service reference after unchecking the reuse types checkbox. Once done, you can recheck the option and update your service reference again.

T-SQL: Using result of a dynamic SQL query in a variable or table

Although, not a recommended practice, but sometimes we have to write our queries using dynamic SQL. In such situations, it is generally needed to fetch the result (scalar or tabular) of dynamic SQL into the main (non-dynamic) query. This is not straight forward because dynamic SQL runs in its own scope and we cannot access the variables defined in main query. This post presents a few approaches to consume the result of a dynamic SQL query:

sp_ExecuteSql stored procedure

This is the most generic and powerful method of invoking dynamic SQL since it allows us to write a parameterized dynamic query with input/output parameters. Here’s a simple example of using sp_executesql to consume the result of a dynamic SQL query:


declare @today datetime
exec sp_executesql
    N'Select @internalVariable = GetDate()', --dynamic query
    N'@internalVariable DateTime output', --query parameters
    @internalVariable = @today output --parameter mapping
select @today

Table variables and Temporary tables

This method is used when we want to get a tabular result set from our dynamic query. Here’s an example to get the result of a dynamically created SQL query by using table variables:


declare @myTable table
(
    DatabaseName nvarchar(256),
    DatabaseID int,
    CreateDate datetime
)

insert into @myTable
    exec (N'select name, database_id, create_date from sys.databases') --dynamic query

select * from @myTable

Here’s the same example that uses a temporary table to fetch the result set of a dynamic SQL query:


create table #myTable
(
    DatabaseName nvarchar(256),
    DatabaseID int,
    CreateDate datetime
)

insert into #myTable
    exec (N'select name, database_id, create_date from sys.databases') --dynamic query

select * from #myTable
drop table #myTable

Since temporary tables have physical existence so we can refer to the temporary table inside our dynamic SQL query as well. Here’s an example illustrating this technique:


create table #myTable
(
    DatabaseName nvarchar(256),
    DatabaseID int,
    CreateDate datetime
)

--dynamic query
exec sp_executesql
    N'insert into #myTable
        select name, database_id, create_date from sys.databases'

select * from #myTable
drop table #myTable

Temporary tables or Table variables can also be used to fetch the result of a stored procedure. Notice that for saving this result, the columns of table variable/temporary table must match with the result of stored procedure. That is, we need to take “ALL” the columns. Here’s an example that grabs the result set from a stored procedure into a table variable.


declare @myTable table
(
    ServerName nvarchar(256),
    NetworkName nvarchar(256),
    Status nvarchar(4000),
    ID int,
    Collation nvarchar(256),
    ConnectTimeOut int,
    QueryTimeOut int
)

insert into @myTable
    exec sp_helpserver

select * from @myTable

Also, here’s an example to get result of a stored procedure using temporary table:


create table #myTable
(
    ServerName nvarchar(256),
    NetworkName nvarchar(256),
    Status nvarchar(4000),
    ID int,
    Collation nvarchar(256),
    ConnectTimeOut int,
    QueryTimeOut int
)

insert into #myTable
    exec sp_helpserver

select * from #myTable
drop table #myTable

Thats all from me. Let me know if you have any more solutions.

LINQ to SQL: Visual Studio designer failed to autogenerate .designer.cs data classes

I had a really strange observation today. I opened one of my LINQ to SQL dbml files, made some changes, and then saved it back to have my designer generated data classes updated. But instead of reflecting my changes in the .designer.cs class, Visual studio deleted that designer generated file. I tried several times but every time LINQ designer was deleting my autogenerated data classes. I googled and found an amazing answer by Marc Gravell at this stackoverflow question. The position of using statements was the source of problem!!! I had extended the LINQ generated partial classes and the first statement in that file was “using System”. I removed that and the VS designer was happy again.

So for anyone else that experiences the same problem, try moving your using statements after the namespace declaration. For example, if you have extended your data context or any other data class like this:

using System;
namespace MyNamespace
{
    partial class MyDataContext
    {
        ...
    }
    ...
}

Try rearranging the declarations like this: 

namespace MyNamespace
{
    using System;

    partial class MyDataContext
    {
        ...
    }
    ...
}

Hope this post helps someone else as well.

TSQL Challenge 12: Completing sequence by inserting missing rows

TSQL Challenge 12 was a relatively easier one. The participants were given month-wise score values and were asked to complete the sequence by creating entries for missing month.

Here’s the sample input:

YearMonth   Score
----------- -----------
200903      100
200803      95
200802      99
200801      100
200711      100

And here’s the desired output. Notice that the score of last month is replicated in each of the missing rows:

YearMonth   Score
----------- -----------
200908      100
200907      100
200906      100
200905      100
200904      100
200903      100
200902      95
200901      95
200812      95
200811      95
200810      95
200809      95
200808      95
200807      95
200806      95
200805      95
200804      95
200803      95
200802      99
200801      100
200712      100
200711      100

The script to generate sample data is provided below:

DECLARE @Scores TABLE
(
	YearMonth	INT,
	Score		INT
)

INSERT @Scores VALUES(200903, 100)
INSERT @Scores VALUES(200803, 95)
INSERT @Scores VALUES(200802, 99)
INSERT @Scores VALUES(200801 ,100)
INSERT @Scores VALUES(200711, 100)

Solution

I once blogged about creating a sequence of numbers/dates using a recursive CTE in this post. The same technique can be used here. However, since the YearMonth column in the sample data is integer, we have two choices:

  1. Convert it to DateTime and apply T-SQL DateTime functions
  2. Leave it as Integer and apply some intelligent arithmetic

I am providing both the solutions here. Note that the first solution will be slower due to overhead of casting and applying T-SQL scalar functions.

The first solution: Converting to DateTime

;with cte as
(
    select score, Cast(Cast(YearMonth as varchar)+'01' as datetime) as dateVal
        from @scores
    union all
    select score, dateadd(month, 1, dateval)
        from cte
        where not exists
            --the resultant YearMonth value should not lie in the original table
            ( select 1 from @scores s where s.YearMonth = cast( left(convert(varchar, dateadd(month, 1, cte.dateval), 112), 6) as int) )
            --stop at current month
            and dateadd(month, 1, cte.dateval) < getdate()
)

select left(convert(varchar, dateval, 112), 6) as yearmonth, score
from cte
order by dateval desc

Explanation:
Here I am simply converting the integer YearMonth column to a datetime dateval column by appending 01 to the end (so a 200901 becomes 20090101 that can easily be cast to a dateTime) and then finding subsequent dates by adding one month in each CTE iteration.

The second solution: Integer Arithmetic

;with cte as
(
    select YearMonth, Score
		from @Scores
    union all
    select YearMonth + YearMonth % 100 / 12 * 88 + 1 as YearMonth, Score
        from cte
        where not exists
            --the resultant YearMonth value should not lie in the original table
            ( select s.YearMonth from @Scores s where s.YearMonth = (cte.YearMonth + cte.YearMonth % 100 / 12 * 88 + 1) )
            --stop at current month
            and cte.YearMonth < month(getdate()) + year(getdate())*100
)

select *
from cte
order by YearMonth desc

Explanation:
The important point is to increment the value of YearMonth correctly. So 200811 should get incremented to 200812 but 200812 should get incremented to 200901. This isn’t difficult if we introduce a case statement like this:

YearMonth + (Case When YearMonth%100 < 12 Then 1 Else 89 End)

But I wanted to do this purely using arithmetic with no Case statements, so I came up with this formula:

(YearMonth % 100 / 12 * 88) + 1

Note that the factor (YearMonth % 100 / 12 * 88) will reduce to zero for all values from January to November, i.e. from 200801 to 200811.

I hope you enjoyed the solution.

Update:
It was pointed out in one of the comments by Rakesh that the solution could reach the default limit of recursion which is 100. In order to avoid this, we need to add option (maxrecursion 12,000) in the final select statement. Then, we can have 10,000 years missing between two adjacent entries. Thanks, Rakesh.

...
select *
from cte
order by YearMonth desc
option (maxrecursion 12,000)

Avoiding hard-coded strings while raising or handling PropertyChanged event

While developing WPF/Silverlight applications, and more specifically while following the Model-View-ViewModel (MVVM) pattern we will find ourselves implementing INotifyProprertyChanged most of the times. The default implementation of PropertyChanged event takes the property name as string in the PropertyChangedEventArgs which is not much robust. There are several ways to address the issue:

  • Use reflection to verify that the property actually exists, as demonstrated once by Josh Smith.
  • Use Injection using some Aspect Oriented Programming framework, like PostSharp
  • Use Expression Trees as described by Michael Sync and Davy Brion

Personally, I prefer using expression trees. So, instead of writing this:

public string MyProperty
{
    get { return this.myProperty; }
    set { this.myProperty = value; this.RaisePropertyChanged("MyProperty"); }
}

We can write:

public string MyProperty
{
    get { return this.myProperty; }
    set { this.myProperty = value; this.RaisePropertyChanged( MyObj => MyObj.MyProperty ); }
}

The same issue exists looking at the other side. When we subscribe to PropertyChanged event of an object, we get the property name again as a string. One way is to use the GetPropertyName( ExpressionTree ) extension method from the above implementation in our if and case statements. Also, Josh nicely addressed the issue in this post, thus allowing us to write:

MyClass myObject = new MyClass();
PropertyObserver<MyClass> observer = new PropertyObserver<MyClass>(myObject)
    .RegisterHandler(myObj => myObj.MyProperty1, myObj => { /* handle change in MyProperty1 */ })
    .RegisterHandler(myObj => myObj.MyProperty2, MyProperty2HandlerMethod  );

Notice that Josh used IWeakEventListener that isn’t available for Silverlight but luckily Pete O’ Hanlon provided us with a Silverlight version of Josh’s work here.

So, combining the great efforts of all these people, we are going to have a better MVVM experience.

SSMS: How to restore differential backups

There are two ways to restore a differential backup in SQL Server:

  • Directly use TSQL statements as described in this MSDN page
  • Use SQL Server Management Studio user interface as described here.

If you are using SQL Management Studio to restore differential backups, and you have restored full backups several time using SSMS, but this is your first time to restore a differential backup then you are likely to encounter the following error:

Restore failed for Server 'servername'. (Microsoft.SqlServer.SmoExtended)
ADDITIONAL INFORMATION:
System.Data.SqlClient.SqlError: The log or differential backup cannot be restored because no files are ready to rollforward. (Microsoft.SqlServer.Smo)

This is because you tried to restore a differential backup on an available/operational/functional/running database which is not allowed.

In order to restore a differential backup, you will first need to restore the last full backup with NO RECOVERY option. So, in SSMS you need to select the appropriate full-backup and choose Restore With NoRecovery option from the Options page as depicted in the following screenshot.

restore-full-backup-with-norecovery

Once restored, the database will be shown in the Object Explorer as Restoring.

database-in-restore

Notice that the database is non-available/non-functional at this time and is waiting for a differential backup to be applied. Now, restore the appropriate differential backup and choose Restore With Recovery from the Options page:

restore-differential-backup-with-recovery

That’s it. You have successfully restored a differential backup.

A Final Note:

Note that differential backups are cumulative and each differential backup contain changes since the last full backup, not the last differential backup. So if you have a full backup of 2009-01-01 and have differential backups for each day, and you want to restore your database to 2009-01-10, then you just need to restore the full backup (with no recovery) of 2009-01-01 followed by the differential backup of 2009-01-10. For more information, read the following MSDN articles from SQL Server books online: