TSQL Challenge 10: Horizontal and Vertical sorting of a result set

TSQL Challenge 10 was an interesting one. We needed to sort a result set horizontally as well as vertically. That is, given the following as an input:

input
We need to write a query that :

  1. Sort the values horizontally: Arrange the values from smallest to the largest. for example, the first row contains values “2”, “1” and “3”, it should be arranged as “1”, “2” and “3”.
  2. Sort the rows vertically: This is the regular sorting that we are familiar with.
  3. Remove duplicates: Duplicate rows should be removed from the final output.

And here’s the expected output:

output
I tackled the problem systematically in series of steps, each represented by a CTE:

  1. First, I assigned a rowID to each row of the input
  2. Then I merged the three columns into one using unpivot transformation
  3. Then I sorted that merged result grouped by rowID (horizontal sorting)
  4. Then I used pivot to split the values into three columns based on that sorting
  5. Finally I sorted the result (vertical sorting) and displayed distinct rows

Here are the CTE definitions along with the result at each step:

  • First, I assigned a rowID to each row of the input

;with tableWithRowID as
(
    --add an extra column for identification of a row
    select c1, c2, c3, ROW_NUMBER() over (order by @@identity) as rowID
    from @t
)

cte1

  • Then I merged the three columns into one using unpivot transformation

,combined as
(
    --combine column values into a single column.
    select rowID, value from tableWithRowID
        unpivot (value for cols in (c1,c2,c3) ) as unpvt
)

cte2

  • Then I sorted that merged result grouped by rowID (horizontal sorting). In this step, I added a new column colID such that the smallest value gets colID=1, the next one gets colID=2 and so on.

,sorted as
(
    --sort values in each row (grouped by rowID). This sorts the original data horizontally
    select rowID, value, ROW_NUMBER() over (partition by rowID order by value) as colID
    from combined
)

cte3

  • Then I used pivot to split the values into three columns based on that sorting

,splitted as
(
    --split into columns again (based on that sorting)
    select rowID, [1] as c1, [2] as c2, [3] as c3
    from sorted
		pivot ( min(value) for colID in ([1],[2],[3])) as pvt
)

cte4

  • Finally I sorted the result (vertical sorting) and displayed distinct rows

--sort vertically and select distinct
select distinct c1,c2,c3
from splitted
order by c1, c2, c3

output

That’s all, the nice feature about the solution is that its highly expandable; to configure it for greater no. of columns, we just need to modify the pivot/unpivot clauses. Here’s the complete source code:


--sample data
DECLARE @t TABLE (
  c1 CHAR(1),
  c2 CHAR(1),
  c3 CHAR(1)
)

insert into @t (c1, c2, c3) values ('2','1','3')
insert into @t (c1, c2, c3) values ('3','2','1')
insert into @t (c1, c2, c3) values ('Z','X','Y')
insert into @t (c1, c2, c3) values ('B','C','D')
insert into @t (c1, c2, c3) values ('Y','Z','X')
insert into @t (c1, c2, c3) values ('B','C','A')

--select * from @t

--solution
;with tableWithRowID as
(
    --add an extra column for identification of a row
    select c1, c2, c3, ROW_NUMBER() over (order by @@identity) as rowID
    from @t
)
,combined as
(
    --combine column values into a single column.
    select rowID, value from tableWithRowID
        unpivot (value for cols in (c1,c2,c3) ) as unpvt
)
,sorted as
(
    --sort values in each row (grouped by rowID). This sorts the original data horizontally
    select rowID, value, ROW_NUMBER() over (partition by rowID order by value) as colID
    from combined
)
,splitted as
(
    --split into columns again (based on that sorting)
    select rowID, [1] as c1, [2] as c2, [3] as c3
    from sorted
		pivot ( min(value) for colID in ([1],[2],[3])) as pvt
)

--sort vertically and select distinct
select distinct c1,c2,c3
from splitted
order by c1, c2, c3

TSQL Challenge 9: Getting longest chain of consecutive alike/duplicate rows

The BeyondRelational team is working hard to present us with cool TSQL challenges. For challenge 9, the contestants were required to find the first and last IDs for consecutive rows with same values of Send and Ack states. That is, given the following as an input,
Input
We were required to produce the following output.
Output
An important part of the challenge was to write a scalable query capable of handling milllions of rows. This requirement kicked off the following simple answer to the question involving min/max subqueries:


;With Result
as
(
	Select
	(Select IsNull(Max(C.id)+1, (Select Min(id) from @tc9)) from @tc9 C
			where
				C.id <= A.id and
				(C.SendState<>A.SendState or C.AckState<>A.AckState)
			) as MinID
	,
	(Select IsNull(Min(C.id)-1, (Select Max(id) from @tc9)) from @tc9 C
			where
				C.id >= A.id and
				(C.SendState<>A.SendState or C.AckState<>A.AckState)
			) as MaxID
	,A.SendState, A.AckState
	From @tc9 A
)

Select distinct C.MinID, C.MaxID, C.SendState, C.AckState
from Result C
order by C.MinID

The above solution tries to find Min/Max IDs for every row using a subquery and hence will result in hopeless speed when executed on very large tables. We need to find another more efficient way to solve the problem.
Luckily, T-SQL 2005 presents us with ranking funcitons that are very helpful in various scenarios. Consider assigning a row number per Send and Ack states using the following query.


;With RowNoPerStateCombination
As
(
    Select
        *,
        --assign a row number grouped per SendState/AckState combination
        ROW_NUMBER() OVER(PARTITION BY SendState,AckState ORDER BY ID) AS RowID
    From @tc9
)

Select * from RowNoPerStateCombination
order by ID

The output is
RowIDs
Notice a great pattern: if we try to subtract the row id from the assigned row number, it is going to give us a unique result for every consecutive send/ack state pattern. This is depicted using the following screenshot from Excel.
Pattern
Since we are able to produce a unique result (say it GroupID) for every consecutive send/ack state combinations, we can just pick the min and max values per GroupID per Send/Ack State combination. Here’s what I am talking about:


;With MessageGroups
as
(
    Select
        *,
        --assign a unique group number for each consequtive state combination
        ID - ROW_NUMBER() OVER(PARTITION BY SendState,AckState ORDER BY ID) AS GroupID
    From @tc9
    Where CreationDate between @startTime and @endTime
)

Select MIN(ID) as FirstIdInclusive, MAX(ID) as LastIdInclusive, SendState, AckState
From MessageGroups
Group by GroupID, SendState, AckState
Order by MIN(ID)

And we get the required output.
Output
Finally, here’s my complete solution.


--populate sample data
DECLARE @tc9 TABLE(
    ID INT IDENTITY(1,1),
    CreationDate DATETIME,
    Content NVARCHAR(10),
    SendState BIT,
    AckState BIT
)

INSERT INTO @tc9 (CreationDate,Content,SendState,AckState)
SELECT GETDATE()-1.0,'Msg #1',0,0 UNION
SELECT GETDATE()-0.9,'Msg #2',0,0 UNION
SELECT GETDATE()-0.8,'Msg #3',1,1 UNION
SELECT GETDATE()-0.7,'Msg #4',1,1 UNION
SELECT GETDATE()-0.6,'Msg #5',1,1 UNION
SELECT GETDATE()-0.5,'Msg #6',1,0 UNION
SELECT GETDATE()-0.4,'Msg #7',1,0 UNION
SELECT GETDATE()-0.3,'Msg #8',1,0 UNION
SELECT GETDATE()-0.2,'Msg #9',1,0 UNION
SELECT GETDATE()-0.1,'Msg #10',1,1

--SELECT * FROM @tc9

--solution
Declare @startTime datetime
Declare @endTime datetime

set @startTime = GetDate()-20.8
set @endTime = GetDate()

;With MessageGroups
as
(
    Select
        *,
        --assign a unique group number for each consequtive state combination
        ID - ROW_NUMBER() OVER(PARTITION BY SendState,AckState ORDER BY ID) AS GroupID
    From @tc9
    Where CreationDate between @startTime and @endTime
)

Select MIN(ID) as FirstIdInclusive, MAX(ID) as LastIdInclusive, SendState, AckState
From MessageGroups
Group by GroupID, SendState, AckState
Order by MIN(ID)

T-SQL: Creating a hierarchical structure from relational data

Recently, while developing an SSIS package, I came across a scenario where I needed to create parent-child relationships between rows in a relational table. Let’s have a look at some sample data:

Source Data

Notice the values in the WBSCode column, 001 is the parent of all 001-XX rows, 001-02 is the parent of all 001-02-XX rows and so on. We need to create a relationship by populating the ParentID field so that the resulting structure can be traversed using a recursive CTE. Here’s a code snippet to create some sample data.


--create sample data
Declare @ProjectWBS TABLE
(
    ID int IDENTITY(1,1) NOT NULL Primary Key,
    ProjectCode varchar(20) NULL,
    WBSCode varchar(100) NULL,
    WBSDescription varchar(max) NULL,
    ParentID int NULL
)

INSERT @ProjectWBS (ProjectCode, WBSCode, WBSDescription, ParentID)
    Select 'Project1', '001', 'Phase 1', NULL
    union all Select 'Project1', '001-01', 'Project Mgmt', NULL
    union all Select 'Project1', '001-02', 'Requirements', NULL
    union all Select 'Project1', '001-02-01', 'Requirements Meeting', NULL
    union all Select 'Project1', '001-02-02', 'Requirements Write Up', NULL
    union all Select 'Project1', '001-03', 'Installation', NULL
    union all Select 'Project1', '001-03-01', 'Nav Install', NULL
    union all Select 'Project1', '001-03-02', 'WTE Install', NULL
    union all Select 'Project1', '001-03-03', 'Reporting Service', NULL
    union all Select 'Project1', '001-03-04', 'BI Install', NULL
    union all Select 'Project1', '001-04', 'Setup', NULL
    union all Select 'Project1', '001-04-01', 'Financials Setup', NULL
    union all Select 'Project1', '001-04-02', 'Projects Setup', NULL
    union all Select 'Project1', '001-04-03', 'Allocation Setup', NULL
    union all Select 'Project1', '001-04-04', 'WTE Setup', NULL
    union all Select 'Project1', '001-04-05', 'Users and Security', NULL
    union all Select 'Project1', '001-05', 'Training', NULL
    union all Select 'Project1', '001-06', 'Testing', NULL
    union all Select 'Project1', '001-06-01', 'Financial Testing', NULL
    union all Select 'Project1', '001-06-02', 'Project Testing', NULL
    union all Select 'Project1', '001-06-03', 'Project Allocation Testing', NULL
    union all Select 'Project1', '001-06-04', 'WTE Testing', NULL
    union all Select 'Project1', '001-06-05', 'Users and Security Testing', NULL
    union all Select 'Project1', '001-07', 'Onsite Go Live', NULL
    union all Select 'Project1', '001-08', 'Development', NULL
    union all Select 'Project1', '001-08-01', 'Overtime functionality', NULL
    union all Select 'Project1', '001-09', 'Data Migration', NULL
    union all Select 'Project1', '002', 'Phase 2', NULL
    union all Select 'Project1', '002-01', 'Monthly Support', NULL
    union all Select 'Project1', '002-01-02', 'Support Training', NULL
    union all Select 'Project1', '002-01-03', 'Support Account Mgmt', NULL

--Select * from @ProjectWBS

I started by creating the following function that returns no. of occurrences of a particular character in a given string:


CREATE FUNCTION fnc_CountChar(@string varchar(max), @char char(1) )
RETURNS INT
BEGIN
    RETURN (LEN(@string) - LEN(REPLACE(@string, @char, '')))
END

After this, I declared the following CTE:


--create hierarchical structure
;With WBSHierarchy
as
(
    Select Parent.ID, Parent.ProjectCode, Parent.WBSCode, Parent.WBSDescription,
        0 as Level, null as ParentID, Cast(null as varchar(MAX)) as ParentWBSCode
    From @ProjectWBS Parent
    Where dbo.fnc_CountChar(Parent.WBSCode, '-') = 0 --no hiphen "-" in parents

    union All

    Select Child.ID, Child.ProjectCode, Child.WBSCode, Child.WBSDescription,
        Parent.Level+1, Parent.ID as ParentID, Cast(Parent.WBSCode as varchar(max)) as ParentWBSCode
        From @ProjectWBS Child
        inner join WBSHierarchy Parent on Child.WBSCode Like Parent.WBSCode + '-%' -- childWBS = parentWBS-xx
            and dbo.fnc_CountChar(Child.WBSCode, '-') = Parent.Level + 1 --childs should have only one hiphen more than their parent
)

--Select * from WBSHierarchy order by WBSCode

Notice that in the above CTE definition, I considered all rows with single occurrence of “-” as the parent rows. Then, I used the fact that ChildWBS=ParentWBS-XX, and the no. of “-” occurrences in child WBS will be equal to one greater than its parent. Here’s the output of the above CTE.

Hierarchical Structure

Things were much simple after this, and I was able to write a single update query to populate the ParentID fields of all the rows in the original table as follows:


--update parentID
update @ProjectWBS
    set ParentID = H.ParentID
from @ProjectWBS W
inner join WBSHierarchy H on W.ID=H.ID

That’s all. Now the result can be traversed using the following recursive CTE:


--traverse the resulting structure using a recursive CTE
;With WBSHierarchy
As
(
    Select Parent.ID, Parent.ProjectCode, Parent.WBSCode, Parent.WBSDescription, 0 as Level
        from @ProjectWBS Parent
        where Parent.ParentID is null
    union all
    Select Child.ID, Child.ProjectCode, Child.WBSCode, Child.WBSDescription, Parent.Level+1 as Level
        from @ProjectWBS Child
        inner join WBSHierarchy Parent on Child.ParentID=Parent.ID
)

Select ProjectCode, WBSCode, space(Level*4)+WBSDescription as WBSDescription
from WBSHierarchy order by WBSCode

CTE Output

Let me know if you have any other solution to such scenario.

Victory in TSQL Challenge 7

I am really glad to announce that my solution to TSQL Challenge 7 was chosen among the winners. I briefly described my solution in this post. For result, have a look at http://beyondrelational.com/blogs/tc/archive/2009/06/17/tsql-challenge-7-winners.aspx, and get prepared for the upcoming challenges at http://beyondrelational.com/blogs/tc/default.aspx

C#: Left outer joins with LINQ

I always considered Left Outer Join in LINQ to be complex until today when I had to use it in my application. I googled and the first result gave a very nice explanation. The only difference between ordinary joins (inner joins) and left joins in LINQ is the use of “join into” and “DefaultIfEmpty()” expressions.

Consider this very simple query (Assuming a scenario that not all the TimesheetLines are associated with a Job)


Select TL.EntryDate, TL.Hours, J.JobName
From TimeSheetLines TL
Left Join Jobs J on TL.JobNo=J.JobNo

A LINQ query using inner join is


var lines =
    from tl in db.TimeSheetLines
    join j  in db.Jobs on tl.JobNo equals j.JobNo
    where tl.ResourceNo == resourceNo

    select new
    {
        EntryDate = tl.EntryDate,
        Hours = tl.Hours,
        Job = j.JobName
    };

And a LINQ query performing left join is


var lines =
    from tl in db.TimeSheetLines
    join j  in db.Jobs on tl.JobNo equals j.JobNo into tl_j
    where tl.ResourceNo == resourceNo

    from j in tl_j.DefaultIfEmpty()
    select new
    {
        EntryDate = tl.EntryDate,
        Hours = tl.Hours,
        Job = j.JobName
    };

Notice that the only difference is the use of “into” with the join statement followed by reselecting the result using “DefaultIfEmpty()” expression. And here’s the generated SQL from the above LINQ expression.


SELECT [t0].[EntryDate] as [EntryDate], [t0].[Hours] as [Hours], [t1].[JobName] AS [Job]
FROM [dbo].[TimeSheetLine] AS [t0]
LEFT OUTER JOIN [dbo].[Jobs] AS [t1] ON [t0].[JobNo] = [t1].[JobNo]
WHERE [t0].[ResourceNo] = @p0

Similar concept can be expanded for multiple left joins. Assuming that a TimeSheetLine will either have a JobNo or an IndirectCode, consider this SQL query:


Select TL.EntryDate, TL.Hours, J.JobName, I.IndirectName
From TimeSheetLines TL
Left Join Jobs J on TL.JobNo=J.JobNo
Left Join Indirects I on TL.IndirectCode=I.IndirectCode

The equivalent LINQ query is:


var lines =
    from tl in db.TimeSheetLines
    join j in db.Jobs      on tl.JobNo        equals j.JobNo            into tl_j
    join i in db.Indirects on tl.IndirectCode equals i.IndirectCode  into tl_i
    where tl.ResourceNo == resourceNo

    from j in tl_j.DefaultIfEmpty()
    from i in tl_i.DefaultIfEmpty()
    select new
    {
        EntryDate = tl.EntryDate,
        Hours = tl.Hours,
        Job = j.JobName,
        Indirect = i.IndirectName,
    };

And the generated SQL is:


SELECT [t0].[EntryDate] as [EntryDate], [t0].[Hours] as [Hours], [t1].[JobName] AS [Job], [t2].[IndirectName] As [Indirect]
LEFT OUTER JOIN [dbo].[Jobs] AS [t1] ON [t0].[JobNo] = [t1].[JobNo]
LEFT OUTER JOIN [dbo].[Indirects] AS [t2] ON [t0].[IndirectCode] = [t2].[IndirectCode]
WHERE [t0].[ResourceNo] = @p0

That’s all, left outer joins in LINQ are as easy as in T-SQL. Happy joining.

Posted in linq. Tags: , , , , . 1 Comment »

T-SQL: Using common table expressions (CTE) to generate sequences

One of the best enhancements in T-SQL with SQL Server 2005 was Common Table Expressions(CTEs). CTEs are very helpful in writing more readable and manageable queries. The good things don’t end here; self-referencing CTEs are a very powerful method of implementing recursion in SQL queries. In this post, I will present a few examples of generating sequences using CTEs.
The following statements create a number sequence from 1 to 1000.


--define start and end limits
Declare @start int, @end int
Select @start=1, @end=1000

;With NumberSequence( Number ) as
(
    Select @start as Number
        union all
    Select Number + 1
        from NumberSequence
        where Number < @end
)

--select result
Select * From NumberSequence Option (MaxRecursion 1000)

And the output is:
Number sequence

Notice that when selecting records from the CTE, we are setting a value for MaxRecursion Option. This one is important since the maximum recursion level for a CTE is set to 100 by default, and so any sequence with more than 100 rows will generate an error. We can turn the recursion limit off by using MaxRecursion 0, but this is not a good idea since it opens the possibility of infinite recursion in case of a small error in CTE.

Here’s another example that creates date sequence from 2009-03-01 to 2009-04-10.


--define start and end limits
Declare @todate datetime, @fromdate datetime
Select @fromdate='2009-03-01', @todate='2009-04-10' 

;With DateSequence( Date ) as
(
    Select @fromdate as Date
        union all
    Select dateadd(day, 1, Date)
    	from DateSequence
    	where Date < @todate
)

--select result
Select * from DateSequence option (MaxRecursion 1000)

This one generates an output similar to:
Date Sequence

Notice, that this example, that generates date sequences can also be used to create Time Dimension for a Data Warehouse. If the granularity of time dimension is not too fine, CTEs give a nice alternative to loops. Here’s an example:


--define limits
Declare @todate datetime, @fromdate datetime
set @fromdate = '2009-01-01'
set @todate = '2009-12-31'

;With DateSequence( [Date] ) as
(
	Select @fromdate as [Date]
		union all
	Select dateadd(day, 1, [Date])
		from DateSequence
		where Date < @todate
)

--select result
Select
	CONVERT(VARCHAR,[Date],112) as ID,
	[Date] as [Date],
	DATEPART(DAY,[Date]) as [Day],
	CASE
		 WHEN DATEPART(DAY,[Date]) IN (11,12,13) THEN CAST(DATEPART(DAY,[Date]) AS VARCHAR) + 'th'
		 WHEN RIGHT(DATEPART(DAY,[Date]),1) = 1 THEN CAST(DATEPART(DAY,[Date]) AS VARCHAR) + 'st'
		 WHEN RIGHT(DATEPART(DAY,[Date]),1) = 2 THEN CAST(DATEPART(DAY,[Date]) AS VARCHAR) + 'nd'
		 WHEN RIGHT(DATEPART(DAY,[Date]),1) = 3 THEN CAST(DATEPART(DAY,[Date]) AS VARCHAR) + 'rd'
		 ELSE CAST(DATEPART(DAY,[Date]) AS VARCHAR) + 'th'
	END as [DaySuffix],
	CASE DATEPART(WEEKDAY, [Date])
		 WHEN 1 THEN 'Sunday'
		 WHEN 2 THEN 'Monday'
		 WHEN 3 THEN 'Tuesday'
		 WHEN 4 THEN 'Wednesday'
		 WHEN 5 THEN 'Thursday'
		 WHEN 6 THEN 'Friday'
		 WHEN 7 THEN 'Saturday'
	END as [DayOfWeek],
	DATEPART(DAYOFYEAR,[Date]) as [DayOfYear],
	DATEPART(WEEK,[Date]) as [WeekOfYear],
	DATEPART(WEEK,[Date]) + 1 - DATEPART(WEEK,CAST(DATEPART(MONTH,[Date]) AS VARCHAR) + '/1/' + CAST(DATEPART(YEAR,[Date]) AS VARCHAR)) as [WeekOfMonth],
	DATEPART(MONTH,[Date]) as [Month],
	DATENAME(MONTH,[Date]) as [MonthName],
	DATEPART(QUARTER,[Date]) as [Quarter],
	CASE DATEPART(QUARTER,[Date])
		WHEN 1 THEN 'First'
		WHEN 2 THEN 'Second'
		WHEN 3 THEN 'Third'
		WHEN 4 THEN 'Fourth'
	END as [QuarterName],
	DATEPART(YEAR,[Date]) as [Year]
from DateSequence option (MaxRecursion 10000)

And here’s the beautiful output of the above query to be used as a time dimension (click to enlarge the image):
Time Dimension

TSQL Challenge 8: Using recursive CTE for a hierarchical relationship

Last week, I submitted an entry for T-SQL challenge 8. This time, the contestants were asked to process a hierarchy using recursive CTEs without applying any filters inside the CTE. The big challenge was the condition that the CTE should not contain any filter for a specific manager.
Here is some code for populating test data.


--Populate test data
DECLARE @Employees TABLE (EmpID INT, EmpName VARCHAR(20), ReportsTo INT)
INSERT INTO @Employees(EmpID, EmpName, ReportsTo)
  SELECT 1, 'Jacob', NULL UNION ALL
  SELECT 2, 'Rui', NULL UNION ALL
  SELECT 3, 'Jacobson', NULL UNION ALL
  SELECT 4, 'Jess', 1 UNION ALL
  SELECT 5, 'Steve', 1 UNION ALL
  SELECT 6, 'Bob', 1 UNION ALL
  SELECT 7, 'Smith', 2 UNION ALL
  SELECT 8, 'Bobbey', 2 UNION ALL
  SELECT 9, 'Steffi', 3 UNION ALL
  SELECT 10, 'Bracha', 3 UNION ALL
  SELECT 11, 'John', 5 UNION ALL
  SELECT 12, 'Michael', 6 UNION ALL
  SELECT 13, 'Paul', 6 UNION ALL
  SELECT 14, 'Lana', 7 UNION ALL
  SELECT 15, 'Johnson', 7 UNION ALL
  SELECT 16, 'Mic', 8 UNION ALL
  SELECT 17, 'Stev', 8 UNION ALL
  SELECT 18, 'Paulson', 9 UNION ALL
  SELECT 19, 'Jessica', 10

I started by creating a recursive CTE for the whole table like this:


;With Hierarchy(EmpName, EmpID, Level, FullyQualifiedName)
As
(
  Select E.EmpName, E.EmpID, 0, Cast('.'+E.EmpName+'.' as Varchar(MAX))
    From @Employees E
    Where E.ReportsTo is null
  Union all
  Select E.EmpName, E.EmpID, H.Level+1, H.FullyQualifiedName+'.'+E.EmpName+'.'
    from @Employees E
    inner join Hierarchy H on H.EmpID=E.ReportsTo
)

Select Space(Level*4) + H.EmpName
  from Hierarchy H
  order by H.FullyQualifiedName

Result of CTE

Notice that I am constructing a FullyQualifiedName value for each row. This value consists of full path from root to the current person. This approach helped me in filtering the records in the final select statement where I just needed to look for a '.' + ManagerName + '.' filter. Here is a select query that extracts all subordinates for a particular manager.


Select Space(Level*4) + H.EmpName
  from Hierarchy H
  where CHARINDEX('.'+(Select Top(1) E.EmpName from @Employees E Where E.EmpName=@manager)+'.', H.FullyQualifiedName) > 0
  order by H.FullyQualifiedName

The only issue that remains is that when the query is run for a non-parent employee, we get extra spaces in the beginning. E.g. when we run the query for “Paul”, we get an output like:

Spacing issue

I solved it by inserted a sub-query in the space() expression like this:


Select Space((Level-(Select Top(1) Level from Hierarchy H2 Where H2.EmpName=@manager))*4) + EmpName
  from Hierarchy H
  where CHARINDEX('.'+(Select Top(1) E.EmpName from @Employees E Where E.EmpName=@manager)+'.', H.FullyQualifiedName) > 0
  order by H.FullyQualifiedName

Instead of two subqueries, I tried to use a join but the performance using sub-queries was better so I submitted the subquery solution. Here’s the join version I am referring, it looks very nice and clean:


Select Space((H.Level-H2.Level)*4) + H.EmpName
  from Hierarchy H
  inner join Hierarchy H2 on CHARINDEX('.'+H2.EmpName+'.', H.FullyQualifiedName) > 0
  where H2.EmpName=@manager
  order by H.FullyQualifiedName

In the last, here is the complete solution:


--Populate test data
DECLARE @Employees TABLE (EmpID INT, EmpName VARCHAR(20), ReportsTo INT)
INSERT INTO @Employees(EmpID, EmpName, ReportsTo)
  SELECT 1, 'Jacob', NULL UNION ALL
  SELECT 2, 'Rui', NULL UNION ALL
  SELECT 3, 'Jacobson', NULL UNION ALL
  SELECT 4, 'Jess', 1 UNION ALL
  SELECT 5, 'Steve', 1 UNION ALL
  SELECT 6, 'Bob', 1 UNION ALL
  SELECT 7, 'Smith', 2 UNION ALL
  SELECT 8, 'Bobbey', 2 UNION ALL
  SELECT 9, 'Steffi', 3 UNION ALL
  SELECT 10, 'Bracha', 3 UNION ALL
  SELECT 11, 'John', 5 UNION ALL
  SELECT 12, 'Michael', 6 UNION ALL
  SELECT 13, 'Paul', 6 UNION ALL
  SELECT 14, 'Lana', 7 UNION ALL
  SELECT 15, 'Johnson', 7 UNION ALL
  SELECT 16, 'Mic', 8 UNION ALL
  SELECT 17, 'Stev', 8 UNION ALL
  SELECT 18, 'Paulson', 9 UNION ALL
  SELECT 19, 'Jessica', 10

--Solution starts here
DECLARE @manager VARCHAR(20)
SELECT @manager = 'Jacob'

--CTE
;With Hierarchy(EmpName, EmpID, Level, FullyQualifiedName)
As
(
  Select E.EmpName, E.EmpID, 0, Cast('.'+E.EmpName+'.' as Varchar(MAX))
    From @Employees E
    Where E.ReportsTo is null
  Union all
  Select E.EmpName, E.EmpID, H.Level+1, H.FullyQualifiedName+'.'+E.EmpName+'.'
    from @Employees E
    inner join Hierarchy H on H.EmpID=E.ReportsTo
)

--Result
Select Space((Level-(Select Top(1) Level from Hierarchy H2 Where H2.EmpName=@manager))*4) + EmpName
  from Hierarchy H
  where CHARINDEX('.'+(Select Top(1) E.EmpName from @Employees E Where E.EmpName=@manager)+'.', H.FullyQualifiedName) > 0
  order by H.FullyQualifiedName

TSQL Challenge 7: Listing the 5 biggest tables on the server

A few weeks ago, I stumbled upon this blog that presents cool T-SQL challenges. I submitted an entry for challenge 7 that asked to write the shortest script to list the 5 biggest tables on a server. Here’s my solution:


create table #temp
(
	[database] nvarchar(MAX),
	[table] nvarchar(MAX),
	[rows] int,
	[reserved_size] nvarchar(100),
	[data_size] nvarchar(100),
	[index_size] nvarchar(100),
	[unused_space] nvarchar(100)
)

declare @sql nvarchar(MAX)
set @sql=replace('if !~! not in (!master!,!model!,!msdb!,!tempdb!)
  exec [~].dbo.sp_msforeachtable
    "insert into #temp([table], [rows], [reserved_size], [data_size], [index_size], [unused_space])
      exec [~].dbo.sp_spaceused !?!"','!',char(39))

EXEC sp_MSForEachDB
	@command1=@sql,
	@command2="update #temp set [database]='~' where [database] is null",
	@replacechar='~'

select top(5) [database] as base, [table], [data_size] as size, [rows] as rows
from #temp
order by Cast(LEFT([data_size],len([data_size])-3) as int) desc

drop table #temp

So, I started by creating a temporary table with columns (database, table, rows, reserved_size, data_size, index_size, unused_space) for the output. I used the two undocumented stored procedures sp_MSforeachdb and sp_MSforeachtable to iterate through all the tables in all the databases and executed sp_spaceused as described in the following pseudo code:


foreach(database db in serverDatabases)
  if (db not in 'master', 'msdb', 'model', 'tempdb')
    foreach(table t in db.Tables)
    {
       insert into #temp (table, rows, reserved_size, data_size, index_size, unused_space)
         execute sp_spaceused for table 't'

       --at this point, our #temp table will be populated with data for each table
       --but the 'database' column will be 'null', so now replace it with the name of database
       update #temp
         set [database] = 'db' where [database] is null
    }

The most important part is that I am using an update operation for storing the database name in the temporary table. Thanks to Microsoft that we can give a set of 3 commands to the above mentioned undocumented stored procedures. Another hard part was to create a single t-sql statement that iterates for all tables inside a database and execute sp_spaceused. I did this by a complex combination of single quotes, double quotes and the replace function. In the last, I am just selecting the top(5) rows ordered by size.

To enter the contest, I reduced the script length by replacing all the variable/column names with a single length identifier. Here was my final submission:


create table #t(d nvarchar(MAX),t nvarchar(MAX),r int,x nvarchar(100),s nvarchar(100),y nvarchar(100),z nvarchar(100))
declare @s nvarchar(MAX)
set @s=replace('if !~! not in (!master!,!model!,!msdb!,!tempdb!) exec [~].dbo.sp_msforeachtable "insert into #t(t, r,x,s,y,z) exec [~].dbo.sp_spaceused !?!"','!',char(39))
EXEC sp_MSForEachDB @command1=@s, @command2="update #t set d='~' where d is null", @replacechar='~'
select top(5) d as base, t as [table], s as size, r as rows from #t order by Cast(LEFT(s,len(s)-3) as int) desc
drop table #t

Let’s wait and see the solution of other players.

Posted in T-SQL. 1 Comment »

T-SQL: sp_MSForEachDB and sp_MSForEachTable – Undocumented but very powerful stored procedures

I recently came across the following two undocumented stored procedures:

  • sp_MSforeachdb
  • sp_MSforeachtable

These stored procedures take a command (infact upto 3 commands) and run it for all the objects (databases on a server or tables inside a database, depending upon the stored procedure). Here are a few examples of the usage of these stored procedures (Note that these statements may take long duration depending upon the number of databases/tables):


--Display no. of objects per databases
exec sp_MSforeachdb 'select "?", count(*) as TotalObjects from [?].dbo.sysobjects'

--Display no. of rows per table
exec sp_MSforeachtable 'select "?" as TableName, count(*) as TotalRows from ?'

--Display total space used for each table
exec sp_MSforeachtable 'exec sp_spaceused [?]'

Beware, never ever underestimate the powers of these stored procedures. In a single line, you can:

  • Destroy all the databases on a server
  • Delete all the data inside all the tables
  • Drop all the tables
  • Even more dangerous things I can’t think at this time

In this small post, I am not going to demonstrate how to do the above mentioned tricks but make sure you are extremely cautious when using the above stored procedures.

C#: Executing batch T-SQL Scripts containing GO statements

At times, we developers need to run SQL scripts from our .NET applications, in, say Installer Applications. This is not always easy since large SQL scripts typically contain a GO statement to separate individual batches and our ADO.NET classes under System.Data namespace do not know how to handle it. The reason is that the “GO” statement is not a native T-SQL statement but rather used by SQL Server Management Studio to terminate the batches it is sending to the server. To tackle this situation, we have the following choices:

1. Split the script on “GO” command into smaller scripts and execute those individual scripts

A very primitive solution would be to split the script on “GO” text and run the individual sub-scripts in sequence. The problem is how to get a robust split mechanism. Generally, a line break before and after the GO works fine. Here’s how to do this:

//get the script
string scriptText = GetScript();

//split the script on "GO" commands
string[] splitter = new string[] { "\r\nGO\r\n" };
string[] commandTexts = scriptText.Split(splitter,
  StringSplitOptions.RemoveEmptyEntries);
foreach (string commandText in commandTexts)
{
  //execute commandText
}

The above code can produce unnecessary splitting in some situations, thus creating an incorrect SQL Command. A more better approach would be to use Regular Expressions. Again, the problem is how to create a pattern that is robust enough to tackle all sort of scripts. Lets look into the second solution now.

2. Use the Server class from SQL Server Management Objects (SMO)

For this, we need to add the following references in our project:

  • Microsoft.SqlServer.Smo
  • Microsoft.SqlServer.ConnectionInfo
  • Microsoft.SqlServer.Management.Sdk.Sfc

After that, we can simply execute the entire script (with all the “GO” statements) using the code below:

string connectionString, scriptText;
SqlConnection sqlConnection = new SqlConnection(connectionString);
ServerConnection svrConnection = new ServerConnection(sqlConnection);
Server server = new Server(svrConnection);
server.ConnectionContext.ExecuteNonQuery(scriptText);

This is a fairly generic and robust solution and the great thing is that it does not require any change in our original script. Hooray!!!