## TSQL Challenge 12: Completing sequence by inserting missing rows

TSQL Challenge 12 was a relatively easier one. The participants were given month-wise score values and were asked to complete the sequence by creating entries for missing month.

Here’s the sample input:

```YearMonth   Score
----------- -----------
200903      100
200803      95
200802      99
200801      100
200711      100
```

And here’s the desired output. Notice that the score of last month is replicated in each of the missing rows:

```YearMonth   Score
----------- -----------
200908      100
200907      100
200906      100
200905      100
200904      100
200903      100
200902      95
200901      95
200812      95
200811      95
200810      95
200809      95
200808      95
200807      95
200806      95
200805      95
200804      95
200803      95
200802      99
200801      100
200712      100
200711      100
```

The script to generate sample data is provided below:

```DECLARE @Scores TABLE
(
YearMonth	INT,
Score		INT
)

INSERT @Scores VALUES(200903, 100)
INSERT @Scores VALUES(200803, 95)
INSERT @Scores VALUES(200802, 99)
INSERT @Scores VALUES(200801 ,100)
INSERT @Scores VALUES(200711, 100)
```

Solution

I once blogged about creating a sequence of numbers/dates using a recursive CTE in this post. The same technique can be used here. However, since the `YearMonth` column in the sample data is integer, we have two choices:

1. Convert it to DateTime and apply T-SQL DateTime functions
2. Leave it as Integer and apply some intelligent arithmetic

I am providing both the solutions here. Note that the first solution will be slower due to overhead of casting and applying T-SQL scalar functions.

The first solution: Converting to DateTime

```;with cte as
(
select score, Cast(Cast(YearMonth as varchar)+'01' as datetime) as dateVal
from @scores
union all
select score, dateadd(month, 1, dateval)
from cte
where not exists
--the resultant YearMonth value should not lie in the original table
( select 1 from @scores s where s.YearMonth = cast( left(convert(varchar, dateadd(month, 1, cte.dateval), 112), 6) as int) )
--stop at current month
and dateadd(month, 1, cte.dateval) < getdate()
)

select left(convert(varchar, dateval, 112), 6) as yearmonth, score
from cte
order by dateval desc
```

Explanation:
Here I am simply converting the integer `YearMonth` column to a datetime `dateval` column by appending 01 to the end (so a `200901` becomes `20090101` that can easily be cast to a dateTime) and then finding subsequent dates by adding one month in each CTE iteration.

The second solution: Integer Arithmetic

```;with cte as
(
select YearMonth, Score
from @Scores
union all
select YearMonth + YearMonth % 100 / 12 * 88 + 1 as YearMonth, Score
from cte
where not exists
--the resultant YearMonth value should not lie in the original table
( select s.YearMonth from @Scores s where s.YearMonth = (cte.YearMonth + cte.YearMonth % 100 / 12 * 88 + 1) )
--stop at current month
and cte.YearMonth < month(getdate()) + year(getdate())*100
)

select *
from cte
order by YearMonth desc
```

Explanation:
The important point is to increment the value of `YearMonth` correctly. So `200811` should get incremented to `200812` but `200812` should get incremented to `200901`. This isn’t difficult if we introduce a case statement like this:

`YearMonth + (Case When YearMonth%100 < 12 Then 1 Else 89 End)`

But I wanted to do this purely using arithmetic with no `Case` statements, so I came up with this formula:

`(YearMonth % 100 / 12 * 88) + 1`

Note that the factor `(YearMonth % 100 / 12 * 88)` will reduce to zero for all values from January to November, i.e. from `200801` to `200811`.

I hope you enjoyed the solution.

Update:
It was pointed out in one of the comments by Rakesh that the solution could reach the default limit of recursion which is 100. In order to avoid this, we need to add `option (maxrecursion 12,000)` in the final select statement. Then, we can have 10,000 years missing between two adjacent entries. Thanks, Rakesh.

```...
select *
from cte
order by YearMonth desc
option (maxrecursion 12,000)
```

## TSQL Challenge 11: Calculating the lowest price of an item by applying discount coupons

TSQL Challenge 11 was a practical problem. Given a a list of products and a list of discount coupons, we needed to find the minimum price for all the products based on certain rules. Here are those rules:

• Maximum two coupons can be applied on the same product
• The discount price can not be less than 70% of the original price
• The total amount of the discount can not exceed 30\$

Also, note that coupons are applied in a cumulative way. So the second coupon is applied on the result of the original price + first coupon.

Sample Data:
Here is some sample products data:

```ID NAME    PRICE
-- ------- ---------
1  PROD 1  100,00
2  PROD 2  220,00
3  PROD 3  15,00
4  PROD 4  70,00
5  PROD 5  150,00
```

Here are the coupons to be applied

```ID NAME         VALUE  IS_PERCENT
-- -----------  ------ ----------
1  CP 1 : -15\$  15     0
2  CP 2 : -5\$   5      0
3  CP 3 : -10%  10     1
4  CP 4 : -12\$  12     0
```

And here’s the required output:

```ID NAME    PRICE    DISC_PRICE  TOT_DISC  RATE    COUPON_NAMES
2.-- ------  -------- ----------- --------- ------- -------------------------
3.1  PROD 1  100.00\$  73.00\$      27.00\$    27.00%  CP 4 : -12\$ + CP 1 : -15\$
4.2  PROD 2  220.00\$  193.00\$     27.00\$    12.27%  CP 4 : -12\$ + CP 1 : -15\$
5.3  PROD 3  15.00\$   13.50\$      1.50\$     10.00%  CP 3 : -10%
6.4  PROD 4  70.00\$   49.50\$      20.50\$    29.28%  CP 1 : -15\$ + CP 3 : -10%
7.5  PROD 5  150.00\$  120.00\$     30.00\$    20.00%  CP 3 : -10% + CP 1 : -15\$
```

Solution

Interesting enough… So lets attempt to find a solution. Read the rest of this entry »

## T-SQL: Creating a hierarchical structure from relational data

Recently, while developing an SSIS package, I came across a scenario where I needed to create parent-child relationships between rows in a relational table. Let’s have a look at some sample data:

Notice the values in the WBSCode column, 001 is the parent of all 001-XX rows, 001-02 is the parent of all 001-02-XX rows and so on. We need to create a relationship by populating the ParentID field so that the resulting structure can be traversed using a recursive CTE. Here’s a code snippet to create some sample data.

```--create sample data
Declare @ProjectWBS TABLE
(
ID int IDENTITY(1,1) NOT NULL Primary Key,
ProjectCode varchar(20) NULL,
WBSCode varchar(100) NULL,
WBSDescription varchar(max) NULL,
ParentID int NULL
)

INSERT @ProjectWBS (ProjectCode, WBSCode, WBSDescription, ParentID)
Select 'Project1', '001', 'Phase 1', NULL
union all Select 'Project1', '001-01', 'Project Mgmt', NULL
union all Select 'Project1', '001-02', 'Requirements', NULL
union all Select 'Project1', '001-02-01', 'Requirements Meeting', NULL
union all Select 'Project1', '001-02-02', 'Requirements Write Up', NULL
union all Select 'Project1', '001-03', 'Installation', NULL
union all Select 'Project1', '001-03-01', 'Nav Install', NULL
union all Select 'Project1', '001-03-02', 'WTE Install', NULL
union all Select 'Project1', '001-03-03', 'Reporting Service', NULL
union all Select 'Project1', '001-03-04', 'BI Install', NULL
union all Select 'Project1', '001-04', 'Setup', NULL
union all Select 'Project1', '001-04-01', 'Financials Setup', NULL
union all Select 'Project1', '001-04-02', 'Projects Setup', NULL
union all Select 'Project1', '001-04-03', 'Allocation Setup', NULL
union all Select 'Project1', '001-04-04', 'WTE Setup', NULL
union all Select 'Project1', '001-04-05', 'Users and Security', NULL
union all Select 'Project1', '001-05', 'Training', NULL
union all Select 'Project1', '001-06', 'Testing', NULL
union all Select 'Project1', '001-06-01', 'Financial Testing', NULL
union all Select 'Project1', '001-06-02', 'Project Testing', NULL
union all Select 'Project1', '001-06-03', 'Project Allocation Testing', NULL
union all Select 'Project1', '001-06-04', 'WTE Testing', NULL
union all Select 'Project1', '001-06-05', 'Users and Security Testing', NULL
union all Select 'Project1', '001-07', 'Onsite Go Live', NULL
union all Select 'Project1', '001-08', 'Development', NULL
union all Select 'Project1', '001-08-01', 'Overtime functionality', NULL
union all Select 'Project1', '001-09', 'Data Migration', NULL
union all Select 'Project1', '002', 'Phase 2', NULL
union all Select 'Project1', '002-01', 'Monthly Support', NULL
union all Select 'Project1', '002-01-02', 'Support Training', NULL
union all Select 'Project1', '002-01-03', 'Support Account Mgmt', NULL

--Select * from @ProjectWBS
```

I started by creating the following function that returns no. of occurrences of a particular character in a given string:

```CREATE FUNCTION fnc_CountChar(@string varchar(max), @char char(1) )
RETURNS INT
BEGIN
RETURN (LEN(@string) - LEN(REPLACE(@string, @char, '')))
END
```

After this, I declared the following CTE:

```--create hierarchical structure
;With WBSHierarchy
as
(
Select Parent.ID, Parent.ProjectCode, Parent.WBSCode, Parent.WBSDescription,
0 as Level, null as ParentID, Cast(null as varchar(MAX)) as ParentWBSCode
From @ProjectWBS Parent
Where dbo.fnc_CountChar(Parent.WBSCode, '-') = 0 --no hiphen "-" in parents

union All

Select Child.ID, Child.ProjectCode, Child.WBSCode, Child.WBSDescription,
Parent.Level+1, Parent.ID as ParentID, Cast(Parent.WBSCode as varchar(max)) as ParentWBSCode
From @ProjectWBS Child
inner join WBSHierarchy Parent on Child.WBSCode Like Parent.WBSCode + '-%' -- childWBS = parentWBS-xx
and dbo.fnc_CountChar(Child.WBSCode, '-') = Parent.Level + 1 --childs should have only one hiphen more than their parent
)

--Select * from WBSHierarchy order by WBSCode
```

Notice that in the above CTE definition, I considered all rows with single occurrence of “-” as the parent rows. Then, I used the fact that `ChildWBS=ParentWBS-XX`, and the no. of “-” occurrences in child WBS will be equal to one greater than its parent. Here’s the output of the above CTE.

Things were much simple after this, and I was able to write a single update query to populate the ParentID fields of all the rows in the original table as follows:

```--update parentID
update @ProjectWBS
set ParentID = H.ParentID
from @ProjectWBS W
inner join WBSHierarchy H on W.ID=H.ID
```

That’s all. Now the result can be traversed using the following recursive CTE:

```--traverse the resulting structure using a recursive CTE
;With WBSHierarchy
As
(
Select Parent.ID, Parent.ProjectCode, Parent.WBSCode, Parent.WBSDescription, 0 as Level
from @ProjectWBS Parent
where Parent.ParentID is null
union all
Select Child.ID, Child.ProjectCode, Child.WBSCode, Child.WBSDescription, Parent.Level+1 as Level
from @ProjectWBS Child
inner join WBSHierarchy Parent on Child.ParentID=Parent.ID
)

Select ProjectCode, WBSCode, space(Level*4)+WBSDescription as WBSDescription
from WBSHierarchy order by WBSCode
```

Update:
I got a very nice solution from Mitesh in the comments below. He used the fact that if we drop all the characters after the last “-” then we can get the parent WBS, i.e. 001-01-01 can be transformed into 001-01. Great find, Mitesh! Hence, we can do all the transformation without a recursive CTE or UDF like this:

```;WITH CTE
AS
(
SELECT
WbsCode,
SUBSTRING(WbsCode,1,LEN(WbsCode)-
CASE
WHEN CHARINDEX('-',REVERSE(WbsCode)) > 0
THEN CHARINDEX('-',REVERSE(WbsCode))
END
) AS EParentID ,
ParentID
FROM
@ProjectWBS
)

UPDATE
C
SET
PARENTID	= P.ID
FROM
@ProjectWBS	P
INNER JOIN
CTE	C
ON
C.EParentID	=	P.WBSCode
```

Let me know if you have any other solution to such scenario.

## T-SQL: Using common table expressions (CTE) to generate sequences

One of the best enhancements in T-SQL with SQL Server 2005 was Common Table Expressions(CTEs). CTEs are very helpful in writing more readable and manageable queries. The good things don’t end here; self-referencing CTEs are a very powerful method of implementing recursion in SQL queries. In this post, I will present a few examples of generating sequences using CTEs.
The following statements create a number sequence from 1 to 1000.

```--define start and end limits
Declare @start int, @end int
Select @start=1, @end=1000

;With NumberSequence( Number ) as
(
Select @start as Number
union all
Select Number + 1
from NumberSequence
where Number < @end
)

--select result
Select * From NumberSequence Option (MaxRecursion 1000)
```

And the output is:

Notice that when selecting records from the CTE, we are setting a value for MaxRecursion Option. This one is important since the maximum recursion level for a CTE is set to 100 by default, and so any sequence with more than 100 rows will generate an error. We can turn the recursion limit off by using MaxRecursion 0, but this is not a good idea since it opens the possibility of infinite recursion in case of a small error in CTE.

Here’s another example that creates date sequence from `2009-03-01` to `2009-04-10`.

```--define start and end limits
Declare @todate datetime, @fromdate datetime
Select @fromdate='2009-03-01', @todate='2009-04-10'

;With DateSequence( Date ) as
(
Select @fromdate as Date
union all
Select dateadd(day, 1, Date)
from DateSequence
where Date < @todate
)

--select result
Select * from DateSequence option (MaxRecursion 1000)
```

This one generates an output similar to:

Notice, that this example, that generates date sequences can also be used to create Time Dimension for a Data Warehouse. If the granularity of time dimension is not too fine, CTEs give a nice alternative to loops. Here’s an example:

```--define limits
Declare @todate datetime, @fromdate datetime
set @fromdate = '2009-01-01'
set @todate = '2009-12-31'

;With DateSequence( [Date] ) as
(
Select @fromdate as [Date]
union all
Select dateadd(day, 1, [Date])
from DateSequence
where Date < @todate
)

--select result
Select
CONVERT(VARCHAR,[Date],112) as ID,
[Date] as [Date],
DATEPART(DAY,[Date]) as [Day],
CASE
WHEN DATEPART(DAY,[Date]) = 1 THEN CAST(DATEPART(DAY,[Date]) AS VARCHAR) + 'st'
WHEN DATEPART(DAY,[Date]) = 2 THEN CAST(DATEPART(DAY,[Date]) AS VARCHAR) + 'nd'
WHEN DATEPART(DAY,[Date]) = 3 THEN CAST(DATEPART(DAY,[Date]) AS VARCHAR) + 'rd'
ELSE CAST(DATEPART(DAY,[Date]) AS VARCHAR) + 'th'
END as [DaySuffix],
DATENAME(dw, [Date]) as [DayOfWeek],
DATEPART(DAYOFYEAR,[Date]) as [DayOfYear],
DATEPART(WEEK,[Date]) as [WeekOfYear],
DATEPART(WEEK,[Date]) + 1 - DATEPART(WEEK,CAST(DATEPART(MONTH,[Date]) AS VARCHAR) + '/1/' + CAST(DATEPART(YEAR,[Date]) AS VARCHAR)) as [WeekOfMonth],
DATEPART(MONTH,[Date]) as [Month],
DATENAME(MONTH,[Date]) as [MonthName],
DATEPART(QUARTER,[Date]) as [Quarter],
CASE DATEPART(QUARTER,[Date])
WHEN 1 THEN 'First'
WHEN 2 THEN 'Second'
WHEN 3 THEN 'Third'
WHEN 4 THEN 'Fourth'
END as [QuarterName],
DATEPART(YEAR,[Date]) as [Year]
from DateSequence option (MaxRecursion 10000)
```

And here’s the beautiful output of the above query to be used as a time dimension (click to enlarge the image):

## TSQL Challenge 8: Using recursive CTE for a hierarchical relationship

Last week, I submitted an entry for T-SQL challenge 8. This time, the contestants were asked to process a hierarchy using recursive CTEs without applying any filters inside the CTE. The big challenge was the condition that the CTE should not contain any filter for a specific manager.
Here is some code for populating test data.

```--Populate test data
DECLARE @Employees TABLE (EmpID INT, EmpName VARCHAR(20), ReportsTo INT)
INSERT INTO @Employees(EmpID, EmpName, ReportsTo)
SELECT 1, 'Jacob', NULL UNION ALL
SELECT 2, 'Rui', NULL UNION ALL
SELECT 3, 'Jacobson', NULL UNION ALL
SELECT 4, 'Jess', 1 UNION ALL
SELECT 5, 'Steve', 1 UNION ALL
SELECT 6, 'Bob', 1 UNION ALL
SELECT 7, 'Smith', 2 UNION ALL
SELECT 8, 'Bobbey', 2 UNION ALL
SELECT 9, 'Steffi', 3 UNION ALL
SELECT 10, 'Bracha', 3 UNION ALL
SELECT 11, 'John', 5 UNION ALL
SELECT 12, 'Michael', 6 UNION ALL
SELECT 13, 'Paul', 6 UNION ALL
SELECT 14, 'Lana', 7 UNION ALL
SELECT 15, 'Johnson', 7 UNION ALL
SELECT 16, 'Mic', 8 UNION ALL
SELECT 17, 'Stev', 8 UNION ALL
SELECT 18, 'Paulson', 9 UNION ALL
SELECT 19, 'Jessica', 10
```

I started by creating a recursive CTE for the whole table like this:

```;With Hierarchy(EmpName, EmpID, Level, FullyQualifiedName)
As
(
Select E.EmpName, E.EmpID, 0, Cast('.'+E.EmpName+'.' as Varchar(MAX))
From @Employees E
Where E.ReportsTo is null
Union all
Select E.EmpName, E.EmpID, H.Level+1, H.FullyQualifiedName+'.'+E.EmpName+'.'
from @Employees E
inner join Hierarchy H on H.EmpID=E.ReportsTo
)

Select Space(Level*4) + H.EmpName
from Hierarchy H
order by H.FullyQualifiedName
```

Notice that I am constructing a `FullyQualifiedName` value for each row. This value consists of full path from root to the current person. This approach helped me in filtering the records in the final select statement where I just needed to look for a `'.' + ManagerName + '.'` filter. Here is a select query that extracts all subordinates for a particular manager.

```Select Space(Level*4) + H.EmpName
from Hierarchy H
where CHARINDEX('.'+(Select Top(1) E.EmpName from @Employees E Where E.EmpName=@manager)+'.', H.FullyQualifiedName) > 0
order by H.FullyQualifiedName
```

The only issue that remains is that when the query is run for a non-parent employee, we get extra spaces in the beginning. E.g. when we run the query for “Paul”, we get an output like:

I solved it by inserted a sub-query in the space() expression like this:

```Select Space((Level-(Select Top(1) Level from Hierarchy H2 Where H2.EmpName=@manager))*4) + EmpName
from Hierarchy H
where CHARINDEX('.'+(Select Top(1) E.EmpName from @Employees E Where E.EmpName=@manager)+'.', H.FullyQualifiedName) > 0
order by H.FullyQualifiedName
```

Instead of two subqueries, I tried to use a `join` but the performance using sub-queries was better so I submitted the subquery solution. Here’s the join version I am referring, it looks very nice and clean:

```Select Space((H.Level-H2.Level)*4) + H.EmpName
from Hierarchy H
inner join Hierarchy H2 on CHARINDEX('.'+H2.EmpName+'.', H.FullyQualifiedName) > 0
where H2.EmpName=@manager
order by H.FullyQualifiedName
```

In the last, here is the complete solution:

```--Populate test data
DECLARE @Employees TABLE (EmpID INT, EmpName VARCHAR(20), ReportsTo INT)
INSERT INTO @Employees(EmpID, EmpName, ReportsTo)
SELECT 1, 'Jacob', NULL UNION ALL
SELECT 2, 'Rui', NULL UNION ALL
SELECT 3, 'Jacobson', NULL UNION ALL
SELECT 4, 'Jess', 1 UNION ALL
SELECT 5, 'Steve', 1 UNION ALL
SELECT 6, 'Bob', 1 UNION ALL
SELECT 7, 'Smith', 2 UNION ALL
SELECT 8, 'Bobbey', 2 UNION ALL
SELECT 9, 'Steffi', 3 UNION ALL
SELECT 10, 'Bracha', 3 UNION ALL
SELECT 11, 'John', 5 UNION ALL
SELECT 12, 'Michael', 6 UNION ALL
SELECT 13, 'Paul', 6 UNION ALL
SELECT 14, 'Lana', 7 UNION ALL
SELECT 15, 'Johnson', 7 UNION ALL
SELECT 16, 'Mic', 8 UNION ALL
SELECT 17, 'Stev', 8 UNION ALL
SELECT 18, 'Paulson', 9 UNION ALL
SELECT 19, 'Jessica', 10

--Solution starts here
DECLARE @manager VARCHAR(20)
SELECT @manager = 'Jacob'

--CTE
;With Hierarchy(EmpName, EmpID, Level, FullyQualifiedName)
As
(
Select E.EmpName, E.EmpID, 0, Cast('.'+E.EmpName+'.' as Varchar(MAX))
From @Employees E
Where E.ReportsTo is null
Union all
Select E.EmpName, E.EmpID, H.Level+1, H.FullyQualifiedName+'.'+E.EmpName+'.'
from @Employees E
inner join Hierarchy H on H.EmpID=E.ReportsTo
)

--Result
Select Space((Level-(Select Top(1) Level from Hierarchy H2 Where H2.EmpName=@manager))*4) + EmpName
from Hierarchy H
where CHARINDEX('.'+(Select Top(1) E.EmpName from @Employees E Where E.EmpName=@manager)+'.', H.FullyQualifiedName) > 0
order by H.FullyQualifiedName
```