It’s a source of great pleasure for me that I am included in the winners for TSQL challenge 12.

Here’s my post describing my solution and here’s the analysis by Jacob Sebastian, the founder and president of TSQL Challenges.

It’s a source of great pleasure for me that I am included in the winners for TSQL challenge 12.

Here’s my post describing my solution and here’s the analysis by Jacob Sebastian, the founder and president of TSQL Challenges.

Advertisements

TSQL Challenge 12 was a relatively easier one. The participants were given month-wise score values and were asked to complete the sequence by creating entries for missing month.

Here’s the sample input:

YearMonth Score ----------- ----------- 200903 100 200803 95 200802 99 200801 100 200711 100

And here’s the desired output. Notice that the score of last month is replicated in each of the missing rows:

YearMonth Score ----------- ----------- 200908 100 200907 100 200906 100 200905 100 200904 100 200903 100 200902 95 200901 95 200812 95 200811 95 200810 95 200809 95 200808 95 200807 95 200806 95 200805 95 200804 95 200803 95 200802 99 200801 100 200712 100 200711 100

The script to generate sample data is provided below:

DECLARE @Scores TABLE ( YearMonth INT, Score INT ) INSERT @Scores VALUES(200903, 100) INSERT @Scores VALUES(200803, 95) INSERT @Scores VALUES(200802, 99) INSERT @Scores VALUES(200801 ,100) INSERT @Scores VALUES(200711, 100)

**Solution**

I once blogged about creating a sequence of numbers/dates using a recursive CTE in this post. The same technique can be used here. However, since the `YearMonth`

column in the sample data is integer, we have two choices:

- Convert it to DateTime and apply T-SQL DateTime functions
- Leave it as Integer and apply some intelligent arithmetic

I am providing both the solutions here. Note that the first solution will be slower due to overhead of casting and applying T-SQL scalar functions.

**The first solution: Converting to DateTime**

;with cte as ( select score, Cast(Cast(YearMonth as varchar)+'01' as datetime) as dateVal from @scores union all select score, dateadd(month, 1, dateval) from cte where not exists --the resultant YearMonth value should not lie in the original table ( select 1 from @scores s where s.YearMonth = cast( left(convert(varchar, dateadd(month, 1, cte.dateval), 112), 6) as int) ) --stop at current month and dateadd(month, 1, cte.dateval) < getdate() ) select left(convert(varchar, dateval, 112), 6) as yearmonth, score from cte order by dateval desc

**Explanation:**

Here I am simply converting the integer `YearMonth`

column to a datetime `dateval`

column by appending 01 to the end (so a `200901`

becomes `20090101`

that can easily be cast to a dateTime) and then finding subsequent dates by adding one month in each CTE iteration.

**The second solution: Integer Arithmetic**

;with cte as ( select YearMonth, Score from @Scores union all select YearMonth + YearMonth % 100 / 12 * 88 + 1 as YearMonth, Score from cte where not exists --the resultant YearMonth value should not lie in the original table ( select s.YearMonth from @Scores s where s.YearMonth = (cte.YearMonth + cte.YearMonth % 100 / 12 * 88 + 1) ) --stop at current month and cte.YearMonth < month(getdate()) + year(getdate())*100 ) select * from cte order by YearMonth desc

**Explanation:**

The important point is to increment the value of `YearMonth`

correctly. So `200811`

should get incremented to `200812`

but `200812`

should get incremented to `200901`

. This isn’t difficult if we introduce a case statement like this:

`YearMonth + (Case When YearMonth%100 < 12 Then 1 Else 89 End)`

But I wanted to do this purely using arithmetic with no `Case`

statements, so I came up with this formula:

`(YearMonth % 100 / 12 * 88) + 1`

Note that the factor `(YearMonth % 100 / 12 * 88)`

will reduce to zero for all values from January to November, i.e. from `200801`

to `200811`

.

I hope you enjoyed the solution.

**Update:**

It was pointed out in one of the comments by Rakesh that the solution could reach the default limit of recursion which is 100. In order to avoid this, we need to add `option (maxrecursion 12,000)`

in the final select statement. Then, we can have 10,000 years missing between two adjacent entries. Thanks, Rakesh.

... select * from cte order by YearMonth desc option (maxrecursion 12,000)

TSQL Challenge 11 was a practical problem. Given a a list of products and a list of discount coupons, we needed to find the minimum price for all the products based on certain rules. Here are those rules:

- Maximum two coupons can be applied on the same product
- The discount price can not be less than 70% of the original price
- The total amount of the discount can not exceed 30$

Also, note that coupons are applied in a cumulative way. So the second coupon is applied on the result of the original price + first coupon.

**Sample Data:**

Here is some sample products data:

ID NAME PRICE -- ------- --------- 1 PROD 1 100,00 2 PROD 2 220,00 3 PROD 3 15,00 4 PROD 4 70,00 5 PROD 5 150,00

Here are the coupons to be applied

ID NAME VALUE IS_PERCENT -- ----------- ------ ---------- 1 CP 1 : -15$ 15 0 2 CP 2 : -5$ 5 0 3 CP 3 : -10% 10 1 4 CP 4 : -12$ 12 0

And here’s the required output:

ID NAME PRICE DISC_PRICE TOT_DISC RATE COUPON_NAMES 2.-- ------ -------- ----------- --------- ------- ------------------------- 3.1 PROD 1 100.00$ 73.00$ 27.00$ 27.00% CP 4 : -12$ + CP 1 : -15$ 4.2 PROD 2 220.00$ 193.00$ 27.00$ 12.27% CP 4 : -12$ + CP 1 : -15$ 5.3 PROD 3 15.00$ 13.50$ 1.50$ 10.00% CP 3 : -10% 6.4 PROD 4 70.00$ 49.50$ 20.50$ 29.28% CP 1 : -15$ + CP 3 : -10% 7.5 PROD 5 150.00$ 120.00$ 30.00$ 20.00% CP 3 : -10% + CP 1 : -15$

**Solution**

Interesting enough… So lets attempt to find a solution. Read the rest of this entry »

TSQL Challenge 10 was an interesting one. We needed to sort a result set horizontally as well as vertically. That is, given the following as an input:

We need to write a query that :

- Sort the values horizontally: Arrange the values from smallest to the largest. for example, the first row contains values “2”, “1” and “3”, it should be arranged as “1”, “2” and “3”.
- Sort the rows vertically: This is the regular sorting that we are familiar with.
- Remove duplicates: Duplicate rows should be removed from the final output.

And here’s the expected output:

I tackled the problem systematically in series of steps, each represented by a CTE:

- First, I assigned a rowID to each row of the input
- Then I merged the three columns into one using unpivot transformation
- Then I sorted that merged result grouped by rowID (horizontal sorting)
- Then I used pivot to split the values into three columns based on that sorting
- Finally I sorted the result (vertical sorting) and displayed distinct rows

Here are the CTE definitions along with the result at each step: Read the rest of this entry »

The BeyondRelational team is working hard to present us with cool TSQL challenges. For challenge 9, the contestants were required to find the first and last IDs for consecutive rows with same values of Send and Ack states. That is, given the following as an input,

We were required to produce the following output.

An important part of the challenge was to write a **scalable** query capable of handling milllions of rows. This requirement kicked off the following simple answer to the question involving min/max subqueries:

;With Result as ( Select (Select IsNull(Max(C.id)+1, (Select Min(id) from @tc9)) from @tc9 C where C.id <= A.id and (C.SendState<>A.SendState or C.AckState<>A.AckState) ) as MinID , (Select IsNull(Min(C.id)-1, (Select Max(id) from @tc9)) from @tc9 C where C.id >= A.id and (C.SendState<>A.SendState or C.AckState<>A.AckState) ) as MaxID ,A.SendState, A.AckState From @tc9 A ) Select distinct C.MinID, C.MaxID, C.SendState, C.AckState from Result C order by C.MinID

The above solution tries to find Min/Max IDs for every row using a subquery and hence will result in hopeless speed when executed on very large tables. We need to find another more efficient way to solve the problem.

Luckily, T-SQL 2005 presents us with ranking funcitons that are very helpful in various scenarios. Consider assigning a row number per Send and Ack states using the following query.

;With RowNoPerStateCombination As ( Select *, --assign a row number grouped per SendState/AckState combination ROW_NUMBER() OVER(PARTITION BY SendState,AckState ORDER BY ID) AS RowID From @tc9 ) Select * from RowNoPerStateCombination order by ID

The output is

Notice a great pattern here: if we try to subtract this generated RowID from the primary key(ID), it is going to give us a unique result for every consecutive send/ack state combination. This is depicted using the following screenshot from Excel.

Since we are able to produce a unique result (say it GroupID) for every consecutive send/ack state combinations, we can just pick the min and max values per GroupID per Send/Ack State combination. Here’s what I am talking about:

;With MessageGroups as ( Select *, --assign a unique group number for each consequtive state combination ID - ROW_NUMBER() OVER(PARTITION BY SendState,AckState ORDER BY ID) AS GroupID From @tc9 Where CreationDate between @startTime and @endTime ) Select MIN(ID) as FirstIdInclusive, MAX(ID) as LastIdInclusive, SendState, AckState From MessageGroups Group by GroupID, SendState, AckState Order by MIN(ID)

And we get the required output.

Finally, here’s my complete solution.

--populate sample data DECLARE @tc9 TABLE( ID INT IDENTITY(1,1), CreationDate DATETIME, Content NVARCHAR(10), SendState BIT, AckState BIT ) INSERT INTO @tc9 (CreationDate,Content,SendState,AckState) SELECT GETDATE()-1.0,'Msg #1',0,0 UNION SELECT GETDATE()-0.9,'Msg #2',0,0 UNION SELECT GETDATE()-0.8,'Msg #3',1,1 UNION SELECT GETDATE()-0.7,'Msg #4',1,1 UNION SELECT GETDATE()-0.6,'Msg #5',1,1 UNION SELECT GETDATE()-0.5,'Msg #6',1,0 UNION SELECT GETDATE()-0.4,'Msg #7',1,0 UNION SELECT GETDATE()-0.3,'Msg #8',1,0 UNION SELECT GETDATE()-0.2,'Msg #9',1,0 UNION SELECT GETDATE()-0.1,'Msg #10',1,1 --SELECT * FROM @tc9 --solution Declare @startTime datetime Declare @endTime datetime set @startTime = GetDate()-20.8 set @endTime = GetDate() ;With MessageGroups as ( Select *, --assign a unique group number for each consequtive state combination ID - ROW_NUMBER() OVER(PARTITION BY SendState,AckState ORDER BY ID) AS GroupID From @tc9 Where CreationDate between @startTime and @endTime ) Select MIN(ID) as FirstIdInclusive, MAX(ID) as LastIdInclusive, SendState, AckState From MessageGroups Group by GroupID, SendState, AckState Order by MIN(ID)

Last week, I submitted an entry for T-SQL challenge 8. This time, the contestants were asked to process a hierarchy using recursive CTEs without applying any filters inside the CTE. The big challenge was the condition that the CTE should not contain any filter for a specific manager.

Here is some code for populating test data.

--Populate test data DECLARE @Employees TABLE (EmpID INT, EmpName VARCHAR(20), ReportsTo INT) INSERT INTO @Employees(EmpID, EmpName, ReportsTo) SELECT 1, 'Jacob', NULL UNION ALL SELECT 2, 'Rui', NULL UNION ALL SELECT 3, 'Jacobson', NULL UNION ALL SELECT 4, 'Jess', 1 UNION ALL SELECT 5, 'Steve', 1 UNION ALL SELECT 6, 'Bob', 1 UNION ALL SELECT 7, 'Smith', 2 UNION ALL SELECT 8, 'Bobbey', 2 UNION ALL SELECT 9, 'Steffi', 3 UNION ALL SELECT 10, 'Bracha', 3 UNION ALL SELECT 11, 'John', 5 UNION ALL SELECT 12, 'Michael', 6 UNION ALL SELECT 13, 'Paul', 6 UNION ALL SELECT 14, 'Lana', 7 UNION ALL SELECT 15, 'Johnson', 7 UNION ALL SELECT 16, 'Mic', 8 UNION ALL SELECT 17, 'Stev', 8 UNION ALL SELECT 18, 'Paulson', 9 UNION ALL SELECT 19, 'Jessica', 10

I started by creating a recursive CTE for the whole table like this:

;With Hierarchy(EmpName, EmpID, Level, FullyQualifiedName) As ( Select E.EmpName, E.EmpID, 0, Cast('.'+E.EmpName+'.' as Varchar(MAX)) From @Employees E Where E.ReportsTo is null Union all Select E.EmpName, E.EmpID, H.Level+1, H.FullyQualifiedName+'.'+E.EmpName+'.' from @Employees E inner join Hierarchy H on H.EmpID=E.ReportsTo ) Select Space(Level*4) + H.EmpName from Hierarchy H order by H.FullyQualifiedName

Notice that I am constructing a ** FullyQualifiedName** value for each row. This value consists of full path from root to the current person. This approach helped me in filtering the records in the final select statement where I just needed to look for a

`'.' + ManagerName + '.'`

Select Space(Level*4) + H.EmpName from Hierarchy H where CHARINDEX('.'+(Select Top(1) E.EmpName from @Employees E Where E.EmpName=@manager)+'.', H.FullyQualifiedName) > 0 order by H.FullyQualifiedName

The only issue that remains is that when the query is run for a non-parent employee, we get extra spaces in the beginning. E.g. when we run the query for “Paul”, we get an output like:

I solved it by inserted a sub-query in the space() expression like this:

Select Space((Level-(Select Top(1) Level from Hierarchy H2 Where H2.EmpName=@manager))*4) + EmpName from Hierarchy H where CHARINDEX('.'+(Select Top(1) E.EmpName from @Employees E Where E.EmpName=@manager)+'.', H.FullyQualifiedName) > 0 order by H.FullyQualifiedName

Instead of two subqueries, I tried to use a `join`

but the performance using sub-queries was better so I submitted the subquery solution. Here’s the join version I am referring, it looks very nice and clean:

Select Space((H.Level-H2.Level)*4) + H.EmpName from Hierarchy H inner join Hierarchy H2 on CHARINDEX('.'+H2.EmpName+'.', H.FullyQualifiedName) > 0 where H2.EmpName=@manager order by H.FullyQualifiedName

In the last, here is the complete solution:

--Populate test data DECLARE @Employees TABLE (EmpID INT, EmpName VARCHAR(20), ReportsTo INT) INSERT INTO @Employees(EmpID, EmpName, ReportsTo) SELECT 1, 'Jacob', NULL UNION ALL SELECT 2, 'Rui', NULL UNION ALL SELECT 3, 'Jacobson', NULL UNION ALL SELECT 4, 'Jess', 1 UNION ALL SELECT 5, 'Steve', 1 UNION ALL SELECT 6, 'Bob', 1 UNION ALL SELECT 7, 'Smith', 2 UNION ALL SELECT 8, 'Bobbey', 2 UNION ALL SELECT 9, 'Steffi', 3 UNION ALL SELECT 10, 'Bracha', 3 UNION ALL SELECT 11, 'John', 5 UNION ALL SELECT 12, 'Michael', 6 UNION ALL SELECT 13, 'Paul', 6 UNION ALL SELECT 14, 'Lana', 7 UNION ALL SELECT 15, 'Johnson', 7 UNION ALL SELECT 16, 'Mic', 8 UNION ALL SELECT 17, 'Stev', 8 UNION ALL SELECT 18, 'Paulson', 9 UNION ALL SELECT 19, 'Jessica', 10 --Solution starts here DECLARE @manager VARCHAR(20) SELECT @manager = 'Jacob' --CTE ;With Hierarchy(EmpName, EmpID, Level, FullyQualifiedName) As ( Select E.EmpName, E.EmpID, 0, Cast('.'+E.EmpName+'.' as Varchar(MAX)) From @Employees E Where E.ReportsTo is null Union all Select E.EmpName, E.EmpID, H.Level+1, H.FullyQualifiedName+'.'+E.EmpName+'.' from @Employees E inner join Hierarchy H on H.EmpID=E.ReportsTo ) --Result Select Space((Level-(Select Top(1) Level from Hierarchy H2 Where H2.EmpName=@manager))*4) + EmpName from Hierarchy H where CHARINDEX('.'+(Select Top(1) E.EmpName from @Employees E Where E.EmpName=@manager)+'.', H.FullyQualifiedName) > 0 order by H.FullyQualifiedName

A few weeks ago, I stumbled upon this blog that presents cool T-SQL challenges. I submitted an entry for challenge 7 that asked to write the shortest script to list the 5 biggest tables on a server. Here’s my solution:

create table #temp ( [database] nvarchar(MAX), [table] nvarchar(MAX), [rows] int, [reserved_size] nvarchar(100), [data_size] nvarchar(100), [index_size] nvarchar(100), [unused_space] nvarchar(100) ) declare @sql nvarchar(MAX) set @sql=replace('if !~! not in (!master!,!model!,!msdb!,!tempdb!) exec [~].dbo.sp_msforeachtable "insert into #temp([table], [rows], [reserved_size], [data_size], [index_size], [unused_space]) exec [~].dbo.sp_spaceused !?!"','!',char(39)) EXEC sp_MSForEachDB @command1=@sql, @command2="update #temp set [database]='~' where [database] is null", @replacechar='~' select top(5) [database] as base, [table], [data_size] as size, [rows] as rows from #temp order by Cast(LEFT([data_size],len([data_size])-3) as int) desc drop table #temp

So, I started by creating a temporary table with columns (`database, table, rows, reserved_size, data_size, index_size, unused_space`

) for the output. I used the two undocumented stored procedures `sp_MSforeachdb`

and `sp_MSforeachtable`

to iterate through all the tables in all the databases and executed sp_spaceused as described in the following pseudo code:

foreach(database db in serverDatabases) if (db not in 'master', 'msdb', 'model', 'tempdb') foreach(table t in db.Tables) { insert into #temp (table, rows, reserved_size, data_size, index_size, unused_space) execute sp_spaceused for table 't' --at this point, our #temp table will be populated with data for each table --but the 'database' column will be 'null', so now replace it with the name of database update #temp set [database] = 'db' where [database] is null }

The most important part is that I am using an update operation for storing the database name in the temporary table. Thanks to Microsoft that we can give a set of 3 commands to the above mentioned undocumented stored procedures. Another hard part was to create a single t-sql statement that iterates for all tables inside a database and execute sp_spaceused. I did this by a complex combination of single quotes, double quotes and the replace function. In the last, I am just selecting the top(5) rows ordered by size.

To enter the contest, I reduced the script length by replacing all the variable/column names with a single length identifier. Here was my final submission:

create table #t(d nvarchar(MAX),t nvarchar(MAX),r int,x nvarchar(100),s nvarchar(100),y nvarchar(100),z nvarchar(100)) declare @s nvarchar(MAX) set @s=replace('if !~! not in (!master!,!model!,!msdb!,!tempdb!) exec [~].dbo.sp_msforeachtable "insert into #t(t, r,x,s,y,z) exec [~].dbo.sp_spaceused !?!"','!',char(39)) EXEC sp_MSForEachDB @command1=@s, @command2="update #t set d='~' where d is null", @replacechar='~' select top(5) d as base, t as [table], s as size, r as rows from #t order by Cast(LEFT(s,len(s)-3) as int) desc drop table #t

Let’s wait and see the solution of other players.