TSQL Challenge 12: Completing sequence by inserting missing rows

TSQL Challenge 12 was a relatively easier one. The participants were given month-wise score values and were asked to complete the sequence by creating entries for missing month.

Here’s the sample input:

YearMonth   Score
----------- -----------
200903      100
200803      95
200802      99
200801      100
200711      100

And here’s the desired output. Notice that the score of last month is replicated in each of the missing rows:

YearMonth   Score
----------- -----------
200908      100
200907      100
200906      100
200905      100
200904      100
200903      100
200902      95
200901      95
200812      95
200811      95
200810      95
200809      95
200808      95
200807      95
200806      95
200805      95
200804      95
200803      95
200802      99
200801      100
200712      100
200711      100

The script to generate sample data is provided below:

DECLARE @Scores TABLE
(
	YearMonth	INT,
	Score		INT
)

INSERT @Scores VALUES(200903, 100)
INSERT @Scores VALUES(200803, 95)
INSERT @Scores VALUES(200802, 99)
INSERT @Scores VALUES(200801 ,100)
INSERT @Scores VALUES(200711, 100)

Solution

I once blogged about creating a sequence of numbers/dates using a recursive CTE in this post. The same technique can be used here. However, since the YearMonth column in the sample data is integer, we have two choices:

  1. Convert it to DateTime and apply T-SQL DateTime functions
  2. Leave it as Integer and apply some intelligent arithmetic

I am providing both the solutions here. Note that the first solution will be slower due to overhead of casting and applying T-SQL scalar functions.

The first solution: Converting to DateTime

;with cte as
(
    select score, Cast(Cast(YearMonth as varchar)+'01' as datetime) as dateVal
        from @scores
    union all
    select score, dateadd(month, 1, dateval)
        from cte
        where not exists
            --the resultant YearMonth value should not lie in the original table
            ( select 1 from @scores s where s.YearMonth = cast( left(convert(varchar, dateadd(month, 1, cte.dateval), 112), 6) as int) )
            --stop at current month
            and dateadd(month, 1, cte.dateval) < getdate()
)

select left(convert(varchar, dateval, 112), 6) as yearmonth, score
from cte
order by dateval desc

Explanation:
Here I am simply converting the integer YearMonth column to a datetime dateval column by appending 01 to the end (so a 200901 becomes 20090101 that can easily be cast to a dateTime) and then finding subsequent dates by adding one month in each CTE iteration.

The second solution: Integer Arithmetic

;with cte as
(
    select YearMonth, Score
		from @Scores
    union all
    select YearMonth + YearMonth % 100 / 12 * 88 + 1 as YearMonth, Score
        from cte
        where not exists
            --the resultant YearMonth value should not lie in the original table
            ( select s.YearMonth from @Scores s where s.YearMonth = (cte.YearMonth + cte.YearMonth % 100 / 12 * 88 + 1) )
            --stop at current month
            and cte.YearMonth < month(getdate()) + year(getdate())*100
)

select *
from cte
order by YearMonth desc

Explanation:
The important point is to increment the value of YearMonth correctly. So 200811 should get incremented to 200812 but 200812 should get incremented to 200901. This isn’t difficult if we introduce a case statement like this:

YearMonth + (Case When YearMonth%100 < 12 Then 1 Else 89 End)

But I wanted to do this purely using arithmetic with no Case statements, so I came up with this formula:

(YearMonth % 100 / 12 * 88) + 1

Note that the factor (YearMonth % 100 / 12 * 88) will reduce to zero for all values from January to November, i.e. from 200801 to 200811.

I hope you enjoyed the solution.

Update:
It was pointed out in one of the comments by Rakesh that the solution could reach the default limit of recursion which is 100. In order to avoid this, we need to add option (maxrecursion 12,000) in the final select statement. Then, we can have 10,000 years missing between two adjacent entries. Thanks, Rakesh.

...
select *
from cte
order by YearMonth desc
option (maxrecursion 12,000)