TSQL Challenge 12: Completing sequence by inserting missing rows

TSQL Challenge 12 was a relatively easier one. The participants were given month-wise score values and were asked to complete the sequence by creating entries for missing month.

Here’s the sample input:

YearMonth   Score
----------- -----------
200903      100
200803      95
200802      99
200801      100
200711      100

And here’s the desired output. Notice that the score of last month is replicated in each of the missing rows:

YearMonth   Score
----------- -----------
200908      100
200907      100
200906      100
200905      100
200904      100
200903      100
200902      95
200901      95
200812      95
200811      95
200810      95
200809      95
200808      95
200807      95
200806      95
200805      95
200804      95
200803      95
200802      99
200801      100
200712      100
200711      100

The script to generate sample data is provided below:

DECLARE @Scores TABLE
(
	YearMonth	INT,
	Score		INT
)

INSERT @Scores VALUES(200903, 100)
INSERT @Scores VALUES(200803, 95)
INSERT @Scores VALUES(200802, 99)
INSERT @Scores VALUES(200801 ,100)
INSERT @Scores VALUES(200711, 100)

Solution

I once blogged about creating a sequence of numbers/dates using a recursive CTE in this post. The same technique can be used here. However, since the YearMonth column in the sample data is integer, we have two choices:

  1. Convert it to DateTime and apply T-SQL DateTime functions
  2. Leave it as Integer and apply some intelligent arithmetic

I am providing both the solutions here. Note that the first solution will be slower due to overhead of casting and applying T-SQL scalar functions.

The first solution: Converting to DateTime

;with cte as
(
    select score, Cast(Cast(YearMonth as varchar)+'01' as datetime) as dateVal
        from @scores
    union all
    select score, dateadd(month, 1, dateval)
        from cte
        where not exists
            --the resultant YearMonth value should not lie in the original table
            ( select 1 from @scores s where s.YearMonth = cast( left(convert(varchar, dateadd(month, 1, cte.dateval), 112), 6) as int) )
            --stop at current month
            and dateadd(month, 1, cte.dateval) < getdate()
)

select left(convert(varchar, dateval, 112), 6) as yearmonth, score
from cte
order by dateval desc

Explanation:
Here I am simply converting the integer YearMonth column to a datetime dateval column by appending 01 to the end (so a 200901 becomes 20090101 that can easily be cast to a dateTime) and then finding subsequent dates by adding one month in each CTE iteration.

The second solution: Integer Arithmetic

;with cte as
(
    select YearMonth, Score
		from @Scores
    union all
    select YearMonth + YearMonth % 100 / 12 * 88 + 1 as YearMonth, Score
        from cte
        where not exists
            --the resultant YearMonth value should not lie in the original table
            ( select s.YearMonth from @Scores s where s.YearMonth = (cte.YearMonth + cte.YearMonth % 100 / 12 * 88 + 1) )
            --stop at current month
            and cte.YearMonth < month(getdate()) + year(getdate())*100
)

select *
from cte
order by YearMonth desc

Explanation:
The important point is to increment the value of YearMonth correctly. So 200811 should get incremented to 200812 but 200812 should get incremented to 200901. This isn’t difficult if we introduce a case statement like this:

YearMonth + (Case When YearMonth%100 < 12 Then 1 Else 89 End)

But I wanted to do this purely using arithmetic with no Case statements, so I came up with this formula:

(YearMonth % 100 / 12 * 88) + 1

Note that the factor (YearMonth % 100 / 12 * 88) will reduce to zero for all values from January to November, i.e. from 200801 to 200811.

I hope you enjoyed the solution.

Update:
It was pointed out in one of the comments by Rakesh that the solution could reach the default limit of recursion which is 100. In order to avoid this, we need to add option (maxrecursion 12,000) in the final select statement. Then, we can have 10,000 years missing between two adjacent entries. Thanks, Rakesh.

...
select *
from cte
order by YearMonth desc
option (maxrecursion 12,000)
Advertisements

4 Responses to “TSQL Challenge 12: Completing sequence by inserting missing rows”

  1. rakesh Says:

    Your Solution is simple and Superb. At the same time, Your Solution won’t work if the mininum YearMonth value contains Year 1999 or less than that year ( Max Recursion will be reached )..Can you modify it.

  2. rakesh Says:

    I had a solution with me but may not be as good as yours..Please go through it..

    DECLARE @Scores TABLE
    (
    YearMonth INT,
    Score INT
    )

    INSERT @Scores VALUES(200903, 100)
    INSERT @Scores VALUES(200803, 95)
    INSERT @Scores VALUES(200802, 99)
    INSERT @Scores VALUES(200801 ,100)
    INSERT @Scores VALUES(190011, 50)

    DECLARE @current INT , @Least INT
    SELECT @current = CONVERT(VARCHAR(6), GETDATE(), 112)

    SELECT @Least = CONVERT(VARCHAR(6), MIN(YearMonth), 112) FROM @Scores

    WHILE ( @current >= @Least )
    BEGIN

    IF NOT EXISTS ( SELECT YearMonth FROM @Scores WHERE YearMonth = @Least )
    BEGIN
    INSERT INTO @Scores ( YearMonth )
    SELECT @Least

    UPDATE S
    SET S.Score = t1.Score
    FROM @Scores s
    INNER JOIN @Scores t1 ON T1.YearMonth = ( CASE WHEN RIGHT(CAST(S.YearMonth AS VARCHAR(6)),2) = ’01’ THEN CAST(CAST(LEFT(CAST(S.YearMonth AS VARCHAR(10)),4) AS INT)-1 AS VARCHAR(4))+’12’ ELSE S.YearMonth-1 END )
    WHERE S.Score IS NULL
    END

    SELECT @Least = @Least+1 WHERE CAST(RIGHT(CAST( @Least AS VARCHAR(10)),2) AS INT) < 13

    SELECT @Least = CAST(CAST(LEFT(CAST(@Least AS VARCHAR(10)),4) AS INT)+1 AS VARCHAR(4))+'01' WHERE CAST(RIGHT(CAST( @Least AS VARCHAR(10)),2) AS INT)=13
    END

    SELECT YearMonth,Score FROM @Scores ORDER BY YearMonth DESC

  3. Syed Mehroz Alam Says:

    Great observation regarding recursion limit, Rakesh. I have updated the post. Thanks for pointing out.

    Also, thanks for sharing your solution. But I am afraid TSQL Challenges team will not regard it as a “set based” solution due to while loop and update queries. Have a look at the guidelines in this post.

  4. Winning TSQL Challenge 12 « Mehroz’s Experiments Says:

    […] a source of great pleasure for me that I am included in the winners for TSQL challenge 12. Here’s my post describing my solution and here’s the analysis by Jacob Sebastian, the founder and […]


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: