TSQL Challenge 9: Getting longest chain of consecutive alike/duplicate rows

The BeyondRelational team is working hard to present us with cool TSQL challenges. For challenge 9, the contestants were required to find the first and last IDs for consecutive rows with same values of Send and Ack states. That is, given the following as an input,

Input

We were required to produce the following output.

Output

An important part of the challenge was to write a scalable query capable of handling milllions of rows. This requirement kicked off the following simple answer to the question involving min/max subqueries:

;With Result
as
(
	Select
	(Select IsNull(Max(C.id)+1, (Select Min(id) from @tc9)) from @tc9 C
			where
				C.id <= A.id and
				(C.SendState<>A.SendState or C.AckState<>A.AckState)
			) as MinID
	,
	(Select IsNull(Min(C.id)-1, (Select Max(id) from @tc9)) from @tc9 C
			where
				C.id >= A.id and
				(C.SendState<>A.SendState or C.AckState<>A.AckState)
			) as MaxID
	,A.SendState, A.AckState
	From @tc9 A
)

Select distinct C.MinID, C.MaxID, C.SendState, C.AckState
from Result C
order by C.MinID

The above solution tries to find Min/Max IDs for every row using a subquery and hence will result in hopeless speed when executed on very large tables. We need to find another more efficient way to solve the problem.

Luckily, T-SQL 2005 presents us with ranking funcitons that are very helpful in various scenarios. Consider assigning a row number per Send and Ack states using the following query.

;With RowNoPerStateCombination
As
(
    Select
        *,
        --assign a row number grouped per SendState/AckState combination
        ROW_NUMBER() OVER(PARTITION BY SendState,AckState ORDER BY ID) AS RowID
    From @tc9
)

Select * from RowNoPerStateCombination
order by ID

The output is

RowIDs

Notice a great pattern here: if we try to subtract this generated RowID from the primary key(ID), it is going to give us a unique result for every consecutive send/ack state combination. This is depicted using the following screenshot from Excel.
Pattern

Since we are able to produce a unique result (say it GroupID) for every consecutive send/ack state combinations, we can just pick the min and max values per GroupID per Send/Ack State combination. Here’s what I am talking about:

;With MessageGroups
as
(
    Select
        *,
        --assign a unique group number for each consequtive state combination
        ID - ROW_NUMBER() OVER(PARTITION BY SendState,AckState ORDER BY ID) AS GroupID
    From @tc9
    Where CreationDate between @startTime and @endTime
)

Select MIN(ID) as FirstIdInclusive, MAX(ID) as LastIdInclusive, SendState, AckState
From MessageGroups
Group by GroupID, SendState, AckState
Order by MIN(ID)

And we get the required output.

Output

Finally, here’s my complete solution.

--populate sample data
DECLARE @tc9 TABLE(
    ID INT IDENTITY(1,1),
    CreationDate DATETIME,
    Content NVARCHAR(10),
    SendState BIT,
    AckState BIT
)

INSERT INTO @tc9 (CreationDate,Content,SendState,AckState)
SELECT GETDATE()-1.0,'Msg #1',0,0 UNION
SELECT GETDATE()-0.9,'Msg #2',0,0 UNION
SELECT GETDATE()-0.8,'Msg #3',1,1 UNION
SELECT GETDATE()-0.7,'Msg #4',1,1 UNION
SELECT GETDATE()-0.6,'Msg #5',1,1 UNION
SELECT GETDATE()-0.5,'Msg #6',1,0 UNION
SELECT GETDATE()-0.4,'Msg #7',1,0 UNION
SELECT GETDATE()-0.3,'Msg #8',1,0 UNION
SELECT GETDATE()-0.2,'Msg #9',1,0 UNION
SELECT GETDATE()-0.1,'Msg #10',1,1

--SELECT * FROM @tc9

--solution
Declare @startTime datetime
Declare @endTime datetime

set @startTime = GetDate()-20.8
set @endTime = GetDate()

;With MessageGroups
as
(
    Select
        *,
        --assign a unique group number for each consequtive state combination
        ID - ROW_NUMBER() OVER(PARTITION BY SendState,AckState ORDER BY ID) AS GroupID
    From @tc9
    Where CreationDate between @startTime and @endTime
)

Select MIN(ID) as FirstIdInclusive, MAX(ID) as LastIdInclusive, SendState, AckState
From MessageGroups
Group by GroupID, SendState, AckState
Order by MIN(ID)
Advertisements

4 Responses to “TSQL Challenge 9: Getting longest chain of consecutive alike/duplicate rows”

  1. PKG Says:

    Hi,

    Like your solution here, but I seem to run into a problem with a similar situation:

    If you add one more row after the last row of data:

    SELECT GETDATE(),’Msg #11′,1,0

    This last row would get a RowID = 5, and 11 – 5 = 6, which is no longer a unique group ID.

    Please let me know if I’m missing something obvious, or if you know of a solution, as I’m pretty stuck on this at the moment!

    Thanks

    PKG

  2. Mayank Says:

    Hi
    I really appreciate this solution.

    Thanks,

  3. phen375 reviews Says:

    I think this is among the most vital info for me. And i am glad reading your article.
    But want to remark on some general things, The site style is ideal,
    the articles is really nice : D. Good job, cheers

  4. Nolalala Says:

    First commenter is right. The suggested solution is simply wrong!


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: