The BeyondRelational team is working hard to present us with cool TSQL challenges. For challenge 9, the contestants were required to find the first and last IDs for consecutive rows with same values of Send and Ack states. That is, given the following as an input,
We were required to produce the following output.
An important part of the challenge was to write a scalable query capable of handling milllions of rows. This requirement kicked off the following simple answer to the question involving min/max subqueries:
;With Result
as
(
Select
(Select IsNull(Max(C.id)+1, (Select Min(id) from @tc9)) from @tc9 C
where
C.id <= A.id and
(C.SendState<>A.SendState or C.AckState<>A.AckState)
) as MinID
,
(Select IsNull(Min(C.id)-1, (Select Max(id) from @tc9)) from @tc9 C
where
C.id >= A.id and
(C.SendState<>A.SendState or C.AckState<>A.AckState)
) as MaxID
,A.SendState, A.AckState
From @tc9 A
)
Select distinct C.MinID, C.MaxID, C.SendState, C.AckState
from Result C
order by C.MinID
The above solution tries to find Min/Max IDs for every row using a subquery and hence will result in hopeless speed when executed on very large tables. We need to find another more efficient way to solve the problem.
Luckily, T-SQL 2005 presents us with ranking funcitons that are very helpful in various scenarios. Consider assigning a row number per Send and Ack states using the following query.
;With RowNoPerStateCombination
As
(
Select
*,
--assign a row number grouped per SendState/AckState combination
ROW_NUMBER() OVER(PARTITION BY SendState,AckState ORDER BY ID) AS RowID
From @tc9
)
Select * from RowNoPerStateCombination
order by ID
The output is
Notice a great pattern here: if we try to subtract this generated RowID from the primary key(ID), it is going to give us a unique result for every consecutive send/ack state combination. This is depicted using the following screenshot from Excel.
Since we are able to produce a unique result (say it GroupID) for every consecutive send/ack state combinations, we can just pick the min and max values per GroupID per Send/Ack State combination. Here’s what I am talking about:
;With MessageGroups
as
(
Select
*,
--assign a unique group number for each consequtive state combination
ID - ROW_NUMBER() OVER(PARTITION BY SendState,AckState ORDER BY ID) AS GroupID
From @tc9
Where CreationDate between @startTime and @endTime
)
Select MIN(ID) as FirstIdInclusive, MAX(ID) as LastIdInclusive, SendState, AckState
From MessageGroups
Group by GroupID, SendState, AckState
Order by MIN(ID)
And we get the required output.
Finally, here’s my complete solution.
--populate sample data
DECLARE @tc9 TABLE(
ID INT IDENTITY(1,1),
CreationDate DATETIME,
Content NVARCHAR(10),
SendState BIT,
AckState BIT
)
INSERT INTO @tc9 (CreationDate,Content,SendState,AckState)
SELECT GETDATE()-1.0,'Msg #1',0,0 UNION
SELECT GETDATE()-0.9,'Msg #2',0,0 UNION
SELECT GETDATE()-0.8,'Msg #3',1,1 UNION
SELECT GETDATE()-0.7,'Msg #4',1,1 UNION
SELECT GETDATE()-0.6,'Msg #5',1,1 UNION
SELECT GETDATE()-0.5,'Msg #6',1,0 UNION
SELECT GETDATE()-0.4,'Msg #7',1,0 UNION
SELECT GETDATE()-0.3,'Msg #8',1,0 UNION
SELECT GETDATE()-0.2,'Msg #9',1,0 UNION
SELECT GETDATE()-0.1,'Msg #10',1,1
--SELECT * FROM @tc9
--solution
Declare @startTime datetime
Declare @endTime datetime
set @startTime = GetDate()-20.8
set @endTime = GetDate()
;With MessageGroups
as
(
Select
*,
--assign a unique group number for each consequtive state combination
ID - ROW_NUMBER() OVER(PARTITION BY SendState,AckState ORDER BY ID) AS GroupID
From @tc9
Where CreationDate between @startTime and @endTime
)
Select MIN(ID) as FirstIdInclusive, MAX(ID) as LastIdInclusive, SendState, AckState
From MessageGroups
Group by GroupID, SendState, AckState
Order by MIN(ID)