LINQ: How to get the latest/last record with a group by clause

Sometimes, we need to select a record that has the maximum or minimum value from a group. Some examples are:

  • We have a PersonOrders table and we want to find the last Order for each Person
  • We have a UserOperations table and we need to find theĀ first Operation for each User
  • We have a Persons table and we need to find the person with maximum salary for each Department

Such scenarios are beautifully handled by TSQL’s Ranking_functions over partition by clauses. Here’s a typical example that uses Row_Number() function to assign a rank to each row per Person.

;With PersonOrderWithRank 
as
(
	Select *, Rnk = ROW_NUMBER() over (partition by PersonID order by OrderDate desc)
	from PersonOrders
)

Select * 
from PersonOrderWithRank
where Rnk=1

In LINQ, similar result can be achieved by using the let keyword. Here’s an example:

from p in PersonOrders
//where conditions or joins with other tables to be included here
group p by p.PersonID into grp
let MaxOrderDatePerPerson = grp.Max ( g=>g.OrderDate )

from p in grp
where p.OrderDate == MaxOrderDatePerPerson
select p

Another more compact method would be to retrieve the first record in the grouping like this:

from p in PersonOrders
//where conditions or joins with other tables to be included here
group p by p.PersonID into grp
select grp.OrderByDescending(g=>g.OrderDate).First()

The above LINQ approaches can be used for both LINQ To SQL as well as Entity Framework. Although the SQL and the LINQ approaches I described above are not exactly identical (since I used Row_Number() and not Rank() or Dense_Rank() ) but the purpose of this post is to provide an starting point to write similar queries using LINQ.

TSQL Challenge 9: Getting longest chain of consecutive alike/duplicate rows

The BeyondRelational team is working hard to present us with cool TSQL challenges. For challenge 9, the contestants were required to find the first and last IDs for consecutive rows with same values of Send and Ack states. That is, given the following as an input,

Input

We were required to produce the following output.

Output

An important part of the challenge was to write a scalable query capable of handling milllions of rows. This requirement kicked off the following simple answer to the question involving min/max subqueries:

;With Result
as
(
	Select
	(Select IsNull(Max(C.id)+1, (Select Min(id) from @tc9)) from @tc9 C
			where
				C.id <= A.id and
				(C.SendState<>A.SendState or C.AckState<>A.AckState)
			) as MinID
	,
	(Select IsNull(Min(C.id)-1, (Select Max(id) from @tc9)) from @tc9 C
			where
				C.id >= A.id and
				(C.SendState<>A.SendState or C.AckState<>A.AckState)
			) as MaxID
	,A.SendState, A.AckState
	From @tc9 A
)

Select distinct C.MinID, C.MaxID, C.SendState, C.AckState
from Result C
order by C.MinID

The above solution tries to find Min/Max IDs for every row using a subquery and hence will result in hopeless speed when executed on very large tables. We need to find another more efficient way to solve the problem.

Luckily, T-SQL 2005 presents us with ranking funcitons that are very helpful in various scenarios. Consider assigning a row number per Send and Ack states using the following query.

;With RowNoPerStateCombination
As
(
    Select
        *,
        --assign a row number grouped per SendState/AckState combination
        ROW_NUMBER() OVER(PARTITION BY SendState,AckState ORDER BY ID) AS RowID
    From @tc9
)

Select * from RowNoPerStateCombination
order by ID

The output is

RowIDs

Notice a great pattern here: if we try to subtract this generated RowID from the primary key(ID), it is going to give us a unique result for every consecutive send/ack state combination. This is depicted using the following screenshot from Excel.
Pattern

Since we are able to produce a unique result (say it GroupID) for every consecutive send/ack state combinations, we can just pick the min and max values per GroupID per Send/Ack State combination. Here’s what I am talking about:

;With MessageGroups
as
(
    Select
        *,
        --assign a unique group number for each consequtive state combination
        ID - ROW_NUMBER() OVER(PARTITION BY SendState,AckState ORDER BY ID) AS GroupID
    From @tc9
    Where CreationDate between @startTime and @endTime
)

Select MIN(ID) as FirstIdInclusive, MAX(ID) as LastIdInclusive, SendState, AckState
From MessageGroups
Group by GroupID, SendState, AckState
Order by MIN(ID)

And we get the required output.

Output

Finally, here’s my complete solution.

--populate sample data
DECLARE @tc9 TABLE(
    ID INT IDENTITY(1,1),
    CreationDate DATETIME,
    Content NVARCHAR(10),
    SendState BIT,
    AckState BIT
)

INSERT INTO @tc9 (CreationDate,Content,SendState,AckState)
SELECT GETDATE()-1.0,'Msg #1',0,0 UNION
SELECT GETDATE()-0.9,'Msg #2',0,0 UNION
SELECT GETDATE()-0.8,'Msg #3',1,1 UNION
SELECT GETDATE()-0.7,'Msg #4',1,1 UNION
SELECT GETDATE()-0.6,'Msg #5',1,1 UNION
SELECT GETDATE()-0.5,'Msg #6',1,0 UNION
SELECT GETDATE()-0.4,'Msg #7',1,0 UNION
SELECT GETDATE()-0.3,'Msg #8',1,0 UNION
SELECT GETDATE()-0.2,'Msg #9',1,0 UNION
SELECT GETDATE()-0.1,'Msg #10',1,1

--SELECT * FROM @tc9

--solution
Declare @startTime datetime
Declare @endTime datetime

set @startTime = GetDate()-20.8
set @endTime = GetDate()

;With MessageGroups
as
(
    Select
        *,
        --assign a unique group number for each consequtive state combination
        ID - ROW_NUMBER() OVER(PARTITION BY SendState,AckState ORDER BY ID) AS GroupID
    From @tc9
    Where CreationDate between @startTime and @endTime
)

Select MIN(ID) as FirstIdInclusive, MAX(ID) as LastIdInclusive, SendState, AckState
From MessageGroups
Group by GroupID, SendState, AckState
Order by MIN(ID)