The powerful APPLY operator in Transact SQL is typically used to invoke a table-valued function for each row. Here’s an MSDN article that describes such usage. But a very interesting usage of the cross apply operator is to produce a much readable code in cascading calculations. I was working on an SSRS report a couple of days back and found it really helpful. Let’s have a look how.
Consider this simple data regarding employees, their regular hours, overtime hours, overtime code and regular rate:
With SampleData (Employee, RegHours, OTHours, OTCode, RegRate) as
(
Select 'Employee1', 100, 10, 'OT1', 10
union all
Select 'Employee2', 100, 10, 'OT2', 10
union all
Select 'Employee3', 100, 10, 'OT3', 10
)
Select *
from SampleData
/*
Output:
Employee RegHours OTHours OTCode RegRate
--------- ----------- ----------- ------ -----------
Employee1 100 10 OT1 10
Employee2 100 10 OT2 10
Employee3 100 10 OT3 10
*/
Suppose, we need to come up with the following result:
Employee RegHours OTHours OTCode RegRate OTMultiplier OTRate OTCost RegCost TotalHours TotalCost EffectiveRate --------- --------- -------- ------ ------- ------------ ------- ------- ------- ---------- --------- ------------- Employee1 100 10 OT1 10 1.2 12.0 120.0 1000 110 1120.0 10.18 Employee2 100 10 OT2 10 1.4 14.0 140.0 1000 110 1140.0 10.36 Employee3 100 10 OT3 10 1.5 15.0 150.0 1000 110 1150.0 10.45
Here’s how the above columns need to be calculated:
- OT Multiplier: To be calculated from OTType
- OT Rate: RegRate * OTMultiplier
- OT Cost: OTHours * OTCost
- Reg Cost: RegHours * RegCost
- Total Hours: RegHours + OTHours
- Total Cost: RegCost + OTCost
- Effective Rate: TotalCost / TotalHours
In the above calculations, the point to be noted is that almost every next column depends upon the result of a previous one. For example, Effective Rate depends upon Total Cost, that depends upon OT Cost, that depends upon OT Rate, that depends upon OT Multiplier, that depends upon the value in OT Type field.
The most straight forward attempt using a simple TSQL query would look something like:
Select *
,OTMultiplier =
Case
when OTCode='OT1' then 1.2
when OTCode='OT2' then 1.4
when OTCode='OT3' then 1.5
end
,OTRate =
Case
when OTCode='OT1' then 1.2 * RegRate
when OTCode='OT2' then 1.4 * RegRate
when OTCode='OT3' then 1.5 * RegRate
end
, OTCost =
Case
when OTCode='OT1' then 1.2 * RegRate * OTHours
when OTCode='OT2' then 1.4 * RegRate * OTHours
when OTCode='OT3' then 1.5 * RegRate * OTHours
end
, RegCost = RegHours * RegRate
, TotalHours = RegHours + OTHours
, TotalCost =
Case
when OTCode='OT1' then 1.2 * RegRate * OTHours + RegHours * RegRate
when OTCode='OT2' then 1.4 * RegRate * OTHours + RegHours * RegRate
when OTCode='OT3' then 1.5 * RegRate * OTHours + RegHours * RegRate
End
, EffectiveRate =
Case
when OTCode='OT1' then (1.2 * RegRate * OTHours + RegHours * RegRate) / (RegHours + OTHours)
when OTCode='OT2' then (1.4 * RegRate * OTHours + RegHours * RegRate) / (RegHours + OTHours)
when OTCode='OT3' then (1.5 * RegRate * OTHours + RegHours * RegRate) / (RegHours + OTHours)
End
From SampleData
The above query is fairly unreadable and difficult to maintain. It is because the effect of OT Type field is propagated to all the columns. What if a new OT Type is added, or any OT Multiplier logic is changed? We will then need to modify a lot of places. A much better code can be achieved if we use several CTEs or derived tables, but even more readable and maintainable code can be produced by using the cross apply operator. Here’s an example:
Select *
from SampleData
cross apply
(
select OTMultiplier =
Case
when OTCode='OT1' then 1.2
when OTCode='OT2' then 1.4
when OTCode='OT3' then 1.5
end
) as ca_otmultiplier
cross apply
(
select OTRate = RegRate * OTMultiplier
) as ca_otrate
cross apply
(
select OTCost = OTHours * OTRate
, RegCost = RegHours * RegRate
) as ca_otcost
cross apply
(
select TotalHours = OTHours + RegHours
, TotalCost = RegCost + OTCost
) as ca_totals
cross apply
(
select EffectiveRate = TotalCost / TotalHours
) as ca_effective
The great advantage is that the calculated result of every cross apply operator can be used into the next one that follows it. Another advantage as pointed by Brad is that such readability and maintainability comes without any extra cost. So the cross apply approach is a great choice if we have many calculated columns, each of which depends upon the result of previously calculated ones.
January 6, 2010 at 11:55 PM
T-SQL: Calculating cascading computed columns using CROSS APPLY operator « Mehroz’s Experiments…
Thank you for submitting this cool story – Trackback from DotNetShoutout…
January 7, 2010 at 1:07 AM
The APPLY operator is by far my favorite… especially in using it as you described in this post. I promote it all the time on the forums.
The REALLY great thing about it is that it makes the query more readable at NO COST WHATSOEVER. If you compare the query plans of your original long hard-to-read-and-maintain query with the CROSS APPLY query, they will be identical in every way.
–Brad
January 7, 2010 at 4:54 PM
That’s great, Brad. Thanks very much for sharing such useful info.
January 16, 2010 at 6:00 PM
[...] http://smehrozalam.wordpress.com/2010/01/06/t-sql-calculating-cascading-computed-columns-using-cross… Posted: 01-16-2010 4:00 PM by Jacob Sebastian | with no comments Filed under: TSQL Beyondrelational.com // // LoadAuthorInfo(); var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www."
; document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E")); try { var pageTracker = _gat._getTracker("UA-9085790-1"); pageTracker._trackPageview(); } catch(err) {} [...]
January 18, 2010 at 2:12 PM
Is Cross apply calculation increase query execution cost for huge data?
January 18, 2010 at 3:07 PM
Hi Vishal,
If you are using cross apply for calculations like this post describes, it should not impose any extra execution cost. You can verify this by looking at the query plan. To further investigate, you can see the query execution statistics by running the following statements before your query:
SET STATISTICS IO ON
SET STATISTICS TIME ON
Hope that helps.
March 5, 2011 at 2:23 AM
Hi there, I am trying to compute cascading date calculations (each date based on prior date, e.g., using dateadd()) and this method is choking on me. I replaced the dates with integers (as a test) and it worked as expected. Put dates back in, and it chokes. Is this a memory issue? Any suggestions how to get around this? Thanks!!
March 5, 2011 at 2:31 AM
Hi Mila…
If your CROSS APPLY is just introducing new columns as Mehroz demonstrated in this blog article, then you shouldn’t have a problem. But if your CROSS APPLY is doing a correlated subquery of some kind, then that can certainly impact performance. Can you give an example of what you are doing?
August 17, 2011 at 12:18 AM
This is a god-send! Thanks very much for this information! Works like a charm and I don’t have to write a bunch of crazy functions.
December 17, 2011 at 4:58 AM
I’m getting the following error after using a number of cross apply’s
Internal error: An expression services limit has been reached. Please look for potentially complex expressions in your query, and try to simplify them.
Has anyone else come across this and got a work around?