T-SQL: Referring computed/calculated columns in the where and group by clause

I previously blogged about how we can write more manageable and readable TSQL queries using CROSS APPLY operator here. This small post is a follow-up and will describe another scenario. Suppose we have a computed column that is a result of some complex calculation and we want to refer it in a where clause (or a group by clause). This is not straight forward since we can’t use the computed column directly in the where clause like this:

select 
  column1
  ,column2 
  ,computedcolumn = 
    Case 
      when expression1 = true then column1 * ( column4 - column 5 )
      when expression2 = true then column2 * ( column5 - column 6 )
      when expression3 = true then column3 * ( column6 - column 7 )
      else column4 - column 5
    end
from SomeTable
where computedColumn < 100

The reason is that the Where part of the query is evaluated before the Select part. For this to work, we need to duplicate the whole computation in the where clause like this:

Where
  (
    Case 
      when expression1 = true then column1 * ( column4 - column 5 )
      when expression2 = true then column2 * ( column5 - column 6 )
      when expression3 = true then column3 * ( column6 - column 7 )
      else column4 - column 5
    end
  ) < 100

Luckily, the APPLY operator comes to our rescue again. We just need to move our calculation to a CROSS APPLY section and we can refer to the calculated column by its alias in the where (as well as group by) clause:

select 
  column1
  ,column2, 
  ,computedcolumn
from SomeTable
cross apply
  (
    Select 
      computedColumn = 
        Case 
          when expression1 = true then column1 * ( column4 - column 5 )
          when expression2 = true then column2 * ( column5 - column 6 )
          when expression3 = true then column3 * ( column6 - column 7 )
          else column4 - column 5
        end
  ) ComputatedColumn1Query
where computedColumn < 100

Great! We can also use a CTE or derived table to accomplish the same:

;With TableWithComputedColumnCalculation as
(
  select 
    column1
    ,column2, 
    ,computedcolumn = 
      Case 
        when expression1 = true then column1 * ( column4 - column 5 )
        when expression2 = true then column2 * ( column5 - column 6 )
        when expression3 = true then column3 * ( column6 - column 7 )
        else column4 - column 5
      end
  from SomeTable
)

select * 
from TableWithComputedColumnCalculation
where computedColumn < 100

Notice that all the approaches I mentioned here (CROSS APPLY, CTE, derived table) are going to have the same query plan (and hence performance) and it is just a matter of personal preference which one to use.

T-SQL: Calculating cascading computed columns using CROSS APPLY operator

The powerful APPLY operator in Transact SQL is typically used to invoke a table-valued function for each row. Here’s an MSDN article that describes such usage. But a very interesting usage of the cross apply operator is to produce a much readable code in cascading calculations. I was working on an SSRS report a couple of days back and found it really helpful. Let’s have a look how.

Consider this simple data regarding employees, their regular hours, overtime hours, overtime code and regular rate:

With SampleData (Employee, RegHours, OTHours, OTCode, RegRate) as
(
      Select 'Employee1', 100, 10, 'OT1', 10
      union all
      Select 'Employee2', 100, 10, 'OT2', 10
      union all
      Select 'Employee3', 100, 10, 'OT3', 10
)

Select *
from SampleData

/*
Output:

Employee  RegHours    OTHours     OTCode RegRate
--------- ----------- ----------- ------ -----------
Employee1 100         10          OT1    10
Employee2 100         10          OT2    10
Employee3 100         10          OT3    10
*/

Suppose, we need to come up with the following result:

Employee  RegHours  OTHours  OTCode RegRate OTMultiplier OTRate  OTCost  RegCost TotalHours TotalCost EffectiveRate
--------- --------- -------- ------ ------- ------------ ------- ------- ------- ---------- --------- -------------
Employee1 100       10       OT1    10      1.2          12.0    120.0   1000    110        1120.0    10.18
Employee2 100       10       OT2    10      1.4          14.0    140.0   1000    110        1140.0    10.36
Employee3 100       10       OT3    10      1.5          15.0    150.0   1000    110        1150.0    10.45

Here’s how the above columns need to be calculated:

  1. OT Multiplier: To be calculated from OTType
  2. OT Rate: RegRate * OTMultiplier
  3. OT Cost: OTHours * OTCost
  4. Reg Cost: RegHours * RegCost
  5. Total Hours: RegHours + OTHours
  6. Total Cost: RegCost + OTCost
  7. Effective Rate: TotalCost / TotalHours

In the above calculations, the point to be noted is that almost every next column depends upon the result of a previous one. For example, Effective Rate depends upon Total Cost, that depends upon OT Cost, that depends upon OT Rate, that depends upon OT Multiplier, that depends upon the value in OT Type field.

The most straight forward attempt using a simple TSQL query would look something like:

Select *
  ,OTMultiplier =
    Case
      when OTCode='OT1' then 1.2
      when OTCode='OT2' then 1.4
      when OTCode='OT3' then 1.5
    end
  ,OTRate =
    Case
      when OTCode='OT1' then 1.2 * RegRate
      when OTCode='OT2' then 1.4 * RegRate
      when OTCode='OT3' then 1.5 * RegRate
    end
  , OTCost =
    Case
      when OTCode='OT1' then 1.2 * RegRate * OTHours
      when OTCode='OT2' then 1.4 * RegRate * OTHours
      when OTCode='OT3' then 1.5 * RegRate * OTHours
    end
  , RegCost = RegHours * RegRate
  , TotalHours = RegHours + OTHours
  , TotalCost =
    Case
      when OTCode='OT1' then 1.2 * RegRate * OTHours + RegHours * RegRate
      when OTCode='OT2' then 1.4 * RegRate * OTHours + RegHours * RegRate
      when OTCode='OT3' then 1.5 * RegRate * OTHours + RegHours * RegRate
    End
  , EffectiveRate =
    Case
      when OTCode='OT1' then (1.2 * RegRate * OTHours + RegHours * RegRate) / (RegHours + OTHours)
      when OTCode='OT2' then (1.4 * RegRate * OTHours + RegHours * RegRate) / (RegHours + OTHours)
      when OTCode='OT3' then (1.5 * RegRate * OTHours + RegHours * RegRate) / (RegHours + OTHours)
    End

From SampleData

The above query is fairly unreadable and difficult to maintain. It is because the effect of OT Type field is propagated to all the columns. What if a new OT Type is added, or any OT Multiplier logic is changed? We will then need to modify a lot of places. A much better code can be achieved if we use several CTEs or derived tables, but even more readable and maintainable code can be produced by using the cross apply operator. Here’s an example:

Select *
from SampleData
cross apply
(
  select OTMultiplier =
    Case
      when OTCode='OT1' then 1.2
      when OTCode='OT2' then 1.4
      when OTCode='OT3' then 1.5
    end
) as ca_otmultiplier
cross apply
(
  select OTRate = RegRate * OTMultiplier
) as ca_otrate
cross apply
(
  select OTCost = OTHours * OTRate
    , RegCost = RegHours * RegRate
) as ca_otcost
cross apply
(
  select TotalHours = OTHours + RegHours
    , TotalCost = RegCost + OTCost
) as ca_totals
cross apply
(
  select EffectiveRate = TotalCost / TotalHours
) as ca_effective

The great advantage is that the calculated result of every cross apply operator can be used into the next one that follows it. Another advantage as pointed by Brad is that such readability and maintainability comes without any extra cost. So the cross apply approach is a great choice if we have many calculated columns, each of which depends upon the result of previously calculated ones.