Too much SQL in SQL

Another day on SQL battlefield brought this form of life:

SELECT a.AccNumber,
       NVL((SELECT SUM(dtInitAmount + ctInitAmount)
             FROM tMediumTable
            WHERE CancelledDoc_Id IS NULL
              AND CancellingDoc_Id IS NULL
              AND NVL(IsPending, 'N') = 'N'
              AND Type_ID NOT IN (6, 29, 40, 96)
              AND Account_ID = a.Account_ID),
           0) AS saldo,
       NVL((SELECT SUM(dtInitAmount + ctInitAmount)
             FROM vHeavyView fd
            WHERE CancelledDoc_Id IS NULL
              AND CancellingDoc_Id IS NULL
              AND NVL(IsPending, 'N') = 'N'
              AND Type_ID IN (27, 132, 971)
              AND (dtInitAmount + ctInitAmount) > 0
              AND Account_ID = a.Account_ID),
           0) AS total_inv,
       NVL((SELECT COUNT(1)
             FROM vHeavyView fd
            WHERE CancelledDoc_Id IS NULL
              AND CancellingDoc_Id IS NULL
              AND NVL(IsPending, 'N') = 'N'
              AND Type_ID IN (27, 132, 971)
              AND (dtInitAmount + ctInitAmount) > 0
              AND Account_ID = a.Account_ID),
           0) AS num_of_inv,
       NVL((SELECT AVG(dtInitAmount + ctInitAmount)
             FROM vHeavyView fd
            WHERE CancelledDoc_Id IS NULL
              AND CancellingDoc_Id IS NULL
              AND NVL(IsPending, 'N') = 'N'
              AND Type_ID IN (27, 132, 971)
              AND (dtInitAmount + ctInitAmount) > 0
              AND Account_ID = a.Account_ID),
           0) AS average_inv,
       NVL((SELECT SUM(dtInitAmount + ctInitAmount)
             FROM vHeavyView fd
            WHERE CancelledDoc_Id IS NULL
              AND CancellingDoc_Id IS NULL
              AND NVL(IsPending, 'N') = 'N'
              AND Type_ID IN (27, 132, 971)
              AND (dtInitAmount + ctInitAmount) > 0
              AND IssueDate >= SYSDATE - 180
              AND Account_ID = a.Account_ID),
           0) AS total_inv_180,
       NVL((SELECT COUNT(1)
             FROM vHeavyView fd
            WHERE CancelledDoc_Id IS NULL
              AND CancellingDoc_Id IS NULL
              AND NVL(IsPending, 'N') = 'N'
              AND Type_ID IN (27, 132, 971)
              AND (dtInitAmount + ctInitAmount) > 0
              AND IssueDate >= SYSDATE - 180
              AND Account_ID = a.Account_ID),
           0) AS num_of_inv_180,
       NVL((SELECT AVG(dtInitAmount + ctInitAmount)
             FROM vHeavyView fd
            WHERE CancelledDoc_Id IS NULL
              AND CancellingDoc_Id IS NULL
              AND NVL(IsPending, 'N') = 'N'
              AND Type_ID IN (27, 132, 971)
              AND (dtInitAmount + ctInitAmount) > 0
              AND IssueDate >= SYSDATE - 180
              AND Account_ID = a.Account_ID),
           0) AS average_inv_180,
       NVL((SELECT SUM(uAmount)
             FROM vHeavyView
            WHERE CancelledDoc_Id IS NULL
              AND CancellingDoc_Id IS NULL
              AND NVL(IsPending, 'N') = 'N'
              AND Type_ID IN (27, 132, 971)
              AND MinDueDate < SYSDATE
              AND (dtInitAmount + ctInitAmount) > 0
              AND Account_ID = a.Account_ID),
           0) AS total_open
  FROM vAnotherHeavyView a
 WHERE a.AccNumber IS NOT NULL

I’m glad you was patient enough to scroll the page down here 😉 As you can see it’s quite long query. There are a lot of nested queries on columns, most of them on heavy view which runs a lot of time. For every row of vAnotherHeavyView there was one scan on tMediumTable and 7 scans on vHeavyView! Explain plan was really terrible. Last time I run the query, it lasts for over 15 hours and didn’t finished. So it was a high time to make some optimization!

If you look carefully you’ll discover some patterns. All queries on vHeavyView have almost the same constraints and the only difference is on IssueDate and MinDueDate only. It would be nice to find a common piece and join the queries together. There are few simple aggregating functions used there: COUNT, SUM and AVG. If we only could scan the vHeavyView once and get functions results for constrained values only…

And here comes the magic! COUNT, SUM and AVG have such a nice feature to ignore NULL values. So if you pass i.e. 4, 12, NULL, 2, NULL values they will get into account only 4, 12 and 2. Having this we can use one more nice construction: CASE. Then we can use something like this:

CASE WHEN <a_constraint> THEN <a_value> ELSE NULL END

Finally we wrap this in aggregate function and using constraint for i.e. total_open column (the last one in query) we get:

AVG(CASE WHEN MinDueDate < SYSDATE THEN uAmount ELSE NULL END)

This way we can scan vHeavyView only once to get values for all columns! Relevant constraints will be applied using CASE for every row of the view. But to this we also need to move nested query out from column declaration and put it into FROM clause. We also have to GROUP it by key column that is used to make join with vAnotherHeavyView (Account_ID). Here is the final result:

SELECT a.AccNumber,
       fds.saldo,
       fdd.total_open,
       fdd.total_inv,
       fdd.num_of_inv,
       fdd.average_inv,
       fdd.total_inv_180,
       fdd.num_of_inv_180,
       fdd.average_inv_180
  FROM vAnotherHeavyView a,
       (SELECT Account_ID,
               AVG(dtInitAmount + ctInitAmount) average_inv,
               COUNT(*) num_of_inv,
               SUM(dtInitAmount + ctInitAmount) total_inv,
               AVG(CASE WHEN IssueDate >= SYSDATE - 180 THEN dtInitAmount + ctInitAmount ELSE NULL END) average_inv_180,
               COUNT(CASE WHEN IssueDate >= SYSDATE - 180 THEN 1 ELSE NULL END) num_of_inv_180,
               SUM(CASE WHEN IssueDate >= SYSDATE - 180 THEN dtInitAmount + ctInitAmount ELSE NULL END) total_inv_180,
               SUM(CASE WHEN MinDueDate < SYSDATE THEN uamount ELSE NULL END) total_open
          FROM vHeavyView
         WHERE CancelledDoc_Id IS NULL
           AND CancellingDoc_Id IS NULL
           AND NVL(IsPending, 'N') = 'N'
           AND Type_ID IN (27, 132, 971)
           AND (dtInitAmount + ctInitAmount) > 0
         GROUP BY Account_ID) fdd,
       (SELECT Account_ID,
               SUM(DTINITAMOUNT + CTINITAMOUNT) saldo
          FROM tMediumTable
         WHERE CancelledDoc_Id IS NULL
           AND CancellingDoc_Id IS NULL
           AND NVL(IsPending, 'N') = 'N'
           AND Type_ID NOT IN (6, 29, 40, 96)
         GROUP BY Account_ID) fds
  FROM vAnotherHeavyView a
 WHERE a.AccNumber IS NOT NULL
   AND fdd.Account_ID(+) = a.Account_ID
   AND fds.Account_ID(+) = a.Account_ID

As you can see I moved also tMediumTable as nested query with grouping to FROM clause to ensure one time scan also on this table. To achieve exactly the same result as source query I need to handle cases for subquery columns getting NULL values. That’s why I used outer join for FDD and FDS nested tables. Explain plan become beautifully simple (relatively) and whole query was much faster and resource saving.

So try to remember:

  • Number 1: Avoid to use nested query in column declaration
  • Number 2: Aggregate functions ignore NULL values
  • Number 3: You can use CASE in query functions to apply query constraints

Keep calm and wait for next battlefield report 😉