Another day on SQL battlefield brought this form of life:
SELECT a.AccNumber, NVL((SELECT SUM(dtInitAmount + ctInitAmount) FROM tMediumTable WHERE CancelledDoc_Id IS NULL AND CancellingDoc_Id IS NULL AND NVL(IsPending, 'N') = 'N' AND Type_ID NOT IN (6, 29, 40, 96) AND Account_ID = a.Account_ID), 0) AS saldo, NVL((SELECT SUM(dtInitAmount + ctInitAmount) FROM vHeavyView fd WHERE CancelledDoc_Id IS NULL AND CancellingDoc_Id IS NULL AND NVL(IsPending, 'N') = 'N' AND Type_ID IN (27, 132, 971) AND (dtInitAmount + ctInitAmount) > 0 AND Account_ID = a.Account_ID), 0) AS total_inv, NVL((SELECT COUNT(1) FROM vHeavyView fd WHERE CancelledDoc_Id IS NULL AND CancellingDoc_Id IS NULL AND NVL(IsPending, 'N') = 'N' AND Type_ID IN (27, 132, 971) AND (dtInitAmount + ctInitAmount) > 0 AND Account_ID = a.Account_ID), 0) AS num_of_inv, NVL((SELECT AVG(dtInitAmount + ctInitAmount) FROM vHeavyView fd WHERE CancelledDoc_Id IS NULL AND CancellingDoc_Id IS NULL AND NVL(IsPending, 'N') = 'N' AND Type_ID IN (27, 132, 971) AND (dtInitAmount + ctInitAmount) > 0 AND Account_ID = a.Account_ID), 0) AS average_inv, NVL((SELECT SUM(dtInitAmount + ctInitAmount) FROM vHeavyView fd WHERE CancelledDoc_Id IS NULL AND CancellingDoc_Id IS NULL AND NVL(IsPending, 'N') = 'N' AND Type_ID IN (27, 132, 971) AND (dtInitAmount + ctInitAmount) > 0 AND IssueDate >= SYSDATE - 180 AND Account_ID = a.Account_ID), 0) AS total_inv_180, NVL((SELECT COUNT(1) FROM vHeavyView fd WHERE CancelledDoc_Id IS NULL AND CancellingDoc_Id IS NULL AND NVL(IsPending, 'N') = 'N' AND Type_ID IN (27, 132, 971) AND (dtInitAmount + ctInitAmount) > 0 AND IssueDate >= SYSDATE - 180 AND Account_ID = a.Account_ID), 0) AS num_of_inv_180, NVL((SELECT AVG(dtInitAmount + ctInitAmount) FROM vHeavyView fd WHERE CancelledDoc_Id IS NULL AND CancellingDoc_Id IS NULL AND NVL(IsPending, 'N') = 'N' AND Type_ID IN (27, 132, 971) AND (dtInitAmount + ctInitAmount) > 0 AND IssueDate >= SYSDATE - 180 AND Account_ID = a.Account_ID), 0) AS average_inv_180, NVL((SELECT SUM(uAmount) FROM vHeavyView WHERE CancelledDoc_Id IS NULL AND CancellingDoc_Id IS NULL AND NVL(IsPending, 'N') = 'N' AND Type_ID IN (27, 132, 971) AND MinDueDate < SYSDATE AND (dtInitAmount + ctInitAmount) > 0 AND Account_ID = a.Account_ID), 0) AS total_open FROM vAnotherHeavyView a WHERE a.AccNumber IS NOT NULL
I’m glad you was patient enough to scroll the page down here 😉 As you can see it’s quite long query. There are a lot of nested queries on columns, most of them on heavy view which runs a lot of time. For every row of vAnotherHeavyView there was one scan on tMediumTable and 7 scans on vHeavyView! Explain plan was really terrible. Last time I run the query, it lasts for over 15 hours and didn’t finished. So it was a high time to make some optimization!
If you look carefully you’ll discover some patterns. All queries on vHeavyView have almost the same constraints and the only difference is on IssueDate and MinDueDate only. It would be nice to find a common piece and join the queries together. There are few simple aggregating functions used there: COUNT, SUM and AVG. If we only could scan the vHeavyView once and get functions results for constrained values only…
And here comes the magic! COUNT, SUM and AVG have such a nice feature to ignore NULL values. So if you pass i.e. 4, 12, NULL, 2, NULL values they will get into account only 4, 12 and 2. Having this we can use one more nice construction: CASE. Then we can use something like this:
CASE WHEN <a_constraint> THEN <a_value> ELSE NULL END
Finally we wrap this in aggregate function and using constraint for i.e. total_open column (the last one in query) we get:
AVG(CASE WHEN MinDueDate < SYSDATE THEN uAmount ELSE NULL END)
This way we can scan vHeavyView only once to get values for all columns! Relevant constraints will be applied using CASE for every row of the view. But to this we also need to move nested query out from column declaration and put it into FROM clause. We also have to GROUP it by key column that is used to make join with vAnotherHeavyView (Account_ID). Here is the final result:
SELECT a.AccNumber, fds.saldo, fdd.total_open, fdd.total_inv, fdd.num_of_inv, fdd.average_inv, fdd.total_inv_180, fdd.num_of_inv_180, fdd.average_inv_180 FROM vAnotherHeavyView a, (SELECT Account_ID, AVG(dtInitAmount + ctInitAmount) average_inv, COUNT(*) num_of_inv, SUM(dtInitAmount + ctInitAmount) total_inv, AVG(CASE WHEN IssueDate >= SYSDATE - 180 THEN dtInitAmount + ctInitAmount ELSE NULL END) average_inv_180, COUNT(CASE WHEN IssueDate >= SYSDATE - 180 THEN 1 ELSE NULL END) num_of_inv_180, SUM(CASE WHEN IssueDate >= SYSDATE - 180 THEN dtInitAmount + ctInitAmount ELSE NULL END) total_inv_180, SUM(CASE WHEN MinDueDate < SYSDATE THEN uamount ELSE NULL END) total_open FROM vHeavyView WHERE CancelledDoc_Id IS NULL AND CancellingDoc_Id IS NULL AND NVL(IsPending, 'N') = 'N' AND Type_ID IN (27, 132, 971) AND (dtInitAmount + ctInitAmount) > 0 GROUP BY Account_ID) fdd, (SELECT Account_ID, SUM(DTINITAMOUNT + CTINITAMOUNT) saldo FROM tMediumTable WHERE CancelledDoc_Id IS NULL AND CancellingDoc_Id IS NULL AND NVL(IsPending, 'N') = 'N' AND Type_ID NOT IN (6, 29, 40, 96) GROUP BY Account_ID) fds FROM vAnotherHeavyView a WHERE a.AccNumber IS NOT NULL AND fdd.Account_ID(+) = a.Account_ID AND fds.Account_ID(+) = a.Account_ID
As you can see I moved also tMediumTable as nested query with grouping to FROM clause to ensure one time scan also on this table. To achieve exactly the same result as source query I need to handle cases for subquery columns getting NULL values. That’s why I used outer join for FDD and FDS nested tables. Explain plan become beautifully simple (relatively) and whole query was much faster and resource saving.
So try to remember:
- Number 1: Avoid to use nested query in column declaration
- Number 2: Aggregate functions ignore NULL values
- Number 3: You can use CASE in query functions to apply query constraints
Keep calm and wait for next battlefield report 😉