Logic-dependent Aggregation (contd)

The only reason that we require two scans of SALES in example 2b is to return all the non-Photo products and their amount_sold in one data set, using SUM(amount_sold); and to return another data set containing the Photo products, using SUM(amount_sold) * 1.1 to calculate the total amount_sold. The two sets are then UNIONed.

We can optimise this query by retrieving all of the amount_sold values in one scan of the SALES table and then using CASE to selectively apply a calculation to the results for the Photo products in the SELECT list, as shown in example 2c.

Example 2c

SELECT   p.prod_category, p.prod_subcategory,
  SUM(CASE p.prod_category
             WHEN 'Photo' THEN amount_sold *1.1
             ELSE amount_sold
       END) AS dollars
FROM     sales s, times t, products p
WHERE s.time_id = t.time_id
AND s.prod_id = p.prod_id
AND t.calendar_year = 2001
GROUP BY p.prod_category, p.prod_subcategory
ORDER BY 1, 2;

PROD_CATEGORY                  PROD_SUBCATEGORY                   DOLLARS
------------------------------ -------------------------- ---------------
Electronics                    Game Consoles                 1,205,027.35
                               Home Audio                    2,779,398.57
                               Y Box Accessories               161,004.00
                               Y Box Games                     559,421.03
******************************                            ---------------
sum                                                          4,704,850.95
Hardware                       Desktop PCs                   2,230,713.39
                               Portable PCs                  3,453,656.62
******************************                            ---------------
sum                                                          5,684,370.01
Peripherals and Accessories    Accessories                     663,034.82
                               CD-ROM                          669,134.90
                               Memory                        1,228,555.41
                               Modems/Fax                      874,702.07
                               Monitors                      3,191,525.93
                               Printer Supplies              1,232,754.58
******************************                            ---------------
sum                                                          7,859,707.71
Photo                          Camcorders                    3,100,982.48
                               Camera Batteries                833,389.59
                               Camera Media                    606,199.41
                               Cameras                       2,426,420.33
******************************                            ---------------
sum                                                          6,966,991.80
Software/Other                 Accessories                     521,342.80
                               Bulk Pack Diskettes              88,216.04
                               Documentation                   827,932.29
                               Operating Systems             1,020,370.87
                               Recordable CDs                  367,478.04
                               Recordable DVD Discs            728,564.36
******************************                            ---------------
sum                                                          3,553,904.40

Although the results are identical and the two statements are functionally equivalent, it is clear from the execution plan that this will require only one full scan of the sales table, which represents a useful improvement.

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=ALL_ROWS (Cost=448 Card=75 Bytes=4800)
   1    0   SORT (GROUP BY) (Cost=448 Card=75 Bytes=4800)
   2    1     HASH JOIN (Cost=430 Card=229837 Bytes=14709568)
   3    2       TABLE ACCESS (FULL) OF 'PRODUCTS' (TABLE) (Cost=3 Card=72 Bytes=2520)
   4    2       HASH JOIN (Cost=424 Card=229837 Bytes=6665273)
   5    4         TABLE ACCESS (FULL) OF 'TIMES' (TABLE) (Cost=15 Card=365 Bytes=4380)
   6    4         PARTITION RANGE (ITERATOR) (Cost=400 Card=918843 Bytes=15620331)
   7    6           TABLE ACCESS (FULL) OF 'SALES' (TABLE) (Cost=400 Card=918843 Bytes=15620331)
Statistics
----------------------------------------------------------
          8  recursive calls
          0  db block gets
        587  consistent gets
          0  physical reads
          0  redo size
       1306  bytes sent via SQL*Net to client
        519  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          2  sorts (memory)
          0  sorts (disk)
         22  rows processed

Example 2c requires half the number of consistent gets and less than half the number of sorts that example 2b does. However, you'll probably notice that if you run this in a single user test environment that, because we're operating on fairly small volumes of data, the time to complete the requests and return the data is very similar - around 1 second in my tests. Which is why generating the execution plans and resource usage statistics is important. If you had many users running this report against larger data sets, the difference would become more noticeable.

So let's look at what we've changed. We'll leave the SELECT clause until last (because that is where the most significant changes are) and exclude the more straightforward parts of the statement first.

We still need to select FROM the same three tables and to GROUP BY product_category and product_subcategory, so no change in those two parts of the statement. We know we're interested in all products so let's eliminate the product_category check from the two different WHERE clauses in example 2b which leaves us with two identical WHERE clauses which facilitate the joins between the sales, times and product tables and limit the data to the calendar year 2001. Now that the two WHERE clauses are identical they return the same rows so we can reduce everything to one data set, with no need for the UNION any more. In fact, the query is starting to look like example 2a.

FROM  sales s, times t, products p
WHERE   s.time_id = t.time_id
AND     s.prod_id = p.prod_id
AND     t.calendar_year = 2001
GROUP BY p.prod_category, p.prod_subcategory
ORDER BY p.prod_category, p.prod_subcategory;

This leaves us with our new SELECT clause to look at. The first two grouping columns remain the same - product_category and product_subcategory from the products table. The third column specification contains some of the logic which we've moved from the WHERE clauses of the UNION. It uses the SUM() function to generate a total amount_sold for all the products in the product_category and product_subcategory but uses different values for amount_sold, depending on whether the category is <91>Photo' or not. So here is a high-level procedural view of how example 2c works.

  FOR EACH product subcategory (GROUP BY prod_category, prod_subcategory)
        Generate the total amount_sold for that product subcategory (SUM)
                IF the related product_category is 'Photo', THEN
                        Use amount_sold * 1.1
                ELSE (by default)
                        Use amount_sold
                END IF
Next

Improving SQL efficiency using CASE