Logic-driven Column Calculations

Imagine a situation where the HR Manager requests a report to examine the effect of giving everyone in the Sales department a 20% salary increase. The report needs to give the total salary bill for each department in the company. This entails calculating the total of the emp.sal column for all employees in each department, which is straightforward using GROUP BY and SUM as shown in Example 2a.

Example 2a

SELECT d.dname department, SUM(e.sal) salary_total FROM emp e, dept d WHERE d.deptno = e.deptno GROUP BY d.dname;
Returning a different value for employees in the Sales department adds complication. There are a few possible solutions. We could use two different copies of the emp table in the FROM clause, or we could use a UNION of two complementary data sets, Sales and non-Sales employees, as shown in Example 2b.

Example 2b

SELECT d.dname department, SUM(e.sal) * 1.2 salary_total FROM emp e, dept d WHERE d.deptno = e.deptno AND d.dname = 'SALES' GROUP BY d.dname UNION ALL SELECT d.dname department, SUM(e.sal) salary_total FROM emp e, dept d WHERE d.deptno = e.deptno AND d.dname != 'SALES' GROUP BY d.dname; DEPARTMENT SALARY_TOTAL -------------- ------------ ACCOUNTING 8750 (Second half of UNION) RESEARCH 10875 (Second half of UNION) SALES 11280 (First half of UNION)

Although this statement will produce the desired results it will perform two full table scans against the emp table to return the complementary data sets which are then UNIONed. (Note that we have used UNION ALL because we know that the two data sets are already complementary, so no SORT UNIQUE step is necessary to exclude the intersection between the sets.) The execution plan generated by Personal Oracle 8.0.3 for this query is as follows

Execution Plan ---------------------------------------------------------- 0 SELECT STATEMENT Optimizer=CHOOSE 1 0 UNION-ALL 2 1 SORT (GROUP BY) 3 2 NESTED LOOPS 4 3 TABLE ACCESS (FULL) OF 'EMP' 5 3 TABLE ACCESS (BY INDEX ROWID) OF 'DEPT' 6 5 INDEX (UNIQUE SCAN) OF 'PK_DEPT' (UNIQUE) 7 1 SORT (GROUP BY) 8 7 NESTED LOOPS 9 8 TABLE ACCESS (FULL) OF 'EMP' 10 8 TABLE ACCESS (BY INDEX ROWID) OF 'DEPT' 11 10 INDEX (UNIQUE SCAN) OF 'PK_DEPT' (UNIQUE)

The only reason that we require two scans of emp is to return all the non-Sales employees and their salaries in one data set, using SUM(emp.sal) to calculate the departmental salary bills; and to return another data set containing the employees in Sales, using SUM(emp.sal * 1.2) to calculate the Sales departments salary bill. The two sets are then UNIONed. We can optimise this query using DECODE by retrieving all of the employee salary information in one scan of the emp table and then including logic in the SELECT clause to selectively apply a calculation to the results for the Sales employees, as shown in example 2c.

Example 2c

SELECT d.dname department, SUM(DECODE(d.dname,'SALES', e.sal * 1.2, e.sal)) salary_total FROM emp e, dept d WHERE d.deptno = e.deptno GROUP BY d.dname DEPARTMENT SALARY_TOTAL ACCOUNTING 8750 RESEARCH 10875 SALES 11280

Although the results are identical and the two statements are functionally equivalent, it is clear from the execution plan that this will require only one full scan of the emp table, which would represent a significant performance improvement if emp was a large table.

Execution Plan ---------------------------------------------------------- 0 SELECT STATEMENT Optimizer=CHOOSE 1 0 SORT (GROUP BY) 2 1 NESTED LOOPS 3 2 TABLE ACCESS (FULL) OF 'EMP' 4 3 TABLE ACCESS (BY INDEX ROWID) OF 'DEPT' 5 4 INDEX (UNIQUE SCAN) OF 'PK_DEPT' (UNIQUE)
So let's look at what we've changed. We'll leave the SELECT clause until last (because that is where the most significant changes are) and exclude the more straightforward parts of the statement first.

We still need to select FROM the same two tables and to GROUP BY the department name, so no change in those two parts of the statement. We know we're interested in all employees in the company so let's eliminate the department name check from the two different WHERE clauses in the first statement which leaves us with two identical WHERE clauses which facilitate the join between the emp and dept tables. Now that the two WHERE clauses are identical they return the same rows so we can reduce everything to one data set, with no need for the UNION any more. In fact, the query is starting to look like example 2a.

FROM emp e, dept d WHERE d.deptno = e.deptno GROUP BY d.dname;
This leaves us with our new SELECT clause to look at. The first column remains the same - the department name from the dept table. The second column specification contains some of the logic which we've moved from the WHERE clauses of the UNION. It uses the SUM() function to generate a total salary for all the employees in the department but uses different values for salary, depending on whether the department is SALES or not. So here is a high-level procedural view of how example 2c works.
FOR EACH department (GROUP BY d.dname) Generate the total salary for that department (SUM) FOR EACH employee (DECODE <85>) IF the related department (d.dname) is 'SALES', then Use salary (e.sal) * 1.2 ELSE (by default) Use the employee's current salary (e.sal)

Example 2c illustrates a couple of DECODE capabilities that we haven't used in earlier examples, which both result from DECODE's very open, generic model. First, the returned value (emp.sal or emp.sal*1.2) is not a translation of the value that we are testing against (dept.dname) <96> it isn't even in the same table. Second, DECODE will either return a field (emp.sal) or a calculation (emp.sal*1.2).

It is important to remember that all of the parameters to DECODE can be complex expressions of any type.

Next

The Power of Decode