Kde…

Analytic functions:

Analytic functions:
All my findings regarding analytic functions will go in this post. In other words I will keep on adding my finding to this post instead of creating new one all the time. I will also make sure that everytime when I update this post I bring up to the top of the list so that people notice the change.

Below is a small but sweet example that shows the difference between using three different functions.


DENSE_RANK()
RANK()
ROW_NUMBER()


create table t1 (
a number,
b varchar2(10)
);

insert into t1 values (5, ‘ABC’);
insert into t1 values (6, ‘ABC’);
insert into t1 values (5, ‘ABC’);
insert into t1 values (3, ‘ABC’);
insert into t1 values (5, ‘ABC’);
insert into t1 values (3, ‘ABC’);
insert into t1 values (6, ‘ABC’);

insert into t1 values (4, ‘XYZ’);
insert into t1 values (8, ‘XYZ’);
insert into t1 values (4, ‘XYZ’);
insert into t1 values (8, ‘XYZ’);
insert into t1 values (4, ‘XYZ’);

commit;

SQL> select * from t1;

A B
———- ———-
5 ABC
6 ABC
5 ABC
3 ABC
5 ABC
3 ABC
6 ABC
4 XYZ
8 XYZ
4 XYZ
8 XYZ

A B
———- ———-
4 XYZ

12 rows selected.

SQL> select dense_rank() over (partition by b order by a) “denserank”,
rank() over (partition by b order by a) “rank”,
row_number() over (partition by b order by a) “rownumber”,
a,b
from t1;

denserank rank rownumber A B
———- ———- ———- ———- ———-
1 1 1 3 ABC
1 1 2 3 ABC
2 3 3 5 ABC
2 3 4 5 ABC
2 3 5 5 ABC
3 6 6 6 ABC
3 6 7 6 ABC
1 1 1 4 XYZ
1 1 2 4 XYZ
1 1 3 4 XYZ
2 4 4 8 XYZ

denserank rank rownumber A B
———- ———- ———- ———- ———-
2 4 5 8 XYZ

12 rows selected.

SQL>
Hope this helps.
Kirtan

##### UPDATES BELOW #####

The example below uses MAX() function to show highest salary in each department right next to each employee record. It also shows what an employee’s salary is compared to the highest salary in the department.
Here the key is (PARTITION BY DEPTNO) part of the analytic function. That’s what tells oracle to break data up based on the partition details.

SQL> SELECT
ENAME,
SAL,
DEPTNO,
MAX(SAL) OVER (PARTITION BY DEPTNO) MGR_SAL_BY_DEPT,
sal/MAX(SAL) OVER (PARTITION BY DEPTNO) *100 EMP_percent_SAL
FROM EMP
ORDER BY DEPTNO;

ENAME SAL DEPTNO MGR_SAL_BY_DEPT EMP_PERCENT_SAL
———- ———- ———- ————— —————
CLARK 2450 10 5000 49
KING 5000 10 5000 100
MILLER 1300 10 5000 26
JONES 2975 20 3000 99.1666667
FORD 3000 20 3000 100
ADAMS 1100 20 3000 36.6666667
SMITH 800 20 3000 26.6666667
SCOTT 3000 20 3000 100
WARD 1250 30 2850 43.8596491
TURNER 1500 30 2850 52.6315789
ALLEN 1600 30 2850 56.1403509
JAMES 950 30 2850 33.3333333
BLAKE 2850 30 2850 100
MARTIN 1250 30 2850 43.8596491

14 rows selected.

SQL>

Now by adding one more MAX() function to the query we can have max salary across departments and a comparision of what an empolyee is making compared to the maximum salary across all departments.

SQL> SELECT
ENAME, SAL,DEPTNO,
MAX(SAL) OVER (PARTITION BY DEPTNO) max_dept_sal
,sal/MAX(SAL) OVER (PARTITION BY DEPTNO) *100 dept_percent
,MAX(SAL) OVER () max_comp_sal
,sal/MAX(SAL) OVER () *100 comp_percent
FROM EMP
ORDER BY DEPTNO;

ENAME SAL DEPTNO MAX_DEPT_SAL DEPT_PERCENT MAX_COMP_SAL COMP_PERCENT
———- ———- ———- ———— ———— ———— ————
CLARK 2450 10 5000 49 5000 49
KING 5000 10 5000 100 5000 100
MILLER 1300 10 5000 26 5000 26
JONES 2975 20 3000 99.1666667 5000 59.5
FORD 3000 20 3000 100 5000 60
ADAMS 1100 20 3000 36.6666667 5000 22
SMITH 800 20 3000 26.6666667 5000 16
SCOTT 3000 20 3000 100 5000 60
WARD 1250 30 2850 43.8596491 5000 25
TURNER 1500 30 2850 52.6315789 5000 30
ALLEN 1600 30 2850 56.1403509 5000 32

ENAME SAL DEPTNO MAX_DEPT_SAL DEPT_PERCENT MAX_COMP_SAL COMP_PERCENT
———- ———- ———- ———— ———— ———— ————
JAMES 950 30 2850 33.3333333 5000 19
BLAKE 2850 30 2850 100 5000 57
MARTIN 1250 30 2850 43.8596491 5000 25

14 rows selected.

SQL>

##############UPDATES ###################

example showing the use of LAG and LEAD functions. Here, LAG is used for reading rows before the current row and LEAD is used for reading rows after the current row. Example below shows the difference between salaries of row and its previous row and row and its next row. The key here would be the ORDER BY clause in the OVER() section of the analytic function.

SQL> SELECT
2 EMPNO,
3 ENAME,
4 JOB,
5 SAL,
6 LAG(sal, 1, 0) OVER (ORDER BY sal) AS sal_prev,
7 sal - LAG(sal, 1, 0) OVER (ORDER BY sal) AS LAG_sal_diff,
8 LEAD(sal, 1, 0) OVER (ORDER BY sal) AS sal_NEXT,
9 sal - LEAD(sal, 1, 0) OVER (ORDER BY sal) AS LEAD_sal_diff
10 FROM
11 EMP;

EMPNO ENAME JOB SAL SAL_PREV LAG_SAL_DIFF SAL_NEXT LEAD_SAL_DIFF
---------- ---------- --------- ---------- ---------- ------------ ---------- -------------
7369 SMITH CLERK 800 0 800 950 -150
7900 JAMES CLERK 950 800 150 1100 -150
7876 ADAMS CLERK 1100 950 150 1250 -150
7521 WARD SALESMAN 1250 1100 150 1250 0
7654 MARTIN SALESMAN 1250 1250 0 1300 -50
7934 MILLER CLERK 1300 1250 50 1500 -200
7844 TURNER SALESMAN 1500 1300 200 1600 -100
7499 ALLEN SALESMAN 1600 1500 100 2450 -850
7782 CLARK MANAGER 2450 1600 850 2850 -400
7698 BLAKE MANAGER 2850 2450 400 2975 -125
7566 JONES MANAGER 2975 2850 125 3000 -25

EMPNO ENAME JOB SAL SAL_PREV LAG_SAL_DIFF SAL_NEXT LEAD_SAL_DIFF
---------- ---------- --------- ---------- ---------- ------------ ---------- -------------
7788 SCOTT ANALYST 3000 2975 25 3000 0
7902 FORD ANALYST 3000 3000 0 5000 -2000
7839 KING PRESIDENT 5000 3000 2000 0 5000

14 rows selected.

SQL>

2 Responses to “Analytic functions:”

  1. April 26th, 2007 at 3:21 pm

    Balraj says:

    The examples are simple and easy to understand. I have a question on how the table partitioning will impact the way the analytical functions work. Results are not going to change but performance of this query .. will it enhance?

  2. April 26th, 2007 at 4:26 pm

    Kirtan Desai says:

    Balraj,

    Analytic Functions are performed on a resultant dataset. For example,


    SQL> SELECT
    ENAME,
    SAL,
    DEPTNO,
    MAX(SAL) OVER (PARTITION BY DEPTNO) MGR_SAL_BY_DEPT,
    sal/MAX(SAL) OVER (PARTITION BY DEPTNO) *100 EMP_percent_SAL
    FROM EMP
    WHERE DEPTNO=20
    ORDER BY DEPTNO;

    In above query Oracle would apply the analytic function on the dataset returned by the query after applying *where DEPTNO=20* condition.

    So the answer to your question is YES. Not because partitioning will help the anaytic function perform better directly but because right use of partitioning may improve the way your query filters the the data before returning the final dataset.

    Hope this helps.
    Kirtan

Leave a Reply