Advance Oracle SQL Functions (Part — 1)

Prithhviraaj Chattopadhyay
8 min readJun 6, 2023

--

The following article covers some functions in oracle SQL that have helped me to tackle certain complicated situations.

The topics covered in this article are as follows:
* Partition By
* Rank
* Lead and Lag
* Connect By and Order Siblings By
* NTILE

Partition By

This is a very important function that gets used in conjunction with several analytical functions in oracle SQL.

The first and most important aspect to understand about partition by is that it is a window function. I.E., it creates windows or partitions in the data and then allows us to perform operations inside those windows.

Let's look at an example:

SELECT col1, 
AVG(col2) OVER(PARTITION BY col1) AS Avgcol2,
MIN(col2) OVER(PARTITION BY col1)AS Mincol2,
SUM(col2) OVER(PARTITION BY col1) Totalcol2
FROM demo_table;

Here the windows are being decided by the value col1, and operations are being performed on the values of col2.

Now let us look at the differences between group by and partition by and the concepts will obtain some clarity.

-- say we have the following code block.
SELECT Customercity,
AVG(Orderamount) AS AvgOrderAmount,
MIN(OrderAmount) AS MinOrderAmount,
SUM(Orderamount) TotalOrderAmount
FROM [dbo].[Orders]
GROUP BY Customercity;

-- now say we want to add CustomerName and OrderAmount to our output.
-- then we would also need to add the columns into the group by expression.

SELECT Customercity, CustomerName, OrderAmount
AVG(Orderamount) AS AvgOrderAmount,
MIN(OrderAmount) AS MinOrderAmount,
SUM(Orderamount) TotalOrderAmount
FROM [dbo].[Orders]
GROUP BY Customercity, CustomerName, OrderAmount;

-- this works but unnecessarily increases levels of computation and affects
-- the performance of the query

So let us try to obtain the desired results using a different technique.

SELECT Customercity, 
CustomerName,
OrderAmount,
AVG(Orderamount) OVER(PARTITION BY Customercity) AS AvgOrderAmount,
MIN(OrderAmount) OVER(PARTITION BY Customercity) AS MinOrderAmount,
SUM(Orderamount) OVER(PARTITION BY Customercity) TotalOrderAmount
FROM [dbo].[Orders];

-- Here windows will be created on the basis of Customercity, and the
-- computations are performed on OrderAmount.
-- the value of CustomerName and OrderAmount are also in accordance with
-- Customercity.
output of the code, we can see that the computations are performed on order amount for their corresponding customer city.

So, we can get the desired output without adversely affecting the performance of the script. But the Partition by clause still has much more to offer.

Row number with partition by

SELECT Customercity, 
CustomerName,
ROW_NUMBER() OVER(PARTITION BY Customercity
ORDER BY OrderAmount DESC) AS "Row Number",
OrderAmount,
COUNT(OrderID) OVER(PARTITION BY Customercity) AS CountOfOrders,
AVG(Orderamount) OVER(PARTITION BY Customercity) AS AvgOrderAmount,
MIN(OrderAmount) OVER(PARTITION BY Customercity) AS MinOrderAmount,
SUM(Orderamount) OVER(PARTITION BY Customercity) TotalOrderAmount
FROM [dbo].[Orders];

-- here the row number function that is being used over the partitioned
-- database, allows us to number the rows in each partition.
o/p of above code. We are able to obtain the row numbers within each window.

Cumulative Total with Partition By

Suppose we want the cumulative total of the orders in a partition, the value should be the sum of the current row and the following row in the partition.

SELECT Customercity, 
CustomerName,
OrderAmount,
ROW_NUMBER() OVER(PARTITION BY Customercity
ORDER BY OrderAmount DESC) AS "Row Number",
CONVERT(VARCHAR(20), SUM(orderamount) OVER(PARTITION BY Customercity
ORDER BY OrderAmount DESC ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING), 1)
AS CumulativeTotal;

-- here convert is not necessary and is only used to get commas in the output.
output for cumulative average

A similar approach can be followed for Cumulative Average.

SELECT Customercity, 
CustomerName,
OrderAmount,
ROW_NUMBER() OVER(PARTITION BY Customercity
ORDER BY OrderAmount DESC) AS "Row Number",
CONVERT(VARCHAR(20), AVG(orderamount) OVER(PARTITION BY Customercity
ORDER BY OrderAmount DESC ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING), 1)
AS CumulativeAVG;

Rows Unbounded Preceding with the partition by clause

We can use this to calculate the average value of two rows if the row preceding the current row has a higher value than the current row.

SELECT Customercity, 
CustomerName,
OrderAmount,
ROW_NUMBER() OVER(PARTITION BY Customercity
ORDER BY OrderAmount DESC) AS "Row Number",
CONVERT(VARCHAR(20), AVG(orderamount) OVER(PARTITION BY Customercity
ORDER BY OrderAmount DESC ROWS UNBOUNDED PRECEDING), 1)
AS CumulativeAvg
FROM [dbo].[Orders];
output for row unbounded preceding

Till now we have seen how to mitigate group by expressions with partition by expressions. How Partition by is better and we have also gove over its versatility.

Now what if we are in a situation where we need a group by in a query which is already using a partition by.

We can do it, but we have to be consistent regarding the grouping levels.
eg:

select product_id, company, 
sum(members) as No_of_Members,
sum(sum(members)) over(partition by company) as TotalMembership
From Product_Membership
Group by Product_ID, Company

References for this Section:
SQL PARTITION BY Clause overview (sqlshack.com)
sql — Cannot use group by and over(partition by) in the same query? — Stack Overflow

Rank

Now that we have covered partition by, rank should be covered with relative ease.

This function is pretty straightforward and self-explanatory. It calculates the rank of a value within a group of values.

-- general form
RANK()
OVER ([ query_partition_clause ] order_by_clause)

-- eg
SELECT department_id, last_name, salary, commission_pct,
RANK() OVER (PARTITION BY department_id
ORDER BY salary DESC, commission_pct) "Rank"
FROM employees WHERE department_id = 80;

-- gets the rank of values within their respective partitions
all values with department id 80 are ranked

If the ablove code is changed slightly to not have a filter condition on department id, then we would see the partition come into play.

References for this section:
Oracle RANK() Function By Practical Examples (oracletutorial.com)
RANK (oracle.com)

Lead and Lag

These are two very important analytic functions that help in various situations and usually manifest their uses in the creation of calculated fields.
The mastery of these functions would also be trivial now that we have an in-depth understanding of partition by.

Lag — gets the previous value of a specified column for a given row.
Lead — gets the following value of a specified column for a given row.

Now if we are specifying a Partition condition, we would get the previous and next values inside that window.

LAG

-- without partition
SELECT empno,
ename,
job,
sal,
LAG(sal, 1, 0) OVER (ORDER BY sal) AS sal_prev,
sal - LAG(sal, 1, 0) OVER (ORDER BY sal) AS sal_diff
FROM emp;

--with partition
SELECT deptno,
empno,
ename,
job,
sal,
LAG(sal, 1, 0) OVER (PARTITION BY deptno ORDER BY sal) AS sal_prev
FROM emp;

-- N.B.
-- the parameters passed to the lag function are
-- 1. the column name
-- 2. the offset value (1 in this case) - tells the function to fech which
-- previous value
-- 3. the default - if previous value does not exist, what should the
-- function substitute it with
output for lag without partition
output for lag with partition

Lead

-- without partition
SELECT empno,
ename,
job,
sal,
LEAD(sal, 1, 0) OVER (ORDER BY sal) AS sal_next,
LEAD(sal, 1, 0) OVER (ORDER BY sal) - sal AS sal_diff
FROM emp;

-- with partition

SELECT deptno,
empno,
ename,
job,
sal,
LEAD(sal, 1, 0) OVER (PARTITION BY deptno ORDER BY sal) AS sal_next
FROM emp;

-- N.B.
-- the parameters passed to the lead function are
-- 1. the column name
-- 2. the offset value (1 in this case) - tells the function to fech which
-- next value
-- 3. the default - if next value does not exist, what should the
-- function substitute it with
output for lead without partition
output for lead with partition

Connect By

The main objective of this clause is to create a hierarchical structure. The Prior keywork is the key feature necessary for the working of this clause.
At least one expression should have the prior operator in order to signify the parent.

-- Sample query on the employees table:
SELECT employee_id, last_name, manager_id, LEVEL
FROM employees
CONNECT BY PRIOR employee_id = manager_id;
-- shows employee_id, last_name, manager_id and level in the tree for
-- the employee hierarchy.

Start with is an optional key word that can be used to denote the starting point of the hierarchy.

SELECT EMPLOYEE_ID,
FIRST_NAME || ' ' || LAST_NAME EMPLOYEE_NAME,
MANAGER_ID,
PRIOR FIRST_NAME || ' ' || PRIOR LAST_NAME MANAGER_NAME,
LEVEL
FROM EMPLOYEES
START WITH EMPLOYEE_ID = 100
CONNECT BY PRIOR EMPLOYEE_ID = MANAGER_ID
FETCH FIRST 5 ROWS ONLY;

Connect by Path

The following example returns the path of employee names from an employee to all employees of that employee (and their employees):

SELECT LPAD(' ', 2*level-1)|| SYS_CONNECT_BY_PATH(last_name, '/') "Path"
FROM employees
CONNECT BY PRIOR employee_id = manager_id;

Connect by Root

CONNECT_BY_ROOT is a unary operator that is valid only in hierarchical queries. Oracle returns the column value using data from the root row. This operator extends the functionality of the CONNECT BY [PRIOR] condition of hierarchical queries.

Restriction on CONNECT_BY_ROOT: You cannot specify this operator in the START WITH condition or the CONNECT BY condition.

SELECT last_name "Employee", CONNECT_BY_ROOT last_name "Manager",
LEVEL-1 "Pathlen", SYS_CONNECT_BY_PATH(last_name, '/') "Path"
FROM employees
WHERE LEVEL > 1 and department_id = 110
CONNECT BY PRIOR employee_id = manager_id;

Connect by isleaf

The CONNECT_BY_ISLEAF pseudocolumn returns 1 if the current row is a leaf of the tree defined by the CONNECT BY condition. Otherwise it returns 0. This information indicates whether a given row can be further expanded to show more of the hierarchy.

SELECT employee_id, last_name, manager_id, connect_by_isleaf "IsLeaf"
FROM employees
START WITH last_name = 'King'
CONNECT BY PRIOR employee_id = manager_id;

Now that we have a decent enough idea of connect by, the following topic is going to seem almost like an easter egg of this concept.

Order Siblings By

This clause is only valid in a hierarchical structure.
The siblings keyword specifies an order that first sorts the parent rows and then sorts the child rows for each parent in each level of the hierarchy.

SELECT empid, name, mgrid, LEVEL
FROM employee
START WITH name = 'Goyal'
CONNECT BY PRIOR empid = mgrid
ORDER SIBLINGS BY name;
output for order siblings by

The following diagram would help us visualize it better

Now referring the diagram, we can easily see that the code orders the data in each level.

References for this section:
How the start with CONNECT BY clause in Oracle works (oradev.com)
How to master the CONNECT BY clause in Oracle to analyze hierarchical data | by Harinder | Quick Code | Medium
ORDER SIBLINGS BY Clause — IBM Documentation

NTILE

This function divides an ordered data into a number of buckets based on some given condition and assigns the bucket number to each row.

The number of rows in the buckets can differ by at most 1. The remainder values (the remainder of number of rows divided by buckets) are distributed one for each bucket, starting with bucket 1.

SELECT last_name, salary, NTILE(4) OVER (ORDER BY salary DESC) AS quartile
FROM employees
WHERE department_id = 100
ORDER BY last_name, salary, quartile;
output of the n-tile function

References for this section:
NTILE (oracle.com)

Thus this article comes to a close.
These functions have helped me in a number of situations, and I hope this article would help you to get an understanding of their functionality.

Cheers.

--

--

Prithhviraaj Chattopadhyay

Data Analyst | write usually about engineering and mathematics.