Advanced uses of the CASE statement in SQL — Part 1: SUM(CASE WHEN)

Michael Taverner
5 min readApr 23, 2022

--

Image author: Matthew Henry

A few use cases of the CASE statement that take it’s powers to a whole new level.

Somewhere along your SQL journey you’ll encounter the CASE statement. If you’re learning about this magical piece of syntax for the first time from an article or a MOOC, you’ll find that it’s amazing for creating your own buckets or categories of data in a column, but there are a few use cases of the CASE statement that seem to be seldom mentioned that I feel take it’s powers to a whole new level.

I want to explain in some depth how and why you would use some of these methods, so I’ve split this into multiple articles and will focus on one use case at a time.

Thought process

To start with, the most important thing when you’re figuring out new ways to use the CASE statement, is to think about what this statement returns as the output — it may give you a column full of strings, integers, floats, timestamps etc. Once you consider this, you’ll realise you can then use the CASE statement inside of other functions, and many other things, just like any other column.

The data I’m using for these examples is from the Kaggle E-Commerce Data set — go check it out. I made a few small transformations to illustrate some use cases that I come across every day in my career and loaded it into a local PostgreSQL database.

Dataset preview

SUM(CASE WHEN…)

This is one of the most common use cases I come across on a daily basis, and I find it very useful to “pivot” data from multiple rows into one.

See the example image above? There are two rows for each stock code within each order, with the Amount value in one row representing the Unit Price, and another showing the tax amount.

We’ve been asked to calculate, per stock code, the tax amount as a percentage of the unit price, in order to confirm that customers are being charged the correct tax amount. The best way to do this would be to get the Unit Price and Tax into one row per stock code, and divide Tax by Unit price to get the percentage. Let’s start by constructing the CASE statements.

Code
Output

We now have two new colums, one for Unit Price and one for Tax, with the respective value in each. But see that we still have two rows per item? And see those nulls that we’re left with? That’s where SUM comes in.

Code
Output

Now that we’ve pivoted the Unit price and Tax amounts up into one row per item, we can calculate the Tax percentage. You can do this by simply duplicating the CASE statements, and using the `/` operator between the second two, but in the real world where your queries are already going to be much longer, you’ll find your code will quickly become VERY difficult to read. A better option is to wrap the query above into a subquery, or even better still, a CTE, and use the output of that.

Code

Note: because PostgreSQL will throw an error when trying to divide by zero, I’ve wrapped the denominator in a NULLIF function so the query will still run — it will simply return a null if the denominator was a zero.

Output

You’ll probably want to round that final calculation off for readability, but that’s it! We have now created a newly calculated tax_pct column, based on data from two separate rows.

If you’re still reading, let’s take a look at one more example:

We’ve been asked to take a look at the 3 top selling stock codes in the UK, and see how these products are selling in the other 3 markets. Let’s do that by grouping on the stock code and description columns, then adding 4 columns with the SUM of sales in each market. We’ll then ORDER BY the UK sales amount, in descending order.

Code

Because we want to split the sales per market into individual columns and only look at Unit prices, our CASE statement specifies the country and the “type” as Unit Price, then returns the Amount to the SUM function.

Output

You can see the 3 best sellers in the UK, and also see how those are selling in the US, Canadian and Australian markets (spoiler alert: not great!).

In summary, SUM(CASE WHEN…) is just one very powerful way that CASE statements can be used to make your life as an analyst easier and help you provide actionable insights to your business. Next up, we’ll take a look at conditional WHERE clauses using CASE.

Until then!

--

--

Michael Taverner
Michael Taverner

Written by Michael Taverner

I'm an Australian Fraud Prevention Data Analyst, live sound engineer and data science enthusiast.

No responses yet