As regular readers of my blog would know, I enjoy helping people on forums who need help with DAX, Power Query or Power BI. Recently I have answered quite a few questions for people that could benefit from a DAX “Banding” solution and hence I have decided to write this post as a permanent link to the topic. I first learnt about the DAX banding technique from Alberto Ferrari at http://sqlbi.com
What is Banding?
Banding is a technique that allows you to “group” or “stratify” your data in a table into “bands” (it is sometimes called Cohort analysis). So let’s say you you have a table of customers like this (shown below). In my sample database I have 18,000 customers and I know the age of each one.
This provides an opportunity to analyse the sales data based on age so as to understand if people of different ages purchase differently. Below is a pivot table that illustrates what is possible with the data. But the problem is that there is far too much detail to be useful.
I don’t really want to target 35 year old customers differently to 36 year old customers. Ideally I would like to group (or stratify) the customers into “bands” so that I can treat a group of similar customers the same way. So I might want to group my customers like this.
Once I have grouped customers by age, I can then create a pivot table like the one shown below.
Notice how much easier it is to find insights into the data when there is less detail. I can immediately see from the above pivot table that as people get older, they tend to be more interested in mountain bikes and touring bikes and less interested in road bikes (of course this is a fake database, so the insights are not necessarily realistic).
You will need a Calculated Column for Banding
I am a strong advocate for using Measures (not calculated columns) when ever possible. You can read more about that in my Knowledge Base here. But in this instance, a Measure is not going to work. Measures can only be used in the values section of a Pivot Table – you can’t use them in the “filtering” areas like Slicers, Rows, Columns, and Filters. In the pivot table above, the bands are used to slice the pivot table on rows, and hence a measure wont work.
I Could Write an If Statement
So the objective is to create a calculated column in the customers table that groups people based on age into a stratified band. Any Excel user would know that you could write an IF statement for that (DAX has very similar syntax for the IF statement in regular Excel). The DAX syntax for a single IF statement is as follows
= IF(Customers[Age] <= 18, "18 and Under", "Over 18")
The DAX Calculated Column above will return one of two possible values depending on the age of the customer. The complexity comes in when you want to have multiple conditions. As I mentioned earlier in this post, there are 6 possible groups that I have in mind (shown here again).
To do this with IF statements, I would need to nest multiple IF statements inside each other as follows.
= IF( Customers[Age] <= 18, "18 and Under", IF( Customers[Age] > 18 && Customers[Age] <= 30, ">18 and <=30", IF( Customers[Age] > 30 && Customers[Age] <= 40, ">30 and <=40", IF( Customers[Age] > 40 && Customers[Age] <= 50, ">40 and <=50", IF(Customers[Age] > 50 && Customers[Age] <= 60, ">50 and <=60", ">60" ) ) ) ) )
Do you see the problem? This Calculated Column works, but it is very hard to write, read and modify. And more importantly, there is a better way – use the banding technique.
Process for Creating Banding
The better way involves the following process.
- Create a table in Excel that contains the group names and the lower/upper age limits
- Load the table to Power Pivot
- Do not connect the table to any other tables – it is a disconnected table
- Write a DAX Calculated Column that compares the age of each customer against the lower/upper limits in your disconnected table and returns the single row from the banding table that matches for each customer.
Here is the detail of how to do it.
Create a Table in Excel
Here is what a table in Excel looks like.
The key features to note are
- There is and ID column – this will be used to sort the Band column later
- A “Band” column – this is the label that describes each group. Note the groups are mutually exclusive and collectively exhaustive (MECE).
- There is a “from” and “to” column that set the lower and upper bounds of each age group. Note that the upper age for one group will match the lower age for the next group. I have used DAX to ensure no overlap in the Calculated Column.
Write a Calculated Column
The objective of the calculated column is to filter the disconnected Banding table so that 1 and only 1 row is “visible” or “unfiltered” for each customer. Here is the formula
= CALCULATE( VALUES(AgeBands[Band]), FILTER(AgeBands, Customers[Age] > AgeBands[From] && Customers[Age] <= AgeBands[To] ) )
The role of the FILTER portion of this formula is to filter the AgeBands table so only a single row is unfiltered. What the FILTER does is apply 2 filter rules to ensure the customer is older than the “from” column and “less than or equal” to the “to” column. It is this less than or equal to portion that ensures no cross over of individuals into multiple groups (Mutually Exclusive).
The VALUES function has a unique capability. VALUES returns a single column table. If that table also has a single row, then VALUES converts the table to a scalar value and allows that value to used as a result in a calculated column or indeed in a pivot table cell. Given the formula guarantees that there is 1 and only 1 row left unfiltered in this banding table, then VALUES will kick in, extract the label for the group from the AgeBands[Band] column and place that value in the calculated column.
Sorting the New Calculated Column
There is one more problem to solve – the names of the new groups do not sort properly in the Pivot Table. By default, columns of data will sort in Alphanumeric order. So the labels will sort like this:
To solve this problem, I created a second calculated column that brings in the Sort ID into the customer table. It is simple to do this – just copy the formula in the first calculated column and then update the VALUES portion with the “ID” column as follows.
= CALCULATE( VALUES(AgeBands[ID]), FILTER(AgeBands, Customers[Age] > AgeBands[From] && Customers[Age] <=AgeBands[To]) )
Once this second calculated column has been created, it is possible to tell Power Pivot to use the new column of Sort ID values as the sort order for the Age Band column (shown below).
Compression of the Calculated Columns
As I mentioned earlier, I am not a big fan of calculated columns – except where they are truly needed. There are a few guidelines about when it is OK to use a calculated column – you can read about them all in the link I provided to my Knowledge Base earlier. The key points to note here however are.
- These calculated columns are on the lookup table (Customers). Calculated columns on lookup tables are generally OK.
- The calculated columns have a low cardinality (ie a low number of unique values). This is also generally OK, even on a large data table if necessary.
Now to Use the New Calculated Column
Now that the Customers table has this new “Age Band” column, it is possible to use this column in visualisations. It can be used on rows, columns, filters, and slicers. Here is a rows example:
And a slicer example:
Learn to Write DAX
The banding topic is just one of the many techniques I cover in my books “Supercharge Power BI” and “Supercharge Excel“. If you are a Power BI user or an Excel user and are learning to use DAX, my books give a comprehensive coverage of all the topics you need to start your journey as a Power Pivot ninja.