*Level: Intermediate*

As regular readers of my blog would know, I enjoy helping people on forums who need help with DAX, Power Query or Power BI. Recently I have answered quite a few questions for people that could benefit from a DAX “Banding” solution and hence I have decided to write this post as a permanent link to the topic. I first learnt about the DAX banding technique from Alberto Ferrari at http://sqlbi.com

# What is Banding?

Banding is a technique that allows you to “group” or “stratify” your data in a table into “bands” (it is sometimes called Cohort analysis). So let’s say you you have a table of customers like this (shown below). In my sample database I have 18,000 customers and I know the age of each one.

This provides an opportunity to analyse the sales data based on age so as **to understand if people of different ages purchase differently**. Below is a pivot table that illustrates what is possible with the data. **But the problem is that there is far too much detail to be useful.**

I don’t really want to target 35 year old customers differently to 36 year old customers. Ideally I would like to group (or stratify) the customers into “bands” so that I can treat a group of similar customers the same way. So I might want to group my customers like this.

Once I have grouped customers by age, I can then create a pivot table like the one shown below.

Notice how much easier it is to find insights into the data when there is less detail. I can immediately see from the above pivot table that as people get older, they tend to be more interested in mountain bikes and touring bikes and less interested in road bikes (of course this is a fake database, so the insights are not necessarily realistic).

# You will need a Calculated Column for Banding

I am a strong advocate for using Measures (not calculated columns) when ever possible. You can read more about that in my Knowledge Base here. But in this instance, a Measure is not going to work. **Measures can only be used in the values section of a Pivot Table** **– you can’t use them in the “filtering” areas like Slicers, Rows, Columns, and Filters.** In the pivot table above, the bands are used to slice the pivot table on rows, and hence a measure wont work.

# I Could Write an If Statement

So **the objective is to create a calculated column in the customers table** that groups people based on age into a stratified band. Any Excel user would know that you could write an IF statement for that (DAX has very similar syntax for the IF statement in regular Excel). The DAX syntax for a single IF statement is as follows

The DAX Calculated Column above will return one of two possible values depending on the age of the customer. **The complexity comes in when you want to have multiple conditions.** As I mentioned earlier in this post, there are 6 possible groups that I have in mind (shown here again).

To do this with IF statements, I would need to nest multiple IF statements inside each other as follows.

Do you see the problem? This Calculated Column works, but it is very hard to write, read and modify. And more importantly, there is a better way – use the banding technique.

# Process for Creating Banding

The better way involves the following process.

- Create a table in Excel that contains the group names and the lower/upper age limits
- Load the table to Power Pivot
- Do not connect the table to any other tables – it is a disconnected table
- Write a DAX Calculated Column that compares the age of each customer against the lower/upper limits in your disconnected table and returns the single row from the banding table that matches for each customer.

Here is the detail of how to do it.

## Create a Table in Excel

Here is what a table in Excel looks like.

The key features to note are

- There is and ID column – this will be used to sort the Band column later
- A “Band” column – this is the label that describes each group. Note the groups are mutually exclusive and collectively exhaustive (MECE).
- There is a “from” and “to” column that set the lower and upper bounds of each age group. Note that the upper age for one group will match the lower age for the next group. I have used DAX to ensure no overlap in the Calculated Column.

## Write a Calculated Column

The objective of the calculated column is to filter the disconnected Banding table so that 1 and only 1 row is “visible” or “unfiltered” for each customer. Here is the formula

The role of **the FILTER portion of this formula** is to filter the AgeBands table so **only a single row is unfiltered**. What the FILTER does is apply 2 filter rules to ensure the customer is older than the “from” column and “less than or equal” to the “to” column. **It is this less than or equal to portion that ensures no cross over of individuals into multiple groups** (Mutually Exclusive).

The VALUES function has a unique capability. VALUES returns a single column table. **If that table also has a single row, then VALUES converts the table to a scalar value and allows that value to used as a result in a calculated column or indeed in a pivot table cell**. Given the formula guarantees that there is 1 and only 1 row left unfiltered in this banding table, then VALUES will kick in, extract the label for the group from the AgeBands[Band] column and place that value in the calculated column.

## Sorting the New Calculated Column

There is one more problem to solve – the names of the new groups do not sort properly in the Pivot Table. By default, columns of data will sort in Alphanumeric order. So the labels will sort like this:

To solve this problem, I created a second calculated column that brings in the Sort ID into the customer table. It is simple to do this – just copy the formula in the first calculated column and then update the VALUES portion with the “ID” column as follows.

Once this second calculated column has been created, it is possible to tell Power Pivot to use the new column of Sort ID values as the sort order for the Age Band column (shown below).

# Compression of the Calculated Columns

As I mentioned earlier, I am not a big fan of calculated columns – except where they are truly needed. There are a few guidelines about when it is OK to use a calculated column – you can read about them all in the link I provided to my Knowledge Base earlier. The key points to note here however are.

- These calculated columns are on the lookup table (Customers). Calculated columns on lookup tables are generally OK.
- The calculated columns have a low cardinality (ie a low number of unique values). This is also generally OK, even on a large data table if necessary.

# Now to Use the New Calculated Column

Now that the Customers table has this new “Age Band” column, it is possible to use this column in visualisations. It can be used on rows, columns, filters, and slicers. Here is a rows example:

And a slicer example:

# Learn to Write DAX

The banding topic is just one of the many techniques I cover in my book “Learn to Write DAX“. If you are an Excel user and are learning to use DAX, my book gives a comprehensive coverage of all the topics you need to start your journey as a Power Pivot ninja.

Thank you for this very clear and concise post.

Very good explanation. Shows how easy it is to set-up banding. Such posts help me a lot to promote usage of Power Pivot within my colleagues.

Hi Matt, great post.

Use PowerQuery to create the same column, in M language of course, is a praticable solution in your opinion?

Hey Francesco, yes I definitely thing PQL (as I call the M language) is a very good candidate for this pattern as well. Just like Excel, there are many ways you can solve this problem. The only way I know how to do this in PQL is with a nested IF statement. That is not to say it can’t be done another way though – I just can’t think of it.

You can mirror Matt’s DAX-approach in PQ with a function like this:

(Age) =>

let

Source = Table.Buffer(AgeBands),

Age = Table.AddColumn(Source, “Age”, each Age),

Check = Table.AddColumn(Age, “Band”, each if [Age]>[From] and [Age]<=[To] then "in" else "out"),

Filter = Table.SelectRows(Check, each ([Band] = "in"))

in

Filter

Added as a column to the Customers table it will return the matching row from the age band. You can then expand [Band] and [ID] at once. But it will be slower than the DAX-approach.

A faster method would be this:

InnerJoin = Table.NestedJoin(Customers,{"Age"},Table.ExpandListColumn(Table.AddColumn(AgeBands, "Ages", each {[From]..[To]-1}), "Ages"),{"Ages"},"NewColumn",JoinKind.Inner)

It goes the other way around and instead of iterating through the age bands it creates all ages within the age-band-table and joins it with an inner join. This acts effectively as a filter.

Thanks Hans and Ivan. I am glad you liked the post.

Thanks, it so happens I enjoyed a workshop given by Alberto Ferrari in Amsterdam last night, and he showed and explained the logic of this solution, great experience to see him presenting!

I can’t get this formula to work

=

CALCULATE (

VALUES ( AgeBands[ID] ),

FILTER (

AgeBands,

Customers[Age] > AgeBands[From]

&& Customers[Age] <= AgeBands[To]

)

)

Any tips?

Excel 2016 Windows 10

Hey Duncan. Power Pivot is a system of tables, columns, relationships and formulas. Everything depends on everything else. So this is the formula, but that is just one piece of the puzzle.

Are you writing a calculated column?

If so, which table?

If a table, how is the table connected to other tables?

What error messages are you getting? These can be useful in debugging the issue.

Hi Matt,

Very interesting article.

I am new to BI & Power Pivot and currently reading Rob Collie’s book “Power Pivot and PowerBI”. In that book, on page 128 similar case is explained where SWITCH(TRUE() …. has been used to create new calculated column with price band. Are these two solutions equal in results or maybe one of them has some limitations which I do not see as a beginner?

Thanks

Sylwester

I don’t think one is necessarily better or worse than the other. They are both calculated during data refresh so have no performance impact at runtime (same for all calc columns). I think SWITCH/TRUE is probably easier for beginners to grasp, however the technique I have shown helps users broaden their understanding of DAX and Power Pivot while also allowing a table based maintenance approach. Thanks for sharing the other approach.

Thank you for the post, makes sense.

Very interesting and well explained article !! Thanks

I’m new to Power BI and am wondering if this approach should be used if the dataset I’m looking at is, say line items in an invoice, and yet the band should depend on a roll up (or sum of all line items) in that table. More details:

– I have a table with invoice line item data (inv #, inv date, line item #, amount, salesperson)

– My table defining my bands looks at the total amount in a month and depending on that amount, returns a commission % which is paid to the designated salesperson

– I want to sum amounts by date / month then use that summed number to assign a band value (e.g. if amount in jan 2016 > 1 and 100 and <=200, then 12% commission)

What is the best practice in this case? Should I roll up the data by month/year/salesperson/sum(amount) and then assign the band to use in a calculation (amount * commission % = commission for that person and month)?

Any guidance is appreciated

The banding I describe here is designed to “write” a number in a column. In your case you could use a similar approach to calculate a commission payment but instead of “writing” the number in a column you would instead use the value in your commission calc. In short, set up a pivot, add the salesperson to the rows and write a measure that calculates their performance. Once you have a measure that works, you can use my banding concept to fetch the commission rate and multiply it out. If you need more help, please create a sample workbook and do what you can, then post and ask a question at http://powerpivotforum.com.au and I will help you out.

Hi Matt,

Thank you for writing this post. I have a question that is related to this post. However, it goes beyond the scope of this post. I hope that you could kindly help me answer it. (as it is one of the issue im dealing with atm)

So after i’ve created the calculated column for age band, the age band become fixed number. For this year, the results would be alright. But if i look back at data last year for example, the results would be wrong because for example, a 18 years old this year would be included in calculating sales for last year too even tho last year he was only 17 years old.

Yes, this is a standard behaviour of Calculated Columns and I highlighted this in the post. The Calculated column is updated when the report is refreshed. if you use =TODAY() instead of a hard coded date, the calculated column will update with the latest age when you refresh the data. I didn’t do this in my article because i wanted people to be able to copy what I have done and check they got the same result. If I used TODAY(), the results would vary