Banding in DAX

Level: Intermediate

As regular readers of my blog would know, I enjoy helping people on forums who need help with DAX, Power Query or Power BI.  Recently I have answered quite a few questions for people that could benefit from a DAX “Banding” solution and hence I have decided to write this post as a permanent link to the topic.  I first learnt about the DAX banding technique from Alberto Ferrari at http://sqlbi.com

What is Banding?

Banding is a technique that allows you to “group” or “stratify” your data in a table into “bands” (it is sometimes called Cohort analysis).  So let’s say you you have a table of customers like this (shown below).  In my sample database I have 18,000 customers and I know the age of each one.

image

This provides an opportunity to analyse the sales data based on age so as to understand if people of different ages purchase differently.  Below is a pivot table that illustrates what is possible with the data.  But the problem is that there is far too much detail to be useful.

image

I don’t really want to target 35 year old customers differently to 36 year old customers.  Ideally I would like to group (or stratify) the customers into “bands” so that I can treat a group of similar customers the same way.  So I might want to group my customers like this.

image

Once I have grouped customers by age, I can then create a pivot table like the one shown below.

image

Notice how much easier it is to find insights into the data when there is less detail.  I can immediately see from the above pivot table that as people get older, they tend to be more interested in mountain bikes and touring bikes and less interested in road bikes (of course this is a fake database, so the insights are not necessarily realistic).

You will need a Calculated Column for Banding

I am a strong advocate for using Measures (not calculated columns) when ever possible.  You can read more about that in my Knowledge Base here.  But in this instance, a Measure is not going to work.  Measures can only be used in the values section of a Pivot Table – you can’t use them in the “filtering” areas like Slicers, Rows, Columns, and Filters.  In the pivot table above, the bands are used to slice the pivot table on rows, and hence a measure wont work.

I Could Write an If Statement

So the objective is to create a calculated column in the customers table that groups people based on age into a stratified band.  Any Excel user would know that you could write an IF statement for that (DAX has very similar syntax for the IF statement in regular Excel).  The DAX syntax for a single IF statement is as follows

The DAX Calculated Column above will return one of two possible values depending on the age of the customer.  The complexity comes in when you want to have multiple conditions.  As I mentioned earlier in this post, there are 6 possible groups that I have in mind (shown here again).

image

To do this with IF statements, I would need to nest multiple IF statements inside each other as follows.

Do you see the problem?  This Calculated Column works, but it is very hard to write, read and modify.  And more importantly, there is a better way – use the banding technique.

Process for Creating Banding

The better way involves the following process.

  1. Create a table in Excel that contains the group names and the lower/upper age limits
  2. Load the table to Power Pivot
  3. Do not connect the table to any other tables – it is a disconnected table
  4. Write a DAX Calculated Column that compares the age of each customer against the lower/upper limits in your disconnected table and returns the single row from the banding table that matches for each customer.

Here is the detail of how to do it.

Create a Table in Excel

Here is what a table in Excel looks like.

image

The key features to note are

  1. There is and ID column – this will be used to sort the Band column later
  2. A “Band” column – this is the label that describes each group.  Note the groups are mutually exclusive and collectively exhaustive (MECE).
  3. There is a “from” and “to” column that set the lower and upper bounds of each age group.  Note that the upper age for one group will match the lower age for the next group.  I have used DAX to ensure no overlap in the Calculated Column.

Write a Calculated Column

The objective of the calculated column is to filter the disconnected Banding table so that 1 and only 1 row is “visible” or “unfiltered” for each customer.  Here is the formula

The role of the FILTER portion of this formula is to filter the AgeBands table so only a single row is unfiltered. What the FILTER does is apply 2 filter rules to ensure the customer is older than the “from” column and “less than or equal” to the “to” column.  It is this less than or equal to portion that ensures no cross over of individuals into multiple groups (Mutually Exclusive).

The VALUES function has a unique capability.  VALUES returns a single column table.  If that table also has a single row, then VALUES converts the table to a scalar value and allows that value to used as a result in a calculated column or indeed in a pivot table cell.  Given the formula guarantees that there is 1 and only 1 row left unfiltered in this banding table, then VALUES will kick in, extract the label for the group from the AgeBands[Band] column and place that value in the calculated column.

Sorting the New Calculated Column

There is one more problem to solve – the names of the new groups do not sort properly in the Pivot Table.  By default, columns of data will sort in Alphanumeric order.  So the labels will sort like this:

image

To solve this problem, I created a second calculated column that brings in the Sort ID into the customer table.  It is simple to do this – just copy the formula in the first calculated column and then update the VALUES portion with the “ID” column as follows.

Once this second calculated column has been created, it is possible to tell Power Pivot to use the new column of Sort ID values as the sort order for the Age Band column (shown below).

image

Compression of the Calculated Columns

As I mentioned earlier, I am not a big fan of calculated columns – except where they are truly needed.  There are a few guidelines about when it is OK to use a calculated column – you can read about them all in the link I provided to my Knowledge Base earlier.  The key points to note here however are.

  1. These calculated columns are on the lookup table (Customers).  Calculated columns on lookup tables are generally OK.
  2. The calculated columns have a low cardinality (ie a low number of unique values).  This is also generally OK, even on a large data table if necessary.

Now to Use the New Calculated Column

Now that the Customers table has this new “Age Band” column, it is possible to use this column in visualisations.  It can be used on rows, columns, filters, and slicers.  Here is a rows example:

image

And a slicer example:

image

Learn to Write DAX

L2WD banner ad

The banding topic is just one of the many techniques I cover in my book “Learn to Write DAX“.  If you are an Excel user and are learning to use DAX, my book gives a comprehensive coverage of all the topics you need to start your journey as a Power Pivot ninja.

Comments

  1. Hi Matt, great post.

    Use PowerQuery to create the same column, in M language of course, is a praticable solution in your opinion?

    • Hey Francesco, yes I definitely thing PQL (as I call the M language) is a very good candidate for this pattern as well. Just like Excel, there are many ways you can solve this problem. The only way I know how to do this in PQL is with a nested IF statement. That is not to say it can’t be done another way though – I just can’t think of it.

      • You can mirror Matt’s DAX-approach in PQ with a function like this:

        (Age) =>
        let
        Source = Table.Buffer(AgeBands),
        Age = Table.AddColumn(Source, “Age”, each Age),
        Check = Table.AddColumn(Age, “Band”, each if [Age]>[From] and [Age]<=[To] then "in" else "out"),
        Filter = Table.SelectRows(Check, each ([Band] = "in"))
        in
        Filter

        Added as a column to the Customers table it will return the matching row from the age band. You can then expand [Band] and [ID] at once. But it will be slower than the DAX-approach.

        A faster method would be this:

        InnerJoin = Table.NestedJoin(Customers,{"Age"},Table.ExpandListColumn(Table.AddColumn(AgeBands, "Ages", each {[From]..[To]-1}), "Ages"),{"Ages"},"NewColumn",JoinKind.Inner)

        It goes the other way around and instead of iterating through the age bands it creates all ages within the age-band-table and joins it with an inner join. This acts effectively as a filter.

  2. Thanks, it so happens I enjoyed a workshop given by Alberto Ferrari in Amsterdam last night, and he showed and explained the logic of this solution, great experience to see him presenting!

  3. I can’t get this formula to work

    =
    CALCULATE (
    VALUES ( AgeBands[ID] ),
    FILTER (
    AgeBands,
    Customers[Age] > AgeBands[From]
    && Customers[Age] <= AgeBands[To]
    )
    )

    Any tips?

    Excel 2016 Windows 10

    • Hey Duncan. Power Pivot is a system of tables, columns, relationships and formulas. Everything depends on everything else. So this is the formula, but that is just one piece of the puzzle.

      Are you writing a calculated column?
      If so, which table?
      If a table, how is the table connected to other tables?
      What error messages are you getting? These can be useful in debugging the issue.

  4. Hi Matt,

    Very interesting article.
    I am new to BI & Power Pivot and currently reading Rob Collie’s book “Power Pivot and PowerBI”. In that book, on page 128 similar case is explained where SWITCH(TRUE() …. has been used to create new calculated column with price band. Are these two solutions equal in results or maybe one of them has some limitations which I do not see as a beginner?

    Thanks
    Sylwester

    • I don’t think one is necessarily better or worse than the other. They are both calculated during data refresh so have no performance impact at runtime (same for all calc columns). I think SWITCH/TRUE is probably easier for beginners to grasp, however the technique I have shown helps users broaden their understanding of DAX and Power Pivot while also allowing a table based maintenance approach. Thanks for sharing the other approach.

  5. I’m new to Power BI and am wondering if this approach should be used if the dataset I’m looking at is, say line items in an invoice, and yet the band should depend on a roll up (or sum of all line items) in that table. More details:

    – I have a table with invoice line item data (inv #, inv date, line item #, amount, salesperson)
    – My table defining my bands looks at the total amount in a month and depending on that amount, returns a commission % which is paid to the designated salesperson
    – I want to sum amounts by date / month then use that summed number to assign a band value (e.g. if amount in jan 2016 > 1 and 100 and <=200, then 12% commission)

    What is the best practice in this case? Should I roll up the data by month/year/salesperson/sum(amount) and then assign the band to use in a calculation (amount * commission % = commission for that person and month)?

    Any guidance is appreciated

    • The banding I describe here is designed to “write” a number in a column. In your case you could use a similar approach to calculate a commission payment but instead of “writing” the number in a column you would instead use the value in your commission calc. In short, set up a pivot, add the salesperson to the rows and write a measure that calculates their performance. Once you have a measure that works, you can use my banding concept to fetch the commission rate and multiply it out. If you need more help, please create a sample workbook and do what you can, then post and ask a question at http://powerpivotforum.com.au and I will help you out.

Leave a Reply

Your email address will not be published. Required fields are marked *

Ten Things I Wish I Knew When I Started With Power Pivot.   Learn  Without The Pain!

x