Archive for Advanced

Cumulative Running Total Based on Highest Value

Level: Advanced

I’ve helped a couple of people out on various forums on this topic over the last week, and that is normally a good sign that I need a blog post on the topic. The scenario is to create a Pareto cumulative running total based on the top products, customers or whatever.  This is a bit tricker than a simple YTD running total, as the “order” of the best to worst products (or customers or whatever) is not materialised in a table, and nor is total sales.  This means all the calculations need to be done on the fly and that can make it tricky.

Cumulative Running Total from Best Products to Worst

My example today uses Adventure works.  I am looking at a sorted list of the best to worst product model names based on Total Sales and want to calculate a running cumulative total using DAX. The output will ultimately look like this.

end state

Spoiler Alert – Test Your DAX Skills First

The final solution to this problem looks relatively straight forward and doesn’t use overly complex DAX.  Anyone with Intermediate DAX skills would be more than capable of understanding how it works.  The reason I rated this topic as Advanced is because it is quite difficult to work out “How” to do it (I thought so anyway).

If you want to test out your DAX skills, then give it a go yourself before you read my solution.  I promise you it will look easy when you see the answer, but once you see the answer you will never know if you could have worked it out yourself.  Your objective is to build the table above using DAX.  Here is a blank Adventure Works Workbook you can use.  Post your experience in the comments below if you would like any hints or would like to share what you learnt.

How I Did It

The first thing I did was to create a total sales measure.  Then I sorted the pivot table so that I could see the best selling products at the top.

image

It is good practice to set up a pivot table and work through the problem one step at time like this.  Don’t try to solve world peace with a single formula – break the problem into manageable individual steps.  Do yourself a favour and take this approach for all your DAX problems!

The next thing I did was to write a simple Rank formula as follows.

Once this Rank was added to my Pivot Table, I toggled my sort order to make sure the RANKX formula works regardless of the order of the data in the Pivot – indeed it does.  This is not essential of course, but it is good peace of mind.

sort

At this stage, I didn’t really know how to write the formula, so I wrote this following a familiar “lifetime to date” pattern.  Lifetime to date is a bit like “Year to Date” but the starting point is the first date ever rather than the first date in the year.

In this measure, I am using [Ranked Models] instead of a date field.

image

The above measure simply gives me the total sales for the top 2 ranked products.  It would of course work for any other arbitrary number I selected.

And it is easy to check if this is working.  Highlight a few concentric cells (1 below), check the value in the information bar (2 below) and compare it against the result (3 below).

image

Now all that is required is to replace the number “2” with a dynamic calculation that tells me what the rank of the current product in the row of the Pivot Table is.  Well this turned out to be a lot harder than I first thought.  I tried a few things for 10 mins or so, but couldn’t get anything to work.

Time to Think

I headed off to meet a client and that gave me some thinking time.  It is amazing the impact that “thinking time” has on these things.  I am a strong believer that you must get your hands on the keyboard and start to write some DAX if you want to learn.  But I have also learnt that if you get stuck, you simply need to walk away and think it through.  It is amazing how many times I wake up in the morning (or the middle of the night – unfortunately) with an answer to a problem I have been noodling.   If you get stuck on a problem, walk away and think it through.

Top N to the Rescue

The issue I was facing was that I couldn’t think of a way to replace that hard coded “2” in my earlier attempt with the actual rank of that product for the row in the Pivot Table.  Clearly [Ranked Models] <= [Ranked Models (as in Pivot Table)] wasn’t going to work because there is no east way (that I can think of) to access that value.  I guess I could have used VALUES() to extract the Model Name and somehow recalculated the RANKX result again using that, but I figured there had to be an easier way.  The real issue is that I didn’t have a table of ranked products anywhere other than in the pivot – the table or ranked products is not materialised in the data model, only in the visualisation.  What I really needed was a virtual table that contained all the products ranked up until the Nth product.  Wow, that sounds like TopN.

TopN is a function that returns a table of items ranked on some value – in my case [Total Sales].

Virtual Tables are Awesome

Before I move on to my solution, let me talk about a very important concept that most beginners (and many more experienced) DAX users do not realise.  All virtual tables created in DAX retain a one to many relationship to the table from where they were created.  Read that sentence again until you have it clear in your mind.

The implication of this is that you can use a temporary table (1 below) that you create with a DAX formula such as VALUES, FILTER, TOPN etc and pass that table into a CALCULATE.  The temporary table retains a relationship to the rest of the data model (2 below) and hence when Context Transition occurs, the entire data model is filtered based on this temporary table.  If you use the Collie Layout Methodology (like I teach in my book), you will know that filters only ever automatically flow “down hill”, from the 1 side of the relationship to the many side of the relationship.  The table shown as 3 below is a virtual table that you can’t “see” but it behaves as if it where there in your data model.  During Context Transition, Table 3 filters Table 4, and Table 4 filters Table 5.

image

The image above is just an illustration of what happens behind the scenes – you can’t see the virtual table or the relationship at all – but it helps to visualise it in your mind. I cover this topic of virtual tables and many other fundamental conceptual topics in my book “Learn to Write DAX”.

A Working Dynamic Cumulative Total

To complete my solution to this problem, I first wrote the TopN formula I was planning to use to make sure it was working as I expected.  Because TopN returns a table, I couldn’t place the TopN formula directly in a Pivot Table – I had to wrap it inside COUNTROWS() as follows:

In my Pivot Table, it looked like this (shown below).  The difference is that the [Ranked Models] measure simply gives me a scalar value telling me “which rank” the product model is.  The [Top N Products] measure is first building a table that contains the top N products, and then counts how many there are – giving the same result of course.

image

Once I could see that this TopN formula was returning a table that was growing based on the Rank, I could then use the inner part of this formula as a table parameter inside a CALCULATE as follows:

The last thing I did was wrote a Pareto % of total column and placed it in my Pivot Table

Bingo!

image.png

If you know a better or different way of solving this problem. I always like to learn, so please post a comment below.  Of course there are all the issues about how RANKX handles ties vs TOPN etc, but I am sure once you have this pattern in hand, you can work out those issues yourself.

Reader Solutions Update 24/8/16

Make sure you take a look at the reader posted solutions in the comments below.  We can all learn from how others solve problems – I definitely have learnt some things.  The solution I want to call out came from Jess (Oxenskiold)

cum total

This is such a simple solution.   Let me explain here how it works.  The IF statement ensures that only Products with Sales get included in the calculation.  The CALCULATE takes 2 inputs (in this example).  The [Total Sales] on line 5 and the FILTER on lines 6-10.  The FILTER portion always gets executed first.  FILTER is an iterator.  In this case it will iterated of a table of all product models (line 7).  Line 8 is where all the work is done.  The entire Filter portion of the formula (lines 6-10) operates in the current row context from the pivot table.  So the first half of line 8 will show the SUM of whatever is in the visualisation (one of the product models).  The other half of line 8 [Total Sales] is deceiving.  the formula for [Total Sales] is simply  SUM(Sales[SalesAmount]), so it looks like row 8 says SUM(Sales[SalesAmount]) <= SUM(Sales[SalesAmount]).  The reason it works is that [Total Sales] is ACTUALLY equivalent to CALCULATE(SUM(Sales[SalesAmount])).  Every measure has an implicit CALCULATE wrapped around it that you can’t see.  Because of this CALCULATE, context transition occurs and the data model is filtered for the current product model iterated by FILTER.

The key learnings for me after seeing this solution are

  1. You don’t always get the best (or simplest) solution the first time
  2. Sometimes there is another solution that is looking at you in the face.  In this case [Total Sales] is a perfect proxy for product rank.

i really like Owen’s solutions too. They are very similar to the one from Jess in that they use total sales as a proxy for rank – something that I completely overlooked.

Many to Many Relationships in DAX Explained

Level: Advanced (but explained in detail so everyone can understand)

There is a lot to learn in DAX if you want to be a ninja.  A couple of the more complex and important things to learn and understand are filter propagation and context transition.  It is not so much that you need to be a rocket scientist to understand these concepts, it is more that they are not naturally intuitive.  You simply have to learn how filter propagation and context transition work.  What’s more, if you have an Excel background, there are some fundamental differences between the way Power Pivot works vs regular Excel, and you have to learn these things too.  All these things are learnable – but indeed you will need to learn. You need an understanding of filter propagation and context transition to understand how to solve the Many to Many problem below – but don’t worry – I will explain it in detail in this post.

The Many to Many Problem – Bill of Materials

The problem I am going to cover today is the DAX Many to Many problem.  All relationships between tables in DAX are of the type 1 to Many* – there is no support for Many to Many relationships in DAX (*Note: in Power BI there is also a 1 to 1 relationship type).

Now for a lot of people this won’t make any difference as there is no need for a Many to Many relationship.  But sometimes there is a DAX data model that has 2 or more tables that have multiple records in each table, and these tables have a logical business relationship to each other.   An example will make it easier to understand.

In the example below, each product exists in the Product table (orange) only once, but each product can be sold many times, and hence it can appear in the Sales table (blue) many times.  This is a standard 1 to many relationship that DAX was built to handle really well.

image

Some of the products in the Sales table are actually “multi product” products.  eg Product ABC is a box of fruit that contains 3 different products (2 x Apples, 3 x Bananas and 1 x Cantaloupe – 6 items in total) .  If you want to see the individual products that were sold, you would need to create a many to many relationship between the Sales table and the Bill of Materials table (green).   The product ID in the sales table has many instances of each product ID (1 for each sale), and the Bill of Materials table also has many instances of the Product ID (1 for each sub product). This is allowed in a traditional relational database tool like Access or SQL Server but it is not allowed in Power Pivot.

Using ONLY 1 to Many Relationships to Solve the Problem

Given that you simply must use 1 to Many relationships in DAX*, the only workable setup of the relationships between the tables is shown below.  There is a 1 to many relationship from the Products table to the Sales table.  There is also a 1 to many relationship from the Products table to the Bill of Materials (BOM) table.  But there is no relationship between the Calendar table and the BOM table because the BOM table doesn’t record the date of the sale; it only records the quantity of each sub product.

image

Now for Some Simple Measures

I have created 3 measures here.

Total Sales Qty=SUM(Sales[Qty])
Total BOM Qty=SUM(BOM[Qty])
Total BOM Sales = SUMX(Products,[Total Sales Qty] * [Total BOM Qty])

When I put these 3 measures in a pivot table with Calendar Date in Rows, this is what I get (shown below).  See the problem?  The measure [Total BOM Qty] is giving the wrong answer – I get the same value for every row in the Pivot Table.  What is going on here?

image

To understand the problem in DAX, you need to have a very clear understanding of automatic filter propagation between relationships.

The Many to Many Problem Explained

Below is the Power Pivot data model again.  I always layout my tables using the Collie Layout Methodology – that is, the lookup tables are at placed at the top and the data tables are placed at the bottom.  This has no effect on the behaviour of the model but it makes it much easier to visualise how filter propagation works.  Lookup tables always have 1 and only 1 row for every object in the table, and there must be a unique identifying key (date in the case of the Calendar table and Product ID in the case of the Products table).  Data tables also must have the same key (date and Product ID – otherwise you can’t join the tables) but data tables are allowed to have as many duplicates as needed (many sales are made on the same date, and the same product is sold many times).

image

Filter propagation automatically flows from the 1 side of the relationship to the many side of the relationship but it does not automatically flow in the other direction.  When you use the Collie Layout Methodology like I have here, we say that filters always automatically flow downhill – they can’t automatically flow uphill. So in the image above,

  • Any filter on the Calendar table will automatically flow through the relationship to the Sales table (shown as 1 to 2 in the image above).
  • Any filter on the Products table will automatically flow through the relationship to the Sales table (3 to 4) and it will also flow to the BOM table (5 to 6).
  • But very importantly, filters will not automatically flow from the Sales table to the Products table (7 to 8), nor uphill through (9 to 10) nor (11 to 12).  The implication is that when you set up a pivot table like the one shown earlier, the Calendar table will filter the Sales table, and hence the Total Sales Qty will be correct.  But there is no automatic way for the Calendar table to filter the BOM table because filters don’t automatically flow up hill. Hence the BOM table is completely unfiltered by the Calendar table.  The result of Total BOM Qty will therefore always be the quantity of the entire BOM table – completely unfiltered by the Calendar table (it is of course filtered by the Product table).

This is an Easy Problem to Solve in Power BI Desktop

In Power BI Desktop this is an easy problem to solve – I will explain how now before going back and solving it for Excel 2010/2013.  There is a feature in Power BI Desktop called Bi-Directional Cross Filtering that will allow you to change the natural “down hill only” filtering of the relationship between tables so that it flows in both directions. As you can see in the data model below (from Power BI Desktop), I have swapped the filtering direction of relationship 2 (Products to Sales) to be bi-directional (these arrows indicate the direction of filter propagation in Power Bi Desktop – which is very helpful indeed.  We can thank Rob Collie for lobbying for this UI improvement, and Microsoft for listening).

image

When you make this change, the Products table will be automatically filtered based on the entries in the Sales table – reverse “up hill” filtering.  Once the Products table is filtered by the Sales table (due to the bi-directional cross filtering behaviour), then the new filter on the Products table will automatically propagate downhill through relationship 3 shown above.  As a result of the end to end flow of cross filtering:

  1. The Calendar table filters the Sales table,
  2. The Sales table filters the Products table,
  3. The Products table filters the BOM table

The net result is that the Calendar table is now filtering the BOM table even though there is no direct relationship between these 2 tables.

When I create a Matrix in Power BI Desktop (a matrix is similar to a Pivot Table in Excel), I get the correct answers as shown below).

image

But there is another problem. Note that Total Sales x Total BOM Qty doesn’t automatically equal Total BOM Sales at this level (eg on 8th Jan). I need to bring the BOM ID column into my Matrix so I can see exactly which BOM items sold each day.  When I do this I get a similar problem to before.  See in the Matrix below that the BOM Qty is correct for each BOM ID, but the Sales Quantity is the same for each of the BOM IDs in the Matrix – this is not correct.

image

This is an almost identical problem as the first one.  Let’s look at the data model again (below).  The Matrix above has the BOM ID column on Rows in the matrix.  This column comes from the BOM Table (shown as 1 below) and because it is on rows in the matrix, it is normal to expect that this will filter the measure [Total Sales Qty].  But remember filters automatically flow down hill, not uphill.  So the BOM ID column is not filtering the Products table (2) and hence the Products table is not filtering the Sales table based on the BOM ID.  The net result, you get the same Total Sales Quantity regardless of the BOM ID because the BOM ID is not filtering the Sales Table.  The simple answer to this problem (in Power BI Desktop) is to change the cross filtering behaviour of the relationship (4) from single to Bi-Directional – just like before.

image

Once you make this change, you will get a fully working report that makes sense to anyone reading it.

image

How to use DAX to force the Calendar Table to filter the BOM Table

OK, now that you understand how to solve this with Bi-Directional cross filtering, hopefully you will realise what needs to be done to solve the problem in Excel 2010/2013.  Here is the data model again (shown below).  I need to force the Sales Table to filter the Products table (shown as 1 below) and I also need to force the BOM table also to filter the Products table (shown as 2 below).  If I can force the 2 Data tables to filter the common Products table, then the Products table will do its job and pass those filters through to the other data tables automatically, hence solving the problem.  Stated another way, I want the Sales table to filter the Products table, then the products table will filter the BOM table automatically.  I also want to force the BOM table to filter the Products table as well – then the products table will automatically filter the Sales table.  I am trying to get the 2 data tables to filter the common lookup table so the common lookup table will then pass the filters on to the other table

image

If I were to write these formulas using “Pseudo DAX”, the 2 formulas would read like this:

Total Sales Qty=
     CALCULATE(
         SUM(Sales[Qty]),
         'Filter the Products table based on the rows in the BOM table first'
     )

Total BOM Qty=
     CALCULATE(
         SUM(BOM[Qty]),
        'Filter the Products table based on the rows in the Sales table first after applying filters from the Calendar table'
     )

So now all I need to do is find a suitable filter to replace the “Filter the Products table…  ” portion of each formula and I will achieve the outcome.  There are many ways to do this, but first I am going to show you a method using the FILTER function, and then I will show (and explain) another method using Black Magic from The Italians.

Total Sales Qty

Let’s start with this formula.

Total Sales Qty=
     CALCULATE(
        sum(Sales[Qty]),
        'Filter the Products table based on the rows in the BOM table first'
     )

How can I write a filter statement to put inside CALCULATE that will filter the Products table based on the values in the BOM?  Let me show the formula and then explain what it does.

=FILTER(
   Products,
   CALCULATE(COUNTROWS(BOM)) > 0
 )

The FILTER function is an iterator and hence it has a Row Context.  The above FILTER formula iterates over the Products table and returns a filtered table of all rows in the Products table that pass the given test.  At each iteration (ie each product) in the Products table, the CALCULATE function forces context transition (turns the row context into a filter context) and hence the BOM table is filtered for the current row in the Products table iteration.

Then the FILTER formula asks the question “Now that it’s clear that we are only talking about this one single product for this single step of the iteration process and we have filtered the BOM table to reflect this, are there currently any rows visible in the BOM table?”.  If the answer is yes, then FILTER keeps that product, if the answer is no, then FILTER discards the product.  FILTER then goes to the second product in the Products table, then CALCULATE again forces context transition for this second iteration and the BOM table is filtered so that only rows of this specific second product are visible in the BOM table, and then FILTER completes the COUNTROWS check again.  This process goes on for every product in the Product table (all those in the current filter context anyway) and then FILTER is left with a new Filtered Table of Products that contains only products that also exist in the BOM table in the current filter context.

What if you leave out the CALCULATE?

It is worth pointing out here that the following filter formula will not work.

The problem with this second formula is that there is no CALCULATE wrapped around COUNTROWS(BOM).  FILTER is an iterator and has a Row Context.  But a Row Context does not automatically create a Filter Context.  So when the FILTER function steps through its iteration process and gets to the first Product, there is no Filter Context and hence the BOM table is not filtered by the new iteration process.  COUNTROWS(BOM) will therefore be the total number of rows in the original table in the original filter context, every product will therefore always pass the test (or always fail – depending on the initial filter context) and there will be no change to the new Filtered Products table.  The net result is the new Filtered Products table is actually identical to the original Products table – no change in filtering at all.  The formula simply doesn’t work.

Bringing the Correct Formulas Together

So putting the correct filter formula inside the CALCULATE from earlier, I end up with this formula.

Total BOM Qty

Now I can just apply the same pattern to the other formula, switching out the table names.

And here is the working Pivot Table, same as in Power BI Desktop earlier.  The BOM table is filtering the Sales table, and the Sales table is filtering the BOM table – Many to Many using DAX formulas in action!

image

Now for the Italian Black Magic

There is another way you can write these formulas that is simpler to write and easy to read – unfortunately it is difficult to understand how it works – Marco Russo calls it Black Magic.  Here are the 2 formulas.

When you compare these Black Magic formulas against the Pseudo DAX formula I wrote earlier, you will see that I am using the BOM table as the filter expression in the first formula, and the Sales table as the filter expression in the second formula.  This doesn’t make any sense on first sight.  If filters always propagate from the one side of the relationship to the many side of the relationship, how can these formulas possibly work?  This can be explained with “Expanded Tables”.

Expanded Tables

Power Pivot is built on top of some more traditional database technologies and hence what happens inside Power Pivot can be converted (or thought of) in more traditional database patterns and structures behind the scenes.  In SQL terms, the relationships between the Sales table (shown as 1 below), the Calendar table (2) and the Products table (3) are:

Sales Left Outer Join Calendar ( 1 to 2)

Sales Left Outer Join Products ( 1 to 3)

image

If I had these tables in a relational database, I could materialise the sales table into an Expanded Table that contains all the original Sales columns, plus the Calendar table columns and the Product table columns.  To do tihs I could write the following SQL Query:

Select * 

from Sales 

Left Join Calendar on Sales.Date = Calendar.Date 
Left Join Products on Sales.[Product ID] = Products.[Product ID]

The above query will return the following Expanded Table with columns of data coming from 3 different tables.

image

Technically speaking when I place the Sales table as a filter argument inside the following formula…

I am actually placing the Expanded Table – the Sales table plus all the relevant records from the other tables on the one side of the relationships.

The Expanded table will still be filtered by the current filter context.  So if there is a filter on the Calendar table (say for 3rd Jan), then the Calendar table will filter the Sales table AND the Expanded Sales Table.  If I re-run the SQL code with a filter on Calendar[Date] = ‘3 Jan 2016’ I get this new Expanded Table.  The Calendar table is filtering the Sales table, and the Sales table is filtering the Products table.

expanded

So when Sales table is used as the filter portion of the CALCULATE function, you can only “see” the Sales table, but it is actually the entire Expanded Sales table (including the Calendar and Products tables and any filters from all of these 3 tables) that is doing the filtering, not just the single Sales table.  Filters from all 3 tables are therefore effectively filtering the BOM table and that is why it works.

Here are copies of my workbook files if  you want to take a look.

Where to learn more

I have learnt most of my advanced DAX knowledge from The Italians (Marco Russo and Alberto Ferrari).  There is an excellent video available here where Alberto explains Many to Many relationships.  I also recommend the book “The Definitive Guide to DAX” for anyone that wants to develop a deep mastery of the DAX language.

For people that are earlier in the learning stages of DAX, you really must master filter propagation and context transition before you can move forward.  If you haven’t mastered these techniques, I recommend you invest some time going back over the basics and make sure you have a solid understanding of the pre-requisites.  The fastest, cheapest and most effective way you can do this is to read my book “Learn to Write DAX”.
L2WD banner ad