I’ve helped a couple of people out on various forums on this topic over the last week, and that is normally a good sign that I need a blog post on the topic. The scenario is to create a Pareto cumulative running total based on the top products, customers or whatever. This is a bit tricker than a simple YTD running total, as the “order” of the best to worst products (or customers or whatever) is not materialised in a table, and nor is total sales. This means all the calculations need to be done on the fly and that can make it tricky.
Cumulative Running Total from Best Products to Worst
My example today uses Adventure works. I am looking at a sorted list of the best to worst product model names based on Total Sales and want to calculate a running cumulative total using DAX. The output will ultimately look like this.
Spoiler Alert – Test Your DAX Skills First
The final solution to this problem looks relatively straight forward and doesn’t use overly complex DAX. Anyone with Intermediate DAX skills would be more than capable of understanding how it works. The reason I rated this topic as Advanced is because it is quite difficult to work out “How” to do it (I thought so anyway).
If you want to test out your DAX skills, then give it a go yourself before you read my solution. I promise you it will look easy when you see the answer, but once you see the answer you will never know if you could have worked it out yourself. Your objective is to build the table above using DAX. Here is a blank Adventure Works Workbook you can use. Post your experience in the comments below if you would like any hints or would like to share what you learnt.
How I Did It
The first thing I did was to create a total sales measure. Then I sorted the pivot table so that I could see the best selling products at the top.
It is good practice to set up a pivot table and work through the problem one step at time like this. Don’t try to solve world peace with a single formula – break the problem into manageable individual steps. Do yourself a favour and take this approach for all your DAX problems!
The next thing I did was to write a simple Rank formula as follows.
= RANKX(ALL(Products[ModelName]),[Total Sales])
Once this Rank was added to my Pivot Table, I toggled my sort order to make sure the RANKX formula works regardless of the order of the data in the Pivot – indeed it does. This is not essential of course, but it is good peace of mind.
At this stage, I didn’t really know how to write the formula, so I wrote this following a familiar “lifetime to date” pattern. Lifetime to date is a bit like “Year to Date” but the starting point is the first date ever rather than the first date in the year.
In this measure, I am using [Ranked Models] instead of a date field.
The above measure simply gives me the total sales for the top 2 ranked products. It would of course work for any other arbitrary number I selected.
And it is easy to check if this is working. Highlight a few concentric cells (1 below), check the value in the information bar (2 below) and compare it against the result (3 below).
Now all that is required is to replace the number “2” with a dynamic calculation that tells me what the rank of the current product in the row of the Pivot Table is. Well this turned out to be a lot harder than I first thought. I tried a few things for 10 mins or so, but couldn’t get anything to work.
Time to Think
I headed off to meet a client and that gave me some thinking time. It is amazing the impact that “thinking time” has on these things. I am a strong believer that you must get your hands on the keyboard and start to write some DAX if you want to learn. But I have also learnt that if you get stuck, you simply need to walk away and think it through. It is amazing how many times I wake up in the morning (or the middle of the night – unfortunately) with an answer to a problem I have been noodling. If you get stuck on a problem, walk away and think it through.
Top N to the Rescue
The issue I was facing was that I couldn’t think of a way to replace that hard coded “2” in my earlier attempt with the actual rank of that product for the row in the Pivot Table. Clearly [Ranked Models] <= [Ranked Models (as in Pivot Table)] wasn’t going to work because there is no east way (that I can think of) to access that value. I guess I could have used VALUES() to extract the Model Name and somehow recalculated the RANKX result again using that, but I figured there had to be an easier way. The real issue is that I didn’t have a table of ranked products anywhere other than in the pivot – the table or ranked products is not materialised in the data model, only in the visualisation. What I really needed was a virtual table that contained all the products ranked up until the Nth product. Wow, that sounds like TopN.
TopN is a function that returns a table of items ranked on some value – in my case [Total Sales].
Virtual Tables are Awesome
Before I move on to my solution, let me talk about a very important concept that most beginners (and many more experienced) DAX users do not realise. All virtual tables created in DAX retain a one to many relationship to the table from where they were created. Read that sentence again until you have it clear in your mind.
The implication of this is that you can use a temporary table (1 below) that you create with a DAX formula such as VALUES, FILTER, TOPN etc and pass that table into a CALCULATE. The temporary table retains a relationship to the rest of the data model (2 below) and hence when Context Transition occurs, the entire data model is filtered based on this temporary table. If you use the Collie Layout Methodology (like I teach in my book), you will know that filters only ever automatically flow “down hill”, from the 1 side of the relationship to the many side of the relationship. The table shown as 3 below is a virtual table that you can’t “see” but it behaves as if it where there in your data model. During Context Transition, Table 3 filters Table 4, and Table 4 filters Table 5.
The image above is just an illustration of what happens behind the scenes – you can’t see the virtual table or the relationship at all – but it helps to visualise it in your mind. I cover this topic of virtual tables and many other fundamental conceptual topics in my book “Supercharge Excel”.
A Working Dynamic Cumulative Total
To complete my solution to this problem, I first wrote the TopN formula I was planning to use to make sure it was working as I expected. Because TopN returns a table, I couldn’t place the TopN formula directly in a Pivot Table – I had to wrap it inside COUNTROWS() as follows:
Top N Products Count = COUNTROWS( TOPN([Ranked Models], ALL(Products[ModelName]), [Total Sales]) )
In my Pivot Table, it looked like this (shown below). The difference is that the [Ranked Models] measure simply gives me a scalar value telling me “which rank” the product model is. The [Top N Products] measure is first building a table that contains the top N products, and then counts how many there are – giving the same result of course.
Once I could see that this TopN formula was returning a table that was growing based on the Rank, I could then use the inner part of this formula as a table parameter inside a CALCULATE as follows:
Cumulative Total based on Rank = CALCULATE([Total Sales], TOPN([Ranked Models],ALL(Products[ModelName]),[Total Sales]) )
The last thing I did was wrote a Pareto % of total column and placed it in my Pivot Table
% of Products Running Total = DIVIDE([Cumulative Total based on Rank],CALCULATE([Total Sales],ALL(Products)))
If you know a better or different way of solving this problem. I always like to learn, so please post a comment below. Of course there are all the issues about how RANKX handles ties vs TOPN etc, but I am sure once you have this pattern in hand, you can work out those issues yourself.
Reader Solutions Update 24/8/16
Make sure you take a look at the reader posted solutions in the comments below. We can all learn from how others solve problems – I definitely have learnt some things. The solution I want to call out came from Jess (Oxenskiold)
This is such a simple solution. Let me explain here how it works. The IF statement ensures that only Products with Sales get included in the calculation. The CALCULATE takes 2 inputs (in this example). The [Total Sales] on line 5 and the FILTER on lines 6-10. The FILTER portion always gets executed first. FILTER is an iterator. In this case it will iterated of a table of all product models (line 7). Line 8 is where all the work is done. The entire Filter portion of the formula (lines 6-10) operates in the current row context from the pivot table. So the first half of line 8 will show the SUM of whatever is in the visualisation (one of the product models). The other half of line 8 [Total Sales] is deceiving. the formula for [Total Sales] is simply SUM(Sales[SalesAmount]), so it looks like row 8 says SUM(Sales[SalesAmount]) <= SUM(Sales[SalesAmount]). The reason it works is that [Total Sales] is ACTUALLY equivalent to CALCULATE(SUM(Sales[SalesAmount])). Every measure has an implicit CALCULATE wrapped around it that you can’t see. Because of this CALCULATE, context transition occurs and the data model is filtered for the current product model iterated by FILTER.
The key learnings for me after seeing this solution are
- You don’t always get the best (or simplest) solution the first time
- Sometimes there is another solution that is looking at you in the face. In this case [Total Sales] is a perfect proxy for product rank.
i really like Owen’s solutions too. They are very similar to the one from Jess in that they use total sales as a proxy for rank – something that I completely overlooked.