(Level: Advanced, but explained in detail so everyone can understand)
There is a lot to learn in DAX if you want to be a ninja. A couple of the more complex and important things to learn and understand are filter propagation and context transition. It is not so much that you need to be a rocket scientist to understand these concepts, it is more that they are not naturally intuitive. You simply have to learn how filter propagation and context transition work. What’s more, if you have an Excel background, there are some fundamental differences between the way Power Pivot works vs regular Excel, and you have to learn these things too. All these things are learnable – but indeed you will need to learn. You need an understanding of filter propagation and context transition to understand how to solve the Many to Many problem below – but don’t worry – I will explain it in detail in this post.
The Many to Many Problem – Bill of Materials
The problem I am going to cover today is the DAX Many to Many problem. All relationships between tables in DAX are of the type 1 to Many* – there is no support for Many to Many relationships in DAX (*Note: in Power BI there is also a 1 to 1 relationship type).
Now for a lot of people this won’t make any difference as there is no need for a Many to Many relationship. But sometimes there is a DAX data model that has 2 or more tables that have multiple records in each table, and these tables have a logical business relationship to each other. An example will make it easier to understand.
In the example below, each product exists in the Product table (orange) only once, but each product can be sold many times, and hence it can appear in the Sales table (blue) many times. This is a standard 1 to many relationship that DAX was built to handle really well.
Some of the products in the Sales table are actually “multi product” products. eg Product ABC is a box of fruit that contains 3 different products (2 x Apples, 3 x Bananas and 1 x Cantaloupe – 6 items in total) . If you want to see the individual products that were sold, you would need to create a many to many relationship between the Sales table and the Bill of Materials table (green). The product ID in the sales table has many instances of each product ID (1 for each sale), and the Bill of Materials table also has many instances of the Product ID (1 for each sub product). This is allowed in a traditional relational database tool like Access or SQL Server but it is not allowed in Power Pivot.
Using ONLY 1 to Many Relationships to Solve the Problem
Given that you simply must use 1 to Many relationships in DAX*, the only workable setup of the relationships between the tables is shown below. There is a 1 to many relationship from the Products table to the Sales table. There is also a 1 to many relationship from the Products table to the Bill of Materials (BOM) table. But there is no relationship between the Calendar table and the BOM table because the BOM table doesn’t record the date of the sale; it only records the quantity of each sub product.
Now for Some Simple Measures
I have created 3 measures here.
Total Sales Qty = SUM(Sales[Qty]) Total BOM Qty = SUM(BOM[Qty]) Total BOM Sales = SUMX(Products,[Total Sales Qty] * [Total BOM Qty])
When I put these 3 measures in a pivot table with Calendar Date in Rows, this is what I get (shown below). See the problem? The measure [Total BOM Qty] is giving the wrong answer – I get the same value for every row in the Pivot Table. What is going on here?
To understand the problem in DAX, you need to have a very clear understanding of automatic filter propagation between relationships.
The Many to Many Problem Explained
Below is the Power Pivot data model again. I always layout my tables using the Collie Layout Methodology – that is, the lookup tables are at placed at the top and the data tables are placed at the bottom. This has no effect on the behaviour of the model but it makes it much easier to visualise how filter propagation works. Lookup tables always have 1 and only 1 row for every object in the table, and there must be a unique identifying key (date in the case of the Calendar table and Product ID in the case of the Products table). Data tables also must have the same key (date and Product ID – otherwise you can’t join the tables) but data tables are allowed to have as many duplicates as needed (many sales are made on the same date, and the same product is sold many times).
Filter propagation automatically flows from the 1 side of the relationship to the many side of the relationship but it does not automatically flow in the other direction. When you use the Collie Layout Methodology like I have here, we say that filters always automatically flow downhill – they can’t automatically flow uphill. So in the image above,
- Any filter on the Calendar table will automatically flow through the relationship to the Sales table (shown as 1 to 2 in the image above).
- Any filter on the Products table will automatically flow through the relationship to the Sales table (3 to 4) and it will also flow to the BOM table (5 to 6).
- But very importantly, filters will not automatically flow from the Sales table to the Products table (7 to 8), nor uphill through (9 to 10) nor (11 to 12). The implication is that when you set up a pivot table like the one shown earlier, the Calendar table will filter the Sales table, and hence the Total Sales Qty will be correct. But there is no automatic way for the Calendar table to filter the BOM table because filters don’t automatically flow up hill. Hence the BOM table is completely unfiltered by the Calendar table. The result of Total BOM Qty will therefore always be the quantity of the entire BOM table – completely unfiltered by the Calendar table (it is of course filtered by the Product table).
This is an Easy Problem to Solve in Power BI Desktop
In Power BI Desktop this is an easy problem to solve – I will explain how now before going back and solving it for Excel 2010/2013. There is a feature in Power BI Desktop called Bi-Directional Cross Filtering that will allow you to change the natural “down hill only” filtering of the relationship between tables so that it flows in both directions. As you can see in the data model below (from Power BI Desktop), I have swapped the filtering direction of relationship 2 (Products to Sales) to be bi-directional (these arrows indicate the direction of filter propagation in Power Bi Desktop – which is very helpful indeed. We can thank Rob Collie for lobbying for this UI improvement, and Microsoft for listening).
When you make this change, the Products table will be automatically filtered based on the entries in the Sales table – reverse “up hill” filtering. Once the Products table is filtered by the Sales table (due to the bi-directional cross filtering behaviour), then the new filter on the Products table will automatically propagate downhill through relationship 3 shown above. As a result of the end to end flow of cross filtering:
- The Calendar table filters the Sales table,
- The Sales table filters the Products table,
- The Products table filters the BOM table
The net result is that the Calendar table is now filtering the BOM table even though there is no direct relationship between these 2 tables.
When I create a Matrix in Power BI Desktop (a matrix is similar to a Pivot Table in Excel), I get the correct answers as shown below).
But there is another problem. Note that Total Sales x Total BOM Qty doesn’t automatically equal Total BOM Sales at this level (eg on 8th Jan). I need to bring the BOM ID column into my Matrix so I can see exactly which BOM items sold each day. When I do this I get a similar problem to before. See in the Matrix below that the BOM Qty is correct for each BOM ID, but the Sales Quantity is the same for each of the BOM IDs in the Matrix – this is not correct.
This is an almost identical problem as the first one. Let’s look at the data model again (below). The Matrix above has the BOM ID column on Rows in the matrix. This column comes from the BOM Table (shown as 1 below) and because it is on rows in the matrix, it is normal to expect that this will filter the measure [Total Sales Qty]. But remember filters automatically flow down hill, not uphill. So the BOM ID column is not filtering the Products table (2) and hence the Products table is not filtering the Sales table based on the BOM ID. The net result, you get the same Total Sales Quantity regardless of the BOM ID because the BOM ID is not filtering the Sales Table. The simple answer to this problem (in Power BI Desktop) is to change the cross filtering behaviour of the relationship (4) from single to Bi-Directional – just like before.
Once you make this change, you will get a fully working report that makes sense to anyone reading it.
How to use DAX to force the Calendar Table to filter the BOM Table
OK, now that you understand how to solve this with Bi-Directional cross filtering, hopefully you will realise what needs to be done to solve the problem in Excel 2010/2013. Here is the data model again (shown below). I need to force the Sales Table to filter the Products table (shown as 1 below) and I also need to force the BOM table also to filter the Products table (shown as 2 below). If I can force the 2 Data tables to filter the common Products table, then the Products table will do its job and pass those filters through to the other data tables automatically, hence solving the problem. Stated another way, I want the Sales table to filter the Products table, then the products table will filter the BOM table automatically. I also want to force the BOM table to filter the Products table as well – then the products table will automatically filter the Sales table. I am trying to get the 2 data tables to filter the common lookup table so the common lookup table will then pass the filters on to the other table
If I were to write these formulas using “Pseudo DAX”, the 2 formulas would read like this:
Total Sales Qty = CALCULATE( SUM(Sales[Qty]), 'Filter the Products table based on the rows in the BOM table first' ) Total BOM Qty = CALCULATE( SUM(BOM[Qty]), 'Filter the Products table based on the rows in the Sales table first after applying filters from the Calendar table' )
So now all I need to do is find a suitable filter to replace the “Filter the Products table… ” portion of each formula and I will achieve the outcome. There are many ways to do this, but first I am going to show you a method using the FILTER function, and then I will show (and explain) another method using Black Magic from The Italians.
Total Sales Qty
Let’s start with this formula.
Total Sales Qty = CALCULATE( SUM(Sales[Qty]), 'Filter the Products table based on the rows in the BOM table first' )
How can I write a filter statement to put inside CALCULATE that will filter the Products table based on the values in the BOM? Let me show the formula and then explain what it does.
=FILTER( Products, CALCULATE(COUNTROWS(BOM)) > 0 )
The FILTER function is an iterator and hence it has a Row Context. The above FILTER formula iterates over the Products table and returns a filtered table of all rows in the Products table that pass the given test. At each iteration (ie each product) in the Products table, the CALCULATE function forces context transition (turns the row context into a filter context) and hence the BOM table is filtered for the current row in the Products table iteration.
Then the FILTER formula asks the question “Now that it’s clear that we are only talking about this one single product for this single step of the iteration process and we have filtered the BOM table to reflect this, are there currently any rows visible in the BOM table?”. If the answer is yes, then FILTER keeps that product, if the answer is no, then FILTER discards the product. FILTER then goes to the second product in the Products table, then CALCULATE again forces context transition for this second iteration and the BOM table is filtered so that only rows of this specific second product are visible in the BOM table, and then FILTER completes the COUNTROWS check again. This process goes on for every product in the Product table (all those in the current filter context anyway) and then FILTER is left with a new Filtered Table of Products that contains only products that also exist in the BOM table in the current filter context.
What if you leave out the CALCULATE?
It is worth pointing out here that the following filter formula will not work.
= FILTER(Products, COUNTROWS(BOM) > 0)
The problem with this second formula is that there is no CALCULATE wrapped around COUNTROWS(BOM). FILTER is an iterator and has a Row Context. But a Row Context does not automatically create a Filter Context. So when the FILTER function steps through its iteration process and gets to the first Product, there is no Filter Context and hence the BOM table is not filtered by the new iteration process. COUNTROWS(BOM) will therefore be the total number of rows in the original table in the original filter context, every product will therefore always pass the test (or always fail – depending on the initial filter context) and there will be no change to the new Filtered Products table. The net result is the new Filtered Products table is actually identical to the original Products table – no change in filtering at all. The formula simply doesn’t work.
Bringing the Correct Formulas Together
So putting the correct filter formula inside the CALCULATE from earlier, I end up with this formula.
Total Sales Qty = CALCULATE( SUM(Sales[Qty]), FILTER(Products, CALCULATE(COUNTROWS(BOM)) > 0) )
Total BOM Qty
Now I can just apply the same pattern to the other formula, switching out the table names.
Total BOM Qty = CALCULATE( SUM(BOM[Qty]), FILTER(Products, CALCULATE(COUNTROWS(Sales)) > 0) )
And here is the working Pivot Table, same as in Power BI Desktop earlier. The BOM table is filtering the Sales table, and the Sales table is filtering the BOM table – Many to Many using DAX formulas in action!
Now for the Italian Black Magic
There is another way you can write these formulas that is simpler to write and easy to read – unfortunately it is difficult to understand how it works – Marco Russo calls it Black Magic. Here are the 2 formulas.
Total Sales Qty = CALCULATE(SUM(Sales[Qty]), BOM)
Total BOM Qty = CALCULATE(SUM(BOM[Qty]), Sales)
When you compare these Black Magic formulas against the Pseudo DAX formula I wrote earlier, you will see that I am using the BOM table as the filter expression in the first formula, and the Sales table as the filter expression in the second formula. This doesn’t make any sense on first sight. If filters always propagate from the one side of the relationship to the many side of the relationship, how can these formulas possibly work? This can be explained with “Expanded Tables”.
Power Pivot is built on top of some more traditional database technologies and hence what happens inside Power Pivot can be converted (or thought of) in more traditional database patterns and structures behind the scenes. In SQL terms, the relationships between the Sales table (shown as 1 below), the Calendar table (2) and the Products table (3) are:
Sales Left Outer Join Calendar ( 1 to 2)
Sales Left Outer Join Products ( 1 to 3)
If I had these tables in a relational database, I could materialise the sales table into an Expanded Table that contains all the original Sales columns, plus the Calendar table columns and the Product table columns. To do tihs I could write the following SQL Query:
Select * from Sales Left Join Calendar on Sales.Date = Calendar.Date Left Join Products on Sales.[Product ID] = Products.[Product ID]
The above query will return the following Expanded Table with columns of data coming from 3 different tables.
Technically speaking when I place the Sales table as a filter argument inside the following formula…
Total BOM Qty = CALCULATE(SUM(BOM[Qty]), Sales)
…I am actually placing the Expanded Table – the Sales table plus all the relevant records from the other tables on the one side of the relationships.
The Expanded table will still be filtered by the current filter context. So if there is a filter on the Calendar table (say for 3rd Jan), then the Calendar table will filter the Sales table AND the Expanded Sales Table. If I re-run the SQL code with a filter on Calendar[Date] = ‘3 Jan 2016’ I get this new Expanded Table. The Calendar table is filtering the Sales table, and the Sales table is filtering the Products table.
So when Sales table is used as the filter portion of the CALCULATE function, you can only “see” the Sales table, but it is actually the entire Expanded Sales table (including the Calendar and Products tables and any filters from all of these 3 tables) that is doing the filtering, not just the single Sales table. Filters from all 3 tables are therefore effectively filtering the BOM table and that is why it works.
Here are copies of my workbook files if you want to take a look.
Where to learn more
I have learnt most of my advanced DAX knowledge from The Italians (Marco Russo and Alberto Ferrari). There is an excellent video available here where Alberto explains Many to Many relationships. I also recommend the book “The Definitive Guide to DAX” for anyone that wants to develop a deep mastery of the DAX language.
For people that are earlier in the learning stages of DAX, you really must master filter propagation and context transition before you can move forward. If you haven’t mastered these techniques, I recommend you invest some time going back over the basics and make sure you have a solid understanding of the pre-requisites. The fastest, cheapest and most effective way you can do this is to read my book “Supercharge Power BI”.