In my last article, I introduced the basics of the DAX FILTER() function. Hopefully that introductory explanation to FILTER() helped you get your head around what FILTER() is doing under the hood. With that knowledge in place, let me help you take your understanding to the next level.
Let’s look back at my example from last article
Total Sales to Customers Earning More Than $50k =
SUM ( Sales[ExtendedAmount] ),
FILTER ( Customers, Customers[YearlyIncome] > 50000 )
As I explained last time, the FILTER() portion of the formula above is executed one row at a time through the table specified in the first parameter. Technically, we say that the FILTER() function has a row context, and it iterates through the rows in the specified table, one row at a time to complete its task. For this reason, we say that FILTER() is an iterator. There are many other iterating functions in DAX, including SUMX(), MINX() and all the other “X functions”. All iterators have a row context. The easiest way to spot them is they all have a table as the first parameter in the function. A calculated column is also an iterator, but of course there is no table as the first parameter – because IT IS a table.
Now there are certain things that are not outwardly obvious with what I have covered so far.
Even though FILTER() logically executes one row at a time, in reality there are optimisations in the engine that CAN make it more optimised than the simplistic “row by row” iterator concept suggests (not always – you can write bad DAX). Even though there can be optimisations under the hood, you should still follow some basic rules when using FILTER().
- Generally speaking you should not filter a data (fact) table if you can filter a Lookup (Dimension) table instead. In my example, we are filtering a dimension table, so it is not too bad, but as I will show you below, there is actually a better way to write this formula.
- You should always filter a column in preference to a table where possible. The exception is if you are filtering a table with a primary key and need to iterate at the same level of granularity as the primary key. A simple example is a lookup (dimension) table. Generally speaking, if you are iterating every row in a dimension table, it doesn’t matter if you iterate the whole table or the primary key of the table – the efficiency is effectively the same.
Filter Columns in Preference to Tables
Now this is interesting. The syntax for FILTER() requires that the first parameter be a table, not a column. If I place a column in the first parameter, I get an error.
Illegal measure = CALCULATE ( SUM ( Sales[ExtendedAmount] ), FILTER ( Customer[Customer Key], Customers[YearlyIncome] > 50000 ) )
The formula above is illegal because it has a column as the first parameter in FILTER(). This is not allowed. So how can you filter a column? The trick is to turn the column into a virtual table first. The two common ways of doing this are:
Technically you could use any table function that achieves the same outcome (SUMMARIZE, DISTINCT, ALLSELECTED etc)
Let’s look at the two common approaches above. Which should you use? Well, ALL() and VALUES() do different things.
- The ALL() function will contain all the unique values in the column after removing all the filters affecting the visual.
- The VALUES() function contains all the unique values in the column but RETAINS the filters affecting the visual.
Use the one that does what you need.
Both of these virtual tables will retain lineage to the data model. I talked about lineage in the last article, and I wrote about lineage in depth in this older article. I find the easiest way to think of lineage is to imagine a virtual table that is temporarily living inside the data model and this temp table contains a virtual relationship back to the table from where it was born (hence the term lineage).
Let’s look at one possible modified version of the formula.
Different, but not great formula = CALCULATE ( SUM ( Sales[ExtendedAmount] ), FILTER ( ALL ( Customers[CustomerKey] ), CALCULATE ( SUM ( Customers[YearlyIncome] ) ) > 50000 ) )
The new formula follows the rule “don’t filter a table if you can filter a column”. But in this case the column and the table have the same cardinality, so there is little benefit there. Also, the new formula requires a second CALCULATE() and SUM() inside the FILTER() function. This is required because the column Customers[YearlyIncome] is no longer in the same table that FILTER() is iterating. The FILTER() function is iterating a virtual, single column table that contains all customer keys in the customer table. The column Customers[YearlyIncome] doesn’t exist in this virtual table, it exists in the Customers table, so you must wrap the column in an aggregation function, SUM() in this case. Further, as the FILTER() function iterates in a row context through the virtual table, the virtual relationship does not filter the connected tables UNLESS you specifically tell the formula to do so. Technically, to make the filter propagate from the new virtual table created by ALL(Customers[CustomerKey]), we need to convert the row context into an equivalent filter context via context transition. Context transition is triggered by the inner CALCULATE() inside the FILTER() function in this case.
Ok, this formula works, but it is probably no more efficient than the first one that referred to the Customers table (maybe worse due to all those context transitions). It is iterating the same number of rows, plus we now have a context transition required for each step. I generally would not write this formula in preference to the first.
A Much Better Formula
Ok, so what then? Ideally we need to find a column with a lower level of granularity that we can iterate through (less often), that will give us the same result. The prime candidate in this case is the Customers[YearlyIncome] Column as shown below.
Much better formula = CALCULATE ( SUM ( Sales[ExtendedAmount] ), FILTER ( ALL ( Customers[Yearly Income] ), Customers[Yearly Income] > 50000 ) )
This new formula is much better and ticks all the boxes.
- It iterates over the smallest granularity table possible to accurately complete the test – in this case the table is a single column table containing all the unique possible income bands for all customers. Because we are testing for income, we can deal with all the customers with the same income at once – this is FAR superior to dealing with each customer individually.
- This time the Customers[YearlyIncome] column is in the same table that we are iterating, so no need for another CALCULATE() to force context transition at each step of the journey.
- The outer CALCULATE() performs the final context transition between the virtual FILTER Table and the rest of the model at the end of the process.
Now, Back to Simple Filters
OK, now back to the beginning. Last week I told you that there were 2 types of filter parameters inside CALCULATE(). Here is a refresher.
There are 2 ways you can add filter parameters inside CALCULATE():
- As a simple filter:
- Table[Column] compared to some value, eg
- As a table filter:
- A table function is added as the filter. The table function is evaluated first, and the results in that table are used as a filter. eg
Up until now, I have only been talking about the second type of filter parameter – the table filter. So let me come back to the simple filter. The simple filter is just a trick – it is syntax sugar!
Simple Filter Version (syntax sugar) = CALCULATE ( SUM ( Sales[ExtendedAmount] ), Customers[Yearly Income] > 50000 )
When you write the above formula, what ACTUALLY gets executed by CALCULATE() is the following:
Actual Formula Executed Under the Hood = CALCULATE ( SUM ( Sales[ExtendedAmount] ), FILTER ( ALL ( Customers[Yearly Income] ), Customers[Yearly Income] > 50000 ) )
So why did Microsoft do this? Because they wanted to make it simple to do regular tasks that would be otherwise hard to do. Imagine if you had to learn about FILTER() and ALL() to efficiently write this formula? The barrier to learning for beginners would be too high, hence the concept of syntax sugar was born to help you do complex tasks easily.
Ok, so if you like the way I teach and explain concepts, you can learn from me in a structured way. You may like to consider reading my book, participating in one of my online training courses, or attending a live Australian training course, or have me come into your organisation for a custom live in house training course.