The FILTER Function in DAX Part 2 - Excelerator BI

The FILTER Function in DAX Part 2

Level: Intermediate

In my last article, I introduced the basics of the DAX FILTER() function. Hopefully that introductory explanation to FILTER() helped you get your head around what FILTER() is doing under the hood. With that knowledge in place, let me help you take your understanding to the next level.

Let’s look back at my example from last article

Total Sales to Customers Earning More Than $50k =
 CALCULATE (
     SUM ( Sales[ExtendedAmount] ),
     FILTER ( Customers, Customers[YearlyIncome] > 50000 )
 )

As I explained last time, the FILTER() portion of the formula above is executed one row at a time through the table specified in the first parameter. Technically, we say that the FILTER() function has a row context, and it iterates through the rows in the specified table, one row at a time to complete its task. For this reason, we say that FILTER() is an iterator. There are many other iterating functions in DAX, including SUMX(), MINX() and all the other “X functions”.  All iterators have a row context.  The easiest way to spot them is they all have a table as the first parameter in the function.  A calculated column is also an iterator, but of course there is no table as the first parameter – because IT IS a table.

Now there are certain things that are not outwardly obvious with what I have covered so far.

Even though FILTER() logically executes one row at a time, in reality there are optimisations in the engine that CAN make it more optimised than the simplistic “row by row” iterator concept suggests (not always – you can write bad DAX). Even though there can be optimisations under the hood, you should still follow some basic rules when using FILTER().

  1. Generally speaking you should not filter a data (fact) table if you can filter a Lookup (Dimension) table instead. In my example, we are filtering a dimension table, so it is not too bad, but as I will show you below, there is actually a better way to write this formula.
  2. You should always filter a column in preference to a table where possible. The exception is if you are filtering a table with a primary key and need to iterate at the same level of granularity as the primary key. A simple example is a lookup (dimension) table. Generally speaking, if you are iterating every row in a dimension table, it doesn’t matter if you iterate the whole table or the primary key of the table – the efficiency is effectively the same.

Filter Columns in Preference to Tables

Now this is interesting. The syntax for FILTER() requires that the first parameter be a table, not a column.  If I place a column in the first parameter, I get an error.

Illegal measure =
CALCULATE (
    SUM ( Sales[ExtendedAmount] ),
    FILTER ( Customer[Customer Key], Customers[YearlyIncome] > 50000 )
)

The formula above is illegal because it has a column as the first parameter in FILTER(). This is not allowed. So how can you filter a column? The trick is to turn the column into a virtual table first. The two common ways of doing this are:

  • ALL(Table[Column])
  • VALUES(Table[Column])

Technically you could use any table function that achieves the same outcome (SUMMARIZE, DISTINCT, ALLSELECTED etc)

Let’s look at the two common approaches above.  Which should you use?  Well, ALL() and VALUES() do different things.

  • The ALL() function will contain all the unique values in the column after removing all the filters affecting the visual.
  • The VALUES() function contains all the unique values in the column but RETAINS the filters affecting the visual.

Use the one that does what you need.

Both of these virtual tables will retain lineage to the data model. I talked about lineage in the last article, and I wrote about lineage in depth in this older article. I find the easiest way to think of lineage is to imagine a virtual table that is temporarily living inside the data model and this temp table contains a virtual relationship back to the table from where it was born (hence the term lineage).

Let’s look at one possible modified version of the formula.

Different, but not great formula =
CALCULATE (
    SUM ( Sales[ExtendedAmount] ),
    FILTER (
        ALL ( Customers[CustomerKey] ),
        CALCULATE ( SUM ( Customers[YearlyIncome] ) ) > 50000
    )
)

The new formula follows the rule “don’t filter a table if you can filter a column”. But in this case the column and the table have the same cardinality, so there is little benefit there. Also, the new formula requires a second CALCULATE() and SUM() inside the FILTER() function. This is required because the column Customers[YearlyIncome] is no longer in the same table that FILTER() is iterating. The FILTER() function is iterating a virtual, single column table that contains all customer keys in the customer table. The column Customers[YearlyIncome] doesn’t exist in this virtual table, it exists in the Customers table, so you must wrap the column in an aggregation function, SUM() in this case. Further, as the FILTER() function iterates in a row context through the virtual table, the virtual relationship does not filter the connected tables UNLESS you specifically tell the formula to do so. Technically, to make the filter propagate from the new virtual table created by ALL(Customers[CustomerKey]), we need to convert the row context into an equivalent filter context via context transition. Context transition is triggered by the inner CALCULATE() inside the FILTER() function in this case.

Ok, this formula works, but it is probably no more efficient than the first one that referred to the Customers table (maybe worse due to all those context transitions). It is iterating the same number of rows, plus we now have a context transition required for each step. I generally would not write this formula in preference to the first.

A Much Better Formula

Ok, so what then? Ideally we need to find a column with a lower level of granularity that we can iterate through (less often), that will give us the same result. The prime candidate in this case is the Customers[YearlyIncome] Column as shown below.

Much better formula =
CALCULATE (
    SUM ( Sales[ExtendedAmount] ),
    FILTER ( ALL ( Customers[Yearly Income] ), Customers[Yearly Income] > 50000 )
)

This new formula is much better and ticks all the boxes.

  • It iterates over the smallest granularity table possible to accurately complete the test – in this case the table is a single column table containing all the unique possible income bands for all customers.  Because we are testing for income, we can deal with all the customers with the same income at once – this is FAR superior to dealing with each customer individually.
  • This time the Customers[YearlyIncome] column is in the same table that we are iterating, so no need for another CALCULATE() to force context transition at each step of the journey.
  • The outer CALCULATE() performs the final context transition between the virtual FILTER Table and the rest of the model at the end of the process.

Now, Back to Simple Filters

OK, now back to the beginning. Last week I told you that there were 2 types of filter parameters inside CALCULATE(). Here is a refresher.

There are 2 ways you can add filter parameters inside CALCULATE():

  • As a simple filter:
    • Table[Column] compared to some value, eg
    • Customers[YearlyIncome]>50000
    • Product[Category]=”Accessories”
  • As a table filter:
    • A table function is added as the filter. The table function is evaluated first, and the results in that table are used as a filter. eg
    • FILTER(Customers,Customers[YearlyIncome]>50000)
    • ALL(Products)

Up until now, I have only been talking about the second type of filter parameter – the table filter. So let me come back to the simple filter. The simple filter is just a trick – it is syntax sugar!

Simple Filter Version (syntax sugar) =
CALCULATE ( SUM ( Sales[ExtendedAmount] ), 
             Customers[Yearly Income] > 50000 )

When you write the above formula, what ACTUALLY gets executed by CALCULATE() is the following:

Actual Formula Executed Under the Hood =
CALCULATE (
    SUM ( Sales[ExtendedAmount] ),
    FILTER ( ALL ( Customers[Yearly Income] ),
               Customers[Yearly Income] > 50000 )
)

So why did Microsoft do this? Because they wanted to make it simple to do regular tasks that would be otherwise hard to do. Imagine if you had to learn about FILTER() and ALL() to efficiently write this formula? The barrier to learning for beginners would be too high, hence the concept of syntax sugar was born to help you do complex tasks easily.

What Next?

Ok, so if you like the way I teach and explain concepts, you can learn from me in a structured way. You may like to consider reading my book, participating in one of my online training courses, or attending a live Australian training course, or have me come into your organisation for a custom live in house training course.

10 thoughts on “The FILTER Function in DAX Part 2”

  1. Hi Matt, I have come to your page hunting for filtering sales by multiple dimension tables. Please consider problem of filtering Sales table by dimension table colors and dimension table sizes. How to filter products with color=red OR size=big? Would you be so kind to direct me anywhere or give a hint? How to achieve this using FILTER and KEEPFILTERS? Greetings Przemyslaw Remin

    1. I believe this is what you need

      =
      CALCULATE (
      [your measure],
      DISTINCT (
      UNION (
      FILTER ( ALL ( product[colour] ), product[colour] = “red” ),
      FILTER ( ALL ( product[size] ), product[Size] = “big” )
      )
      )
      )

  2. Thanks Matt. I have used Filter quite a bit, but I see now I really never understood it completely. Many times I would get it to work without complete understanding. The concept of the virtual table has brought much clarification.

  3. Fernando Camargo

    Great article, I didn’t know that I could make a less expensive formula using ALL to filter by a column instead of the whole virtual table. BTW I did read your book Learn to write DAX, I highly recommend it.

  4. I was struggling to learn DAX function, I was looking for good resource where I can learn and utilize the same in my day to day working life. Their are many paid and free resources are available in internet.I found your study material which is really helpful to learn easy and connect to my work. Your examples practical base.

    Thanks Matt for your great help.

  5. Hi Matt, love the way you’ve explained this!

    When talking about the Much Better Formula you say “in this case the table is a single column table containing all the unique possible income bands for all customers”. This might seem strange to some readers as the definition of ALL clearly states you get all the rows in the table/column with no filters.

    So here i assume ALL(Customers[Yearly Income]) implies a unique list of income bands since the compressed column in VertiPaq internally contains the unique values.

    Am i following your logic correctly here?

    1. Hey Bhavik. You are exactly right. It is the fact that the Vertipaq engine stores data in a compressed way in columns, not rows, that makes this the preferred way to filter the data. We are helping the engine filter the data the best way possible. A good way to visualise this is to create a new table in Power BI using the New Table button. Here you CAN materialise the table so you can “see” it. Table = ALL(Customers[YearlyIncome]) will display a distinct list of all the incomes. The other way to “see” it is to go to the data view and click the drop down filter at the Customers[YearlyIncome] column – it is the same list.

      1. Awesome Explanation!! Thank you so much Matt. This article helped me to understand how the engine behaves when using “All” within a filter() which was a question inside my head for ages.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top