Sometimes, just when you think you are getting the hang of the DAX language, something happens that completely baffles you. Such events can cause you to doubt your own learning and progress. But let me assure you – this happens to EVERYONE at sometime or another. In this article I am going to show you one such baffling situation, and then explain what is really happening.
I am using a simple Adventure Works data model for this demo.
The final objective is to write the following measure
Total Northern Hemisphere Sales in 2003
In other words, I want a measure that will give me sales for all the countries excluding Australia. (BTW, did you know that there is just 5.7% of the global population that has summer in Dec – Feb each year (ie summer in the southern hemisphere)? – read my article about that here).
I have set up a table that contains the Territory[Country] on Rows and I have added 2 measures.
Total Sales = SUM(Sales[ExtendedAmount])
Total Sales in 2003 = SUM([Total Sales],'Calendar'[Year]=2003)
All that is required now is to remove Australia in the measure to deliver the final required measure. Here is the first attempt at such a measure
Total Northern Hemisphere Sales in 2003 = CALCULATE([Total Sales], 'Calendar'[Year]=2003, Territory[Country]<>"Australia")
When this measure is added to the above pivot table, this is what happens.
The measure indeed is returning Total Northern Hemisphere Sales in 2003, however the problem is that it ALWAYS gives the TOTAL sales regardless of any other filter coming from one of the Countries. Plus, what’s worse, Australia pops up again when the whole purpose of the measure was to exclude Australia from the calculation. This may not be what you expect and sometimes not what you want. Arguably it would be more useful if this measure returned $6.8M at the Grand Total line, but still returned the sales by country when sliced by each country (and excluded Australia in the table).
To understand what is happening here, I need to cover the topic of “syntax sugar”. Syntax sugar is a term that the Microsoft developers use to describe the way they simplify the syntax of DAX. They do this to make it easier for people to get started with the DAX language without having to be an expert. This is indeed helpful, but by definition it hides more complex principles from the DAX author, and these can come back to confuse you at some time in the future.
Total Sales from Bikes
Let me take a slight diversion from the DAX formula above. Say you wanted to write a measure “Total Sales from Bikes”. Thanks to syntax sugar, this is very easy to do.
Total Sales from Bikes = CALCULATE([Total Sales],Product[Category]="Bikes")
But this simple syntax is just syntax sugar for a more complex formula. Under the hood, this is the actual formula that gets executed.
Total Sales from Bikes = CALCULATE ( [Total Sales], FILTER ( ALL ( Product[Category] ), Product[Category] = "Bikes" ) )
The second formula is harder to learn than the first one. Of course you CAN learn it, but it is just not the first thing the average person wants to learn when the objective is to do something as basic as returning Total Sales from Bikes. There is a lot going on with this second formula including a FILTER function and also the ALL function – not something you would want to learn in the first couple of hours of your DAX journey – hence the purpose of the syntax sugar concept.
The Full Formula From Above
Back to the original formula Total Northern Hemisphere Sales in 2003. You now know that the full actual formula is as follows.
There are 2 FILTER functions ‘hidden’ from the simple syntax that are visible in the full formula above. Each of these formulas first removes the filters applied to the columns (Calendar[Year] and Territory[Country]) and then each FILTER re-applies a new filter on that same column. So this explains the strange behaviour shown at the start of this article. Line 5 in the formula above removes the filters coming from Territory[Country] (which is in the rows section of the table below), and then reapplies a new filter on the Territory[Country] table so that it returns all countries other than Australia, returning $6.8M for every row in the table.
As you can see above, this can be confusing because the rows in table appear to give the wrong answer. In this case, the rows in the table above are “fighting” with the CALCULATE function for the right to apply a filter on the Country[Territory] column. When a filter is applied to a column from a visual and then another filter is applied on the same column in a CALCULATE, the CALCULATE always wins. So the visual in the row highlighted above says “put a filter on Australia” but the CALCULATE inside the measure [Total Northern Hemisphere Sales in 2003] says “put a filter on all countries except Australia”. They are “arguing” over the same column and CALCULATE wins.
Why have the ALL function at all?
The next question that may come to mind is “why have the ALL function at all”? Why doesn’t the syntax sugar just return the following?
If I add the above formula to the table, this is what happens.
This time the measure returns something that is more intuitive for the average user ( I think you will agree). So why didn’t the developers use this formula as the syntax sugar instead of the version including ALL? Well I can only speculate (others may know more than me), but one thing I know for sure is that where ever possible, it is advisable to operate on a single column in preference to working on an entire table. This is particularly true in formulas that are iterators (such as FILTER and the X Functions). The Vertipaq engine is designed to store data in Columns and as a consequence it operates most efficiently when it acts on single columns. In the case of the Territory table in this example, the number of rows in the Territory table and the number of distinct values in the Country column are not that different (as can be seen below). So in this case the performance difference would not be significant, however the principle of operating on columns rather than tables is still well advised.
An Alternative Formula using VALUES
There is an alternative formula that uses columns and still gives the more intuitive result. In fact there are a few options that could work and this is just one.
The formula above uses VALUES instead of ALL inside the FILTER function. This is more efficient than referring to the entire table and has the added benefit of respecting the filter applied by the rows in the visual.
So why didn’t the developers use VALUES underneath the syntax sugar of a simple CALCULATE instead of ALL? Once again, I don’t know the answer to that – maybe others know more than me. Maybe the VALUES function didn’t exist when they first came up with the simple syntax for CALCULATE, maybe there is some other technical reason that ALL is superior. But it doesn’t really matter once you understand more about how the DAX language works, because you can then use your skills, knowledge and experience to solve almost any problem you come across.