Simple Filters and Syntax Sugar in DAX functions

Name: BI Training
Address: AU
Telephone: @
Price range: @

By Matt Allington / April 9, 2019 December 24, 2021

Level: Intermediate

There is a concept in the DAX language called “Syntax Sugar”. Simply put, the developers have created simplified versions of more complex formulas to make it easier for people to learn and use the DAX functions. Here are a couple of examples.

Total Sales Syntax Sugar = SUM(Sales[ExtendedAmount])

Total Sales Full Syntax = SUMX(Sales, Sales[ExtendedAmount])

Total Sales Prior Year Syntax Sugar =
      CALCULATE(
          [Total Sales],
          SAMEPERIODLASTYEAR('Calendar'[Date]
      )

Total Sales Prior Year Full Syntax =
      CALCULATE(
          [Total Sales],
          DATEADD('Calendar'[Date], -1, YEAR)
      )

Simple Syntax in CALCULATE DAX function

Another example of syntax sugar is the simple syntax inside CALCULATE. There are 2 different ways you can apply a filter inside CALCULATE.

= CALCULATE(<expression>, Table[Column] compared to a scalar value, ...)

= CALCULATE(<expression>, TableFunction(),. ..)

Consider the following example using the simple syntax.

Total Sales of Bikes Simple =
     CALCULATE(
         [Total Sales],
         Product[Category] = "Bikes"
     )

The above formula is syntax sugar for the following full DAX function formula.

Total Sales of Bikes Full Version =
     CALCULATE(
         [Total Sales],
         FILTER(
              ALL(Product[Category]),
              Product[Category] = "Bikes"
         )
     )

As you can see, the simple version of the DAX formula is clearly easier for a beginner to learn and understand. Under the hood, the simple version is converted to the full version prior to execution.

But Why use ALL(Table[Column])?

One thing a curious mind may be interested in is “why does the full version use ALL(Table[Column]) and not ALL(Table)”? Before I answer that question, let me first talk about compression. The Vertipaq engine (used in Power BI and Power Pivot) is a column store database. Traditional databases (eg SQL Server) use row store technology – each row is stored one full row at a time. A column store database stores the data one column at a time. Some of the consequences (benefits) of a column store database are that it is very efficient to filter, iterate and operate over entire columns of data in a table.

Compression Explained

There are a few different types of compression used in a the Vertipaq database, and I am only going to discuss one of them here. Let’s assume you have a data table with 6 columns and 100 million rows, and one of the columns is “Quantity”. The objective is to sum the quantity and come up with a total. A row store database (without an index) would need to retrieve all 100 million rows of data, including those columns not needed in the calculation, to come up with the answer. A column store database can directly access the single column to get the answer. What’s more, the columns can be compressed by the Vertipaq engine prior to being stored and accessed. One compression method is called run length encoding (RLE).

In the image below, the original column of data contains various values (in this case, the numbers 1 to 5). The way RLE works is that it first sorts the column, then creates a compressed version of the column, similar to that shown below (illustrative).

As you can see above, the compressed version of the column is small compared to the original. In my example the original column/table has only 18 rows, but if the original table had 100 million rows with values from 1 to 5, it would still compress to exactly the same compressed version as shown above. The calculation to add up the columns in my sample above would therefore look something like this (pseudo code).

Value 1 exists 3 times (starting row 4 minus 1). Multiply 1 * 3
Value 2 exists 3 times (starting row 7 minus 4). Multiply 2 * 3
Value 3 exists 4 times (starting row 11 minus 7). Multiply 3 * 4
Value 4 exists 3 times (starting row 14 minus 11). Multiply 4 * 3
Value 5 exists 5 times (starting row 19 minus 14). Multiply 5 * 5
Add up all the totals

As you can no doubt understand, the above process is identical for an 18 row column and a 100 million row column. What’s more, the Vertipaq engine is mutli-threaded. If you have 4 cores on your machine, the calculations can be shared amongst the cores and completed in parallel before finally being combined to return the final result.

Uniqueness is Your Enemy

It follows that the more unique values in a column, the lower the compression. There also comes a tipping point where it is better just to store the uncompressed column than it is to try to compress the data. This is why it is better to remove un-needed precision and uniqueness from a column before loading. Examples include rounding to 2 decimal places rather than keeping unrounded decimal numbers, separating date and time into 2 columns and rounding the time to minutes rather than keeping the seconds (if seconds are not needed) etc.

Back to ALL (Table[Column])

Now, back to the syntax sugar. The reason it is more efficient to filter, iterate and operate over a column than a table is because the engine can iterate over the compressed version of the column. the ALL() DAX function returns a table that contains all the distinct values in a column. This is exactly what the “Value” column in the compressed table in the image above contains (except the blank row at the bottom). So that is the reason the syntax sugar is written the way it is – it is to leverage the efficiencies of the Vertipaq engine.

Final Advice for DAX functions

In the wise words of Alberto Farrari, “Never filter a table if you can filter a column instead”. Now you know why.

About The Author

Matt Allington

Matt Allington is the Data Professional you want to be trained by. Matt does a phenomenal job of breaking concepts down into easily digestible chunks. He is also the principal consultant at Excelerator BI Pty Ltd. There are 3 main areas where he can help you save months and even years of self-learning: Kickstart Power BI in your organisation, training and consulting. He also brings his 35-year career expertise in business and data analytics directly to you with his high quality Power BI training courses and consulting. Check out his Public Training and begin your Power BI Ninja journey!

17 thoughts on “Simple Filters and Syntax Sugar in DAX functions”

retro bowl
February 2, 2023 at 3:46 pm

Wonderfully written piece. How did you figure out what the DAX “sugar” functions really called?

Reply
1. Matt Allington
  February 3, 2023 at 10:36 am
  
  A combination of learning from others, speaking to Jeffrey Wang at Microsoft, and then once I had an understanding of how it works, you can actually guess.
  
  Reply
Mishkafofer
May 17, 2020 at 3:35 am

I suspect that Syntax Sugar uses a boolean logic to retain efficiency due to the issue that FILTER is considered generally inefficient. After reading Microsoft article I think it explains the following error that you get when you try to use Syntax sugar with a measure
(For example: CALCULATE([Total Orders Measure], Big_Table[SomeColumn] > [Avg Price Measure])

The error that you will get: “A function ‘Calculate’ has been used on a True/False expression that is used as a table expression. This is not allowed”.

https://docs.microsoft.com/en-us/power-bi/guidance/dax-avoid-avoid-filter-as-filter-argument

Reply
1. Matt Allington
  May 18, 2020 at 3:05 pm
  
  No, that’s not it. FILTER is not generally considered inefficient, as long as you use it efficiently. The syntax sugar in this case ensures that FILTER is executed efficiently under the hood.
  
  Reply
Randy Minder
August 11, 2019 at 6:00 am

Hi Matt – Nice article. How were you able to determine the actual underlying code the DAX “sugar” functions are calling? This could be extremely educational and helpful in certain circumstances.

Reply
1. Matt Allington
  August 11, 2019 at 8:43 am
  
  Well i know some people that know this stuff – Marco Russo, Jeffrey Wang. But if you use DAX studio, you can see the queries generated by Power BI. You can write the different measures and see that the same query gets executed. However it is not as simple as just referring to a list of syntax sugar.
  
  Reply
  1. Randy Minder
    August 11, 2019 at 11:00 am
    
    Thank you Matt. DAX Studio will not show what I want. For example, if I call TOTALYTD in my code, DAX Studio shows that I called TOTALYTD but it does not show the code TOTALYTD is actually executing. This is what I’d love to see.
    
    Reply
Matt Allington
April 15, 2019 at 6:34 am

The only reason I can think of that you would need one column is if you want to create a relationship using that column. That implies you will also have a date/time calendar table. This is not a common requirement. And yes, don’t split the columns and then add a calculated column later. This is worse for 2 reasons. Firstly you now have 3 columns instead of 2, and second calculated columns are generally less well compressed (not always, but at least there is the possibility that they are less well compressed).

Reply
Uriel Ilan
April 15, 2019 at 3:13 am

ThX Matt
This article made me think one step forward, regarding optimizing one’s Data:
In the article you gave an example that it will help separate Date from time, in to 2 columns, so there will be less unique members in the column on the expense of having 2 columns.
I understand this well but then I think:
If I must have one dimension (or column) of both (as in your example) of date and time, will it payoff (from the performance side only) to separate them, as you mentioned in to 2 columns, and then with DAX add a third calculated column of them joined together? or will all the advantage be loosed?

Reply
Ed Hansberry
April 15, 2019 at 12:22 am

Excellent article. Understanding what is really happening behind the scenes helps me understand DAX better.

In your first set of examples, isn’t there still some syntax sugar to be teased out:
Total Sales Prior Year Full Syntax =
CALCULATE([Total Sales],DATEADD(Calendar[Date],-1,YEAR))

Isn’t *that* syntax sugar for this:
Total Sales Prior Year Full Syntax =
Total Sales Prior Year Full Syntax =
CALCULATE(
[Total Sales],
DATEADD(CALCULATETABLE(VALUES(‘Date'[Date])),-1,YEAR)
)

Reply
1. Matt Allington
  April 15, 2019 at 6:39 am
  
  Possibly, I don’t know for sure. I can’t see what CALCULATETABLE does in your example, as VALUES already respects the filter context. Then of course the second and third parameters are doing something under the hood too. I guess at the nth degree, all code is just syntax sugar for machine code.
  
  Reply
2. Oxenskiold
  May 12, 2019 at 4:42 pm
  
  You’re right Ed.
  The ‘Date'[Date] part of DATEADD is syntax sugar for CALCULATETABLE(VALUES(‘Date'[Date])). This is important to realize because DATEADD(Calendar[Date],-1,YEAR) behaves differently from ,DATEADD(VALUES(Calendar[Date]),-1,YEAR) when evaluated under a row context.
  
  “Understanding what is really happening behind the scenes helps me understand DAX better.”.
  I couldn’t have put it better.
  
  If you’re a stickler for details you could argue that the real syntax sugar is CALCULATETABLE(DISTINCT(‘Date'[Date])); using DISTINCT instead of VALUES. This because the time intelligence functions generally ignore the blank row (Unknown member). (How would you shift/move the blank row/unknown member?).
  
  Best regards Jes.
  
  Reply
Iretiola Fola-bolumole
April 10, 2019 at 12:27 am

Great explanation on the column filter. The picture made it easy to understand

Reply
Nazim Hussain
April 9, 2019 at 6:45 pm

Matt, I am currently facing a problem of connection my pbix file to an sql adventureworks2012 installed on
my pc. I have tried a lot of combinations and permutations but to no avail. Can u please guide me in this.

Thanks in advance

Reply
1. Matt Allington
  April 13, 2019 at 10:56 am
  
  I think it best if you post a question at community.powerbi.com
  
  Reply
rob maessen
April 9, 2019 at 3:58 pm

Again a great article, Matt. However one typo in the pseudo code: “Multiply 5 * 5” ?

Reply
Jeff Weir
April 9, 2019 at 7:52 am

Beautiful article, Matt. Run Length Encoding is analogous to ‘Range Slicing’ in Excel. I wrote something on Range Slicing some time back at Stack Overflow that might be of interest by way of comparison: https://stackoverflow.com/a/32488911/2507160

Reply