Using Variables in DAX

Level: Intermediate

The variable feature in DAX has been around for more than a year and I have simply never got around to writing an article about it – I am changing that today.  Let me start off with some version information and syntax and then I will show you are real life example where I have used variables to improve my formula.

Versions

Variables in DAX is a relatively new feature and is available in

  • Power BI Desktop
  • Excel 2016
  • SSAS Tabular 2016

Variables are not available in Excel 2013 or Excel 2010.

Syntax

I normally refer to variables as “the VAR syntax”.  In short there are 2 new keywords that are used together to make your formulas more readable.  The keywords are VAR and RETURN.  This is how you would write a typical formula.

myFormula = VAR = VariableName1 = <some valid DAX formula>
            VAR = VariableName2 = <some other valid DAX formula>
            RETURN <another valid DAX formula that can use VariableName1 
                       and VariableName2 as part of the expression>

Notes on the Syntax

It is possible to set scalar values as variables but it is also possible to assign tables as variables. See the following valid examples for assigning a variable within a formula.  Note that the following are just examples of the VAR line.  You must always use VAR and RETURN to write a valid complete DAX formula.

Scalar Value Example

VAR myScalarValue = SUM(Sales[Extended Amount])

Table Example

VAR myTable = FILTER(Customer,Customer[Post Code] = 50210)

The above variable myTable returns a virtual table that can be used anywhere that a table is used in a DAX formula, such as a filter inside a CALCULATE function.

Other Important Points

  • Each of the variables are evaluated once prior to the commencement of the RETURN portion of the formula.
  • Variables can refer to other variables.
  • The variables are executed in the initial filter and row contexts.
  • The results of the variable calculations are cached for reuse inside the formula.
  • Once the variable has been assigned a value, that value cannot change during the execution of the RETURN portion of the formula. From that perspective, the variables act more like constants than regular variables in a traditional programming language context.

Benefits

There are many benefits of using the VAR syntax, particularly when you also use in line comments.

  • The formulas can be easier to write. It may seem to take more time to write the formula at first, but you may just end up saving that time by getting to a working version of the formula faster, with less rework.
  • The formulas can be easier to read.  A reader can understand the logic of the formula step by step and hence it can be easier to work out what the formula is designed to do.  This is particularly true when the author uses comments, new lines and spacing as well.
  • The formulas can be easier to maintain when changes are needed – see my example below.
  • Formulas that use repetitive sub strings of DAX can be written once and then used many times.  This is good for formula writing but also for performance (although DAX already had some good caching features prior to this syntax).
  • Given the variables are executed in the initial filter context, this syntax can be an easier alternative to using the EARLIER function (when you have nested row contexts inside a complex formula).

Real Life Example

I was with a client last week building a sales force incentive tracking system.  It is very common for sales management to want to measure performance across a range of different business drivers in order to focus the sales force on what is important.  As the saying goes, “what gets measured gets done”.  Power BI, Power Pivot and Power Query are excellent tools for this type of reporting.  Reasons include:

  • Often the data sources are disparate and need to be brought together into a single report.
  • The time horizon for the tracking is normally short and means that the cost and effort of a full blown IT BI solution often makes it hard to justify.
  • The effort to manually produce such a report on a regular cycle (say monthly) can be high and hence the work tends to get done less regularly (say quarterly or even half yearly).  Irregular updates of such tracking tools defeats the purpose of the entire exercise.

In short my client has an annual “team of the year” award that is based on a number of different criteria across many different data sources including sales performance, execution effectiveness, new product launch effectiveness etc – pretty standard sales stuff.  Teams are awarded a number of points in each area reflecting based on performance;  the points available are relative to the importance of each business driver. Teams are awarded points across all areas based on performance and then the points can be added up to come up with an overall score and hence the winning team.

Assigning Points for Sales Growth

I am going to share the formula I wrote for assigning points for just one of the areas – sales growth. The business rules for assigning points were as follows:

  • If you decline on last year, the team gets no points.
  • For each 0.1% growth between 0 and 1% over last year, the team receives 100 points.
  • For each 0.1% growth above 1% over last year, the team receives 200 points.

First a Design Decision

The first thing I had to decide was how to maintain these business rules.  I could think of 3 options.

  1. Hard code the business rules into the formula itself.
  2. Create a parameter table that contained all of the key numbers (eg how many points to assign, upper and lower trigger points to get those points etc).
  3. Write the formula using the VAR syntax.

I pretty quickly came to a conclusion that option 3 was the best in this case.  The effort to load a parameter table and then extract the values so they could be used is relatively high compared to using the VAR syntax.  Given the use of the business rules were limited to this one formula, I felt the VAR syntax was the best option.

Final Formula

Here is the final formula I ended up writing, but as often is the case, I think the process of writing the formula is much more interesting and useful than the final formula.

Writing the Formula

The Variables

Writing formulas often take some iterations and rework – it is seldom a case of just typing in the full correct formula from start to finish.  To start with I wrote variables for each of the business rules I was doing to need to apply (lines 2 to 6 above).  But it definitely wasn’t as simple as just typing each line one after the other.  In fact I ended up using Excel to build the formula first to break the problem into its component pieces.  Once I had the Excel version working, it was much easier to write the DAX.

The RETURN Portion

Note that the actual DAX function commences from line 7.  On line 9 you can see the first reference to one of the variables as an input into the final formula (GrowthRounded in this case).  As it turned out, I changed the logic of the way the points were assigned.  It seemed easiest to assign 100 points to all sales above 0 and then assign a further 100 points for all sales above 1% (as opposed to assigning 100 point between 0 and 1% and then assigning 200 points for sales above 1%).  But of course all of this is clearly documented in the formula so it is clear what was done and why.

Hopefully you can see from this example that the VAR syntax makes it easy to write formulas that can be read, maintained and understood by others.

How to Avoid the EARLIER Function

As I mentioned up front, The EARLIER function can be a bit confusing when you are learning DAX.  EARLIER is used to refer to a value in a previous row context – let me explain.  Consider the following 2 calculated columns in the customer table of Adventure Works. (note I am not saying you should write columns like this, I am just using them to demonstrate the use of EARLIER).

Calculated Column 1

Cust Sales = [Total Sales]  // this adds the total lifetime sales for each customer to the customer table.

Calculated Column 2

image

Note the use of EARLIER in line 4 above.  This formula is a calculated column and all calculated columns have a row context.  The FILTER function also has a row context so there are actually 2 nested row contexts in this formula.

What line 4 is actually saying is this.

“Filter the customer table and only keep customers that have sales greater than the sales for this one customer I am currently considering for this row of the calculated column”.  The EARLIER function is required to be able to “escape” the inner row context of the FILTER function and refer back to the outer row context created by the calculated column.

The formula can be considered like this.

Start Calculated Column (row context 1 - outer)

   For each row in row context 1, do the following….

     Start FILTER function (row context 2 - inner)

        For each row in row context 2, do the following….

            Check if this customer has more sales than me

        Next row for row context 2 (ie go to the next customer)

     End FILTER function

   Next row for row context 1

End Calculated Column

So you can think of a row context as an iteration through each row of the table to complete the same task for every row.  And you can nest as many row contexts as needed within other row contexts.

As I mentioned earlier in this article, it is possible to use the VAR syntax as an alternative to the EARLIER function because the VAR statements are evaluated in the initial filter and row contexts.  So here is the alternative syntax for the Rank Customer calculated column.

image

This new formula returns the same result but I think you will agree it is easier to read and understand.  The reason it works is because the VAR statement on line 2 is evaluated before the second row context is created by the FILTER function.

Share?

Comments

  1. Great one – very detailed as usual !
    I love variables, the fact they can store either scalar values or tables is so useful… hopefully we can get global variables and not just local to the measure.

    • Hi Tristan, I’m curious as to in which use cases a global variable will be of benefit.

      a. The measure cache is already a set of global variables!
      b. What filter context will be applied to a global variable?

      I’ve heard/seen/read your statement before but am still a bit confused about it. Is it a language feature (DAX) or a client tool feature (like in Power BI desktop) that is requested?

      Best regards Jes

  2. What I REALLY like about the above is it has a VBA feel about it, declaring variables, good commentary, clear logic etc. I can relate to that very well and have shared this article further

  3. Hi Matt
    You mentioned the benefits of using variables, but still I am not sure I see a real significant advantage in using variables over using EARLIER functions (calling simple measures in a complicated one). One sure advantage of calling EARLIER functions is that it is supported by older versions (Excel 2013 for one).
    Can you (or any one reading my comment) point out the advantages?

    • Yes I covered the limitations of which versions support the new syntax above. Of course if you version doesn’t support it, then it is not a benefit. In my experience when teaching new Excel users, they find it easier to understand the VAR syntax. I believe this is because they don’t have to conceptualise nested row contexts. They don’t even need to really understand that there is a nested row context. They can simply define the variable they want to compare with first, then execute the rest of the function. If you don’t see a benefit in doing it this way because you already understand earlier, then there is probably no benefit to you.

  4. Tried to replicate your example with the AdventureWorks DB and I found the circular dependency issue “Customers[Cust Sales], Customers[CUSTOMER LIFETIME RANK], Customers[Cust Sales]”

  5. Variables IMO is the single biggest improvement to DAX since inception. Functions are now much cleaner and therefore easier to read and maintain. I can’t imagine going back to life without them. I don’t know if this has happened to you, but with variables I find myself using fewer nested “portable measures” as I don’t always recall where all they are used and therefore how even a subtle change will impact calculations. Instead because I have variables more and more in complicated measures I encapsulate all function steps in a single measure. A refactoring trade off to be sure, but just the direction I’m gravitating right now.

Leave a Reply

Your email address will not be published. Required fields are marked *

Ten Things I Wish I Knew When I Started With Power Pivot.   Learn  Without The Pain!

x