I was helping someone on Power Pivot Forum last week with a tricky calculated column question. As an aside, you may know that I always advise beginners to keep away from calculated columns. You mostly don’t need calculated columns because in most circumstances a measure is a better option to use (read more about that here). On this occasion however the required formula is likely to be complex, the run-time performance of such a formula as a measure could be slow, plus the number of possible unique values from the DAX Formula is small. When these conditions are present, a calculated column can be a good solution.
My Process is More Important Than My Solution
I want to say before starting that this blog article is actually about the process I went through to get a working solution. I think the process is much more interesting and much more useful than the final formula. In addition, when you look at the final formula you might think that you don’t even understand what the formula is doing let alone know how to write it. But here is the killer point – when I look at the final formula I also think the same thing. This is why the process of writing such a formula is so important.
Here is the final formula
If you want to learn how to write such a formula then read on.
The Problem Explained
There is a table of data (see sample below). The table contains information about changes to membership status for members of a club. At any point in time, a member has
- A status (active or inactive)
- A type (full, partial, limited)
Both “status” and “type” are indicated by a code. The codes for status and type can and do change over time. When these change, a change record is created that shows the old_value and the new_value. One row in the table is equal to a single change.
Key points are:
- The change record (row) can be for a change to the status or a change to the type as indicated in the Change_Type column.
- There is a Member_ID column that shows which member the change row belongs to.
- The date of the change is the start_date
The objective is to know what the “current status” of the type code is for each member at the time the new change record was added. Stated differently, at the time this change record was created, was this customer a full, partial or limited member?
The problem therefore is that I need to find the latest “type code” set prior to the date of the current transaction. An example will make it clearer.
Here is a filtered copy of the table for member_id 3100. The transactions are sorted in start_date order (which is the date of the transaction). This table contains my final working calculated column formula “Current Type”
- The first 2 transactions (rows) occurred on 28/1/1998 (start_date). At this time there was no pre-existing entry for “Type” and as such the calculated column “Current Type” returns blank (see 1 above).
- The transaction in row 3 occurred on 22/3/2013 (start_date). At this time the “Type” for this customer was “12” because this was the last code set as new_value prior to 22/3/2013. In this case the “Type” was set on 28/1/1998 as shown in 2 above. As a result, my calculated column returns 12 for record 3 (see 2 above).
- Record 3 actually changed the “Type” value from 12 to 14, so the next records (4 through 7) should return the value 14. And they do as you can see in 3 above.
A Different Design?
Before I share with you how I created this calculated column, it is worth pointing out that this probably would be easier to solve if there were 2 separate tables, one containing the “Type” transactions and another table containing the “Status” transactions. Putting that aside, this is a great challenge as is, and working on challenging DAX problems will help you become better and stronger – a true DAX Ninja.
How I Solved The Problem – Step by Step
I had a clear view in my mind what I needed to do before I started. When solving any DAX problem, you need to think “filter first, evaluate second”. In other words, filter the table I am given so it only contains what I need, then grab the values I am after.
With this in mind, for each record, I needed to:
- Check which member the single record belonged to
- Check the date of the change of this record
- Filter the entire table for:
- this one member
- also for all records that were change_type = “type”
- also for all records prior to the current record date
- Then once I had the above filtered table, I had to find the last record by start_date and then grab the number from the new_value column
Step 1. Get Started With Anything
Frankly I just wanted to write something that would get me started. So I started with something simple. I asked myself, “how many rows does each customer have in this table?”
I wrote this calculated column.
Line 4 is the key. Line 4 says…
“go and put a filter on the Changes table where the member_id is equal to the member_id of this row”. The EARLIER function is needed because a calculated column has a row context and so does the FILTER function.
The VAR Syntax
If you have Excel 2016 or Power BI Desktop then you can use the VAR syntax. The VAR syntax is actually easier to write and easier to read I think. Here is the same formula with the VAR syntax.
I hope you agree that this VAR version is easier to read and understand. First you set a value to the variable in line 2. Then you can use this variable inside your formulas without having to worry about inner and outer row context.
OK, I am up and running and have solved the first problem. I have written a formula that correctly detects the current member_id and filters the table for that member (as you can see below in the WIP column).
Of course I needed to use COUNTROWS as it is not possible to put a table into a column. Instead I used COUNTROWS as a trick so I could “see” how big the table is for each member. I regularly use COUNTROWS when I am building complex formulas as it is one good way to “see” the virtual table I am building. Another important point here is the test data. When I first wrote this test formula I actually had 7 rows in the sample table for each customer. This made it very difficult to check if the formula was working. So before doing anything else I went back and changed my test data so that each member had a different number of rows. Then I could “see” that the formula was working. Good test data is essential to effectively write formulas.
Step 2. How Many Rows of “Type” for this Member?
Working through my high level mental plan, I decided to try to filter the table to count how many rows there were for each customer where the record was change_type = “type”. Here is my new formula.
As you can see I have switched back to the original syntax (without VAR). I simply added a new filter on change_type = “type”. It was easy to check if this formula was working by looking at the results.
Step 3. Extract Something More Useful
So far I have counted how many rows there are in these test tables. So now that I had the correct table to use as my filter, I decided to try to extract the last start_date from this table. This was as easy as changing line 3 as you can see below.
And the new formula now correctly extracts the last start_date of this table. Remember the table contains all the change_type = “type” for this member_id, so this start_date is the date of the last “type” transaction found.
Step 4. Turn The 2 Filters into a Single Table
The above formula has 2 filter functions. I really need it to be a single table so I can use it. There are a few ways to do this, but I decided to change it into a table as follows. Note I am still using COUNTROWS so I can validate it is still working.
Rows 3 to 7 now are a “table” that I can use as I move on. Instead I could have used a single FILTER function with multiple filters using &&.
Step 5. Grab the Last Date Before This Transaction Date
This table contains all type transactions. But I don’t want all type transactions, I am looking for the last transaction that occurred before this current transaction. In other words I need to filter this table to remove any type transactions that occurred on or after the date of this transaction. I decided to switch back to the VAR syntax here to make it easier to write this formula.
Note above rows 7 – 11 are the table I produced in step 4. I wrapped this inside another FILTER to remove all records on or after the date of the current record, only keeping records that occurred before the date of the current change record. Rows 6 through 13 therefore returns a table that contains all ‘type’ records that occurred before this record. I then use the CALCULATE statement to return the date of the last type transaction. As you will see in the resulting table below, the WIP column now returns blanks for any record that doesn’t have a ‘type’ record for an earlier period in time.
Step 6. Extract the Value from the Previous “Type” Record
Now I know the date of the correct ‘type’ record, I can proceed to extract the value of that record from the new_value column. Here is my formula.
Note how I have taken the result of the formula in step 5 and turned that into another VAR (rows 3- 14). Then I have written a new formula (rows 16 – 21 above) to extract the value I need after filtering for the member_id (row 18), the change_type (row 19) and the correct date of the transaction (row 20).
One Last Complication
I did have one last problem that I came across. The OP at http://powerpivotforum.com.au was using Excel 2010, so I couldn’t provide the above formula (Excel 2010 doesn’t support the VAR syntax). So I had to remove the VAR syntax. First I removed the first VAR (line 2 from step 6 above) as follows. Note how I had to use EARLIER in line 11 to manage the row context.
Finally I removed the second VAR from step 6 as follows. I grabbed lines 3 -13 from the formula above, and replaced dateToUse above in line 19 with this formula.
After making this change, note that this time in lines 14 and 17 I had to tweak the formula to correctly manage the nested row context, but this time I had to refer to EARLIEST instead of EARLIER. This is because there are 3 nested sets of row context. The first is the calculated column, the second is the FILTER on row 6 and the third is the FILTER on row 11.
A Final Word
The key point to this post is that the final formula above is hard to read and understand. You may think that this formula is written starting at the top and working to the bottom, but this is far from reality. Complex DAX formulas rarely (if ever) are written top down. The trick to writing complex DAX formulas is to break the problem into pieces and solve each piece of the puzzle one step at a time.
Do You have a Better Solution?
I know there are some talented DAX Ninjas out there that read my blog (Jess, Owen to name 2). I always love learning so if you have a better formula/solution or interesting alternative approach than this I would love to see it. Please post in the comments.