I got my haircut today (pretty spiffy one too, even if I do say so myself). While I was chatting I asked my hair dresser “on average, how often should I get my hair cut”? She told me (for men) around 4-6 weeks. Then I got thinking (as I do), I wonder if I could data-mine my credit card data using Power BI and find out how often I actually get my own hair cut. It turns out I was able to do this, and this article explains the hardest part of that task – find the number of days between two transaction dates using DAX.
Loading Credit Card Data
I have covered how to load your bank data using Power Query before so I am not going to cover that in any depth in this article. I first updated my bank credit card extracts so I had a full set of data and then loaded them up into Power BI. After loading the data into a single table, I had the following data structure.
As you can see, it is a simple list of transactions with a date, description (vendor), and the amount of the transaction. With this in mind, my idea was to first filter the description to that of my hairdresser and then find out the number of days between transactions. Note, I changed the true name of my hairdresser in the data extracts
Check the Data are Complete
The first thing I did was use Power BI to check that the data were complete. It is not necessary to write any DAX to do this. I just created a couple of visuals that showed me a COUNT of description by date.
Visual 1 is simply the date column with a count of description. Power BI automatically creates a hidden date table that allows me to view the data by year and month. I actually deleted the Quarter and Day levels in this visual and then drilled down to Year\Month.
Visual 2 in the original report above is the same data but not using the date hierarchy, but using the actual dates of the transactions (posting date actually). I also created a couple of cards (3 and 4 above) and used these to find the first date and last date in the selected period. That allowed me to click on each month and check that the first and last date in that month seemed to be complete. Simply using this audit report allowed me to find some missing data in my csv extracts before I started the exercise.
Days Between Two Transactions
OK, next job was to filter for my hairdresser and look at the transactions in a table. You can see a slicer on the left below and the table of data on the right. Mmmm, seems my hairdresser put up the price 3% in April this year!
The task at hand is to find out the number of days (or weeks) between each of these transactions. This is easy in Excel because a spreadsheet has a cell reference, and you can copy formulas and maintain the relative relationship between the cells. This is not possible in Power BI, Power Pivot or any database for that matter. Databases do not store data using absolute location references – they simply store the data in the most efficient way it can and the order is not reference-able (unless you have a row ID, and that is not a good idea in Power BI).
Days Since Previous Transaction
As is my style, rather than me give you the answer, I want to show you how I solved the problem. It is always best to set up a table (or matrix) and solve each part of the puzzle one step at a time. EVERYONE does this in Excel automatically, but it is not so intuitive in Power BI. You should ALWAYS solve problems this way in Power BI.
Extract the Correct Date of THIS Transaction
The first task is to be able to access the date of the current transaction. For this formula, I decided to use the VAR syntax. So my first WIP measure was as follows:
WIP = VAR ThisTranDate = SELECTEDVALUE('Visa Extracts'[Date]) Return ThisTranDate
When I placed this measure in my table, this is what I got – confirmation that the measure is correctly identifying the current transaction date.
Date of Previous Transaction
There are 2 benefits of creating a table and placing your WIP measure in that table. Firstly you get to see if the measure is giving you the answer you expect, and secondly being able to “see” the problem in front of you helps you think through the next steps. As I said earlier, it is not possible to use a cell reference in Power BI. The only way to solve problems like this in DAX is to first filter the data and then act on the data that has not been filtered (Filter First, Evaluate Second). If you look at the table above, you will “see” that the way to solve this problem is to filter out all records that occur on or after the current transaction date and then work out the latest date.
Note: This works because I know I don’t have more than 1 transaction for my hairdresser on the same day. If there were more than 1 transaction on the same day, you would need some way of determining the order of transactions within the same day, such as a time stamp or an ordinal ID of some type.
So my job to find the previous transaction is simply to use FILTER to find the last record AFTER filtering away the current and all future transaction dates. I modified my WIP formula as show below (there are other valid formulas that will work too). Note how the FILTER function first removes all the transactions on or after the current transaction, then returns the last date left in the list, which just happens to be the date of the previous transaction.
Days Between Transactions
Now that I had the current and previous dates, I was able to find the days between each transactions. Here is my formula:
As you can see below, this formula gives me the weeks between each transaction, but it doesn’t give me an average overall.
Average Days Between Transactions
This is where experience comes in. I knew immediately that the way to find out the average over all the selected transactions was to write an AVEARGEX function that iterates over the dates in the Table visual to create the answer. In other words, I needed to write a measure that would “simulate” the role of the table above and work through each transaction one at a time, then work out the average overall. I talk about this “simulate” concept in depth in my article here. Because of the way filter context works (the filter coming from the table visual), it turns out that this new AVERAGEX measure will work equally well for each row in the table as well as the total in the table.
As a result, I then removed my first measure and was left with the following:
Any analysis is only as good as the data you have loaded. I would actually be surprised if I went 13 weeks without a haircut and think that I may have paid for my October 2017 haircut with cash (rare, but possible). As it turns out, if I assume there is a missing haircut between July and October, then my average is 6.33 weeks. I calculated that using this formula.
Avg Weeks Between Transactions = SUMX(VALUES('Visa Extracts'[Date]), [Weeks Between Transactions]) / ( DISTINCTCOUNT('Visa Extracts'[Date]) + 1)
Disturbing Open Questions
This article creates a whole lot of disturbing questions that are out of scope for this article, such as “Does Matt really spend more $30 for a haircut”, and “What happened in early June 2017 that triggered a hair cut after 4 weeks”?