There was a great new feature that snuck in unannounced to the July release of Power BI Desktop. It was only a tweet by my colleague Imke Feldmann from http://www.thebiccountant.com/ that I was even aware it was there.
Include in Report Refresh
When you look in the Queries pane in Power BI Desktop, you will see your list of queries in the Queries pane on the left hand side. Simply right click on any of the Queries to see the new feature (shown as 2 below).
You can see the new check mark “Include in Report Refresh” which obviously can be toggled on or off (it is on by default). This is a step towards partitioning, which is available in SSAS Tabular.
So What Can I Use This For?
As soon as I saw this, I thought of an immediate use case that will add a lot of value for me (maybe you too). I have a client that I complete reporting for on a weekly cycle (I only process the data once per week). I import daily extracts of data into a data warehouse (once each week) and then provide weekly reporting and analysis tools. I also have some business users that need daily reporting on a subset of the data – but I only process the data on site once per week.
The Old Process Was to Load a History Table
The process I have used so far to solve this problem is as follows:
- Prepare a “history table” (illustrated as 1 below) that has sales going back to the start of the year (up until a point in time). This history data is an Excel table in a workbook and I refresh then store this on the client file server.
- I then have a daily sales report (2 below) that refreshes off the daily file extracts (3 below) that get placed on the server each night. The client just opens the workbook (2 below) and clicks refresh – it then cycles through the new files and imports the data needed.
- The Workbook (2 below) also processes the history file (1) and appends the historical data to the new data to create the daily sales report needed.
The Benefit of this Approach.
The main benefit of this approach is that you don’t need to load data from hundreds of individual files. Power Query is very capable to load data directly from files, however this approach can be slow if you have lots of files and the files are very large. In my case, the files have around 50,000 rows each, and there is one file per day. To load data from 200 files like this would likely take 30 mins or more.
There Are a Few Problems With This Approach.
- The longer I leave it to re-create the history table, the slower the daily sales query becomes, as it has to cycle and load more and more daily files to extract the few product lines needed for the report.
- There is a bit of effort to re-run the history table every couple of weeks. I then also need to reset the daily queries to only bring in data that is after the data in the history table. I have parameter tables set up to do this, but there are still a few manual steps.
- I then of course need to copy the additional history file to the server each time I create an update.
Enter the New “Include in Report Refresh” Feature
Given the above, I immediately saw the opportunity to replace my entire history table process with a new history table directly inside the main Power BI Desktop file. The new pattern that I will use is as follows.
- Add a new history table directly into the workbook.
- When I refresh the workbook against my data warehouse, I will switch on the “Include in Report Refresh” menu, refresh, then turn it off again.
- I will modify the daily refresh Query to extract the last sales date from the history file, and to start the daily refresh from that date automatically. Given I know the history table won’t change (because refresh is disabled), there is no chance that the queries will execute in the wrong order and muck everything up.
- Send the workbook to the client.
The changes above will make this process easier and faster to maintain, and also faster on average to refresh every day.
And this Feature Could be Even Better
One extra feature I would like to see from Microsoft is to allow this setting to be configured with a parameter using the Power Query Language (PQL). An example would be if an error was detected (using the Try command) when the Query failed to connect to my Data Warehouse, then turn off the report refresh for this table. Now that would be something! Please vote for that idea here.