Load the Latest Version of a File with Power Query - Excelerator BI

Load the Latest Version of a File with Power Query

There are quite a few common business scenarios where you get data in the form of a file on a regular basis from some source system and you need to process these files into your Power BI reports.  There are 3 common patterns that come to mind.

  1. Each day/month/year you get a new file containing incremental data.  All the data in all the files need to be combined and loaded.
  2. Each day/month/year you get a new file and it replaces the old file.
  3. You have a workbook with multiple sheets within a single workbook that need to be combined.  This could be in addition to pattern 1 or 2 above.

I have previously shared an article explaining pattern 1 (how to combine all files in a folder with Power Query). Over the next few weeks I will share a couple of simple techniques for the other 2 common scenarios starting today with pattern 2 above.

Load the Latest Version of a File

This pattern is common if your new file contains a superset of all the data.  It could be a transactional file that grows in length each time or it could be a dimension/lookup table (such as Customers) that can change slowly over time, and you always want to see the latest version.  My advice to all my Power Query students is “zero touch the file”.  In other words, your objective should always be to have the absolute minimum amount of interaction with the source files possible and push all the work into Power Query.  This will minimise the amount of work/rework you have to do in the future.  Thinking about the use case here “load the latest version of a file”, the question becomes “how can I make this zero touch”?  There are a few issues to consider including naming/renaming of the file and also archiving old copies of the file.  This doesn’t sound like zero touch to me.

Power Query Online Training

Load from Folder, not File

My solution to this problem is to load the file from a folder and not directly from the file.  The pattern is to

  1. Have the system extract the file to include a unique name eg myFile-YYYYMMDD.xlsx
  2. Place the file into a standard folder
  3. Sort the folder descending based on creation date
  4. Select the first file from the list of files

Using the above approach you will always load just the latest version of the file while keeping an archive history of all the files in the same folder – zero touch.

Here is a video demo of how to do it.  Unfortunately I was away from my home office when recording this video and there is quite some background noise – sorry about that.

Want to Learn Power Query in a Structured Way?

My Power Query Online training course consists of 7.5 hours of online video training material that will teach you everything you need to know about using this fabulous tool for both Power BI and Excel. You can watch some free sample videos and find out more about registration following the link.

And, here is the link to the Power BI Report Themes blog article that I mentioned in the video – https://exceleratorbi.com.au/changing-defaults-in-power-bi/.

24 thoughts on “Load the Latest Version of a File with Power Query”

  1. Hi Matt,

    What if the most recent excel file has multiple tabs (worksheets). How do you apply {0} to all of them to ensure the most recent version of each worksheet in the most recent excel file is uploaded without duplication of effort?

  2. Hi Matt,

    Thank you for this very simple approach to loading the latest file from a folder. I understood what needed to be done, but I was stuck at the point where it grabbed the most recent filename. Your video very simply showed how to get the content from the first row apart from the file name. Perfect! Thank you!

    Bobby

  3. Hey Matt – thanks for the post.

    I was wondering if you have any solution here that covers ‘incremental load’ such that the data refresh will pull the latest file in the folder but only update the existing information without overwriting in. So you can delete the source information in the external system for a record between 1 -> 2 -> 3 etc imports – but its never deleted upon refreshes, you build upon the first file but never overwrite deleted recordsE.g:
    File 1: File 2: — On refreshing, record for row ‘b’ will still exist & a,c,d,e will update as normal.
    a a
    b c
    c d
    e

    1. Incremental refresh is only supported by Premium. If you are loading lots of files, you could consider creating a single history csv file and then append the new files. The history will still load, but it can be faster, depending on the scenario.

  4. Melascaglia Robert

    Hi Matt

    Your information is very valuable so thanks for that. I like to ask based upon loading files automatically from a folder into PQ.

    What if I have five-files that are also related tables in Power Pivot (PP). Lets say they will be updated each month and I like to add the updated versions to the same folder.

    How does PQ handle the process so my PP Data Model remains intact?

    Thanks from Robert,

  5. Hi Matt

    I was getting frustrated at selecting the most recent file, then Power Query naming it. I didn’t realise I could remove the absolute filename etc. Thanks for your help

  6. I concern the files in csv format. Would you please advise the pro and cons between xlsx and csv. As our ERP system just supports in csv only.

  7. Kamil Kaleciński

    My way around is to store dynamic source path string in a named range in excel worksheet – using Excel formulas you can easily build desired string eg. “=C:\Folder\Extract From “&Text(Eomonth(Today(),0),”yyyy-mm-dd”)&”.xlsx” . This named range can be loaded to PQ as a separate text query, then referenced while specyfing source path for another query. Worked well in my situation, where team decided a long time ago to store each month`s data in separate folder (hopefully with logic date-based naming convention).
    I guess the same path-building puzzles could be constructed in Query from scratch when using Power BI.

  8. Pingback: Getting The Latest File With Power Query – Curated SQL

  9. Hi Matt-
    Any strategies for the second scenario you mentioned above (“Each day/month/year you get a new file and it replaces the old file.”) but where you want to save the historical data, not replace it with the newest file? In other words somehow create a historical record file that keeps the old data and then adds the new data each time the PowerQuery file is refreshed (which could then be scheduled automatically using the endlessly useful PowerUpdate). The example is that we have a “Backlog” table that is updated once a day from our ERP that reports each order that is currently sitting in our factory yet to be shipped. But it is always a live file. If we want to know “how much was sitting in backlog at the same time last year”, it’s a challenge to reconstruct the backlog (doable using other data in our ERP perhaps, but wondering if I could create this sort of a “historical backlog” record)

    I can think of a strategy that would date-stamp the data with the time the refresh is run, but haven’t worked it all out yet. Wondering if you’ve already cooked something up.

    1. Sorry for the slow reply on this one. To me this sounds like scenario snapshots. ideally you get your source system to add the date of extract into the file name. That way you do a combine and merge all data (instead of loading the latest) and use the file date as the “scenario date” as opposed to the transaction date. Does that make sense? If your system can’t do that, you could look at Microsoft Flow (not sure if it will work or not) to add a date to the file name. I have heard people using Flow to monitor an email inbox and download attachments -I have never done it though.

  10. Hi Matt,
    I used PQ for getting the latest file in a folder too. I used a slightly different approach. I applied Date/Time filter on the Date Modified column, filter “is latest “ which return the latest file in the folder.
    It works too.
    Regards,
    MF

  11. Vaughan Boekestein

    Hi Matt – great article thanks. I have tried to set this up but the weekly download I get is extracted from our ERP system and is created with some blank and then a report name and some lines for the parameters used.
    I’ve not figured out how to deal with those unwanted blank and report name rows? Can you tell power query to only read from row x?

    1. Home\Remove Rows\Remove Top Rows\

      Formal training has a cost of course, but not knowing how to do stuff also comes at a cost, the cost of lost productivity, lost opportunity etc. You should consider purchasing my Power Query online course and you will be an expert in no time at all.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top