I have written a series of articles where I explained different methods of combining data from multiple files into a single workbook/table using Power Query. In the first article, I explained how to create a custom function when you want to reshape the data in the source files. In the second article, I covered combining data from multiple CSV files in a folder. In the third article of the series, I have shown how to combine data from multiple Excel workbooks using a different approach.
Over time, Microsoft has deployed many improvements in using Power Query including the ability to automatically combine the contents of multiple workbooks automatically. With the combine feature you can
- combine data in multiple worksheets into a single table
- combine a single sheet from multiple workbooks into a single table
Today I want to show you a way you can do both at the same time, i.e. how to combine data that is in multiple worksheets and also in multiple Excel workbooks into a single table using Power Query. All the workbooks need to be in the same folder and the data in all the worksheets and workbooks that need to be combined should be in the same format. However, it is possible that the workbooks have
- different number of worksheets
- worksheets that are not necessary for the processing that can be excluded inside the query
The Method
The method I show has two parts.
- Create 2 functions
-
- a function to extract the valid worksheets from one workbook
- a function to reshape data in a single worksheet as needed
-
- Then I extract the data using these 2 functions from all the Excel workbooks in the folder
YouTube Video
You can view how each of these steps is executed using the Power Query UI with the occasional tweaking of the ‘M’ language code that is produced in the following video. I have provided a high level summary of the steps below too.
Function to extract all valid worksheets from a workbook
The steps to be used are
- Get data from one Excel file (this is going to be the sample file for the function).
- Use the “Transform data” option to load a list of sheets into the Power Query Editor.
- Filter the table to retain only the valid sheets you want to transform.
- Next create two parameters – file path and file name. You will use these to create a parameterised query.
- Edit the source step so the step uses the path and file name parameters.
- Turn the query into a function.
- Call the function fnAllSheetNames.
This function extracts all the valid sheets from a workbook given the file path and file name.
Function to reshape data in a single worksheet as needed
The steps to be used are
- Use the same sample file as above. So create a duplicate of the same query.
- Extract the first worksheet from the workbook.
- Create a new parameter and call it SheetName.
- Edit the query and make it generic to use the parameter.
- Now the query works with 3 parameters – file path, file name and sheet name.
- Turn the query into a function.
- Call the function fnProcessSheets.
Extract data using the 2 functions from all the Excel workbooks in the folder
The steps to be used are
- Get data from Folder.
- Select Transform Data and load the list of all workbooks into Power Query.
- Keep 2 columns – the one with file names and the other with file path names (remove the others).
- Add a new column using Invoke Custom Function.
- Use the function fnAllSheetNames and pass the 2 column names as the parameters.
- Extract the sheet names by expanding the “Item”.
This gives all the valid sheets from all the files in the folder. Next, we need to extract data from all these worksheets.
- Add column using Invoke Custom Function.
- Use the function fnProcessSheets and pass the 3 parameters.
You have all the data from all the valid worksheets from all the workbooks in the folder.
Want to Learn Power Query?
If you want a comprehensive lesson on how to use Power Query, checkout my online training course for details and preview videos.
Hi! Matt, Thanks , this is great, I have on Sharepoint folder the files, so all working very well in PBI desktop, so, when pusblish this file, not refresh the dataset, dinamyc querys, I tried woth RelativePath (Chris Webb blog), but continue with no refresh. please can you help me
Not sure if you’re monitoring this, but now my query is getting an error:
“data could not be retrieved from the database”
I have 64 Excel files with a total of 750 worksheets. This I can see in Power Query, but I get the above error when trying to load to table.
All files are on the local drive.
Any ideas what I should be looking for?
are you using Power BI or PQ for Excel?
Are you using 32 bit or 64 bit?
How much memory to you have on your device?
How big are the source files?
All these things can have an impact
How often does the source data change? If some of them don’t change (only new ones), loading them in batches and then creating a history file containing the old, transformed data, and save this to a CSV. That way the old data can be loaded direct from CSV and appended to the new data
Testing on various computers
are you using Power BI or PQ for Excel? – Excel PQ
Are you using 32 bit or 64 bit? 64bit Windows, but using mixture of 32b and 64b Excel
How much memory to you have on your device? 16G (with 32b Excel) 8G and 32G (Excel 64b)
How big are the source files? Largest source file could be around 3-6MB
Since I was running on various systems, I noticed that they all failed at the same record count leading me to believe it was source a file issue rather that a memory issue since the computer with the best resources would have made it further through the process.
So in researching this, I found that in one of the sheet’s transformation it produced an empty table, with further steps producing a step error. Once I accounted for this type of scenario I was able to get all the systems to successfully run the queries.
After retooling all the queries, the runtime to produce them have been significant. On the computer with the best resources, the runtime was cut down by more than half.
I’m so glad I found this post and was able to rethink my solution! I just need to run them in parallel for a bit to make sure I haven’t missed any logic/steps.
With the current solution, we have already been running the files in batches to keep memory use down. We’ll continue to do this with this new process but at least it should take less time to get through the batches now! 🙂
Thanks again!
Great stuff. Finding where queries fail (and why) is one of the toughest things to fix.
Oh wow! This is great, though I wish I had stumbled on this “recipe” when it came out when I was designing my query.
This is exactly the process I’m doing for our company timesheets. Now to see how to retrofit this process into my solution.
Thank you Matt!
Hi Matt, awesome video very well explained. Thank you for sharing with us.
I followed the instructions from A to Z, however when I am trying the Close & apply, I am getting the following error:
“Failed to save modifications to the server. Error returned: ‘Expression in partition ‘n/a’ in table ‘n/a’ references an unknown entity.”
I’ve already check for any errors with the function kept errors and got rid of those, but still not working, any clue what is wrong?
Managed to fix my problem never mind.
I’ve got the same problem. How did you manage to solve it?
Hi Eric, Can you please help, as how did you managed to solve the error as I am also getting the following error:
“Failed to save modifications to the server. Error returned: ‘Expression in partition ‘n/a’ in table ‘n/a’ references an unknown entity.” Would appreciate your help in this regards. Thanks
conernant ce sujet peut on avoir les ficiers excel avec
Hi Matt,
Thanks for another great tutorial! When combining the sheets, each sheet contains the same number of columns and the column names are the same across the board, correct?
Technically, no, but it depends what you do. If you unpivot the columns then change the data types, it will work. If you change data types of named columns and those columns are missing, then it will error.
This type of practical example is extremely helpful. I can take this info and directly apply it to my workflow. Very clear instruction and the overview of the steps provide the context to think about the larger issues and general applications. I’ll be looking into your formal training opportunities. Thank you
Thanks Dave. I’m glad you liked it.