I have blogged about these concepts before but thought there was value in creating a couple of clear blog posts covering step by step instructions on 2 good ways to combine multiple Excel Workbooks with Power Query. Both of these approaches require that the Excel files are identical in structure (same column names, same number of columns). The number of rows is not important.
Method 1: Create a custom function – good when you want to reshape the data in the source files.
Combining CSV: I will cover combining CSV files prior to covering Method 2 for Excel.
Method 2: Use the Excel.Workbook Power Query function (covered in a couple of weeks) – good when the data is already in the final shape.
Method 1: Create a Custom Function – High Level Approach
Here are the high level steps to complete this task
- Load one of the Excel workbooks in Power Query and complete the steps in Power Query to reshape the data the way you want it.
- Convert the Query into a custom function and save
- Create a new query that loads all the names of files to be combined, and call the custom function to do the work for you.
1. Load and process one of the workbooks
For this demo, I have 3 identical workbooks that look like this. Each workbook has a name indicating the year (shown as #1 below).
First I create a new blank workbook and connect to one of the files.
Once you select the file you need, select the correct sheet to import and then click EDIT. Note that this process assumes the sheet names in all workbooks are the same.
Complete the transformations you need. In this demo, I am simply unpivoting the month columns to get a better table structure. When you use this approach yourself, simply complete all the transformation steps you need for your data at this point.
2. Convert the Query into a Custom Function
This query is currently a one off query that only works on one specific file. The next step is to convert this to a function so that it an be used to process any file of the same shape.
Rename the query to something that indicates that the query is a function (or will be once the following steps are completed) and also something easy to remember. I tend to call mine fnCombine. Note that Power Query is case sensitive, so you will need to replicate all capitalisation when you call the function later on.
Switch to the Advanced Editor. Click #1 View, #2 Advanced Editor
Most of the code you will see is irrelevant for this exercise. The only thing that matters is the hard coded path and file name for the file processed in the previous steps. The objective now is to turn this hard coded file pathfilename into a parameter. Once this is done, it will be possible to pass any pathfilename to the new function and have the new function process the new file (or multiple files in this case).
The syntax to create a function is pretty simple.
1. Add a line of code before the let statement that accepts parameters into the new function as shown in 1 below. You should ready this line of code as saying “this function accepts 2 parameters (myPath and myFile) and then passes the values from these parameters to the query steps in the function.
The only other change is to replace the hard coded pathfilename in the original query with the parameter values. You will need to delete everything from and including the double inverted quotes (which is the hard coded pathfilename) with the 2 new parameters. Note there is an ampersand (concatenate) between the 2 parameters and there are no double inverted quotes.
Click Done, then File -> Close&Load.
Once you have done this, you will see the query in the Query Pane on the right shows that the query is a Function (shown by the fx symbol next to the query name shown in 1 below).
Intermission – test the function to see how it works
This is not required, but it is useful to test the function now to see how it works.
Right click on the function name in the Query Pane, and select “invoke”. The function then asks for 2 parameters. I tested mine by going to Windows Explorer, then do a cut and paste of the folder path and also the file name. Note that you must have a final slash at the end of the file path name (this will not come across with a standard cut and paste from Windows Explorer).
Once I clicked OK, the new function successfully processed my file “2014.xlsx” even though the query was originally written for the file called “2013.xlsx”
Finally click File-Close and Discard – i.e. don’t save this query.
3. Create a new Query that combines all the files
Now that you have seen how the new function works, the final step is to create a new query that will create a list of all the file path and file names you want to combined. Then pass the pathfilenames to the function one at a time for processing.
First create a new query that provides a list of all the file paths and file names. Click From File (#1) then From Folder (#2). Then navigate to the folder that contains all your files to combine and click OK.
The query will provide you quite a few columns of data, but the only ones needed are #1 Folder Path and #2 File Name.
Select these 2 columns and then click “remove other columns”.
You will then have exactly the information you need to pass to the function written above (#1 and #2 below).
Now add a new custom column and call the function from this column. Click #3 Add Column then #4 Add Custom Column. The formula for the custom column is shown below. You simply type the name of the function and pass the 2 parameters to the function. Note there is a comma between the 2 column names and not an ampersand. Note the format is identical to the first line of code added to the custom function way back in step 2.
After you click OK, Power Query shows the new column and indicates that the column contains a table of data. ie for each row of the query, the new column will call the function fnCombine and return a table of values.
When you click on the expand button (shown above), you will be given a choice of which columns from the new table are required. In this demo, I have selected all columns and deselected the last option “Use original column name as prefix”.
After clicking OK, the query looks like the following.
The last step in this demo is to extract the YEAR from the file name. To do this, select the name column (#1), then click the home tab (#2) and then Split by Column #3).
It is possible to split by delimiter or number of characters. For this demo I selected “number of characters” and then selected 4 characters, once as far left as possible).
After this step, you will have the following
Just a bit of clean up left to do. Delete the columns that are not needed and rename the first column to be called Year. Then click Close and Load and you are done.
And there you have it – a new table that contains all the contents of the files in the folder path, plus the files have been transformed (in this case unpivoted).
This approach is very powerful and can be used in all instances. Once you have done it a couple of times, you will be able to do it automatically without referencing this guide. The main benefit of this approach is that is allows you to complete transformations on the data before the data is combined.
Here are the sample files I used if you want to try it out yourself.
I will blog about method 2 that uses the inbuilt Power Query functions to combine Excel Workbooks that are already in the correct shape into a single table in a couple of weeks. However I will first blog about combining CSV files, as this is an important precursor to the Excel method.
If you want a comprehensive lesson on how to use Power Query, checkout my training course here https://exceleratorbi.com.au/product/power-query-excel-power-bi-online-training/