Combine Excel Workbooks with Power Query – Method 1

I have blogged about these concepts before but thought there was value in creating a couple of clear blog posts covering step by step instructions on 2 good ways to combine multiple Excel Workbooks with Power Query.  Both of these approaches require that the Excel files are identical in structure (same column names, same number of columns).  The number of rows is not important.

Method 1: Create a custom function – good when you want to reshape the data in the source files

Combining CSV: I will cover combining CSV files prior to covering Method 2 for Excel

Method 2: Use the Excel.Workbook Power Query function (covered in a couple of weeks) – good when the data is already in the final shape.

 

Method 1: Create a Custom Function – High Level Approach

Here are the high level steps to complete this task

  1. Load one of the Excel workbooks in Power Query and complete the steps in Power Query to reshape the data the way you want it.
  2. Convert the Query into a custom function and save
  3. Create a new query that loads all the names of files to be combined, and call the custom function to do the work for you.

1. Load and process one of the workbooks

For this demo, I have 3 identical workbooks that look like this. Each workbook has a name indicating the year (shown as #1 below).

image

First I create a new blank workbook and connect to one of the files.

image

Once you select the file you need, select the correct sheet to import and then click EDIT.  Note that this process assumes the sheet names in all workbooks are the same.

image

Complete the transformations you need.  In this demo, I am simply unpivoting the month columns to get a better table structure.  When you use this approach yourself, simply complete all the transformation steps you need for your data at this point.

image

2. Convert the Query into a Custom Function

This query is currently a one off query that only works on one specific file.  The next step is to convert this to a function so that it an be used to process any file of the same shape.

Rename the query to something that indicates that the query is a function (or will be once the following steps are completed) and also something easy to remember.  I tend to call mine fnCombine.  Note that Power Query is case sensitive, so you will need to replicate all capitalisation when you call the function later on.

image

Switch to the Advanced Editor.  Click #1 View, #2 Advanced Editor

image

Most of the code you will see is irrelevant for this exercise.  The only thing that matters is the hard coded path and file name for the file processed in the previous steps.  The objective now is to turn this hard coded file pathfilename into a parameter.  Once this is done, it will be possible to pass any pathfilename to the new function and have the new function process the new file (or multiple files in this case).

image

The syntax to create a function is pretty simple.

1. Add a line of code before the let statement that accepts parameters into the new function as shown in 1 below.  You should ready this line of code as saying “this function accepts 2 parameters (myPath and myFile) and then passes the values from these parameters to the query steps in the function.

image

The only other change is to replace the hard coded pathfilename in the original query with the parameter values.  You will need to delete everything from and including the double inverted quotes (which is the hard coded pathfilename) with the 2 new parameters.  Note there is an ampersand (concatenate) between the 2 parameters and there are no double inverted quotes.

Click Done, then FileClose and Load.

Once you have done this, you will see the query in the Query Pane on the right shows that the query is a Function (shown by the Fx symbol next to the query name shown in 1 below).

image

Intermission – test the function to see how it works

This is not required, but it is useful to test the function now to see how it works.

Right click on the function name in the Query Pane, and select “invoke”.  The function then asks for 2 parameters.  I tested mine by going to Windows Explorer, then do a cut and paste of the folder path and also the file name.  Note that you must have a final slash at the end of the file path name (this will not come across with a standard cut and paste from Windows Explorer).

image

Once I clicked OK, the new function successfully processed my file “2014.xlsx” even though the query was originally written for the file called “2013.xlsx”

image

Finally click FileDiscard and Close – ie don’t save this query.

3. Create a new Query that combines all the files

Now that you have seen how the new function works, the final step is to create a new query that will create a list of all the file path and file names you want to combined. Then pass the pathfilenames to the function one at a time for processing.

First create a new query that provides a list of all the file paths and file names.  Click From File (#1) then From Folder (#2).  Then navigate to the folder that contains all your files to combine and click OK.

image

The query will provide you will quite a few columns of data, but the only ones needed are #1 Folder Path and #2 File Name.

image

Select these 2 columns and then click “remove other columns”.

You will then have exactly the information you need to pass to the function written above (#1 and #2 below).

image

Now add a new custom column and call the function from this column. Click #3 Add Column then #4 Add Custom Column.  The formula for the custom column is shown below.  You simply type the name of the function and pass the 2 parameters to the function.  Note there is a comma between the 2 column names and not an ampersand.  Note the format is identical to the first line of code added to the custom function way back in step 2.

image

After you click OK, Power Query shows the new column and indicates that the column contains a table of data.  ie for each row of the query, the new column will call the function fnCombine and return a table of values.

image

When you click on the expand button (shown above), you will be given a choice of which columns from the new table are required.  In this demo, I have selected all columns and deselected the last option “Use original column name as prefix”.

image

After clicking OK, the query looks like the following.

image

The last step in this demo is to extract the YEAR from the file name.  To do this, select the name column (#1), then click the home tab (#2) and then Split by Column #3).

image

It is possible to split by delimiter or number of characters. For this demo I selected “number of characters” and then selected 4 characters, once as far left as possible).

image

After this step, you will have the following

image

Just a bit of clean up left to do.  Delete the columns that are not needed and rename the first column to be called Year.  Then click Close and Load and you are done.

image

And there you have it – a new table that contains all the contents of the files in the folder path, plus the files have been transformed (in this case unpivotted).

Wrap Up

This approach is very powerful and can be used in all instanced.  Once you have done it a couple of times, you will be able to do it automatically without referencing this guide.  The main benefit of this approach is that is allows you to complete transformations on the data before the data is combined.

Here are the sample files I used if you want to try it out yourself.

I will blog about method 2 that uses the inbuilt Power Query functions to combine Excel Workbooks that are already in the correct shape into a single table in a couple of weeks.  However I will first blog about combining CSV files, as this is an important precursor to the Excel method.

 

if you want a comprehensive lesson on how to use Power Query, checkout my training course here https://exceleratorbi.com.au/product/power-query-excel-power-bi-online-training/

Share?

Comments

  1. Excellent post!! Very detailed and easy to follow.

    What if you want specific cells from a workbook?

    For example, let’s say I want cells B10, E6, A1:A10, and F14 from the first sheet of all the workbooks in a folder. How could I accomplish that?

    Thanks,
    Orlando

    • Power Query only works on tables. You could try importing e entire sheet and the use power query to extract the data you need. I think this would be hard. They way i have done something similar in the past is to add a hidden sheet in every workbook, and set up a table in the hidden sheet that points to all the cells of data you need. Of course this takes some set up but may be viable.

  2. I showed this process off to someone and the were “…its a work of art!”
    In converting to a function in a Folder case I tried using just (a little intuition and help from “M is for (Data) Monkey” )
    (myPath) >=
    which works, but the user must know to add an ending backslash after pasting the path. Not hard for most of “us” but is there a method to allow browsing to the desired folder?

    Are there other recommended sources besides the “Microsoft Power Query Preview for Excel Formula Library Specification” and the afore mention “M is for (Data) Monkey” to help in the advanced editor for Power Query?

    • Glad you and your colleges like it. Regarding the backslash, I am not really sure what you mean by the user needing to add a backslash. In my worked through demo, you use Power Query to browse the folder and Power Query returns the Path name (including the final backslash) and the file name. These 2 columns are passed to the function to do the work. So I am not sure what the use case is where the user needs to add the backslash.

  3. This is great, I was looking for such example for really long time. One question though – is it possible to only do incremental update somehow with this method? Say I will have quite large files added daily to the folder – I would only like to import new file that is not yet in target table.
    Would it be possible also?

    • Incremental update is not well supported. You can create a “history” table and set that so it doesn’t refresh (Power BI only), then append it to the new data table. It is a hack, not a true incremental update. There is some talk that incremental update is coming

Leave a Reply

Your email address will not be published. Required fields are marked *

Ten Things I Wish I Knew When I Started With Power Pivot.   Learn  Without The Pain!

x