Level: Intermediate
Sometime late in 2016 Microsoft deployed a new “Combine” feature in both Power Query for Excel (Get and Transform) and also in Power BI Desktop (Get Data). The new capability makes it easier to combine multiple copies of similar workbooks into a single table without hand coding M Functions, but unfortunately it is now a lot harder to work out what is happening let alone fix it when something goes wrong. Today I am going to explain how it all works and cover what you should and should not do.
Demo Data
The demo I am going to use is to combine multiple Excel Workbooks from a folder into a single table. Here are the Excel files in my folder.
This is what one of those Excel files looks like.
All files are the same “shape” in that they have a header row along the top and product number down the first column. What I plan to do with this data is combine all the files into a single table that is unpivoted so it looks like this.
Import the Data from Folder
To start the process, I select Get Data from Folder as shown below.
When given the option, I select the Combine button (combine and edit).
Select the Sample File
I am immediately presented with the first file in the folder to use as the sample file (shown as 1 below). I can choose a different file from that dropdown list if I want, but given they are all the same, I just left it as the default.
I then selected “Sheet1” as shown in 2 above and clicked OK. Note how 2 above says “Sample File Parameter1”. More on that below.
Wow, Look at All Those Queries!
Now the good news is that somehow the combine button has done a reasonable job at combining the files (see 1 below). Actually I wanted to unpivot the month columns, but I will come back to that later.
The bad news is all of the activity over on the left hand side. What the…? There are 5 additional “Queries” on the left (numbered 2 – 6) that do all sorts of things. Let me tell you what each of these are and then come back and explain how to use/interpret these things.
2. This is a parameter that can be used to change the sample file
3. This is the link to the sample file that was selected originally (I selected the first file in the folder and this is the link to that file).
4. This is the “by example query” – the most important query to know about.
5. This is an auto generated function that goes with 4 above.
6. This is the final output query (it is the query that is displayed in 1 above).
Let me explain each one of these queries so it is clear. These files work in pairs.
The First Pair: 2 Parameter & 3 Sample File
The easiest way to explain this pair is to make a copy of the sample file. The first thing I did was rename the sample file to be “Sample File 1”, then I right clicked (1 below), selected duplicate (2 below) and then created a copy, then renamed it Sample File 2.
I then changed the line of code for Sample File 2 so it points to =Source{1}[Content] as shown below (sample file 1 points to =Source{0}[Content] )
Now one thing to point out here. When I copied this query, Power Query automatically added a new step to the new query that wasn’t in the original query (as shown below). You have to be very careful with Power Query because sometimes it does this to “help you”, but in reality in this case the extra step is not required – I simply deleted it.
I now have 2 different sample files (Sample File 1 and Sample File 2) that point to 2 different files in my main folder.
Now when I select the Parameter shown in 1 below, I have a choice to swap which file it points to (shown in 2 below). This might be useful if one of the sample files was slightly different for some reason (still compatible but with some minor differences).
The Parameter and Sample File work together to determine which file will be used as the “by example” query covered below. Normal you don’t have to touch these unless you need to change your sample file for some reason in the future.
The key points are:
- The parameter and sample file pointer work together to tell Power Query which file to use as the sample file
- You can copy or repoint the sample file pointer to another file if you like.
- If you have multiple sample file pointers, you can use the parameter to swap between them.
The Second Pair: 4 By Example Query and 5 The Function
These next 2 items are the most important and they work together as a pair too.
4 is a special query that I call the “By Example Query”. You can edit this query how ever you like using the user interface. 5 is the function that goes with this “by example query”. To demonstrate how this works, let me show you the advanced editor for the function before and after I make changes.
When I select the function and then go into the “Advanced Editor”, I get this warning.
I push ahead anyway as I am not going to touch anything – I will simple exit without making any changes. This is what the function looks like before I touch the “by example query”.
I cancel out without making changes, then I proceed to make changes to the “by example query”. I selected the by example query (1 below), then selected the Product Column (2 below), right click and “unpivot other columns” shown as 3 below.
Now when I go back into the function for a second time, the function has changed as shown below.
Note the extra line of code for the “unpivot other columns” step.
As you can see, the way this works is you can edit the by example query using the UI, and the function is automatically updated for you. As long as you don’t touch the function directly, any changes you make in the by example query will be automatically reflected in the function without the need for you to write any M code at all.
Now There is An Error
OK, after making the change above, now there is a new problem. Note below that there is now a new error that pops up in the last query. When I changed the query above, the error below was created.
Errors can occur in dependent queries when you make a change to an upstream query, and this is exactly what has happened here. If I select the query with the error as shown in 1 below, you will note that the query in error refers to the function (2 below) and we know the function references the by example query (shown as 3 below).
It therefore follows that any changes you make to the by example query will effect the final query. In this case it is the change I made to the by example query that has caused an error. But there is nothing to worry about. It is just a case of selecting the query with the error and working out what has gone wrong. When I select the query in error as shown in 1 below, note how the last step (2 below) is changing the data types of all of the columns as indicated. But most of these columns no longer exist because of the unpivot action I took earlier. By simply deleting this “Changed Type” step below, the problem was solved.
To finish off my transformation, I turned the file name into the Year by selecting Transform\Extract\First 4 characters and reapplied the data typing (not shown).
The key points are:
- The “by example query” and the function work together as a pair.
- Any changes you make to the “by example query” will automatically be reflected into the function
- Any changes here may cause an error in the final query, but these are easy to fix. Just go to the final query and work out what has changed to cause the error.
Rename Your Queries if Necessary
If you have a workbook with lots of “combine ” transformations, it is a good idea to rename the query steps to something more meaningful to you. Power Query gives all the additional queries generic names and this can be confusing when you end up with a lot of combine queries. Take a look at my queries below after I have renamed them to something more meaningful. Try to give the queries unique names that clearly identify each set of data. If you only have 1 set of combine functions this wont really add much value. But if you have 10 of them in a single workbook, it will make it A LOT easier to manage.
Want to be a Power Query Ninja?
I have an extensive online video training course that covers this technique and a lot more. I always teach you how to use the UI to complete the tasks but also explain how it works under the hood. This way you get to learn the easiest way but also build depth of knowledge. You are read up about my Power Query training (and watch some free videos) here https://exceleratorbi.com.au/power-query-online-training/
Can you help me with this error? I am extracting pdf’s from one folder using power query. I’m getting the error below about the sample file, which is no longer available, as the contents of that folder keep changing as new documents are received and old ones are filed away or deleted. Is there a way to make your sample file dynamic?
An error occurred in the ‘Sample File’ query. Expression.Error: The key didn’t match any rows in the table.
Details:
Key=
Name=4-1234567.pdf
Table=[Table]
It’s saying that no items are found in one of the query steps. Often this can occur when the file combine is built with a file that contains something, but then the files loaded in the file combine step don’t have those same somethings. PDF files are the worst culprit
RE: Power Query automatically adding a new step (“imported Excel”) after changing source for SampleFile 2 – While stepping through your wonderfully clear example and applying it to my files, I discovered that (at least in Excel 2013) it didn’t add that step if I changed that line from inside the advanced editor, only when I changed it in the formula bar.
I think I’m going to be spending several of my lunch breaks in the near future exploring your website… 😉
Thx. Matt for the great explanation of this capability. One question though: Is there a best practice for deciding which transformation to define on the sample file (function) and which ones should be put in the final query. Maybe there are even situations where you do not have a choice?
I don’t know of any best practice. I have noticed that you can sometimes apply data type formatting on the sample file only to find that you have to do the data type formatting again on the final query. So it seems that data type formatting is best left until the final query. I guess that any columns you are going to remove should be removed from the sample query for efficiency. ie why import them all if you are just going to remove them. Having said that, Power Query is very efficient and it is possible that it will never bother to import them if it realises they are going to be removed anyway. I can’t be sure if it is that smart or not. But I guess it makes sense to push everything to the source query except the data conversions.
Thank you Matt for all your wonderful articles on Power BI.
Many thanks for such a clear explanation of a complex and obscure feature. Maintaining functions has been made easier with the new feature, but explaining how it works was much harder, until now. In my bookmarks.
Great post.
This new “Combine Binary experience” can be a bit confusing at first.
But once you get it, it turns out to be super powerful !