This is the second of a series of articles where I explain different methods of combining data from multiple files into a single workbook/table. In the first article I described how to use a function to combine data from identical Excel Workbooks into a single file using Power Query. Next week I will cover a second method of doing this with Excel Workbooks. But before I cover the second method, I thought it would be worthwhile to talk about how to do this with CSV files. CSV files are relatively simple to combine provided they are in the same format and are generally tabular in layout. If you need to do significant transformation of the CSV files (e.g. unpivoting columns and removing rows of data that are unused) then you may be better off to use the function approach covered last time.
Overview of the approach
This is probably the easiest way to combine multiple files, however it only works if you have CSV (or other text format) files. Here is an outline of the steps followed by instructions.
- Place all the CSV files in a folder
- Import the files in the folder using Power Query
- Expand the contents of all of the files
- Remove any unwanted duplicate headers from files 2+.
1. Place all the CSV files in a folder
The CSV files I am using for this demo consist of 3 columns with a single header row as shown below.
I have 3 of these files for different periods all in the same folder
2. Import the contents of the folder using Power Query
I then created a new blank workbook and created a new Power Query to import the contents of this folder.
Select the correct folder from the dialog, then click OK.
3. Expand the contents of all of the files
There are a lot of columns that you can see in the Power Query results window after completing the previous step (mainly metadata about each file). The only one needed for this combine process is the first column called [Content] marked as 1 below. You can get rid of all the other columns by clicking on the first column, and then selecting Remove Columns (2), Remove Other Columns (3).
After removing all the other columns, you will need to click on the button at the top of the [Content] column as shown below (called either combine or expand depending on your view).
After you have clicked this button, Power Query jumps into action and performs a number of steps automatically to combined the data (4 steps n my case) . You can see these in the Applied Steps window to the right.
You should always check the “changed type” step to make sure that Power Query’s guess of the data type for each column is correct. To do this, make sure you have the formula bar turned on (steps 1 and 2 below). Select the “changed type” step from the Applied Steps and then check the formula bar (3). You can see that Power Query has correctly identified my 3 columns as being of type Date, Integer and Integer. You don’t need to really understand the Power Query formula language to be able to confirm these things as it is pretty intuitive.
If any of the formats are incorrect, simply click on the column(s) and change the format to the correct one.
4. Remove any unwanted headers
The last step is to remove any unwanted headers. As you can see in the Applied Steps above, the second to last step is “Promoted Headers”. This step took the first row from the first CSV file and turned this row into the column names for the new table. But the problem is that each CSV file has an exact copy of this header row, and these rows exist in the data.
To remove the extra headers, click on any of the column filter buttons (shown as 1 below). In my example, there are more than 1,000 rows of data and hence you can’t see the header in the list as indicated by the warning in 2 below. To solve this problem, click on the “load more” link (shown as 3).
When I did this for this demo, I got an error shown as 1 below. The issue is that the header row that needs to be removed consists of the word “Date” but the word “Date” cannot be formatted as type “Date” (this data format change was executed automatically by Power Query in the previous step). You could delete the “Changed Type” step if you want, but the easiest thing to do is to is simply select the second last step (shown as 2 below) before completing this filter step.
So to make it work, first select the second last step (shown as 1 below), then click the filter on one of the columns (2 below). You will then see the header row called “Date” in the list of values. Deselect this header row (shown as 3) and then click OK.
Now you can click FileClose and Load and the process is complete.
Doing the same with Excel Workbooks
Excel is slightly different and needs some manual intervention, so that is a topic for next week.
if you want a comprehensive lesson on how to use Power Query, checkout my training course here https://exceleratorbi.com.au/product/power-query-excel-power-bi-online-training/