This is the second of a series of articles where I explain different methods of combining data from multiple files into a single workbook/table. In the first article I described how to use a function to combine data from identical Excel Workbooks into a single file using Power Query. Next week I will cover a second method of doing this with Excel Workbooks. But before I cover the second method, I thought it would be worthwhile to talk about how to do this with CSV files. CSV files are relatively simple to combine provided they are in the same format and are generally tabular in layout. If you need to do significant transformation of the CSV files (e.g. unpivoting columns and removing rows of data that are unused) then you may be better off to use the function approach covered last time.
Overview of the approach
This is probably the easiest way to combine multiple files, however it only works if you have CSV (or other text format) files. Here is an outline of the steps followed by instructions.
- Place all the CSV files in a folder
- Import the files in the folder using Power Query
- Expand the contents of all of the files
- Remove any unwanted duplicate headers from files 2+.
1. Place all the CSV files in a folder
The CSV files I am using for this demo consist of 3 columns with a single header row as shown below.
I have 3 of these files for different periods all in the same folder
2. Import the contents of the folder using Power Query
I then created a new blank workbook and created a new Power Query to import the contents of this folder.
Select the correct folder from the dialog, then click OK.
3. Expand the contents of all of the files
There are a lot of columns that you can see in the Power Query results window after completing the previous step (mainly metadata about each file). The only one needed for this combine process is the first column called [Content] marked as 1 below. You can get rid of all the other columns by clicking on the first column, and then selecting Remove Columns (2), Remove Other Columns (3).
After removing all the other columns, you will need to click on the button at the top of the [Content] column as shown below (called either combine or expand depending on your view).
After you have clicked this button, Power Query jumps into action and performs a number of steps automatically to combined the data (4 steps n my case) . You can see these in the Applied Steps window to the right.
You should always check the “changed type” step to make sure that Power Query’s guess of the data type for each column is correct. To do this, make sure you have the formula bar turned on (steps 1 and 2 below). Select the “changed type” step from the Applied Steps and then check the formula bar (3). You can see that Power Query has correctly identified my 3 columns as being of type Date, Integer and Integer. You don’t need to really understand the Power Query formula language to be able to confirm these things as it is pretty intuitive.
If any of the formats are incorrect, simply click on the column(s) and change the format to the correct one.
4. Remove any unwanted headers
The last step is to remove any unwanted headers. As you can see in the Applied Steps above, the second to last step is “Promoted Headers”. This step took the first row from the first CSV file and turned this row into the column names for the new table. But the problem is that each CSV file has an exact copy of this header row, and these rows exist in the data.
To remove the extra headers, click on any of the column filter buttons (shown as 1 below). In my example, there are more than 1,000 rows of data and hence you can’t see the header in the list as indicated by the warning in 2 below. To solve this problem, click on the “load more” link (shown as 3).
When I did this for this demo, I got an error shown as 1 below. The issue is that the header row that needs to be removed consists of the word “Date” but the word “Date” cannot be formatted as type “Date” (this data format change was executed automatically by Power Query in the previous step). You could delete the “Changed Type” step if you want, but the easiest thing to do is to is simply select the second last step (shown as 2 below) before completing this filter step.
So to make it work, first select the second last step (shown as 1 below), then click the filter on one of the columns (2 below). You will then see the header row called “Date” in the list of values. Deselect this header row (shown as 3) and then click OK.
Now you can click FileClose and Load and the process is complete.
Doing the same with Excel Workbooks
Excel is slightly different and needs some manual intervention, so that is a topic for next week.
if you want a comprehensive lesson on how to use Power Query, checkout my training course here https://exceleratorbi.com.au/product/power-query-excel-power-bi-online-training/
Hi Matt
Thank you for this detailed post
Is there anyway to just load all the cvs files in one folder without combining all of them
Do you mean “connect to a folder and load each CSV file as a new table”? There is no way that I know of to do that – you need to do them one at a time as each file will be its own query.
Hi Matt,
Thank you for your amazing article. It was very nice.
I have a doubt about multiple .CSV files in different locations. I want to combine those .CSV files in the power BI tool.
Can you please help me to resolve the issue.
Thank you very much.
If the csvs are in multiple locations, you will need to create a list of all the files in all the locations first. You can do this by selecting “from folder” for each location, but don’t combine the files, just load the folder names and file names. Then append all these together into one table before combining from there.
Hi! It’s really a powerful tool, however while getting Csvs Files it is not giving Prompt msg before combining the files. It is just combing all the files without enabling any option to edit. Please help me out with this as I need to handle large data in Csv file format.
There should be a drop down button giving you the choice. Regardless you can simply re-edit the query even after the data is loaded. How to re-edit depends if you are using Excel or Power BI
Based on your description, it sounds like the function technique from my last blog https://exceleratorbi.com.au/combine-excel-workbooks-power-query-method-1/ will work for you. Just transform the file (unpivot the columns) before creating the function. That way any column variations will be handled. Make sure you “unpivot other columns” otherwise you will get errors with new month column names.
Post a sample workbook at http://powerpivotforum.com.au if you still have problems.
Hi
Thanks for sharing such a wonderful article. I have found this really helpful to put into practice use but I am stuck with one issue.
I have yearly Pivot style reports in which left side I have customer, region and product information and in the data I have months as headers e.g 2015 report has month Jul-14 to Jun-15 and 2014 Jul-13 to JUN-14. When I am creating a query from folder its only taking headers from first report and ignoring other report headers which is an issue as I only get months from the report whichever is on top. Any ideas If I can get relevant months when unpivot report.