I had read an interesting request on the Whirlpool forum over the weekend. Whirlpool user Alicia2 wanted to extract Word document metadata into a Spreadsheet. That seemed like an interesting problem and something I could use some time. So I decided to help her out.
I built this Spreadsheet that you put into a folder along with the Word documents (DOCX format). The Spreadsheet then uses Power Query to suck in the file names in the folder (manual refresh required). Finally you click the Extract Metadata button, and the Spreadsheet does the rest.
Here is the extract word metadata Spreadsheet.
And here is a video that shows you how it works.
I found this Post that explains how to consolidate files on your computer with Power Query (kudos to Bacon Bits). That is a great tip, however I wanted to consolidate multiple Excel files in SharePoint using PowerQuery – not files on my computer. I used the principles in the link above to come up with the following procedure.
EDIT 21/11/14: Today Chris Webb posted this blog which covers an easier way to combine files on your PC.
Create a new folder in your sharepoint site. Add all the files you want to consolidate into this folder. Make sure they are all identical in file format, because we are going to consolidate them into a single file using a repeatable process – they must be identical in layout. Note: do not add any other files to this folder.
Create a SharePoint Consolidation File
Note that it is laid out in a way that is easy for the user to edit the numbers (ie Crosstab format). This is good for data entry, but it is not good for PivotTable reports. The next tip will help you fix that. You will want to create multiple copies of this data input file – one for each data source (in my case it is each person that manages a budget).
Import the Data from SharePoint into Excel Using Power Query
Open a new Excel Workbook
Go to Power Query
Select From Other SourcesFrom SharePoint List.
Paste the URL of your main SharePoint Site. Note well: You must use the main URL, not the URL of your Shared Documents or any other sub page.
Click OK
Go to the Workbook on the right hand side and right click on the SharePoint URL. You don’t have to wait for it to refresh. Righ click/edit.
Make sure you can see the formula bar at the top. If you can’t, click View and then select the Formula Bar Checkbox.
In the formula bar, change Sharepoint.Tables to read Sharepoint.Files . Note this is case sensitive.
Find one of the files in your SharePoint Folder. Click on the word “binary” for that file.
It should return something like Name = Sheet1, Data = Table. Click on Table
Edit the data in the table as you normally would with Power Query.
Once you have the data the way you want it in Power Query, turn the query into a function. First rename the Query to something like fGetMyFilesSharePoint. Then go to the Advanced Editor. In the Ribbon, click ViewAdvanced Editor. Add the code as shown below
Save the Power Query
Create a new query to import all the files located in your SharePoint folder
Once you have the files you need, create a custom column that uses your function produced in the first step above. Click OK
Expand the new custom column. Click as indicated. Make sure “Expand Column” is selected and then click OK.
You should end up with your consolidated table of data something like this.
If you want a comprehensive lesson on how to use Power Query, checkout my training course here https://exceleratorbi.com.au/power-query-online-training/
I’m a little confused by the second “Add the following code” box. Where is this code? I’m guessing it’s auto-generated, but I don’t see it.
I appreciate this is an old thread, but this is the first time I’ve needed to extract data from multiple files.
I was ok up to that point, but now I’m confuddled!
Yes, this is quite an old post. It still works, but you can do it with UI support now.
Here are some other posts to look at
https://exceleratorbi.com.au/power-query-combine-multiple-files-in-folder-another-method/
https://exceleratorbi.com.au/understanding-power-query-combine/
Makes sense. Definitely something I can configure with VBA. Another question though: Do you know what query settings would optimize this? For example, “Fast Data Load” in the function, but not the query table, or vice versa.
Looking for ways to optimize 🙂 Otherwise this solution is fantastic
Power Query optimisation is an enigma to me, and not something I could give good advice about. The 2 experts I know are Imke Feldmann and Chris Webb.
Just recently I started getting an error at a random file, and also at random, certain applied steps say “Unexpected error: Evaluation was canceled.”
I’ve tried rebuilding and adjusting every setting that I can think of, but this continues to persist (and it’s an intermittent error – sometimes all will load fine, and other times it will say download failed). I’ve checked that I am connected to the network, so that’s not an issue.
Do you have any thoughts?
error at a random file from which I want to pull data*
Also, not sure if this helps clarify, the row will show “Error” in every column…
Is it a consistent error, or inconsistent? If consistent, then you have to find the file that causes the issue. I normally do a binary search. Remove 50% of the files and refresh. If it works, switch to the other 50% and refresh to confirm it fails. Repeat until you find the file.
If inconsistent, try moving the files onto a local HDD to see if it is a connection issue. Try using a different tool (power BI vs Excel to see if it is a product bug. These are my only thoughts
It’s inconsistent….the worst kind of error.
I am trying to run the same process (unique functions) on 2 folders then combine the queries with an existing table in a workbook, and load to worksheet.
I think the problem is that the second folder is a subfolder within the first…does that sound plausible?
Or, could it be caused by data transformations that happen outside of the function (i.e., with the consolidated list in the query editor)…?
As you can probably tell, this is driving me crazy hah 🙂
I can’t see why a subfolder would cause an issue, unless you didn’t mean for the subfolder to be included. I find the way Power Query manages subfolders to be a bit strange, and there is no easy UI option to exclude subfolders.
Hi Matt,
Do you think this issue could be caused by different data types in matching columns before appending the tables?
Thanks for the continued support,
-Mike
It depends. If you have a function created by the file combine wizard, then definitely a different data type is a likely candidate. Go to the sample query and look for any “changed type” steps. Generally these are not needed in the sample query, and you can safely delete them.
I made sure to align the data types before appending and now getting a “can’t access content…filename/_api content/” error on one of the files. The problem is, it’s a different file causing the error each time…
Could it be that the tabs I’m pulling data from are VeryHidden?
If the file causing the problem changes each time, I don’t see how it can be related to the file.
So, I was able to fix the refresh by removing errors after adding the custom column, but now each time, only 1 file doesn’t pull in…and it changes each time. I can’t seem to find a pattern in the files that are causing the issue.
There are a couple other files (e.g., PPT) in the same folder…could these be interfering? I’ve filtered them out with the Extension column in the query editor prior to inserting the custom column.
Or could it be that I don’t apply the same filters in the function? i.e., I simply selected the file in the function without drilling down to the folder with the files…
If you have filtered out files using file type, then that is not the issue. No, it is not the function. The function uses a specific file and is executed first, and once. The function will work or not (all files can be processed or none). If I were providing commercial support for this problem, I would do the binary search I mentioned earlier.
There is no easy way to do this. There is no incremental refresh and hence no way to store snapshots. You would have to permanently store the old data and also the new data, and then compare the results.
I’m getting lost in the syntax update. When you say “Replace File Name with Variable”. What is the “Variable” in the example you use here? Would it be “CC Owner”?
This article is 5 years old now. There is probably an easier way to do this (I’m guessing). I would have to look.
I haven’t been able to find any easier solutions, though I did get this to work. Do you know if there is any way to highlight/flag cells in the query that have changed when the query is refreshed?
Short, sweet, and saved the day. Thanks Matt!
I could not see any of my excel files and found this tutorial.
I get as far as changing SharePoint.Tables to SharePoint.Files but when I do I get an Access to resource forbidden message – which would suggest an authority issue!
I also tried changing it to Sharepoint.Contents as @Shawn suggested but then I get We couldn’t authenticate with the credentials provided – similar issue to the above.
any ideas
Certainly sounds like an authentication issue. These can be very challenging to debug. I suggest you check the data source settings in Power Query (where this is located depends on what tool you are using). Clear the permissions and see if that helps.
This is a wonderful, flexible solution. Thank you for sharing these instructions!
will be checking tomorrow. If it works, it will just be amazing.
This 100% solved my problem. Thank you!
I am getting lost between createfunction2.png and createfunction1.png. Are you creating 2 separate queries?
After following createfunction2.png ; it asks me to invoke function? when I close and load without invoking, I get “Connection Only” under the function in the workbook query. If I invoke it asks me to enter parameter “FileName”, leaving blank and clicking okay gives me an error “Expression.Error: The Key did not match any rows in the table.”
mmm, it was a long time ago when I wrote this, and I don’t have the sample file I used. I think the createfunction1.png was an earlier version of the same file – not sure why I used that. Yes you should get a “connection only”, but you should not invoke the function from the Power Query UI. You need to move to the next step to create a new query to your SharePoint folder, and then add a custom column – that is where the function should be invoked, and it will take the file name from the column in your new query.
I am getting the same “key did not match any rows in the table” when trying to add the custom column. Do you know how to resolve this?
Sorry, no idea without seeing it. You could try this http://community.powerbi.com/t5/Integrations-with-Files-and/Expression-Error-The-key-did-not-match-any-rows-in-the-table/m-p/11536#M1965
Pingback: Power Query Performance Improvements - Test Drive - Excelerator BI
Hi,
the solution above would save me many hours manual work right, but I don’t get it to work 🙁 The problem is, as I see it that the formula I created is not loaded into the sheet. in the workbook queries tab stands “Load is disabled” below the formula query. Do you have any idea how I can enable the load?
Thanks in advance
Frank
Matt,
I’m trying to “Find one of the files in your SharePoint Folder”. I have many files on the SharePoint site, and they are buried many folders deep. Any idea why my folder isn’t showing up? Thanks!
Lisa
And to add to that…I know they’re not showing up because the preview window can’t show every one of my many SharePoint files in the preview. How do I get around this? Thanks!
Lisa
Hi Lisa
Did you try using the filter button as suggested here?
https://exceleratorbi.com.au/wp-content/uploads/2014/02/create-new-query.png
Yes, I only have one Excel file in my PowerQuery preview, so I’m only getting one filter result even though I have hundreds (if not more) of Excel files on my SharePoint site. Any other ideas?
Thanks!
Lisa
Sorry, hard to say without looking at it. Are you saying there are hundreds of files on your SharePoint site and only 1 shows up, or are you saying there are hundreds of files and only 1 XLSX shows up? If the latter, try clicking on the Extension filter after changing the M code to SharePoint.Files(…)
Power Query only brings in a sub set of records, so if you have LOTs of files, that could be the issue
Lisa,
Try using “Sharepoint.Contents” instead of “Sharepoint.Files”. This should give you a list of all of your “folders in SP, rather than all of your files, and you can drill in from there. It may help you narrow down the query enough so that you can see the files you are looking for.
I had heard this Patrick, but never tried it myself. In my use case here, I am using Power Query on the desktop client and not on SharePoint, so it is not really an issue. I agree with that this is a big limitation. Have you given feedback? There is an option in the Ribbon under “Power QuerySend Feedback”.
PowerQuery is very interesting BUT it’s major problem is total lack of refresh if you want to publish it on sharepoint