Level: Advanced Power Query
I am in the process of finalising some action items after a great Power BI, Power Pivot and Power Query training course in Melbourne. During the training we looked at a scenario for one of the attendees where they needed to combine data from multiple Excel files into 1 big table. I have blogged about this before in 3 articles starting here.
In this specific scenario we looked at in the training, there is a weekly extract of data from another BI system (sound familiar?). Each week someone must use “Export to Excel” to grab the current week’s data and dump it into a file (a typical action for a BI system that isn’t user friendly). The file format of the export looks something like this (indicative).
The key points about this data structure are.
- There is a date in a cell in the file header.
- The details are also in the same file, but there is no date column in the detail section.
The requirement is to end up with a table like this.
The rest of this article describes how to do it.
Step 1. Connect to the file using Power Query
Step 2. Filter the Table so the Date Row is at the Top
To do this, I simply did a “Remove Top Rows” to remove the top 2 rows.
Step 3. Killer Trick to Extract the Actual Date
The next step is a killer trick. The idea is to extract the date from the first column in the top row into a scalar value so it can be used later. To execute this step, you need to use a hand written PQL language formula.
Click on the Fx button to insert a new custom step
If you can’t see the Fx button, it means you don’t have the formula bar turned on. Turn it on like this
After clicking Fx, a new custom step will be added to the query pane (1 below). The step (by default) assumes that you want to use the previous step as a starting point (2 below).
To extract the date as a scalar value (2 below), you need to use the function Record.Field as shown in 1 below.
Note how I have wrapped the original step #”Removed Top Rows” inside the function Record.Field. The {0} portion of the formula says “take the first row from the previous step, and the “Column1” part says take the data in the column called “Column1”. The intersection between the first row and this column is a single cell, and hence this formula is accessing the actual value (scalar value) in this one single cell in the workbook. You can see it is a scalar value (2 above) because it doesn’t show a table, but a single value.
Strictly speaking I didn’t need to force the date into the first row – I just did this because it was cleaner. I could have just as easily reference the row/column without removing the top rows with this formula
= Record.Field(#”Removed Top Rows”{2},”Column1″)
The number inside the curly braces represents the row number, with a base 0 index. So 0 is the first row, 1 is the second row etc.
Step 3. Second Killer Trick – Get the Data from an Earlier Step
The next (little known) trick is that I can actually go back and get the data from an earlier step. Power Query by default always uses the previous step as the starting point for the next step – a logical flow. But it doesn’t have to be that way. In this case the previous step is simply the scalar value = date, and that is not what I want. I need the step previous to that which is “Removed Top Rows”.
Click on the Fx button to create another new custom step.
Now change the formula in the formula bar (1 above) so it points to the step “Removed Top Rows” instead of pointing to the step “Custom1”. Like this
So now I am back were I was before extracting the Scalar Date value.
Step 4. Proceed to shape the data the way you want it
I then executed some steps to cleanse and shape the table the way I want the data. I ended up with the following.
Step 5. Add the Date as a Column
The final step is to bring the date from step 2 into a column in the table. This is quite easy now.
Add a Custom Column, and refer to the step “Custom1” that returned the scalar value
And I end up with this
Exactly the type of data structure needed to load into Power Pivot. I simply reordered the columns to put the date column first, and then it was done.
Final Comments
If you want to (or need to), you can then turn the above query into a function. Read more about that here. If you turn it into a function, you can then use it to process all files in a folder and combine them into a single data table to load to Power Pivot.
If you want to look at the workbook, you can download it here => Extract scalar date
Thanks for the detailed information with screen shots. I have a similar problem. I need to have the invoice number on the first column, but this number is in a row that’s third from the bottom. How can I do that?
Full instructions are in the blog. In your case, you have to drill down on a different cell, but everything else is the same.
Matt, this is a life saver…awesome tip. Thank you so much for posting this article.
Want to share my appreciation. I literally am first day on using power query and would say I am a beginner excel user. Your guide is so well written that I had no issues following along and understanding what I needed to do and why. This guide has saved me hours upon hours of future work.
Thank you so much!
Very, very helpful. Easy to follow, excellent!
Thank you!
Hello,
I would like to thank you for this interesting blog. It was very helpful. I’m working on a similar case but my query is a combination of 30 files that has in the top a particular date. So i need to report in a column all the dates for the lines of each file in the same query.
I don’t know when you will see this, but I am presenting to the UK user group in 35 mins from now. I have this exact use case (close enough) in the session. https://community.powerbi.com/t5/Power-BI-UK-User-Group/Learning-M-Using-the-UI-With-Matt-Allington/ev-p/2339211
Thank you for your response. Unfortunately, i didn’t see your response before. Is there any recorded session link?
It will be shared. https://community.powerbi.com/t5/Power-BI-UK-User-Group/gh-p/pbi_UK_usergroup
Thank you – really helpful. I’ve been looking for this for a while.
Hi Matt,
Great trick, I tried copying this query over into other imported worksheets in Powerbi but it keeps referencing back to my original “date”. How can I customise the workflow so that the query can copied to another sheet and it finds the cell on the active sheet not the previous one?
It’s impossible to say without looking at the code. If you have not done so already, turn on the formula bar and look at the code. Even if you can’t write M, you can normally work it out by looking at the code.
thanks Matt, very useful. was able to apply it to my data very easily. worked first time as well! which is always a nice surprise for me.
Great Tip!!! Thanks for the help
Hi Matt thanks for this great tutorial. I have multiple files in a folder with this problem. Can I use this technique with many excel files in a folder?
As long as they are all the same, you can use this technique along with file combine. When you do the file combine, PQ will get it wrong. First edit the transfer sample file query and get that right. The. go back and edit the combined query
Fantastic! Took me a minute to understand exactly how to do the multiple files, but once I got it, magic!! Thanks!
it is magic (again 🙂
Above idea is great but facing one issue. While applying these changes to a sample & then consolidating it’s returning an error stating “Column is not founnd in the table”, help me in reslving the same
This article helped me enormously! I found out that in 2020 you can right click any cell value and choose “drill down” to do step 2 and 3 and killer trick 1 in a single click. Instead of “Custom” the query step will have the name of the column
Great article, very well explained. More tidbits than I expected.
Very Informative and will helped me in one of my project I am working on.
I have more then one Row with similar requirement, so do I keep repeating 1st and 2nd Killer Trick or there is a better way ?
Lets say I have Four entries to transfer into columns how to approach that scenario?
I guess it depends on the layout. You should be able to refer to any cell from a previous table by writing the following code =StepName{RowNumberWithZeroReference}[ColumnName]
What if the row in which the data you want to extract is dynamic? It may appear in a different row within the same column from CSV to CSV file?
Thanks,
Scott
Well, I guess it depends. You need to write code that works under any circumstances, of course. If the column is fixed and the row can vary, is it possible to remove blank rows so that the row needed becomes the first row, then drill down? You just need a way to get the row in question into a constant row position so you can extract it.
I repeated the steps above three times and converted three scalars into three columns. You can do that four times. or as many times as you needed.
Hi Matt,
Great Trick,
this is going to help me in lot of places
Hello Matt!
What a great post, I’ve been struggling with this for a couple of days and this is so detailed that helped me a bunch.
Thanks for taking the time!
Barbara
Awesome Trick! I used it to add column of my daily store report. However, in case it seems to be NOT dynamic whatever I choose in CUSTOM1 as my store name stays the same for other stores. Any help appreciated.
This process selects the value in a single cell and adds it to every row in a column. It is not dynamic. You could post a sample sheet and question at http://powerpivotforum.com.au
Matt, thanks so much for posting. I am just now reading it, and it is helpful to know how to perform these steps directly from the function bar without going into the editor. Great post!
Hi, many thanks for this, it’s exactly my situation. Additionally, I would like to merge the data from a number of these files so that I get a single list including the month column. I guess I need to open each data source, manipulate it to get the month in the correct column, and then import into the master column? Is there a straightforward way to do this. Many thanks,
well not necessarily. If the cell is always in the same location in all sheets (ie the sheets are the same shape) then you can use the approach I describe here to transform a single sheet, convert it to a function and then go from there. https://exceleratorbi.com.au/combine-excel-workbooks-power-query-method-1/
many thanks. I got that to work. Strangely, I had to reduce the row references in the function by 1 in the transformation stage when identifying the date cell and removing top rows. Don’t know why. Much appreciated.
You can directly get the date while adding a custom column
let
Source = Excel.Workbook(File.Contents(“C:\Users\SAM\Desktop\demo.xlsx”), null, true),
Data = Source{[Item=”Sheet1″,Kind=”Sheet”]}[Data],
mRemTopRows = Table.Skip(Data,3),
mPromoteHeaders = Table.PromoteHeaders(mRemTopRows),
mUnPivotCols = Table.UnpivotOtherColumns(mPromoteHeaders, {“data”}, “Attribute”, “Value”),
mAddCustCol = Table.AddColumn(mUnPivotCols, “Date”, each Data{2}[Column1]),
mReorderCols = Table.ReorderColumns(mAddCustCol,{“Date”, “data”, “Attribute”, “Value”}),
mChgTyp = Table.TransformColumnTypes(mReorderCols,{{“Date”, type date}, {“data”, type number}, {“Value”, type number}})
in
mChgTyp
Thanks everyone for this very useful post and comments. All excellent. My personal choice is to go step by step, and Matt you’re bloody good in that.
Thanks for posting this topic, Matt! Very insightful.
Thanks to Jean-Pierre’s shortcut drill-down method as well!
Hi Matt
I played around a few days ago with a similar issue. If you have the entry you want as the first entry in a column you can use a custom column with the following formula
=Source{0}[Column2]
Source zero is the first entry in the column.
You then delete the top row and the custom column retains the value. The just finish the row 1 as header and rename the column.
Its a common requirement as many Excel + csv files have report headers at the top containing data you want to capture.
Regards
Neale
HI Neale!
Interesting approach, really.
But if we have to work with relative big dataset, for example, of N rows, you will perfom drilling to Source{0}[Column2] all N times, and I think it could cause a performance leak. I think that assigning Source{0}[Column2] to variable could be faster then
Cheers, Max
Hi Matt,
for obtaining the value of the cell, you have a easiest way, just right click on the cell who contains the date and then select drilldown resulting in a new step, then give it a name and you can use it as a variable
Jean-Pierre Girardot
Thanks Jean-Pierre for sharing. Yes I didn’t know this and it was shared by Matthew above too. I like this way you have both shown better, but I still think there is value in incrementally learning the PQL language. It is difficult to learn in a single sitting, so learning little snippets over time is a great way to learn. Still like your way better though 🙂
Hello Matt,
Nice trick (as always)!
As far as I tied to a very ugly-shaped reporting system, I am always looking for such transformations. My cases a little bit more complicated, so I “invented” such tricks several times, for example here: http://excel-inside.pro/blog/2015/11/12/using-the-header-of-the-report-as-the-data-for-table-columns-in-power-query/ and here: http://excel-inside.pro/blog/2015/11/09/transfer-values-to-other-columns-in-power-query/
May be you can find it interesting
Cool. Just shows that almost anything is possible with Power Query. You just have to chop the elephant into pieces.
There’s another way to do this that leaves your date in a reusable query and doesn’t interfere with the flow of your main query and uses next-to-no manually written code: If the date is in the same place every time, then do your initial Source and Navigation steps, then just right click on the date value and select “Drill Down”. That will create a query that returns the result of just that date. You can then add it back into your primary query by doing Add Column = Date, and you’re done.
If it’s not in the same place every time, then do whatever processing you need to get it to a static location in the query (remove other columns, keep top/bottom rows, etc.) and do the Drill Down then.
Resulting code below, I named the date query just “Date”.
let
Source = Excel.Workbook(File.Contents(“C:\Users\sfo-matthewr\Documents\Process Analysis\Misc Reporting\demo.xlsx”), null, true),
Sheet1_Sheet = Source{[Item=”Sheet1″,Kind=”Sheet”]}[Data],
Column1 = Sheet1_Sheet{2}[Column1]
in
Column1
let
Source = Excel.Workbook(File.Contents(“C:\Users\sfo-matthewr\Documents\Process Analysis\Misc Reporting\demo.xlsx”), null, true),
Sheet1_Sheet = Source{[Item=”Sheet1″,Kind=”Sheet”]}[Data],
#”Changed Type” = Table.TransformColumnTypes(Sheet1_Sheet,{{“Column1”, type any}, {“Column2”, type any}, {“Column3”, type any}, {“Column4″, type any}}),
#”Removed Top Rows” = Table.Skip(#”Changed Type”,2),
Custom2 = #”Removed Top Rows”,
#”Removed Top Rows1″ = Table.Skip(Custom2,1),
#”Promoted Headers” = Table.PromoteHeaders(#”Removed Top Rows1″),
#”Unpivoted Columns” = Table.UnpivotOtherColumns(#”Promoted Headers”, {“data”}, “Attribute”, “Value”),
#”Added Custom” = Table.AddColumn(#”Unpivoted Columns”, “Custom”, each Date),
#”Reordered Columns” = Table.ReorderColumns(#”Added Custom”,{“Custom”, “data”, “Attribute”, “Value”}),
#”Renamed Columns” = Table.RenameColumns(#”Reordered Columns”,{{“Custom”, “Date”}}),
#”Changed Type1″ = Table.TransformColumnTypes(#”Renamed Columns”,{{“Date”, type date}})
in
#”Changed Type1″
Hey Matthew, really nice trick with the Drill Down step – I didn’t know you can do that.
I actually just did that for the first time the other day for a related project, I love when I’m working on something I see here or on one of the other blogs (especially since usually I’m the one learning new tricks from you or Rob or Ken!).