Convert a Cell Value into a Column with Power Query

Level: Advanced Power Query

I am in the process of finalising some action items after a great Power BI, Power Pivot and Power Query training course in Melbourne.  During the training we looked at a scenario for one of the attendees where they needed to combine data from multiple Excel files into 1 big table.  I have blogged about this before in 3 articles starting here.

In this specific scenario we looked at in the training, there is a weekly extract of data from another BI system (sound familiar?).  Each week someone must use “Export to Excel” to grab the current week’s data and dump it into a file (a typical action for a BI system that isn’t user friendly).   The file format of the export looks something like this (indicative).

image

The key points about this data structure are.

  1. There is a date in a cell in the file header.
  2. The details are also in the same file, but there is no date column in the detail section.

The requirement is to end up with a table like this.

image

The rest of this article describes how to do it.

Step 1.  Connect to the file using Power Query

image

Step 2.  Filter the Table so the Date Row is at the Top

To do this, I simply did a “Remove Top Rows” to remove the top 2 rows.

image

Step 3.  Killer Trick to Extract the Actual Date

The next step is a killer trick.  The idea is to extract the date from the first column in the top row into a scalar value so it can be used later.  To execute this step, you need to use a hand written PQL language formula.

Click on the Fx button to insert a new custom step

image

If you can’t see the Fx button, it means you don’t have the formula bar turned on.  Turn it on like this

image

After clicking Fx, a new custom step will be added to the query pane (1 below).  The step (by default) assumes that you want to use the previous step as a starting point (2 below).

image

To extract the date as a scalar value (2 below), you need to use the function Record.Field as shown in 1 below.

image

Note how I have wrapped the original step #”Removed Top Rows” inside the function Record.Field.  The {0} portion of the formula says “take the first row from the previous step, and the “Column1” part says take the data in the column called “Column1”.  The intersection between the first row and this column is a single cell, and hence this formula is accessing the actual value (scalar value) in this one single cell in the workbook.  You can see it is a scalar value (2 above) because it doesn’t show a table, but a single value.

Strictly speaking I didn’t need to force the date into the first row – I just did this because it was cleaner.  I could have just as easily reference the row/column without removing the top rows with this formula

= Record.Field(#”Removed Top Rows”{2},”Column1″)

The number inside the curly braces represents the row number, with a base 0 index.  So 0 is the first row, 1 is the second row etc.

Step 3. Second Killer Trick – Get the Data from an Earlier Step

The next (little known) trick is that I can actually go back and get the data from an earlier step.  Power Query by default always uses the previous step as the starting point for the next step – a logical flow.  But it doesn’t have to be that way.  In this case the previous step is simply the scalar value = date, and that is not what I want. I need the step previous to that which is “Removed Top Rows”. 

Click on the Fx button to create another new custom step.

image

Now change the formula in the formula bar (1 above) so it points to the step “Removed Top Rows” instead of pointing to the step “Custom1”.  Like this

image

So now I am back were I was before extracting the Scalar Date value.

Step 4.  Proceed to shape the data the way you want it

I then executed some steps to cleanse and shape the table the way I want the data.  I ended up with the following.

image

Step 5.  Add the Date as a Column

The final step is to bring the date from step 2 into a column in the table.  This is quite easy now.

Add a Custom Column, and refer to the step “Custom1” that returned the scalar value

image

 

And I end up with this

image

Exactly the type of data structure needed to load into Power Pivot.  I simply reordered the columns to put the date column first, and then it was done.

Final Comments

If you want to (or need to), you can then turn the above query into a function.  Read more about that here.  If you turn it into a function, you can then use it to process all files in a folder and combine them into a single data table to load to Power Pivot.

If you want to look at the workbook, you can download it here => Extract scalar date

Share?

Comments

  1. There’s another way to do this that leaves your date in a reusable query and doesn’t interfere with the flow of your main query and uses next-to-no manually written code: If the date is in the same place every time, then do your initial Source and Navigation steps, then just right click on the date value and select “Drill Down”. That will create a query that returns the result of just that date. You can then add it back into your primary query by doing Add Column = Date, and you’re done.

    If it’s not in the same place every time, then do whatever processing you need to get it to a static location in the query (remove other columns, keep top/bottom rows, etc.) and do the Drill Down then.

    Resulting code below, I named the date query just “Date”.

    let
    Source = Excel.Workbook(File.Contents(“C:\Users\sfo-matthewr\Documents\Process Analysis\Misc Reporting\demo.xlsx”), null, true),
    Sheet1_Sheet = Source{[Item=”Sheet1″,Kind=”Sheet”]}[Data],
    Column1 = Sheet1_Sheet{2}[Column1]
    in
    Column1

    let
    Source = Excel.Workbook(File.Contents(“C:\Users\sfo-matthewr\Documents\Process Analysis\Misc Reporting\demo.xlsx”), null, true),
    Sheet1_Sheet = Source{[Item=”Sheet1″,Kind=”Sheet”]}[Data],
    #”Changed Type” = Table.TransformColumnTypes(Sheet1_Sheet,{{“Column1”, type any}, {“Column2”, type any}, {“Column3”, type any}, {“Column4″, type any}}),
    #”Removed Top Rows” = Table.Skip(#”Changed Type”,2),
    Custom2 = #”Removed Top Rows”,
    #”Removed Top Rows1″ = Table.Skip(Custom2,1),
    #”Promoted Headers” = Table.PromoteHeaders(#”Removed Top Rows1″),
    #”Unpivoted Columns” = Table.UnpivotOtherColumns(#”Promoted Headers”, {“data”}, “Attribute”, “Value”),
    #”Added Custom” = Table.AddColumn(#”Unpivoted Columns”, “Custom”, each Date),
    #”Reordered Columns” = Table.ReorderColumns(#”Added Custom”,{“Custom”, “data”, “Attribute”, “Value”}),
    #”Renamed Columns” = Table.RenameColumns(#”Reordered Columns”,{{“Custom”, “Date”}}),
    #”Changed Type1″ = Table.TransformColumnTypes(#”Renamed Columns”,{{“Date”, type date}})
    in
    #”Changed Type1″

      • I actually just did that for the first time the other day for a related project, I love when I’m working on something I see here or on one of the other blogs (especially since usually I’m the one learning new tricks from you or Rob or Ken!).

  2. Hello Matt,
    Nice trick (as always)!
    As far as I tied to a very ugly-shaped reporting system, I am always looking for such transformations. My cases a little bit more complicated, so I “invented” such tricks several times, for example here: http://excel-inside.pro/blog/2015/11/12/using-the-header-of-the-report-as-the-data-for-table-columns-in-power-query/ and here: http://excel-inside.pro/blog/2015/11/09/transfer-values-to-other-columns-in-power-query/
    May be you can find it interesting

  3. Hi Matt,

    for obtaining the value of the cell, you have a easiest way, just right click on the cell who contains the date and then select drilldown resulting in a new step, then give it a name and you can use it as a variable

    Jean-Pierre Girardot

    • Thanks Jean-Pierre for sharing. Yes I didn’t know this and it was shared by Matthew above too. I like this way you have both shown better, but I still think there is value in incrementally learning the PQL language. It is difficult to learn in a single sitting, so learning little snippets over time is a great way to learn. Still like your way better though 🙂

  4. Hi Matt

    I played around a few days ago with a similar issue. If you have the entry you want as the first entry in a column you can use a custom column with the following formula

    =Source{0}[Column2]

    Source zero is the first entry in the column.

    You then delete the top row and the custom column retains the value. The just finish the row 1 as header and rename the column.

    Its a common requirement as many Excel + csv files have report headers at the top containing data you want to capture.

    Regards

    Neale

    • HI Neale!
      Interesting approach, really.
      But if we have to work with relative big dataset, for example, of N rows, you will perfom drilling to Source{0}[Column2] all N times, and I think it could cause a performance leak. I think that assigning Source{0}[Column2] to variable could be faster then

      Cheers, Max

  5. Thanks for posting this topic, Matt! Very insightful.

    Thanks to Jean-Pierre’s shortcut drill-down method as well!

  6. Thanks everyone for this very useful post and comments. All excellent. My personal choice is to go step by step, and Matt you’re bloody good in that.

  7. You can directly get the date while adding a custom column
    let
    Source = Excel.Workbook(File.Contents(“C:\Users\SAM\Desktop\demo.xlsx”), null, true),
    Data = Source{[Item=”Sheet1″,Kind=”Sheet”]}[Data],
    mRemTopRows = Table.Skip(Data,3),
    mPromoteHeaders = Table.PromoteHeaders(mRemTopRows),
    mUnPivotCols = Table.UnpivotOtherColumns(mPromoteHeaders, {“data”}, “Attribute”, “Value”),
    mAddCustCol = Table.AddColumn(mUnPivotCols, “Date”, each Data{2}[Column1]),
    mReorderCols = Table.ReorderColumns(mAddCustCol,{“Date”, “data”, “Attribute”, “Value”}),
    mChgTyp = Table.TransformColumnTypes(mReorderCols,{{“Date”, type date}, {“data”, type number}, {“Value”, type number}})
    in
    mChgTyp

  8. Hi, many thanks for this, it’s exactly my situation. Additionally, I would like to merge the data from a number of these files so that I get a single list including the month column. I guess I need to open each data source, manipulate it to get the month in the correct column, and then import into the master column? Is there a straightforward way to do this. Many thanks,

  9. Matt, thanks so much for posting. I am just now reading it, and it is helpful to know how to perform these steps directly from the function bar without going into the editor. Great post!

Leave a Reply

Your email address will not be published. Required fields are marked *

Ten Things I Wish I Knew When I Started With Power Pivot.   Learn  Without The Pain!

x