Pivoting Text Data Using Power Query

It is very common to need to transform data from one “shape” to another “shape” before it can be used inside Power BI for analysis (although many beginners don’t realise this).   One such example is shown below, where the data in the table on the left hand side needs to be transformed into the table on the right hand side.  As you can see on the left, column A contains the attribute and column B contains the value of the attribute.  Every 4 lines of data is 1 record.  This specific problem is very common problem when your only source of data is from an extract (eg csv) from some other system, particularly older systems where you can’t change the format of the data extract.

The trick to solving this problem is to pivot the text data into a tabular shape.  I learnt how to solve this problem from my friend Imke Feldmann at The BIcountant.

You Cannot Pivot Text Data Directly

This is how NOT to do it.  If you click on column A above and select “Pivot Column” from Power Query, you will get the results shown below.

This approach will not solve this problem because Power Query doesn’t know how to uniquely identify each record set, and that is a hint on how to solve this problem.

How to Pivot Text Data

The trick to solve this problem is to first stamp each set of 4 rows as a unique record.

In the example above, each record is spread over 4 consecutive rows in the table, so we first need to stamp a unique record number ID to each set of 4 rows that makes up a single record.  Once we have this unique identifier for each row, we can use the identifier to pivot the data to produce the desired results.

Steps for Pivoting Text Data

I completed the demo below using Power Query for Excel, but of course it also applies to Power Query for Power BI.  You can download the sample workbook here.

Step 1:

I loaded the table into Power Query, selecting Data tab (see 1 below) and then clicking on From Table/Range (see 2 below).

Step 2:

In the Power Query window, I went to the Add Column tab (see 1 below) and clicked on Index Column (see 2 below). A new numeric column is added (see 3 below).

Step 3:

Next, I converted this numeric column so that it identifies (stamps) each record uniquely. For this I went to Transform tab (see 1 below), Standard (see 2 below) and clicked Integer-Divide (see 3 below).

I divided by 4 (see 1 below) as that is the number of rows in each record.

 

There is now a unique integer in the new column that identifies each record (see 0, 1, 2, 3, 4, 5 below) in the table.

Step 4: This is the final step.

I selected the first column (see 1 below), went to Transform tab (see 2 below) and then selected Pivot Column (see 3 below).

In the Pivot Column dialog box (see 1 below) I selected the Values Column to be Column2 (see 2 below) and then selected Advanced Options (see 3 below) and set to Don’t Aggregate (see 4

below)

.

This pivoted the data as shown below.

I do not need the index column anymore and I deleted it. I then fixed the data formats, and that’s it.

 

Once I load this table back into Excel beside the original table, it looks as in the first image that we started with in this article.

What Next?

If you want a comprehensive lesson on how to use Power Query, checkout my training course here http://xbi.com.au/pqt

You can learn similar tricks and tips to this on Power Query with illustrative examples. All the examples demoed in the course are also available to download in the course material so that you can practice along as you view the videos for an in-depth understanding of the concepts.

Share?

Comments

  1. Hi Matt.. cool trick. Probably, you already know this, but if you have a single column of only stacked data and no field identifiers, then you can still un-stack as follows:
    1. Load single column data to PQ
    2. Add Column – Index
    3. On Index Column – Add Column Integer-Divide by whatever number or fields per record
    4. On Index Column – Transform w/Modulo specifying whatever number of fields per record
    5. Select both Index-Modulo column and Index Integer-Divide column and Pivot on original Data column; Don’t Aggregate
    You get the same un-stacked transformed data as your example, but with 0, 1, 2 etc. as the column/field names. Rename columns as appropriate and remove un-needed Integer-Division column. Close and load to existing or new worksheet.
    I learned this from John MacDougall at howtoexcel.org and his YouTube channel. As I said, I’m sure you already know this method.. just sharing it here for any readers who might have a data stack with no field identifiers. Thanks for your insights tips.. always useful and helpful!

  2. Nice, I just spent an hour today doing that manually. I just have to remember the trick exists next time, and there will be a next time.

    • Doh. I normally would have released this one day earlier, but I got distracted. One thing I like about blogging and blogs is that you can always refer back later. The downside for you is you have to remember which blog it was. 😀

  3. Nice technique. Curious, how would you handle a situation where the fields could vary. For example, record 0 has five fields, record 1 has only 4, record 3 has six? I could see this situation easily occurring

    • 1. Add Index Column
      2. Add Conditional Column where if [Column1] = “Employee” then result is [Index] otherwise null
      3. Fill Down on the new column
      4. Pivot as before

      This works as long as the “Employee” row is always the first of each batch

  4. @Matt – Here is a another way

    let
    Source = Excel.CurrentWorkbook(){[Name=”D”]}[Content],
    mUnStack = Table.PromoteHeaders(Table.Combine({#table(4,{List.FirstN(Source[Column1],4)}),#table(4,List.Split(Source[Column2],4))})),
    mChgTyp = Table.TransformColumnTypes(mUnStack,
    {{“Employee”, type text},
    {“Employee Number”, type text},
    {“Start Date”, type date},
    {“Salary”, type number}})
    in
    mChgTyp

  5. Hi Matt,
    Only for fun with PQ i decided to share my one-step solution using new PQ function (Table.Split).
    I would like to stress… only for fun ( and learning “M” for someone who wants to learn :-)) )
    let
    Source = Table.FromRecords(List.Transform(Table.Split(Excel.CurrentWorkbook(){[Name=”Table2″]}[Content], 4), each Record.FromTable(_)))
    in
    Source

    BTW i’m not sure but maybe i’m a person who showed Imke this trick… maybe.

    • Any trick from you Bill is appreciated :-). I know this can be done using coding (as you have demonstrated). Personally I like learning a bit of M code, but I don’t think I will ever be a coder. I like to share ways to leverage the UI to get the job done as I think that is more accessible to most users. But if you can write the code, then that is a whole new world. Thanks for sharing.

Leave a Reply