Cross Join with Power Query

Power Query is still very new, so there is a lot to learn.  I like to mix my learning up by doing some formal learning (such as reading a book like M is for Data Monkey) as well as “on the job” learning.  What I mean by “on the job” learning is when I go out of my way to use the software I am trying to learn in small, manageable but meaningful new ways.  You need to take the opportunity to solve real world problems in your real job if you are to ever build your skills.

I had a situation this week where I need to create a budget table in Power Pivot.  I had a choice of either loading a weekly budget as a single record per store and using DAX to calculate the Year To Date budgets, or load a set of records (one for each week) in a budget table.  I chose the latter in this case as it gave me a chance to try out some new skills in Power Query.  I have recreated the scenario below.

Weekly Budgets by Store

I have a spreadsheet that contains the budget per store per week.  The budget is the same for each week for each store for the next 15 weeks.  The budget table looks like this – easy to enter the data but not optimal for Power Pivot.  In my real life example, there were many more columns for Shrinkage, GP and some other metrics across the page.

image

Long narrow tables are normally better for Power Pivot, so I used Power Query to reshape this table so it looks like the one below – using a simple un-pivot columns transformation from Power Query.

image

Once this table was created in Power Query (import from spreadsheet then transformed), I named the Query “Budget” and set it to “only create connection”.

image

This makes it available to use in other queries without actually storing the data anywhere in Excel.  Each time you ‘materialise’ a query by adding it as a table in Excel or Power Pivot, it takes up space.  Only do this if you need the materialised view of the table.

Week ID Numbers

I needed to set the budget each week for the coming 15 weeks.  I was using a weekly calendar in this example (not a daily calendar) so I had week IDs as shown below in the format YYWW.  Note the non contiguous jump at the end of the calendar year into 2016.

image

Creating the List of Week IDs in Power Query

Here was my first opportunity to learn a new skill.  I knew it was possible to create a native list in Power Query rather than import the list from Excel ( I read this somewhere – probably something from Chris Webb) but wasn’t quite sure how.  I Googled and tried List.Generate() but that was wrong. Then a flashback – all I needed to do was use the following syntax using braces.

={1..5}

The above generates a list of numbers from 1 to 5.  With this in mind I did the following:

  • I created a new query by executing the following menu steps:
    • Power Query
    • From Other Sources
    • Blank Query
  • Renamed the Query “Weeks”
  • I wrote a line of code like this in the formula bar = {1545..1552,1601..1607} and it gave me exactly what I needed.  I didn’t know that the comma syntax would work for creating non contiguous ranges when I wrote the formula, but it did work.  After you get some practice and experience with a new language like this, you start to see the patterns in the syntax and you get free kicks like this one.  (Note:  If you can’t see the formula bar, turn it on from the View menu)

image

  • Note that at this point it is “just” a list, not a table.  To convert the list to a table, I clicked TransformTo Table, then gave the new table a column name.

image

  • I saved the query as “only create connection” as well so it didn’t materialise the table, just created the instructions on how to create the table in a new Query.

At this point I was very happy and impressed with my new skills, but there were still problems ahead.  The next step was to Cross Join the tables.

Cross Join the Two Tables

The weeks table above has 15 rows and the Budget Table has 10 rows.  When I say “cross join”, what I mean is I need to duplicate the entire Budget table for every week, adding the week column into the budget table.  So I need something like shown below, but for every possible combination.  ie 15 x 10 = 150 rows in my new table.

cross join

I did some quick Googling on Cross Join Power Query and found some references to Full Outer Joins, but this is not what I needed.  Then I had an idea – perhaps I could write a dummy function to simulate the Cross Join process.  I tested this and it worked well.

Edit 31/10/15

After posting, I got some good suggestions from Imke Feldman; it seems the function is not needed at all – makes sense actually but I hadn’t realised it.  Chris Webb posted a more performant approach too (which is important if you have large tables), so here is the updated approach.

Add a Custom Column to create a Cross Join

  1. Open the Weeks table and add a new custom column (shown below).  The formula for the new column is simply the name of the Budget Query.
    new add column
  2. After you click OK, you get a new column that “contains” the entire Budget table as an object in each row of the Weeks table (as shown below).
    image
  3. All I needed to do then is expand the new column by clicking the expand button (shown above) to create all the possible combinations (shown below)

image

I set the data types of the numeric columns and loaded the table directly into my data model so I could use it in Power Pivot.

You can also read Chris Webb’s suggestion in the comments below)

if you want a comprehensive lesson on how to use Power Query, checkout my training course here http://xbi.com.au/pqt

 

Share?

Comments

  1. Hi Matt,
    very cool trick with the comma in the list! Isn’t Power Query really rewarding once you start playing around with it?

    Re the function for the crossjoin: As it has no parameters, you could actually skip it and simply write the name of the query into the formula section of your new column (=Budget). (In your case you would need to “unfunction” it back to a simple table)

    (…I’m not a programmer at all, but from what I know about it, the simplicity of some concepts in Power Query must come quite as a surprise 🙂 )

    • Oh, cool. Thanks Imke, that is just the sort of “better way” I was thinking of/hoping for. I didn’t even think that was possible. I will test it out and update the post

  2. Matt, Imke,

    There are some potential performance problems with these approaches that I cover here:
    http://blog.crossjoin.co.uk/2014/06/02/join-conditions-in-power-query-part-1/
    http://blog.crossjoin.co.uk/2014/06/04/join-conditions-in-power-query-part-2-events-in-progress-performance-and-query-folding/

    The best approach to doing cross joins I have found is to create calculated columns on both tables that return the value 1, then do a merge to join these tables together.

    Chris

  3. Hi Matt,

    I think there is a simpler way to do that. The only you have to do is adding the column with your list of weeks to the first table and expand its. That is all 🙂

    Regards

      • But Matt, i’m not telling about adding the column containing first table in each row.
        I’m telling about adding the column to the first table (column containing your list in each row).
        I mean something like this
        =Table.AddColumn( YourLastStepInFirstTable, “Week”, each {1542..1562,1601..1607})
        and then expand this column.
        Now we have the same table as you have. Only the order of records is different.

        Regards

  4. I just wanted to say thank you for your blog!! Really helpful for those of us still learning the tricks of Power Query. I’ll take you up on your book recommendation too 🙂

Leave a Reply

Your email address will not be published. Required fields are marked *

Ten Things I Wish I Knew When I Started With Power Pivot.   Learn  Without The Pain!

x