How Query Folding Works

Level: Beginners

Power Query is a product that allows business users to extract data from any source, cleanse and transform that data, and then load the modified data to a new location, typically into an Excel spreadsheet or a Power BI data model.  Power Query is very easy to learn, and most tasks can be completed by using the buttons on the menus.

When the data source to be loaded is a file or files eg a csv(s), Power Query reads the data from the file(s) and processes the transformation tasks directly inside Power Query.  This is fine for relatively simple tasks (eg remove or rename columns) and relatively small files (say less than 1 million rows).  But bad things can happen when you try to load tens of millions of rows using Power Query from a csv file and then try to preform ‘heavy’ tasks, such as sorting, over the data to be loaded.  To sort 10 million rows from a csv, Power Query would first need to load every record into memory and then complete the sort task from there.  While it may be possible to do this, it certainly isn’t the best approach.

Persistent Storage

Power BI and Power Pivot for Excel do not have their own persistent storage of the source data.  These products are a database, but they are a reporting database, not a storage database.  You will need a place to persistently store your source data outside of Power BI and Power Pivot for Excel if you want to be able to refresh your data in the future.

You can of course keep your source files (csv, xlsx) as the persistent storage if you have relatively small amounts of data, but if you are going to process a lot of data you really should consider loading the source data into a relational database. This could be something like Microsoft Access but better still, you could use something like SQL Server if you have it available. Even if you don’t have access to SQL Server, you could consider installing SQL Server Express on your laptop (it is free).  Believe me SQL Server is not that hard to learn (with YouTube) and it is well worth the effort to better manage your persistent storage of data.

What is Query Folding

Inside Power Query, the term query folding refers to the the process of taking complex tasks generated by the Power Query UI (such as sorting 10 million rows of data) and pushing those tasks back to the database for execution.  I like to think of Power Query “writing down” some SQL code on a piece of paper, folding the paper in half and passing it over to the database as a set of instructions for execution.

Power Query will automatically execute query folding under certain conditions including:

  1. The source must be a database that can accept a folding request.  Most relational databases are included.  It can also be an OData source, SSAS and Google Analytics (thanks Miguel and Chris).
  2. If you start your query with SQL code you wrote yourself, no further steps inside Power Query will be folded.
  3. Once you perform a step that cannot be folded, query folding stops for the rest of the query.

How do I Know if Query Folding is Active?

When you use Power Query, the steps you produce are listed in the right side “applied steps” panel. You can check if a query step has generated query folding (or not) by checking for the existence of Native Query code inside that step.  To do this, right-click on one of the query steps and then select “view Native Query”. If you are able to do this, ie if view Native Query is enabled, it means Power Query has successfully performed query folding.

Do the Heavy Lifting First

Your objective when using Power Query over a relational database should be to do all the heavy lifting first.  You basically want to make sure query folding is enabled for as long as possible so the harder tasks are performed more efficiently by the database and not by Power Query itself.  The best way to check is to check if the last step in your query uses query folding.   If the last step uses query folding, then every step uses query folding.  However if you find that that last step is not using query folding, then you should work you way back up the list of applied steps to find out where query folding stopped.  If possible, change the order of execution of your steps so that all the foldable tasks are completed first before you complete any step that cannot be folded (e.g. Split Column).

I have created a brief video to show how Query Folding works and how to check where folding stops (there is no sound in this video, so you can look at it now 🙂 ).

Power Up Your Skills

You can learn this stuff from online articles, but why not excelerate your learning (see what I did there?) by doing some formal training.  I have an excellent online Power Query training course that will teach you query folding and so much more with over 7.5 hours of structured video instruction using real world examples.  You can checkout my training course here => Power Query for Excel and Power BI – Online Training

Share?

Comments

  1. hola Mat siempre es genial tus publicaciones, seria genial que los cursos estuvieran disponibles en español. cuidate mucho, exitos en todo. 🙂

  2. Matt- great tutorial. In this example, it is straightforward. Any way you can do a blog (or video) of query folding when doing a left outer join? I am having trouble understanding this:

    You connect to Table 1 (75MB) in a database and do some transformations and close and load as a connection.
    Then you connect to Table 2 (100MB) in the same database and do some transformations and close and load as a connection.
    So now we have two queries in power query. When I do a left outer join (Table 1 is the left) it takes a long time. Obviously query folding is not taking place. What steps using the Power Query interface can I do in this scenario to do the join with query folding so it wont take 30 minutes?
    That seems like a much harder query folding task than simply sorting rows. Any help on this would be super helpful. Thank you.

  3. Hi Matt,
    Nice article, but there are a few inaccuracies. First, there are a few other sources apart from relational databases and OData that support query folding, for example SSAS and Google Analytics. Second, your point about no manual M coding allowed is wrong – there’s no difference between M code you write manually and M code generated by the UI, some things can be folded and some things can’t but it depends on what you do in the code, no how you write the code. You’re right though that using your own SQL query prevents any further folding.

    Chris

  4. I love query folding. A comment, and a plea for help.
    Comment: If you change the data type as a separate step, it will most likely break query folding, but if you include the data type change in the step by manually editing the formula in the formula bar, it will change the data type and preserve query folding. The below will preserve query folding.

    #”Added Commission” = Table.AddColumn(dbo_oeordlin_sql, “Commission”, each [unit_price] * .05),
    #”Changed Type” = Table.TransformColumnTypes(#”Added Commission”,{{“Commission”, type number}}),

    That preserves query folding because the data type was already a number and it is just a visual in PQ to change ABC/123 to 1.2 in the data type header.

    This will break query folding:
    #”Added Commission” = Table.AddColumn(dbo_oeordlin_sql, “Commission”, each [unit_price] * .05),
    #”Changed Type” = Table.TransformColumnTypes(#”Added Commission”,{{“Commission”, Int64.Type}}),
    It had to change a decimal number to an integer. It won’t fold that, so folding breaks, but the below will preserve query folding:

    = Table.AddColumn(dbo_oeordlin_sql, “Commission”, each [unit_price] * .05, Int64.Type)

    Just insert “, Int64.Type” in the formula bar before the last paren manually, and it works. Note that editing the formula might destroy that, and you’ll have to put it back.

    As for the plea for help, I would love it if anyone had any ideas how to convert a YYYYMMDD integer into a valid date and not break folding. I’ve tried dozens of methods, and it always breaks. Converting to a date is no issue, but folding gets killed.

    • Can you get someone to create a view in SQL that has the date column you need? I’m not even sure if SQL can do that task efficiently – I’m not knowledgeable enough on TSQL to know.

Leave a Reply

Your email address will not be published. Required fields are marked *

Ten Things I Wish I Knew When I Started With Power Pivot.   Learn  Without The Pain!

x