How Query Folding Works

Name: BI Training
Address: AU
Telephone: @
Price range: @

By Matt Allington / September 4, 2018 December 31, 2019

Level: Beginners

Power Query is a product that allows business users to extract data from any source, cleanse and transform that data, and then load the modified data to a new location, typically into an Excel spreadsheet or a Power BI data model. Power Query is very easy to learn, and most tasks can be completed by using the buttons on the menus.

When the data source to be loaded is a file or files eg a csv(s), Power Query reads the data from the file(s) and processes the transformation tasks directly inside Power Query. This is fine for relatively simple tasks (eg remove or rename columns) and relatively small files (say less than 1 million rows). But bad things can happen when you try to load tens of millions of rows using Power Query from a csv file and then try to preform ‘heavy’ tasks, such as sorting, over the data to be loaded. To sort 10 million rows from a csv, Power Query would first need to load every record into memory and then complete the sort task from there. While it may be possible to do this, it certainly isn’t the best approach.

Persistent Storage

Power BI and Power Pivot for Excel do not have their own persistent storage of the source data. These products are a database, but they are a reporting database, not a storage database. You will need a place to persistently store your source data outside of Power BI and Power Pivot for Excel if you want to be able to refresh your data in the future.

You can of course keep your source files (csv, xlsx) as the persistent storage if you have relatively small amounts of data, but if you are going to process a lot of data you really should consider loading the source data into a relational database. This could be something like Microsoft Access but better still, you could use something like SQL Server if you have it available. Even if you don’t have access to SQL Server, you could consider installing SQL Server Express on your laptop (it is free). Believe me SQL Server is not that hard to learn (with YouTube) and it is well worth the effort to better manage your persistent storage of data.

What is Query Folding

Inside Power Query, the term query folding refers to the the process of taking complex tasks generated by the Power Query UI (such as sorting 10 million rows of data) and pushing those tasks back to the database for execution. I like to think of Power Query “writing down” some SQL code on a piece of paper, folding the paper in half and passing it over to the database as a set of instructions for execution.

Power Query will automatically execute query folding under certain conditions including:

The source must be a database that can accept a folding request. Most relational databases are included. It can also be an OData source, SSAS and Google Analytics (thanks Miguel and Chris).
If you start your query with SQL code you wrote yourself, no further steps inside Power Query will be folded.
Once you perform a step that cannot be folded, query folding stops for the rest of the query.

How do I Know if Query Folding is Active?

When you use Power Query, the steps you produce are listed in the right side “applied steps” panel. You can check if a query step has generated query folding (or not) by checking for the existence of Native Query code inside that step. To do this, right-click on one of the query steps and then select “view Native Query”. If you are able to do this, ie if view Native Query is enabled, it means Power Query has successfully performed query folding.

Do the Heavy Lifting First

Your objective when using Power Query over a relational database should be to do all the heavy lifting first. You basically want to make sure query folding is enabled for as long as possible so the harder tasks are performed more efficiently by the database and not by Power Query itself. The best way to check is to check if the last step in your query uses query folding. If the last step uses query folding, then every step uses query folding. However if you find that that last step is not using query folding, then you should work you way back up the list of applied steps to find out where query folding stopped. If possible, change the order of execution of your steps so that all the foldable tasks are completed first before you complete any step that cannot be folded (e.g. Split Column).

I have created a brief video to show how Query Folding works and how to check where folding stops (there is no sound in this video, so you can look at it now 🙂 ).

Power Up Your Skills

You can learn this stuff from online articles, but why not excelerate your learning (see what I did there?) by doing some formal training. I have an excellent online Power Query training course that will teach you query folding and so much more with over 7.5 hours of structured video instruction using real world examples. You can checkout my training course here => Power Query for Excel and Power BI – Online Training

25 thoughts on “How Query Folding Works”

Qrcodes
September 18, 2025 at 4:42 am

Really love this

Log in to Reply
здк binance алдым-ау коды
July 16, 2025 at 11:54 pm

Your article helped me a lot, is there any more related content? Thanks!

Log in to Reply
Adolfo Socorro
May 22, 2020 at 1:59 am

Hi Matt, how come query folding for SQL Server does not work if one enters a SQL statement when connecting to the database? Does that mean one always needs to start with a connection to a table or a view? Thanks!

Log in to Reply
1. Matt Allington
  May 22, 2020 at 7:38 am
  
  I don’t know for sure “why”, but I can guess. Query folding is essentially PowerQuery automatically writing a SQL statement for you. I guess this is relatively easy when starting from blank, but difficult when starting from any of an infinite number of pre-written queries written by you (that may or may not be well written). So yes, you need to start with a simple connection.
  
  Log in to Reply
Joshua Harvey
April 14, 2020 at 7:55 am

Matt, great article, thank you!

Does Query Folding attempt to write changes back to the database?

Log in to Reply
1. Matt Allington
  April 14, 2020 at 8:00 am
  
  Query folding cannot do that, but technically you can embed SQL code that will write to SQL Server inside a power query query.
  
  Log in to Reply
Matt Allington
June 17, 2019 at 3:17 pm

In short, it depends. My best tip is to filter asap in the query. Right click the applied step to see if query folding is active, or use a profiling tool on the db to see exactly what happens. There is no other way to be sure. Sorry.

Log in to Reply
Matt Allington
January 19, 2019 at 6:09 am

You are close in both statements. This is how I would describe it.

Power Query will load what ever data is needed in order to produce a sample table showing the top 1000 rows of data after applying all the applied steps. The top 1000 rows are then cached in memory and disk and used to build out the new query steps in the editor. If a query step then requires the query to load more data (such as a sort descending step) then it will repeat the process before storing the new 1000 rows in cache memory and disk.

The advantage of this is it doesn’t have to execute all the applied steps on the entire table during editing. Imagine if you had a 100m row table and you did a Merge Column step in the query. With the current approach, the step is recorded and the preview shows the results on the top 1000 rows – it doesn’t have to execute the full 100m rows until you click Close and Apply.

Hope that helps

Log in to Reply
1. Elizabeth
  June 17, 2019 at 3:08 pm
  
  Hi Matt,
  I too am trying to get my head around the query editor. I would really appreciate some clarification.
  
  I have taken the question a step further.
  
  While I am in Query Editor, I filter the query for records after a certain date. For example Sales transactions after 1st April 2019. Will the query editor still bring in to Power BI all records including Pre and post 1st April 2019. However, will only display in report view(after I click on close/apply) records dated after 1st April 2019. As a consequence, no matter how filters are placed, Query Editor will pull out of the API all records in that table. For example will put out 1,000,000 rows to store in Query Editor but will only display in report view of Power BI, 80,000 row. (sales records after 1st April).
  Thank you in advance
  
  Log in to Reply
Lazaros Viastikopoulos
December 16, 2018 at 11:17 pm

Brilliant read! Thank you for this Matt.
Did I understand correctly through?
I have a query (data set) in Power Query which the source data comes from 5 CSV’s (over 5M records) appended together through a folder connector. Then, I return two more Queries (data sets) from a SQL DB Source using a SQL query, conduct a LEFT OUTER MERGE from the CSV data set as it requires some fields from the two SQL DB Sources, the performance will be slow as Power Query must load every record into Memory first and then complete the Left Out Merge.
However, if I load the CSV’s into a SQL DB Table, therefore I will no longer be loading CSV’s into Power Query, but instead loading from a SQL DB Source, performance should increase as Power Query will now be pushing back those tasks (Multiple MERGES) to the database for execution.
Thanks,
Laz

Log in to Reply
1. Matt Allington
  December 17, 2018 at 7:48 am
  
  Yes Laz, that’s correct. It is easy to test if you have access to SQL. The final performance from SQL will depend on normal SQL issues, such as the existence of suitable indexes and other performance tuning features
  
  Log in to Reply
  1. Lazaros Viastikopoulos
    January 19, 2019 at 5:10 am
    
    Hey Matt,
    
    I am diving much deeper into Power BI and trying to understand performance much more, which reminded me of this article you shared. If you can shed any light on this, it would be very much appreciated.
    
    This was my initial assumption:
    
    “Power Query only shows a preview of data and DOES NOT load data into memory. Data is only loaded into memory once we click on ‘Close and Apply’ and Power Query loads data into Power Pivot.”
    
    When I read the above to myself, it seems incorrect, as surely the transformations must take place before loading anything into Power Pivot (Model). So, this now leads me to believe this:
    
    When applying transformation to data in Power Query, the data is first loaded into memory and then the transformation is applied. After we click the ‘Close and Apply’ button, the Data is once again re-loaded into the model. This is why the loading prompt appears.
    
    Thanks in advance,
    
    Laz
    
    Log in to Reply
Shobi
September 30, 2018 at 1:34 am

Thanks for sharing, Keep up the good work!

Log in to Reply
edh
September 6, 2018 at 9:00 am

I love query folding. A comment, and a plea for help.
Comment: If you change the data type as a separate step, it will most likely break query folding, but if you include the data type change in the step by manually editing the formula in the formula bar, it will change the data type and preserve query folding. The below will preserve query folding.

#”Added Commission” = Table.AddColumn(dbo_oeordlin_sql, “Commission”, each [unit_price] * .05),
#”Changed Type” = Table.TransformColumnTypes(#”Added Commission”,{{“Commission”, type number}}),

That preserves query folding because the data type was already a number and it is just a visual in PQ to change ABC/123 to 1.2 in the data type header.

This will break query folding:
#”Added Commission” = Table.AddColumn(dbo_oeordlin_sql, “Commission”, each [unit_price] * .05),
#”Changed Type” = Table.TransformColumnTypes(#”Added Commission”,{{“Commission”, Int64.Type}}),
It had to change a decimal number to an integer. It won’t fold that, so folding breaks, but the below will preserve query folding:

= Table.AddColumn(dbo_oeordlin_sql, “Commission”, each [unit_price] * .05, Int64.Type)

Just insert “, Int64.Type” in the formula bar before the last paren manually, and it works. Note that editing the formula might destroy that, and you’ll have to put it back.

As for the plea for help, I would love it if anyone had any ideas how to convert a YYYYMMDD integer into a valid date and not break folding. I’ve tried dozens of methods, and it always breaks. Converting to a date is no issue, but folding gets killed.

Log in to Reply
1. Matt Allington
  September 7, 2018 at 6:29 am
  
  Can you get someone to create a view in SQL that has the date column you need? I’m not even sure if SQL can do that task efficiently – I’m not knowledgeable enough on TSQL to know.
  
  Log in to Reply
2. Eric Glover
  September 26, 2019 at 9:38 pm
  
  Has anyone found a solution to the YYYYMMDD date issue mentioned above by edh? In my case, the YYYYMMDD field comes across from the source in String format. So, anytime I transform the column it breaks query folding.
  
  Log in to Reply
Matt Allington
September 5, 2018 at 5:26 pm

Thanks for the corrections Chris

Log in to Reply
Chris Webb
September 5, 2018 at 4:51 pm

Hi Matt,
Nice article, but there are a few inaccuracies. First, there are a few other sources apart from relational databases and OData that support query folding, for example SSAS and Google Analytics. Second, your point about no manual M coding allowed is wrong – there’s no difference between M code you write manually and M code generated by the UI, some things can be folded and some things can’t but it depends on what you do in the code, no how you write the code. You’re right though that using your own SQL query prevents any further folding.

Chris

Log in to Reply
Miguel Escobar
September 4, 2018 at 11:16 pm

Hey!
OData also supports Query folding 🙂

Log in to Reply
Jessica Jolly
September 4, 2018 at 9:16 pm

Thanks for this. Query Folding has been on my mind lately, and this helped me understand it. I think it should really be called Query Outsourcing! 🙂

Log in to Reply
1. Matt Allington
  September 5, 2018 at 6:23 am
  
  I like that name!
  
  Log in to Reply
John P
September 4, 2018 at 12:57 pm

Matt- great tutorial. In this example, it is straightforward. Any way you can do a blog (or video) of query folding when doing a left outer join? I am having trouble understanding this:

You connect to Table 1 (75MB) in a database and do some transformations and close and load as a connection.
Then you connect to Table 2 (100MB) in the same database and do some transformations and close and load as a connection.
So now we have two queries in power query. When I do a left outer join (Table 1 is the left) it takes a long time. Obviously query folding is not taking place. What steps using the Power Query interface can I do in this scenario to do the join with query folding so it wont take 30 minutes?
That seems like a much harder query folding task than simply sorting rows. Any help on this would be super helpful. Thank you.

Log in to Reply
1. Matt Allington
  September 5, 2018 at 5:45 pm
  
  Done John P. Check my YouTube Channel for the new video
  
  Log in to Reply
  1. John P
    October 1, 2018 at 5:41 pm
    
    Matt,
    Thanks for putting up a follow up video. Both videos have were really enlightening to me. Doing the transformations after the left outer join was the missing piece for me. Appreciate you taking the time to educate all of us. Thanks again.
    
    Log in to Reply
osiel
September 4, 2018 at 12:28 pm

hola Mat siempre es genial tus publicaciones, seria genial que los cursos estuvieran disponibles en español. cuidate mucho, exitos en todo. 🙂

Log in to Reply