Power Query is a product that allows business users to extract data from any source, cleanse and transform that data, and then load the modified data to a new location, typically into an Excel spreadsheet or a Power BI data model. Power Query is very easy to learn, and most tasks can be completed by using the buttons on the menus.
When the data source to be loaded is a file or files eg a csv(s), Power Query reads the data from the file(s) and processes the transformation tasks directly inside Power Query. This is fine for relatively simple tasks (eg remove or rename columns) and relatively small files (say less than 1 million rows). But bad things can happen when you try to load tens of millions of rows using Power Query from a csv file and then try to preform ‘heavy’ tasks, such as sorting, over the data to be loaded. To sort 10 million rows from a csv, Power Query would first need to load every record into memory and then complete the sort task from there. While it may be possible to do this, it certainly isn’t the best approach.
Power BI and Power Pivot for Excel do not have their own persistent storage of the source data. These products are a database, but they are a reporting database, not a storage database. You will need a place to persistently store your source data outside of Power BI and Power Pivot for Excel if you want to be able to refresh your data in the future.
You can of course keep your source files (csv, xlsx) as the persistent storage if you have relatively small amounts of data, but if you are going to process a lot of data you really should consider loading the source data into a relational database. This could be something like Microsoft Access but better still, you could use something like SQL Server if you have it available. Even if you don’t have access to SQL Server, you could consider installing SQL Server Express on your laptop (it is free). Believe me SQL Server is not that hard to learn (with YouTube) and it is well worth the effort to better manage your persistent storage of data.
What is Query Folding
Inside Power Query, the term query folding refers to the the process of taking complex tasks generated by the Power Query UI (such as sorting 10 million rows of data) and pushing those tasks back to the database for execution. I like to think of Power Query “writing down” some SQL code on a piece of paper, folding the paper in half and passing it over to the database as a set of instructions for execution.
Power Query will automatically execute query folding under certain conditions including:
- The source must be a database that can accept a folding request. Most relational databases are included. It can also be an OData source, SSAS and Google Analytics (thanks Miguel and Chris).
- If you start your query with SQL code you wrote yourself, no further steps inside Power Query will be folded.
- Once you perform a step that cannot be folded, query folding stops for the rest of the query.
How do I Know if Query Folding is Active?
When you use Power Query, the steps you produce are listed in the right side “applied steps” panel. You can check if a query step has generated query folding (or not) by checking for the existence of Native Query code inside that step. To do this, right-click on one of the query steps and then select “view Native Query”. If you are able to do this, ie if view Native Query is enabled, it means Power Query has successfully performed query folding.
Do the Heavy Lifting First
Your objective when using Power Query over a relational database should be to do all the heavy lifting first. You basically want to make sure query folding is enabled for as long as possible so the harder tasks are performed more efficiently by the database and not by Power Query itself. The best way to check is to check if the last step in your query uses query folding. If the last step uses query folding, then every step uses query folding. However if you find that that last step is not using query folding, then you should work you way back up the list of applied steps to find out where query folding stopped. If possible, change the order of execution of your steps so that all the foldable tasks are completed first before you complete any step that cannot be folded (e.g. Split Column).
I have created a brief video to show how Query Folding works and how to check where folding stops (there is no sound in this video, so you can look at it now 🙂 ).
Power Up Your Skills
You can learn this stuff from online articles, but why not excelerate your learning (see what I did there?) by doing some formal training. I have an excellent online Power Query training course that will teach you query folding and so much more with over 7.5 hours of structured video instruction using real world examples. You can checkout my training course here => Power Query for Excel and Power BI – Online Training