Moving Queries from Power BI Desktop to Dataflows

A few weeks ago I wrote an article introducing dataflows in Power BI and discussed why you should care about them.  There are a few questions that have been spinning around in my head since then, significantly:

  1. What implication is there to the Power BI Golden Dataset as a result of the new dataflows?
  2. Should I extract my queries from my Power BI workbooks and create dataflows instead?

I will comeback to question 1 another time.  Today I am going to explain how and why you may want to move your queries to dataflows.

Why Would You Want to Move Queries to Dataflows?

Power Query is a user friendly ETL tool (Extract, Transform and Load).  Traditionally ETL has been done using more complicated tools (such as SQL Server Integration Services – SSIS) and the resulting data is stored in a data mart or data warehouse for consumption by anyone that needs a standard view of the data.  Power BI Desktop can consume tables directly from a data warehouse and simply load the table into Power BI – dead easy.  But Power Query is also a powerful ETL tool in its own right, and it can be used to transform and reshape the source data directly inside Power BI Desktop (and then PowerBI.com).  This is very useful if :

  1. You don’t have a data warehouse and/or
  2. You need some variation of what is in your data warehouse.
  3. You have other data sources that are not in a data warehouse but are still important to you.

Taking this approach (manipulate in Power Query) is perfectly fine if you have a single workbook, but what if you have 10 similar workbooks all needing the same transformation?  Worse still, what if you are one of many people in a company all doing the same thing with multiple workbooks?  It is possible that you will end up with hundreds of Power Query queries embedded in workbooks that are either identical or what is possibly worse, maybe even slightly different.  One of the things that dataflows allows you to do is remove the Power Query code from the Power BI workbook and make it available for consumption many times into multiple workbooks from the same source.  See image below.

How to Move your Queries to Dataflows

I set out to move my queries from a sample workbooks into dataflows.  First I created a new workspace to store my Dataflows (I called it myData).  I noticed that when you have an otherwise blank workspace (no workbooks loaded), you will see the welcome screen as shown below.  It wasn’t clear to me how to create a dataflow, and it is not intuitive.  All you have to do is click “skip” to be able to create a dataflow.

image

From there I clicked “Dataflows” from the menu at the top of the page and, then the create button.

image

I copied the queries from my current PBIX file one at a time.  To do this, I just went into the query editor in Power BI Desktop, right clicked on a query (1 below) and selected copy (2).

image

I then went back to PowerBI.com and added a new blank query data source inside dataflows

image

I then pasted the code from the clipboard directly over the starting power query code in the dataflow as shown below.

image

In my case I am using a very simple query.  Given this case it is probably easier to just create the new queries connecting directly to my SQL database directly inside dataflow rather than using copy and paste.  If you had a more complex query, then copy and paste is definitely the right way to go.

When I pasted the code, I was prompted to select a gateway.  I already had an enterprise gateway configured to my PC/Server so I selected it from the list.  Note:  dataflows will not work with a personal gateway.  If you have a personal gateway, you may like to consider uninstalling it and installing the enterprise version instead.  Don’t be scared, it is not hard at all to do and it is not too different from the Personal Gateway.

image

I then had to enter my credentials for the PC/Server to make sure the dataflow could access the gateway.

From there it was just a matter of repeating the process for all other tables in my PBIX file.   In my case I was just connecting to my SQL Server DB, so it was faster to create a brand new Entity for each of my SQL Tables directly as a new SQL Database Source, so that is what I did.

Create a new SQL connection

To do this, I simply went to new data source, SQL Server.

I then entered the details for the database AND the credentials for the gateway.

Follow the normal Power Query steps to complete the import for the other tables.

New Dataflows Are Ready

At this point, I had my 5 tables configured as dataflows as shown below.  Technically each of these “tables” is called an “entity” inside dataflows.  This is a clue that dataflows can do a lot more than just serve up a table to a user (like a data warehouse).

image

Refresh the Tables

I then saved the dataflow (called it AdventureWorks) and then ran the refresh when prompted.  Just a reminder for those of you not in the USA, you will need to change the Locale of your connections so that things such as dates are treated correctly.   The default is English US and I need English AUS.

image

This change only has to be (should be) done once for each data source, but there is currently a bug that is preventing this change from being saved.  I know MS is working on this.  Given this isn’t currently working correctly, I actually had to change the date columns to data type “text”, otherwise the queries always failed when it found a date like 28/1/2004.  Once the data was loaded as text in Power BI Desktop I was able to change the data type back to date again, picking up the correct Locale information from my PC.  Hardly ideal, but it is a work around until this bug is fixed.

Re-point the Tables in the PBIX

There are a few ways you can do this, but the easiest way is to get under the hood into the code in Power Query.  The process is to:

  • Create a new connection to one of the entities in the new dataflow from Power BI desktop (get data\Power BI Dataflows).  Repeat for all other tables.
  • Then go into Power Query  (edit queries) and select the Advanced Editor for the first new query, and copy the M code.

  • Then go into the query for the original matching table, go into the Advanced Editor and replace the current code that exists with the new code.
  • Repeat the cut and paste process for the other tables
  • delete the duplicate queries (the new ones, not the original ones).

The advantages of using this approach include:

  • Any calculated columns you have in the table will keep working
  • There is no need to move any measures from the tables.

Note:  You cannot simply delete the original tables and rename the new tables in Power Query and expect it to work.  You must cut and paste the new code into the original table that you want to keep.

Publish the Workbook to PowerBI.com

One REALLY cool feature of dataflows is that I can now publish my PBIX file back to PowerBI.com, but it doesn’t need to be published into the same workspace that contains my dataflows.  For this demo, I published my updated version of my workbook to “My Workspace”, not to the myData workspace I created earlier.

Refresh Dataflows and Workbooks

The last thing I did was to set up scheduled refreshes for both the dataflows and the workbooks.  Of course I need to set the scheduled refresh for the dataflow to run and complete first, followed by a refresh of the workbook.

Dataflows Refresh

I switched back to the dataflows workspace and selected schedule refresh.

image

I set the schedule to refresh each day at 8:00 am

image

I then switched back to My Workspace and did the same thing with the workbook dataset

image

Before moving on, I had to set up my credentials.

image

I clicked “edit credentials”.  The following window popped up.  First I was confused about this as I was trying to edit the data in the first 2 text boxes, but that is not what is needed.  Just click “sign in” and then log into your account.

image

After logging in, I was able to schedule the dataset refresh.  I set it to 8:30 am giving the dataflow enough time to refresh first.

image

Benefits

The benefits of doing it this way include.

  1. You have 1 master query for each table.
  2. You can load these tables into multiple workbooks reusing a common source.
  3. If you need to edit the query, you edit it in one place and it will flow through to every workbook.
  4. Other people can use the same queries (as long as they are given access to the workspace containing the dataflows.
  5. There is one version of the truth.
  6. Dataflows can use traditional data warehouses as the source, but can also use Excel spreadsheets and CSV files too.  In this sense, dataflows can extend the concept of an enterprise data warehouse to a broader set of data including data not managed by IT.
  7. There are performance benefits against the gateway, as the gateway is only used once per query instead of potentially many times.

What do you think?

I would love to hear what you think about setting up your queries this way.  Do you agree it is a good approach?  What other benefits do you see?

Share?

Comments

  1. yes with the caveat that, it is just csv files, so there is no way to implement any kind of security ?

  2. Great Matt,

    I will apply this especially for new customers. Looking forward to your next (?) blog discussing the Golden Dataset and Dataflows. Feels straightforward but their is probably a catch 😉

  3. It is great that dataflows can extend the traditional datawarehouses with the speed and scalability of cloud. It would be more better if it is possible to connect with it in Direct Query Mode. The other features that could be included visual flow of the transformation like Azure Data Factory ,able to access the backend Azure Data Lake files, converting csv files to Parquet files and source control integration really create perfect and complete experience with Power BI Platform.

  4. Matt
    let’s say you are pulling data from SQL server and there is some security implemented, manager store A can’t see the data from Store B, now you export all data using dataflow, now all the data is available for both stores, and because it is CSV file, you can’t implement any restriction, if a user has access to those entities, he will see everything, just worth noting that’s all.

  5. Matt
    My comment is not at the practical level, but when trying to look at the “Bigger Picture”: Power Query Started as an amazingly easy ETL tool in Excel and is growing in to anther ” MS Monster”
    Now I don’t Say I know how to do it better, but I have a bad feeling it could have gone on developing it a better path and that once again (like so many times in a past) Microsoft will loos the mass users, and stay with IT Pro.
    Would love to hear what s our opinion

    • Uriel. My view: There is a big difference between building software that is user friendly for business users and then extending it for more technical purposes too, and building technical software for IT and then trying to make it easy for business users to use. As long as the easy of use is not removed, I am not that fussed if it is extended for IT purposes. Actually I think it is a good thing if Business and IT use the same tools. I learn DAX in Power Pivot for Excel, then I did a 6 month SSAS Tabular project (same core engine). I was fully productive on day 1 and didn’t have to relearn anything. My blog post 2 weeks ago was the first time I had ever used dataflows, and I didn’t need any new training. JMO

  6. Good to highlight this as an Architectural pattern – to move the transformation one step closer to the Data source.

  7. Great article as usual, concise and easy to follow. Looking forward to the next one regarding golden dataset. With the PBI roadmap looks like dataflows can accomplish this (when it gets close to parity with the full PQ desktop capabilities, and directquery with dataflow), and also PBI cloud dataset in the future (when multiple datasets can be used in same model/report. Exciting times for BI!

  8. As a Pro User you have precious 10 GB. Storing both, dataflows and dataset, you are using that space twice.

  9. Mim. Yes, there is no RLS on csv as you know. But once the data is loaded into Power BI you have the same RLS capabilities as any other import mode model. Just don’t give the end users access to the dataflows, just give them access to the report.

  10. Matt, how do you add a data source in a dataflow? It’s not obvious to me how to do that in PowerBI.com

  11. Great post Matt!
    What about performance benefits?
    Have you test it? Can I expect M Query to be refreshed faster in comparison to the same one in PB Desktop?

    • I haven’t tested the performance benefits. My guess is that the sync from the dataflow through the gateway will be the same. Then there is another sync from the workspace to the dataflow. This should be super fast. If there are multiple workspaces and workbooks on PowerBI.com, then the gateway is only used once instead of multiple times, so yes you are right – this would be a great benefit.

  12. I can certainly see benefits of moving queries to Dataflows. Let’s hope MS provide a more efficient way to do this sometime soon. It is rather painful moving them by copying and pasting one at a time when you have lots of queries.

  13. Hi Matt,
    PBI Dataflows s in “preview” now…do you have any idea when it may be generally available?
    These (dataflows) tables are being stored in “Azure Data Lakes Store”…I don’t know what that is about and how it works. Any idea?

    • Sorry, I have no idea how long it will be in preview. Other preview items seem to stay in preview from 3-12 months. Also, preview doesn’t mean things won’t improve before GA. Some things will improve until finally it is in GA. As for the data lake question, see my point 4 in my original article. https://exceleratorbi.com.au/power-bi-dataflows-and-why-you-should-care/

      So no, I don’t really know a lot about it. It is a big data storage technology that allows storage of files and unstructured data for rapid processing and retrieval using big data principles. I don’t know more than that

Leave a Reply

Ten Things I Wish I Knew When I Started With Power Pivot.   Learn  Without The Pain!

x