Microsoft recently released a new feature in Power BI called “Dataflows”. You may have seen it pop up in a recent release of Power BI Desktop under Home\Get Data\Power BI Dataflows (beta) as shown below.
It is also visible when you log into PowerBI.com and go to an App Workspace as shown in 1 below. Note it is not visible in “My Workspace” as shown in 2 below. App Workspaces are a pro feature and it therefore follows that you must have a pro account to see dataflows.
What are Dataflows?
I’m glad you asked, because I have been scratching my head asking the same thing. I have been confused to date because there are lots of cogs in this machine. Based on my reading and research, here is my simplified explanation.
Dataflows are:
- An online service provided by Microsoft as part of Power BI (software as a service, or SaaS).
- In effect dataflows are an online data collection and storage tool.
- Collection: It uses Power Query to connect to the data at the source and transform that data as needed.
- You will need to be able to access the data either through a cloud service (such as Dynamics 365) or to your PC/Network via a gateway.
- You can also use Power Query to write queries from scratch, such as my Power BI calendar table.
- Storage: Dataflows then stores that data in a table in the cloud so it can be used directly inside PowerBI.com, but more importantly (from my view) directly from Power BI Desktop.
- Collection: It uses Power Query to connect to the data at the source and transform that data as needed.
- Dataflows leverage the Power Query skills you have learnt (or are learning) using other tools (like Power BI Desktop, Power Query for Excel) allowing you to reuse those same skills in this online tool.
- Tables that are created as a result of the dataflow are stored in an Azure Data Lake.
- If you don’t know what that is, don’t worry – I don’t understand it either. The point is it doesn’t matter because it is all done automatically for you by the tool.
- Dataflows include the concept of the common data service (CDS) or common data model directly in the tool and you don’t have to know what it is, nor care.
- If you don’t know what that is, don’t worry – it doesn’t matter now/yet.
- This will become very important in the future as it will make the process of getting data out of complex databases (such as MS Dynamics 365) much easier in the future.
Dataflows are (and will be) much more than this, but for now I think this is enough detail for the average business user.
Uses For Dataflows
The immediate use that comes to mind for me is the ability to standardise access to source data. Up until now, when I create a new Power BI Desktop file, I have always connected back to the original data source and loaded the data I needed for the new PBIX file. Often the data needed to be transformed, so I performed some transformation steps in Power Query before loading. Then, later on, when I needed to create a new workbook, I had to either copy the Power Query steps or re-write the steps again from scratch. Each person in the same organisation with the same need for the same data had to do this too hence multiplying the rework. You can imagine that there is the potential to have many versions of the same thing. With dataflows all that changes. One person can create a single dataflow and make it available via an App Workspace. Anyone with access to that App Workspace can then directly access the table of data directly from that dataflow (demo to follow).
Other benefits include:
- Effectively creating an online centralised data mart/data warehouse for storage of data in a format that is better suited to reporting and analytics than the source data.
- As new dataflow connectors become available, it will be easier than ever to connect to important data such as “customers” from services such as MS Dynamics 365. I have tried to do this myself in the past and found it far to complex for the average business user. This is all about to change.
- Dataflows have incremental refresh (currently just for premium) meaning that large data refreshes that currently can take a very long time to complete can be configured to refresh just the changes.
Dataflows Video Demo
I have produced a video demonstrating how to create a dataflow. See the 2 notes given below.
Note 1: In the video I state that I can’t work out how to create a dataflow without first loading a PBIX file. It can be done (see image below), but it is not intuitive (Microsoft agrees and will change it shortly).
Note 2: There is a bug at the moment (Microsoft agrees) where the changes I made to the Locale for the date column didn’t stick. This will also be fixed in the near future.
Steps to Create Power BI Dataflows
I have also documented the steps to create a dataflow to get you started below.
- Login to PowerBI.com
- Create an App Workspace
- If no data is loaded in the workspace, click “skip” in the bottom right of the screen (as shown above)
- Click on the name of the app workspace on the left hand side (shown as 1 below, then Create (2), Dataflow (3).
- Click Add new entity (1 below) and proceed from there.
What Benefits do you see?
I would be really interested to hear from you about how you might use dataflows in your business. Please post your comments below.
Do You Like the Way I Explain things? Why not Hire Me?
I share lots of stuff for free on my blog and at various speaking events around the world – I like sharing and I enjoy blogging. If you find the way I explain things makes it easy for you to understand, then why not hire me to help you learn faster with less rework. I have lots of clients all around the world that I help ‘on demand’ – you could be one of them. Just contact me via my website for more details if you are interested.
Awesome Explanation !!! Cheers
I am getting an error regarding Data Source Credentials:
Your data source can’t be refreshed because the credentials are invalid.
Please update your credentials and try again.
And then, below that message ” (admin has granted access, credentials are not required)
This makes no sense to me. Any suggestions? Thank you Matt.
What is the data source?
It might be worth mentioning that Dataflows require an Enterprise Gateway to work. (that is, the other kind – a Personal Gateway, won’t show up). I think a lot of people who only ever used a gateway for Power BI refreshes, may just have a personal one installed.
Thanks for sharing that Mark, I wasn’t aware of that. I will update the post.
Hey Patricio, not wanting to reinvent the wheel here. Adam has a demo for Dynamics here https://www.youtube.com/watch?v=veuxofp0ZIg
If you can access the DW, then sure, just do that. But sometimes not all the data is in the DW. You CAN skip a DW all together, but personally I don’t think that is a good idea if you already have a DW set up and operating. But if you don’t have one set up, this can be a great step forward. Today I don’t see this as a 1 for 1 replacement for a good DW, but I can see a time in the future when there are less and less differences. And yes, a DW is great when managed by IT for standards. And Dataflows can also be managed by those same data experts for the benefit of the organisation. It is common for there to be lots of grey areas when new features come out. I say, if you don’t see the value today, then don’t use it. If you do see the value, then use it. And value can and will change over time, so keep monitoring.
Nice article.
I am still trying to grasp the use case of data flows considering a fully loaded datawarehouse (DWH) is available. Why don’t I just connect to my DWH?
As far as I can reason for now the DWH can be skipped altogether using a data flow. Will the Data Flow rest aside a DWH or remove the need for a DWH completely? The ETL possibilities are (for now) far less then a fully equipped ETL tool. How do you, or other readers, consider this?
Data Flows are likely to be flexible for the business user. But how about management of all data flows and ETL processes which need to be inline with company policies/definitions.
Hi Richard. Do you mean Power BI Report Server? No, I don’t have access to look at it there, sorry
Hey Ben. This is similar to the earlier question/comment about the golden dataset. An Analysis Services cube is a data model in its own right. It has tables, relationships, measures etc. You can’t access a single table in another model. a dataflow is effectively a single prepared table that can be used inside a brand new data model as many times in the future as you like. Because it is a dataflow, everyone accessing it will have the same version of the truth. dataflows are a closer match to SSIS for a datawarehouse than they do to SSAS.
I agree, that would be great Patricio. I will do it as soon as I work it out 🙂
I managed to access my company SQL Server in Data Flow. I was using the On premises gateway what was not necessary, that’s why I was getting a IP restriction. This is amazing. Now I can access the database anywhere in the world and do some ETL, Power Query preparing the data and even connect in Power BI Desktop. For the record, I can’t access company’s SQL Sever on Power BI Desktop without VPN or IP being authorized. But with Data Flow I can, and once I create a data flow, I can access it from Power Bi Desktop without the IP restriction.
Could you show a demo with Dynamics 365. As we are moving to the cloud version we are scratching our heads about how to get the same level of access to the tables, as we did on our On-premises server.
One thing that found strange if not straightforward outrageous is that Dynamics doesn’t have the same (developer version) as Salesforce, where you can immediately play around at no cost.
Matt, very good explanation and demo. Clear and simple, direct to the point.
Your feelings are the some of ours, and thankfully you translated it for us.
Since I’m only able to access my SQL server from my company IP, I used a Google Spreadsheet as data source, importing as a TXT/CSV and it worked like a charm, following your steps.
This feature will be a game changer for organization and pre mash up operations.
Thanks again for the article.
I think that data flows are great especially because of their ability to connect with other Azure tools. Besides that, I can’t really understand the difference between data flows and a model built with Analysis Services. Can you help me figure it out?
A nice summary Matt – thanks for sharing. Especially the “why” part which I think most Microsoft demos of this have not explained well. AFAIK Incremental is only available in a Power BI Premium workspace?
Dataflows will become the future of the ETL from Microsoft in next one year with the intention of weekly updates release cycle , visual workflow design toolset and auto orchestration of data, as promised by Amir in the whitepaper. The other innvoation coming soon is that Azure Data Lake will allow to store the data in Parquet files ( In memory storage format currently used by Apache Spark Platform).
The future of Microsoft BI is bright and exciting!
Power BI is the new gold standard of the BI Market.
Have you had them look at the on site version? We are looking at that since my Boss has big issues with Cloud due to us being involved with anonymous public data.
Dataflows, paginated reports…man, this stuff is coming thick and fast. If only my IT dept would switch on the cloud already.
Nice,
I hope what connection type can to Power BI Dataflows storage, Import or Direct Query or both.
also I hope how much the number of concurrent connection, for share the storage with many users.
also I hope how fast of Power BI Dataflows import data by Power BI Desktop, for import millions row data.
and more for using Power BI Dataflows instead of traditional Data Warehouse.
@Matt
You don’t have to load a Power BI desktop file to initiate the Data Flow Tab
Once you create the Work space – click on skip that appears at the bottom right corner of the “Welcome to Data flows…” section