This is a further (fourth) update/edit to this article Power BI Golden Dataset I first wrote in Apr-17. I then updated it again in Jan-18, Jun-18 and again now in May-20 (Jason did a full review this time around). When I update my articles, I change the original post with the latest and best information available at the time – that way only the current/latest version is available online preventing people stumbling on old advice.
The original idea of a Golden Dataset was to
- Load a PBIX into a OneDrive or SharePoint folder
- Import the Golden Dataset into one or more workspaces (each workspace will have an identical duplicate copy of the dataset).
- Rely on the capabilities of PowerBI.com to automatically refresh the dataset in all workspaces everytime the source (step 1 above) is changed.
This is no longer the most effective way of achieving a Golden Dataset. This article explains the more modern approach, which is to use shared datasets instead. Further, the concept of the Golden Dataset now has very wide reaching impacts across concepts that include shared datasets, shared workspaces, certified datasets and lineage views (to name a few). This article has been broadened to cover all of these areas. This is a very long article now, so strap yourself in. Here is an overview of what is covered.
- Statement of the problem to solve.
- Solution overview and setup
- The definition of a golden dataset.
- Publishing a golden dataset options.
- Using datasets across workspaces configuration.
- The golden workspace.
- The golden dataset.
- Maintaining the golden dataset.
- Creating thin Power BI reports, including local measures.
- Publishing thin workbooks.
- Impact analysis and lineage views.
- Creating online only reports.
In the future, this will also include deployment pipelines (Microsoft recently announced this feature, however it is only available for premium users, and thus this information is still valid for those not in a premium space).
The Problem – Too Many Data Models
Power BI has so many great features it seems almost sacrilegious to raise problems that can surface when unleashing this fabulous self service BI tool to the user base. Having said that, one of the problems that can occur in the world of self service BI is the proliferation of slightly different versions of essentially the same data model. The problem is made worse by the fact that you can publish the same data model to multiple workspaces at PowerBI.com, essentially multiplying the problem. Note: data model and dataset are essentially the same thing – I refer to data model when using Power BI Desktop, but when it is published to PowerBI.com it is officially called a dataset.
If you think about it, self service BI (using your own data models) has the potential to be the antithesis of enterprise BI. Instead of having one single data model that is used for all reporting (enterprise BI), you can build individual bespoke data models each time you have a new problem. This is both powerful and problematic. If you are not careful, before too long you can have an unwieldy mess of uncontrollable and unmanageable data models that are essentially the same at the core but with different reports sitting over the top. The concept of a “Golden Dataset” (you heard the term here first in April 2017) is a elegant solution to this problem, allowing use of a single master source (dataset) across multiple reports in Power BI Desktop (and Power BI Service). Let me explain further below.
Added note: Microsoft has already announced the introduction of composite models (the ability for a user to create a new PBIX file that connects to a golden dataset and then allow the user to extend the model further in their own PBIX file (eg by adding local tables) – this will literally be another game changer, and will finally mean that self service BI and enterprise BI can both live closely together in a world approaching utopia.
Solution Overview and Setup
What is the Golden Dataset?
The Golden Dataset is a master Power BI Data Model that is loaded to PowerBI.com with or without a report attached. (Actually currently you must have at least a single page in your report, even if it is blank, but you can delete it once loaded to PowerBI.com if you like). Once you have your Golden Dataset loaded to PowerBI.com, you can build as many new ‘thin’ reports as you need in Power BI Desktop by connecting Power BI Desktop directly to PowerBI.com with Get Data -> Power BI Service.
A Thin Workbook is one that doesn’t contain a data model itself, but links into a central data model (in this case stored on PowerBI.com).
The advantages of taking this approach are:
- You only have one dataset/data model to maintain. This will use less space, use less resources to keep the data up to date, and also eliminate the proliferation of slightly different versions of the data model.
- You can maintain and improve the single data model in the Golden Dataset master PBIX and make the changes available to all users and dependent reports.
- You (and everyone else with access rights) can simply build new ‘thin’ reports connected to the Golden Dataset as needed.
- Anyone building a new report in Desktop using the Golden Dataset will be able to add their own “local” measures – local to their thin workbook.
- With the implementation of a suitable business process, the best local measures created by users can be retrospectively migrated into the Golden Dataset for the benefit of all.
- You get the benefits of an enterprise solution (kinda like using an SSAS Tabular data model) without having to invest in all that infrastructure.
Let’s explore how it works.
Publishing Your Golden Dataset Options
There are actually a few ways you can work with Golden Datasets in Power BI. The first two options below have been discussed previously the last time this article was updated. Option 3, using a Shared Workspace will be discussed in detail below.
Your options are:
- Save your Golden Dataset to OneDrive for Business and import to one or more Workspaces from there.
- Save your Golden Dataset to SharePoint Online and import to one or more Workspaces from there.
- Publish your Golden Dataset to a Power BI Workspace (Golden Workspace), create a new thin workbook pointing to the Golden Dataset, and then publish the thin workbook to a different workspace(s). Power BI manages the link back to the Golden Dataset for you automatically.
Options 1 and 2 above effectively provide the ability to impact multiple workspaces and reports while maintaining only one dataset, however in order to use the dataset across multiple workspaces, the dataset needs to be loaded multiple times into each workspace where you need the data. This also means that each copy of the dataset needs to have its own refresh schedule configured adding duplicate refresh effort to the systems. The advantage of using option 3 is that you can have multiple workspaces and reports/apps linking back to one common source of data, without the need to duplicate to models. This goes a long way to addressing the restriction that currently exists where only one app can be produced per workspace. The concept of working with a Golden Dataset published into a Golden Workspace is illustrated in the image below.
Using Datasets Across Workspaces Configuration
In order to be able to use a common dataset across multiple workspaces, your service tenant needs to be configured to allow it. In the Admin Portal, select Tenant Settings, and scroll down to Workspace Settings. Here you will find an option to Use datasets across workspaces. This can be enabled for either the entire organisation, or specific security groups.
It is also recommended that if you are intending to work with Golden Datasets, you should have a system in place for certifying the datasets so that your report builders know that this is an official data source. A certified dataset is simply a label that confirms this dataset is the real deal. This feature can also be configured in the Admin Portal, Tenant Settings. Now that your workspaces are configured to allow datasets to be used across workspaces, continue to scroll down until you get to Certification. Click to expand, allowing you to enable this setting. Once enabled you have two fields requiring attention. The first field asks you to specify URL for documentation page. This is where you link your dataset certification process protocol, so when a report developer clicks the “Learn More” link in the Endorsement settings of their dataset, this documentation will guide them in the process for requesting certification.
The second field requiring attention is the “Apply to:” field. Here you will allocate the appropriate security group(s) responsible for dataset certification (In this example, I am in the Golden Data Admin group). This field can be applied to the entire organisation, but that seems counter productive considering the purpose of certifying a dataset is to ensure report developers they are working with a controlled and approved dataset. You should therefore only give access to a sub set of users who are allowed to certify datasets.
Now that we have an appropriately configured tenant, we can proceed with setting up the Golden Workspace.
The Golden Workspace
With an appropriately configured tenant, the first step to establishing your Golden Dataset is to create a Golden Workspace for it to reside in. For demo purposes, I am going to create an Adventure Works Golden Data workspace in my PowerBI.com account. This is no different to creating an ordinary workspace as far as creation is concerned, but I would recommend you establish appropriate protocols so that only “Golden Data Admin” security group users create and maintain these workspaces.
In this example, I am using a Golden Dataset logo to visually illustrate that this is a Golden Workspace, with an appropriate workspace name and description. We are now ready to publish our Golden Dataset.
The Golden Dataset
In Power BI Desktop, I have built the master data model for my colleague report developers to use in building their reports. This PBIX file only contains my model (including necessary calculated columns, tables and measures). The report page is blank. I can simply publish this workbook into my new “Adventure Works Golden Data” workspace as I normally would.
Now that I have my Golden Datset in my Golden Workspace, I need to make a few minor adjustments before it is ready for production use. First, I need to consider whether or not to remove the blank report from this workspace (#1 below), as I only want to maintain my Golden Dataset(s) in here (more on this later). Then I need to promote and certify the dataset (#2 below) as being ready for production (I will release an article on certifying a dataset soon, for those of you who are unsure how to do this).
The Golden Dataset is now ready for production.
Making Changes to The Golden Dataset
The real benefits of this approach are realised when you need to make changes to the Golden Dataset. To demonstrate how this works, when I first published the dataset to the Golden Workspace it had 5 tables. You can see there is now an additional table in the model being published to replace the existing dataset in the Golden Workspace.
At this point it is important that I draw your attention to the fact that if you update a Golden Dataset in a Golden Workspace, it does not reset the certification level of the dataset. Depending on the data policies in your organisation this may or may not cause complications. If you do not have the authority to re-certify the updated dataset, you should have protocols in place to get the updated model certified before your publish it to the Golden Workspace and overwrite the approved dataset.
Note: The re-certification process is likely to be addressed in the future once deployment pipelines are made available.
With your updated dataset loaded into your Golden Workspace, you will need to delete the blank report again. This is of course optional, providing everyone who uses the Golden Workspace is aware that it is not intended for building reports in. Another option to consider, is to load a single image into the report page that illustrates the model design (this will make more sense below).
Creating a Thin Power BI Desktop Report
OK. We now have a Golden Dataset in our Golden Workspace, its time to create a new thin report using Power BI Desktop. To do this, simply open a new blank PBIX file in Power BI Desktop, and opt to Get Data when the splash screen welcomes you. Navigate to the Power Platform option and select “Power BI datasets” as the data source (shown below), then click Connect.
If you are not logged in when you do this, you will be prompted to log in before proceeding. Once logged in, you will be presented with a list of datasets that you can access from your various workspaces. This is one reason why having official datasets promoted and certified is recommended, as these will appear at the top of the list, with certified datasets appearing before promoted datasets. Simply select your Golden Dataset and click Create.
After building my report, I saved the Power BI Desktop file with the name “Thin Sales Report”. The new thin workbook does not have a data view or relationship view, only the report view as shown below. Also the thin workbook doesn’t have a copy of the data, so it will only work when you have direct access to PowerBI.com (keep that in mind if you want access when you are out of Internet coverage).
As mentioned earlier, one of the benefits of this approach is that a report builder can connect to the Golden Dataset but still create local measures when needed. Here is my thin Power BI Desktop workbook below (#1 below). I can right click on a table in the Fields List and create a new Measure (#2 below), but notice you can’t create calculated columns or new tables (they are not in the drop down list and are greyed out in the modelling tab), only measures are available (#3 below).
Any new measures you create in a thin workbook are stored as part of that thin workbook and are not visible in the Golden Dataset. If, down the track the Golden Dataset is updated with a measure using the same name as the one you created locally, your local measure will show an error in your thin workbook and the new measure will appear, working in place of your local version.
You will also notice that any ribbon buttons relating to the dataset in your thin workbook are also locked. Among other things, this means you cannot currently add additional data to the model your thin workbook is built on, however there are moves afoot to enable the building of composite models in thin workbooks as shown in the image below.
Below is a screen shot of new composite models coming soon (not available as of this update in April 2020)
In this image, we can see a workbook that is connected to 3 different datasets (indicated by the colours, Red, Green and Blue). This is going to be huge! To find out more, watch Christian Wade’s Business Applications Summit presentation on Modern Enterprise BI (The Business Analyst view of the Enterprise Semantic Models begins at ~14mins).
In the meantime, the only advice I can offer in this regard is to add an image of the dataset model into the Page1 report of your Golden Dataset. Then you simply need to connect to PowerBI.com and look at the report attached to the Golden Dataset to understand the underlying data model.
Publishing the Thin Workbook and Workspace Lineage
Once you’re ready to publish your thin workbook to PowerBI.com you will normally publish this to another workspace (it can be the same one, but normally it will be different). In this case, I have published my thin workbook into a new workspace called “Adventure Works Sales”. I have also published a regular workbook into a workspace called “Adventure Works Territory Sales” so that we can compare the difference between the two.
Note: I now have both a desktop PBIX file AND a new PowerBI.com thin workbook each containing a report pointing to the same Golden Dataset in the Golden Workspace.
After publishing I can see the new Thin Sales Report (Adventure Works Sales) and Sales Report (Adventure Works Territory Sales) in their respective workspaces.
This is the Thin Workbook
This is the Regular (Golden) Workbook
In comparison between the two, there is an obvious absence of a dataset in the Adventure Works Sales workspace, and no I didn’t just photoshop it away. To understand this, let’s explore the new Lineage View offered in the new experience workspaces.
To view your workspace contents in Lineage View, simply expand the View tab (1) and select Lineage (2).
Let’s first look at the lineage of a standard workspace.
In the Adventure Works Territory Sales workspace, the lineage view reveals that the Sales Report (page 1 with the image of the data model) is in the workbook containing the Adventure Works dataset (golden) which is connected to the 5 Tables in Excel data source via a Gateway.
In contrast, the lineage of the Thin Adventure Works Sales workspace containing the thin report, reveals a General Manager dashboard that stems from the Thin Sales Report, which is connected to the Adventure Works Golden Data in another workspace. Looking at the image below we can see two significant flags relating to the Adventure Works Golden Data. The icon for this dataset contains a link in the bottom right hand corner (1), letting you know this is a shared dataset. The green icon to the right of the dataset name reveals that this is a certified dataset (2).
To explore the source of your Golden Dataset, simply click on the dataset name or on the hyperlink provided (see above). This will redirect you to the lineage view of the shared Golden Workspace.
The lineage of the Adventure Works Golden Data workspace reveals that there is an Adventure Works Golden Data report (containing only our data model image mentioned earlier) stemming from the Adventure Works Golden Data dataset, which is connected to the 5 Tables in Excel data source via a Gateway. In this view there is also a small icon immediately to the left of the “Show Lineage” icon that reveals the dataset’s impact across workspaces (1).
Understanding Impact Analysis
Note: Before proceeding in this next section, I created a second workspace and uploaded another thin workbook connected to the Golden Dataset, so that I could illustrate the Golden Dataset’s impact across multiple workspaces more effectively.
To open the impact analysis report, I clicked on the link (shown as 1 in the image above). The Impact Analysis report is an important feature to allow you see what will be effected when you make changes to your Golden Dataset.
Exploring the Impact Analysis pop-out shows you how many workspaces, reports, and dashboards will be affected by your change, and provides easy navigation to those workspaces where the affected reports and dashboards are located so that you can investigate further if required.
The Impact Summary (1) shows you how many workspaces, reports and dashboards and the number of views there are on those potentially affected items. This helps you determine the overall impact of the change for downstream objects. Thus, it is probably more important to investigate the effect of a change to a dataset that has 500 views than on a dataset that has 10 views.
The Notify Contacts link (2) opens a dialog where you can create and send a message about any dataset changes you intend to make to all the contacts for all impacted workspaces, including workspaces you don’t have access to.
Finally, the Usage Breakdown (3) shows you, for each workspace, the total number of distinct viewers and views for the potentially impacted reports and dashboards it contains, and allows you to further break that down for each report and dashboard.
Power BI Online Only Reports
Of course you can use the Golden Dataset to create a PowerBI.com service only report (as you can with any dataset), however to maintain the effect of the shared Golden Workspace, we need to do this in a slightly different way.
When working in Adventure Works Territory Sales, a regular workspace, we simply need to click on the Dataset in the navigation pane (1) and we can build our new report (2) and save it into the reports area of the workspace. After the report has been saved (top right hand corner of PowerBI.com) you will see the new report in the Reports list in the navigation pane (3).
If required, you can download a desktop version of this report by clicking Download the .pbix file as shown below. Which will download both a copy of the report and a copy of the underlying dataset associated with that report. But what about when we’re working in thin workspaces?
To illistrate, I have used our training profiles below.
Looking at the Adventure Works Sales thin workspace, we don’t have any datasets available to build reports on, and our collaborators who have appropriate access to build reports in the Adventure Works Sales shared workspace cannot currently make any adjustments to the Thin Sales Report.
The solution to this is to ensure your collaborators are given build access rights on the Golden Dataset.
Even if you wish to keep your collaborators access to the Golden Dataset limited to viewer status, with build enabled, they will be able to edit and modify reports in other workspaces that use the Golden Dataset.
The first step to enable this is for you as the administrator of the Golden Dataset, to click on the elipsis next to the dataset name, and select Manage Permissions.
You can now assign build rights to those users you wish to give editing rights to in the Golden Workspace. Simply click the elipsis next to the user’s permission level, and select Add Build.
Now that I have given my collaborator build access to my Golden Dataset, how will this impact their access in the Golden Dataset Workspace?
As a viewer of the Golden Dataset, the content available to your collaborators doesn’t change in the Golden Workspace. Here, they will still only have access to view that “Page1” report that was published alongside the Golden Dataset.
When my collaborator views the Adventure Works Sales workspace however, they are now able to modify the existing Thin Sales Report or Save a Copy and start building a new online only version.
With build authority granted to my collaborator they are also now able to connect to the Golden Dataset in Power BI Desktop, and build a new thin report that they can publish up to the shared Adventure Works Sales shared workspace, but the Golden Workspace is protected from them accidentally publishing a new workbook into it.
Some of you may have noticed that the Download the .pbix file is once again available to my collaborator, however rest assured that attempting to download a PBIX copy of the report is not enabled.
As the administrator of both the Golden Workspace and the shared workspace however, I am able to download the updated Thin Sales Report.pbix file once changes have been applied, should I have the need, however I am not able to download any new online generated reports added to the shared workspace created by either myself or a collaborator using the Golden Dataset (Golden Data Report below).
Switch Reports to a Different Golden Dataset
Now for some really good news. It is possible to repoint an existing thin Power BI Desktop workbook to a different dataset if you so required. Now of course you wouldn’t want to repoint a Sales report to a Finance dataset, but if you take this Golden Dataset approach and later change your mind which workspace to point to, you can easily repoint the thin workbook to a new dataset.
“But all the Data buttons on my ribbon are locked out!” I hear you cry. That is only half true.
While previously you may have taken an approach of repointing the data source via right clicking a table in the fields pane and editing the data acquisition query, this is not available to us in a thin workbook.
Instead, we need to look to the “locked out” ribbon to achieve this.
It is true that most of the Data buttons are locked out in the ribbon, however there is one that still has limited functionality in a thin workbook: Transform data. While the top Transform Data button is greyed out and unavailable, the bottom Transform data button (1) can still be interacted with, and offers only a solution to adjusting Data source settings (2).
Clicking on the Data source settings returns the Power BI datasets picker to view, where you can select a new remote dataset. Note that you cannot change the data connection type however, so you are still limited to another source of data present on PowerBI.com.
Summary of Key Points
- You can upload a PBIX file to the service and use it as a Golden Dataset (one dataset to rule them all).
- You can also set up a Golden Workspace to hold your Golden Dataset(s) that you can then link many reports and workspaces to the Golden Dataset.
- You can create as many new reports as you want remotely connected to this Golden Dataset.
- Connect a new blank Power BI Desktop file effectively creating new thin workbooks.
- Indirectly create new reports online in the Power BI Service.
- If you create a thin workbook connected to the Golden Dataset and create a report, you can publish it to a different workspace and it will work.
- Anyone with build access to the Golden Dataset and a local copy of the desktop generated thin report (PBIX file) can work with the file on their desktop and successfully interact with it on their Power BI Desktop. If you later remove their build rights to the dataset, then the local thin workbook (PBIX file) will stop working because it can no longer access the Golden Dataset – very cool!
- Online generated thin workbooks can’t be downloaded.
- You can modify and update your Golden Dataset and republish at any time, and the changes instantly affect any thin workbooks that you have created connected to the Golden Dataset, and you have the Impact Analysis tool to help you manage this –sweet again.
- Any user with access rights can build local measures to a thin Power BI Desktop workbook without impacting the Golden Dataset.
- With a suitable business process, end users can advise the owner (admin) of the Golden Dataset of any useful local measures that have been created. The administrator can then migrate these measures to the Golden Dataset for the benefit of everyone, a task made simpler when you copy the measures between two PBIX files with Tabular Editor.
- If needed, you can repoint any thin workbook to a different copy of the Golden Dataset. This can be good if you started your journey without using a Golden Dataset but now want to backtrack and sort this out.
Let me know what you think of sharing Power BI Golden Datasets via Golden Workspaces in the comments below. I also would love to hear any other ideas you have on how to get the most from the Golden Dataset.