Archive for Power BI Desktop

Uses for DAX as a Query Language

This is the final article in my series about DAX as a Query Language.  In my first article I introduced DAX as a Query Language and then in my second article I went in deeper to the specifics on how to use some of the more useful DAX query functions.  In this final article today I am going to cover a few of the many uses for DAX Queries and also introduce one last DAX function – the ROW function.

Create Summary Tables

As you have may have seen in my previous articles, you can use DAX Queries to create summary tables like the one below.

This may be useful if you want to pre-calculate this summary table and have it as part of your data model (more on that covered in the “Materialisation” section below).  If you want to create a summary table for any reason you can do it using DAX Queries.

Create Tables to Use As Filters Inside  CALCULATE

Another major use of queries is as a table input inside a CALCULATE function.  Because the virtual tables retain lineage to the data model they behave as if they were physically part of the data model.   The following measure returns the total sales for all unique combinations of product category and customer gender where the margin to those unique combinations is greater than 45%.

This measures doesn’t have any real value however it does demonstrate the concept well.

Create Tables to Use in Iterators

You can also create table to use as the table parameter in an iterator like SUMX or AVERAGEX.  The following measure returns the straight line average Margin % across all the actual combinations of product category and customer gender.

Here is an article I wrote some time ago that explains how to use DAX Query tables instead of a more complex nested SUMX approach.  This article will be much more useful with your new understanding of DAX Queries.

To Debug DAX Formulas (the Table Bit)

One thing I really like about DAX Queries is they allow you to “Inspect” tables that are used inside DAX Measures and Calculated Columns.  Excel users come from a world where everything thing they do – every formula and every number – is materialised in front of their eyes in the spreadsheet.  When you begin to write more complex DAX Measures or Calculated Columns, the formulas will often be in the following format:

Your Measure = CALCULATE([Some Base Measure], <A complex table function used to filter the data model>)

The issue of course is you can’t actually “see” the result of “the complex table function used to filter the data model”.  If you can’t see it, it can be hard to conceptualise and also hard to debug the measure particularly if it gives an answer different to your expectations.  This is where the concept of materialising tables comes in.  A great way to move forward when you are stuck is to write a DAX Query to return just the table portion of your measure first.  Once you have that working correctly you can insert the table portion into the measure.

Materialise Tables Into Excel

It is possible to materialise a DAX Query into Excel by using a trick that I learnt from The Italians.  For this next example I have an Excel Workbook that contains an Adventure Works Power Pivot data model.  The first thing to do is insert a Table* (not a Pivot Table) connected to the data model.  To do this, go to a blank area of the spreadsheet (shown as 1 below) and then select Data\Existing Connections\Tables, then select any table (I have selected Products shown as 5 below) and then click Open.  *At this point you can select any table as it is going to change shortly anyway.

image_thumb[38]

You will be given an option to insert a Pivot Table however you also have an option to insert a Table as shown below.  Go ahead and click OK to insert this table.

image_thumb[40]

Now that you have a table inserted into a spreadsheet from the data model, it is possible to change what is returned in this table.  To change what is returned in the table, right click anywhere inside the table (shown as 1 below) and then select Table\Edit DAX.

image_thumb[42]

Now you have the opportunity to change this table from being a “table” to being a DAX Query as shown below.

image_thumb[43]

I have now taken a DAX Query I wrote earlier and pasted this into the expression box (shown below) to return a summary table into the spreadsheet.

image_thumb[44]

You can manually refresh this table by right clicking in the table and clicking “Refresh”.  Unfortunately there is no easy way to pass filters to this type of table in Excel although I believe it can be done with VBA. That may be a topic for another day.

Materialise Tables Into Power BI

You can also materialise DAX Query Tables into Power BI using the “New Table” function.  This can be very useful to help you debug the formulas you are writing (as I covered in this article).  The only thing different to note here is that you need to omit the EVALUATE statement inside Power BI New Tables.  The example below materialises the table (shown as 5) from the (formula shown as 4).

image_thumb[45]

Check the Performance of your Measures using ROW

DAX Studio has a really cool feature that allows you to understand how Power Pivot is completing its task of calculating a formula under the hood.  To demonstrate the point I need to introduce an new DAX Query function called ROW.

You can see the syntax for ROW below from the DAX Studio Intellisense.

image_thumb

ROW returns a single row table.  Because a DAX Query can ONLY return a table, it follows that if you want to see the result of a measure inside DAX Studio then you must first convert the scalar result of the measure into a table – this is exactly what ROW does.

Below is a simple example where I return the value of the [Total Sales] measure as a new table.

image_thumb5

My ROW function (shown as 1 above) returns a single row table with 1 column (called “my result”) and the row has the value 29.3m (shown as 2 above) which is the total for the entire database.  Note the Server Timings button shown in 3 above.  If you click this button it will enable the capability for DAX Studio to keep track of how long your query takes to execute and also how the query is being executed.

With the server timings enabled, a new tab will appear in DAX Studio (shown as 1 below).

image_thumb4

After executing the query, the server timings tab will show the total time taken to execute the query in milliseconds (shown as 2), how much time each engine in Power Pivot completed the tasks (Formula Engine and Storage Engine shown in 3), and if the Storage Engine Cache was used (shown in 4).

SE or FE?

This is a huge topic in its own right and way beyond the scope of this article. What I will say here is that in general you want to leverage the Storage Engine in preference to the Formula Engine.  Take a look at some of the key points below.

Storage Engine Formula Engine
Really Fast Really Smart
Can retrieve records without decompressing in the right conditions Iterates in a row context and hence it can be slow
Multi threaded Single threaded
Cached Not cached

I will cover the Storage Engine and Formula Engine in another post sometime.

CROSSJOIN Revisited using ROW

Earlier in this series I warned that CROSSJOIN can return very large tables, but this is only really an issue if you try to materialise the tables.  If I write the following query in DAX Studio, it takes forever to execute and I ended up canceling the query (remember there are more than 7 million rows in this table).

image_thumb[46]

The issue is not the creation of the above table in Power Pivot – the issue is the materialisation of this table in DAX Studio.  Consider the following query that uses ROW to return a single row table containing the count of the rows in the CROSSJOIN query.

image_thumb[47]

The above query executed in 400 milliseconds on my PC confirming that Power Pivot can create these large tables really quickly.  If you want to materialise the results, well that is a completely different thing.

An Introduction to DAX as a Query Language

Level: Intermediate

I have written quite a few articles that contain embedded references to DAX Queries and DAX Table functions in the past.  Today I am writing an article that introduces the DAX query language from the beginning so you can learn how to use it and how it will help you to become a Power BI superstar.

What is a Query Language?

A query language is a programming language that allows you to send a request to a database and then receive data back in the form of a table.  If you only have a few hundred rows of data then this is not overly useful.  If on the other hand you have millions of rows of data then it becomes very useful.

The most important difference between DAX when used in Measures and Calculated Columns and DAX as a query language is that the query language always returns a table.  The resulting table will consist of one or more columns, none or more rows, and data inside none or more of the cells in the table.

Why Do I Need a Query Language?

When you create a data model in Power BI or Power Pivot for Excel, you are essentially loading tabular data into a dedicated reporting database for the expressed purpose of analysis and/or reporting.  Once it is loaded, you (the report author) then typically spend hours of effort building out the data model so it can be used to add value to your business/job.  Finally once this is all done, the most common way to use your new data model is to build reports and pivot tables that present data to a user in a way that is easy to understand.

Below is an example of an interactive Power BI report that runs off a Power BI data model.

And here is a Pivot Table that runs off an Excel Power Pivot data model.

Both of these above visualisations are summarisations of the underlying data and in both cases you can’t actually see any of the the underlying data itself, just the summary.  The visualisation tools above are great and have enormous utility however they are generally not the best tools if you want to see the underlying data itself or if you want to extract very large sets of data for whatever reason.

Once you have invested all this effort in building your data model, you are going to want to use it to the fullest extent. Power Pivot and Power BI can handle many millions of rows of data (unlike traditional Excel) and so the “old way” of simply looking at the underlying table is not really practical in this brave new world.

Show Me the Underlying Data!

 

If you take the pivot table I showed earlier, most Excel users would be aware that they can double click on any value cell (say 31.6% representing Mountain Bike sales to people aged between 30 and 40 years of age).   When you double click on a traditional pivot table in this way, something magical happens – a copy of the underlying data is extracted from the source and it magically appears in a new sheet in Excel.  Once you are done looking at the copy of the data, you can delete the new sheet and you are back where you started – life is good.

When you double click on on a traditional pivot table in this way, you are effectively sending a query to the underlying database and the query response is a table of data.

But Does it Work With Power Pivot?

In short, yes and no depending on what you are looking for.  The experience is different if you have an Excel Power Pivot data model or if you are connected to an external SSAS data source. In my experience the results are inconsistent at best and at worst it may not give you what you expect.  But the good news is that with DAX as a query language you can easily write any query you want and extract exactly the subset of data you need.  You no longer have to rely on double clicking your pivot table to interrogate your data.

Time To Learn DAX as a Query Language

OK, enough of the chatter, time to get serious about DAX as a query language.  You have invested time and effort in building your data model in Power BI or Power Pivot and now you rightly want to extract a subset of your data so you can take a look at it.  Let’s get started.

First Select the Right Tool

There are 3 general approaches you can take to extract a table of data from your fabulous data model.

  • Use a dedicated tool like DAX Studio or SQL Server Management Studio.  I wrote an article describing how to install and use DAX Studio here.
  • Write a “New Table” in Power BI Desktop.  I wrote an article about how to use this feature here.
  • Use an Excel “table hack” I learnt from Marco Russo to extract a table directly into Excel.  I will cover this hack later in this article series.

I am going to use DAX Studio in this blog article but you can use Power BI “New Table” if you want to follow along and you don’t have (or want) DAX Studio. In fact if you want to learn this stuff then you really should follow along using either approach.  In my experience you can’t learn new concepts simply by reading.  Please note however that using Power BI and Excel as a query tool should be considered as a temporary extraction of the data.  Once you have looked at the data you want, you would normally delete the table so you don’t end up storing redundant data (you can always extract it again if you need it).

Let’s Start With the Evaluate Statement

All DAX queries must start with the EVALUATE statement. There is an exception however.  When you use the Power BI “New Table” button to write a query, you simply omit this EVALUATE statement.  This is no doubt another example of “Syntax Sugar” where the developers shield the user from some of the more confusing syntax.  If you are following along and testing my queries below in Power BI, then simply omit the EVALUATE statement when you write the queries.  In addition if you ever want to move your DAX Studio query to a table in Power BI, then you can just remove the EVALUATE statement at that time.

Connecting DAX Studio to the Database

When I launch DAX Studio I am prompted to connect to my database. In this case I have an instance of Power BI Desktop with an Adventure Works data model running on my PC and DAX Studio gives me an option to connect to that.  You can also connect to Excel Power Pivot data models and Tabular SSAS servers.

image

If you want a more comprehensive overview of the DAX Studio UI, then read my other post I mentioned earlier.  Otherwise let’s get into it.

Return an Existing Table

The easiest DAX query you can write is to simply return a complete copy of an existing table.

When I run this query in DAX Studio (shown as 1 below) and then press F5 to run, the entire table is returned (shown as 2).  A total of 18,484 rows of data (shown as 3).

image

This in itself is not overly useful as chances are you have this table already in some source system.

List of Unique Values

In DAX there are many functions that return tables instead of scalar values.  I covered this topic in depth last week so you can read up on that here if you like.

Let me now look at the ALL function.

When I execute this table function as a query, I can actually “see” the results of the query.  In this case the query returns a single column table consisting of 4 rows – all the unique product categories.

Next let’s look at VALUES.

In this instance, the VALUES table function returns exactly the same result.

image

Now let’s look at DISTINCT.

And same again here.

image

All three of these table functions return the same table in this instance.  In reality they all do different things in different filtering scenarios.  This is one of the great things about DAX Queries is that you can “inspect” what is happening to these table functions in different scenarios.  More on that later.

A Two Column Table

The ALL function can take 1 or more columns as an input (or a table).  In the example below I have used ALL to materialise a table of all of the combinations of Product Category and Product Sub Category.  There are 37 unique combinations in all.

image

A Filtered Table

Most people that know DAX will know of the FILTER function.  The FILTER function returns a filtered copy of a table.  eg see the example below

This is a much more interesting query because it returns a table containing all the products that have sold more than $200,000 of sales over all time.  There are 49 such products.

image

Altering Filter Context

When you write a DAX measure or Calculated Column, there is only 1 way you can change the filter context and that is using the CALCULATE function.  CALCULATE returns a scalar value after applying filters.  CALCULATETABLE is the sibling of CALCULATE.  As you can probably guess, CALCULATETABLE returns a table after applying filters.

This next query returns exactly the same result as the last query.

image

the main difference with using CALCULATETABLE compared to just using FILTER is that CALCULATETABLE can return a different table to the FILTER function itself.

The above query uses the same FILTER function from before, but instead of returning a table containing the products that sold more than $200,000, it returns all of the actual sales transactions that contributed to those products qualifying (see results below).

image

Combining Table Functions

Now that you see how some of the standard table functions work in a query, let’s look at combining them together into a single query.

This query first applies a filter on all products that have sold more than $50,000 (line 4 below), and then returns a list of unique product subcategories that contain these products.

image

One interesting thing in the query results above is that “helmets” are in the list (the only non-bike subcategory). To see what the exact products where that sold, it is simply a matter of writing another query as follows.

image

Note how I was able to reuse the the original query structure to return the list of product names with just a minor change to the query.

More to Come Next Week

This is a big topic – too big for a single blog article.  This week I introduced the topic of DAX queries, I showed how you can write queries to return tables and how you can combine table functions together to interrogate data.  Next week I will introduce some new special query functions that extend the capabilities even further.

Relationships in Power BI and Power Pivot

Level: Beginners

Power Pivot is a database that has been designed from the ground up to be optimised for reporting and analysis in Power BI, Power Pivot for Excel and SSAS Tabular.  The technology is essentially the same across all of these products so I will generically refer to Power Pivot in this article.

Power Pivot uses an in memory columnar database (Vertipaq) as the foundation technology – the same technology for all versions (Power Pivot, Power  BI and SSAS Tabular).  The Vertipaq engine is what makes Power Pivot both super fast and highly compressed.  A Power Pivot database is not the same as a relational database (RDB) and it does not support all the relationship types that relational databases support.  This can be confusing for people that are new to Power Pivot, particularly if they have at least a basic understanding of how databases (such as MS Access) work.  I explain what you need to know to get started with relationships in Power Pivot here.

Virtual Relationships

This article is specifically about physical relationships, however there are ways to create virtual relationships using DAX.  Examples include using LOOKUPVALUE, FILTER, CROSSFILTER and other techniques.  I am not covering these types of relationships in this article.

Relationship Cardinality Types

There are 3 different possible physical relationship cardinality types in database design but not all are supported by Power Pivot.

Relationship Cardinality Type Power BI Support Power Pivot for Excel Support
One to Many Yes Yes
One to One Yes No
Many to Many No No

One to Many Relationships

image

The one to many relationship is the foundation of Power Pivot.  In the example above (from Adventure Works in Power BI Desktop), the Customers table is on the 1 side of the relationship and the Sales table is on the many side of the relationship. These tables are joined using a common field/column called “CustomerKey”.  Customer Key (aka customer number) is a code that uniquely identifies each customer.  There can be no duplicates of the customer key in the customer table.  Conversely the customer can purchase as many times as needed and hence the customer key can appear in the Sales table as many times as necessary.  This is where the name “one to many” comes from – the customer key occurs once and only once in the Customers table but can appear many times in the Sales table.

Tables on the one side of the relationship are called Dimension tables (I call them Lookup tables) and the tables on the many side of the relationship are called Fact tables (I call them Data tables).

The entire Power Pivot Vertipaq engine is optimised to work with this (one to many) type of relationship.

One to One Relationships

The One to One relationship is only supported in Power BI and the newest version of SSAS Tabular.  In my view this relationship type has limited value and in most cases it is better to combine these tables into a single flat table prior to loading to Power BI.  Consider the model below.

image

The first relationship (shown as 1) is a 1 to many relationship between the Customer table (Lookup table) and the Sales table (Data table).  The Customer Socio Economic Data table is joined to the Customer table via a 1 to 1 relationship (shown as 2 above).  If there is a benefit (to the user of reports) of splitting this Socio Economic data into a separate table then of course you should do so.  If there is no benefit, I recommend you combine all the data from Customer Socio Economic Data table into the Customer table using Power Query on load.

Every relationship has a “cost” in that it will have some affect on performance.  The performance impact may not be noticeable for simple models but may become an issue with very complex models.

If you only remember 1 thing from this article, then please let it be this:  Don’t automatically accept the table structure coming from your source data.  You are now a data modeller and you need to make decisions on the best way to load your data.  Your source system  is probably not optimised for reporting (unless it is a reporting datamart) so please don’t assume that what you have got is what you need.

Many to Many Relationships

The many to many relationship type is not supported in Power Pivot.  This is a deliberate design decision that has been made as a trade off to ensure optimum database performance.  If you have data that is logically related with a many to many cardinality, there are modelling techniques you can use to solve the problem in Power Pivot that are covered in my many to many pattern article here.

UI Differences

There are a few differences in the Power BI Desktop/Excel 2016 UI (relationship view) and that of Excel 2010/2013.

Excel 2010/2013

The early UI has an arrow pointing to the lookup table (the one side of the relationship) and a dot on the many side.  This is unfortunate as the arrow is pointing in the opposite direction of the filter propagation.  This only exists in Excel 2010/2013 (and the early version of SSAS Tabular).

Power BI/Excel 2016

The UI has been significantly improved with Power BI Desktop and Excel 2016.  As you can see below, the 1 to many relationship is now clearly shown, and there is also a new arrow showing the automatic filter propagation direction.

image

One Active Relationship

It is possible to have more than 1 relationship between tables in Power Pivot, but only 1 can be active at a time.  An example of when you may want multiple relationships is if you have a Sales[Order Date] and a Sales[Ship Date] in your data table.

In this scenario (shown above in Excel 2013) you may want to join both Sales Date columns to your Calendar table so you can use time intelligence in your data model on both Order Date and Ship Date.

The active relationship is shown as a solid line (above) and the inactive relationship is shown as a dashed line (in this case it is highlighted in blue above).  The active relationship is used by default in all DAX measures however you can over-ride this default and use the inactive relationship (when needed) by using the USERELATIONSHIP() function.  Full details on this are covered in my article here.

Cross Filtering Behaviour

Power Pivot is optimised to work with one to many relationships and to automatically propagate filters (filter context) from the one side to the many side.  In all versions of Power Pivot for Excel, this is the ONLY type of filter propagation that is available.

Power BI supports bi-directional cross filtering behaviour (shown right below) as well as single direction (shown left below).

image

In fact bi-directional filtering is the default behaviour for Power BI Desktop.  There are many people (including me) that think this is a bad idea as bi-directional cross filtering comes at a cost – there is an overhead of constantly cross filtering the lookup table based on the contents of the data table at time when it is actually not required.  Sure if you have a simple model and you need this behaviour and you don’t know how to handled it with the many to many pattern  then turn it on.  But surely this should not be turned on by default.  In addition, if you have more than 1 data table, bi-directional cross filtering can cause circular references causing further confusion to unsuspecting users.

I think Microsoft is trying to make Power BI more user friendly for the “lay user” however in this case I think Microsoft has made a mistake.  Do yourself a favour and turn off bi-directional cross filtering unless you explicitly need it.  To change it just double click on the arrow and set the cross filter direction to single.

update:  Last week 17th Feb 2017 I noted that new models I built were single directional by default – seems Microsoft has listened and changed the default behaviour.

image

Wrap Up

Hopefully this article has helped you understand more about how Power Pivot works.  Let me know in the comments below if there is anything I have missed.

Download Report from Power BI Service to Desktop

I love to celebrate successes with the Microsoft Power BI team, but I am also not shy in shouting out when I am disappointed.  Back on 25th Nov 2016 I blogged about the then new “Save PBIX from Power BI Service”.  At the time I was disappointed in the way it worked because it didn’t really add any value that I could see.  But that now seems to have all changed.  Today I was delivering a training course and we looked at the “Download Report (Preview)” feature.  I am very wary about telling training participants definitively how PowerBI.com because it often get caught out.  The way it worked last week is different to this week.  So I said cautiously “This feature wasn’t very good last time I looked – but lets see if it works now”.

image

I was very excited to discover that download report now works as I would expect.

Here is the Scenario that now Works

Create a desktop PBIX and upload it to the service, then make changes to (or add a new) a report, then download the changes back to the desktop.  Changes that you make in Power BI Service in the cloud can be downloaded (and hence backed up) to a PBIX file on your desktop.  You are no longer locked into a scenario where you must author all reports in Power BI Desktop to maintain a backup.  This new feature allows you to share your reports with others using a group workspace to co-author a report together, and then download a PBIX version of the report with all the new reports back to the desktop.

To test this out, I uploaded a PBIX file to PowerBI.com and then added a new report tab (shown below). I added a new custom visual (shown below) to the PowerBI.com service to give it a good work out – I wondered if this would be downloaded.

image

I then downloaded the PBIX to my desktop using download report and opened it in Power BI Desktop.  I was pleased and excited to find the desktop version was identical to the one in the cloud, including the Word Cloud custom visual.

image

It is worth pointing out a few things that still don’t work

  • The Dashboards are not included in the PBIX.
  • You must upload a PBIX with a current version of Power BI Desktop for this feature to work. So if your PBIX file hasn’t been updated in the cloud for a while then now is the time to refresh it.
  • it doesn’t work for cloud created data sets, only with pbix files first created on the desktop.

Good job Microsoft.  I hope to see these other features added soon

Use Power Query to Compare Database Records

I was helping a user on community.powerbi.com this week. The user had a list of service desk tickets for multiple customers and the requirement was to identify when a customer opened a second new ticket within 7 days of closing a previous ticket.  This is not an uncommon scenario so I thought it would make a good blog article.

image

One of the challenges that new users have when using Power BI is to decide if they should use Power Pivot (DAX modelling) or Power Query (PQL shaping) to solve each problem.  And if the answer is DAX, then they also need to decide if it should be a measure or calculated columnThere is no one single correct answer to these questions and it always depends on the circumstances.  I generally recommend to shape your data properly prior to loading to Power Pivot, that way the DAX formulas are easier to write and the reporting database will be more performant.   I looked at the options for this problem and decided to use Power Query to shape the data before loading.  Power Pivot and DAX are not really designed to do ordinal row comparisons, so this task is better handled with Power Query.

For this particular problem I thought I would produce a video that demonstrates one way in which this problem can be solved.  Here it is.

My Online Power Query Video Training

One of my new year’s resolutions was to baton down and create an online Power Query training course.  This has been on my to-do list for almost 2 years now and I am well under way.  The video above is a sample of my video course.  I start from the beginning and walk the viewer through the foundation skills they need to know to be a Power Query guru.  Towards the end of the course I will have a stack of videos like the one above where you can broaden your awareness of how you can use your new skills in real world situations.  If you would like to sign up and be notified once the course is finished (no obligation), then please enter your contact details in the form below.

DAX Time Intelligence Explained

Level: Beginners

I help a lot of people on forums who ask questions about time intelligence for DAX.  If you are just starting out then the chances are that you may not even be clear what time intelligence is and hence sometimes you don’t even know what to ask.  Often the question is something like “I want to show this year and last year on a chart”, or “total year to date this year compared with last year” etc. If you want to do any time based comparison using DAX, Power Pivot and or Power BI, then this article explaining time intelligence is the right article for you.

Definition of Time Intelligence

Time intelligence is the collective name for a set of patterns (DAX in this case) that can be used to solve time comparison problems.  Examples include comparing:

  • Same period prior month, quarter, year etc.
  • Same period next month, quarter, year etc.
  • Same period year to date compared with prior year, next year etc.
  • Rolling 30 days, 60 days, 90 days, 12 months etc.
  • etc. – there are many many more

Time intelligence is used when you want to “time shift” any period with another period of time for comparison purposes or to simply display a different period than the selection.

Understanding Filter Context

Before you can understand why time intelligence needs a special approach in DAX, you first need to have a clear understanding of Filter Context.

Pivot Tables and Power BI visuals both slice data so that you can “drill” and “filter” to see a sub-set of data.  Take the image shown below.  On the left is a pivot table and on the right is a Power BI visual.

image

Starting with the Excel Pivot table on the left, every value cell in the pivot table has been filtered more or less by the Rows, Columns, Filters and Slicers that make up the Pivot Table.  The cell highlighted as 1 (above left) has the following filters applied.

  • Territory[Country] = “Australia” – this comes from the Rows in the Pivot
  • Calendar[Year] = “2003” – this comes from the Year in the slicer
  • Product[Category] = “Bikes” – this comes from the Category in the filter.

After these filters are applied, the calculation for the cell is evaluated and the answer $2,947,789 is returned to the cell.  Every value cell in the pivot table is evaluated in exactly the same way – including the Grand Total row in the pivot table.  In the case of the grand total row, Product[Category] and Calendar[Year] have the same filters, but there is no filter on Territory[Country].

On the right hand side in the image above is a Power BI visual.  Filtering in Power BI visuals fundamentally works the same way as a pivot table however there are more places for cross filtering to come from.  In the image above, the same filtering is applied as in the Pivot table but in the Power BI example the filters are applied in a different way.

  • Territory[Country] = “Australia” – this comes from the bar chart Axis
  • Calendar[Year] = “2003” – this comes from the Year in the slicer
  • Product[Category] = “Bikes” – this comes from the Category in the tree map visual.

When filter context is passed from a visual to the underlying data model, all the relevant tables are filtered before the calculation is completed.  Filter first, evaluate second is a fundamental principle for all DAX formulas.

The Time Intelligence “Problem”

Let’s assume you want to compare total sales on a particular year vs prior year.  One way to do this (in Excel) is to put the years onto Columns in a pivot table as shown below (a similar approach can be used in Power BI).

image

But doing it this way causes many problems, including:

  • There are years in the pivot table that you don’t want (eg 2001, 2004).  You will need to somehow manually hide or filter the ones you don’t need.
  • If you want to calculate the absolute change or % change year on year you will need to hard code some formulas in the cells next to the spreadsheet and hence they can’t be reused in other visuals later.
  • If you want to look at a different year you will potentially have to go through the process of doing the filtering and formulas again from scratch.

A better way to solve this problem is to select the current period (using a slicer or filter of some sort) and have a universal formula that returns the result relative to the selection.  So if you select 2003, the formula will automatically select 2002 for you.  If you select 2002, it will automatically select 2001 (and so on).

Filtering is Now Your Enemy

If you want to write a relative time intelligence formula, then the natural filtering behaviour of a visual becomes your enemy. Imagine a calendar table with a filter applied Calendar[Year] = 2003.  If you imagine what the filtered data model would look like “Under the hood”, it should be clear that the filtered calendar table will show rows starting on 1 Jan 2003 and ending on 31 Dec 2003 – it will contain 365 unique days for the single year 2003.  The filter has already been applied to the data model so only days in 2003 are available for the calculation – how then can the data model possibly access sales for the year 2002?  There needs to be a way to go back in time and fetch a different period (in this case the period is 1 year earlier in time than the selected period), yet the 2003 filter has already been applied preventing this from occurring naturally.  This is the reason why there needs to be a special set of time intelligence functions – to solve this natural filtering “problem”.

How Time Intelligence Functions Work

Time intelligence functions are specifically designed to solve the filtering issue described above.  All time intelligence functions execute the following 4 steps:

  1. first “detect” the current filter context to determine what the “current” selected period is
  2. then remove the calendar filtering from the underlying tables so that data for “all time” is once again available.
  3. then perform a time shift to find a different period in time (as specified in the formula).
  4. Finally reapply filters on the data model for the time shifted period prior to calculating the final value.

Custom vs. Inbuilt Time Intelligence

There are 2 types of time intelligence functions – Custom and Inbuilt.  Inbuilt time intelligence functions are easier to write because they have been designed to protect the user (ie you) from the complexity.  I am not going to cover Inbuilt time intelligence in depth here because they are relatively easy to use.  See link at the bottom of the page if you would like to download the DAX reference guide I produced (including all the time intelligence functions).

The reason inbuilt time intelligence functions are easier to learn is they actually are what is known as “Syntax Sugar”.  Microsoft has created these special functions to make them easier to write and easier to understand.  You follow the simple syntax and Power Pivot does the rest.  But under the hood the inbuilt time intelligence functions are converted to the full syntax prior to evaluation.  Consider the following two examples (just to illustrate the potential complexity in the custom version).

Total Sales Year to Date (inbuilt) = TOTALSYTD(Calendar[Date],[Total Sales])

Both of these formulas return the exact same result, and in fact both use the same approach to calculating the result under the hood.  The only difference is the inbuilt version is much easy to write because you (the DAX author) are protected from the full syntax.

The end result (regardless which formula is used) looks like this in a Pivot Table.

image

Looking at the highlighted cells, even though cell 1 above is filtered to show only sales for the month of July 2003, the Time Intelligence function (cell 2 above) is able to display sales for the period Jan – Jul 2003.  It does this because the formula performs the following 4 steps.

  1. It first “detects” the current filter context to determine what the “current” selected period is.  In this case July 2003
  2. It then removes the calendar filtering from the underlying tables so that all data is once again available.
  3. It then performs a time shift to find a different period in time – in this case it holds the end date in the current filter context the same (31 July 2003) but shifts the start date back to 1 Jan 2003.
  4. Finally it reapplies filters on the data model for the time shifted period prior to calculating the final value.

How to Read a Custom Time Intelligence Formula

The custom time intelligence formulas can be daunting when you first see them – but actually they are easy to understand as long as you clearly understand the role of each part of the formula.  Below is the same formula again (from above) along with line numbers to make it easier for me to refer to.

image

Line 2 (CALCULATE) is the only function that can change the natural filtering behaviour coming from a visual – that’s what it does.   CALCULATE always filters first, evaluates second (as mentioned above).  So lines 5 – 8 are executed first (filters applied) and then the formula on line 3 is evaluated last.

Lines 4 – 8 (FILTER) is where the filtering occurs.  There are a few confusing things here.

  • Line 5 refers to ALL(Calendar) instead of simply Calendar.
  • Lines 6 and 7 seem to be evaluating against themselves – very confusing.
  • Line 7 starts with a double ampersand &&

Let me explain each line.

Line 5 ALL(Calendar)

The key thing to understand here is that the filter portion of calculate always operates in the current filter context.  That means that the Calendar table in line 5 has already been filtered by the visual (eg the Pivot Table).  Looking back at the pivot table image above, this means that the line 5 is already filtered by the pivot table and hence the Calendar only has dates from 1 July 2003 until 31 July 2003 available – all other dates have been filtered away.  The purpose of the ALL Function therefore is to remove all filters from the current filter context.  If it didn’t remove these filters, it would not be possible to access dates outside of the month of July 2003.

Now they have all be removed, it is time to add back that date filters that are needed.

Line 6 MAX( )

Line 6 reads “where Calendar[Year] = MAX(Calendar[Year])” and hence it seems to be referring to itself. The way to read and understand line 6 (and 7) is as follows:

  • Whenever you see an aggregation function (in this case MAX) around a column, it means “go and read the value from the current filter context”.
  • Whenever you see a “naked” reference to a column (ie there is no aggregation function around the column), then it means “go and apply a new filter on this column in the data model.

Taking these 2 rules, it should be clear that MAX(Calendar[Year]) in the current filter context is = 2003.  Line 6 therefore says “Go and apply a new filter on Calendar[Year] to be equal to 2003.

Note that you can use any aggregation function in these formulas that does the job.  In line 6, you could use SUM, MIN, MAX, AVERAGE or any other similar aggregation function and still get the same result.  You could also use VALUES or DISTINCT in the case of line 6, and LASTDATE in the case of line 7.  There is no right or wrong answer, simply think about the need and then find the best function to use.

Line 7 && and MAX( )

Line 7 reads “and Calendar[Date] <= MAX(Calendar[Date])”.  The double ampersand && is the inline syntax for a logical AND.  Using this knowledge and the same logic as with line 6, the way to read line 7 is as follows:

“AND also go and apply another new filter this time on Calendar[Date] to be less than or equal to 31 July 2003”.

With both of these filters applied in a logical AND, the Calendar table will contain all of the dates from 1 Jan 2003 until 31 July 2003.

The Result

Once the time intelligence function has been written, it can be added to a visual as shown below (Power BI Desktop).  Note how the YTD formula seems to “defy” the filter context coming from the visualisation due to the custom time intelligence function that has been written and explained.

image

The Trouble with Syntax Sugar

Syntax sugar is great as it makes otherwise hard formulas very easy to write with a limited understanding of filter context, row context, filter propagation, context transition etc.  There are a few down sides however.  Firstly is that you don’t get to learn these very important concepts and hence you are delayed in becoming a true Power Pivot and Power BI Ninja.  Secondly the inbuilt time intelligence functions only work in certain circumstances where you have a day level Gregorian calendar.  Read more about that here exceleratorbi.com.au/power-pivot-calendar-tables/

Granularity

I personally hardly ever use the inbuilt time intelligence functions unless my data is at a day level of granularity (which is rare), and I find the custom functions become easy to write with practice.  Custom time intelligence functions become much more important when your data is not at a day level of granularity.  Most of the work I do for clients is either weekly or monthly data.  If you are in this situation you could “trick” Power Pivot that you are using daily data by loading all data in any given week or month on the same date (eg last day of the month) and use inbuilt time intelligence however this is hardly best practice.  A much better approach I think is to write custom time intelligence functions as outlined in this article.  If you are going down the  path of writing custom time intelligence, you should read my advice about adding an ID column into a calendar table to make custom time intelligence functions easier to write.  exceleratorbi.com.au/power-pivot-calendar-tables/

Sales vs Prior Year

Time for a different example.  Now that I have covered how a custom time intelligence function works, let me show you a couple of inbuilt time intelligence measures that solve the original problem (Sales vs Prior Year).

Sales Prior Year 1 = CALCULATE([Total Sales],SAMEPERIODLASTYEAR(Calendar[Dates]))

Sales Prior Year 2 = CALCULATE([Total Sales],DATESADD(Calendar[Dates],-1,YEAR))

Both of the above formulas use inbuilt time intelligence functions (shown in bold), but they also use CALCULATE.  Now you have an understanding that CALCULATE is performing a time shift, it should be much easier to understand what is happening in these formulas.  Both of these formulas in bold produce a table of dates that have been time shifted by 1 year.  CALCULATE then takes this new table of dates, removes the current filter context from the calendar table and then moves back in time by 1 year before reapplying the filter context and then doing the calculation.  One you have [Sales Prior Year] it is easy to write:

Change vs Prior Year = [Total Sales] – [Sales Prior Year]

% Change vs Prior Year = DIVIDE([Change vs Prior Year] , [Sales Prior Year])

So where can you find a list of all the inbuilt time intelligence functions?

 A Free DAX Reference Guide

One of my students at a recent training class asked me if there was a list of all DAX Functions – kind of like a cheat sheet.  I wasn’t able to find such a thing so I produced exactly that and I am making it available free to anyone that would like a copy here.

This reference guide covers all of the inbuilt time intelligence functions on page 14 as well as every other function across the language all nicely laid out to make them easy to find.  You can download this reference guide below.  If you haven’t ready done so, why not sign up for my weekly newsletters at the same time so you are kept up to date with my latest tips and tricks about Power Pivot, Power Query and Power BI.

Download the DAX Reference Guide Using the Form Below

Data Modelling with OneDrive Source Files

Last week I shared how it was possible to easily create an online survey using Excel for O365.  Once such a survey is created, the data can be imported into PowerBI.com and used to create a simple report.  The biggest issue with directly connecting to OneDrive like I showed last week is that you can’t do any serious data modelling.  You can’t write any DAX measures or join other tables into the data model etc.  Not being able to build a proper data model with my data set last week created the following issues (I am sure there are more).

  1. I couldn’t easily create a sort order column for my “how long have you been reading” question (hence why I needed to prepend a letter A, B, C etc to get the answers to sort).
  2. Without data modelling, there was no easy way for me to cleans the data. eg Some people said they came from “United States” and others said US, but that didn’t work in my map (it had to be consistent).
  3. image<

  4. I couldn’t write proper measures such as countrows(survey) to find out how many surveys there were.  Instead I had to rely on “Count of Name”.  If someone left the name field blank, it would not be counted.
  5. I couldn’t change the column names to be something more suitable for reporting – I was stuck with the actual survey questions in the visualisations.

All of these issues (and more) are solvable if you can simply build a regular data model, but this is not possible when you directly connect PowerBI.com to a flat table of data in the cloud.

This article explains how you can create a data model with a cloud based service (like OneDrive) and also how I then went about improving the reporting from last week given I could crate a proper data model.

Connecting Power BI Desktop to OneDrive

The alternative to connecting PowerBI.com to the OneDrive file is to use Power BI Desktop instead.

  • First build a data model using Power BI Desktop connecting to the cloud copy of the OneDrive file.
  • Publish the Power BI Desktop file to the service in the cloud (after you have done your data modelling of course)
  • Configure Power BI cloud service to directly authenticate to OneDrive in the cloud so that it can directly access the OneDrive data without further configuration or gateways.  Power BI automatically refreshes the data for you approximately once every hour or so.

Let me demonstrate how to do this using the same survey from last week.  If you haven’t already completed the survey you can do so by clicking here and then see you survey response included in the reporting below.  You can read about how to create an Excel survey like this from the blog article last week.

Copy the URL to your OneDrive File in the Cloud

There are a few ways to do this but the way I did it was to copy the URL of my OneDrive XLSX file and then use that to import the data “From Web”.

Get the URL

The tricky bit is finding the correct web URL of your file on OneDrive.  The way to do this is to open the Excel file in Excel on your PC and extract the URL from there.  You can either open the file from OneDrive in the cloud or the copy on your PC (I used the one on my PC).

Once you have the file open in Excel, click on the File menu, then Info (1 below), click on the OneDrive Link (2 below) and then click “Copy link to Clipboard” (3 below).  This placed the link into the clipboard.

image

With the URL still in the clipboard, I created a new Power BI Desktop report using Get data, Web as shown below.

image

I then pasted the URL from the clipboard and manually deleted everything from the question mark to the right (highlighted below).

image

So I was left with the following

image

I then got the following complicated set of options for Authentication. I took the last option to sign into my Organizational Account, and signed in with my Office 365 credentials

image

image

After authenticating, I was able to import the table of data into Power BI Desktop and Load as normal.

image

Load to Power BI and Configure Credentials

I will continue the process of building out my data model further down the page, but first let me complete the process to publish and configure the online credentials process.

First I saved the query (close and apply), saved the workbook and published it to my PowerBI.com account.  Once the dataset was in the cloud, I needed to configure my credentials for the dataset.    I clicked on Settings (1 and 2), Datasets (3) and then selected my dataset (4 below)

image

Because this is an online data source, I was able to edit the credentials as shown below.

image

After clicking Edit Credentials, I selected OAuth2 and signed in.

image

And I then got this confirmation.

image

Improving the Reporting Using Data Modelling

Back to the data modelling in Power BI Desktop.  The first thing I did was turn off data load for the table I had previously loaded.  This was quite easy. Right click on the table, then uncheck “Enable Load”.  This keeps the connection to the table but doesn’t load it in the current form.

image

The rest of this article explains how I changed the data to make it more useable in my final report.

Cleanse the Country Data

The next thing I did was create a data cleansing table for the list of countries. Ideally the survey tool would provide a list of all valid countries to select from a list, however this is not available using the simple online Excel survey tool I am using.  Instead I created a substitution table where I can manually create substitutes for data that I wanted to change.

I created a new query with reference to the first.  Right click on the first query (1 below) and then select Reference (2 below).

image

I then deleted all columns other than the country column, removed duplicates and sorted the column.  Finally I set it to disable load.  It is now possible to see all the data points entered and identify the dirty data that was included (shown below).

image

I then created a manual mapping table to make it easier to control the data cleansing process.  To do this, I simply created a new manual table in Power BI Desktop as show below (and also turned of data load).

image

Sorry to the non descript North American if you happen to be Canadian, Mexican or of some other Nationality.  I decided to map this to the USA.

Now I had the mapping table, the next step was to replace the original country data with the cleansed data. I created a new query referencing Table1, then joined on the country columns to the CountryCleanse table.

image

Using a Left Outer Join as shown below

image

I then expanded the new column to extract the cleansed version of the data.

image

This gave me a new column that contains the substitutes in the case there was a match in the cleansing table.

image

I then created a new custom column that uses this cleansed column if it exists, otherwise the original column.

image

image

I then deleted the other 2 columns and renamed my new column so that I had a cleansed copy of the Country.

Cleanse and Sort “How Long” Answers

In my survey question, I asked how long had the participant been reading my blog.  When the answers were placed in Power BI they didn’t sort based on length of time (they sorted alphabetically).  To fix this (using the original approach I blogged about last week) I was forced to add a prefix to force the answers to sort logically.  Using Data Modelling, it is possible to solve this problem properly.  For variety (in this blog), I decided to use a different mapping table solution. I first created a mapping table with the original survey choices, a new set of labels and a sort column as follows.

image

Then I loaded this mapping table as a lookup table to my data model and joined it on the Original column as the key.

image

Then to avoid confusion in the final report, I hid the original columns in both tables.  Right click on the column (1 below) and then select Hide in Report View (2 below).

image

Of course I sorted the Tenure column

image

Adding “All of the Above” into the other 3 choices

The last problem I wanted to solve was that many people chose “All of the above” when asked about their favourite topics.  When this was put in a visual (shown as 1 below) it is not possible to see the overall most popular item.

image

There are many ways to handle this problem.  The measure I wrote was as follows:

Count Most Useful =
COUNTROWS ( Data )
    + CALCULATE (
        COUNTROWS ( data ),
        Data[Which topics do you find most useful?] = "All of the above"
    )

The measures simply adds the count for each item and also adds “All of the above” as well (shown as 3 above) and I then simply filtered out the “All of the above” answer from the visual (shown as 4 above).

My report was starting to look good (see below) but I still wasn’t happy.

image

Ideally I should be able to show “what is most useful” and also “What do you want to see more of” in a single bar chart.   But the problem was that there were 2 different measures and the “bars” are coming from 2 different columns in the data source.  This could not be solved without thinking differently about the problem.

I created a new table that has the 3 responses and loaded it to the data model.  I didn’t connect this table to any other tables in the data model.

image

I placed this new column of data as the Axis of my visual and place the 2 measures as Values.  This then gave me a chart that looked like I needed, but of course there was no filtering, because the new “Types” table is not connected to anything in the data model.

This is important:  The Axis is part of the filter context, the Types table is being filtered by the visual, but this filtering is not being used in the measures I wrote.

image

So I changed the measures so they would “harvest” the value on the Axis to filter the results.

Count Most Useful =
IF (
    HASONEVALUE ( Types[Product] ),
    CALCULATE (
        COUNTROWS ( Data ),
        Data[Which topics do you find most useful?] = VALUES ( Types[Product] )
    )
        + CALCULATE (
            COUNTROWS ( data ),
            Data[Which topics do you find most useful?] = "All of the above"
        )
)

The final visual looks like this, which is much easier to read and understand.

image

And here is the final report

64 Bit Power BI Desktop with 32 Bit Office? Can do!

As many of you would already know, 32 bit applications do not always play well with their bigger sibling,  64 bit applications.

There are lots of “under the hood” files and applications that are always designed to be either 32 bit or 64 bit – not both.  In the case of Microsoft Office, you must choose either 32 bit or 64 bit across the entire Office Suite.  You can’t for example have 64 bit Excel and 32 bit Outlook, PowerPoint etc (read more about it here).  This is unfortunate because:

  1. The ONLY Office application that can benefit from 64 bit is Excel running Power Pivot.
  2. Many (most) IT departments do not install Office 64 bit (for the same reason I mention above – there are often many conflicts with Addins and applications).

So chances are you have 32 bit Office applications installed…
image

…but you want to have 64 bit Excel because you want to be a Power Pivot Ninja.

So what to do? Well the best thing is to try to get your system upgraded to 64 bit Office.  If that is not possible, then there is a work around that I explain below.

Note: You must have 64 bit Windows and Admin Rights to proceed.  If you don’t have Admin Rights, your IT department may be happy to install 64 bit Power BI Desktop for you as an alternative to Office 64 bit.  Also note that this worked for me, but I can’t guarantee it will work for you.  Use caution and I recommend you create a system restore point before proceeding.

Power BI Desktop 64 Bit

It is possible to install Power BI Desktop 64 bit on a computer that has 32 bit Office installed.  There are a few things you may need to do however to make it work properly (depending on your config), but I explain those things below.  The high level process is:

  1. Install Power BI Desktop 64 bit
  2. Install the 64 bit data providers you need (such as the Access Database Engine) so you can import the data you need. Power Pivot relies on these data providers to connect to the relevant data sources.

Once you have 64 bit Power BI Desktop installed and running, you can then also optionally use Power BI Desktop as a local 64 bit server and direct connect to it from Excel 32 bit by following the instructions and template provided in my blog article here.

Download and Install Power BI Desktop

Installing Power BI Desktop is easy.  You download it from the direct download here or from the Power BI site here,   Which ever place you go, just make sure you download the 64 bit version.

image

After downloading the file, just go ahead and install it in the normal way.

Try to Import Data in Power BI Desktop

After installing, run Power BI Desktop and try to install some data (say from an Access file like this one).  When you try to import data using Access, you will most likely get an error message like this.
image

But don’t panic, just follow the instructions below.

Download the Data Providers you need

If you have 32 bit Office on your computer, then any data providers you have on your PC will also be 32 bit.  These will not work with 64 bit Power BI Desktop and you will need to install the 64 bit versions as well as retaining your 32 bit versions.

Standard Install Process

  1. Download the data provider you need (AccessDatabaseEngine.exe 64 bit version in this case here)image
  2. Try to install the file.  If all goes well, you are done and you can start using Power BI Desktop.

However in some PC configurations, installing 64 bit data providers when there are also 32 bit data providers installed can be harder than you would think, because Microsoft can explicitly try to prevent you from doing this.   If this happens, you can force the install of the 64 bit data providers by following the steps below:

Install in Passive Mode

  1. Copy the install file to a location on your PC that is easy to access from the command line.  I put mine in C: root folder
  2. Click the windows button and type run and press enter
  3. type c:AccessDatabaseEngine.exe /passive into the command line box and click “OK”.  Of course you should use the correct path to your file.
  4. The files will install successfully in the background while suppressing warnings and errors.
  5. It’s a good idea to reboot your computer after the install.

You may need to repeat this process for any other data providers that you need.  The easiest thing to do is just start using Power BI Desktop until you find you have a problem, then find and install the new 64 bit data providers you need to solve each problem you come across.

Time to Test

Once you have installed the new data providers, it is time to test it out.  Open Power BI Desktop 64 bit and try to connect to data coming from an Access Database.  You can download the Adventure Works Access database here if you need a sample to test.

Use Power BI Desktop 64 bit as a Server on your PC

Take a look at my blog post from a couple of weeks ago.  If you follow then instructions on the other blog post, you will be able to open Power BI Desktop 64 bit, open a large PBIX workbook, and then use Excel to connect directly to Power BI Desktop as a server.  You can also migrate your existing Excel workbooks that are too big for 32 bit Excel to Power BI Desktop (File\Import\Excel Workbook Contents) and then start to use Power BI Desktop as a server (you will have to recreate your visualisations however).

Save PBIX from Power BI Service Under Delivers

This blog post was going to be about how excited I was to hear that there is now an option to download a PBIX file from the Power BI Service to your PC.  Unfortunately all that glitters is not gold.  This new option in the Power BI Service promises a lot but delivers little – let me explain.

save pbix updated

The Authoring Pattern

The general pattern I recommend for Power BI is:

  1. Author your workbook in Power BI Desktop
  2. Publish the workbook to the Power BI Service and share from there.

I have always recommend you do not make any changes to your reports or datasets on the Power BI service because:

  1. There is no way to backup the changes you make
  2. There is no way to download the changes back to your Power BI Desktop file.

I was hoping that this new feature solved this problem, but it doesn’t.

The Problems With This Feature

The first problem is that any PBIX files that I previously uploaded to the server are not able to be downloaded (maybe because you need to upload from the latest version of Power BI Desktop – I’m not sure).  This problem seems fixable by simply uploading a new copy of the PBIX and overwriting the current version.

The second issue is that you cannot download a report created from scratch on an existing dataset in the Power BI Service.  If the report was created online, the download option is greyed out (shown below).

image

This limitation therefore means the new feature doesn’t solve the 2 issues that existed in the first place.  It seems to me that the only possible value of this new feature is that it allows other users to take a local copy of the PBIX file (which is good), and also allows you to recover the PBIX in the instance where you misplaced (or lost) the original.

I would like to hope that this is just the first step of a multi-step process towards making all reports authored in the Power BI Service downloadable.  It remains to be seen if this will be the case.

Edit:  A colleague of mine Gilbert Quevauvilliers tells me that if you do download a PBIX, you must have the latest version of Power BI Desktop to open it.  It is a good idea to keep Power BI Desktop always up to date.

Excel Workbook Connection to Local Power BI Desktop

Today I am sharing an easy way to use Power BI Desktop as a SSAS tabular server on your local PC.  I find increasingly that I am building data models in Power BI Desktop and yet I still want to use Excel to analyse the data some of the time.  If I load the PBIX workbook to the Power BI Service, it is easy to use “Analyze in Excel” to query the data model.  But there is currently no standard supported method to do the same thing using Power BI Desktop as a server (see my suggested idea to Microsoft at the bottom of this page for more about that).  Until then, we need to use a “hack” to complete this task.

I have previously shared a way to:

  • Open DAX Studio on your PC
  • Use DAX Studio to extract the Port Number of a running instance of Power BI Desktop
  • Manually enter this information into Excel to connect to Power BI Desktop and use it as a local SSAS Server instance
  • Query the data model that exists in the open Power BI Desktop file with a Pivot Table in Excel.

The problem with the above approach is that once the Power BI Desktop file is closed, the connection string becomes invalid.  Next time you want to use Power BI Desktop as a server you must go through the entire process again plus you need to alter the connection string that was previously created – hardly a scalable solution.

An Excel VBA Workbook Template to Solve This Problem

Local Host Workbook

Today I am sharing a tool that I built recently that completes the entire process automatically in Excel using VBA and Power Query.

Disclaimer:  I am providing this tool free of charge, without support, and do not warrant that it will work for you.  This is not supported by Microsoft and could stop working anytime.  If it works for you, then great – if not then it is unfortunate :-(.  Maybe you can deconstruct what I have done and make it work for you.  I built this using Excel 2016 and I have tested it on 2010 and 2013 and it works (Please make sure you have the latest version of Power Query installed).

You create a new blank Excel Workbook using my template (from personal templates), then click the “Refresh SSAS Connection” button.  The tool will then:

  • Detect an instance of Power BI Desktop running on your PC (there must be 1 and only 1)
  • It finds the port number and database name of the running instance
  • it modifies the existing connection string in the Excel Workbook to repoint it to the current running instance of Power BI Desktop.
  • If the workbook already contains pivot tables that were previously built using the same data source, they will simply reconnect and work as you expect.
  • If this is the first time you have used the template, you will have a blank Pivot Table connected to the Power BI Desktop data model.
  • You should only have 1 Excel Workbook and 1 Power BI Desktop file open at any one time (as they all use the same connection string).

I am not planning on covering in detail how the tool works.  I am making it available to use and modify as you see fit.  If you want to look into the VBA code and the Power Query to see what I did then of course this is fine by me.  If you want to improve it and give me back the improvements under the same “share and share alike” rules, then please do so as long as you leave my name comments in the file untouched.

How to use the Template

You can download the template here.  I have put it in a zip file so that I can keep the same link but change the name of the actual file over time (to manage version numbers).

The file is a macro enabled Excel Template workbook.  You need to place it in your Personal Templates location.  To find this location, in Excel go to File\Options, then navigate to save (1 below) then check the location (shown in 2).  If you don’t have a current template location, you can set your own.  Note Excel 2010 has a different file location (Google it).

image

To create a new workbook that uses this template, in Excel go to File\New, select the personal templates option (shown as 2 below) and then select the template.

image

I would love to hear what you think and find out if this is a useful tool for you.

Let’s Ask Microsoft to Make this a Standard Feature

I was talking to Avi Singh and he suggested that I request this as a standard feature.  I have set up an idea that you can find and vote for here.  If we get enough votes for this feature, Microsoft may build out this capability into the core product(s) so it is easier for everyone and fully maintained for future releases.

 

Find Duplicate Files on Your PC with Power BI

Level: Beginners

If you want to learn new skills using a new tool, then you simply must practice.  One great way to practice is to weave the new tool into you daily problem solving.  If you have something meaningful to do with the new tool, then you are much more likely to be motivated to practice.  And the new tool I am talking about of course is Power BI.

Last week I showed how easy it is to use Power BI to help you track down large files saved in Dropbox so you could manage the overall space usage.  As a result of that article, Graham Whiteman posted a comment suggesting it would be a good next step to find duplicate files.  I think that is a great idea, so I decided to test it out on my PC.  Read on to see how I did it, and how you can do it too.

Create a Query to Fetch All PC files

I started a new Power BI Desktop file, then connected to my PC documents folder

image

image

I immediately selected Edit query as shown in 1 below.

image

The only time you would immediately select Load (2 above) is if the data you are imported is already in the correct shape for Power BI.

The only columns I need are the file name, date modified, attributes and path (shown below).  I Multi selected the columns I wanted to keep, then I right clicked and select “remove other columns”.

image

The next step was to extract the file size from the attributes list. To do this, I expanded the list of attributes (1 below), deselected all the columns and then reselected the file size (3 below).

image

Then I renamed the query (1 below) and changed the query so it didn’t load to Power BI by right clicking on the query and un-checking the enable load option.

image

This created a query that links to the PC, keeps the columns of data needed but didn’t load anything to Power BI yet.

Create a New Query that Accesses the File List

The next step was to create a new query that references the File List.  I right clicked on the first query (1 below) and then selected Reference (2 below).  Note how the File List query is shown in Italics indicating that it won’t load to Power BI.

image

The next step was to merge this data with the itself by going to the Home Ribbon and selecting Merge Queries.

image.

In the Merge Queries dialog, I joined the list of files File List (2) with the original query File List so that it was joined to itself on 3 columns (the File Name, Modify Date and File Size) but not the File Path as shown below.

join file list

The above steps added a new column to the query.  I then expanded the new column as shown below making sure to keep the original column name prefix.

image

Find The Duplicate Files

The second query now looked like this.  As you can see in the image below, the query returned all the files (name column) along with the folder paths from the query “File List” shown as 1, and a second column containing the folder paths from the query “File List (2)” shown as 2 below.

image

The next step was to get rid of all rows in this query where the 2 folder paths are identical.  Doing this is easy with a custom column.  I added a custom column (steps 1 and 2), and wrote a formula to return TRUE if the 2 folder paths were identical.

image

I then filtered out everything that returned a TRUE in the new column using the filter button as shown below.

remove matches

I then deleted this custom column as it was no longer needed.  I just right clicked and selected remove.

Format the Number Columns

It is very important in Power BI to set the number formats before loading the data.  Any numeric column that has a data type “Any” should be changed to a suitable numeric format (as shown below).

image

I did this, renamed the query to be called “Duplicates” and then selected  “Close and Load” to get the data into Power BI.

Time to Write Some DAX

Now the data is loaded, you of course I could just drag the one or more of the columns to the Power BI canvas.  But remember half the reason of doing this is to get some new skills.  So instead of dragging the Size column and creating an implicit measure, I wrote some DAX – it isn’t hard to get started with such simple formulas.  Here’s how to do it.

Select the Size column, go to the Modelling Ribbon and select New Measure.

image

The formula I wrote is as follows

File Size MB = sum(Duplicates[Size])/(1024 * 1024)

image

Note a few things that I was able to do by writing this measure myself

  1. I converted the units of the result from bytes to megabytes by dividing by (1024 x 1024).
  2. I gave the measure a more meaningful name “File Size MB”
  3. I was able to set the formatting to comma separated with 1 decimal place

And of course I practiced my DAX.

And the Results

I simply then added the file size, File Name, Folder Path and Second Folder Path to a table in Power BI like shown below. image

I then discovered I had around 9 GB of duplicate files on my PC.  I sorted the table by File Size descending and discovered that I had multiple identical copies of a contoso.pbix.  It looks above like there are 6 copies of contoso.pbix but this is deceiving. Every copy of a file will find a match with every other copy.  If  you note in the Folder Path column, there are only 3 unique folder paths, hence 3 files.

The next thing I did was add a Tree Map as shown, with the file name in the Group section and File Size MB in the Values section.

image

To find out accurately how many copies of each file there were, I had to write some more DAX.  This formula is a bit more involved (intermediate DAX).

2016-10-31_120903

Let me explain this formula starting from the inside out.  There are 4 functions in this DAX formula and I describe their role below.

  1. SUMX is an iterator.  It iterates over a table specified as the first parameter (VALUES in this case).  You can read more about SUMX here.
  2. The VALUES function returns a table of unique file names (in this case it is unique values in the column Duplicates[Name]).  So SUMX above will iterate over each file name in the name column.
  3. SUMX is iterating over a Virtual Table (VALUES).  The CALCULATE is required to force context transition.
  4. Then for each file name in the table (in 2 above), DISTINCTCOUNT will count how many unique folder names there are.

I then added the new File Count measure to the Colour Saturation section of the Tree Map (1 below).  This does 2 things.  Firstly it shows the high folder count files as being a darker colour, and secondly it adds the file count to the tool tips (visible when you hover the mouse over the visual).

image

And Now Some Fun

I’ve been looking for an excuse to do this for some time.  I want to find the fattest fish in my pond (aka most space taken by file name).  I went to visuals.powerbi.com and downloaded the Enlighten Aquarium custom visual.

app.powerbi.com/visuals/show/Aquarium1442671919391

I then imported the custom visual into Power BI Desktop

image

The I copied my Tree Map visual (Ctrl-c, Ctrl-v), selected the copy and changed the visualisation to be the Aquarium.  This visual is showing the largest individual files regardless of location or how many copies.  I am not saying this is the best way to visualise data, but surely it is one of the most creative.

fish

Here is my final workbook canvas

image

For the purists out there, I wrote a new file size formula as follows.

final

 

The original formula I wrote double counts the file size when there are multiple duplicates.  The above formula is almost identical to the File Count I explained above.  The only difference really is the inclusion of MAX(Duplicates[Size]).  This is a “trick” to handle the fact that for each file name there will be multiple records in the data model.  Each file will have the exact same file size, so by selecting MAX I simply get to access the file size.  I could have used any other aggregator (eg Min, Avg, Sum) and got the same outcome.

I haven’t shared the actual workbook here. The whole idea is for you do try this yourself so you get 3 benefits; more disk space, some practice with Power BI Desktop and have some fun.

Use Power Query to Manage Dropbox Space

Level: Beginners

I got this dreaded Dropbox email recently as shown below.

image

I needed to clear out some of the files I have loaded in Dropbox so I didn’t have to upgrade my account.  It occurred to me that I could make this process a lot easier by using Power BI to quickly show me where my big files were located in Dropbox.  This post today explains how I did it. What I ended up with is a report like this that allowed me drill down on the large sub folders to easily find my big files.

dropbox size

Note, there is a great tool called WinDirStat that you can download here that does this too – I use WinDirStat all the time. But I never want to miss an opportunity to do something with Power BI.

Process to Build the “File Space Usage” Tool

First I created a new Power BI report and connected it to my Dropbox folder.

image

You can of course use the same process on any other folder on your computer, or even the entire Hard Disk if you want.

I then imported the columns I thought would be of use, and loaded them into the data model.

image

I figured the Hidden flag and Date Accessed might be useful at some stage, so I brought those in too.

I then wrote some measures that I thought would be useful.

image

I encourage you to write your own measures rather than use the implicit measures created when you drag a column of values to the report.  By writing your own measures, you “Learn” how to write DAX and that will help you become a Power BI ninja.

The last thing I did was to create a report that made it easy to see where my big files were located and find out what they were.

image

I have loaded a short 4 minute video that shows how quick and easy it is to do this from scratch.

What Obtuse uses have you found for Power BI?

I would love to hear from others about how they are using Power BI in ways that are not immediately obvious.

Who Needs Power Pivot, Power Query and Power BI Anyway?

Level: Beginners

One of the great challenges Microsoft has faced with its “new” suite of Self Service BI tools (particularly Power Pivot) is that most people that could benefit from the holy trinity (Power Pivot, Power Query and Power BI) don’t even know these tools exist, let alone how the tools can help them succeed in their jobs.  The situation is definitely getting better as Power BI starts to get a presence in the market place, however I still talk to people who have heard of Power BI, but have no idea what Power Pivot or Power Query are, and what’s more they don’t know why they should care.  I personally believe a big part of the awareness problem is that Microsoft insists on shipping Microsoft Excel with the Power Pivot plugin disabled.  There is no reference to Power Pivot when you excitedly receive your brand spanking new version of Excel 2016 – what a marketing opportunity gone begging!

image

I have been an Excel nerd for 30 years.  There is nothing I used to enjoy more than installing a new version of Excel, and clicking through every menu item to find something shiny and new that would make my life easier.  By not turning on the Power Pivot menu by default, Microsoft is missing one of the best silent selling opportunities is has for this fabulous addition to Excel.

Given there is no “pull through” on the menus, many people don’t know what these products are or why they should care.  I am often asked by people “who can benefit from these tools?”.  This post sets out to explain who can benefit and why.  Note when I say “who can benefit”, I am not talking about “consumers of reports” here, I am talking about “authors of reports”.  It is clear that people that consume quality reports and analysis will benefit, whatever the tool.  This article is focused on the benefits to those people that are responsible for building the reports and analysis that others will end up consuming.

Power BI

Who can benefit from Power BI is probably the easiest to understand.  The product is well marketed and has a clear role to play.   Power BI is a complete self service BI tool.  It is designed to bring business intelligence capabilities to the masses instead of the elite (e.g. instead of just highly skilled IT MDX programmers).  Rob Collie wrote a good article last week about the democratisation of BI tools.  Power BI will add value to people who:

  1. Have problems sharing reports with others because the file size is too large.
  2. Need to share data with people on the go, that maybe only have a Tablet or a Mobile phone.
  3. Have large data sets that can’t be managed in traditional Excel.
  4. Are spending too much time each week/month manually maintaining reports with new source data and/or new visualisation requests.
  5. Can’t get the (timely) support they need from their IT department using traditional Enterprise BI tools.

Power BI is great because it puts capabilities across the end to end BI stack into the hands of end users (authors), including:

  1. Extraction of data from the source (using the Power Query engine)
  2. Transformation of that data into a new shape that is optimum for BI reporting and analytics (Power Query again).
  3. Data modelling capabilities, so you can turn the raw data into business meaningful insights (using the Power Pivot engine).
  4. A reporting and analytics front end allowing you to build reports to visualise your data (Power BI Desktop and Power BI Service).
  5. A fully integrated cloud based delivery framework so you can easily share with anyone over the internet (Power BI Service).
  6. A full set of Mobile applications across the major operating systems (Power BI Mobile).

Notice how steps 1 and 2 use Power Query, and step 3 uses Power Pivot.  So if you want to learn about Power BI, you really need to learn about Power Pivot and Power Query too.

Power Pivot

Conversely, Power Pivot is the hardest to understand – I.e. it is the hardest for individuals (potential authors) to understand “what does Power Pivot do for me and why do I need it?”.  I have had people enquire about Power BI training courses that have not been interested in Power Pivot or DAX*. But the truth is, if you want be able to write your own reports in Power BI, you really need to learn at least some basic Power Pivot skills.

Power Pivot is a data modelling tool.  It is the underlying reporting engine that enables Power BI and Modern Excel to delivery those modern funky reports that can help you succeed in business.  The Power Pivot engine allows you (the report author) to take your business knowledge and to configure the reporting tools so that Power BI and Excel Pivot tables can be used to find and report on insights in your data.

Most business users have never heard of the term “Data Modelling” before, and the reason for this is quite simple – it has always been the IT department that has been responsible for data modelling.  Power Pivot brings the power of data modelling and puts it in the hands of competent business/Excel users.    An example of data modelling will make it easier to understand.

Example of Data Modelling

Consider a scenario where you download sales data from your company transaction system and it looks something like this in a spreadsheet.

image

You can see the sell price and the cost price information exists in the table above.  But there is nothing about the $ Margin per product and nothing about the % Margin, let alone insights like Sales Year to Date, Top 3 selling products, Fastest growing product etc.  In a traditional Excel world you would simply write formulas in your spreadsheet(s) to enhance this raw data and extract the additional insights.  The trouble with this approach is that all these formulas are only ever “one off”.  You write them for this report, and then when you create another report later, you have to write the formulas again.  Power Pivot handles this problem in a different way.  Power Pivot is a data modelling tool, and instead of writing formulas in your final reports, you write them “under the hood directly in the Power Pivot engine” as part of the data modelYou build the “rules” that describe how to calculate $ Margin and % Margin ONCE and only once directly in the Power Pivot engine.  Once you have created the rules for each insight (e.g. $ Margin, % Margin etc), it is forever available for you (and others) to use over and over again in any report, chart, Pivot Table or any other visualisation you can think of.  Never again to you have to write/copy a new formula every time you create a new report.

*What About DAX?

Data Analysis Expressions (DAX) is the formula language of Power Pivot.  DAX is very similar to the Excel formula language yet there are also a lot of differences that you will have to learn from scratch.  Simply put, if you want to learn Power Pivot (and/or Power BI), you will need to learn at least some DAX as well as lots of other things about how the Power Pivot engine works.

Who needs Power Pivot and why?

People that need this tool are typically Microsoft Excel users/report authors that analyse or report on data, particularly when the following conditions apply:

    1. There are lots of data that stretch the capacity of traditional Excel (file size, re-calculation speed etc).
    2. If you use Pivot Tables a lot to analyse your data.
    3. If you are writing a lot of VLOOKUP (or INDEX/MATCH) formulas to join data from different tables so you can analyse in a Pivot Table.
    4. If you have had to create bespoke reports in traditional Excel, but then have to spend hours rebuilding the report when asked for a different view of the data (e.g. you have a report that shows sales by half year, and then you are asked to produce the same report by month – and have to start again almost from scratch).
    5. Anyone that wants to start using Power BI for anything more than the most simple aggregation reports.  The “out of the box” capabilities of data modelling in Power BI are limited, and you will quickly realise that you need to learn some new skills (Power Pivot/DAX) to be able to leverage the strengths of Power BI.

Power Query

Power Query is a user friendly ETL (Extract, Transform, Load) tool.  Unfortunately Microsoft muddied the waters by renaming Power Query “Get and Transform” in Excel 2016, and “Get Data” in Power BI Desktop.  That aside, Power Query is used to:

  1. Extract:  Connect directly to the data source and ingest the data (into Power BI or Excel) so you can work with it.  It retains a connection to the source so when the source data is updated you can automatically “refresh” the data without having to go through the import process again from scratch. It is amazing.
  2. Transform:  You can clean and reshape the data so that by the time you are ready to use it, it already is in the format you need instead of the format you are given.  Again, you do this only once, and then you can automatically refresh when the data is updated later.
  3. Load the data directly to the place where you want to use it.  The end location for the data will typically be inside Power Pivot or Power BI, but it is also an invaluable tool for use with traditional Excel.

Don’t think of Power Query as simply a tool for self service BI reporting.  I have used Power Query to automatically audit information (eg XLSX files, csv extracts from other systems etc) and produce exception reports directly in Excel.  Once you understand what Power Query can do, you will start to realise all the ways you could use it to make your life easier.

Who needs Power Query and why?

People that need this tool typically are trying to solve the following problems.

  1. They regularly receive file extracts from someone (or some system) and need to manually manipulate this data before they can use it.
  2. They want to use Power BI/Power Pivot but can’t get the IT department to shape the data they need prior to loading it into Power BI/Power Pivot.
  3. People that are doing repetitive tasks such as matching invoices on a remittance advice against a bank statement (or similar) – Power Query eats this kind of work.
  4. Anyone that is given data in the wrong shape (example below).  And yes I quickly used Power Query to un-pivot the data in the blue table so it looked like the green table.  It is worth getting and using Power Query just to get this un-pivot feature alone!
    image

Wrap Up

Hopefully this overview has helped demystify how Power Pivot, Power Query and Power BI can help you in your job.  If you would like to learn more about Power Pivot, you can read my book “Learn to Write DAX“.  If you want to learn more about Power Query, I recommend Ken and Miguel’s book “M is for Data Monkey“.  If you live in Australia, you can attend one of my live training courses where I cover what you need to know about Power Pivot, Power Query and Power BI.

Query Dependency View in Power BI Desktop

I have been on holidays taking a much needed break from work, so it has been very quite on my blog over the last few weeks.  But I am back now! I woke up this morning to a very pleasant surprise – Microsoft has delivered the long waited for Query Dependency View in Power BI Desktop.

“What is this and why are you so excited Matt” I hear you ask.

What is Dependency View

Simply stated, it is a “table view” of all of your queries showing the linkages between the tables/queries.  Now if you are simply loading data directly into the data model with no transformation, then this really doesn’t add any value.  However if you are doing a lot of transformation using staging queries etc, then this is a must have tool.

How to Access Dependency View

Download and install the latest version of Power BI Desktop.  Open a workbook that has data loaded, and go into the edit query mode.  Do this via Home, Edit Queries, Edit Queries as shown below.

image

Then click on View, Query Dependencies.

image

What it Looks Like in Action

The best way I can demonstrate why I think the Query Dependencies view is required is to show you the Dependencies View from a workbook that I developed for a client last year.  As you could probably imagine, this workbook was very difficult to manage without a view like this.  What I ended up doing was to develop my own dependencies view using Microsoft Visio.  This took quite some time and effort, and thankfully is no longer required.

image

It is worth pointing out that this workbook was not a reporting tool using Power Pivot/Power BI, but it was an auditing tool.  It was used to import data from many spreadsheets, compare this data against master data, and identify data errors and exceptions.  This tool was built using lots of interim queries with lots of cross dependencies (as you can see).

The Dependencies Query view is still not available in Power Query for Excel however it is easy to import your workbook into Power BI Desktop to immediately leverage the new feature.

Improvements Still Needed

So this is version 1 of this view, and it is simply great.  There are some things that still need improvement in my view, including.

  • There is no maximise button for the window.
  • You currently can’t reposition the tables on the screen manually
  • It could benefit from some user controlled colouring capabilities to highlight different types of tables, such as those that are loaded into the data model etc.
  • Deliver it in Power Query for Excel of course.

But having said that, it is still great now so get to it and start using it.

Top 10 Tips for Getting Started with Power BI

Level: Beginners

top 10 tipsI really love Power BI, and I have learnt so much over the last 12 months that sometimes it is easy to forget the challenges I had in getting started.

Today I am sharing my top 10 tips on how to get started with Power BI.

Build Your Reports in Power BI Desktop, Not the Service

One thing that can be confusing is that there are 2 ways you can build reports – Power BI Desktop and Power BI Service.  My first tip is that you should always use Power BI Desktop.  Reasons include:

  • There are more configuration capabilities in Power BI Desktop
  • You can upload changes made in Power BI Desktop to the Service, but you can’t download changes made in Power BI Service to the desktop.
  • There is currently no way to back up your changes in Power BI Service, but you can back up Power BI Desktop yourself.

Don’t Try to Replicate What You Do in Traditional Excel

Power BI is not the same as Excel, and hence you should not start out trying to simply replicate what you currently do in Excel.  The closest thing in Excel to Power BI is the Pivot Table (probably Pivot Charts actually).  The key similarity here is that a Pivot Table is an aggregation and summarisation visualisation tool – this is exactly what Power BI is too.  Yet there is no Pivot Table object currently available in Power BI.  The closest thing to a Pivot Table in Power BI is the Matrix, yet if you start using the Matrix with the view of replicating a Pivot Table, then you are bound to be disappointed.  Hence you should explore the other options (to your great delight).  See the next tip.

Explore the Visualisations to Understand Capabilities

Instead of replicating what you do in Excel, I recommend that you explore the various visualisations to see what each of them does.  There are lots of new visualisations that don’t exist in Excel, and some of these are very effective in helping you communicate the insights in data.  A great place to start is the Tree Map or a Column Chart.  You can get these visualisations to behave just like a Pivot Table in Excel.  Just place a number of summary levels into the visual (see the Axis section in the image below), then you can drill through the visual just like a Pivot Table.  It is a great experience.

drill

You should also check out the Custom Visuals Gallery for some of the great free tools that have been shared with the community by industry leading experts.

Understand the Purpose of Your Report, Then Find a Way to Build It

A common mistake is to start out thinking that you should simply replicate what you have previously built in Excel.  As covered above, this approach will limit your thinking and make the journey so much harder.  Instead I recommend you start out by thinking about the purpose of your report.  You need to be clear “What” you are trying to communicate, “How” the user will consume the information, and what business decisions will be made with the data.  If you can write down a couple of sentences about the purpose of your report, that is even better.  Once you are clear on the purpose, then set about finding the best visualisations and techniques to achieve the best outcome  that delivers on the purpose of the report.

Assume Your Current Data Shape is  Not Ideal for Power BI

Many Excel professionals make a fundamental mistake by simply loading data from their current source systems into Power BI as is.  The most common issue I find when helping people with their DAX problems is they don’t have a good data structure.  There is a big difference between transactional business systems and BI reporting tools.  If you want your reporting tools to be easy to use, easy to enhance and to perform well, then you really need to understand the data structure needed by BI Reporting tools.  I wrote a separate article about data shape here.

Assume Calculated Columns are Not What You Need

Self taught Power Pivot/Power BI users from an Excel background tend to make the mistake of writing far too many calculated columns.  I have hardly ever seen a calculated column from a self taught user where it was “the correct decision” – in almost all cases the calculated column should have been written as a measure.  I cover this topic in a lot more detail here.  I generally recommend (to Excel users) that you should assume your formulas should be written as measures unless you know for sure why it needs to be a calculated column.

Learn How Cross Filtering Works, and How to Turn it Off

Cross filtering is one of the great features of Power BI, and you should learn how it works so you can use it to your advantage.  However in my experience the default cross filtering behaviour is not always the best experience for the user.  In a recent release, Microsoft deployed a feature that allows you to turn off (or change) the cross filtering behaviour between visualisations.

Once you select a visualisation (1 below), the “format” menu (2 below) appears at the top of the page.  From there you can select “Edit Interactions” (3 below) to change the default cross filtering behaviour.  It is easy to work out how to do it, so just give it a go and learn by doing.

image

Load Content to be Shared into a Group Workspace

If you intend to share data with others, make sure you load the content into a Group Workspace right from the start. I have found it is much easier to keep control of data that you intend to share if you do this.  Of course at times you may be working on something in your own Workspace and want to share it, and that is fine to do.  However if you are starting out with the intention of sharing data, then the Group Workspace is the way to go.  I cover the topic of workspaces and sharing in some depth here.

Learn to Write DAX

At some time early in your journey you will discover that there is a limit to what you can do using the automated summarisation techniques that come with Power BI (and Power Pivot).  Data Analysis eXpressions (DAX) is the language of Power Pivot used in Power BI.  DAX has a lot of similarities with Excel formulas however there is a lot to learn and it is not always intuitive.  If you want to turbo charge your Power BI journey, you really do need to do some formal learning.  I recommend you read my book “Learn to Write DAX” as a cheap and easy way to make rapid progress.   Many Excel professionals tell me that they wish they started with my book first, and it would have saved them a world of pain.  There are many other great books available too.  I keep a curated list of the best books here.

Learn to Use Power Query

Power Query (get data) is much easier to learn than DAX.  The user interface is very intuitive and using the menu items in the UI will normally give you excellent transformations with very little knowledge or previous experience.  You should explore what you can do, and go in with the assumption that “there has to be a way to do this”.  Once you get the basics down, you should consider learning some more advanced techniques from books like M is for Data Monkey.

Conditionally Refresh Queries in Power BI

There was a great new feature that snuck in unannounced to the July release of Power BI Desktop.  It was only a tweet by my colleague Imke Feldmann from www.thebiccountant.com/ that I was even aware it was there.

Include in Report Refresh

When you look in the Queries pane in Power BI Desktop, you will see your list of queries in the Queries pane on the left hand side.  Simply right click on any of the Queries to see the new feature (shown as 2 below).

image

You can see the new check mark “Include in Report Refresh” which obviously can be toggled on or off (it is on by default). This is a step towards partitioning, which is available in SSAS Tabular.

So What Can I Use This For?

As soon as I saw this, I thought of an immediate use case that will add a lot of value for me (maybe you too).  I have a client that I complete reporting for on a weekly cycle (I only process the data once per week).  I import daily extracts of data into a data warehouse (once each week) and then provide weekly reporting and analysis tools.  I also have some business users that need daily reporting on a subset of the data – but I only process the data on site once per week.

The Old Process Was to Load a History Table

The process I have used so far to solve this problem is as follows:

  1. Prepare a “history table” (illustrated as 1 below) that has sales going back to the start of the year (up until a point in time).  This history data is an Excel table in a workbook and I refresh then store this on the client file server.
  2. I then have a daily sales report (2 below) that refreshes off the daily file extracts (3 below) that get placed on the server each night.  The client just opens the workbook (2 below) and clicks refresh – it then cycles through the new files and imports the data needed.
  3. The Workbook (2 below) also processes the history file (1) and appends the historical data to the new data to create the daily sales report needed.

image

The Benefit of this Approach.

The main benefit of this approach is that you don’t need to load data from hundreds of individual files. Power Query is very capable to load data directly from files, however this approach can be slow if you have lots of files and the files are very large.  In my case, the files have around 50,000 rows each, and there is one file per day.  To load data from 200 files like this would likely take 30 mins or more.

There Are a Few Problems With This Approach.

  1. The longer I leave it to re-create the history table, the slower the daily sales query becomes, as it has to cycle and load more and more daily files to extract the few product lines needed for the report.
  2. There is a bit of effort to re-run the history table every couple of weeks.  I then also need to reset the daily queries to only bring in data that is after the data in the history table.  I have parameter tables set up to do this, but there are still a few manual steps.
  3. I then of course need to copy the additional history file to the server each time I create an update.

Enter the New “Include in Report Refresh” Feature

Given the above, I immediately saw the opportunity to replace my entire history table process with a new history table directly inside the main Power BI Desktop file. The new pattern that I will use is as follows.

  1. Add a new history table directly into the workbook.
  2. When I refresh the workbook against my data warehouse, I will switch on the “Include in Report Refresh” menu, refresh, then turn it off again.
  3. I will modify the daily refresh Query to extract the last sales date from the history file, and to start the daily refresh from that date automatically. Given I know the history table won’t change (because refresh is disabled), there is no chance that the queries will execute in the wrong order and muck everything up.
  4. Send the workbook to the client.

The changes above will make this process easier and faster to maintain, and also faster on average to refresh every day.

And this Feature Could be Even Better

One extra feature I would like to see from Microsoft is to allow this setting to be configured with a parameter using the Power Query Language (PQL).  An example would be if an error was detected (using the Try command) when the Query failed to connect to my Data Warehouse, then turn off the report refresh for this table.  Now that would be something!  Please vote for that idea here.

Measures on Rows – Here is How I did it

Level: Intermediate

You may or may not be aware that it is not possible to put Measures on rows in a Matrix in Power BI. But I came up with a trick that makes it possible, so read on to find out how.

Measures Can Only be Placed on Columns

First the problem. The only way that you can use the Power BI Matrix visualisation (at this writing) is to place the measures on the columns as shown below.  You can then take any column of data from your data model (typically from a Dimension/Lookup table) and place that on rows in the Matrix.  So you end up with this.

image

This limitation does not exist in an Excel Pivot Table.  As you can see below, it is possible to flip values between columns on rows from the Pivot Table Fields List.

measures on rows

Old tricks for New Purposes

When I was thinking through this problem, my first intuition was to use a Switch statement somehow.  Switch statements can be used to create a Switch Measure (like the ones I used in this blog post here back in 2014) and I figured this could be part of the solution.  And then I remembered another trick I learnt from Rob Collie using disconnected tables to feed a measure – I new I had a solution.

So in summary, I wrote a Switch measure that will morph into different measures when filtered, and then used a disconnected table to pass a filter to that measure.

Write a Switch Measure

The Switch measure is pretty easy to write as follows:

The way this measure works is that it takes an input in the form of an integer, and then depending on that number, it will return the value of the appropriate measure as the result.

Then Create a Helper Table

The next step is the secret sauce to this trick.  I created a disconnected table with the sole purpose of passing the correct filter values (the integers) to the Switch measure.

I used the “Enter Data” feature in Power BI to create a list of the measures I want to put on rows in the Matrix.

image

Note that the ID column is also the sort order of the measures, and the ID column also matches with the Switch measure numbering scheme.  Once the table is created, I added it to the data model as a disconnected table – no need to join it to any other tables in the data model.

I then set the sort order of the Measure column as follows:

image

Write a Harvester Measure

The next step is to write a harvester measure to extract the selected filtered value of the table.  This measure is simply as follows:


The measure above will return the largest integer in the current filter context.  If there is a filter (eg a slicer) on the Measure column and the user as selected “Total Margin” in the slicer, then there is only a single row visible in the filter context, and hence the MAX of the visible rows is 3.  I would of course get the same result if I used MIN, AVERAGE, SUM or indeed even VALUES in my harvester measure.

Filter context is one of the hardest things for new users to understand properly.  I explain filter context in detail (and everything else you need to know to be great at using Power Pivot and Power BI) in my book “Learn to Write DAX”.
L2WD banner ad

You can see the harvester measure in action below.  I have placed the column from the MeasureTable into a slicer and the harvester measure into a Matrix.  When I click on the slicer, the harvester measure updates to show the ID of the selected filter.

slicer

Add the Measure Column to the Matrix

There is more than 1 way to filter a table of course.  In the image above I am using a slicer, but I can also use the rows on the Matrix to provide filter context.  When I place the Measure column from the MeasureTable onto rows in the Matrix, you can see below that the harvester measure is correctly returning the value of the ID column to the visualisation.

image

Pulling it All Together

In order to pull this all together, I replaced the harvester measure with the Switch measure I wrote earlier, and put the Categories column from the Products table onto columns in the Matrix.

image

But there are now 2 problems.  The first problem is that the Total Invoices are shown in currency format.  One limitation of this trick is that all the measures must have the same number formatting.  If you would like to see an optional “Alternative Format” for the SWITCH measure, please vote for that idea here.

The second issue is that the total row is actually showing the total invoices and not the total of all the measures.  This makes sense of course because the grand total row is not filtered.  As a result, the MAX of ID is 4 and hence the SWITCH measure is returning the value of the measure that maps to ID 4 (Total Invoices).

To fix these 2 problems, I removed Total Invoices from my table and rewrote the SWITCH statement to correctly handle the grand total row.

And there you have it, Measures on Rows in a Power BI Matrix.

image

Top Tips for Sharing Content Using Power BI

I have learnt a lot about sharing Power BI content over the last year since Power BI hit the main stream.  A lot of my learning has come from trial and error, and there are lots of things I wish I knew before I started.  Today I am sharing my top tips about how you can share, and when you should use each of the various methods

It is not easy to categorise the sharing approaches into logical buckets and there is quite a bit of overlap.  Hopefully my explanation below will make it easier to understand the options.

PBIX Workbooks

The most basic way to share your Power BI content is to simply share your PBIX workbook (created by Power BI Desktop) using traditional methods.  There are a few reasons why you might want to do this.

Reasons you might want to do it Things that might turn you off.
  • Everyone can download the software and use it for free (Windows users – sorry Mac boys and girls)
  • It has lots of new visualisations you can’t get in Excel
  • You don’t need your IT department to sign you up to the Power BI Service to get started (although you do need Admin Rights to install)
  • It is no harder to share a PBIX workbook than a Power Pivot Excel workbook.
  • Workbooks can get very large – hundreds of Mb so they can be hard to share if they are that big.
  • There are no Dashboard capabilities until you publish to Power BI Service.
  • You can’t stop sharing – once you send the workbook, the other party has it (not so with the Service).

So while you can do it this way, sharing via the Power BI Service is probably a better way to go.  Power BI Service was built for the explicit purpose of sharing, so why not use it – its free to get started.

My Workspace

Once you sign up to the Power BI Service, the default location where you will publish your workbooks is the “My Workspace” area.

image

  • The workspace area is a place for you to store your Datasets, Reports and Dashboards.
  • You are the only person that can see the items in you own “My Workspace” (unless you explicitly share material)
  • Items that other people share with you will also be visible in your own “My Workspace”, but with a “shared with me” icon.

Sharing from My Workspace (use caution)

There are 3 ways you can share from My Workspace.  These methods of sharing are also available in other areas of Power BI (covered further down the page).  While you “can” share this way from My Workspace, be sure to read up on the downsides of sharing this way and consider the other options further down the page.

Here are the 3 options to share from My Workspace.

Share a Dashboard from My Workspace

Click on the ellipsis next to the dashboard to bring up the sharing menu.  You can share to individual email addresses or to Active Directory Groups.  It is possible to securely share a dashboard this way with people internal to your organisation (email domain) as well as with people outside of your organisation.

image

Publish to Web from My Workspace

Sharing this way allows you to make the material publicly visible over the web.  You can get an embedding code to publish the material via a web page (see this example in a recent blog) or you can get a link that you can send to anyone. But be careful!  Even if you send the link to a single person, the material is available to anyone that has the link (or can find it, hack it etc) – no password required.

image

Create a Content Pack from My Workspace

A Content Pack is a bundle of Datasets, Reports and Dashboards that can be distributed to defined users in your organisation.  When you create the Content Pack, you can specify individual users or groups of users via Active Directory.

image

If someone wants to access this data shared with them as a Content Pack, they simply click “Get Data” and select the content pack from “My Organisation”

image

The benefit of sharing via Content Packs is that you can bundle relevant material and share with only those that need it.  They can choose to access it only when they want it.

There are some downsides of sharing from My Workspace

Here are the pros and cons of sharing this way from My Workspace.  Remember it is possible to share using these methods from a Group Workspace too – more on that below.

Pros Cons
  • It is free for the first 2 options, but you need a Pro License to create a content pack (option 3).
  • It is easy to share very large workbooks with other users as long as they have an account too.
  • You can share with Mac users (via browser)
  • You can share with Mobile users (there are apps and browser solutions)
  • You can stop sharing the content at any time by going back into the share options in the service and removing the sharing approval.
  • It is difficult to easily spot the dashboards you have shared from the My Workspace Dashboard view.
  • My Workspace can get very crowded with ad hoc reports and dashboards for your personal use, and these can then get in the way of shared material.
  • If you have a role to share material within an organisation (eg Business Analyst sharing material), then what will happen when you go on leave?  No one else can access your workspace.
  • There are better ways of doing it by using Group Workspaces.

Group Workspaces

If you are serious about sharing material, then you really should start using Group Workspaces. Group Workspaces are a Pro Licence feature, but it is almost a must for any organisation serious about sharing content via Power BI.

The benefits of using Group Workspaces vs My Workspace include:

  • More than one person can publish, edit and manage all content within a Group Workspace.
  • One or more people can administer user access to the Group Workspace.
  • If the original author is not available (ie they are on holidays or they leave the company) then someone else can take ownership of the material without starting again from scratch.
  • You can have as many Group Workspaces as you need, and these can be based on any group categorisation that works for you and your organisation.
  • When you have your shared material in a Group Workspace, there is no confusion as to what is your personal material and what material you have loaded for the purpose of sharing (see below how you can categorise the Group Workspaces).

image

Once you have decided to use a Group Workspace, you have the same sharing options as covered earlier with all the same benefits.

  • Share a Dashboard
  • Create and Share a Content Pack
  • Public Sharing – Publish to Web

Sharing Group Workspaces

The final method of sharing is to share the Group Workspace itself.  You can of course share the Group Workspace for the purpose of “administration”, but you can also share with the purpose of Group Collaboration and/or Content Sharing.  You have the ability to set access for individuals to be either an administrator or a member.  Admin always has full privileges. Members can be configured by an administrator to either be all “editors” or all “viewers” depending on the group requirements.  One limitation is that you can only share a Group Workspace with individual email address users, not an Active Directory group.  If you think that is just wrong (as I do), then please vote here to have this changed.

Hopefully this article has helped you gain a deeper understanding of the sharing options available with Power BI.  Microsoft has announced it is reviewing a new idea called  Private Publish to web, so keep an eye out for that, and vote for it if you like the idea.

Sharing Directly In Excel

As pointed out by Anthony (in the comments below), once you have access to shared data via Power BI, you will have access to another KILLER feature – direct access to the data from within Excel.  There was a new update on this today, and I blogged about it here. exceleratorbi.com.au/direct-connect-excel-power-bi-service/

Conditional Columns in Power BI Desktop

Microsoft recently announced a new conditional column feature in Power Query (Power BI Desktop version) that warrants a blog post.  This feature makes the process of creating banded columns much easier for the average user.  I blogged about banded columns using DAX here, and now there is this second, potentially easier approach.  In the demo below, I will add a new column of “age bands” just like I did with the DAX example above.

This column is then part of the data model and can be used in Power BI Desktop to slice and dice the visualisations.

The New Conditional Column Menu Item

I am using Adventure works and importing the Customer table (shown as 1 below) for this demo.  Note on the Add Column menu (2) there is a new menu item “Conditional Column” (3).

image

Add an Age Column

The first thing I need to do is create an Age column.  To do this, I first created a “Year of Birth” column by selecting the Birth Date column and then selecting Date\Year\Year.  This will add a new column containing the year of birth (keeps the original Birth Date column).

image

I then edited the code created by Power Query directly in the Formula Bar.

image

The original Code created by Power Query was this:

= Table.AddColumn(_Customers, "Custom", each Date.Year([BirthDate]), type number)

I simply changed the code directly in the Formula Bar to be like this:

= Table.AddColumn(_Customers, "Age", each 2016 - Date.Year([BirthDate]), type number)

Once you get the hang of the way Power Query formulas are constructed, you can easily make small changes like this to streamline the process.  Of course this is a hard coded approximate age, and it will not be 100% correct after the person has a birthday, but it is good enough for the purpose of this demo.

Add the Conditional Column

Then I clicked the new Conditional Column menu button (shown earlier) and the following dialogue box appeared.

image

It is just a matter of using the drop down menu items to build the logic for the banded column using standard Boolean If/Then logic.

image

The main thing to note is that you need to build the logic in a way that it incrementally checks each step and then it exits the logical test once it finds the right band.  In my example above, I started with the oldest band, and each test is a “Greater Than”.  I could have also started with the youngest band and applied  “Less Than” test.  But it would not work if I started with the youngest band and did a “Greater Than” test, or if I started with the oldest band and did a “Less Than” test.

The Final Result

Note how the new banded column now appears (shown as 1 below) and also note the code in the Formula Bar (shown as 2 below).

image

It has always been possible to write this code (shown in the formula bar 2 above) using the “Add Custom Column” feature.  What Microsoft has done here is to create the user interface to help the user write this custom column without having to learn the PQL language.  In fact this is what Power Query is all about – it is simply a user interface driven tool that sits on top of a power formula language, and the user interface does most of the heavy lifting for the user.

The product keeps getting better – keep up the good work Microsoft.

Power BI Personal Gateway Explained

One of the many excellent sessions I attend this week at the PASS Business Analytics Conference in San Jose was a session titled “Get Latest Insights by connecting your data using Power BI Content Packs and PBI Gateways”.  The title was interesting but the content presented by Dimah Zaidalkilani and Theresa Palmer-Boroski (both Program Managers on the Power BI team at Microsoft) was truly excellent.  I am going to share here what I learnt about the Power BI Personal Gateway, to help you understand if you need it, and if so how to get it installed and running.

There are actually 2 gateways that Microsoft currently has available for Power BI, the other being the Enterprise Gateway (for companies and the like).  I will not be covering the Enterprise Gateway in this post.  You would generally look to use the Enterprise Gateway if you refresh from work from a corporate network and need to connect through the firewall.

What is Power BI Personal Gateway?

personal gatewayAs the name suggests, this gateway is for “personal” use. To use it you simply install it on your own laptop, desktop computer or even a server if you want.  “Personal” does not mean you can’t use it for business if you want to, it is just a term to distinguish from the other gateway.  What the Personal Gateway does is create a “gateway” or connection between your computer (that has the software installed) and the Power BI Service in the cloud.  It is kind of like a dedicated VPN designed just to allow Power BI to talk to the data on your computer.

Why do I need it?

Well you probably don’t “need” it, but you certainly might “want” it.  Once you have installed and configured the Personal Gateway on your computer, you are then able to refresh your Power BI datasets directly in the cloud.  You have the option to either set up an automated schedule (up to 8 times per day) or you can trigger the refresh manually on demand.

When you refresh a Power BI Desktop file on your PC, Power BI Desktop connects to all the data sources it needs to refresh the report. Data sources can include things such as SQL Server, local Excel files, CSV files, or what ever you have set as your data sources for your reports.  The Power BI Personal Gateway simply allows PowerBI.com to access those same data sources that are visible on your PC transparently over the Internet.

The Regular Refresh Process

If you are anything like me, your standard refresh process for Power BI workbooks will go like this.

  1. Open the Power BI Workbook on your PC via Power BI Desktop
  2. Click on the refresh buttonimage
  3. Wait for 1 to 10 minutes for the workbook to refresh while doing something else.  30 minutes later you realise the refresh finished 20 minutes ago.
  4. Save and then “Publish” the workbook to PowerBI.com via the menu shown below.image
  5. You then get 1 or more prompts about where to load the workbook and also “do you want to replace the existing version” (seriously – what else would I want to do?).image
  6. Repeat process steps 1 through 5 over and over again for each Workbook you have developed.

The Power BI Personal Gateway Process

Option 1:

  1. Configure a daily or weekly refresh once and then do nothing but sit back and relax.

Option 2:

  1. Log on to the Power BI Service
  2. Click on the ellipsis next to each data source and click “refresh” for each workbook you need to refresh.  There is no waiting and the refreshes can be triggered in parallel (whether you should or not will depend on your computer and Internet connection.

Things You Should Know

There are a couple of things you should be aware of.

  1. Gateways are part of the paid subscription to PowerBI.com.  If you currently don’t have a paid subscription then you can’t use it.  Everyone has to decided for themselves if it is worth what it costs.  Keep in mind there are other benefits of the paid subscription, not least of which are “content packs”.  I will come back and talk about that another time.  Also keep in mind that Microsoft is currently not invoicing personal users for the professional version unless they are part of some broader corporate licencing agreement.  In my experience you will be given a 60 day free trial, and after 60 days you will be given a further 60 days free.  I guess this will end one day, but I haven’t seen the end yet. If it still isn’t right for you, consider purchasing Power Update as a local refresh tool.
  2. Your computer needs to be turned on and connected to the Internet at the time of the refresh if you want it to work.  Computer servers are “by design” always on and hence it is not a problem.  But your laptop or personal computer may not be on all the time and hence you need to keep this in mind.

How to Install the Power BI Personal Gateway

Installation couldn’t be easier.  Go to powerbi.microsoft.com/en-us/downloads/ and find the Personal Gateway download.  There is a link to the downloads page on the PowerBI.com  home page before you log in.

image

Launch the software then follow the steps in the install wizard and close then you are done.

Set your Data Source Credentials

The only thing that wasn’t obvious to me when I started using the Personal Gateway was that you need to manually set the data source credentials for each dataset you want to refresh through the gateway.  The first time I realised this needed to be done was when I got the following error message.

image

Clicking through the error took me to the settings section.  You can get to this session at any time from the regular “cog” settings menu.

image

For each data source in your datasets, you will have to “edit credentials” and then tell Power BI what the connection authentication method will be.

image

Here is an example of the dialog from one of my workbooks that has a SQL connection.

image

The dialog for regular PC files is very similar

image

Once it is done, you should see a confirmation message something like this for each connection.

image

Time to Refresh the Datasets

In PowerBI.com, click on the ellipsis next to one of your datasets, and then you can either set up your Scheduled Refresh settings (2 below) or manually refresh the report on demand (3 below).

image

If you select Schedule Refresh above, you will be taken to the settings panel (also clickable from the cog in the top right of the page).  From there you can set up the refresh schedule for all of your loaded datasets.  You may choose to stagger the refresh times across the data sets so that they don’t all hit your computer at once.  You can have up to 8 refresh events each day per dataset.

image

I hope you have found this article interesting.  I would to hear from different readers as to why they have decided to use or “not use” the personal gateway, so please share your story in the comments section below.

I you are looking for more detailed information, you can read about it at the official Power BI Site.