Archive for How to

Uses for DAX as a Query Language

This is the final article in my series about DAX as a Query Language.  In my first article I introduced DAX as a Query Language and then in my second article I went in deeper to the specifics on how to use some of the more useful DAX query functions.  In this final article today I am going to cover a few of the many uses for DAX Queries and also introduce one last DAX function – the ROW function.

Create Summary Tables

As you have may have seen in my previous articles, you can use DAX Queries to create summary tables like the one below.

This may be useful if you want to pre-calculate this summary table and have it as part of your data model (more on that covered in the “Materialisation” section below).  If you want to create a summary table for any reason you can do it using DAX Queries.

Create Tables to Use As Filters Inside  CALCULATE

Another major use of queries is as a table input inside a CALCULATE function.  Because the virtual tables retain lineage to the data model they behave as if they were physically part of the data model.   The following measure returns the total sales for all unique combinations of product category and customer gender where the margin to those unique combinations is greater than 45%.

This measures doesn’t have any real value however it does demonstrate the concept well.

Create Tables to Use in Iterators

You can also create table to use as the table parameter in an iterator like SUMX or AVERAGEX.  The following measure returns the straight line average Margin % across all the actual combinations of product category and customer gender.

Here is an article I wrote some time ago that explains how to use DAX Query tables instead of a more complex nested SUMX approach.  This article will be much more useful with your new understanding of DAX Queries.

To Debug DAX Formulas (the Table Bit)

One thing I really like about DAX Queries is they allow you to “Inspect” tables that are used inside DAX Measures and Calculated Columns.  Excel users come from a world where everything thing they do – every formula and every number – is materialised in front of their eyes in the spreadsheet.  When you begin to write more complex DAX Measures or Calculated Columns, the formulas will often be in the following format:

Your Measure = CALCULATE([Some Base Measure], <A complex table function used to filter the data model>)

The issue of course is you can’t actually “see” the result of “the complex table function used to filter the data model”.  If you can’t see it, it can be hard to conceptualise and also hard to debug the measure particularly if it gives an answer different to your expectations.  This is where the concept of materialising tables comes in.  A great way to move forward when you are stuck is to write a DAX Query to return just the table portion of your measure first.  Once you have that working correctly you can insert the table portion into the measure.

Materialise Tables Into Excel

It is possible to materialise a DAX Query into Excel by using a trick that I learnt from The Italians.  For this next example I have an Excel Workbook that contains an Adventure Works Power Pivot data model.  The first thing to do is insert a Table* (not a Pivot Table) connected to the data model.  To do this, go to a blank area of the spreadsheet (shown as 1 below) and then select Data\Existing Connections\Tables, then select any table (I have selected Products shown as 5 below) and then click Open.  *At this point you can select any table as it is going to change shortly anyway.

image_thumb[38]

You will be given an option to insert a Pivot Table however you also have an option to insert a Table as shown below.  Go ahead and click OK to insert this table.

image_thumb[40]

Now that you have a table inserted into a spreadsheet from the data model, it is possible to change what is returned in this table.  To change what is returned in the table, right click anywhere inside the table (shown as 1 below) and then select Table\Edit DAX.

image_thumb[42]

Now you have the opportunity to change this table from being a “table” to being a DAX Query as shown below.

image_thumb[43]

I have now taken a DAX Query I wrote earlier and pasted this into the expression box (shown below) to return a summary table into the spreadsheet.

image_thumb[44]

You can manually refresh this table by right clicking in the table and clicking “Refresh”.  Unfortunately there is no easy way to pass filters to this type of table in Excel although I believe it can be done with VBA. That may be a topic for another day.

Materialise Tables Into Power BI

You can also materialise DAX Query Tables into Power BI using the “New Table” function.  This can be very useful to help you debug the formulas you are writing (as I covered in this article).  The only thing different to note here is that you need to omit the EVALUATE statement inside Power BI New Tables.  The example below materialises the table (shown as 5) from the (formula shown as 4).

image_thumb[45]

Check the Performance of your Measures using ROW

DAX Studio has a really cool feature that allows you to understand how Power Pivot is completing its task of calculating a formula under the hood.  To demonstrate the point I need to introduce an new DAX Query function called ROW.

You can see the syntax for ROW below from the DAX Studio Intellisense.

image_thumb

ROW returns a single row table.  Because a DAX Query can ONLY return a table, it follows that if you want to see the result of a measure inside DAX Studio then you must first convert the scalar result of the measure into a table – this is exactly what ROW does.

Below is a simple example where I return the value of the [Total Sales] measure as a new table.

image_thumb5

My ROW function (shown as 1 above) returns a single row table with 1 column (called “my result”) and the row has the value 29.3m (shown as 2 above) which is the total for the entire database.  Note the Server Timings button shown in 3 above.  If you click this button it will enable the capability for DAX Studio to keep track of how long your query takes to execute and also how the query is being executed.

With the server timings enabled, a new tab will appear in DAX Studio (shown as 1 below).

image_thumb4

After executing the query, the server timings tab will show the total time taken to execute the query in milliseconds (shown as 2), how much time each engine in Power Pivot completed the tasks (Formula Engine and Storage Engine shown in 3), and if the Storage Engine Cache was used (shown in 4).

SE or FE?

This is a huge topic in its own right and way beyond the scope of this article. What I will say here is that in general you want to leverage the Storage Engine in preference to the Formula Engine.  Take a look at some of the key points below.

Storage Engine Formula Engine
Really Fast Really Smart
Can retrieve records without decompressing in the right conditions Iterates in a row context and hence it can be slow
Multi threaded Single threaded
Cached Not cached

I will cover the Storage Engine and Formula Engine in another post sometime.

CROSSJOIN Revisited using ROW

Earlier in this series I warned that CROSSJOIN can return very large tables, but this is only really an issue if you try to materialise the tables.  If I write the following query in DAX Studio, it takes forever to execute and I ended up canceling the query (remember there are more than 7 million rows in this table).

image_thumb[46]

The issue is not the creation of the above table in Power Pivot – the issue is the materialisation of this table in DAX Studio.  Consider the following query that uses ROW to return a single row table containing the count of the rows in the CROSSJOIN query.

image_thumb[47]

The above query executed in 400 milliseconds on my PC confirming that Power Pivot can create these large tables really quickly.  If you want to materialise the results, well that is a completely different thing.

DAX as a Query Language Part 2

In my last article I introduced DAX as a Query Language and explained why learning to query your data model can be useful.  I covered the basics about using a tool like DAX Studio to query a database and return a table of data as a result.  In this article I am going to go deeper and introduce the more common and useful query functions including CROSSJOIN, SUMMARIZE and ADDCOLUMNS (there are more functions that I don’t cover in this article).

Here is the link to the PBIX data model again if you want to follow along and do the examples yourself.  I am using DAX Studio again to connect to Power BI Desktop running on my location machine.  If you take the time to complete these examples you will significantly increase your learning and knowledge retention.

Lineage

Before I move on, an interesting and important fact about Power Pivot (when compared to traditional database tools) is the concept of lineage (pronounced LIN-E-AGE).  When a new virtual table is created in a query or a formula in Power Pivot, the new table will include an automatic one to many relationship to the table from where it was created.  Consider the following simple table from last time.

image

You can see above that this query produces a single column table of all of the unique product categories.  This new virtual table retains lineage to the data model.  In this case the new virtual table comes from the Products table and hence the new virtual table has lineage to the Products table.  You can “imagine” this as shown below with the new table having a 1 to many relationship to the Products table.

.image

Note the image above is just a visual simulation of what happens.  The virtual table is not materialised and you cannot actually see it in the relationship view. But the virtual table does exist (virtually) and the lineage to the Products table also exists – it is just that you can’t actually see it.  I recommend that you learn to “imagine” this happening in the data model in your mind as it will help you understand how the new virtual table interacts with the rest of the data model especially as it relates to context transition.

All virtual tables have lineage to the tables from where they came from.

CROSSJOIN

CROSSJOIN is a function that can create a new table from 2 or more source tables. For this example I am going to join some virtual tables. The first virtual table is VALUES(Product[Category]) which of course returns a list of all the unique product categories.

image

The second virtual table is a list of all possible Customer genders

image

Next I am going to use CROSSJOIN to create a new table containing all the unique combinations of both tables.

image

In the table above there are 4 rows x 2 rows giving a total of 8 rows of all of the unique values.  Continuing on the concept of “imagining” the way these new virtual tables are related in the data model, it would look something like this:

image

Remember this is just a simulation of what it looks like.  These tables at the top are not materialised and you cannot see them in the data model.  But you can “imagine” them as looking like this and they behave in exactly the same way as they would if they were physical tables.

M x N Can Mean a Large Table

You need to be careful with CROSSJOIN as by definition the resulting table will be m x n rows long where m is the number of rows in table 1 and n is the number of rows in table 2.  If I were to CROSSJOIN the customer table (18,484 rows) with the Products table (397 rows) I would end up with more than 7 million rows.  This in itself is not a problem for Power Pivot to create such a large table in memory, but it can definitely be a problem if you try to materialise the table.  More on that next week.

SUMMARIZE

SUMMARIZE is by far my favourite DAX Query function.  SUMMARIZE can do similar things to CROSSJOIN however CROSSJOIN can join tables that do not have relationships whereas SUMMARIZE can only join tables that are related with a many to 1 relationship.

SUMMARIZE first takes a table and then one or more columns (that can be reached via a many to 1 relationship) that you want to include in the new summarised table.

SUMMARIZE(<table>, table[column], table2[column],….)

Here is an example.

image

The query results above are similar to the CROSSJOIN query from before but there is one important difference.    SUMMARIZE will only return rows that actually exist in the data itself (Note there are only 6 rows above compared with 8 rows in the CROSSJOIN example).

Consider the relevant tables from the data model below.

image

Here is the SUMMARIZE formula written earlier.

This query starts with the Sales table and then adds the Products[Category] column from the Products table and the Customers[Gender] column from the Customers table.  The 2 columns specified inside the SUMMARIZE formula come from tables on the 1 side of the many to 1 relationships – this is allowed.

The following is not allowed and will not work.

EVALUATE
SUMMARIZE(Products,Sales[CustomerKey])

It doesn’t work because the column Sales[CustomerKey] cannot be reached from the Products table via a many to 1 relationship.

It is also possible to write a SUMMARIZE statement over any single table.  In the example below, the SUMMARIZE statement returns a list of all possible combinations of product category and colour.

image

You could also achieve the same result with the ALL function (which would be an easier solution if you are just using a single table)

Adding Summary Sales to the Summary Table

So far the SUMMARIZE queries above are just lists of valid combinations.  It is time to do something more interesting and add the [Total Sales] to these summary tables.  Before moving on please note that the following formulas are not best practice – there is a better way which I will cover later.

Consider the following formula

Note specifically that the table parameter in this formula is “Products”.  Also note below that this formula returns blank rows (shown below).

image

This summarize statement correctly summarises all the combinations of Product[Category] and Product[Color] in the products table and then for those products where there are sales, those sales are shown next to the unique combination.  But in some cases the unique combination doesn’t actually have any sales, hence the blank rows.

Using Sales as the Table Parameter

If I change the formula above and swap out the Products table with the Sales table, then the blank rows are no longer visible (see below).

image

SUMMARIZE will always find the unique combinations that actually exist in the selected data.  Because this new formula starts from the Sales table, only combinations of Product[Category] and Product[Color] where there are actual sales are returned.

Context Transition or No Context Transition?

Those of you that are familiar with the concept of context transition may be thinking that context transition is occurring here.  That is a valid thing to assume but this is not what is happening here.  Consider the following formula.

image

Note how I have swapped out the measure [Total Sales] with SUM(Sales[ExtendedAmount]).  With this new formula above there is no CALCULATE forcing context transition yet despite this the table still returns the same result.  This implies that SUMMARIZE does not operate in a Row Context.  In fact SUMMARIZE is a Vertipaq Storage Engine operation.  The part that produces the valid combinations of columns is very efficient however the calculation of the total sales figures are very inefficient.  For this reason it is better to use ADDCOLUMNS to add the sales totals (see below).

ADDCOLUMNS

ADDCOLUMNS does exactly what it suggests – it adds new columns to a table in a query.  The general syntax is as follows:

ADDCOLUMNS(<table>,”Column Name”, <formula or measure>,….)

To demonstrate how this works, let me start with a formula from earlier that produces the following table.

The SUMMARIZE function returns a 2 column table with 15 rows – all the possible combinations that contains sales values.  This table can be used as the table parameter in the ADDCOLUMNS formula as follows.

image

You can see above that this new table returns the Total Sales for each of the 15 possible combinations.

And it is possible to add as many new columns as you need to the summary table.  See below.

image

The Important Differences between ADDCOLUMNS and SUMMARIZE

Now you have seen that it is possible to add columns (such as summary total sales) to a table using SUMMARIZE and also with ADDCOLUMNS.  But there are some important differences between these 2 approaches.

ADDCOLUMNS has a Row Context

Unlike I showed with SUMMARIZE earlier in this article, ADDCOLUMNS does have a row context.  Consider the following query.

image

When I swap out the measure [Total Sales] with SUM(Sales[ExtendedAmount]) the results are wrong.  This shows that ADDCOLUMNS operates in a row context.

Efficiency

When given the choice, you should choose to use ADDCOLUMNS in favour of SUMMARIZE to add these additional columns of data.  ADDCOLUMNS is a lot more efficient in the way it adds the values to the SUMMARIZE table. SUMMARIZE uses a Vertipaq Storage Engine operation to produce the base table and then ADDCOLUMNS leverages lineage and context transition to add the value columns – this approach leverages the special capabilities of Power Pivot to do the job in the most efficient way.  For a more detailed coverage of this topic you should read this article from The Italians.

Other DAX Query Functions

I realise I have not covered all DAX Query functions in this series of articles.  There are others, some of which are only available in the newer versions of Power Pivot (eg Power BI Desktop, Excel 2016).  If you are interested to find out more you can do some research online.  I will be covering 1 final function next week – the ROW function.

Uses for DAX Queries

Next week I will share my final article in this series where I explain a few ways that you can use DAX Queries in the real world.  Be sure to check back next week, or better still sign up to my weekly newsletter to be notified when there are new articles.

An Introduction to DAX as a Query Language

Level: Intermediate

I have written quite a few articles that contain embedded references to DAX Queries and DAX Table functions in the past.  Today I am writing an article that introduces the DAX query language from the beginning so you can learn how to use it and how it will help you to become a Power BI superstar.

What is a Query Language?

A query language is a programming language that allows you to send a request to a database and then receive data back in the form of a table.  If you only have a few hundred rows of data then this is not overly useful.  If on the other hand you have millions of rows of data then it becomes very useful.

The most important difference between DAX when used in Measures and Calculated Columns and DAX as a query language is that the query language always returns a table.  The resulting table will consist of one or more columns, none or more rows, and data inside none or more of the cells in the table.

Why Do I Need a Query Language?

When you create a data model in Power BI or Power Pivot for Excel, you are essentially loading tabular data into a dedicated reporting database for the expressed purpose of analysis and/or reporting.  Once it is loaded, you (the report author) then typically spend hours of effort building out the data model so it can be used to add value to your business/job.  Finally once this is all done, the most common way to use your new data model is to build reports and pivot tables that present data to a user in a way that is easy to understand.

Below is an example of an interactive Power BI report that runs off a Power BI data model.

And here is a Pivot Table that runs off an Excel Power Pivot data model.

Both of these above visualisations are summarisations of the underlying data and in both cases you can’t actually see any of the the underlying data itself, just the summary.  The visualisation tools above are great and have enormous utility however they are generally not the best tools if you want to see the underlying data itself or if you want to extract very large sets of data for whatever reason.

Once you have invested all this effort in building your data model, you are going to want to use it to the fullest extent. Power Pivot and Power BI can handle many millions of rows of data (unlike traditional Excel) and so the “old way” of simply looking at the underlying table is not really practical in this brave new world.

Show Me the Underlying Data!

 

If you take the pivot table I showed earlier, most Excel users would be aware that they can double click on any value cell (say 31.6% representing Mountain Bike sales to people aged between 30 and 40 years of age).   When you double click on a traditional pivot table in this way, something magical happens – a copy of the underlying data is extracted from the source and it magically appears in a new sheet in Excel.  Once you are done looking at the copy of the data, you can delete the new sheet and you are back where you started – life is good.

When you double click on on a traditional pivot table in this way, you are effectively sending a query to the underlying database and the query response is a table of data.

But Does it Work With Power Pivot?

In short, yes and no depending on what you are looking for.  The experience is different if you have an Excel Power Pivot data model or if you are connected to an external SSAS data source. In my experience the results are inconsistent at best and at worst it may not give you what you expect.  But the good news is that with DAX as a query language you can easily write any query you want and extract exactly the subset of data you need.  You no longer have to rely on double clicking your pivot table to interrogate your data.

Time To Learn DAX as a Query Language

OK, enough of the chatter, time to get serious about DAX as a query language.  You have invested time and effort in building your data model in Power BI or Power Pivot and now you rightly want to extract a subset of your data so you can take a look at it.  Let’s get started.

First Select the Right Tool

There are 3 general approaches you can take to extract a table of data from your fabulous data model.

  • Use a dedicated tool like DAX Studio or SQL Server Management Studio.  I wrote an article describing how to install and use DAX Studio here.
  • Write a “New Table” in Power BI Desktop.  I wrote an article about how to use this feature here.
  • Use an Excel “table hack” I learnt from Marco Russo to extract a table directly into Excel.  I will cover this hack later in this article series.

I am going to use DAX Studio in this blog article but you can use Power BI “New Table” if you want to follow along and you don’t have (or want) DAX Studio. In fact if you want to learn this stuff then you really should follow along using either approach.  In my experience you can’t learn new concepts simply by reading.  Please note however that using Power BI and Excel as a query tool should be considered as a temporary extraction of the data.  Once you have looked at the data you want, you would normally delete the table so you don’t end up storing redundant data (you can always extract it again if you need it).

Let’s Start With the Evaluate Statement

All DAX queries must start with the EVALUATE statement. There is an exception however.  When you use the Power BI “New Table” button to write a query, you simply omit this EVALUATE statement.  This is no doubt another example of “Syntax Sugar” where the developers shield the user from some of the more confusing syntax.  If you are following along and testing my queries below in Power BI, then simply omit the EVALUATE statement when you write the queries.  In addition if you ever want to move your DAX Studio query to a table in Power BI, then you can just remove the EVALUATE statement at that time.

Connecting DAX Studio to the Database

When I launch DAX Studio I am prompted to connect to my database. In this case I have an instance of Power BI Desktop with an Adventure Works data model running on my PC and DAX Studio gives me an option to connect to that.  You can also connect to Excel Power Pivot data models and Tabular SSAS servers.

image

If you want a more comprehensive overview of the DAX Studio UI, then read my other post I mentioned earlier.  Otherwise let’s get into it.

Return an Existing Table

The easiest DAX query you can write is to simply return a complete copy of an existing table.

When I run this query in DAX Studio (shown as 1 below) and then press F5 to run, the entire table is returned (shown as 2).  A total of 18,484 rows of data (shown as 3).

image

This in itself is not overly useful as chances are you have this table already in some source system.

List of Unique Values

In DAX there are many functions that return tables instead of scalar values.  I covered this topic in depth last week so you can read up on that here if you like.

Let me now look at the ALL function.

When I execute this table function as a query, I can actually “see” the results of the query.  In this case the query returns a single column table consisting of 4 rows – all the unique product categories.

Next let’s look at VALUES.

In this instance, the VALUES table function returns exactly the same result.

image

Now let’s look at DISTINCT.

And same again here.

image

All three of these table functions return the same table in this instance.  In reality they all do different things in different filtering scenarios.  This is one of the great things about DAX Queries is that you can “inspect” what is happening to these table functions in different scenarios.  More on that later.

A Two Column Table

The ALL function can take 1 or more columns as an input (or a table).  In the example below I have used ALL to materialise a table of all of the combinations of Product Category and Product Sub Category.  There are 37 unique combinations in all.

image

A Filtered Table

Most people that know DAX will know of the FILTER function.  The FILTER function returns a filtered copy of a table.  eg see the example below

This is a much more interesting query because it returns a table containing all the products that have sold more than $200,000 of sales over all time.  There are 49 such products.

image

Altering Filter Context

When you write a DAX measure or Calculated Column, there is only 1 way you can change the filter context and that is using the CALCULATE function.  CALCULATE returns a scalar value after applying filters.  CALCULATETABLE is the sibling of CALCULATE.  As you can probably guess, CALCULATETABLE returns a table after applying filters.

This next query returns exactly the same result as the last query.

image

the main difference with using CALCULATETABLE compared to just using FILTER is that CALCULATETABLE can return a different table to the FILTER function itself.

The above query uses the same FILTER function from before, but instead of returning a table containing the products that sold more than $200,000, it returns all of the actual sales transactions that contributed to those products qualifying (see results below).

image

Combining Table Functions

Now that you see how some of the standard table functions work in a query, let’s look at combining them together into a single query.

This query first applies a filter on all products that have sold more than $50,000 (line 4 below), and then returns a list of unique product subcategories that contain these products.

image

One interesting thing in the query results above is that “helmets” are in the list (the only non-bike subcategory). To see what the exact products where that sold, it is simply a matter of writing another query as follows.

image

Note how I was able to reuse the the original query structure to return the list of product names with just a minor change to the query.

More to Come Next Week

This is a big topic – too big for a single blog article.  This week I introduced the topic of DAX queries, I showed how you can write queries to return tables and how you can combine table functions together to interrogate data.  Next week I will introduce some new special query functions that extend the capabilities even further.

Relationships in Power BI and Power Pivot

Level: Beginners

Power Pivot is a database that has been designed from the ground up to be optimised for reporting and analysis in Power BI, Power Pivot for Excel and SSAS Tabular.  The technology is essentially the same across all of these products so I will generically refer to Power Pivot in this article.

Power Pivot uses an in memory columnar database (Vertipaq) as the foundation technology – the same technology for all versions (Power Pivot, Power  BI and SSAS Tabular).  The Vertipaq engine is what makes Power Pivot both super fast and highly compressed.  A Power Pivot database is not the same as a relational database (RDB) and it does not support all the relationship types that relational databases support.  This can be confusing for people that are new to Power Pivot, particularly if they have at least a basic understanding of how databases (such as MS Access) work.  I explain what you need to know to get started with relationships in Power Pivot here.

Virtual Relationships

This article is specifically about physical relationships, however there are ways to create virtual relationships using DAX.  Examples include using LOOKUPVALUE, FILTER, CROSSFILTER and other techniques.  I am not covering these types of relationships in this article.

Relationship Cardinality Types

There are 3 different possible physical relationship cardinality types in database design but not all are supported by Power Pivot.

Relationship Cardinality Type Power BI Support Power Pivot for Excel Support
One to Many Yes Yes
One to One Yes No
Many to Many No No

One to Many Relationships

image

The one to many relationship is the foundation of Power Pivot.  In the example above (from Adventure Works in Power BI Desktop), the Customers table is on the 1 side of the relationship and the Sales table is on the many side of the relationship. These tables are joined using a common field/column called “CustomerKey”.  Customer Key (aka customer number) is a code that uniquely identifies each customer.  There can be no duplicates of the customer key in the customer table.  Conversely the customer can purchase as many times as needed and hence the customer key can appear in the Sales table as many times as necessary.  This is where the name “one to many” comes from – the customer key occurs once and only once in the Customers table but can appear many times in the Sales table.

Tables on the one side of the relationship are called Dimension tables (I call them Lookup tables) and the tables on the many side of the relationship are called Fact tables (I call them Data tables).

The entire Power Pivot Vertipaq engine is optimised to work with this (one to many) type of relationship.

One to One Relationships

The One to One relationship is only supported in Power BI and the newest version of SSAS Tabular.  In my view this relationship type has limited value and in most cases it is better to combine these tables into a single flat table prior to loading to Power BI.  Consider the model below.

image

The first relationship (shown as 1) is a 1 to many relationship between the Customer table (Lookup table) and the Sales table (Data table).  The Customer Socio Economic Data table is joined to the Customer table via a 1 to 1 relationship (shown as 2 above).  If there is a benefit (to the user of reports) of splitting this Socio Economic data into a separate table then of course you should do so.  If there is no benefit, I recommend you combine all the data from Customer Socio Economic Data table into the Customer table using Power Query on load.

Every relationship has a “cost” in that it will have some affect on performance.  The performance impact may not be noticeable for simple models but may become an issue with very complex models.

If you only remember 1 thing from this article, then please let it be this:  Don’t automatically accept the table structure coming from your source data.  You are now a data modeller and you need to make decisions on the best way to load your data.  Your source system  is probably not optimised for reporting (unless it is a reporting datamart) so please don’t assume that what you have got is what you need.

Many to Many Relationships

The many to many relationship type is not supported in Power Pivot.  This is a deliberate design decision that has been made as a trade off to ensure optimum database performance.  If you have data that is logically related with a many to many cardinality, there are modelling techniques you can use to solve the problem in Power Pivot that are covered in my many to many pattern article here.

UI Differences

There are a few differences in the Power BI Desktop/Excel 2016 UI (relationship view) and that of Excel 2010/2013.

Excel 2010/2013

The early UI has an arrow pointing to the lookup table (the one side of the relationship) and a dot on the many side.  This is unfortunate as the arrow is pointing in the opposite direction of the filter propagation.  This only exists in Excel 2010/2013 (and the early version of SSAS Tabular).

Power BI/Excel 2016

The UI has been significantly improved with Power BI Desktop and Excel 2016.  As you can see below, the 1 to many relationship is now clearly shown, and there is also a new arrow showing the automatic filter propagation direction.

image

One Active Relationship

It is possible to have more than 1 relationship between tables in Power Pivot, but only 1 can be active at a time.  An example of when you may want multiple relationships is if you have a Sales[Order Date] and a Sales[Ship Date] in your data table.

In this scenario (shown above in Excel 2013) you may want to join both Sales Date columns to your Calendar table so you can use time intelligence in your data model on both Order Date and Ship Date.

The active relationship is shown as a solid line (above) and the inactive relationship is shown as a dashed line (in this case it is highlighted in blue above).  The active relationship is used by default in all DAX measures however you can over-ride this default and use the inactive relationship (when needed) by using the USERELATIONSHIP() function.  Full details on this are covered in my article here.

Cross Filtering Behaviour

Power Pivot is optimised to work with one to many relationships and to automatically propagate filters (filter context) from the one side to the many side.  In all versions of Power Pivot for Excel, this is the ONLY type of filter propagation that is available.

Power BI supports bi-directional cross filtering behaviour (shown right below) as well as single direction (shown left below).

image

In fact bi-directional filtering is the default behaviour for Power BI Desktop.  There are many people (including me) that think this is a bad idea as bi-directional cross filtering comes at a cost – there is an overhead of constantly cross filtering the lookup table based on the contents of the data table at time when it is actually not required.  Sure if you have a simple model and you need this behaviour and you don’t know how to handled it with the many to many pattern  then turn it on.  But surely this should not be turned on by default.  In addition, if you have more than 1 data table, bi-directional cross filtering can cause circular references causing further confusion to unsuspecting users.

I think Microsoft is trying to make Power BI more user friendly for the “lay user” however in this case I think Microsoft has made a mistake.  Do yourself a favour and turn off bi-directional cross filtering unless you explicitly need it.  To change it just double click on the arrow and set the cross filter direction to single.

update:  Last week 17th Feb 2017 I noted that new models I built were single directional by default – seems Microsoft has listened and changed the default behaviour.

image

Wrap Up

Hopefully this article has helped you understand more about how Power Pivot works.  Let me know in the comments below if there is anything I have missed.

SQL Saturday Melbourne Feb 2017 Materials

image

I am currently in Melbourne for PASS SQL Saturday 2017.  SQL Saturday is an annual one day conference event that occurs in many major cities around the world.  It is focussed on helping SQL Server professionals learn more about their profession and also network with other like minded people.  I have a lot of people that are readers of my website material that are not SQL Server professionals – most likely they are Excel professionals.  But that doesn’t mean there is nothing for you at SQL Saturday.  I encourage you to look for an event in your location, read the session materials and decide if there is value for you – particularly if you are into Power Query and Power Pivot.  You can learn more at www.sqlsaturday.com/

SQL Saturday will be held in Sydney next Saturday 18th Feb 2017 and I would love to see you there.

My Presentation: Disconnected Tables in Power Pivot

I spoke at SQL Saturday about disconnected tables in Power Pivot.  Most people know that you create relationships between tables in Power Pivot but did you know that you can load tables that are not joined (disconnected tables) and they can still add value.

My Slide Deck

Here are the slides that I used in my presentation today for those that would like to download them.  I realise that they really just support the live demonstration, but they should have meaning for those that were there.  I have also linked to some of my other articles on 3 of the slides if you would like to read more.

SQLSat_MattAllington_DisconnectedTablesInDAX

Use Power Query to Compare Database Records

I was helping a user on community.powerbi.com this week. The user had a list of service desk tickets for multiple customers and the requirement was to identify when a customer opened a second new ticket within 7 days of closing a previous ticket.  This is not an uncommon scenario so I thought it would make a good blog article.

image

One of the challenges that new users have when using Power BI is to decide if they should use Power Pivot (DAX modelling) or Power Query (PQL shaping) to solve each problem.  And if the answer is DAX, then they also need to decide if it should be a measure or calculated columnThere is no one single correct answer to these questions and it always depends on the circumstances.  I generally recommend to shape your data properly prior to loading to Power Pivot, that way the DAX formulas are easier to write and the reporting database will be more performant.   I looked at the options for this problem and decided to use Power Query to shape the data before loading.  Power Pivot and DAX are not really designed to do ordinal row comparisons, so this task is better handled with Power Query.

For this particular problem I thought I would produce a video that demonstrates one way in which this problem can be solved.  Here it is.

My Online Power Query Video Training

One of my new year’s resolutions was to baton down and create an online Power Query training course.  This has been on my to-do list for almost 2 years now and I am well under way.  The video above is a sample of my video course.  I start from the beginning and walk the viewer through the foundation skills they need to know to be a Power Query guru.  Towards the end of the course I will have a stack of videos like the one above where you can broaden your awareness of how you can use your new skills in real world situations.  If you would like to sign up and be notified once the course is finished (no obligation), then please enter your contact details in the form below.

Power BI On Premise – Not in the Cloud!

Microsoft has recently released a technical preview of Power BI that can be installed on premise inside a company firewall and the Power BI reports can be securely published over the Intranet (not Internet) via SSRS.  Today I am sharing my experiences setting up the technical preview for the first time.  I installed the software on my PC however if you are going to use it for your company, you need to install it on a server.  Time to get one of your friendly IT folk to help you when you are ready to set this up and demonstrate what can be done to your business leaders.

Guy in a Cube is Your Friend

Adam Saxton shares a truck load of information at his Guy in a Cube YouTube ChannelI watched Adam’s video on how to install this technical preview and followed his instructions.  I wont repeat all the details here as he has done a great job of that already, however I wanted to share my experience and observations.

My Install Experience

For the record, I installed the 64 bit version on my local PC.  I already had SQL Server 2012 installed as well as SSAS Tabular 2012.

1. I downloaded the Jan 2017 technical preview of Power BI Desktop (SSRS edition) PBIDesktopRS_x64.msi and SQL Server Reporting Services SQLServerReportingServices.exe (both available here).  1 minute to download both files.

image

2. I installed and ran Power BI Desktop installer for SSRS first.  After installing, I ran the new software.  It looks, loads and behaves just like Power BI Desktop except for the reference to SSRS.  There was no conflict with my existing copy of Power BI Desktop, so now I have both versions installed side by side.  I assume the special SSRS features will find their way into the main version of Power BI Desktop in time.

image

3. I then installed the SSRS Technical Preview software (I note it has a short life – 180 days, which is to be expected).  1 minute to install.

image

4. I was then prompted to configure SSRS.  I already have a SQL Server instance running on my PC (2012), so I used that (I didn’t need to download a SQL Server Evaluation Edition) from the link shown above.

5. I then clicked the configuration button and gave it the name of my localhost server. I then got the following message.

image

6.  I then had to refer back to Adam’s video as I wasn’t sure what to do next.  Adam instructed to navigate to a URL in a browser (MachineName\Reports).   I found the exact URL for my SSRS server in the configuration tool as shown below.

image

Note you don’t need the :80 port number as port 80 is the default.  So for me the URL was simply

desktop-1sckel7/Reports

But when I went to that address in Google Chrome, it didn’t work.

image

On a hunch I decided to reboot my PC, then it worked just fine.

image

7. I noticed in the SSRS configuration manager that there is an option to register Power BI Integration.  This seems to be related to users being able to pin tiles from an On Premise Power BI report onto a Cloud based Dashboard, although I am not 100% clear on this.  I didn’t touch anything here and it all worked fine.

image

8.  This preview only works with a live connection to SSAS (Tabular or Multi Dimensional).  I have SSAS Tabular running on my PC but I didn’t have a Tabular Data Model I could use, so I decided to restore a Power Pivot model from Excel to my server.  I launched SSMS to connect to Analysis Services Tabular (2012 in my case).

image

I right clicked on Databases (shown as 1 below) and then selected Restore from Power Pivot (2 below).

image

I followed the instructions to restore the file.  At first I couldn’t restore the file due to a file access permission error. I have struggled with this forever (I am not a SQL Server Pro) but I quickly found this article that told me how to give SQL Server permission to access the folder that contained my Power Pivot workbook.

After restoring I couldn’t see the database.  I had to first right click (1 below), then select refresh (2 below).

image

9. Using my new version of Power BI Desktop, I created a new workbook and connected it to my SSAS server using connect live.

image

I created a quick demo report and then saved the report.  I navigated to the SSRS Reports Page in my Browser (shown below) and uploaded my new report.

image

Bingo – On Premise Power BI

And bingo – there it is, working through a browser like a charm.

image

Observations

  • The first report was slow to render.  The second and subsequent reports were fast.  This could be related to my SSAS Server on my PC not being hot.
  • Now when I double click on a PBIX file, the SSRS version of Power BI Desktop launches by default.  I tried to manually change this back but couldn’t get it to change.  I decided to reinstall the latest production version of Power BI Desktop (repair install) and that fixed it.  The SSRS version still worked but was now not the default.
  • The URL for the report page is in the format MACHINENAME/Reports/PowerBI/FileNameWithoutThePBIXextension.  As you can see in the image above, my file name was “Power BI On Premise” including spaces.  That meant that the URL was actually desktop-1sckel7/Reports/powerbi/Power%20BI%20On%20Premise  I hate all those %20 that indicate a space.  I renamed the file using underscore characters instead of spaces, and that gave me a much neater URL

image

Summary

All in all I have to say this is pretty exciting.  I have lots of current and potential clients that I have spoken to that are not willing to proceed with a cloud based solution.  A production ready on-premise solution seems very close now – can’t wait.

Best Practices for Power Pivot, Power Query and Power BI

Level: Beginners

There are many best practices for Power BI, Power Pivot and Power Query.  I know these things so intuitively now that it is very easy to forget how I incrementally learnt these things along the journey.  Most of these things are not “right” vs “wrong” – they are more often simply “better”.  I thought there would be value in producing a page that clearly outlines the important best practices as a learning reference for everyone.  In all cases I have outlined why it is a best practice to help the understanding.  You would be well placed to adopt these best practices as part of your DAX journey.

Naming Conventions

Naming your Columns and Measures

  • Always write a Column in the format TableName[Column Name]
  • Always write a Measure in the format [Measure Name]

This is the foundation of all the DAX formulas you will write.  Both columns and measures use the same square bracket syntax.  It is technically possible to write measures and columns both including the table name as follows.

  • TableName[Column Name]
  • TableName[Measure Name]

These 2 examples above do not cause a problem because their names makes it easy to know what they are, but if you always place the table name at the front of every measure and every column, then it will be impossible to tell them apart by reading the formula.  Take the following example

  • TableName[Total Sales]

Is the above a column or a measure?  It is impossible to tell unless you are using the best practice naming convention.

Measures and columns are very different in the DAX language.  It is essential that you can tell at a glance which is which.  In addition if you hard code a measure with the table name and then you later move the measure to another table, any formulas referring to this measure will stop working.

Give Tables a Single Noun Name

  • Don’t just accept the table name from your source system.  Preferably give the table a single word noun description/name.

Many BI data sources will have long table names like fctSalesTransactionsHistory or dimCustomerMasterFile.  This is a common practice in IT and is related to the Kimball dimension modelling methodology.  The problem is that with Self Service BI these table names, column names and measure names are more visible to business users than ever before.  Given many business users are going to be reading these table names, it is much easier for them to comprehend the “Sales” table rather than the “fctSalesTransactionsHistory” table.  In addition earlier versions of Power Pivot do not have fully featured intellisense – you are required to refer to columns by starting to type the table name from the beginning.  If every table starts with either fct or dim, you have just added 3 additional characters you have to type for each formula before Intellisense can help you.

Also PowerBI.com has a natural langauge query tool that allows you to ask quetions of your data.  If your table names are ‘words’, then you are helping the engine find what you are looking for.

Using Spaces in Names

  • Don’t use spaces in table names
  • Do use spaces in column names
  • Do use spaces in measure names

If you use spaces in table names you will be forced to add single quotes around the table name each time you refer to it in a formula.  This makes the code longer, harder to read and “untidy” (IMO anyway).  It is better to use underscore_characters or CamelCase instead of spaces (or better still use a single noun name as mentioned above).

Columns and measures always need to be wrapped in [Square Brackets] anyway and hence adding spaces does not make the code any more complex.  Columns and measures are easier to read if they have spaces

Don’t Overly Abbreviate Business Terms

  • Give your tables, columns and measures descriptive business names without overly short abbreviations.

Firstly you should use language and abbreviations that are commonly used in your organisation.  So if “Year to Date” is commonly abbreviated to YTD, then for sure you can use this abbreviation in your measure names eg [Total Sales YTD].  However if you develop a new measure called [Total Sales Last Rolling Quarter] and this is not a common concept across the organisation, then you are just making it hard for yourself if you call your measure [Ttl Sales LRQ].  You will simply have people calling you asking what it means.

Secondly Power BI has a feature called Q&A that allows a user to ask a natural language question about data.

eg.  What were the total sales for bikes last year

This natural language algorithm looks for matches in words in the question against the data model to help answer the question.  If you abbreviate your measure names to for example [TtlSales] instead of [Total Sales], you are making it hard for Q&A to do its work.  You can help Q&A using synonyms, but do yourself a favour and don’t over abbreviate your name.s

Measures or Calculated Fields

  • Measures is a better name than Calculated Fields

The term measures has been a Microsoft term for a BI formula for many years.  In the first release of Power Pivot in Excel 2010, Microsoft adopted this term.  Unfortunately in Excel 2013 somehow a decision was taken to rename “measures” to be called “calculated fields”.  This was a bad decision and thanks to lobbying from many people Excel 2016 reverted to using the term measures (as does Power BI).  I always now use the term measures and never refer to Calculated Fields unless I am explaining to Excel 2013 users why they are stuck with a bad name.

Loading and Shaping Data

Push Shaping as Close to the Source as Possible

  • Wherever possible, you should do your data shaping as close as possible to the data source.

There are many ways that you can shape your data in the Microsoft BI stack.  Power Query is a great tool to reshape your data however you can also use Power Pivot (Calculated Columns, Filters on load) and Power BI also includes Calculated Tables.  And you can always write SQL code and paste that into the tools to extract the data that way.  The main problem with these approaches is you are effectively hard coding a solution for a single data set.  If you want to build another data set in the future, the work needs to be done again (either copy or re-write).  The data shaping tools are designed to allow you to do what ever you need without having to rely on a third party – use these tools if you need to.  However if you have a common need for data in a particular shape and you can get support (from IT or otherwise) to shape your data at the source so you can easily get what you need, then there is definitely value in doing that.

Shape in Power Query, Model in Power Pivot

Power Query and Power Pivot were built to do 2 completely different tasks.  Power Query is built for cleansing and shaping while Power Pivot is built for modelling and reporting.  It is possible that you can shape your data in Power Pivot (eg you can write calculated columns, you can add calculated tables (in the newer versions) etc).  But just because you can do these things in Power Pivot, doesn’t mean you should.   For example it is possible to write letters to people using Excel, but Word is a much better tool for this task (I knew someone that once did that!).

Best practice is that you should use Power Query to shape your data before/during load, and then use Power Pivot for measures and reporting. I have deeper coverage on this topic here.

Use A Calendar Table

  • If you want to any sort of time calculations, get a Calendar table

It is possible that you can analyse your data in a single flat table without using any lookup/dimension tables.  A Calendar table is a special type of lookup/dimension table because it can be used to perform time intelligence functions.  I have an article on time intelligence here and another on Calendar tables here.  Bottom line – get a Calendar table.

A Star Schema is Optimal

  • Power Pivot is optimised to use a Star Schema table structure

I have an in-depth article about star schemas here that you can read if need be.  I am not saying this is the only layout that will work, or that other designs will always be slow.  I am saying that if you start out thinking about a star schema and aim to build that design you will be well under way to success.  Two key things you should know.

  • Don’t just bring in what is in your source transactional database – that would likely put you into a world of pain.
  • There is no need to create a lookup/dimension table just for the sake of it.  If your sales table has customer name and you don’t care about anything else about the customer (eg city, state etc), then there is no need to create a lookup table just for the sake of creating a star schema.  If you have 2 or more columns relating to the same object in your data table, then it is time to consider a lookup table.

You Should Prefer Long and Narrow Tables

  • Short wide tables are generally bad for Power Pivot but long narrow tables are great.

image

There are 2 main reasons why loading data this way is a good idea.

  • Power Pivot is a column store database.  It uses advanced compression techniques to store the data efficiently so it takes up less space and so it is fast to access the data when needed.  Simplistically speaking, long narrow tables compress better than short wide tables.
  • Power Pivot is designed to quickly and easily filter your data.  It is much easier/better to write one formula to add up a single column and then filter on an attribute column (such as month name in the green table above) than it is to write many different measures to add up each column separately.

Only Load the Data You Need

  • Load all the data you need, and nothing you don’t need.

If you have data (particularly in extra columns) you don’t need loaded, then don’t load it. Loading data you don’t need will make your workbooks bigger and slower than they need to be.  In the old world of Excel we all used to ask IT to “give me everything” because it was too hard to go back and add the missing columns of data later.  This is no longer the case – it is very easy to change your data load query to add in a column you are missing.  So bring in all of what you need and nothing you don’t.  If you need something else later, then go and get it later.  It is even advisable to use a tool like PP Utilities to show you which columns are not in use so you can delete them.  Focus mainly on your large data tables – the lookup/dimension tables tend to be smaller and hence are generally less of an issue (not always).

Don’t use Linked Tables

It is possible to add your data to a table in Excel and then use a Linked Table to load it into Power Pivot.  You simply select the data, go to the Power Pivot Menu (1 below) and click Add to Data Model (2 below).

The trouble with doing this is that you end up with 2 copies of the data in your workbook. The Excel table is an uncompressed copy and then you also have a compressed copy inside Power Pivot.  In the example (shown as 3 above) there are many thousands of rows of data.  The uncompressed data can take up 6 – 10 times more space than the equivalent compressed data.  If you have a small table with a couple of columns and 10 or 20 rows then it is fine.  However if you have anything more than that you are better to have 1 workbook for your data and then import the table directly into Power Pivot without storing it in Excel at all.

Modelling

Avoid Bi-Directional Relationships

  • Avoid using the default bi-directional relationships in Power BI unless you need them.

image

Microsoft introduce bi-directional filter propagation in Power BI (this is currently not available in Excel).  This is a good feature for beginners and those that don’t really understand how the plumbing works.  But it comes at a cost in that:

  • The performance can be negatively affected
  • If there is more than 1 data table, there can be circular relationships created (just like in cells in Excel)

I recommend you turn your bi-directional relationships to single direction (double click to change) and only turn them back on if you really need them.

Measures are Better than Calculated Columns

  • Wherever possible you should write Measures in Preference to Calculated Columns Where Possible

I have been a strong proponent of this for many years.  It mainly applies to Excel users that don’t have any formal learning about database design.  It is very easy for an Excel user to think they should write a calculated column (because they know how to do that) and not a measure (because that is a bit foreign to an Excel user).  I am not going to cover this in depth again now as I have already covered it here.  Just do yourself a favour Excel folk and start with the assumption that a measure is what you should write unless you know why a calculated column is a better fit.

For the record there are times when a calculated column is the best option, but 99.9% of all use cases coming from new Excel users won’t need them.  The main exception is if you need to use the formula in a slicer to filter your data – then you will need a column.

Store Measures in the Table Where the Data Comes from

  • You have a choice where to store your measures, so place them in the table where the data comes from.

Take for example a measure like this.

Total Sales = SUM(Sales[Extended Amount])

The data for this measure is coming from the [Extended Amount] column in the sales table.  You should therefore store the measure in the sales table.   If you place it in (say) the Calendar table, Excel will likely give you a warning similar to shown below.

image

This issue doesn’t occur in Power BI.

Break Measures into Interim Parts

  • Break your DAX problems into manageable pieces and solve each piece one at a time.

You wouldn’t use a single cell in a spreadsheet to build a financial model.  The cells are there to be used and make it easier to build a solution that meets your needs.  You should think the same way about measures.  The following formula is valid however it is hard to read, write and debug.  It also repeats the same line of code multiple times throughout the measure.  Having said that it will give you the % change in sales vs last year.

 

It is much easier to write interim measures and then join the pieces together to solve your problem.  Plus you get each interim measure available for reuse elsewhere in your model.  I am sure you will agree the following set of 4 measures are much easier to understand.

Total Sales = SUMX(Sales,Sales[Qty] * Sales[Unit Price])

Total Sales LY  = CALCULATE([Total Sales],SAMEPERIODLASTYEAR(Calendar[Date]))

Chg vs LY = [Total Sales] – [Total Sales LY]

% Chg vs LY = DIVIDE ( [Chg vs LY], [Total Sales LY] )

Don’t Break Calculated Columns into Interim Parts

  • It is good to have interim measures but it is bad to keep interim columns.

Interim measures are calculated on the fly on demand, they take up little space and make it easier to write your DAX.  As with measures, I t is easier to write calculated columns using interim calculated columns, however the problem is that every column is pre-calculated and stored on disk, and each additional column makes the data model take up more space on disk and memory and hence makes it less efficient.   By all means write interim columns if you need to in order to create a calculate column (not withstanding the earlier comments of columns vs measures) however once you have worked out the correct syntax, concatenate all the code into a single “Mega DAX formula” in a single column. This is an Excel concept I learnt from John Walkenbach.

Other Advice

You Can’t Start to Learn DAX by Reading Alone

I say this up front in my book “Learn to Write DAX”.  If you think you are going to learn a new skill like Power Pivot, Power Query or Power BI by reading a book and not getting your hands in the system, let me tell you “you can’t”.  The exception is if you are a professional SQL Server database user and have a solid background in reporting and analytics database technology, then I am sure you can learn this way. For the rest of us Excel folk, there is no substitute for practicing what you read – so do yourself a favour.

Use 64 bit If You Can

This one causes a world of pain for many people.  Power Pivot is the ONLY MS Office product that can benefit from 64 bit, but unfortunately it is all or nothing.  Most organisations have deployed 32 bit and will be very reluctant to give you 64 bit Office.  You can read my article on this topic here and also read up about my work arounds including installing 64 bit Power BI Desktop with 32 bit Office, and then also using Power BI Desktop as a local server for Excel.

DAX Time Intelligence Explained

Level: Beginners

I help a lot of people on forums who ask questions about time intelligence for DAX.  If you are just starting out then the chances are that you may not even be clear what time intelligence is and hence sometimes you don’t even know what to ask.  Often the question is something like “I want to show this year and last year on a chart”, or “total year to date this year compared with last year” etc. If you want to do any time based comparison using DAX, Power Pivot and or Power BI, then this article explaining time intelligence is the right article for you.

Definition of Time Intelligence

Time intelligence is the collective name for a set of patterns (DAX in this case) that can be used to solve time comparison problems.  Examples include comparing:

  • Same period prior month, quarter, year etc.
  • Same period next month, quarter, year etc.
  • Same period year to date compared with prior year, next year etc.
  • Rolling 30 days, 60 days, 90 days, 12 months etc.
  • etc. – there are many many more

Time intelligence is used when you want to “time shift” any period with another period of time for comparison purposes or to simply display a different period than the selection.

Understanding Filter Context

Before you can understand why time intelligence needs a special approach in DAX, you first need to have a clear understanding of Filter Context.

Pivot Tables and Power BI visuals both slice data so that you can “drill” and “filter” to see a sub-set of data.  Take the image shown below.  On the left is a pivot table and on the right is a Power BI visual.

image

Starting with the Excel Pivot table on the left, every value cell in the pivot table has been filtered more or less by the Rows, Columns, Filters and Slicers that make up the Pivot Table.  The cell highlighted as 1 (above left) has the following filters applied.

  • Territory[Country] = “Australia” – this comes from the Rows in the Pivot
  • Calendar[Year] = “2003” – this comes from the Year in the slicer
  • Product[Category] = “Bikes” – this comes from the Category in the filter.

After these filters are applied, the calculation for the cell is evaluated and the answer $2,947,789 is returned to the cell.  Every value cell in the pivot table is evaluated in exactly the same way – including the Grand Total row in the pivot table.  In the case of the grand total row, Product[Category] and Calendar[Year] have the same filters, but there is no filter on Territory[Country].

On the right hand side in the image above is a Power BI visual.  Filtering in Power BI visuals fundamentally works the same way as a pivot table however there are more places for cross filtering to come from.  In the image above, the same filtering is applied as in the Pivot table but in the Power BI example the filters are applied in a different way.

  • Territory[Country] = “Australia” – this comes from the bar chart Axis
  • Calendar[Year] = “2003” – this comes from the Year in the slicer
  • Product[Category] = “Bikes” – this comes from the Category in the tree map visual.

When filter context is passed from a visual to the underlying data model, all the relevant tables are filtered before the calculation is completed.  Filter first, evaluate second is a fundamental principle for all DAX formulas.

The Time Intelligence “Problem”

Let’s assume you want to compare total sales on a particular year vs prior year.  One way to do this (in Excel) is to put the years onto Columns in a pivot table as shown below (a similar approach can be used in Power BI).

image

But doing it this way causes many problems, including:

  • There are years in the pivot table that you don’t want (eg 2001, 2004).  You will need to somehow manually hide or filter the ones you don’t need.
  • If you want to calculate the absolute change or % change year on year you will need to hard code some formulas in the cells next to the spreadsheet and hence they can’t be reused in other visuals later.
  • If you want to look at a different year you will potentially have to go through the process of doing the filtering and formulas again from scratch.

A better way to solve this problem is to select the current period (using a slicer or filter of some sort) and have a universal formula that returns the result relative to the selection.  So if you select 2003, the formula will automatically select 2002 for you.  If you select 2002, it will automatically select 2001 (and so on).

Filtering is Now Your Enemy

If you want to write a relative time intelligence formula, then the natural filtering behaviour of a visual becomes your enemy. Imagine a calendar table with a filter applied Calendar[Year] = 2003.  If you imagine what the filtered data model would look like “Under the hood”, it should be clear that the filtered calendar table will show rows starting on 1 Jan 2003 and ending on 31 Dec 2003 – it will contain 365 unique days for the single year 2003.  The filter has already been applied to the data model so only days in 2003 are available for the calculation – how then can the data model possibly access sales for the year 2002?  There needs to be a way to go back in time and fetch a different period (in this case the period is 1 year earlier in time than the selected period), yet the 2003 filter has already been applied preventing this from occurring naturally.  This is the reason why there needs to be a special set of time intelligence functions – to solve this natural filtering “problem”.

How Time Intelligence Functions Work

Time intelligence functions are specifically designed to solve the filtering issue described above.  All time intelligence functions execute the following 4 steps:

  1. first “detect” the current filter context to determine what the “current” selected period is
  2. then remove the calendar filtering from the underlying tables so that data for “all time” is once again available.
  3. then perform a time shift to find a different period in time (as specified in the formula).
  4. Finally reapply filters on the data model for the time shifted period prior to calculating the final value.

Custom vs. Inbuilt Time Intelligence

There are 2 types of time intelligence functions – Custom and Inbuilt.  Inbuilt time intelligence functions are easier to write because they have been designed to protect the user (ie you) from the complexity.  I am not going to cover Inbuilt time intelligence in depth here because they are relatively easy to use.  See link at the bottom of the page if you would like to download the DAX reference guide I produced (including all the time intelligence functions).

The reason inbuilt time intelligence functions are easier to learn is they actually are what is known as “Syntax Sugar”.  Microsoft has created these special functions to make them easier to write and easier to understand.  You follow the simple syntax and Power Pivot does the rest.  But under the hood the inbuilt time intelligence functions are converted to the full syntax prior to evaluation.  Consider the following two examples (just to illustrate the potential complexity in the custom version).

Total Sales Year to Date (inbuilt) = TOTALSYTD(Calendar[Date],[Total Sales])

Both of these formulas return the exact same result, and in fact both use the same approach to calculating the result under the hood.  The only difference is the inbuilt version is much easy to write because you (the DAX author) are protected from the full syntax.

The end result (regardless which formula is used) looks like this in a Pivot Table.

image

Looking at the highlighted cells, even though cell 1 above is filtered to show only sales for the month of July 2003, the Time Intelligence function (cell 2 above) is able to display sales for the period Jan – Jul 2003.  It does this because the formula performs the following 4 steps.

  1. It first “detects” the current filter context to determine what the “current” selected period is.  In this case July 2003
  2. It then removes the calendar filtering from the underlying tables so that all data is once again available.
  3. It then performs a time shift to find a different period in time – in this case it holds the end date in the current filter context the same (31 July 2003) but shifts the start date back to 1 Jan 2003.
  4. Finally it reapplies filters on the data model for the time shifted period prior to calculating the final value.

How to Read a Custom Time Intelligence Formula

The custom time intelligence formulas can be daunting when you first see them – but actually they are easy to understand as long as you clearly understand the role of each part of the formula.  Below is the same formula again (from above) along with line numbers to make it easier for me to refer to.

image

Line 2 (CALCULATE) is the only function that can change the natural filtering behaviour coming from a visual – that’s what it does.   CALCULATE always filters first, evaluates second (as mentioned above).  So lines 5 – 8 are executed first (filters applied) and then the formula on line 3 is evaluated last.

Lines 4 – 8 (FILTER) is where the filtering occurs.  There are a few confusing things here.

  • Line 5 refers to ALL(Calendar) instead of simply Calendar.
  • Lines 6 and 7 seem to be evaluating against themselves – very confusing.
  • Line 7 starts with a double ampersand &&

Let me explain each line.

Line 5 ALL(Calendar)

The key thing to understand here is that the filter portion of calculate always operates in the current filter context.  That means that the Calendar table in line 5 has already been filtered by the visual (eg the Pivot Table).  Looking back at the pivot table image above, this means that the line 5 is already filtered by the pivot table and hence the Calendar only has dates from 1 July 2003 until 31 July 2003 available – all other dates have been filtered away.  The purpose of the ALL Function therefore is to remove all filters from the current filter context.  If it didn’t remove these filters, it would not be possible to access dates outside of the month of July 2003.

Now they have all be removed, it is time to add back that date filters that are needed.

Line 6 MAX( )

Line 6 reads “where Calendar[Year] = MAX(Calendar[Year])” and hence it seems to be referring to itself. The way to read and understand line 6 (and 7) is as follows:

  • Whenever you see an aggregation function (in this case MAX) around a column, it means “go and read the value from the current filter context”.
  • Whenever you see a “naked” reference to a column (ie there is no aggregation function around the column), then it means “go and apply a new filter on this column in the data model.

Taking these 2 rules, it should be clear that MAX(Calendar[Year]) in the current filter context is = 2003.  Line 6 therefore says “Go and apply a new filter on Calendar[Year] to be equal to 2003.

Note that you can use any aggregation function in these formulas that does the job.  In line 6, you could use SUM, MIN, MAX, AVERAGE or any other similar aggregation function and still get the same result.  You could also use VALUES or DISTINCT in the case of line 6, and LASTDATE in the case of line 7.  There is no right or wrong answer, simply think about the need and then find the best function to use.

Line 7 && and MAX( )

Line 7 reads “and Calendar[Date] <= MAX(Calendar[Date])”.  The double ampersand && is the inline syntax for a logical AND.  Using this knowledge and the same logic as with line 6, the way to read line 7 is as follows:

“AND also go and apply another new filter this time on Calendar[Date] to be less than or equal to 31 July 2003”.

With both of these filters applied in a logical AND, the Calendar table will contain all of the dates from 1 Jan 2003 until 31 July 2003.

The Result

Once the time intelligence function has been written, it can be added to a visual as shown below (Power BI Desktop).  Note how the YTD formula seems to “defy” the filter context coming from the visualisation due to the custom time intelligence function that has been written and explained.

image

The Trouble with Syntax Sugar

Syntax sugar is great as it makes otherwise hard formulas very easy to write with a limited understanding of filter context, row context, filter propagation, context transition etc.  There are a few down sides however.  Firstly is that you don’t get to learn these very important concepts and hence you are delayed in becoming a true Power Pivot and Power BI Ninja.  Secondly the inbuilt time intelligence functions only work in certain circumstances where you have a day level Gregorian calendar.  Read more about that here exceleratorbi.com.au/power-pivot-calendar-tables/

Granularity

I personally hardly ever use the inbuilt time intelligence functions unless my data is at a day level of granularity (which is rare), and I find the custom functions become easy to write with practice.  Custom time intelligence functions become much more important when your data is not at a day level of granularity.  Most of the work I do for clients is either weekly or monthly data.  If you are in this situation you could “trick” Power Pivot that you are using daily data by loading all data in any given week or month on the same date (eg last day of the month) and use inbuilt time intelligence however this is hardly best practice.  A much better approach I think is to write custom time intelligence functions as outlined in this article.  If you are going down the  path of writing custom time intelligence, you should read my advice about adding an ID column into a calendar table to make custom time intelligence functions easier to write.  exceleratorbi.com.au/power-pivot-calendar-tables/

Sales vs Prior Year

Time for a different example.  Now that I have covered how a custom time intelligence function works, let me show you a couple of inbuilt time intelligence measures that solve the original problem (Sales vs Prior Year).

Sales Prior Year 1 = CALCULATE([Total Sales],SAMEPERIODLASTYEAR(Calendar[Dates]))

Sales Prior Year 2 = CALCULATE([Total Sales],DATESADD(Calendar[Dates],-1,YEAR))

Both of the above formulas use inbuilt time intelligence functions (shown in bold), but they also use CALCULATE.  Now you have an understanding that CALCULATE is performing a time shift, it should be much easier to understand what is happening in these formulas.  Both of these formulas in bold produce a table of dates that have been time shifted by 1 year.  CALCULATE then takes this new table of dates, removes the current filter context from the calendar table and then moves back in time by 1 year before reapplying the filter context and then doing the calculation.  One you have [Sales Prior Year] it is easy to write:

Change vs Prior Year = [Total Sales] – [Sales Prior Year]

% Change vs Prior Year = DIVIDE([Change vs Prior Year] , [Sales Prior Year])

So where can you find a list of all the inbuilt time intelligence functions?

 A Free DAX Reference Guide

One of my students at a recent training class asked me if there was a list of all DAX Functions – kind of like a cheat sheet.  I wasn’t able to find such a thing so I produced exactly that and I am making it available free to anyone that would like a copy here.

This reference guide covers all of the inbuilt time intelligence functions on page 14 as well as every other function across the language all nicely laid out to make them easy to find.  You can download this reference guide below.  If you haven’t ready done so, why not sign up for my weekly newsletters at the same time so you are kept up to date with my latest tips and tricks about Power Pivot, Power Query and Power BI.

Download the DAX Reference Guide Using the Form Below

Extract Tabular Data From Power BI Service to Excel

Someone asked me a question yesterday about exporting data from the Power BI Service into Excel.  There are a few options to do this however they all have their problems (these problems are not covered in great detail in this post).

  • Power BI has an inbuilt export data feature (there is an export limit of 30k rows for a free account and 150k rows for a Pro account.  Plus the connection cannot be refreshed).
  • Analyze in Excel (can only view the data in Excel via a pivot table, not a table of records).
  • Download the Power BI Desktop file to your PC and use DAX Studio (you have to download the entire model just to get the records you want, besides it may not even work at all).

I started to think that I may be able to connect directly to Power BI Service using DAX Studio and then use the techniques I cover here to extract the records.  I tried this but couldn’t get it to work (please let me know if you find a way to do this).  I started searching and found this old article I wrote in 2015 that uses a technique I learnt from Marco Russo.  l have used a similar approach to create a direct query of a Power BI Data Model and extract the records directly into a Table in Excel.

Edit: 11 Jan 2017 – see the bottom of this post to see how Chris Webb recommends to do this task – it is much easier :-).  But read on for some interesting learning.

Create a Link to PowerBI.com

The first thing I did was to select my dataset from the Power BI Service, clicked the Ellipsis (1 below) and then Analyze in Excel (2 below).

image

This downloads an ODC file to the downloads folder.  Mine was called Adventure Works.odc

image

Note: there have been many updates to the Analyze in Excel connectors since its launch last year.  If you haven’t updated the Analyze in Excel connectors for a while you would be well advised to do that first before attempting the rest of the procedure that follows.  This can be done by logging into Power BI, clicking Downloads (shown as 1 below) and then downloading and installing the connector software (2 below).

image

Edit the ODC File to Make it a Query

I then edited the ODC file using Notepad.  It looks like this originally

image

It is a bit hard to read the file as is, so I cut and paste the contents into an HTML editor to improve the readability, (as shown below) then pasted the formatted text back into Notepad.

image

As you can see above, the default command is “Cube” and the default text is “Model”.  The trick is to change the command to be “Query” and replace the text to be any valid DAX Query.

I changed lines 19 and 20 below as follows:

image

<odc:CommandType>Query</odc:CommandType>
<odc:CommandText>Evaluate Sales</odc:CommandText>

The Command Text I used above is one of the most simple DAX Queries you can write.  All queries must start with the EVALUATE statement followed by any valid DAX code that returns a table of data.  The simplest way to do that is just specify an existing table (Sales in this case). If you are doing this, you can use any table name in your data model.

I then saved and closed the ODC file.

Open the ODC in Excel

To open the ODC file in Excel, I then simply double clicked on the file.  This opened Excel and gave me the following security warning

image

I clicked enable, and there appeared the entire Sales table from my Power BI data model directly in a table in Excel.

image

Now that the connection has been established, the file can be saved and refreshed as needed in the future.  To refresh, simply open the file, right click on the table and click “refresh”.

Change the Query

So far this is pretty boring albeit maybe useful for some people.  Much more interesting is to be able to extract a subset of the data to Excel.  To do this you need to learn a bit of the DAX Query Language.  I am not going to go into the DAX Query Language in detail in this post, but below I show how to edit the current query and change it to a new query – in this case a query that will return the Total Sales by Invoice Number.

To edit the existing query, I simply right clicked anywhere in the table (1 below), then selected Table (2 below), then Edit Query (3 below).

image

This brings up a dialog that can be edited as needed.

image

For my demo, I replaced the command text with my new DAX Query as follows

image

After clicking OK, this query then returned a table of all invoice numbers and the total value of sales for those orders.  Interestingly the number formatting for Total Sales did not flow through to Excel as I expected it to.

image

Chris Webb’s Trick

Edit 11 Jan 2017. Chris Webb tweeted an alternate approach that is faster to execute. Simply create any pivot table using Analyze in Excel, drill through to the underlying table and then edit the query as before.

Further Learning

I may come back and blog about DAX as a Query language at another time.  Until then if you would like to learn what is possible, I recommend this video from Alberto Ferrari (although it is a bit advanced unless you have some solid understanding of tables and databases).

www.sqlbi.com/tv/querying-in-dax-tee2014/

Cleansing Data with Power Query

Today I am combining a few techniques to show how to build a robust cleansing approach for your data using Power Query.  This article will demonstrate the following Power Query techniques

  • Joining tables (Merge Query)
  • Self Referencing tables
  • Adding Custom Columns

Sample Data

I am using some sample data as shown below.  The Country data is the data I want to cleanse so there are consistent country values in the Country column.

image

I first load this data into Power Query and set “Close and Load to” so that it only creates a connection.  I then have Power Query access to the source data table that can be reused multiple times.

image

Create a Table of All Entered Countries

The next step is to produce a list of the unique entries in the source data.  To do this, I referenced the above table by right clicking (1 below), then Reference (2)

image

I then removed the Name column (1 below) and then removed the duplicates in the remaining Country Column (2 below).

image

I then sorted the column before loading the data back to Excel as shown below.

image

Create the Substitutes with a Self Referencing Table

I first covered this trick of a self referencing table in this blog article.  Continuing on with my test data above, I have manually added a new column to the green table above and entered the substitutions as needed.  Where there is no substitute needed I have simply left the substitute blank.

image

As it currently stands above, if the source table is sorted and then the green table is refreshed, the substitute column will fall out of alignment with the source table – that is bad.  Now for the self referencing trick.  I load this green table above back into Power Query as a new query, setting it to only create a connection as I did for the original set of data. I named this new query “Substitutes”

I then edited the original query that created the original green table and joined it to the substitute table (aka self referencing).  While editing the Entered Names table, I selected Merge Queries (1 below), selected the Substitute table (2), then joined on the Country Columns (3, 4) with a left outer join (5).

image

I then expanded the new column to extract the substitutes.

image

 

This now gives me a self referencing table that will refresh while keeping the countries and substitutes correctly aligned regardless of what happens to the source table.  This is the key point about a self referencing table (covered in my previous article).

image

Update the New Data

Now when new data is added to the source table, it will appear in the substitute table after refresh as follows.

image

What would be much better is to be clear which are the correct entries that have previously been checked so it is more obvious which are the new entries.  So I decided to enter the correct country name for each entry as follows.

image

It is now much clearer that there are 3 new entries that need to be maintained, and I then updated the new data below.

image

Use the Substitute Table

Now that I have this substitute table working for me, all I need to do is create a new copy of my source table that uses the substitute country rather than the original country entered.  I create a new query referencing my original source data (right click, reference).

Merge the queries as before, joining on the original country column.

image

expand the table to retrieve the substitute column

image

Then delete the original country column and rename the substitute column to be called “Country”

image

As you can see below, the substitute table is correctly remapping the data as needed.

image

When the new data is added to the source table, the substitute table will have blank entries as shown below.

image

As a nice touch, I went back into the final query and added a new custom column as follows

and then used this new column as the country column.  Now when I refresh the data, it is clear that I need to do some maintenance on the substitute table.

Data Modelling with OneDrive Source Files

Last week I shared how it was possible to easily create an online survey using Excel for O365.  Once such a survey is created, the data can be imported into PowerBI.com and used to create a simple report.  The biggest issue with directly connecting to OneDrive like I showed last week is that you can’t do any serious data modelling.  You can’t write any DAX measures or join other tables into the data model etc.  Not being able to build a proper data model with my data set last week created the following issues (I am sure there are more).

  1. I couldn’t easily create a sort order column for my “how long have you been reading” question (hence why I needed to prepend a letter A, B, C etc to get the answers to sort).
  2. Without data modelling, there was no easy way for me to cleans the data. eg Some people said they came from “United States” and others said US, but that didn’t work in my map (it had to be consistent).
  3. image<

  4. I couldn’t write proper measures such as countrows(survey) to find out how many surveys there were.  Instead I had to rely on “Count of Name”.  If someone left the name field blank, it would not be counted.
  5. I couldn’t change the column names to be something more suitable for reporting – I was stuck with the actual survey questions in the visualisations.

All of these issues (and more) are solvable if you can simply build a regular data model, but this is not possible when you directly connect PowerBI.com to a flat table of data in the cloud.

This article explains how you can create a data model with a cloud based service (like OneDrive) and also how I then went about improving the reporting from last week given I could crate a proper data model.

Connecting Power BI Desktop to OneDrive

The alternative to connecting PowerBI.com to the OneDrive file is to use Power BI Desktop instead.

  • First build a data model using Power BI Desktop connecting to the cloud copy of the OneDrive file.
  • Publish the Power BI Desktop file to the service in the cloud (after you have done your data modelling of course)
  • Configure Power BI cloud service to directly authenticate to OneDrive in the cloud so that it can directly access the OneDrive data without further configuration or gateways.  Power BI automatically refreshes the data for you approximately once every hour or so.

Let me demonstrate how to do this using the same survey from last week.  If you haven’t already completed the survey you can do so by clicking here and then see you survey response included in the reporting below.  You can read about how to create an Excel survey like this from the blog article last week.

Copy the URL to your OneDrive File in the Cloud

There are a few ways to do this but the way I did it was to copy the URL of my OneDrive XLSX file and then use that to import the data “From Web”.

Get the URL

The tricky bit is finding the correct web URL of your file on OneDrive.  The way to do this is to open the Excel file in Excel on your PC and extract the URL from there.  You can either open the file from OneDrive in the cloud or the copy on your PC (I used the one on my PC).

Once you have the file open in Excel, click on the File menu, then Info (1 below), click on the OneDrive Link (2 below) and then click “Copy link to Clipboard” (3 below).  This placed the link into the clipboard.

image

With the URL still in the clipboard, I created a new Power BI Desktop report using Get data, Web as shown below.

image

I then pasted the URL from the clipboard and manually deleted everything from the question mark to the right (highlighted below).

image

So I was left with the following

image

I then got the following complicated set of options for Authentication. I took the last option to sign into my Organizational Account, and signed in with my Office 365 credentials

image

image

After authenticating, I was able to import the table of data into Power BI Desktop and Load as normal.

image

Load to Power BI and Configure Credentials

I will continue the process of building out my data model further down the page, but first let me complete the process to publish and configure the online credentials process.

First I saved the query (close and apply), saved the workbook and published it to my PowerBI.com account.  Once the dataset was in the cloud, I needed to configure my credentials for the dataset.    I clicked on Settings (1 and 2), Datasets (3) and then selected my dataset (4 below)

image

Because this is an online data source, I was able to edit the credentials as shown below.

image

After clicking Edit Credentials, I selected OAuth2 and signed in.

image

And I then got this confirmation.

image

Improving the Reporting Using Data Modelling

Back to the data modelling in Power BI Desktop.  The first thing I did was turn off data load for the table I had previously loaded.  This was quite easy. Right click on the table, then uncheck “Enable Load”.  This keeps the connection to the table but doesn’t load it in the current form.

image

The rest of this article explains how I changed the data to make it more useable in my final report.

Cleanse the Country Data

The next thing I did was create a data cleansing table for the list of countries. Ideally the survey tool would provide a list of all valid countries to select from a list, however this is not available using the simple online Excel survey tool I am using.  Instead I created a substitution table where I can manually create substitutes for data that I wanted to change.

I created a new query with reference to the first.  Right click on the first query (1 below) and then select Reference (2 below).

image

I then deleted all columns other than the country column, removed duplicates and sorted the column.  Finally I set it to disable load.  It is now possible to see all the data points entered and identify the dirty data that was included (shown below).

image

I then created a manual mapping table to make it easier to control the data cleansing process.  To do this, I simply created a new manual table in Power BI Desktop as show below (and also turned of data load).

image

Sorry to the non descript North American if you happen to be Canadian, Mexican or of some other Nationality.  I decided to map this to the USA.

Now I had the mapping table, the next step was to replace the original country data with the cleansed data. I created a new query referencing Table1, then joined on the country columns to the CountryCleanse table.

image

Using a Left Outer Join as shown below

image

I then expanded the new column to extract the cleansed version of the data.

image

This gave me a new column that contains the substitutes in the case there was a match in the cleansing table.

image

I then created a new custom column that uses this cleansed column if it exists, otherwise the original column.

image

image

I then deleted the other 2 columns and renamed my new column so that I had a cleansed copy of the Country.

Cleanse and Sort “How Long” Answers

In my survey question, I asked how long had the participant been reading my blog.  When the answers were placed in Power BI they didn’t sort based on length of time (they sorted alphabetically).  To fix this (using the original approach I blogged about last week) I was forced to add a prefix to force the answers to sort logically.  Using Data Modelling, it is possible to solve this problem properly.  For variety (in this blog), I decided to use a different mapping table solution. I first created a mapping table with the original survey choices, a new set of labels and a sort column as follows.

image

Then I loaded this mapping table as a lookup table to my data model and joined it on the Original column as the key.

image

Then to avoid confusion in the final report, I hid the original columns in both tables.  Right click on the column (1 below) and then select Hide in Report View (2 below).

image

Of course I sorted the Tenure column

image

Adding “All of the Above” into the other 3 choices

The last problem I wanted to solve was that many people chose “All of the above” when asked about their favourite topics.  When this was put in a visual (shown as 1 below) it is not possible to see the overall most popular item.

image

There are many ways to handle this problem.  The measure I wrote was as follows:

Count Most Useful =
COUNTROWS ( Data )
    + CALCULATE (
        COUNTROWS ( data ),
        Data[Which topics do you find most useful?] = "All of the above"
    )

The measures simply adds the count for each item and also adds “All of the above” as well (shown as 3 above) and I then simply filtered out the “All of the above” answer from the visual (shown as 4 above).

My report was starting to look good (see below) but I still wasn’t happy.

image

Ideally I should be able to show “what is most useful” and also “What do you want to see more of” in a single bar chart.   But the problem was that there were 2 different measures and the “bars” are coming from 2 different columns in the data source.  This could not be solved without thinking differently about the problem.

I created a new table that has the 3 responses and loaded it to the data model.  I didn’t connect this table to any other tables in the data model.

image

I placed this new column of data as the Axis of my visual and place the 2 measures as Values.  This then gave me a chart that looked like I needed, but of course there was no filtering, because the new “Types” table is not connected to anything in the data model.

This is important:  The Axis is part of the filter context, the Types table is being filtered by the visual, but this filtering is not being used in the measures I wrote.

image

So I changed the measures so they would “harvest” the value on the Axis to filter the results.

Count Most Useful =
IF (
    HASONEVALUE ( Types[Product] ),
    CALCULATE (
        COUNTROWS ( Data ),
        Data[Which topics do you find most useful?] = VALUES ( Types[Product] )
    )
        + CALCULATE (
            COUNTROWS ( data ),
            Data[Which topics do you find most useful?] = "All of the above"
        )
)

The final visual looks like this, which is much easier to read and understand.

image

And here is the final report

Easy Online Surveys with Power BI Reporting

I think today’s article will be of interest to my readers even though it is a little astray from my normally pure Power BI, Power Pivot and Power Query content. I will show you how to quickly and easily create an On-Line Survey that you can distribute to anyone that has an Internet connection, and then use Power BI to report on the results from your survey.

Office 365 Excel Surveys

You may not be aware that O365 has an inbuilt Excel survey tool that is very easy to use.  And who wouldn’t want to use Excel to solve a problem if they could?  You can find the survey tool by going to OneDrive online.  I am using OneDrive for business for this demo, but it also works for OneDrive Personal.

Here is Survey – Please Participate

For demonstration purposes, I have created a survey to ask my readers what they would like to see more of on my website and to see which countries the readers live in.  Edit 7 Feb 2016.  The survey is now closed, but you can still read on.

image

Now that you have completed the survey, read on to see how you can do the same yourself and to see the latest results via an embedded a Power BI report at the bottom of the page.  Note that it can take up to 1 hour for your specific survey response to be available in the report (ie it is near real time – not real time).

How I Created the Survey

First Open OneDrive Online

Frankly I get confused with all the logins for Microsoft sites. The easiest way I know how to login to OneDrive Online is to Open Windows Explorer, right click on the OneDrive folder and then select View Online.

image

Create Your Excel Survey

Creating an Excel survey couldn’t be easier.  Just click New\Excel Survey as shown below.

image

I immediately got some survey header information to fill in as shown below.

image

There are no user manuals required to work out how to do this.  I just updated the information and moved on to the next section.  It is easy to enter the questions you have, and there are a number of intuitive response types that you can choose to meet your needs.

image

Survey Results Database

After you have completed your survey, you can see the underlying Excel spreadsheet that stores your survey results.  It is possible to add new columns to the right of the results table.  Additional columns added manually can be used to store information you want to add without it being part of the survey.

image

Connecting the Data to PowerBI.com

There are a few ways you can create a Power BI Report.  If you want a fully featured data model including the ability to import other tables of data and write your own DAX, then you need to import this new data source into Power BI Desktop as you would with any other report.  For this demo today, I have use the “Get Data” feature from directly inside Power BI Service to connect to the Excel workbook on OneDrive.  To do this, I first logged into PowerBI.com and selected get data from files as shown below.

image

Then I selected OneDrive – Business as shown below.

image

Then I selected my survey file and clicked Connect

image

I was then prompted to either import the data or connect.  The connect option works just like Excel Online – not what I wanted.  So I selected Import.  When you use import in this way, Power BI will check your OneDrive account about once each hour to check for new data and will automatically update your report.

image

Power BI then created a new Dataset for me (1 below).  You can see the table of data shown in 2 below.  It also created a Dashboard, however this was not much use so I simply deleted the dashboard.

image

Here is the Final Power BI Report

Once I had the data connected, I simply created a few simple visualisations using the capabilities of Power BI Service.  You can see the final interactive report below.

Be sure to read next week’s post where I cover how you can use Power BI Desktop to model your online data. exceleratorbi.com.au/data-modelling-onedrive-source-files/

How to Document DAX Measures in Excel

I often get asked if there is an easy way to create documentation for DAX measures when using Power Pivot for Excel.  I am not a big fan of documentation for the sake of it, but I do see value in having “some” appropriate level of documentation.  I think a good balance of the right amount of documentation vs the effort to write and maintain the documentation is to use the Power Pivot UI itself.  I explain how I do it below.

Writing DAX Measures From Within Excel

I always teach students at my Power BI training courses to write their measures from within Excel via the “new measures” dialog box (and specifically not to write them from within the Power Pivot window).  The process is always as shown below.

  1. Set up a pivot table that has some relevance to the measure you are writing, then click in the pivot table (shown as 1 below)
  2. Select Power Pivot Menu \ Measures* \ New Measure* as shown below (*called Calculated Field in Excel 2013)

image

The reasons I teach students to do it this way are simple.

  1. You get a dialog box/wizard that shows you every step that is required to complete the task.  Just complete each step one at a time and you can’t go wrong.
  2. You get to check the measure for errors before completing it (shown as 2 below).
  3. Once you add the measure to your pivot table (shown as 3), you get to see immediately if it is working as you expected.

image

image

And there is a fourth reason to use the wizard that is less obvious.  The Description box (shown as 4 above) is a great place where you can make notes about your DAX formulas (aka documentation).    In Power BI Desktop you can add comments into your DAX expressions, but this is not possible in Excel, so the description box is the best option you have.

Create Your Documentation

If you diligently add comments into the Description box above, you will then be able to extract this information into a documentation page using the Power Pivot Utilities Toolbar.  I have previously reviewed this great little tool here, so you can go back and read about it if you missed it.

After installing both tools, you will then see the PP Utilities Toolbar in Excel (shown as 1 below)

image

Click on List Measures (shown as 2 above).  After a few seconds, you will see a new page in your Workbook called “Measures” like shown below.  Note the “Description” column on the right hand side. This column is sourced directly from the information entered into the Description box shown further up the page.

image

What’s more, if you select the measures on your page (Shown as 1 below) and then click Format DAX Expressions (2 below), the measures will be nicely formatted so they are easy to read.

image

This page is then your documentation and you can refer to it at any time you need to see the big picture.

Always Up to Date

One important feature of this type of documentation is that it is very easy to maintain.  If you make changes to any of the measures or comments in your data model, you can simply re-run the extract process so you have an up-to-date list of all the measures.  You can also do the same for Calculated Columns and also relationships.

What About Excel 2010?

Unfortunately the Power Pivot Utilities Addin only works for Excel 2013+  But all is not lost.  It is possible to extract the Measures from your Data Model manually using DAX Studio.  You can read my blog post about how to do that (and other cool things) with DAX Studio here

64 Bit Power BI Desktop with 32 Bit Office? Can do!

As many of you would already know, 32 bit applications do not always play well with their bigger sibling,  64 bit applications.

There are lots of “under the hood” files and applications that are always designed to be either 32 bit or 64 bit – not both.  In the case of Microsoft Office, you must choose either 32 bit or 64 bit across the entire Office Suite.  You can’t for example have 64 bit Excel and 32 bit Outlook, PowerPoint etc (read more about it here).  This is unfortunate because:

  1. The ONLY Office application that can benefit from 64 bit is Excel running Power Pivot.
  2. Many (most) IT departments do not install Office 64 bit (for the same reason I mention above – there are often many conflicts with Addins and applications).

So chances are you have 32 bit Office applications installed…
image

…but you want to have 64 bit Excel because you want to be a Power Pivot Ninja.

So what to do? Well the best thing is to try to get your system upgraded to 64 bit Office.  If that is not possible, then there is a work around that I explain below.

Note: You must have 64 bit Windows and Admin Rights to proceed.  If you don’t have Admin Rights, your IT department may be happy to install 64 bit Power BI Desktop for you as an alternative to Office 64 bit.  Also note that this worked for me, but I can’t guarantee it will work for you.  Use caution and I recommend you create a system restore point before proceeding.

Power BI Desktop 64 Bit

It is possible to install Power BI Desktop 64 bit on a computer that has 32 bit Office installed.  There are a few things you may need to do however to make it work properly (depending on your config), but I explain those things below.  The high level process is:

  1. Install Power BI Desktop 64 bit
  2. Install the 64 bit data providers you need (such as the Access Database Engine) so you can import the data you need. Power Pivot relies on these data providers to connect to the relevant data sources.

Once you have 64 bit Power BI Desktop installed and running, you can then also optionally use Power BI Desktop as a local 64 bit server and direct connect to it from Excel 32 bit by following the instructions and template provided in my blog article here.

Download and Install Power BI Desktop

Installing Power BI Desktop is easy.  You download it from the direct download here or from the Power BI site here,   Which ever place you go, just make sure you download the 64 bit version.

image

After downloading the file, just go ahead and install it in the normal way.

Try to Import Data in Power BI Desktop

After installing, run Power BI Desktop and try to install some data (say from an Access file like this one).  When you try to import data using Access, you will most likely get an error message like this.
image

But don’t panic, just follow the instructions below.

Download the Data Providers you need

If you have 32 bit Office on your computer, then any data providers you have on your PC will also be 32 bit.  These will not work with 64 bit Power BI Desktop and you will need to install the 64 bit versions as well as retaining your 32 bit versions.

Standard Install Process

  1. Download the data provider you need (AccessDatabaseEngine.exe 64 bit version in this case here)image
  2. Try to install the file.  If all goes well, you are done and you can start using Power BI Desktop.

However in some PC configurations, installing 64 bit data providers when there are also 32 bit data providers installed can be harder than you would think, because Microsoft can explicitly try to prevent you from doing this.   If this happens, you can force the install of the 64 bit data providers by following the steps below:

Install in Passive Mode

  1. Copy the install file to a location on your PC that is easy to access from the command line.  I put mine in C: root folder
  2. Click the windows button and type run and press enter
  3. type c:AccessDatabaseEngine.exe /passive into the command line box and click “OK”.  Of course you should use the correct path to your file.
  4. The files will install successfully in the background while suppressing warnings and errors.
  5. It’s a good idea to reboot your computer after the install.

You may need to repeat this process for any other data providers that you need.  The easiest thing to do is just start using Power BI Desktop until you find you have a problem, then find and install the new 64 bit data providers you need to solve each problem you come across.

Time to Test

Once you have installed the new data providers, it is time to test it out.  Open Power BI Desktop 64 bit and try to connect to data coming from an Access Database.  You can download the Adventure Works Access database here if you need a sample to test.

Use Power BI Desktop 64 bit as a Server on your PC

Take a look at my blog post from a couple of weeks ago.  If you follow then instructions on the other blog post, you will be able to open Power BI Desktop 64 bit, open a large PBIX workbook, and then use Excel to connect directly to Power BI Desktop as a server.  You can also migrate your existing Excel workbooks that are too big for 32 bit Excel to Power BI Desktop (File\Import\Excel Workbook Contents) and then start to use Power BI Desktop as a server (you will have to recreate your visualisations however).

Import Tabular Data from PDF using Power Query

Today I am sharing a process I developed that allows you to import tabular data from a PDF document into Excel (or Power BI) using Power Query.  I didn’t want to purchase software to do this task so I started experimenting on how I could do it with the tools I already have, and I am sharing my solution here today.

pdf-to-pq

Note:  This will only work for tabular data in a PDF – exactly the same as downloading from a Web Page must be in tabular form.

I also demo some neat Power Query tricks further down in this post, so make sure you read through to the end.  These tricks are not hard, and you may be surprised at what you can do.

Process Overview

Here are the steps I use to grab the table from the PDF.

  • Open the PDF in Microsoft Word.
  • Save the file as a single file web page.
  • Import the single file web page into Power Query as HTML.
  • Go through the normal cleansing and transformation process you would normally do (plus of course the cool tricks I use below).

Worked Through Example

I did a quick Google to find a PDF containing something I could use in my demo from the Web.  I found this PDF that contains a list of US States with their abbreviations

www.siue.edu/postal/pdf/Abbreviation-List.pdf    Note the data I want from this file is in tabular format.

image

I saved the PDF to a known location on my PC.

Convert the PDF to a Web Page File

I opened Microsoft Word and then I opened the PDF file from within Word.  I got this warning message below.  I just clicked “don’t show this message again” and clicked OK.

image

I then selected File\Save As and saved the file as a Single File Web Page (shown below).

2016-11-18_121653

 

Import into Power Query

There are a couple of ways you can do the next step. I found the following to be the easiest.

Create a new query that connects to a text file (example shown below – Excel 2016, but it is similar in all other versions and Power BI)

image

In the dialog, change the file filter so it can find all file types (as shown below), then browsed until I found my file.

image

Then select the MHT file created earlier and click “import”

image

Power Query will not correctly identify the file type, so right click on the file (shown below) and the select HTML.

image

Power Query now correctly identifies the table in the document.  I expanded the one identified as “3DTableGrid”.

image

In your own PDF documents, you may need to use some trial and error to find the right table.

I then clicked on the Table I wanted (shown as 2 above).

Cleansing the Data

The data is starting to look good at this stage.  But note there is a bit of “noise” in the file (see below). Power Query is a great tool for fixing that.

image

First I got rid of the first row (Remove Top Rows, 1).

Next I copied the text <= /span> by right clicking on one of the cells containing this text, and selecting “copy”.  Then I selected all 4 columns and did a Transform\Replace Values and replaced <= /span> with nothing.

I didn’t need the bottom 11 rows, so I did Remove Rows\Remove Bottom Rows\11

image

Now for Some Power Query Tricks

Power Query is a formula language. If you analyse each of the steps in the Applied Steps window, you will notice that each step is a formula, and the formula takes the output of the previous formula as the input step to the next formula.  The UI assumes each step will take the previous step as an input to the new step, but it doesn’t have to be that way.  You can override that if you want – I show you how below.

Turn on the formula bar before proceeding.

formula

At this point, I have my data in 4 columns, but I really want all the data in just 2 columns.

image

This is easy to fix with some simple Power Query tricks, with very little PQL coding at all.

First I renamed this step to be called All4Columns.  I like to rename significant steps in my Applied Steps window to make it easier to find the ones I need later.

image

Then I removed the last 2 columns using the UI.  Select the columns to remove, right click and then remove columns.

image

Before moving on I renamed the 2 columns to be called “State” and “Abbreviation”, and renamed the step to be called First2Columns using the same renaming approach as before.

image

Now the trick. At any point in your query, you can add a custom query step by clicking the button shown in 1 below.

image

Power Query assumes you want to add a new step that starts from the previous step. So Power Query automatically adds the previous step as the starting point (as shown below).

image

But you don’t have to keep the previous step.  In this case I want to refer to the step “All4Columns” instead.  So I just typed in the name of the step I wanted (as shown below) to replace what Power Query added for me.

= All4Columns

Now that I had all 4 columns back, I went through the following steps.

  • removed the first 2 columns
  • renamed the columns to be “State” and “Abbreviation” as before
  • renamed the step to be called Second2Columns

This then left me with 2 non-sequential steps (First2Columns, Second2Columns) that each contained half of the data.

Append The 2 Tables into a Single Table

I didn’t know how to write the code to do this, so I just used the UI again.  I selected “Append Queries” and appended the query to itself.  That forced the UI to write the code for me as shown below.

image

Not only did the UI write the code for me, but I learnt a single PQL function that I think I can probably remember Table.Combine( ) and hence I may be able to do the same step by hand next time around (this is how you can learn too).  Then all I needed to do was change the code above so that instead of appending to itself, it would append the First2Columns to the Second2Columns.  You don’t need to be a rocket scientist to work out how to do this :-).   It looks like this after I made the change.

= Table.Combine({First2Columns, Second2Columns})

Wrap Up

I hope you found this article useful, and you have learnt a few things.

  1. How to import data from a PDF
  2. How to refer to non sequential rows in a query
  3. How to use the UI in Power Query to help you write more advanced queries.
  4. How to refer to the Formula Bar to start to build your knowledge of the Power Query Formula Language.

If you are into Power Pivot and you haven’t already done so, you may like to download my paper “Ten Things I Wish I Knew When I Started With Power Pivot” from the link below.

Edit: 24th Nov 2016
A friend of mine Dave Marriott was using my approach above but had a stack of PDFs that he needed to access. He decided to write some Word VBA code that automates the task of converting the PDFs to MHT files. Dave is a professional VBA programmer based in Adelaide, Australia if you would like to hire him to help you be more efficient in your business. You can contact him at LinkedIn

Copy this VBA code into Microsoft Word.  Why not add it in your Normal Template so it is always available?  If you haven’t done this before, you can follow my instructions on how to do this in Excel (it is exactly the same process in Word).  Run the code to launch a dialog. You can multi-select as many PDFs as you like and the code will convert them all for you.

Sub ConvertToMHT()

Dim f_dialog As FileDialog
Dim FileChosen As Integer
Dim FileName As String
Dim i As Integer
Set f_dialog = Application.FileDialog(msoFileDialogFilePicker)
f_dialog.InitialView = msoFileDialogViewList
f_dialog.AllowMultiSelect = True

FileChosen = f_dialog.Show
If FileChosen = -1 Then
     For i = 1 To f_dialog.SelectedItems.Count
          Documents.Open FileName:=f_dialog.SelectedItems(i)
          ActiveDocument.SaveAs2 FileName:=Mid(f_dialog.SelectedItems(i), 1, Len(f_dialog.SelectedItems(i)) - 4) & ".mht", FileFormat:=wdFormatWebArchive
          ActiveWindow.Close
     Next i
End If

End Sub

Excel Workbook Connection to Local Power BI Desktop

Today I am sharing an easy way to use Power BI Desktop as a SSAS tabular server on your local PC.  I find increasingly that I am building data models in Power BI Desktop and yet I still want to use Excel to analyse the data some of the time.  If I load the PBIX workbook to the Power BI Service, it is easy to use “Analyze in Excel” to query the data model.  But there is currently no standard supported method to do the same thing using Power BI Desktop as a server (see my suggested idea to Microsoft at the bottom of this page for more about that).  Until then, we need to use a “hack” to complete this task.

I have previously shared a way to:

  • Open DAX Studio on your PC
  • Use DAX Studio to extract the Port Number of a running instance of Power BI Desktop
  • Manually enter this information into Excel to connect to Power BI Desktop and use it as a local SSAS Server instance
  • Query the data model that exists in the open Power BI Desktop file with a Pivot Table in Excel.

The problem with the above approach is that once the Power BI Desktop file is closed, the connection string becomes invalid.  Next time you want to use Power BI Desktop as a server you must go through the entire process again plus you need to alter the connection string that was previously created – hardly a scalable solution.

An Excel VBA Workbook Template to Solve This Problem

Local Host Workbook

Today I am sharing a tool that I built recently that completes the entire process automatically in Excel using VBA and Power Query.

Disclaimer:  I am providing this tool free of charge, without support, and do not warrant that it will work for you.  This is not supported by Microsoft and could stop working anytime.  If it works for you, then great – if not then it is unfortunate :-(.  Maybe you can deconstruct what I have done and make it work for you.  I built this using Excel 2016 and I have tested it on 2010 and 2013 and it works (Please make sure you have the latest version of Power Query installed).

You create a new blank Excel Workbook using my template (from personal templates), then click the “Refresh SSAS Connection” button.  The tool will then:

  • Detect an instance of Power BI Desktop running on your PC (there must be 1 and only 1)
  • It finds the port number and database name of the running instance
  • it modifies the existing connection string in the Excel Workbook to repoint it to the current running instance of Power BI Desktop.
  • If the workbook already contains pivot tables that were previously built using the same data source, they will simply reconnect and work as you expect.
  • If this is the first time you have used the template, you will have a blank Pivot Table connected to the Power BI Desktop data model.
  • You should only have 1 Excel Workbook and 1 Power BI Desktop file open at any one time (as they all use the same connection string).

I am not planning on covering in detail how the tool works.  I am making it available to use and modify as you see fit.  If you want to look into the VBA code and the Power Query to see what I did then of course this is fine by me.  If you want to improve it and give me back the improvements under the same “share and share alike” rules, then please do so as long as you leave my name comments in the file untouched.

How to use the Template

You can download the template here.  I have put it in a zip file so that I can keep the same link but change the name of the actual file over time (to manage version numbers).

The file is a macro enabled Excel Template workbook.  You need to place it in your Personal Templates location.  To find this location, in Excel go to File\Options, then navigate to save (1 below) then check the location (shown in 2).  If you don’t have a current template location, you can set your own.  Note Excel 2010 has a different file location (Google it).

image

To create a new workbook that uses this template, in Excel go to File\New, select the personal templates option (shown as 2 below) and then select the template.

image

I would love to hear what you think and find out if this is a useful tool for you.

Let’s Ask Microsoft to Make this a Standard Feature

I was talking to Avi Singh and he suggested that I request this as a standard feature.  I have set up an idea that you can find and vote for here.  If we get enough votes for this feature, Microsoft may build out this capability into the core product(s) so it is easier for everyone and fully maintained for future releases.

 

Power Pivot Crashes? – An Update

powerpivotlogoToday Avi Singh and I met with Howie Dickerman from Microsoft to discuss the Power Pivot for Excel Add-In.  You can read the background to this meeting at my blog post last month.  Howie is the Program Manager assigned to Power Pivot for Excel and has a wealth of experience with Excel and other products.  I shared all the comments that people added to my blog post from 19th October.  The great news is Howie is already crystal clear about the problems.  We had a great discussion about the plans that Microsoft has for improving stability and performance for the Addin.  And there definitely is a plan.  Like any software project it is impossible to say when we will see something, but I am just excited that there is a plan.

How You Can Help

Microsoft already has a clear understanding of the areas that need some work.  What Microsoft needs more than anything is a set of structured steps that will reproduce an issue – any issue you have.  From experience this is very hard to pin down.  If you know how to reproduce an issue, please send me the steps to reproduce it (you can add to the comments below or send me a message via my home page).

Note:  I realise lots of people have had a bad experience at times.  I am explicitly looking for “reproducible steps” here.  This means that when you follow these steps it will always (or often – eg 1 in 4) cause the issue.  If it is not reproducible then the information cannot help for this next phase.

When to Create a Lookup Table in Power Pivot

Level: Beginners

Today I explain when it is important to create a lookup table and when it is fine to use native columns in a data table.  I have rated this topic as a beginner topic as it is a fundamental skill to learn on your journey to become a Power Pivot and Power BI Ninja.  However there is so much to learn and you can only absorb so much content before your head begins to explode.  Because there is so much to learn, you may find yourself learning this topic for the first time now even though you are an intermediate user. Or you may come back and read this again for a second time before it all sinks in.  This topic is therefore good for everyone, either as a review, a confirmation of what you know, or as a new learning opportunity.

It is also worth pointing out that his topic does not mean the difference between “it works” and “It doesn’t work”.  You can be successful using Power Pivot and Power BI without understanding this topic well.  But it can make a big difference to usability of your model and certainly performance (in some cases).  And not least of all, if you want to get better and advance you skills, you simply must understand these principles well.

Topics Covered in This Post

In this post I cover the following scenarios to help explain some of the nuances and benefits of using lookup tables (or not):

  • Simple data models
  • Inbuilt time intelligence functions
  • To Simplify Your DAX formulas
  • Pointless dimension tables
  • Multiple data tables
  • Simple business communication

First a Definition of the 2 Table Types

Before I move on, it is worth restating the definition of the 2 different types of tables so it is clear in your mind.

There are 2 different types of tables in a Power Pivot data model.  There are dimension tables (I call them lookup tables) and there are fact tables (I call them data tables).  These two table types are very different and understanding the difference is fundamental to understanding and using Power Pivot and Power BI properly.

Lookup Tables (Dimension Tables)

Lookup tables contain information typically about 1 business concept/object, eg Customers, Products, Time etc.

image

Lookup tables always have the following features:

  • There is always a column of data that uniquely identifies each row of data in the table.  In database terms this is the primary key.  In the customer table it would be the customer number, in the products table it is the product code, in the calendar table it is the date, and so on for any lookup table.
  • There can be no duplicates in this key column – every row must be unique
  • There can be 1 or more additional columns in each lookup table that provide more information about that primary key column.  Eg in the calendar lookup table, the date is the primary key. Other columns such as Day Name, Month Name, Year, Day of Week number etc all provide detail about the date key.  I often refer to these additional columns as providing metadata about the primary key.

Data Tables

Data tables contain (typically) transactional information.  It could be sales data, budget data, GL data from a financial system, call centre call data, or any other data about activities of some type.

image

Data tables have the following features:

  • There is no requirement that a data table must have a primary key.
  • There needs to be 1 column that can be connected to relevant lookup tables (assuming you want to connect these tables).  In database terms, this column is called the foreign key.  It should be obvious that if you want to join your data table to your product table, you must have a column that defines the product in your data table.
  • Duplicates are allowed in data table columns.  eg many sales occur on each day, many sales occur for each product.

Joining the Tables

You cannot join data tables to other data tables.  Data tables are always joined to lookup tables using the Lookup Table primary key column as the join column.

I am using the Adventure Works database in the image below.  I have placed the lookup tables in my data model at the top of the window and the data table below.  This is the Collie Layout Methodology; it gives the user a visual clue that the tables you have to “look up” to see are the Lookup Tables – get it?  It is not required, just an easy way to remember.  Technically this data layout is called a star schema.

image

When To Use Lookup Tables

Back to the point of this post.  When should you use lookup tables and when is it OK to simply use the column in your data table?  Well the fact is that you do not “need” to use lookup tables at all to make your data model work – it is perfectly fine to simply load a data table and go from there.  Having said that, read on to find out the nuances of the some common scenarios.

Simple data models

Let me start by covering when it is ok to not use any lookup tables.  If your data meets the following scenario it is fine to ignore lookup tables:

  • Your data set is small.  What defines small depends of course.  Certainly 10k – 250k rows of data is small (tiny actually). It is less important to efficiently store your data when your data set is small and hence a flat single data table can be OK.
  • Your data doesn’t contain too many columns (this is a topic in its own right and outside of the scope of this post).  If you have lots of columns (eg >15) then even 250k rows of data can become non-performant.  But if you have a few columns, then maybe you don’t need lookup tables.
  • Your data table already contains all the columns you need for your analysis.
  • You don’t need to use any inbuilt time intelligence calculations (more on that below).
  • You only need to write simple DAX formulas (such as SUM formulas).

If this is your scenario, then you don’t need lookup tables at all.  However if you always operate in this simple scenario then you will not advance your DAX knowledge to the next level, so keep that in mind.  This scenario with no lookup tables is quite rare and is probably limited to Excel users that already have their data in a single flat table.

Inbuilt Time Intelligence Functions

It is mandatory to have a calendar lookup table if you want to use the inbuilt time intelligence features of Power Pivot.  Examples include calculating sales last year, sales year to date etc. I cover calendar lookup tables in depth in this article here.  It is possible to do manual time intelligence calculations (not inbuilt, but manual) using a single data table, but I don’t recommend it.  If time is an important part of your data, then I strongly recommend you get a calendar lookup table and go from there.

To Simplify Your DAX Formulas

While it is possible to use a simple data model with a single data table and no lookup tables (as mentioned above), once you have more complex analysis needs (eg other than just using SUM) you will need to write some DAX formulas to create the business insights.  In my experience it can be easier to write DAX formulas when you have lookup tables, particularly when you need to change the filter context in your data model.  “How” to do this is also out of scope for this topic, but be aware that this is another reason to move on from a single data table.

Pointless Dimension Tables

There is no point creating a dimension/lookup table without a reason and certainly not if there are only 2 columns in the lookup table.  This will make more sense with an example.

Take the Adventure Works data model as shown below.  It contains a lookup table (Products) with 2 columns, one for Product Key and one for Product Name.  The lookup table is joined to the data table using the primary key (of course).

image

The data in this lookup table would look something like shown below – only 2 columns (but a lot more rows of data than shown in this image of course).

image

People that have some understanding of traditional relational databases will be aware that relational databases will store the data in the first column (ProductKey) a lot more efficiently than the data in the second column (Product Name).  It is therefore common to think that it is more efficient to store the product key in the data table and put the product name in the lookup table only.  But this is not the case in Power Pivot and Power BI.  Power Pivot uses the xVelocity storage engine (columnar store) to compress the data.  There is no (or little) space saving benefit in storing the product code in the data table instead of storing the product name.  In addition every relationship in your data model comes at a cost/overhead.  This is why this lookup table is called a junk dimension. If the only reason you have this lookup table is to join the single column “Product Name” to the data table via the product key, then it would be just as good to load the product name into the data table and drop the product key all together.

If your data already has the product name (and not the product key) in the data table, and you are thinking of creating this lookup table, then you may want to think again.  That being said, there are other reasons why you may want to keep the primary key and lookup table including

  • if you have more than 2 columns in your lookup table.
  • if your product names are not unique (often the product key is managed more rigorously than the name)
  • if your data table already has the product key in the table and it is easier to do it this way.

to name a few.

If you data table contains a column that is not related to any other column as a logical object and you are wondering if you should build a lookup table for it – the answer is no – don’t do it.  Just use the column in your data table.

Multiple data tables

As you probably know already, it is possible to have multiple data tables in your data model. In fact this is one of the many great things about Power Pivot.

image

However as I mentioned at the start of this post, it is not possible to join one data table to another data table in Power Pivot. If you want to use multiple data tables in your data model, you must join them through common lookup tables.  It therefore follows that if you have multiple data tables in your data table, you should load lookup tables even if they are junk dimensions.

Simple Business Communication

There is a clear benefit to business users if you logically group your columns of data into lookup tables that make business sense.  It can be hard for users if they need to hunt through the data table for information about a customer, or a product etc.  It is a much better end user experience if there are logical groupings of columns of relevant data in a table called Customer, Product etc.  Given the end users may be writing their own reports using the field list to find the relevant data, grouping the data into lookup tables can really improve the end user experience.

Flatten Your Lookup Tables

Further to the above point, you should also consider flattening out your data into a single lookup table if you can.  The following data model is feasible and will work, but it is hardly user friendly.  In the image below there is a product lookup table, and then a sub category lookup table of the product table, and a product category lookup table of the sub category table.

image

But if you deploy this model, it is likely to be confusing to the end user.  Remember they will be building reports using the field list, and they will need to go searching across different tables to find the columns of data they use.  It is much better to flatten the data from the additional lookup tables into a single product lookup table if you can.  Also flattening these tables will remove extra relationships and this will make the model more efficient (every relationship comes at a cost).  And finally your DAX will be easier to read and write.

Wrap Up

Hopefully this article has helped clarify when and where to use lookup tables.  Let me know your thoughts and/or any questions that remain unresolved in the comments below.

Find Duplicate Files on Your PC with Power BI

Level: Beginners

If you want to learn new skills using a new tool, then you simply must practice.  One great way to practice is to weave the new tool into you daily problem solving.  If you have something meaningful to do with the new tool, then you are much more likely to be motivated to practice.  And the new tool I am talking about of course is Power BI.

Last week I showed how easy it is to use Power BI to help you track down large files saved in Dropbox so you could manage the overall space usage.  As a result of that article, Graham Whiteman posted a comment suggesting it would be a good next step to find duplicate files.  I think that is a great idea, so I decided to test it out on my PC.  Read on to see how I did it, and how you can do it too.

Create a Query to Fetch All PC files

I started a new Power BI Desktop file, then connected to my PC documents folder

image

image

I immediately selected Edit query as shown in 1 below.

image

The only time you would immediately select Load (2 above) is if the data you are imported is already in the correct shape for Power BI.

The only columns I need are the file name, date modified, attributes and path (shown below).  I Multi selected the columns I wanted to keep, then I right clicked and select “remove other columns”.

image

The next step was to extract the file size from the attributes list. To do this, I expanded the list of attributes (1 below), deselected all the columns and then reselected the file size (3 below).

image

Then I renamed the query (1 below) and changed the query so it didn’t load to Power BI by right clicking on the query and un-checking the enable load option.

image

This created a query that links to the PC, keeps the columns of data needed but didn’t load anything to Power BI yet.

Create a New Query that Accesses the File List

The next step was to create a new query that references the File List.  I right clicked on the first query (1 below) and then selected Reference (2 below).  Note how the File List query is shown in Italics indicating that it won’t load to Power BI.

image

The next step was to merge this data with the itself by going to the Home Ribbon and selecting Merge Queries.

image.

In the Merge Queries dialog, I joined the list of files File List (2) with the original query File List so that it was joined to itself on 3 columns (the File Name, Modify Date and File Size) but not the File Path as shown below.

join file list

The above steps added a new column to the query.  I then expanded the new column as shown below making sure to keep the original column name prefix.

image

Find The Duplicate Files

The second query now looked like this.  As you can see in the image below, the query returned all the files (name column) along with the folder paths from the query “File List” shown as 1, and a second column containing the folder paths from the query “File List (2)” shown as 2 below.

image

The next step was to get rid of all rows in this query where the 2 folder paths are identical.  Doing this is easy with a custom column.  I added a custom column (steps 1 and 2), and wrote a formula to return TRUE if the 2 folder paths were identical.

image

I then filtered out everything that returned a TRUE in the new column using the filter button as shown below.

remove matches

I then deleted this custom column as it was no longer needed.  I just right clicked and selected remove.

Format the Number Columns

It is very important in Power BI to set the number formats before loading the data.  Any numeric column that has a data type “Any” should be changed to a suitable numeric format (as shown below).

image

I did this, renamed the query to be called “Duplicates” and then selected  “Close and Load” to get the data into Power BI.

Time to Write Some DAX

Now the data is loaded, you of course I could just drag the one or more of the columns to the Power BI canvas.  But remember half the reason of doing this is to get some new skills.  So instead of dragging the Size column and creating an implicit measure, I wrote some DAX – it isn’t hard to get started with such simple formulas.  Here’s how to do it.

Select the Size column, go to the Modelling Ribbon and select New Measure.

image

The formula I wrote is as follows

File Size MB = sum(Duplicates[Size])/(1024 * 1024)

image

Note a few things that I was able to do by writing this measure myself

  1. I converted the units of the result from bytes to megabytes by dividing by (1024 x 1024).
  2. I gave the measure a more meaningful name “File Size MB”
  3. I was able to set the formatting to comma separated with 1 decimal place

And of course I practiced my DAX.

And the Results

I simply then added the file size, File Name, Folder Path and Second Folder Path to a table in Power BI like shown below. image

I then discovered I had around 9 GB of duplicate files on my PC.  I sorted the table by File Size descending and discovered that I had multiple identical copies of a contoso.pbix.  It looks above like there are 6 copies of contoso.pbix but this is deceiving. Every copy of a file will find a match with every other copy.  If  you note in the Folder Path column, there are only 3 unique folder paths, hence 3 files.

The next thing I did was add a Tree Map as shown, with the file name in the Group section and File Size MB in the Values section.

image

To find out accurately how many copies of each file there were, I had to write some more DAX.  This formula is a bit more involved (intermediate DAX).

2016-10-31_120903

Let me explain this formula starting from the inside out.  There are 4 functions in this DAX formula and I describe their role below.

  1. SUMX is an iterator.  It iterates over a table specified as the first parameter (VALUES in this case).  You can read more about SUMX here.
  2. The VALUES function returns a table of unique file names (in this case it is unique values in the column Duplicates[Name]).  So SUMX above will iterate over each file name in the name column.
  3. SUMX is iterating over a Virtual Table (VALUES).  The CALCULATE is required to force context transition.
  4. Then for each file name in the table (in 2 above), DISTINCTCOUNT will count how many unique folder names there are.

I then added the new File Count measure to the Colour Saturation section of the Tree Map (1 below).  This does 2 things.  Firstly it shows the high folder count files as being a darker colour, and secondly it adds the file count to the tool tips (visible when you hover the mouse over the visual).

image

And Now Some Fun

I’ve been looking for an excuse to do this for some time.  I want to find the fattest fish in my pond (aka most space taken by file name).  I went to visuals.powerbi.com and downloaded the Enlighten Aquarium custom visual.

app.powerbi.com/visuals/show/Aquarium1442671919391

I then imported the custom visual into Power BI Desktop

image

The I copied my Tree Map visual (Ctrl-c, Ctrl-v), selected the copy and changed the visualisation to be the Aquarium.  This visual is showing the largest individual files regardless of location or how many copies.  I am not saying this is the best way to visualise data, but surely it is one of the most creative.

fish

Here is my final workbook canvas

image

For the purists out there, I wrote a new file size formula as follows.

final

 

The original formula I wrote double counts the file size when there are multiple duplicates.  The above formula is almost identical to the File Count I explained above.  The only difference really is the inclusion of MAX(Duplicates[Size]).  This is a “trick” to handle the fact that for each file name there will be multiple records in the data model.  Each file will have the exact same file size, so by selecting MAX I simply get to access the file size.  I could have used any other aggregator (eg Min, Avg, Sum) and got the same outcome.

I haven’t shared the actual workbook here. The whole idea is for you do try this yourself so you get 3 benefits; more disk space, some practice with Power BI Desktop and have some fun.