Archive for PowerPivot

Uses for DAX as a Query Language

This is the final article in my series about DAX as a Query Language.  In my first article I introduced DAX as a Query Language and then in my second article I went in deeper to the specifics on how to use some of the more useful DAX query functions.  In this final article today I am going to cover a few of the many uses for DAX Queries and also introduce one last DAX function – the ROW function.

Create Summary Tables

As you have may have seen in my previous articles, you can use DAX Queries to create summary tables like the one below.

This may be useful if you want to pre-calculate this summary table and have it as part of your data model (more on that covered in the “Materialisation” section below).  If you want to create a summary table for any reason you can do it using DAX Queries.

Create Tables to Use As Filters Inside  CALCULATE

Another major use of queries is as a table input inside a CALCULATE function.  Because the virtual tables retain lineage to the data model they behave as if they were physically part of the data model.   The following measure returns the total sales for all unique combinations of product category and customer gender where the margin to those unique combinations is greater than 45%.

This measures doesn’t have any real value however it does demonstrate the concept well.

Create Tables to Use in Iterators

You can also create table to use as the table parameter in an iterator like SUMX or AVERAGEX.  The following measure returns the straight line average Margin % across all the actual combinations of product category and customer gender.

Here is an article I wrote some time ago that explains how to use DAX Query tables instead of a more complex nested SUMX approach.  This article will be much more useful with your new understanding of DAX Queries.

To Debug DAX Formulas (the Table Bit)

One thing I really like about DAX Queries is they allow you to “Inspect” tables that are used inside DAX Measures and Calculated Columns.  Excel users come from a world where everything thing they do – every formula and every number – is materialised in front of their eyes in the spreadsheet.  When you begin to write more complex DAX Measures or Calculated Columns, the formulas will often be in the following format:

Your Measure = CALCULATE([Some Base Measure], <A complex table function used to filter the data model>)

The issue of course is you can’t actually “see” the result of “the complex table function used to filter the data model”.  If you can’t see it, it can be hard to conceptualise and also hard to debug the measure particularly if it gives an answer different to your expectations.  This is where the concept of materialising tables comes in.  A great way to move forward when you are stuck is to write a DAX Query to return just the table portion of your measure first.  Once you have that working correctly you can insert the table portion into the measure.

Materialise Tables Into Excel

It is possible to materialise a DAX Query into Excel by using a trick that I learnt from The Italians.  For this next example I have an Excel Workbook that contains an Adventure Works Power Pivot data model.  The first thing to do is insert a Table* (not a Pivot Table) connected to the data model.  To do this, go to a blank area of the spreadsheet (shown as 1 below) and then select Data\Existing Connections\Tables, then select any table (I have selected Products shown as 5 below) and then click Open.  *At this point you can select any table as it is going to change shortly anyway.

image_thumb[38]

You will be given an option to insert a Pivot Table however you also have an option to insert a Table as shown below.  Go ahead and click OK to insert this table.

image_thumb[40]

Now that you have a table inserted into a spreadsheet from the data model, it is possible to change what is returned in this table.  To change what is returned in the table, right click anywhere inside the table (shown as 1 below) and then select Table\Edit DAX.

image_thumb[42]

Now you have the opportunity to change this table from being a “table” to being a DAX Query as shown below.

image_thumb[43]

I have now taken a DAX Query I wrote earlier and pasted this into the expression box (shown below) to return a summary table into the spreadsheet.

image_thumb[44]

You can manually refresh this table by right clicking in the table and clicking “Refresh”.  Unfortunately there is no easy way to pass filters to this type of table in Excel although I believe it can be done with VBA. That may be a topic for another day.

Materialise Tables Into Power BI

You can also materialise DAX Query Tables into Power BI using the “New Table” function.  This can be very useful to help you debug the formulas you are writing (as I covered in this article).  The only thing different to note here is that you need to omit the EVALUATE statement inside Power BI New Tables.  The example below materialises the table (shown as 5) from the (formula shown as 4).

image_thumb[45]

Check the Performance of your Measures using ROW

DAX Studio has a really cool feature that allows you to understand how Power Pivot is completing its task of calculating a formula under the hood.  To demonstrate the point I need to introduce an new DAX Query function called ROW.

You can see the syntax for ROW below from the DAX Studio Intellisense.

image_thumb

ROW returns a single row table.  Because a DAX Query can ONLY return a table, it follows that if you want to see the result of a measure inside DAX Studio then you must first convert the scalar result of the measure into a table – this is exactly what ROW does.

Below is a simple example where I return the value of the [Total Sales] measure as a new table.

image_thumb5

My ROW function (shown as 1 above) returns a single row table with 1 column (called “my result”) and the row has the value 29.3m (shown as 2 above) which is the total for the entire database.  Note the Server Timings button shown in 3 above.  If you click this button it will enable the capability for DAX Studio to keep track of how long your query takes to execute and also how the query is being executed.

With the server timings enabled, a new tab will appear in DAX Studio (shown as 1 below).

image_thumb4

After executing the query, the server timings tab will show the total time taken to execute the query in milliseconds (shown as 2), how much time each engine in Power Pivot completed the tasks (Formula Engine and Storage Engine shown in 3), and if the Storage Engine Cache was used (shown in 4).

SE or FE?

This is a huge topic in its own right and way beyond the scope of this article. What I will say here is that in general you want to leverage the Storage Engine in preference to the Formula Engine.  Take a look at some of the key points below.

Storage Engine Formula Engine
Really Fast Really Smart
Can retrieve records without decompressing in the right conditions Iterates in a row context and hence it can be slow
Multi threaded Single threaded
Cached Not cached

I will cover the Storage Engine and Formula Engine in another post sometime.

CROSSJOIN Revisited using ROW

Earlier in this series I warned that CROSSJOIN can return very large tables, but this is only really an issue if you try to materialise the tables.  If I write the following query in DAX Studio, it takes forever to execute and I ended up canceling the query (remember there are more than 7 million rows in this table).

image_thumb[46]

The issue is not the creation of the above table in Power Pivot – the issue is the materialisation of this table in DAX Studio.  Consider the following query that uses ROW to return a single row table containing the count of the rows in the CROSSJOIN query.

image_thumb[47]

The above query executed in 400 milliseconds on my PC confirming that Power Pivot can create these large tables really quickly.  If you want to materialise the results, well that is a completely different thing.

DAX as a Query Language Part 2

In my last article I introduced DAX as a Query Language and explained why learning to query your data model can be useful.  I covered the basics about using a tool like DAX Studio to query a database and return a table of data as a result.  In this article I am going to go deeper and introduce the more common and useful query functions including CROSSJOIN, SUMMARIZE and ADDCOLUMNS (there are more functions that I don’t cover in this article).

Here is the link to the PBIX data model again if you want to follow along and do the examples yourself.  I am using DAX Studio again to connect to Power BI Desktop running on my location machine.  If you take the time to complete these examples you will significantly increase your learning and knowledge retention.

Lineage

Before I move on, an interesting and important fact about Power Pivot (when compared to traditional database tools) is the concept of lineage (pronounced LIN-E-AGE).  When a new virtual table is created in a query or a formula in Power Pivot, the new table will include an automatic one to many relationship to the table from where it was created.  Consider the following simple table from last time.

image

You can see above that this query produces a single column table of all of the unique product categories.  This new virtual table retains lineage to the data model.  In this case the new virtual table comes from the Products table and hence the new virtual table has lineage to the Products table.  You can “imagine” this as shown below with the new table having a 1 to many relationship to the Products table.

.image

Note the image above is just a visual simulation of what happens.  The virtual table is not materialised and you cannot actually see it in the relationship view. But the virtual table does exist (virtually) and the lineage to the Products table also exists – it is just that you can’t actually see it.  I recommend that you learn to “imagine” this happening in the data model in your mind as it will help you understand how the new virtual table interacts with the rest of the data model especially as it relates to context transition.

All virtual tables have lineage to the tables from where they came from.

CROSSJOIN

CROSSJOIN is a function that can create a new table from 2 or more source tables. For this example I am going to join some virtual tables. The first virtual table is VALUES(Product[Category]) which of course returns a list of all the unique product categories.

image

The second virtual table is a list of all possible Customer genders

image

Next I am going to use CROSSJOIN to create a new table containing all the unique combinations of both tables.

image

In the table above there are 4 rows x 2 rows giving a total of 8 rows of all of the unique values.  Continuing on the concept of “imagining” the way these new virtual tables are related in the data model, it would look something like this:

image

Remember this is just a simulation of what it looks like.  These tables at the top are not materialised and you cannot see them in the data model.  But you can “imagine” them as looking like this and they behave in exactly the same way as they would if they were physical tables.

M x N Can Mean a Large Table

You need to be careful with CROSSJOIN as by definition the resulting table will be m x n rows long where m is the number of rows in table 1 and n is the number of rows in table 2.  If I were to CROSSJOIN the customer table (18,484 rows) with the Products table (397 rows) I would end up with more than 7 million rows.  This in itself is not a problem for Power Pivot to create such a large table in memory, but it can definitely be a problem if you try to materialise the table.  More on that next week.

SUMMARIZE

SUMMARIZE is by far my favourite DAX Query function.  SUMMARIZE can do similar things to CROSSJOIN however CROSSJOIN can join tables that do not have relationships whereas SUMMARIZE can only join tables that are related with a many to 1 relationship.

SUMMARIZE first takes a table and then one or more columns (that can be reached via a many to 1 relationship) that you want to include in the new summarised table.

SUMMARIZE(<table>, table[column], table2[column],….)

Here is an example.

image

The query results above are similar to the CROSSJOIN query from before but there is one important difference.    SUMMARIZE will only return rows that actually exist in the data itself (Note there are only 6 rows above compared with 8 rows in the CROSSJOIN example).

Consider the relevant tables from the data model below.

image

Here is the SUMMARIZE formula written earlier.

This query starts with the Sales table and then adds the Products[Category] column from the Products table and the Customers[Gender] column from the Customers table.  The 2 columns specified inside the SUMMARIZE formula come from tables on the 1 side of the many to 1 relationships – this is allowed.

The following is not allowed and will not work.

EVALUATE
SUMMARIZE(Products,Sales[CustomerKey])

It doesn’t work because the column Sales[CustomerKey] cannot be reached from the Products table via a many to 1 relationship.

It is also possible to write a SUMMARIZE statement over any single table.  In the example below, the SUMMARIZE statement returns a list of all possible combinations of product category and colour.

image

You could also achieve the same result with the ALL function (which would be an easier solution if you are just using a single table)

Adding Summary Sales to the Summary Table

So far the SUMMARIZE queries above are just lists of valid combinations.  It is time to do something more interesting and add the [Total Sales] to these summary tables.  Before moving on please note that the following formulas are not best practice – there is a better way which I will cover later.

Consider the following formula

Note specifically that the table parameter in this formula is “Products”.  Also note below that this formula returns blank rows (shown below).

image

This summarize statement correctly summarises all the combinations of Product[Category] and Product[Color] in the products table and then for those products where there are sales, those sales are shown next to the unique combination.  But in some cases the unique combination doesn’t actually have any sales, hence the blank rows.

Using Sales as the Table Parameter

If I change the formula above and swap out the Products table with the Sales table, then the blank rows are no longer visible (see below).

image

SUMMARIZE will always find the unique combinations that actually exist in the selected data.  Because this new formula starts from the Sales table, only combinations of Product[Category] and Product[Color] where there are actual sales are returned.

Context Transition or No Context Transition?

Those of you that are familiar with the concept of context transition may be thinking that context transition is occurring here.  That is a valid thing to assume but this is not what is happening here.  Consider the following formula.

image

Note how I have swapped out the measure [Total Sales] with SUM(Sales[ExtendedAmount]).  With this new formula above there is no CALCULATE forcing context transition yet despite this the table still returns the same result.  This implies that SUMMARIZE does not operate in a Row Context.  In fact SUMMARIZE is a Vertipaq Storage Engine operation.  The part that produces the valid combinations of columns is very efficient however the calculation of the total sales figures are very inefficient.  For this reason it is better to use ADDCOLUMNS to add the sales totals (see below).

ADDCOLUMNS

ADDCOLUMNS does exactly what it suggests – it adds new columns to a table in a query.  The general syntax is as follows:

ADDCOLUMNS(<table>,”Column Name”, <formula or measure>,….)

To demonstrate how this works, let me start with a formula from earlier that produces the following table.

The SUMMARIZE function returns a 2 column table with 15 rows – all the possible combinations that contains sales values.  This table can be used as the table parameter in the ADDCOLUMNS formula as follows.

image

You can see above that this new table returns the Total Sales for each of the 15 possible combinations.

And it is possible to add as many new columns as you need to the summary table.  See below.

image

The Important Differences between ADDCOLUMNS and SUMMARIZE

Now you have seen that it is possible to add columns (such as summary total sales) to a table using SUMMARIZE and also with ADDCOLUMNS.  But there are some important differences between these 2 approaches.

ADDCOLUMNS has a Row Context

Unlike I showed with SUMMARIZE earlier in this article, ADDCOLUMNS does have a row context.  Consider the following query.

image

When I swap out the measure [Total Sales] with SUM(Sales[ExtendedAmount]) the results are wrong.  This shows that ADDCOLUMNS operates in a row context.

Efficiency

When given the choice, you should choose to use ADDCOLUMNS in favour of SUMMARIZE to add these additional columns of data.  ADDCOLUMNS is a lot more efficient in the way it adds the values to the SUMMARIZE table. SUMMARIZE uses a Vertipaq Storage Engine operation to produce the base table and then ADDCOLUMNS leverages lineage and context transition to add the value columns – this approach leverages the special capabilities of Power Pivot to do the job in the most efficient way.  For a more detailed coverage of this topic you should read this article from The Italians.

Other DAX Query Functions

I realise I have not covered all DAX Query functions in this series of articles.  There are others, some of which are only available in the newer versions of Power Pivot (eg Power BI Desktop, Excel 2016).  If you are interested to find out more you can do some research online.  I will be covering 1 final function next week – the ROW function.

Uses for DAX Queries

Next week I will share my final article in this series where I explain a few ways that you can use DAX Queries in the real world.  Be sure to check back next week, or better still sign up to my weekly newsletter to be notified when there are new articles.

Power BI Skills – Thinking of Data in Tables

Level: Beginners

One very important concept that Excel users often struggle with when starting on their Power BI and Power Pivot journey is that they need to learn to think about data in tables rather than thinking about data in cells.  In the spreadsheet world every cell can be a different formula or value, but in the Power BI and Power Pivot worlds, it doesn’t work this way.

The Excel Way

The Excel way is very simple.  You load data into the spreadsheet – each cell in the spreadsheet contains one data element (one scalar value).  From there you can write a formula in a single cell (see cell E2 below).

Standard Excel

If you want the same formula to appear many times in your spreadsheet, you copy the formula down the page. The key point here is that every cell in the above example is either a scalar value or an individual formula that returns a scalar value. That is how traditional Excel works and that is why us Excel folk are comfortable thinking about data in this way.

But What About Excel Tables I Hear You Say?

Yes, good point.  Sometime in the past Excel was given a table object as part of it suite of tools.  Many Excel users don’t know about these and hence don’t use them.  Excel tables solve quite a few issues compared with Excel cell ranges including:

  • allowing you to write a single formula, and have that formula apply to the entire column in the table
  • extending the formulas in the table when there are new rows added to the table
  • extending a reference (such as a pivot table source reference) so that it will point to a table (that can change size and shape) rather than pointing to a static range of cells.

Below I convert the same Excel range from the first image into an Excel table.  Note how I now only need to add a single formula to create “Sales Inc Tax”.  The single formula now applies to the entire table.

Excel Tables

When you have an Excel table like the one above, you can then start to refer to the entire table or columns in other formulas in your spreadsheet.  See in the image below how the formula in cell G3 refers to a single column in the table.  As the table grows, the formula doesn’t need to be changed – it just works.

refer to columns

This use of tables in Excel as shown above is very similar to how it works in Power Pivot and Power BI.  You no longer think about clusters of cells and instead have to consider the entire table or column.

Step 1: Thinking of Data in Tables in Power Pivot

The key things you need to know as you learn to think of data in tables in Power Pivot are:

  1. Measures are written to operate over the entire data model.  You can (and normally do) get a different answer to the same formula depending on what filtering is currently applied to the columns, tables and visuals in your data model/report.
  2. You can’t directly reference rows in a table. The only way to reference specific row(s) is to first “filter” the table so that the rows you want to access are “visible” (and all the rows you don’t want are “not visible”).  After the filter is applied, then the measure or calculated column (mentioned in point 1 above) will work for the filtered copy of the table (it is a little more complicated than that for calculated columns but that is beyond the scope of this article).
  3. You need to learn to “imagine” what the table looks like with filters applied.  The reason you need to learn this skill is because many of the tables you will use in DAX formulas are ‘virtual tables’, that is the table is never materialised in a way that you can see it.  You need to learn to “imagine” what the table looks like in its filtered state.  I wrote a blog post at powerpivotpro.com that explains how you can use Power BI to physically materialise these virtual tables into temporary “test tables” to help you get started on your journey of being able to visualise what is happening.  Once you get the hang of it you will only need to materialise a table when you can’t work out why your formula isn’t working or if there is a specific question that is too hard to answer without looking (like question 5 that you will see later in this post).

Here is some imagination practice for you

Imagine you have a calendar table containing every date from 1 Jan 2010 to 31 Dec 2016 as well as columns for [Day Name], [Month Name] and [Year].  It would look something like this.

image

This table of course would have more than 2,500 rows.  Picture this table in your mind (eg stop looking at the image above).  Now imagine what would happen if you applied a filter on the Year column to be equal to 2012 and another filter on the Month Name column to be Feb.  Once you have applied these filters in your mind, then answer the following questions by referring to your imaginary table in your mind.

  1. What would the table look like now?
  2. How many rows are there visible in this table with these 2 filters applied?

Step 2: Thinking of Data in Columns

As well as mastering tables, you need to master columns.  Here are some additional questions for your imaginary table – this time the questions are about columns.

  1. How many unique values now appear in the Month Name column?
  2. How many unique values now appear in the Day Name column?
  3. What is the minimum date that exists in the Date column?
  4. What is the maximum date that exists in the Date column?
  5. What is the Day Name for the last date in this filtered table?

The answers are are at the bottom of the page.  But do yourself a favour and answer each question yourself without cheating – this will help you understand how skilled you are with thinking of data in tables and columns using your mind’s eye.  In fact I don’t expect you to be able to answer number 5 without doing some research.

The reason you need to think about columns of data are two fold.

  • Many of your formulas will operate over single or multiple columns.
  • Power Pivot is a columnar database and it is optimised for working with columns.

As a general rule, it is more efficient to apply filters to individual columns one at a time rather than apply filters to multiple columns at the same time.  Consider the following two measures.

The second formula is much more efficient because there are 2 separate filters being applied to 2 separate columns and they are applied one at a time (in a logical AND).  This is much more efficient than asking Power Pivot to consider both columns at the same time.

Note the FILTER functions in these two measures above all return a filtered copy of the calendar table.  You can’t see the filter copy of the table and that can make it hard to understand what is happening.  But if you learn to imagine what the tables look like in their filtered state you will be well on the way to becoming a DAX super star.

Step 3: Thinking of the Entire Data Model

The final thing you need to learn is that the entire data model operates as an enclosed ecosystem.  In the data model shown below, there are 4 lookup tables at the top of the image and 2 data tables at the bottom (from Adventure Works).

All 6 of these tables operate as a single system.  Filters applied to the lookup tables propagate downhill from the top (one side of the relationship) to the bottom (many side of the relationship).  Filters do not automatically propagate from the bottom to the top.  So once you learn to think about how filtering will affect a single table, you then need to take a further step to imagine how the filters will propagate to all other tables in the data model.

Answers to the Quizzes Above

Tables

  1. What would the table look like now?  It would still have 4 columns but now it has only 29 rows.
  2. How many rows are there visible in this table with the filters applied?  29.

Columns

  1. How many unique values appear in the Month Name column?  1 – Feb.
  2. How many unique values appear in the Day Name column? 7, the days from Sun to Sat
  3. What is the minimum date that exists in the Date column?   1 Feb 2012
  4. What is the maximum date that exists in the Date column?  29 Feb 2012
  5. What is the Day Name for the last date in this filtered table?  It is Wednesday, but this is a very hard question and too difficult to answer without either materialising the table or doing some research to check the day of week for the 29th Feb 2012. 

Relationships in Power BI and Power Pivot

Level: Beginners

Power Pivot is a database that has been designed from the ground up to be optimised for reporting and analysis in Power BI, Power Pivot for Excel and SSAS Tabular.  The technology is essentially the same across all of these products so I will generically refer to Power Pivot in this article.

Power Pivot uses an in memory columnar database (Vertipaq) as the foundation technology – the same technology for all versions (Power Pivot, Power  BI and SSAS Tabular).  The Vertipaq engine is what makes Power Pivot both super fast and highly compressed.  A Power Pivot database is not the same as a relational database (RDB) and it does not support all the relationship types that relational databases support.  This can be confusing for people that are new to Power Pivot, particularly if they have at least a basic understanding of how databases (such as MS Access) work.  I explain what you need to know to get started with relationships in Power Pivot here.

Virtual Relationships

This article is specifically about physical relationships, however there are ways to create virtual relationships using DAX.  Examples include using LOOKUPVALUE, FILTER, CROSSFILTER and other techniques.  I am not covering these types of relationships in this article.

Relationship Cardinality Types

There are 3 different possible physical relationship cardinality types in database design but not all are supported by Power Pivot.

Relationship Cardinality Type Power BI Support Power Pivot for Excel Support
One to Many Yes Yes
One to One Yes No
Many to Many No No

One to Many Relationships

image

The one to many relationship is the foundation of Power Pivot.  In the example above (from Adventure Works in Power BI Desktop), the Customers table is on the 1 side of the relationship and the Sales table is on the many side of the relationship. These tables are joined using a common field/column called “CustomerKey”.  Customer Key (aka customer number) is a code that uniquely identifies each customer.  There can be no duplicates of the customer key in the customer table.  Conversely the customer can purchase as many times as needed and hence the customer key can appear in the Sales table as many times as necessary.  This is where the name “one to many” comes from – the customer key occurs once and only once in the Customers table but can appear many times in the Sales table.

Tables on the one side of the relationship are called Dimension tables (I call them Lookup tables) and the tables on the many side of the relationship are called Fact tables (I call them Data tables).

The entire Power Pivot Vertipaq engine is optimised to work with this (one to many) type of relationship.

One to One Relationships

The One to One relationship is only supported in Power BI and the newest version of SSAS Tabular.  In my view this relationship type has limited value and in most cases it is better to combine these tables into a single flat table prior to loading to Power BI.  Consider the model below.

image

The first relationship (shown as 1) is a 1 to many relationship between the Customer table (Lookup table) and the Sales table (Data table).  The Customer Socio Economic Data table is joined to the Customer table via a 1 to 1 relationship (shown as 2 above).  If there is a benefit (to the user of reports) of splitting this Socio Economic data into a separate table then of course you should do so.  If there is no benefit, I recommend you combine all the data from Customer Socio Economic Data table into the Customer table using Power Query on load.

Every relationship has a “cost” in that it will have some affect on performance.  The performance impact may not be noticeable for simple models but may become an issue with very complex models.

If you only remember 1 thing from this article, then please let it be this:  Don’t automatically accept the table structure coming from your source data.  You are now a data modeller and you need to make decisions on the best way to load your data.  Your source system  is probably not optimised for reporting (unless it is a reporting datamart) so please don’t assume that what you have got is what you need.

Many to Many Relationships

The many to many relationship type is not supported in Power Pivot.  This is a deliberate design decision that has been made as a trade off to ensure optimum database performance.  If you have data that is logically related with a many to many cardinality, there are modelling techniques you can use to solve the problem in Power Pivot that are covered in my many to many pattern article here.

UI Differences

There are a few differences in the Power BI Desktop/Excel 2016 UI (relationship view) and that of Excel 2010/2013.

Excel 2010/2013

The early UI has an arrow pointing to the lookup table (the one side of the relationship) and a dot on the many side.  This is unfortunate as the arrow is pointing in the opposite direction of the filter propagation.  This only exists in Excel 2010/2013 (and the early version of SSAS Tabular).

Power BI/Excel 2016

The UI has been significantly improved with Power BI Desktop and Excel 2016.  As you can see below, the 1 to many relationship is now clearly shown, and there is also a new arrow showing the automatic filter propagation direction.

image

One Active Relationship

It is possible to have more than 1 relationship between tables in Power Pivot, but only 1 can be active at a time.  An example of when you may want multiple relationships is if you have a Sales[Order Date] and a Sales[Ship Date] in your data table.

In this scenario (shown above in Excel 2013) you may want to join both Sales Date columns to your Calendar table so you can use time intelligence in your data model on both Order Date and Ship Date.

The active relationship is shown as a solid line (above) and the inactive relationship is shown as a dashed line (in this case it is highlighted in blue above).  The active relationship is used by default in all DAX measures however you can over-ride this default and use the inactive relationship (when needed) by using the USERELATIONSHIP() function.  Full details on this are covered in my article here.

Cross Filtering Behaviour

Power Pivot is optimised to work with one to many relationships and to automatically propagate filters (filter context) from the one side to the many side.  In all versions of Power Pivot for Excel, this is the ONLY type of filter propagation that is available.

Power BI supports bi-directional cross filtering behaviour (shown right below) as well as single direction (shown left below).

image

In fact bi-directional filtering is the default behaviour for Power BI Desktop.  There are many people (including me) that think this is a bad idea as bi-directional cross filtering comes at a cost – there is an overhead of constantly cross filtering the lookup table based on the contents of the data table at time when it is actually not required.  Sure if you have a simple model and you need this behaviour and you don’t know how to handled it with the many to many pattern  then turn it on.  But surely this should not be turned on by default.  In addition, if you have more than 1 data table, bi-directional cross filtering can cause circular references causing further confusion to unsuspecting users.

I think Microsoft is trying to make Power BI more user friendly for the “lay user” however in this case I think Microsoft has made a mistake.  Do yourself a favour and turn off bi-directional cross filtering unless you explicitly need it.  To change it just double click on the arrow and set the cross filter direction to single.

update:  Last week 17th Feb 2017 I noted that new models I built were single directional by default – seems Microsoft has listened and changed the default behaviour.

image

Wrap Up

Hopefully this article has helped you understand more about how Power Pivot works.  Let me know in the comments below if there is anything I have missed.

Best Practices for Power Pivot, Power Query and Power BI

Level: Beginners

There are many best practices for Power BI, Power Pivot and Power Query.  I know these things so intuitively now that it is very easy to forget how I incrementally learnt these things along the journey.  Most of these things are not “right” vs “wrong” – they are more often simply “better”.  I thought there would be value in producing a page that clearly outlines the important best practices as a learning reference for everyone.  In all cases I have outlined why it is a best practice to help the understanding.  You would be well placed to adopt these best practices as part of your DAX journey.

Naming Conventions

Naming your Columns and Measures

  • Always write a Column in the format TableName[Column Name]
  • Always write a Measure in the format [Measure Name]

This is the foundation of all the DAX formulas you will write.  Both columns and measures use the same square bracket syntax.  It is technically possible to write measures and columns both including the table name as follows.

  • TableName[Column Name]
  • TableName[Measure Name]

These 2 examples above do not cause a problem because their names makes it easy to know what they are, but if you always place the table name at the front of every measure and every column, then it will be impossible to tell them apart by reading the formula.  Take the following example

  • TableName[Total Sales]

Is the above a column or a measure?  It is impossible to tell unless you are using the best practice naming convention.

Measures and columns are very different in the DAX language.  It is essential that you can tell at a glance which is which.  In addition if you hard code a measure with the table name and then you later move the measure to another table, any formulas referring to this measure will stop working.

Give Tables a Single Noun Name

  • Don’t just accept the table name from your source system.  Preferably give the table a single word noun description/name.

Many BI data sources will have long table names like fctSalesTransactionsHistory or dimCustomerMasterFile.  This is a common practice in IT and is related to the Kimball dimension modelling methodology.  The problem is that with Self Service BI these table names, column names and measure names are more visible to business users than ever before.  Given many business users are going to be reading these table names, it is much easier for them to comprehend the “Sales” table rather than the “fctSalesTransactionsHistory” table.  In addition earlier versions of Power Pivot do not have fully featured intellisense – you are required to refer to columns by starting to type the table name from the beginning.  If every table starts with either fct or dim, you have just added 3 additional characters you have to type for each formula before Intellisense can help you.

Also PowerBI.com has a natural langauge query tool that allows you to ask quetions of your data.  If your table names are ‘words’, then you are helping the engine find what you are looking for.

Using Spaces in Names

  • Don’t use spaces in table names
  • Do use spaces in column names
  • Do use spaces in measure names

If you use spaces in table names you will be forced to add single quotes around the table name each time you refer to it in a formula.  This makes the code longer, harder to read and “untidy” (IMO anyway).  It is better to use underscore_characters or CamelCase instead of spaces (or better still use a single noun name as mentioned above).

Columns and measures always need to be wrapped in [Square Brackets] anyway and hence adding spaces does not make the code any more complex.  Columns and measures are easier to read if they have spaces

Don’t Overly Abbreviate Business Terms

  • Give your tables, columns and measures descriptive business names without overly short abbreviations.

Firstly you should use language and abbreviations that are commonly used in your organisation.  So if “Year to Date” is commonly abbreviated to YTD, then for sure you can use this abbreviation in your measure names eg [Total Sales YTD].  However if you develop a new measure called [Total Sales Last Rolling Quarter] and this is not a common concept across the organisation, then you are just making it hard for yourself if you call your measure [Ttl Sales LRQ].  You will simply have people calling you asking what it means.

Secondly Power BI has a feature called Q&A that allows a user to ask a natural language question about data.

eg.  What were the total sales for bikes last year

This natural language algorithm looks for matches in words in the question against the data model to help answer the question.  If you abbreviate your measure names to for example [TtlSales] instead of [Total Sales], you are making it hard for Q&A to do its work.  You can help Q&A using synonyms, but do yourself a favour and don’t over abbreviate your name.s

Measures or Calculated Fields

  • Measures is a better name than Calculated Fields

The term measures has been a Microsoft term for a BI formula for many years.  In the first release of Power Pivot in Excel 2010, Microsoft adopted this term.  Unfortunately in Excel 2013 somehow a decision was taken to rename “measures” to be called “calculated fields”.  This was a bad decision and thanks to lobbying from many people Excel 2016 reverted to using the term measures (as does Power BI).  I always now use the term measures and never refer to Calculated Fields unless I am explaining to Excel 2013 users why they are stuck with a bad name.

Loading and Shaping Data

Push Shaping as Close to the Source as Possible

  • Wherever possible, you should do your data shaping as close as possible to the data source.

There are many ways that you can shape your data in the Microsoft BI stack.  Power Query is a great tool to reshape your data however you can also use Power Pivot (Calculated Columns, Filters on load) and Power BI also includes Calculated Tables.  And you can always write SQL code and paste that into the tools to extract the data that way.  The main problem with these approaches is you are effectively hard coding a solution for a single data set.  If you want to build another data set in the future, the work needs to be done again (either copy or re-write).  The data shaping tools are designed to allow you to do what ever you need without having to rely on a third party – use these tools if you need to.  However if you have a common need for data in a particular shape and you can get support (from IT or otherwise) to shape your data at the source so you can easily get what you need, then there is definitely value in doing that.

Shape in Power Query, Model in Power Pivot

Power Query and Power Pivot were built to do 2 completely different tasks.  Power Query is built for cleansing and shaping while Power Pivot is built for modelling and reporting.  It is possible that you can shape your data in Power Pivot (eg you can write calculated columns, you can add calculated tables (in the newer versions) etc).  But just because you can do these things in Power Pivot, doesn’t mean you should.   For example it is possible to write letters to people using Excel, but Word is a much better tool for this task (I knew someone that once did that!).

Best practice is that you should use Power Query to shape your data before/during load, and then use Power Pivot for measures and reporting. I have deeper coverage on this topic here.

Use A Calendar Table

  • If you want to any sort of time calculations, get a Calendar table

It is possible that you can analyse your data in a single flat table without using any lookup/dimension tables.  A Calendar table is a special type of lookup/dimension table because it can be used to perform time intelligence functions.  I have an article on time intelligence here and another on Calendar tables here.  Bottom line – get a Calendar table.

A Star Schema is Optimal

  • Power Pivot is optimised to use a Star Schema table structure

I have an in-depth article about star schemas here that you can read if need be.  I am not saying this is the only layout that will work, or that other designs will always be slow.  I am saying that if you start out thinking about a star schema and aim to build that design you will be well under way to success.  Two key things you should know.

  • Don’t just bring in what is in your source transactional database – that would likely put you into a world of pain.
  • There is no need to create a lookup/dimension table just for the sake of it.  If your sales table has customer name and you don’t care about anything else about the customer (eg city, state etc), then there is no need to create a lookup table just for the sake of creating a star schema.  If you have 2 or more columns relating to the same object in your data table, then it is time to consider a lookup table.

You Should Prefer Long and Narrow Tables

  • Short wide tables are generally bad for Power Pivot but long narrow tables are great.

image

There are 2 main reasons why loading data this way is a good idea.

  • Power Pivot is a column store database.  It uses advanced compression techniques to store the data efficiently so it takes up less space and so it is fast to access the data when needed.  Simplistically speaking, long narrow tables compress better than short wide tables.
  • Power Pivot is designed to quickly and easily filter your data.  It is much easier/better to write one formula to add up a single column and then filter on an attribute column (such as month name in the green table above) than it is to write many different measures to add up each column separately.

Only Load the Data You Need

  • Load all the data you need, and nothing you don’t need.

If you have data (particularly in extra columns) you don’t need loaded, then don’t load it. Loading data you don’t need will make your workbooks bigger and slower than they need to be.  In the old world of Excel we all used to ask IT to “give me everything” because it was too hard to go back and add the missing columns of data later.  This is no longer the case – it is very easy to change your data load query to add in a column you are missing.  So bring in all of what you need and nothing you don’t.  If you need something else later, then go and get it later.  It is even advisable to use a tool like PP Utilities to show you which columns are not in use so you can delete them.  Focus mainly on your large data tables – the lookup/dimension tables tend to be smaller and hence are generally less of an issue (not always).

Don’t use Linked Tables

It is possible to add your data to a table in Excel and then use a Linked Table to load it into Power Pivot.  You simply select the data, go to the Power Pivot Menu (1 below) and click Add to Data Model (2 below).

The trouble with doing this is that you end up with 2 copies of the data in your workbook. The Excel table is an uncompressed copy and then you also have a compressed copy inside Power Pivot.  In the example (shown as 3 above) there are many thousands of rows of data.  The uncompressed data can take up 6 – 10 times more space than the equivalent compressed data.  If you have a small table with a couple of columns and 10 or 20 rows then it is fine.  However if you have anything more than that you are better to have 1 workbook for your data and then import the table directly into Power Pivot without storing it in Excel at all.

Modelling

Avoid Bi-Directional Relationships

  • Avoid using the default bi-directional relationships in Power BI unless you need them.

image

Microsoft introduce bi-directional filter propagation in Power BI (this is currently not available in Excel).  This is a good feature for beginners and those that don’t really understand how the plumbing works.  But it comes at a cost in that:

  • The performance can be negatively affected
  • If there is more than 1 data table, there can be circular relationships created (just like in cells in Excel)

I recommend you turn your bi-directional relationships to single direction (double click to change) and only turn them back on if you really need them.

Measures are Better than Calculated Columns

  • Wherever possible you should write Measures in Preference to Calculated Columns Where Possible

I have been a strong proponent of this for many years.  It mainly applies to Excel users that don’t have any formal learning about database design.  It is very easy for an Excel user to think they should write a calculated column (because they know how to do that) and not a measure (because that is a bit foreign to an Excel user).  I am not going to cover this in depth again now as I have already covered it here.  Just do yourself a favour Excel folk and start with the assumption that a measure is what you should write unless you know why a calculated column is a better fit.

For the record there are times when a calculated column is the best option, but 99.9% of all use cases coming from new Excel users won’t need them.  The main exception is if you need to use the formula in a slicer to filter your data – then you will need a column.

Store Measures in the Table Where the Data Comes from

  • You have a choice where to store your measures, so place them in the table where the data comes from.

Take for example a measure like this.

Total Sales = SUM(Sales[Extended Amount])

The data for this measure is coming from the [Extended Amount] column in the sales table.  You should therefore store the measure in the sales table.   If you place it in (say) the Calendar table, Excel will likely give you a warning similar to shown below.

image

This issue doesn’t occur in Power BI.

Break Measures into Interim Parts

  • Break your DAX problems into manageable pieces and solve each piece one at a time.

You wouldn’t use a single cell in a spreadsheet to build a financial model.  The cells are there to be used and make it easier to build a solution that meets your needs.  You should think the same way about measures.  The following formula is valid however it is hard to read, write and debug.  It also repeats the same line of code multiple times throughout the measure.  Having said that it will give you the % change in sales vs last year.

 

It is much easier to write interim measures and then join the pieces together to solve your problem.  Plus you get each interim measure available for reuse elsewhere in your model.  I am sure you will agree the following set of 4 measures are much easier to understand.

Total Sales = SUMX(Sales,Sales[Qty] * Sales[Unit Price])

Total Sales LY  = CALCULATE([Total Sales],SAMEPERIODLASTYEAR(Calendar[Date]))

Chg vs LY = [Total Sales] – [Total Sales LY]

% Chg vs LY = DIVIDE ( [Chg vs LY], [Total Sales LY] )

Don’t Break Calculated Columns into Interim Parts

  • It is good to have interim measures but it is bad to keep interim columns.

Interim measures are calculated on the fly on demand, they take up little space and make it easier to write your DAX.  As with measures, I t is easier to write calculated columns using interim calculated columns, however the problem is that every column is pre-calculated and stored on disk, and each additional column makes the data model take up more space on disk and memory and hence makes it less efficient.   By all means write interim columns if you need to in order to create a calculate column (not withstanding the earlier comments of columns vs measures) however once you have worked out the correct syntax, concatenate all the code into a single “Mega DAX formula” in a single column. This is an Excel concept I learnt from John Walkenbach.

Other Advice

You Can’t Start to Learn DAX by Reading Alone

I say this up front in my book “Learn to Write DAX”.  If you think you are going to learn a new skill like Power Pivot, Power Query or Power BI by reading a book and not getting your hands in the system, let me tell you “you can’t”.  The exception is if you are a professional SQL Server database user and have a solid background in reporting and analytics database technology, then I am sure you can learn this way. For the rest of us Excel folk, there is no substitute for practicing what you read – so do yourself a favour.

Use 64 bit If You Can

This one causes a world of pain for many people.  Power Pivot is the ONLY MS Office product that can benefit from 64 bit, but unfortunately it is all or nothing.  Most organisations have deployed 32 bit and will be very reluctant to give you 64 bit Office.  You can read my article on this topic here and also read up about my work arounds including installing 64 bit Power BI Desktop with 32 bit Office, and then also using Power BI Desktop as a local server for Excel.

DAX Time Intelligence Explained

Level: Beginners

I help a lot of people on forums who ask questions about time intelligence for DAX.  If you are just starting out then the chances are that you may not even be clear what time intelligence is and hence sometimes you don’t even know what to ask.  Often the question is something like “I want to show this year and last year on a chart”, or “total year to date this year compared with last year” etc. If you want to do any time based comparison using DAX, Power Pivot and or Power BI, then this article explaining time intelligence is the right article for you.

Definition of Time Intelligence

Time intelligence is the collective name for a set of patterns (DAX in this case) that can be used to solve time comparison problems.  Examples include comparing:

  • Same period prior month, quarter, year etc.
  • Same period next month, quarter, year etc.
  • Same period year to date compared with prior year, next year etc.
  • Rolling 30 days, 60 days, 90 days, 12 months etc.
  • etc. – there are many many more

Time intelligence is used when you want to “time shift” any period with another period of time for comparison purposes or to simply display a different period than the selection.

Understanding Filter Context

Before you can understand why time intelligence needs a special approach in DAX, you first need to have a clear understanding of Filter Context.

Pivot Tables and Power BI visuals both slice data so that you can “drill” and “filter” to see a sub-set of data.  Take the image shown below.  On the left is a pivot table and on the right is a Power BI visual.

image

Starting with the Excel Pivot table on the left, every value cell in the pivot table has been filtered more or less by the Rows, Columns, Filters and Slicers that make up the Pivot Table.  The cell highlighted as 1 (above left) has the following filters applied.

  • Territory[Country] = “Australia” – this comes from the Rows in the Pivot
  • Calendar[Year] = “2003” – this comes from the Year in the slicer
  • Product[Category] = “Bikes” – this comes from the Category in the filter.

After these filters are applied, the calculation for the cell is evaluated and the answer $2,947,789 is returned to the cell.  Every value cell in the pivot table is evaluated in exactly the same way – including the Grand Total row in the pivot table.  In the case of the grand total row, Product[Category] and Calendar[Year] have the same filters, but there is no filter on Territory[Country].

On the right hand side in the image above is a Power BI visual.  Filtering in Power BI visuals fundamentally works the same way as a pivot table however there are more places for cross filtering to come from.  In the image above, the same filtering is applied as in the Pivot table but in the Power BI example the filters are applied in a different way.

  • Territory[Country] = “Australia” – this comes from the bar chart Axis
  • Calendar[Year] = “2003” – this comes from the Year in the slicer
  • Product[Category] = “Bikes” – this comes from the Category in the tree map visual.

When filter context is passed from a visual to the underlying data model, all the relevant tables are filtered before the calculation is completed.  Filter first, evaluate second is a fundamental principle for all DAX formulas.

The Time Intelligence “Problem”

Let’s assume you want to compare total sales on a particular year vs prior year.  One way to do this (in Excel) is to put the years onto Columns in a pivot table as shown below (a similar approach can be used in Power BI).

image

But doing it this way causes many problems, including:

  • There are years in the pivot table that you don’t want (eg 2001, 2004).  You will need to somehow manually hide or filter the ones you don’t need.
  • If you want to calculate the absolute change or % change year on year you will need to hard code some formulas in the cells next to the spreadsheet and hence they can’t be reused in other visuals later.
  • If you want to look at a different year you will potentially have to go through the process of doing the filtering and formulas again from scratch.

A better way to solve this problem is to select the current period (using a slicer or filter of some sort) and have a universal formula that returns the result relative to the selection.  So if you select 2003, the formula will automatically select 2002 for you.  If you select 2002, it will automatically select 2001 (and so on).

Filtering is Now Your Enemy

If you want to write a relative time intelligence formula, then the natural filtering behaviour of a visual becomes your enemy. Imagine a calendar table with a filter applied Calendar[Year] = 2003.  If you imagine what the filtered data model would look like “Under the hood”, it should be clear that the filtered calendar table will show rows starting on 1 Jan 2003 and ending on 31 Dec 2003 – it will contain 365 unique days for the single year 2003.  The filter has already been applied to the data model so only days in 2003 are available for the calculation – how then can the data model possibly access sales for the year 2002?  There needs to be a way to go back in time and fetch a different period (in this case the period is 1 year earlier in time than the selected period), yet the 2003 filter has already been applied preventing this from occurring naturally.  This is the reason why there needs to be a special set of time intelligence functions – to solve this natural filtering “problem”.

How Time Intelligence Functions Work

Time intelligence functions are specifically designed to solve the filtering issue described above.  All time intelligence functions execute the following 4 steps:

  1. first “detect” the current filter context to determine what the “current” selected period is
  2. then remove the calendar filtering from the underlying tables so that data for “all time” is once again available.
  3. then perform a time shift to find a different period in time (as specified in the formula).
  4. Finally reapply filters on the data model for the time shifted period prior to calculating the final value.

Custom vs. Inbuilt Time Intelligence

There are 2 types of time intelligence functions – Custom and Inbuilt.  Inbuilt time intelligence functions are easier to write because they have been designed to protect the user (ie you) from the complexity.  I am not going to cover Inbuilt time intelligence in depth here because they are relatively easy to use.  See link at the bottom of the page if you would like to download the DAX reference guide I produced (including all the time intelligence functions).

The reason inbuilt time intelligence functions are easier to learn is they actually are what is known as “Syntax Sugar”.  Microsoft has created these special functions to make them easier to write and easier to understand.  You follow the simple syntax and Power Pivot does the rest.  But under the hood the inbuilt time intelligence functions are converted to the full syntax prior to evaluation.  Consider the following two examples (just to illustrate the potential complexity in the custom version).

Total Sales Year to Date (inbuilt) = TOTALSYTD(Calendar[Date],[Total Sales])

Both of these formulas return the exact same result, and in fact both use the same approach to calculating the result under the hood.  The only difference is the inbuilt version is much easy to write because you (the DAX author) are protected from the full syntax.

The end result (regardless which formula is used) looks like this in a Pivot Table.

image

Looking at the highlighted cells, even though cell 1 above is filtered to show only sales for the month of July 2003, the Time Intelligence function (cell 2 above) is able to display sales for the period Jan – Jul 2003.  It does this because the formula performs the following 4 steps.

  1. It first “detects” the current filter context to determine what the “current” selected period is.  In this case July 2003
  2. It then removes the calendar filtering from the underlying tables so that all data is once again available.
  3. It then performs a time shift to find a different period in time – in this case it holds the end date in the current filter context the same (31 July 2003) but shifts the start date back to 1 Jan 2003.
  4. Finally it reapplies filters on the data model for the time shifted period prior to calculating the final value.

How to Read a Custom Time Intelligence Formula

The custom time intelligence formulas can be daunting when you first see them – but actually they are easy to understand as long as you clearly understand the role of each part of the formula.  Below is the same formula again (from above) along with line numbers to make it easier for me to refer to.

image

Line 2 (CALCULATE) is the only function that can change the natural filtering behaviour coming from a visual – that’s what it does.   CALCULATE always filters first, evaluates second (as mentioned above).  So lines 5 – 8 are executed first (filters applied) and then the formula on line 3 is evaluated last.

Lines 4 – 8 (FILTER) is where the filtering occurs.  There are a few confusing things here.

  • Line 5 refers to ALL(Calendar) instead of simply Calendar.
  • Lines 6 and 7 seem to be evaluating against themselves – very confusing.
  • Line 7 starts with a double ampersand &&

Let me explain each line.

Line 5 ALL(Calendar)

The key thing to understand here is that the filter portion of calculate always operates in the current filter context.  That means that the Calendar table in line 5 has already been filtered by the visual (eg the Pivot Table).  Looking back at the pivot table image above, this means that the line 5 is already filtered by the pivot table and hence the Calendar only has dates from 1 July 2003 until 31 July 2003 available – all other dates have been filtered away.  The purpose of the ALL Function therefore is to remove all filters from the current filter context.  If it didn’t remove these filters, it would not be possible to access dates outside of the month of July 2003.

Now they have all be removed, it is time to add back that date filters that are needed.

Line 6 MAX( )

Line 6 reads “where Calendar[Year] = MAX(Calendar[Year])” and hence it seems to be referring to itself. The way to read and understand line 6 (and 7) is as follows:

  • Whenever you see an aggregation function (in this case MAX) around a column, it means “go and read the value from the current filter context”.
  • Whenever you see a “naked” reference to a column (ie there is no aggregation function around the column), then it means “go and apply a new filter on this column in the data model.

Taking these 2 rules, it should be clear that MAX(Calendar[Year]) in the current filter context is = 2003.  Line 6 therefore says “Go and apply a new filter on Calendar[Year] to be equal to 2003.

Note that you can use any aggregation function in these formulas that does the job.  In line 6, you could use SUM, MIN, MAX, AVERAGE or any other similar aggregation function and still get the same result.  You could also use VALUES or DISTINCT in the case of line 6, and LASTDATE in the case of line 7.  There is no right or wrong answer, simply think about the need and then find the best function to use.

Line 7 && and MAX( )

Line 7 reads “and Calendar[Date] <= MAX(Calendar[Date])”.  The double ampersand && is the inline syntax for a logical AND.  Using this knowledge and the same logic as with line 6, the way to read line 7 is as follows:

“AND also go and apply another new filter this time on Calendar[Date] to be less than or equal to 31 July 2003”.

With both of these filters applied in a logical AND, the Calendar table will contain all of the dates from 1 Jan 2003 until 31 July 2003.

The Result

Once the time intelligence function has been written, it can be added to a visual as shown below (Power BI Desktop).  Note how the YTD formula seems to “defy” the filter context coming from the visualisation due to the custom time intelligence function that has been written and explained.

image

The Trouble with Syntax Sugar

Syntax sugar is great as it makes otherwise hard formulas very easy to write with a limited understanding of filter context, row context, filter propagation, context transition etc.  There are a few down sides however.  Firstly is that you don’t get to learn these very important concepts and hence you are delayed in becoming a true Power Pivot and Power BI Ninja.  Secondly the inbuilt time intelligence functions only work in certain circumstances where you have a day level Gregorian calendar.  Read more about that here exceleratorbi.com.au/power-pivot-calendar-tables/

Granularity

I personally hardly ever use the inbuilt time intelligence functions unless my data is at a day level of granularity (which is rare), and I find the custom functions become easy to write with practice.  Custom time intelligence functions become much more important when your data is not at a day level of granularity.  Most of the work I do for clients is either weekly or monthly data.  If you are in this situation you could “trick” Power Pivot that you are using daily data by loading all data in any given week or month on the same date (eg last day of the month) and use inbuilt time intelligence however this is hardly best practice.  A much better approach I think is to write custom time intelligence functions as outlined in this article.  If you are going down the  path of writing custom time intelligence, you should read my advice about adding an ID column into a calendar table to make custom time intelligence functions easier to write.  exceleratorbi.com.au/power-pivot-calendar-tables/

Sales vs Prior Year

Time for a different example.  Now that I have covered how a custom time intelligence function works, let me show you a couple of inbuilt time intelligence measures that solve the original problem (Sales vs Prior Year).

Sales Prior Year 1 = CALCULATE([Total Sales],SAMEPERIODLASTYEAR(Calendar[Dates]))

Sales Prior Year 2 = CALCULATE([Total Sales],DATESADD(Calendar[Dates],-1,YEAR))

Both of the above formulas use inbuilt time intelligence functions (shown in bold), but they also use CALCULATE.  Now you have an understanding that CALCULATE is performing a time shift, it should be much easier to understand what is happening in these formulas.  Both of these formulas in bold produce a table of dates that have been time shifted by 1 year.  CALCULATE then takes this new table of dates, removes the current filter context from the calendar table and then moves back in time by 1 year before reapplying the filter context and then doing the calculation.  One you have [Sales Prior Year] it is easy to write:

Change vs Prior Year = [Total Sales] – [Sales Prior Year]

% Change vs Prior Year = DIVIDE([Change vs Prior Year] , [Sales Prior Year])

So where can you find a list of all the inbuilt time intelligence functions?

 A Free DAX Reference Guide

One of my students at a recent training class asked me if there was a list of all DAX Functions – kind of like a cheat sheet.  I wasn’t able to find such a thing so I produced exactly that and I am making it available free to anyone that would like a copy here.

This reference guide covers all of the inbuilt time intelligence functions on page 14 as well as every other function across the language all nicely laid out to make them easy to find.  You can download this reference guide below.  If you haven’t ready done so, why not sign up for my weekly newsletters at the same time so you are kept up to date with my latest tips and tricks about Power Pivot, Power Query and Power BI.

Download the DAX Reference Guide Using the Form Below

Data Modelling with OneDrive Source Files

Last week I shared how it was possible to easily create an online survey using Excel for O365.  Once such a survey is created, the data can be imported into PowerBI.com and used to create a simple report.  The biggest issue with directly connecting to OneDrive like I showed last week is that you can’t do any serious data modelling.  You can’t write any DAX measures or join other tables into the data model etc.  Not being able to build a proper data model with my data set last week created the following issues (I am sure there are more).

  1. I couldn’t easily create a sort order column for my “how long have you been reading” question (hence why I needed to prepend a letter A, B, C etc to get the answers to sort).
  2. Without data modelling, there was no easy way for me to cleans the data. eg Some people said they came from “United States” and others said US, but that didn’t work in my map (it had to be consistent).
  3. image<

  4. I couldn’t write proper measures such as countrows(survey) to find out how many surveys there were.  Instead I had to rely on “Count of Name”.  If someone left the name field blank, it would not be counted.
  5. I couldn’t change the column names to be something more suitable for reporting – I was stuck with the actual survey questions in the visualisations.

All of these issues (and more) are solvable if you can simply build a regular data model, but this is not possible when you directly connect PowerBI.com to a flat table of data in the cloud.

This article explains how you can create a data model with a cloud based service (like OneDrive) and also how I then went about improving the reporting from last week given I could crate a proper data model.

Connecting Power BI Desktop to OneDrive

The alternative to connecting PowerBI.com to the OneDrive file is to use Power BI Desktop instead.

  • First build a data model using Power BI Desktop connecting to the cloud copy of the OneDrive file.
  • Publish the Power BI Desktop file to the service in the cloud (after you have done your data modelling of course)
  • Configure Power BI cloud service to directly authenticate to OneDrive in the cloud so that it can directly access the OneDrive data without further configuration or gateways.  Power BI automatically refreshes the data for you approximately once every hour or so.

Let me demonstrate how to do this using the same survey from last week.  If you haven’t already completed the survey you can do so by clicking here and then see you survey response included in the reporting below.  You can read about how to create an Excel survey like this from the blog article last week.

Copy the URL to your OneDrive File in the Cloud

There are a few ways to do this but the way I did it was to copy the URL of my OneDrive XLSX file and then use that to import the data “From Web”.

Get the URL

The tricky bit is finding the correct web URL of your file on OneDrive.  The way to do this is to open the Excel file in Excel on your PC and extract the URL from there.  You can either open the file from OneDrive in the cloud or the copy on your PC (I used the one on my PC).

Once you have the file open in Excel, click on the File menu, then Info (1 below), click on the OneDrive Link (2 below) and then click “Copy link to Clipboard” (3 below).  This placed the link into the clipboard.

image

With the URL still in the clipboard, I created a new Power BI Desktop report using Get data, Web as shown below.

image

I then pasted the URL from the clipboard and manually deleted everything from the question mark to the right (highlighted below).

image

So I was left with the following

image

I then got the following complicated set of options for Authentication. I took the last option to sign into my Organizational Account, and signed in with my Office 365 credentials

image

image

After authenticating, I was able to import the table of data into Power BI Desktop and Load as normal.

image

Load to Power BI and Configure Credentials

I will continue the process of building out my data model further down the page, but first let me complete the process to publish and configure the online credentials process.

First I saved the query (close and apply), saved the workbook and published it to my PowerBI.com account.  Once the dataset was in the cloud, I needed to configure my credentials for the dataset.    I clicked on Settings (1 and 2), Datasets (3) and then selected my dataset (4 below)

image

Because this is an online data source, I was able to edit the credentials as shown below.

image

After clicking Edit Credentials, I selected OAuth2 and signed in.

image

And I then got this confirmation.

image

Improving the Reporting Using Data Modelling

Back to the data modelling in Power BI Desktop.  The first thing I did was turn off data load for the table I had previously loaded.  This was quite easy. Right click on the table, then uncheck “Enable Load”.  This keeps the connection to the table but doesn’t load it in the current form.

image

The rest of this article explains how I changed the data to make it more useable in my final report.

Cleanse the Country Data

The next thing I did was create a data cleansing table for the list of countries. Ideally the survey tool would provide a list of all valid countries to select from a list, however this is not available using the simple online Excel survey tool I am using.  Instead I created a substitution table where I can manually create substitutes for data that I wanted to change.

I created a new query with reference to the first.  Right click on the first query (1 below) and then select Reference (2 below).

image

I then deleted all columns other than the country column, removed duplicates and sorted the column.  Finally I set it to disable load.  It is now possible to see all the data points entered and identify the dirty data that was included (shown below).

image

I then created a manual mapping table to make it easier to control the data cleansing process.  To do this, I simply created a new manual table in Power BI Desktop as show below (and also turned of data load).

image

Sorry to the non descript North American if you happen to be Canadian, Mexican or of some other Nationality.  I decided to map this to the USA.

Now I had the mapping table, the next step was to replace the original country data with the cleansed data. I created a new query referencing Table1, then joined on the country columns to the CountryCleanse table.

image

Using a Left Outer Join as shown below

image

I then expanded the new column to extract the cleansed version of the data.

image

This gave me a new column that contains the substitutes in the case there was a match in the cleansing table.

image

I then created a new custom column that uses this cleansed column if it exists, otherwise the original column.

image

image

I then deleted the other 2 columns and renamed my new column so that I had a cleansed copy of the Country.

Cleanse and Sort “How Long” Answers

In my survey question, I asked how long had the participant been reading my blog.  When the answers were placed in Power BI they didn’t sort based on length of time (they sorted alphabetically).  To fix this (using the original approach I blogged about last week) I was forced to add a prefix to force the answers to sort logically.  Using Data Modelling, it is possible to solve this problem properly.  For variety (in this blog), I decided to use a different mapping table solution. I first created a mapping table with the original survey choices, a new set of labels and a sort column as follows.

image

Then I loaded this mapping table as a lookup table to my data model and joined it on the Original column as the key.

image

Then to avoid confusion in the final report, I hid the original columns in both tables.  Right click on the column (1 below) and then select Hide in Report View (2 below).

image

Of course I sorted the Tenure column

image

Adding “All of the Above” into the other 3 choices

The last problem I wanted to solve was that many people chose “All of the above” when asked about their favourite topics.  When this was put in a visual (shown as 1 below) it is not possible to see the overall most popular item.

image

There are many ways to handle this problem.  The measure I wrote was as follows:

Count Most Useful =
COUNTROWS ( Data )
    + CALCULATE (
        COUNTROWS ( data ),
        Data[Which topics do you find most useful?] = "All of the above"
    )

The measures simply adds the count for each item and also adds “All of the above” as well (shown as 3 above) and I then simply filtered out the “All of the above” answer from the visual (shown as 4 above).

My report was starting to look good (see below) but I still wasn’t happy.

image

Ideally I should be able to show “what is most useful” and also “What do you want to see more of” in a single bar chart.   But the problem was that there were 2 different measures and the “bars” are coming from 2 different columns in the data source.  This could not be solved without thinking differently about the problem.

I created a new table that has the 3 responses and loaded it to the data model.  I didn’t connect this table to any other tables in the data model.

image

I placed this new column of data as the Axis of my visual and place the 2 measures as Values.  This then gave me a chart that looked like I needed, but of course there was no filtering, because the new “Types” table is not connected to anything in the data model.

This is important:  The Axis is part of the filter context, the Types table is being filtered by the visual, but this filtering is not being used in the measures I wrote.

image

So I changed the measures so they would “harvest” the value on the Axis to filter the results.

Count Most Useful =
IF (
    HASONEVALUE ( Types[Product] ),
    CALCULATE (
        COUNTROWS ( Data ),
        Data[Which topics do you find most useful?] = VALUES ( Types[Product] )
    )
        + CALCULATE (
            COUNTROWS ( data ),
            Data[Which topics do you find most useful?] = "All of the above"
        )
)

The final visual looks like this, which is much easier to read and understand.

image

And here is the final report

Easy Online Surveys with Power BI Reporting

I think today’s article will be of interest to my readers even though it is a little astray from my normally pure Power BI, Power Pivot and Power Query content. I will show you how to quickly and easily create an On-Line Survey that you can distribute to anyone that has an Internet connection, and then use Power BI to report on the results from your survey.

Office 365 Excel Surveys

You may not be aware that O365 has an inbuilt Excel survey tool that is very easy to use.  And who wouldn’t want to use Excel to solve a problem if they could?  You can find the survey tool by going to OneDrive online.  I am using OneDrive for business for this demo, but it also works for OneDrive Personal.

Here is Survey – Please Participate

For demonstration purposes, I have created a survey to ask my readers what they would like to see more of on my website and to see which countries the readers live in.  Edit 7 Feb 2016.  The survey is now closed, but you can still read on.

image

Now that you have completed the survey, read on to see how you can do the same yourself and to see the latest results via an embedded a Power BI report at the bottom of the page.  Note that it can take up to 1 hour for your specific survey response to be available in the report (ie it is near real time – not real time).

How I Created the Survey

First Open OneDrive Online

Frankly I get confused with all the logins for Microsoft sites. The easiest way I know how to login to OneDrive Online is to Open Windows Explorer, right click on the OneDrive folder and then select View Online.

image

Create Your Excel Survey

Creating an Excel survey couldn’t be easier.  Just click New\Excel Survey as shown below.

image

I immediately got some survey header information to fill in as shown below.

image

There are no user manuals required to work out how to do this.  I just updated the information and moved on to the next section.  It is easy to enter the questions you have, and there are a number of intuitive response types that you can choose to meet your needs.

image

Survey Results Database

After you have completed your survey, you can see the underlying Excel spreadsheet that stores your survey results.  It is possible to add new columns to the right of the results table.  Additional columns added manually can be used to store information you want to add without it being part of the survey.

image

Connecting the Data to PowerBI.com

There are a few ways you can create a Power BI Report.  If you want a fully featured data model including the ability to import other tables of data and write your own DAX, then you need to import this new data source into Power BI Desktop as you would with any other report.  For this demo today, I have use the “Get Data” feature from directly inside Power BI Service to connect to the Excel workbook on OneDrive.  To do this, I first logged into PowerBI.com and selected get data from files as shown below.

image

Then I selected OneDrive – Business as shown below.

image

Then I selected my survey file and clicked Connect

image

I was then prompted to either import the data or connect.  The connect option works just like Excel Online – not what I wanted.  So I selected Import.  When you use import in this way, Power BI will check your OneDrive account about once each hour to check for new data and will automatically update your report.

image

Power BI then created a new Dataset for me (1 below).  You can see the table of data shown in 2 below.  It also created a Dashboard, however this was not much use so I simply deleted the dashboard.

image

Here is the Final Power BI Report

Once I had the data connected, I simply created a few simple visualisations using the capabilities of Power BI Service.  You can see the final interactive report below.

Be sure to read next week’s post where I cover how you can use Power BI Desktop to model your online data. exceleratorbi.com.au/data-modelling-onedrive-source-files/

Excel Workbook Connection to Local Power BI Desktop

Today I am sharing an easy way to use Power BI Desktop as a SSAS tabular server on your local PC.  I find increasingly that I am building data models in Power BI Desktop and yet I still want to use Excel to analyse the data some of the time.  If I load the PBIX workbook to the Power BI Service, it is easy to use “Analyze in Excel” to query the data model.  But there is currently no standard supported method to do the same thing using Power BI Desktop as a server (see my suggested idea to Microsoft at the bottom of this page for more about that).  Until then, we need to use a “hack” to complete this task.

I have previously shared a way to:

  • Open DAX Studio on your PC
  • Use DAX Studio to extract the Port Number of a running instance of Power BI Desktop
  • Manually enter this information into Excel to connect to Power BI Desktop and use it as a local SSAS Server instance
  • Query the data model that exists in the open Power BI Desktop file with a Pivot Table in Excel.

The problem with the above approach is that once the Power BI Desktop file is closed, the connection string becomes invalid.  Next time you want to use Power BI Desktop as a server you must go through the entire process again plus you need to alter the connection string that was previously created – hardly a scalable solution.

An Excel VBA Workbook Template to Solve This Problem

Local Host Workbook

Today I am sharing a tool that I built recently that completes the entire process automatically in Excel using VBA and Power Query.

Disclaimer:  I am providing this tool free of charge, without support, and do not warrant that it will work for you.  This is not supported by Microsoft and could stop working anytime.  If it works for you, then great – if not then it is unfortunate :-(.  Maybe you can deconstruct what I have done and make it work for you.  I built this using Excel 2016 and I have tested it on 2010 and 2013 and it works (Please make sure you have the latest version of Power Query installed).

You create a new blank Excel Workbook using my template (from personal templates), then click the “Refresh SSAS Connection” button.  The tool will then:

  • Detect an instance of Power BI Desktop running on your PC (there must be 1 and only 1)
  • It finds the port number and database name of the running instance
  • it modifies the existing connection string in the Excel Workbook to repoint it to the current running instance of Power BI Desktop.
  • If the workbook already contains pivot tables that were previously built using the same data source, they will simply reconnect and work as you expect.
  • If this is the first time you have used the template, you will have a blank Pivot Table connected to the Power BI Desktop data model.
  • You should only have 1 Excel Workbook and 1 Power BI Desktop file open at any one time (as they all use the same connection string).

I am not planning on covering in detail how the tool works.  I am making it available to use and modify as you see fit.  If you want to look into the VBA code and the Power Query to see what I did then of course this is fine by me.  If you want to improve it and give me back the improvements under the same “share and share alike” rules, then please do so as long as you leave my name comments in the file untouched.

How to use the Template

You can download the template here.  I have put it in a zip file so that I can keep the same link but change the name of the actual file over time (to manage version numbers).

The file is a macro enabled Excel Template workbook.  You need to place it in your Personal Templates location.  To find this location, in Excel go to File\Options, then navigate to save (1 below) then check the location (shown in 2).  If you don’t have a current template location, you can set your own.  Note Excel 2010 has a different file location (Google it).

image

To create a new workbook that uses this template, in Excel go to File\New, select the personal templates option (shown as 2 below) and then select the template.

image

I would love to hear what you think and find out if this is a useful tool for you.

Let’s Ask Microsoft to Make this a Standard Feature

I was talking to Avi Singh and he suggested that I request this as a standard feature.  I have set up an idea that you can find and vote for here.  If we get enough votes for this feature, Microsoft may build out this capability into the core product(s) so it is easier for everyone and fully maintained for future releases.

 

Power Pivot Crashes? – An Update

powerpivotlogoToday Avi Singh and I met with Howie Dickerman from Microsoft to discuss the Power Pivot for Excel Add-In.  You can read the background to this meeting at my blog post last month.  Howie is the Program Manager assigned to Power Pivot for Excel and has a wealth of experience with Excel and other products.  I shared all the comments that people added to my blog post from 19th October.  The great news is Howie is already crystal clear about the problems.  We had a great discussion about the plans that Microsoft has for improving stability and performance for the Addin.  And there definitely is a plan.  Like any software project it is impossible to say when we will see something, but I am just excited that there is a plan.

How You Can Help

Microsoft already has a clear understanding of the areas that need some work.  What Microsoft needs more than anything is a set of structured steps that will reproduce an issue – any issue you have.  From experience this is very hard to pin down.  If you know how to reproduce an issue, please send me the steps to reproduce it (you can add to the comments below or send me a message via my home page).

Note:  I realise lots of people have had a bad experience at times.  I am explicitly looking for “reproducible steps” here.  This means that when you follow these steps it will always (or often – eg 1 in 4) cause the issue.  If it is not reproducible then the information cannot help for this next phase.

When to Create a Lookup Table in Power Pivot

Level: Beginners

Today I explain when it is important to create a lookup table and when it is fine to use native columns in a data table.  I have rated this topic as a beginner topic as it is a fundamental skill to learn on your journey to become a Power Pivot and Power BI Ninja.  However there is so much to learn and you can only absorb so much content before your head begins to explode.  Because there is so much to learn, you may find yourself learning this topic for the first time now even though you are an intermediate user. Or you may come back and read this again for a second time before it all sinks in.  This topic is therefore good for everyone, either as a review, a confirmation of what you know, or as a new learning opportunity.

It is also worth pointing out that his topic does not mean the difference between “it works” and “It doesn’t work”.  You can be successful using Power Pivot and Power BI without understanding this topic well.  But it can make a big difference to usability of your model and certainly performance (in some cases).  And not least of all, if you want to get better and advance you skills, you simply must understand these principles well.

Topics Covered in This Post

In this post I cover the following scenarios to help explain some of the nuances and benefits of using lookup tables (or not):

  • Simple data models
  • Inbuilt time intelligence functions
  • To Simplify Your DAX formulas
  • Pointless dimension tables
  • Multiple data tables
  • Simple business communication

First a Definition of the 2 Table Types

Before I move on, it is worth restating the definition of the 2 different types of tables so it is clear in your mind.

There are 2 different types of tables in a Power Pivot data model.  There are dimension tables (I call them lookup tables) and there are fact tables (I call them data tables).  These two table types are very different and understanding the difference is fundamental to understanding and using Power Pivot and Power BI properly.

Lookup Tables (Dimension Tables)

Lookup tables contain information typically about 1 business concept/object, eg Customers, Products, Time etc.

image

Lookup tables always have the following features:

  • There is always a column of data that uniquely identifies each row of data in the table.  In database terms this is the primary key.  In the customer table it would be the customer number, in the products table it is the product code, in the calendar table it is the date, and so on for any lookup table.
  • There can be no duplicates in this key column – every row must be unique
  • There can be 1 or more additional columns in each lookup table that provide more information about that primary key column.  Eg in the calendar lookup table, the date is the primary key. Other columns such as Day Name, Month Name, Year, Day of Week number etc all provide detail about the date key.  I often refer to these additional columns as providing metadata about the primary key.

Data Tables

Data tables contain (typically) transactional information.  It could be sales data, budget data, GL data from a financial system, call centre call data, or any other data about activities of some type.

image

Data tables have the following features:

  • There is no requirement that a data table must have a primary key.
  • There needs to be 1 column that can be connected to relevant lookup tables (assuming you want to connect these tables).  In database terms, this column is called the foreign key.  It should be obvious that if you want to join your data table to your product table, you must have a column that defines the product in your data table.
  • Duplicates are allowed in data table columns.  eg many sales occur on each day, many sales occur for each product.

Joining the Tables

You cannot join data tables to other data tables.  Data tables are always joined to lookup tables using the Lookup Table primary key column as the join column.

I am using the Adventure Works database in the image below.  I have placed the lookup tables in my data model at the top of the window and the data table below.  This is the Collie Layout Methodology; it gives the user a visual clue that the tables you have to “look up” to see are the Lookup Tables – get it?  It is not required, just an easy way to remember.  Technically this data layout is called a star schema.

image

When To Use Lookup Tables

Back to the point of this post.  When should you use lookup tables and when is it OK to simply use the column in your data table?  Well the fact is that you do not “need” to use lookup tables at all to make your data model work – it is perfectly fine to simply load a data table and go from there.  Having said that, read on to find out the nuances of the some common scenarios.

Simple data models

Let me start by covering when it is ok to not use any lookup tables.  If your data meets the following scenario it is fine to ignore lookup tables:

  • Your data set is small.  What defines small depends of course.  Certainly 10k – 250k rows of data is small (tiny actually). It is less important to efficiently store your data when your data set is small and hence a flat single data table can be OK.
  • Your data doesn’t contain too many columns (this is a topic in its own right and outside of the scope of this post).  If you have lots of columns (eg >15) then even 250k rows of data can become non-performant.  But if you have a few columns, then maybe you don’t need lookup tables.
  • Your data table already contains all the columns you need for your analysis.
  • You don’t need to use any inbuilt time intelligence calculations (more on that below).
  • You only need to write simple DAX formulas (such as SUM formulas).

If this is your scenario, then you don’t need lookup tables at all.  However if you always operate in this simple scenario then you will not advance your DAX knowledge to the next level, so keep that in mind.  This scenario with no lookup tables is quite rare and is probably limited to Excel users that already have their data in a single flat table.

Inbuilt Time Intelligence Functions

It is mandatory to have a calendar lookup table if you want to use the inbuilt time intelligence features of Power Pivot.  Examples include calculating sales last year, sales year to date etc. I cover calendar lookup tables in depth in this article here.  It is possible to do manual time intelligence calculations (not inbuilt, but manual) using a single data table, but I don’t recommend it.  If time is an important part of your data, then I strongly recommend you get a calendar lookup table and go from there.

To Simplify Your DAX Formulas

While it is possible to use a simple data model with a single data table and no lookup tables (as mentioned above), once you have more complex analysis needs (eg other than just using SUM) you will need to write some DAX formulas to create the business insights.  In my experience it can be easier to write DAX formulas when you have lookup tables, particularly when you need to change the filter context in your data model.  “How” to do this is also out of scope for this topic, but be aware that this is another reason to move on from a single data table.

Pointless Dimension Tables

There is no point creating a dimension/lookup table without a reason and certainly not if there are only 2 columns in the lookup table.  This will make more sense with an example.

Take the Adventure Works data model as shown below.  It contains a lookup table (Products) with 2 columns, one for Product Key and one for Product Name.  The lookup table is joined to the data table using the primary key (of course).

image

The data in this lookup table would look something like shown below – only 2 columns (but a lot more rows of data than shown in this image of course).

image

People that have some understanding of traditional relational databases will be aware that relational databases will store the data in the first column (ProductKey) a lot more efficiently than the data in the second column (Product Name).  It is therefore common to think that it is more efficient to store the product key in the data table and put the product name in the lookup table only.  But this is not the case in Power Pivot and Power BI.  Power Pivot uses the xVelocity storage engine (columnar store) to compress the data.  There is no (or little) space saving benefit in storing the product code in the data table instead of storing the product name.  In addition every relationship in your data model comes at a cost/overhead.  This is why this lookup table is called a junk dimension. If the only reason you have this lookup table is to join the single column “Product Name” to the data table via the product key, then it would be just as good to load the product name into the data table and drop the product key all together.

If your data already has the product name (and not the product key) in the data table, and you are thinking of creating this lookup table, then you may want to think again.  That being said, there are other reasons why you may want to keep the primary key and lookup table including

  • if you have more than 2 columns in your lookup table.
  • if your product names are not unique (often the product key is managed more rigorously than the name)
  • if your data table already has the product key in the table and it is easier to do it this way.

to name a few.

If you data table contains a column that is not related to any other column as a logical object and you are wondering if you should build a lookup table for it – the answer is no – don’t do it.  Just use the column in your data table.

Multiple data tables

As you probably know already, it is possible to have multiple data tables in your data model. In fact this is one of the many great things about Power Pivot.

image

However as I mentioned at the start of this post, it is not possible to join one data table to another data table in Power Pivot. If you want to use multiple data tables in your data model, you must join them through common lookup tables.  It therefore follows that if you have multiple data tables in your data table, you should load lookup tables even if they are junk dimensions.

Simple Business Communication

There is a clear benefit to business users if you logically group your columns of data into lookup tables that make business sense.  It can be hard for users if they need to hunt through the data table for information about a customer, or a product etc.  It is a much better end user experience if there are logical groupings of columns of relevant data in a table called Customer, Product etc.  Given the end users may be writing their own reports using the field list to find the relevant data, grouping the data into lookup tables can really improve the end user experience.

Flatten Your Lookup Tables

Further to the above point, you should also consider flattening out your data into a single lookup table if you can.  The following data model is feasible and will work, but it is hardly user friendly.  In the image below there is a product lookup table, and then a sub category lookup table of the product table, and a product category lookup table of the sub category table.

image

But if you deploy this model, it is likely to be confusing to the end user.  Remember they will be building reports using the field list, and they will need to go searching across different tables to find the columns of data they use.  It is much better to flatten the data from the additional lookup tables into a single product lookup table if you can.  Also flattening these tables will remove extra relationships and this will make the model more efficient (every relationship comes at a cost).  And finally your DAX will be easier to read and write.

Wrap Up

Hopefully this article has helped clarify when and where to use lookup tables.  Let me know your thoughts and/or any questions that remain unresolved in the comments below.

Does the Power Pivot Excel Add-In Crash for You?

I was chatting to Avi Singh the other day and we were comparing notes about the live Power Pivot training classes we both offer in Australia (me) and the USA (Avi).  One thing we both have in common is that the Microsoft Excel Power Pivot Add-In CRASHES a lot during training!  When I say “crash”, I mean it stops working in some ungraceful way.  Excel normally don’t stop working, just the plugin.   I always warn my students that the Add-In will crash , and I always say If Power Pivot wasn’t so great, you wouldn’t put up with it – but it seriously IS that great and hence I have learnt to live with it”.  But there were a lot more Add-In crashes in my training last week than I have come to expect, and I would like to try to do something about it.  Note when I say the Add-In crashes, I mean it stops working – Excel itself doesn’t crash.

Before moving on, I don’t want anyone to get me wrong here, I still love Power Pivot (really – I LOVE IT!).  It is still the best thing to happen to Excel since Excel itself.  But the instability of Power Pivot for Excel during the data modelling process is really bad.  In this post today I explain the symptoms of the Add-In crash and the recovery process.  But more importantly I am seeking feedback from the community to share their own experiences with the Add-in crashing so I have some basis for a discussion with Microsoft when I am in Redmond later this year.  So please post your comments and experiences at the bottom of the page.

It is Worse in Excel 2016

I taught my Power Pivot class using Excel 2016 for the first time last week (I normally teach in Excel 2013 or Excel 2010).  Frankly I was expecting that the Add-In would be a lot more stable given it is an updated version, and given Microsoft ships bug fixes via O365 on a regular basis.  But in fact I found the exact opposite – The Excel 2016 O365 version of Power Pivot for Excel is the most unstable I have ever used.  For the first time ever, the Add-In crashed and at the same time it actually corrupted my data model and made the entire workbook un-usable.  This has never happened to me before using Excel 2010 or Excel 2013 despite experiencing literally hundreds of Add-In crashes over the years.

Symptoms that the Add-In has Crashed

There are a few symptoms that the Power Pivot Add-In has crashed.  The easiest to spot is when you get some sort of “Unhandled Exception” error message (something like this one below).

Image result for unhandled exception

If you see an error message like this, then it is time to restart the Add-in.  Read about how to recover from an Add-In crash further down the page.

But sometimes the symptoms that the Add-In has crashed are harder to spot.  An example is when you write a measure directly in the Excel measure editing box (like shown below), the measure is written correctly (shown as 1 below), yet when you click “Check Formula” (shown as 2), it says there is an error in the formula (shown as 3).

image

This is really confusing for someone that is trying to learn to write DAX.  It is hard enough getting your head around all the new formulas and new syntax, but when Excel is telling you there is an error in your formula AND THERE ISN’T, then wow!  I have a few years of Power Pivot experience, and in that time I have learnt to spot these subtle clues that the Power Pivot Ad-In has crashed, but for a new user it is a horrible experience.

Remedies for When the Power Pivot Add-In Crashes

The first thing I always do when teaching a Power Pivot class is to show the students how to recover from a Power Pivot Add-In crash.  This is a mandatory skill for anyone learning Power Pivot.  The standard fix is to turn the Add-In off, then on again. But in my experience there are actually 3 levels of remediation depending on the seriousness of the Power Pivot Add-In Crash.  In my training course yesterday, I came up with a naming convention for the 3 levels of fix.

DEFCON 3

This is the most mild response to the Add-In crash and it is always what I try first.  You first need to open the COM Add-Ins dialog.  The easiest way to do this is to turn on the Developer Ribbon (turned off by default) and use the COM Add-Ins button.

image

Once in the dialog, you need to de-select the Power Pivot Add-In (shown as 1), then click OK (shown as 2).

image

Then you need to go back into the COM Add-Ins again, reselect the Add-In and click OK.  More often than not, this will successfully restart the Add-In and you can get back to work.

DEFCON 2

Sometimes the simple stop/restart fix doesn’t solve the problem.  Then it is time to move to DEFCON 2.  The process is basically the same except you must close the Excel application before restarting the Add-In.

  1. Disable the Power Pivot Add-In
  2. Shut down Excel
  3. Restart Excel
  4. Re-enable the Power Pivot Add-In.

This second approach will fix many of the more serious Add-In crashes.  But sometimes (and actually quite often when using Excel 2016), even this second level response doesn’t work.

DEFCON 1

Time then to move to DEFCON 1 and throw everything you have at the problem.  The process is still basically the same except you add in a PC reboot to completely reset everything.

    1. Disable the Power Pivot Add-In
    2. Shut down Excel
    3. Reboot your PC
    4. Restart Excel
    5. Re-enable the Power Pivot Add-In.

This last approach has always recovered the Power Pivot Add-In crash for me.  Unfortunately however I experienced my first ever workbook corruption using Excel 2016.  I got the Power Pivot Add-In working again with a DEFCON 1 response, but the workbook could not be saved.

Things that Causes the Add-In to Crash

Given I have observed so many Add-In crashes over the years, I have an understanding of the types of things that cause the Add-In to crash.  The key things that come to mind are:

  • If you make an error in a DAX formula but don’t click “Check Formula” before saving the measure.  The more often you do this, the more likely the Add-In will crash.  It seems the internal error handling is not that robust.
  • When you create Linked Tables and add them to the data model, there are often issues that seem to be related to the data model not refreshing properly.  Often you simply can’t see the new table in the Pivot Table Field List.  I can normally solve this problem by shutting down Excel and restarting.
  • If you try to rename a Linked Table after you have added it to the data model, this can cause the Add-In to crash.
  • Unfortunately now it seems that if you are using Excel 2016, that alone is enough to be at risk.

Power BI Desktop

Edit: 22 Oct 2016

I just finished a 2 day in house training course with a client.  For the first time I did the entire training in Power BI desktop.  It was much more stable than Excel has been.  There were a couple of application crashes but recovery was pretty good.

Please Share Your Experiences so I Can Share with Microsoft

As I mentioned at the top of the post, I want to be able to give some “feedback” to Microsoft about this problem.  You can help me by sharing your experiences in the comments section below.   The more real world feedback I get, the more likely I will be able to get some attention to the issue. That way I will be able to demonstrate that this is not just one grumpy Australian that is complaining about an isolated problem, but a widespread issue that needs some attention.

I will let you know how I go 🙂

Who Needs Power Pivot, Power Query and Power BI Anyway?

Level: Beginners

One of the great challenges Microsoft has faced with its “new” suite of Self Service BI tools (particularly Power Pivot) is that most people that could benefit from the holy trinity (Power Pivot, Power Query and Power BI) don’t even know these tools exist, let alone how the tools can help them succeed in their jobs.  The situation is definitely getting better as Power BI starts to get a presence in the market place, however I still talk to people who have heard of Power BI, but have no idea what Power Pivot or Power Query are, and what’s more they don’t know why they should care.  I personally believe a big part of the awareness problem is that Microsoft insists on shipping Microsoft Excel with the Power Pivot plugin disabled.  There is no reference to Power Pivot when you excitedly receive your brand spanking new version of Excel 2016 – what a marketing opportunity gone begging!

image

I have been an Excel nerd for 30 years.  There is nothing I used to enjoy more than installing a new version of Excel, and clicking through every menu item to find something shiny and new that would make my life easier.  By not turning on the Power Pivot menu by default, Microsoft is missing one of the best silent selling opportunities is has for this fabulous addition to Excel.

Given there is no “pull through” on the menus, many people don’t know what these products are or why they should care.  I am often asked by people “who can benefit from these tools?”.  This post sets out to explain who can benefit and why.  Note when I say “who can benefit”, I am not talking about “consumers of reports” here, I am talking about “authors of reports”.  It is clear that people that consume quality reports and analysis will benefit, whatever the tool.  This article is focused on the benefits to those people that are responsible for building the reports and analysis that others will end up consuming.

Power BI

Who can benefit from Power BI is probably the easiest to understand.  The product is well marketed and has a clear role to play.   Power BI is a complete self service BI tool.  It is designed to bring business intelligence capabilities to the masses instead of the elite (e.g. instead of just highly skilled IT MDX programmers).  Rob Collie wrote a good article last week about the democratisation of BI tools.  Power BI will add value to people who:

  1. Have problems sharing reports with others because the file size is too large.
  2. Need to share data with people on the go, that maybe only have a Tablet or a Mobile phone.
  3. Have large data sets that can’t be managed in traditional Excel.
  4. Are spending too much time each week/month manually maintaining reports with new source data and/or new visualisation requests.
  5. Can’t get the (timely) support they need from their IT department using traditional Enterprise BI tools.

Power BI is great because it puts capabilities across the end to end BI stack into the hands of end users (authors), including:

  1. Extraction of data from the source (using the Power Query engine)
  2. Transformation of that data into a new shape that is optimum for BI reporting and analytics (Power Query again).
  3. Data modelling capabilities, so you can turn the raw data into business meaningful insights (using the Power Pivot engine).
  4. A reporting and analytics front end allowing you to build reports to visualise your data (Power BI Desktop and Power BI Service).
  5. A fully integrated cloud based delivery framework so you can easily share with anyone over the internet (Power BI Service).
  6. A full set of Mobile applications across the major operating systems (Power BI Mobile).

Notice how steps 1 and 2 use Power Query, and step 3 uses Power Pivot.  So if you want to learn about Power BI, you really need to learn about Power Pivot and Power Query too.

Power Pivot

Conversely, Power Pivot is the hardest to understand – I.e. it is the hardest for individuals (potential authors) to understand “what does Power Pivot do for me and why do I need it?”.  I have had people enquire about Power BI training courses that have not been interested in Power Pivot or DAX*. But the truth is, if you want be able to write your own reports in Power BI, you really need to learn at least some basic Power Pivot skills.

Power Pivot is a data modelling tool.  It is the underlying reporting engine that enables Power BI and Modern Excel to delivery those modern funky reports that can help you succeed in business.  The Power Pivot engine allows you (the report author) to take your business knowledge and to configure the reporting tools so that Power BI and Excel Pivot tables can be used to find and report on insights in your data.

Most business users have never heard of the term “Data Modelling” before, and the reason for this is quite simple – it has always been the IT department that has been responsible for data modelling.  Power Pivot brings the power of data modelling and puts it in the hands of competent business/Excel users.    An example of data modelling will make it easier to understand.

Example of Data Modelling

Consider a scenario where you download sales data from your company transaction system and it looks something like this in a spreadsheet.

image

You can see the sell price and the cost price information exists in the table above.  But there is nothing about the $ Margin per product and nothing about the % Margin, let alone insights like Sales Year to Date, Top 3 selling products, Fastest growing product etc.  In a traditional Excel world you would simply write formulas in your spreadsheet(s) to enhance this raw data and extract the additional insights.  The trouble with this approach is that all these formulas are only ever “one off”.  You write them for this report, and then when you create another report later, you have to write the formulas again.  Power Pivot handles this problem in a different way.  Power Pivot is a data modelling tool, and instead of writing formulas in your final reports, you write them “under the hood directly in the Power Pivot engine” as part of the data modelYou build the “rules” that describe how to calculate $ Margin and % Margin ONCE and only once directly in the Power Pivot engine.  Once you have created the rules for each insight (e.g. $ Margin, % Margin etc), it is forever available for you (and others) to use over and over again in any report, chart, Pivot Table or any other visualisation you can think of.  Never again to you have to write/copy a new formula every time you create a new report.

*What About DAX?

Data Analysis Expressions (DAX) is the formula language of Power Pivot.  DAX is very similar to the Excel formula language yet there are also a lot of differences that you will have to learn from scratch.  Simply put, if you want to learn Power Pivot (and/or Power BI), you will need to learn at least some DAX as well as lots of other things about how the Power Pivot engine works.

Who needs Power Pivot and why?

People that need this tool are typically Microsoft Excel users/report authors that analyse or report on data, particularly when the following conditions apply:

    1. There are lots of data that stretch the capacity of traditional Excel (file size, re-calculation speed etc).
    2. If you use Pivot Tables a lot to analyse your data.
    3. If you are writing a lot of VLOOKUP (or INDEX/MATCH) formulas to join data from different tables so you can analyse in a Pivot Table.
    4. If you have had to create bespoke reports in traditional Excel, but then have to spend hours rebuilding the report when asked for a different view of the data (e.g. you have a report that shows sales by half year, and then you are asked to produce the same report by month – and have to start again almost from scratch).
    5. Anyone that wants to start using Power BI for anything more than the most simple aggregation reports.  The “out of the box” capabilities of data modelling in Power BI are limited, and you will quickly realise that you need to learn some new skills (Power Pivot/DAX) to be able to leverage the strengths of Power BI.

Power Query

Power Query is a user friendly ETL (Extract, Transform, Load) tool.  Unfortunately Microsoft muddied the waters by renaming Power Query “Get and Transform” in Excel 2016, and “Get Data” in Power BI Desktop.  That aside, Power Query is used to:

  1. Extract:  Connect directly to the data source and ingest the data (into Power BI or Excel) so you can work with it.  It retains a connection to the source so when the source data is updated you can automatically “refresh” the data without having to go through the import process again from scratch. It is amazing.
  2. Transform:  You can clean and reshape the data so that by the time you are ready to use it, it already is in the format you need instead of the format you are given.  Again, you do this only once, and then you can automatically refresh when the data is updated later.
  3. Load the data directly to the place where you want to use it.  The end location for the data will typically be inside Power Pivot or Power BI, but it is also an invaluable tool for use with traditional Excel.

Don’t think of Power Query as simply a tool for self service BI reporting.  I have used Power Query to automatically audit information (eg XLSX files, csv extracts from other systems etc) and produce exception reports directly in Excel.  Once you understand what Power Query can do, you will start to realise all the ways you could use it to make your life easier.

Who needs Power Query and why?

People that need this tool typically are trying to solve the following problems.

  1. They regularly receive file extracts from someone (or some system) and need to manually manipulate this data before they can use it.
  2. They want to use Power BI/Power Pivot but can’t get the IT department to shape the data they need prior to loading it into Power BI/Power Pivot.
  3. People that are doing repetitive tasks such as matching invoices on a remittance advice against a bank statement (or similar) – Power Query eats this kind of work.
  4. Anyone that is given data in the wrong shape (example below).  And yes I quickly used Power Query to un-pivot the data in the blue table so it looked like the green table.  It is worth getting and using Power Query just to get this un-pivot feature alone!
    image

Wrap Up

Hopefully this overview has helped demystify how Power Pivot, Power Query and Power BI can help you in your job.  If you would like to learn more about Power Pivot, you can read my book “Learn to Write DAX“.  If you want to learn more about Power Query, I recommend Ken and Miguel’s book “M is for Data Monkey“.  If you live in Australia, you can attend one of my live training courses where I cover what you need to know about Power Pivot, Power Query and Power BI.

Shaping vs Modelling in Power BI

Level: Beginners

Power Pivot, Power Query and Power BI are 3 products that are closely related to each other and were all built for the same purpose – enabling Self Service Business Intelligence.  I first learnt to use Power Pivot for Excel, then Power Query for Excel, and finally Power BI.  But there is a new wave of users that have never heard of Power Pivot and Power Query, and their first exposure to the underlying technologies is with Power BI.  This can be confusing and hence I explain what you need to know below.

Power BI Can be Confusing to First Timers

I realised recently that people who have never used Power Pivot and Power Query could easily be confused by the different options to do things in Power BI.  There are lots of reasons to be confused, including:

  • There is no reference to the term “Power Query” inside the Power BI tools.  Microsoft stopped using this name in favour of terms like “Get Data”, and “Edit Queries”.  I think this was a bad decision and it has fuelled the confusion. When I tell people to do something in Power Query, they therefore don’t know what I am talking about.
  • There is no reference to the term “Power Pivot” inside Power BI either.  The most relevant proxy for Power Pivot inside Power BI is the “Modelling” menu. Use of the term Power Pivot therefore doesn’t make any sense to “Power BI only users” either.

Shaping and Modelling Your Data

The implications of the above is that there is no clear way to delineate (with language/labels) between the process of loading the data and the process of modelling the data.  I like the term “Shaping” as a replacement for Power Query/Get Data and “Modelling” as a replacement for Power Pivot.

Definition of Shaping

Shaping is the process of extracting your data from the source, and transforming the data so that you can load it into Power BI (often referred to as ETL).  The reason I like the word Shaping is that it clearly describes what is normally needed as the first step in any BI project.  The shape of your data refers to

  • What data you extract
  • How your data is loaded into one or more tables, and which tables you import
  • What names you give to the columns and tables
  • Whether your tables are wide and short (many columns), or long and narrow (fewer columns)

For example data tables (also called fact tables) are normally best shaped into long, narrow tables (lots of rows, and a minimum number of columns), whereas lookup (dimension) tables can be as wide as needed to for the task.

image

All tables should only be loaded with data that you actually need.  One big difference between Power Pivot/Power Query and Excel, is it is easy to go back and get the data you missed later.

Definition of Modelling

Modelling is the process of building the relationships between your tables and converting the business requirements into measures and calculated columns using the DAX language.  The measures contain

  • The business logic that transforms your raw data to a usable, value adding calculation
  • The formatting required for the numbers (currency, percentage)
  • A good business name for the measure that accurately describes the measure

Prior to Self Service Business Intelligence, data modelling was always done by expert IT professionals using a complex language like MDX.  The complexity of such languages put it out of reach of users, as it was simply too hard to learn casually.

2 Ways to Add A Column – Wait What?

A perfect example of the type of confusion Power BI first timers face is that there are 2 ways to add columns to your data.  You can add a column during the shaping process, and you can add a column during the modelling process.

image

So which should you use?  Well it depends on what you are trying to do.  My general advice on adding columns to your tables is to make the change as close to the source as possible. So in order of preference:

  1. Add the column in your source data (eg your Data Warehouse) with the help of your IT friends if necessary
  2. Add the column during the shaping process
  3. Add the column during the modelling process

If you add the column at the source, then you have access to it for your current BI needs but also for future BI needs.  Adding the column at the source provides the most reuse, and if you need to make a change down the track, it will be the least maintenance.  But sometimes you can’t do that for various reasons, or you can’t do it quickly.

If you add the column during shaping, it will be loaded and compressed with all other columns during data load.  This normally means better overall table compression and hence a good outcome for the overall model.

The most common reason you would add a column during the modelling process is if you need to leverage other parts of the model to create the data.  An example would be if you wanted to use a measure (eg Total Sales) in your data model to create the new column in the model (eg Banded groups of Customers based on High, Med, Low annual sales).  But don’t fall into the trap of using too many calculated columns instead of measures. I rant on about that here.  Use them when they are needed, otherwise avoid them.

Wrap Up

Hopefully this explanation of the role of shaping and modelling in Power BI has helped clarify some things for you.  Don’t forget to check out the other articles in my knowledge base here.

Measures on Rows – Here is How I did it

Level: Intermediate

You may or may not be aware that it is not possible to put Measures on rows in a Matrix in Power BI. But I came up with a trick that makes it possible, so read on to find out how.

Measures Can Only be Placed on Columns

First the problem. The only way that you can use the Power BI Matrix visualisation (at this writing) is to place the measures on the columns as shown below.  You can then take any column of data from your data model (typically from a Dimension/Lookup table) and place that on rows in the Matrix.  So you end up with this.

image

This limitation does not exist in an Excel Pivot Table.  As you can see below, it is possible to flip values between columns on rows from the Pivot Table Fields List.

measures on rows

Old tricks for New Purposes

When I was thinking through this problem, my first intuition was to use a Switch statement somehow.  Switch statements can be used to create a Switch Measure (like the ones I used in this blog post here back in 2014) and I figured this could be part of the solution.  And then I remembered another trick I learnt from Rob Collie using disconnected tables to feed a measure – I new I had a solution.

So in summary, I wrote a Switch measure that will morph into different measures when filtered, and then used a disconnected table to pass a filter to that measure.

Write a Switch Measure

The Switch measure is pretty easy to write as follows:

The way this measure works is that it takes an input in the form of an integer, and then depending on that number, it will return the value of the appropriate measure as the result.

Then Create a Helper Table

The next step is the secret sauce to this trick.  I created a disconnected table with the sole purpose of passing the correct filter values (the integers) to the Switch measure.

I used the “Enter Data” feature in Power BI to create a list of the measures I want to put on rows in the Matrix.

image

Note that the ID column is also the sort order of the measures, and the ID column also matches with the Switch measure numbering scheme.  Once the table is created, I added it to the data model as a disconnected table – no need to join it to any other tables in the data model.

I then set the sort order of the Measure column as follows:

image

Write a Harvester Measure

The next step is to write a harvester measure to extract the selected filtered value of the table.  This measure is simply as follows:


The measure above will return the largest integer in the current filter context.  If there is a filter (eg a slicer) on the Measure column and the user as selected “Total Margin” in the slicer, then there is only a single row visible in the filter context, and hence the MAX of the visible rows is 3.  I would of course get the same result if I used MIN, AVERAGE, SUM or indeed even VALUES in my harvester measure.

Filter context is one of the hardest things for new users to understand properly.  I explain filter context in detail (and everything else you need to know to be great at using Power Pivot and Power BI) in my book “Learn to Write DAX”.
L2WD banner ad

You can see the harvester measure in action below.  I have placed the column from the MeasureTable into a slicer and the harvester measure into a Matrix.  When I click on the slicer, the harvester measure updates to show the ID of the selected filter.

slicer

Add the Measure Column to the Matrix

There is more than 1 way to filter a table of course.  In the image above I am using a slicer, but I can also use the rows on the Matrix to provide filter context.  When I place the Measure column from the MeasureTable onto rows in the Matrix, you can see below that the harvester measure is correctly returning the value of the ID column to the visualisation.

image

Pulling it All Together

In order to pull this all together, I replaced the harvester measure with the Switch measure I wrote earlier, and put the Categories column from the Products table onto columns in the Matrix.

image

But there are now 2 problems.  The first problem is that the Total Invoices are shown in currency format.  One limitation of this trick is that all the measures must have the same number formatting.  If you would like to see an optional “Alternative Format” for the SWITCH measure, please vote for that idea here.

The second issue is that the total row is actually showing the total invoices and not the total of all the measures.  This makes sense of course because the grand total row is not filtered.  As a result, the MAX of ID is 4 and hence the SWITCH measure is returning the value of the measure that maps to ID 4 (Total Invoices).

To fix these 2 problems, I removed Total Invoices from my table and rewrote the SWITCH statement to correctly handle the grand total row.

And there you have it, Measures on Rows in a Power BI Matrix.

image

LASTNONBLANK Explained

Level: Intermediate

Last week at my Sydney training course, one of the students asked me a question about LASTNONBLANK.  This reminded me what a trickily deceptive function LASTNONBLANK is.  It sounds like an easy DAX formula to understand, right?  It just finds the last non blank value in a column – easy right?  Well it is a little bit trickier than that, and I am going to explain it all in this post today.

LASTNONBLANK has a sibling FIRSTNONBLANK that operates in exactly the same way but in reverse.  Given the behaviour is the same, I won’t cover FIRSTNONBLANK at all, as you can work it out after reading about LASTNONBLANK.

Syntax

The syntax of LASTNONBLANK is as follows.

LASTNONBLANK(Table[Column],<expression>)

It takes 2 parameters, 1) a column and 2) an expression.  In fact you can also pass a single column table as the first parameter in as well.  This could be useful for example if you wanted to create your own table on the fly using a function that returns a table, and then use that as the first parameter.

LASTNONBLANK is an Iterator

It is not immediately obvious, but LASTNONBLANK is actually an iterator,  It iterates over the column specified in the first parameter and finds (unsurprisingly) the last value in that column that is not blank.  (Technically the engine carries out this task in reverse natural sort order for efficiency).

But what is that pesky second parameter?

The first thing people normally find confusing about LASTNONBLANK is that pesky second parameter.  If all you want to do is find the last non blank value in a column, why do you need this second parameter?  Well in short, you don’t “need” it in that use case (except that it is mandatory).  But having this parameter makes the formula much more powerful, so it is good to have it as an option. It is designed to allow you to write a DAX expression such as a measure to test for some condition or value.  Ideally (in my view) Microsoft should have made this an optional parameter, but that is not what we got.  But there is an easy work around that allows you to make this parameter optional  – just use the value 1 as the second parameter, like this.

LASTNONBLANK iterates through the Table[Column] and then checks to see if the second parameter has a value.  The number 1 always has a value of course, so placing 1 as the second parameter has the same effect as just ignoring this parameter.  Instead the formula will just return the last non blank value it finds in the column.

LASTNONBLANK actually returns a TABLE

Another thing that is not immediately obvious is that LASTNONBLANK actually returns a TABLE.  It is a special single column, single row table that by definition can only have 1 possible value (because it has only 1 row and 1 column).  One feature of this single row, single column table is that you can use it as both a scalar value or a Table in DAX.  More on that later too.

Test Data

My test data for this blog post is monthly bank account balances for 2 fake bank accounts.  Measures that calculate account balances in DAX are often described as “semi-additive measures”, because you can’t just add up the values from each month to get the current balance – you need to find the latest balance to work out how much you have.  This type of data is a prime candidate to use the LASTNONBLANK formula, because LASTNONBLANK does exactly what we need to work with semi-additive measures.

I have set up the following data with some interesting features to demonstrate the behaviour of LASTNONBLANK.

image

It is easiest to see what I have done with this test data in the pivot table below.  I have written the following measure to demonstrate the point.

Note the above measure is a test measure only to help with this blog – it doesn’t really make any sense as is, but it is useful for describing what is happening.  When the above measure is placed in a pivot table, it looks like this (shown below).

image

Note a couple of things about this data.

  1. Kathy’s account doesn’t have a result for May
  2. Matt’s account doesn’t have a result for July
  3. The maximum value for Kathy’s account is in June (the previous month’s data load)
  4. The maximum value for Matt’s account is back in May.

I have set the data up this way to demonstrate the behaviour of LASTNONBLANK.  I often talk about how important it is to set up test data that will flush out issues when you write your formulas.  This is a good example of that, and it will make more sense as you read on.

Last value in a Column

Now the objective is to use LASTNONBLANK to find the last value in a column.

Consider the following formula.

This formula finds the last date in the data table (note it is not the last date in the Calendar table, but the data table).

The following pivot table has the Month name from the calendar table and the account names on pivot table columns.

image

Note that the formula correctly indicates that May data is missing for Kathy, and July data is missing for Matt.  Also note that it correctly gives the last date in the Grand Total of the pivot table.  (You would also get the same result as above if you used LASTDATE instead of LASTNONBLANK).  I have only used LASTNONBLANK here to demonstrate the behaviour.

LASTNONBLANK operates over a sorted column

The [Last Non Blank Date] measure above hides some complexities about LASTNONBLANK.  You might expect that LASTNONBLANK finds the last value in the column, but that is not how it works.  It actually finds the last value in a sorted column using the natural sort behaviour of the column data type.  In the case of a date column (like shown above), then everything is sweet – the natural sort order is also the order we normally load the data (chronological order).  But in the case of the balance column, the natural sort order of a numeric column is numerical sort order, not the order the data is loaded (as you can see in the following pivot)

I have written another test measure as follows

When placed in the pivot table, you get the following behaviour

image

In the above example, you may expect the Grand Total row to return the values 2,125 for Kathy and 1,557 for Matt as these are the last values you loaded, but that is not how it works.  When looking at the individual rows in the pivot table, it all works fine – it correctly finds the last balance for each month.  This measure works correctly on the rows in the pivot table because the rows provide “initial filter context” prior to the measure being evaluated.  So there is only ever 1 row in the data table at the time the measure is evaluated, and that is why it works in this case.  But in the Grand Total row in the pivot table, there is no initial filter context applied to the date, hence all values in the column are iterated over by LASTNONBLANK.  The iteration operates over a sorted version of the balance column (the natural sort order of the column, which is numeric in this case).  Once the balance column is sorted, then 2,200 will be the last value in the column for Kathy, and 1,806 will be the last value in the column for Matt, and that is the result that is returned.

LASTNONBLANK as a TABLE

In this next test measure, I have used LASTNONBLANK as a filter input into a CALCULATE function

You should remember that CALCULATE can use a simple filter such as Table[Column] = “some value” or it can take an advanced filter where you must pass a TABLE as the filter.  The measure above therefore suggests that LASTNONBLANK must be a table, not a scalar value.  This theory can be tested by firing up DAX Studio and executing the LASTNONBLANK portion of the formula as a query.  DAX Studio ONLY ever returns tables, it can’t return scalar values, so if the query works, it confirms that LASTNONBLANK returns a table. Note: you can read more about using DAX Studio as a tool to help you learn DAX at this blog post here.

image

You can see above that LASTNONBLANK returns a single column, single row table that contains a single value.  It is a special table because it can also be used as a scalar value, in the same way that you can use VALUES() as a scalar value if there is only a single row returned.  In the case of the measure [Latest Balance with 1] above, I am using LASTNONBLANK as a table parameter filter inside CALCULATE.

But how does this table act as a filter?

One feature of temporary tables in DAX is they retain a link to the data model, and filter propagation will work between the temporary table and the rest of the data model.  You should visualise this in your mind like the image below.  Imagine a new table that has spawned into the data model and has a relationship to the table where it came from (in this case it is the data table).  The filter propagation flows from the 1 side of the relationship (temporary table imagined as shown in 1 below) to the many side of the relationship (table 2 below).

image

So you can see above that the LASTNONBLANK function produces a table that then filters the data model based on the single value it returns in the table.

When I put this measure in a pivot table, I get this result

image

Note that the measure is correctly returning the last value for the account “Kathy” but it is returning a blank for account “Matt”.  Technically this is correct because the calendar has dates in July and there is no entry for Matt for July, so the last value is blank.

Enter the second parameter

Now it is time to change the second parameter with the following measure.

When this last measure is added to a pivot table, it works as desired, correctly returning the last non blank value from the data table based on the chronological order of the data.

image

This new LASTNONBLANK function measure returns the last non blank value it finds in the column provided it also returns a non-blank result for the measure [Sum of Balance].

It is worth pointing out here that the ONLY REASON this formula works is because the measure [Sum of Balance] has an implicit CALCULATE wrapped around it – you can’t see it, but it is there.  LASTNONBLANK is an iterator, and like all iterators, it has a row context but doesn’t have a filter context.  Therefore the following formula will not work

But this next measure does work (this next formula is the equivalent of the one that uses [Sum of Balance] because [Sum of Balance] has an implicit CALCULATE

I cover evaluation context and context transition in detail in my book, Learn to Write DAX.

Further reading/references

Here are some good links that helped me learn how LASTNONBLANK works, and how it can be used/leveraged if you are interested in doing some more reading on this topic.

www.sqlbi.com/articles/semi-additive-measures-in-dax/

www.sqlbi.com/articles/alternative-use-of-firstnonblank-and-lastnonblank/

www.powerpivotpro.com/2012/06/top-selling-product-using-firstnonblank/

Sydney Modern Excel and Power BI User Group is Growing

Power Bi User Group_NEW 150In October last year, Iman Eftekhari and I set up the Official Microsoft Power BI User Group (PUG) here in Sydney Australia.  Since then we have gone from strength to strength and as of the time of this post the user group consists of a total of 330 members.  If you live in Sydney and would like to join the User Group, then we would love to have you on board.

Objectives of the User Group

The primary objective of this group is to share and learn about data analytics & reporting using Modern Excel and the Microsoft Power BI suite of tools, including

  • Power BI in the Cloud (what is Power BI?)
  • Power BI Desktop
  • Power Query for Excel
  • Power Pivot for Excel

Our Meetup sessions typically include a quick update about what is new in Power BI, tips & tricks, followed by one major presentation.  Major presentations may be an introduction to one of the above tools, a demonstration or tips/advice/best practice or a special guest speaker. Our meetings include lots of opportunities to ask questions, share your own experiences, or ask for help and meet like minded data experts who are going through the same challenges as you. Beginners are welcome as are people that are well into the Microsoft suite of Power Tools.

Our April Meetup this week is already Sold Out

I would love to invite readers along to our Meetup this week but we are already at 100% capacity with 90 users planning to attend.  This week we have a great session from Gin Jia (Agile BI) who will be sharing some of the techniques he has used to leverage the power of Power BI.  In recent months we have had presentations from Marco Russo, Ken Puls, Bill Jelen (aka Mr Excel) as well as official presentations from Microsoft.

Sign up now to be part of our Community

Now is a good time to sign up to be part of the Sydney Modern Excel and Power BI User Group.  There are 2 places to register.

image

When you get to the above link, click on the “join group” button (shown as 1 above) which will take you to our Meetup site for registration.  All of our Meetup events will be communicated at this site including meeting registration and confirmations so you should sign up there.  You should also join the Microsoft Power BI Community and claim your PUG badge (shown as 2 above) indicating your membership to the Sydney PUG.  This affiliates all of your Power BI Community activity with the Sydney PUG, and that helps us get official recognition from Microsoft which in turn helps us get access to exclusive resources to make the PUG more valuable to members.

If you don’t live in Sydney, there are lots of other Power BI User Groups around the world that you can join.  Just go to the link provided above and find the nearest group to you.  And if you find there is not an Official PUG in your area, then why not set one up.

Banding in DAX

Level: Intermediate

As regular readers of my blog would know, I enjoy helping people on forums who need help with DAX, Power Query or Power BI.  Recently I have answered quite a few questions for people that could benefit from a DAX “Banding” solution and hence I have decided to write this post as a permanent link to the topic.  I first learnt about the DAX banding technique from Alberto Ferrari at sqlbi.com

What is Banding?

Banding is a technique that allows you to “group” or “stratify” your data in a table into “bands” (it is sometimes called Cohort analysis).  So let’s say you you have a table of customers like this (shown below).  In my sample database I have 18,000 customers and I know the age of each one.

image

This provides an opportunity to analyse the sales data based on age so as to understand if people of different ages purchase differently.  Below is a pivot table that illustrates what is possible with the data.  But the problem is that there is far too much detail to be useful.

image

I don’t really want to target 35 year old customers differently to 36 year old customers.  Ideally I would like to group (or stratify) the customers into “bands” so that I can treat a group of similar customers the same way.  So I might want to group my customers like this.

image

Once I have grouped customers by age, I can then create a pivot table like the one shown below.

image

Notice how much easier it is to find insights into the data when there is less detail.  I can immediately see from the above pivot table that as people get older, they tend to be more interested in mountain bikes and touring bikes and less interested in road bikes (of course this is a fake database, so the insights are not necessarily realistic).

You will need a Calculated Column for Banding

I am a strong advocate for using Measures (not calculated columns) when ever possible.  You can read more about that in my Knowledge Base here.  But in this instance, a Measure is not going to work.  Measures can only be used in the values section of a Pivot Table – you can’t use them in the “filtering” areas like Slicers, Rows, Columns, and Filters.  In the pivot table above, the bands are used to slice the pivot table on rows, and hence a measure wont work.

I Could Write an If Statement

So the objective is to create a calculated column in the customers table that groups people based on age into a stratified band.  Any Excel user would know that you could write an IF statement for that (DAX has very similar syntax for the IF statement in regular Excel).  The DAX syntax for a single IF statement is as follows

The DAX Calculated Column above will return one of two possible values depending on the age of the customer.  The complexity comes in when you want to have multiple conditions.  As I mentioned earlier in this post, there are 6 possible groups that I have in mind (shown here again).

image

To do this with IF statements, I would need to nest multiple IF statements inside each other as follows.

Do you see the problem?  This Calculated Column works, but it is very hard to write, read and modify.  And more importantly, there is a better way – use the banding technique.

Process for Creating Banding

The better way involves the following process.

  1. Create a table in Excel that contains the group names and the lower/upper age limits
  2. Load the table to Power Pivot
  3. Do not connect the table to any other tables – it is a disconnected table
  4. Write a DAX Calculated Column that compares the age of each customer against the lower/upper limits in your disconnected table and returns the single row from the banding table that matches for each customer.

Here is the detail of how to do it.

Create a Table in Excel

Here is what a table in Excel looks like.

image

The key features to note are

  1. There is and ID column – this will be used to sort the Band column later
  2. A “Band” column – this is the label that describes each group.  Note the groups are mutually exclusive and collectively exhaustive (MECE).
  3. There is a “from” and “to” column that set the lower and upper bounds of each age group.  Note that the upper age for one group will match the lower age for the next group.  I have used DAX to ensure no overlap in the Calculated Column.

Write a Calculated Column

The objective of the calculated column is to filter the disconnected Banding table so that 1 and only 1 row is “visible” or “unfiltered” for each customer.  Here is the formula

The role of the FILTER portion of this formula is to filter the AgeBands table so only a single row is unfiltered. What the FILTER does is apply 2 filter rules to ensure the customer is older than the “from” column and “less than or equal” to the “to” column.  It is this less than or equal to portion that ensures no cross over of individuals into multiple groups (Mutually Exclusive).

The VALUES function has a unique capability.  VALUES returns a single column table.  If that table also has a single row, then VALUES converts the table to a scalar value and allows that value to used as a result in a calculated column or indeed in a pivot table cell.  Given the formula guarantees that there is 1 and only 1 row left unfiltered in this banding table, then VALUES will kick in, extract the label for the group from the AgeBands[Band] column and place that value in the calculated column.

Sorting the New Calculated Column

There is one more problem to solve – the names of the new groups do not sort properly in the Pivot Table.  By default, columns of data will sort in Alphanumeric order.  So the labels will sort like this:

image

To solve this problem, I created a second calculated column that brings in the Sort ID into the customer table.  It is simple to do this – just copy the formula in the first calculated column and then update the VALUES portion with the “ID” column as follows.

Once this second calculated column has been created, it is possible to tell Power Pivot to use the new column of Sort ID values as the sort order for the Age Band column (shown below).

image

Compression of the Calculated Columns

As I mentioned earlier, I am not a big fan of calculated columns – except where they are truly needed.  There are a few guidelines about when it is OK to use a calculated column – you can read about them all in the link I provided to my Knowledge Base earlier.  The key points to note here however are.

  1. These calculated columns are on the lookup table (Customers).  Calculated columns on lookup tables are generally OK.
  2. The calculated columns have a low cardinality (ie a low number of unique values).  This is also generally OK, even on a large data table if necessary.

Now to Use the New Calculated Column

Now that the Customers table has this new “Age Band” column, it is possible to use this column in visualisations.  It can be used on rows, columns, filters, and slicers.  Here is a rows example:

image

And a slicer example:

image

Learn to Write DAX

L2WD banner ad

The banding topic is just one of the many techniques I cover in my book “Learn to Write DAX“.  If you are an Excel user and are learning to use DAX, my book gives a comprehensive coverage of all the topics you need to start your journey as a Power Pivot ninja.

What is Power BI

This is the third in my “what is” series having previously blogged about “What is Power Pivot” and “What is Power Query”.  This topic is the hardest of the three to give a straight answer, however it is also the most important to understand (as I will explain below).

Why Power BI is so Important

Microsoft Power BI is the single most important thing to happen in the data analytics market since the introduction of Microsoft Excel 30 years ago.  The reasons it is so important include:

  1. It is a Microsoft product, and it is a strategic priority for the company.
  2. It is built on the heritage of two of the most mature and best BI products available (SQL Server Analysis Services and Microsoft Excel)
  3. Despite the heritage from SSAS and Excel, Power BI is actually being built/rebuilt using the latest technologies including HTML 5.0, cloud computing, column store databases and modern smart phone mobile apps.
  4. Microsoft is listening to the community to determine what it builds and delivers next.  Let me state that again – Microsoft is listening and delivering!
  5. Microsoft has opened up the visualisation layer to open source contributions via the custom visuals gallery, so there is (and will continue to be) be lots of community added value.  Read about that in my blog here.

Gartner has recognised the trend towards self service business intelligence and also Microsoft’s leading position in this space.  You can look at the 2016 report on this topic here if you are interested.

Why it is Hard to Explain What Power BI is

Microsoft is known for its great products, however it doesn’t always have a great record when it comes to naming products.  The naming of Power BI products has been confusing to say the least.  I wont spend time going back through all the confusing history but instead I will provide my spin (below) on the least confusing way to consider Power BI products today.  I first covered this framework in my blog about the best training for Business Analysts.

So, What is Power BI?

Power BI is a suite of Power BI branded products plus an extended family of umbrella products that collectively change the way people use and analyse data.  All Power BI products (regardless of which group they are in) have all the attributes of a robust enterprise strength BI solution but in addition they all have best in class self service BI capabilities What this means to the average business analyst is that you don’t need to run to IT every time you need something done hence avoiding the normal time, cost, and money conflicts that so often prevent success.

If you want to learn to be a Power BI Ninja, you will need to learn skills in all of the following areas:

  • Extract and Load your data ready for analysis – you use Power Query for this.
  • Enhance the data you have loaded (using Power Query) with relationships and calculations (measures) that can be used to show and find business insights – you use Power Pivot for this.
  • Create reports that bring out the insights from the data, and allow users to easily interact with the data – you use Power BI Desktop for this
  • Distribute the reports  and Dashboards to those that need it – you use Power BI Service and Power BI Mobile for this.

Power BI Branded Products

There are 3 Power BI branded products.

Power BI Service

Power BI ServiceThis is the cloud solution that makes sharing live business analytics a snap.  Need to share a 250MB report – no problem!  It is possible to share small or large reports and dashboards internally within your organisation, externally with third parties, and also share non sensitive information publicly over the web (as I have done with this demo).

The Power BI Service has a completely new visualisation front end built from scratch. The capabilities are still relatively immature however the pace of improvement is stunning, and the addition of the Open Source visualisation tools discussed earlier make it a sure bet.

The service can store your data online, and/or connect to your data in the cloud via Azure (or similar), and/or it can connect to your in house data via a Power BI Gateway.

Power BI Desktop

This is the free Power BI authoring tool that allows you to quickly and easily analyse your data and find insights directly on your PC. The general process is to connect to your data, reshape the data if needed, load the data into the tool, then build striking visualisations that help you analyse your business.  You then can save the workbook (just like you do in Excel), distribute the file for free to other people (eg email, place on a network drive) or better still – publish the file directly to the Power BI Service with just a couple of clicks.  Once the analysis is loaded to the Power BI Service, it is easy to share with others.

You should author your Power BI reports in Desktop over Service, as you can backup your Desktop files but you can’t backup your Service reports.

Power BI Mobile

Power BI MobileThis is the free mobile tool that allows you to consume any of the reports that are available to you in the Power BI Service.  There are versions for all the major tablets and smart phones.  This software is also fairly new and still being enhanced.  It is not perfect, but it is good enough to get started and it will only get better from here.

Power BI Umbrella Products

There is a suite of underlying “products” and technologies that support the Power BI branded products.  These umbrella products covered below are the supporting technologies sitting behind Microsoft’s Self Service BI Strategy (and also Enterprise BI in some cases).

Power Pivot

Power PivotPower Pivot is a data modelling tool.  It allows you to take tables of data, create relationships between those tables so they work together (without VLOOKUP!), and then enhance the data model so that it includes specific calculations that turn your raw data into useful business insights.  This work has historically been done by the IT department, however it can now be done by competent business users using the language of Power Pivot – DAX (Data Analysis Expressions).

Power Pivot is available in Excel 2010+, SSAS 2012+, and Power BI Desktop/Service.  It is the same technology across all of these products.  If you learn it once, you can use the same skills across all of these products.  You can read more about Power Pivot here.

Power Query

Power QueryPower Query is a data acquisition and transformation tool.  It helps business users grab data from anywhere it resides, then cleanse, reshape and prepare the data ready for the analytics work to come – no IT guru needed any more. Power Query is a user interface driven tool that makes is easy for anyone to manage and load data, however it is supported by a powerful programming language that makes it super powerful.  You can read more about Power Query here.

Power Query is available in Excel 2010+ and Power BI Desktop/Service.  It  (annoyingly) has been renamed “Get & Transform” in Excel 2016 and “Get Data” in Power BI Desktop.  What were they thinking?

L2WD banner ad

What is the right way to learn Power BI?

There are lots of different learning styles, but here are the best resources I know of for learning these tools.

Live Training

Reading Books

I keep an up to date recommended reading list in my knowledge base here.  As well as information about books, I keep a list of great websites to learn from down the bottom of the page.

Online Learning

There are lots of online resources to learn. Some of the best I have found are:

Many to Many Relationships in DAX Explained

Level: Advanced (but explained in detail so everyone can understand)

There is a lot to learn in DAX if you want to be a ninja.  A couple of the more complex and important things to learn and understand are filter propagation and context transition.  It is not so much that you need to be a rocket scientist to understand these concepts, it is more that they are not naturally intuitive.  You simply have to learn how filter propagation and context transition work.  What’s more, if you have an Excel background, there are some fundamental differences between the way Power Pivot works vs regular Excel, and you have to learn these things too.  All these things are learnable – but indeed you will need to learn. You need an understanding of filter propagation and context transition to understand how to solve the Many to Many problem below – but don’t worry – I will explain it in detail in this post.

The Many to Many Problem – Bill of Materials

The problem I am going to cover today is the DAX Many to Many problem.  All relationships between tables in DAX are of the type 1 to Many* – there is no support for Many to Many relationships in DAX (*Note: in Power BI there is also a 1 to 1 relationship type).

Now for a lot of people this won’t make any difference as there is no need for a Many to Many relationship.  But sometimes there is a DAX data model that has 2 or more tables that have multiple records in each table, and these tables have a logical business relationship to each other.   An example will make it easier to understand.

In the example below, each product exists in the Product table (orange) only once, but each product can be sold many times, and hence it can appear in the Sales table (blue) many times.  This is a standard 1 to many relationship that DAX was built to handle really well.

image

Some of the products in the Sales table are actually “multi product” products.  eg Product ABC is a box of fruit that contains 3 different products (2 x Apples, 3 x Bananas and 1 x Cantaloupe – 6 items in total) .  If you want to see the individual products that were sold, you would need to create a many to many relationship between the Sales table and the Bill of Materials table (green).   The product ID in the sales table has many instances of each product ID (1 for each sale), and the Bill of Materials table also has many instances of the Product ID (1 for each sub product). This is allowed in a traditional relational database tool like Access or SQL Server but it is not allowed in Power Pivot.

Using ONLY 1 to Many Relationships to Solve the Problem

Given that you simply must use 1 to Many relationships in DAX*, the only workable setup of the relationships between the tables is shown below.  There is a 1 to many relationship from the Products table to the Sales table.  There is also a 1 to many relationship from the Products table to the Bill of Materials (BOM) table.  But there is no relationship between the Calendar table and the BOM table because the BOM table doesn’t record the date of the sale; it only records the quantity of each sub product.

image

Now for Some Simple Measures

I have created 3 measures here.

Total Sales Qty=SUM(Sales[Qty])
Total BOM Qty=SUM(BOM[Qty])
Total BOM Sales = SUMX(Products,[Total Sales Qty] * [Total BOM Qty])

When I put these 3 measures in a pivot table with Calendar Date in Rows, this is what I get (shown below).  See the problem?  The measure [Total BOM Qty] is giving the wrong answer – I get the same value for every row in the Pivot Table.  What is going on here?

image

To understand the problem in DAX, you need to have a very clear understanding of automatic filter propagation between relationships.

The Many to Many Problem Explained

Below is the Power Pivot data model again.  I always layout my tables using the Collie Layout Methodology – that is, the lookup tables are at placed at the top and the data tables are placed at the bottom.  This has no effect on the behaviour of the model but it makes it much easier to visualise how filter propagation works.  Lookup tables always have 1 and only 1 row for every object in the table, and there must be a unique identifying key (date in the case of the Calendar table and Product ID in the case of the Products table).  Data tables also must have the same key (date and Product ID – otherwise you can’t join the tables) but data tables are allowed to have as many duplicates as needed (many sales are made on the same date, and the same product is sold many times).

image

Filter propagation automatically flows from the 1 side of the relationship to the many side of the relationship but it does not automatically flow in the other direction.  When you use the Collie Layout Methodology like I have here, we say that filters always automatically flow downhill – they can’t automatically flow uphill. So in the image above,

  • Any filter on the Calendar table will automatically flow through the relationship to the Sales table (shown as 1 to 2 in the image above).
  • Any filter on the Products table will automatically flow through the relationship to the Sales table (3 to 4) and it will also flow to the BOM table (5 to 6).
  • But very importantly, filters will not automatically flow from the Sales table to the Products table (7 to 8), nor uphill through (9 to 10) nor (11 to 12).  The implication is that when you set up a pivot table like the one shown earlier, the Calendar table will filter the Sales table, and hence the Total Sales Qty will be correct.  But there is no automatic way for the Calendar table to filter the BOM table because filters don’t automatically flow up hill. Hence the BOM table is completely unfiltered by the Calendar table.  The result of Total BOM Qty will therefore always be the quantity of the entire BOM table – completely unfiltered by the Calendar table (it is of course filtered by the Product table).

This is an Easy Problem to Solve in Power BI Desktop

In Power BI Desktop this is an easy problem to solve – I will explain how now before going back and solving it for Excel 2010/2013.  There is a feature in Power BI Desktop called Bi-Directional Cross Filtering that will allow you to change the natural “down hill only” filtering of the relationship between tables so that it flows in both directions. As you can see in the data model below (from Power BI Desktop), I have swapped the filtering direction of relationship 2 (Products to Sales) to be bi-directional (these arrows indicate the direction of filter propagation in Power Bi Desktop – which is very helpful indeed.  We can thank Rob Collie for lobbying for this UI improvement, and Microsoft for listening).

image

When you make this change, the Products table will be automatically filtered based on the entries in the Sales table – reverse “up hill” filtering.  Once the Products table is filtered by the Sales table (due to the bi-directional cross filtering behaviour), then the new filter on the Products table will automatically propagate downhill through relationship 3 shown above.  As a result of the end to end flow of cross filtering:

  1. The Calendar table filters the Sales table,
  2. The Sales table filters the Products table,
  3. The Products table filters the BOM table

The net result is that the Calendar table is now filtering the BOM table even though there is no direct relationship between these 2 tables.

When I create a Matrix in Power BI Desktop (a matrix is similar to a Pivot Table in Excel), I get the correct answers as shown below).

image

But there is another problem. Note that Total Sales x Total BOM Qty doesn’t automatically equal Total BOM Sales at this level (eg on 8th Jan). I need to bring the BOM ID column into my Matrix so I can see exactly which BOM items sold each day.  When I do this I get a similar problem to before.  See in the Matrix below that the BOM Qty is correct for each BOM ID, but the Sales Quantity is the same for each of the BOM IDs in the Matrix – this is not correct.

image

This is an almost identical problem as the first one.  Let’s look at the data model again (below).  The Matrix above has the BOM ID column on Rows in the matrix.  This column comes from the BOM Table (shown as 1 below) and because it is on rows in the matrix, it is normal to expect that this will filter the measure [Total Sales Qty].  But remember filters automatically flow down hill, not uphill.  So the BOM ID column is not filtering the Products table (2) and hence the Products table is not filtering the Sales table based on the BOM ID.  The net result, you get the same Total Sales Quantity regardless of the BOM ID because the BOM ID is not filtering the Sales Table.  The simple answer to this problem (in Power BI Desktop) is to change the cross filtering behaviour of the relationship (4) from single to Bi-Directional – just like before.

image

Once you make this change, you will get a fully working report that makes sense to anyone reading it.

image

How to use DAX to force the Calendar Table to filter the BOM Table

OK, now that you understand how to solve this with Bi-Directional cross filtering, hopefully you will realise what needs to be done to solve the problem in Excel 2010/2013.  Here is the data model again (shown below).  I need to force the Sales Table to filter the Products table (shown as 1 below) and I also need to force the BOM table also to filter the Products table (shown as 2 below).  If I can force the 2 Data tables to filter the common Products table, then the Products table will do its job and pass those filters through to the other data tables automatically, hence solving the problem.  Stated another way, I want the Sales table to filter the Products table, then the products table will filter the BOM table automatically.  I also want to force the BOM table to filter the Products table as well – then the products table will automatically filter the Sales table.  I am trying to get the 2 data tables to filter the common lookup table so the common lookup table will then pass the filters on to the other table

image

If I were to write these formulas using “Pseudo DAX”, the 2 formulas would read like this:

Total Sales Qty=
     CALCULATE(
         SUM(Sales[Qty]),
         'Filter the Products table based on the rows in the BOM table first'
     )

Total BOM Qty=
     CALCULATE(
         SUM(BOM[Qty]),
        'Filter the Products table based on the rows in the Sales table first after applying filters from the Calendar table'
     )

So now all I need to do is find a suitable filter to replace the “Filter the Products table…  ” portion of each formula and I will achieve the outcome.  There are many ways to do this, but first I am going to show you a method using the FILTER function, and then I will show (and explain) another method using Black Magic from The Italians.

Total Sales Qty

Let’s start with this formula.

Total Sales Qty=
     CALCULATE(
        sum(Sales[Qty]),
        'Filter the Products table based on the rows in the BOM table first'
     )

How can I write a filter statement to put inside CALCULATE that will filter the Products table based on the values in the BOM?  Let me show the formula and then explain what it does.

=FILTER(
   Products,
   CALCULATE(COUNTROWS(BOM)) > 0
 )

The FILTER function is an iterator and hence it has a Row Context.  The above FILTER formula iterates over the Products table and returns a filtered table of all rows in the Products table that pass the given test.  At each iteration (ie each product) in the Products table, the CALCULATE function forces context transition (turns the row context into a filter context) and hence the BOM table is filtered for the current row in the Products table iteration.

Then the FILTER formula asks the question “Now that it’s clear that we are only talking about this one single product for this single step of the iteration process and we have filtered the BOM table to reflect this, are there currently any rows visible in the BOM table?”.  If the answer is yes, then FILTER keeps that product, if the answer is no, then FILTER discards the product.  FILTER then goes to the second product in the Products table, then CALCULATE again forces context transition for this second iteration and the BOM table is filtered so that only rows of this specific second product are visible in the BOM table, and then FILTER completes the COUNTROWS check again.  This process goes on for every product in the Product table (all those in the current filter context anyway) and then FILTER is left with a new Filtered Table of Products that contains only products that also exist in the BOM table in the current filter context.

What if you leave out the CALCULATE?

It is worth pointing out here that the following filter formula will not work.

The problem with this second formula is that there is no CALCULATE wrapped around COUNTROWS(BOM).  FILTER is an iterator and has a Row Context.  But a Row Context does not automatically create a Filter Context.  So when the FILTER function steps through its iteration process and gets to the first Product, there is no Filter Context and hence the BOM table is not filtered by the new iteration process.  COUNTROWS(BOM) will therefore be the total number of rows in the original table in the original filter context, every product will therefore always pass the test (or always fail – depending on the initial filter context) and there will be no change to the new Filtered Products table.  The net result is the new Filtered Products table is actually identical to the original Products table – no change in filtering at all.  The formula simply doesn’t work.

Bringing the Correct Formulas Together

So putting the correct filter formula inside the CALCULATE from earlier, I end up with this formula.

Total BOM Qty

Now I can just apply the same pattern to the other formula, switching out the table names.

And here is the working Pivot Table, same as in Power BI Desktop earlier.  The BOM table is filtering the Sales table, and the Sales table is filtering the BOM table – Many to Many using DAX formulas in action!

image

Now for the Italian Black Magic

There is another way you can write these formulas that is simpler to write and easy to read – unfortunately it is difficult to understand how it works – Marco Russo calls it Black Magic.  Here are the 2 formulas.

When you compare these Black Magic formulas against the Pseudo DAX formula I wrote earlier, you will see that I am using the BOM table as the filter expression in the first formula, and the Sales table as the filter expression in the second formula.  This doesn’t make any sense on first sight.  If filters always propagate from the one side of the relationship to the many side of the relationship, how can these formulas possibly work?  This can be explained with “Expanded Tables”.

Expanded Tables

Power Pivot is built on top of some more traditional database technologies and hence what happens inside Power Pivot can be converted (or thought of) in more traditional database patterns and structures behind the scenes.  In SQL terms, the relationships between the Sales table (shown as 1 below), the Calendar table (2) and the Products table (3) are:

Sales Left Outer Join Calendar ( 1 to 2)

Sales Left Outer Join Products ( 1 to 3)

image

If I had these tables in a relational database, I could materialise the sales table into an Expanded Table that contains all the original Sales columns, plus the Calendar table columns and the Product table columns.  To do tihs I could write the following SQL Query:

Select * 

from Sales 

Left Join Calendar on Sales.Date = Calendar.Date 
Left Join Products on Sales.[Product ID] = Products.[Product ID]

The above query will return the following Expanded Table with columns of data coming from 3 different tables.

image

Technically speaking when I place the Sales table as a filter argument inside the following formula…

I am actually placing the Expanded Table – the Sales table plus all the relevant records from the other tables on the one side of the relationships.

The Expanded table will still be filtered by the current filter context.  So if there is a filter on the Calendar table (say for 3rd Jan), then the Calendar table will filter the Sales table AND the Expanded Sales Table.  If I re-run the SQL code with a filter on Calendar[Date] = ‘3 Jan 2016’ I get this new Expanded Table.  The Calendar table is filtering the Sales table, and the Sales table is filtering the Products table.

expanded

So when Sales table is used as the filter portion of the CALCULATE function, you can only “see” the Sales table, but it is actually the entire Expanded Sales table (including the Calendar and Products tables and any filters from all of these 3 tables) that is doing the filtering, not just the single Sales table.  Filters from all 3 tables are therefore effectively filtering the BOM table and that is why it works.

Here are copies of my workbook files if  you want to take a look.

Where to learn more

I have learnt most of my advanced DAX knowledge from The Italians (Marco Russo and Alberto Ferrari).  There is an excellent video available here where Alberto explains Many to Many relationships.  I also recommend the book “The Definitive Guide to DAX” for anyone that wants to develop a deep mastery of the DAX language.

For people that are earlier in the learning stages of DAX, you really must master filter propagation and context transition before you can move forward.  If you haven’t mastered these techniques, I recommend you invest some time going back over the basics and make sure you have a solid understanding of the pre-requisites.  The fastest, cheapest and most effective way you can do this is to read my book “Learn to Write DAX”.
L2WD banner ad