# Uses for DAX as a Query Language

This is the final article in my series about DAX as a Query Language.  In my first article I introduced DAX as a Query Language and then in my second article I went in deeper to the specifics on how to use some of the more useful DAX query functions.  In this final article today I am going to cover a few of the many uses for DAX Queries and also introduce one last DAX function – the ROW function.

# Create Summary Tables

As you have may have seen in my previous articles, you can use DAX Queries to create summary tables like the one below.

This may be useful if you want to pre-calculate this summary table and have it as part of your data model (more on that covered in the “Materialisation” section below).  If you want to create a summary table for any reason you can do it using DAX Queries.

## Create Tables to Use As Filters Inside  CALCULATE

Another major use of queries is as a table input inside a CALCULATE function.  Because the virtual tables retain lineage to the data model they behave as if they were physically part of the data model.   The following measure returns the total sales for all unique combinations of product category and customer gender where the margin to those unique combinations is greater than 45%.

This measures doesn’t have any real value however it does demonstrate the concept well.

## Create Tables to Use in Iterators

You can also create table to use as the table parameter in an iterator like SUMX or AVERAGEX.  The following measure returns the straight line average Margin % across all the actual combinations of product category and customer gender.

Here is an article I wrote some time ago that explains how to use DAX Query tables instead of a more complex nested SUMX approach.  This article will be much more useful with your new understanding of DAX Queries.

## To Debug DAX Formulas (the Table Bit)

One thing I really like about DAX Queries is they allow you to “Inspect” tables that are used inside DAX Measures and Calculated Columns.  Excel users come from a world where everything thing they do – every formula and every number – is materialised in front of their eyes in the spreadsheet.  When you begin to write more complex DAX Measures or Calculated Columns, the formulas will often be in the following format:

Your Measure = CALCULATE([Some Base Measure], <A complex table function used to filter the data model>)

The issue of course is you can’t actually “see” the result of “the complex table function used to filter the data model”.  If you can’t see it, it can be hard to conceptualise and also hard to debug the measure particularly if it gives an answer different to your expectations.  This is where the concept of materialising tables comes in.  A great way to move forward when you are stuck is to write a DAX Query to return just the table portion of your measure first.  Once you have that working correctly you can insert the table portion into the measure.

## Materialise Tables Into Excel

It is possible to materialise a DAX Query into Excel by using a trick that I learnt from The Italians.  For this next example I have an Excel Workbook that contains an Adventure Works Power Pivot data model.  The first thing to do is insert a Table* (not a Pivot Table) connected to the data model.  To do this, go to a blank area of the spreadsheet (shown as 1 below) and then select Data\Existing Connections\Tables, then select any table (I have selected Products shown as 5 below) and then click Open.  *At this point you can select any table as it is going to change shortly anyway.

You will be given an option to insert a Pivot Table however you also have an option to insert a Table as shown below.  Go ahead and click OK to insert this table.

Now that you have a table inserted into a spreadsheet from the data model, it is possible to change what is returned in this table.  To change what is returned in the table, right click anywhere inside the table (shown as 1 below) and then select Table\Edit DAX.

Now you have the opportunity to change this table from being a “table” to being a DAX Query as shown below.

I have now taken a DAX Query I wrote earlier and pasted this into the expression box (shown below) to return a summary table into the spreadsheet.

You can manually refresh this table by right clicking in the table and clicking “Refresh”.  Unfortunately there is no easy way to pass filters to this type of table in Excel although I believe it can be done with VBA. That may be a topic for another day.

## Materialise Tables Into Power BI

You can also materialise DAX Query Tables into Power BI using the “New Table” function.  This can be very useful to help you debug the formulas you are writing (as I covered in this article).  The only thing different to note here is that you need to omit the EVALUATE statement inside Power BI New Tables.  The example below materialises the table (shown as 5) from the (formula shown as 4).

# Check the Performance of your Measures using ROW

DAX Studio has a really cool feature that allows you to understand how Power Pivot is completing its task of calculating a formula under the hood.  To demonstrate the point I need to introduce an new DAX Query function called ROW.

You can see the syntax for ROW below from the DAX Studio Intellisense.

ROW returns a single row table.  Because a DAX Query can ONLY return a table, it follows that if you want to see the result of a measure inside DAX Studio then you must first convert the scalar result of the measure into a table – this is exactly what ROW does.

Below is a simple example where I return the value of the [Total Sales] measure as a new table.

My ROW function (shown as 1 above) returns a single row table with 1 column (called “my result”) and the row has the value 29.3m (shown as 2 above) which is the total for the entire database.  Note the Server Timings button shown in 3 above.  If you click this button it will enable the capability for DAX Studio to keep track of how long your query takes to execute and also how the query is being executed.

With the server timings enabled, a new tab will appear in DAX Studio (shown as 1 below).

After executing the query, the server timings tab will show the total time taken to execute the query in milliseconds (shown as 2), how much time each engine in Power Pivot completed the tasks (Formula Engine and Storage Engine shown in 3), and if the Storage Engine Cache was used (shown in 4).

# SE or FE?

This is a huge topic in its own right and way beyond the scope of this article. What I will say here is that in general you want to leverage the Storage Engine in preference to the Formula Engine.  Take a look at some of the key points below.

Storage Engine Formula Engine
Really Fast Really Smart
Can retrieve records without decompressing in the right conditions Iterates in a row context and hence it can be slow
Cached Not cached

I will cover the Storage Engine and Formula Engine in another post sometime.

## CROSSJOIN Revisited using ROW

Earlier in this series I warned that CROSSJOIN can return very large tables, but this is only really an issue if you try to materialise the tables.  If I write the following query in DAX Studio, it takes forever to execute and I ended up canceling the query (remember there are more than 7 million rows in this table).

The issue is not the creation of the above table in Power Pivot – the issue is the materialisation of this table in DAX Studio.  Consider the following query that uses ROW to return a single row table containing the count of the rows in the CROSSJOIN query.

The above query executed in 400 milliseconds on my PC confirming that Power Pivot can create these large tables really quickly.  If you want to materialise the results, well that is a completely different thing.

# An Introduction to DAX as a Query Language

Level: Intermediate

I have written quite a few articles that contain embedded references to DAX Queries and DAX Table functions in the past.  Today I am writing an article that introduces the DAX query language from the beginning so you can learn how to use it and how it will help you to become a Power BI superstar.

# What is a Query Language?

A query language is a programming language that allows you to send a request to a database and then receive data back in the form of a table.  If you only have a few hundred rows of data then this is not overly useful.  If on the other hand you have millions of rows of data then it becomes very useful.

The most important difference between DAX when used in Measures and Calculated Columns and DAX as a query language is that the query language always returns a table.  The resulting table will consist of one or more columns, none or more rows, and data inside none or more of the cells in the table.

# Why Do I Need a Query Language?

When you create a data model in Power BI or Power Pivot for Excel, you are essentially loading tabular data into a dedicated reporting database for the expressed purpose of analysis and/or reporting.  Once it is loaded, you (the report author) then typically spend hours of effort building out the data model so it can be used to add value to your business/job.  Finally once this is all done, the most common way to use your new data model is to build reports and pivot tables that present data to a user in a way that is easy to understand.

Below is an example of an interactive Power BI report that runs off a Power BI data model.

And here is a Pivot Table that runs off an Excel Power Pivot data model.

Both of these above visualisations are summarisations of the underlying data and in both cases you can’t actually see any of the the underlying data itself, just the summary.  The visualisation tools above are great and have enormous utility however they are generally not the best tools if you want to see the underlying data itself or if you want to extract very large sets of data for whatever reason.

Once you have invested all this effort in building your data model, you are going to want to use it to the fullest extent. Power Pivot and Power BI can handle many millions of rows of data (unlike traditional Excel) and so the “old way” of simply looking at the underlying table is not really practical in this brave new world.

# Show Me the Underlying Data!

If you take the pivot table I showed earlier, most Excel users would be aware that they can double click on any value cell (say 31.6% representing Mountain Bike sales to people aged between 30 and 40 years of age).   When you double click on a traditional pivot table in this way, something magical happens – a copy of the underlying data is extracted from the source and it magically appears in a new sheet in Excel.  Once you are done looking at the copy of the data, you can delete the new sheet and you are back where you started – life is good.

When you double click on on a traditional pivot table in this way, you are effectively sending a query to the underlying database and the query response is a table of data.

# But Does it Work With Power Pivot?

In short, yes and no depending on what you are looking for.  The experience is different if you have an Excel Power Pivot data model or if you are connected to an external SSAS data source. In my experience the results are inconsistent at best and at worst it may not give you what you expect.  But the good news is that with DAX as a query language you can easily write any query you want and extract exactly the subset of data you need.  You no longer have to rely on double clicking your pivot table to interrogate your data.

# Time To Learn DAX as a Query Language

OK, enough of the chatter, time to get serious about DAX as a query language.  You have invested time and effort in building your data model in Power BI or Power Pivot and now you rightly want to extract a subset of your data so you can take a look at it.  Let’s get started.

# First Select the Right Tool

There are 3 general approaches you can take to extract a table of data from your fabulous data model.

• Use a dedicated tool like DAX Studio or SQL Server Management Studio.  I wrote an article describing how to install and use DAX Studio here.
• Write a “New Table” in Power BI Desktop.  I wrote an article about how to use this feature here.
• Use an Excel “table hack” I learnt from Marco Russo to extract a table directly into Excel.  I will cover this hack later in this article series.

I am going to use DAX Studio in this blog article but you can use Power BI “New Table” if you want to follow along and you don’t have (or want) DAX Studio. In fact if you want to learn this stuff then you really should follow along using either approach.  In my experience you can’t learn new concepts simply by reading.  Please note however that using Power BI and Excel as a query tool should be considered as a temporary extraction of the data.  Once you have looked at the data you want, you would normally delete the table so you don’t end up storing redundant data (you can always extract it again if you need it).

All DAX queries must start with the EVALUATE statement. There is an exception however.  When you use the Power BI “New Table” button to write a query, you simply omit this EVALUATE statement.  This is no doubt another example of “Syntax Sugar” where the developers shield the user from some of the more confusing syntax.  If you are following along and testing my queries below in Power BI, then simply omit the EVALUATE statement when you write the queries.  In addition if you ever want to move your DAX Studio query to a table in Power BI, then you can just remove the EVALUATE statement at that time.

# Connecting DAX Studio to the Database

When I launch DAX Studio I am prompted to connect to my database. In this case I have an instance of Power BI Desktop with an Adventure Works data model running on my PC and DAX Studio gives me an option to connect to that.  You can also connect to Excel Power Pivot data models and Tabular SSAS servers.

If you want a more comprehensive overview of the DAX Studio UI, then read my other post I mentioned earlier.  Otherwise let’s get into it.

# Return an Existing Table

The easiest DAX query you can write is to simply return a complete copy of an existing table.

When I run this query in DAX Studio (shown as 1 below) and then press F5 to run, the entire table is returned (shown as 2).  A total of 18,484 rows of data (shown as 3).

This in itself is not overly useful as chances are you have this table already in some source system.

# List of Unique Values

In DAX there are many functions that return tables instead of scalar values.  I covered this topic in depth last week so you can read up on that here if you like.

Let me now look at the ALL function.

When I execute this table function as a query, I can actually “see” the results of the query.  In this case the query returns a single column table consisting of 4 rows – all the unique product categories.

Next let’s look at VALUES.

In this instance, the VALUES table function returns exactly the same result.

Now let’s look at DISTINCT.

And same again here.

All three of these table functions return the same table in this instance.  In reality they all do different things in different filtering scenarios.  This is one of the great things about DAX Queries is that you can “inspect” what is happening to these table functions in different scenarios.  More on that later.

# A Two Column Table

The ALL function can take 1 or more columns as an input (or a table).  In the example below I have used ALL to materialise a table of all of the combinations of Product Category and Product Sub Category.  There are 37 unique combinations in all.

# A Filtered Table

Most people that know DAX will know of the FILTER function.  The FILTER function returns a filtered copy of a table.  eg see the example below

This is a much more interesting query because it returns a table containing all the products that have sold more than \$200,000 of sales over all time.  There are 49 such products.

# Altering Filter Context

When you write a DAX measure or Calculated Column, there is only 1 way you can change the filter context and that is using the CALCULATE function.  CALCULATE returns a scalar value after applying filters.  CALCULATETABLE is the sibling of CALCULATE.  As you can probably guess, CALCULATETABLE returns a table after applying filters.

This next query returns exactly the same result as the last query.

the main difference with using CALCULATETABLE compared to just using FILTER is that CALCULATETABLE can return a different table to the FILTER function itself.

The above query uses the same FILTER function from before, but instead of returning a table containing the products that sold more than \$200,000, it returns all of the actual sales transactions that contributed to those products qualifying (see results below).

# Combining Table Functions

Now that you see how some of the standard table functions work in a query, let’s look at combining them together into a single query.

This query first applies a filter on all products that have sold more than \$50,000 (line 4 below), and then returns a list of unique product subcategories that contain these products.

One interesting thing in the query results above is that “helmets” are in the list (the only non-bike subcategory). To see what the exact products where that sold, it is simply a matter of writing another query as follows.

Note how I was able to reuse the the original query structure to return the list of product names with just a minor change to the query.

# More to Come Next Week

This is a big topic – too big for a single blog article.  This week I introduced the topic of DAX queries, I showed how you can write queries to return tables and how you can combine table functions together to interrogate data.  Next week I will introduce some new special query functions that extend the capabilities even further.

# Power BI Skills – Thinking of Data in Tables

Level: Beginners

One very important concept that Excel users often struggle with when starting on their Power BI and Power Pivot journey is that they need to learn to think about data in tables rather than thinking about data in cells.  In the spreadsheet world every cell can be a different formula or value, but in the Power BI and Power Pivot worlds, it doesn’t work this way.

# The Excel Way

The Excel way is very simple.  You load data into the spreadsheet – each cell in the spreadsheet contains one data element (one scalar value).  From there you can write a formula in a single cell (see cell E2 below).

If you want the same formula to appear many times in your spreadsheet, you copy the formula down the page. The key point here is that every cell in the above example is either a scalar value or an individual formula that returns a scalar value. That is how traditional Excel works and that is why us Excel folk are comfortable thinking about data in this way.

## But What About Excel Tables I Hear You Say?

Yes, good point.  Sometime in the past Excel was given a table object as part of it suite of tools.  Many Excel users don’t know about these and hence don’t use them.  Excel tables solve quite a few issues compared with Excel cell ranges including:

• allowing you to write a single formula, and have that formula apply to the entire column in the table
• extending the formulas in the table when there are new rows added to the table
• extending a reference (such as a pivot table source reference) so that it will point to a table (that can change size and shape) rather than pointing to a static range of cells.

Below I convert the same Excel range from the first image into an Excel table.  Note how I now only need to add a single formula to create “Sales Inc Tax”.  The single formula now applies to the entire table.

When you have an Excel table like the one above, you can then start to refer to the entire table or columns in other formulas in your spreadsheet.  See in the image below how the formula in cell G3 refers to a single column in the table.  As the table grows, the formula doesn’t need to be changed – it just works.

This use of tables in Excel as shown above is very similar to how it works in Power Pivot and Power BI.  You no longer think about clusters of cells and instead have to consider the entire table or column.

# Step 1: Thinking of Data in Tables in Power Pivot

The key things you need to know as you learn to think of data in tables in Power Pivot are:

1. Measures are written to operate over the entire data model.  You can (and normally do) get a different answer to the same formula depending on what filtering is currently applied to the columns, tables and visuals in your data model/report.
2. You can’t directly reference rows in a table. The only way to reference specific row(s) is to first “filter” the table so that the rows you want to access are “visible” (and all the rows you don’t want are “not visible”).  After the filter is applied, then the measure or calculated column (mentioned in point 1 above) will work for the filtered copy of the table (it is a little more complicated than that for calculated columns but that is beyond the scope of this article).
3. You need to learn to “imagine” what the table looks like with filters applied.  The reason you need to learn this skill is because many of the tables you will use in DAX formulas are ‘virtual tables’, that is the table is never materialised in a way that you can see it.  You need to learn to “imagine” what the table looks like in its filtered state.  I wrote a blog post at powerpivotpro.com that explains how you can use Power BI to physically materialise these virtual tables into temporary “test tables” to help you get started on your journey of being able to visualise what is happening.  Once you get the hang of it you will only need to materialise a table when you can’t work out why your formula isn’t working or if there is a specific question that is too hard to answer without looking (like question 5 that you will see later in this post).

## Here is some imagination practice for you

Imagine you have a calendar table containing every date from 1 Jan 2010 to 31 Dec 2016 as well as columns for [Day Name], [Month Name] and [Year].  It would look something like this.

This table of course would have more than 2,500 rows.  Picture this table in your mind (eg stop looking at the image above).  Now imagine what would happen if you applied a filter on the Year column to be equal to 2012 and another filter on the Month Name column to be Feb.  Once you have applied these filters in your mind, then answer the following questions by referring to your imaginary table in your mind.

1. What would the table look like now?
2. How many rows are there visible in this table with these 2 filters applied?

# Step 2: Thinking of Data in Columns

As well as mastering tables, you need to master columns.  Here are some additional questions for your imaginary table – this time the questions are about columns.

1. How many unique values now appear in the Month Name column?
2. How many unique values now appear in the Day Name column?
3. What is the minimum date that exists in the Date column?
4. What is the maximum date that exists in the Date column?
5. What is the Day Name for the last date in this filtered table?

The answers are are at the bottom of the page.  But do yourself a favour and answer each question yourself without cheating – this will help you understand how skilled you are with thinking of data in tables and columns using your mind’s eye.  In fact I don’t expect you to be able to answer number 5 without doing some research.

The reason you need to think about columns of data are two fold.

• Many of your formulas will operate over single or multiple columns.
• Power Pivot is a columnar database and it is optimised for working with columns.

As a general rule, it is more efficient to apply filters to individual columns one at a time rather than apply filters to multiple columns at the same time.  Consider the following two measures.

The second formula is much more efficient because there are 2 separate filters being applied to 2 separate columns and they are applied one at a time (in a logical AND).  This is much more efficient than asking Power Pivot to consider both columns at the same time.

Note the FILTER functions in these two measures above all return a filtered copy of the calendar table.  You can’t see the filter copy of the table and that can make it hard to understand what is happening.  But if you learn to imagine what the tables look like in their filtered state you will be well on the way to becoming a DAX super star.

# Step 3: Thinking of the Entire Data Model

The final thing you need to learn is that the entire data model operates as an enclosed ecosystem.  In the data model shown below, there are 4 lookup tables at the top of the image and 2 data tables at the bottom (from Adventure Works).

All 6 of these tables operate as a single system.  Filters applied to the lookup tables propagate downhill from the top (one side of the relationship) to the bottom (many side of the relationship).  Filters do not automatically propagate from the bottom to the top.  So once you learn to think about how filtering will affect a single table, you then need to take a further step to imagine how the filters will propagate to all other tables in the data model.

# Answers to the Quizzes Above

### Tables

1. What would the table look like now?  It would still have 4 columns but now it has only 29 rows.
2. How many rows are there visible in this table with the filters applied?  29.

### Columns

1. How many unique values appear in the Month Name column?  1 – Feb.
2. How many unique values appear in the Day Name column? 7, the days from Sun to Sat
3. What is the minimum date that exists in the Date column?   1 Feb 2012
4. What is the maximum date that exists in the Date column?  29 Feb 2012
5. What is the Day Name for the last date in this filtered table?  It is Wednesday, but this is a very hard question and too difficult to answer without either materialising the table or doing some research to check the day of week for the 29th Feb 2012.

# SQL Saturday Melbourne Feb 2017 Materials

I am currently in Melbourne for PASS SQL Saturday 2017.  SQL Saturday is an annual one day conference event that occurs in many major cities around the world.  It is focussed on helping SQL Server professionals learn more about their profession and also network with other like minded people.  I have a lot of people that are readers of my website material that are not SQL Server professionals – most likely they are Excel professionals.  But that doesn’t mean there is nothing for you at SQL Saturday.  I encourage you to look for an event in your location, read the session materials and decide if there is value for you – particularly if you are into Power Query and Power Pivot.  You can learn more at www.sqlsaturday.com/

SQL Saturday will be held in Sydney next Saturday 18th Feb 2017 and I would love to see you there.

# My Presentation: Disconnected Tables in Power Pivot

I spoke at SQL Saturday about disconnected tables in Power Pivot.  Most people know that you create relationships between tables in Power Pivot but did you know that you can load tables that are not joined (disconnected tables) and they can still add value.

# My Slide Deck

Here are the slides that I used in my presentation today for those that would like to download them.  I realise that they really just support the live demonstration, but they should have meaning for those that were there.  I have also linked to some of my other articles on 3 of the slides if you would like to read more.

SQLSat_MattAllington_DisconnectedTablesInDAX

# Best Practices for Power Pivot, Power Query and Power BI

Level: Beginners

There are many best practices for Power BI, Power Pivot and Power Query.  I know these things so intuitively now that it is very easy to forget how I incrementally learnt these things along the journey.  Most of these things are not “right” vs “wrong” – they are more often simply “better”.  I thought there would be value in producing a page that clearly outlines the important best practices as a learning reference for everyone.  In all cases I have outlined why it is a best practice to help the understanding.  You would be well placed to adopt these best practices as part of your DAX journey.

# Naming Conventions

## Naming your Columns and Measures

• Always write a Column in the format TableName[Column Name]
• Always write a Measure in the format [Measure Name]

This is the foundation of all the DAX formulas you will write.  Both columns and measures use the same square bracket syntax.  It is technically possible to write measures and columns both including the table name as follows.

• TableName[Column Name]
• TableName[Measure Name]

These 2 examples above do not cause a problem because their names makes it easy to know what they are, but if you always place the table name at the front of every measure and every column, then it will be impossible to tell them apart by reading the formula.  Take the following example

• TableName[Total Sales]

Is the above a column or a measure?  It is impossible to tell unless you are using the best practice naming convention.

Measures and columns are very different in the DAX language.  It is essential that you can tell at a glance which is which.  In addition if you hard code a measure with the table name and then you later move the measure to another table, any formulas referring to this measure will stop working.

## Give Tables a Single Noun Name

• Don’t just accept the table name from your source system.  Preferably give the table a single word noun description/name.

Many BI data sources will have long table names like fctSalesTransactionsHistory or dimCustomerMasterFile.  This is a common practice in IT and is related to the Kimball dimension modelling methodology.  The problem is that with Self Service BI these table names, column names and measure names are more visible to business users than ever before.  Given many business users are going to be reading these table names, it is much easier for them to comprehend the “Sales” table rather than the “fctSalesTransactionsHistory” table.  In addition earlier versions of Power Pivot do not have fully featured intellisense – you are required to refer to columns by starting to type the table name from the beginning.  If every table starts with either fct or dim, you have just added 3 additional characters you have to type for each formula before Intellisense can help you.

Also PowerBI.com has a natural langauge query tool that allows you to ask quetions of your data.  If your table names are ‘words’, then you are helping the engine find what you are looking for.

## Using Spaces in Names

• Don’t use spaces in table names
• Do use spaces in column names
• Do use spaces in measure names

If you use spaces in table names you will be forced to add single quotes around the table name each time you refer to it in a formula.  This makes the code longer, harder to read and “untidy” (IMO anyway).  It is better to use underscore_characters or CamelCase instead of spaces (or better still use a single noun name as mentioned above).

Columns and measures always need to be wrapped in [Square Brackets] anyway and hence adding spaces does not make the code any more complex.  Columns and measures are easier to read if they have spaces

## Don’t Overly Abbreviate Business Terms

• Give your tables, columns and measures descriptive business names without overly short abbreviations.

Firstly you should use language and abbreviations that are commonly used in your organisation.  So if “Year to Date” is commonly abbreviated to YTD, then for sure you can use this abbreviation in your measure names eg [Total Sales YTD].  However if you develop a new measure called [Total Sales Last Rolling Quarter] and this is not a common concept across the organisation, then you are just making it hard for yourself if you call your measure [Ttl Sales LRQ].  You will simply have people calling you asking what it means.

Secondly Power BI has a feature called Q&A that allows a user to ask a natural language question about data.

eg.  What were the total sales for bikes last year

This natural language algorithm looks for matches in words in the question against the data model to help answer the question.  If you abbreviate your measure names to for example [TtlSales] instead of [Total Sales], you are making it hard for Q&A to do its work.  You can help Q&A using synonyms, but do yourself a favour and don’t over abbreviate your name.s

## Measures or Calculated Fields

• Measures is a better name than Calculated Fields

The term measures has been a Microsoft term for a BI formula for many years.  In the first release of Power Pivot in Excel 2010, Microsoft adopted this term.  Unfortunately in Excel 2013 somehow a decision was taken to rename “measures” to be called “calculated fields”.  This was a bad decision and thanks to lobbying from many people Excel 2016 reverted to using the term measures (as does Power BI).  I always now use the term measures and never refer to Calculated Fields unless I am explaining to Excel 2013 users why they are stuck with a bad name.

## Push Shaping as Close to the Source as Possible

• Wherever possible, you should do your data shaping as close as possible to the data source.

There are many ways that you can shape your data in the Microsoft BI stack.  Power Query is a great tool to reshape your data however you can also use Power Pivot (Calculated Columns, Filters on load) and Power BI also includes Calculated Tables.  And you can always write SQL code and paste that into the tools to extract the data that way.  The main problem with these approaches is you are effectively hard coding a solution for a single data set.  If you want to build another data set in the future, the work needs to be done again (either copy or re-write).  The data shaping tools are designed to allow you to do what ever you need without having to rely on a third party – use these tools if you need to.  However if you have a common need for data in a particular shape and you can get support (from IT or otherwise) to shape your data at the source so you can easily get what you need, then there is definitely value in doing that.

## Shape in Power Query, Model in Power Pivot

Power Query and Power Pivot were built to do 2 completely different tasks.  Power Query is built for cleansing and shaping while Power Pivot is built for modelling and reporting.  It is possible that you can shape your data in Power Pivot (eg you can write calculated columns, you can add calculated tables (in the newer versions) etc).  But just because you can do these things in Power Pivot, doesn’t mean you should.   For example it is possible to write letters to people using Excel, but Word is a much better tool for this task (I knew someone that once did that!).

Best practice is that you should use Power Query to shape your data before/during load, and then use Power Pivot for measures and reporting. I have deeper coverage on this topic here.

## Use A Calendar Table

• If you want to any sort of time calculations, get a Calendar table

It is possible that you can analyse your data in a single flat table without using any lookup/dimension tables.  A Calendar table is a special type of lookup/dimension table because it can be used to perform time intelligence functions.  I have an article on time intelligence here and another on Calendar tables here.  Bottom line – get a Calendar table.

## A Star Schema is Optimal

• Power Pivot is optimised to use a Star Schema table structure

I have an in-depth article about star schemas here that you can read if need be.  I am not saying this is the only layout that will work, or that other designs will always be slow.  I am saying that if you start out thinking about a star schema and aim to build that design you will be well under way to success.  Two key things you should know.

• Don’t just bring in what is in your source transactional database – that would likely put you into a world of pain.
• There is no need to create a lookup/dimension table just for the sake of it.  If your sales table has customer name and you don’t care about anything else about the customer (eg city, state etc), then there is no need to create a lookup table just for the sake of creating a star schema.  If you have 2 or more columns relating to the same object in your data table, then it is time to consider a lookup table.

## You Should Prefer Long and Narrow Tables

• Short wide tables are generally bad for Power Pivot but long narrow tables are great.

There are 2 main reasons why loading data this way is a good idea.

• Power Pivot is a column store database.  It uses advanced compression techniques to store the data efficiently so it takes up less space and so it is fast to access the data when needed.  Simplistically speaking, long narrow tables compress better than short wide tables.
• Power Pivot is designed to quickly and easily filter your data.  It is much easier/better to write one formula to add up a single column and then filter on an attribute column (such as month name in the green table above) than it is to write many different measures to add up each column separately.

## Only Load the Data You Need

• Load all the data you need, and nothing you don’t need.

It is possible to add your data to a table in Excel and then use a Linked Table to load it into Power Pivot.  You simply select the data, go to the Power Pivot Menu (1 below) and click Add to Data Model (2 below).

The trouble with doing this is that you end up with 2 copies of the data in your workbook. The Excel table is an uncompressed copy and then you also have a compressed copy inside Power Pivot.  In the example (shown as 3 above) there are many thousands of rows of data.  The uncompressed data can take up 6 – 10 times more space than the equivalent compressed data.  If you have a small table with a couple of columns and 10 or 20 rows then it is fine.  However if you have anything more than that you are better to have 1 workbook for your data and then import the table directly into Power Pivot without storing it in Excel at all.

# Modelling

## Avoid Bi-Directional Relationships

• Avoid using the default bi-directional relationships in Power BI unless you need them.

Microsoft introduce bi-directional filter propagation in Power BI (this is currently not available in Excel).  This is a good feature for beginners and those that don’t really understand how the plumbing works.  But it comes at a cost in that:

• The performance can be negatively affected
• If there is more than 1 data table, there can be circular relationships created (just like in cells in Excel)

I recommend you turn your bi-directional relationships to single direction (double click to change) and only turn them back on if you really need them.

## Measures are Better than Calculated Columns

• Wherever possible you should write Measures in Preference to Calculated Columns Where Possible

I have been a strong proponent of this for many years.  It mainly applies to Excel users that don’t have any formal learning about database design.  It is very easy for an Excel user to think they should write a calculated column (because they know how to do that) and not a measure (because that is a bit foreign to an Excel user).  I am not going to cover this in depth again now as I have already covered it here.  Just do yourself a favour Excel folk and start with the assumption that a measure is what you should write unless you know why a calculated column is a better fit.

For the record there are times when a calculated column is the best option, but 99.9% of all use cases coming from new Excel users won’t need them.  The main exception is if you need to use the formula in a slicer to filter your data – then you will need a column.

## Store Measures in the Table Where the Data Comes from

• You have a choice where to store your measures, so place them in the table where the data comes from.

Take for example a measure like this.

Total Sales = SUM(Sales[Extended Amount])

The data for this measure is coming from the [Extended Amount] column in the sales table.  You should therefore store the measure in the sales table.   If you place it in (say) the Calendar table, Excel will likely give you a warning similar to shown below.

This issue doesn’t occur in Power BI.

## Break Measures into Interim Parts

• Break your DAX problems into manageable pieces and solve each piece one at a time.

You wouldn’t use a single cell in a spreadsheet to build a financial model.  The cells are there to be used and make it easier to build a solution that meets your needs.  You should think the same way about measures.  The following formula is valid however it is hard to read, write and debug.  It also repeats the same line of code multiple times throughout the measure.  Having said that it will give you the % change in sales vs last year.

It is much easier to write interim measures and then join the pieces together to solve your problem.  Plus you get each interim measure available for reuse elsewhere in your model.  I am sure you will agree the following set of 4 measures are much easier to understand.

Total Sales = SUMX(Sales,Sales[Qty] * Sales[Unit Price])

Total Sales LY  = CALCULATE([Total Sales],SAMEPERIODLASTYEAR(Calendar[Date]))

Chg vs LY = [Total Sales] – [Total Sales LY]

% Chg vs LY = DIVIDE ( [Chg vs LY], [Total Sales LY] )

## Don’t Break Calculated Columns into Interim Parts

• It is good to have interim measures but it is bad to keep interim columns.

Interim measures are calculated on the fly on demand, they take up little space and make it easier to write your DAX.  As with measures, I t is easier to write calculated columns using interim calculated columns, however the problem is that every column is pre-calculated and stored on disk, and each additional column makes the data model take up more space on disk and memory and hence makes it less efficient.   By all means write interim columns if you need to in order to create a calculate column (not withstanding the earlier comments of columns vs measures) however once you have worked out the correct syntax, concatenate all the code into a single “Mega DAX formula” in a single column. This is an Excel concept I learnt from John Walkenbach.

## You Can’t Start to Learn DAX by Reading Alone

I say this up front in my book “Learn to Write DAX”.  If you think you are going to learn a new skill like Power Pivot, Power Query or Power BI by reading a book and not getting your hands in the system, let me tell you “you can’t”.  The exception is if you are a professional SQL Server database user and have a solid background in reporting and analytics database technology, then I am sure you can learn this way. For the rest of us Excel folk, there is no substitute for practicing what you read – so do yourself a favour.

## Use 64 bit If You Can

This one causes a world of pain for many people.  Power Pivot is the ONLY MS Office product that can benefit from 64 bit, but unfortunately it is all or nothing.  Most organisations have deployed 32 bit and will be very reluctant to give you 64 bit Office.  You can read my article on this topic here and also read up about my work arounds including installing 64 bit Power BI Desktop with 32 bit Office, and then also using Power BI Desktop as a local server for Excel.

# DAX Time Intelligence Explained

Level: Beginners

I help a lot of people on forums who ask questions about time intelligence for DAX.  If you are just starting out then the chances are that you may not even be clear what time intelligence is and hence sometimes you don’t even know what to ask.  Often the question is something like “I want to show this year and last year on a chart”, or “total year to date this year compared with last year” etc. If you want to do any time based comparison using DAX, Power Pivot and or Power BI, then this article explaining time intelligence is the right article for you.

# Definition of Time Intelligence

Time intelligence is the collective name for a set of patterns (DAX in this case) that can be used to solve time comparison problems.  Examples include comparing:

• Same period prior month, quarter, year etc.
• Same period next month, quarter, year etc.
• Same period year to date compared with prior year, next year etc.
• Rolling 30 days, 60 days, 90 days, 12 months etc.
• etc. – there are many many more

Time intelligence is used when you want to “time shift” any period with another period of time for comparison purposes or to simply display a different period than the selection.

# Understanding Filter Context

Before you can understand why time intelligence needs a special approach in DAX, you first need to have a clear understanding of Filter Context.

Pivot Tables and Power BI visuals both slice data so that you can “drill” and “filter” to see a sub-set of data.  Take the image shown below.  On the left is a pivot table and on the right is a Power BI visual.

Starting with the Excel Pivot table on the left, every value cell in the pivot table has been filtered more or less by the Rows, Columns, Filters and Slicers that make up the Pivot Table.  The cell highlighted as 1 (above left) has the following filters applied.

• Territory[Country] = “Australia” – this comes from the Rows in the Pivot
• Calendar[Year] = “2003” – this comes from the Year in the slicer
• Product[Category] = “Bikes” – this comes from the Category in the filter.

After these filters are applied, the calculation for the cell is evaluated and the answer \$2,947,789 is returned to the cell.  Every value cell in the pivot table is evaluated in exactly the same way – including the Grand Total row in the pivot table.  In the case of the grand total row, Product[Category] and Calendar[Year] have the same filters, but there is no filter on Territory[Country].

On the right hand side in the image above is a Power BI visual.  Filtering in Power BI visuals fundamentally works the same way as a pivot table however there are more places for cross filtering to come from.  In the image above, the same filtering is applied as in the Pivot table but in the Power BI example the filters are applied in a different way.

• Territory[Country] = “Australia” – this comes from the bar chart Axis
• Calendar[Year] = “2003” – this comes from the Year in the slicer
• Product[Category] = “Bikes” – this comes from the Category in the tree map visual.

When filter context is passed from a visual to the underlying data model, all the relevant tables are filtered before the calculation is completed.  Filter first, evaluate second is a fundamental principle for all DAX formulas.

# The Time Intelligence “Problem”

Let’s assume you want to compare total sales on a particular year vs prior year.  One way to do this (in Excel) is to put the years onto Columns in a pivot table as shown below (a similar approach can be used in Power BI).

But doing it this way causes many problems, including:

• There are years in the pivot table that you don’t want (eg 2001, 2004).  You will need to somehow manually hide or filter the ones you don’t need.
• If you want to calculate the absolute change or % change year on year you will need to hard code some formulas in the cells next to the spreadsheet and hence they can’t be reused in other visuals later.
• If you want to look at a different year you will potentially have to go through the process of doing the filtering and formulas again from scratch.

A better way to solve this problem is to select the current period (using a slicer or filter of some sort) and have a universal formula that returns the result relative to the selection.  So if you select 2003, the formula will automatically select 2002 for you.  If you select 2002, it will automatically select 2001 (and so on).

## Filtering is Now Your Enemy

If you want to write a relative time intelligence formula, then the natural filtering behaviour of a visual becomes your enemy. Imagine a calendar table with a filter applied Calendar[Year] = 2003.  If you imagine what the filtered data model would look like “Under the hood”, it should be clear that the filtered calendar table will show rows starting on 1 Jan 2003 and ending on 31 Dec 2003 – it will contain 365 unique days for the single year 2003.  The filter has already been applied to the data model so only days in 2003 are available for the calculation – how then can the data model possibly access sales for the year 2002?  There needs to be a way to go back in time and fetch a different period (in this case the period is 1 year earlier in time than the selected period), yet the 2003 filter has already been applied preventing this from occurring naturally.  This is the reason why there needs to be a special set of time intelligence functions – to solve this natural filtering “problem”.

## How Time Intelligence Functions Work

Time intelligence functions are specifically designed to solve the filtering issue described above.  All time intelligence functions execute the following 4 steps:

1. first “detect” the current filter context to determine what the “current” selected period is
2. then remove the calendar filtering from the underlying tables so that data for “all time” is once again available.
3. then perform a time shift to find a different period in time (as specified in the formula).
4. Finally reapply filters on the data model for the time shifted period prior to calculating the final value.

# Custom vs. Inbuilt Time Intelligence

There are 2 types of time intelligence functions – Custom and Inbuilt.  Inbuilt time intelligence functions are easier to write because they have been designed to protect the user (ie you) from the complexity.  I am not going to cover Inbuilt time intelligence in depth here because they are relatively easy to use.  See link at the bottom of the page if you would like to download the DAX reference guide I produced (including all the time intelligence functions).

The reason inbuilt time intelligence functions are easier to learn is they actually are what is known as “Syntax Sugar”.  Microsoft has created these special functions to make them easier to write and easier to understand.  You follow the simple syntax and Power Pivot does the rest.  But under the hood the inbuilt time intelligence functions are converted to the full syntax prior to evaluation.  Consider the following two examples (just to illustrate the potential complexity in the custom version).

Total Sales Year to Date (inbuilt) = TOTALSYTD(Calendar[Date],[Total Sales])

Both of these formulas return the exact same result, and in fact both use the same approach to calculating the result under the hood.  The only difference is the inbuilt version is much easy to write because you (the DAX author) are protected from the full syntax.

The end result (regardless which formula is used) looks like this in a Pivot Table.

Looking at the highlighted cells, even though cell 1 above is filtered to show only sales for the month of July 2003, the Time Intelligence function (cell 2 above) is able to display sales for the period Jan – Jul 2003.  It does this because the formula performs the following 4 steps.

1. It first “detects” the current filter context to determine what the “current” selected period is.  In this case July 2003
2. It then removes the calendar filtering from the underlying tables so that all data is once again available.
3. It then performs a time shift to find a different period in time – in this case it holds the end date in the current filter context the same (31 July 2003) but shifts the start date back to 1 Jan 2003.
4. Finally it reapplies filters on the data model for the time shifted period prior to calculating the final value.

# How to Read a Custom Time Intelligence Formula

The custom time intelligence formulas can be daunting when you first see them – but actually they are easy to understand as long as you clearly understand the role of each part of the formula.  Below is the same formula again (from above) along with line numbers to make it easier for me to refer to.

Line 2 (CALCULATE) is the only function that can change the natural filtering behaviour coming from a visual – that’s what it does.   CALCULATE always filters first, evaluates second (as mentioned above).  So lines 5 – 8 are executed first (filters applied) and then the formula on line 3 is evaluated last.

Lines 4 – 8 (FILTER) is where the filtering occurs.  There are a few confusing things here.

• Line 5 refers to ALL(Calendar) instead of simply Calendar.
• Lines 6 and 7 seem to be evaluating against themselves – very confusing.
• Line 7 starts with a double ampersand &&

Let me explain each line.

## Line 5 ALL(Calendar)

The key thing to understand here is that the filter portion of calculate always operates in the current filter context.  That means that the Calendar table in line 5 has already been filtered by the visual (eg the Pivot Table).  Looking back at the pivot table image above, this means that the line 5 is already filtered by the pivot table and hence the Calendar only has dates from 1 July 2003 until 31 July 2003 available – all other dates have been filtered away.  The purpose of the ALL Function therefore is to remove all filters from the current filter context.  If it didn’t remove these filters, it would not be possible to access dates outside of the month of July 2003.

Now they have all be removed, it is time to add back that date filters that are needed.

## Line 6 MAX( )

Line 6 reads “where Calendar[Year] = MAX(Calendar[Year])” and hence it seems to be referring to itself. The way to read and understand line 6 (and 7) is as follows:

• Whenever you see an aggregation function (in this case MAX) around a column, it means “go and read the value from the current filter context”.
• Whenever you see a “naked” reference to a column (ie there is no aggregation function around the column), then it means “go and apply a new filter on this column in the data model.

Taking these 2 rules, it should be clear that MAX(Calendar[Year]) in the current filter context is = 2003.  Line 6 therefore says “Go and apply a new filter on Calendar[Year] to be equal to 2003.

Note that you can use any aggregation function in these formulas that does the job.  In line 6, you could use SUM, MIN, MAX, AVERAGE or any other similar aggregation function and still get the same result.  You could also use VALUES or DISTINCT in the case of line 6, and LASTDATE in the case of line 7.  There is no right or wrong answer, simply think about the need and then find the best function to use.

## Line 7 && and MAX( )

Line 7 reads “and Calendar[Date] <= MAX(Calendar[Date])”.  The double ampersand && is the inline syntax for a logical AND.  Using this knowledge and the same logic as with line 6, the way to read line 7 is as follows:

“AND also go and apply another new filter this time on Calendar[Date] to be less than or equal to 31 July 2003”.

With both of these filters applied in a logical AND, the Calendar table will contain all of the dates from 1 Jan 2003 until 31 July 2003.

# The Result

Once the time intelligence function has been written, it can be added to a visual as shown below (Power BI Desktop).  Note how the YTD formula seems to “defy” the filter context coming from the visualisation due to the custom time intelligence function that has been written and explained.

# The Trouble with Syntax Sugar

Syntax sugar is great as it makes otherwise hard formulas very easy to write with a limited understanding of filter context, row context, filter propagation, context transition etc.  There are a few down sides however.  Firstly is that you don’t get to learn these very important concepts and hence you are delayed in becoming a true Power Pivot and Power BI Ninja.  Secondly the inbuilt time intelligence functions only work in certain circumstances where you have a day level Gregorian calendar.  Read more about that here exceleratorbi.com.au/power-pivot-calendar-tables/

# Granularity

I personally hardly ever use the inbuilt time intelligence functions unless my data is at a day level of granularity (which is rare), and I find the custom functions become easy to write with practice.  Custom time intelligence functions become much more important when your data is not at a day level of granularity.  Most of the work I do for clients is either weekly or monthly data.  If you are in this situation you could “trick” Power Pivot that you are using daily data by loading all data in any given week or month on the same date (eg last day of the month) and use inbuilt time intelligence however this is hardly best practice.  A much better approach I think is to write custom time intelligence functions as outlined in this article.  If you are going down the  path of writing custom time intelligence, you should read my advice about adding an ID column into a calendar table to make custom time intelligence functions easier to write.  exceleratorbi.com.au/power-pivot-calendar-tables/

# Sales vs Prior Year

Time for a different example.  Now that I have covered how a custom time intelligence function works, let me show you a couple of inbuilt time intelligence measures that solve the original problem (Sales vs Prior Year).

Sales Prior Year 1 = CALCULATE([Total Sales],SAMEPERIODLASTYEAR(Calendar[Dates]))

Sales Prior Year 2 = CALCULATE([Total Sales],DATESADD(Calendar[Dates],-1,YEAR))

Both of the above formulas use inbuilt time intelligence functions (shown in bold), but they also use CALCULATE.  Now you have an understanding that CALCULATE is performing a time shift, it should be much easier to understand what is happening in these formulas.  Both of these formulas in bold produce a table of dates that have been time shifted by 1 year.  CALCULATE then takes this new table of dates, removes the current filter context from the calendar table and then moves back in time by 1 year before reapplying the filter context and then doing the calculation.  One you have [Sales Prior Year] it is easy to write:

Change vs Prior Year = [Total Sales] – [Sales Prior Year]

% Change vs Prior Year = DIVIDE([Change vs Prior Year] , [Sales Prior Year])

So where can you find a list of all the inbuilt time intelligence functions?

# A Free DAX Reference Guide

One of my students at a recent training class asked me if there was a list of all DAX Functions – kind of like a cheat sheet.  I wasn’t able to find such a thing so I produced exactly that and I am making it available free to anyone that would like a copy here.

This reference guide covers all of the inbuilt time intelligence functions on page 14 as well as every other function across the language all nicely laid out to make them easy to find.  You can download this reference guide below.  If you haven’t ready done so, why not sign up for my weekly newsletters at the same time so you are kept up to date with my latest tips and tricks about Power Pivot, Power Query and Power BI.

# Find Duplicate Files on Your PC with Power BI

Level: Beginners

If you want to learn new skills using a new tool, then you simply must practice.  One great way to practice is to weave the new tool into you daily problem solving.  If you have something meaningful to do with the new tool, then you are much more likely to be motivated to practice.  And the new tool I am talking about of course is Power BI.

Last week I showed how easy it is to use Power BI to help you track down large files saved in Dropbox so you could manage the overall space usage.  As a result of that article, Graham Whiteman posted a comment suggesting it would be a good next step to find duplicate files.  I think that is a great idea, so I decided to test it out on my PC.  Read on to see how I did it, and how you can do it too.

# Create a Query to Fetch All PC files

I started a new Power BI Desktop file, then connected to my PC documents folder

I immediately selected Edit query as shown in 1 below.

The only time you would immediately select Load (2 above) is if the data you are imported is already in the correct shape for Power BI.

The only columns I need are the file name, date modified, attributes and path (shown below).  I Multi selected the columns I wanted to keep, then I right clicked and select “remove other columns”.

The next step was to extract the file size from the attributes list. To do this, I expanded the list of attributes (1 below), deselected all the columns and then reselected the file size (3 below).

Then I renamed the query (1 below) and changed the query so it didn’t load to Power BI by right clicking on the query and un-checking the enable load option.

This created a query that links to the PC, keeps the columns of data needed but didn’t load anything to Power BI yet.

# Create a New Query that Accesses the File List

The next step was to create a new query that references the File List.  I right clicked on the first query (1 below) and then selected Reference (2 below).  Note how the File List query is shown in Italics indicating that it won’t load to Power BI.

The next step was to merge this data with the itself by going to the Home Ribbon and selecting Merge Queries.

.

In the Merge Queries dialog, I joined the list of files File List (2) with the original query File List so that it was joined to itself on 3 columns (the File Name, Modify Date and File Size) but not the File Path as shown below.

The above steps added a new column to the query.  I then expanded the new column as shown below making sure to keep the original column name prefix.

# Find The Duplicate Files

The second query now looked like this.  As you can see in the image below, the query returned all the files (name column) along with the folder paths from the query “File List” shown as 1, and a second column containing the folder paths from the query “File List (2)” shown as 2 below.

The next step was to get rid of all rows in this query where the 2 folder paths are identical.  Doing this is easy with a custom column.  I added a custom column (steps 1 and 2), and wrote a formula to return TRUE if the 2 folder paths were identical.

I then filtered out everything that returned a TRUE in the new column using the filter button as shown below.

I then deleted this custom column as it was no longer needed.  I just right clicked and selected remove.

## Format the Number Columns

It is very important in Power BI to set the number formats before loading the data.  Any numeric column that has a data type “Any” should be changed to a suitable numeric format (as shown below).

I did this, renamed the query to be called “Duplicates” and then selected  “Close and Load” to get the data into Power BI.

# Time to Write Some DAX

Now the data is loaded, you of course I could just drag the one or more of the columns to the Power BI canvas.  But remember half the reason of doing this is to get some new skills.  So instead of dragging the Size column and creating an implicit measure, I wrote some DAX – it isn’t hard to get started with such simple formulas.  Here’s how to do it.

Select the Size column, go to the Modelling Ribbon and select New Measure.

The formula I wrote is as follows

File Size MB = sum(Duplicates[Size])/(1024 * 1024)

Note a few things that I was able to do by writing this measure myself

1. I converted the units of the result from bytes to megabytes by dividing by (1024 x 1024).
2. I gave the measure a more meaningful name “File Size MB”
3. I was able to set the formatting to comma separated with 1 decimal place

And of course I practiced my DAX.

# And the Results

I simply then added the file size, File Name, Folder Path and Second Folder Path to a table in Power BI like shown below.

I then discovered I had around 9 GB of duplicate files on my PC.  I sorted the table by File Size descending and discovered that I had multiple identical copies of a contoso.pbix.  It looks above like there are 6 copies of contoso.pbix but this is deceiving. Every copy of a file will find a match with every other copy.  If  you note in the Folder Path column, there are only 3 unique folder paths, hence 3 files.

The next thing I did was add a Tree Map as shown, with the file name in the Group section and File Size MB in the Values section.

To find out accurately how many copies of each file there were, I had to write some more DAX.  This formula is a bit more involved (intermediate DAX).

Let me explain this formula starting from the inside out.  There are 4 functions in this DAX formula and I describe their role below.

1. SUMX is an iterator.  It iterates over a table specified as the first parameter (VALUES in this case).  You can read more about SUMX here.
2. The VALUES function returns a table of unique file names (in this case it is unique values in the column Duplicates[Name]).  So SUMX above will iterate over each file name in the name column.
3. SUMX is iterating over a Virtual Table (VALUES).  The CALCULATE is required to force context transition.
4. Then for each file name in the table (in 2 above), DISTINCTCOUNT will count how many unique folder names there are.

I then added the new File Count measure to the Colour Saturation section of the Tree Map (1 below).  This does 2 things.  Firstly it shows the high folder count files as being a darker colour, and secondly it adds the file count to the tool tips (visible when you hover the mouse over the visual).

# And Now Some Fun

I’ve been looking for an excuse to do this for some time.  I want to find the fattest fish in my pond (aka most space taken by file name).  I went to visuals.powerbi.com and downloaded the Enlighten Aquarium custom visual.

app.powerbi.com/visuals/show/Aquarium1442671919391

I then imported the custom visual into Power BI Desktop

The I copied my Tree Map visual (Ctrl-c, Ctrl-v), selected the copy and changed the visualisation to be the Aquarium.  This visual is showing the largest individual files regardless of location or how many copies.  I am not saying this is the best way to visualise data, but surely it is one of the most creative.

Here is my final workbook canvas

For the purists out there, I wrote a new file size formula as follows.

The original formula I wrote double counts the file size when there are multiple duplicates.  The above formula is almost identical to the File Count I explained above.  The only difference really is the inclusion of MAX(Duplicates[Size]).  This is a “trick” to handle the fact that for each file name there will be multiple records in the data model.  Each file will have the exact same file size, so by selecting MAX I simply get to access the file size.  I could have used any other aggregator (eg Min, Avg, Sum) and got the same outcome.

I haven’t shared the actual workbook here. The whole idea is for you do try this yourself so you get 3 benefits; more disk space, some practice with Power BI Desktop and have some fun.

# Use Power Query to Manage Dropbox Space

Level: Beginners

I got this dreaded Dropbox email recently as shown below.

I needed to clear out some of the files I have loaded in Dropbox so I didn’t have to upgrade my account.  It occurred to me that I could make this process a lot easier by using Power BI to quickly show me where my big files were located in Dropbox.  This post today explains how I did it. What I ended up with is a report like this that allowed me drill down on the large sub folders to easily find my big files.

Note, there is a great tool called WinDirStat that you can download here that does this too – I use WinDirStat all the time. But I never want to miss an opportunity to do something with Power BI.

# Process to Build the “File Space Usage” Tool

First I created a new Power BI report and connected it to my Dropbox folder.

You can of course use the same process on any other folder on your computer, or even the entire Hard Disk if you want.

I then imported the columns I thought would be of use, and loaded them into the data model.

I figured the Hidden flag and Date Accessed might be useful at some stage, so I brought those in too.

I then wrote some measures that I thought would be useful.

I encourage you to write your own measures rather than use the implicit measures created when you drag a column of values to the report.  By writing your own measures, you “Learn” how to write DAX and that will help you become a Power BI ninja.

The last thing I did was to create a report that made it easy to see where my big files were located and find out what they were.

I have loaded a short 4 minute video that shows how quick and easy it is to do this from scratch.

# What Obtuse uses have you found for Power BI?

I would love to hear from others about how they are using Power BI in ways that are not immediately obvious.

# Who Needs Power Pivot, Power Query and Power BI Anyway?

Level: Beginners

One of the great challenges Microsoft has faced with its “new” suite of Self Service BI tools (particularly Power Pivot) is that most people that could benefit from the holy trinity (Power Pivot, Power Query and Power BI) don’t even know these tools exist, let alone how the tools can help them succeed in their jobs.  The situation is definitely getting better as Power BI starts to get a presence in the market place, however I still talk to people who have heard of Power BI, but have no idea what Power Pivot or Power Query are, and what’s more they don’t know why they should care.  I personally believe a big part of the awareness problem is that Microsoft insists on shipping Microsoft Excel with the Power Pivot plugin disabled.  There is no reference to Power Pivot when you excitedly receive your brand spanking new version of Excel 2016 – what a marketing opportunity gone begging!

I have been an Excel nerd for 30 years.  There is nothing I used to enjoy more than installing a new version of Excel, and clicking through every menu item to find something shiny and new that would make my life easier.  By not turning on the Power Pivot menu by default, Microsoft is missing one of the best silent selling opportunities is has for this fabulous addition to Excel.

Given there is no “pull through” on the menus, many people don’t know what these products are or why they should care.  I am often asked by people “who can benefit from these tools?”.  This post sets out to explain who can benefit and why.  Note when I say “who can benefit”, I am not talking about “consumers of reports” here, I am talking about “authors of reports”.  It is clear that people that consume quality reports and analysis will benefit, whatever the tool.  This article is focused on the benefits to those people that are responsible for building the reports and analysis that others will end up consuming.

# Power BI

Who can benefit from Power BI is probably the easiest to understand.  The product is well marketed and has a clear role to play.   Power BI is a complete self service BI tool.  It is designed to bring business intelligence capabilities to the masses instead of the elite (e.g. instead of just highly skilled IT MDX programmers).  Rob Collie wrote a good article last week about the democratisation of BI tools.  Power BI will add value to people who:

1. Have problems sharing reports with others because the file size is too large.
2. Need to share data with people on the go, that maybe only have a Tablet or a Mobile phone.
3. Have large data sets that can’t be managed in traditional Excel.
4. Are spending too much time each week/month manually maintaining reports with new source data and/or new visualisation requests.
5. Can’t get the (timely) support they need from their IT department using traditional Enterprise BI tools.

Power BI is great because it puts capabilities across the end to end BI stack into the hands of end users (authors), including:

1. Extraction of data from the source (using the Power Query engine)
2. Transformation of that data into a new shape that is optimum for BI reporting and analytics (Power Query again).
3. Data modelling capabilities, so you can turn the raw data into business meaningful insights (using the Power Pivot engine).
4. A reporting and analytics front end allowing you to build reports to visualise your data (Power BI Desktop and Power BI Service).
5. A fully integrated cloud based delivery framework so you can easily share with anyone over the internet (Power BI Service).
6. A full set of Mobile applications across the major operating systems (Power BI Mobile).

Notice how steps 1 and 2 use Power Query, and step 3 uses Power Pivot.  So if you want to learn about Power BI, you really need to learn about Power Pivot and Power Query too.

# Power Pivot

Conversely, Power Pivot is the hardest to understand – I.e. it is the hardest for individuals (potential authors) to understand “what does Power Pivot do for me and why do I need it?”.  I have had people enquire about Power BI training courses that have not been interested in Power Pivot or DAX*. But the truth is, if you want be able to write your own reports in Power BI, you really need to learn at least some basic Power Pivot skills.

Power Pivot is a data modelling tool.  It is the underlying reporting engine that enables Power BI and Modern Excel to delivery those modern funky reports that can help you succeed in business.  The Power Pivot engine allows you (the report author) to take your business knowledge and to configure the reporting tools so that Power BI and Excel Pivot tables can be used to find and report on insights in your data.

Most business users have never heard of the term “Data Modelling” before, and the reason for this is quite simple – it has always been the IT department that has been responsible for data modelling.  Power Pivot brings the power of data modelling and puts it in the hands of competent business/Excel users.    An example of data modelling will make it easier to understand.

## Example of Data Modelling

You can see the sell price and the cost price information exists in the table above.  But there is nothing about the \$ Margin per product and nothing about the % Margin, let alone insights like Sales Year to Date, Top 3 selling products, Fastest growing product etc.  In a traditional Excel world you would simply write formulas in your spreadsheet(s) to enhance this raw data and extract the additional insights.  The trouble with this approach is that all these formulas are only ever “one off”.  You write them for this report, and then when you create another report later, you have to write the formulas again.  Power Pivot handles this problem in a different way.  Power Pivot is a data modelling tool, and instead of writing formulas in your final reports, you write them “under the hood directly in the Power Pivot engine” as part of the data modelYou build the “rules” that describe how to calculate \$ Margin and % Margin ONCE and only once directly in the Power Pivot engine.  Once you have created the rules for each insight (e.g. \$ Margin, % Margin etc), it is forever available for you (and others) to use over and over again in any report, chart, Pivot Table or any other visualisation you can think of.  Never again to you have to write/copy a new formula every time you create a new report.

Data Analysis Expressions (DAX) is the formula language of Power Pivot.  DAX is very similar to the Excel formula language yet there are also a lot of differences that you will have to learn from scratch.  Simply put, if you want to learn Power Pivot (and/or Power BI), you will need to learn at least some DAX as well as lots of other things about how the Power Pivot engine works.

## Who needs Power Pivot and why?

People that need this tool are typically Microsoft Excel users/report authors that analyse or report on data, particularly when the following conditions apply:

1. There are lots of data that stretch the capacity of traditional Excel (file size, re-calculation speed etc).
2. If you use Pivot Tables a lot to analyse your data.
3. If you are writing a lot of VLOOKUP (or INDEX/MATCH) formulas to join data from different tables so you can analyse in a Pivot Table.
4. If you have had to create bespoke reports in traditional Excel, but then have to spend hours rebuilding the report when asked for a different view of the data (e.g. you have a report that shows sales by half year, and then you are asked to produce the same report by month – and have to start again almost from scratch).
5. Anyone that wants to start using Power BI for anything more than the most simple aggregation reports.  The “out of the box” capabilities of data modelling in Power BI are limited, and you will quickly realise that you need to learn some new skills (Power Pivot/DAX) to be able to leverage the strengths of Power BI.

# Power Query

Power Query is a user friendly ETL (Extract, Transform, Load) tool.  Unfortunately Microsoft muddied the waters by renaming Power Query “Get and Transform” in Excel 2016, and “Get Data” in Power BI Desktop.  That aside, Power Query is used to:

1. Extract:  Connect directly to the data source and ingest the data (into Power BI or Excel) so you can work with it.  It retains a connection to the source so when the source data is updated you can automatically “refresh” the data without having to go through the import process again from scratch. It is amazing.
2. Transform:  You can clean and reshape the data so that by the time you are ready to use it, it already is in the format you need instead of the format you are given.  Again, you do this only once, and then you can automatically refresh when the data is updated later.
3. Load the data directly to the place where you want to use it.  The end location for the data will typically be inside Power Pivot or Power BI, but it is also an invaluable tool for use with traditional Excel.

Don’t think of Power Query as simply a tool for self service BI reporting.  I have used Power Query to automatically audit information (eg XLSX files, csv extracts from other systems etc) and produce exception reports directly in Excel.  Once you understand what Power Query can do, you will start to realise all the ways you could use it to make your life easier.

## Who needs Power Query and why?

People that need this tool typically are trying to solve the following problems.

1. They regularly receive file extracts from someone (or some system) and need to manually manipulate this data before they can use it.
2. They want to use Power BI/Power Pivot but can’t get the IT department to shape the data they need prior to loading it into Power BI/Power Pivot.
3. People that are doing repetitive tasks such as matching invoices on a remittance advice against a bank statement (or similar) – Power Query eats this kind of work.
4. Anyone that is given data in the wrong shape (example below).  And yes I quickly used Power Query to un-pivot the data in the blue table so it looked like the green table.  It is worth getting and using Power Query just to get this un-pivot feature alone!

# Measures on Rows – Here is How I did it

Level: Intermediate

You may or may not be aware that it is not possible to put Measures on rows in a Matrix in Power BI. But I came up with a trick that makes it possible, so read on to find out how.

# Measures Can Only be Placed on Columns

First the problem. The only way that you can use the Power BI Matrix visualisation (at this writing) is to place the measures on the columns as shown below.  You can then take any column of data from your data model (typically from a Dimension/Lookup table) and place that on rows in the Matrix.  So you end up with this.

This limitation does not exist in an Excel Pivot Table.  As you can see below, it is possible to flip values between columns on rows from the Pivot Table Fields List.

# Old tricks for New Purposes

When I was thinking through this problem, my first intuition was to use a Switch statement somehow.  Switch statements can be used to create a Switch Measure (like the ones I used in this blog post here back in 2014) and I figured this could be part of the solution.  And then I remembered another trick I learnt from Rob Collie using disconnected tables to feed a measure – I new I had a solution.

So in summary, I wrote a Switch measure that will morph into different measures when filtered, and then used a disconnected table to pass a filter to that measure.

## Write a Switch Measure

The Switch measure is pretty easy to write as follows:

The way this measure works is that it takes an input in the form of an integer, and then depending on that number, it will return the value of the appropriate measure as the result.

## Then Create a Helper Table

The next step is the secret sauce to this trick.  I created a disconnected table with the sole purpose of passing the correct filter values (the integers) to the Switch measure.

I used the “Enter Data” feature in Power BI to create a list of the measures I want to put on rows in the Matrix.

Note that the ID column is also the sort order of the measures, and the ID column also matches with the Switch measure numbering scheme.  Once the table is created, I added it to the data model as a disconnected table – no need to join it to any other tables in the data model.

I then set the sort order of the Measure column as follows:

## Write a Harvester Measure

The next step is to write a harvester measure to extract the selected filtered value of the table.  This measure is simply as follows:

The measure above will return the largest integer in the current filter context.  If there is a filter (eg a slicer) on the Measure column and the user as selected “Total Margin” in the slicer, then there is only a single row visible in the filter context, and hence the MAX of the visible rows is 3.  I would of course get the same result if I used MIN, AVERAGE, SUM or indeed even VALUES in my harvester measure.

Filter context is one of the hardest things for new users to understand properly.  I explain filter context in detail (and everything else you need to know to be great at using Power Pivot and Power BI) in my book “Learn to Write DAX”.

You can see the harvester measure in action below.  I have placed the column from the MeasureTable into a slicer and the harvester measure into a Matrix.  When I click on the slicer, the harvester measure updates to show the ID of the selected filter.

## Add the Measure Column to the Matrix

There is more than 1 way to filter a table of course.  In the image above I am using a slicer, but I can also use the rows on the Matrix to provide filter context.  When I place the Measure column from the MeasureTable onto rows in the Matrix, you can see below that the harvester measure is correctly returning the value of the ID column to the visualisation.

## Pulling it All Together

In order to pull this all together, I replaced the harvester measure with the Switch measure I wrote earlier, and put the Categories column from the Products table onto columns in the Matrix.

But there are now 2 problems.  The first problem is that the Total Invoices are shown in currency format.  One limitation of this trick is that all the measures must have the same number formatting.  If you would like to see an optional “Alternative Format” for the SWITCH measure, please vote for that idea here.

The second issue is that the total row is actually showing the total invoices and not the total of all the measures.  This makes sense of course because the grand total row is not filtered.  As a result, the MAX of ID is 4 and hence the SWITCH measure is returning the value of the measure that maps to ID 4 (Total Invoices).

To fix these 2 problems, I removed Total Invoices from my table and rewrote the SWITCH statement to correctly handle the grand total row.

And there you have it, Measures on Rows in a Power BI Matrix.

# LASTNONBLANK Explained

Level: Intermediate

Last week at my Sydney training course, one of the students asked me a question about LASTNONBLANK.  This reminded me what a trickily deceptive function LASTNONBLANK is.  It sounds like an easy DAX formula to understand, right?  It just finds the last non blank value in a column – easy right?  Well it is a little bit trickier than that, and I am going to explain it all in this post today.

LASTNONBLANK has a sibling FIRSTNONBLANK that operates in exactly the same way but in reverse.  Given the behaviour is the same, I won’t cover FIRSTNONBLANK at all, as you can work it out after reading about LASTNONBLANK.

# Syntax

The syntax of LASTNONBLANK is as follows.

LASTNONBLANK(Table[Column],<expression>)

It takes 2 parameters, 1) a column and 2) an expression.  In fact you can also pass a single column table as the first parameter in as well.  This could be useful for example if you wanted to create your own table on the fly using a function that returns a table, and then use that as the first parameter.

## LASTNONBLANK is an Iterator

It is not immediately obvious, but LASTNONBLANK is actually an iterator,  It iterates over the column specified in the first parameter and finds (unsurprisingly) the last value in that column that is not blank.  (Technically the engine carries out this task in reverse natural sort order for efficiency).

## But what is that pesky second parameter?

The first thing people normally find confusing about LASTNONBLANK is that pesky second parameter.  If all you want to do is find the last non blank value in a column, why do you need this second parameter?  Well in short, you don’t “need” it in that use case (except that it is mandatory).  But having this parameter makes the formula much more powerful, so it is good to have it as an option. It is designed to allow you to write a DAX expression such as a measure to test for some condition or value.  Ideally (in my view) Microsoft should have made this an optional parameter, but that is not what we got.  But there is an easy work around that allows you to make this parameter optional  – just use the value 1 as the second parameter, like this.

LASTNONBLANK iterates through the Table[Column] and then checks to see if the second parameter has a value.  The number 1 always has a value of course, so placing 1 as the second parameter has the same effect as just ignoring this parameter.  Instead the formula will just return the last non blank value it finds in the column.

## LASTNONBLANK actually returns a TABLE

Another thing that is not immediately obvious is that LASTNONBLANK actually returns a TABLE.  It is a special single column, single row table that by definition can only have 1 possible value (because it has only 1 row and 1 column).  One feature of this single row, single column table is that you can use it as both a scalar value or a Table in DAX.  More on that later too.

## Test Data

My test data for this blog post is monthly bank account balances for 2 fake bank accounts.  Measures that calculate account balances in DAX are often described as “semi-additive measures”, because you can’t just add up the values from each month to get the current balance – you need to find the latest balance to work out how much you have.  This type of data is a prime candidate to use the LASTNONBLANK formula, because LASTNONBLANK does exactly what we need to work with semi-additive measures.

I have set up the following data with some interesting features to demonstrate the behaviour of LASTNONBLANK.

It is easiest to see what I have done with this test data in the pivot table below.  I have written the following measure to demonstrate the point.

Note the above measure is a test measure only to help with this blog – it doesn’t really make any sense as is, but it is useful for describing what is happening.  When the above measure is placed in a pivot table, it looks like this (shown below).

1. Kathy’s account doesn’t have a result for May
2. Matt’s account doesn’t have a result for July
3. The maximum value for Kathy’s account is in June (the previous month’s data load)
4. The maximum value for Matt’s account is back in May.

I have set the data up this way to demonstrate the behaviour of LASTNONBLANK.  I often talk about how important it is to set up test data that will flush out issues when you write your formulas.  This is a good example of that, and it will make more sense as you read on.

## Last value in a Column

Now the objective is to use LASTNONBLANK to find the last value in a column.

Consider the following formula.

This formula finds the last date in the data table (note it is not the last date in the Calendar table, but the data table).

The following pivot table has the Month name from the calendar table and the account names on pivot table columns.

Note that the formula correctly indicates that May data is missing for Kathy, and July data is missing for Matt.  Also note that it correctly gives the last date in the Grand Total of the pivot table.  (You would also get the same result as above if you used LASTDATE instead of LASTNONBLANK).  I have only used LASTNONBLANK here to demonstrate the behaviour.

## LASTNONBLANK operates over a sorted column

The [Last Non Blank Date] measure above hides some complexities about LASTNONBLANK.  You might expect that LASTNONBLANK finds the last value in the column, but that is not how it works.  It actually finds the last value in a sorted column using the natural sort behaviour of the column data type.  In the case of a date column (like shown above), then everything is sweet – the natural sort order is also the order we normally load the data (chronological order).  But in the case of the balance column, the natural sort order of a numeric column is numerical sort order, not the order the data is loaded (as you can see in the following pivot)

I have written another test measure as follows

When placed in the pivot table, you get the following behaviour

In the above example, you may expect the Grand Total row to return the values 2,125 for Kathy and 1,557 for Matt as these are the last values you loaded, but that is not how it works.  When looking at the individual rows in the pivot table, it all works fine – it correctly finds the last balance for each month.  This measure works correctly on the rows in the pivot table because the rows provide “initial filter context” prior to the measure being evaluated.  So there is only ever 1 row in the data table at the time the measure is evaluated, and that is why it works in this case.  But in the Grand Total row in the pivot table, there is no initial filter context applied to the date, hence all values in the column are iterated over by LASTNONBLANK.  The iteration operates over a sorted version of the balance column (the natural sort order of the column, which is numeric in this case).  Once the balance column is sorted, then 2,200 will be the last value in the column for Kathy, and 1,806 will be the last value in the column for Matt, and that is the result that is returned.

## LASTNONBLANK as a TABLE

In this next test measure, I have used LASTNONBLANK as a filter input into a CALCULATE function

You should remember that CALCULATE can use a simple filter such as Table[Column] = “some value” or it can take an advanced filter where you must pass a TABLE as the filter.  The measure above therefore suggests that LASTNONBLANK must be a table, not a scalar value.  This theory can be tested by firing up DAX Studio and executing the LASTNONBLANK portion of the formula as a query.  DAX Studio ONLY ever returns tables, it can’t return scalar values, so if the query works, it confirms that LASTNONBLANK returns a table. Note: you can read more about using DAX Studio as a tool to help you learn DAX at this blog post here.

You can see above that LASTNONBLANK returns a single column, single row table that contains a single value.  It is a special table because it can also be used as a scalar value, in the same way that you can use VALUES() as a scalar value if there is only a single row returned.  In the case of the measure [Latest Balance with 1] above, I am using LASTNONBLANK as a table parameter filter inside CALCULATE.

## But how does this table act as a filter?

One feature of temporary tables in DAX is they retain a link to the data model, and filter propagation will work between the temporary table and the rest of the data model.  You should visualise this in your mind like the image below.  Imagine a new table that has spawned into the data model and has a relationship to the table where it came from (in this case it is the data table).  The filter propagation flows from the 1 side of the relationship (temporary table imagined as shown in 1 below) to the many side of the relationship (table 2 below).

So you can see above that the LASTNONBLANK function produces a table that then filters the data model based on the single value it returns in the table.

When I put this measure in a pivot table, I get this result

Note that the measure is correctly returning the last value for the account “Kathy” but it is returning a blank for account “Matt”.  Technically this is correct because the calendar has dates in July and there is no entry for Matt for July, so the last value is blank.

## Enter the second parameter

Now it is time to change the second parameter with the following measure.

When this last measure is added to a pivot table, it works as desired, correctly returning the last non blank value from the data table based on the chronological order of the data.

This new LASTNONBLANK function measure returns the last non blank value it finds in the column provided it also returns a non-blank result for the measure [Sum of Balance].

It is worth pointing out here that the ONLY REASON this formula works is because the measure [Sum of Balance] has an implicit CALCULATE wrapped around it – you can’t see it, but it is there.  LASTNONBLANK is an iterator, and like all iterators, it has a row context but doesn’t have a filter context.  Therefore the following formula will not work

But this next measure does work (this next formula is the equivalent of the one that uses [Sum of Balance] because [Sum of Balance] has an implicit CALCULATE

I cover evaluation context and context transition in detail in my book, Learn to Write DAX.