Archive for DAX Studio

Uses for DAX as a Query Language

This is the final article in my series about DAX as a Query Language.  In my first article I introduced DAX as a Query Language and then in my second article I went in deeper to the specifics on how to use some of the more useful DAX query functions.  In this final article today I am going to cover a few of the many uses for DAX Queries and also introduce one last DAX function – the ROW function.

Create Summary Tables

As you have may have seen in my previous articles, you can use DAX Queries to create summary tables like the one below.

This may be useful if you want to pre-calculate this summary table and have it as part of your data model (more on that covered in the “Materialisation” section below).  If you want to create a summary table for any reason you can do it using DAX Queries.

Create Tables to Use As Filters Inside  CALCULATE

Another major use of queries is as a table input inside a CALCULATE function.  Because the virtual tables retain lineage to the data model they behave as if they were physically part of the data model.   The following measure returns the total sales for all unique combinations of product category and customer gender where the margin to those unique combinations is greater than 45%.

This measures doesn’t have any real value however it does demonstrate the concept well.

Create Tables to Use in Iterators

You can also create table to use as the table parameter in an iterator like SUMX or AVERAGEX.  The following measure returns the straight line average Margin % across all the actual combinations of product category and customer gender.

Here is an article I wrote some time ago that explains how to use DAX Query tables instead of a more complex nested SUMX approach.  This article will be much more useful with your new understanding of DAX Queries.

To Debug DAX Formulas (the Table Bit)

One thing I really like about DAX Queries is they allow you to “Inspect” tables that are used inside DAX Measures and Calculated Columns.  Excel users come from a world where everything thing they do – every formula and every number – is materialised in front of their eyes in the spreadsheet.  When you begin to write more complex DAX Measures or Calculated Columns, the formulas will often be in the following format:

Your Measure = CALCULATE([Some Base Measure], <A complex table function used to filter the data model>)

The issue of course is you can’t actually “see” the result of “the complex table function used to filter the data model”.  If you can’t see it, it can be hard to conceptualise and also hard to debug the measure particularly if it gives an answer different to your expectations.  This is where the concept of materialising tables comes in.  A great way to move forward when you are stuck is to write a DAX Query to return just the table portion of your measure first.  Once you have that working correctly you can insert the table portion into the measure.

Materialise Tables Into Excel

It is possible to materialise a DAX Query into Excel by using a trick that I learnt from The Italians.  For this next example I have an Excel Workbook that contains an Adventure Works Power Pivot data model.  The first thing to do is insert a Table* (not a Pivot Table) connected to the data model.  To do this, go to a blank area of the spreadsheet (shown as 1 below) and then select Data\Existing Connections\Tables, then select any table (I have selected Products shown as 5 below) and then click Open.  *At this point you can select any table as it is going to change shortly anyway.

image_thumb[38]

You will be given an option to insert a Pivot Table however you also have an option to insert a Table as shown below.  Go ahead and click OK to insert this table.

image_thumb[40]

Now that you have a table inserted into a spreadsheet from the data model, it is possible to change what is returned in this table.  To change what is returned in the table, right click anywhere inside the table (shown as 1 below) and then select Table\Edit DAX.

image_thumb[42]

Now you have the opportunity to change this table from being a “table” to being a DAX Query as shown below.

image_thumb[43]

I have now taken a DAX Query I wrote earlier and pasted this into the expression box (shown below) to return a summary table into the spreadsheet.

image_thumb[44]

You can manually refresh this table by right clicking in the table and clicking “Refresh”.  Unfortunately there is no easy way to pass filters to this type of table in Excel although I believe it can be done with VBA. That may be a topic for another day.

Materialise Tables Into Power BI

You can also materialise DAX Query Tables into Power BI using the “New Table” function.  This can be very useful to help you debug the formulas you are writing (as I covered in this article).  The only thing different to note here is that you need to omit the EVALUATE statement inside Power BI New Tables.  The example below materialises the table (shown as 5) from the (formula shown as 4).

image_thumb[45]

Check the Performance of your Measures using ROW

DAX Studio has a really cool feature that allows you to understand how Power Pivot is completing its task of calculating a formula under the hood.  To demonstrate the point I need to introduce an new DAX Query function called ROW.

You can see the syntax for ROW below from the DAX Studio Intellisense.

image_thumb

ROW returns a single row table.  Because a DAX Query can ONLY return a table, it follows that if you want to see the result of a measure inside DAX Studio then you must first convert the scalar result of the measure into a table – this is exactly what ROW does.

Below is a simple example where I return the value of the [Total Sales] measure as a new table.

image_thumb5

My ROW function (shown as 1 above) returns a single row table with 1 column (called “my result”) and the row has the value 29.3m (shown as 2 above) which is the total for the entire database.  Note the Server Timings button shown in 3 above.  If you click this button it will enable the capability for DAX Studio to keep track of how long your query takes to execute and also how the query is being executed.

With the server timings enabled, a new tab will appear in DAX Studio (shown as 1 below).

image_thumb4

After executing the query, the server timings tab will show the total time taken to execute the query in milliseconds (shown as 2), how much time each engine in Power Pivot completed the tasks (Formula Engine and Storage Engine shown in 3), and if the Storage Engine Cache was used (shown in 4).

SE or FE?

This is a huge topic in its own right and way beyond the scope of this article. What I will say here is that in general you want to leverage the Storage Engine in preference to the Formula Engine.  Take a look at some of the key points below.

Storage Engine Formula Engine
Really Fast Really Smart
Can retrieve records without decompressing in the right conditions Iterates in a row context and hence it can be slow
Multi threaded Single threaded
Cached Not cached

I will cover the Storage Engine and Formula Engine in another post sometime.

CROSSJOIN Revisited using ROW

Earlier in this series I warned that CROSSJOIN can return very large tables, but this is only really an issue if you try to materialise the tables.  If I write the following query in DAX Studio, it takes forever to execute and I ended up canceling the query (remember there are more than 7 million rows in this table).

image_thumb[46]

The issue is not the creation of the above table in Power Pivot – the issue is the materialisation of this table in DAX Studio.  Consider the following query that uses ROW to return a single row table containing the count of the rows in the CROSSJOIN query.

image_thumb[47]

The above query executed in 400 milliseconds on my PC confirming that Power Pivot can create these large tables really quickly.  If you want to materialise the results, well that is a completely different thing.

DAX as a Query Language Part 2

In my last article I introduced DAX as a Query Language and explained why learning to query your data model can be useful.  I covered the basics about using a tool like DAX Studio to query a database and return a table of data as a result.  In this article I am going to go deeper and introduce the more common and useful query functions including CROSSJOIN, SUMMARIZE and ADDCOLUMNS (there are more functions that I don’t cover in this article).

Here is the link to the PBIX data model again if you want to follow along and do the examples yourself.  I am using DAX Studio again to connect to Power BI Desktop running on my location machine.  If you take the time to complete these examples you will significantly increase your learning and knowledge retention.

Lineage

Before I move on, an interesting and important fact about Power Pivot (when compared to traditional database tools) is the concept of lineage (pronounced LIN-E-AGE).  When a new virtual table is created in a query or a formula in Power Pivot, the new table will include an automatic one to many relationship to the table from where it was created.  Consider the following simple table from last time.

image

You can see above that this query produces a single column table of all of the unique product categories.  This new virtual table retains lineage to the data model.  In this case the new virtual table comes from the Products table and hence the new virtual table has lineage to the Products table.  You can “imagine” this as shown below with the new table having a 1 to many relationship to the Products table.

.image

Note the image above is just a visual simulation of what happens.  The virtual table is not materialised and you cannot actually see it in the relationship view. But the virtual table does exist (virtually) and the lineage to the Products table also exists – it is just that you can’t actually see it.  I recommend that you learn to “imagine” this happening in the data model in your mind as it will help you understand how the new virtual table interacts with the rest of the data model especially as it relates to context transition.

All virtual tables have lineage to the tables from where they came from.

CROSSJOIN

CROSSJOIN is a function that can create a new table from 2 or more source tables. For this example I am going to join some virtual tables. The first virtual table is VALUES(Product[Category]) which of course returns a list of all the unique product categories.

image

The second virtual table is a list of all possible Customer genders

image

Next I am going to use CROSSJOIN to create a new table containing all the unique combinations of both tables.

image

In the table above there are 4 rows x 2 rows giving a total of 8 rows of all of the unique values.  Continuing on the concept of “imagining” the way these new virtual tables are related in the data model, it would look something like this:

image

Remember this is just a simulation of what it looks like.  These tables at the top are not materialised and you cannot see them in the data model.  But you can “imagine” them as looking like this and they behave in exactly the same way as they would if they were physical tables.

M x N Can Mean a Large Table

You need to be careful with CROSSJOIN as by definition the resulting table will be m x n rows long where m is the number of rows in table 1 and n is the number of rows in table 2.  If I were to CROSSJOIN the customer table (18,484 rows) with the Products table (397 rows) I would end up with more than 7 million rows.  This in itself is not a problem for Power Pivot to create such a large table in memory, but it can definitely be a problem if you try to materialise the table.  More on that next week.

SUMMARIZE

SUMMARIZE is by far my favourite DAX Query function.  SUMMARIZE can do similar things to CROSSJOIN however CROSSJOIN can join tables that do not have relationships whereas SUMMARIZE can only join tables that are related with a many to 1 relationship.

SUMMARIZE first takes a table and then one or more columns (that can be reached via a many to 1 relationship) that you want to include in the new summarised table.

SUMMARIZE(<table>, table[column], table2[column],….)

Here is an example.

image

The query results above are similar to the CROSSJOIN query from before but there is one important difference.    SUMMARIZE will only return rows that actually exist in the data itself (Note there are only 6 rows above compared with 8 rows in the CROSSJOIN example).

Consider the relevant tables from the data model below.

image

Here is the SUMMARIZE formula written earlier.

This query starts with the Sales table and then adds the Products[Category] column from the Products table and the Customers[Gender] column from the Customers table.  The 2 columns specified inside the SUMMARIZE formula come from tables on the 1 side of the many to 1 relationships – this is allowed.

The following is not allowed and will not work.

EVALUATE
SUMMARIZE(Products,Sales[CustomerKey])

It doesn’t work because the column Sales[CustomerKey] cannot be reached from the Products table via a many to 1 relationship.

It is also possible to write a SUMMARIZE statement over any single table.  In the example below, the SUMMARIZE statement returns a list of all possible combinations of product category and colour.

image

You could also achieve the same result with the ALL function (which would be an easier solution if you are just using a single table)

Adding Summary Sales to the Summary Table

So far the SUMMARIZE queries above are just lists of valid combinations.  It is time to do something more interesting and add the [Total Sales] to these summary tables.  Before moving on please note that the following formulas are not best practice – there is a better way which I will cover later.

Consider the following formula

Note specifically that the table parameter in this formula is “Products”.  Also note below that this formula returns blank rows (shown below).

image

This summarize statement correctly summarises all the combinations of Product[Category] and Product[Color] in the products table and then for those products where there are sales, those sales are shown next to the unique combination.  But in some cases the unique combination doesn’t actually have any sales, hence the blank rows.

Using Sales as the Table Parameter

If I change the formula above and swap out the Products table with the Sales table, then the blank rows are no longer visible (see below).

image

SUMMARIZE will always find the unique combinations that actually exist in the selected data.  Because this new formula starts from the Sales table, only combinations of Product[Category] and Product[Color] where there are actual sales are returned.

Context Transition or No Context Transition?

Those of you that are familiar with the concept of context transition may be thinking that context transition is occurring here.  That is a valid thing to assume but this is not what is happening here.  Consider the following formula.

image

Note how I have swapped out the measure [Total Sales] with SUM(Sales[ExtendedAmount]).  With this new formula above there is no CALCULATE forcing context transition yet despite this the table still returns the same result.  This implies that SUMMARIZE does not operate in a Row Context.  In fact SUMMARIZE is a Vertipaq Storage Engine operation.  The part that produces the valid combinations of columns is very efficient however the calculation of the total sales figures are very inefficient.  For this reason it is better to use ADDCOLUMNS to add the sales totals (see below).

ADDCOLUMNS

ADDCOLUMNS does exactly what it suggests – it adds new columns to a table in a query.  The general syntax is as follows:

ADDCOLUMNS(<table>,”Column Name”, <formula or measure>,….)

To demonstrate how this works, let me start with a formula from earlier that produces the following table.

The SUMMARIZE function returns a 2 column table with 15 rows – all the possible combinations that contains sales values.  This table can be used as the table parameter in the ADDCOLUMNS formula as follows.

image

You can see above that this new table returns the Total Sales for each of the 15 possible combinations.

And it is possible to add as many new columns as you need to the summary table.  See below.

image

The Important Differences between ADDCOLUMNS and SUMMARIZE

Now you have seen that it is possible to add columns (such as summary total sales) to a table using SUMMARIZE and also with ADDCOLUMNS.  But there are some important differences between these 2 approaches.

ADDCOLUMNS has a Row Context

Unlike I showed with SUMMARIZE earlier in this article, ADDCOLUMNS does have a row context.  Consider the following query.

image

When I swap out the measure [Total Sales] with SUM(Sales[ExtendedAmount]) the results are wrong.  This shows that ADDCOLUMNS operates in a row context.

Efficiency

When given the choice, you should choose to use ADDCOLUMNS in favour of SUMMARIZE to add these additional columns of data.  ADDCOLUMNS is a lot more efficient in the way it adds the values to the SUMMARIZE table. SUMMARIZE uses a Vertipaq Storage Engine operation to produce the base table and then ADDCOLUMNS leverages lineage and context transition to add the value columns – this approach leverages the special capabilities of Power Pivot to do the job in the most efficient way.  For a more detailed coverage of this topic you should read this article from The Italians.

Other DAX Query Functions

I realise I have not covered all DAX Query functions in this series of articles.  There are others, some of which are only available in the newer versions of Power Pivot (eg Power BI Desktop, Excel 2016).  If you are interested to find out more you can do some research online.  I will be covering 1 final function next week – the ROW function.

Uses for DAX Queries

Next week I will share my final article in this series where I explain a few ways that you can use DAX Queries in the real world.  Be sure to check back next week, or better still sign up to my weekly newsletter to be notified when there are new articles.

An Introduction to DAX as a Query Language

Level: Intermediate

I have written quite a few articles that contain embedded references to DAX Queries and DAX Table functions in the past.  Today I am writing an article that introduces the DAX query language from the beginning so you can learn how to use it and how it will help you to become a Power BI superstar.

What is a Query Language?

A query language is a programming language that allows you to send a request to a database and then receive data back in the form of a table.  If you only have a few hundred rows of data then this is not overly useful.  If on the other hand you have millions of rows of data then it becomes very useful.

The most important difference between DAX when used in Measures and Calculated Columns and DAX as a query language is that the query language always returns a table.  The resulting table will consist of one or more columns, none or more rows, and data inside none or more of the cells in the table.

Why Do I Need a Query Language?

When you create a data model in Power BI or Power Pivot for Excel, you are essentially loading tabular data into a dedicated reporting database for the expressed purpose of analysis and/or reporting.  Once it is loaded, you (the report author) then typically spend hours of effort building out the data model so it can be used to add value to your business/job.  Finally once this is all done, the most common way to use your new data model is to build reports and pivot tables that present data to a user in a way that is easy to understand.

Below is an example of an interactive Power BI report that runs off a Power BI data model.

And here is a Pivot Table that runs off an Excel Power Pivot data model.

Both of these above visualisations are summarisations of the underlying data and in both cases you can’t actually see any of the the underlying data itself, just the summary.  The visualisation tools above are great and have enormous utility however they are generally not the best tools if you want to see the underlying data itself or if you want to extract very large sets of data for whatever reason.

Once you have invested all this effort in building your data model, you are going to want to use it to the fullest extent. Power Pivot and Power BI can handle many millions of rows of data (unlike traditional Excel) and so the “old way” of simply looking at the underlying table is not really practical in this brave new world.

Show Me the Underlying Data!

 

If you take the pivot table I showed earlier, most Excel users would be aware that they can double click on any value cell (say 31.6% representing Mountain Bike sales to people aged between 30 and 40 years of age).   When you double click on a traditional pivot table in this way, something magical happens – a copy of the underlying data is extracted from the source and it magically appears in a new sheet in Excel.  Once you are done looking at the copy of the data, you can delete the new sheet and you are back where you started – life is good.

When you double click on on a traditional pivot table in this way, you are effectively sending a query to the underlying database and the query response is a table of data.

But Does it Work With Power Pivot?

In short, yes and no depending on what you are looking for.  The experience is different if you have an Excel Power Pivot data model or if you are connected to an external SSAS data source. In my experience the results are inconsistent at best and at worst it may not give you what you expect.  But the good news is that with DAX as a query language you can easily write any query you want and extract exactly the subset of data you need.  You no longer have to rely on double clicking your pivot table to interrogate your data.

Time To Learn DAX as a Query Language

OK, enough of the chatter, time to get serious about DAX as a query language.  You have invested time and effort in building your data model in Power BI or Power Pivot and now you rightly want to extract a subset of your data so you can take a look at it.  Let’s get started.

First Select the Right Tool

There are 3 general approaches you can take to extract a table of data from your fabulous data model.

  • Use a dedicated tool like DAX Studio or SQL Server Management Studio.  I wrote an article describing how to install and use DAX Studio here.
  • Write a “New Table” in Power BI Desktop.  I wrote an article about how to use this feature here.
  • Use an Excel “table hack” I learnt from Marco Russo to extract a table directly into Excel.  I will cover this hack later in this article series.

I am going to use DAX Studio in this blog article but you can use Power BI “New Table” if you want to follow along and you don’t have (or want) DAX Studio. In fact if you want to learn this stuff then you really should follow along using either approach.  In my experience you can’t learn new concepts simply by reading.  Please note however that using Power BI and Excel as a query tool should be considered as a temporary extraction of the data.  Once you have looked at the data you want, you would normally delete the table so you don’t end up storing redundant data (you can always extract it again if you need it).

Let’s Start With the Evaluate Statement

All DAX queries must start with the EVALUATE statement. There is an exception however.  When you use the Power BI “New Table” button to write a query, you simply omit this EVALUATE statement.  This is no doubt another example of “Syntax Sugar” where the developers shield the user from some of the more confusing syntax.  If you are following along and testing my queries below in Power BI, then simply omit the EVALUATE statement when you write the queries.  In addition if you ever want to move your DAX Studio query to a table in Power BI, then you can just remove the EVALUATE statement at that time.

Connecting DAX Studio to the Database

When I launch DAX Studio I am prompted to connect to my database. In this case I have an instance of Power BI Desktop with an Adventure Works data model running on my PC and DAX Studio gives me an option to connect to that.  You can also connect to Excel Power Pivot data models and Tabular SSAS servers.

image

If you want a more comprehensive overview of the DAX Studio UI, then read my other post I mentioned earlier.  Otherwise let’s get into it.

Return an Existing Table

The easiest DAX query you can write is to simply return a complete copy of an existing table.

When I run this query in DAX Studio (shown as 1 below) and then press F5 to run, the entire table is returned (shown as 2).  A total of 18,484 rows of data (shown as 3).

image

This in itself is not overly useful as chances are you have this table already in some source system.

List of Unique Values

In DAX there are many functions that return tables instead of scalar values.  I covered this topic in depth last week so you can read up on that here if you like.

Let me now look at the ALL function.

When I execute this table function as a query, I can actually “see” the results of the query.  In this case the query returns a single column table consisting of 4 rows – all the unique product categories.

Next let’s look at VALUES.

In this instance, the VALUES table function returns exactly the same result.

image

Now let’s look at DISTINCT.

And same again here.

image

All three of these table functions return the same table in this instance.  In reality they all do different things in different filtering scenarios.  This is one of the great things about DAX Queries is that you can “inspect” what is happening to these table functions in different scenarios.  More on that later.

A Two Column Table

The ALL function can take 1 or more columns as an input (or a table).  In the example below I have used ALL to materialise a table of all of the combinations of Product Category and Product Sub Category.  There are 37 unique combinations in all.

image

A Filtered Table

Most people that know DAX will know of the FILTER function.  The FILTER function returns a filtered copy of a table.  eg see the example below

This is a much more interesting query because it returns a table containing all the products that have sold more than $200,000 of sales over all time.  There are 49 such products.

image

Altering Filter Context

When you write a DAX measure or Calculated Column, there is only 1 way you can change the filter context and that is using the CALCULATE function.  CALCULATE returns a scalar value after applying filters.  CALCULATETABLE is the sibling of CALCULATE.  As you can probably guess, CALCULATETABLE returns a table after applying filters.

This next query returns exactly the same result as the last query.

image

the main difference with using CALCULATETABLE compared to just using FILTER is that CALCULATETABLE can return a different table to the FILTER function itself.

The above query uses the same FILTER function from before, but instead of returning a table containing the products that sold more than $200,000, it returns all of the actual sales transactions that contributed to those products qualifying (see results below).

image

Combining Table Functions

Now that you see how some of the standard table functions work in a query, let’s look at combining them together into a single query.

This query first applies a filter on all products that have sold more than $50,000 (line 4 below), and then returns a list of unique product subcategories that contain these products.

image

One interesting thing in the query results above is that “helmets” are in the list (the only non-bike subcategory). To see what the exact products where that sold, it is simply a matter of writing another query as follows.

image

Note how I was able to reuse the the original query structure to return the list of product names with just a minor change to the query.

More to Come Next Week

This is a big topic – too big for a single blog article.  This week I introduced the topic of DAX queries, I showed how you can write queries to return tables and how you can combine table functions together to interrogate data.  Next week I will introduce some new special query functions that extend the capabilities even further.

Extract Tabular Data From Power BI Service to Excel

Someone asked me a question yesterday about exporting data from the Power BI Service into Excel.  There are a few options to do this however they all have their problems (these problems are not covered in great detail in this post).

  • Power BI has an inbuilt export data feature (there is an export limit of 30k rows for a free account and 150k rows for a Pro account.  Plus the connection cannot be refreshed).
  • Analyze in Excel (can only view the data in Excel via a pivot table, not a table of records).
  • Download the Power BI Desktop file to your PC and use DAX Studio (you have to download the entire model just to get the records you want, besides it may not even work at all).

I started to think that I may be able to connect directly to Power BI Service using DAX Studio and then use the techniques I cover here to extract the records.  I tried this but couldn’t get it to work (please let me know if you find a way to do this).  I started searching and found this old article I wrote in 2015 that uses a technique I learnt from Marco Russo.  l have used a similar approach to create a direct query of a Power BI Data Model and extract the records directly into a Table in Excel.

Edit: 11 Jan 2017 – see the bottom of this post to see how Chris Webb recommends to do this task – it is much easier :-).  But read on for some interesting learning.

Create a Link to PowerBI.com

The first thing I did was to select my dataset from the Power BI Service, clicked the Ellipsis (1 below) and then Analyze in Excel (2 below).

image

This downloads an ODC file to the downloads folder.  Mine was called Adventure Works.odc

image

Note: there have been many updates to the Analyze in Excel connectors since its launch last year.  If you haven’t updated the Analyze in Excel connectors for a while you would be well advised to do that first before attempting the rest of the procedure that follows.  This can be done by logging into Power BI, clicking Downloads (shown as 1 below) and then downloading and installing the connector software (2 below).

image

Edit the ODC File to Make it a Query

I then edited the ODC file using Notepad.  It looks like this originally

image

It is a bit hard to read the file as is, so I cut and paste the contents into an HTML editor to improve the readability, (as shown below) then pasted the formatted text back into Notepad.

image

As you can see above, the default command is “Cube” and the default text is “Model”.  The trick is to change the command to be “Query” and replace the text to be any valid DAX Query.

I changed lines 19 and 20 below as follows:

image

<odc:CommandType>Query</odc:CommandType>
<odc:CommandText>Evaluate Sales</odc:CommandText>

The Command Text I used above is one of the most simple DAX Queries you can write.  All queries must start with the EVALUATE statement followed by any valid DAX code that returns a table of data.  The simplest way to do that is just specify an existing table (Sales in this case). If you are doing this, you can use any table name in your data model.

I then saved and closed the ODC file.

Open the ODC in Excel

To open the ODC file in Excel, I then simply double clicked on the file.  This opened Excel and gave me the following security warning

image

I clicked enable, and there appeared the entire Sales table from my Power BI data model directly in a table in Excel.

image

Now that the connection has been established, the file can be saved and refreshed as needed in the future.  To refresh, simply open the file, right click on the table and click “refresh”.

Change the Query

So far this is pretty boring albeit maybe useful for some people.  Much more interesting is to be able to extract a subset of the data to Excel.  To do this you need to learn a bit of the DAX Query Language.  I am not going to go into the DAX Query Language in detail in this post, but below I show how to edit the current query and change it to a new query – in this case a query that will return the Total Sales by Invoice Number.

To edit the existing query, I simply right clicked anywhere in the table (1 below), then selected Table (2 below), then Edit Query (3 below).

image

This brings up a dialog that can be edited as needed.

image

For my demo, I replaced the command text with my new DAX Query as follows

image

After clicking OK, this query then returned a table of all invoice numbers and the total value of sales for those orders.  Interestingly the number formatting for Total Sales did not flow through to Excel as I expected it to.

image

Chris Webb’s Trick

Edit 11 Jan 2017. Chris Webb tweeted an alternate approach that is faster to execute. Simply create any pivot table using Analyze in Excel, drill through to the underlying table and then edit the query as before.

Further Learning

I may come back and blog about DAX as a Query language at another time.  Until then if you would like to learn what is possible, I recommend this video from Alberto Ferrari (although it is a bit advanced unless you have some solid understanding of tables and databases).

www.sqlbi.com/tv/querying-in-dax-tee2014/

How to Document DAX Measures in Excel

I often get asked if there is an easy way to create documentation for DAX measures when using Power Pivot for Excel.  I am not a big fan of documentation for the sake of it, but I do see value in having “some” appropriate level of documentation.  I think a good balance of the right amount of documentation vs the effort to write and maintain the documentation is to use the Power Pivot UI itself.  I explain how I do it below.

Writing DAX Measures From Within Excel

I always teach students at my Power BI training courses to write their measures from within Excel via the “new measures” dialog box (and specifically not to write them from within the Power Pivot window).  The process is always as shown below.

  1. Set up a pivot table that has some relevance to the measure you are writing, then click in the pivot table (shown as 1 below)
  2. Select Power Pivot Menu \ Measures* \ New Measure* as shown below (*called Calculated Field in Excel 2013)

image

The reasons I teach students to do it this way are simple.

  1. You get a dialog box/wizard that shows you every step that is required to complete the task.  Just complete each step one at a time and you can’t go wrong.
  2. You get to check the measure for errors before completing it (shown as 2 below).
  3. Once you add the measure to your pivot table (shown as 3), you get to see immediately if it is working as you expected.

image

image

And there is a fourth reason to use the wizard that is less obvious.  The Description box (shown as 4 above) is a great place where you can make notes about your DAX formulas (aka documentation).    In Power BI Desktop you can add comments into your DAX expressions, but this is not possible in Excel, so the description box is the best option you have.

Create Your Documentation

If you diligently add comments into the Description box above, you will then be able to extract this information into a documentation page using the Power Pivot Utilities Toolbar.  I have previously reviewed this great little tool here, so you can go back and read about it if you missed it.

After installing both tools, you will then see the PP Utilities Toolbar in Excel (shown as 1 below)

image

Click on List Measures (shown as 2 above).  After a few seconds, you will see a new page in your Workbook called “Measures” like shown below.  Note the “Description” column on the right hand side. This column is sourced directly from the information entered into the Description box shown further up the page.

image

What’s more, if you select the measures on your page (Shown as 1 below) and then click Format DAX Expressions (2 below), the measures will be nicely formatted so they are easy to read.

image

This page is then your documentation and you can refer to it at any time you need to see the big picture.

Always Up to Date

One important feature of this type of documentation is that it is very easy to maintain.  If you make changes to any of the measures or comments in your data model, you can simply re-run the extract process so you have an up-to-date list of all the measures.  You can also do the same for Calculated Columns and also relationships.

What About Excel 2010?

Unfortunately the Power Pivot Utilities Addin only works for Excel 2013+  But all is not lost.  It is possible to extract the Measures from your Data Model manually using DAX Studio.  You can read my blog post about how to do that (and other cool things) with DAX Studio here

Excel Workbook Connection to Local Power BI Desktop

Today I am sharing an easy way to use Power BI Desktop as a SSAS tabular server on your local PC.  I find increasingly that I am building data models in Power BI Desktop and yet I still want to use Excel to analyse the data some of the time.  If I load the PBIX workbook to the Power BI Service, it is easy to use “Analyze in Excel” to query the data model.  But there is currently no standard supported method to do the same thing using Power BI Desktop as a server (see my suggested idea to Microsoft at the bottom of this page for more about that).  Until then, we need to use a “hack” to complete this task.

I have previously shared a way to:

  • Open DAX Studio on your PC
  • Use DAX Studio to extract the Port Number of a running instance of Power BI Desktop
  • Manually enter this information into Excel to connect to Power BI Desktop and use it as a local SSAS Server instance
  • Query the data model that exists in the open Power BI Desktop file with a Pivot Table in Excel.

The problem with the above approach is that once the Power BI Desktop file is closed, the connection string becomes invalid.  Next time you want to use Power BI Desktop as a server you must go through the entire process again plus you need to alter the connection string that was previously created – hardly a scalable solution.

An Excel VBA Workbook Template to Solve This Problem

Local Host Workbook

Today I am sharing a tool that I built recently that completes the entire process automatically in Excel using VBA and Power Query.

Disclaimer:  I am providing this tool free of charge, without support, and do not warrant that it will work for you.  This is not supported by Microsoft and could stop working anytime.  If it works for you, then great – if not then it is unfortunate :-(.  Maybe you can deconstruct what I have done and make it work for you.  I built this using Excel 2016 and I have tested it on 2010 and 2013 and it works (Please make sure you have the latest version of Power Query installed).

You create a new blank Excel Workbook using my template (from personal templates), then click the “Refresh SSAS Connection” button.  The tool will then:

  • Detect an instance of Power BI Desktop running on your PC (there must be 1 and only 1)
  • It finds the port number and database name of the running instance
  • it modifies the existing connection string in the Excel Workbook to repoint it to the current running instance of Power BI Desktop.
  • If the workbook already contains pivot tables that were previously built using the same data source, they will simply reconnect and work as you expect.
  • If this is the first time you have used the template, you will have a blank Pivot Table connected to the Power BI Desktop data model.
  • You should only have 1 Excel Workbook and 1 Power BI Desktop file open at any one time (as they all use the same connection string).

I am not planning on covering in detail how the tool works.  I am making it available to use and modify as you see fit.  If you want to look into the VBA code and the Power Query to see what I did then of course this is fine by me.  If you want to improve it and give me back the improvements under the same “share and share alike” rules, then please do so as long as you leave my name comments in the file untouched.

How to use the Template

You can download the template here.  I have put it in a zip file so that I can keep the same link but change the name of the actual file over time (to manage version numbers).

The file is a macro enabled Excel Template workbook.  You need to place it in your Personal Templates location.  To find this location, in Excel go to File\Options, then navigate to save (1 below) then check the location (shown in 2).  If you don’t have a current template location, you can set your own.  Note Excel 2010 has a different file location (Google it).

image

To create a new workbook that uses this template, in Excel go to File\New, select the personal templates option (shown as 2 below) and then select the template.

image

I would love to hear what you think and find out if this is a useful tool for you.

Let’s Ask Microsoft to Make this a Standard Feature

I was talking to Avi Singh and he suggested that I request this as a standard feature.  I have set up an idea that you can find and vote for here.  If we get enough votes for this feature, Microsoft may build out this capability into the core product(s) so it is easier for everyone and fully maintained for future releases.

 

LASTNONBLANK Explained

Level: Intermediate

Last week at my Sydney training course, one of the students asked me a question about LASTNONBLANK.  This reminded me what a trickily deceptive function LASTNONBLANK is.  It sounds like an easy DAX formula to understand, right?  It just finds the last non blank value in a column – easy right?  Well it is a little bit trickier than that, and I am going to explain it all in this post today.

LASTNONBLANK has a sibling FIRSTNONBLANK that operates in exactly the same way but in reverse.  Given the behaviour is the same, I won’t cover FIRSTNONBLANK at all, as you can work it out after reading about LASTNONBLANK.

Syntax

The syntax of LASTNONBLANK is as follows.

LASTNONBLANK(Table[Column],<expression>)

It takes 2 parameters, 1) a column and 2) an expression.  In fact you can also pass a single column table as the first parameter in as well.  This could be useful for example if you wanted to create your own table on the fly using a function that returns a table, and then use that as the first parameter.

LASTNONBLANK is an Iterator

It is not immediately obvious, but LASTNONBLANK is actually an iterator,  It iterates over the column specified in the first parameter and finds (unsurprisingly) the last value in that column that is not blank.  (Technically the engine carries out this task in reverse natural sort order for efficiency).

But what is that pesky second parameter?

The first thing people normally find confusing about LASTNONBLANK is that pesky second parameter.  If all you want to do is find the last non blank value in a column, why do you need this second parameter?  Well in short, you don’t “need” it in that use case (except that it is mandatory).  But having this parameter makes the formula much more powerful, so it is good to have it as an option. It is designed to allow you to write a DAX expression such as a measure to test for some condition or value.  Ideally (in my view) Microsoft should have made this an optional parameter, but that is not what we got.  But there is an easy work around that allows you to make this parameter optional  – just use the value 1 as the second parameter, like this.

LASTNONBLANK iterates through the Table[Column] and then checks to see if the second parameter has a value.  The number 1 always has a value of course, so placing 1 as the second parameter has the same effect as just ignoring this parameter.  Instead the formula will just return the last non blank value it finds in the column.

LASTNONBLANK actually returns a TABLE

Another thing that is not immediately obvious is that LASTNONBLANK actually returns a TABLE.  It is a special single column, single row table that by definition can only have 1 possible value (because it has only 1 row and 1 column).  One feature of this single row, single column table is that you can use it as both a scalar value or a Table in DAX.  More on that later too.

Test Data

My test data for this blog post is monthly bank account balances for 2 fake bank accounts.  Measures that calculate account balances in DAX are often described as “semi-additive measures”, because you can’t just add up the values from each month to get the current balance – you need to find the latest balance to work out how much you have.  This type of data is a prime candidate to use the LASTNONBLANK formula, because LASTNONBLANK does exactly what we need to work with semi-additive measures.

I have set up the following data with some interesting features to demonstrate the behaviour of LASTNONBLANK.

image

It is easiest to see what I have done with this test data in the pivot table below.  I have written the following measure to demonstrate the point.

Note the above measure is a test measure only to help with this blog – it doesn’t really make any sense as is, but it is useful for describing what is happening.  When the above measure is placed in a pivot table, it looks like this (shown below).

image

Note a couple of things about this data.

  1. Kathy’s account doesn’t have a result for May
  2. Matt’s account doesn’t have a result for July
  3. The maximum value for Kathy’s account is in June (the previous month’s data load)
  4. The maximum value for Matt’s account is back in May.

I have set the data up this way to demonstrate the behaviour of LASTNONBLANK.  I often talk about how important it is to set up test data that will flush out issues when you write your formulas.  This is a good example of that, and it will make more sense as you read on.

Last value in a Column

Now the objective is to use LASTNONBLANK to find the last value in a column.

Consider the following formula.

This formula finds the last date in the data table (note it is not the last date in the Calendar table, but the data table).

The following pivot table has the Month name from the calendar table and the account names on pivot table columns.

image

Note that the formula correctly indicates that May data is missing for Kathy, and July data is missing for Matt.  Also note that it correctly gives the last date in the Grand Total of the pivot table.  (You would also get the same result as above if you used LASTDATE instead of LASTNONBLANK).  I have only used LASTNONBLANK here to demonstrate the behaviour.

LASTNONBLANK operates over a sorted column

The [Last Non Blank Date] measure above hides some complexities about LASTNONBLANK.  You might expect that LASTNONBLANK finds the last value in the column, but that is not how it works.  It actually finds the last value in a sorted column using the natural sort behaviour of the column data type.  In the case of a date column (like shown above), then everything is sweet – the natural sort order is also the order we normally load the data (chronological order).  But in the case of the balance column, the natural sort order of a numeric column is numerical sort order, not the order the data is loaded (as you can see in the following pivot)

I have written another test measure as follows

When placed in the pivot table, you get the following behaviour

image

In the above example, you may expect the Grand Total row to return the values 2,125 for Kathy and 1,557 for Matt as these are the last values you loaded, but that is not how it works.  When looking at the individual rows in the pivot table, it all works fine – it correctly finds the last balance for each month.  This measure works correctly on the rows in the pivot table because the rows provide “initial filter context” prior to the measure being evaluated.  So there is only ever 1 row in the data table at the time the measure is evaluated, and that is why it works in this case.  But in the Grand Total row in the pivot table, there is no initial filter context applied to the date, hence all values in the column are iterated over by LASTNONBLANK.  The iteration operates over a sorted version of the balance column (the natural sort order of the column, which is numeric in this case).  Once the balance column is sorted, then 2,200 will be the last value in the column for Kathy, and 1,806 will be the last value in the column for Matt, and that is the result that is returned.

LASTNONBLANK as a TABLE

In this next test measure, I have used LASTNONBLANK as a filter input into a CALCULATE function

You should remember that CALCULATE can use a simple filter such as Table[Column] = “some value” or it can take an advanced filter where you must pass a TABLE as the filter.  The measure above therefore suggests that LASTNONBLANK must be a table, not a scalar value.  This theory can be tested by firing up DAX Studio and executing the LASTNONBLANK portion of the formula as a query.  DAX Studio ONLY ever returns tables, it can’t return scalar values, so if the query works, it confirms that LASTNONBLANK returns a table. Note: you can read more about using DAX Studio as a tool to help you learn DAX at this blog post here.

image

You can see above that LASTNONBLANK returns a single column, single row table that contains a single value.  It is a special table because it can also be used as a scalar value, in the same way that you can use VALUES() as a scalar value if there is only a single row returned.  In the case of the measure [Latest Balance with 1] above, I am using LASTNONBLANK as a table parameter filter inside CALCULATE.

But how does this table act as a filter?

One feature of temporary tables in DAX is they retain a link to the data model, and filter propagation will work between the temporary table and the rest of the data model.  You should visualise this in your mind like the image below.  Imagine a new table that has spawned into the data model and has a relationship to the table where it came from (in this case it is the data table).  The filter propagation flows from the 1 side of the relationship (temporary table imagined as shown in 1 below) to the many side of the relationship (table 2 below).

image

So you can see above that the LASTNONBLANK function produces a table that then filters the data model based on the single value it returns in the table.

When I put this measure in a pivot table, I get this result

image

Note that the measure is correctly returning the last value for the account “Kathy” but it is returning a blank for account “Matt”.  Technically this is correct because the calendar has dates in July and there is no entry for Matt for July, so the last value is blank.

Enter the second parameter

Now it is time to change the second parameter with the following measure.

When this last measure is added to a pivot table, it works as desired, correctly returning the last non blank value from the data table based on the chronological order of the data.

image

This new LASTNONBLANK function measure returns the last non blank value it finds in the column provided it also returns a non-blank result for the measure [Sum of Balance].

It is worth pointing out here that the ONLY REASON this formula works is because the measure [Sum of Balance] has an implicit CALCULATE wrapped around it – you can’t see it, but it is there.  LASTNONBLANK is an iterator, and like all iterators, it has a row context but doesn’t have a filter context.  Therefore the following formula will not work

But this next measure does work (this next formula is the equivalent of the one that uses [Sum of Balance] because [Sum of Balance] has an implicit CALCULATE

I cover evaluation context and context transition in detail in my book, Learn to Write DAX.

Further reading/references

Here are some good links that helped me learn how LASTNONBLANK works, and how it can be used/leveraged if you are interested in doing some more reading on this topic.

www.sqlbi.com/articles/semi-additive-measures-in-dax/

www.sqlbi.com/articles/alternative-use-of-firstnonblank-and-lastnonblank/

www.powerpivotpro.com/2012/06/top-selling-product-using-firstnonblank/

Quadruple Nested SUMX or CROSSJOIN

Level: Intermediate

I had an interesting problem with a client last week where I built a data model to calculate rebates payable to customers based on their sales turnover.  There were a few challenges that I am going to cover below, including:

  1. There needs to be a user friendly way to capture the rebate data, but it then needs to be transformed so it can be used in the model.
  2. There is a nested SUMX problem – quadruple nesting in this case.  I solve this with a CROSSJOIN (and then SUMMARIZE) that I will explain below.
  3. There is also a quadruple nested IF(HASONEVALUE()) problem that I solve with an innovative formula that I will also cover below.

The Business Scenario

The scenario is a wholesale business that sells products to customers.  Depending on the contract with the customer, a rebate may be paid on parts of the sale.  The objective is to create a tool that will correctly calculate the rebate payable at all aggregation levels in a pivot table. Read More

Getting Started with DAX Studio

Level: Beginners

DAX Studio is a fabulous free tool that allows you to directly query your Power Pivot/Power BI data models.  It is not immediately obvious (particularly to beginners) exactly how using DAX Studio can add value.  I often refer people on various forums to use DAX Studio but then lack a suitable reference to refer them to so they can get started.  This blog post today is a simple reference to help anyone that uses Power Pivot get started with DAX Studio.

But What Can I Do With DAX Studio?

I’m glad you asked – lots of things – much more than I will cover here today.  But here are a few things that you can do that are really useful. Read More

A Fabulous new Excel Add-In for Power Pivot

I’m really exited to share with you a new Excel Add-in that was authored by Bertrand d’Arbonneau and has recently been made available via SQLBI.com The Add-in called Power Pivot Utilities combines a number of existing tools into a single new tool bar in Excel as shown below (Excel 2013 and 2016 only). Everything could already be done using various bespoke existing tools and procedures, but there is huge value by bringing them all together into a simple to use UI like this. This new Add-In goes straight into my “must have tools” for Power Pivot in Excel.

image

There are currently 8 icons on the tool bar (edit: plus there are some hidden gems in the context menus). Let me explain what each item does. Read More

Extract Calculated Fields from an Excel Workbook

I was working with a client last week and the unthinkable happened.  The client made a simple editing mistake inside the Calculated Field editing dialogue box and it corrupted the Power Pivot workbook. We could still edit the corrupt workbook, but the Pivot Tables stopped working.  Oh No!!  Now what?  He had written half a dozen new Measures (Calculated Fields) and all that effort now seemed to be lost.

Luckily we had incremental backups of the workbook we were developing.  Well actually it wasn’t luck – it was good forward planning.  Even though Power Pivot is an excellent tool, anyone that uses it will know it is a bit flaky and it can cause problems when editing.  As a precaution I always keep incremental backups of my workbooks when doing data modelling work.  I have optimised this process using an incremental backup macro I created.  You can read about that here and even download the VBA code to use yourself.

OK, but we still had a problem.  We had a backed up copy of the workbook, but this backed up copy didn’t contain the last 30 mins work – about half a dozen DAX formulae.  Power Pivot doesn’t allow you to export the Measures (Calculated Fields) and then reuse them.  But there is a solution using DAX Studio.

What is DAX Studio?

image DAX Studio is a great tool that allows you to do many things with your Excel Power Pivot data models.  It will allow you to use DAX as a query language to interrogate your data model and many other things.  One of those “other things” it allows you to do is get a list of all of your DAX Measures in a new sheet in your workbook.

You can read more about DAX Studio and download it here.  Note I installed version 2.2.1 today and it didn’t work for me.  I am currently using version 2.1.1 and that works fine.

Step by Step Instructions on how to Extract Calculated Fields

Read More

DAX Studio 2.0 released today

Great news for Power Pivot for Excel users – DAX Studio has had an update over night and version 2.0 is now available. For those of you that don’t know DAX studio, it is an ADDIN that you install for Excel 2010 or 2013. It connects to the Power Pivot Data Model and allows you to write DAX code, query the data model and also query the DMV (which means you can get a list of all the Measures and Measure formulae in your data model). DAX Studio 2.0 is worth downloading for that reason alone.

How to Install

You do not need to uninstall your old version.  Just download the new version from here, close Excel and run the install.  Install DAX studio into the same location as your original version – I just used the defaults and all worked fine.  Once you start up Excel, you will find the same Icon on the Addin tab will work as before, but launch the new version.

If this is the first version you have installed, you first of all need to enable the Addin.  Do the following in Excel

  • Goto FileOptionsAddins
  • Go to the bottom and find “Manage Excel Addins”
  • Change the drop down box from Excel Addins and select COM Addins instead
  • Click Go.

Once you have done this, there will be an Icon on the Addin Ribbon in Excel.

addin

What’s New and Cool

User Interface

The most obvious thing is a new user interface.  It is very nice and clean.  It now supports Hierarchies and KPIs in your data model, which was not the case before.

Read More