Archive for Intermediate

Uses for DAX as a Query Language

This is the final article in my series about DAX as a Query Language.  In my first article I introduced DAX as a Query Language and then in my second article I went in deeper to the specifics on how to use some of the more useful DAX query functions.  In this final article today I am going to cover a few of the many uses for DAX Queries and also introduce one last DAX function – the ROW function.

Create Summary Tables

As you have may have seen in my previous articles, you can use DAX Queries to create summary tables like the one below.

This may be useful if you want to pre-calculate this summary table and have it as part of your data model (more on that covered in the “Materialisation” section below).  If you want to create a summary table for any reason you can do it using DAX Queries.

Create Tables to Use As Filters Inside  CALCULATE

Another major use of queries is as a table input inside a CALCULATE function.  Because the virtual tables retain lineage to the data model they behave as if they were physically part of the data model.   The following measure returns the total sales for all unique combinations of product category and customer gender where the margin to those unique combinations is greater than 45%.

This measures doesn’t have any real value however it does demonstrate the concept well.

Create Tables to Use in Iterators

You can also create table to use as the table parameter in an iterator like SUMX or AVERAGEX.  The following measure returns the straight line average Margin % across all the actual combinations of product category and customer gender.

Here is an article I wrote some time ago that explains how to use DAX Query tables instead of a more complex nested SUMX approach.  This article will be much more useful with your new understanding of DAX Queries.

To Debug DAX Formulas (the Table Bit)

One thing I really like about DAX Queries is they allow you to “Inspect” tables that are used inside DAX Measures and Calculated Columns.  Excel users come from a world where everything thing they do – every formula and every number – is materialised in front of their eyes in the spreadsheet.  When you begin to write more complex DAX Measures or Calculated Columns, the formulas will often be in the following format:

Your Measure = CALCULATE([Some Base Measure], <A complex table function used to filter the data model>)

The issue of course is you can’t actually “see” the result of “the complex table function used to filter the data model”.  If you can’t see it, it can be hard to conceptualise and also hard to debug the measure particularly if it gives an answer different to your expectations.  This is where the concept of materialising tables comes in.  A great way to move forward when you are stuck is to write a DAX Query to return just the table portion of your measure first.  Once you have that working correctly you can insert the table portion into the measure.

Materialise Tables Into Excel

It is possible to materialise a DAX Query into Excel by using a trick that I learnt from The Italians.  For this next example I have an Excel Workbook that contains an Adventure Works Power Pivot data model.  The first thing to do is insert a Table* (not a Pivot Table) connected to the data model.  To do this, go to a blank area of the spreadsheet (shown as 1 below) and then select Data\Existing Connections\Tables, then select any table (I have selected Products shown as 5 below) and then click Open.  *At this point you can select any table as it is going to change shortly anyway.

image_thumb[38]

You will be given an option to insert a Pivot Table however you also have an option to insert a Table as shown below.  Go ahead and click OK to insert this table.

image_thumb[40]

Now that you have a table inserted into a spreadsheet from the data model, it is possible to change what is returned in this table.  To change what is returned in the table, right click anywhere inside the table (shown as 1 below) and then select Table\Edit DAX.

image_thumb[42]

Now you have the opportunity to change this table from being a “table” to being a DAX Query as shown below.

image_thumb[43]

I have now taken a DAX Query I wrote earlier and pasted this into the expression box (shown below) to return a summary table into the spreadsheet.

image_thumb[44]

You can manually refresh this table by right clicking in the table and clicking “Refresh”.  Unfortunately there is no easy way to pass filters to this type of table in Excel although I believe it can be done with VBA. That may be a topic for another day.

Materialise Tables Into Power BI

You can also materialise DAX Query Tables into Power BI using the “New Table” function.  This can be very useful to help you debug the formulas you are writing (as I covered in this article).  The only thing different to note here is that you need to omit the EVALUATE statement inside Power BI New Tables.  The example below materialises the table (shown as 5) from the (formula shown as 4).

image_thumb[45]

Check the Performance of your Measures using ROW

DAX Studio has a really cool feature that allows you to understand how Power Pivot is completing its task of calculating a formula under the hood.  To demonstrate the point I need to introduce an new DAX Query function called ROW.

You can see the syntax for ROW below from the DAX Studio Intellisense.

image_thumb

ROW returns a single row table.  Because a DAX Query can ONLY return a table, it follows that if you want to see the result of a measure inside DAX Studio then you must first convert the scalar result of the measure into a table – this is exactly what ROW does.

Below is a simple example where I return the value of the [Total Sales] measure as a new table.

image_thumb5

My ROW function (shown as 1 above) returns a single row table with 1 column (called “my result”) and the row has the value 29.3m (shown as 2 above) which is the total for the entire database.  Note the Server Timings button shown in 3 above.  If you click this button it will enable the capability for DAX Studio to keep track of how long your query takes to execute and also how the query is being executed.

With the server timings enabled, a new tab will appear in DAX Studio (shown as 1 below).

image_thumb4

After executing the query, the server timings tab will show the total time taken to execute the query in milliseconds (shown as 2), how much time each engine in Power Pivot completed the tasks (Formula Engine and Storage Engine shown in 3), and if the Storage Engine Cache was used (shown in 4).

SE or FE?

This is a huge topic in its own right and way beyond the scope of this article. What I will say here is that in general you want to leverage the Storage Engine in preference to the Formula Engine.  Take a look at some of the key points below.

Storage Engine Formula Engine
Really Fast Really Smart
Can retrieve records without decompressing in the right conditions Iterates in a row context and hence it can be slow
Multi threaded Single threaded
Cached Not cached

I will cover the Storage Engine and Formula Engine in another post sometime.

CROSSJOIN Revisited using ROW

Earlier in this series I warned that CROSSJOIN can return very large tables, but this is only really an issue if you try to materialise the tables.  If I write the following query in DAX Studio, it takes forever to execute and I ended up canceling the query (remember there are more than 7 million rows in this table).

image_thumb[46]

The issue is not the creation of the above table in Power Pivot – the issue is the materialisation of this table in DAX Studio.  Consider the following query that uses ROW to return a single row table containing the count of the rows in the CROSSJOIN query.

image_thumb[47]

The above query executed in 400 milliseconds on my PC confirming that Power Pivot can create these large tables really quickly.  If you want to materialise the results, well that is a completely different thing.

DAX as a Query Language Part 2

In my last article I introduced DAX as a Query Language and explained why learning to query your data model can be useful.  I covered the basics about using a tool like DAX Studio to query a database and return a table of data as a result.  In this article I am going to go deeper and introduce the more common and useful query functions including CROSSJOIN, SUMMARIZE and ADDCOLUMNS (there are more functions that I don’t cover in this article).

Here is the link to the PBIX data model again if you want to follow along and do the examples yourself.  I am using DAX Studio again to connect to Power BI Desktop running on my location machine.  If you take the time to complete these examples you will significantly increase your learning and knowledge retention.

Lineage

Before I move on, an interesting and important fact about Power Pivot (when compared to traditional database tools) is the concept of lineage (pronounced LIN-E-AGE).  When a new virtual table is created in a query or a formula in Power Pivot, the new table will include an automatic one to many relationship to the table from where it was created.  Consider the following simple table from last time.

image

You can see above that this query produces a single column table of all of the unique product categories.  This new virtual table retains lineage to the data model.  In this case the new virtual table comes from the Products table and hence the new virtual table has lineage to the Products table.  You can “imagine” this as shown below with the new table having a 1 to many relationship to the Products table.

.image

Note the image above is just a visual simulation of what happens.  The virtual table is not materialised and you cannot actually see it in the relationship view. But the virtual table does exist (virtually) and the lineage to the Products table also exists – it is just that you can’t actually see it.  I recommend that you learn to “imagine” this happening in the data model in your mind as it will help you understand how the new virtual table interacts with the rest of the data model especially as it relates to context transition.

All virtual tables have lineage to the tables from where they came from.

CROSSJOIN

CROSSJOIN is a function that can create a new table from 2 or more source tables. For this example I am going to join some virtual tables. The first virtual table is VALUES(Product[Category]) which of course returns a list of all the unique product categories.

image

The second virtual table is a list of all possible Customer genders

image

Next I am going to use CROSSJOIN to create a new table containing all the unique combinations of both tables.

image

In the table above there are 4 rows x 2 rows giving a total of 8 rows of all of the unique values.  Continuing on the concept of “imagining” the way these new virtual tables are related in the data model, it would look something like this:

image

Remember this is just a simulation of what it looks like.  These tables at the top are not materialised and you cannot see them in the data model.  But you can “imagine” them as looking like this and they behave in exactly the same way as they would if they were physical tables.

M x N Can Mean a Large Table

You need to be careful with CROSSJOIN as by definition the resulting table will be m x n rows long where m is the number of rows in table 1 and n is the number of rows in table 2.  If I were to CROSSJOIN the customer table (18,484 rows) with the Products table (397 rows) I would end up with more than 7 million rows.  This in itself is not a problem for Power Pivot to create such a large table in memory, but it can definitely be a problem if you try to materialise the table.  More on that next week.

SUMMARIZE

SUMMARIZE is by far my favourite DAX Query function.  SUMMARIZE can do similar things to CROSSJOIN however CROSSJOIN can join tables that do not have relationships whereas SUMMARIZE can only join tables that are related with a many to 1 relationship.

SUMMARIZE first takes a table and then one or more columns (that can be reached via a many to 1 relationship) that you want to include in the new summarised table.

SUMMARIZE(<table>, table[column], table2[column],….)

Here is an example.

image

The query results above are similar to the CROSSJOIN query from before but there is one important difference.    SUMMARIZE will only return rows that actually exist in the data itself (Note there are only 6 rows above compared with 8 rows in the CROSSJOIN example).

Consider the relevant tables from the data model below.

image

Here is the SUMMARIZE formula written earlier.

This query starts with the Sales table and then adds the Products[Category] column from the Products table and the Customers[Gender] column from the Customers table.  The 2 columns specified inside the SUMMARIZE formula come from tables on the 1 side of the many to 1 relationships – this is allowed.

The following is not allowed and will not work.

EVALUATE
SUMMARIZE(Products,Sales[CustomerKey])

It doesn’t work because the column Sales[CustomerKey] cannot be reached from the Products table via a many to 1 relationship.

It is also possible to write a SUMMARIZE statement over any single table.  In the example below, the SUMMARIZE statement returns a list of all possible combinations of product category and colour.

image

You could also achieve the same result with the ALL function (which would be an easier solution if you are just using a single table)

Adding Summary Sales to the Summary Table

So far the SUMMARIZE queries above are just lists of valid combinations.  It is time to do something more interesting and add the [Total Sales] to these summary tables.  Before moving on please note that the following formulas are not best practice – there is a better way which I will cover later.

Consider the following formula

Note specifically that the table parameter in this formula is “Products”.  Also note below that this formula returns blank rows (shown below).

image

This summarize statement correctly summarises all the combinations of Product[Category] and Product[Color] in the products table and then for those products where there are sales, those sales are shown next to the unique combination.  But in some cases the unique combination doesn’t actually have any sales, hence the blank rows.

Using Sales as the Table Parameter

If I change the formula above and swap out the Products table with the Sales table, then the blank rows are no longer visible (see below).

image

SUMMARIZE will always find the unique combinations that actually exist in the selected data.  Because this new formula starts from the Sales table, only combinations of Product[Category] and Product[Color] where there are actual sales are returned.

Context Transition or No Context Transition?

Those of you that are familiar with the concept of context transition may be thinking that context transition is occurring here.  That is a valid thing to assume but this is not what is happening here.  Consider the following formula.

image

Note how I have swapped out the measure [Total Sales] with SUM(Sales[ExtendedAmount]).  With this new formula above there is no CALCULATE forcing context transition yet despite this the table still returns the same result.  This implies that SUMMARIZE does not operate in a Row Context.  In fact SUMMARIZE is a Vertipaq Storage Engine operation.  The part that produces the valid combinations of columns is very efficient however the calculation of the total sales figures are very inefficient.  For this reason it is better to use ADDCOLUMNS to add the sales totals (see below).

ADDCOLUMNS

ADDCOLUMNS does exactly what it suggests – it adds new columns to a table in a query.  The general syntax is as follows:

ADDCOLUMNS(<table>,”Column Name”, <formula or measure>,….)

To demonstrate how this works, let me start with a formula from earlier that produces the following table.

The SUMMARIZE function returns a 2 column table with 15 rows – all the possible combinations that contains sales values.  This table can be used as the table parameter in the ADDCOLUMNS formula as follows.

image

You can see above that this new table returns the Total Sales for each of the 15 possible combinations.

And it is possible to add as many new columns as you need to the summary table.  See below.

image

The Important Differences between ADDCOLUMNS and SUMMARIZE

Now you have seen that it is possible to add columns (such as summary total sales) to a table using SUMMARIZE and also with ADDCOLUMNS.  But there are some important differences between these 2 approaches.

ADDCOLUMNS has a Row Context

Unlike I showed with SUMMARIZE earlier in this article, ADDCOLUMNS does have a row context.  Consider the following query.

image

When I swap out the measure [Total Sales] with SUM(Sales[ExtendedAmount]) the results are wrong.  This shows that ADDCOLUMNS operates in a row context.

Efficiency

When given the choice, you should choose to use ADDCOLUMNS in favour of SUMMARIZE to add these additional columns of data.  ADDCOLUMNS is a lot more efficient in the way it adds the values to the SUMMARIZE table. SUMMARIZE uses a Vertipaq Storage Engine operation to produce the base table and then ADDCOLUMNS leverages lineage and context transition to add the value columns – this approach leverages the special capabilities of Power Pivot to do the job in the most efficient way.  For a more detailed coverage of this topic you should read this article from The Italians.

Other DAX Query Functions

I realise I have not covered all DAX Query functions in this series of articles.  There are others, some of which are only available in the newer versions of Power Pivot (eg Power BI Desktop, Excel 2016).  If you are interested to find out more you can do some research online.  I will be covering 1 final function next week – the ROW function.

Uses for DAX Queries

Next week I will share my final article in this series where I explain a few ways that you can use DAX Queries in the real world.  Be sure to check back next week, or better still sign up to my weekly newsletter to be notified when there are new articles.

An Introduction to DAX as a Query Language

Level: Intermediate

I have written quite a few articles that contain embedded references to DAX Queries and DAX Table functions in the past.  Today I am writing an article that introduces the DAX query language from the beginning so you can learn how to use it and how it will help you to become a Power BI superstar.

What is a Query Language?

A query language is a programming language that allows you to send a request to a database and then receive data back in the form of a table.  If you only have a few hundred rows of data then this is not overly useful.  If on the other hand you have millions of rows of data then it becomes very useful.

The most important difference between DAX when used in Measures and Calculated Columns and DAX as a query language is that the query language always returns a table.  The resulting table will consist of one or more columns, none or more rows, and data inside none or more of the cells in the table.

Why Do I Need a Query Language?

When you create a data model in Power BI or Power Pivot for Excel, you are essentially loading tabular data into a dedicated reporting database for the expressed purpose of analysis and/or reporting.  Once it is loaded, you (the report author) then typically spend hours of effort building out the data model so it can be used to add value to your business/job.  Finally once this is all done, the most common way to use your new data model is to build reports and pivot tables that present data to a user in a way that is easy to understand.

Below is an example of an interactive Power BI report that runs off a Power BI data model.

And here is a Pivot Table that runs off an Excel Power Pivot data model.

Both of these above visualisations are summarisations of the underlying data and in both cases you can’t actually see any of the the underlying data itself, just the summary.  The visualisation tools above are great and have enormous utility however they are generally not the best tools if you want to see the underlying data itself or if you want to extract very large sets of data for whatever reason.

Once you have invested all this effort in building your data model, you are going to want to use it to the fullest extent. Power Pivot and Power BI can handle many millions of rows of data (unlike traditional Excel) and so the “old way” of simply looking at the underlying table is not really practical in this brave new world.

Show Me the Underlying Data!

 

If you take the pivot table I showed earlier, most Excel users would be aware that they can double click on any value cell (say 31.6% representing Mountain Bike sales to people aged between 30 and 40 years of age).   When you double click on a traditional pivot table in this way, something magical happens – a copy of the underlying data is extracted from the source and it magically appears in a new sheet in Excel.  Once you are done looking at the copy of the data, you can delete the new sheet and you are back where you started – life is good.

When you double click on on a traditional pivot table in this way, you are effectively sending a query to the underlying database and the query response is a table of data.

But Does it Work With Power Pivot?

In short, yes and no depending on what you are looking for.  The experience is different if you have an Excel Power Pivot data model or if you are connected to an external SSAS data source. In my experience the results are inconsistent at best and at worst it may not give you what you expect.  But the good news is that with DAX as a query language you can easily write any query you want and extract exactly the subset of data you need.  You no longer have to rely on double clicking your pivot table to interrogate your data.

Time To Learn DAX as a Query Language

OK, enough of the chatter, time to get serious about DAX as a query language.  You have invested time and effort in building your data model in Power BI or Power Pivot and now you rightly want to extract a subset of your data so you can take a look at it.  Let’s get started.

First Select the Right Tool

There are 3 general approaches you can take to extract a table of data from your fabulous data model.

  • Use a dedicated tool like DAX Studio or SQL Server Management Studio.  I wrote an article describing how to install and use DAX Studio here.
  • Write a “New Table” in Power BI Desktop.  I wrote an article about how to use this feature here.
  • Use an Excel “table hack” I learnt from Marco Russo to extract a table directly into Excel.  I will cover this hack later in this article series.

I am going to use DAX Studio in this blog article but you can use Power BI “New Table” if you want to follow along and you don’t have (or want) DAX Studio. In fact if you want to learn this stuff then you really should follow along using either approach.  In my experience you can’t learn new concepts simply by reading.  Please note however that using Power BI and Excel as a query tool should be considered as a temporary extraction of the data.  Once you have looked at the data you want, you would normally delete the table so you don’t end up storing redundant data (you can always extract it again if you need it).

Let’s Start With the Evaluate Statement

All DAX queries must start with the EVALUATE statement. There is an exception however.  When you use the Power BI “New Table” button to write a query, you simply omit this EVALUATE statement.  This is no doubt another example of “Syntax Sugar” where the developers shield the user from some of the more confusing syntax.  If you are following along and testing my queries below in Power BI, then simply omit the EVALUATE statement when you write the queries.  In addition if you ever want to move your DAX Studio query to a table in Power BI, then you can just remove the EVALUATE statement at that time.

Connecting DAX Studio to the Database

When I launch DAX Studio I am prompted to connect to my database. In this case I have an instance of Power BI Desktop with an Adventure Works data model running on my PC and DAX Studio gives me an option to connect to that.  You can also connect to Excel Power Pivot data models and Tabular SSAS servers.

image

If you want a more comprehensive overview of the DAX Studio UI, then read my other post I mentioned earlier.  Otherwise let’s get into it.

Return an Existing Table

The easiest DAX query you can write is to simply return a complete copy of an existing table.

When I run this query in DAX Studio (shown as 1 below) and then press F5 to run, the entire table is returned (shown as 2).  A total of 18,484 rows of data (shown as 3).

image

This in itself is not overly useful as chances are you have this table already in some source system.

List of Unique Values

In DAX there are many functions that return tables instead of scalar values.  I covered this topic in depth last week so you can read up on that here if you like.

Let me now look at the ALL function.

When I execute this table function as a query, I can actually “see” the results of the query.  In this case the query returns a single column table consisting of 4 rows – all the unique product categories.

Next let’s look at VALUES.

In this instance, the VALUES table function returns exactly the same result.

image

Now let’s look at DISTINCT.

And same again here.

image

All three of these table functions return the same table in this instance.  In reality they all do different things in different filtering scenarios.  This is one of the great things about DAX Queries is that you can “inspect” what is happening to these table functions in different scenarios.  More on that later.

A Two Column Table

The ALL function can take 1 or more columns as an input (or a table).  In the example below I have used ALL to materialise a table of all of the combinations of Product Category and Product Sub Category.  There are 37 unique combinations in all.

image

A Filtered Table

Most people that know DAX will know of the FILTER function.  The FILTER function returns a filtered copy of a table.  eg see the example below

This is a much more interesting query because it returns a table containing all the products that have sold more than $200,000 of sales over all time.  There are 49 such products.

image

Altering Filter Context

When you write a DAX measure or Calculated Column, there is only 1 way you can change the filter context and that is using the CALCULATE function.  CALCULATE returns a scalar value after applying filters.  CALCULATETABLE is the sibling of CALCULATE.  As you can probably guess, CALCULATETABLE returns a table after applying filters.

This next query returns exactly the same result as the last query.

image

the main difference with using CALCULATETABLE compared to just using FILTER is that CALCULATETABLE can return a different table to the FILTER function itself.

The above query uses the same FILTER function from before, but instead of returning a table containing the products that sold more than $200,000, it returns all of the actual sales transactions that contributed to those products qualifying (see results below).

image

Combining Table Functions

Now that you see how some of the standard table functions work in a query, let’s look at combining them together into a single query.

This query first applies a filter on all products that have sold more than $50,000 (line 4 below), and then returns a list of unique product subcategories that contain these products.

image

One interesting thing in the query results above is that “helmets” are in the list (the only non-bike subcategory). To see what the exact products where that sold, it is simply a matter of writing another query as follows.

image

Note how I was able to reuse the the original query structure to return the list of product names with just a minor change to the query.

More to Come Next Week

This is a big topic – too big for a single blog article.  This week I introduced the topic of DAX queries, I showed how you can write queries to return tables and how you can combine table functions together to interrogate data.  Next week I will introduce some new special query functions that extend the capabilities even further.

SQL Saturday Melbourne Feb 2017 Materials

image

I am currently in Melbourne for PASS SQL Saturday 2017.  SQL Saturday is an annual one day conference event that occurs in many major cities around the world.  It is focussed on helping SQL Server professionals learn more about their profession and also network with other like minded people.  I have a lot of people that are readers of my website material that are not SQL Server professionals – most likely they are Excel professionals.  But that doesn’t mean there is nothing for you at SQL Saturday.  I encourage you to look for an event in your location, read the session materials and decide if there is value for you – particularly if you are into Power Query and Power Pivot.  You can learn more at www.sqlsaturday.com/

SQL Saturday will be held in Sydney next Saturday 18th Feb 2017 and I would love to see you there.

My Presentation: Disconnected Tables in Power Pivot

I spoke at SQL Saturday about disconnected tables in Power Pivot.  Most people know that you create relationships between tables in Power Pivot but did you know that you can load tables that are not joined (disconnected tables) and they can still add value.

My Slide Deck

Here are the slides that I used in my presentation today for those that would like to download them.  I realise that they really just support the live demonstration, but they should have meaning for those that were there.  I have also linked to some of my other articles on 3 of the slides if you would like to read more.

SQLSat_MattAllington_DisconnectedTablesInDAX

Use Power Query to Compare Database Records

I was helping a user on community.powerbi.com this week. The user had a list of service desk tickets for multiple customers and the requirement was to identify when a customer opened a second new ticket within 7 days of closing a previous ticket.  This is not an uncommon scenario so I thought it would make a good blog article.

image

One of the challenges that new users have when using Power BI is to decide if they should use Power Pivot (DAX modelling) or Power Query (PQL shaping) to solve each problem.  And if the answer is DAX, then they also need to decide if it should be a measure or calculated columnThere is no one single correct answer to these questions and it always depends on the circumstances.  I generally recommend to shape your data properly prior to loading to Power Pivot, that way the DAX formulas are easier to write and the reporting database will be more performant.   I looked at the options for this problem and decided to use Power Query to shape the data before loading.  Power Pivot and DAX are not really designed to do ordinal row comparisons, so this task is better handled with Power Query.

For this particular problem I thought I would produce a video that demonstrates one way in which this problem can be solved.  Here it is.

My Online Power Query Video Training

One of my new year’s resolutions was to baton down and create an online Power Query training course.  This has been on my to-do list for almost 2 years now and I am well under way.  The video above is a sample of my video course.  I start from the beginning and walk the viewer through the foundation skills they need to know to be a Power Query guru.  Towards the end of the course I will have a stack of videos like the one above where you can broaden your awareness of how you can use your new skills in real world situations.  If you would like to sign up and be notified once the course is finished (no obligation), then please enter your contact details in the form below.

Extract Tabular Data From Power BI Service to Excel

Someone asked me a question yesterday about exporting data from the Power BI Service into Excel.  There are a few options to do this however they all have their problems (these problems are not covered in great detail in this post).

  • Power BI has an inbuilt export data feature (there is an export limit of 30k rows for a free account and 150k rows for a Pro account.  Plus the connection cannot be refreshed).
  • Analyze in Excel (can only view the data in Excel via a pivot table, not a table of records).
  • Download the Power BI Desktop file to your PC and use DAX Studio (you have to download the entire model just to get the records you want, besides it may not even work at all).

I started to think that I may be able to connect directly to Power BI Service using DAX Studio and then use the techniques I cover here to extract the records.  I tried this but couldn’t get it to work (please let me know if you find a way to do this).  I started searching and found this old article I wrote in 2015 that uses a technique I learnt from Marco Russo.  l have used a similar approach to create a direct query of a Power BI Data Model and extract the records directly into a Table in Excel.

Edit: 11 Jan 2017 – see the bottom of this post to see how Chris Webb recommends to do this task – it is much easier :-).  But read on for some interesting learning.

Create a Link to PowerBI.com

The first thing I did was to select my dataset from the Power BI Service, clicked the Ellipsis (1 below) and then Analyze in Excel (2 below).

image

This downloads an ODC file to the downloads folder.  Mine was called Adventure Works.odc

image

Note: there have been many updates to the Analyze in Excel connectors since its launch last year.  If you haven’t updated the Analyze in Excel connectors for a while you would be well advised to do that first before attempting the rest of the procedure that follows.  This can be done by logging into Power BI, clicking Downloads (shown as 1 below) and then downloading and installing the connector software (2 below).

image

Edit the ODC File to Make it a Query

I then edited the ODC file using Notepad.  It looks like this originally

image

It is a bit hard to read the file as is, so I cut and paste the contents into an HTML editor to improve the readability, (as shown below) then pasted the formatted text back into Notepad.

image

As you can see above, the default command is “Cube” and the default text is “Model”.  The trick is to change the command to be “Query” and replace the text to be any valid DAX Query.

I changed lines 19 and 20 below as follows:

image

<odc:CommandType>Query</odc:CommandType>
<odc:CommandText>Evaluate Sales</odc:CommandText>

The Command Text I used above is one of the most simple DAX Queries you can write.  All queries must start with the EVALUATE statement followed by any valid DAX code that returns a table of data.  The simplest way to do that is just specify an existing table (Sales in this case). If you are doing this, you can use any table name in your data model.

I then saved and closed the ODC file.

Open the ODC in Excel

To open the ODC file in Excel, I then simply double clicked on the file.  This opened Excel and gave me the following security warning

image

I clicked enable, and there appeared the entire Sales table from my Power BI data model directly in a table in Excel.

image

Now that the connection has been established, the file can be saved and refreshed as needed in the future.  To refresh, simply open the file, right click on the table and click “refresh”.

Change the Query

So far this is pretty boring albeit maybe useful for some people.  Much more interesting is to be able to extract a subset of the data to Excel.  To do this you need to learn a bit of the DAX Query Language.  I am not going to go into the DAX Query Language in detail in this post, but below I show how to edit the current query and change it to a new query – in this case a query that will return the Total Sales by Invoice Number.

To edit the existing query, I simply right clicked anywhere in the table (1 below), then selected Table (2 below), then Edit Query (3 below).

image

This brings up a dialog that can be edited as needed.

image

For my demo, I replaced the command text with my new DAX Query as follows

image

After clicking OK, this query then returned a table of all invoice numbers and the total value of sales for those orders.  Interestingly the number formatting for Total Sales did not flow through to Excel as I expected it to.

image

Chris Webb’s Trick

Edit 11 Jan 2017. Chris Webb tweeted an alternate approach that is faster to execute. Simply create any pivot table using Analyze in Excel, drill through to the underlying table and then edit the query as before.

Further Learning

I may come back and blog about DAX as a Query language at another time.  Until then if you would like to learn what is possible, I recommend this video from Alberto Ferrari (although it is a bit advanced unless you have some solid understanding of tables and databases).

www.sqlbi.com/tv/querying-in-dax-tee2014/

Data Modelling with OneDrive Source Files

Last week I shared how it was possible to easily create an online survey using Excel for O365.  Once such a survey is created, the data can be imported into PowerBI.com and used to create a simple report.  The biggest issue with directly connecting to OneDrive like I showed last week is that you can’t do any serious data modelling.  You can’t write any DAX measures or join other tables into the data model etc.  Not being able to build a proper data model with my data set last week created the following issues (I am sure there are more).

  1. I couldn’t easily create a sort order column for my “how long have you been reading” question (hence why I needed to prepend a letter A, B, C etc to get the answers to sort).
  2. Without data modelling, there was no easy way for me to cleans the data. eg Some people said they came from “United States” and others said US, but that didn’t work in my map (it had to be consistent).
  3. image<

  4. I couldn’t write proper measures such as countrows(survey) to find out how many surveys there were.  Instead I had to rely on “Count of Name”.  If someone left the name field blank, it would not be counted.
  5. I couldn’t change the column names to be something more suitable for reporting – I was stuck with the actual survey questions in the visualisations.

All of these issues (and more) are solvable if you can simply build a regular data model, but this is not possible when you directly connect PowerBI.com to a flat table of data in the cloud.

This article explains how you can create a data model with a cloud based service (like OneDrive) and also how I then went about improving the reporting from last week given I could crate a proper data model.

Connecting Power BI Desktop to OneDrive

The alternative to connecting PowerBI.com to the OneDrive file is to use Power BI Desktop instead.

  • First build a data model using Power BI Desktop connecting to the cloud copy of the OneDrive file.
  • Publish the Power BI Desktop file to the service in the cloud (after you have done your data modelling of course)
  • Configure Power BI cloud service to directly authenticate to OneDrive in the cloud so that it can directly access the OneDrive data without further configuration or gateways.  Power BI automatically refreshes the data for you approximately once every hour or so.

Let me demonstrate how to do this using the same survey from last week.  If you haven’t already completed the survey you can do so by clicking here and then see you survey response included in the reporting below.  You can read about how to create an Excel survey like this from the blog article last week.

Copy the URL to your OneDrive File in the Cloud

There are a few ways to do this but the way I did it was to copy the URL of my OneDrive XLSX file and then use that to import the data “From Web”.

Get the URL

The tricky bit is finding the correct web URL of your file on OneDrive.  The way to do this is to open the Excel file in Excel on your PC and extract the URL from there.  You can either open the file from OneDrive in the cloud or the copy on your PC (I used the one on my PC).

Once you have the file open in Excel, click on the File menu, then Info (1 below), click on the OneDrive Link (2 below) and then click “Copy link to Clipboard” (3 below).  This placed the link into the clipboard.

image

With the URL still in the clipboard, I created a new Power BI Desktop report using Get data, Web as shown below.

image

I then pasted the URL from the clipboard and manually deleted everything from the question mark to the right (highlighted below).

image

So I was left with the following

image

I then got the following complicated set of options for Authentication. I took the last option to sign into my Organizational Account, and signed in with my Office 365 credentials

image

image

After authenticating, I was able to import the table of data into Power BI Desktop and Load as normal.

image

Load to Power BI and Configure Credentials

I will continue the process of building out my data model further down the page, but first let me complete the process to publish and configure the online credentials process.

First I saved the query (close and apply), saved the workbook and published it to my PowerBI.com account.  Once the dataset was in the cloud, I needed to configure my credentials for the dataset.    I clicked on Settings (1 and 2), Datasets (3) and then selected my dataset (4 below)

image

Because this is an online data source, I was able to edit the credentials as shown below.

image

After clicking Edit Credentials, I selected OAuth2 and signed in.

image

And I then got this confirmation.

image

Improving the Reporting Using Data Modelling

Back to the data modelling in Power BI Desktop.  The first thing I did was turn off data load for the table I had previously loaded.  This was quite easy. Right click on the table, then uncheck “Enable Load”.  This keeps the connection to the table but doesn’t load it in the current form.

image

The rest of this article explains how I changed the data to make it more useable in my final report.

Cleanse the Country Data

The next thing I did was create a data cleansing table for the list of countries. Ideally the survey tool would provide a list of all valid countries to select from a list, however this is not available using the simple online Excel survey tool I am using.  Instead I created a substitution table where I can manually create substitutes for data that I wanted to change.

I created a new query with reference to the first.  Right click on the first query (1 below) and then select Reference (2 below).

image

I then deleted all columns other than the country column, removed duplicates and sorted the column.  Finally I set it to disable load.  It is now possible to see all the data points entered and identify the dirty data that was included (shown below).

image

I then created a manual mapping table to make it easier to control the data cleansing process.  To do this, I simply created a new manual table in Power BI Desktop as show below (and also turned of data load).

image

Sorry to the non descript North American if you happen to be Canadian, Mexican or of some other Nationality.  I decided to map this to the USA.

Now I had the mapping table, the next step was to replace the original country data with the cleansed data. I created a new query referencing Table1, then joined on the country columns to the CountryCleanse table.

image

Using a Left Outer Join as shown below

image

I then expanded the new column to extract the cleansed version of the data.

image

This gave me a new column that contains the substitutes in the case there was a match in the cleansing table.

image

I then created a new custom column that uses this cleansed column if it exists, otherwise the original column.

image

image

I then deleted the other 2 columns and renamed my new column so that I had a cleansed copy of the Country.

Cleanse and Sort “How Long” Answers

In my survey question, I asked how long had the participant been reading my blog.  When the answers were placed in Power BI they didn’t sort based on length of time (they sorted alphabetically).  To fix this (using the original approach I blogged about last week) I was forced to add a prefix to force the answers to sort logically.  Using Data Modelling, it is possible to solve this problem properly.  For variety (in this blog), I decided to use a different mapping table solution. I first created a mapping table with the original survey choices, a new set of labels and a sort column as follows.

image

Then I loaded this mapping table as a lookup table to my data model and joined it on the Original column as the key.

image

Then to avoid confusion in the final report, I hid the original columns in both tables.  Right click on the column (1 below) and then select Hide in Report View (2 below).

image

Of course I sorted the Tenure column

image

Adding “All of the Above” into the other 3 choices

The last problem I wanted to solve was that many people chose “All of the above” when asked about their favourite topics.  When this was put in a visual (shown as 1 below) it is not possible to see the overall most popular item.

image

There are many ways to handle this problem.  The measure I wrote was as follows:

Count Most Useful =
COUNTROWS ( Data )
    + CALCULATE (
        COUNTROWS ( data ),
        Data[Which topics do you find most useful?] = "All of the above"
    )

The measures simply adds the count for each item and also adds “All of the above” as well (shown as 3 above) and I then simply filtered out the “All of the above” answer from the visual (shown as 4 above).

My report was starting to look good (see below) but I still wasn’t happy.

image

Ideally I should be able to show “what is most useful” and also “What do you want to see more of” in a single bar chart.   But the problem was that there were 2 different measures and the “bars” are coming from 2 different columns in the data source.  This could not be solved without thinking differently about the problem.

I created a new table that has the 3 responses and loaded it to the data model.  I didn’t connect this table to any other tables in the data model.

image

I placed this new column of data as the Axis of my visual and place the 2 measures as Values.  This then gave me a chart that looked like I needed, but of course there was no filtering, because the new “Types” table is not connected to anything in the data model.

This is important:  The Axis is part of the filter context, the Types table is being filtered by the visual, but this filtering is not being used in the measures I wrote.

image

So I changed the measures so they would “harvest” the value on the Axis to filter the results.

Count Most Useful =
IF (
    HASONEVALUE ( Types[Product] ),
    CALCULATE (
        COUNTROWS ( Data ),
        Data[Which topics do you find most useful?] = VALUES ( Types[Product] )
    )
        + CALCULATE (
            COUNTROWS ( data ),
            Data[Which topics do you find most useful?] = "All of the above"
        )
)

The final visual looks like this, which is much easier to read and understand.

image

And here is the final report

Import Tabular Data from PDF using Power Query

Today I am sharing a process I developed that allows you to import tabular data from a PDF document into Excel (or Power BI) using Power Query.  I didn’t want to purchase software to do this task so I started experimenting on how I could do it with the tools I already have, and I am sharing my solution here today.

pdf-to-pq

Note:  This will only work for tabular data in a PDF – exactly the same as downloading from a Web Page must be in tabular form.

I also demo some neat Power Query tricks further down in this post, so make sure you read through to the end.  These tricks are not hard, and you may be surprised at what you can do.

Process Overview

Here are the steps I use to grab the table from the PDF.

  • Open the PDF in Microsoft Word.
  • Save the file as a single file web page.
  • Import the single file web page into Power Query as HTML.
  • Go through the normal cleansing and transformation process you would normally do (plus of course the cool tricks I use below).

Worked Through Example

I did a quick Google to find a PDF containing something I could use in my demo from the Web.  I found this PDF that contains a list of US States with their abbreviations

www.siue.edu/postal/pdf/Abbreviation-List.pdf    Note the data I want from this file is in tabular format.

image

I saved the PDF to a known location on my PC.

Convert the PDF to a Web Page File

I opened Microsoft Word and then I opened the PDF file from within Word.  I got this warning message below.  I just clicked “don’t show this message again” and clicked OK.

image

I then selected File\Save As and saved the file as a Single File Web Page (shown below).

2016-11-18_121653

 

Import into Power Query

There are a couple of ways you can do the next step. I found the following to be the easiest.

Create a new query that connects to a text file (example shown below – Excel 2016, but it is similar in all other versions and Power BI)

image

In the dialog, change the file filter so it can find all file types (as shown below), then browsed until I found my file.

image

Then select the MHT file created earlier and click “import”

image

Power Query will not correctly identify the file type, so right click on the file (shown below) and the select HTML.

image

Power Query now correctly identifies the table in the document.  I expanded the one identified as “3DTableGrid”.

image

In your own PDF documents, you may need to use some trial and error to find the right table.

I then clicked on the Table I wanted (shown as 2 above).

Cleansing the Data

The data is starting to look good at this stage.  But note there is a bit of “noise” in the file (see below). Power Query is a great tool for fixing that.

image

First I got rid of the first row (Remove Top Rows, 1).

Next I copied the text <= /span> by right clicking on one of the cells containing this text, and selecting “copy”.  Then I selected all 4 columns and did a Transform\Replace Values and replaced <= /span> with nothing.

I didn’t need the bottom 11 rows, so I did Remove Rows\Remove Bottom Rows\11

image

Now for Some Power Query Tricks

Power Query is a formula language. If you analyse each of the steps in the Applied Steps window, you will notice that each step is a formula, and the formula takes the output of the previous formula as the input step to the next formula.  The UI assumes each step will take the previous step as an input to the new step, but it doesn’t have to be that way.  You can override that if you want – I show you how below.

Turn on the formula bar before proceeding.

formula

At this point, I have my data in 4 columns, but I really want all the data in just 2 columns.

image

This is easy to fix with some simple Power Query tricks, with very little PQL coding at all.

First I renamed this step to be called All4Columns.  I like to rename significant steps in my Applied Steps window to make it easier to find the ones I need later.

image

Then I removed the last 2 columns using the UI.  Select the columns to remove, right click and then remove columns.

image

Before moving on I renamed the 2 columns to be called “State” and “Abbreviation”, and renamed the step to be called First2Columns using the same renaming approach as before.

image

Now the trick. At any point in your query, you can add a custom query step by clicking the button shown in 1 below.

image

Power Query assumes you want to add a new step that starts from the previous step. So Power Query automatically adds the previous step as the starting point (as shown below).

image

But you don’t have to keep the previous step.  In this case I want to refer to the step “All4Columns” instead.  So I just typed in the name of the step I wanted (as shown below) to replace what Power Query added for me.

= All4Columns

Now that I had all 4 columns back, I went through the following steps.

  • removed the first 2 columns
  • renamed the columns to be “State” and “Abbreviation” as before
  • renamed the step to be called Second2Columns

This then left me with 2 non-sequential steps (First2Columns, Second2Columns) that each contained half of the data.

Append The 2 Tables into a Single Table

I didn’t know how to write the code to do this, so I just used the UI again.  I selected “Append Queries” and appended the query to itself.  That forced the UI to write the code for me as shown below.

image

Not only did the UI write the code for me, but I learnt a single PQL function that I think I can probably remember Table.Combine( ) and hence I may be able to do the same step by hand next time around (this is how you can learn too).  Then all I needed to do was change the code above so that instead of appending to itself, it would append the First2Columns to the Second2Columns.  You don’t need to be a rocket scientist to work out how to do this :-).   It looks like this after I made the change.

= Table.Combine({First2Columns, Second2Columns})

Wrap Up

I hope you found this article useful, and you have learnt a few things.

  1. How to import data from a PDF
  2. How to refer to non sequential rows in a query
  3. How to use the UI in Power Query to help you write more advanced queries.
  4. How to refer to the Formula Bar to start to build your knowledge of the Power Query Formula Language.

If you are into Power Pivot and you haven’t already done so, you may like to download my paper “Ten Things I Wish I Knew When I Started With Power Pivot” from the link below.

Edit: 24th Nov 2016
A friend of mine Dave Marriott was using my approach above but had a stack of PDFs that he needed to access. He decided to write some Word VBA code that automates the task of converting the PDFs to MHT files. Dave is a professional VBA programmer based in Adelaide, Australia if you would like to hire him to help you be more efficient in your business. You can contact him at LinkedIn

Copy this VBA code into Microsoft Word.  Why not add it in your Normal Template so it is always available?  If you haven’t done this before, you can follow my instructions on how to do this in Excel (it is exactly the same process in Word).  Run the code to launch a dialog. You can multi-select as many PDFs as you like and the code will convert them all for you.

Sub ConvertToMHT()

Dim f_dialog As FileDialog
Dim FileChosen As Integer
Dim FileName As String
Dim i As Integer
Set f_dialog = Application.FileDialog(msoFileDialogFilePicker)
f_dialog.InitialView = msoFileDialogViewList
f_dialog.AllowMultiSelect = True

FileChosen = f_dialog.Show
If FileChosen = -1 Then
     For i = 1 To f_dialog.SelectedItems.Count
          Documents.Open FileName:=f_dialog.SelectedItems(i)
          ActiveDocument.SaveAs2 FileName:=Mid(f_dialog.SelectedItems(i), 1, Len(f_dialog.SelectedItems(i)) - 4) & ".mht", FileFormat:=wdFormatWebArchive
          ActiveWindow.Close
     Next i
End If

End Sub

Power Query to Combine Web Pages

Level: Intermediate

There was an interesting question this week on powerpivotforum.com.au asking if there was a smarter way to user Power Query over multiple identical web pages to scrape the data in a single query.  I have been meaning to blog about this for a while, so it is a great opportunity for a mini-Friday post.

The process of typing all the steps on how to do this is a lot longer than simply recording a video.  So instead of a long post, I have embedded a video below from my YouTube channel.

First the Problem

Take a look at this web page  www.calendar-australia.com/holidays/2016/

It contains all the public holidays for the calendar year 2016.  But there are 2 similar web pages with the same structure but different results

www.calendar-australia.com/holidays/2017/

www.calendar-australia.com/holidays/2018/

How to do it in Power Query

I’m sure I first learnt this technique watching Miguel Llopis from Microsoft, probably at PASS SQL BA Conference in 2015.  The technique can be reused in many different scenarios like this one. The trick is virtually identical to the technique I showed in this post about combining multiple Excel workbooks into a single query. The solution works equally well in Power Query for Excel as well as in Power BI (Get and Transform). The steps are

  1. Build a query for one of the sets of data
  2. Turn the query into a function using the simple parameter step
  3. Modify the query to use the parameter instead of the hard coded set of data
  4. Create a table with a column of the parameters you want to use (in this case the data must be text, not numbers).
  5. Add a new column that takes the text from the first column and processes the function for each row.
  6. Expand the column into a single table.

Here is a video showing how easy it is to do.

Seems there is a special birthday this weekend.  Should be fun!

Measures on Rows – Here is How I did it

Level: Intermediate

You may or may not be aware that it is not possible to put Measures on rows in a Matrix in Power BI. But I came up with a trick that makes it possible, so read on to find out how.

Measures Can Only be Placed on Columns

First the problem. The only way that you can use the Power BI Matrix visualisation (at this writing) is to place the measures on the columns as shown below.  You can then take any column of data from your data model (typically from a Dimension/Lookup table) and place that on rows in the Matrix.  So you end up with this.

image

This limitation does not exist in an Excel Pivot Table.  As you can see below, it is possible to flip values between columns on rows from the Pivot Table Fields List.

measures on rows

Old tricks for New Purposes

When I was thinking through this problem, my first intuition was to use a Switch statement somehow.  Switch statements can be used to create a Switch Measure (like the ones I used in this blog post here back in 2014) and I figured this could be part of the solution.  And then I remembered another trick I learnt from Rob Collie using disconnected tables to feed a measure – I new I had a solution.

So in summary, I wrote a Switch measure that will morph into different measures when filtered, and then used a disconnected table to pass a filter to that measure.

Write a Switch Measure

The Switch measure is pretty easy to write as follows:

The way this measure works is that it takes an input in the form of an integer, and then depending on that number, it will return the value of the appropriate measure as the result.

Then Create a Helper Table

The next step is the secret sauce to this trick.  I created a disconnected table with the sole purpose of passing the correct filter values (the integers) to the Switch measure.

I used the “Enter Data” feature in Power BI to create a list of the measures I want to put on rows in the Matrix.

image

Note that the ID column is also the sort order of the measures, and the ID column also matches with the Switch measure numbering scheme.  Once the table is created, I added it to the data model as a disconnected table – no need to join it to any other tables in the data model.

I then set the sort order of the Measure column as follows:

image

Write a Harvester Measure

The next step is to write a harvester measure to extract the selected filtered value of the table.  This measure is simply as follows:


The measure above will return the largest integer in the current filter context.  If there is a filter (eg a slicer) on the Measure column and the user as selected “Total Margin” in the slicer, then there is only a single row visible in the filter context, and hence the MAX of the visible rows is 3.  I would of course get the same result if I used MIN, AVERAGE, SUM or indeed even VALUES in my harvester measure.

Filter context is one of the hardest things for new users to understand properly.  I explain filter context in detail (and everything else you need to know to be great at using Power Pivot and Power BI) in my book “Learn to Write DAX”.
L2WD banner ad

You can see the harvester measure in action below.  I have placed the column from the MeasureTable into a slicer and the harvester measure into a Matrix.  When I click on the slicer, the harvester measure updates to show the ID of the selected filter.

slicer

Add the Measure Column to the Matrix

There is more than 1 way to filter a table of course.  In the image above I am using a slicer, but I can also use the rows on the Matrix to provide filter context.  When I place the Measure column from the MeasureTable onto rows in the Matrix, you can see below that the harvester measure is correctly returning the value of the ID column to the visualisation.

image

Pulling it All Together

In order to pull this all together, I replaced the harvester measure with the Switch measure I wrote earlier, and put the Categories column from the Products table onto columns in the Matrix.

image

But there are now 2 problems.  The first problem is that the Total Invoices are shown in currency format.  One limitation of this trick is that all the measures must have the same number formatting.  If you would like to see an optional “Alternative Format” for the SWITCH measure, please vote for that idea here.

The second issue is that the total row is actually showing the total invoices and not the total of all the measures.  This makes sense of course because the grand total row is not filtered.  As a result, the MAX of ID is 4 and hence the SWITCH measure is returning the value of the measure that maps to ID 4 (Total Invoices).

To fix these 2 problems, I removed Total Invoices from my table and rewrote the SWITCH statement to correctly handle the grand total row.

And there you have it, Measures on Rows in a Power BI Matrix.

image

Power Query Over a Command Screen Output File

Level: Intermediate Power Query.

I spent a lot of last week helping to configure Power BI in preparation for go live for a client.  One of the important things to do when designing a Power BI solution is to make sure you have a good design for your user security access.   Today I am going to share the approach I used to track user security configuration with Active Directory Groups.

This post is more about Power Query than it is about Active Directory, so even if you don’t have this specific scenario, it is still worth reading this blog post to see how I dealt with some tricky issues in cleaning data that was extracted into a text file using Power Query.

Using Active Directory Groups

I don’t want to go into a lot of detail here, but in short, we set up dashboards and reports targeted at different groups of usersRather than assign users access directly to the dashboards in Power BI (by adding their email addresses one by one), a much better solution is to create an Active Directly Group, assign the users to the active directory group using the normal business process, and then assign the AD group to the Dashboard in Power BI.

image

Note that you must assign an email address to the Active Directory group to be able to do this.

The benefit of this approach is that you can have a single corporate process for assigning user access, and this is normally Active Directory (for medium to large companies).

The Problem – How to Keep Track of Which Users Have Access

If you are an Active Directory Administrator, then I am sure you have all sorts of tools to work out who has what access.  But let me tell you – I don’t have those tools (or don’t know about them anyway), and I quickly found out it was a pain to try to check user access when you don’t have the right tools.

Command Line Tricks to Check Users in an AD Group

A quick question around the office, and Lester told me how to run a command line request to find out which users are in a group.  To do this:

  • Open a new command prompt on your PC (that is connected to the company network of course).
  • type the following command
net group /domain "GROUP NAME IN INVERTED COMMAS"

The above command line string will send a list of all users assigned to the group to the screen like shown below

image

Checking One Group at a Time is a Pain too

This was great, but soon I realised I wanted to get ALL the groups I wanted, and then look at the data in Excel.  I didn’t want to have to go to the command line each time, so I developed the following process to create a simple to use Excel file.  This is a summary of the steps I took to achieve this outcome.

  1. Created a batch file that ran the command line code for all the groups I needed to interrogate (one after the other), and then sent the results of each of these commands to a single text file.
  2. Imported and cleansed the data in the file to Excel using Power Query
  3. Set up some slicers on top of my final Excel table to make it easy to access the information I needed.

This is what I ended up with

ad groups

Below I step through the details on how to complete each step to achieve this final report.

Step 1.  Move the Manual Code into a Batch File

This is easy, even if you haven’t done it before

  • Open notepad.exe and paste the first line of code that I showed above.
  • I appended some more code at the end of the line to tell the command line to send the results to a file (instead of sending the results to the screen).
    Note I added a single > symbol pointing to the name of the output file in the first line of code below.
  • Then I duplicated the first line of code multiple times (cut and paste), once for each group I needed – then changed the group name of course.  Also note that the > symbol is a >> for lines 2+ in the batch file (> means send the data to a new file, >> means append the data to an existing file).  I had many more groups in real life.
net group /domain "FINCUBE GROUP" > c:\users\allingtm\Documents\list.txt
net group /domain "FINCUBE ALL" >> c:\users\allingtm\Documents\list.txt
net group /domain "FINCUBE ADHOC" >> c:\users\allingtm\Documents\list.txt

Now save the text file with a *.bat extension in an easy to find location (note, first change the file type filter in Notepad to be *.* instead of *.txt).  From the command line, execute the batch file by typing the name of the file (including the full path and excluding the bat extension).  The command line will then chug away and produce the output file with all the Active Directory information needed.  This is what the final file output looks like in a txt file (sample of test data only of course).  In my real scenario there were many more names and many more groups which is of course exponentially harder to manage manually than this simple test data.

image

Step 2. Import into Excel Using Power Query

I am not going to go through every step I used in Power Query, one at a time here as this is an intermediate post.  I have however attached the workbook at the bottom of the page – download it and take a look if you want more details about each step in the transformation.  I am just going to call out some of the more interesting things I had to do to get the final output.

The Default File Output is not Very Excel Friendly

The first thing to note is that this output is great for a command line screen, but not very good for Excel.  So you have to deal with these issues in Power Query the best you can.  It is easy to deal with blank lines and dotted lines etc with a simple filter.  Just deselect anything that is not of value for the task at hand as shown below.

image

All of the lines I have deselected (shown in the image above) are repetitive lines that are created for each of the AD Group commands in my batch file.

Once I had done this, I had a single column of text data as shown below.  Note the name of the groups appear first, followed by one or more rows of user names.  This pattern is then repeated down the page.  Each row of names has 1 or more user names per row – tricky!!

image

I then added a new custom column in Power Query by hand writing an IF formula.  An If formula in Power Query Language (PQL) is in the format

if something1 = something2 then something3 else something4

Note PQL is case sensitive for the formulas and the text it is looking at.

See my formula below along with the output in a new custom column.

image

Basically I am checking the first column to see if the row is a Group Name or a list of User Names.  If it is a Group Name, then return that Group Name.  If it is a User Name, then return null.  The reason null is required is important – it is then easy to copy the group name down the list as shown below

image

Now that every row has a group name as part of the row record, it is now OK to go ahead and delete every row in the first column that starts with the text “Group name”.  I did this with a filter rather than manually selecting each row – that way it is future proof if I add new group names down the track.

image

Remember everything is case sensitive, so typing “group name” would fail here as my data uses “Group name”.

Split the User Names into Columns

This is what I have now (after some reorg).  Time to split the users into columns.

image

This looked easy, but there was a surprise waiting for me. When I split on “space” repeatedly, I discovered the number of spaces between each name varied by line – anywhere from around 17 spaces up to about 21 spaces in total.  This stumped me – how was I going to iterate over this file and remove a seemingly random number of spaces to parse out the names?  I gave up and went to lunch.

During a nice kebab with my family, I had a brain wave – I wondered if the file may have been a fixed width file format, adding more or less spaces depending on the number of characters in each user name.  I then remembered an article (Chris Webb I think) talking about the new monospaced font feature in Power Query – just what I needed.  As soon as I got home, I found the setting for the mono spaced font and turned it on – BAM!  My hunch was correct.

image

It wasn’t until later that I realised that I should have known this from looking at the original text file – oh well.

I then selected one of the cells of user names (shown as 1 below) and then selected the results pane (shown as 2 below).  Once the cursor is in the results pane, I used my arrow keys to move the cursor left and right to count how many characters made up each “column” of user names (25 as it turned out).

image

So I applied a “split column by number of characters” step as shown below.

image

I then un-pivoted the resulting user name columns, trimmed out the trailing spaces, removed the columns I didn’t need and did a general clean up. This is what I then sent to a table in Excel

image

The last thing I did was add a slicer for each of the columns.  This gave me a very usable tool where I could select either a group or a user name, and see the relationship between the two in the resulting filtered table.  See below.

ad groups

And given I had set this up using a batch file and a Power Query report, the process is infinitely repeatable with very little effort.  20 seconds to re-run the batch file, 40 seconds to refresh the Power Query workbook, and in as little as 1 minute I can have a refreshed set of data.

You can download the sample text extract, the sample batch file and my Power Query workbook here if you would like to look in more detail at the process.

LASTNONBLANK Explained

Level: Intermediate

Last week at my Sydney training course, one of the students asked me a question about LASTNONBLANK.  This reminded me what a trickily deceptive function LASTNONBLANK is.  It sounds like an easy DAX formula to understand, right?  It just finds the last non blank value in a column – easy right?  Well it is a little bit trickier than that, and I am going to explain it all in this post today.

LASTNONBLANK has a sibling FIRSTNONBLANK that operates in exactly the same way but in reverse.  Given the behaviour is the same, I won’t cover FIRSTNONBLANK at all, as you can work it out after reading about LASTNONBLANK.

Syntax

The syntax of LASTNONBLANK is as follows.

LASTNONBLANK(Table[Column],<expression>)

It takes 2 parameters, 1) a column and 2) an expression.  In fact you can also pass a single column table as the first parameter in as well.  This could be useful for example if you wanted to create your own table on the fly using a function that returns a table, and then use that as the first parameter.

LASTNONBLANK is an Iterator

It is not immediately obvious, but LASTNONBLANK is actually an iterator,  It iterates over the column specified in the first parameter and finds (unsurprisingly) the last value in that column that is not blank.  (Technically the engine carries out this task in reverse natural sort order for efficiency).

But what is that pesky second parameter?

The first thing people normally find confusing about LASTNONBLANK is that pesky second parameter.  If all you want to do is find the last non blank value in a column, why do you need this second parameter?  Well in short, you don’t “need” it in that use case (except that it is mandatory).  But having this parameter makes the formula much more powerful, so it is good to have it as an option. It is designed to allow you to write a DAX expression such as a measure to test for some condition or value.  Ideally (in my view) Microsoft should have made this an optional parameter, but that is not what we got.  But there is an easy work around that allows you to make this parameter optional  – just use the value 1 as the second parameter, like this.

LASTNONBLANK iterates through the Table[Column] and then checks to see if the second parameter has a value.  The number 1 always has a value of course, so placing 1 as the second parameter has the same effect as just ignoring this parameter.  Instead the formula will just return the last non blank value it finds in the column.

LASTNONBLANK actually returns a TABLE

Another thing that is not immediately obvious is that LASTNONBLANK actually returns a TABLE.  It is a special single column, single row table that by definition can only have 1 possible value (because it has only 1 row and 1 column).  One feature of this single row, single column table is that you can use it as both a scalar value or a Table in DAX.  More on that later too.

Test Data

My test data for this blog post is monthly bank account balances for 2 fake bank accounts.  Measures that calculate account balances in DAX are often described as “semi-additive measures”, because you can’t just add up the values from each month to get the current balance – you need to find the latest balance to work out how much you have.  This type of data is a prime candidate to use the LASTNONBLANK formula, because LASTNONBLANK does exactly what we need to work with semi-additive measures.

I have set up the following data with some interesting features to demonstrate the behaviour of LASTNONBLANK.

image

It is easiest to see what I have done with this test data in the pivot table below.  I have written the following measure to demonstrate the point.

Note the above measure is a test measure only to help with this blog – it doesn’t really make any sense as is, but it is useful for describing what is happening.  When the above measure is placed in a pivot table, it looks like this (shown below).

image

Note a couple of things about this data.

  1. Kathy’s account doesn’t have a result for May
  2. Matt’s account doesn’t have a result for July
  3. The maximum value for Kathy’s account is in June (the previous month’s data load)
  4. The maximum value for Matt’s account is back in May.

I have set the data up this way to demonstrate the behaviour of LASTNONBLANK.  I often talk about how important it is to set up test data that will flush out issues when you write your formulas.  This is a good example of that, and it will make more sense as you read on.

Last value in a Column

Now the objective is to use LASTNONBLANK to find the last value in a column.

Consider the following formula.

This formula finds the last date in the data table (note it is not the last date in the Calendar table, but the data table).

The following pivot table has the Month name from the calendar table and the account names on pivot table columns.

image

Note that the formula correctly indicates that May data is missing for Kathy, and July data is missing for Matt.  Also note that it correctly gives the last date in the Grand Total of the pivot table.  (You would also get the same result as above if you used LASTDATE instead of LASTNONBLANK).  I have only used LASTNONBLANK here to demonstrate the behaviour.

LASTNONBLANK operates over a sorted column

The [Last Non Blank Date] measure above hides some complexities about LASTNONBLANK.  You might expect that LASTNONBLANK finds the last value in the column, but that is not how it works.  It actually finds the last value in a sorted column using the natural sort behaviour of the column data type.  In the case of a date column (like shown above), then everything is sweet – the natural sort order is also the order we normally load the data (chronological order).  But in the case of the balance column, the natural sort order of a numeric column is numerical sort order, not the order the data is loaded (as you can see in the following pivot)

I have written another test measure as follows

When placed in the pivot table, you get the following behaviour

image

In the above example, you may expect the Grand Total row to return the values 2,125 for Kathy and 1,557 for Matt as these are the last values you loaded, but that is not how it works.  When looking at the individual rows in the pivot table, it all works fine – it correctly finds the last balance for each month.  This measure works correctly on the rows in the pivot table because the rows provide “initial filter context” prior to the measure being evaluated.  So there is only ever 1 row in the data table at the time the measure is evaluated, and that is why it works in this case.  But in the Grand Total row in the pivot table, there is no initial filter context applied to the date, hence all values in the column are iterated over by LASTNONBLANK.  The iteration operates over a sorted version of the balance column (the natural sort order of the column, which is numeric in this case).  Once the balance column is sorted, then 2,200 will be the last value in the column for Kathy, and 1,806 will be the last value in the column for Matt, and that is the result that is returned.

LASTNONBLANK as a TABLE

In this next test measure, I have used LASTNONBLANK as a filter input into a CALCULATE function

You should remember that CALCULATE can use a simple filter such as Table[Column] = “some value” or it can take an advanced filter where you must pass a TABLE as the filter.  The measure above therefore suggests that LASTNONBLANK must be a table, not a scalar value.  This theory can be tested by firing up DAX Studio and executing the LASTNONBLANK portion of the formula as a query.  DAX Studio ONLY ever returns tables, it can’t return scalar values, so if the query works, it confirms that LASTNONBLANK returns a table. Note: you can read more about using DAX Studio as a tool to help you learn DAX at this blog post here.

image

You can see above that LASTNONBLANK returns a single column, single row table that contains a single value.  It is a special table because it can also be used as a scalar value, in the same way that you can use VALUES() as a scalar value if there is only a single row returned.  In the case of the measure [Latest Balance with 1] above, I am using LASTNONBLANK as a table parameter filter inside CALCULATE.

But how does this table act as a filter?

One feature of temporary tables in DAX is they retain a link to the data model, and filter propagation will work between the temporary table and the rest of the data model.  You should visualise this in your mind like the image below.  Imagine a new table that has spawned into the data model and has a relationship to the table where it came from (in this case it is the data table).  The filter propagation flows from the 1 side of the relationship (temporary table imagined as shown in 1 below) to the many side of the relationship (table 2 below).

image

So you can see above that the LASTNONBLANK function produces a table that then filters the data model based on the single value it returns in the table.

When I put this measure in a pivot table, I get this result

image

Note that the measure is correctly returning the last value for the account “Kathy” but it is returning a blank for account “Matt”.  Technically this is correct because the calendar has dates in July and there is no entry for Matt for July, so the last value is blank.

Enter the second parameter

Now it is time to change the second parameter with the following measure.

When this last measure is added to a pivot table, it works as desired, correctly returning the last non blank value from the data table based on the chronological order of the data.

image

This new LASTNONBLANK function measure returns the last non blank value it finds in the column provided it also returns a non-blank result for the measure [Sum of Balance].

It is worth pointing out here that the ONLY REASON this formula works is because the measure [Sum of Balance] has an implicit CALCULATE wrapped around it – you can’t see it, but it is there.  LASTNONBLANK is an iterator, and like all iterators, it has a row context but doesn’t have a filter context.  Therefore the following formula will not work

But this next measure does work (this next formula is the equivalent of the one that uses [Sum of Balance] because [Sum of Balance] has an implicit CALCULATE

I cover evaluation context and context transition in detail in my book, Learn to Write DAX.

Further reading/references

Here are some good links that helped me learn how LASTNONBLANK works, and how it can be used/leveraged if you are interested in doing some more reading on this topic.

www.sqlbi.com/articles/semi-additive-measures-in-dax/

www.sqlbi.com/articles/alternative-use-of-firstnonblank-and-lastnonblank/

www.powerpivotpro.com/2012/06/top-selling-product-using-firstnonblank/

A Double CALCULATE Solves a SUMX Problem

Level: Intermediate

I helped a member at powerpivotforum.com.au with a problem last week that ended with an interesting solution.  The explanation of how it worked was a bit complicated and worthy of sharing, hence this is the topic of today’s post.

Count the Working Days Between Two Dates

The requirement was to count the working days between the arrival date (say of a request) and the approval date of that request.

I set up a simple data model as follows

Data Table

The data table has a unique identifier (ID), a received date and a approved date.

image

Calendar Table

I created a simple calendar table that has just 2 columns; a Dates column and a flag to indicate if each date is a working day.  I used 0s and 1s in this second column so that it would be easy to add up this column to calculate the total working days.

image

Table Structure

I then created a single join between the calendar date and the received date.  Technically I could have created the join on the Date Approved column as an alternative, however it is possible (even likely) that some requests will be “open” and hence will not have an Approved Date at some point in time.  For this reason I have used the Date Received column to make the join.

image

Calculated Column or Measure?

There are many ways to write a formula that will work.  Of course it is possible to write a calculated column in the data table that contains the total business days for each request.  This would not be a complete disaster in this case as the cardinality of this column is likely to be relatively low.  However this is not the best approach for a number of other reasons, the main one being that “Measures” are the turbo charged super tools of Power Pivot (read all the reasons here).  You should invest time to learn how to write Measures, even (especially) when it is hard. In doing so you will build your skills and depth of understanding of how to use and leverage Power Pivot.

If you laze about in calculated columns world, you will never develop your skills enough to be really good at DAX.

Part Solution:  A Simple SUM Measure

The first part of the Measure I wrote to solve this problem is as follows:

image

When reading a CALCULATE formula, you always start with the second parameter – in this case the FILTER function (lines 4 through 8).  Filter is an iterator, and in this case it is iterating over the CALENDAR table AFTER any existing filters on the Calendar table are first removed (line 5).

FILTER works through each row in the Calendar table and checks to see if the date of each row in the Calendar table is greater than the maximum Date Received (in the current filter context) and also if that date is less than the maximum Date Approved (also in the current filter context).  The FILTER function keeps the rows in the Calendar table that pass this test, and then CALCULATE (line 2) applies this FILTER prior to adding up the week days (line 3).  Actually, This FILTER function is not very efficient – a better solution is provided below.

It is easier to understand how the formula works with an example, so read on.
L2WD banner ad

Worked Through Example in a Pivot Table

Look at Request ID 2 in the pivot table below (the second row in the pivot). Pivot tables naturally provide initial filter context to the data model – that’s what they do; that’s why they are so great. 

image

When looking at Request ID 2 in the above pivot table, the data model has an initial filter context so that only a single row in the Data table is “visible”.  It is possible to simulate this initial filter context into the Data table by physically applying a filter to the source table (like shown below ID = 2).  You can do this in the Power Pivot window, or in this case I have used the linked table in Excel to simulate the filter as shown below.

image

When you simulate the initial filter context like this, it is much easier to understand how the formula works.  Referring back to the formula (lines 6 and 7), you need to ask yourself “What is the MAX value of Date Received in the current filter context?”.  The answer is clearly 4th Jan and the maximum of Date Approved in the current filter context is 9th Jan.  Once you know the answer to the MAX parts of this FILTER function (lines 6 & 7), it is then a lot easier to understand what the FILTER function is actually doing.  Remember line 5 of the formula first removes all filters on the Calendar[Date] column (in this example there actually are no filters on the calendar table as there are no columns from Calendar in the pivot).  Then FILTER checks each row of the Calendar table to see if the date in each row is >= 4th Jan and also <= 9th Jan.  FILTER keeps all rows in the Calendar table that pass this test.  Note I could have used MAX or MIN or even VALUES to “harvest” the value in the current filter context.  In some cases you can even use SUM or AVERAGE – it really depends on your data and what you are trying to do.  In short, the Aggregation functions and VALUES when used this way will “harvest” the value in the current filter context – very useful.

It is possible to simulate the results of this FILTER function on the Calendar table the same way as I showed with the Data table above.  In the image below, note how I have used the standard Excel table filter feature to simulate the FILTER function in DAX.  After applying the filters in Excel, there are 6 rows left in the table, and 5 of those rows have an “IS Weekday” value of 1.

image

So when SUM in the formula kicks in, it will return the value 5 – exactly the value shown in the pivot table above.

Note how much easier it is to understand and learn how these functions work when you simulate what is happening in the actual tables? This is a great way to learn to “think in tables”.

The First Problem with this Partial Formula

You may have noticed that there is a problem with the partial formula shown above.  Referring back to the pivot table (shown here again), see how the Grand Total row is blank?

image.png

What is really needed is for the Grand Total row to be the sum of the values shown in the rows in the pivot table ie the Grand Total should be 10.  To understand why the Grand Total it is blank, you have to go through the same process I walked through above.  In short, the initial filter context of the Grand Total in the pivot table is “completely un-filtered”.  So MAX of Date Received is 15th Jan and MAX of Date Approved is 11th Jan.  If you go ahead and simulate this by applying these filters onto the Excel Calendar table (like before), there are no rows in the Calendar table that pass the test.  The Calendar table is therefore completely filtered of all rows – none is visible, and hence the SUM formula returns BLANK (note it is BLANK, not 0).

A More Complete SUMX Solution (Still Not Optimal)

Whenever you get this “Grand Totals don’t add up” problem, you should think of SUMX as the solution. I like to think of SUMX as a way to “simulate the natural filtering behaviour provided by the rows of the pivot table”.  Each row of the pivot table is providing initial filter context for one record in the Data table and hence the formula works on row level.  But no such Data table filtering is provided by the Grand Total row and hence the formula doesn’t work on Grand Total level.  I cover this SUM vs SUMX behaviour in more depth in a blog post here.

Here is a formula that you may think should work, but it actually doesn’t work at all.

image

In the above formula, SUMX iterates over the Data table (line 3).  For each row in the Data table, the formula lines 4 through 10 are executed.  Intermediate DAX users can be forgiven for thinking that Context Transition occurs because of the CALCULATE function on line 4 – but in this case it does not happen.   Remember earlier in the post I explained that the second parameter of CALCULATE is the first part of the formula to be executed.  So in the above formula, the FILTER portion (lines 6 through 9) is executed in the initial filter context without any context transition. The CALCULATE isn’t executed until after the FILTER portion is complete.  That is why CALCULATE in this case does not force Context Transition and that is why this formula doesn’t work.

Here is a formula that does work (Still Not Optimal)

image

Note above in this new correct formula the addition of a second CALCULATE on line 4.  This CALCULATE function (ending on line 12) only has a single parameter (which is the second CALCULATE from line 5 through 12).  Because the inner CALCULATE function is the first parameter of the outer CALCULATE, the outer CALCULATE is executed first (note if the inner CALCULATE was the second parameter of the outer CALCULATE, then the inner CALCULATE would be executed first – confusing, hey!).  The outer CALCULATE forces context transition onto the Data table for every iteration step in the SUMX function before the inner CALCULATE is executed.  This context transition is in effect “simulating” the row by row behaviour of the original pivot table regardless if there is an ID used in the filter context or not.  This is why it works at the Grand Total level too.

works

The Optimal SUMX Formula

I mentioned above that the previous formulas are not optimal. The issue is that the Vertipaq engine in Power Pivot is a Columnar Data Store.  That means data is compressed and stored in Columns – not Tables.  The most efficient way to access the compressed data is one column at a time.  In the formulas above, the FILTER function is operating over 2 separate columns in the same table – this is not optimal – in fact it can be very inefficient on large tables.  In order for FILTER to do the comparison on 2 separate columns in the same table, Power Pivot needs to decompress the data – this is slow and inefficient.

So a better formula is this.

best

In this final formula, lines 7 and 8 are 2 independent FILTER functions each operating over a single column in the Data table. With this structure, the Vertipaq engine can complete each operation (independently) really fast, and without decompressing the data.

This final formula is fine over a relatively small data table, but would still not be efficient over a very large data table.  What is “large” depends on your computer.

You can download the sample workbook here.

I will cover a more detailed description of context transition in a future post.

If you like the way I explained this problem and you are a starting out on your formal DAX learning, you may like to consider reading my book “Learn to Write DAX” for a complete and structured way to build your DAX skills.  If you already have a solid understanding of DAX and want to go to the next level, then you should consider the websites, books and training courses from sqlbi.com   I have learnt most of my advanced understanding of DAX from Marco and Alberto using these methods.

Power BI Personal Gateway Explained

One of the many excellent sessions I attend this week at the PASS Business Analytics Conference in San Jose was a session titled “Get Latest Insights by connecting your data using Power BI Content Packs and PBI Gateways”.  The title was interesting but the content presented by Dimah Zaidalkilani and Theresa Palmer-Boroski (both Program Managers on the Power BI team at Microsoft) was truly excellent.  I am going to share here what I learnt about the Power BI Personal Gateway, to help you understand if you need it, and if so how to get it installed and running.

There are actually 2 gateways that Microsoft currently has available for Power BI, the other being the Enterprise Gateway (for companies and the like).  I will not be covering the Enterprise Gateway in this post.  You would generally look to use the Enterprise Gateway if you refresh from work from a corporate network and need to connect through the firewall.

What is Power BI Personal Gateway?

personal gatewayAs the name suggests, this gateway is for “personal” use. To use it you simply install it on your own laptop, desktop computer or even a server if you want.  “Personal” does not mean you can’t use it for business if you want to, it is just a term to distinguish from the other gateway.  What the Personal Gateway does is create a “gateway” or connection between your computer (that has the software installed) and the Power BI Service in the cloud.  It is kind of like a dedicated VPN designed just to allow Power BI to talk to the data on your computer.

Why do I need it?

Well you probably don’t “need” it, but you certainly might “want” it.  Once you have installed and configured the Personal Gateway on your computer, you are then able to refresh your Power BI datasets directly in the cloud.  You have the option to either set up an automated schedule (up to 8 times per day) or you can trigger the refresh manually on demand.

When you refresh a Power BI Desktop file on your PC, Power BI Desktop connects to all the data sources it needs to refresh the report. Data sources can include things such as SQL Server, local Excel files, CSV files, or what ever you have set as your data sources for your reports.  The Power BI Personal Gateway simply allows PowerBI.com to access those same data sources that are visible on your PC transparently over the Internet.

The Regular Refresh Process

If you are anything like me, your standard refresh process for Power BI workbooks will go like this.

  1. Open the Power BI Workbook on your PC via Power BI Desktop
  2. Click on the refresh buttonimage
  3. Wait for 1 to 10 minutes for the workbook to refresh while doing something else.  30 minutes later you realise the refresh finished 20 minutes ago.
  4. Save and then “Publish” the workbook to PowerBI.com via the menu shown below.image
  5. You then get 1 or more prompts about where to load the workbook and also “do you want to replace the existing version” (seriously – what else would I want to do?).image
  6. Repeat process steps 1 through 5 over and over again for each Workbook you have developed.

The Power BI Personal Gateway Process

Option 1:

  1. Configure a daily or weekly refresh once and then do nothing but sit back and relax.

Option 2:

  1. Log on to the Power BI Service
  2. Click on the ellipsis next to each data source and click “refresh” for each workbook you need to refresh.  There is no waiting and the refreshes can be triggered in parallel (whether you should or not will depend on your computer and Internet connection.

Things You Should Know

There are a couple of things you should be aware of.

  1. Gateways are part of the paid subscription to PowerBI.com.  If you currently don’t have a paid subscription then you can’t use it.  Everyone has to decided for themselves if it is worth what it costs.  Keep in mind there are other benefits of the paid subscription, not least of which are “content packs”.  I will come back and talk about that another time.  Also keep in mind that Microsoft is currently not invoicing personal users for the professional version unless they are part of some broader corporate licencing agreement.  In my experience you will be given a 60 day free trial, and after 60 days you will be given a further 60 days free.  I guess this will end one day, but I haven’t seen the end yet. If it still isn’t right for you, consider purchasing Power Update as a local refresh tool.
  2. Your computer needs to be turned on and connected to the Internet at the time of the refresh if you want it to work.  Computer servers are “by design” always on and hence it is not a problem.  But your laptop or personal computer may not be on all the time and hence you need to keep this in mind.

How to Install the Power BI Personal Gateway

Installation couldn’t be easier.  Go to powerbi.microsoft.com/en-us/downloads/ and find the Personal Gateway download.  There is a link to the downloads page on the PowerBI.com  home page before you log in.

image

Launch the software then follow the steps in the install wizard and close then you are done.

Set your Data Source Credentials

The only thing that wasn’t obvious to me when I started using the Personal Gateway was that you need to manually set the data source credentials for each dataset you want to refresh through the gateway.  The first time I realised this needed to be done was when I got the following error message.

image

Clicking through the error took me to the settings section.  You can get to this session at any time from the regular “cog” settings menu.

image

For each data source in your datasets, you will have to “edit credentials” and then tell Power BI what the connection authentication method will be.

image

Here is an example of the dialog from one of my workbooks that has a SQL connection.

image

The dialog for regular PC files is very similar

image

Once it is done, you should see a confirmation message something like this for each connection.

image

Time to Refresh the Datasets

In PowerBI.com, click on the ellipsis next to one of your datasets, and then you can either set up your Scheduled Refresh settings (2 below) or manually refresh the report on demand (3 below).

image

If you select Schedule Refresh above, you will be taken to the settings panel (also clickable from the cog in the top right of the page).  From there you can set up the refresh schedule for all of your loaded datasets.  You may choose to stagger the refresh times across the data sets so that they don’t all hit your computer at once.  You can have up to 8 refresh events each day per dataset.

image

I hope you have found this article interesting.  I would to hear from different readers as to why they have decided to use or “not use” the personal gateway, so please share your story in the comments section below.

I you are looking for more detailed information, you can read about it at the official Power BI Site.

Banding in DAX

Level: Intermediate

As regular readers of my blog would know, I enjoy helping people on forums who need help with DAX, Power Query or Power BI.  Recently I have answered quite a few questions for people that could benefit from a DAX “Banding” solution and hence I have decided to write this post as a permanent link to the topic.  I first learnt about the DAX banding technique from Alberto Ferrari at sqlbi.com

What is Banding?

Banding is a technique that allows you to “group” or “stratify” your data in a table into “bands” (it is sometimes called Cohort analysis).  So let’s say you you have a table of customers like this (shown below).  In my sample database I have 18,000 customers and I know the age of each one.

image

This provides an opportunity to analyse the sales data based on age so as to understand if people of different ages purchase differently.  Below is a pivot table that illustrates what is possible with the data.  But the problem is that there is far too much detail to be useful.

image

I don’t really want to target 35 year old customers differently to 36 year old customers.  Ideally I would like to group (or stratify) the customers into “bands” so that I can treat a group of similar customers the same way.  So I might want to group my customers like this.

image

Once I have grouped customers by age, I can then create a pivot table like the one shown below.

image

Notice how much easier it is to find insights into the data when there is less detail.  I can immediately see from the above pivot table that as people get older, they tend to be more interested in mountain bikes and touring bikes and less interested in road bikes (of course this is a fake database, so the insights are not necessarily realistic).

You will need a Calculated Column for Banding

I am a strong advocate for using Measures (not calculated columns) when ever possible.  You can read more about that in my Knowledge Base here.  But in this instance, a Measure is not going to work.  Measures can only be used in the values section of a Pivot Table – you can’t use them in the “filtering” areas like Slicers, Rows, Columns, and Filters.  In the pivot table above, the bands are used to slice the pivot table on rows, and hence a measure wont work.

I Could Write an If Statement

So the objective is to create a calculated column in the customers table that groups people based on age into a stratified band.  Any Excel user would know that you could write an IF statement for that (DAX has very similar syntax for the IF statement in regular Excel).  The DAX syntax for a single IF statement is as follows

The DAX Calculated Column above will return one of two possible values depending on the age of the customer.  The complexity comes in when you want to have multiple conditions.  As I mentioned earlier in this post, there are 6 possible groups that I have in mind (shown here again).

image

To do this with IF statements, I would need to nest multiple IF statements inside each other as follows.

Do you see the problem?  This Calculated Column works, but it is very hard to write, read and modify.  And more importantly, there is a better way – use the banding technique.

Process for Creating Banding

The better way involves the following process.

  1. Create a table in Excel that contains the group names and the lower/upper age limits
  2. Load the table to Power Pivot
  3. Do not connect the table to any other tables – it is a disconnected table
  4. Write a DAX Calculated Column that compares the age of each customer against the lower/upper limits in your disconnected table and returns the single row from the banding table that matches for each customer.

Here is the detail of how to do it.

Create a Table in Excel

Here is what a table in Excel looks like.

image

The key features to note are

  1. There is and ID column – this will be used to sort the Band column later
  2. A “Band” column – this is the label that describes each group.  Note the groups are mutually exclusive and collectively exhaustive (MECE).
  3. There is a “from” and “to” column that set the lower and upper bounds of each age group.  Note that the upper age for one group will match the lower age for the next group.  I have used DAX to ensure no overlap in the Calculated Column.

Write a Calculated Column

The objective of the calculated column is to filter the disconnected Banding table so that 1 and only 1 row is “visible” or “unfiltered” for each customer.  Here is the formula

The role of the FILTER portion of this formula is to filter the AgeBands table so only a single row is unfiltered. What the FILTER does is apply 2 filter rules to ensure the customer is older than the “from” column and “less than or equal” to the “to” column.  It is this less than or equal to portion that ensures no cross over of individuals into multiple groups (Mutually Exclusive).

The VALUES function has a unique capability.  VALUES returns a single column table.  If that table also has a single row, then VALUES converts the table to a scalar value and allows that value to used as a result in a calculated column or indeed in a pivot table cell.  Given the formula guarantees that there is 1 and only 1 row left unfiltered in this banding table, then VALUES will kick in, extract the label for the group from the AgeBands[Band] column and place that value in the calculated column.

Sorting the New Calculated Column

There is one more problem to solve – the names of the new groups do not sort properly in the Pivot Table.  By default, columns of data will sort in Alphanumeric order.  So the labels will sort like this:

image

To solve this problem, I created a second calculated column that brings in the Sort ID into the customer table.  It is simple to do this – just copy the formula in the first calculated column and then update the VALUES portion with the “ID” column as follows.

Once this second calculated column has been created, it is possible to tell Power Pivot to use the new column of Sort ID values as the sort order for the Age Band column (shown below).

image

Compression of the Calculated Columns

As I mentioned earlier, I am not a big fan of calculated columns – except where they are truly needed.  There are a few guidelines about when it is OK to use a calculated column – you can read about them all in the link I provided to my Knowledge Base earlier.  The key points to note here however are.

  1. These calculated columns are on the lookup table (Customers).  Calculated columns on lookup tables are generally OK.
  2. The calculated columns have a low cardinality (ie a low number of unique values).  This is also generally OK, even on a large data table if necessary.

Now to Use the New Calculated Column

Now that the Customers table has this new “Age Band” column, it is possible to use this column in visualisations.  It can be used on rows, columns, filters, and slicers.  Here is a rows example:

image

And a slicer example:

image

Learn to Write DAX

L2WD banner ad

The banding topic is just one of the many techniques I cover in my book “Learn to Write DAX“.  If you are an Excel user and are learning to use DAX, my book gives a comprehensive coverage of all the topics you need to start your journey as a Power Pivot ninja.