Archive for Beginners

Relationships in Power BI and Power Pivot

Level: Beginners

Power Pivot is a database that has been designed from the ground up to be optimised for reporting and analysis in Power BI, Power Pivot for Excel and SSAS Tabular.  The technology is essentially the same across all of these products so I will generically refer to Power Pivot in this article.

Power Pivot uses an in memory columnar database (Vertipaq) as the foundation technology – the same technology for all versions (Power Pivot, Power  BI and SSAS Tabular).  The Vertipaq engine is what makes Power Pivot both super fast and highly compressed.  A Power Pivot database is not the same as a relational database (RDB) and it does not support all the relationship types that relational databases support.  This can be confusing for people that are new to Power Pivot, particularly if they have at least a basic understanding of how databases (such as MS Access) work.  I explain what you need to know to get started with relationships in Power Pivot here.

Virtual Relationships

This article is specifically about physical relationships, however there are ways to create virtual relationships using DAX.  Examples include using LOOKUPVALUE, FILTER, CROSSFILTER and other techniques.  I am not covering these types of relationships in this article.

Relationship Cardinality Types

There are 3 different possible physical relationship cardinality types in database design but not all are supported by Power Pivot.

Relationship Cardinality Type Power BI Support Power Pivot for Excel Support
One to Many Yes Yes
One to One Yes No
Many to Many No No

One to Many Relationships

image

The one to many relationship is the foundation of Power Pivot.  In the example above (from Adventure Works in Power BI Desktop), the Customers table is on the 1 side of the relationship and the Sales table is on the many side of the relationship. These tables are joined using a common field/column called “CustomerKey”.  Customer Key (aka customer number) is a code that uniquely identifies each customer.  There can be no duplicates of the customer key in the customer table.  Conversely the customer can purchase as many times as needed and hence the customer key can appear in the Sales table as many times as necessary.  This is where the name “one to many” comes from – the customer key occurs once and only once in the Customers table but can appear many times in the Sales table.

Tables on the one side of the relationship are called Dimension tables (I call them Lookup tables) and the tables on the many side of the relationship are called Fact tables (I call them Data tables).

The entire Power Pivot Vertipaq engine is optimised to work with this (one to many) type of relationship.

One to One Relationships

The One to One relationship is only supported in Power BI and the newest version of SSAS Tabular.  In my view this relationship type has limited value and in most cases it is better to combine these tables into a single flat table prior to loading to Power BI.  Consider the model below.

image

The first relationship (shown as 1) is a 1 to many relationship between the Customer table (Lookup table) and the Sales table (Data table).  The Customer Socio Economic Data table is joined to the Customer table via a 1 to 1 relationship (shown as 2 above).  If there is a benefit (to the user of reports) of splitting this Socio Economic data into a separate table then of course you should do so.  If there is no benefit, I recommend you combine all the data from Customer Socio Economic Data table into the Customer table using Power Query on load.

Every relationship has a “cost” in that it will have some affect on performance.  The performance impact may not be noticeable for simple models but may become an issue with very complex models.

If you only remember 1 thing from this article, then please let it be this:  Don’t automatically accept the table structure coming from your source data.  You are now a data modeller and you need to make decisions on the best way to load your data.  Your source system  is probably not optimised for reporting (unless it is a reporting datamart) so please don’t assume that what you have got is what you need.

Many to Many Relationships

The many to many relationship type is not supported in Power Pivot.  This is a deliberate design decision that has been made as a trade off to ensure optimum database performance.  If you have data that is logically related with a many to many cardinality, there are modelling techniques you can use to solve the problem in Power Pivot that are covered in my many to many pattern article here.

UI Differences

There are a few differences in the Power BI Desktop/Excel 2016 UI (relationship view) and that of Excel 2010/2013.

Excel 2010/2013

The early UI has an arrow pointing to the lookup table (the one side of the relationship) and a dot on the many side.  This is unfortunate as the arrow is pointing in the opposite direction of the filter propagation.  This only exists in Excel 2010/2013 (and the early version of SSAS Tabular).

Power BI/Excel 2016

The UI has been significantly improved with Power BI Desktop and Excel 2016.  As you can see below, the 1 to many relationship is now clearly shown, and there is also a new arrow showing the automatic filter propagation direction.

image

One Active Relationship

It is possible to have more than 1 relationship between tables in Power Pivot, but only 1 can be active at a time.  An example of when you may want multiple relationships is if you have a Sales[Order Date] and a Sales[Ship Date] in your data table.

In this scenario (shown above in Excel 2013) you may want to join both Sales Date columns to your Calendar table so you can use time intelligence in your data model on both Order Date and Ship Date.

The active relationship is shown as a solid line (above) and the inactive relationship is shown as a dashed line (in this case it is highlighted in blue above).  The active relationship is used by default in all DAX measures however you can over-ride this default and use the inactive relationship (when needed) by using the USERELATIONSHIP() function.  Full details on this are covered in my article here.

Cross Filtering Behaviour

Power Pivot is optimised to work with one to many relationships and to automatically propagate filters (filter context) from the one side to the many side.  In all versions of Power Pivot for Excel, this is the ONLY type of filter propagation that is available.

Power BI supports bi-directional cross filtering behaviour (shown right below) as well as single direction (shown left below).

image

In fact bi-directional filtering is the default behaviour for Power BI Desktop.  There are many people (including me) that think this is a bad idea as bi-directional cross filtering comes at a cost – there is an overhead of constantly cross filtering the lookup table based on the contents of the data table at time when it is actually not required.  Sure if you have a simple model and you need this behaviour and you don’t know how to handled it with the many to many pattern  then turn it on.  But surely this should not be turned on by default.  In addition, if you have more than 1 data table, bi-directional cross filtering can cause circular references causing further confusion to unsuspecting users.

I think Microsoft is trying to make Power BI more user friendly for the “lay user” however in this case I think Microsoft has made a mistake.  Do yourself a favour and turn off bi-directional cross filtering unless you explicitly need it.  To change it just double click on the arrow and set the cross filter direction to single.

update:  Last week 17th Feb 2017 I noted that new models I built were single directional by default – seems Microsoft has listened and changed the default behaviour.

image

Wrap Up

Hopefully this article has helped you understand more about how Power Pivot works.  Let me know in the comments below if there is anything I have missed.

Best Practices for Power Pivot, Power Query and Power BI

Level: Beginners

There are many best practices for Power BI, Power Pivot and Power Query.  I know these things so intuitively now that it is very easy to forget how I incrementally learnt these things along the journey.  Most of these things are not “right” vs “wrong” – they are more often simply “better”.  I thought there would be value in producing a page that clearly outlines the important best practices as a learning reference for everyone.  In all cases I have outlined why it is a best practice to help the understanding.  You would be well placed to adopt these best practices as part of your DAX journey.

Naming Conventions

Naming your Columns and Measures

  • Always write a Column in the format TableName[Column Name]
  • Always write a Measure in the format [Measure Name]

This is the foundation of all the DAX formulas you will write.  Both columns and measures use the same square bracket syntax.  It is technically possible to write measures and columns both including the table name as follows.

  • TableName[Column Name]
  • TableName[Measure Name]

These 2 examples above do not cause a problem because their names makes it easy to know what they are, but if you always place the table name at the front of every measure and every column, then it will be impossible to tell them apart by reading the formula.  Take the following example

  • TableName[Total Sales]

Is the above a column or a measure?  It is impossible to tell unless you are using the best practice naming convention.

Measures and columns are very different in the DAX language.  It is essential that you can tell at a glance which is which.  In addition if you hard code a measure with the table name and then you later move the measure to another table, any formulas referring to this measure will stop working.

Give Tables a Single Noun Name

  • Don’t just accept the table name from your source system.  Preferably give the table a single word noun description/name.

Many BI data sources will have long table names like fctSalesTransactionsHistory or dimCustomerMasterFile.  This is a common practice in IT and is related to the Kimball dimension modelling methodology.  The problem is that with Self Service BI these table names, column names and measure names are more visible to business users than ever before.  Given many business users are going to be reading these table names, it is much easier for them to comprehend the “Sales” table rather than the “fctSalesTransactionsHistory” table.  In addition earlier versions of Power Pivot do not have fully featured intellisense – you are required to refer to columns by starting to type the table name from the beginning.  If every table starts with either fct or dim, you have just added 3 additional characters you have to type for each formula before Intellisense can help you.

Also PowerBI.com has a natural langauge query tool that allows you to ask quetions of your data.  If your table names are ‘words’, then you are helping the engine find what you are looking for.

Using Spaces in Names

  • Don’t use spaces in table names
  • Do use spaces in column names
  • Do use spaces in measure names

If you use spaces in table names you will be forced to add single quotes around the table name each time you refer to it in a formula.  This makes the code longer, harder to read and “untidy” (IMO anyway).  It is better to use underscore_characters or CamelCase instead of spaces (or better still use a single noun name as mentioned above).

Columns and measures always need to be wrapped in [Square Brackets] anyway and hence adding spaces does not make the code any more complex.  Columns and measures are easier to read if they have spaces

Don’t Overly Abbreviate Business Terms

  • Give your tables, columns and measures descriptive business names without overly short abbreviations.

Firstly you should use language and abbreviations that are commonly used in your organisation.  So if “Year to Date” is commonly abbreviated to YTD, then for sure you can use this abbreviation in your measure names eg [Total Sales YTD].  However if you develop a new measure called [Total Sales Last Rolling Quarter] and this is not a common concept across the organisation, then you are just making it hard for yourself if you call your measure [Ttl Sales LRQ].  You will simply have people calling you asking what it means.

Secondly Power BI has a feature called Q&A that allows a user to ask a natural language question about data.

eg.  What were the total sales for bikes last year

This natural language algorithm looks for matches in words in the question against the data model to help answer the question.  If you abbreviate your measure names to for example [TtlSales] instead of [Total Sales], you are making it hard for Q&A to do its work.  You can help Q&A using synonyms, but do yourself a favour and don’t over abbreviate your name.s

Measures or Calculated Fields

  • Measures is a better name than Calculated Fields

The term measures has been a Microsoft term for a BI formula for many years.  In the first release of Power Pivot in Excel 2010, Microsoft adopted this term.  Unfortunately in Excel 2013 somehow a decision was taken to rename “measures” to be called “calculated fields”.  This was a bad decision and thanks to lobbying from many people Excel 2016 reverted to using the term measures (as does Power BI).  I always now use the term measures and never refer to Calculated Fields unless I am explaining to Excel 2013 users why they are stuck with a bad name.

Loading and Shaping Data

Push Shaping as Close to the Source as Possible

  • Wherever possible, you should do your data shaping as close as possible to the data source.

There are many ways that you can shape your data in the Microsoft BI stack.  Power Query is a great tool to reshape your data however you can also use Power Pivot (Calculated Columns, Filters on load) and Power BI also includes Calculated Tables.  And you can always write SQL code and paste that into the tools to extract the data that way.  The main problem with these approaches is you are effectively hard coding a solution for a single data set.  If you want to build another data set in the future, the work needs to be done again (either copy or re-write).  The data shaping tools are designed to allow you to do what ever you need without having to rely on a third party – use these tools if you need to.  However if you have a common need for data in a particular shape and you can get support (from IT or otherwise) to shape your data at the source so you can easily get what you need, then there is definitely value in doing that.

Shape in Power Query, Model in Power Pivot

Power Query and Power Pivot were built to do 2 completely different tasks.  Power Query is built for cleansing and shaping while Power Pivot is built for modelling and reporting.  It is possible that you can shape your data in Power Pivot (eg you can write calculated columns, you can add calculated tables (in the newer versions) etc).  But just because you can do these things in Power Pivot, doesn’t mean you should.   For example it is possible to write letters to people using Excel, but Word is a much better tool for this task (I knew someone that once did that!).

Best practice is that you should use Power Query to shape your data before/during load, and then use Power Pivot for measures and reporting. I have deeper coverage on this topic here.

Use A Calendar Table

  • If you want to any sort of time calculations, get a Calendar table

It is possible that you can analyse your data in a single flat table without using any lookup/dimension tables.  A Calendar table is a special type of lookup/dimension table because it can be used to perform time intelligence functions.  I have an article on time intelligence here and another on Calendar tables here.  Bottom line – get a Calendar table.

A Star Schema is Optimal

  • Power Pivot is optimised to use a Star Schema table structure

I have an in-depth article about star schemas here that you can read if need be.  I am not saying this is the only layout that will work, or that other designs will always be slow.  I am saying that if you start out thinking about a star schema and aim to build that design you will be well under way to success.  Two key things you should know.

  • Don’t just bring in what is in your source transactional database – that would likely put you into a world of pain.
  • There is no need to create a lookup/dimension table just for the sake of it.  If your sales table has customer name and you don’t care about anything else about the customer (eg city, state etc), then there is no need to create a lookup table just for the sake of creating a star schema.  If you have 2 or more columns relating to the same object in your data table, then it is time to consider a lookup table.

You Should Prefer Long and Narrow Tables

  • Short wide tables are generally bad for Power Pivot but long narrow tables are great.

image

There are 2 main reasons why loading data this way is a good idea.

  • Power Pivot is a column store database.  It uses advanced compression techniques to store the data efficiently so it takes up less space and so it is fast to access the data when needed.  Simplistically speaking, long narrow tables compress better than short wide tables.
  • Power Pivot is designed to quickly and easily filter your data.  It is much easier/better to write one formula to add up a single column and then filter on an attribute column (such as month name in the green table above) than it is to write many different measures to add up each column separately.

Only Load the Data You Need

  • Load all the data you need, and nothing you don’t need.

If you have data (particularly in extra columns) you don’t need loaded, then don’t load it. Loading data you don’t need will make your workbooks bigger and slower than they need to be.  In the old world of Excel we all used to ask IT to “give me everything” because it was too hard to go back and add the missing columns of data later.  This is no longer the case – it is very easy to change your data load query to add in a column you are missing.  So bring in all of what you need and nothing you don’t.  If you need something else later, then go and get it later.  It is even advisable to use a tool like PP Utilities to show you which columns are not in use so you can delete them.  Focus mainly on your large data tables – the lookup/dimension tables tend to be smaller and hence are generally less of an issue (not always).

Don’t use Linked Tables

It is possible to add your data to a table in Excel and then use a Linked Table to load it into Power Pivot.  You simply select the data, go to the Power Pivot Menu (1 below) and click Add to Data Model (2 below).

The trouble with doing this is that you end up with 2 copies of the data in your workbook. The Excel table is an uncompressed copy and then you also have a compressed copy inside Power Pivot.  In the example (shown as 3 above) there are many thousands of rows of data.  The uncompressed data can take up 6 – 10 times more space than the equivalent compressed data.  If you have a small table with a couple of columns and 10 or 20 rows then it is fine.  However if you have anything more than that you are better to have 1 workbook for your data and then import the table directly into Power Pivot without storing it in Excel at all.

Modelling

Avoid Bi-Directional Relationships

  • Avoid using the default bi-directional relationships in Power BI unless you need them.

image

Microsoft introduce bi-directional filter propagation in Power BI (this is currently not available in Excel).  This is a good feature for beginners and those that don’t really understand how the plumbing works.  But it comes at a cost in that:

  • The performance can be negatively affected
  • If there is more than 1 data table, there can be circular relationships created (just like in cells in Excel)

I recommend you turn your bi-directional relationships to single direction (double click to change) and only turn them back on if you really need them.

Measures are Better than Calculated Columns

  • Wherever possible you should write Measures in Preference to Calculated Columns Where Possible

I have been a strong proponent of this for many years.  It mainly applies to Excel users that don’t have any formal learning about database design.  It is very easy for an Excel user to think they should write a calculated column (because they know how to do that) and not a measure (because that is a bit foreign to an Excel user).  I am not going to cover this in depth again now as I have already covered it here.  Just do yourself a favour Excel folk and start with the assumption that a measure is what you should write unless you know why a calculated column is a better fit.

For the record there are times when a calculated column is the best option, but 99.9% of all use cases coming from new Excel users won’t need them.  The main exception is if you need to use the formula in a slicer to filter your data – then you will need a column.

Store Measures in the Table Where the Data Comes from

  • You have a choice where to store your measures, so place them in the table where the data comes from.

Take for example a measure like this.

Total Sales = SUM(Sales[Extended Amount])

The data for this measure is coming from the [Extended Amount] column in the sales table.  You should therefore store the measure in the sales table.   If you place it in (say) the Calendar table, Excel will likely give you a warning similar to shown below.

image

This issue doesn’t occur in Power BI.

Break Measures into Interim Parts

  • Break your DAX problems into manageable pieces and solve each piece one at a time.

You wouldn’t use a single cell in a spreadsheet to build a financial model.  The cells are there to be used and make it easier to build a solution that meets your needs.  You should think the same way about measures.  The following formula is valid however it is hard to read, write and debug.  It also repeats the same line of code multiple times throughout the measure.  Having said that it will give you the % change in sales vs last year.

 

It is much easier to write interim measures and then join the pieces together to solve your problem.  Plus you get each interim measure available for reuse elsewhere in your model.  I am sure you will agree the following set of 4 measures are much easier to understand.

Total Sales = SUMX(Sales,Sales[Qty] * Sales[Unit Price])

Total Sales LY  = CALCULATE([Total Sales],SAMEPERIODLASTYEAR(Calendar[Date]))

Chg vs LY = [Total Sales] – [Total Sales LY]

% Chg vs LY = DIVIDE ( [Chg vs LY], [Total Sales LY] )

Don’t Break Calculated Columns into Interim Parts

  • It is good to have interim measures but it is bad to keep interim columns.

Interim measures are calculated on the fly on demand, they take up little space and make it easier to write your DAX.  As with measures, I t is easier to write calculated columns using interim calculated columns, however the problem is that every column is pre-calculated and stored on disk, and each additional column makes the data model take up more space on disk and memory and hence makes it less efficient.   By all means write interim columns if you need to in order to create a calculate column (not withstanding the earlier comments of columns vs measures) however once you have worked out the correct syntax, concatenate all the code into a single “Mega DAX formula” in a single column. This is an Excel concept I learnt from John Walkenbach.

Other Advice

You Can’t Start to Learn DAX by Reading Alone

I say this up front in my book “Learn to Write DAX”.  If you think you are going to learn a new skill like Power Pivot, Power Query or Power BI by reading a book and not getting your hands in the system, let me tell you “you can’t”.  The exception is if you are a professional SQL Server database user and have a solid background in reporting and analytics database technology, then I am sure you can learn this way. For the rest of us Excel folk, there is no substitute for practicing what you read – so do yourself a favour.

Use 64 bit If You Can

This one causes a world of pain for many people.  Power Pivot is the ONLY MS Office product that can benefit from 64 bit, but unfortunately it is all or nothing.  Most organisations have deployed 32 bit and will be very reluctant to give you 64 bit Office.  You can read my article on this topic here and also read up about my work arounds including installing 64 bit Power BI Desktop with 32 bit Office, and then also using Power BI Desktop as a local server for Excel.

DAX Time Intelligence Explained

Level: Beginners

I help a lot of people on forums who ask questions about time intelligence for DAX.  If you are just starting out then the chances are that you may not even be clear what time intelligence is and hence sometimes you don’t even know what to ask.  Often the question is something like “I want to show this year and last year on a chart”, or “total year to date this year compared with last year” etc. If you want to do any time based comparison using DAX, Power Pivot and or Power BI, then this article explaining time intelligence is the right article for you.

Definition of Time Intelligence

Time intelligence is the collective name for a set of patterns (DAX in this case) that can be used to solve time comparison problems.  Examples include comparing:

  • Same period prior month, quarter, year etc.
  • Same period next month, quarter, year etc.
  • Same period year to date compared with prior year, next year etc.
  • Rolling 30 days, 60 days, 90 days, 12 months etc.
  • etc. – there are many many more

Time intelligence is used when you want to “time shift” any period with another period of time for comparison purposes or to simply display a different period than the selection.

Understanding Filter Context

Before you can understand why time intelligence needs a special approach in DAX, you first need to have a clear understanding of Filter Context.

Pivot Tables and Power BI visuals both slice data so that you can “drill” and “filter” to see a sub-set of data.  Take the image shown below.  On the left is a pivot table and on the right is a Power BI visual.

image

Starting with the Excel Pivot table on the left, every value cell in the pivot table has been filtered more or less by the Rows, Columns, Filters and Slicers that make up the Pivot Table.  The cell highlighted as 1 (above left) has the following filters applied.

  • Territory[Country] = “Australia” – this comes from the Rows in the Pivot
  • Calendar[Year] = “2003” – this comes from the Year in the slicer
  • Product[Category] = “Bikes” – this comes from the Category in the filter.

After these filters are applied, the calculation for the cell is evaluated and the answer $2,947,789 is returned to the cell.  Every value cell in the pivot table is evaluated in exactly the same way – including the Grand Total row in the pivot table.  In the case of the grand total row, Product[Category] and Calendar[Year] have the same filters, but there is no filter on Territory[Country].

On the right hand side in the image above is a Power BI visual.  Filtering in Power BI visuals fundamentally works the same way as a pivot table however there are more places for cross filtering to come from.  In the image above, the same filtering is applied as in the Pivot table but in the Power BI example the filters are applied in a different way.

  • Territory[Country] = “Australia” – this comes from the bar chart Axis
  • Calendar[Year] = “2003” – this comes from the Year in the slicer
  • Product[Category] = “Bikes” – this comes from the Category in the tree map visual.

When filter context is passed from a visual to the underlying data model, all the relevant tables are filtered before the calculation is completed.  Filter first, evaluate second is a fundamental principle for all DAX formulas.

The Time Intelligence “Problem”

Let’s assume you want to compare total sales on a particular year vs prior year.  One way to do this (in Excel) is to put the years onto Columns in a pivot table as shown below (a similar approach can be used in Power BI).

image

But doing it this way causes many problems, including:

  • There are years in the pivot table that you don’t want (eg 2001, 2004).  You will need to somehow manually hide or filter the ones you don’t need.
  • If you want to calculate the absolute change or % change year on year you will need to hard code some formulas in the cells next to the spreadsheet and hence they can’t be reused in other visuals later.
  • If you want to look at a different year you will potentially have to go through the process of doing the filtering and formulas again from scratch.

A better way to solve this problem is to select the current period (using a slicer or filter of some sort) and have a universal formula that returns the result relative to the selection.  So if you select 2003, the formula will automatically select 2002 for you.  If you select 2002, it will automatically select 2001 (and so on).

Filtering is Now Your Enemy

If you want to write a relative time intelligence formula, then the natural filtering behaviour of a visual becomes your enemy. Imagine a calendar table with a filter applied Calendar[Year] = 2003.  If you imagine what the filtered data model would look like “Under the hood”, it should be clear that the filtered calendar table will show rows starting on 1 Jan 2003 and ending on 31 Dec 2003 – it will contain 365 unique days for the single year 2003.  The filter has already been applied to the data model so only days in 2003 are available for the calculation – how then can the data model possibly access sales for the year 2002?  There needs to be a way to go back in time and fetch a different period (in this case the period is 1 year earlier in time than the selected period), yet the 2003 filter has already been applied preventing this from occurring naturally.  This is the reason why there needs to be a special set of time intelligence functions – to solve this natural filtering “problem”.

How Time Intelligence Functions Work

Time intelligence functions are specifically designed to solve the filtering issue described above.  All time intelligence functions execute the following 4 steps:

  1. first “detect” the current filter context to determine what the “current” selected period is
  2. then remove the calendar filtering from the underlying tables so that data for “all time” is once again available.
  3. then perform a time shift to find a different period in time (as specified in the formula).
  4. Finally reapply filters on the data model for the time shifted period prior to calculating the final value.

Custom vs. Inbuilt Time Intelligence

There are 2 types of time intelligence functions – Custom and Inbuilt.  Inbuilt time intelligence functions are easier to write because they have been designed to protect the user (ie you) from the complexity.  I am not going to cover Inbuilt time intelligence in depth here because they are relatively easy to use.  See link at the bottom of the page if you would like to download the DAX reference guide I produced (including all the time intelligence functions).

The reason inbuilt time intelligence functions are easier to learn is they actually are what is known as “Syntax Sugar”.  Microsoft has created these special functions to make them easier to write and easier to understand.  You follow the simple syntax and Power Pivot does the rest.  But under the hood the inbuilt time intelligence functions are converted to the full syntax prior to evaluation.  Consider the following two examples (just to illustrate the potential complexity in the custom version).

Total Sales Year to Date (inbuilt) = TOTALSYTD(Calendar[Date],[Total Sales])

Both of these formulas return the exact same result, and in fact both use the same approach to calculating the result under the hood.  The only difference is the inbuilt version is much easy to write because you (the DAX author) are protected from the full syntax.

The end result (regardless which formula is used) looks like this in a Pivot Table.

image

Looking at the highlighted cells, even though cell 1 above is filtered to show only sales for the month of July 2003, the Time Intelligence function (cell 2 above) is able to display sales for the period Jan – Jul 2003.  It does this because the formula performs the following 4 steps.

  1. It first “detects” the current filter context to determine what the “current” selected period is.  In this case July 2003
  2. It then removes the calendar filtering from the underlying tables so that all data is once again available.
  3. It then performs a time shift to find a different period in time – in this case it holds the end date in the current filter context the same (31 July 2003) but shifts the start date back to 1 Jan 2003.
  4. Finally it reapplies filters on the data model for the time shifted period prior to calculating the final value.

How to Read a Custom Time Intelligence Formula

The custom time intelligence formulas can be daunting when you first see them – but actually they are easy to understand as long as you clearly understand the role of each part of the formula.  Below is the same formula again (from above) along with line numbers to make it easier for me to refer to.

image

Line 2 (CALCULATE) is the only function that can change the natural filtering behaviour coming from a visual – that’s what it does.   CALCULATE always filters first, evaluates second (as mentioned above).  So lines 5 – 8 are executed first (filters applied) and then the formula on line 3 is evaluated last.

Lines 4 – 8 (FILTER) is where the filtering occurs.  There are a few confusing things here.

  • Line 5 refers to ALL(Calendar) instead of simply Calendar.
  • Lines 6 and 7 seem to be evaluating against themselves – very confusing.
  • Line 7 starts with a double ampersand &&

Let me explain each line.

Line 5 ALL(Calendar)

The key thing to understand here is that the filter portion of calculate always operates in the current filter context.  That means that the Calendar table in line 5 has already been filtered by the visual (eg the Pivot Table).  Looking back at the pivot table image above, this means that the line 5 is already filtered by the pivot table and hence the Calendar only has dates from 1 July 2003 until 31 July 2003 available – all other dates have been filtered away.  The purpose of the ALL Function therefore is to remove all filters from the current filter context.  If it didn’t remove these filters, it would not be possible to access dates outside of the month of July 2003.

Now they have all be removed, it is time to add back that date filters that are needed.

Line 6 MAX( )

Line 6 reads “where Calendar[Year] = MAX(Calendar[Year])” and hence it seems to be referring to itself. The way to read and understand line 6 (and 7) is as follows:

  • Whenever you see an aggregation function (in this case MAX) around a column, it means “go and read the value from the current filter context”.
  • Whenever you see a “naked” reference to a column (ie there is no aggregation function around the column), then it means “go and apply a new filter on this column in the data model.

Taking these 2 rules, it should be clear that MAX(Calendar[Year]) in the current filter context is = 2003.  Line 6 therefore says “Go and apply a new filter on Calendar[Year] to be equal to 2003.

Note that you can use any aggregation function in these formulas that does the job.  In line 6, you could use SUM, MIN, MAX, AVERAGE or any other similar aggregation function and still get the same result.  You could also use VALUES or DISTINCT in the case of line 6, and LASTDATE in the case of line 7.  There is no right or wrong answer, simply think about the need and then find the best function to use.

Line 7 && and MAX( )

Line 7 reads “and Calendar[Date] <= MAX(Calendar[Date])”.  The double ampersand && is the inline syntax for a logical AND.  Using this knowledge and the same logic as with line 6, the way to read line 7 is as follows:

“AND also go and apply another new filter this time on Calendar[Date] to be less than or equal to 31 July 2003”.

With both of these filters applied in a logical AND, the Calendar table will contain all of the dates from 1 Jan 2003 until 31 July 2003.

The Result

Once the time intelligence function has been written, it can be added to a visual as shown below (Power BI Desktop).  Note how the YTD formula seems to “defy” the filter context coming from the visualisation due to the custom time intelligence function that has been written and explained.

image

The Trouble with Syntax Sugar

Syntax sugar is great as it makes otherwise hard formulas very easy to write with a limited understanding of filter context, row context, filter propagation, context transition etc.  There are a few down sides however.  Firstly is that you don’t get to learn these very important concepts and hence you are delayed in becoming a true Power Pivot and Power BI Ninja.  Secondly the inbuilt time intelligence functions only work in certain circumstances where you have a day level Gregorian calendar.  Read more about that here exceleratorbi.com.au/power-pivot-calendar-tables/

Granularity

I personally hardly ever use the inbuilt time intelligence functions unless my data is at a day level of granularity (which is rare), and I find the custom functions become easy to write with practice.  Custom time intelligence functions become much more important when your data is not at a day level of granularity.  Most of the work I do for clients is either weekly or monthly data.  If you are in this situation you could “trick” Power Pivot that you are using daily data by loading all data in any given week or month on the same date (eg last day of the month) and use inbuilt time intelligence however this is hardly best practice.  A much better approach I think is to write custom time intelligence functions as outlined in this article.  If you are going down the  path of writing custom time intelligence, you should read my advice about adding an ID column into a calendar table to make custom time intelligence functions easier to write.  exceleratorbi.com.au/power-pivot-calendar-tables/

Sales vs Prior Year

Time for a different example.  Now that I have covered how a custom time intelligence function works, let me show you a couple of inbuilt time intelligence measures that solve the original problem (Sales vs Prior Year).

Sales Prior Year 1 = CALCULATE([Total Sales],SAMEPERIODLASTYEAR(Calendar[Dates]))

Sales Prior Year 2 = CALCULATE([Total Sales],DATESADD(Calendar[Dates],-1,YEAR))

Both of the above formulas use inbuilt time intelligence functions (shown in bold), but they also use CALCULATE.  Now you have an understanding that CALCULATE is performing a time shift, it should be much easier to understand what is happening in these formulas.  Both of these formulas in bold produce a table of dates that have been time shifted by 1 year.  CALCULATE then takes this new table of dates, removes the current filter context from the calendar table and then moves back in time by 1 year before reapplying the filter context and then doing the calculation.  One you have [Sales Prior Year] it is easy to write:

Change vs Prior Year = [Total Sales] – [Sales Prior Year]

% Change vs Prior Year = DIVIDE([Change vs Prior Year] , [Sales Prior Year])

So where can you find a list of all the inbuilt time intelligence functions?

 A Free DAX Reference Guide

One of my students at a recent training class asked me if there was a list of all DAX Functions – kind of like a cheat sheet.  I wasn’t able to find such a thing so I produced exactly that and I am making it available free to anyone that would like a copy here.

This reference guide covers all of the inbuilt time intelligence functions on page 14 as well as every other function across the language all nicely laid out to make them easy to find.  You can download this reference guide below.  If you haven’t ready done so, why not sign up for my weekly newsletters at the same time so you are kept up to date with my latest tips and tricks about Power Pivot, Power Query and Power BI.

Download the DAX Reference Guide Using the Form Below

Easy Online Surveys with Power BI Reporting

I think today’s article will be of interest to my readers even though it is a little astray from my normally pure Power BI, Power Pivot and Power Query content. I will show you how to quickly and easily create an On-Line Survey that you can distribute to anyone that has an Internet connection, and then use Power BI to report on the results from your survey.

Office 365 Excel Surveys

You may not be aware that O365 has an inbuilt Excel survey tool that is very easy to use.  And who wouldn’t want to use Excel to solve a problem if they could?  You can find the survey tool by going to OneDrive online.  I am using OneDrive for business for this demo, but it also works for OneDrive Personal.

Here is Survey – Please Participate

For demonstration purposes, I have created a survey to ask my readers what they would like to see more of on my website and to see which countries the readers live in.  Edit 7 Feb 2016.  The survey is now closed, but you can still read on.

image

Now that you have completed the survey, read on to see how you can do the same yourself and to see the latest results via an embedded a Power BI report at the bottom of the page.  Note that it can take up to 1 hour for your specific survey response to be available in the report (ie it is near real time – not real time).

How I Created the Survey

First Open OneDrive Online

Frankly I get confused with all the logins for Microsoft sites. The easiest way I know how to login to OneDrive Online is to Open Windows Explorer, right click on the OneDrive folder and then select View Online.

image

Create Your Excel Survey

Creating an Excel survey couldn’t be easier.  Just click New\Excel Survey as shown below.

image

I immediately got some survey header information to fill in as shown below.

image

There are no user manuals required to work out how to do this.  I just updated the information and moved on to the next section.  It is easy to enter the questions you have, and there are a number of intuitive response types that you can choose to meet your needs.

image

Survey Results Database

After you have completed your survey, you can see the underlying Excel spreadsheet that stores your survey results.  It is possible to add new columns to the right of the results table.  Additional columns added manually can be used to store information you want to add without it being part of the survey.

image

Connecting the Data to PowerBI.com

There are a few ways you can create a Power BI Report.  If you want a fully featured data model including the ability to import other tables of data and write your own DAX, then you need to import this new data source into Power BI Desktop as you would with any other report.  For this demo today, I have use the “Get Data” feature from directly inside Power BI Service to connect to the Excel workbook on OneDrive.  To do this, I first logged into PowerBI.com and selected get data from files as shown below.

image

Then I selected OneDrive – Business as shown below.

image

Then I selected my survey file and clicked Connect

image

I was then prompted to either import the data or connect.  The connect option works just like Excel Online – not what I wanted.  So I selected Import.  When you use import in this way, Power BI will check your OneDrive account about once each hour to check for new data and will automatically update your report.

image

Power BI then created a new Dataset for me (1 below).  You can see the table of data shown in 2 below.  It also created a Dashboard, however this was not much use so I simply deleted the dashboard.

image

Here is the Final Power BI Report

Once I had the data connected, I simply created a few simple visualisations using the capabilities of Power BI Service.  You can see the final interactive report below.

Be sure to read next week’s post where I cover how you can use Power BI Desktop to model your online data. http://exceleratorbi.com.au/data-modelling-onedrive-source-files/

How to Document DAX Measures in Excel

I often get asked if there is an easy way to create documentation for DAX measures when using Power Pivot for Excel.  I am not a big fan of documentation for the sake of it, but I do see value in having “some” appropriate level of documentation.  I think a good balance of the right amount of documentation vs the effort to write and maintain the documentation is to use the Power Pivot UI itself.  I explain how I do it below.

Writing DAX Measures From Within Excel

I always teach students at my Power BI training courses to write their measures from within Excel via the “new measures” dialog box (and specifically not to write them from within the Power Pivot window).  The process is always as shown below.

  1. Set up a pivot table that has some relevance to the measure you are writing, then click in the pivot table (shown as 1 below)
  2. Select Power Pivot Menu \ Measures* \ New Measure* as shown below (*called Calculated Field in Excel 2013)

image

The reasons I teach students to do it this way are simple.

  1. You get a dialog box/wizard that shows you every step that is required to complete the task.  Just complete each step one at a time and you can’t go wrong.
  2. You get to check the measure for errors before completing it (shown as 2 below).
  3. Once you add the measure to your pivot table (shown as 3), you get to see immediately if it is working as you expected.

image

image

And there is a fourth reason to use the wizard that is less obvious.  The Description box (shown as 4 above) is a great place where you can make notes about your DAX formulas (aka documentation).    In Power BI Desktop you can add comments into your DAX expressions, but this is not possible in Excel, so the description box is the best option you have.

Create Your Documentation

If you diligently add comments into the Description box above, you will then be able to extract this information into a documentation page using the Power Pivot Utilities Toolbar.  I have previously reviewed this great little tool here, so you can go back and read about it if you missed it.

After installing both tools, you will then see the PP Utilities Toolbar in Excel (shown as 1 below)

image

Click on List Measures (shown as 2 above).  After a few seconds, you will see a new page in your Workbook called “Measures” like shown below.  Note the “Description” column on the right hand side. This column is sourced directly from the information entered into the Description box shown further up the page.

image

What’s more, if you select the measures on your page (Shown as 1 below) and then click Format DAX Expressions (2 below), the measures will be nicely formatted so they are easy to read.

image

This page is then your documentation and you can refer to it at any time you need to see the big picture.

Always Up to Date

One important feature of this type of documentation is that it is very easy to maintain.  If you make changes to any of the measures or comments in your data model, you can simply re-run the extract process so you have an up-to-date list of all the measures.  You can also do the same for Calculated Columns and also relationships.

What About Excel 2010?

Unfortunately the Power Pivot Utilities Addin only works for Excel 2013+  But all is not lost.  It is possible to extract the Measures from your Data Model manually using DAX Studio.  You can read my blog post about how to do that (and other cool things) with DAX Studio here

Power Pivot Crashes? – An Update

powerpivotlogoToday Avi Singh and I met with Howie Dickerman from Microsoft to discuss the Power Pivot for Excel Add-In.  You can read the background to this meeting at my blog post last month.  Howie is the Program Manager assigned to Power Pivot for Excel and has a wealth of experience with Excel and other products.  I shared all the comments that people added to my blog post from 19th October.  The great news is Howie is already crystal clear about the problems.  We had a great discussion about the plans that Microsoft has for improving stability and performance for the Addin.  And there definitely is a plan.  Like any software project it is impossible to say when we will see something, but I am just excited that there is a plan.

How You Can Help

Microsoft already has a clear understanding of the areas that need some work.  What Microsoft needs more than anything is a set of structured steps that will reproduce an issue – any issue you have.  From experience this is very hard to pin down.  If you know how to reproduce an issue, please send me the steps to reproduce it (you can add to the comments below or send me a message via my home page).

Note:  I realise lots of people have had a bad experience at times.  I am explicitly looking for “reproducible steps” here.  This means that when you follow these steps it will always (or often – eg 1 in 4) cause the issue.  If it is not reproducible then the information cannot help for this next phase.

When to Create a Lookup Table in Power Pivot

Level: Beginners

Today I explain when it is important to create a lookup table and when it is fine to use native columns in a data table.  I have rated this topic as a beginner topic as it is a fundamental skill to learn on your journey to become a Power Pivot and Power BI Ninja.  However there is so much to learn and you can only absorb so much content before your head begins to explode.  Because there is so much to learn, you may find yourself learning this topic for the first time now even though you are an intermediate user. Or you may come back and read this again for a second time before it all sinks in.  This topic is therefore good for everyone, either as a review, a confirmation of what you know, or as a new learning opportunity.

It is also worth pointing out that his topic does not mean the difference between “it works” and “It doesn’t work”.  You can be successful using Power Pivot and Power BI without understanding this topic well.  But it can make a big difference to usability of your model and certainly performance (in some cases).  And not least of all, if you want to get better and advance you skills, you simply must understand these principles well.

Topics Covered in This Post

In this post I cover the following scenarios to help explain some of the nuances and benefits of using lookup tables (or not):

  • Simple data models
  • Inbuilt time intelligence functions
  • To Simplify Your DAX formulas
  • Pointless dimension tables
  • Multiple data tables
  • Simple business communication

First a Definition of the 2 Table Types

Before I move on, it is worth restating the definition of the 2 different types of tables so it is clear in your mind.

There are 2 different types of tables in a Power Pivot data model.  There are dimension tables (I call them lookup tables) and there are fact tables (I call them data tables).  These two table types are very different and understanding the difference is fundamental to understanding and using Power Pivot and Power BI properly.

Lookup Tables (Dimension Tables)

Lookup tables contain information typically about 1 business concept/object, eg Customers, Products, Time etc.

image

Lookup tables always have the following features:

  • There is always a column of data that uniquely identifies each row of data in the table.  In database terms this is the primary key.  In the customer table it would be the customer number, in the products table it is the product code, in the calendar table it is the date, and so on for any lookup table.
  • There can be no duplicates in this key column – every row must be unique
  • There can be 1 or more additional columns in each lookup table that provide more information about that primary key column.  Eg in the calendar lookup table, the date is the primary key. Other columns such as Day Name, Month Name, Year, Day of Week number etc all provide detail about the date key.  I often refer to these additional columns as providing metadata about the primary key.

Data Tables

Data tables contain (typically) transactional information.  It could be sales data, budget data, GL data from a financial system, call centre call data, or any other data about activities of some type.

image

Data tables have the following features:

  • There is no requirement that a data table must have a primary key.
  • There needs to be 1 column that can be connected to relevant lookup tables (assuming you want to connect these tables).  In database terms, this column is called the foreign key.  It should be obvious that if you want to join your data table to your product table, you must have a column that defines the product in your data table.
  • Duplicates are allowed in data table columns.  eg many sales occur on each day, many sales occur for each product.

Joining the Tables

You cannot join data tables to other data tables.  Data tables are always joined to lookup tables using the Lookup Table primary key column as the join column.

I am using the Adventure Works database in the image below.  I have placed the lookup tables in my data model at the top of the window and the data table below.  This is the Collie Layout Methodology; it gives the user a visual clue that the tables you have to “look up” to see are the Lookup Tables – get it?  It is not required, just an easy way to remember.  Technically this data layout is called a star schema.

image

When To Use Lookup Tables

Back to the point of this post.  When should you use lookup tables and when is it OK to simply use the column in your data table?  Well the fact is that you do not “need” to use lookup tables at all to make your data model work – it is perfectly fine to simply load a data table and go from there.  Having said that, read on to find out the nuances of the some common scenarios.

Simple data models

Let me start by covering when it is ok to not use any lookup tables.  If your data meets the following scenario it is fine to ignore lookup tables:

  • Your data set is small.  What defines small depends of course.  Certainly 10k – 250k rows of data is small (tiny actually). It is less important to efficiently store your data when your data set is small and hence a flat single data table can be OK.
  • Your data doesn’t contain too many columns (this is a topic in its own right and outside of the scope of this post).  If you have lots of columns (eg >15) then even 250k rows of data can become non-performant.  But if you have a few columns, then maybe you don’t need lookup tables.
  • Your data table already contains all the columns you need for your analysis.
  • You don’t need to use any inbuilt time intelligence calculations (more on that below).
  • You only need to write simple DAX formulas (such as SUM formulas).

If this is your scenario, then you don’t need lookup tables at all.  However if you always operate in this simple scenario then you will not advance your DAX knowledge to the next level, so keep that in mind.  This scenario with no lookup tables is quite rare and is probably limited to Excel users that already have their data in a single flat table.

Inbuilt Time Intelligence Functions

It is mandatory to have a calendar lookup table if you want to use the inbuilt time intelligence features of Power Pivot.  Examples include calculating sales last year, sales year to date etc. I cover calendar lookup tables in depth in this article here.  It is possible to do manual time intelligence calculations (not inbuilt, but manual) using a single data table, but I don’t recommend it.  If time is an important part of your data, then I strongly recommend you get a calendar lookup table and go from there.

To Simplify Your DAX Formulas

While it is possible to use a simple data model with a single data table and no lookup tables (as mentioned above), once you have more complex analysis needs (eg other than just using SUM) you will need to write some DAX formulas to create the business insights.  In my experience it can be easier to write DAX formulas when you have lookup tables, particularly when you need to change the filter context in your data model.  “How” to do this is also out of scope for this topic, but be aware that this is another reason to move on from a single data table.

Pointless Dimension Tables

There is no point creating a dimension/lookup table without a reason and certainly not if there are only 2 columns in the lookup table.  This will make more sense with an example.

Take the Adventure Works data model as shown below.  It contains a lookup table (Products) with 2 columns, one for Product Key and one for Product Name.  The lookup table is joined to the data table using the primary key (of course).

image

The data in this lookup table would look something like shown below – only 2 columns (but a lot more rows of data than shown in this image of course).

image

People that have some understanding of traditional relational databases will be aware that relational databases will store the data in the first column (ProductKey) a lot more efficiently than the data in the second column (Product Name).  It is therefore common to think that it is more efficient to store the product key in the data table and put the product name in the lookup table only.  But this is not the case in Power Pivot and Power BI.  Power Pivot uses the xVelocity storage engine (columnar store) to compress the data.  There is no (or little) space saving benefit in storing the product code in the data table instead of storing the product name.  In addition every relationship in your data model comes at a cost/overhead.  This is why this lookup table is called a junk dimension. If the only reason you have this lookup table is to join the single column “Product Name” to the data table via the product key, then it would be just as good to load the product name into the data table and drop the product key all together.

If your data already has the product name (and not the product key) in the data table, and you are thinking of creating this lookup table, then you may want to think again.  That being said, there are other reasons why you may want to keep the primary key and lookup table including

  • if you have more than 2 columns in your lookup table.
  • if your product names are not unique (often the product key is managed more rigorously than the name)
  • if your data table already has the product key in the table and it is easier to do it this way.

to name a few.

If you data table contains a column that is not related to any other column as a logical object and you are wondering if you should build a lookup table for it – the answer is no – don’t do it.  Just use the column in your data table.

Multiple data tables

As you probably know already, it is possible to have multiple data tables in your data model. In fact this is one of the many great things about Power Pivot.

image

However as I mentioned at the start of this post, it is not possible to join one data table to another data table in Power Pivot. If you want to use multiple data tables in your data model, you must join them through common lookup tables.  It therefore follows that if you have multiple data tables in your data table, you should load lookup tables even if they are junk dimensions.

Simple Business Communication

There is a clear benefit to business users if you logically group your columns of data into lookup tables that make business sense.  It can be hard for users if they need to hunt through the data table for information about a customer, or a product etc.  It is a much better end user experience if there are logical groupings of columns of relevant data in a table called Customer, Product etc.  Given the end users may be writing their own reports using the field list to find the relevant data, grouping the data into lookup tables can really improve the end user experience.

Flatten Your Lookup Tables

Further to the above point, you should also consider flattening out your data into a single lookup table if you can.  The following data model is feasible and will work, but it is hardly user friendly.  In the image below there is a product lookup table, and then a sub category lookup table of the product table, and a product category lookup table of the sub category table.

image

But if you deploy this model, it is likely to be confusing to the end user.  Remember they will be building reports using the field list, and they will need to go searching across different tables to find the columns of data they use.  It is much better to flatten the data from the additional lookup tables into a single product lookup table if you can.  Also flattening these tables will remove extra relationships and this will make the model more efficient (every relationship comes at a cost).  And finally your DAX will be easier to read and write.

Wrap Up

Hopefully this article has helped clarify when and where to use lookup tables.  Let me know your thoughts and/or any questions that remain unresolved in the comments below.

Find Duplicate Files on Your PC with Power BI

Level: Beginners

If you want to learn new skills using a new tool, then you simply must practice.  One great way to practice is to weave the new tool into you daily problem solving.  If you have something meaningful to do with the new tool, then you are much more likely to be motivated to practice.  And the new tool I am talking about of course is Power BI.

Last week I showed how easy it is to use Power BI to help you track down large files saved in Dropbox so you could manage the overall space usage.  As a result of that article, Graham Whiteman posted a comment suggesting it would be a good next step to find duplicate files.  I think that is a great idea, so I decided to test it out on my PC.  Read on to see how I did it, and how you can do it too.

Create a Query to Fetch All PC files

I started a new Power BI Desktop file, then connected to my PC documents folder

image

image

I immediately selected Edit query as shown in 1 below.

image

The only time you would immediately select Load (2 above) is if the data you are imported is already in the correct shape for Power BI.

The only columns I need are the file name, date modified, attributes and path (shown below).  I Multi selected the columns I wanted to keep, then I right clicked and select “remove other columns”.

image

The next step was to extract the file size from the attributes list. To do this, I expanded the list of attributes (1 below), deselected all the columns and then reselected the file size (3 below).

image

Then I renamed the query (1 below) and changed the query so it didn’t load to Power BI by right clicking on the query and un-checking the enable load option.

image

This created a query that links to the PC, keeps the columns of data needed but didn’t load anything to Power BI yet.

Create a New Query that Accesses the File List

The next step was to create a new query that references the File List.  I right clicked on the first query (1 below) and then selected Reference (2 below).  Note how the File List query is shown in Italics indicating that it won’t load to Power BI.

image

The next step was to merge this data with the itself by going to the Home Ribbon and selecting Merge Queries.

image.

In the Merge Queries dialog, I joined the list of files File List (2) with the original query File List so that it was joined to itself on 3 columns (the File Name, Modify Date and File Size) but not the File Path as shown below.

join file list

The above steps added a new column to the query.  I then expanded the new column as shown below making sure to keep the original column name prefix.

image

Find The Duplicate Files

The second query now looked like this.  As you can see in the image below, the query returned all the files (name column) along with the folder paths from the query “File List” shown as 1, and a second column containing the folder paths from the query “File List (2)” shown as 2 below.

image

The next step was to get rid of all rows in this query where the 2 folder paths are identical.  Doing this is easy with a custom column.  I added a custom column (steps 1 and 2), and wrote a formula to return TRUE if the 2 folder paths were identical.

image

I then filtered out everything that returned a TRUE in the new column using the filter button as shown below.

remove matches

I then deleted this custom column as it was no longer needed.  I just right clicked and selected remove.

Format the Number Columns

It is very important in Power BI to set the number formats before loading the data.  Any numeric column that has a data type “Any” should be changed to a suitable numeric format (as shown below).

image

I did this, renamed the query to be called “Duplicates” and then selected  “Close and Load” to get the data into Power BI.

Time to Write Some DAX

Now the data is loaded, you of course I could just drag the one or more of the columns to the Power BI canvas.  But remember half the reason of doing this is to get some new skills.  So instead of dragging the Size column and creating an implicit measure, I wrote some DAX – it isn’t hard to get started with such simple formulas.  Here’s how to do it.

Select the Size column, go to the Modelling Ribbon and select New Measure.

image

The formula I wrote is as follows

File Size MB = sum(Duplicates[Size])/(1024 * 1024)

image

Note a few things that I was able to do by writing this measure myself

  1. I converted the units of the result from bytes to megabytes by dividing by (1024 x 1024).
  2. I gave the measure a more meaningful name “File Size MB”
  3. I was able to set the formatting to comma separated with 1 decimal place

And of course I practiced my DAX.

And the Results

I simply then added the file size, File Name, Folder Path and Second Folder Path to a table in Power BI like shown below. image

I then discovered I had around 9 GB of duplicate files on my PC.  I sorted the table by File Size descending and discovered that I had multiple identical copies of a contoso.pbix.  It looks above like there are 6 copies of contoso.pbix but this is deceiving. Every copy of a file will find a match with every other copy.  If  you note in the Folder Path column, there are only 3 unique folder paths, hence 3 files.

The next thing I did was add a Tree Map as shown, with the file name in the Group section and File Size MB in the Values section.

image

To find out accurately how many copies of each file there were, I had to write some more DAX.  This formula is a bit more involved (intermediate DAX).

2016-10-31_120903

Let me explain this formula starting from the inside out.  There are 4 functions in this DAX formula and I describe their role below.

  1. SUMX is an iterator.  It iterates over a table specified as the first parameter (VALUES in this case).  You can read more about SUMX here.
  2. The VALUES function returns a table of unique file names (in this case it is unique values in the column Duplicates[Name]).  So SUMX above will iterate over each file name in the name column.
  3. SUMX is iterating over a Virtual Table (VALUES).  The CALCULATE is required to force context transition.
  4. Then for each file name in the table (in 2 above), DISTINCTCOUNT will count how many unique folder names there are.

I then added the new File Count measure to the Colour Saturation section of the Tree Map (1 below).  This does 2 things.  Firstly it shows the high folder count files as being a darker colour, and secondly it adds the file count to the tool tips (visible when you hover the mouse over the visual).

image

And Now Some Fun

I’ve been looking for an excuse to do this for some time.  I want to find the fattest fish in my pond (aka most space taken by file name).  I went to visuals.powerbi.com and downloaded the Enlighten Aquarium custom visual.

app.powerbi.com/visuals/show/Aquarium1442671919391

I then imported the custom visual into Power BI Desktop

image

The I copied my Tree Map visual (Ctrl-c, Ctrl-v), selected the copy and changed the visualisation to be the Aquarium.  This visual is showing the largest individual files regardless of location or how many copies.  I am not saying this is the best way to visualise data, but surely it is one of the most creative.

fish

Here is my final workbook canvas

image

For the purists out there, I wrote a new file size formula as follows.

final

 

The original formula I wrote double counts the file size when there are multiple duplicates.  The above formula is almost identical to the File Count I explained above.  The only difference really is the inclusion of MAX(Duplicates[Size]).  This is a “trick” to handle the fact that for each file name there will be multiple records in the data model.  Each file will have the exact same file size, so by selecting MAX I simply get to access the file size.  I could have used any other aggregator (eg Min, Avg, Sum) and got the same outcome.

I haven’t shared the actual workbook here. The whole idea is for you do try this yourself so you get 3 benefits; more disk space, some practice with Power BI Desktop and have some fun.

Use Power Query to Manage Dropbox Space

Level: Beginners

I got this dreaded Dropbox email recently as shown below.

image

I needed to clear out some of the files I have loaded in Dropbox so I didn’t have to upgrade my account.  It occurred to me that I could make this process a lot easier by using Power BI to quickly show me where my big files were located in Dropbox.  This post today explains how I did it. What I ended up with is a report like this that allowed me drill down on the large sub folders to easily find my big files.

dropbox size

Note, there is a great tool called WinDirStat that you can download here that does this too – I use WinDirStat all the time. But I never want to miss an opportunity to do something with Power BI.

Process to Build the “File Space Usage” Tool

First I created a new Power BI report and connected it to my Dropbox folder.

image

You can of course use the same process on any other folder on your computer, or even the entire Hard Disk if you want.

I then imported the columns I thought would be of use, and loaded them into the data model.

image

I figured the Hidden flag and Date Accessed might be useful at some stage, so I brought those in too.

I then wrote some measures that I thought would be useful.

image

I encourage you to write your own measures rather than use the implicit measures created when you drag a column of values to the report.  By writing your own measures, you “Learn” how to write DAX and that will help you become a Power BI ninja.

The last thing I did was to create a report that made it easy to see where my big files were located and find out what they were.

image

I have loaded a short 4 minute video that shows how quick and easy it is to do this from scratch.

What Obtuse uses have you found for Power BI?

I would love to hear from others about how they are using Power BI in ways that are not immediately obvious.

Does the Power Pivot Excel Add-In Crash for You?

I was chatting to Avi Singh the other day and we were comparing notes about the live Power Pivot training classes we both offer in Australia (me) and the USA (Avi).  One thing we both have in common is that the Microsoft Excel Power Pivot Add-In CRASHES a lot during training!  When I say “crash”, I mean it stops working in some ungraceful way.  Excel normally don’t stop working, just the plugin.   I always warn my students that the Add-In will crash , and I always say If Power Pivot wasn’t so great, you wouldn’t put up with it – but it seriously IS that great and hence I have learnt to live with it”.  But there were a lot more Add-In crashes in my training last week than I have come to expect, and I would like to try to do something about it.  Note when I say the Add-In crashes, I mean it stops working – Excel itself doesn’t crash.

Before moving on, I don’t want anyone to get me wrong here, I still love Power Pivot (really – I LOVE IT!).  It is still the best thing to happen to Excel since Excel itself.  But the instability of Power Pivot for Excel during the data modelling process is really bad.  In this post today I explain the symptoms of the Add-In crash and the recovery process.  But more importantly I am seeking feedback from the community to share their own experiences with the Add-in crashing so I have some basis for a discussion with Microsoft when I am in Redmond later this year.  So please post your comments and experiences at the bottom of the page.

It is Worse in Excel 2016

I taught my Power Pivot class using Excel 2016 for the first time last week (I normally teach in Excel 2013 or Excel 2010).  Frankly I was expecting that the Add-In would be a lot more stable given it is an updated version, and given Microsoft ships bug fixes via O365 on a regular basis.  But in fact I found the exact opposite – The Excel 2016 O365 version of Power Pivot for Excel is the most unstable I have ever used.  For the first time ever, the Add-In crashed and at the same time it actually corrupted my data model and made the entire workbook un-usable.  This has never happened to me before using Excel 2010 or Excel 2013 despite experiencing literally hundreds of Add-In crashes over the years.

Symptoms that the Add-In has Crashed

There are a few symptoms that the Power Pivot Add-In has crashed.  The easiest to spot is when you get some sort of “Unhandled Exception” error message (something like this one below).

Image result for unhandled exception

If you see an error message like this, then it is time to restart the Add-in.  Read about how to recover from an Add-In crash further down the page.

But sometimes the symptoms that the Add-In has crashed are harder to spot.  An example is when you write a measure directly in the Excel measure editing box (like shown below), the measure is written correctly (shown as 1 below), yet when you click “Check Formula” (shown as 2), it says there is an error in the formula (shown as 3).

image

This is really confusing for someone that is trying to learn to write DAX.  It is hard enough getting your head around all the new formulas and new syntax, but when Excel is telling you there is an error in your formula AND THERE ISN’T, then wow!  I have a few years of Power Pivot experience, and in that time I have learnt to spot these subtle clues that the Power Pivot Ad-In has crashed, but for a new user it is a horrible experience.

Remedies for When the Power Pivot Add-In Crashes

The first thing I always do when teaching a Power Pivot class is to show the students how to recover from a Power Pivot Add-In crash.  This is a mandatory skill for anyone learning Power Pivot.  The standard fix is to turn the Add-In off, then on again. But in my experience there are actually 3 levels of remediation depending on the seriousness of the Power Pivot Add-In Crash.  In my training course yesterday, I came up with a naming convention for the 3 levels of fix.

DEFCON 3

This is the most mild response to the Add-In crash and it is always what I try first.  You first need to open the COM Add-Ins dialog.  The easiest way to do this is to turn on the Developer Ribbon (turned off by default) and use the COM Add-Ins button.

image

Once in the dialog, you need to de-select the Power Pivot Add-In (shown as 1), then click OK (shown as 2).

image

Then you need to go back into the COM Add-Ins again, reselect the Add-In and click OK.  More often than not, this will successfully restart the Add-In and you can get back to work.

DEFCON 2

Sometimes the simple stop/restart fix doesn’t solve the problem.  Then it is time to move to DEFCON 2.  The process is basically the same except you must close the Excel application before restarting the Add-In.

  1. Disable the Power Pivot Add-In
  2. Shut down Excel
  3. Restart Excel
  4. Re-enable the Power Pivot Add-In.

This second approach will fix many of the more serious Add-In crashes.  But sometimes (and actually quite often when using Excel 2016), even this second level response doesn’t work.

DEFCON 1

Time then to move to DEFCON 1 and throw everything you have at the problem.  The process is still basically the same except you add in a PC reboot to completely reset everything.

    1. Disable the Power Pivot Add-In
    2. Shut down Excel
    3. Reboot your PC
    4. Restart Excel
    5. Re-enable the Power Pivot Add-In.

This last approach has always recovered the Power Pivot Add-In crash for me.  Unfortunately however I experienced my first ever workbook corruption using Excel 2016.  I got the Power Pivot Add-In working again with a DEFCON 1 response, but the workbook could not be saved.

Things that Causes the Add-In to Crash

Given I have observed so many Add-In crashes over the years, I have an understanding of the types of things that cause the Add-In to crash.  The key things that come to mind are:

  • If you make an error in a DAX formula but don’t click “Check Formula” before saving the measure.  The more often you do this, the more likely the Add-In will crash.  It seems the internal error handling is not that robust.
  • When you create Linked Tables and add them to the data model, there are often issues that seem to be related to the data model not refreshing properly.  Often you simply can’t see the new table in the Pivot Table Field List.  I can normally solve this problem by shutting down Excel and restarting.
  • If you try to rename a Linked Table after you have added it to the data model, this can cause the Add-In to crash.
  • Unfortunately now it seems that if you are using Excel 2016, that alone is enough to be at risk.

Power BI Desktop

Edit: 22 Oct 2016

I just finished a 2 day in house training course with a client.  For the first time I did the entire training in Power BI desktop.  It was much more stable than Excel has been.  There were a couple of application crashes but recovery was pretty good.

Please Share Your Experiences so I Can Share with Microsoft

As I mentioned at the top of the post, I want to be able to give some “feedback” to Microsoft about this problem.  You can help me by sharing your experiences in the comments section below.   The more real world feedback I get, the more likely I will be able to get some attention to the issue. That way I will be able to demonstrate that this is not just one grumpy Australian that is complaining about an isolated problem, but a widespread issue that needs some attention.

I will let you know how I go 🙂

Who Needs Power Pivot, Power Query and Power BI Anyway?

Level: Beginners

One of the great challenges Microsoft has faced with its “new” suite of Self Service BI tools (particularly Power Pivot) is that most people that could benefit from the holy trinity (Power Pivot, Power Query and Power BI) don’t even know these tools exist, let alone how the tools can help them succeed in their jobs.  The situation is definitely getting better as Power BI starts to get a presence in the market place, however I still talk to people who have heard of Power BI, but have no idea what Power Pivot or Power Query are, and what’s more they don’t know why they should care.  I personally believe a big part of the awareness problem is that Microsoft insists on shipping Microsoft Excel with the Power Pivot plugin disabled.  There is no reference to Power Pivot when you excitedly receive your brand spanking new version of Excel 2016 – what a marketing opportunity gone begging!

image

I have been an Excel nerd for 30 years.  There is nothing I used to enjoy more than installing a new version of Excel, and clicking through every menu item to find something shiny and new that would make my life easier.  By not turning on the Power Pivot menu by default, Microsoft is missing one of the best silent selling opportunities is has for this fabulous addition to Excel.

Given there is no “pull through” on the menus, many people don’t know what these products are or why they should care.  I am often asked by people “who can benefit from these tools?”.  This post sets out to explain who can benefit and why.  Note when I say “who can benefit”, I am not talking about “consumers of reports” here, I am talking about “authors of reports”.  It is clear that people that consume quality reports and analysis will benefit, whatever the tool.  This article is focused on the benefits to those people that are responsible for building the reports and analysis that others will end up consuming.

Power BI

Who can benefit from Power BI is probably the easiest to understand.  The product is well marketed and has a clear role to play.   Power BI is a complete self service BI tool.  It is designed to bring business intelligence capabilities to the masses instead of the elite (e.g. instead of just highly skilled IT MDX programmers).  Rob Collie wrote a good article last week about the democratisation of BI tools.  Power BI will add value to people who:

  1. Have problems sharing reports with others because the file size is too large.
  2. Need to share data with people on the go, that maybe only have a Tablet or a Mobile phone.
  3. Have large data sets that can’t be managed in traditional Excel.
  4. Are spending too much time each week/month manually maintaining reports with new source data and/or new visualisation requests.
  5. Can’t get the (timely) support they need from their IT department using traditional Enterprise BI tools.

Power BI is great because it puts capabilities across the end to end BI stack into the hands of end users (authors), including:

  1. Extraction of data from the source (using the Power Query engine)
  2. Transformation of that data into a new shape that is optimum for BI reporting and analytics (Power Query again).
  3. Data modelling capabilities, so you can turn the raw data into business meaningful insights (using the Power Pivot engine).
  4. A reporting and analytics front end allowing you to build reports to visualise your data (Power BI Desktop and Power BI Service).
  5. A fully integrated cloud based delivery framework so you can easily share with anyone over the internet (Power BI Service).
  6. A full set of Mobile applications across the major operating systems (Power BI Mobile).

Notice how steps 1 and 2 use Power Query, and step 3 uses Power Pivot.  So if you want to learn about Power BI, you really need to learn about Power Pivot and Power Query too.

Power Pivot

Conversely, Power Pivot is the hardest to understand – I.e. it is the hardest for individuals (potential authors) to understand “what does Power Pivot do for me and why do I need it?”.  I have had people enquire about Power BI training courses that have not been interested in Power Pivot or DAX*. But the truth is, if you want be able to write your own reports in Power BI, you really need to learn at least some basic Power Pivot skills.

Power Pivot is a data modelling tool.  It is the underlying reporting engine that enables Power BI and Modern Excel to delivery those modern funky reports that can help you succeed in business.  The Power Pivot engine allows you (the report author) to take your business knowledge and to configure the reporting tools so that Power BI and Excel Pivot tables can be used to find and report on insights in your data.

Most business users have never heard of the term “Data Modelling” before, and the reason for this is quite simple – it has always been the IT department that has been responsible for data modelling.  Power Pivot brings the power of data modelling and puts it in the hands of competent business/Excel users.    An example of data modelling will make it easier to understand.

Example of Data Modelling

Consider a scenario where you download sales data from your company transaction system and it looks something like this in a spreadsheet.

image

You can see the sell price and the cost price information exists in the table above.  But there is nothing about the $ Margin per product and nothing about the % Margin, let alone insights like Sales Year to Date, Top 3 selling products, Fastest growing product etc.  In a traditional Excel world you would simply write formulas in your spreadsheet(s) to enhance this raw data and extract the additional insights.  The trouble with this approach is that all these formulas are only ever “one off”.  You write them for this report, and then when you create another report later, you have to write the formulas again.  Power Pivot handles this problem in a different way.  Power Pivot is a data modelling tool, and instead of writing formulas in your final reports, you write them “under the hood directly in the Power Pivot engine” as part of the data modelYou build the “rules” that describe how to calculate $ Margin and % Margin ONCE and only once directly in the Power Pivot engine.  Once you have created the rules for each insight (e.g. $ Margin, % Margin etc), it is forever available for you (and others) to use over and over again in any report, chart, Pivot Table or any other visualisation you can think of.  Never again to you have to write/copy a new formula every time you create a new report.

*What About DAX?

Data Analysis Expressions (DAX) is the formula language of Power Pivot.  DAX is very similar to the Excel formula language yet there are also a lot of differences that you will have to learn from scratch.  Simply put, if you want to learn Power Pivot (and/or Power BI), you will need to learn at least some DAX as well as lots of other things about how the Power Pivot engine works.

Who needs Power Pivot and why?

People that need this tool are typically Microsoft Excel users/report authors that analyse or report on data, particularly when the following conditions apply:

    1. There are lots of data that stretch the capacity of traditional Excel (file size, re-calculation speed etc).
    2. If you use Pivot Tables a lot to analyse your data.
    3. If you are writing a lot of VLOOKUP (or INDEX/MATCH) formulas to join data from different tables so you can analyse in a Pivot Table.
    4. If you have had to create bespoke reports in traditional Excel, but then have to spend hours rebuilding the report when asked for a different view of the data (e.g. you have a report that shows sales by half year, and then you are asked to produce the same report by month – and have to start again almost from scratch).
    5. Anyone that wants to start using Power BI for anything more than the most simple aggregation reports.  The “out of the box” capabilities of data modelling in Power BI are limited, and you will quickly realise that you need to learn some new skills (Power Pivot/DAX) to be able to leverage the strengths of Power BI.

Power Query

Power Query is a user friendly ETL (Extract, Transform, Load) tool.  Unfortunately Microsoft muddied the waters by renaming Power Query “Get and Transform” in Excel 2016, and “Get Data” in Power BI Desktop.  That aside, Power Query is used to:

  1. Extract:  Connect directly to the data source and ingest the data (into Power BI or Excel) so you can work with it.  It retains a connection to the source so when the source data is updated you can automatically “refresh” the data without having to go through the import process again from scratch. It is amazing.
  2. Transform:  You can clean and reshape the data so that by the time you are ready to use it, it already is in the format you need instead of the format you are given.  Again, you do this only once, and then you can automatically refresh when the data is updated later.
  3. Load the data directly to the place where you want to use it.  The end location for the data will typically be inside Power Pivot or Power BI, but it is also an invaluable tool for use with traditional Excel.

Don’t think of Power Query as simply a tool for self service BI reporting.  I have used Power Query to automatically audit information (eg XLSX files, csv extracts from other systems etc) and produce exception reports directly in Excel.  Once you understand what Power Query can do, you will start to realise all the ways you could use it to make your life easier.

Who needs Power Query and why?

People that need this tool typically are trying to solve the following problems.

  1. They regularly receive file extracts from someone (or some system) and need to manually manipulate this data before they can use it.
  2. They want to use Power BI/Power Pivot but can’t get the IT department to shape the data they need prior to loading it into Power BI/Power Pivot.
  3. People that are doing repetitive tasks such as matching invoices on a remittance advice against a bank statement (or similar) – Power Query eats this kind of work.
  4. Anyone that is given data in the wrong shape (example below).  And yes I quickly used Power Query to un-pivot the data in the blue table so it looked like the green table.  It is worth getting and using Power Query just to get this un-pivot feature alone!
    image

Wrap Up

Hopefully this overview has helped demystify how Power Pivot, Power Query and Power BI can help you in your job.  If you would like to learn more about Power Pivot, you can read my book “Learn to Write DAX“.  If you want to learn more about Power Query, I recommend Ken and Miguel’s book “M is for Data Monkey“.  If you live in Australia, you can attend one of my live training courses where I cover what you need to know about Power Pivot, Power Query and Power BI.

Shaping vs Modelling in Power BI

Level: Beginners

Power Pivot, Power Query and Power BI are 3 products that are closely related to each other and were all built for the same purpose – enabling Self Service Business Intelligence.  I first learnt to use Power Pivot for Excel, then Power Query for Excel, and finally Power BI.  But there is a new wave of users that have never heard of Power Pivot and Power Query, and their first exposure to the underlying technologies is with Power BI.  This can be confusing and hence I explain what you need to know below.

Power BI Can be Confusing to First Timers

I realised recently that people who have never used Power Pivot and Power Query could easily be confused by the different options to do things in Power BI.  There are lots of reasons to be confused, including:

  • There is no reference to the term “Power Query” inside the Power BI tools.  Microsoft stopped using this name in favour of terms like “Get Data”, and “Edit Queries”.  I think this was a bad decision and it has fuelled the confusion. When I tell people to do something in Power Query, they therefore don’t know what I am talking about.
  • There is no reference to the term “Power Pivot” inside Power BI either.  The most relevant proxy for Power Pivot inside Power BI is the “Modelling” menu. Use of the term Power Pivot therefore doesn’t make any sense to “Power BI only users” either.

Shaping and Modelling Your Data

The implications of the above is that there is no clear way to delineate (with language/labels) between the process of loading the data and the process of modelling the data.  I like the term “Shaping” as a replacement for Power Query/Get Data and “Modelling” as a replacement for Power Pivot.

Definition of Shaping

Shaping is the process of extracting your data from the source, and transforming the data so that you can load it into Power BI (often referred to as ETL).  The reason I like the word Shaping is that it clearly describes what is normally needed as the first step in any BI project.  The shape of your data refers to

  • What data you extract
  • How your data is loaded into one or more tables, and which tables you import
  • What names you give to the columns and tables
  • Whether your tables are wide and short (many columns), or long and narrow (fewer columns)

For example data tables (also called fact tables) are normally best shaped into long, narrow tables (lots of rows, and a minimum number of columns), whereas lookup (dimension) tables can be as wide as needed to for the task.

image

All tables should only be loaded with data that you actually need.  One big difference between Power Pivot/Power Query and Excel, is it is easy to go back and get the data you missed later.

Definition of Modelling

Modelling is the process of building the relationships between your tables and converting the business requirements into measures and calculated columns using the DAX language.  The measures contain

  • The business logic that transforms your raw data to a usable, value adding calculation
  • The formatting required for the numbers (currency, percentage)
  • A good business name for the measure that accurately describes the measure

Prior to Self Service Business Intelligence, data modelling was always done by expert IT professionals using a complex language like MDX.  The complexity of such languages put it out of reach of users, as it was simply too hard to learn casually.

2 Ways to Add A Column – Wait What?

A perfect example of the type of confusion Power BI first timers face is that there are 2 ways to add columns to your data.  You can add a column during the shaping process, and you can add a column during the modelling process.

image

So which should you use?  Well it depends on what you are trying to do.  My general advice on adding columns to your tables is to make the change as close to the source as possible. So in order of preference:

  1. Add the column in your source data (eg your Data Warehouse) with the help of your IT friends if necessary
  2. Add the column during the shaping process
  3. Add the column during the modelling process

If you add the column at the source, then you have access to it for your current BI needs but also for future BI needs.  Adding the column at the source provides the most reuse, and if you need to make a change down the track, it will be the least maintenance.  But sometimes you can’t do that for various reasons, or you can’t do it quickly.

If you add the column during shaping, it will be loaded and compressed with all other columns during data load.  This normally means better overall table compression and hence a good outcome for the overall model.

The most common reason you would add a column during the modelling process is if you need to leverage other parts of the model to create the data.  An example would be if you wanted to use a measure (eg Total Sales) in your data model to create the new column in the model (eg Banded groups of Customers based on High, Med, Low annual sales).  But don’t fall into the trap of using too many calculated columns instead of measures. I rant on about that here.  Use them when they are needed, otherwise avoid them.

Wrap Up

Hopefully this explanation of the role of shaping and modelling in Power BI has helped clarify some things for you.  Don’t forget to check out the other articles in my knowledge base here.

Top 10 Tips for Getting Started with Power BI

Level: Beginners

top 10 tipsI really love Power BI, and I have learnt so much over the last 12 months that sometimes it is easy to forget the challenges I had in getting started.

Today I am sharing my top 10 tips on how to get started with Power BI.

Build Your Reports in Power BI Desktop, Not the Service

One thing that can be confusing is that there are 2 ways you can build reports – Power BI Desktop and Power BI Service.  My first tip is that you should always use Power BI Desktop.  Reasons include:

  • There are more configuration capabilities in Power BI Desktop
  • You can upload changes made in Power BI Desktop to the Service, but you can’t download changes made in Power BI Service to the desktop.
  • There is currently no way to back up your changes in Power BI Service, but you can back up Power BI Desktop yourself.

Don’t Try to Replicate What You Do in Traditional Excel

Power BI is not the same as Excel, and hence you should not start out trying to simply replicate what you currently do in Excel.  The closest thing in Excel to Power BI is the Pivot Table (probably Pivot Charts actually).  The key similarity here is that a Pivot Table is an aggregation and summarisation visualisation tool – this is exactly what Power BI is too.  Yet there is no Pivot Table object currently available in Power BI.  The closest thing to a Pivot Table in Power BI is the Matrix, yet if you start using the Matrix with the view of replicating a Pivot Table, then you are bound to be disappointed.  Hence you should explore the other options (to your great delight).  See the next tip.

Explore the Visualisations to Understand Capabilities

Instead of replicating what you do in Excel, I recommend that you explore the various visualisations to see what each of them does.  There are lots of new visualisations that don’t exist in Excel, and some of these are very effective in helping you communicate the insights in data.  A great place to start is the Tree Map or a Column Chart.  You can get these visualisations to behave just like a Pivot Table in Excel.  Just place a number of summary levels into the visual (see the Axis section in the image below), then you can drill through the visual just like a Pivot Table.  It is a great experience.

drill

You should also check out the Custom Visuals Gallery for some of the great free tools that have been shared with the community by industry leading experts.

Understand the Purpose of Your Report, Then Find a Way to Build It

A common mistake is to start out thinking that you should simply replicate what you have previously built in Excel.  As covered above, this approach will limit your thinking and make the journey so much harder.  Instead I recommend you start out by thinking about the purpose of your report.  You need to be clear “What” you are trying to communicate, “How” the user will consume the information, and what business decisions will be made with the data.  If you can write down a couple of sentences about the purpose of your report, that is even better.  Once you are clear on the purpose, then set about finding the best visualisations and techniques to achieve the best outcome  that delivers on the purpose of the report.

Assume Your Current Data Shape is  Not Ideal for Power BI

Many Excel professionals make a fundamental mistake by simply loading data from their current source systems into Power BI as is.  The most common issue I find when helping people with their DAX problems is they don’t have a good data structure.  There is a big difference between transactional business systems and BI reporting tools.  If you want your reporting tools to be easy to use, easy to enhance and to perform well, then you really need to understand the data structure needed by BI Reporting tools.  I wrote a separate article about data shape here.

Assume Calculated Columns are Not What You Need

Self taught Power Pivot/Power BI users from an Excel background tend to make the mistake of writing far too many calculated columns.  I have hardly ever seen a calculated column from a self taught user where it was “the correct decision” – in almost all cases the calculated column should have been written as a measure.  I cover this topic in a lot more detail here.  I generally recommend (to Excel users) that you should assume your formulas should be written as measures unless you know for sure why it needs to be a calculated column.

Learn How Cross Filtering Works, and How to Turn it Off

Cross filtering is one of the great features of Power BI, and you should learn how it works so you can use it to your advantage.  However in my experience the default cross filtering behaviour is not always the best experience for the user.  In a recent release, Microsoft deployed a feature that allows you to turn off (or change) the cross filtering behaviour between visualisations.

Once you select a visualisation (1 below), the “format” menu (2 below) appears at the top of the page.  From there you can select “Edit Interactions” (3 below) to change the default cross filtering behaviour.  It is easy to work out how to do it, so just give it a go and learn by doing.

image

Load Content to be Shared into a Group Workspace

If you intend to share data with others, make sure you load the content into a Group Workspace right from the start. I have found it is much easier to keep control of data that you intend to share if you do this.  Of course at times you may be working on something in your own Workspace and want to share it, and that is fine to do.  However if you are starting out with the intention of sharing data, then the Group Workspace is the way to go.  I cover the topic of workspaces and sharing in some depth here.

Learn to Write DAX

At some time early in your journey you will discover that there is a limit to what you can do using the automated summarisation techniques that come with Power BI (and Power Pivot).  Data Analysis eXpressions (DAX) is the language of Power Pivot used in Power BI.  DAX has a lot of similarities with Excel formulas however there is a lot to learn and it is not always intuitive.  If you want to turbo charge your Power BI journey, you really do need to do some formal learning.  I recommend you read my book “Learn to Write DAX” as a cheap and easy way to make rapid progress.   Many Excel professionals tell me that they wish they started with my book first, and it would have saved them a world of pain.  There are many other great books available too.  I keep a curated list of the best books here.

Learn to Use Power Query

Power Query (get data) is much easier to learn than DAX.  The user interface is very intuitive and using the menu items in the UI will normally give you excellent transformations with very little knowledge or previous experience.  You should explore what you can do, and go in with the assumption that “there has to be a way to do this”.  Once you get the basics down, you should consider learning some more advanced techniques from books like M is for Data Monkey.

Direct Connect from Excel to Power BI Service

Today Microsoft announced a great new feature that allows you to direct connect FROM Excel TO Power BI and not the other way around.  This simple change really streamlines the integration experience between Excel and the Power BI Service, and makes Power BI even more like you own personal SSAS server.

Power BI direct connect

There are 2 immediate use cases that come to mind

Personal SSAS Server

It is now easier than ever to effectively have your own free SSAS Server.  Just create a free Power BI account, load up your Power BI workbook, and then access your data models directly from within Excel.

Corporate Sharing

If you are using the paid Pro version of Power BI, the ability to use Power BI Service as a “bonus” SSAS server is also now easier than ever.  Just open Excel and connect directly to the data models that have been shared with your from your organisation from the Excel Power BI menu.

Installing and Getting Started

Download the new version of the Excel Dashboard Publisher Addin here

After installing, you will notice a new Menu Bar in Excel.  Click on this menu and then connect to your Power BI Service account.

image

Once you are signed in, you will be presented with a prompt asking how you want to connect.  In my case I was offered just a single workspace as shown below.

image

What is interesting is that I actually have 2 workspaces in my service, but I can only see one of them.   So something seems a bit amiss here, but I am sure that will be sorted out shortly.

image

A second interesting thing is that you will be presented with options to connect to either a report or a dataset.

image

In my testing, the results were the same.  Maybe something new is coming in the future here!

And yet a third interesting thing is that I was then asked to Authenticate again.  It would be nice if these Azure Authentication dialog boxes were not modal and supported Lastpass.  This would make the login process so much easier.  Even better – Excel could pass the stored credentials to the service so I don’t have to authenticate twice.

image

The good news is this is a one off additional login process, so the next time you connect from Excel you won’t have to log in again.  The exception is if you have multiple Power BI accounts (eg if you are a consultant like me with many clients with different domains) – in that case you will have to continuously log in and out, and not be able to benefit from Lastpass Sad smile

After logging in, you will then see a new blank Pivot Table connected directly to the Power BI Service

image

A Final Word

The original Pin to Power BI Dashboard/Pin Manager is still there (see 1 below), giving you great flexibility in sharing ranges in your workbooks in the Power BI Service.

And last but not least, look how POWERful my Excel Ribbon has become (2 below) – I now have 5 POWERful Addins.  Sweeeet.

image

Top Tips for Sharing Content Using Power BI

I have learnt a lot about sharing Power BI content over the last year since Power BI hit the main stream.  A lot of my learning has come from trial and error, and there are lots of things I wish I knew before I started.  Today I am sharing my top tips about how you can share, and when you should use each of the various methods

It is not easy to categorise the sharing approaches into logical buckets and there is quite a bit of overlap.  Hopefully my explanation below will make it easier to understand the options.

PBIX Workbooks

The most basic way to share your Power BI content is to simply share your PBIX workbook (created by Power BI Desktop) using traditional methods.  There are a few reasons why you might want to do this.

Reasons you might want to do it Things that might turn you off.
  • Everyone can download the software and use it for free (Windows users – sorry Mac boys and girls)
  • It has lots of new visualisations you can’t get in Excel
  • You don’t need your IT department to sign you up to the Power BI Service to get started (although you do need Admin Rights to install)
  • It is no harder to share a PBIX workbook than a Power Pivot Excel workbook.
  • Workbooks can get very large – hundreds of Mb so they can be hard to share if they are that big.
  • There are no Dashboard capabilities until you publish to Power BI Service.
  • You can’t stop sharing – once you send the workbook, the other party has it (not so with the Service).

So while you can do it this way, sharing via the Power BI Service is probably a better way to go.  Power BI Service was built for the explicit purpose of sharing, so why not use it – its free to get started.

My Workspace

Once you sign up to the Power BI Service, the default location where you will publish your workbooks is the “My Workspace” area.

image

  • The workspace area is a place for you to store your Datasets, Reports and Dashboards.
  • You are the only person that can see the items in you own “My Workspace” (unless you explicitly share material)
  • Items that other people share with you will also be visible in your own “My Workspace”, but with a “shared with me” icon.

Sharing from My Workspace (use caution)

There are 3 ways you can share from My Workspace.  These methods of sharing are also available in other areas of Power BI (covered further down the page).  While you “can” share this way from My Workspace, be sure to read up on the downsides of sharing this way and consider the other options further down the page.

Here are the 3 options to share from My Workspace.

Share a Dashboard from My Workspace

Click on the ellipsis next to the dashboard to bring up the sharing menu.  You can share to individual email addresses or to Active Directory Groups.  It is possible to securely share a dashboard this way with people internal to your organisation (email domain) as well as with people outside of your organisation.

image

Publish to Web from My Workspace

Sharing this way allows you to make the material publicly visible over the web.  You can get an embedding code to publish the material via a web page (see this example in a recent blog) or you can get a link that you can send to anyone. But be careful!  Even if you send the link to a single person, the material is available to anyone that has the link (or can find it, hack it etc) – no password required.

image

Create a Content Pack from My Workspace

A Content Pack is a bundle of Datasets, Reports and Dashboards that can be distributed to defined users in your organisation.  When you create the Content Pack, you can specify individual users or groups of users via Active Directory.

image

If someone wants to access this data shared with them as a Content Pack, they simply click “Get Data” and select the content pack from “My Organisation”

image

The benefit of sharing via Content Packs is that you can bundle relevant material and share with only those that need it.  They can choose to access it only when they want it.

There are some downsides of sharing from My Workspace

Here are the pros and cons of sharing this way from My Workspace.  Remember it is possible to share using these methods from a Group Workspace too – more on that below.

Pros Cons
  • It is free for the first 2 options, but you need a Pro License to create a content pack (option 3).
  • It is easy to share very large workbooks with other users as long as they have an account too.
  • You can share with Mac users (via browser)
  • You can share with Mobile users (there are apps and browser solutions)
  • You can stop sharing the content at any time by going back into the share options in the service and removing the sharing approval.
  • It is difficult to easily spot the dashboards you have shared from the My Workspace Dashboard view.
  • My Workspace can get very crowded with ad hoc reports and dashboards for your personal use, and these can then get in the way of shared material.
  • If you have a role to share material within an organisation (eg Business Analyst sharing material), then what will happen when you go on leave?  No one else can access your workspace.
  • There are better ways of doing it by using Group Workspaces.

Group Workspaces

If you are serious about sharing material, then you really should start using Group Workspaces. Group Workspaces are a Pro Licence feature, but it is almost a must for any organisation serious about sharing content via Power BI.

The benefits of using Group Workspaces vs My Workspace include:

  • More than one person can publish, edit and manage all content within a Group Workspace.
  • One or more people can administer user access to the Group Workspace.
  • If the original author is not available (ie they are on holidays or they leave the company) then someone else can take ownership of the material without starting again from scratch.
  • You can have as many Group Workspaces as you need, and these can be based on any group categorisation that works for you and your organisation.
  • When you have your shared material in a Group Workspace, there is no confusion as to what is your personal material and what material you have loaded for the purpose of sharing (see below how you can categorise the Group Workspaces).

image

Once you have decided to use a Group Workspace, you have the same sharing options as covered earlier with all the same benefits.

  • Share a Dashboard
  • Create and Share a Content Pack
  • Public Sharing – Publish to Web

Sharing Group Workspaces

The final method of sharing is to share the Group Workspace itself.  You can of course share the Group Workspace for the purpose of “administration”, but you can also share with the purpose of Group Collaboration and/or Content Sharing.  You have the ability to set access for individuals to be either an administrator or a member.  Admin always has full privileges. Members can be configured by an administrator to either be all “editors” or all “viewers” depending on the group requirements.  One limitation is that you can only share a Group Workspace with individual email address users, not an Active Directory group.  If you think that is just wrong (as I do), then please vote here to have this changed.

Hopefully this article has helped you gain a deeper understanding of the sharing options available with Power BI.  Microsoft has announced it is reviewing a new idea called  Private Publish to web, so keep an eye out for that, and vote for it if you like the idea.

Sharing Directly In Excel

As pointed out by Anthony (in the comments below), once you have access to shared data via Power BI, you will have access to another KILLER feature – direct access to the data from within Excel.  There was a new update on this today, and I blogged about it here. exceleratorbi.com.au/direct-connect-excel-power-bi-service/

Conditional Columns in Power BI Desktop

Microsoft recently announced a new conditional column feature in Power Query (Power BI Desktop version) that warrants a blog post.  This feature makes the process of creating banded columns much easier for the average user.  I blogged about banded columns using DAX here, and now there is this second, potentially easier approach.  In the demo below, I will add a new column of “age bands” just like I did with the DAX example above.

This column is then part of the data model and can be used in Power BI Desktop to slice and dice the visualisations.

The New Conditional Column Menu Item

I am using Adventure works and importing the Customer table (shown as 1 below) for this demo.  Note on the Add Column menu (2) there is a new menu item “Conditional Column” (3).

image

Add an Age Column

The first thing I need to do is create an Age column.  To do this, I first created a “Year of Birth” column by selecting the Birth Date column and then selecting Date\Year\Year.  This will add a new column containing the year of birth (keeps the original Birth Date column).

image

I then edited the code created by Power Query directly in the Formula Bar.

image

The original Code created by Power Query was this:

= Table.AddColumn(_Customers, "Custom", each Date.Year([BirthDate]), type number)

I simply changed the code directly in the Formula Bar to be like this:

= Table.AddColumn(_Customers, "Age", each 2016 - Date.Year([BirthDate]), type number)

Once you get the hang of the way Power Query formulas are constructed, you can easily make small changes like this to streamline the process.  Of course this is a hard coded approximate age, and it will not be 100% correct after the person has a birthday, but it is good enough for the purpose of this demo.

Add the Conditional Column

Then I clicked the new Conditional Column menu button (shown earlier) and the following dialogue box appeared.

image

It is just a matter of using the drop down menu items to build the logic for the banded column using standard Boolean If/Then logic.

image

The main thing to note is that you need to build the logic in a way that it incrementally checks each step and then it exits the logical test once it finds the right band.  In my example above, I started with the oldest band, and each test is a “Greater Than”.  I could have also started with the youngest band and applied  “Less Than” test.  But it would not work if I started with the youngest band and did a “Greater Than” test, or if I started with the oldest band and did a “Less Than” test.

The Final Result

Note how the new banded column now appears (shown as 1 below) and also note the code in the Formula Bar (shown as 2 below).

image

It has always been possible to write this code (shown in the formula bar 2 above) using the “Add Custom Column” feature.  What Microsoft has done here is to create the user interface to help the user write this custom column without having to learn the PQL language.  In fact this is what Power Query is all about – it is simply a user interface driven tool that sits on top of a power formula language, and the user interface does most of the heavy lifting for the user.

The product keeps getting better – keep up the good work Microsoft.

What is Power BI

This is the third in my “what is” series having previously blogged about “What is Power Pivot” and “What is Power Query”.  This topic is the hardest of the three to give a straight answer, however it is also the most important to understand (as I will explain below).

Why Power BI is so Important

Microsoft Power BI is the single most important thing to happen in the data analytics market since the introduction of Microsoft Excel 30 years ago.  The reasons it is so important include:

  1. It is a Microsoft product, and it is a strategic priority for the company.
  2. It is built on the heritage of two of the most mature and best BI products available (SQL Server Analysis Services and Microsoft Excel)
  3. Despite the heritage from SSAS and Excel, Power BI is actually being built/rebuilt using the latest technologies including HTML 5.0, cloud computing, column store databases and modern smart phone mobile apps.
  4. Microsoft is listening to the community to determine what it builds and delivers next.  Let me state that again – Microsoft is listening and delivering!
  5. Microsoft has opened up the visualisation layer to open source contributions via the custom visuals gallery, so there is (and will continue to be) be lots of community added value.  Read about that in my blog here.

Gartner has recognised the trend towards self service business intelligence and also Microsoft’s leading position in this space.  You can look at the 2016 report on this topic here if you are interested.

Why it is Hard to Explain What Power BI is

Microsoft is known for its great products, however it doesn’t always have a great record when it comes to naming products.  The naming of Power BI products has been confusing to say the least.  I wont spend time going back through all the confusing history but instead I will provide my spin (below) on the least confusing way to consider Power BI products today.  I first covered this framework in my blog about the best training for Business Analysts.

So, What is Power BI?

Power BI is a suite of Power BI branded products plus an extended family of umbrella products that collectively change the way people use and analyse data.  All Power BI products (regardless of which group they are in) have all the attributes of a robust enterprise strength BI solution but in addition they all have best in class self service BI capabilities What this means to the average business analyst is that you don’t need to run to IT every time you need something done hence avoiding the normal time, cost, and money conflicts that so often prevent success.

If you want to learn to be a Power BI Ninja, you will need to learn skills in all of the following areas:

  • Extract and Load your data ready for analysis – you use Power Query for this.
  • Enhance the data you have loaded (using Power Query) with relationships and calculations (measures) that can be used to show and find business insights – you use Power Pivot for this.
  • Create reports that bring out the insights from the data, and allow users to easily interact with the data – you use Power BI Desktop for this
  • Distribute the reports  and Dashboards to those that need it – you use Power BI Service and Power BI Mobile for this.

Power BI Branded Products

There are 3 Power BI branded products.

Power BI Service

Power BI ServiceThis is the cloud solution that makes sharing live business analytics a snap.  Need to share a 250MB report – no problem!  It is possible to share small or large reports and dashboards internally within your organisation, externally with third parties, and also share non sensitive information publicly over the web (as I have done with this demo).

The Power BI Service has a completely new visualisation front end built from scratch. The capabilities are still relatively immature however the pace of improvement is stunning, and the addition of the Open Source visualisation tools discussed earlier make it a sure bet.

The service can store your data online, and/or connect to your data in the cloud via Azure (or similar), and/or it can connect to your in house data via a Power BI Gateway.

Power BI Desktop

This is the free Power BI authoring tool that allows you to quickly and easily analyse your data and find insights directly on your PC. The general process is to connect to your data, reshape the data if needed, load the data into the tool, then build striking visualisations that help you analyse your business.  You then can save the workbook (just like you do in Excel), distribute the file for free to other people (eg email, place on a network drive) or better still – publish the file directly to the Power BI Service with just a couple of clicks.  Once the analysis is loaded to the Power BI Service, it is easy to share with others.

You should author your Power BI reports in Desktop over Service, as you can backup your Desktop files but you can’t backup your Service reports.

Power BI Mobile

Power BI MobileThis is the free mobile tool that allows you to consume any of the reports that are available to you in the Power BI Service.  There are versions for all the major tablets and smart phones.  This software is also fairly new and still being enhanced.  It is not perfect, but it is good enough to get started and it will only get better from here.

Power BI Umbrella Products

There is a suite of underlying “products” and technologies that support the Power BI branded products.  These umbrella products covered below are the supporting technologies sitting behind Microsoft’s Self Service BI Strategy (and also Enterprise BI in some cases).

Power Pivot

Power PivotPower Pivot is a data modelling tool.  It allows you to take tables of data, create relationships between those tables so they work together (without VLOOKUP!), and then enhance the data model so that it includes specific calculations that turn your raw data into useful business insights.  This work has historically been done by the IT department, however it can now be done by competent business users using the language of Power Pivot – DAX (Data Analysis Expressions).

Power Pivot is available in Excel 2010+, SSAS 2012+, and Power BI Desktop/Service.  It is the same technology across all of these products.  If you learn it once, you can use the same skills across all of these products.  You can read more about Power Pivot here.

Power Query

Power QueryPower Query is a data acquisition and transformation tool.  It helps business users grab data from anywhere it resides, then cleanse, reshape and prepare the data ready for the analytics work to come – no IT guru needed any more. Power Query is a user interface driven tool that makes is easy for anyone to manage and load data, however it is supported by a powerful programming language that makes it super powerful.  You can read more about Power Query here.

Power Query is available in Excel 2010+ and Power BI Desktop/Service.  It  (annoyingly) has been renamed “Get & Transform” in Excel 2016 and “Get Data” in Power BI Desktop.  What were they thinking?

L2WD banner ad

What is the right way to learn Power BI?

There are lots of different learning styles, but here are the best resources I know of for learning these tools.

Live Training

Reading Books

I keep an up to date recommended reading list in my knowledge base here.  As well as information about books, I keep a list of great websites to learn from down the bottom of the page.

Online Learning

There are lots of online resources to learn. Some of the best I have found are:

Power BI Analyze in Excel – What You Need to Know

Microsoft announced another awesome Power BI feature recently – Analyze in Excel.  This feature was requested by Avi Singh from PowerPivotPro.com and was heavily supported as a requirement by the community. You can read more about how you can influence future releases at the bottom of this post.  But let’s get straight into this great new feature.

Analyze in Excel is available for the free service offering as well as the paid service. Note however if you want to upload a workbook and share it with someone else in your organisation so they too can Analyze in Excel, you will need the paid service.  But don’t despair- Power BI Pro is cheap and well worth the money.

Power BI Service

The Power BI Service is a modern version of SQL Server Analysis Services fully hosted in the Cloud by Microsoft.  What that means to you is you can have a powerful server to host your Power BI workbooks and distribute your reports without having to spend hundreds of thousands of dollars getting it all set up.  You can get started today for free with just a few clicks. powerbi.microsoft.com

Once you have your Power BI workbook loaded in the Cloud, you can even share a fully interactive version of the workbook publicly using the Publish to Web feature (demo’d below).  Note you should only do this with non-confidential data as this approach will make your data available publicly over the Internet.

What is Analyze in Excel?

Once you have your data loaded up in Power BI, the world somehow just seems better.  Sometimes however you just want to get into the data and analyse it using good old fashioned Excel Pivot Tables.  Now can do that with the new Analyze in Excel feature.  To do this, you need to do the following:

  • Log in to the Power BI Service with your account
  • Navigate to the report in question in the left hand panel of the Power BI Service.
  • Click on the ellipsis next to the report name (shown as 1 below)
  • Click Analyze in Excel (2 below).

image

Updated Data Connector

The first time you do this, you will be prompted to download some updated software that is required for this feature to work.  Make sure you select the correct version for your version of Excel.

  • 32 bit version of Excel = Install x86 shown as 1 below
  • 64 bit version of Excel shown as 2 below.

image

Once you have installed this, you can click the “Don’t show this again” box (shown as 3 above) to prevent this dialogue appearing each time you go through this process.

One thing to be aware of. After you click “Don’t show this again”, you can get to the download box again by clicking the downloads button (shown as 1 below) and then selecting “Analyze in Excel updates” (2 below).  However when you do this, it currently ONLY DOWNLOADS the 32 bit version regardless if you have 64 bit or not.  I have logged a bug with Microsoft and will remove this comment from this blog once the problem is fixed.

image

Building a Pivot Table

After you click the “Analyze in Excel” button, a small ODC file will be downloaded to your PC.  Keep an eye out for the download in your browser – mine is shown below.

image

When I click on the above ODC file, I get warning message as shown below.  Just click Enable.

image

After you click “Enable”, you will get a new blank Pivot Table connected to the Power BI Service as shown below.

image

Now you have a new blank workbook connected to your Power BI data model in the cloud.  You can build out a Pivot Table to analyse your data like any other Pivot Table, with a few minor differences.

If you take a close look at the Pivot Table Fields list on the right, you will see that there are now 2 types of field groups.  The first type (shown as 1 below) are the measures stored in your tables in Power BI.  These “measure tables” are indicated by the Sigma symbol.  The second type (shown as 2 below) are the actual tables (and columns) from the data model.  You select the Values for your Pivot Table from the measures tables (1) and the Rows/Columns/Filters/Slicers from the actual tables (2).

image

Now you just build out your Pivot Table as normal as shown below.

analyze in excel

Thin Workbooks

When you save this file, you will notice something really special.  This is a “Thin Workbook”, in my example it is just 24kb (shown below).

image

The data does not live in the workbook – it is only in the cloud.  The only data that is in the workbook is the data that is visualised inside the Pivot Table(s).  But there is a cost to this approach – you must have a live connection to the Internet to be able to interact with the data in the Pivot Table.  Personally I think this is a small price to pay for all of the benefits, however if you want to interact with your workbook in a location that doesn’t have an Internet connection, then this is something you will need to be aware of.

Also note that now the data is in the cloud there will be a small amount of latency (delay) when clicking on the Pivot Table compared to a local workbook on your PC.  This will be most noticeable for small workbooks that are almost instantaneous on your PC.  These small fast workbooks will move from being “instantaneous” on your PC to being “sub second” in the cloud – which is not that bad in my view.  If you have very large workbooks on your PC (say 300MB +), or if you have 32 bit Excel on your PC, then you may actually notice an overall improvement with these thin workbooks.  This is because the processing of the cube is now pushed to the Power BI Service which has lots of powerful processors to complete the task.

Distributing Thin Workbooks

Now that you have this “Thin” Excel workbook, you can distribute the Thin Workbook to other users and they can also interact with the data live directly in the Excel workbook connected to the cloud.  You don’t need to distribute the ODC file, just the updated and saved Thin Workbook.  There are some pre-requisites for this to work, including:

  1. You will need to send them the Thin Workbook once the connection has been established (of course).
  2. They will need their own Power BI Service Account.
  3. They will need access to the shared data via the Power BI Service, so you will need to share the Power BI report with them too from within Power BI.
  4. They will need to install the updated “Analyze with Excel update” described earlier.

I have tested distributing this Thin Workbook to a Mac user with Excel 2011 and unfortunately it didn’t work. I haven’t tested it with Excel 2016 for Mac but would like to hear from anyone who does.

L2WD banner ad

Better Than Export to Excel

Rob Collie has a favourite joke.  Qn. What is the 3rd most common button in all BI tools?  Ans. Export to Excel (3rd after 1. OK and 2. Cancel).  But Analyze with Excel is so much more than “Export to Excel”.  The reason of course is that you don’t have to take a copy of your data to work with it.  You create a live link to the data source and hence when the data is updated in the future, your “Analyze with Excel” workbook will also update.

Cube Formulas Work Too

If you like using Cube Formulas, the good news is they will work too.  When you type a cube formula (like shown below), you will be given a choice of data connections (1 being the Power BI service and 2 being the data model in the current workbook).

image

So that got me wondering… Can I have 2 data models for a workbook, 1 in the Cloud and a second in the workbook?  The answer is yes.  I have 2 cube formulas below, the first is coming from the data model in the current workbook and the second is coming from the Power BI service connected to the same workbook.

image

Common Errors

New Versions of Connector

The connector is being updated all the time.  If you see this error (or any other connection error for that matter)

The connection failed because user credentials are needed and Sign-In UI is not allowed

The first thing you should do is make sure you have the latest version of the connector installed.  You can find the latest update online at the Power BI Service as shown below.

Forbidden Error

If you have multiple Power BI accounts, you may come across the following error “The HTTP server returned the following error: Forbidden.”  In fact some users that don’t have multiple accounts have also experienced this error.

image

The issue is caused by “swapping” between accounts and potentially the login process trying to log into the wrong account.  There is a manual work around to fix this problem.

  1. Navigate to the ODC file you downloaded.
  2. Right click the ODC file and edit it in Notepad.
  3. Find the section that starts with <odc:ConnectionString>
  4. Add the following text immediately after this string
    User ID = name@youremailaddress.com;
  5. Save the file, and then double click to open it again.

You should now be taken to the correct login screen to give you access to the data. I had this very issue myself when I first tried to use this feature.  I logged a thread at community.powerbi.com and got immediate help from the Microsoft Power BI support team to help me resolve the issue.  If you are not a member of the Power BI community, you are missing out – sign up and get involved.

Hopefully a more elegant solution will be delivered in the future – I’m sure it will.

Initialization of the data source failed

You may  see this error, particularly the first time you try to do this.

“Initialization of the data source failed. Check the database server or contact your database administrator. Make sure the external database is available, and then try the operation again. If you see this message again, create a new data source to connect to the database.”

This is normally related to you not having the correct data connector installed for your version of Excel.  Make sure you have installed the correct (64 bit or 32 bit) data connector as covered earlier in this post.

How to Influence Future Power BI Updates

Power BI is rapidly improving each week as the crack Microsoft Power BI team focuses on delivering the new features that its customers most want.  If you want to influence what features get built out next, then do what Avi did and create an idea at ideas.powerbi.com/ .  Or you can simply look at other people’s ideas and vote for the ones that you want most.

If you want to help me with one of my passions, you can vote for this idea here.

ideas.powerbi.com/forums/265200-power-bi-ideas/suggestions/12464280-individual-custom-links-for-pinned-live-pages

What I think is really needed is a way to build a proper “web page like” menu structure so you can build out Power BI with a “web browsing experience”.  The current approach of pinning tiles is OK in limited scenarios, but is not really “complete” enough to deliver a comprehensive reporting tool.  I think we need a proper menu/navigation editing solution like in SharePoint where the curator can build out a proper menu structure to deliver the reports to users in a custom and meaningful way.

What is Power Query

I am still astounded by the number of people I meet that have never heard of Power Query and Power Pivot.  The good news is there are lots of people out there in for a pleasant surprise.  I think Power BI will solve the “lack of awareness of Power Query” problem over the coming months and years.  Power BI is much more visible as can been seen in this Google Trends chart that I first created in Dec 2014.

I have written a blog titled “What is Power Pivot” before and I refer people to that page all the time – this helps me quickly give people a detailed summary of what they need to know.  Today I am writing about Power Query for the same reason – to have something that I can point people towards when they are just starting out.

So What is Power Query (aka Get and Transform)?

Power Query is an Extract, Transform and Load tool (ETL) that helps Excel and Power BI users connect to data, transform the shape of the data as required and then load the data for further use, typically into Excel, Power BI, CSV etc.  ETL tools are not new – there are many enterprise strength tools in the marketplace that are used by IT professionals.  What makes Power Query such a ground-breaking tool is it is the first tool built specifically for business users.

Who can benefit from Power Query?

If one or more of the following scenarios applies to you, then Power Query is a great tool to make your life better.

  • You spend hours and hours of repetitive work combining data from multiple files (CSV, Excel) into a single Excel workbook before you can start the real work.
  • You have data in a file but it is the wrong shape for the task at hand.  You need to spend time “reshaping” the data before you can use it.
  • You write lots of VLOOKUP formulas over multiple tables of data to “join” the data together.
  • You manually have to “cleanse” your data every month removing data and errors that you already know about but you can’t permanently solve back at the source.
  • You need to get data from web pages into Excel so you can analyse the data. You cut and paste the data and spend hours reformatting it to remove the noise from the data.

What Can Power Query Do?

Power Query can be used for the following (and probably many more things too).

  • A data loading tool
  • A data cleansing tool (remove unwanted columns, rows, characters etc)
  • A data reshaping tool including a very powerful “unpivot data” function (example shown below)

unpivot

  • A tool to combined data from multiple worksheets or workbooks into a single file/table

combine

  • A tool to join data from different sources into 1 coherent data set or table.
  • An auditing tool – compare lists of data and find the similarities and differences.
  • A website data scraping tool – easily load tabular data from a web page into Excel or Power BI.
  • A data creation tool (eg it can create a calendar table for Power Pivot).  Read my blog posts about how create a regular Calendar table here, or a 445 Calendar table here.

What Makes Power Query So Great?

Power Query is great for many reasons including:

  • It has a simple User Interface that any Excel user can learn.  Note the simple menu driven features in the image below.
    PQUI[3]
  • The user interface is built on top of a very powerful formula language that has immense capabilities well beyond what most users will ever need.  If you do ever need something more complex than the UI can support, the language is there (as shown above) and can be used to build a working solution without switching tools.
    image
  • Power Query records repeatable “Process Steps” that can be reused over and over without destroying the underlying data.  You can go back through the Process Steps and see the data before the transformation took place.  You can even insert new transformation steps earlier in the process flow if needed.
    Process Steps
  • Repeatable process steps means that you can build the steps for one set of data and then you can “reapply” those same process steps to a different set of data (provided both sets of data are fundamentally the same “shape”).  This is great for updating weekly or monthly data (for example).
    reusable process

Versions of Power Query

Excel 2010/2013

Power Query was first released as a free Addin for Excel 2010 and Excel 2013.  After you install the Addin, you get a new menu on the Excel Ribbon shown below.

image

You can find the download details for Excel 2010/2013 on my downloads page here.

Excel 2016

In Excel 2016 the product was fully integrated into Excel and renamed “Get and Transform” – unfortunately.  To find it in Excel 2016, you need to go to the Data tab (#1 below) and then you will see the Get and Transform group (#2 below).

image

Power BI Desktop

Power Query is also fully integrated into Power BI Desktop where it is called “Get Data” – unfortunately.  You can find it by navigating to the Home Tab (#1 below) and then clicking Get Data (#2 below).  Power Query (Get Data) is the only way to load data in Power BI Desktop, so if you have used this tool to build a new workbook then you have already used Power Query (Get Data).

image

Why Microsoft couldn’t stick with the name “Power Query” is anyone’s guess.  Maybe they thought users were “scared off” by the “Power” term – who knows.
L2WD banner ad

The Power Query Formula Language (PQL)

The official name of the underlying language of Power Query is the “Power Query Formula Language”.  This language was developed from another language called “M”.  Both these names are “unfortunate” as the former is a mouth full and the later is very difficult to search for in Google.  For these reasons, I have come up with my own name (which I hope will catch on) called Power Query Language – PQL  (pronounced Peequel – as in SQL Seequel).  PQL is a much better name, easy to search on Google and has a good fit to is cousin SQL.

Where to learn more

I often blog about how to get value from Power Query.  You can subscribe to my blog via email at the top of this page.  You can also explore the previous Power Query blog posts that have have written here.

There is a great book by Ken Puls and Miguel Escobar – I wrote a book review here.  I am also a big fan of Ken’s blog that you can find here.

Multiple Data Tables in Power Pivot

Level: Beginners

All of the learning examples in my book Learn to Write DAX are built on an Adventure Works database that has a single data table.  But Power Pivot is a very capable tool for building reports that combine multiple data tables.  Although it is very capable, the logic of how to combine multiple tables is different in Power Pivot than most Excel users will be familiar with.  This post covers the correct way to do it.

The Classic Power Pivot Data Model

As you can see in the image below, the classic Power Pivot data model consists of 2 different types of tables – lookup tables (shown as #1 below) and data table(s) shown as 2.

image

The difference between these 2 types of tables is very important. Read More