Archive for Excel

SQL Saturday Melbourne Feb 2017 Materials

image

I am currently in Melbourne for PASS SQL Saturday 2017.  SQL Saturday is an annual one day conference event that occurs in many major cities around the world.  It is focussed on helping SQL Server professionals learn more about their profession and also network with other like minded people.  I have a lot of people that are readers of my website material that are not SQL Server professionals – most likely they are Excel professionals.  But that doesn’t mean there is nothing for you at SQL Saturday.  I encourage you to look for an event in your location, read the session materials and decide if there is value for you – particularly if you are into Power Query and Power Pivot.  You can learn more at www.sqlsaturday.com/

SQL Saturday will be held in Sydney next Saturday 18th Feb 2017 and I would love to see you there.

My Presentation: Disconnected Tables in Power Pivot

I spoke at SQL Saturday about disconnected tables in Power Pivot.  Most people know that you create relationships between tables in Power Pivot but did you know that you can load tables that are not joined (disconnected tables) and they can still add value.

My Slide Deck

Here are the slides that I used in my presentation today for those that would like to download them.  I realise that they really just support the live demonstration, but they should have meaning for those that were there.  I have also linked to some of my other articles on 3 of the slides if you would like to read more.

SQLSat_MattAllington_DisconnectedTablesInDAX

Best Practices for Power Pivot, Power Query and Power BI

Level: Beginners

There are many best practices for Power BI, Power Pivot and Power Query.  I know these things so intuitively now that it is very easy to forget how I incrementally learnt these things along the journey.  Most of these things are not “right” vs “wrong” – they are more often simply “better”.  I thought there would be value in producing a page that clearly outlines the important best practices as a learning reference for everyone.  In all cases I have outlined why it is a best practice to help the understanding.  You would be well placed to adopt these best practices as part of your DAX journey.

Naming Conventions

Naming your Columns and Measures

  • Always write a Column in the format TableName[Column Name]
  • Always write a Measure in the format [Measure Name]

This is the foundation of all the DAX formulas you will write.  Both columns and measures use the same square bracket syntax.  It is technically possible to write measures and columns both including the table name as follows.

  • TableName[Column Name]
  • TableName[Measure Name]

These 2 examples above do not cause a problem because their names makes it easy to know what they are, but if you always place the table name at the front of every measure and every column, then it will be impossible to tell them apart by reading the formula.  Take the following example

  • TableName[Total Sales]

Is the above a column or a measure?  It is impossible to tell unless you are using the best practice naming convention.

Measures and columns are very different in the DAX language.  It is essential that you can tell at a glance which is which.  In addition if you hard code a measure with the table name and then you later move the measure to another table, any formulas referring to this measure will stop working.

Give Tables a Single Noun Name

  • Don’t just accept the table name from your source system.  Preferably give the table a single word noun description/name.

Many BI data sources will have long table names like fctSalesTransactionsHistory or dimCustomerMasterFile.  This is a common practice in IT and is related to the Kimball dimension modelling methodology.  The problem is that with Self Service BI these table names, column names and measure names are more visible to business users than ever before.  Given many business users are going to be reading these table names, it is much easier for them to comprehend the “Sales” table rather than the “fctSalesTransactionsHistory” table.  In addition earlier versions of Power Pivot do not have fully featured intellisense – you are required to refer to columns by starting to type the table name from the beginning.  If every table starts with either fct or dim, you have just added 3 additional characters you have to type for each formula before Intellisense can help you.

Also PowerBI.com has a natural langauge query tool that allows you to ask quetions of your data.  If your table names are ‘words’, then you are helping the engine find what you are looking for.

Using Spaces in Names

  • Don’t use spaces in table names
  • Do use spaces in column names
  • Do use spaces in measure names

If you use spaces in table names you will be forced to add single quotes around the table name each time you refer to it in a formula.  This makes the code longer, harder to read and “untidy” (IMO anyway).  It is better to use underscore_characters or CamelCase instead of spaces (or better still use a single noun name as mentioned above).

Columns and measures always need to be wrapped in [Square Brackets] anyway and hence adding spaces does not make the code any more complex.  Columns and measures are easier to read if they have spaces

Don’t Overly Abbreviate Business Terms

  • Give your tables, columns and measures descriptive business names without overly short abbreviations.

Firstly you should use language and abbreviations that are commonly used in your organisation.  So if “Year to Date” is commonly abbreviated to YTD, then for sure you can use this abbreviation in your measure names eg [Total Sales YTD].  However if you develop a new measure called [Total Sales Last Rolling Quarter] and this is not a common concept across the organisation, then you are just making it hard for yourself if you call your measure [Ttl Sales LRQ].  You will simply have people calling you asking what it means.

Secondly Power BI has a feature called Q&A that allows a user to ask a natural language question about data.

eg.  What were the total sales for bikes last year

This natural language algorithm looks for matches in words in the question against the data model to help answer the question.  If you abbreviate your measure names to for example [TtlSales] instead of [Total Sales], you are making it hard for Q&A to do its work.  You can help Q&A using synonyms, but do yourself a favour and don’t over abbreviate your name.s

Measures or Calculated Fields

  • Measures is a better name than Calculated Fields

The term measures has been a Microsoft term for a BI formula for many years.  In the first release of Power Pivot in Excel 2010, Microsoft adopted this term.  Unfortunately in Excel 2013 somehow a decision was taken to rename “measures” to be called “calculated fields”.  This was a bad decision and thanks to lobbying from many people Excel 2016 reverted to using the term measures (as does Power BI).  I always now use the term measures and never refer to Calculated Fields unless I am explaining to Excel 2013 users why they are stuck with a bad name.

Loading and Shaping Data

Push Shaping as Close to the Source as Possible

  • Wherever possible, you should do your data shaping as close as possible to the data source.

There are many ways that you can shape your data in the Microsoft BI stack.  Power Query is a great tool to reshape your data however you can also use Power Pivot (Calculated Columns, Filters on load) and Power BI also includes Calculated Tables.  And you can always write SQL code and paste that into the tools to extract the data that way.  The main problem with these approaches is you are effectively hard coding a solution for a single data set.  If you want to build another data set in the future, the work needs to be done again (either copy or re-write).  The data shaping tools are designed to allow you to do what ever you need without having to rely on a third party – use these tools if you need to.  However if you have a common need for data in a particular shape and you can get support (from IT or otherwise) to shape your data at the source so you can easily get what you need, then there is definitely value in doing that.

Shape in Power Query, Model in Power Pivot

Power Query and Power Pivot were built to do 2 completely different tasks.  Power Query is built for cleansing and shaping while Power Pivot is built for modelling and reporting.  It is possible that you can shape your data in Power Pivot (eg you can write calculated columns, you can add calculated tables (in the newer versions) etc).  But just because you can do these things in Power Pivot, doesn’t mean you should.   For example it is possible to write letters to people using Excel, but Word is a much better tool for this task (I knew someone that once did that!).

Best practice is that you should use Power Query to shape your data before/during load, and then use Power Pivot for measures and reporting. I have deeper coverage on this topic here.

Use A Calendar Table

  • If you want to any sort of time calculations, get a Calendar table

It is possible that you can analyse your data in a single flat table without using any lookup/dimension tables.  A Calendar table is a special type of lookup/dimension table because it can be used to perform time intelligence functions.  I have an article on time intelligence here and another on Calendar tables here.  Bottom line – get a Calendar table.

A Star Schema is Optimal

  • Power Pivot is optimised to use a Star Schema table structure

I have an in-depth article about star schemas here that you can read if need be.  I am not saying this is the only layout that will work, or that other designs will always be slow.  I am saying that if you start out thinking about a star schema and aim to build that design you will be well under way to success.  Two key things you should know.

  • Don’t just bring in what is in your source transactional database – that would likely put you into a world of pain.
  • There is no need to create a lookup/dimension table just for the sake of it.  If your sales table has customer name and you don’t care about anything else about the customer (eg city, state etc), then there is no need to create a lookup table just for the sake of creating a star schema.  If you have 2 or more columns relating to the same object in your data table, then it is time to consider a lookup table.

You Should Prefer Long and Narrow Tables

  • Short wide tables are generally bad for Power Pivot but long narrow tables are great.

image

There are 2 main reasons why loading data this way is a good idea.

  • Power Pivot is a column store database.  It uses advanced compression techniques to store the data efficiently so it takes up less space and so it is fast to access the data when needed.  Simplistically speaking, long narrow tables compress better than short wide tables.
  • Power Pivot is designed to quickly and easily filter your data.  It is much easier/better to write one formula to add up a single column and then filter on an attribute column (such as month name in the green table above) than it is to write many different measures to add up each column separately.

Only Load the Data You Need

  • Load all the data you need, and nothing you don’t need.

If you have data (particularly in extra columns) you don’t need loaded, then don’t load it. Loading data you don’t need will make your workbooks bigger and slower than they need to be.  In the old world of Excel we all used to ask IT to “give me everything” because it was too hard to go back and add the missing columns of data later.  This is no longer the case – it is very easy to change your data load query to add in a column you are missing.  So bring in all of what you need and nothing you don’t.  If you need something else later, then go and get it later.  It is even advisable to use a tool like PP Utilities to show you which columns are not in use so you can delete them.  Focus mainly on your large data tables – the lookup/dimension tables tend to be smaller and hence are generally less of an issue (not always).

Don’t use Linked Tables

It is possible to add your data to a table in Excel and then use a Linked Table to load it into Power Pivot.  You simply select the data, go to the Power Pivot Menu (1 below) and click Add to Data Model (2 below).

The trouble with doing this is that you end up with 2 copies of the data in your workbook. The Excel table is an uncompressed copy and then you also have a compressed copy inside Power Pivot.  In the example (shown as 3 above) there are many thousands of rows of data.  The uncompressed data can take up 6 – 10 times more space than the equivalent compressed data.  If you have a small table with a couple of columns and 10 or 20 rows then it is fine.  However if you have anything more than that you are better to have 1 workbook for your data and then import the table directly into Power Pivot without storing it in Excel at all.

Modelling

Avoid Bi-Directional Relationships

  • Avoid using the default bi-directional relationships in Power BI unless you need them.

image

Microsoft introduce bi-directional filter propagation in Power BI (this is currently not available in Excel).  This is a good feature for beginners and those that don’t really understand how the plumbing works.  But it comes at a cost in that:

  • The performance can be negatively affected
  • If there is more than 1 data table, there can be circular relationships created (just like in cells in Excel)

I recommend you turn your bi-directional relationships to single direction (double click to change) and only turn them back on if you really need them.

Measures are Better than Calculated Columns

  • Wherever possible you should write Measures in Preference to Calculated Columns Where Possible

I have been a strong proponent of this for many years.  It mainly applies to Excel users that don’t have any formal learning about database design.  It is very easy for an Excel user to think they should write a calculated column (because they know how to do that) and not a measure (because that is a bit foreign to an Excel user).  I am not going to cover this in depth again now as I have already covered it here.  Just do yourself a favour Excel folk and start with the assumption that a measure is what you should write unless you know why a calculated column is a better fit.

For the record there are times when a calculated column is the best option, but 99.9% of all use cases coming from new Excel users won’t need them.  The main exception is if you need to use the formula in a slicer to filter your data – then you will need a column.

Store Measures in the Table Where the Data Comes from

  • You have a choice where to store your measures, so place them in the table where the data comes from.

Take for example a measure like this.

Total Sales = SUM(Sales[Extended Amount])

The data for this measure is coming from the [Extended Amount] column in the sales table.  You should therefore store the measure in the sales table.   If you place it in (say) the Calendar table, Excel will likely give you a warning similar to shown below.

image

This issue doesn’t occur in Power BI.

Break Measures into Interim Parts

  • Break your DAX problems into manageable pieces and solve each piece one at a time.

You wouldn’t use a single cell in a spreadsheet to build a financial model.  The cells are there to be used and make it easier to build a solution that meets your needs.  You should think the same way about measures.  The following formula is valid however it is hard to read, write and debug.  It also repeats the same line of code multiple times throughout the measure.  Having said that it will give you the % change in sales vs last year.

 

It is much easier to write interim measures and then join the pieces together to solve your problem.  Plus you get each interim measure available for reuse elsewhere in your model.  I am sure you will agree the following set of 4 measures are much easier to understand.

Total Sales = SUMX(Sales,Sales[Qty] * Sales[Unit Price])

Total Sales LY  = CALCULATE([Total Sales],SAMEPERIODLASTYEAR(Calendar[Date]))

Chg vs LY = [Total Sales] – [Total Sales LY]

% Chg vs LY = DIVIDE ( [Chg vs LY], [Total Sales LY] )

Don’t Break Calculated Columns into Interim Parts

  • It is good to have interim measures but it is bad to keep interim columns.

Interim measures are calculated on the fly on demand, they take up little space and make it easier to write your DAX.  As with measures, I t is easier to write calculated columns using interim calculated columns, however the problem is that every column is pre-calculated and stored on disk, and each additional column makes the data model take up more space on disk and memory and hence makes it less efficient.   By all means write interim columns if you need to in order to create a calculate column (not withstanding the earlier comments of columns vs measures) however once you have worked out the correct syntax, concatenate all the code into a single “Mega DAX formula” in a single column. This is an Excel concept I learnt from John Walkenbach.

Other Advice

You Can’t Start to Learn DAX by Reading Alone

I say this up front in my book “Learn to Write DAX”.  If you think you are going to learn a new skill like Power Pivot, Power Query or Power BI by reading a book and not getting your hands in the system, let me tell you “you can’t”.  The exception is if you are a professional SQL Server database user and have a solid background in reporting and analytics database technology, then I am sure you can learn this way. For the rest of us Excel folk, there is no substitute for practicing what you read – so do yourself a favour.

Use 64 bit If You Can

This one causes a world of pain for many people.  Power Pivot is the ONLY MS Office product that can benefit from 64 bit, but unfortunately it is all or nothing.  Most organisations have deployed 32 bit and will be very reluctant to give you 64 bit Office.  You can read my article on this topic here and also read up about my work arounds including installing 64 bit Power BI Desktop with 32 bit Office, and then also using Power BI Desktop as a local server for Excel.

DAX Time Intelligence Explained

Level: Beginners

I help a lot of people on forums who ask questions about time intelligence for DAX.  If you are just starting out then the chances are that you may not even be clear what time intelligence is and hence sometimes you don’t even know what to ask.  Often the question is something like “I want to show this year and last year on a chart”, or “total year to date this year compared with last year” etc. If you want to do any time based comparison using DAX, Power Pivot and or Power BI, then this article explaining time intelligence is the right article for you.

Definition of Time Intelligence

Time intelligence is the collective name for a set of patterns (DAX in this case) that can be used to solve time comparison problems.  Examples include comparing:

  • Same period prior month, quarter, year etc.
  • Same period next month, quarter, year etc.
  • Same period year to date compared with prior year, next year etc.
  • Rolling 30 days, 60 days, 90 days, 12 months etc.
  • etc. – there are many many more

Time intelligence is used when you want to “time shift” any period with another period of time for comparison purposes or to simply display a different period than the selection.

Understanding Filter Context

Before you can understand why time intelligence needs a special approach in DAX, you first need to have a clear understanding of Filter Context.

Pivot Tables and Power BI visuals both slice data so that you can “drill” and “filter” to see a sub-set of data.  Take the image shown below.  On the left is a pivot table and on the right is a Power BI visual.

image

Starting with the Excel Pivot table on the left, every value cell in the pivot table has been filtered more or less by the Rows, Columns, Filters and Slicers that make up the Pivot Table.  The cell highlighted as 1 (above left) has the following filters applied.

  • Territory[Country] = “Australia” – this comes from the Rows in the Pivot
  • Calendar[Year] = “2003” – this comes from the Year in the slicer
  • Product[Category] = “Bikes” – this comes from the Category in the filter.

After these filters are applied, the calculation for the cell is evaluated and the answer $2,947,789 is returned to the cell.  Every value cell in the pivot table is evaluated in exactly the same way – including the Grand Total row in the pivot table.  In the case of the grand total row, Product[Category] and Calendar[Year] have the same filters, but there is no filter on Territory[Country].

On the right hand side in the image above is a Power BI visual.  Filtering in Power BI visuals fundamentally works the same way as a pivot table however there are more places for cross filtering to come from.  In the image above, the same filtering is applied as in the Pivot table but in the Power BI example the filters are applied in a different way.

  • Territory[Country] = “Australia” – this comes from the bar chart Axis
  • Calendar[Year] = “2003” – this comes from the Year in the slicer
  • Product[Category] = “Bikes” – this comes from the Category in the tree map visual.

When filter context is passed from a visual to the underlying data model, all the relevant tables are filtered before the calculation is completed.  Filter first, evaluate second is a fundamental principle for all DAX formulas.

The Time Intelligence “Problem”

Let’s assume you want to compare total sales on a particular year vs prior year.  One way to do this (in Excel) is to put the years onto Columns in a pivot table as shown below (a similar approach can be used in Power BI).

image

But doing it this way causes many problems, including:

  • There are years in the pivot table that you don’t want (eg 2001, 2004).  You will need to somehow manually hide or filter the ones you don’t need.
  • If you want to calculate the absolute change or % change year on year you will need to hard code some formulas in the cells next to the spreadsheet and hence they can’t be reused in other visuals later.
  • If you want to look at a different year you will potentially have to go through the process of doing the filtering and formulas again from scratch.

A better way to solve this problem is to select the current period (using a slicer or filter of some sort) and have a universal formula that returns the result relative to the selection.  So if you select 2003, the formula will automatically select 2002 for you.  If you select 2002, it will automatically select 2001 (and so on).

Filtering is Now Your Enemy

If you want to write a relative time intelligence formula, then the natural filtering behaviour of a visual becomes your enemy. Imagine a calendar table with a filter applied Calendar[Year] = 2003.  If you imagine what the filtered data model would look like “Under the hood”, it should be clear that the filtered calendar table will show rows starting on 1 Jan 2003 and ending on 31 Dec 2003 – it will contain 365 unique days for the single year 2003.  The filter has already been applied to the data model so only days in 2003 are available for the calculation – how then can the data model possibly access sales for the year 2002?  There needs to be a way to go back in time and fetch a different period (in this case the period is 1 year earlier in time than the selected period), yet the 2003 filter has already been applied preventing this from occurring naturally.  This is the reason why there needs to be a special set of time intelligence functions – to solve this natural filtering “problem”.

How Time Intelligence Functions Work

Time intelligence functions are specifically designed to solve the filtering issue described above.  All time intelligence functions execute the following 4 steps:

  1. first “detect” the current filter context to determine what the “current” selected period is
  2. then remove the calendar filtering from the underlying tables so that data for “all time” is once again available.
  3. then perform a time shift to find a different period in time (as specified in the formula).
  4. Finally reapply filters on the data model for the time shifted period prior to calculating the final value.

Custom vs. Inbuilt Time Intelligence

There are 2 types of time intelligence functions – Custom and Inbuilt.  Inbuilt time intelligence functions are easier to write because they have been designed to protect the user (ie you) from the complexity.  I am not going to cover Inbuilt time intelligence in depth here because they are relatively easy to use.  See link at the bottom of the page if you would like to download the DAX reference guide I produced (including all the time intelligence functions).

The reason inbuilt time intelligence functions are easier to learn is they actually are what is known as “Syntax Sugar”.  Microsoft has created these special functions to make them easier to write and easier to understand.  You follow the simple syntax and Power Pivot does the rest.  But under the hood the inbuilt time intelligence functions are converted to the full syntax prior to evaluation.  Consider the following two examples (just to illustrate the potential complexity in the custom version).

Total Sales Year to Date (inbuilt) = TOTALSYTD(Calendar[Date],[Total Sales])

Both of these formulas return the exact same result, and in fact both use the same approach to calculating the result under the hood.  The only difference is the inbuilt version is much easy to write because you (the DAX author) are protected from the full syntax.

The end result (regardless which formula is used) looks like this in a Pivot Table.

image

Looking at the highlighted cells, even though cell 1 above is filtered to show only sales for the month of July 2003, the Time Intelligence function (cell 2 above) is able to display sales for the period Jan – Jul 2003.  It does this because the formula performs the following 4 steps.

  1. It first “detects” the current filter context to determine what the “current” selected period is.  In this case July 2003
  2. It then removes the calendar filtering from the underlying tables so that all data is once again available.
  3. It then performs a time shift to find a different period in time – in this case it holds the end date in the current filter context the same (31 July 2003) but shifts the start date back to 1 Jan 2003.
  4. Finally it reapplies filters on the data model for the time shifted period prior to calculating the final value.

How to Read a Custom Time Intelligence Formula

The custom time intelligence formulas can be daunting when you first see them – but actually they are easy to understand as long as you clearly understand the role of each part of the formula.  Below is the same formula again (from above) along with line numbers to make it easier for me to refer to.

image

Line 2 (CALCULATE) is the only function that can change the natural filtering behaviour coming from a visual – that’s what it does.   CALCULATE always filters first, evaluates second (as mentioned above).  So lines 5 – 8 are executed first (filters applied) and then the formula on line 3 is evaluated last.

Lines 4 – 8 (FILTER) is where the filtering occurs.  There are a few confusing things here.

  • Line 5 refers to ALL(Calendar) instead of simply Calendar.
  • Lines 6 and 7 seem to be evaluating against themselves – very confusing.
  • Line 7 starts with a double ampersand &&

Let me explain each line.

Line 5 ALL(Calendar)

The key thing to understand here is that the filter portion of calculate always operates in the current filter context.  That means that the Calendar table in line 5 has already been filtered by the visual (eg the Pivot Table).  Looking back at the pivot table image above, this means that the line 5 is already filtered by the pivot table and hence the Calendar only has dates from 1 July 2003 until 31 July 2003 available – all other dates have been filtered away.  The purpose of the ALL Function therefore is to remove all filters from the current filter context.  If it didn’t remove these filters, it would not be possible to access dates outside of the month of July 2003.

Now they have all be removed, it is time to add back that date filters that are needed.

Line 6 MAX( )

Line 6 reads “where Calendar[Year] = MAX(Calendar[Year])” and hence it seems to be referring to itself. The way to read and understand line 6 (and 7) is as follows:

  • Whenever you see an aggregation function (in this case MAX) around a column, it means “go and read the value from the current filter context”.
  • Whenever you see a “naked” reference to a column (ie there is no aggregation function around the column), then it means “go and apply a new filter on this column in the data model.

Taking these 2 rules, it should be clear that MAX(Calendar[Year]) in the current filter context is = 2003.  Line 6 therefore says “Go and apply a new filter on Calendar[Year] to be equal to 2003.

Note that you can use any aggregation function in these formulas that does the job.  In line 6, you could use SUM, MIN, MAX, AVERAGE or any other similar aggregation function and still get the same result.  You could also use VALUES or DISTINCT in the case of line 6, and LASTDATE in the case of line 7.  There is no right or wrong answer, simply think about the need and then find the best function to use.

Line 7 && and MAX( )

Line 7 reads “and Calendar[Date] <= MAX(Calendar[Date])”.  The double ampersand && is the inline syntax for a logical AND.  Using this knowledge and the same logic as with line 6, the way to read line 7 is as follows:

“AND also go and apply another new filter this time on Calendar[Date] to be less than or equal to 31 July 2003”.

With both of these filters applied in a logical AND, the Calendar table will contain all of the dates from 1 Jan 2003 until 31 July 2003.

The Result

Once the time intelligence function has been written, it can be added to a visual as shown below (Power BI Desktop).  Note how the YTD formula seems to “defy” the filter context coming from the visualisation due to the custom time intelligence function that has been written and explained.

image

The Trouble with Syntax Sugar

Syntax sugar is great as it makes otherwise hard formulas very easy to write with a limited understanding of filter context, row context, filter propagation, context transition etc.  There are a few down sides however.  Firstly is that you don’t get to learn these very important concepts and hence you are delayed in becoming a true Power Pivot and Power BI Ninja.  Secondly the inbuilt time intelligence functions only work in certain circumstances where you have a day level Gregorian calendar.  Read more about that here exceleratorbi.com.au/power-pivot-calendar-tables/

Granularity

I personally hardly ever use the inbuilt time intelligence functions unless my data is at a day level of granularity (which is rare), and I find the custom functions become easy to write with practice.  Custom time intelligence functions become much more important when your data is not at a day level of granularity.  Most of the work I do for clients is either weekly or monthly data.  If you are in this situation you could “trick” Power Pivot that you are using daily data by loading all data in any given week or month on the same date (eg last day of the month) and use inbuilt time intelligence however this is hardly best practice.  A much better approach I think is to write custom time intelligence functions as outlined in this article.  If you are going down the  path of writing custom time intelligence, you should read my advice about adding an ID column into a calendar table to make custom time intelligence functions easier to write.  exceleratorbi.com.au/power-pivot-calendar-tables/

Sales vs Prior Year

Time for a different example.  Now that I have covered how a custom time intelligence function works, let me show you a couple of inbuilt time intelligence measures that solve the original problem (Sales vs Prior Year).

Sales Prior Year 1 = CALCULATE([Total Sales],SAMEPERIODLASTYEAR(Calendar[Dates]))

Sales Prior Year 2 = CALCULATE([Total Sales],DATESADD(Calendar[Dates],-1,YEAR))

Both of the above formulas use inbuilt time intelligence functions (shown in bold), but they also use CALCULATE.  Now you have an understanding that CALCULATE is performing a time shift, it should be much easier to understand what is happening in these formulas.  Both of these formulas in bold produce a table of dates that have been time shifted by 1 year.  CALCULATE then takes this new table of dates, removes the current filter context from the calendar table and then moves back in time by 1 year before reapplying the filter context and then doing the calculation.  One you have [Sales Prior Year] it is easy to write:

Change vs Prior Year = [Total Sales] – [Sales Prior Year]

% Change vs Prior Year = DIVIDE([Change vs Prior Year] , [Sales Prior Year])

So where can you find a list of all the inbuilt time intelligence functions?

 A Free DAX Reference Guide

One of my students at a recent training class asked me if there was a list of all DAX Functions – kind of like a cheat sheet.  I wasn’t able to find such a thing so I produced exactly that and I am making it available free to anyone that would like a copy here.

This reference guide covers all of the inbuilt time intelligence functions on page 14 as well as every other function across the language all nicely laid out to make them easy to find.  You can download this reference guide below.  If you haven’t ready done so, why not sign up for my weekly newsletters at the same time so you are kept up to date with my latest tips and tricks about Power Pivot, Power Query and Power BI.

Download the DAX Reference Guide Using the Form Below

Extract Tabular Data From Power BI Service to Excel

Someone asked me a question yesterday about exporting data from the Power BI Service into Excel.  There are a few options to do this however they all have their problems (these problems are not covered in great detail in this post).

  • Power BI has an inbuilt export data feature (there is an export limit of 30k rows for a free account and 150k rows for a Pro account.  Plus the connection cannot be refreshed).
  • Analyze in Excel (can only view the data in Excel via a pivot table, not a table of records).
  • Download the Power BI Desktop file to your PC and use DAX Studio (you have to download the entire model just to get the records you want, besides it may not even work at all).

I started to think that I may be able to connect directly to Power BI Service using DAX Studio and then use the techniques I cover here to extract the records.  I tried this but couldn’t get it to work (please let me know if you find a way to do this).  I started searching and found this old article I wrote in 2015 that uses a technique I learnt from Marco Russo.  l have used a similar approach to create a direct query of a Power BI Data Model and extract the records directly into a Table in Excel.

Edit: 11 Jan 2017 – see the bottom of this post to see how Chris Webb recommends to do this task – it is much easier :-).  But read on for some interesting learning.

Create a Link to PowerBI.com

The first thing I did was to select my dataset from the Power BI Service, clicked the Ellipsis (1 below) and then Analyze in Excel (2 below).

image

This downloads an ODC file to the downloads folder.  Mine was called Adventure Works.odc

image

Note: there have been many updates to the Analyze in Excel connectors since its launch last year.  If you haven’t updated the Analyze in Excel connectors for a while you would be well advised to do that first before attempting the rest of the procedure that follows.  This can be done by logging into Power BI, clicking Downloads (shown as 1 below) and then downloading and installing the connector software (2 below).

image

Edit the ODC File to Make it a Query

I then edited the ODC file using Notepad.  It looks like this originally

image

It is a bit hard to read the file as is, so I cut and paste the contents into an HTML editor to improve the readability, (as shown below) then pasted the formatted text back into Notepad.

image

As you can see above, the default command is “Cube” and the default text is “Model”.  The trick is to change the command to be “Query” and replace the text to be any valid DAX Query.

I changed lines 19 and 20 below as follows:

image

<odc:CommandType>Query</odc:CommandType>
<odc:CommandText>Evaluate Sales</odc:CommandText>

The Command Text I used above is one of the most simple DAX Queries you can write.  All queries must start with the EVALUATE statement followed by any valid DAX code that returns a table of data.  The simplest way to do that is just specify an existing table (Sales in this case). If you are doing this, you can use any table name in your data model.

I then saved and closed the ODC file.

Open the ODC in Excel

To open the ODC file in Excel, I then simply double clicked on the file.  This opened Excel and gave me the following security warning

image

I clicked enable, and there appeared the entire Sales table from my Power BI data model directly in a table in Excel.

image

Now that the connection has been established, the file can be saved and refreshed as needed in the future.  To refresh, simply open the file, right click on the table and click “refresh”.

Change the Query

So far this is pretty boring albeit maybe useful for some people.  Much more interesting is to be able to extract a subset of the data to Excel.  To do this you need to learn a bit of the DAX Query Language.  I am not going to go into the DAX Query Language in detail in this post, but below I show how to edit the current query and change it to a new query – in this case a query that will return the Total Sales by Invoice Number.

To edit the existing query, I simply right clicked anywhere in the table (1 below), then selected Table (2 below), then Edit Query (3 below).

image

This brings up a dialog that can be edited as needed.

image

For my demo, I replaced the command text with my new DAX Query as follows

image

After clicking OK, this query then returned a table of all invoice numbers and the total value of sales for those orders.  Interestingly the number formatting for Total Sales did not flow through to Excel as I expected it to.

image

Chris Webb’s Trick

Edit 11 Jan 2017. Chris Webb tweeted an alternate approach that is faster to execute. Simply create any pivot table using Analyze in Excel, drill through to the underlying table and then edit the query as before.

Further Learning

I may come back and blog about DAX as a Query language at another time.  Until then if you would like to learn what is possible, I recommend this video from Alberto Ferrari (although it is a bit advanced unless you have some solid understanding of tables and databases).

www.sqlbi.com/tv/querying-in-dax-tee2014/

Cleansing Data with Power Query

Today I am combining a few techniques to show how to build a robust cleansing approach for your data using Power Query.  This article will demonstrate the following Power Query techniques

  • Joining tables (Merge Query)
  • Self Referencing tables
  • Adding Custom Columns

Sample Data

I am using some sample data as shown below.  The Country data is the data I want to cleanse so there are consistent country values in the Country column.

image

I first load this data into Power Query and set “Close and Load to” so that it only creates a connection.  I then have Power Query access to the source data table that can be reused multiple times.

image

Create a Table of All Entered Countries

The next step is to produce a list of the unique entries in the source data.  To do this, I referenced the above table by right clicking (1 below), then Reference (2)

image

I then removed the Name column (1 below) and then removed the duplicates in the remaining Country Column (2 below).

image

I then sorted the column before loading the data back to Excel as shown below.

image

Create the Substitutes with a Self Referencing Table

I first covered this trick of a self referencing table in this blog article.  Continuing on with my test data above, I have manually added a new column to the green table above and entered the substitutions as needed.  Where there is no substitute needed I have simply left the substitute blank.

image

As it currently stands above, if the source table is sorted and then the green table is refreshed, the substitute column will fall out of alignment with the source table – that is bad.  Now for the self referencing trick.  I load this green table above back into Power Query as a new query, setting it to only create a connection as I did for the original set of data. I named this new query “Substitutes”

I then edited the original query that created the original green table and joined it to the substitute table (aka self referencing).  While editing the Entered Names table, I selected Merge Queries (1 below), selected the Substitute table (2), then joined on the Country Columns (3, 4) with a left outer join (5).

image

I then expanded the new column to extract the substitutes.

image

 

This now gives me a self referencing table that will refresh while keeping the countries and substitutes correctly aligned regardless of what happens to the source table.  This is the key point about a self referencing table (covered in my previous article).

image

Update the New Data

Now when new data is added to the source table, it will appear in the substitute table after refresh as follows.

image

What would be much better is to be clear which are the correct entries that have previously been checked so it is more obvious which are the new entries.  So I decided to enter the correct country name for each entry as follows.

image

It is now much clearer that there are 3 new entries that need to be maintained, and I then updated the new data below.

image

Use the Substitute Table

Now that I have this substitute table working for me, all I need to do is create a new copy of my source table that uses the substitute country rather than the original country entered.  I create a new query referencing my original source data (right click, reference).

Merge the queries as before, joining on the original country column.

image

expand the table to retrieve the substitute column

image

Then delete the original country column and rename the substitute column to be called “Country”

image

As you can see below, the substitute table is correctly remapping the data as needed.

image

When the new data is added to the source table, the substitute table will have blank entries as shown below.

image

As a nice touch, I went back into the final query and added a new custom column as follows

and then used this new column as the country column.  Now when I refresh the data, it is clear that I need to do some maintenance on the substitute table.

Easy Online Surveys with Power BI Reporting

I think today’s article will be of interest to my readers even though it is a little astray from my normally pure Power BI, Power Pivot and Power Query content. I will show you how to quickly and easily create an On-Line Survey that you can distribute to anyone that has an Internet connection, and then use Power BI to report on the results from your survey.

Office 365 Excel Surveys

You may not be aware that O365 has an inbuilt Excel survey tool that is very easy to use.  And who wouldn’t want to use Excel to solve a problem if they could?  You can find the survey tool by going to OneDrive online.  I am using OneDrive for business for this demo, but it also works for OneDrive Personal.

Here is Survey – Please Participate

For demonstration purposes, I have created a survey to ask my readers what they would like to see more of on my website and to see which countries the readers live in.  Edit 7 Feb 2016.  The survey is now closed, but you can still read on.

image

Now that you have completed the survey, read on to see how you can do the same yourself and to see the latest results via an embedded a Power BI report at the bottom of the page.  Note that it can take up to 1 hour for your specific survey response to be available in the report (ie it is near real time – not real time).

How I Created the Survey

First Open OneDrive Online

Frankly I get confused with all the logins for Microsoft sites. The easiest way I know how to login to OneDrive Online is to Open Windows Explorer, right click on the OneDrive folder and then select View Online.

image

Create Your Excel Survey

Creating an Excel survey couldn’t be easier.  Just click New\Excel Survey as shown below.

image

I immediately got some survey header information to fill in as shown below.

image

There are no user manuals required to work out how to do this.  I just updated the information and moved on to the next section.  It is easy to enter the questions you have, and there are a number of intuitive response types that you can choose to meet your needs.

image

Survey Results Database

After you have completed your survey, you can see the underlying Excel spreadsheet that stores your survey results.  It is possible to add new columns to the right of the results table.  Additional columns added manually can be used to store information you want to add without it being part of the survey.

image

Connecting the Data to PowerBI.com

There are a few ways you can create a Power BI Report.  If you want a fully featured data model including the ability to import other tables of data and write your own DAX, then you need to import this new data source into Power BI Desktop as you would with any other report.  For this demo today, I have use the “Get Data” feature from directly inside Power BI Service to connect to the Excel workbook on OneDrive.  To do this, I first logged into PowerBI.com and selected get data from files as shown below.

image

Then I selected OneDrive – Business as shown below.

image

Then I selected my survey file and clicked Connect

image

I was then prompted to either import the data or connect.  The connect option works just like Excel Online – not what I wanted.  So I selected Import.  When you use import in this way, Power BI will check your OneDrive account about once each hour to check for new data and will automatically update your report.

image

Power BI then created a new Dataset for me (1 below).  You can see the table of data shown in 2 below.  It also created a Dashboard, however this was not much use so I simply deleted the dashboard.

image

Here is the Final Power BI Report

Once I had the data connected, I simply created a few simple visualisations using the capabilities of Power BI Service.  You can see the final interactive report below.

Be sure to read next week’s post where I cover how you can use Power BI Desktop to model your online data. exceleratorbi.com.au/data-modelling-onedrive-source-files/

How to Document DAX Measures in Excel

I often get asked if there is an easy way to create documentation for DAX measures when using Power Pivot for Excel.  I am not a big fan of documentation for the sake of it, but I do see value in having “some” appropriate level of documentation.  I think a good balance of the right amount of documentation vs the effort to write and maintain the documentation is to use the Power Pivot UI itself.  I explain how I do it below.

Writing DAX Measures From Within Excel

I always teach students at my Power BI training courses to write their measures from within Excel via the “new measures” dialog box (and specifically not to write them from within the Power Pivot window).  The process is always as shown below.

  1. Set up a pivot table that has some relevance to the measure you are writing, then click in the pivot table (shown as 1 below)
  2. Select Power Pivot Menu \ Measures* \ New Measure* as shown below (*called Calculated Field in Excel 2013)

image

The reasons I teach students to do it this way are simple.

  1. You get a dialog box/wizard that shows you every step that is required to complete the task.  Just complete each step one at a time and you can’t go wrong.
  2. You get to check the measure for errors before completing it (shown as 2 below).
  3. Once you add the measure to your pivot table (shown as 3), you get to see immediately if it is working as you expected.

image

image

And there is a fourth reason to use the wizard that is less obvious.  The Description box (shown as 4 above) is a great place where you can make notes about your DAX formulas (aka documentation).    In Power BI Desktop you can add comments into your DAX expressions, but this is not possible in Excel, so the description box is the best option you have.

Create Your Documentation

If you diligently add comments into the Description box above, you will then be able to extract this information into a documentation page using the Power Pivot Utilities Toolbar.  I have previously reviewed this great little tool here, so you can go back and read about it if you missed it.

After installing both tools, you will then see the PP Utilities Toolbar in Excel (shown as 1 below)

image

Click on List Measures (shown as 2 above).  After a few seconds, you will see a new page in your Workbook called “Measures” like shown below.  Note the “Description” column on the right hand side. This column is sourced directly from the information entered into the Description box shown further up the page.

image

What’s more, if you select the measures on your page (Shown as 1 below) and then click Format DAX Expressions (2 below), the measures will be nicely formatted so they are easy to read.

image

This page is then your documentation and you can refer to it at any time you need to see the big picture.

Always Up to Date

One important feature of this type of documentation is that it is very easy to maintain.  If you make changes to any of the measures or comments in your data model, you can simply re-run the extract process so you have an up-to-date list of all the measures.  You can also do the same for Calculated Columns and also relationships.

What About Excel 2010?

Unfortunately the Power Pivot Utilities Addin only works for Excel 2013+  But all is not lost.  It is possible to extract the Measures from your Data Model manually using DAX Studio.  You can read my blog post about how to do that (and other cool things) with DAX Studio here

Excel Workbook Connection to Local Power BI Desktop

Today I am sharing an easy way to use Power BI Desktop as a SSAS tabular server on your local PC.  I find increasingly that I am building data models in Power BI Desktop and yet I still want to use Excel to analyse the data some of the time.  If I load the PBIX workbook to the Power BI Service, it is easy to use “Analyze in Excel” to query the data model.  But there is currently no standard supported method to do the same thing using Power BI Desktop as a server (see my suggested idea to Microsoft at the bottom of this page for more about that).  Until then, we need to use a “hack” to complete this task.

I have previously shared a way to:

  • Open DAX Studio on your PC
  • Use DAX Studio to extract the Port Number of a running instance of Power BI Desktop
  • Manually enter this information into Excel to connect to Power BI Desktop and use it as a local SSAS Server instance
  • Query the data model that exists in the open Power BI Desktop file with a Pivot Table in Excel.

The problem with the above approach is that once the Power BI Desktop file is closed, the connection string becomes invalid.  Next time you want to use Power BI Desktop as a server you must go through the entire process again plus you need to alter the connection string that was previously created – hardly a scalable solution.

An Excel VBA Workbook Template to Solve This Problem

Local Host Workbook

Today I am sharing a tool that I built recently that completes the entire process automatically in Excel using VBA and Power Query.

Disclaimer:  I am providing this tool free of charge, without support, and do not warrant that it will work for you.  This is not supported by Microsoft and could stop working anytime.  If it works for you, then great – if not then it is unfortunate :-(.  Maybe you can deconstruct what I have done and make it work for you.  I built this using Excel 2016 and I have tested it on 2010 and 2013 and it works (Please make sure you have the latest version of Power Query installed).

You create a new blank Excel Workbook using my template (from personal templates), then click the “Refresh SSAS Connection” button.  The tool will then:

  • Detect an instance of Power BI Desktop running on your PC (there must be 1 and only 1)
  • It finds the port number and database name of the running instance
  • it modifies the existing connection string in the Excel Workbook to repoint it to the current running instance of Power BI Desktop.
  • If the workbook already contains pivot tables that were previously built using the same data source, they will simply reconnect and work as you expect.
  • If this is the first time you have used the template, you will have a blank Pivot Table connected to the Power BI Desktop data model.
  • You should only have 1 Excel Workbook and 1 Power BI Desktop file open at any one time (as they all use the same connection string).

I am not planning on covering in detail how the tool works.  I am making it available to use and modify as you see fit.  If you want to look into the VBA code and the Power Query to see what I did then of course this is fine by me.  If you want to improve it and give me back the improvements under the same “share and share alike” rules, then please do so as long as you leave my name comments in the file untouched.

How to use the Template

You can download the template here.  I have put it in a zip file so that I can keep the same link but change the name of the actual file over time (to manage version numbers).

The file is a macro enabled Excel Template workbook.  You need to place it in your Personal Templates location.  To find this location, in Excel go to File\Options, then navigate to save (1 below) then check the location (shown in 2).  If you don’t have a current template location, you can set your own.  Note Excel 2010 has a different file location (Google it).

image

To create a new workbook that uses this template, in Excel go to File\New, select the personal templates option (shown as 2 below) and then select the template.

image

I would love to hear what you think and find out if this is a useful tool for you.

Let’s Ask Microsoft to Make this a Standard Feature

I was talking to Avi Singh and he suggested that I request this as a standard feature.  I have set up an idea that you can find and vote for here.  If we get enough votes for this feature, Microsoft may build out this capability into the core product(s) so it is easier for everyone and fully maintained for future releases.

 

Power Pivot Crashes? – An Update

powerpivotlogoToday Avi Singh and I met with Howie Dickerman from Microsoft to discuss the Power Pivot for Excel Add-In.  You can read the background to this meeting at my blog post last month.  Howie is the Program Manager assigned to Power Pivot for Excel and has a wealth of experience with Excel and other products.  I shared all the comments that people added to my blog post from 19th October.  The great news is Howie is already crystal clear about the problems.  We had a great discussion about the plans that Microsoft has for improving stability and performance for the Addin.  And there definitely is a plan.  Like any software project it is impossible to say when we will see something, but I am just excited that there is a plan.

How You Can Help

Microsoft already has a clear understanding of the areas that need some work.  What Microsoft needs more than anything is a set of structured steps that will reproduce an issue – any issue you have.  From experience this is very hard to pin down.  If you know how to reproduce an issue, please send me the steps to reproduce it (you can add to the comments below or send me a message via my home page).

Note:  I realise lots of people have had a bad experience at times.  I am explicitly looking for “reproducible steps” here.  This means that when you follow these steps it will always (or often – eg 1 in 4) cause the issue.  If it is not reproducible then the information cannot help for this next phase.

Does the Power Pivot Excel Add-In Crash for You?

I was chatting to Avi Singh the other day and we were comparing notes about the live Power Pivot training classes we both offer in Australia (me) and the USA (Avi).  One thing we both have in common is that the Microsoft Excel Power Pivot Add-In CRASHES a lot during training!  When I say “crash”, I mean it stops working in some ungraceful way.  Excel normally don’t stop working, just the plugin.   I always warn my students that the Add-In will crash , and I always say If Power Pivot wasn’t so great, you wouldn’t put up with it – but it seriously IS that great and hence I have learnt to live with it”.  But there were a lot more Add-In crashes in my training last week than I have come to expect, and I would like to try to do something about it.  Note when I say the Add-In crashes, I mean it stops working – Excel itself doesn’t crash.

Before moving on, I don’t want anyone to get me wrong here, I still love Power Pivot (really – I LOVE IT!).  It is still the best thing to happen to Excel since Excel itself.  But the instability of Power Pivot for Excel during the data modelling process is really bad.  In this post today I explain the symptoms of the Add-In crash and the recovery process.  But more importantly I am seeking feedback from the community to share their own experiences with the Add-in crashing so I have some basis for a discussion with Microsoft when I am in Redmond later this year.  So please post your comments and experiences at the bottom of the page.

It is Worse in Excel 2016

I taught my Power Pivot class using Excel 2016 for the first time last week (I normally teach in Excel 2013 or Excel 2010).  Frankly I was expecting that the Add-In would be a lot more stable given it is an updated version, and given Microsoft ships bug fixes via O365 on a regular basis.  But in fact I found the exact opposite – The Excel 2016 O365 version of Power Pivot for Excel is the most unstable I have ever used.  For the first time ever, the Add-In crashed and at the same time it actually corrupted my data model and made the entire workbook un-usable.  This has never happened to me before using Excel 2010 or Excel 2013 despite experiencing literally hundreds of Add-In crashes over the years.

Symptoms that the Add-In has Crashed

There are a few symptoms that the Power Pivot Add-In has crashed.  The easiest to spot is when you get some sort of “Unhandled Exception” error message (something like this one below).

Image result for unhandled exception

If you see an error message like this, then it is time to restart the Add-in.  Read about how to recover from an Add-In crash further down the page.

But sometimes the symptoms that the Add-In has crashed are harder to spot.  An example is when you write a measure directly in the Excel measure editing box (like shown below), the measure is written correctly (shown as 1 below), yet when you click “Check Formula” (shown as 2), it says there is an error in the formula (shown as 3).

image

This is really confusing for someone that is trying to learn to write DAX.  It is hard enough getting your head around all the new formulas and new syntax, but when Excel is telling you there is an error in your formula AND THERE ISN’T, then wow!  I have a few years of Power Pivot experience, and in that time I have learnt to spot these subtle clues that the Power Pivot Ad-In has crashed, but for a new user it is a horrible experience.

Remedies for When the Power Pivot Add-In Crashes

The first thing I always do when teaching a Power Pivot class is to show the students how to recover from a Power Pivot Add-In crash.  This is a mandatory skill for anyone learning Power Pivot.  The standard fix is to turn the Add-In off, then on again. But in my experience there are actually 3 levels of remediation depending on the seriousness of the Power Pivot Add-In Crash.  In my training course yesterday, I came up with a naming convention for the 3 levels of fix.

DEFCON 3

This is the most mild response to the Add-In crash and it is always what I try first.  You first need to open the COM Add-Ins dialog.  The easiest way to do this is to turn on the Developer Ribbon (turned off by default) and use the COM Add-Ins button.

image

Once in the dialog, you need to de-select the Power Pivot Add-In (shown as 1), then click OK (shown as 2).

image

Then you need to go back into the COM Add-Ins again, reselect the Add-In and click OK.  More often than not, this will successfully restart the Add-In and you can get back to work.

DEFCON 2

Sometimes the simple stop/restart fix doesn’t solve the problem.  Then it is time to move to DEFCON 2.  The process is basically the same except you must close the Excel application before restarting the Add-In.

  1. Disable the Power Pivot Add-In
  2. Shut down Excel
  3. Restart Excel
  4. Re-enable the Power Pivot Add-In.

This second approach will fix many of the more serious Add-In crashes.  But sometimes (and actually quite often when using Excel 2016), even this second level response doesn’t work.

DEFCON 1

Time then to move to DEFCON 1 and throw everything you have at the problem.  The process is still basically the same except you add in a PC reboot to completely reset everything.

    1. Disable the Power Pivot Add-In
    2. Shut down Excel
    3. Reboot your PC
    4. Restart Excel
    5. Re-enable the Power Pivot Add-In.

This last approach has always recovered the Power Pivot Add-In crash for me.  Unfortunately however I experienced my first ever workbook corruption using Excel 2016.  I got the Power Pivot Add-In working again with a DEFCON 1 response, but the workbook could not be saved.

Things that Causes the Add-In to Crash

Given I have observed so many Add-In crashes over the years, I have an understanding of the types of things that cause the Add-In to crash.  The key things that come to mind are:

  • If you make an error in a DAX formula but don’t click “Check Formula” before saving the measure.  The more often you do this, the more likely the Add-In will crash.  It seems the internal error handling is not that robust.
  • When you create Linked Tables and add them to the data model, there are often issues that seem to be related to the data model not refreshing properly.  Often you simply can’t see the new table in the Pivot Table Field List.  I can normally solve this problem by shutting down Excel and restarting.
  • If you try to rename a Linked Table after you have added it to the data model, this can cause the Add-In to crash.
  • Unfortunately now it seems that if you are using Excel 2016, that alone is enough to be at risk.

Power BI Desktop

Edit: 22 Oct 2016

I just finished a 2 day in house training course with a client.  For the first time I did the entire training in Power BI desktop.  It was much more stable than Excel has been.  There were a couple of application crashes but recovery was pretty good.

Please Share Your Experiences so I Can Share with Microsoft

As I mentioned at the top of the post, I want to be able to give some “feedback” to Microsoft about this problem.  You can help me by sharing your experiences in the comments section below.   The more real world feedback I get, the more likely I will be able to get some attention to the issue. That way I will be able to demonstrate that this is not just one grumpy Australian that is complaining about an isolated problem, but a widespread issue that needs some attention.

I will let you know how I go 🙂

Who Needs Power Pivot, Power Query and Power BI Anyway?

Level: Beginners

One of the great challenges Microsoft has faced with its “new” suite of Self Service BI tools (particularly Power Pivot) is that most people that could benefit from the holy trinity (Power Pivot, Power Query and Power BI) don’t even know these tools exist, let alone how the tools can help them succeed in their jobs.  The situation is definitely getting better as Power BI starts to get a presence in the market place, however I still talk to people who have heard of Power BI, but have no idea what Power Pivot or Power Query are, and what’s more they don’t know why they should care.  I personally believe a big part of the awareness problem is that Microsoft insists on shipping Microsoft Excel with the Power Pivot plugin disabled.  There is no reference to Power Pivot when you excitedly receive your brand spanking new version of Excel 2016 – what a marketing opportunity gone begging!

image

I have been an Excel nerd for 30 years.  There is nothing I used to enjoy more than installing a new version of Excel, and clicking through every menu item to find something shiny and new that would make my life easier.  By not turning on the Power Pivot menu by default, Microsoft is missing one of the best silent selling opportunities is has for this fabulous addition to Excel.

Given there is no “pull through” on the menus, many people don’t know what these products are or why they should care.  I am often asked by people “who can benefit from these tools?”.  This post sets out to explain who can benefit and why.  Note when I say “who can benefit”, I am not talking about “consumers of reports” here, I am talking about “authors of reports”.  It is clear that people that consume quality reports and analysis will benefit, whatever the tool.  This article is focused on the benefits to those people that are responsible for building the reports and analysis that others will end up consuming.

Power BI

Who can benefit from Power BI is probably the easiest to understand.  The product is well marketed and has a clear role to play.   Power BI is a complete self service BI tool.  It is designed to bring business intelligence capabilities to the masses instead of the elite (e.g. instead of just highly skilled IT MDX programmers).  Rob Collie wrote a good article last week about the democratisation of BI tools.  Power BI will add value to people who:

  1. Have problems sharing reports with others because the file size is too large.
  2. Need to share data with people on the go, that maybe only have a Tablet or a Mobile phone.
  3. Have large data sets that can’t be managed in traditional Excel.
  4. Are spending too much time each week/month manually maintaining reports with new source data and/or new visualisation requests.
  5. Can’t get the (timely) support they need from their IT department using traditional Enterprise BI tools.

Power BI is great because it puts capabilities across the end to end BI stack into the hands of end users (authors), including:

  1. Extraction of data from the source (using the Power Query engine)
  2. Transformation of that data into a new shape that is optimum for BI reporting and analytics (Power Query again).
  3. Data modelling capabilities, so you can turn the raw data into business meaningful insights (using the Power Pivot engine).
  4. A reporting and analytics front end allowing you to build reports to visualise your data (Power BI Desktop and Power BI Service).
  5. A fully integrated cloud based delivery framework so you can easily share with anyone over the internet (Power BI Service).
  6. A full set of Mobile applications across the major operating systems (Power BI Mobile).

Notice how steps 1 and 2 use Power Query, and step 3 uses Power Pivot.  So if you want to learn about Power BI, you really need to learn about Power Pivot and Power Query too.

Power Pivot

Conversely, Power Pivot is the hardest to understand – I.e. it is the hardest for individuals (potential authors) to understand “what does Power Pivot do for me and why do I need it?”.  I have had people enquire about Power BI training courses that have not been interested in Power Pivot or DAX*. But the truth is, if you want be able to write your own reports in Power BI, you really need to learn at least some basic Power Pivot skills.

Power Pivot is a data modelling tool.  It is the underlying reporting engine that enables Power BI and Modern Excel to delivery those modern funky reports that can help you succeed in business.  The Power Pivot engine allows you (the report author) to take your business knowledge and to configure the reporting tools so that Power BI and Excel Pivot tables can be used to find and report on insights in your data.

Most business users have never heard of the term “Data Modelling” before, and the reason for this is quite simple – it has always been the IT department that has been responsible for data modelling.  Power Pivot brings the power of data modelling and puts it in the hands of competent business/Excel users.    An example of data modelling will make it easier to understand.

Example of Data Modelling

Consider a scenario where you download sales data from your company transaction system and it looks something like this in a spreadsheet.

image

You can see the sell price and the cost price information exists in the table above.  But there is nothing about the $ Margin per product and nothing about the % Margin, let alone insights like Sales Year to Date, Top 3 selling products, Fastest growing product etc.  In a traditional Excel world you would simply write formulas in your spreadsheet(s) to enhance this raw data and extract the additional insights.  The trouble with this approach is that all these formulas are only ever “one off”.  You write them for this report, and then when you create another report later, you have to write the formulas again.  Power Pivot handles this problem in a different way.  Power Pivot is a data modelling tool, and instead of writing formulas in your final reports, you write them “under the hood directly in the Power Pivot engine” as part of the data modelYou build the “rules” that describe how to calculate $ Margin and % Margin ONCE and only once directly in the Power Pivot engine.  Once you have created the rules for each insight (e.g. $ Margin, % Margin etc), it is forever available for you (and others) to use over and over again in any report, chart, Pivot Table or any other visualisation you can think of.  Never again to you have to write/copy a new formula every time you create a new report.

*What About DAX?

Data Analysis Expressions (DAX) is the formula language of Power Pivot.  DAX is very similar to the Excel formula language yet there are also a lot of differences that you will have to learn from scratch.  Simply put, if you want to learn Power Pivot (and/or Power BI), you will need to learn at least some DAX as well as lots of other things about how the Power Pivot engine works.

Who needs Power Pivot and why?

People that need this tool are typically Microsoft Excel users/report authors that analyse or report on data, particularly when the following conditions apply:

    1. There are lots of data that stretch the capacity of traditional Excel (file size, re-calculation speed etc).
    2. If you use Pivot Tables a lot to analyse your data.
    3. If you are writing a lot of VLOOKUP (or INDEX/MATCH) formulas to join data from different tables so you can analyse in a Pivot Table.
    4. If you have had to create bespoke reports in traditional Excel, but then have to spend hours rebuilding the report when asked for a different view of the data (e.g. you have a report that shows sales by half year, and then you are asked to produce the same report by month – and have to start again almost from scratch).
    5. Anyone that wants to start using Power BI for anything more than the most simple aggregation reports.  The “out of the box” capabilities of data modelling in Power BI are limited, and you will quickly realise that you need to learn some new skills (Power Pivot/DAX) to be able to leverage the strengths of Power BI.

Power Query

Power Query is a user friendly ETL (Extract, Transform, Load) tool.  Unfortunately Microsoft muddied the waters by renaming Power Query “Get and Transform” in Excel 2016, and “Get Data” in Power BI Desktop.  That aside, Power Query is used to:

  1. Extract:  Connect directly to the data source and ingest the data (into Power BI or Excel) so you can work with it.  It retains a connection to the source so when the source data is updated you can automatically “refresh” the data without having to go through the import process again from scratch. It is amazing.
  2. Transform:  You can clean and reshape the data so that by the time you are ready to use it, it already is in the format you need instead of the format you are given.  Again, you do this only once, and then you can automatically refresh when the data is updated later.
  3. Load the data directly to the place where you want to use it.  The end location for the data will typically be inside Power Pivot or Power BI, but it is also an invaluable tool for use with traditional Excel.

Don’t think of Power Query as simply a tool for self service BI reporting.  I have used Power Query to automatically audit information (eg XLSX files, csv extracts from other systems etc) and produce exception reports directly in Excel.  Once you understand what Power Query can do, you will start to realise all the ways you could use it to make your life easier.

Who needs Power Query and why?

People that need this tool typically are trying to solve the following problems.

  1. They regularly receive file extracts from someone (or some system) and need to manually manipulate this data before they can use it.
  2. They want to use Power BI/Power Pivot but can’t get the IT department to shape the data they need prior to loading it into Power BI/Power Pivot.
  3. People that are doing repetitive tasks such as matching invoices on a remittance advice against a bank statement (or similar) – Power Query eats this kind of work.
  4. Anyone that is given data in the wrong shape (example below).  And yes I quickly used Power Query to un-pivot the data in the blue table so it looked like the green table.  It is worth getting and using Power Query just to get this un-pivot feature alone!
    image

Wrap Up

Hopefully this overview has helped demystify how Power Pivot, Power Query and Power BI can help you in your job.  If you would like to learn more about Power Pivot, you can read my book “Learn to Write DAX“.  If you want to learn more about Power Query, I recommend Ken and Miguel’s book “M is for Data Monkey“.  If you live in Australia, you can attend one of my live training courses where I cover what you need to know about Power Pivot, Power Query and Power BI.

Power Query to Combine Web Pages

Level: Intermediate

There was an interesting question this week on powerpivotforum.com.au asking if there was a smarter way to user Power Query over multiple identical web pages to scrape the data in a single query.  I have been meaning to blog about this for a while, so it is a great opportunity for a mini-Friday post.

The process of typing all the steps on how to do this is a lot longer than simply recording a video.  So instead of a long post, I have embedded a video below from my YouTube channel.

First the Problem

Take a look at this web page  www.calendar-australia.com/holidays/2016/

It contains all the public holidays for the calendar year 2016.  But there are 2 similar web pages with the same structure but different results

www.calendar-australia.com/holidays/2017/

www.calendar-australia.com/holidays/2018/

How to do it in Power Query

I’m sure I first learnt this technique watching Miguel Llopis from Microsoft, probably at PASS SQL BA Conference in 2015.  The technique can be reused in many different scenarios like this one. The trick is virtually identical to the technique I showed in this post about combining multiple Excel workbooks into a single query. The solution works equally well in Power Query for Excel as well as in Power BI (Get and Transform). The steps are

  1. Build a query for one of the sets of data
  2. Turn the query into a function using the simple parameter step
  3. Modify the query to use the parameter instead of the hard coded set of data
  4. Create a table with a column of the parameters you want to use (in this case the data must be text, not numbers).
  5. Add a new column that takes the text from the first column and processes the function for each row.
  6. Expand the column into a single table.

Here is a video showing how easy it is to do.

Seems there is a special birthday this weekend.  Should be fun!

Direct Connect from Excel to Power BI Service

Today Microsoft announced a great new feature that allows you to direct connect FROM Excel TO Power BI and not the other way around.  This simple change really streamlines the integration experience between Excel and the Power BI Service, and makes Power BI even more like you own personal SSAS server.

Power BI direct connect

There are 2 immediate use cases that come to mind

Personal SSAS Server

It is now easier than ever to effectively have your own free SSAS Server.  Just create a free Power BI account, load up your Power BI workbook, and then access your data models directly from within Excel.

Corporate Sharing

If you are using the paid Pro version of Power BI, the ability to use Power BI Service as a “bonus” SSAS server is also now easier than ever.  Just open Excel and connect directly to the data models that have been shared with your from your organisation from the Excel Power BI menu.

Installing and Getting Started

Download the new version of the Excel Dashboard Publisher Addin here

After installing, you will notice a new Menu Bar in Excel.  Click on this menu and then connect to your Power BI Service account.

image

Once you are signed in, you will be presented with a prompt asking how you want to connect.  In my case I was offered just a single workspace as shown below.

image

What is interesting is that I actually have 2 workspaces in my service, but I can only see one of them.   So something seems a bit amiss here, but I am sure that will be sorted out shortly.

image

A second interesting thing is that you will be presented with options to connect to either a report or a dataset.

image

In my testing, the results were the same.  Maybe something new is coming in the future here!

And yet a third interesting thing is that I was then asked to Authenticate again.  It would be nice if these Azure Authentication dialog boxes were not modal and supported Lastpass.  This would make the login process so much easier.  Even better – Excel could pass the stored credentials to the service so I don’t have to authenticate twice.

image

The good news is this is a one off additional login process, so the next time you connect from Excel you won’t have to log in again.  The exception is if you have multiple Power BI accounts (eg if you are a consultant like me with many clients with different domains) – in that case you will have to continuously log in and out, and not be able to benefit from Lastpass Sad smile

After logging in, you will then see a new blank Pivot Table connected directly to the Power BI Service

image

A Final Word

The original Pin to Power BI Dashboard/Pin Manager is still there (see 1 below), giving you great flexibility in sharing ranges in your workbooks in the Power BI Service.

And last but not least, look how POWERful my Excel Ribbon has become (2 below) – I now have 5 POWERful Addins.  Sweeeet.

image