Best Practices for Power Pivot, Power Query and Power BI

Level: Beginners

There are many best practices for Power BI, Power Pivot and Power Query.  I know these things so intuitively now that it is very easy to forget how I incrementally learnt these things along the journey.  Most of these things are not “right” vs “wrong” – they are more often simply “better”.  I thought there would be value in producing a page that clearly outlines the important best practices as a learning reference for everyone.  In all cases I have outlined why it is a best practice to help the understanding.  You would be well placed to adopt these best practices as part of your DAX journey.

Naming Conventions

Naming your Columns and Measures

  • Always write a Column in the format TableName[Column Name]
  • Always write a Measure in the format [Measure Name]

This is the foundation of all the DAX formulas you will write.  Both columns and measures use the same square bracket syntax.  It is technically possible to write measures and columns both including the table name as follows.

  • TableName[Column Name]
  • TableName[Measure Name]

These 2 examples above do not cause a problem because their names makes it easy to know what they are, but if you always place the table name at the front of every measure and every column, then it will be impossible to tell them apart by reading the formula.  Take the following example

  • TableName[Total Sales]

Is the above a column or a measure?  It is impossible to tell unless you are using the best practice naming convention.

Measures and columns are very different in the DAX language.  It is essential that you can tell at a glance which is which.  In addition if you hard code a measure with the table name and then you later move the measure to another table, any formulas referring to this measure will stop working.

Give Tables a Single Noun Name

  • Don’t just accept the table name from your source system.  Preferably give the table a single word noun description/name.

Many BI data sources will have long table names like fctSalesTransactionsHistory or dimCustomerMasterFile.  This is a common practice in IT and is related to the Kimball dimension modelling methodology.  The problem is that with Self Service BI these table names, column names and measure names are more visible to business users than ever before.  Given many business users are going to be reading these table names, it is much easier for them to comprehend the “Sales” table rather than the “fctSalesTransactionsHistory” table.  In addition earlier versions of Power Pivot do not have fully featured intellisense – you are required to refer to columns by starting to type the table name from the beginning.  If every table starts with either fct or dim, you have just added 3 additional characters you have to type for each formula before Intellisense can help you.

Also PowerBI.com has a natural langauge query tool that allows you to ask quetions of your data.  If your table names are ‘words’, then you are helping the engine find what you are looking for.

Using Spaces in Names

  • Don’t use spaces in table names
  • Do use spaces in column names
  • Do use spaces in measure names

If you use spaces in table names you will be forced to add single quotes around the table name each time you refer to it in a formula.  This makes the code longer, harder to read and “untidy” (IMO anyway).  It is better to use underscore_characters or CamelCase instead of spaces (or better still use a single noun name as mentioned above).

Columns and measures always need to be wrapped in [Square Brackets] anyway and hence adding spaces does not make the code any more complex.  Columns and measures are easier to read if they have spaces

Don’t Overly Abbreviate Business Terms

  • Give your tables, columns and measures descriptive business names without overly short abbreviations.

Firstly you should use language and abbreviations that are commonly used in your organisation.  So if “Year to Date” is commonly abbreviated to YTD, then for sure you can use this abbreviation in your measure names eg [Total Sales YTD].  However if you develop a new measure called [Total Sales Last Rolling Quarter] and this is not a common concept across the organisation, then you are just making it hard for yourself if you call your measure [Ttl Sales LRQ].  You will simply have people calling you asking what it means.

Secondly Power BI has a feature called Q&A that allows a user to ask a natural language question about data.

eg.  What were the total sales for bikes last year

This natural language algorithm looks for matches in words in the question against the data model to help answer the question.  If you abbreviate your measure names to for example [TtlSales] instead of [Total Sales], you are making it hard for Q&A to do its work.  You can help Q&A using synonyms, but do yourself a favour and don’t over abbreviate your name.s

Measures or Calculated Fields

  • Measures is a better name than Calculated Fields

The term measures has been a Microsoft term for a BI formula for many years.  In the first release of Power Pivot in Excel 2010, Microsoft adopted this term.  Unfortunately in Excel 2013 somehow a decision was taken to rename “measures” to be called “calculated fields”.  This was a bad decision and thanks to lobbying from many people Excel 2016 reverted to using the term measures (as does Power BI).  I always now use the term measures and never refer to Calculated Fields unless I am explaining to Excel 2013 users why they are stuck with a bad name.

Loading and Shaping Data

Push Shaping as Close to the Source as Possible

  • Wherever possible, you should do your data shaping as close as possible to the data source.

There are many ways that you can shape your data in the Microsoft BI stack.  Power Query is a great tool to reshape your data however you can also use Power Pivot (Calculated Columns, Filters on load) and Power BI also includes Calculated Tables.  And you can always write SQL code and paste that into the tools to extract the data that way.  The main problem with these approaches is you are effectively hard coding a solution for a single data set.  If you want to build another data set in the future, the work needs to be done again (either copy or re-write).  The data shaping tools are designed to allow you to do what ever you need without having to rely on a third party – use these tools if you need to.  However if you have a common need for data in a particular shape and you can get support (from IT or otherwise) to shape your data at the source so you can easily get what you need, then there is definitely value in doing that.

Shape in Power Query, Model in Power Pivot

Power Query and Power Pivot were built to do 2 completely different tasks.  Power Query is built for cleansing and shaping while Power Pivot is built for modelling and reporting.  It is possible that you can shape your data in Power Pivot (eg you can write calculated columns, you can add calculated tables (in the newer versions) etc).  But just because you can do these things in Power Pivot, doesn’t mean you should.   For example it is possible to write letters to people using Excel, but Word is a much better tool for this task (I knew someone that once did that!).

Best practice is that you should use Power Query to shape your data before/during load, and then use Power Pivot for measures and reporting. I have deeper coverage on this topic here.

Use A Calendar Table

  • If you want to any sort of time calculations, get a Calendar table

It is possible that you can analyse your data in a single flat table without using any lookup/dimension tables.  A Calendar table is a special type of lookup/dimension table because it can be used to perform time intelligence functions.  I have an article on time intelligence here and another on Calendar tables here.  Bottom line – get a Calendar table.

A Star Schema is Optimal

  • Power Pivot is optimised to use a Star Schema table structure

I have an in-depth article about star schemas here that you can read if need be.  I am not saying this is the only layout that will work, or that other designs will always be slow.  I am saying that if you start out thinking about a star schema and aim to build that design you will be well under way to success.  Two key things you should know.

  • Don’t just bring in what is in your source transactional database – that would likely put you into a world of pain.
  • There is no need to create a lookup/dimension table just for the sake of it.  If your sales table has customer name and you don’t care about anything else about the customer (eg city, state etc), then there is no need to create a lookup table just for the sake of creating a star schema.  If you have 2 or more columns relating to the same object in your data table, then it is time to consider a lookup table.

You Should Prefer Long and Narrow Tables

  • Short wide tables are generally bad for Power Pivot but long narrow tables are great.

image

There are 2 main reasons why loading data this way is a good idea.

  • Power Pivot is a column store database.  It uses advanced compression techniques to store the data efficiently so it takes up less space and so it is fast to access the data when needed.  Simplistically speaking, long narrow tables compress better than short wide tables.
  • Power Pivot is designed to quickly and easily filter your data.  It is much easier/better to write one formula to add up a single column and then filter on an attribute column (such as month name in the green table above) than it is to write many different measures to add up each column separately.

Only Load the Data You Need

  • Load all the data you need, and nothing you don’t need.

If you have data (particularly in extra columns) you don’t need loaded, then don’t load it. Loading data you don’t need will make your workbooks bigger and slower than they need to be.  In the old world of Excel we all used to ask IT to “give me everything” because it was too hard to go back and add the missing columns of data later.  This is no longer the case – it is very easy to change your data load query to add in a column you are missing.  So bring in all of what you need and nothing you don’t.  If you need something else later, then go and get it later.  It is even advisable to use a tool like PP Utilities to show you which columns are not in use so you can delete them.  Focus mainly on your large data tables – the lookup/dimension tables tend to be smaller and hence are generally less of an issue (not always).

Don’t use Linked Tables

It is possible to add your data to a table in Excel and then use a Linked Table to load it into Power Pivot.  You simply select the data, go to the Power Pivot Menu (1 below) and click Add to Data Model (2 below).

The trouble with doing this is that you end up with 2 copies of the data in your workbook. The Excel table is an uncompressed copy and then you also have a compressed copy inside Power Pivot.  In the example (shown as 3 above) there are many thousands of rows of data.  The uncompressed data can take up 6 – 10 times more space than the equivalent compressed data.  If you have a small table with a couple of columns and 10 or 20 rows then it is fine.  However if you have anything more than that you are better to have 1 workbook for your data and then import the table directly into Power Pivot without storing it in Excel at all.

If you want a comprehensive lesson on how to use Power Query, checkout my training course here http://xbi.com.au/pqt

Modelling

Avoid Bi-Directional Relationships

  • Avoid using the default bi-directional relationships in Power BI unless you need them.

image

Microsoft introduce bi-directional filter propagation in Power BI (this is currently not available in Excel).  This is a good feature for beginners and those that don’t really understand how the plumbing works.  But it comes at a cost in that:

  • The performance can be negatively affected
  • If there is more than 1 data table, there can be circular relationships created (just like in cells in Excel)

I recommend you turn your bi-directional relationships to single direction (double click to change) and only turn them back on if you really need them.

Measures are Better than Calculated Columns

  • Wherever possible you should write Measures in Preference to Calculated Columns Where Possible

I have been a strong proponent of this for many years.  It mainly applies to Excel users that don’t have any formal learning about database design.  It is very easy for an Excel user to think they should write a calculated column (because they know how to do that) and not a measure (because that is a bit foreign to an Excel user).  I am not going to cover this in depth again now as I have already covered it here.  Just do yourself a favour Excel folk and start with the assumption that a measure is what you should write unless you know why a calculated column is a better fit.

For the record there are times when a calculated column is the best option, but 99.9% of all use cases coming from new Excel users won’t need them.  The main exception is if you need to use the formula in a slicer to filter your data – then you will need a column.

Store Measures in the Table Where the Data Comes from

  • You have a choice where to store your measures, so place them in the table where the data comes from.

Take for example a measure like this.

Total Sales = SUM(Sales[Extended Amount])

The data for this measure is coming from the [Extended Amount] column in the sales table.  You should therefore store the measure in the sales table.   If you place it in (say) the Calendar table, Excel will likely give you a warning similar to shown below.

image

This issue doesn’t occur in Power BI.

Break Measures into Interim Parts

  • Break your DAX problems into manageable pieces and solve each piece one at a time.

You wouldn’t use a single cell in a spreadsheet to build a financial model.  The cells are there to be used and make it easier to build a solution that meets your needs.  You should think the same way about measures.  The following formula is valid however it is hard to read, write and debug.  It also repeats the same line of code multiple times throughout the measure.  Having said that it will give you the % change in sales vs last year.

 

It is much easier to write interim measures and then join the pieces together to solve your problem.  Plus you get each interim measure available for reuse elsewhere in your model.  I am sure you will agree the following set of 4 measures are much easier to understand.

Total Sales = SUMX(Sales,Sales[Qty] * Sales[Unit Price])

Total Sales LY  = CALCULATE([Total Sales],SAMEPERIODLASTYEAR(Calendar[Date]))

Chg vs LY = [Total Sales] – [Total Sales LY]

% Chg vs LY = DIVIDE ( [Chg vs LY], [Total Sales LY] )

Don’t Break Calculated Columns into Interim Parts

  • It is good to have interim measures but it is bad to keep interim columns.

Interim measures are calculated on the fly on demand, they take up little space and make it easier to write your DAX.  As with measures, I t is easier to write calculated columns using interim calculated columns, however the problem is that every column is pre-calculated and stored on disk, and each additional column makes the data model take up more space on disk and memory and hence makes it less efficient.   By all means write interim columns if you need to in order to create a calculate column (not withstanding the earlier comments of columns vs measures) however once you have worked out the correct syntax, concatenate all the code into a single “Mega DAX formula” in a single column. This is an Excel concept I learnt from John Walkenbach.

Other Advice

You Can’t Start to Learn DAX by Reading Alone

I say this up front in my book “Learn to Write DAX”.  If you think you are going to learn a new skill like Power Pivot, Power Query or Power BI by reading a book and not getting your hands in the system, let me tell you “you can’t”.  The exception is if you are a professional SQL Server database user and have a solid background in reporting and analytics database technology, then I am sure you can learn this way. For the rest of us Excel folk, there is no substitute for practicing what you read – so do yourself a favour.

Use 64 bit If You Can

This one causes a world of pain for many people.  Power Pivot is the ONLY MS Office product that can benefit from 64 bit, but unfortunately it is all or nothing.  Most organisations have deployed 32 bit and will be very reluctant to give you 64 bit Office.  You can read my article on this topic here and also read up about my work arounds including installing 64 bit Power BI Desktop with 32 bit Office, and then also using Power BI Desktop as a local server for Excel.

Share?

Comments

  1. Great one, thanks Matt !

    Actually, I saw many data models from PowerPivotPro and other “Power BI companies” where measures are stored in a {disconnected} table simply dedicated to measures.
    What do you think about this approach ?

    • I think this is a fine approach if you want to do it this way. I recommend Excel 2010 users always do this if they are loading via Power Query (because it is so unstable). I also like that a single table will show up as a measure table (sigma symbol) after you hide all columns. The downside (which is why I don’t generally recommend it) is that you get false positive warnings “relationships may be needed”.

      • Two more concerns with disconnected tables: (1) q&a will not work properly and (2) drill through will not work properly. Obviously these depend on the tech you’re using (eg ssas vs pp).

        Measures in a separate table stops the automatic drillthrough behavior in Excel PivotTables (that you might use with the Analyze In Excel feature of Power BI).

        Measures in a separate table are wrongly interpreted by Q&A in Power BI service, because they no longer belong to a table related to other tables in the data model.

        I’m not sure, but also the Quick Insight might suffer by this inconsistency in the data model.

        There have been discussed workarounds but ultimately it was decided to adhere to storing measures in there “native” table. However, given enough time then the above items may be irrelevent; take the the new drillthrough features in SSAS vNext as an example.

          • The big downside of using the native tables is that it is a real pain if you ever want to, or have, rename or rebuild that native table… which is something you might encounter a time or two when first constructing your model. If you don’t move your measures before doing so, you lose them, and moving measures between tables is a real pain. I’m hoping that they adopt the PowerBI desktop model of having measures completely separate from tables across all three products (Excel, SSAS, and PowerBI) in the next round of upgrades.

          • I agree Eric – it would be a good change and it would simplify a lot of things. When connecting to SSAS from a Pivot Table, the measures already get copied into multiple copies of the main table anyway. Also if you use cube formulas the measures are “technically” in a “Measures” table anyway (even when you are using “Calculated Fields” in Excel 2013.

        • Simon, that is an interesting consideration around the Drill Down functionality. I do really prefer having the table disconnected for the various reasons mentioned in other posts here, especially as it makes it easier to discover measures in one place.

          I’ve just tried out a work around and it seems to work. I just added an Index Column to my fact table and then added a calculated column to my measure table with a simple =1 in it. I could then link my Measure Table to my fact table using those 2 columns

          My drill down functionality was then restored. Not sure of the consequences, will need to test it out.

          Great post btw Matt. I know how hard it is to make time to post stuff like this. Thanks.

  2. Thanks for the excellent compilation of all the best practices Matt!

    This blog brought out the most efficient and effective ways of working with Power Pivot, Power Query and Power BI. It is certainly useful not only for the beginners but everyone in the community.

  3. Great stuff.

    I never did approve of adding dim and fct prefixes to table names. Doing so tended to only make names and formulas longer without any measurable benefit to development or understanding the data model. Thoughtfully naming tables overcomes many issues.

    Thank you for validating my opinion.

  4. Hi Matt,

    Great post and agree with all of your points in the few years I’ve been working with the Power Stack. One follow up question for you though. What are the biggest causes for performance hits that you’ve found with bi-directional cross-filtering? I’ve used in several models and find it to be very useful when needed, but want to understand the downside as well as the upside.

    Thanks!

    • I personally haven’t experienced performance hits (I hardly use them) but I understand that there is a technical overhead. And there is the circular reference issue of course. I also have a view that the more Microsoft tries to make it easier for people to use without actually understanding how it actually works, the deeper people get in before they realise they are in trouble. In some ways that keeps me in a job, but I guess I have a view that some things should be more “pure”. Cross filtering behaviour is a foundational feature of the tool and I think it should be well understood; I think this default setting makes it less understood by beginners.

      There is no doubt bi-directional cross filtering is useful, and if you need it then you should use it (as I mentioned in the post).

    • “What are the biggest causes for performance hits that you’ve found with bi-directional cross-filtering”

      In my opinion the single most cause of performance hits using bi-directional cross-filtering is the inability of people new to DAX to comprehend that it’s there in the first place. It’s there per default and the consequences of having it activated per default will have to be understood by people (newcomers) who barely understand the uni-directional cross-filtering.

      Jeffrey Wang one of the architects behind the DAX-engine writes in this blogpost
      https://pbidax.wordpress.com/2016/05/22/simple-filter-in-dax-measures/

      ” that means DAX engine needs to scan the ‘Internet Sales’ fact table just to calculate the Filter function itself. Few people would want to scan a fact table only to set a filter on a dimension table”

      That actually nails it down in just a few words and I think that is also what Matt is expressing.

  5. Great post.

    But I’d not recommend anyone to use underscores or PascalCasing in table names.
    The user experience of your model is much more important than the tidyness of your DAX calculations.
    Furthermore, the natural language Q&A experience will suffer too: nobody writes “Show me revenue by product_category in 2016”.

    • i don’t get your point based on my recommendations. Q&A is more interested in measure names than table names. I don’t recommend you call your table product_category, I recommend you call it products (single noun). Your column should be called category. Then “show me revenue by product category” will work just fine.

  6. @Matt – I think Measures should be defined on a Dedicated Table Called M
    That way for some reason you need to delete a table in the model – you can do so without worrying about re-writing your measure.

    • As I mentioned in my earlier comments to Tristan, this is a valid approach however it does have some downsides hence I don’t think it can be called a best practice. No problems in doing it if you want of course.

  7. When we go from short and wide to long and thin, doesn’t that cause your file size to increase? I understand that I can use a lookup table for any repeated info, what if my columns are say

    Customer
    Country
    Production facility
    Loads of monthly columns with numerical data

    If the first three columns are independent I can’t use any of them as a relationship key can I.

    • Long and thin compress much more efficiently that short and wide. It has to do with the way data compresses in a column database. Based on your description, a better shape is

      Customer
      Country
      Production Facility
      Month
      Value

      Unpivot all the monthly columns into the month and value columns.

      I don’t know what you mean by your last question. The customer column can be used to join to a customer lookup table, the country column can join to a country lookup table etc.

      • Thanks for your prompt reply!

        If my source data has 1 million rows with 3 dimension columns and 12 month columns then that is 15 million data points. If I unpivot per your suggestion I then have 12 million rows x 5 columns = 60 million data points.

        Do I understand correctly that this won’t actually increase the file size, due to compression? I would have sworn that I have done this before and the file size has increased, perhaps not 1 for 1 but certainly significantly larger. Maybe I am mistaken.

        Ignore my last question. I was only making the point that if my other columns were say Product, Product Category and Product Sub-Category then I could have split two of them out into a look up table.

        PS I really appreciated your article, this was just the one point I was unsure on.

        • In the words of the famous Marco Russo, “It depends”. Everyone’s data is different and you need to test it. I can tell you for sure that generally speaking, a long thin table compresses better than a short wide table. But it depends on the frequency of the foreign keys in the source, the cardinality of the values in the columns being unpivoted, the similarities in the values in the columns being unpivoted etc. If it were me, I would create a copy of the workbook, delete everything other than the fact table, then save. Then tak another copy and unpivot – save again then compare.

Leave a Reply to Matt Allington Cancel reply

Your email address will not be published. Required fields are marked *

Ten Things I Wish I Knew When I Started With Power Pivot.   Learn  Without The Pain!

x