There are many best practices for Power BI, Power Pivot and Power Query. I know these things so intuitively now that it is very easy to forget how I incrementally learnt these things along the journey. Most of these things are not “right” vs “wrong” – they are more often simply “better”. I thought there would be value in producing a page that clearly outlines the important best practices as a learning reference for everyone. In all cases I have outlined why it is a best practice to help the understanding. You would be well placed to adopt these best practices as part of your DAX journey.
Naming your Columns and Measures
- Always write a Column in the format TableName[Column Name]
- Always write a Measure in the format [Measure Name]
This is the foundation of all the DAX formulas you will write. Both columns and measures use the same square bracket syntax. It is technically possible to write measures and columns both including the table name as follows.
- TableName[Column Name]
- TableName[Measure Name]
These 2 examples above do not cause a problem because their names makes it easy to know what they are, but if you always place the table name at the front of every measure and every column, then it will be impossible to tell them apart by reading the formula. Take the following example
- TableName[Total Sales]
Is the above a column or a measure? It is impossible to tell unless you are using the best practice naming convention.
Measures and columns are very different in the DAX language. It is essential that you can tell at a glance which is which. In addition if you hard code a measure with the table name and then you later move the measure to another table, any formulas referring to this measure will stop working.
Give Tables a Single Noun Name
- Don’t just accept the table name from your source system. Preferably give the table a single word noun description/name.
Many BI data sources will have long table names like fctSalesTransactionsHistory or dimCustomerMasterFile. This is a common practice in IT and is related to the Kimball dimension modelling methodology. The problem is that with Self Service BI these table names, column names and measure names are more visible to business users than ever before. Given many business users are going to be reading these table names, it is much easier for them to comprehend the “Sales” table rather than the “fctSalesTransactionsHistory” table. In addition earlier versions of Power Pivot do not have fully featured intellisense – you are required to refer to columns by starting to type the table name from the beginning. If every table starts with either fct or dim, you have just added 3 additional characters you have to type for each formula before Intellisense can help you.
Also PowerBI.com has a natural langauge query tool that allows you to ask quetions of your data. If your table names are ‘words’, then you are helping the engine find what you are looking for.
Using Spaces in Names
- Don’t use spaces in table names
- Do use spaces in column names
- Do use spaces in measure names
If you use spaces in table names you will be forced to add single quotes around the table name each time you refer to it in a formula. This makes the code longer, harder to read and “untidy” (IMO anyway). It is better to use underscore_characters or CamelCase instead of spaces (or better still use a single noun name as mentioned above).
Columns and measures always need to be wrapped in [Square Brackets] anyway and hence adding spaces does not make the code any more complex. Columns and measures are easier to read if they have spaces
Don’t Overly Abbreviate Business Terms
- Give your tables, columns and measures descriptive business names without overly short abbreviations.
Firstly you should use language and abbreviations that are commonly used in your organisation. So if “Year to Date” is commonly abbreviated to YTD, then for sure you can use this abbreviation in your measure names eg [Total Sales YTD]. However if you develop a new measure called [Total Sales Last Rolling Quarter] and this is not a common concept across the organisation, then you are just making it hard for yourself if you call your measure [Ttl Sales LRQ]. You will simply have people calling you asking what it means.
Secondly Power BI has a feature called Q&A that allows a user to ask a natural language question about data.
eg. What were the total sales for bikes last year
This natural language algorithm looks for matches in words in the question against the data model to help answer the question. If you abbreviate your measure names to for example [TtlSales] instead of [Total Sales], you are making it hard for Q&A to do its work. You can help Q&A using synonyms, but do yourself a favour and don’t over abbreviate your name.s
Measures or Calculated Fields
- Measures is a better name than Calculated Fields
The term measures has been a Microsoft term for a BI formula for many years. In the first release of Power Pivot in Excel 2010, Microsoft adopted this term. Unfortunately in Excel 2013 somehow a decision was taken to rename “measures” to be called “calculated fields”. This was a bad decision and thanks to lobbying from many people Excel 2016 reverted to using the term measures (as does Power BI). I always now use the term measures and never refer to Calculated Fields unless I am explaining to Excel 2013 users why they are stuck with a bad name.
Loading and Shaping Data
Push Shaping as Close to the Source as Possible
- Wherever possible, you should do your data shaping as close as possible to the data source.
There are many ways that you can shape your data in the Microsoft BI stack. Power Query is a great tool to reshape your data however you can also use Power Pivot (Calculated Columns, Filters on load) and Power BI also includes Calculated Tables. And you can always write SQL code and paste that into the tools to extract the data that way. The main problem with these approaches is you are effectively hard coding a solution for a single data set. If you want to build another data set in the future, the work needs to be done again (either copy or re-write). The data shaping tools are designed to allow you to do what ever you need without having to rely on a third party – use these tools if you need to. However if you have a common need for data in a particular shape and you can get support (from IT or otherwise) to shape your data at the source so you can easily get what you need, then there is definitely value in doing that.
Shape in Power Query, Model in Power Pivot
Power Query and Power Pivot were built to do 2 completely different tasks. Power Query is built for cleansing and shaping while Power Pivot is built for modelling and reporting. It is possible that you can shape your data in Power Pivot (eg you can write calculated columns, you can add calculated tables (in the newer versions) etc). But just because you can do these things in Power Pivot, doesn’t mean you should. For example it is possible to write letters to people using Excel, but Word is a much better tool for this task (I knew someone that once did that!).
Best practice is that you should use Power Query to shape your data before/during load, and then use Power Pivot for measures and reporting. I have deeper coverage on this topic here.
Use A Calendar Table
- If you want to any sort of time calculations, get a Calendar table
It is possible that you can analyse your data in a single flat table without using any lookup/dimension tables. A Calendar table is a special type of lookup/dimension table because it can be used to perform time intelligence functions. I have an article on time intelligence here and another on Calendar tables here. Bottom line – get a Calendar table.
A Star Schema is Optimal
- Power Pivot is optimised to use a Star Schema table structure
I have an in-depth article about star schemas here that you can read if need be. I am not saying this is the only layout that will work, or that other designs will always be slow. I am saying that if you start out thinking about a star schema and aim to build that design you will be well under way to success. Two key things you should know.
- Don’t just bring in what is in your source transactional database – that would likely put you into a world of pain.
- There is no need to create a lookup/dimension table just for the sake of it. If your sales table has customer name and you don’t care about anything else about the customer (eg city, state etc), then there is no need to create a lookup table just for the sake of creating a star schema. If you have 2 or more columns relating to the same object in your data table, then it is time to consider a lookup table.
You Should Prefer Long and Narrow Tables
- Short wide tables are generally bad for Power Pivot but long narrow tables are great.
There are 2 main reasons why loading data this way is a good idea.
- Power Pivot is a column store database. It uses advanced compression techniques to store the data efficiently so it takes up less space and so it is fast to access the data when needed. Simplistically speaking, long narrow tables compress better than short wide tables.
- Power Pivot is designed to quickly and easily filter your data. It is much easier/better to write one formula to add up a single column and then filter on an attribute column (such as month name in the green table above) than it is to write many different measures to add up each column separately.
Only Load the Data You Need
- Load all the data you need, and nothing you don’t need.
If you have data (particularly in extra columns) you don’t need loaded, then don’t load it. Loading data you don’t need will make your workbooks bigger and slower than they need to be. In the old world of Excel we all used to ask IT to “give me everything” because it was too hard to go back and add the missing columns of data later. This is no longer the case – it is very easy to change your data load query to add in a column you are missing. So bring in all of what you need and nothing you don’t. If you need something else later, then go and get it later. It is even advisable to use a tool like PP Utilities to show you which columns are not in use so you can delete them. Focus mainly on your large data tables – the lookup/dimension tables tend to be smaller and hence are generally less of an issue (not always).
Don’t use Linked Tables
It is possible to add your data to a table in Excel and then use a Linked Table to load it into Power Pivot. You simply select the data, go to the Power Pivot Menu (1 below) and click Add to Data Model (2 below).
The trouble with doing this is that you end up with 2 copies of the data in your workbook. The Excel table is an uncompressed copy and then you also have a compressed copy inside Power Pivot. In the example (shown as 3 above) there are many thousands of rows of data. The uncompressed data can take up 6 – 10 times more space than the equivalent compressed data. If you have a small table with a couple of columns and 10 or 20 rows then it is fine. However if you have anything more than that you are better to have 1 workbook for your data and then import the table directly into Power Pivot without storing it in Excel at all.
If you want a comprehensive lesson on how to use Power Query, checkout my training course here http://xbi.com.au/pqt
Avoid Bi-Directional Relationships
- Avoid using the default bi-directional relationships in Power BI unless you need them.
Microsoft introduce bi-directional filter propagation in Power BI (this is currently not available in Excel). This is a good feature for beginners and those that don’t really understand how the plumbing works. But it comes at a cost in that:
- The performance can be negatively affected
- If there is more than 1 data table, there can be circular relationships created (just like in cells in Excel)
I recommend you turn your bi-directional relationships to single direction (double click to change) and only turn them back on if you really need them.
Measures are Better than Calculated Columns
- Wherever possible you should write Measures in Preference to Calculated Columns Where Possible
I have been a strong proponent of this for many years. It mainly applies to Excel users that don’t have any formal learning about database design. It is very easy for an Excel user to think they should write a calculated column (because they know how to do that) and not a measure (because that is a bit foreign to an Excel user). I am not going to cover this in depth again now as I have already covered it here. Just do yourself a favour Excel folk and start with the assumption that a measure is what you should write unless you know why a calculated column is a better fit.
For the record there are times when a calculated column is the best option, but 99.9% of all use cases coming from new Excel users won’t need them. The main exception is if you need to use the formula in a slicer to filter your data – then you will need a column.
Store Measures in the Table Where the Data Comes from
- You have a choice where to store your measures, so place them in the table where the data comes from.
Take for example a measure like this.
Total Sales = SUM(Sales[Extended Amount])
The data for this measure is coming from the [Extended Amount] column in the sales table. You should therefore store the measure in the sales table. If you place it in (say) the Calendar table, Excel will likely give you a warning similar to shown below.
This issue doesn’t occur in Power BI.
Break Measures into Interim Parts
- Break your DAX problems into manageable pieces and solve each piece one at a time.
You wouldn’t use a single cell in a spreadsheet to build a financial model. The cells are there to be used and make it easier to build a solution that meets your needs. You should think the same way about measures. The following formula is valid however it is hard to read, write and debug. It also repeats the same line of code multiple times throughout the measure. Having said that it will give you the % change in sales vs last year.
It is much easier to write interim measures and then join the pieces together to solve your problem. Plus you get each interim measure available for reuse elsewhere in your model. I am sure you will agree the following set of 4 measures are much easier to understand.
Total Sales = SUMX(Sales,Sales[Qty] * Sales[Unit Price])
Total Sales LY = CALCULATE([Total Sales],SAMEPERIODLASTYEAR(Calendar[Date]))
Chg vs LY = [Total Sales] – [Total Sales LY]
% Chg vs LY = DIVIDE ( [Chg vs LY], [Total Sales LY] )
Don’t Break Calculated Columns into Interim Parts
- It is good to have interim measures but it is bad to keep interim columns.
Interim measures are calculated on the fly on demand, they take up little space and make it easier to write your DAX. As with measures, I t is easier to write calculated columns using interim calculated columns, however the problem is that every column is pre-calculated and stored on disk, and each additional column makes the data model take up more space on disk and memory and hence makes it less efficient. By all means write interim columns if you need to in order to create a calculate column (not withstanding the earlier comments of columns vs measures) however once you have worked out the correct syntax, concatenate all the code into a single “Mega DAX formula” in a single column. This is an Excel concept I learnt from John Walkenbach.
You Can’t Start to Learn DAX by Reading Alone
I say this up front in my book “Learn to Write DAX”. If you think you are going to learn a new skill like Power Pivot, Power Query or Power BI by reading a book and not getting your hands in the system, let me tell you “you can’t”. The exception is if you are a professional SQL Server database user and have a solid background in reporting and analytics database technology, then I am sure you can learn this way. For the rest of us Excel folk, there is no substitute for practicing what you read – so do yourself a favour.
Use 64 bit If You Can
This one causes a world of pain for many people. Power Pivot is the ONLY MS Office product that can benefit from 64 bit, but unfortunately it is all or nothing. Most organisations have deployed 32 bit and will be very reluctant to give you 64 bit Office. You can read my article on this topic here and also read up about my work arounds including installing 64 bit Power BI Desktop with 32 bit Office, and then also using Power BI Desktop as a local server for Excel.