Edit: Sep 2019
There are many best practices for Power BI. I know these things so intuitively now that it is very easy to forget how I incrementally learnt these things along the journey. Most of these things are not “right” vs “wrong” – they are more often simply “better”. I thought there would be value in producing a page that clearly outlines the important best practices as a learning reference for everyone. In all cases I have outlined why it is a best practice to help the understanding. You would be well placed to adopt these best practices as part of your DAX journey.
Naming your Columns and Measures
- Always write a Column in the format TableName[Column Name]
- Always write a Measure in the format [Measure Name]
This is the foundation of all the DAX formulas you will write. Both columns and measures use the same square bracket syntax. It is technically possible to write measures and columns both including the table name as follows.
- TableName[Column Name]
- TableName[Measure Name]
These 2 examples above do not cause a problem because their names make it easy to know what they are, but if you always place the table name at the front of every measure and every column, then it will be impossible to tell them apart by reading the formula. Take the following example
- TableName[Total Sales]
Is the above a column or a measure? It is impossible to tell unless you are using the best practice naming convention.
Measures and columns are very different in the DAX language. It is essential that you can tell at a glance which is which. In addition if you hard code a measure with the table name and then you later move the measure to another table, any formulas referring to this measure will stop working.
Give Tables a Single Noun Name
- Don’t just accept the table name from your source system. Preferably give the table a single word noun description/name.
Many BI data sources will have long table names like fctSalesTransactionsHistory or dimCustomerMasterFile. This is a common practice in IT and is related to the Kimball dimension modelling methodology. The problem is that with Self Service BI these table names, column names and measure names are more visible to business users than ever before. Given many business users are going to be reading these table names, it is much easier for them to comprehend the “Sales” table rather than the “fctSalesTransactionsHistory” table. If every table starts with either fct or dim, you have just added 3 additional characters you have to type for each formula before Intellisense can help you.
Also Power BI has a natural language query tool that allows you to ask questions of your data. If your table names are ‘words’, then you are helping the engine find what you are looking for.
Using Spaces in Names
- Don’t use spaces in table names.
- Do use spaces in column names.
- Do use spaces in measure names.
If you use spaces in table names you will be forced to add single quotes around the table name each time you refer to it in a formula. This makes the code longer, harder to read and “untidy” (IMO anyway). It is better to use underscore_characters or CamelCase instead of spaces (or better still use a single noun name as mentioned above).
Columns and measures always need to be wrapped in [Square Brackets] anyway and hence adding spaces does not make the code any more complex. Columns and measures are easier to read if they have spaces.
Don’t Overly Abbreviate Business Terms
- Give your tables, columns and measures descriptive business names without overly short abbreviations.
Firstly you should use the language and abbreviations that are commonly used in your organisation. So if “Year to Date” is commonly abbreviated to YTD, then for sure you can use this abbreviation in your measure names, e.g. [Total Sales YTD]. However if you develop a new measure called [Total Sales Last Rolling Quarter] and this is not a common concept across the organisation, then you are just making it hard for yourself if you call your measure [Ttl Sales LRQ]. You will simply have people calling you asking what it means.
Secondly Power BI has a feature called Q&A that allows a user to ask a natural language question about data.
e.g. What were the total sales for bikes last year?
This natural language algorithm looks for matches in words in the question against the data model to help answer the question. If you abbreviate your measure names to for example [TtlSales] instead of [Total Sales], you are making it hard for Q&A to do its work. You can help Q&A using synonyms, but do yourself a favour and don’t over abbreviate your names.
Loading and Shaping Data
Push Shaping as Close to the Source as Possible
- Wherever possible, you should do your data shaping as close as possible to the data source.
There are many ways that you can shape your data in Power BI. Power Query is a great tool to reshape your data however you can also use DAX (Calculated Columns, Filters on load) and Power BI also includes Calculated Tables. And you can always write SQL code and paste that into the tools to extract the data that way. The main problem with these approaches is you are effectively hard coding a solution for a single data set. If you want to build another data set in the future, the work needs to be done again (either copy or re-write). The data shaping tools are designed to allow you to do whatever you need without having to rely on a third party – use these tools if you need to. However if you have a common need for data in a particular shape and you can get support (from IT or otherwise) to shape your data at the source so you can easily get what you need, then there is definitely value in doing that.
Shape with ‘M’ in Power Query, Model with DAX in Power BI
Power Query (‘M’) and DAX were built to do 2 completely different tasks. Power Query is built for cleansing and shaping while DAX is built for modelling and reporting. It is possible that you can shape your data with DAX (e.g. you can write calculated columns, you can add calculated tables, etc.). But just because you can do these things with DAX, doesn’t mean you should. For example it is possible to write letters to people using Excel, but Word is a much better tool for this task (I knew someone that once did that!).
Best practice is that you should use Power Query to shape your data before/during load, and then use DAX for measures and reporting. I have deeper coverage on this topic here.
Use A Calendar Table
- If you want to any sort of time calculations, get a Calendar table
It is possible that you can analyse your data in a single flat table without using any lookup/dimension tables. A Calendar table is a special type of lookup/dimension table because it can be used in the time intelligence functions. I have an article on time intelligence here and another on Calendar tables here. Bottom line – get a Calendar table.
A Star Schema is Optimal
- Power BI is optimised to use a Star Schema table structure
I have an in-depth article about star schemas here that you can read if need be. I am not saying this is the only layout that will work, or that other designs will always be slow. I am saying that if you start out thinking about a star schema and aim to build that design you will be well under way to success. Two key things you should know.
- Don’t just bring in what is in your source transactional database – that would likely put you into a world of pain.
- There is no need to create a lookup/dimension table just for the sake of it. If your sales table has customer name and you don’t care about anything else about the customer (e.g. city, state etc.), then there is no need to create a lookup table just for the sake of creating a star schema. If you have 2 or more columns relating to the same object in your data table, then it is time to consider a lookup table.
You Should Prefer Long and Narrow Tables
- Short wide tables are generally bad for Power BI but long narrow tables are great.
There are 2 main reasons why loading data this way is a good idea.
- Power BI has a column store database. It uses advanced compression techniques to store the data efficiently so it takes up less space and so it is fast to access the data when needed. Simplistically speaking, long narrow tables compress better than short wide tables.
- Power BI is designed to quickly and easily filter your data. It is much easier/better to write one formula to add up a single column and then filter on an attribute column (such as month name in the green table above) than it is to write many different measures to add up each column separately.
Only Load the Data You Need
- Load all the data you need, and nothing you don’t need.
If you have data (particularly in extra columns) you don’t need loaded, then don’t load it. Loading data you don’t need will make your workbooks bigger and slower than they need to be. In the old world of Excel we all used to ask IT to “give me everything” because it was too hard to go back and add the missing columns of data later. This is no longer the case – it is very easy to change your data load query to add in a column you are missing. So bring in all of what you need and nothing you don’t. If you need something else later, then go and get it later. Focus mainly on your large data tables – the lookup/dimension tables tend to be smaller and hence are generally less of an issue (not always).
If you want a comprehensive lesson on how to use Power Query, checkout my Power Query Only Training course here.
Avoid Bi-Directional Relationships
- Avoid using the default bi-directional relationships in Power BI unless you need them.
Microsoft introduced bi-directional filter propagation in Power BI. This is a good feature for beginners and those that don’t really understand how the plumbing works. But it comes at a cost in that:
- The performance can be negatively affected
- If there is more than 1 data table, there can be circular relationships created
I recommend you turn your bi-directional relationships to single direction (double click to change) and only turn them back on if you really need them.
Measures are Better than Calculated Columns
- Wherever possible you should write Measures in Preference to Calculated Columns.
I have been a strong proponent of this for many years. It mainly applies to Excel users that don’t have any formal learning about database design. It is very easy for an Excel user to think they should write a calculated column (because they know how to do that) and not a measure (because that is a bit foreign to an Excel user). I am not going to cover this in depth again now as I have already covered it here. Just do yourself a favour Excel folk and start with the assumption that a measure is what you should write unless you know why a calculated column is a better fit.
For the record there are times when a calculated column is the best option, but 99.9% of all use cases coming from new Excel users won’t need them. The main exception is if you need to use the formula in a slicer to filter your data – then you will need a calculated column.
Store Measures in the Table Where the Data Comes from
- You have a choice where to store your measures, so place them in the table where the data comes from.
Take for example a measure like this.
Total Sales = SUM(Sales[Extended Amount])
The data for this measure is coming from the [Extended Amount] column in the Sales table. You should therefore store the measure in the Sales table. This makes it easier to manage the measures in Power BI.
Break Measures into Interim Parts
- Break your DAX problems into manageable pieces and solve each piece one at a time.
You wouldn’t use a single cell in a spreadsheet to build a financial model. The cells are there to be used and make it easier to build a solution that meets your needs. You should think the same way about measures. The following formula is valid however it is hard to read, write and debug. It also repeats the same line of code multiple times throughout the measure. Having said that it will give you the % change in sales vs last year.
%Chg vs LY = DIVIDE( SUMX(Sales, Sales[Qty] * Sales[Unit Price]) - CALCULATE( SUMX(Sales, Sales[Qty] * Sales[Unit Price]), SAMEPERIODLASTYEAR('Calendar'[Date]) ), CALCULATE( SUMX(Sales, Sales[Qty] * Sales[Unit Price]), SAMEPERIODLASTYEAR('Calendar'[Date]) ) )
It is much easier to write interim measures and then join the pieces together to solve your problem. Plus you get each interim measure available for reuse elsewhere in your model. I am sure you will agree the following set of 4 measures are much easier to understand.
Total Sales = SUMX(Sales, Sales[Qty] * Sales[Unit Price])
Total Sales LY = CALCULATE([Total Sales], SAMEPERIODLASTYEAR('Calendar'[Date]))
Chg vs LY = [Total Sales] – [Total Sales LY]
%Chg vs LY = DIVIDE([Chg vs LY], [Total Sales LY])
Don’t Break Calculated Columns into Interim Parts
- It is good to have interim measures but it is bad to keep interim columns.
Interim measures are calculated on the fly on demand, they take up little space and make it easier to write your DAX. As with measures, it is easier to write calculated columns using interim calculated columns, however the problem is that every calculated column is pre-calculated and stored on disk, and each additional calculated column makes the data model take up more space on disk and memory and hence makes it less efficient. By all means write interim calculated columns if you need to in order to create a calculated column (notwithstanding the earlier comments of columns vs measures) however once you have worked out the correct syntax, concatenate all the code into a single “Mega DAX formula” in a single calculated column. This is an Excel concept I learnt from John Walkenbach.
You Can’t Start to Learn DAX by Reading Alone
I say this up front in my book “Supercharge Power BI”. If you think you are going to learn a new skill like data modelling with DAX in Power BI by reading a book and not getting your hands on the system, let me tell you “you can’t”. The exception is if you are a professional SQL Server database user and have a solid background in reporting and analytics database technology, then I am sure you can learn this way. For the rest of us Excel folk, there is no substitute for practicing what you read – so do yourself a favour.
I offer various Power BI trainings to suit your requirements. Click on the image below to find the details of the courses.
Use 64 bit If You Can
This one causes a world of pain for many people. Power BI is the ONLY MS Office product that can benefit from 64 bit. Most organisations have deployed 32 bit Office and will be very reluctant to give you 64 bit Office. You can read about my work arounds including installing 64 bit Power BI Desktop with 32 bit Office, and then also using Power BI Desktop as a local server for Excel.