It has been 16 months since I wrote my blog post 5 common mistakes of self taught DAX students at PowerPivotPro.com. I train a lot of people at my training courses and also help many more on various forums. The number 1 mistake I see in self taught DAX students with an Excel background is the use and over use of Calculated Columns.
Attracted Like a Magnet
As I mentioned in the above post, Excel users are attracted to Calculated Columns like a magnet.
Maybe a better analogy would be “Attracted to a flame like a moth” with equally dangerous side effects.
I think this is because Excel users feel very comfortable working in the table structure that exists in the Power Pivot window. In addition, self taught Power Pivot users normally don’t know any better.
Take a look at the Power Pivot window below (Adventure Works Sales Table). When an untrained DAX user who is used to working in Excel looks at this table (which looks a lot like a spreadsheet), the “natural’ thing to do is to think “I’ve got Sales (1 below), I’ve got Cost (2), but I need Margin $ (Sales minus Cost). So then they jump in and write a simple calculate column for Margin (3 below).
But this is 100% the wrong thing to do – please please don’t do this!
Reasons Calculated Columns can be a Problem.
I have been very critical of Calculated Columns in the past to the extent that I have suggested they are evil (as implied in the graphic below). This is not strictly true, but they are certainly dangerous in the hands of the uninitiated which is why I warn new users to steer clear unless they are 100% sure it is the right approach.
Here is a list of reasons that Calculated Columns can be bad.
- The results of your formula are calculated and stored in the table for every single row in the table. This takes up space on disk and also uses space in memory, and this will make your report less performant. If your table is 100 rows then it probably doesn’t matter. If you table is 100 million rows, then it can be a big problem.
- The compression on Calculated Columns may not be as good as compression on imported columns. This is because Calculated Columns are compressed last. Power Pivot therefore doesn’t give Calculated Columns the same prioritisation consideration it gives imported columns.
- The Calculated Columns are recalculated every time the workbook is refreshed against the data source. This will slow down the refresh process (increasingly so for every additional calculated columns).
- You can unwittingly create circular reference conflicts in your tables (particularly data tables) for reasons that are very difficult for learners to understand (and beyond the scope of this blog to explain).
Benefits of using Measures Instead of Calculated Columns
The most practical reason you shouldn’t write Calculated Columns is that you simply don’t need to. Power Pivot was built for Measures. If Calculated Columns are Evil, then Measures are Angelic. Here are the benefits of using Measures instead of Calculated Columns.
- Your workbooks will be smaller and faster. You are not “materialising” the results of every possible calculation in your table and storing it in memory and on disk (like with a Calculated Column). Instead Measures are calculated on the fly in memory on demand, and Measure calculations are normally lightening fast. The only results that are materialised are the results that need to be displayed inside your Pivot Table.
- Pivot Tables naturally filter your data model before calculations are completed. This means that Measures only need to be calculated over the filtered subset of data, further reducing the number of calculations required.
General Rules of When Calculated Columns are OK
I still believe Calculated Columns are evil in the wrong hands, however as I have built depth of knowledge I have learnt more about the specific conditions when it is OK to use Calculated Columns. Here are my general rules for when it is OK to use Calculated Columns. For Excel users that are still learning; if you don’t know which to use, then I recommend that you assume you should use a Measure at least until you are sure in your mind why you must use a Calculated Column. With that in mind, it is generally OK to use Calculated Columns in the following circumstances.
- When you need to use the results to filter your data model. You can’t use a Measure on Rows, Columns, Filters, or Slicers in your Pivot Table. If you want/need to do this, then you need a Column and not a Measure. It is preferable to get the column added in your data source rather than use a Calculated Column. The benefits of loading it from the source include:
- that you can re-use the source column in other workbooks without re-writing the calculated column
- there is the potential for improved compression on import
- the refresh of your workbook will be faster.
But for times when that is not possible to import from the source, then a Calculated Column is your friend.
- When you are enhancing a Lookup (Dimension) table. Lookup tables are normally smaller (less rows) and hence the negative impacts of Calculated Columns are not as far reaching.
- When the results of the Calculated Column have a low cardinality. Uniqueness of values in a column is the enemy of compression in Power Pivot. So if your Calculated Column returns 2 possible unique values (eg Yes or No) and this column helps you with writing a complex Measure, then there are no problems with the Calculated Column. Calculated Columns with a low cardinality compress really well.
- When you have a very complicated formula that is very taxing on your report to the point where the poor run time performance of a measure calculation is so bad that it is preferable to “pre-calculate” the result and store it in a column. It is unlikely that a self taught Excel user will be writing such a complex formula in the early days of learning Power Pivot, but it is worth noting that this is a valid use case anyway.
Where to learn more
- Subscribe to Blogs and read all you can. You can subscribe to my blog at the top of this page. I also recommend that you subscribe to powerpivotpro.com for the latest information about Power Pivot.
- Read a book. My book Learn to Write DAX covers Calculated Columns as well as all the other topics that you need to learn to be good at Power Pivot and Power BI. My book focuses on giving you the theory as well as hands on practice. I keep a curated list of other great Power Pivot and Power Query books in my Knowledge Base.
- Do some live training. Students from my live courses walk away with depth of knowledge that is difficult to replicate in other learning environments. If you really want to gain deep understanding, then consider a live training course.
February 29, 2016 7:00 am