Calculated Columns vs Measures in DAX

Level: Beginners

It has been 16 months since I wrote my blog post 5 common mistakes of self taught DAX students at PowerPivotPro.com.  I train a lot of people at my training courses and also help many more on various forums.  The number 1 mistake I see in self taught DAX students with an Excel background is the use and over use of Calculated Columns instead of Measures.

Attracted Like a Magnet

As I mentioned in the above post, Excel users are attracted to Calculated Columns like a magnet.

magnet

Maybe a better analogy would be “Attracted to a flame like a moth” with sometimes equally dangerous side effects.

I think this is because Excel users feel very comfortable working in the table structure that exists in the Power Pivot window.  In addition, self taught Power Pivot users normally don’t know any better.

Take a look at the Power Pivot window below (Adventure Works Sales Table).  When an untrained DAX user who is used to working in Excel looks at this table (which looks a lot like a spreadsheet), the “natural’ thing to do is to think “I’ve got Sales (1 below), I’ve got Cost (2), but I need Margin $ (Sales minus Cost).  So then they jump in and write a simple calculate column for Margin (3 below).

image

But this is 100% the wrong thing to do – please please don’t do this!  Use Measures instead.

Reasons Calculated Columns can be a Problem.

I have been very critical of Calculated Columns in the past to the extent that I have suggested they are evil (as implied in the graphic below).  This is not strictly true, but they certainly can be bad in the hands of the uninitiated which is why I warn new users to steer clear unless they are clear why it is the right approach.

see-no-evil

Here is a list of reasons that Calculated Columns can be bad.

  1. The results of your formula are calculated and stored in the table for every single row in the table.  This takes up space on disk and also uses space in memory, and this will make your report less performant.  If your table is 100 rows then it probably doesn’t matter.  If you table is 100 million rows, then it can be a big problem.
  2. The compression on Calculated Columns may not be as good as compression on imported columns.  This is because Calculated Columns are compressed last.  Power Pivot therefore doesn’t give Calculated Columns the same prioritisation consideration it gives imported columns.
  3. The Calculated Columns are recalculated every time the workbook is refreshed against the data source.  This will slow down the refresh process (increasingly so for every additional calculated columns).
  4. You can unwittingly create circular reference conflicts in your tables (particularly data tables) for reasons that are very difficult for learners to understand (and beyond the scope of this blog to explain).

Benefits of using Measures Instead of Calculated Columns

The most practical reason you shouldn’t write Calculated Columns is that you simply don’t need to.  Power Pivot was built for Measures.  If Calculated Columns are Evil, then Measures are Angelic.  Here are the benefits of using Measures instead of Calculated Columns.

  1. Your workbooks will be smaller and faster.  You are not “materialising” the results of every possible calculation in your table and storing it in memory and on disk (like with a Calculated Column).  Instead Measures are calculated on the fly in memory on demand, and Measure calculations are normally lightening fast.  The only results that are materialised are the results that need to be displayed inside your Pivot Table.
  2. Pivot Tables naturally filter your data model before calculations are completed.  This means that Measures only need to be calculated over the filtered subset of data, further reducing the number of calculations required.

Take the Calculated Column mentioned earlier.  You can get the exact same outcome by writing the following measures.  Note these are measures, not calculate columns.

Total Sales = SUM(Sales[SalesAmount])

Total Cost = SUM(Sales[TotalProductCost])

Total Margin = [Total Sales] - [Total Cost]

L2WD banner ad

General Rules of When Calculated Columns are OK

I still believe Calculated Columns are evil in the wrong hands, however as I have built depth of knowledge I have learnt more about the specific conditions when it is OK to use Calculated Columns.  Here are my general rules for when it is OK to use Calculated Columns.  For Excel users that are still learning; if you don’t know which to use, then I recommend that you assume you should use a Measure at least until you are sure in your mind why you must use a Calculated Column.  With that in mind, it is generally OK to use Calculated Columns in the following circumstances.

  1. When you need to use the results to filter your data model.  You can’t use a Measure on Rows, Columns, Filters, or Slicers in your Pivot Table.  If you want/need to do this, then you need a Column and not a Measure.  It is preferable to get the column added in your data source rather than use a Calculated Column.  The benefits of loading it from the source include:
    • that you can re-use the source column in other workbooks without re-writing the calculated column
    • there is the potential for improved compression on import
    • the refresh of your workbook will be faster.

    But for times when that is not possible to import from the source, then a Calculated Column is your friend.

  2. When you are enhancing a Lookup (Dimension) table.  Lookup tables are normally smaller (less rows) and hence the negative impacts of Calculated Columns are not as far reaching.
  3. When the results of the Calculated Column have a low cardinalityUniqueness of values in a column is the enemy of compression in Power Pivot.  So if your Calculated Column returns 2 possible unique values (eg Yes or No) and this column helps you with writing a complex Measure, then there are no problems with the Calculated Column.  Calculated Columns with a low cardinality compress really well.
  4. When you have a very complicated formula that is very taxing on your report to the point where the poor run time performance of a measure calculation is so bad that it is preferable to “pre-calculate” the result and store it in a column.  It is unlikely that a self taught Excel user will be writing such a complex formula in the early days of learning Power Pivot, but it is worth noting that this is a valid use case anyway.

Where to learn more

  1. Subscribe to Blogs and read all you can.  You can subscribe to my blog at the top of this page.  I also recommend that you subscribe to http://powerpivotpro.com for the latest information about Power Pivot.
  2. Read a book.  My book Learn to Write DAX covers Calculated Columns as well as all the other topics that you need to learn to be good at Power Pivot and Power BI.  My book focuses on giving you the theory as well as hands on practice. I keep a curated list of other great Power Pivot and Power Query books in my Knowledge Base.
  3. Do some live training.  Students from my live courses walk away with depth of knowledge that is difficult to replicate in other learning environments.  If you really want to gain deep understanding, then consider a live training course.
Share?

Comments

  1. This is a very helpful post, I have some re work to do now, Thank you for sharing your knowledge.

  2. Great guidelines, thanks Matt. i am interested to hear similar besr practice tips on striking a balance between creating calculated columns in DAX versus in the ETL phase with say, power query. Thats another area of overlap with no obvious answer.

  3. My Friends, good post as always! Help me out on this one
    “…The compression on Calculated Columns may not be as good as compression on imported columns…”
    In this article
    http://www.powerpivotpro.com/2015/12/compression-with-power-pivot/
    …you indicated that Columns seem to be compressed independent of each other. If that’s the case, then it should not matter if I add a calculated column in Power Pivot or bring it in via Power Query. Should it?
    I’ll try to run a test, but if you already have, let me know.

    • Hey Avi. This is still mysterious to me. I attended “Mastering DAX” and “Optimising DAX” with Marco Russo recently. That confirmed that there IS a dependency between columns with compression, however I can only go on my personal experience and testing. In my experience, there is little incremental cost of having 1 extra column, certainly not “orders of magnitude” extra cost. I have also tested imported columns vs calculated columns and found little difference. The only issue (as I understand) is that Calculate Columns are not considered along with other columns to determine the optimum compression order. So it is at least possible that this will make the workbook larger – possible but not guaranteed.

  4. Hmmm…I agree. Column compression being seemingly independent did blow my mind. I think Scott tried to disprove that but drew up blank, so I didn’t even bother to try. As I’ve heard the Italians often say – “It depends”. So it’s best to test things out with your own data I suppose.
    As for doing Calc Column in PP versus PQ…I have settled into the habit of ALWAYS using PQ to bring in data. So that’s moot for me as well.

  5. Thanks Matt – it is useful and insightful as always! I might possibly be the only Excel user who “forgets” to use columns and tries to write measures instead ( with varying results!) 🙂

  6. Thanks, Matt, very passionate post! 🙂
    It is really interesting, when to use PQ to make columns and when PP. I think that in cases you mentioned for use columns in PP (like filtering etc.), it is better to use PQ-generated columns, right? Because we at least do not recalculate them later

    • I think both Power Query and Power Pivot calculated columns are both fine for the types of columns I referred to in the post. It really depends on the scenario. If you have loaded your data from SQL Server directly into Excel 2013 (ie not through Power Query), then I would definitely use Power Pivot for the “appropriate” use cases. If my table was already loaded via Power Query, I would definitely use Power Query (you have to anyway). If the table is already loaded directly into Power Pivot, I would only switch to Power Query if there were a very good reason.

        • If I were in Power BI Desktop, I would probably put it into Power Query just for simplicity sake – so all data preparation is in the same place. The only exception would be if there were some complex calc that uses context transition (and hence the rest of the logic built in the data model) to return the required value. But we are touching on the edges here – I have a saying – if you have all the information and you still can’t decide, then it probably doesn’t matter etiher way.

  7. Sorry to be a pest. This post well explained why Calculated Columns (I assume from different tables) are “Bad” … but it did not discuss the correct solution!! I did apologize at the beginning. I come from the world of access where calculated columns are in queries and need to be re-run every time the query is run.
    Any hint (basic sample) would be appreciated. Please remember that I’m extremely new to all this and I got both books but not sure what to look for. B-)

    • Calculate columns are bad when you can use Measures instead. It is as simple as that. It does cover the correct solution – use Measures instead. I have added an example as per your request.

Leave a Reply

Your email address will not be published. Required fields are marked *

Ten Things I Wish I Knew When I Started With Power Pivot.   Learn  Without The Pain!

x