When to Create a Lookup Table in Power Pivot

Level: Beginners

Today I explain when it is important to create a lookup table and when it is fine to use native columns in a data table.  I have rated this topic as a beginner topic as it is a fundamental skill to learn on your journey to become a Power Pivot and Power BI Ninja.  However there is so much to learn and you can only absorb so much content before your head begins to explode.  Because there is so much to learn, you may find yourself learning this topic for the first time now even though you are an intermediate user. Or you may come back and read this again for a second time before it all sinks in.  This topic is therefore good for everyone, either as a review, a confirmation of what you know, or as a new learning opportunity.

It is also worth pointing out that his topic does not mean the difference between “it works” and “It doesn’t work”.  You can be successful using Power Pivot and Power BI without understanding this topic well.  But it can make a big difference to usability of your model and certainly performance (in some cases).  And not least of all, if you want to get better and advance you skills, you simply must understand these principles well.

Topics Covered in This Post

In this post I cover the following scenarios to help explain some of the nuances and benefits of using lookup tables (or not):

  • Simple data models
  • Inbuilt time intelligence functions
  • To Simplify Your DAX formulas
  • Pointless dimension tables
  • Multiple data tables
  • Simple business communication

First a Definition of the 2 Table Types

Before I move on, it is worth restating the definition of the 2 different types of tables so it is clear in your mind.

There are 2 different types of tables in a Power Pivot data model.  There are dimension tables (I call them lookup tables) and there are fact tables (I call them data tables).  These two table types are very different and understanding the difference is fundamental to understanding and using Power Pivot and Power BI properly.

Lookup Tables (Dimension Tables)

Lookup tables contain information typically about 1 business concept/object, eg Customers, Products, Time etc.

image

Lookup tables always have the following features:

  • There is always a column of data that uniquely identifies each row of data in the table.  In database terms this is the primary key.  In the customer table it would be the customer number, in the products table it is the product code, in the calendar table it is the date, and so on for any lookup table.
  • There can be no duplicates in this key column – every row must be unique
  • There can be 1 or more additional columns in each lookup table that provide more information about that primary key column.  Eg in the calendar lookup table, the date is the primary key. Other columns such as Day Name, Month Name, Year, Day of Week number etc all provide detail about the date key.  I often refer to these additional columns as providing metadata about the primary key.

Data Tables

Data tables contain (typically) transactional information.  It could be sales data, budget data, GL data from a financial system, call centre call data, or any other data about activities of some type.

image

Data tables have the following features:

  • There is no requirement that a data table must have a primary key.
  • There needs to be 1 column that can be connected to relevant lookup tables (assuming you want to connect these tables).  In database terms, this column is called the foreign key.  It should be obvious that if you want to join your data table to your product table, you must have a column that defines the product in your data table.
  • Duplicates are allowed in data table columns.  eg many sales occur on each day, many sales occur for each product.

Joining the Tables

You cannot join data tables to other data tables.  Data tables are always joined to lookup tables using the Lookup Table primary key column as the join column.

I am using the Adventure Works database in the image below.  I have placed the lookup tables in my data model at the top of the window and the data table below.  This is the Collie Layout Methodology; it gives the user a visual clue that the tables you have to “look up” to see are the Lookup Tables – get it?  It is not required, just an easy way to remember.  Technically this data layout is called a star schema.

image

When To Use Lookup Tables

Back to the point of this post.  When should you use lookup tables and when is it OK to simply use the column in your data table?  Well the fact is that you do not “need” to use lookup tables at all to make your data model work – it is perfectly fine to simply load a data table and go from there.  Having said that, read on to find out the nuances of the some common scenarios.

Simple data models

Let me start by covering when it is ok to not use any lookup tables.  If your data meets the following scenario it is fine to ignore lookup tables:

  • Your data set is small.  What defines small depends of course.  Certainly 10k – 250k rows of data is small (tiny actually). It is less important to efficiently store your data when your data set is small and hence a flat single data table can be OK.
  • Your data doesn’t contain too many columns (this is a topic in its own right and outside of the scope of this post).  If you have lots of columns (eg >15) then even 250k rows of data can become non-performant.  But if you have a few columns, then maybe you don’t need lookup tables.
  • Your data table already contains all the columns you need for your analysis.
  • You don’t need to use any inbuilt time intelligence calculations (more on that below).
  • You only need to write simple DAX formulas (such as SUM formulas).

If this is your scenario, then you don’t need lookup tables at all.  However if you always operate in this simple scenario then you will not advance your DAX knowledge to the next level, so keep that in mind.  This scenario with no lookup tables is quite rare and is probably limited to Excel users that already have their data in a single flat table.

Inbuilt Time Intelligence Functions

It is mandatory to have a calendar lookup table if you want to use the inbuilt time intelligence features of Power Pivot.  Examples include calculating sales last year, sales year to date etc. I cover calendar lookup tables in depth in this article here.  It is possible to do manual time intelligence calculations (not inbuilt, but manual) using a single data table, but I don’t recommend it.  If time is an important part of your data, then I strongly recommend you get a calendar lookup table and go from there.

To Simplify Your DAX Formulas

While it is possible to use a simple data model with a single data table and no lookup tables (as mentioned above), once you have more complex analysis needs (eg other than just using SUM) you will need to write some DAX formulas to create the business insights.  In my experience it can be easier to write DAX formulas when you have lookup tables, particularly when you need to change the filter context in your data model.  “How” to do this is also out of scope for this topic, but be aware that this is another reason to move on from a single data table.

Pointless Dimension Tables

There is no point creating a dimension/lookup table without a reason and certainly not if there are only 2 columns in the lookup table.  This will make more sense with an example.

Take the Adventure Works data model as shown below.  It contains a lookup table (Products) with 2 columns, one for Product Key and one for Product Name.  The lookup table is joined to the data table using the primary key (of course).

image

The data in this lookup table would look something like shown below – only 2 columns (but a lot more rows of data than shown in this image of course).

image

People that have some understanding of traditional relational databases will be aware that relational databases will store the data in the first column (ProductKey) a lot more efficiently than the data in the second column (Product Name).  It is therefore common to think that it is more efficient to store the product key in the data table and put the product name in the lookup table only.  But this is not the case in Power Pivot and Power BI.  Power Pivot uses the xVelocity storage engine (columnar store) to compress the data.  There is no (or little) space saving benefit in storing the product code in the data table instead of storing the product name.  In addition every relationship in your data model comes at a cost/overhead.  This is why this lookup table is called a junk dimension. If the only reason you have this lookup table is to join the single column “Product Name” to the data table via the product key, then it would be just as good to load the product name into the data table and drop the product key all together.

If your data already has the product name (and not the product key) in the data table, and you are thinking of creating this lookup table, then you may want to think again.  That being said, there are other reasons why you may want to keep the primary key and lookup table including

  • if you have more than 2 columns in your lookup table.
  • if your product names are not unique (often the product key is managed more rigorously than the name)
  • if your data table already has the product key in the table and it is easier to do it this way.

to name a few.

If you data table contains a column that is not related to any other column as a logical object and you are wondering if you should build a lookup table for it – the answer is no – don’t do it.  Just use the column in your data table.

Multiple data tables

As you probably know already, it is possible to have multiple data tables in your data model. In fact this is one of the many great things about Power Pivot.

image

However as I mentioned at the start of this post, it is not possible to join one data table to another data table in Power Pivot. If you want to use multiple data tables in your data model, you must join them through common lookup tables.  It therefore follows that if you have multiple data tables in your data table, you should load lookup tables even if they are junk dimensions.

Simple Business Communication

There is a clear benefit to business users if you logically group your columns of data into lookup tables that make business sense.  It can be hard for users if they need to hunt through the data table for information about a customer, or a product etc.  It is a much better end user experience if there are logical groupings of columns of relevant data in a table called Customer, Product etc.  Given the end users may be writing their own reports using the field list to find the relevant data, grouping the data into lookup tables can really improve the end user experience.

Flatten Your Lookup Tables

Further to the above point, you should also consider flattening out your data into a single lookup table if you can.  The following data model is feasible and will work, but it is hardly user friendly.  In the image below there is a product lookup table, and then a sub category lookup table of the product table, and a product category lookup table of the sub category table.

image

But if you deploy this model, it is likely to be confusing to the end user.  Remember they will be building reports using the field list, and they will need to go searching across different tables to find the columns of data they use.  It is much better to flatten the data from the additional lookup tables into a single product lookup table if you can.  Also flattening these tables will remove extra relationships and this will make the model more efficient (every relationship comes at a cost).  And finally your DAX will be easier to read and write.

Wrap Up

Hopefully this article has helped clarify when and where to use lookup tables.  Let me know your thoughts and/or any questions that remain unresolved in the comments below.

Share?

Comments

  1. Thanks Matt for this great blog.

    My opinion is unless you are working on a homework problem or a very small data set, you will likely need to work with fact table and dimension tables all the times in real world. My company is not a data-driven company yet but we have multiple and multiple data tables and sets (dimension and fact).

    Have a great day.

    • I agree Andrew, particularly if you are doing enterprise wide work. But some users coming from an Excel world may already have data in a flat table. Also it may make sense to create some lookup tables but leave other data as a single column in the data table.

  2. Hi Matt,

    Good article – particularly the explanation of why a single-attribute look-up table is sub-optimal.

    However, your use of the term ‘junk dimension’ is a bit iffy. The concept comes from classic data warehouse design (Ralph Kimball specifically), and refers to a dimension that contains a collection of unrelated and non-hierarchic attributes. These are bundled together precisely to avoid single-attribute dimensions proliferating in a star schema model, gunking up the user experience. So it’s at best a related concept.

  3. Thanks Matt for another great post.

    As you mentioned a fundamental requirement of lookup table is a column of unique values. If there are two or more blank rows in that column then it would be impossible establish connection to the data tables. It is important to beware of the existence of duplicates and blanks in the primary key column.

  4. Good Day!

    What If I need one of the column in the fact table as a slicer? and a sorted slicer? Should I make [Sort By Column] field in the fact table? What about performance?
    What do you advice?

    Tnank you very much

    • Good question. By default, if you need a sort column, then it becomes a 2+ Column lookup table and hence worth considering. Definitely if you have a large data table (millions of rows) and the slicer column is also large (say 20+ unique values) then I would prefer to create a lookup table. If he data is less than this it is really personal preference. I think I would create the lookup table anyway.

  5. Hi ,
    Using Adventure Works database and an example in your book, Version 1 of the following example works. Is there any way to use the same example on the data table without creating a lookup table? I have tried Version 2 of the formula but clearly that does not work.
    Many thanks
    Aditi
    Version 1 Count of Products that have some sales:=CALCULATE(countrows(Products),filter(Products,Sales[Total Sales Amount]>8000))

    Version 2 Count of Products that have some sales:=CALCULATE(countrows(Sales),FILTER(SALES,[Total Sales Amount]>8000))

    • If you want version 2 to return the same result as version 1, but without using a lookup table, you could write this.

      Version 3 Count of Products that have some sales:
      =CALCULATE(distinctcount(Sales[ProductKey]),FILTER(values(SALES[ProductKey]),[Total Sales Amount]>8000))

      • Many thanks for your reply. i have a little doubt how are the tables Values(Sales[ProductKey] and Sales linked. Just checking if I have understood the logic right. So, here Values(Sales[ProductKey]) is working as the lookup table. Filter creates a row context in Row 1 of Values(Sales[ProductKey]) table and then the invisible calculate function wrapped around the calculated field [Total Sales Amount] propagates the filter from the table values(Sales[ProductKey]) to Sales Table ?

        • Aditi, that is an excellent and perfect explanation! You are truly on the way to becoming great at DAX. You have correctly identified all of the complicated concepts in this formula.

Leave a Reply to Matt Allington Cancel reply

Your email address will not be published. Required fields are marked *

Ten Things I Wish I Knew When I Started With Power Pivot.   Learn  Without The Pain!

x