Today I explain when it is important to create a lookup table and when it is fine to use native columns in a data table. I have rated this topic as a beginner topic as it is a fundamental skill to learn on your journey to become a Power Pivot and Power BI Ninja. However there is so much to learn and you can only absorb so much content before your head begins to explode. Because there is so much to learn, you may find yourself learning this topic for the first time now even though you are an intermediate user. Or you may come back and read this again for a second time before it all sinks in. This topic is therefore good for everyone, either as a review, a confirmation of what you know, or as a new learning opportunity.
It is also worth pointing out that his topic does not mean the difference between “it works” and “It doesn’t work”. You can be successful using Power Pivot and Power BI without understanding this topic well. But it can make a big difference to usability of your model and certainly performance (in some cases). And not least of all, if you want to get better and advance you skills, you simply must understand these principles well.
Topics Covered in This Post
In this post I cover the following scenarios to help explain some of the nuances and benefits of using lookup tables (or not):
- Simple data models
- Inbuilt time intelligence functions
- To Simplify Your DAX formulas
- Pointless dimension tables
- Multiple data tables
- Simple business communication
First a Definition of the 2 Table Types
Before I move on, it is worth restating the definition of the 2 different types of tables so it is clear in your mind.
There are 2 different types of tables in a Power Pivot data model. There are dimension tables (I call them lookup tables) and there are fact tables (I call them data tables). These two table types are very different and understanding the difference is fundamental to understanding and using Power Pivot and Power BI properly.
Lookup Tables (Dimension Tables)
Lookup tables contain information typically about 1 business concept/object, eg Customers, Products, Time etc.
Lookup tables always have the following features:
- There is always a column of data that uniquely identifies each row of data in the table. In database terms this is the primary key. In the customer table it would be the customer number, in the products table it is the product code, in the calendar table it is the date, and so on for any lookup table.
- There can be no duplicates in this key column – every row must be unique
- There can be 1 or more additional columns in each lookup table that provide more information about that primary key column. Eg in the calendar lookup table, the date is the primary key. Other columns such as Day Name, Month Name, Year, Day of Week number etc all provide detail about the date key. I often refer to these additional columns as providing metadata about the primary key.
Data tables contain (typically) transactional information. It could be sales data, budget data, GL data from a financial system, call centre call data, or any other data about activities of some type.
Data tables have the following features:
- There is no requirement that a data table must have a primary key.
- There needs to be 1 column that can be connected to relevant lookup tables (assuming you want to connect these tables). In database terms, this column is called the foreign key. It should be obvious that if you want to join your data table to your product table, you must have a column that defines the product in your data table.
- Duplicates are allowed in data table columns. eg many sales occur on each day, many sales occur for each product.
Joining the Tables
You cannot join data tables to other data tables. Data tables are always joined to lookup tables using the Lookup Table primary key column as the join column.
I am using the Adventure Works database in the image below. I have placed the lookup tables in my data model at the top of the window and the data table below. This is the Collie Layout Methodology; it gives the user a visual clue that the tables you have to “look up” to see are the Lookup Tables – get it? It is not required, just an easy way to remember. Technically this data layout is called a star schema.
When To Use Lookup Tables
Back to the point of this post. When should you use lookup tables and when is it OK to simply use the column in your data table? Well the fact is that you do not “need” to use lookup tables at all to make your data model work – it is perfectly fine to simply load a data table and go from there. Having said that, read on to find out the nuances of the some common scenarios.
Simple data models
Let me start by covering when it is ok to not use any lookup tables. If your data meets the following scenario it is fine to ignore lookup tables:
- Your data set is small. What defines small depends of course. Certainly 10k – 250k rows of data is small (tiny actually). It is less important to efficiently store your data when your data set is small and hence a flat single data table can be OK.
- Your data doesn’t contain too many columns (this is a topic in its own right and outside of the scope of this post). If you have lots of columns (eg >15) then even 250k rows of data can become non-performant. But if you have a few columns, then maybe you don’t need lookup tables.
- Your data table already contains all the columns you need for your analysis.
- You don’t need to use any inbuilt time intelligence calculations (more on that below).
- You only need to write simple DAX formulas (such as SUM formulas).
If this is your scenario, then you don’t need lookup tables at all. However if you always operate in this simple scenario then you will not advance your DAX knowledge to the next level, so keep that in mind. This scenario with no lookup tables is quite rare and is probably limited to Excel users that already have their data in a single flat table.
Inbuilt Time Intelligence Functions
It is mandatory to have a calendar lookup table if you want to use the inbuilt time intelligence features of Power Pivot. Examples include calculating sales last year, sales year to date etc. I cover calendar lookup tables in depth in this article here. It is possible to do manual time intelligence calculations (not inbuilt, but manual) using a single data table, but I don’t recommend it. If time is an important part of your data, then I strongly recommend you get a calendar lookup table and go from there.
To Simplify Your DAX Formulas
While it is possible to use a simple data model with a single data table and no lookup tables (as mentioned above), once you have more complex analysis needs (eg other than just using SUM) you will need to write some DAX formulas to create the business insights. In my experience it can be easier to write DAX formulas when you have lookup tables, particularly when you need to change the filter context in your data model. “How” to do this is also out of scope for this topic, but be aware that this is another reason to move on from a single data table.
Pointless Dimension Tables
There is no point creating a dimension/lookup table without a reason and certainly not if there are only 2 columns in the lookup table. This will make more sense with an example.
Take the Adventure Works data model as shown below. It contains a lookup table (Products) with 2 columns, one for Product Key and one for Product Name. The lookup table is joined to the data table using the primary key (of course).
The data in this lookup table would look something like shown below – only 2 columns (but a lot more rows of data than shown in this image of course).
People that have some understanding of traditional relational databases will be aware that relational databases will store the data in the first column (ProductKey) a lot more efficiently than the data in the second column (Product Name). It is therefore common to think that it is more efficient to store the product key in the data table and put the product name in the lookup table only. But this is not the case in Power Pivot and Power BI. Power Pivot uses the xVelocity storage engine (columnar store) to compress the data. There is no (or little) space saving benefit in storing the product code in the data table instead of storing the product name. In addition every relationship in your data model comes at a cost/overhead. This is why this lookup table is called a junk dimension. If the only reason you have this lookup table is to join the single column “Product Name” to the data table via the product key, then it would be just as good to load the product name into the data table and drop the product key all together.
If your data already has the product name (and not the product key) in the data table, and you are thinking of creating this lookup table, then you may want to think again. That being said, there are other reasons why you may want to keep the primary key and lookup table including
- if you have more than 2 columns in your lookup table.
- if your product names are not unique (often the product key is managed more rigorously than the name)
- if your data table already has the product key in the table and it is easier to do it this way.
to name a few.
If you data table contains a column that is not related to any other column as a logical object and you are wondering if you should build a lookup table for it – the answer is no – don’t do it. Just use the column in your data table.
Multiple data tables
As you probably know already, it is possible to have multiple data tables in your data model. In fact this is one of the many great things about Power Pivot.
However as I mentioned at the start of this post, it is not possible to join one data table to another data table in Power Pivot. If you want to use multiple data tables in your data model, you must join them through common lookup tables. It therefore follows that if you have multiple data tables in your data table, you should load lookup tables even if they are junk dimensions.
Simple Business Communication
There is a clear benefit to business users if you logically group your columns of data into lookup tables that make business sense. It can be hard for users if they need to hunt through the data table for information about a customer, or a product etc. It is a much better end user experience if there are logical groupings of columns of relevant data in a table called Customer, Product etc. Given the end users may be writing their own reports using the field list to find the relevant data, grouping the data into lookup tables can really improve the end user experience.
Flatten Your Lookup Tables
Further to the above point, you should also consider flattening out your data into a single lookup table if you can. The following data model is feasible and will work, but it is hardly user friendly. In the image below there is a product lookup table, and then a sub category lookup table of the product table, and a product category lookup table of the sub category table.
But if you deploy this model, it is likely to be confusing to the end user. Remember they will be building reports using the field list, and they will need to go searching across different tables to find the columns of data they use. It is much better to flatten the data from the additional lookup tables into a single product lookup table if you can. Also flattening these tables will remove extra relationships and this will make the model more efficient (every relationship comes at a cost). And finally your DAX will be easier to read and write.
Hopefully this article has helped clarify when and where to use lookup tables. Let me know your thoughts and/or any questions that remain unresolved in the comments below.