The Optimal Shape for Power Pivot Data

One of the topics in my book “Learn to Write DAX”  talks about the optimal shape of your data tables.  It is such an important topic that I have decide to share some modified extracts from the book here.

When I say “shape” of your data, I am talking about things like

  • how many tables you import into Power Pivot,
  • how many columns are in each table,
  • which columns are in each of the tables
  • should you unpivot your data so column headings become values in rows in your table.

One reason data shape is important is because the shape of data in transactional systems (or relational databases) is seldom the ideal shape for Power Pivot. When the IT department executes an Enterprise BI project, one of the important first steps is to shape the data so it is optimal for reporting. This step is normally completely transparent to the end user (ie you) and hence the end user is shielded from the need to do this. But you need to understand data shaping if you want to have efficient and effective Power Pivot data models. Just copying what you have got in your source data is unlikely to be optimal.

The Star Schema is the optimal shape

The generally accepted approach to bringing data into Power Pivot is to bring in your data in as a “Start Schema” format. This is a technical term coming from the Kimball methodology (also known as dimensional modelling) which describes the logical way data should be structured for optimal reporting performance. You can see why it is called a Star Schema by looking at the following image – note the star shape.

image

However I use and teach Excel users to follow the graphical layout approach developed and recommended by Rob Collie – let’s call it the Collie Methodology! That is to place the lookup tables at the top of the screen and the data tables at the bottom (see the image below).

image

If you compare the 2 images above, you will see they have exactly the same logical relationship (links) between the tables, it is just that they have a different visual layout.

Lookup tables and Data tables.

In the professional BI world, the Lookup tables would be referred to as Dimension tables and the Data tables would be referred to as Fact tables. For Excel users I suggest you think of them as Lookup tables and Data tables as it is a closer map to our old world of VLOOKUP.

Lookup tables

You should have 1 Lookup table for each “object” that you need for reporting purposes.  In the sample data above from Adventure Works, these objects are “customers”, “products”, “territories” and “time” (ie Calendar). A key feature of a Lookup table is it contains 1 and only 1 row for every individual item in the table and as many columns as you need to describe the object.  Eg, there is only 1 row for each unique Customer in the Customer table. The Customer table has lots of columns describing each customer such as customer number, customer name, customer address etc, but there is only 1 row for each customer – each one is unique based on the customer number – no duplicates allowed.

Data Tables

The Data table in the Adventure Works data is the Sales table (it is possible to have many data tables in your data model but there is only one in this example). This Data table contains lots of rows (60,000+ in this case) and contains all of the transactional records of sales that occurred over several years. Importantly the data table contains one column “key” that matches to each of the “keys” in each Lookup table needed for reporting. So in this sample data, there is a date, customer number, product number, and territory key so that the data table can be logically joined to the Lookup tables.

The ideal shape of Data tables is to have very few columns but to have as many rows as needed to bring in all the data records. These Data tables normally have lots of rows (sometimes in the 10s or even 100s of millions).
L2WD banner ad

Why you should layout your Star Schema using the Collie Methodology

There are a few reasons you should visually layout your data using the Collie Methodology.

  1. Firstly it gives you a visual clue to which ones are the “lookup” tables.  They are the ones above that you have to “lookup” to see them.
  2. Also note that in the old world of traditional Excel, you would have probably written VLOOKUP formulae to bring the data from these Lookup tables into your single data table prior to creating a pivot table.
  3. Finally, filters only propagate in 1 direction in Power Pivot (in Excel version 2010/2013) and that is from the 1 side of the relationship to the many side of the relationship.  The Lookup tables are on the 1 side, so that means the filters always flow “down hill”.  They do not/can not flow “up hill” automatically (but note this can be done with advanced DAX).  So this layout gives you a visual clue which direction the filters will flow.

Naming Conventions

It is good practice to use simple English names for your tables and columns.  Avoid using abbreviations and underscores wherever possible.

example:  Instead of a table called Pricing_History_Sum_Tbl, simply call it Pricing.  It is much easier to read and write your DAX if you keep it simple.

Other best practices include:

  • Use English business names
  • No spaces in Table Names (use CamelCase if necessary)
  • Spaces are fine in Column Names, because you always refer to these in [square brackets], so it doesn’t matter.
  • keep the names of your tables and columns short and to the point.

Measures on the other hand should always use Spaces not CamelCaseAsTheyAreHarderToReadThatWay.  Also avoid abbreviations unless they are common in your business setting.  Some examples of good measure names are

[Total Sales]

[Total Sales YTD]

[Total Sales All Products]

A final word on Shaping

While the optimum shape is a Star Schema as show above, that doesn’t mean this is the only way it will work. It will also work with a Snowflake Schema too (where there are secondary lookup tables joined to the primary lookup tables), but the point is that these extra relationships required to make the “snowflake” can come at a cost of worse performance and possibly also confusion to the user (if they access the field list for Pivot Tables).

There is no 1 correct way to shape your data, but what you need to remember is

  1. There are 2 types of tables
    1. Data tables that contain your actual data that you want to analyse.
    2. Lookup tables that contain metadata about the objects you are going to analyse, like the name, address, suburb of each customer.
  2. The optimal way to shape your data is a Star Schema as shown above, but this is not the only way it will work.
August 14, 2015 7:59 am

3 Comments

  • Hi Matt,
    excellent post, very much looking forward your DAX-book.

    Although modelling this style, I’ve often seen people dragging the date-key from the data table instead of from the date-(lookup-) table which leads to wrong results in some cases. So hiding these columns from the client tools makes sense in my eyes.

    Meanwhile I pushed this a little further: Putting my measures into dedicates measure tables, I’m now able to hide my complete data tables from client tools, which is also a good check if the model is complete.

    1. I agree with you on the date thing. If I share my workbooks with power users (ie they use the PivotTable Field List) then I also hide the other date columns (and anything else that can get them into trouble). But as a rule I try to distribute reports where the users do not need to interact with the PivotTable Field List.

      I have also used the Measures table in the past and find this quite good. It is interesting that when writing Cube Formulas, the Intellisense automatically adds the Measures table regardless. To me this looks like another of those things that Microsoft just needs to polish up a bit.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

*

Bring your data to life - Let's discuss how Matt can help!