Power Pivot, Power Query and Power BI are 3 products that are closely related to each other and were all built for the same purpose – enabling Self Service Business Intelligence. I first learnt to use Power Pivot for Excel, then Power Query for Excel, and finally Power BI. But there is a new wave of users that have never heard of Power Pivot and Power Query, and their first exposure to the underlying technologies is with Power BI. This can be confusing and hence I explain what you need to know below.
Power BI Can be Confusing to First Timers
I realised recently that people who have never used Power Pivot and Power Query could easily be confused by the different options to do things in Power BI. There are lots of reasons to be confused, including:
- There is no reference to the term “Power Query” inside the Power BI tools. Microsoft stopped using this name in favour of terms like “Get Data”, and “Edit Queries”. I think this was a bad decision and it has fuelled the confusion. When I tell people to do something in Power Query, they therefore don’t know what I am talking about.
- There is no reference to the term “Power Pivot” inside Power BI either. The most relevant proxy for Power Pivot inside Power BI is the “Modelling” menu. Use of the term Power Pivot therefore doesn’t make any sense to “Power BI only users” either.
Shaping and Modelling Your Data
The implications of the above is that there is no clear way to delineate (with language/labels) between the process of loading the data and the process of modelling the data. I like the term “Shaping” as a replacement for Power Query/Get Data and “Modelling” as a replacement for Power Pivot.
Definition of Shaping
Shaping is the process of extracting your data from the source, and transforming the data so that you can load it into Power BI (often referred to as ETL). The reason I like the word Shaping is that it clearly describes what is normally needed as the first step in any BI project. The shape of your data refers to
- What data you extract
- How your data is loaded into one or more tables, and which tables you import
- What names you give to the columns and tables
- Whether your tables are wide and short (many columns), or long and narrow (fewer columns)
For example data tables (also called fact tables) are normally best shaped into long, narrow tables (lots of rows, and a minimum number of columns), whereas lookup (dimension) tables can be as wide as needed to for the task.
All tables should only be loaded with data that you actually need. One big difference between Power Pivot/Power Query and Excel, is it is easy to go back and get the data you missed later.
Definition of Modelling
Modelling is the process of building the relationships between your tables and converting the business requirements into measures and calculated columns using the DAX language. The measures contain
- The business logic that transforms your raw data to a usable, value adding calculation
- The formatting required for the numbers (currency, percentage)
- A good business name for the measure that accurately describes the measure
Prior to Self Service Business Intelligence, data modelling was always done by expert IT professionals using a complex language like MDX. The complexity of such languages put it out of reach of users, as it was simply too hard to learn casually.
2 Ways to Add A Column – Wait What?
A perfect example of the type of confusion Power BI first timers face is that there are 2 ways to add columns to your data. You can add a column during the shaping process, and you can add a column during the modelling process.
So which should you use? Well it depends on what you are trying to do. My general advice on adding columns to your tables is to make the change as close to the source as possible. So in order of preference:
- Add the column in your source data (eg your Data Warehouse) with the help of your IT friends if necessary
- Add the column during the shaping process
- Add the column during the modelling process
If you add the column at the source, then you have access to it for your current BI needs but also for future BI needs. Adding the column at the source provides the most reuse, and if you need to make a change down the track, it will be the least maintenance. But sometimes you can’t do that for various reasons, or you can’t do it quickly.
If you add the column during shaping, it will be loaded and compressed with all other columns during data load. This normally means better overall table compression and hence a good outcome for the overall model.
The most common reason you would add a column during the modelling process is if you need to leverage other parts of the model to create the data. An example would be if you wanted to use a measure (eg Total Sales) in your data model to create the new column in the model (eg Banded groups of Customers based on High, Med, Low annual sales). But don’t fall into the trap of using too many calculated columns instead of measures. I rant on about that here. Use them when they are needed, otherwise avoid them.
Hopefully this explanation of the role of shaping and modelling in Power BI has helped clarify some things for you. Don’t forget to check out the other articles in my knowledge base here.
September 06, 2016 7:14 am