Pareto Analysis is a statistical technique that applies the Pareto Principle to data. This is more commonly known as the 80:20 Rule. The Pareto Principle is based on the presumption that a relatively small number of inputs (20%) have most impact on the results/output (80%). The 80:20 rule can be applied to a wide variety of data in most businesses. Examples include:
- Which 20% of products make up 80% of sales
- Which 20% of customers make up 80% of profit.
- just to name a few
Although it is called the 80:20 Rule, the results normally do not come out exactly in the ratio of 80:20. Anything close to this ratio is considered normal. Armed with the results, you can focus business efforts on the 20% of the things that impact 80% of the results.
The Pareto Principle can be illustrated on a Pareto chart. In Power BI you can use a combo line and column chart. The column in the chart represents the individual inputs, while the line chart represents the cumulative percentage of the inputs as they build towards 80%.
In this article, I will use the AdventureWorks database (as usual) to show that just 12% of the products (in this case) contribute to 80%of the total margin for the business.
Measures Used for A Pareto Chart
To start with I have created 3 measures in the Sales table.
Total Cost = SUM(Sales[TotalProductCost]) Total Sales = SUM(Sales[ExtendedAmount]) Total Margin = [Total Sales] – [Total Cost]
Next I needed to write the measure that would calculate the cumulative % of sales ranked from the best product to the worst. This is a tricky formula, but I explain how it works below.
To understand how this formula works, I will need to refer to the results table shown below showing the product codes, Total Margin in $ for each product, and the Cumulative % of margin. Note that the table is sorted from highest margin $ product to the lowest margin $ in descending order.
The first row in the table below is product 312. It has a margin of approximately $474,151 which makes up 4% of the total margin for all products. Each row in the table below is calculated completely independently of each other for both the Total Margin and Cumulative % measures.
Now, let me explain how a single row is calculated (the row highlighted in red below – ProductKey 363).
The highlighted row in the table below is filtered for ProductKey = 363. This is what the row section of a table does – it filters the data before the measure is calculated.
Line 2 in the measure above is a variable. Variables are defined first as part of a measure before any other execution is completed. So the variable thisProductMargin is assigned a value of $437,021 in this case.
Line 4 in the measure above is pretty straight forward. It uses CALCULATE to remove the filter on ProductKey = 363 so it can calculate the total margin for all products. The ALL function removes all filters on products.
The secret sauce of the formula is line 10.
- Filter is an iterator and hence it has a row context (it can repeatedly evaluate a formula, one row at a time, as it steps through the specified table). In this case it is going to step through the products table.
- But there is a filter on ProductKey = 363, so FILTER needs to step through ALL(Products) instead of Products. ALL removes the filter applied by the visual before FILTER starts it’s iteration process. If I didn’t use ALL here, then there would be only 1 row in the table. ProductKey = 363 due to the filter coming from the visual.
- Then FILTER steps through the unfiltered copy of the products table and checks each product, one at a time, to see which individual products have a margin greater than the current product. The current product is ProductKey = 363.
- If FILTER comes to a product that has a margin greater than Product 363, that product is kept – in other words that product forms part of the cumulative build of all products that make up the cumulative total.
- FILTER continues to check every single product against this criteria.
- At the end of the filter process, only products with a total margin greater than or equal to the product being compared (product 363 in this case) are kept.
- Then the total margin for all those products is calculated.
- Finally this total is divided by the total for all products.
The process then repeats for every single row in the visual, over and over again.
Creating a Pareto Chart in Power BI
As I mentioned above we can use the line and clustered column chart visual to display the Pareto chart. To do that:
- Add a Line and clustered column chart visual to the Report canvas.
- Put ProductKey from the Products table on Shared axis.
- Put Cumulative % from the Products table on Line values.
- Place the [Total Margin] measure on Column values.
- Sort the chart based on Total Margin, Sort Descending.
This will sort the columns in the chart in descending order as you can see above.
I then went ahead and wrote a measure to count the number of products that make up the 80% of margin. That measure is as follows.
Count of Products Making 80% of Margin = CALCULATE ( DISTINCTCOUNT ( Sales[ProductKey] ), FILTER ( Products, [Cumulative %] <= 0.8 ) )
Here is my Pareto Analysis workbook if you would like to take a look.