Last week I held a free Power BI webinar for people starting out with Power BI who are wondering why they should invest time in learning about DAX. I had more than 700 people register and was completely overwhelmed by the response. There were a few questions that I didn’t get to on the call and I wanted to circle back and answer them here. I have taken the open questions, reworded them slightly to appeal to a broader audience and answered them below. Hopefully there will be a few questions that you have wondered about too, and the answers will be of interest to you.
Is Data Acquisition the same as ETL?
When I describe the end to end process of converting raw data into value adding reporting assets, I talk about there being 4 logical phases. The phases are
Data acquisition is the phase performed by Power Query, and yes, this step is often referred to as ETL (Extract, Transform and Load).
Why Write Formulas in Calculate Columns and Not Power Query?
(And a sub/related question – when would a column in power query be better?)
In the webinar I wrote the following Calculated Column as a “test” column.
Margin Column = Sales[ExtendedAmount] – Sales[Cost]
This was for demo purposes only – I would never write such a column in my Power BI workbooks. The question of when and how to add columns is huge. As a general rule, I suggest you follow these options (in order of priority).
- Aim to use a measure instead of a column where practical. Read my article about measures vs columns here for more about the reasons why.
- Use Power Query to add required columns – this way all the transformations on the table are kept in one place
- Use a Calculated Column.
But these are just general guidelines and, as always, there are exceptions to the rule. Whether to use Power Query to add a column or use a calculated column really depends on the circumstances. As mentioned above, I generally prefer to add any genuinely needed columns in Power Query as that keeps all the transformations in one place. One exception is when I want to use a measure as part of the calculation to create the column.
E.g. if I wanted a column to specify whether a customer is “large” or “small”. I would want/need such a column if I wanted to add a slicer to a page to filter on large or small customers. To create the data, I might write the following formulas as a measure and then a calculated column.
Total Sales Measure = SUM(Sales[ExtendedAmount])
Customer Size Column in Customer Table = IF([Total Sales] >= 2000, "Large", "Small")
This is a “good” column because
- It has only 2 possible values (low cardinality)
- It is stored in the customers table, and generally extra columns in a lookup table are just fine.
- I want to use a slicer, so a measure won’t do the job.
But the column uses the [Total Sales] measure to work out which customers are large and small, and for this reason I prefer to write this as a calculated column than do it in Power Query. If I do it in Power Query, I would have to re-aggregate the total sales by customer in Power Query too – something that would be an unnecessary overhead.
Why you write SUM(), why not write SUMX()?
Another good question. DAX was developed with Excel users’ needs front of mind. The DAX language can actually be quite complex and so where possible the developers have used “syntax sugar” to make the language easier to learn for Excel users. You can read more about syntax sugar here.
Syntax sugar creates and easier to understand version of a harder formula. For example, the following 2 formulas are identical in their execution.
Total Sales SUM = SUM(Sales[ExtendedAmount])
Total Sales SUMX = SUMX(Sales, Sales[ExtendedAmount])
Now imagine you are just starting out with the DAX language and you want to add up total sales. Which one of the formulas above would you feel most comfortable writing? I am sure you agree the first one is a lot more intuitive to Excel users. There are lots of questions surrounding the second formula, such as “why am I specifying the table name twice”? Indeed, why? It is a good question, but beyond the scope for beginners. The decision whether to use SUM versus SUMX is covered in more detail here. It is really up to you. I learnt to use SUM() first, so I just default to using SUM(). But there is no reason why you can’t substitute out the SUMX() version if you prefer.
Is DIVIDE() the equivalent of IFERROR in Excel?
In short, yes. When dividing 2 numbers in Excel, you need to plan for potential errors in the case that the denominator is equal to zero. There are lots of ways to do this including using IFERROR(). In DAX there is a DIVIDE() function. DIVIDE() is the “safe divide” function that traps for the divide by zero error automatically for you. This function should be added to Excel
When should you use bidirectional cross filtering with relationships?
This is another very good question with a very long answer. The short answer is “don’t turn on bidirectional cross filtering on your relationships unless you have to”. All table filtering events come at a cost. When Power BI filters a table, it “costs” processing power, time and effort. Power BI is designed to do this – that’s what makes it great. But don’t filter things you don’t need to. That then poses the bigger question “when do you need to use bidirectional cross filtering”? Well, there are very few use cases remaining now that Power BI supports many to many physical relationships (another big topic). The obvious exception is when you want to filter a lookup table based on a column in a data table.
Here is an example. Assume you have the following setup.
Now what if I wanted a slicer on Tickets[Ticket Number] and I want to see in a table who from the support team has been assigned, and what department are they in. I have set that up below.
The table on the right has data from the SupportTeam table, but the slicer on the left is on the Tickets table. In this case, the tickets table doesn’t filter the SupportTeam table. Using a bidirectional cross filter here would fix this.
On average how much time will a person need to invest to learn DAX well?
It depends on the individual and the learning approach you take. As many readers of my blog already know, I offer live Power BI training and online Power BI Training. I have been training people for many years and I think I am pretty good at it (my students tell me so). If you learn from me, I think you can lay the foundations you need with about 25 hours focused work. Thereafter, you need to get practice (3-6 months doing stuff), ideally then followed by some more refining of your knowledge through some more structured training. After you have experience, many of the concepts become more “teachable” than they are when you are brand new to DAX. After that, it is a lifelong learning journey (I am still learning, and this is all I do).
If you teach yourself then you ‘can’ succeed, but I recommend you get some good structured learning framework to follow (my book for example). I see lots of self-taught people that have not used a good learning framework, and they literally have burnt many weeks/months of effort because they simply didn’t understand what they needed to know. My best advice is at least read a book. Read the reviews of my book on Amazon if you want recommendations from others about what they thought of my book, such as the ones below.
When should I filter in SQL Server vs in Power BI?
Another good, broad question that is not easy to answer in a few sentences. Here are a couple of principles.
- Store the data in tables in SQL
- Create views in SQL of the tables and summarise the data you need for Power BI.
- Flatten out snowflake schemas into star schemas
- Remove columns you don’t need
- If the table has “lots of rows”, I don’t see any issue. If you need the rows, then load them. What you should be more concerned about is “lots of columns”. Columns are your enemy, particularly in large data tables.
- Load all the data you need, and nothing you don’t need.
Can you suggest best way to arrange date / month
I don’t think there is one best way. I suggest you read my article about calendar tables here. https://exceleratorbi.com.au/power-pivot-calendar-tables/
Is a good practice to join multiple tables, sales, gross profit, expenses etc into a single table?
It depends. The tables should contain data of like structure, shape and context. I don’t see any reason why you shouldn’t add these 3 items into the same table. It might look something like this
It is when the overall shape, structure and context of the data is different that you should think twice. E.g. in the above example, if Expense doesn’t have an Item code but instead has an “account code”, then that would be a clue that they don’t belong in the same table. Also, if you had Stock on Hand, then this data is very different to the above, so I would not combine those.