Today Microsoft released the October 2018 update of Power BI Desktop. One of the fabulous new features that has been released is the ability to get a visual profile of your data in Power Query BEFORE you load the data. This new feature is typical of the type of development effort Microsoft is investing in with its products. MS is trying to make your life easier by helping you with the tools you need to do the job without you having to do everything yourself from scratch.
What is Data Profiling?
Data profiling is a process that data scientists use when wrangling with data. In short, it is a bit like taking a step back and looking at the overall shape and structure of the data. Kinda like looking at the overall forest rather than the trees. Let me show you how it is done.
Turn on the Data Profiling Preview Feature
Data Profiling is (at October 2018) a preview feature. Turn it on in the options settings
You will have to restart Power BI for the changes to come into effect.
Load and Profile Some Data
I am loading data from Adventure works. I have 5 tables that make up the data model. When I load the data using Power BI, I select Edit so it launches Power Query.
As a side note, it is very pleasing indeed to see Microsoft increasingly referring to the brand “Power Query” in the UI and conversations around the traps. Personally I believe that Power Query can be as strong a brand name as R or Python, if only Microsoft will allow the brand for the technology to have a life of its own even though it lives inside of products like Power BI and Excel.
The first data profiling feature is very subtle. Note the green bar in the image below. If there are any errors in the columns, the percentage of errors will be indicated with a percentage of this bar being a different colour. Red for error, grey for empty, green for OK.
The next thing I did was I went to the View menu (1) and turned on Column Distribution (2) and Column Quality (3) as shown below. Then something wonderful happens – look at the data profiling information show in 4 below.
The data profiling tools look at the first 1,000 rows in the preview data loaded an shows you the big picture of what the data “looks” like.
Currently the profiling tool only works on the top 1000 rows of data. It also takes some time to prepare the profile of the columns (as could be expected), however the benefits of getting this stuff right before moving on far outweigh the slower load times (IMO). I would love to see an option to profile the entire set of data for one or more columns. I am sure this will come.
What do you think of this new feature? I love it.