I was reading a blog by David Eldersveld last week where he was talking about his entry in the Microsoft Data Insights Summit Data Analytics Challenge. I originally was too busy to enter however I was inspired by his article and thought I should prepare an entry albeit with only a few days to go. I went to the rules of the competition here and decided to enter the complex data preparation category, partly because David said it did his head in, and partly because I have been spending so much time recently developing my Power Query Online training that I figured I would be ripe for this one. I managed to find a few hours to build out an entry (although not as many as I would have liked or was needed). But this was a great lesson for me that I can share now with everyone – never under estimate the amount of effort to convert bad data into good data. My experience was actually pretty consistent with all but the “most ready” projects I have done (with the best quality data), that is that 80+% of the effort is in the data preparation.
The Source Data
I am not going to spend a lot of time talking about the source data in this article as I will do that next week in some detail, but to be sure it was UGLY. There were nested rows, different spellings for the same thing, subtotals nested with detailed data etc.
More on that in my next post.
A Video Overview of the Report
One of the requirements of the competition was to produce a 3 minute video about the report. I first created a longer video (about 6 mins) that discussed all the tabs I produced which I posted on my YouTube Channel – you can watch this longer version of the video below. I had to trim that pretty brutally to fit into the official requirement of a 3 minute video.
The Final Report
I have posted an interactive version of the final report using Power BI Public Sharing). Total effort to produce this report was about 7 hours, and most of that was with the data cleansing and loading (and reloading and further reloading when I found issues).
Here is the link to launch fill screen
Most of the work for my submission was using Power Query. My next post I will provide a copy of the workbook I produced and a video explaining how I set about the more challenging data cleansing and transformation processes using Power Query.