My MDIS Data Analytics Challenge Entry

I was reading a blog by David Eldersveld last week where he was talking about his entry in the Microsoft Data Insights Summit Data Analytics Challenge.  I originally was too busy to enter however I was inspired by his article and thought I should prepare an entry albeit with only a few days to go.  I went to the rules of the competition here and decided to enter the complex data preparation category, partly because David said it did his head in, and partly because I have been spending so much time recently developing my Power Query Online training that I figured I would be ripe for this one.  I managed to find a few hours to build out an entry (although not as many as I would have liked or was needed). But this was a great lesson for me that I can share now with everyone – never under estimate the amount of effort to convert bad data into good data.  My experience was actually pretty consistent with all but the “most ready” projects I have done (with the best quality data), that is that 80+% of the effort is in the data preparation.

The Source Data

I am not going to spend a lot of time talking about the source data in this article as I will do that next week in some detail, but to be sure it was UGLY.  There were nested rows, different spellings for the same thing, subtotals nested with detailed data etc.

image

More on that in my next post.

A Video Overview of the Report

One of the requirements of the competition was to produce a 3 minute video about the report. I first created a longer video (about 6 mins) that discussed all the tabs I produced which I posted on my YouTube Channel – you can watch this longer version of the video below.  I had to trim that pretty brutally to fit into the official requirement of a 3 minute video.

The Final Report

I have posted an interactive version of the final report using Power BI Public Sharing).  Total effort to produce this report was about 7 hours, and most of that was with the data cleansing and loading (and reloading and further reloading when I found issues).

Here is the link to launch fill screen

Power Query

Most of the work for my submission was using Power Query.  My next post I will provide a copy of the workbook I produced and a video explaining how I set about the more challenging data cleansing and transformation processes using Power Query.

Share?

Comments

  1. Matt, it is an Excellent contribution. If I could vote, definitely I’ll vote for your solution! I know what this dataset is, and how hard it was to shape all that mess into such a nice report. May be there a little flaws, but with that source that is nothing to note, and you very accurately avoid most of pitfalls. Great job!

    I made at least two big mistakes with this challenge, first of all I buried too deep in data and tried to mashup all of it with all granularities etc. Fighting with a source I’ve totally forgot about report itself, missing deadline. That led to a really weird data model and then to a slow, raw and heavy report.

    • Thanks Maxim. I had to cut my losses on the last data source as I was running out of time. When I started I was thinking I should communicate “how I did it” and that was the judging criteria. But I learnt through my uni days to read and re-read the question through out to make sure you were answering the question. I think I also made a mistake as I didn’t spend enough time demonstrating the insights. Instead I demonstrated the tools to “find the insights”. I am not sure how that will be judged, but we will see. It was fun anyway and gives me a demo dataset that is more interesting than Adventure Works that I can use for other purposes.

Leave a Reply

Your email address will not be published. Required fields are marked *

Ten Things I Wish I Knew When I Started With Power Pivot.   Learn  Without The Pain!

x