Keeping Track of Changes on a Website with Power BI

Level: Intermediate

A couple of weeks ago I posted a list of my ideas that I have created at ideas.powerbi.com and asked you to vote for them if you agree with the concept (please go back and take a look if you missed it).  I thought it would be interesting to keep track of the change in the number of votes for each of my ideas.  But then I remembered that there is an issue doing this.  Both Power Pivot and Power BI are reporting database tools, they are not data storage tools (like Access, SQL Server etc).  These reporting tools are not designed for permanent storage of your data.  Each time you refresh your dataset in Power BI, the previously loaded data is deleted and the updated data is then loaded.  The implication is there is no way to build a report trending the results of subsequent refresh events over time, particulary when you are grabbing the data from the web.

Solution: Create Some Permanent Storage

The way I decided to solve this problem was as follows:

  1. Use Power Query for Excel to extract the current number of votes from the web page and store this in an Excel table in an Excel Workbook.
  2. Automate the refresh of this Excel workbook using Power Update and have Power Update create a time stamped copy of the file that contains the data for that day only, saved on disk.
  3. Run the refresh every day (automatically using Power Update).
  4. Combine and load the daily versions of the data from the various Excel Workbooks into Power BI for reporting and analysis.

I was pretty busy until after the Microsoft Business Applications Summit wrapped up a few weeks ago, so I was 1 day late getting my first extract of the numbers.  So the starting numbers in this post are the results 1 day after I released the original blog article.

Here are the steps I used to build my reporting tool.  Each is explained in detail in the rest of this article.

  • Extract the list of URLs from my blog using Ivan’s method explained here.
  • Extract the current votes, also using Ivan’s method.
  • Schedule the refresh in Power Update

Extract List of URLs

I connected from Power Query in Excel to my blog web page, then went back and changed “Open File As” to be Text file, just as Ivan explained a few weeks ago.

image

The next step was to find the URLs to my ideas.  First I used the menu to Filter Rows\Contains\ with “<a href” as the search term.  This was wayyyy too broad as it caught all the other hyperlinks on the page.  But a quick glace at the list of hyperlinks I quickly realised that searching for Contains “ideas.powerbi.com/” would do the job.

image

The reason I decided to extract these from URLs from my original blog post was 2 fold.  Firstly It meant I didn’t need to go an create another list of URLs.  The second reason is that if/when I update the webpage, the links will auto update as well.

Extract the Current Votes

The next thing I did was connect to one of my ideas using a new source in Power Query.  I went to the web page and noted how many votes there were on the website.  As you can see below, there were 869 votes at this time for my SWITCH formatting idea.

image

Then, using Ivan’s tips from a few weeks ago, I converted my new query to read the webpage as text, and then filtered on Filter Rows\Contains “869”.  Bingo, I found the line of code that contained this number.

image

But I can’t use this search term in my final query because (hopefully) this number will change over time.  But I noticed the <strong> tags so I changed the search string to be “<strong>”.  After I made this change, lucky for me there was only 1 row returned.  I then used Columns from Example to extract the 869 into a new votes column before removing the original column containing the html tags.

Create a Function

Now I had a way of extracting the vote count for a single idea, I needed to change this query into a function, so I could reuse this code for each idea I wanted to track.  Creating a function manually using M code is really simple.

I went into the Advanced Editor, added the line of code shown as 1 below.  This line of code simple says “create a function that takes a single parameter as the input and pass it to the following code”.

image

I then created a duplicate of the original line of code (shown as 2 above), pasted the duplicate line and replaced the hard coded URL with the parameter “idea” (3 above).  Note also that I have kept and commented out the original source line of code so it is there if I ever need to “undo” the creation of the function.

Once this change was done, the query became a function that I could reuse inside my first query – the one that has the list of all the URLs.

image

I went back to the Ideas query, selected Add Column (1 below), Invoke Custom Function (2 below) and selected the Votes function.  This then returned me a new column shown as 3 below.  From there I could expand and extract the current votes for every idea.

image

I then returned my final query to an Excel table and saved.

image

Schedule Daily Refresh with Power Update

Power Update is a very useful scheduling tool that allows you to set periodic updates of your Excel and Power BI Workbooks on a local PC (or server) and have them automatically updated from there without your intervention.  You can check out Power Update here and even download a free trial version if you like.  I went through the process of scheduling the daily refresh of the Excel workbook.  When doing this, there is an option to add the date and time stamp of the refresh into file name and save as a copy, as you can see below.

image

This has the effect of giving me a new file every day that contains all of the historical daily votes at the time the refresh was executed.

image

After the daily run, a new file for each day is created with the date and time of the refresh appended into the file name.

image

Build a Report

Now I had the extracts and permanent storage of the daily results, it was just a matter of building a Power BI report over the top.  Here are the results.

Configure the Gateway and Schedule Refresh

The last thing I needed to do was configure the gateway so it could access the Excel files and automatically refresh every day.  This is pretty easy to do and I covered this in an earlier blog.  Actually the install process has changed since I last blogged about this.  There is only 1 install file for both Personal and Enterprise Gateways.  The idea is you install the enterprise gateway if you have multiple people using it, and the personal gateway if you are only connecting to your personal PC.  But there is nothing to fear from either gateway – just give it a go.  Once installed, you have to add a New Data Source and configure it for your files.  Then you are good to go.

Share?

Comments

  1. Spooky. Swear to god I was just refreshing a PBI dashboard that I built to scrape data from the Excel Uservoice site, and I was saying to my colleague “It would be real cool to record each month’s votes total, so I could do a Bumps Chart showing how the rankings change week to week”, and then in comes your email.

    Are you eavesdropping on my workplace, Matt? 😉

Leave a Reply

Your email address will not be published. Required fields are marked *

Ten Things I Wish I Knew When I Started With Power Pivot.   Learn  Without The Pain!

x