Extract Multiple Pages from Web

Every now and then when I have a Power BI project of interest to me, I like to create a video of the end to end process of building a new report. This allows me to share some “warts and all” real-world examples of how to go about building a Power BI report. It gives me a chance to show some concepts (such as creating functions and extracting multiple pages from websites) but also to show that these things are seldom smooth and error free. In the video I am sharing today, I built a new Covid-19 report showing case rates in Australia. I wanted to achieve a report that allowed me to

  1. Extract data from an online data source
  2. Have it combine data from multiple web pages into a single report
  3. Automate the Power BI Reporting process to refresh the report each day.

Watch the video to see how I achieved 1 and 2 above.

Can’t Schedule Web.BrowserContents to Refresh in Power BI

When it came time to set up the scheduled refresh in Power BI (3 above), I struck a problem. I was not able to schedule auto refresh due to the error “can not refresh due to hand authored queries”, or something similar to that. I asked a few friends what the issue could be, and Tristan Malherbe (also an MVP) helped me out. He directed me to this blog article by Chris Webb.

The issue was with the first step (auto generated by Power Query).

=Web.BrowserContents(“https://covidlive.com.au/report/daily-cases/” & StateParameter)

It turns out that this query step cannot be automatically refreshed – which is very annoying given it is the default for “get data, from Web”. After reading Chris’s blog above, I changed the first step to be as follows

= Web.Contents(“https://covidlive.com.au/report/daily-cases/”, [RelativePath=StateParameter])

With this change in place, it all worked perfectly with Scheduled Refresh.

Future Videos

Over the next few weeks I will share a couple of follow up videos where I continue to develop this report to help me visualise what is happening across and between the states, so keep an eye out for those.

4 thoughts on “Extract Multiple Pages from Web”

  1. It appears there might be a slight misunderstanding in the request, as “Extract Multiple Pages from Web” isn’t a standard, universally recognized software feature or a direct technical term in the way “enable developer ribbon” is. It’s more of a high-level goal.

  2. Extracting multiple pages from the web” (often referred to as web scraping or web crawling) is a common and powerful technique for collecting large volumes of data from websites for various purposes, such as:

    Market Research: Gathering product prices, competitor information, or industry trends.
    Content Aggregation: Collecting articles, news, or blog posts from various sources.
    Lead Generation: Extracting contact information from directories.
    Academic Research: Collecting data for linguistic analysis, social science studies, etc.
    Data Analysis: Building datasets for machine learning models or business intelligence

Leave a Comment

Your email address will not be published. Required fields are marked *