Level: Intermediate
There was an interesting question this week on http://powerpivotforum.com.au asking if there was a smarter way to user Power Query over multiple identical web pages to scrape the data in a single query. I have been meaning to blog about this for a while, so it is a great opportunity for a mini-Friday post.
The process of typing all the steps on how to do this is a lot longer than simply recording a video. So instead of a long post, I have embedded a video below from my YouTube channel.
First the Problem
Take a look at this web page http://www.calendar-australia.com/holidays/2016/
It contains all the public holidays for the calendar year 2016. But there are 2 similar web pages with the same structure but different results
http://www.calendar-australia.com/holidays/2017/
http://www.calendar-australia.com/holidays/2018/
How to do it in Power Query
I’m sure I first learnt this technique watching Miguel Llopis from Microsoft, probably at PASS SQL BA Conference in 2015. The technique can be reused in many different scenarios like this one. The trick is virtually identical to the technique I showed in this post about combining multiple Excel workbooks into a single query. The solution works equally well in Power Query for Excel as well as in Power BI (Get and Transform). The steps are
- Build a query for one of the sets of data
- Turn the query into a function using the simple parameter step
- Modify the query to use the parameter instead of the hard coded set of data
- Create a table with a column of the parameters you want to use (in this case the data must be text, not numbers).
- Add a new column that takes the text from the first column and processes the function for each row.
- Expand the column into a single table.
Here is a video showing how easy it is to do.
Seems there is a special birthday this weekend. Should be fun!
If you want a comprehensive lesson on how to use Power Query, checkout my training course here https://exceleratorbi.com.au/product/power-query-excel-power-bi-online-training/
Hi Matt. Thanks for sharing . I was looking something like that so many … weeks. Greets from Argentina
Great Post!!!
can you please upload the excel file?
how did you extract the year from the website? as it not mention but only the day and month?
thanks.
Sorry, It seems I didn’t save the file. The year is passed in the function at the 45 second mark in the video
Hi Nir,
Here is a link to my file that I used https://1drv.ms/x/s!Ah47YP4hylGXgoN0fo2h8fvi7UcNow
Hi Matt,
Thanks for sharing. This use case is way too popular to not have a good solution. But the interesting thing for me is the statement: “The solution works equally well in Power Query for Excel as well as in Power BI (Get and Transform)”. I predominantly use Excel PQ/PP, 365 proplus with 2016 apps. Should I be concerned once I port to PBID? Many thanks.
No. You should not be concerned. Generally Power BI is more advanced in capabilities than Power Query. You should expect migration to be painless and features to be better.
Great! But what about when the data appears over multiple pages? For example: https://secure.tiktok.biz/results/list/balmainfunrun/2015/10KM has got 534 records, but only 100 show per page and the URL is the same for each page. Does anyone know if there is there a faster way of pulling out all the records in one query, instead of having to create a separate query for each page?
You don’t need one query per page. You need 1 query to turn into a function, and a second query to pass the page details to the function. With these 2 queries, you can extract any number of pages into a single table.
Really cool! Also is a great confidence builder for creating custom functions!
That’s funny I was looking for a solution like this this morning. I will try your suggestion.
Thanks
Patrick
Very helpful post, I had some VBA code that I was using to get Store details from Websites, Now I have the store details directly in PQ. Thank you for the post.