Using Power Query for Extracting Non-Tabular Data from Web Pages

Matt Here:  I first met Ivan via Twitter when he was living in Russia, and then in person when he moved to Australia to live in Sydney with his family. Ivan is a talented Power BI professional and I have learnt many things from him over the last few years.  Today Ivan is writing a guest blog and he will share with you how he used Power Query to help him with his move to Australia. If you like what you read, be sure to subscribe to Ivan’s blog (link at the bottom of the page).  I will hand over to Ivan.

Introduction

Ivan: I recently moved to Australia with my family. While searching for a place to live I had to look for a good pair of suburb and school.  I found interesting data at https://www.myschool.edu.au/. However, there is no option to compare schools. It just gives some statistics about Australian schools, one page at a time. The information presented on a typical school profile page (e.g. Burwood Public School) is as shown below.

It means that in worst case, we can manually copy / paste data to Excel and perform own analysis. Of course, this can be done only for limited number of schools. Even for twenty schools it would take a bit of time.  However, nowadays, when we have such tools as Power Query (Get & Transform) in Excel, we can collect data from web sites efficiently, and easily “refreshable”.  Everything that I describe and share in this post is created for non-commercial educational purpose only.

Getting Data of One School

Let’s take Burwood school as an example. Copy URL from browser

https://www.myschool.edu.au/SchoolProfile/Index/104736/BurwoodPublicSchool/41369.

And query data from this URL in Excel.

Note: I usually do all queries in Excel. If needed, I copy/paste them to Power BI.

Power Query automatically detects all tables available on web page, and presents options so we can choose the one we are interested in. When we need more than one table we need to create another query.

For this post, let’s say we are interested in parameter “Language background other than English”. But in this case, Power Query detects only one table on the school’s page, not the data I want!

The data I want is just a value on a chart and is not in a table (#1 below) and hence cannot be used for comparison purpose.

So the required data is not in tabular format. You can only retrieve tables from Web pages using Get Data from Web.  This means we have to find a different way to get this data from a web page.  Fortunately, there is a way to do this and I am sharing it with you in this post.

The Method

In Google Chrome, we can Inspect elements of web page by right-clicking and then selecting Inspect (#1 below).

In the HTML code below, we can see that “97%” (#1 below) is a text between tags “<tspan>” (#2 below) that in turn belongs to parent tag “<text>” (#3 below), and all of this is part of “<div id=”nonEnglishSpeakingStudents”…” (#4 below). Probably, this is what we need.

This may look scary, but these tags will help us to find the data we want in the HTML text of the web page. So we need to first get that text.  Going back to Power Query, “From Web”, right-click on the URL (#1 below) and then click on Edit (#2 below).

Power Query automatically wraps content received from web into Web.Page function as shown below.

But what we need is just the text. So we have to change the setting “Open file as” to “Text File” in the Source step.  Click on the cog to edit the source step.

When we use “Text File”, Power Query loads the web page html rows in one column.

We can filter the column in order to find the line that contains the text “nonEnglishSpeakingStudents”.

The step above leaves 1 single row in the table.  All other rows have been removed.  Selecting the single row that remains after filtering, we can see the required “97%” as part of the kept text.

Now, we can easily extract this text by using standard function from Power Query “Text Between Delimiters”.

All we need to do is specify Start and End delimiters, as show below. Very easy, no need to program own functions, just use the UI to help.

This adds a new column with the value we are looking for.

Getting Data for Another School

In theory, if we change the URL with another school ID, it should work the same way, because HTML text will have similar structure on the other school’s profile page.
Click on “Advanced Editor” and locate the URL with school ID in the ‘M’ language code.

To apply our query to another school’s profile page we need to change only the URL.
Let’s take, for example, Concord Public School. I changed the query and now the query returned 69 (as shown below), which is same as on the school’s page – it works!

Convert URL to Parameter

Now it works, I will convert the URL to a parameter instead of it coded as a string. It is better to create a parameter to make it more flexible.  Home\Manage Parameters\New Parameter

Fill necessary fields (Parameter Name = URL, Text, Any Value)

I use another school here (Meadowbank Public School https://www.myschool.edu.au/school/41257) to check that solution works regardless of the school selected.

Now that I have URL as a parameter, I need to change the Power Query code to make it point to the parameter instead of hard coded string.

To do this, go to settings of step Source of the query. And select Parameter in drop-down list.

Creating a Function from Query

The reason why we started this task is to compare schools. So we need links for all the school profile pages.

The most effective way to do this in Power Query is to create function from existing query. In Power Query Editor, right-click on the query and select “Create Function”.

Provide a name to your function.

Function replicates M-code used in the initial query but adds an option to call itself with a parameter URL. Note that the original query is still there, plus new function.

Get URLs of Multiple Schools

Current version of website allows to search schools, for example, by suburb name or postal code.

While we search, in the browser we can see URL that contains search parameters –
https://www.myschool.edu.au/school-search?FormPosted=True&SchoolSearchQuery=ryde&SchoolSector=G&SchoolType=P&State=NSW

It uses following parameters:
SchoolSearchQuery=ryde – e.g. suburbs around Ryde
SchoolSector=G – Government
SchoolType=P – Primary
State=NSW
So, if needed, such string can be generated with a formula in Power Query.
In the same way as described earlier for ‘Getting data of one school’, we can query this search results web page and extract useful information from the text.

For example, we can get all the information displayed on the search results page along with the schools’ URLs that we require as parameters as shown below.

Having the URL of each school, we can extend this table with information from each school profile page, e.g. with “Non English Speaking Students %” as shown below by calling the function as an added column.

That’s it. The required information to compare schools is at hand now.

You can apply the same technique to any website where data is published in non-tabular format.

Practice Exercises

Now that you know how to query non-tabular data from web, try to get currency exchange rates from the following pages:

https://www.xe.com/currencytables/?from=AUD&date=2018-06-19

http://www.floatrates.com/daily/AUD.xml

https://www.exchange-rates.org/converter/AUD/EUR/1

https://www.x-rates.com/table/?from=AUD&amount=1

Simple Way to Get Data About Australian Schools

Though the intention of this post is to demonstrate what is possible with Power Query, in case you became inquisitive to know more about Australian schools, there is a simpler option to retrieve schools’ data.
In the footer of https://www.myschool.edu.au website you can find a link to http://www.acara.edu.au/contact-us/acara-data-access where all the data is available in Excel format.

About Author

Ivan Bondarenko (aka Ivan Bond) is a specialist in Excel-based and Power BI reporting, VBA developer, author of open-source SAP Business Objects Automation Tool and Excel-based solution for scheduling update of Excel files (aka Power Refresh).

If you are keen to read about Power Query techniques from time to time, subscribe to Ivan’s blog or Twitter @_Ivan_Bond.

Sample File

The Sample file for this post can be downloaded here.

126 thoughts on “Using Power Query for Extracting Non-Tabular Data from Web Pages”

  1. Simply desire to say your article is as amazing. The clarity in your post is just excellent and i could assume you are an expert on this subject.
    Fine with your permission allow me to grab your feed to keep
    up to date with forthcoming post. Thanks a million and please keep up the gratifying work.

  2. Can I simply say what a comfort to uncover a person that really understands what they are discussing
    online. You certainly understand how to bring an issue to light and make it important.

    A lot more people have to check this out and understand this side of the story.
    I was surprised that you are not more popular
    given that you surely have the gift.

  3. Great weblog here! Additionally your website so much
    up very fast! What web host are you using? Can I get your affiliate hyperlink to your host?

    I desire my web site loaded up as quickly as yours lol

  4. Pretty element of content. I just stumbled upon your weblog and in accession capital to assert that I acquire
    in fact loved account your weblog posts. Any way I will
    be subscribing on your augment or even I success you access consistently fast.

  5. Nice blog! Is your theme custom made or did you download it
    from somewhere? A design like yours with a few simple tweeks would really make my blog jump out.
    Please let me know where you got your design.
    Thanks a lot

  6. It’s actually a cool and useful piece of info. I am satisfied that you simply shared this
    useful info with us. Please keep us informed like this.
    Thank you for sharing.

  7. Whats up this is kinda of off topic but I was wanting to know if blogs
    use WYSIWYG editors or if you have to manually code with HTML.

    I’m starting a blog soon but have no coding knowledge so I wanted to get guidance from
    someone with experience. Any help would be greatly
    appreciated!

  8. Hey there! Do you know if they make any plugins to safeguard
    against hackers? I’m kinda paranoid about losing everything
    I’ve worked hard on. Any tips?

  9. Hi, Neat post. There is an issue along with your web
    site in internet explorer, would check this? IE still is the marketplace chief and
    a large component to folks will miss your magnificent writing due to this problem.

  10. After I initially left a comment I appear to have
    clicked the -Notify me when new comments are added- checkbox and from now on whenever a comment is added I receive 4 emails with
    the same comment. There has to be a way you can remove me from
    that service? Thanks!

  11. Having read this I thought it was rather enlightening. I appreciate you finding the
    time and effort to put this content together. I once again find myself personally spending a significant amount of time both reading and posting comments.
    But so what, it was still worthwhile!

  12. Hello there! This post couldn’t be written much better!
    Going through this article reminds me of my previous roommate!
    He always kept talking about this. I most certainly will send this
    article to him. Pretty sure he’s going to have a good read.
    I appreciate you for sharing!

  13. Do you mind if I quote a couple of your articles as long as I provide credit and sources back to your blog?
    My website is in the exact same area of interest as yours and my visitors would definitely
    benefit from some of the information you provide here.
    Please let me know if this ok with you. Thanks!

  14. Hello very cool website!! Man .. Excellent ..
    Wonderful .. I will bookmark your website and take the feeds also?
    I am satisfied to find numerous helpful information right here in the publish,
    we want work out more strategies in this regard,
    thank you for sharing. . . . . .

  15. We stumbled over here from a different website and thought I should check things out.

    I like what I see so i am just following you.
    Look forward to exploring your web page again.

  16. Excellent way of explaining, and fastidious paragraph to obtain data concerning my presentation subject, which i am going to convey in institution of higher education.

  17. Hey! I could have sworn I’ve been to this site before but after reading through some of the post I realized it’s new to
    me. Anyhow, I’m definitely happy I found it and I’ll be book-marking and
    checking back frequently!

  18. Hi I am so excited I found your blog, I really found you
    by error, while I was searching on Askjeeve for something
    else, Anyways I am here now and would just like to say kudos for a tremendous
    post and a all round thrilling blog (I also love the theme/design), I don’t have
    time to read it all at the moment but I have book-marked it and also
    added in your RSS feeds, so when I have time I will
    be back to read a lot more, Please do keep up the fantastic
    work.

  19. Howdy! This blog post couldn’t be written any better!
    Going through this post reminds me of my previous roommate!
    He constantly kept preaching about this. I’ll send this
    article to him. Pretty sure he will have a great read. Thank you for sharing!

  20. Today, I went to the beach front with my children. I found a sea shell and
    gave it to my 4 year old daughter and said “You can hear the ocean if you put this to your ear.” She put the shell
    to her ear and screamed. There was a hermit crab
    inside and it pinched her ear. She never wants to go back!
    LoL I know this is entirely off topic but I had to tell someone!

  21. Great beat ! I would like to apprentice while you amend your website, how can i subscribe for a blog site?
    The account helped me a acceptable deal. I had been a little bit acquainted
    of this your broadcast offered bright clear concept

  22. I’m not sure where you are getting your info,
    but great topic. I needs to spend some time learning more or
    understanding more. Thanks for excellent info I was looking for this information for my mission.

  23. Good day! I could have sworn I’ve been to this site before
    but after browsing through some of the articles I realized it’s new to me.
    Nonetheless, I’m definitely delighted I discovered it and I’ll be
    bookmarking it and checking back often!

  24. Hello, Neat post. There is an issue with your site in web
    explorer, might test this? IE still is the marketplace
    leader and a large element of other people will leave out your fantastic writing because of this
    problem.

  25. We absolutely love your blog and find almost all of your post’s to be precisely what I’m looking for.

    Do you offer guest writers to write content for you?
    I wouldn’t mind writing a post or elaborating on a lot of the subjects you write regarding
    here. Again, awesome website!

  26. An outstanding share! I’ve just forwarded this onto a coworker who has
    been conducting a little homework on this. And he actually bought
    me breakfast because I found it for him…
    lol. So allow me to reword this…. Thanks
    for the meal!! But yeah, thanks for spending some time to discuss
    this topic here on your website.

  27. Amazing blog! Do you have any tips and hints for aspiring writers?

    I’m planning to start my own website soon but I’m a little lost on everything.
    Would you propose starting with a free platform like WordPress or go
    for a paid option? There are so many options out there that I’m
    completely confused .. Any ideas? Appreciate it!

  28. What’s Taking place i am new to this, I stumbled upon this I’ve discovered It absolutely useful and it
    has aided me out loads. I am hoping to give a contribution & help different users like its aided me.
    Great job.

  29. I blog frequently and I seriously thank you for
    your information. This great article has really peaked
    my interest. I will take a note of your blog and keep
    checking for new information about once a week. I opted in for your RSS feed too.

  30. Someone necessarily help to make critically articles I might
    state. That is the first time I frequented your web page and so far?
    I amazed with the research you made to make this particular post extraordinary.
    Fantastic job!

  31. Heya i am for the primary time here. I found this board and I
    find It really useful & it helped me out much.
    I hope to give something back and aid others like you helped
    me.

  32. I like the helpful information you provide in your articles.
    I will bookmark your weblog and check again here regularly.
    I’m quite certain I will learn many new stuff right here!

    Best of luck for the next!

  33. Definitely believe that which you said. Your favorite justification seemed
    to be on the web the easiest thing to be aware of. I
    say to you, I certainly get annoyed even as people think about
    concerns that they plainly do not recognize about.
    You controlled to hit the nail upon the top and defined out the entire thing with no
    need side effect , other folks could take a signal. Will likely be again to get more.
    Thank you

  34. Hi, I do believe this is an excellent website. I stumbledupon it 😉 I am going to revisit once again since I saved as a favorite it. Money and freedom is the best way to change, may you be rich and continue to guide others.

  35. Hi there outstanding website! Does running a blog like this take a large amount of work?
    I’ve no expertise in coding but I was hoping to start my own blog soon. Anyway, should you have any suggestions
    or tips for new blog owners please share. I know this
    is off topic nevertheless I just had to ask.

    Cheers!

  36. I don’t even know how I finished up right here, but I believed this submit was once great.
    I do not know who you might be however definitely you’re going to a famous blogger when you
    are not already. Cheers!

  37. Heya i’m for the first time here. I came across this board and I to find It truly useful & it helped me out much.
    I am hoping to offer one thing again and help others such as
    you helped me.

  38. Really enjoyed thios write-up! I actually came across
    a company called Ideal Glass, and I have to say,
    they are truly impressive in the world of windowws and doors.
    They are experts in modern triple glazed glass solutions, as well as
    uPVC bifold doors, annd even internal screen syystems for both domestic
    and trade needs.

    I was impressed by how they combine craftsmanship
    with a focus on energy efficiency. It’s awesome to know that companies likee this one are
    raising the bar. This post, along with what I’ve seen from
    them, really makes me want to research some changes to my own property.
    Looking forward to reading more content like this—thanks again foor sharing!

    Also visit my site: Clean Pro

  39. I’m not sure exactly why but this web site is loading very slow for me. Is anyone else having this issue or is it a issue on my end? I’ll check back later on and see if the problem still exists.

  40. I loved as much as you will receive carried out right
    here. The sketch is attractive, your authored material stylish.
    nonetheless, you command get bought an nervousness over that you wish be delivering the following.

    unwell unquestionably come more formerly again since exactly the same nearly a lot often inside case you shield this hike.

  41. Greetings from Colorado! I’m bored to tears at work so I decided to browse your site on my iphone during lunch break.
    I really like the info you present here and can’t wait to
    take a look when I get home. I’m surprised at how fast your blog loaded on my mobile ..
    I’m not even using WIFI, just 3G .. Anyways, very
    good site!

  42. Today, I went to the beach with my kids. I found a sea shell and
    gave it to my 4 year old daughter and said “You can hear the ocean if you put this to your ear.” She placed the shell to her ear and screamed.
    There was a hermit crab inside and it pinched her ear.
    She never wants to go back! LoL I know this is completely off topic but I had to tell someone!

  43. Hey I know this is off topic but I was wondering if you knew of any widgets I could add to my blog that
    automatically tweet my newest twitter updates.
    I’ve been looking for a plug-in like this for quite some time and was hoping maybe you would have some
    experience with something like this. Please let me
    know if you run into anything. I truly enjoy reading
    your blog and I look forward to your new updates.

  44. I’ve been browsing online more than 3 hours nowadays, but I never discovered any attention-grabbing article like yours.
    It is beautiful worth enough for me. Personally, if all webmasters and bloggers made
    good content material as you probably did, the web will be a lot more helpful than ever before.

  45. Heya excellent blog! Does running a blog such as this take a massive
    amount work? I have absolutely no knowledge of programming but I had been hoping to start my own blog in the near future.
    Anyhow, should you have any ideas or techniques for new blog owners please
    share. I know this is off topic but I simply needed to
    ask. Cheers!

  46. My partner and I stumbled over here coming from a different web address and thought
    I may as well check things out. I like what I see so now i’m following you.

    Look forward to looking at your web page yet again.

  47. I am sure this piece of writing has touched all the internet visitors, its really really pleasant piece of writing on building up new webpage.

  48. It’s amazing to go to see this web site and reading the views of all mates on the topic of
    this post, while I am also eager of getting familiarity.

  49. Cool blog! Is your theme custom made or did you download it from somewhere?
    A theme like yours with a few simple tweeks would really make my blog jump out.
    Please let me know where you got your design. Bless you

  50. You made some good points there. I checked on the net for more information about the issue and found most people will go along
    with your views on this web site.

  51. When I initially commented I clicked the “Notify me when new comments are added” checkbox and now each time a comment is
    added I get several e-mails with the same comment.
    Is there any way you can remove me from that service?
    Appreciate it!

  52. Hmm is anyone else experiencing problems with
    the pictures on this blog loading? I’m trying to determine if its a problem on my end or if
    it’s the blog. Any feed-back would be greatly appreciated.

  53. Thanks for the marvelous posting! I quite enjoyed reading it,
    you’re a great author. I will be sure to bookmark your blog and definitely will come back later on. I want to encourage you
    to ultimately continue your great work, have a nice evening!

  54. My brother suggested I might like this blog. He was entirely right. This post truly made my day. You cann’t imagine just how much time I had spent for this information! Thanks!

  55. Hello there, I do believe your web site could possibly be having browser compatibility problems. When I take a look at your blog in Safari, it looks fine but when opening in Internet Explorer, it has some overlapping issues. I just wanted to give you a quick heads up! Aside from that, fantastic website!

  56. Hello There. I found your blog using msn. This is an extremely well written article. I will be sure to bookmark it and come back to read more of your useful information. Thanks for the post. I’ll definitely return.

  57. After looking at a handful of the articles on your website, I seriously like your technique of blogging. I saved as a favorite it to my bookmark website list and will be checking back in the near future. Take a look at my website too and let me know how you feel.

  58. Greetings from Colorado! I’m bored to death at work so I decided to browse your site on my iphone during lunch break. I really like the knowledge you present here and can’t wait to take a look when I get home. I’m amazed at how quick your blog loaded on my phone .. I’m not even using WIFI, just 3G .. Anyways, great blog!

  59. Greetings from Carolina! I’m bored to death at work so I decided to check out your site on my iphone during lunch break. I really like the info you provide here and can’t wait to take a look when I get home. I’m shocked at how fast your blog loaded on my phone .. I’m not even using WIFI, just 3G .. Anyways, amazing blog!

  60. Greetings from Idaho! I’m bored to death at work so I decided to check out your blog on my iphone during lunch break. I love the information you present here and can’t wait to take a look when I get home. I’m shocked at how fast your blog loaded on my phone .. I’m not even using WIFI, just 3G .. Anyways, fantastic blog!

  61. I don’t even know how I ended up here, but I thought this post was good. I do not know who you are but definitely you’re going to a famous blogger if you are not already 😉 Cheers!

    1. Jason Cockington

      Power Query is only available to Mac Office 365 users at present.
      If you don’t have access to Power Query in your version of Excel for Mac, then your best bet is to use Data Flows.
      Data Flows allows you to use Power Query online, and then you can access your transformed data by connecting to the Data Flow, or by downloading its results.
      You can find out more about Data Flows here:
      https://learn.microsoft.com/en-us/power-bi/transform-model/dataflows/dataflows-create

  62. Hi Ivan,thank you for this demonstration.I applied the same procedures to extract data from not tabular web page,for the first original page it works, however when I change URL to extract data in another web with the same format as the first one,it displays nothing.How can I fix this?

  63. Hi, great information in this post. One question I have, when trying to reproduce, I get a first page on the URL query to accept terms and conditions. When browsing thru a browser I just check the box and get to the page you mention in the article. But in Power Query, how do I got thru that instance of interaction to get to the page of the information to extract?
    Thanks in advance,
    Gabriel

  64. I’m not finding the Paramater field when doing this step below

    “To do this, go to settings of step Source of the query. And select Parameter in drop-down list.”

  65. What a great article / technique. This is something that I wanted to do for quite a while and wasn’t able to figure it out so I gave up. Now I can go back and pull information from lots of web pages. THANKS!

  66. I do not have the option to “Open file as” There are no options in my source step. Running Microsoft Office 365 ProPlus, Excel Version 1803 (Build 9126.2227)

    Is there a solution to add this feature?

    Thank you.

    Robert

    “But what we need is just the text. So we have to change the setting “Open file as” to “Text File” in the Source step. Click on the cog to edit the source step.

  67. Wow! Well articulated and extremely useful. I’ve stumbled into similar scenarios but you’ve made wonderfully explicit your process – thank you so much!

Leave a Comment

Your email address will not be published. Required fields are marked *