How to scrape a webpage using Power BI?
Whether you’re a data analyst or just curious about the world of information, you’ve probably faced a situation where you wanted to go beyond the analysis presented by some website/blog/Twitter profile on any subject, whatever it may be.
After going through some of these situations (mainly involving sports) and learning how to do it, I wanted to write a little about the possibilities of doing web scrapping on a website. I know and also use Python and R, and both will be topics of texts in the future, but today the idea today is not to use codes.
PS: All the following steps work on Microsoft Excel as well. ;)
Before starting, we need to understand what is Web scrapping. According to the (excellent) Towards Data Science page:
“Web scraping is a method used to get great amounts of data from websites and then data can be used for any kind of data manipulation and operation on it.
For this technique, we use web browsers. You usually do not have the built-in option to get that data you want. That is why we use Web Scraping to automate the process of getting that data and not having to do it manually. Web Scraping is the technique of automating this process so that instead of manually copying the data from websites.”
Now, let’s do it.
In this example, I’ll use as a source of data from one of my favorite websites to gather soccer data: fbref. They have other ways to share and export the data, but that is not the point, the main goal here is to teach how to do it in Power BI.
First of all (of course), open Power BI, then find “Get data” — inside “Get data” find “Web”:
In this example, I want to compare teams during the season 2022/2023 of the Premier League (what a year for Arsenal fans). So, I used this link and pasted it on URL box, and “Ok”:
Power BI connects with the webpage and present’s all the data captured during the reading:
In this, I want to use in my analysis, all the data available from squads on regular season table:
Power BI desktop usually recognizes the name of the tables and makes our life easier, since I searched by name of the section and I found the table I want:
Now, we can select the table we want and go to the data transformation process (this is not mandatory, you can load your data and start to work, however, as a good practice I was try to check data formats and quality before loading it to our model in order to avoid unnecessary waste of time):
And voila:
We now have all the data, ready for the magic! I hope you enjoyed it!