3. Scraping Fantasy Football Scout's Opta Data using BeautifulSoup in Python

Before we actually play around with the data, we need to have it with us in a way that's easy to play around with. Below is a brief guide on how to write a simple script to get the data that was discussed in the previous post.

Note: You will need to have a membership on www.fantasyfootballscout.co.uk to be able to access the Opta data. This post does not reproduce the same data in any way.

Pulling in data from the Ffscout website is not as straightforward as it would be from other websites because firstly you need to be logged into your membership account. Secondly, for every table on the website, there are different filters that need to be selected and cannot be accessed directly from the URL which means it's not easy to create our raw data. 

We will need to import a couple of modules first.

Next, we need to create a session with our members login ID and password. (I made sure to remove mine from the code before I pasted it here)

I created two tables in the members area that store the data that I want to pull. You don't need to create your own tables if the data you need is already available in a single table in the predefined ones. 

https://members.fantasyfootballscout.co.uk/my-stats-tables/view/42352/ is the table that has attacking and defensive statistics of teams

https://members.fantasyfootballscout.co.uk/my-stats-tables/view/42350/ is the table that has only attacking and game time statistics of players

"copyColumnList" stores the list of columns that we are interested in and "heading" is the heading of the CSV file

The function below uses BeautifulSoup to parse a URL response. Since the data is in the form of a table, we need to traverse through each column and strip the various tabs, spaces and new lines on either end before we get to the raw data. 

The boolean value team checks for whether the table is team based or player based as there are slightly different formats for the two types of tables.

I pulled the player and team data from the 2017-18 season to the 2019-20 season which should set us up nicely to build a dataset that will allow for easy experimentation.

Next steps: Scraping match information