Getting data from FPL and Understat to do analysis

The new season of the English Premier League is about to start and with that will start the Fantasy Premier League. A lot of websites and youtube channels will help you out in finding the right players for your team which you can copy every single week or you can just go to Reddit or Discord, and see the meltdown of taking Gabriel Jesus only for Saka to score a hattrick or you can do your analysis and overengineer everything and end up losing to a player who will not change the team for the whole season. All up to you!

If you have decided to make your life more miserable then follow the 3 steps below:

Step 1: Getting FPL data using FPL’s API

There is so much data by which one can sort players and select players in any given price range. Mostly we will check who is the most selected by% which will give you players like Gabriel Jesus, Salah, Haaland, etc, and then try to be clever and take Vardy in our team. We surely can’t make this awesome game that simple, we will do “data analysis” by getting every possible FPL data.

Python is the tool that will help us get all the data.

#Import Libraries else nothing will work
import requests
import pandas as pd
import warnings
warnings.filterwarnings('ignore')

#this is the url for the api, copy as it is
url = "https://fantasy.premierleague.com/api/bootstrap-static/"

#Request packagae to make GET request from the API endpoint (basically we are asking requests library to make a request to API to get us the data.)

r = requests.get(url)

#Transform request variable to a JSON object. (That data needs to be transformed into a JSON object else we won't be able to understand anything).

json = r.json()
json.keys()

#Build a Dataframe from one of the keys that JSON object has.
elements_df = pd.DataFrame(json['elements'])

#Checking the columns
elements_df.columns

The above code basically takes the URL and requests to get a JSON object which is used to make an elements_df. The elements_df.columns code will give us the column names in the JSON object.

Index(['chance_of_playing_next_round', 'chance_of_playing_this_round', 'code',
       'cost_change_event', 'cost_change_event_fall', 'cost_change_start',
       'cost_change_start_fall', 'dreamteam_count', 'element_type', 'ep_next',
       'ep_this', 'event_points', 'first_name', 'form', 'id', 'in_dreamteam',
       'news', 'news_added', 'now_cost', 'photo', 'points_per_game',
       'second_name', 'selected_by_percent', 'special', 'squad_number',
       'status', 'team', 'team_code', 'total_points', 'transfers_in',
       'transfers_in_event', 'transfers_out', 'transfers_out_event',
       'value_form', 'value_season', 'web_name', 'minutes', 'goals_scored',
       'assists', 'clean_sheets', 'goals_conceded', 'own_goals',
       'penalties_saved', 'penalties_missed', 'yellow_cards', 'red_cards',
       'saves', 'bonus', 'bps', 'influence', 'creativity', 'threat',
       'ict_index', 'influence_rank', 'influence_rank_type', 'creativity_rank',
       'creativity_rank_type', 'threat_rank', 'threat_rank_type',
       'ict_index_rank', 'ict_index_rank_type',
       'corners_and_indirect_freekicks_order',
       'corners_and_indirect_freekicks_text', 'direct_freekicks_order',
       'direct_freekicks_text', 'penalties_order', 'penalties_text'],
      dtype='object')

Now, from this, we will take the relevant column/data points which we think can help in analysis later.

#Getting the required metrics for analysism (Can't use all the columns, just select the ones you need)

red_elements_df= elements_df[['team','id','first_name','second_name',
'web_name','element_type','cost_change_start','now_cost','selected_by_percent', 'transfers_in', 'transfers_out', 'total_points', 'bonus', 'minutes', 'goals_scored', 'assists', 'clean_sheets', 'status', 'form']]

The element_type column is basically the player’s role. We will map them as Forward, Midfielder, Defender and Goalkeeper. We will also map the team columns to team names.

#Map Element type to actual position of the players else remembring what 4 was will make you lose your temper.

red_elements_df['element_type']=red_elements_df['element_type'].map({4:'Forward', 3:'Midfielder', 2:'Defender', 1:'Goalkeeper'})

#Map team names same reason as above

red_elements_df['team']=red_elements_df['team'].map({1:'Arsenal',2:'Aston Villa',3:'Bournemouth',4:'Brentford',5:'Brighton',6:'Chelsea',7:'Crystal Palace',8:'Everton',9:'Fulham',10:'Leicester City',11:'Leeds United',12:'Liverpool',13:'Manchester City',14:'Manchester Utd',15:'Newcastle Utd',16:'Nottingham Forest',17:'Southampton',18:'Tottenham',19:'West Ham',20:'Wolves'})

Now, some calculations for our columns.

#Convert value to float else no calculaions!
red_elements_df['selected_by_percent'] = red_elements_df.selected_by_percent.astype(float)
red_elements_df['cost_change_start'] = red_elements_df['cost_change_start']/10
red_elements_df['now_cost'] = red_elements_df['now_cost']/10


#rename columns to make it look good 
red_elements_df.columns = ['Team','Player ID','First Name','Second Name','Web Name','Position','Start Price','Current Price','Selected By','Transfers In', 'Transfers Out', 'Total Points', 'Bonus', 'Minutes', 'Goals Scored', 'Assists', 'Clean Sheets', 'Status', 'Form']

We are now ready with data and can export it as an excel file.

#Convert to Excel because that is universal truth.

red_elements_df.to_excel("First_Draft.xlsx", index=False)

Step 2: Get the Understat ID and FPL ID of the players matched.

For this, I used a CSV file from (https://github.com/vaastav/Fantasy-Premier-League).


url = "https://github.com/vaastav/Fantasy-Premier-League/blob/master/data/2022-23/id_dict.csv?raw=true"
df_understat = pd.read_csv(url)

#Clean the column names (brother has put a space before FPL ID column which caused me to lose an hour of my life)

df_understat.columns = df_understat.columns.str.replace(' ', '')

#Get the column names
df_understat.columns

Index(['Understat_ID', 'FPL_ID', 'Understat_Name', 'FPL_Name'], dtype='object')

Now, read the FPL file and merge the two data frames using FPL_ID and Player_ID

# Merge df_fpl and df_understat using the player_id column and _FPL_id column
df_fpl_understat = pd.merge(df_fpl, df_understat, left_on = "Player ID", right_on = "FPL_ID")

#Get the column names
df_fpl_understat.columns

#keep only Understat_ID 
df_fpl_understat = df_fpl_understat[['Team', 'Player ID', 'First Name', 'Second Name', 'Web Name',
    'Position', 'Start Price', 'Current Price', 'Selected By',
    'Transfers In', 'Transfers Out', 'Form', 'Total Points', 'Bonus', 'Minutes',
    'Goals Scored', 'Assists', 'Clean Sheets', 'Status', 'Understat_ID']]

#convert Understat_ID to int
df_fpl_understat['Understat_ID'] = df_fpl_understat['Understat_ID'].astype(int)

Step 3: Get Understat data using understatapi library

Now, we have a dataframe having all the FPL data and Understat ID of the players matched with FPL ID. Time to get some data from understat like xG, xA, npxG, xGChain etc that will help us in doing analysis like which player is over or underperforming and who is involved more in a match.

For this, we need to install, understatapi and then call the function UnderstatClient

from understatapi import UnderstatClient
understat = UnderstatClient()

#Change Season to 2022 when the season starts on 5th August
league_player_data = understat.league(league="EPL").get_player_data(season="2021")

# convert to pandas dataframe
league_player_data = pd.DataFrame(league_player_data)
league_player_data.columns
#convert columns to int
league_player_data['id'] = league_player_data['id'].astype(int)
league_player_data['xG'] = league_player_data['xG'].astype(float)
league_player_data['xA'] = league_player_data['xA'].astype(float)
league_player_data['npxG'] = league_player_data['npxG'].astype(float)
league_player_data['xGChain'] = league_player_data['xGChain'].astype(float)
league_player_data['xGBuildup'] = league_player_data['xGBuildup'].astype(float)
league_player_data['shots'] = league_player_data['shots'].astype(int)
league_player_data['key_passes'] = league_player_data['key_passes'].astype(int)

Now, this dataframe also has Understat ID and our previous merged dataframe also has one. So, we can merge both again to get the final excel sheet.

#merge with df_merged
final_df= pd.merge(df_fpl_understat, league_player_data, left_on = "Understat_ID", right_on = "id")

final_df = final_df[['Player ID', 'Understat_ID','Team', 'First Name', 'Second Name', 'Web Name','Position', 'Start Price', 'Current Price', 'Selected By',
'Transfers In', 'Transfers Out', 'Form', 'Total Points', 'Bonus', 'games','Minutes', 'Goals Scored', 'xG', 'Assists', 'xA', 'npxG', 'xGChain', 'xGBuildup', 'Clean Sheets', 'Status']]

#all numeric number to round to 2 decimal places else 11.9070970970970 won't look good.

final_df = final_df.round(2)

#rename columns
final_df.columns = ['Player ID', 'Understat_ID','Team', 'First Name', 'Second Name', 'Web Name',
    'Position', 'Start Price', 'Current Price', 'Selected By', 'Transfers In', 'Transfers Out', 'Form','Total Points', 'Bonus', 'Matches', 'Minutes', 'Goals Scored', 'xG', 'Assists', 'xA', 'npxG','xGChain', 'xGBuildup', 'Clean Sheets', 'Status']

#sort by Selected By and Total Points or whatever you want.
final_df = final_df.sort_values(by=['Selected By', 'Total Points'], ascending=False)

#Export as Excel File again because that is the Universal Law!!

import datetime
final_df.to_excel('final_df '+str(datetime.datetime.today().strftime("%d-%m-%Y"))+'.xlsx', index=False)

The final excel sheet should look like this:

or Step: Get on my Github profile and download the files.

https://github.com/probablyvivek/Fantasy

We will make some vizzes in the next post.

3 thoughts on “Getting data from FPL and Understat to do analysis

  1. regnårusk's avatar

    Great tutorial, looking forward for the viz! Could you do something similar for fotmob?

    Like

    1. stateastic's avatar

      There are some scrapers that can get you data from FotMob. All you need to do is pip install it and call the functions. I will try to make a tutorial on it.

      Like

Leave a comment

Design a site like this with WordPress.com
Get started
search previous next tag category expand menu location phone mail time cart zoom edit close