Extracting and Displaying Football Scores and Standings from Sky News: A Code Implementation Guide
This blog post focuses on creating a code implementation to extract football scores and standings from Sky News. It aims to address the challenge of efficiently retrieving and displaying up-to-date information for football fans. The guide provides step-by-step instructions for web scraping, selecting tools, parsing HTML content, and transforming data. By offering practical code examples, the post empowers readers to build their own data-driven solutions, enhancing their coding skills while satisfying their passion for real-time football updates. The goal is to bridge the gap between sports enthusiasts and coding enthusiasts by providing a clear and accessible solution to a common problem.
The selected tech stack comprises:
- Python
- AWS (Amazon Web Services)
- Keboola: A data integration and manipulation platform that simplifies ETL processes, enabling smooth data transformations and preparation.
EXTRACTION:
Extraction was done using python, and after so much difficulty in finding a football website with a response status of 200 I settled with Football News | Sky Sports.
EXTRACTION — Team standing code
This code aims to extract football league standings data from various URLs using web scraping. It utilizes popular libraries such as `requests` for sending HTTP requests and `BeautifulSoup` for parsing HTML content. The URLs correspond to different football leagues, and the code iterates through each URL to scrape the table containing the standings data.
The process involves:
1. Defining a list of URLs along with their sources (league names).
2. Iterating through each URL to send a request and parse the HTML content.
3. Extracting table headers and creating an empty DataFrame with those headers.
4. Iterating through table rows, extracting the data from each row, and adding it to the DataFrame.
5. Adding a column to the DataFrame indicating the source (league) of the data.
6. Appending each league’s DataFrame to a list.
7. Concatenating all league DataFrames into a single DataFrame.
8. Removing the first two columns from the final DataFrame to exclude unwanted information.
9. Printing the final DataFrame displaying the merged football league standings data.
The result is a consolidated DataFrame containing football league standings data from multiple sources (leagues), which can be further analysed and visualized for insights into team performance across different leagues.
EXTRACTION — Team scores code
The process involves:
- Sending an HTTP request to the specified URL and parsing the HTML content using BeautifulSoup.
- Extracting home team names, scores, and away team names from the HTML using the appropriate CSS classes.
- Cleaning and formatting the extracted data to create meaningful strings for home team names, scores, and away team names.
- Creating a DataFrame to store the extracted data.
The result is a DataFrame containing data about football match scores, which is then converted to CSV format and stored in an S3 bucket. This CSV data can be used for further analysis, reporting, or sharing with others.
LOADING:
Next step in the project is to load the data in a cloud storage. Follow this video for a step by step process of creating an AWS s3 bucket
(298) how to create an aws s3 bucket — YouTube
Once created, run the following code to upload data into the bucket. My bucket name is “footballstanding”
TRANSFORMATION:
The next step is to streamline, automate and transform the data.
Keebola seems to provide a comprehensive solution for data integration, transformation, and automation, which is beneficial to the project, It will help us streamline the ELTL process and focus on deriving insights from the data rather than getting caught up in complex technicalities.
It is important to note that Data transformation may be constructive (adding, copying, and replicating data), destructive (deleting fields and records), aesthetic (standardizing salutations or street names), or structural (renaming, moving, and combining columns in a database).
Below is a snapshot of the simple transformation done on the data in Keebola
The entire ELTL process on keebola is a 3 step process ( connect to source > Transform and Connect to destination)
It took just about 2 min 23 sec to run the flow successfully.
From the result of the two tables, we have extracted football league standings and match scores. Here are some questions that can be answered from this data:
Football League Standings:
1. Which team is currently leading the league in each competition?**
2. Which teams are at the bottom of the league standings in each competition?
3. Are there any significant differences in team performance across different leagues (Ligue 1, Premier League, etc.)?
4. How many points does each team have, and how does it compare to other teams?
5. Can we identify any trends or patterns in team performance over time?
6. Which league has the highest average goal difference (GD)?
7. Are there any teams that consistently perform well across multiple seasons?
Football Match Scores:
1. What were the scores of the most recent football matches?
2. Are there any matches with high-scoring games (e.g., 4–3, 3–2)?
3. Are there any notable upsets or surprises in recent match outcomes?
4. Which teams have been consistently winning or losing matches?
5. What is the distribution of goals scored in recent matches?
6. Can we identify any trends in home team vs. away team performance?
7. Are there any patterns in match scores based on different leagues?
Cross-Data Analysis:
1. Can we correlate team standings with recent match outcomes? For example, do teams higher in the standings tend to win more matches?
2. Is there a relationship between a team’s position in the standings and the goal difference in matches they play?
3. Can we analyse whether there is a correlation between a team’s position in the standings and the number of goals they score or concede in matches?
4. Can we identify any trends in team performance before and after certain periods of the season?
These questions can help you gain insights into team performance, league dynamics, and trends within the football data. Analysing the data from both tables together can provide a holistic view of the football landscape and help in making informed decisions or predictions related to team strategies, fan engagement, and more.