Want to make creations as awesome as this one?

Webscraping the stats.nba.com page

Transcript

Start

Jeffrey Ekeanyanwu

Webscraping in Python with Selenium and Beautiful Soup

With the COVID-19 pandemic in this perpetual state of flux, I decided to take a look at one of the first sports leagues to take the virus seriously. I am curious to see if the pandemic, rule changes, and general tougher playing environments translated into any change in overall performance. Also, I love sports and hope to be able to do indepth sports analysis in the future!

Why I chose the topic:

link

This is the main page for all NBA stats as far back as the 1970s. There are other sites that do this historical data as well, and they have information downloadable even by .csv, but nba.com updates fastest. Plus, I wanted a bit of a challenge!

stats.nba.com

The page I scraped

The code I used:

This was my loop for 4 seasons of the NBA: 1 season before the pandemic, 1 right in between, and 2 during. I had to use a try and except loop to get rid of the accept cookie pop-up that would appear sometimes when I opened the browser for the first time or refreshed too many times.

These were all the functions I had to import. Pay special attention to expected_conditions, this was really interesting. It essentially allowed you to wait until a certain element (in my case, the stat table), appeared on the screen before beginning to run the rest of the search methods.

Before and after!

The charts at the bottom measure the average age of a team versus their net rating (fancy term for how good they are at scoring baskets without being scored on. Higher net rating = better chance at winning the championship) and found that in general, older teams had higher net ratings. This may be because stars in this league have been around for quite a few years by now, bringing a teams average age up.

Something interesting I noticed in the data over the four seasons is that players are getting younger. Before the pandemic, the average age of a player was about 26 and 1/2 years old. As of this season, the average age of a player is about 25 and 2/3 years old.

Data visualizations and insights

Let's wrap up!

I only did a deep dive on a few key stats and players. I wonder if this information would be sufficient enough to make a model on to predict how a player would perform in future games.

Next Steps?

As it goes with learning any new program or software, the hardest part for me was figuring out which commands did which things. As a normal human user, I underestimated all the things I do to access a webpage and retrieve information. It is quite amazing when you have to make it explicit.

Difficulties?

As the GIF so eloquently explains, pop-ups/cookie notifications were a bit of a nuisance. I handled this with a few try/except loops on Python, as the cookies didn't always show up. I have a suspicion that NBA.com does not want to make it very easy for one to webscrape as they can sell some of this data to fantasy sports leagues.

Challenges?

THANKS!