Below are some great 3rd party websites and resources put together by others in the sports/data science world that may assist in any sort of project you are looking to do. The Mines Sports Analytics Club has not created any of these resources. Be sure to cite the developers of these resources when posting and presenting research as they have been generous to publish such tools for public use.
For a great centralized library of web-scrapers across many sports, Meyappan put together a great directory of resources on GitHub.
Football
- nflscrapR
- Uses the R language to scrape the NFL.com website of any statistics the user wishes. This widely-used R package was created by Maksim Horowitz, Ron Yurko, and Sam Ventura while they were students at Carnegie Mellon University.
- Basic nflscrapR Tutorial – Created by Ben Baldwin, writer for The Athletic
- next-gen-scrapy
- Analyzes NFL passing location data for such events like completions, incompletions, interceptions, and touchdowns from the 2017 season to the present. This scrapes the NFL’s Next Gen Stats website for all passing data in the Python language. Developed by Sarah Mallepalle and company from the Carnegie Mellon Sports Analytics Club.
- NCAAF_Scraper
- Web scraper used for Python and utilizes BeautifulSoup to scrape NCAA.com of all college football statistics.
- Pro Football Reference
- Pro Football Reference (College)
Basketball
- nbastatR
- Web scraper used in R that covers a handful of websites and data resources for the NBA.
- nbaTools
- Web scraper used in R to gather information off the NBA.com website.
- ncaahoopR (College)
- Uses the R language to scrape the ESPN.com website of any college basketball statistics on the site. This R package was created by Luke Benz while he was an undergraduate at Yale.
- Pro Basketball Reference
- Pro Basketball Reference (College)
- YouTube Tutorial Video: Python Web Scrape NBA Players of the Week Data Into a CSV File | Python Web Scraping
Baseball
- MLB Statcast
- baseballR
- Package for R developed by Bill Petti that scrapes websites like FanGraphs.com and BaseballReference.com.
- mlb-scraper
- A package that is used in Python and uses BeautifulSoup to scrape the Baseball Reference website of any desired player or team statistics.
- SABR: Society for American Baseball Research
- Sean Lahman’s Baseball Database
- Retrosheet
- Pro Baseball Reference
- Article: Scraping and Analyzing Baseball Data in R
Soccer
- fcscrapR
- Uses the R language to scrape the ESPN.com website of any statistics. This R package was created by Ron Yurko while he was a student at Carnegie Mellon University. Can scrape statistics for the World Cup, UEFA Champion League, UEFA Europa League, the English Premier League, and La Liga.
- StatsBomb
- Freely available data: Locations of players on the pitch at any moment, defensive pressure statistics, and passing data
- Transfer Markt
- Pro Soccer Reference
- Article: Scraping Major League Soccer Statistics
- Tutorial in the Python language.
Hockey
- Hockey-Scraper
- Web scraper used in Python to gather information off the ESPN website for all NHL games since the 2007-08 season and all NWHL games since the 2015-16 season. Created by Harry Shomer.
- Pro Hockey Reference
Player Contract Statistics (All Sports)