Data

Data Sources

Retrosheet

The data presented on the website was compiled from Retrosheet which captures individual MLB game information from 1871 to 2021. Complete game information from 1871 to 2021 was combined into a final analytic dataset, extracting the relevant MLB game characteristics and statistics. The online data repository contains complete game information, with full game information consisting of team statistics and individual player-level statistics. More than 125 characteristics of each MLB game are available for analysis. Primary variables of interest include:

  • Date
  • Ballpark
  • Attendance
  • Umpires
  • Team Statistics (runs, etc.)
  • Pitcher Statistics (number used, earned runs, walks, wild pitches, winning pitcher, etc.)
  • Batting Statistics (at-bats, hits, doubles, triples, home runs, runs, etc.)
  • Defensive Statistics (putouts, assists, errors, etc.)
  • Starting players and lineups
  • Regular season vs. Postseason Game

Please visit https://www.retrosheet.org/gamelogs/index.html for full year datasets of MLB game information.