Baseball has always been a numbers game. Batting averages, earned run averages, and RBIs have been part of the sport for over a century. But since 2015, MLB Statcast and machine learning have pushed that much further. Statcast now generates up to seven terabytes of data per game, and ML models turn that raw data into everything from pitch classifications to automated strike zone calls.
Statcast is MLB’s tracking system, now powered by Hawk-Eye Innovations cameras installed in all 30 ballparks and running on Google Cloud. It tracks the speed, spin, and trajectory of every pitch. It measures exit velocity and launch angle on every batted ball. It captures the sprint speed of every baserunner and the route efficiency of every outfielder.
The interesting part isn’t the data volume. It’s what the machine learning models do with it.
Machine Learning for Pitch Classification
Every pitcher throws multiple pitch types: fastball, slider, curveball, changeup, cutter, and variations of each. Statcast uses neural networks trained on each individual pitcher’s delivery to automatically classify every pitch thrown in every game.
The models learn each pitcher’s repertoire by analyzing spin rate, spin axis, velocity, and movement. A pitch that breaks 14 inches horizontally with 2,400 RPM of spin gets classified differently than one that drops 6 inches vertically at 2,800 RPM. This classification happens in near-real time and feeds directly into broadcasts, the MLB app, and the public-facing Baseball Savant website.
Classifying pitches used to be a manual job. Now machine learning handles it automatically for every pitch of a 162-game season across 30 teams.
What Teams See
All 30 clubs get access to the same Statcast data through APIs. The difference is what they build on top of it.
Teams train proprietary machine learning models on Statcast data to evaluate players, plan strategy, and manage workloads. A hitting coach can see that a batter’s exit velocity drops when facing pitches above 95 mph with high spin, and adjust the training plan accordingly. A front office can evaluate minor league prospects using metrics that didn’t exist ten years ago.
Speed matters too. The average time between pitches is 23 seconds. Teams that can process and surface insights faster than their opponent within that window have a real edge. Some organizations have built streaming data pipelines to run machine learning models pitch-by-pitch in real time, using platforms like Databricks to handle the volume.
How Statcast Machine Learning Powers Broadcasts
Statcast data also powers what viewers see during broadcasts.
The strike zone box overlaid on your TV screen, the projected home run distance, the catch probability percentage that appears as an outfielder sprints toward a fly ball: all Statcast. In 2024, ESPN launched a Statcast AI alternate broadcast for Sunday Night Baseball that layered real-time analytics, win probabilities, and pitcher-vs-batter matchup histories directly into the stream, powered by Google Cloud’s AI models.
The MLB App uses machine learning to personalize content for each fan: curated highlights, relevant player stats, and game analyses tailored to their favorite teams and players. Baseball Savant makes Statcast data publicly available in a way no other major sport matches, with interactive visualizations like 3D pitch movement charts that anyone can explore.
That level of public access is unusual. In the NBA, tracking data feeds into team analytics and broadcast augmentation, but the raw data isn’t nearly as open. Baseball fans have always been stats people, so the audience was already there.
The Automated Strike Zone
The most visible machine learning application in baseball right now is the Automated Ball-Strike System, or ABS. During 2025 spring training, MLB tested a challenge system: teams could challenge human umpire ball-and-strike calls, with the automated system making the final ruling.
The technology is essentially Statcast applied to officiating. The same Hawk-Eye cameras that track pitch movement also determine whether a pitch crossed the strike zone. Early data showed that about 52% of challenges resulted in overturned calls, meaning even experienced umpires were wrong on roughly half of the disputed pitches. Separate analysis found human umpires get approximately 6 to 11% of all ball-strike calls wrong across a full game.
The system added an average of only 17 seconds per challenge. Most fans seem fine with it, and MLB is expected to expand it beyond spring training.
Same Data, Different Applications
Like the NBA’s tracking system, Statcast demonstrates a pattern common in real-world machine learning deployments: one data collection infrastructure serves multiple purposes. The same cameras and sensors that help teams build proprietary scouting models also power fan-facing broadcast graphics, drive the MLB App’s personalization, enable the automated strike zone, and give the public access to analytics through Baseball Savant.
One data pipeline feeding many different machine learning models for many different audiences. That’s how most production systems work outside of sports too.
For more on MLB’s AI partnership with Google Cloud, see this Google Cloud blog post on how Statcast data is being used to surface insights faster.
