What Is Cross Validation in Machine Learning

4 min read

Cross validation in machine learning is not just a technique for testing a model. It is a way of asking a harder question than a single test can answer.

A model tested once on one slice of data might look accurate because that slice happened to suit it. Cross validation tests the model repeatedly on different slices and asks whether performance holds across all of them. A model that does is more likely to perform well on data it has never seen. A model that does not is telling you something important before it ever reaches production.

Why a Single Test Is Not Enough

When you split a dataset into training and test sets once, evaluation depends entirely on that particular split. The training set is what the model learns from. The test set is what you use to measure how well it learned. That split is made once, and the result is a single number representing model performance.

The problem is that a single number drawn from a single split is sensitive to which examples ended up where. If the test set happened to contain examples that were easier to predict, performance looks better than it should. If it contained examples that were harder, performance looks worse. Either way, you are measuring the model and the split together, not the model alone.

This is not a hypothetical concern. A company building a model to screen job applicants split their dataset once, trained the model, and evaluated it on the held-out test set. Performance looked strong. When the model was applied to new applicants over the following quarter, it performed noticeably worse. A later review found that the original test set had, by chance, underrepresented applicants from certain roles that the model struggled to evaluate. The single split had produced a misleadingly optimistic result.

Cross validation reduces this risk by repeating the evaluation process across multiple splits and averaging the results. No single split determines the outcome.

How Cross Validation Works

The intuition behind cross validation is straightforward. If a model has genuinely learned something, it should perform reasonably well regardless of which examples it was trained on and which it was tested on. Cross validation checks this by changing the split repeatedly and seeing whether performance holds.

The most common approach is called k-fold cross validation. The dataset is divided into k equal parts, called folds. The model is trained on k-1 of those folds and tested on the remaining one. That process repeats k times, each time holding out a different fold as the test set. At the end, you have k performance measurements rather than one. Those measurements are averaged to produce a more stable estimate of how the model will perform on new data.

If k is set to five, the model is trained and tested five times, each time on a different 80/20 split of the data. The first fold is held out while the model trains on the other four. Then the second fold is held out while the model trains on the remaining four. And so on until every fold has served as the test set exactly once. No example is ever in the training set and the test set at the same time.

If performance is consistent across all five folds, that consistency is meaningful. If performance varies significantly from fold to fold, that variation is also meaningful. It suggests the model is sensitive to which examples it sees during training, which is a signal that the model may be overfitting or that the dataset contains subgroups the model handles differently.

The choice of k involves a tradeoff. Larger values of k mean each test set is smaller and each training set is larger, which produces a more accurate estimate but requires more computation. Smaller values of k are faster but produce estimates that are more sensitive to how the data happened to be divided. Five and ten are the most common choices, and either works well for most problems.

Cross Validation, Overfitting, and Regularization

Cross validation does not prevent overfitting. It detects it.

When a model overfits, it has learned patterns specific to the training data that do not hold on new data. A single test set might not surface this clearly, particularly if the test set shares some of those specific patterns with the training set. Cross validation makes overfitting harder to miss because the model is evaluated on multiple different subsets. A model that has learned genuine patterns will perform consistently across folds. A model that has learned patterns specific to the training data will perform well on some folds and poorly on others.

Regularization addresses overfitting by constraining what the model is allowed to learn during training. Cross validation measures whether that constraint is working. The two techniques are often used together: regularization reduces overfitting, and cross validation provides the evidence that it has done so. Neither replaces the other. A model trained with regularization still needs to be evaluated, and cross validation is a more reliable way to do that evaluation than a single split.

When to Use Cross Validation and When Not To

Cross validation is most useful when datasets are limited and when a reliable estimate of performance is needed before deployment. It is standard practice during model selection, where you are choosing between different model types, and during hyperparameter tuning, where you are adjusting the settings that control how a model learns. In both cases, decisions are being made based on performance estimates, and those estimates need to be trustworthy.

An e-commerce company selecting between several candidate models to predict which customers were likely to abandon their cart used cross validation to compare them. Each model was evaluated across ten folds. Two models that had looked similar on a single test split showed meaningfully different consistency across folds. The model with more consistent performance across folds was selected, and it held up better when deployed.

Cross validation is less useful when datasets are very large. With millions of examples, a single well-constructed split is usually sufficient and far less computationally expensive. It is also less appropriate when data has a time dimension. If you are predicting future events, training on data that includes the future and testing on the past produces misleading results. In those cases, a time-based split, where the model is always trained on earlier data and tested on later data, is more appropriate than standard k-fold cross validation.

Where Cross Validation Fits

Cross validation sits at the point where a model stops being built and starts being evaluated. It does not change what the model learns. It changes how confidently you can say the model has learned something real.

Used alongside regularization and careful data collection, cross validation in machine learning closes the gap between a model that performs well on your data and one that performs well on data it has never seen.

This post is part of a series on why machine learning models fail in production and how to diagnose them. For more information:

What Is Overfitting in Machine Learning
What Regularization Does and Why Your Model Needs It
Bias vs. Variance: Why Your ML Model Can’t Have It All

What Is Overfitting in Machine Learning

Overfitting is not a model failing to learn. It is a model that learned exactly what it was shown, and nothing more. A model...
mladvocate
3 min read

What Regularization Does (and Why Your Model Needs It)

Overfitting is not a mistake the model makes. It is what happens when a model does exactly what it is told. Regularization in machine...
mladvocate
4 min read

Machine Learning for Non-Technical Professionals

Why Your “Silly Questions” Are Product Requirements We have a technical writer on our team. She’s sharp, curious, and exactly the kind of non-technical...
mladvocate
7 min read

Leave a Reply

Your email address will not be published. Required fields are marked *

ML Advocate Assistant
Answers from the blog
Hi! 👋 Ask me anything about machine learning and AI! I'll answer using ML Advocate blog posts.