0 votes
in Machine Learning by
Why do we need a validation set and test set? What is the difference between them?

1 Answer

0 votes
by

When training a model, we divide the available data into three separate sets:

The training dataset is used for fitting the model’s parameters. However, the accuracy that we achieve on the training set is not reliable for predicting if the model will be accurate on new samples.

The validation dataset is used to measure how well the model does on examples that weren’t part of the training dataset. The metrics computed on the validation data can be used to tune the hyperparameters of the model. However, every time we evaluate the validation data and we make decisions based on those scores, we are leaking information from the validation data into our model. The more evaluations, the more information is leaked. So we can end up overfitting to the validation data, and once again the validation score won’t be reliable for predicting the behaviour of the model in the real world.

The test dataset is used to measure how well the model does on previously unseen examples. It should only be used once we have tuned the parameters using the validation set.

So if we omit the test set and only use a validation set, the validation score won’t be a good estimate of the generalization of the model.

Related questions

+1 vote
asked May 9, 2022 in Machine Learning by sharadyadav1986
0 votes
asked Mar 12, 2022 in Azure by sharadyadav1986
...