What do you understand by bias, variance trade-off?

When we work with a supervised machine learning algorithm, the model learns from the training data. The model always tries to best estimate the mapping function between the output variable(Y) and the input variable(X). The estimation for target function may generate the prediction error, which can be divided mainly into Bias error, and Variance error. These errors can be explained as:

  • Bias Error: Bias is a prediction error which is introduced in the model due to oversimplifying the machine learning algorithms. It is the difference of predicted output and actual output. There are two types of bias:
    • High Bias: If the suggested predicted values are much different from actual value, then it is called as high bias. Due to high bias, an algorithm may miss the relevant relationships between the input features and target output, which is called underfitting.
    • Low Bias: If the suggested predicted values are less different from actual value, then it is called as low bias.
  • Variance Error: If the machine learning model performs well with training dataset, but does not perform well with test dataset, then variance occurs. It can also be defined as an error caused by the model’s sensitivity to small fluctuation in training dataset. The high variance would cause Overfitting in machine learning model, which means an algorithm introduce noise along with the underlying pattern in data to the model.

Bias Variance tradeoff:

In the machine learning model, we always try to have low bias and low variance, and

  • If we try to increase the bias, the variance decreases
  • If we try to increase the variance, the bias decreases.

Hence, trying to get an optimal bias and variance is called bias-variance trade-off. We can define it using the Bull eye diagram given below. There are four cases of bias and variances:

Data Science Interview Questions
  • If there is low bias and low variance, the predicted output is mostly close to the desired output.
  • If there is low bias and high variance, the model is not consistent.
  • If there is high variance and low bias, the model is consistent but predicted results are far away from the actual output.
  • If there is high bias and high variance, then the model is inconsistent, and also predictions are much different with actual value. It is the worst case of bias and variance.