0 votes
in Deep Learning by
What is the Vanishing Gradient Problem in Artificial Neural Networks and How to fix it?

1 Answer

0 votes
by

The vanishing gradient problem is encountered in artificial neural networks with gradient-based learning methods and backpropagation. In these learning methods, each of the weights of the neural network receives an update proportional to the partial derivative of the error function with respect to the current weight in each iteration of training. Sometimes when gradients become vanishingly small, this prevents the weight to change value.

When the neural network has many hidden layers, the gradients in the earlier layers will become very low as we multiply the derivatives of each layer. As a result, learning in the earlier layers becomes very slow. 𝐓𝐡𝐢𝐬 𝐜𝐚𝐧 𝐜𝐚𝐮𝐬𝐞 𝐭𝐡𝐞 𝐧𝐞𝐮𝐫𝐚𝐥 𝐧𝐞𝐭𝐰𝐨𝐫𝐤 𝐭𝐨 𝐬𝐭𝐨𝐩 𝐥𝐞𝐚𝐫𝐧𝐢𝐧𝐠. This problem of vanishing gradient descent happens when training neural networks with many layers because the gradient diminishes dramatically as it propagates backward through the network.

Some ways to fix it are:

  1. Use skip/residual connections.
  2. Using ReLU or Leaky ReLU over sigmoid and tanh activation functions.
  3. Use models that help propagate gradients to earlier time steps like in GRUs and LSTMs.

Related questions

0 votes
asked Jul 21, 2023 in Deep Learning by rahuljain1
0 votes
asked Dec 12, 2022 in Deep Learning by Robin
...