What may happen if you set the momentum hyperparameter too close to 1 (e.g., 0.99999) when using an SGD optimizer?
When using an SGD optimizer, if you set the momentum hyperparameter too near to one (e.g., 0.99999), the algorithm may fluctuate around the optimal result. This is because the momentum term will lead the algorithm to continue moving in the same direction even though it is no longer moving towards its optimal result. This can … Read more