A Comparative Study of Optimization Algorithms in Deep Learning: SGD, Adam, And Beyond
DOI:
https://doi.org/10.15680/IJCTECE.2025.0804003Keywords:
Deep Learning, Optimization Algorithms, Stochastic Gradient Descent (SGD), Adam, RMSprop, Adagrad, L-BFGS, Convergence Rate, Neural Networks, Gradient DescentAbstract
Optimization algorithms play a critical role in the training of deep learning models, as they influence the convergence rate, accuracy, and stability of learning processes. Among the most popular optimization algorithms are Stochastic Gradient Descent (SGD) and its adaptive counterparts, such as Adam. While SGD has been widely used for years, Adam has gained significant popularity due to its adaptive learning rate and the ability to handle sparse gradients. However, the effectiveness of these algorithms varies depending on the problem domain, the dataset, and the architecture of the neural network. This paper conducts a comparative study of popular optimization algorithms used in deep learning, focusing primarily on SGD, Adam, and other emerging optimization techniques. We investigate the characteristics, advantages, and disadvantages of these algorithms, with a particular focus on their convergence rates, robustness, and computational efficiency. The study also considers modern variants, such as RMSprop, Adagrad, and L-BFGS, which aim to improve upon the basic optimization techniques by addressing issues like vanishing gradients, overfitting, and slow convergence. Through a series of experiments using standard benchmark datasets, we analyze the performance of these optimization algorithms on different deep learning models, such as Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs). The results are analyzed to highlight the conditions under which each algorithm excels and provide practical recommendations for selecting the optimal optimizer based on specific problem requirements.The findings of this study offer valuable insights for deep learning practitioners, providing a detailed comparison of the strengths and weaknesses of popular optimization algorithms, and guide future research on enhancing optimization techniques for deep learning models.
References
1. Kingma, D. P., & Ba, J. "Adam: A Method for Stochastic Optimization". International Conference on Learning Representations (ICLR).
2. Duchi, J., Hazan, E., & Singer, Y. "Adaptive Subgradient Methods for Online Learning and Stochastic Optimization".
Journal of Machine Learning Research, 12, 2121–2159.
3. Ruder, S."An Overview of Gradient Descent Optimization Algorithms". arXiv:1609.04747.
4. Schaul, T., Zhang, S., & LeCun, Y. "No More Pesky Learning Rates". International Conference on Machine Learning (ICML).
5. Gopichand Vemulapalli, Padmaja Pulivarthy, “Integrating Green Infrastructure With AI-Driven Dynamic Workload Optimization: Focus on Network and Chip Design,” in Integrating Blue-Green Infrastructure Into Urban Development, IGI Global, USA, pp. 397-422, 2025.
6. Fletcher, R"Practical Methods of Optimization". Wiley-Interscience.