Designing L-BFGS inspired automatic optimizer network

Etesam, Mohammad; Sadeghi-Lotfabadi, Ashkan; Ghiasi-Shirazi, Kamaledin

doi:10.61186/jsdp.21.1.89

Volume 21, Issue 1 (6-2024) JSDP 2024, 21(1): 89-100 | Back to browse issues page

‎ 10.61186/jsdp.21.1.89

Mendeley

Zotero

RefWorks

Etesam M, Sadeghi-Lotfabadi A, Ghiasi-Shirazi K. Designing L-BFGS inspired automatic optimizer network. JSDP 2024; 21 (1) : 7
URL: http://jsdp.rcisp.ac.ir/article-1-1142-en.html

Designing L-BFGS inspired automatic optimizer network

Mohammad Etesam

, Ashkan Sadeghi-Lotfabadi

, Kamaledin Ghiasi-Shirazi ^*

Ferdowsi University of Mashhad

Abstract: (1105 Views)

Nowadays using features learned by machines is common and these types of features have excellent quality in comparison with hand-designed features. While many machine learning models are developed to extract features automatically, however, the optimizing algorithms are still designed manually. In this paper, we propose a method to cast the optimizing algorithm as a machine learning problem. This is a branch of machine learning which is named meta-learning or learning to learn.
Gradient-based optimization algorithms (e.g. gradient descent and BFGS) receive the gradient vector in each step and, by using the information of the previous points and gradients, estimate the update vector at the current point. The inputs and outputs of these algorithms are vectors whose dimension is the same as the optimization problem. These algorithms are written solely based on vector addition, scalar-product, and inner-product operations. Therefore, we can say that these algorithms are executed in a Hilbert space whose dimension is determined by the optimization problem. In this paper, we propose a novel method for learning to optimize over a Hilbert space of unknown dimensionality.
We introduce a new neural network module named Hilbert LSTM (HLSTM) which is based on a novel LSTM cell whose learning process is independent of the input data dimension. This independency is the result of restricting the network to the operations on a Hilbert space, prohibiting the network to work directly with the entries within a vector. To achieve this goal, we use a linear coefficients layer that linearly combines the input vectors based on coefficients computed by their inner products. Training the network based on the inner product between vectors leads to learning an optimization algorithm that is independent of the data dimension. Our experiments show that the proposed optimizer achieves better results in comparison with hand-designed algorithms.

Article number: 7

Keywords: Hilbert LSTM, LSTM, L-BFGS, meta-learning, automatic optimization

Full-Text [PDF 852 kb] (499 Downloads)

Type of Study: بنیادی | Subject: Paper
Received: 2020/05/10 | Accepted: 2024/02/17 | Published: 2024/08/3 | ePublished: 2024/08/3

References

1. M. Andrychowicz et al., "Learning to learn by gradient descent by gradient descent," in Advances in neural information processing systems, 2016, pp. 3981-3989.

2. Y. Nesterov, "A method for unconstrained convex minimization problem with the rate of convergence O (1/k^ 2)," in Doklady an ussr, 1983, vol. 269, pp. 543-547.

3. M. Riedmiller and H. Braun, "A direct adaptive method for faster backpropagation learning: The RPROP algorithm," in IEEE international conference on neural networks, 1993: IEEE, pp. 586-591.

4. J. Duchi, E. Hazan, and Y. Singer, "Adaptive subgradient methods for online learning and stochastic optimization," Journal of machine learning research, vol. 12, no. Jul, pp. 2121-2159, 2011.

5. T. Tieleman and G. Hinton, "Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude," COURSERA: Neural networks for machine learning, vol. 4, no. 2, pp. 26-31, 2012.

6. D. P. Kingma and J. Ba, "Adam: A method for stochastic optimization," arXiv preprint arXiv:1412.6980, 2014.

7. J. Martens and R. Grosse, "Optimizing neural networks with kronecker-factored approximate curvature," in International conference on machine learning, 2015, pp. 2408-2417.

8. [D. L. Donoho, "Compressed sensing," IEEE Transactions on information theory, vol. 52, no. 4, pp. 1289-1306, 2006. [DOI:10.1109/TIT.2006.871582]

9. [D. H. Wolpert and W. G. Macready, "No free lunch theorems for optimization," IEEE transactions on evolutionary computation, vol. 1, no. 1, pp. 67-82, 1997. [DOI:10.1109/4235.585893]

10. [N. J. W. SJ, "Numerical optimization: Springer Science+ Business Media," ed: LLC, 2006.

11. S. Thrun and L. Pratt, "Learning to learn: Introduction and overview," in Learning to learn: Springer, 1998, pp. 3-17. [DOI:10.1007/978-1-4615-5529-2_1]

12. J. Schmidhuber, "Evolutionary principles in self-referential learning, or on learning how to learn: the meta-meta-... hook," Technische Universität München, 1987.

13. X. Chen et al., "Symbolic discovery of optimization algorithms," Advances in Neural Information Processing Systems, vol. 36, 2024.

14. T. Hospedales, A. Antoniou, P. Micaelli, and A. Storkey, "Meta-learning in neural networks: A survey," IEEE transactions on pattern analysis and machine intelligence, vol. 44, no. 9, pp. 5149-5169, 2021. [DOI:10.1109/TPAMI.2021.3079209]

15. E. Gärtner, L. Metz, M. Andriluka, C. D. Freeman, and C. Sminchisescu, "Transformer-based learned optimization," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 11970-11979. [DOI:10.1109/CVPR52729.2023.01152]

16. L. S. Metz, N. Maheswaranathan, C. D. Freeman, B. Poole, and J. N. Sohl-Dickstein, "Training neural networks using learned optimizers," ed: Google Patents, 2022.

17. E. D. Cubuk, L. S. Metz, S. S. Schoenholz, and A. A. Merchant, "Optimization using learned neural network optimizers," ed: Google Patents, 2022.

18. B. M. Lake, T. D. Ullman, J. B. Tenenbaum, and S. J. Gershman, "Building machines that learn and think like people," Behavioral and brain sciences, vol. 40, 2017. [DOI:10.1017/S0140525X16001837]

19. A. Santoro, S. Bartunov, M. Botvinick, D. Wierstra, and T. Lillicrap, "Meta-learning with memory-augmented neural networks," in International conference on machine learning, 2016, pp. 1842-1850.

20. J. Schmidhuber, "A neural network that embeds its own meta-levels," in IEEE International Conference on Neural Networks, 1993: IEEE, pp. 407-412. [DOI:10.1109/ICNN.1993.298591]

21. J. Schmidhuber, "Learning to control fast-weight memories: An alternative to dynamic recurrent networks," Neural Computation, vol. 4, no. 1, pp. 131-139, 1992. [DOI:10.1162/neco.1992.4.1.131]

22. J. Schmidhuber, J. Zhao, and M. Wiering, "Shifting inductive bias with success-story algorithm, adaptive Levin search, and incremental self-improvement," Machine Learning, vol. 28, no. 1, pp. 105-130, 1997. [DOI:10.1023/A:1007383707642]

23. C. Daniel, J. Taylor, and S. Nowozin, "Learning step size controllers for robust neural network training," in Thirtieth AAAI Conference on Artificial Intelligence, 2016. [DOI:10.1609/aaai.v30i1.10187]

24. T. P. Runarsson and M. T. Jonsson, "Evolution and design of distributed learning rules," in 2000 IEEE Symposium on Combinations of Evolutionary Computation and Neural Networks. Proceedings of the First IEEE Symposium on Combinations of Evolutionary Computation and Neural Networks (Cat. No. 00, 2000: IEEE, pp. 59-63. [DOI:10.1109/ECNN.2000.886220]

25. Y. Bengio, S. Bengio, and J. Cloutier, "Learning a synaptic learning rule: Université de Montréal," Département d'informatique et de recherche opérationnelle, 1990. [DOI:10.1109/IJCNN.1991.155621]

26. N. E. Cotter and P. R. Conwell, "Fixed-weight networks can learn," in 1990 IJCNN International Joint Conference on Neural Networks, 1990: IEEE, pp. 553-559. [DOI:10.1109/IJCNN.1990.137898]

27. A. S. Younger, P. R. Conwell, and N. E. Cotter, "Fixed-weight on-line learning," IEEE Transactions on Neural Networks, vol. 10, no. 2, pp. 272-283, 1999. [DOI:10.1109/72.750553]

28. S. Hochreiter, A. S. Younger, and P. R. Conwell, "Learning to learn using gradient descent," in International Conference on Artificial Neural Networks, 2001: Springer, pp. 87-94. [DOI:10.1007/3-540-44668-0_13]

29. A. S. Younger, S. Hochreiter, and P. R. Conwell, "Meta-learning with backpropagation," in IJCNN'01. International Joint Conference on Neural Networks. Proceedings (Cat. No. 01CH37222), 2001, vol. 3: IEEE. [DOI:10.1109/IJCNN.2001.938471]

30. S. Hochreiter and J. Schmidhuber, "Long short-term memory," Neural computation, vol. 9, no. 8, pp. 1735-1780, 1997. [DOI:10.1162/neco.1997.9.8.1735]

Send email to the article author

Rights and permissions
	This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Signal and Data Processing

Vote