1. M. Andrychowicz et al., "Learning to learn by gradient descent by gradient descent," in Advances in neural information processing systems, 2016, pp. 3981-3989.
2. Y. Nesterov, "A method for unconstrained convex minimization problem with the rate of convergence O (1/k^ 2)," in Doklady an ussr, 1983, vol. 269, pp. 543-547.
3. M. Riedmiller and H. Braun, "A direct adaptive method for faster backpropagation learning: The RPROP algorithm," in IEEE international conference on neural networks, 1993: IEEE, pp. 586-591.
4. J. Duchi, E. Hazan, and Y. Singer, "Adaptive subgradient methods for online learning and stochastic optimization," Journal of machine learning research, vol. 12, no. Jul, pp. 2121-2159, 2011.
5. T. Tieleman and G. Hinton, "Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude," COURSERA: Neural networks for machine learning, vol. 4, no. 2, pp. 26-31, 2012.
6. D. P. Kingma and J. Ba, "Adam: A method for stochastic optimization," arXiv preprint arXiv:1412.6980, 2014.
7. J. Martens and R. Grosse, "Optimizing neural networks with kronecker-factored approximate curvature," in International conference on machine learning, 2015, pp. 2408-2417.
8. [D. L. Donoho, "Compressed sensing," IEEE Transactions on information theory, vol. 52, no. 4, pp. 1289-1306, 2006. [
DOI:10.1109/TIT.2006.871582]
9. [D. H. Wolpert and W. G. Macready, "No free lunch theorems for optimization," IEEE transactions on evolutionary computation, vol. 1, no. 1, pp. 67-82, 1997. [
DOI:10.1109/4235.585893]
10. [N. J. W. SJ, "Numerical optimization: Springer Science+ Business Media," ed: LLC, 2006.
11. S. Thrun and L. Pratt, "Learning to learn: Introduction and overview," in Learning to learn: Springer, 1998, pp. 3-17. [
DOI:10.1007/978-1-4615-5529-2_1]
12. J. Schmidhuber, "Evolutionary principles in self-referential learning, or on learning how to learn: the meta-meta-... hook," Technische Universität München, 1987.
13. X. Chen et al., "Symbolic discovery of optimization algorithms," Advances in Neural Information Processing Systems, vol. 36, 2024.
14. T. Hospedales, A. Antoniou, P. Micaelli, and A. Storkey, "Meta-learning in neural networks: A survey," IEEE transactions on pattern analysis and machine intelligence, vol. 44, no. 9, pp. 5149-5169, 2021. [
DOI:10.1109/TPAMI.2021.3079209]
15. E. Gärtner, L. Metz, M. Andriluka, C. D. Freeman, and C. Sminchisescu, "Transformer-based learned optimization," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 11970-11979. [
DOI:10.1109/CVPR52729.2023.01152]
16. L. S. Metz, N. Maheswaranathan, C. D. Freeman, B. Poole, and J. N. Sohl-Dickstein, "Training neural networks using learned optimizers," ed: Google Patents, 2022.
17. E. D. Cubuk, L. S. Metz, S. S. Schoenholz, and A. A. Merchant, "Optimization using learned neural network optimizers," ed: Google Patents, 2022.
18. B. M. Lake, T. D. Ullman, J. B. Tenenbaum, and S. J. Gershman, "Building machines that learn and think like people," Behavioral and brain sciences, vol. 40, 2017. [
DOI:10.1017/S0140525X16001837]
19. A. Santoro, S. Bartunov, M. Botvinick, D. Wierstra, and T. Lillicrap, "Meta-learning with memory-augmented neural networks," in International conference on machine learning, 2016, pp. 1842-1850.
20. J. Schmidhuber, "A neural network that embeds its own meta-levels," in IEEE International Conference on Neural Networks, 1993: IEEE, pp. 407-412. [
DOI:10.1109/ICNN.1993.298591]
21. J. Schmidhuber, "Learning to control fast-weight memories: An alternative to dynamic recurrent networks," Neural Computation, vol. 4, no. 1, pp. 131-139, 1992. [
DOI:10.1162/neco.1992.4.1.131]
22. J. Schmidhuber, J. Zhao, and M. Wiering, "Shifting inductive bias with success-story algorithm, adaptive Levin search, and incremental self-improvement," Machine Learning, vol. 28, no. 1, pp. 105-130, 1997. [
DOI:10.1023/A:1007383707642]
23. C. Daniel, J. Taylor, and S. Nowozin, "Learning step size controllers for robust neural network training," in Thirtieth AAAI Conference on Artificial Intelligence, 2016. [
DOI:10.1609/aaai.v30i1.10187]
24. T. P. Runarsson and M. T. Jonsson, "Evolution and design of distributed learning rules," in 2000 IEEE Symposium on Combinations of Evolutionary Computation and Neural Networks. Proceedings of the First IEEE Symposium on Combinations of Evolutionary Computation and Neural Networks (Cat. No. 00, 2000: IEEE, pp. 59-63. [
DOI:10.1109/ECNN.2000.886220]
25. Y. Bengio, S. Bengio, and J. Cloutier, "Learning a synaptic learning rule: Université de Montréal," Département d'informatique et de recherche opérationnelle, 1990. [
DOI:10.1109/IJCNN.1991.155621]
26. N. E. Cotter and P. R. Conwell, "Fixed-weight networks can learn," in 1990 IJCNN International Joint Conference on Neural Networks, 1990: IEEE, pp. 553-559. [
DOI:10.1109/IJCNN.1990.137898]
27. A. S. Younger, P. R. Conwell, and N. E. Cotter, "Fixed-weight on-line learning," IEEE Transactions on Neural Networks, vol. 10, no. 2, pp. 272-283, 1999. [
DOI:10.1109/72.750553]
28. S. Hochreiter, A. S. Younger, and P. R. Conwell, "Learning to learn using gradient descent," in International Conference on Artificial Neural Networks, 2001: Springer, pp. 87-94. [
DOI:10.1007/3-540-44668-0_13]
29. A. S. Younger, S. Hochreiter, and P. R. Conwell, "Meta-learning with backpropagation," in IJCNN'01. International Joint Conference on Neural Networks. Proceedings (Cat. No. 01CH37222), 2001, vol. 3: IEEE. [
DOI:10.1109/IJCNN.2001.938471]
30. S. Hochreiter and J. Schmidhuber, "Long short-term memory," Neural computation, vol. 9, no. 8, pp. 1735-1780, 1997. [
DOI:10.1162/neco.1997.9.8.1735]