Resource-Aware Neural Architecture Search for Multicore Embedded Real-Time Systems

Rastari, Soheil; Mohajjel kafshdooz, Morteza; Shamsi, Mahboubeh

doi:10.61186/jsdp.22.1.25

Volume 22, Issue 1 (5-2025) JSDP 2025, 22(1): 25-38 | Back to browse issues page

‎ 10.61186/jsdp.22.1.25

Mendeley

Zotero

RefWorks

Rastari S, Mohajjel kafshdooz M, Shamsi M. Resource-Aware Neural Architecture Search for Multicore Embedded Real-Time Systems. JSDP 2025; 22 (1) :25-38
URL: http://jsdp.rcisp.ac.ir/article-1-1418-en.html

Resource-Aware Neural Architecture Search for Multicore Embedded Real-Time Systems

Soheil Rastari

, Morteza Mohajjel kafshdooz ^*

, Mahboubeh Shamsi

Assistant Professor, Faculty of Electrical and Computer Engineering, Qom University of Technology, Qom, Iran

Abstract: (236 Views)

Creating neural networks in a non-automatic way is a slow process based on trial and error. When the number of network parameters or the number of layers increases, the non-automatic method becomes very expensive and the final result may be suboptimal. Automatic network architecture search algorithms are used to solve this problem. Recently, these algorithms have been able to achieve high accuracies on various datasets such as CIFAR-10, ImageNet, and Penn Tree Bank. These algorithms have the ability to search a wide space of architectures with different characteristics such as network depth, width, connection method, and operations in order to discover architectures with appropriate accuracy. However, one of the traditional challenges of these algorithms is their high search time (Approximately tens of thousands of GPU hours), which has been reduced to tens of hours with new research. Another challenge that usually exists in these methods is their focus on improving network accuracy, while other criteria such as network speed and consumed resources are not taken into account. As a result, these methods cannot be used directly to find the optimal architecture in embedded systems that have limited resources such as processing power, memory, and energy consumption. Therefore, search methods should be devised that are aware of these limitations. Research has been done in this field in recent years, but these methods do not focus specifically on coarse-grained multi-core architectures that do not have a GPU. In this article, we present a method for the automatic design of networks that are suitable for running on multi-core processors. In this method, based on gradient descent, a SuperNet with parallel paths and computational blocks is created. The number of parallel paths is equal to or less than the number of cores. We use a series of decision variables to select appropriate operations in each block of the path. In addition to deciding on the operations performed in each block, deciding is also made regarding synchronization points to utilize the intermediate results of parallel paths and improve the network's accuracy. Then, by training the decision variables (block type and synchronization points) simultaneously with the main network weights, an appropriate subnetwork is selected. Due to the use of the gradient descent method in this approach, the training process is performed only twice, resulting in the final structure of the network. As a result, it has a much lower execution time compared to other methods based on evolutionary search and reinforcement learning. Additionally, considering the constraints of the target system, such as the number of cores and memory consumption, can lead to a more suitable architecture compared to other methods. Experiments conducted on the CIFAR-10 dataset demonstrate that the proposed method can achieve satisfactory accuracy with very little search time.

Keywords: Neural network architecture search, embedded systems, parallelization, multi-core processors, gradient descent method.

Full-Text [PDF 1200 kb] (82 Downloads)

Type of Study: Research | Subject: Paper
Received: 2024/01/23 | Accepted: 2025/03/8 | Published: 2025/06/21 | ePublished: 2025/06/21

References

1. T. Elsken, J. H.Metzen, and F. Hutter, "Neural Architecture Search: A Survey." arXiv, Apr. 26, 2019. [DOI:10.1007/978-3-030-05318-5_3]

2. M. Wistuba, A. Rawat, and T. Pedapati, "A Survey on Neural Architecture Search." arXiv, Jun.18,2019.

3. H. Benmeziane, K. E. Maghraoui, H. Ouarnoughi, S. Niar, M. Wistuba, and N. Wang, "A Comprehensive Survey on Hardware-Aware Neural Architecture Search." arXiv,Jan.22,2021. [DOI:10.24963/ijcai.2021/592]

4. X. He, K. Zhao, and X. Chu, "AutoML: A survey of the state-of-the-art," Knowl.-Based Syst., vol. 212, p. 106622, Jan. 2021. [DOI:10.1016/j.knosys.2020.106622]

5. E. Real, A. Aggarwal, Y. Huang, and Q. V. Le, "Regularized Evolution for Image Classifier Architecture Search." arXiv, Feb. 16, 2019. [DOI:10.1609/aaai.v33i01.33014780]

6. B. Zoph and Q. V. Le, "Neural Architecture Search with Reinforcement Learning." arXiv, Feb.15,2017.

7. H. Liu, K. Simonyan, and Y. Yang, "DARTS: Differentiable Architecture Search." arXiv, Apr.23,2019.

8. W. J. Song, "Chapter Two - Hardware accelerator systems for embedded systems," in Advances in Computers, vol. 122, S. Kim and G. C. Deka, Eds., in Hardware Accelerator Systems for Artificial Intelligence and Machine Learning, vol. 122. , Elsevier, 2021, pp. 23-49. [DOI:10.1016/bs.adcom.2020.11.004]

9. H. Park and S. Kim, "Chapter Three - Hardware accelerator systems for artificial intelligence and machine learning," in Advances in Computers, vol. 122, S. Kim and G. C. Deka, Eds., in Hardware Accelerator Systems for Artificial Intelligence and Machine Learning, vol. 122. , Elsevier, 2021, pp. 51-95. [DOI:10.1016/bs.adcom.2020.11.005]

10. A. G. Howard et al., "MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications." arXiv, Apr. 16, 2017.

11. X. Zhang, X. Zhou, M. Lin, and J. Sun, "ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices." arXiv, Dec. 07, 2017. [DOI:10.1109/CVPR.2018.00716] [PMID] []

12. M. Tan and Q. V. Le, "EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks." arXiv, Sep. 11, 2020.

13. H. Cai, L. Zhu, and S. Han, "ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware." arXiv, Feb. 22, 2019.

14. B. Wu et al., "FBNet: Hardware-Aware Efficient ConvNet Design via Differentiable Neural Architecture Search." arXiv, May 24, 2019. [DOI:10.1109/CVPR.2019.01099] [PMID]

15. A. Wan et al., "FBNetV2: Differentiable Neural Architecture Search for Spatial and Channel Dimensions." arXiv, Apr. 12, 2020. [DOI:10.1109/CVPR42600.2020.01298]

16. X. Chen, L. Xie, J. Wu, and Q. Tian, "Progressive Differentiable Architecture Search: Bridging the Depth Gap Between Search and Evaluation," presented at the Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 1294-1303. Accessed: Aug. 30, 2023. [Online]. Available: https://openaccess.thecvf.com/ [DOI:10.1109/ICCV.2019.00138]

17. content_ICCV_2019/html/Chen_Progressive_Differentiable_Architecture_Search_Bridging_the_Depth_Gap_Between_Search_ICCV_2019_paper.html

18. H. Cai, C. Gan, T. Wang, Z. Zhang, and S. Han, "Once-for-All: Train One Network and Specialize it for Efficient Deployment." arXiv, Apr.29,2020.

19. "Automated deep learning architecture design using differentiable architecture search (DARTS)." Accessed: Aug. 21, 2023. [Online]. Available: https://mountainscholar .org/handle/10217/199856

20. D. Stamoulis et al., "Single-Path NAS: Designing Hardware-Efficient ConvNets in less than 4 Hours." arXiv, Apr. 05, 2019. [DOI:10.1007/978-3-030-46147-8_29]

21. C. Li et al., "HW-NAS-Bench:Hardware-Aware Neural Architecture Search Benchmark." arXiv, Mar. 18, 2021.

22. L. L. Zhang, Y. Yang, Y. Jiang, W. Zhu, and Y. Liu, "Fast Hardware-Aware Neural Architecture Search," in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Seattle, WA, USA: IEEE, Jun. 2020, pp. 2959-2967. [DOI:10.1109/CVPRW50498.2020.00354]

23. K. T. Chitty-Venkata and A. K. Somani, "Neural Architecture Search Survey: A Hardware Perspective," ACM Comput. Surv., vol. 55, no. 4, p. 78:1-78:36, Nov. 2022. [DOI:10.1145/3524500]

24. W. Roth et al., "Resource-Efficient Neural Networks for Embedded Systems." arXiv, Dec. 09, 2022.

25. "How to Boost Compute Performance with FPGA-Based Accelerators - element14 Community." Accessed: Aug. 22, 2023. [Online].Available:https://community.element14.com/technologies/fpga- group/w/documents/5003/how-to-boost-compute-performance-with-fpga-based-accelerators.

26. "Ultimate Guide: ASIC (Application Specific Integrated Circuit)," AnySilicon. Accessed: Aug. 22,2023. [Online]. Available: https://anysilicon.com/ultimate-guide-asic-application-specific-integrated-circuit/

27. S. Han, H. Mao, and W. J. Dally, "Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding." arXiv, Feb. 15, 2016.

28. S. Han, J. Pool, J. Tran, and W. J. Dally, "Learning both Weights and Connections for Efficient Neural Networks." arXiv, Oct. 30, 2015.

29. B. Jacob et al., "Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference," in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2018, pp. 2704-2713. [DOI:10.1109/CVPR.2018.00286]

30. "CIFAR-10 and CIFAR-100 datasets." Accessed: Sep. 14, 2023. [Online]. Available: https://www.cs.toronto.edu/~kriz/cifar.html

31. "PyTorch." Accessed: Sep. 16, 2023. [Online]. Available: https://www.pyto rch.org

Send email to the article author

Rights and permissions
	This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Signal and Data Processing

Vote