مروری نقادانه بر روش‌های بازیابی محتوامحور و معناگرای تصاویر

حاجی اسمعیلی, محمد مهدی; منتظر, غلامعلی

doi:10.61186/jsdp.22.1.113

دوره 22، شماره 1 - ( 3-1404 ) جلد 22 شماره 1 صفحات 141-113 | برگشت به فهرست نسخه ها

‎ 10.61186/jsdp.22.1.113

Mendeley

Zotero

RefWorks

Haji-Esmaeili M M, Montazer G. a Critical Survey on Content-Based & Semantic Image Retrieval – Abstract. JSDP 2025; 22 (1) :113-141
URL: http://jsdp.rcisp.ac.ir/article-1-1432-fa.html

حاجی اسمعیلی محمد مهدی، منتظر غلامعلی. مروری نقادانه بر روش‌های بازیابی محتوامحور و معناگرای تصاویر. پردازش علائم و داده‌ها. 1404; 22 (1) :113-141

URL: http://jsdp.rcisp.ac.ir/article-1-1432-fa.html

مروری نقادانه بر روش‌های بازیابی محتوامحور و معناگرای تصاویر

محمد مهدی حاجی اسمعیلی

، غلامعلی منتظر^*

استاد گروه مهندسی فناوری اطلاعات دانشگاه تربیت مدرس، تهران، ایران

چکیده: (509 مشاهده)

تعداد، تنوع و پیچیدگی محتوای تصویری در دنیای رقمی به‌سرعت در حال افزایش است و این موضوع نیاز به طراحی و پیاده‌سازی سامانه‌های جویش و بازیابی محتوای تصویری را بسیار محسوس کرده‌است؛ در حال حاضر با مقیاس عظیمی از داده‌های تصویری در فضای وب روبه‌روهستیم که راه‌کارهای معمولِ مبتنی بر فراداده‌های دستی و انسانی پاسخ‌گوی تنوع و تعداد بسیار زیاد آن‌ها نیست. حجم عظیم داده‌های تولیدی در محیط وب، بدون راه‌کاری با دقت و سرعت بالا در درک و بازیابی آن‌ها، به آرشیو‌های ابدی رقمی خواهند پیوست و هرگز دوباره پیدا نخواهند شد. در سال‌های اخیر تلاش‌های بسیاری برای بازیابی این تصاویر، به‌‌ویژه در مبحث «بازیابی ‌محتوامحور» (CBIR) و «بازیابی معناگرای‌» (SIR) تصویر شده‌است. سامانه‌های بازیابی محتوامحور و معناگرای تصویر، توانایی جست‌وجو و بازیابی تصاویر بر اساس محتوای درونی و معانی سطح بالای انسانی را دارد، نه فراداده‌هایی‌ که‌ ممکن است، همراه با آن ثبت شده باشند. این مقاله، مروری جامع بر آخرین پیشرفت‌ها در زمینۀ بازیابی محتوامحور تصاویر در سال‌های اخیر ارائه کرده و تلاش دارد با رویکردی نقادانه، نقاط مثبت و منفی هر حوزۀ پژوهشی مطرح در مبحث بازیابی محتوامحور را بیان کند و نمایی کلی از چهارچوب این فرایند و پیشرفت‌های این حوزه ارائه دهد که شامل زمینه‌هایی همچون پیش‌پردازش تصویر، استخراج و تعبیۀ ویژگی‌ها (Feature Embedding)، یادگیری ماشینی، مجموعه‌داده‌های مطرح در این حوزه، تطبیق شباهت و ارزیابی عملکرد است؛ درنهایت، رویکردهای پژوهشیِ اصیل، چالش‌ها و پیشنهادهایی برای پیشرفت بهتر پژوهش‌ها در این حوزه ارائه شده‌است.

واژه‌های کلیدی: بازیابی محتوامحور تصاویر، پردازش تصویر، بینایی ماشین، یادگیری ماشینی، یادگیری ژرف، شکاف معنایی.

متن کامل [PDF 2405 kb] (224 دریافت)

نوع مطالعه: پژوهشي | موضوع مقاله: مقالات پردازش تصویر
دریافت: 1403/4/23 | پذیرش: 1403/12/25 | انتشار: 1404/3/31 | انتشار الکترونیک: 1404/3/31

فهرست منابع

1. L. Fei-Fei, A. Iyer, C. Koch, and P. Perona, "What do we perceive in a glance of a real-world scene?," J. Vis., vol. 7, no. 1, p. 10, Jan. 2007, doi: 10.1167/7.1.10. [DOI:10.1167/7.1.10] [PMID]

2. S. R. Dubey, "A Decade Survey of Content Based Image Retrieval Using Deep Learning," IEEE Trans. Circuits Syst. Video Technol., vol. 32, no. 5, pp. 2687-2704, 2022, doi: 10.1109/TCSVT.2021.3080920. [DOI:10.1109/TCSVT.2021.3080920]

3. A. Alzu'bi, A. Amira, and N. Ramzan, "Semantic content-based image retrieval: A comprehensive study," J. Vis. Commun. Image Represent., vol. 32, pp. 20-54, 2015, doi: 10.1016/j.jvcir.2015.07.012. [DOI:10.1016/j.jvcir.2015.07.012]

4. O. Vinyals, A. Toshev, S. Bengio, and D. Erhan, "Show and tell: A neural image caption generator," in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2015, pp. 3156-3164. doi: 10.1109/CVPR.2015.7298935. [DOI:10.1109/CVPR.2015.7298935]

5. C. Li et al., "mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections," Proc. 2022 Conf. Empir. Methods Nat. Lang. Process. EMNLP 2022, pp. 7241-7259, 2022. [DOI:10.18653/v1/2022.emnlp-main.488]

6. R. Mihalcea, "The multidisciplinary facets of research on humour," Appl. Fuzzy Sets Theory, pp. 412-421, 2007, doi: 10.1007/978-3-540-73400-0_52. [DOI:10.1007/978-3-540-73400-0_52]

7. A. Chandrasekaran et al., "We Are Humor Beings: Understanding and Predicting Visual Humor," Dec. 2015. [DOI:10.1109/CVPR.2016.498]

8. A. Ramesh, P. Dhariwal, A. Nichol, C. Chu, and M. Chen, "Hierarchical Text-Conditional Image Generation with CLIP Latents," 2022, [Online]. Available: http://arxiv.org/abs/2204.06125

9. "Internet LiveStats 2024." [Online]. Available: https://www.internetlivestats.com

10. "DemandSage 2024." [Online]. Available: https://www.demandsage.com/social-media-users/

11. "Gartner 2024." [Online]. Available: https://www.gartner.com

12. Y. Rui, T. S. Huang, and S. F. Chang, "Image retrieval: Current techniques, promising directions, and open issues," J. Vis. Commun. Image Represent., vol. 10, no. 1, pp. 39-62, 1999, doi: 10.1006/jvci.1999.0413. [DOI:10.1006/jvci.1999.0413]

13. D. Stan and I. K. Sethi, "Mapping low-level image features to semantic concepts," M. M. Yeung, C.-S. Li, and R. W. Lienhart, Eds., Jan. 2001, pp. 172-179. doi: 10.1117/12.410925. [DOI:10.1117/12.410925]

14. R. Datta, J. Li, and J. Z. Wang, "Content-based image retrieval: approaches and trends of the new age," in Proceedings of the 7th ACM SIGMM international workshop on Multimedia information retrieval, New York, NY, USA: ACM, Nov. 2005, pp. 253-262. doi: 10.1145/1101826.1101866. [DOI:10.1145/1101826.1101866]

15. A. Mojsilovic and B. Rogowitz, "Capturing image semantics with low-level descriptors," in Proceedings 2001 International Conference on Image Processing (Cat. No.01CH37205), IEEE, pp. 18-21. doi: 10.1109/ICIP.2001.958942. [DOI:10.1109/ICIP.2001.958942]

16. A. W. M. Smeulders, M. Worring, S. Santini, A. Gupta, and R. Jain, "Content-based image retrieval at the end of the early years," IEEE Trans. Pattern Anal. Mach. Intell., vol. 22, no. 12, pp. 1349-1380, 2000, doi: 10.1109/34.895972. [DOI:10.1109/34.895972]

17. E. Cambria, A. Hussain, C. Havasi, and C. Eckl, "Common sense computing: from the society of mind to digital intuition and beyond," in European Workshop on Biometrics and Identity Management, Springer, 2009, pp. 252-259. [DOI:10.1007/978-3-642-04391-8_33]

18. E. Perez, H. de Vries, F. Strub, V. Dumoulin, and A. Courville, "Learning Visual Reasoning Without Strong Priors," 2017, [Online]. Available: http://arxiv.org/abs/1707.03017

19. J. Lu, C. Xiong, D. Parikh, and R. Socher, "Knowing When to Look: Adaptive Attention via A Visual Sentinel for Image Captioning," 2016, doi: 10.1109/CVPR.2017.345. [DOI:10.1109/CVPR.2017.345]

20. H. Ling and S. Fidler, "Teaching Machines to Describe Images via Natural Language Feedback," 2017, [Online]. Available: http://arxiv.org/abs/1706.00130

21. A. Chandrasekaran et al., "We Are Humor Beings: Understanding and Predicting Visual Humor," Cvpr 2016, p. 17, Dec. 2015, doi: 10.1109/CVPR.2016.498. [DOI:10.1109/CVPR.2016.498]

22. D. Raposo, A. Santoro, R. Pascanu, T. Lillicrap, P. Battaglia, and U. Kingdom, "Discovering objects and their relations from entangled scene representations," Iclr, pp. 1-10, 2017.

23. R. Krishna et al., "Visual genome: Connecting language and vision using crowdsourced dense image annotations," Feb. 2016, [Online]. Available: http://arxiv.org/abs/1602.07332

24. L. Ladický, J. Shi, and M. Pollefeys, "Pulling things out of perspective," Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., pp. 89-96, 2014, doi: 10.1109/CVPR.2014.19. [DOI:10.1109/CVPR.2014.19]

25. T. Yao, Y. Pan, Y. Li, Z. Qiu, and T. Mei, "Boosting Image Captioning with Attributes," 2016, [Online]. Available: http://arxiv.org/abs/1611.01646 [DOI:10.1109/ICCV.2017.524]

26. L. Ouyang et al., "Training language models to follow instructions with human feedback," Mar. 2022, [Online]. Available: http://arxiv.org/abs/2203.02155

27. N. Ghosh, S. Agrawal, and M. Motwani, "A Survey of Feature Extraction for Content-Based Image Retrieval System," Lect. Notes Networks Syst., vol. 34, pp. 305-313, 2018, doi: 10.1007/978-981-10-8198-9_32. [DOI:10.1007/978-981-10-8198-9_32]

28. R. C. Gonzalez and R. E. Woods, Digital Image Processing, Global Edition. Pearson Education, 2018.

29. J. Devlin et al., "Language Models for Image Captioning : The Quirks and What Works," Acl-2015, no. Me Lm, pp. 100-105, 2015, doi: 10.1103/PhysRevE.92.022112. [DOI:10.1103/PhysRevE.92.022112] [PMID]

30. A. Karpathy and F. F. Li, "Deep visual-semantic alignments for generating image descriptions," in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, IEEE, Jun. 2015, pp. 3128-3137. doi: 10.1109/CVPR.2015.7298932. [DOI:10.1109/CVPR.2015.7298932]

31. A. Karpathy, A. Joulin, and L. Fei-Fei, "Deep Fragment Embeddings for Bidirectional Image Sentence Mapping," in Proceedings of NIPS 2014, 2014, pp. 1-9. [Online]. Available: http://arxiv.org/abs/1406.5679

32. E. H. Huang, R. Socher, C. D. Manning, and A. Ng, "Improving word representations via global context and multiple word prototypes," Proc. 50th Annu. Meet. Assoc. Comput. Linguist. Long Pap. 1, pp. 873-882, 2012, [Online]. Available: http://nlp.stanford.edu/pubs/HuangACL12.pdf

33. T. Mikolov, K. Chen, G. Corrado, and J. Dean, "Distributed Representations of Words and Phrases and their Compositionality," Nips, pp. 1-9, 2013, doi: 10.1162/jmlr.2003.3.4-5.951. [DOI:10.1162/jmlr.2003.3.4-5.951]

34. J. Graham, T. Cootes, C. Taylor, and D. Cooper, "Active shape models-their training and application," Comput. Vis. image Underst., vol. 61, 1995. [DOI:10.1006/cviu.1995.1004]

35. S. Ali and A. Madabhushi, "An integrated region-, boundary-, shape-based active contour for multiple object overlap resolution in histological imagery," IEEE Trans. Med. Imaging, vol. 31, no. 7, pp. 1448-1460, 2012, doi: 10.1109/TMI.2012.2190089. [DOI:10.1109/TMI.2012.2190089] [PMID]

36. Mehmet Sezgin Bu¨ lent Sankur, "Survey over image thresholding techniques and quantitative performance evaluation," J. Electron. Imaging, vol. 13, no. 1, pp. 146-165, 2004. [DOI:10.1117/1.1631315]

37. Y. Liang, M. Zhang, and W. N. Browne, "Image segmentation: A survey of methods based on evolutionary computation," Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 8886, pp. 847-859, 2014, doi: 10.1007/978-3-319-13563-2_71. [DOI:10.1007/978-3-319-13563-2_71]

38. B. M. Carvalho, C. J. Gau, G. T. Herman, and T. Y. Kong, "Algorithms for Fuzzy Segmentation," Int. Conf. Adv. Pattern Recognit., pp. 154-163, 1999, doi: 10.1007/978-1-4471-0833-7_16. [DOI:10.1007/978-1-4471-0833-7_16]

39. B. M. Carvalho, G. T. Herman, and T. Y. Kong, "Simultaneous Fuzzy Segmentation of Multiple Objects," Electron. Notes Discret. Math., vol. 12, pp. 3-22, 2003, doi: 10.1016/S1571-0653(04)00470-6. [DOI:10.1016/S1571-0653(04)00470-6]

40. J. K. Udupa, P. K. Saha, and R. A. Lotufo, "Relative fuzzy connectedness and object definition: Theory, algorithms, and applications in image segmentation," IEEE Trans. Pattern Anal. Mach. Intell., vol. 24, no. 11, pp. 1485-1500, 2002, doi: 10.1109/TPAMI.2002.1046162. [DOI:10.1109/TPAMI.2002.1046162]

41. Hanan Same, "The Quadtree and Related Hierarchical Data Structures," ACM Comput. Surv., vol. 16, pp. 187-260, 1984. [DOI:10.1145/356924.356930]

42. J. P. Marques de Sá, "Structural Pattern Recognition," Pattern Recognit., pp. 243-289, 2001, doi: 10.1007/978-3-642-56651-6_6. [DOI:10.1007/978-3-642-56651-6_6]

43. I. Karoui, R. Fablet, J. M. Boucher, and J. M. Augustin, "Unsupervised region-based image segmentation using texture statistics and level-set methods," 2007 IEEE Int. Symp. Intell. Signal Process. WISP, 2007, doi: 10.1109/WISP.2007.4447617. [DOI:10.1109/WISP.2007.4447617]

44. L. Grady, "Multilabel random walker image segmentation using prior models," Proc. - 2005 IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognition, CVPR 2005, vol. I, pp. 763-771, 2005, doi: 10.1109/CVPR.2005.239. [DOI:10.1109/CVPR.2005.239]

45. X. Yu and J. Yla-Jaaski, "A new algorithm for image segmentation based on region growing and edge detection," Proc. - IEEE Int. Symp. Circuits Syst., vol. 1, pp. 516-519, 1991, doi: 10.1109/iscas.1991.176386. [DOI:10.1109/ISCAS.1991.176386]

46. P. F. Felzenszwalb and D. P. Huttenlocher, "Efficient graph-based image segmentation," Int. J. Comput. Vis., vol. 59, no. 2, pp. 167-181, 2004, doi: 10.1023/B:VISI.0000022288.19776.77. [DOI:10.1023/B:VISI.0000022288.19776.77]

47. L. Lucchese and S. K. Mitray, "Color image segmentation: A state-of-the-art survey," Proc. Indian Natl. Sci. Acad. (INSA-A). Delhi, Indian Natl Sci Acad, vol. 67, pp. 207-221, 2001, [Online]. Available: http://ultra.sdk.free.fr/docs/Image-Processing/filters/Color Image Segmentation-A State-of-the-Art Survey.pdf%5Cnhttp://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.84.4896

48. R. Leahy, "An Optimal Graph Theoretic Approach to Data Clustering: Theory and Its Application to Image Segmentation," IEEE Trans. Pattern Anal. Mach. Intell., vol. 15, no. 11, pp. 1101-1113, 1993, doi: 10.1109/34.244673. [DOI:10.1109/34.244673]

49. L. Grady and G. Funka-Lea, "Multi-label image segmentation for medical applications based on graph-theoretic electrical potentials," Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 3117, pp. 230-245, 2004, doi: 10.1007/978-3-540-27816-0_20. [DOI:10.1007/978-3-540-27816-0_20]

50. L. Grady, "Random Walks for Image Segmentation," IEEE Trans. Pattern Anal. Mach. Intell., vol. 28, no. 11, pp. 1768-1783, 2006. [DOI:10.1109/TPAMI.2006.233] [PMID]

51. C. H. Lampert, M. B. Blaschko, and T. Hofmann, "Beyond sliding windows: Object localization by efficient subwindow search," 26th IEEE Conf. Comput. Vis. Pattern Recognition, CVPR, 2008, doi: 10.1109/CVPR.2008.4587586. [DOI:10.1109/CVPR.2008.4587586]

52. C. H. Lin, R. T. Chen, and Y. K. Chan, "A smart content-based image retrieval system based on color and texture feature," Image Vis. Comput., vol. 27, no. 6, pp. 658-665, 2009, doi: 10.1016/j.imavis.2008.07.004. [DOI:10.1016/j.imavis.2008.07.004]

53. A. Torralba, R. Fergus, and Y. Weiss, "Small codes and large image databases for recognition," 26th IEEE Conf. Comput. Vis. Pattern Recognition, CVPR, 2008, doi: 10.1109/CVPR.2008.4587633. [DOI:10.1109/CVPR.2008.4587633]

54. V. Chandrasekhar, G. Takacs, D. Chen, S. Tsai, R. Grzeszczuk, and B. Girod, "CHoG: Compressed histogram of gradients A low bit-rate feature descriptor," pp. 2504-2511, 2010, doi: 10.1109/cvpr.2009.5206733. [DOI:10.1109/CVPR.2009.5206733]

55. Z. Guo, L. Zhang, and D. Zhang, "Rotation invariant texture classification using LBP variance (LBPV) with global matching," Pattern Recognit., vol. 43, no. 3, pp. 706-719, 2010. [DOI:10.1016/j.patcog.2009.08.017]

56. B. Zhang, Y. Gao, S. Zhao, and J. Liu, "Local derivative pattern versus local binary pattern: face recognition with high-order local pattern descriptor," IEEE Trans. Image Process., vol. 19, no. 2, pp. 533-544, 2010. [DOI:10.1109/TIP.2009.2035882] [PMID]

57. X. Tan and B. Triggs, "Enhanced local texture feature sets for face recognition under difficult lighting conditions," Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 4778 LNCS, pp. 168-182, 2007, doi: 10.1007/978-3-540-75690-3_13. [DOI:10.1007/978-3-540-75690-3_13]

58. X. Yang and K. T. Cheng, "Accelerating SURF detector on mobile devices," MM 2012 - Proc. 20th ACM Int. Conf. Multimed., pp. 569-578, 2012, doi: 10.1145/2393347.2393427. [DOI:10.1145/2393347.2393427]

59. K. Mikolajczyk and C. Schmid, "A performance evaluation of local descriptors," Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., vol. 2, 2003, doi: 10.1109/cvpr.2003.1211478. [DOI:10.1109/CVPR.2003.1211478]

60. J. Johnson, A. Karpathy, and L. Fei-Fei, "DenseCap: Fully Convolutional Localization Networks for Dense Captioning," arXiv Prepr., 2015, doi: 10.1109/CVPR.2016.494. [DOI:10.1109/CVPR.2016.494] [PMID]

61. K. Xu, J. L. B. R. Kiros, K. C. A. Courville, and R. S. R. S. Z. Y. Bengio, "Show, Attend and Tell: Neural Image Caption Generation with Visual Attention," IEEE Trans. Neural Networks, vol. 5, no. 2, pp. 157-166, Feb. 2015, doi: 10.1109/72.279181. [DOI:10.1109/72.279181] [PMID]

62. J. Mao et al., "Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN)," To Appear ICLR-2015, vol. 1090, no. 2014, pp. 1-14, 2015, [Online]. Available: http://cbmm.mit.edu/sites/default/files/publications/CBMM Memo 033.pdf

63. Q. You, H. Jin, Z. Wang, C. Fang, and J. Luo, "Image Captioning with Semantic Attention," Cvpr, p. 10, 2016, doi: 10.1109/CVPR.2016.503. [DOI:10.1109/CVPR.2016.503]

64. K. Cho et al., "Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation," Proc. 2014 Conf. Empir. Methods Nat. Lang. Process., pp. 1724-1734, 2014, doi: 10.3115/v1/D14-1179. [DOI:10.3115/v1/D14-1179]

65. J. Devlin, S. Gupta, R. Girshick, M. Mitchell, and C. L. Zitnick, "Exploring Nearest Neighbor Approaches for Image Captioning," arXiv Prepr., pp. 1-6, 2015, [Online]. Available: http://arxiv.org/abs/1505.04467

66. M. Kolář, M. Hradiš, and P. Zemčík, "Technical Report: Image Captioning with Semantically Similar Images," p. 3, 2015, [Online]. Available: http://arxiv.org/abs/1506.03995

67. J.-B. Michel et al., "Quantitative analysis of culture using millions of digitized books.," Science, vol. 331, no. 6014, pp. 176-82, 2011, doi: 10.1126/science.1199644. [DOI:10.1126/science.1199644] [PMID] []

68. M. J. Choi, A. Torralba, and A. S. Willsky, "Context models and out-of-context objects," Pattern Recognit. Lett., vol. 33, no. 7, pp. 853-862, 2012, doi: 10.1016/j.patrec.2011.12.004. [DOI:10.1016/j.patrec.2011.12.004]

69. S. Woo, J. Park, J. Y. Lee, and I. S. Kweon, "CBAM: Convolutional block attention module," Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 11211 LNCS, pp. 3-19, 2018, doi: 10.1007/978-3-030-01234-2_1. [DOI:10.1007/978-3-030-01234-2_1]

70. A. Zhang et al., "Fine-Grained Scene Graph Generation with Data Transfer," Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 13687 LNCS, pp. 409-424, 2022, doi: 10.1007/978-3-031-19812-0_24. [DOI:10.1007/978-3-031-19812-0_24]

71. J. Kim, J. Park, J. Park, J. Kim, S. Kim, and H. J. Kim, "Groupwise Query Specialization and Quality-Aware Multi-Assignment for Transformer-based Visual Relationship Detection," 2024, [Online]. Available: http://arxiv.org/abs/2403.17709 [DOI:10.1109/CVPR52733.2024.02660]

72. A. Krizhevsky, Ii. Sulskever, and G. E. Hinton, "ImageNet Classification with Deep Convolutional Neural Networks," in Nips, 2012, pp. 1-9.

73. M. D. Zeiler and R. Fergus, "Visualizing and understanding convolutional networks," in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 8689 LNCS, no. PART 1, Springer, 2014, pp. 818-833. doi: 10.1007/978-3-319-10590-1_53. [DOI:10.1007/978-3-319-10590-1_53]

74. K. Fukushima, "Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position," Biol. Cybern., vol. 36, no. 4, pp. 193-202, 1980. [DOI:10.1007/BF00344251] [PMID]

75. D. Lay, Linear algebra and its applications, vol. 41. 2016. doi: 10.1016/0024-3795(81)90106-3. [DOI:10.1016/0024-3795(81)90106-3]

76. Schrutka, "Geometrie der Zahlen," Monatshefte für Math. und Phys., vol. 22, no. 1, pp. A30-A30, 1911, doi: 10.1007/bf01742861. [DOI:10.1007/BF01742861]

77. Amit Singhal, "Modern information retrieval: a brief overview," Bull. Ieee Comput. Soc. Tech. Comm. Data Eng., vol. 24, 2001.

78. H. Minkowski, "The Fundamental Equations for Electromagnetic Processes in Moving Bodies," Math. Klasse, pp. 53-111, 1908.

79. É.O. Rodrigues, "Combining Minkowski and Chebyshev: New distance proposal and survey of distance metrics using k-nearest neighbours classifier," Pattern Recognit. Lett., vol. 110, pp. 66-71, 2018. [DOI:10.1016/j.patrec.2018.03.021]

80. R. W. Hamming, "Error Detecting and Error Correcting Codes," Bell Syst. Tech. J., vol. 29, no. 2, pp. 147-160, 1950, doi: 10.1002/j.1538-7305.1950.tb00463.x. [DOI:10.1002/j.1538-7305.1950.tb00463.x]

81. I. Levenshtein, Vladimir, "Binary Codes Capable of Correcting Deletions, Insertions and Reversals," Sov. Phys. Dokl., vol. 10, p. 707, 1966.

82. P. Jaccard, "Étude comparative de la distribution florale dans une portion des Alpes et des Jura," Bull. la Société Vaudoise des Sci. Nat., vol. 37, pp. 547-579, 1901.

83. G. Van Brummelen, Heavenly mathematics: The forgotten art of spherical trigonometry. 2012. doi: 10.33137/aestimatio.v11i0.26065. [DOI:10.33137/aestimatio.v11i0.26065]

84. P. C. Mahalanobis, "On the Generalised Distance in Statistics," Proc. Natl. Inst. Sci. India, vol. 2, pp. 49-55, 1936.

85. K. Pearson, Mathematical contributions to the theory of evolution, vol. 60, no. 1834. 1896. [Online]. Available: http://books.google.com/books?hl=en&lr=&id=aIU_AQAAIAAJ&oi=fnd&pg=PA1&dq=Mathematical+Contributions+to+the+Theory+of+Evolution&ots=6q0ynawAzT&sig=FdqqMWpdG0a5gRGfvPbW2BRUw8I

86. Manning C.D. and Schutze H., "Foundations of statistical natural language processing.," MIT Press, 1999.

87. C. Shen, S. Panda, and J. T. Vogelstein, "The Chi-Square Test of Distance Correlation," J. Comput. Graph. Stat., vol. 31, no. 1, pp. 254-262, 2022, doi: 10.1080/10618600.2021.1938585. [DOI:10.1080/10618600.2021.1938585] [PMID] []

88. C. Spearman, "The Proof and Measurement of Association between Two Things," Am. J. Psychol., vol. 15, no. 1, p. 72, 1904, doi: 10.2307/1412159. [DOI:10.2307/1412159]

89. G. N. Lance and W. T. Williams, Computer Programs for Hierarchical Polythetic Classification ("Similarity Analyses"), vol. 9, no. 1. 1966. doi: 10.1093/comjnl/9.1.60. [DOI:10.1093/comjnl/9.1.60]

90. J. Shlens, "Notes on Kullback-Leibler Divergence and Likelihood," 2014, [Online]. Available: http://arxiv.org/abs/1404.2000

91. G. Qian, S. Sural, Y. Gu, and S. Pramanik, "Similarity between euclidean and cosine angle distance for nearest neighbor queries," Proc. ACM Symp. Appl. Comput., vol. 2, pp. 1232-1237, 2004, doi: 10.1145/967900.968151. [DOI:10.1145/967900.968151]

92. S. Hochreiter and J. Schmidhuber, "Long short-term memory," Neural Comput., vol. 9, no. 8, pp. 1735-1780, 1997. [DOI:10.1162/neco.1997.9.8.1735] [PMID]

93. Y. Merri, Bart Van; Bengio, "Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation," 1997.

94. A. Vaswani et al., "Attention Is All You Need," Adv. Neural Inf. Process. Syst., vol. 30, Jun. 2017, [Online]. Available: http://arxiv.org/abs/1706.03762

95. J. Devlin, M. W. Chang, K. Lee, and K. Toutanova, "BERT: Pre-training of deep bidirectional transformers for language understanding," NAACL HLT 2019 - 2019 Conf. North Am. Chapter Assoc. Comput. Linguist. Hum. Lang. Technol. - Proc. Conf., vol. 1, pp. 4171-4186, 2019.

96. A. Radford et al., "Learning Transferable Visual Models From Natural Language Supervision," Feb. 2021, [Online]. Available: http://arxiv.org/abs/2103.00020

97. Z. Jiang et al., "MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs," 2024, [Online]. Available: http://arxiv.org/abs/2402.15627

98. D. MEYER, "The cost of training AI could soon become too much to bear." [Online]. Available: https://fortune.com/2024/04/04/ai-training-costs-how-much-is-too-much-openai-gpt-anthropic-microsoft/

99. J. Johnson, A. Gupta, and L. Fei-Fei, "Image Generation from Scene Graphs," 2018, [Online]. Available: http://arxiv.org/abs/1804.01622 [DOI:10.1109/CVPR.2018.00133]

100. X. An et al., "Unicom: Universal and Compact Representation Learning for Image Retrieval," 2023, [Online]. Available: http://arxiv.org/abs/2304.05884

101. J. Deng, J. Guo, N. Xue, and S. Zafeiriou, "ArcFace: Additive Angular Margin Loss for Deep Face Recognition," 2018, [Online]. Available: http://arxiv.org/abs/1801.07698 [DOI:10.1109/CVPR.2019.00482] [PMID] []

102. B. Dhingra, H. Liu, R. Salakhutdinov, and W. W. Cohen, "A Comparative Study of Word Embeddings for Reading Comprehension," 2017, [Online]. Available: http://arxiv.org/abs/1703.00993

103. Q. Guo et al., "M2-Encoder: Advancing Bilingual Image-Text Understanding by Large-scale Efficient Pretraining," Jan. 2024, [Online]. Available: http://arxiv.org/abs/2401.15896

104. J. Yu, Z. Wang, V. Vasudevan, L. Yeung, M. Seyedhosseini, and Y. Wu, "CoCa: Contrastive Captioners are Image-Text Foundation Models," 2022, [Online]. Available: http://arxiv.org/abs/2205.01917

105. M. Wortsman et al., "Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time," Proc. Mach. Learn. Res., vol. 162, pp. 23965-23998, 2022.

106. M. Oquab et al., "DINOv2: Learning Robust Visual Features without Supervision," 2023, [Online]. Available: http://arxiv.org/abs/2304.07193

107. H. Pham et al., "Combined scaling for zero-shot transfer learning," Neurocomputing, vol. 555, 2023, doi: 10.1016/j.neucom.2023.126658. [DOI:10.1016/j.neucom.2023.126658]

108. M. Dehghani et al., "Scaling Vision Transformers to 22 Billion Parameters," Proc. Mach. Learn. Res., vol. 202, pp. 7480-7512, 2023.

109. B. Alkin, L. Miklautz, S. Hochreiter, and J. Brandstetter, "MIM-Refiner: A Contrastive Learning Boost from Intermediate Pre-Trained Representations," 2024, [Online]. Available: http://arxiv.org/abs/2402.10093

110. X. An et al., "Unicom: Universal and Compact Representation Learning for Image Retrieval," Apr. 2023, [Online]. Available: http://arxiv.org/abs/2304.05884

111. X. S. Zhou, T. S. Huang, and N. M. Ave, "Relevance Feedback in Image Retrieval: A Comprehensive Review," ACM Multimed. Syst. J., vol. 544, no. 2003, pp. 536-544, 2001. [DOI:10.1007/s00530-002-0070-3]

112. C. Cortes and V. Vapnik, "Support-vector networks," Mach. Learn., vol. 20, no. 3, pp. 273-297, 1995. [DOI:10.1007/BF00994018]

113. L. Zhang, L. Wang, and W. Lin, "Semisupervised biased maximum margin analysis for interactive image retrieval," IEEE Trans. Image Process., vol. 21, no. 4, pp. 2294-2308, 2012, doi: 10.1109/TIP.2011.2177846. [DOI:10.1109/TIP.2011.2177846] [PMID]

114. W. Bian and D. Tao, "Biased discriminant euclidean embedding for content-based image retrieval," IEEE Trans. Image Process., vol. 19, no. 2, pp. 545-554, 2010, doi: 10.1109/TIP.2009.2035223. [DOI:10.1109/TIP.2009.2035223] [PMID]

115. G. T. Ngo, T. Q. Ngo, and D. D. Nguyen, "Image Retrieval with Relevance Feedback using SVM Active Learning," Int. J. Electr. Comput. Eng., vol. 6, no. 6, p. 3238, 2016, doi: 10.11591/ijece.v6i6.pp3238-3246. [DOI:10.11591/ijece.v6i6.pp3238-3246]

116. X. S. Zhou and T. S. Huang, "Small sample learning during multimedia retrieval using BiasMap," Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., vol. 1, 2001, doi: 10.1109/cvpr.2001.990450. [DOI:10.1109/CVPR.2001.990450]

117. C. D. Ferreira, J. A. Santos, R. Da, M. A. Gonalves, R. C. Rezende, and W. Fan, "Relevance feedback based on genetic programming for image retrieval," Pattern Recognit. Lett., vol. 32, no. 1, pp. 27-37, 2011, doi: 10.1016/j.patrec.2010.05.015. [DOI:10.1016/j.patrec.2010.05.015]

118. C. Schuhmann et al., "LAION-5B: An open large-scale dataset for training next generation image-text models," 2022, [Online]. Available: http://arxiv.org/abs/2210.08402

119. C. Schuhmann et al., "LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs," 2021, [Online]. Available: http://arxiv.org/abs/2111.02114

120. K. Srinivasan, K. Raman, J. Chen, M. Bendersky, and M. Najork, "WIT: Wikipedia-based Image Text Dataset for Multimodal Multilingual Machine Learning," SIGIR 2021 - Proc. 44th Int. ACM SIGIR Conf. Res. Dev. Inf. Retr., pp. 2443-2449, 2021, doi: 10.1145/3404835.3463257. [DOI:10.1145/3404835.3463257]

121. O. Russakovsky et al., "ImageNet Large Scale Visual Recognition Challenge," Int. J. Comput. Vis., vol. 115, no. 3, pp. 211-252, 2015, doi: 10.1007/s11263-015-0816-y. [DOI:10.1007/s11263-015-0816-y]

122. P. Sharma, N. Ding, S. Goodman, and R. Soricut, "Conceptual captions: A cleaned, hypernymed, image alt-text dataset for automatic image captioning," ACL 2018 - 56th Annu. Meet. Assoc. Comput. Linguist. Proc. Conf. (Long Pap., vol. 1, pp. 2556-2565, 2018, doi: 10.18653/v1/p18-1238. [DOI:10.18653/v1/P18-1238] [PMID] []

123. Y. Goyal, T. Khot, A. Agrawal, D. Summers-Stay, D. Batra, and D. Parikh, "Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering," Int. J. Comput. Vis., vol. 127, no. 4, pp. 398-414, 2019, doi: 10.1007/s11263-018-1116-0. [DOI:10.1007/s11263-018-1116-0]

124. T. Y. Lin et al., "Microsoft COCO: Common objects in context," Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 8693 LNCS, no. PART 5, pp. 740-755, 2014, doi: 10.1007/978-3-319-10602-1_48. [DOI:10.1007/978-3-319-10602-1_48]

125. A. Krizhevsky, "Learning Multiple Layers of Features from Tiny Images," … Sci. Dep. Univ. Toronto, Tech. …, pp. 1-60, 2009, doi: 10.1.1.222.9220.

126. P. Young, A. Lai, M. Hodosh, and J. Hockenmaier, "From Image Descriptions to Visual Denotations: New Similarity Metrics for Semantic Inference over Event Descriptions," Trans. Assoc. Comput. Linguist., vol. 2, no. April, pp. 67-78, 2014, [Online]. Available: http://nlp.cs.illinois.edu/HockenmaierGroup/Papers/DenotationGraph.pdf [DOI:10.1162/tacl_a_00166]

127. M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman, "The pascal visual object classes (VOC) challenge," Int. J. Comput. Vis., vol. 88, no. 2, pp. 303-338, 2010, doi: 10.1007/s11263-009-0275-4. [DOI:10.1007/s11263-009-0275-4]

128. H. Touvron et al., "LLaMA: Open and Efficient Foundation Language Models," Feb. 2023, [Online]. Available: http://arxiv.org/abs /2302.13971

ارسال پیام به نویسنده مسئول

بازنشر اطلاعات
	این مقاله تحت شرایط Creative Commons Attribution-NonCommercial 4.0 International License قابل بازنشر است.

کلیه حقوق این تارنما متعلق به فصل‌نامة علمی - پژوهشی پردازش علائم و داده‌ها است.

نظر شما در مورد قالب جدید چیست؟
	خوب
	متوسط
	ضعیف

پایگاه‌های مرتبط

واژگان کلیدی

نظرسنجی