برچسب‌زنی مقیاس‌پذیر تصاویر با خلاصه‌سازی نمونه‌ها به نماینده‌های برچسب‌دار

محمدی کاشانی, محیا; امیری, سید حمید

doi:10.52547/jsdp.18.4.49

دوره 18، شماره 4 - ( 12-1400 ) جلد 18 شماره 4 صفحات 68-49 | برگشت به فهرست نسخه ها

‎ 10.52547/jsdp.18.4.49

Mendeley

Zotero

RefWorks

Mohammadi Kashani M, Amiri S H. Scalable Image Annotation by Summarizing Training Samples into Labeled Prototypes. JSDP 2022; 18 (4) : 4
URL: http://jsdp.rcisp.ac.ir/article-1-1046-fa.html

محمدی کاشانی محیا، امیری سید حمید. برچسب‌زنی مقیاس‌پذیر تصاویر با خلاصه‌سازی نمونه‌ها به نماینده‌های برچسب‌دار. پردازش علائم و داده‌ها. 1400; 18 (4) :49-68

URL: http://jsdp.rcisp.ac.ir/article-1-1046-fa.html

برچسب‌زنی مقیاس‌پذیر تصاویر با خلاصه‌سازی نمونه‌ها به نماینده‌های برچسب‌دار

محیا محمدی کاشانی

، سید حمید امیری^*

دانشگاه تربیت دبیر شهید رجایی

چکیده: (3163 مشاهده)

با افزایش روز‌افزون تصاویر، اندیس‌گذاری و جستجوی سریع آنها در پایگاه داده‌های بزرگ، یک امر ضروری است. یکی از راه‌کارهای مؤثر، نسبت‌دادن یک یا چند برچسب به هر تصویر با هدف توصیف محتوای درون آن است. با وجود کارایی روش‌های خودکار برچسب‌زنی، یکی از چالش‌های اساسی آنها مقیاس‌پذیری با افزایش تصاویر پایگاه داده است. در این مقاله، با هدف حل این چالش، ابتدا براساس توصیف‌گر بصری تصاویر که از شبکه‌های یادگیری عمیق استخراج می‌شوند،‌ نمایندگان مناسبی به‌دست می‌آیند. سپس، با استفاده از رویه انتشار برچسب بر روی گراف، برچسب‌های معنایی از تصاویر آموزشی به نمایندگان منتشر می‌شوند. با این راه‌کار، به یک مجموعه نمایندگان برچسب‎‌دار دست خواهیم یافت که می‌توان عمل برچسب‌زنی هر تصویر آزمون را بر اساس این نمایندگان انجام داد. برای برچسب‌زنی، یک رویکرد مبتنی بر آستانه‌گذاری وفقی پیشنهاد شده است. با روش پیشنهادی، می‌توان اندازه مجموعه‌داده آموزشی را به 6/22 درصد اندازه اولیه کاهش داد که منجر به تسریع حداقل 2/4 برابری زمان برچسب‌زنی خواهد شد. همچنین، کارایی برچسبزنی بر روی مجموعه‌داده‌های مختلف برحسب سه معیار دقت، یادآوری و F1 در حد مطلوبی حفظ شده است.

شماره‌ی مقاله: 4

واژه‌های کلیدی: خلاصه‌سازی پایگاه داده، برچسب‌زنی تصویر، روش مبتنی برجستجو، مقیاس‌پذیری

متن کامل [PDF 1267 kb] (1033 دریافت)

نوع مطالعه: كاربردي | موضوع مقاله: مقالات پردازش تصویر
دریافت: 1398/4/23 | پذیرش: 1399/5/28 | انتشار: 1401/1/1 | انتشار الکترونیک: 1401/1/1

فهرست منابع

1. [1] V. N. Murthy, E. F. Can, and R. Manmatha, "A hybrid model for automatic image annotation," in Proceedings of International Conference on Multimedia Retrieval, pp. 355-369, 2014. [DOI:10.1145/2578726.2578774]

2. [2] S. Feng, R. Manmatha, and V. Lavrenko, "Multiple Bernoulli relevance models for image and video annotation," in Computer Vision and Pattern Recognition (CVPR), 2004.

3. [3] P. Ji, X. Gao, and X. Hu, "Automatic image annotation by combining generative and discriminant models," Neurocomputing, 2016. [DOI:10.1016/j.neucom.2016.09.108]

4. [4] L. Ballan, T. Uricchio, L. Seidenari, and A. Del Bimbo, "A cross-media model for automatic image annotation," in Proceedings of International Conference on Multimedia Retrieval, 2014, pp. 73. [DOI:10.1145/2578726.2578728]

5. [5] J. Jeon, V. Lavrenko, and R. Manmatha, "Automatic image annotation and retrieval using cross-media relevance models," in Proceedings of the 26th annual international ACM SIGIR conference on Research and development in information retrieval, 2003, pp. 119-126. [DOI:10.1145/860435.860459]

6. [6] J. Li and J. Z. Wang, "Automatic linguistic indexing of pictures by a statistical modeling approach," IEEE Transactions on pattern analysis and machine intelligence, vol. 25, pp. 1075-1088, 2003. [DOI:10.1109/TPAMI.2003.1227984]

7. [7] A. Makadia and V. Pavlovic, "Baselines for image annotation." International Journal of Computer Vision, pp. 88-105, 2010. [DOI:10.1007/s11263-010-0338-6]

8. [8] Wang, J., Yang, J., Lv, F., Huang, T., "Locality-constrained linear coding for image classification," 2010. [DOI:10.1109/CVPR.2010.5540018]

9. [9] M. M. Kashani and S. H. Amiri, "Leveraging deep learning representation for search-based image annotation," in Artificial Intelligence and Signal Processing Conference (AISP), 2017, pp. 156-161. [DOI:10.1109/AISP.2017.8324073]

10. [10] V. N. Murthy, S. Maji, and R. Manmatha, "Automatic image annotation using deep learning representations," in Proceedings of the 5th ACM on International Conference on Multimedia Retrieval, 2015, pp. 603-606. [DOI:10.1145/2671188.2749391] [PMID]

11. [11] X. Li, T. Uricchio, L. Ballan, M. Bertini, "Socializing the semantic gap: A comparative survey on image tag assignment, refinement, and retrieval." ACM Computing Surveys (CSUR), 2016, 49(1): 14. [DOI:10.1145/2906152]

12. [12] Q. Cheng, Q. Zhang, P. Fu, C. Tu, S. Li, "A survey and analysis on automatic image annotation," Pattern Recognition, pp. 242-259, 2018. [DOI:10.1016/j.patcog.2018.02.017]

13. [13] D. M. Blei, A. Y. Ng, and M. I. Jordan, "Latent dirichlet allocation," Journal of machine Learning research, vol. 3, pp. 993-1022, 2003.

14. [14] F. Monay and D. Gatica-Perez, "PLSA-based image auto-annotation: constraining the latent space," in Proceedings of the 12th annual ACM international conference on Multimedia, 2004, pp. 348-351. [DOI:10.1145/1027527.1027608]

15. [15] A. Llorente, R. Manmatha, S. Ruger, Image retrieval using markov random Fields and global image features, in Proceedings of the ACM International,Conference on Image and Video Retrieval, ACM, 2010, pp. 243-250. [DOI:10.1145/1816041.1816078]

16. [16] Y. Xiang, X. Zhou, T.-S. Chua, C.-W. Ngo, A revisit of generative model for automatic image annotation using markov random _elds, in Computer Vision and Pattern Recognition, 2009. CVPR 2009, IEEE Conference on, IEEE, 2009, pp. 1153-1160. [DOI:10.1109/CVPR.2009.5206518]

17. [17] I. Dimitrovski, D. Kocev, S. Loskovska, S. D_zeroski, Hierarchical annotation of medical images, Pattern Recognition 44 (10-11), pp. 2436-2449, 2011. [DOI:10.1016/j.patcog.2011.03.026]

18. [18] J. Wang and J. Hu, Multi-label image annotation via maximum consistency, in Image Processing (ICIP), 2010 17th IEEE International Conference on, IEEE, 2010, pp. 2337-2340. [DOI:10.1109/ICIP.2010.5649863] [PMCID]

19. [19] H.Wang, H. Huang, C. Ding, Image annotation using the bi-relational graph of images and semantic labels, in Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on IEEE, 2011, pp. 793-800. [DOI:10.1109/CVPR.2011.5995379] [PMCID]

20. [20] Z. Lin, G. Ding, M. Hu, J. Wang, X. Ye, Image tag completion via image-specific and tag-specific linear sparse reconstructions, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2013, pp. 1618-1625. [DOI:10.1109/CVPR.2013.212]

21. [21] L. Wu, R. Jin, A. K. Jain, Tag Completion for image retrieval, IEEE Trans. Pattern Anal. Mach. Intell. 35 (3), (2013), pp. 716-727. [DOI:10.1109/TPAMI.2012.124] [PMID]

22. [22] Z. Qin, C.-G. Li, H. Zhang, J. Guo, Improving tag matrix completion for image annotation and retrieval, in Visual Communications and Image Processing (VCIP), IEEE, 2015, pp. 1-4. [DOI:10.1109/VCIP.2015.7457871]

23. [23] X.-Y. Jing, F. Wu, Z. Li, R. Hu, D. Zhang, Multi-label dictionary learning for image annotation, IEEE Transactions on Image Processing 25 (6) (2016),2712-2725. [DOI:10.1109/TIP.2016.2549459] [PMID]

24. [24] Y. Hou, Z. Lin, Image tag completion and refinement by subspace clustering and matrix completion, in Visual Communications and Image Processing(VCIP), 2015, IEEE, 2015, pp. 1-4. [DOI:10.1109/VCIP.2015.7457875]

25. [25] Z. Lin, G. Ding, M. Hu, Y. Lin, S. S. Ge, Image tag completion via dual-view linear sparse reconstructions, Computer Vision and Image Understanding, 124 (2014) 42-60 [DOI:10.1016/j.cviu.2014.03.012]

26. [26] K. Q. Weinberger, L. K. Saul, Distance metric learning for large margin nearest neighbor classification, Journal of Machine Learning Research, 10, pp. 207-244, 2009.

27. [27] E. P. Xing, M. I. Jordan, S. J. Russell, A. Y. Ng, Distance metric learning with application to clustering with side-information, in Advances in neural information processing systems, pp. 521-528, 2003.

28. [28] S. C. Hoi, W. Liu, M. R. Lyu, W.-Y. Ma, Learning distance metrics with contextual constraints for image retrieval, in Computer vision and pattern recognition, IEEE computer society conference, Vol. 2, 2006, pp. 2072-2078.

29. [29] Y. Verma and & C. V. Jawahar, Image annotation by propagating labels from semantic neighbourhoods. International Journal of Computer Vision, 2017, 121. 1., pp. 126-148. [DOI:10.1007/s11263-016-0927-0]

30. [30] M. Guillaumin, T. Mensink, J. Verbeek, and C. Schmid, "Tagprop: Discriminative metric learning in nearest neighbor models for image auto-annotation," in 2009 IEEE 12th international conference on computer vision, 2009, pp. 309-316. [DOI:10.1109/ICCV.2009.5459266]

31. [31] L. Wu, S. C. Hoi, R. Jin, J. Zhu, N. Yu, Distance metric learning from uncertain side information with application to automated photo tagging, in Proceedings of the 17th ACM international conference on Multimedia,, 2009, pp. 135-144. [DOI:10.1145/1631272.1631293]

32. [32] A. Bar-Hillel, T. Hertz, N. Shental, D. Weinshall, Learning a Mahalanobis metric from equivalence constraints, Journal of Machine Learning Research, 6, pp. 937-965, Jun 2005.

33. [33] F. Liu, T. Xiang, T. M. Hospedales, W. Yang, C. Sun, Semantic regularisation for recurrent image annotation, in Computer Vision and Pattern Recognition (CVPR), IEEE Conference, 2017, pp. 4160-4168. [DOI:10.1109/CVPR.2017.443] [PMCID]

34. [34] J. Johnson, L. Ballan, L. Fei-Fei, Love the neighbors: Image annotation by exploiting image metadata, in Proceedings of the IEEE international conference on computer vision, 2015, pp. 4624-4632. [DOI:10.1109/ICCV.2015.525]

35. [35] H.-F. Yu, P. Jain, P. Kar, I. Dhillon, Large-scale multi-label learning with missing labels, in International conference on machine learning, 2014, pp. 593-601.

36. [36] Y. Verma, C. Jawahar, Exploring svm for image annotation in presence of confusing labels, in BMVC, 2013, pp. 1-25. [DOI:10.5244/C.27.25] [PMID]

37. [37] B. Hariharan, L. Zelnik-Manor, M. Varma, S. Vishwanathan, Large scale max-margin multi-label classification with priors, in Proceedings of the 27th International Conference on Machine Learning (ICML-10), Citeseer, 2010, pp. 423-430.

38. [38] Y. Li, Y. Song, J. Luo, Improving pairwise ranking for multi-label image classification, in the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 3617-3625. [DOI:10.1109/CVPR.2017.199]

39. [39] T. Lan, G. Mori, A max-margin riffled independence model for image tag ranking, in IEEE Conference on Computer Vision and Pattern Recognition, IEEE, 2013, pp. 3103-3110. [DOI:10.1109/CVPR.2013.399]

40. [40] Y. Yang, W. Zhang, and Y. Xie, "Image automatic annotation via multi-view deep representation," Journal of Visual Communication and Image Representation, vol. 33, 2015, pp. 368-377. [DOI:10.1016/j.jvcir.2015.10.006]

41. [41] H. K. Shooroki, M. A. Z. Chahooki, Selection of effective training instances for scalable automatic image annotation, Multimedia Tools and Applications, 2017, 76 (7) (2017), pp. 9643-9666. [DOI:10.1007/s11042-016-3572-2]

42. [42] S. H. Amiri and M. Jamzad. "Leveraging multi-modal fusion for graph-based image annotation.", Journal of Visual Communication and Image Representation, 2018, 55, pp. 816-828. [DOI:10.1016/j.jvcir.2018.08.012]

43. [43] R. Rad and M. Jamzad. "Image annotation using multi-view non-negative matrix factorization with a different number of basis vectors." Journal of Visual Communication and Image Representation, 2017, 46: 1-12. [DOI:10.1016/j.jvcir.2017.03.005]

44. [44] M. M. Kalayeh, H. Idrees, and M. Shah, "NMF-KNN: Image annotation using weighted multi-view non-negative matrix factorization," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 184-191. [DOI:10.1109/CVPR.2014.31]

45. [45] Sun, Y., Liu, Q., Tang, J., Tao, D., "Learning discriminative dictionary for group sparse representation." IEEE transactions on image processing, 2014, 23(9): 3816-3828. [DOI:10.1109/TIP.2014.2331760] [PMID]

46. [46] XC. Deng, X. Liu, Y. Mu, J. Li, Large-scale multi-task image labeling with adaptive relevance discovery and feature hashing, Signal Processing 112 , 2015, pp. 137-145. [DOI:10.1016/j.sigpro.2014.07.017]

47. [47] J. Wang, G. Li, A multi-modal hashing learning framework for automatic image annotation, in IEEE Second International Conference on Data Science in Cyberspace (DSC), IEEE, 2017, pp. 14-21. [DOI:10.1109/DSC.2017.48]

48. [48] Wang, Changhu, Shuicheng Yan, Lei Zhang, and Hong-Jiang Zhang. "Multi-label sparse coding for automatic image annotation." In 2009 IEEE Conference on Computer Vision and Pattern Recognition, IEEE, 2009, pp. 1643-1650. [DOI:10.1109/CVPR.2009.5206866] [PMCID]

49. [49] Q. Zhang and B. Li, 2015. Dictionary learning in visual computing. Synthesis Lectures on Image, Video, & Multimedia Processing, 8(2), pp.1-151. [DOI:10.2200/S00640ED1V01Y201504IVM018]

50. [50] F. Wang and C. Zhang, "Label propagation through linear neighborhoods," IEEE Transactions on Knowledge and Data Engineering, vol. 20, pp. 55-67, 2008. [DOI:10.1109/TKDE.2007.190672]

51. [51] K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition,", arXiv preprint arXiv:1409.1556, 2014.

52. [52] K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 770-778. [DOI:10.1109/CVPR.2016.90] [PMID]

53. [53] G. Huang and Z. Liu, "Densely connected convolutional networks," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 3. [DOI:10.1109/CVPR.2017.243] [PMCID]

ارسال پیام به نویسنده مسئول

بازنشر اطلاعات
	این مقاله تحت شرایط Creative Commons Attribution-NonCommercial 4.0 International License قابل بازنشر است.

کلیه حقوق این تارنما متعلق به فصل‌نامة علمی - پژوهشی پردازش علائم و داده‌ها است.

نظر شما در مورد قالب جدید چیست؟
	خوب
	متوسط
	ضعیف

پایگاه‌های مرتبط

واژگان کلیدی

نظرسنجی