بخش‌بندی معنایی بی‌ناظر تصاویر RGB-D با استفاده از ترکیب روش برش گراف و میدان تصادفی شرطی

میرکمالی, سیدسعید

***************«بسم الله الرحمن الرحیم» نشریه علمی «پردازش علائم و داده‌ها» با مجوز رسمی از کمیسیون نشریات وزارت علوم، تحقیقات و فناوری، صاحب امتیاز: پژوهشگاه توسعه فناوری‌های پیشرفته ***************

Signal and Data Processing Journal A scientific journal officially licensed by the Commission for Scientific Publications of the (MSRT). Publisher: Research Ceter for Developmen of Technologies

EN FA

دوره 22، شماره 4 - ( 12-1404 ) جلد 22 شماره 4 صفحات 39-52 | برگشت به فهرست نسخه ها

Mendeley

Zotero

RefWorks

Mirkamali S. Unsupervised Semantic Segmentation of RGB-D Images Using Combination of Conditional Random Field with Graph Cuts. JSDP 2026; 22 (4) : 3
URL: http://jsdp.rcisp.ac.ir/article-1-1448-fa.html

میرکمالی سیدسعید. بخش‌بندی معنایی بی‌ناظر تصاویر RGB-D با استفاده از ترکیب روش برش گراف و میدان تصادفی شرطی. پردازش علائم و داده‌ها. 1404; 22 (4) :52-39

URL: http://jsdp.rcisp.ac.ir/article-1-1448-fa.html

بخش‌بندی معنایی بی‌ناظر تصاویر RGB-D با استفاده از ترکیب روش برش گراف و میدان تصادفی شرطی

سیدسعید میرکمالی^*

استادیار گروه مهندسی کامپیوتر و فناوری اطلاعات، دانشگاه پیام‌نور، تهران، ایران

چکیده: (596 مشاهده)

هدف بخشبندی معنایی، اختصاص برچسب متناسب به مجموعه‌ای از پیکسل‌های یک شی در یک تصویر با توجه به مشخصات ظاهری و معنایی آن است. این مسئله یکی از چالش برانگیزترین کارها در علم پردازش تصویر و بینایی ماشین است و در سال‌های اخیر بسیار مورد توجه جامعه بینایی ماشین قرار گرفته است. در این مقاله، روشی برای بخشبندی معنایی تصاویر RGB-D به‌صورت لایه‌به‌لایه ارائه شده‌است. الگوریتم پیشنهادی، ویژگی‌های ظاهری و اطلاعات عمق را در یک مدل میدان تصادفی شرطی (CRF) بدون نظارت یک‌پارچه می‌کند و از یک روش برش گراف کمک می‌گیرد تا یک صحنه را به لایه‌های منسجم و معنادار تقسیم کند. روش پیشنهادی از برش‌های گراف برای بهینه‌سازی فرایند برچسب‌گذاری استفاده می‌کند. در این مقاله برای ارزیابی عملکرد روش پیشنهادی از نظر کمی و کیفی از دو مجموعه‌داده مختلف استفاده شده‌است که هریک ویژگی‌های منحصربه‌فردی دارند؛ همچنین برای مقایسه روش پیشنهادی نتایج به‌دست‌آمده با هشت روش بخش‌بندی معنایی نظارت‌شده و بدون ناظر دیگر مقایسه شده‌اند. نتایج آنالیزها نشان می‌دهد که CRF بدون نظارت می‌تواند به‌اندازه روش‌های نظارت‌شده دقیق باشد و در بسیاری از موارد حتی می‌تواند بهتر از سایر روش‌های بخش‌بندی عمل کند.

شماره‌ی مقاله: 3

واژه‌های کلیدی: بخش‌بندی لایه‌به‌لایه، تصویر RGB-D، میدان تصادفی شرطی CRF، برش گراف

متن کامل [PDF 1313 kb] (169 دریافت)

نوع مطالعه: پژوهشي | موضوع مقاله: مقالات پردازش تصویر
دریافت: 1403/9/7 | پذیرش: 1404/4/30 | انتشار: 1404/12/29 | انتشار الکترونیک: 1404/12/29

فهرست منابع

1. D. Comaniciu and P. Meer, "Mean shift: A robust approach toward feature space analysis," IEEE Transactions on pattern analysis and machine intelligence, vol. 24, no. 5, pp. 603-619, 2002. [DOI:10.1109/34.1000236]

2. Z. Wu, Z. Zhou, G. Allibert, C. Stolz, C. Demonceaux, and C. Ma, "Transformer fusion for indoor rgb-d semantic segmentation," Computer Vision and Image Understanding, vol. 249, p. 104174, 2024. [DOI:10.1016/j.cviu.2024.104174]

3. C. Liu, W. T. Freeman, E. H. Adelson, and Y. Weiss, "Human-assisted motion annotation," in 2008 IEEE Conference on Computer Vision and Pattern Recognition, 2008: IEEE, pp. 1-8. [DOI:10.1109/CVPR.2008.4587845]

4. P. F. Felzenszwalb and D. P. Huttenlocher, "Efficient graph-based image segmentation," International journal of computer vision, vol. 59, pp. 167-181, 2004. [DOI:10.1023/B:VISI.0000022288.19776.77]

5. D. Sun, E. B. Sudderth, and M. J. Black, "Layered segmentation and optical flow estimation over time," in 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012: IEEE, pp. 1768-1775. [DOI:10.1109/CVPR.2012.6247873]

6. L. u. Ladický, C. Russell, P. Kohli, and P. H. Torr, "Associative hierarchical crfs for object class image segmentation," in 2009 IEEE 12th international conference on computer vision, 2009: IEEE, pp. 739-746. [DOI:10.1109/ICCV.2009.5459248]

7. Criminisi, G. Cross, A. Blake, and V. Kolmogorov, "Bilayer segmentation of live video," in 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06), 2006, vol. 1: IEEE, pp. 53-60. [DOI:10.1109/CVPR.2006.69]

8. M. Szummer, P. Kohli, and D. Hoiem, "Learning CRFs using graph cuts," in Computer Vision-ECCV 2008: 10th European Conference on Computer Vision, Marseille, France, October 12-18, 2008, Proceedings, Part II 10, 2008: Springer, pp. 582-595. [DOI:10.1007/978-3-540-88688-4_43]

9. Y. Boykov, O. Veksler, and R. Zabih, "Fast approximate energy minimization via graph cuts," IEEE Transactions on pattern analysis and machine intelligence, vol. 23, no. 11, pp. 1222-1239, 2001. [DOI:10.1109/34.969114]

10. C. Rother, V. Kolmogorov, and A. Blake, "" GrabCut" interactive foreground extraction using iterated graph cuts," ACM transactions on graphics (TOG), vol. 23, no. 3, pp. 309-314, 2004. [DOI:10.1145/1015706.1015720]

11. S. Mirkamali and P. Nagabhushan, "Depth-wise image inpainting," in Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012), 2012: IEEE, pp. 141-144.

12. حاجی اسماعیلی، محمدمهدی، منتظر، غلامعلی، «مروری نقادانه بر روش‌های بازیابی محتوامحور و معناگرای تصاویر»، فصلنامة پردازش علائم و دادهها، 22 (1)، صص 113-141، 1404.

12. M. M. Haji-Esmaeili and G. Montazer, "a Critical Survey on Content-Based & Semantic Image Retrieval - Abstract," (in eng), Signal and Data Processing, Research vol. 22, no. 1, pp. 113-141, 2025, doi: 10.61186/jsdp.22.1.113. [DOI:10.61186/jsdp.22.1.113]

13. J. Shi and J. Malik, "Normalized cuts and image segmentation," IEEE Transactions on pattern analysis and machine intelligence, vol. 22, no. 8, pp. 888-905, 2000. [DOI:10.1109/34.868688]

14. S. Du, W. Wang, R. Guo, R. Wang, and S. Tang, "Asymformer: Asymmetrical cross-modal representation learning for mobile platform real-time rgb-d semantic segmentation," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 7608-7615. [DOI:10.1109/CVPRW63382.2024.00756]

15. X. He, R. S. Zemel, and D. Ray, "Learning and incorporating top-down cues in image segmentation," in Computer Vision-ECCV 2006: 9th European Conference on Computer Vision, Graz, Austria, May 2006,7-13 Proceedings, Part I 9, 2006: Springer, pp. 338-351. [DOI:10.1007/11744023_27]

16. Ren and Malik, "Learning a classification model for segmentation," in Proceedings ninth IEEE international conference on computer vision, 2003: IEEE, pp. 10-17 vol. 1. [DOI:10.1109/ICCV.2003.1238308]

17. A. Jepson and M. J. Black, "Mixture models for optical flow computation," in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 1993: IEEE, pp. 760-761. [DOI:10.1109/CVPR.1993.341161]

18. N. Jojic and B. J. Frey, "Learning flexible sprites in video layers," in Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001, 2001, vol. 1: IEEE, pp. I-I. [DOI:10.1109/CVPR.2001.990476]

19. D. Sun, E. Sudderth, and M. Black, "Layered image motion with explicit occlusions, temporal consistency, and depth ordering," Advances in Neural Information Processing Systems, vol. 23, 2010.

20. M. Bleyer, C. Rother, P. Kohli, D. Scharstein, and S. Sinha, "Object stereo-joint stereo matching and object segmentation," in CVPR 2011, 2011: IEEE, pp. 3081-3088. [DOI:10.1109/CVPR.2011.5995581]

21. N. Silberman, D. Hoiem, P. Kohli, and R. Fergus, "Indoor segmentation and support inference from rgbd images," in Computer Vision-ECCV 2012: 12th European Conference on Computer Vision, Florence, Italy, October 7-13, 2012, Proceedings, Part V 12, 2012: Springer, pp. 746-760. [DOI:10.1007/978-3-642-33715-4_54]

22. L. Wang, C. Zhang, R. Yang, and C. Zhang, "Tofcut: Towards robust real-time foreground extraction using a time-of-flight camera," in Proc. of 3DPVT, 2010, pp. 1-8.

23. A. D. Jepson, D. J. Fleet, and M. J. Black, "A layered motion representation with occlusion and compact spatial support," in Computer Vision-ECCV 2002: 7th European Conference on Computer Vision Copenhagen, Denmark, May 28-31, 2002 Proceedings, Part I 7, 2002: Springer, pp. 692-706. [DOI:10.1007/3-540-47969-4_46]

24. Y. Weiss and E. H. Adelson, "A unified mixture framework for motion segmentation: Incorporating spatial coherence and estimating the number of models," in Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 1996: IEEE, pp. 321-326. [DOI:10.1109/CVPR.1996.517092]

25. J. Wills, Agarwal, S., and Belongie, S., "What Went Where," CVPR, vol. v.1, pp. 37-44, 2003.

26. J. Xiao and M. Shah, "Motion layer extraction in the presence of occlusion using graph cuts," IEEE transactions on pattern analysis and machine intelligence, vol. 27, no. 10, pp. 1644-1659, 2005. [DOI:10.1109/TPAMI.2005.202]

27. P. Kohli, L. u. Ladický, and P. H. Torr, "Robust higher order potentials for enforcing label consistency," International Journal of Computer Vision, vol. 82, pp. 302-324, 2009. [DOI:10.1007/s11263-008-0202-0]

28. B. Yin, X. Zhang, Z. Li, L. Liu, M.-M. Cheng, and Q. Hou, "Dformer: Rethinking rgbd representation learning for semantic segmentation," arXiv preprint arXiv:2309.09668, 2023.

29. L. Zhong, C. Guo, J. Zhan, and J. Deng, "Attention-based fusion network for RGB-D semantic segmentation," Neurocomputing, vol. 608, p. 128371, 2024. [DOI:10.1016/j.neucom.2024.128371]

30. Z. Li, C. Lang, G. Li, T. Wang, and Y. Li, "Depth guided feature selection for RGBD salient object detection," Neurocomputing, vol. 519, pp. 57-68, 2023. [DOI:10.1016/j.neucom.2022.11.030]

31. Y. Tong, J. Chen, and Y. Wang, "Geometry-guided multilevel RGBD fusion for surface normal estimation," Computer Communications, vol. 206, pp. 73-84, 2023. [DOI:10.1016/j.comcom.2023.04.014]

32. B. Xiong, Y. Peng, J. Zhu, J. Gu, Z. Chen, and W. Qin, "AGWNet: Attention-guided adaptive shuffle channel gate warped feature network for indoor scene RGB-D semantic segmentation," Displays, p. 102730, 2024. [DOI:10.1016/j.displa.2024.102730]

33. N. Komodakis, G. Tziritas, and N. Paragios, "Fast, approximately optimal solutions for single and dynamic MRFs," in 2007 IEEE Conference on Computer Vision and Pattern Recognition, 2007: IEEE, pp. 1-8. [DOI:10.1109/CVPR.2007.383095]

34. S. Gupta, P. Arbelaez, and J. Malik, "Perceptual organization and recognition of indoor scenes from RGB-D images," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2013, pp. 564-571. [DOI:10.1109/CVPR.2013.79]

35. S. Gupta, R. Girshick, P. Arbeláez, and J. Malik, "Learning rich features from RGB-D images for object detection and segmentation," in Computer Vision-ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part VII 13, 2014: Springer, pp. 345-360. [DOI:10.1007/978-3-319-10584-0_23]

36. Y. Liu, O. Yoshie, and H. Watanabe, "Application of multi-modal fusion attention mechanism in semantic segmentation," in Proceedings of the Asian conference on computer vision, 2022, pp. 1245-1264. [DOI:10.1007/978-3-031-26293-7_23]

37. Y. Zhang, C. Xiong, J. Liu, X. Ye, and G. Sun, "Spatial-information guided adaptive context-aware network for efficient RGB-D semantic segmentation," IEEE Sensors Journal, 2023. [DOI:10.1109/JSEN.2023.3304637]

38. G. Zhang, J. Jia, T.-T. Wong, and H. Bao, "Consistent depth maps recovery from a video sequence," IEEE Transactions on pattern analysis and machine intelligence, vol. 31, no. 6, pp. 974-988, 2009. [DOI:10.1109/TPAMI.2009.52]

39. W. M. Rand, "Objective criteria for the evaluation of clustering methods," Journal of the American Statistical association, vol. 66, no. 336, pp. 846-850, 1971. [DOI:10.1080/01621459.1971.10482356]

ارسال پیام به نویسنده مسئول

بازنشر اطلاعات
	این مقاله تحت شرایط Creative Commons Attribution-NonCommercial 4.0 International License قابل بازنشر است.