1. [1] M. Mohseni and M. Seriani, "Pedestrian Detection in Infrared Image Sequences Using SVM and Histogram Classifiers," JSDP, vol. 6, no. 1, pp. 79-90, 2009.
2. [2] S. Shafeipour Yourdeshahi, H. Seyedarabi, and A. Aghagolzadeh, "Video based Face Recognition Using Orthogonal Locality Preserving Projection," JSDP, vol. 13, no. 2, pp. 139-149, Sep. 2016.
3. [3] Z. Xu, Y. Yang, and A. G. Hauptmann, "A discriminative CNN video representation for event detection," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 1798-1807, Accessed: Apr. 08, 2016. [Online]. [
DOI:10.1109/CVPR.2015.7298789] [
PMCID]
4. [4] L. Wang, C. Gao, J. Liu, and D. Meng, "A novel learning-based frame pooling method for event detection," Signal Processing, vol. 140, pp. 45-52, 2017. [
DOI:10.1016/j.sigpro.2017.05.005]
5. [5] S. Kwak, B. Han, and J. H. Han, "Scenario-based video event recognition by constraint flow," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2011, pp. 3345-3352. [
DOI:10.1109/CVPR.2011.5995435]
6. [6] Y. Cong, J. Yuan, and J. Luo, "Towards scalable summarization of consumer videos via sparse dictionary selection," IEEE Transactions on Multimedia, vol. 14, no. 1, pp. 66-75, 2012. [
DOI:10.1109/TMM.2011.2166951]
7. [7] N. Dalal and B. Triggs, "Histograms of oriented gradients for human detection," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2005, vol. 1, pp. 886-893.
8. [8] D. G. Lowe, "Distinctive image features from scale-invariant keypoints," International journal of computer vision, vol. 60, no. 2, pp. 91-110, 2004. [
DOI:10.1023/B:VISI.0000029664.99615.94]
9. [9] K. E. Van De Sande, T. Gevers, and C. G. Snoek, "Evaluating color descriptors for object and scene recognition," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 32, no. 9, pp. 1582-1596, 2010. [
DOI:10.1109/TPAMI.2009.154] [
PMID]
10. [10] I. Laptev, "On space-time interest points," International Journal of Computer Vision, vol. 64, no. 2, pp. 107-123, 2005. [
DOI:10.1007/s11263-005-1838-7]
11. [11] M. Chen and A. Hauptmann, "Mosift: Recognizing human actions in surveillance videos," Carnegie Mellon University, Technical Report CMU-CS-09-161, 2009.
12. [12] H. Wang, A. Kläser, C. Schmid, and C.-L. Liu, "Action recognition by dense trajectories," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2011, pp. 3169-3176. [
DOI:10.1109/CVPR.2011.5995407]
13. [13] H. Wang and C. Schmid, "Action recognition with improved trajectories," in Proceedings of the IEEE International Conference on Computer Vision (ICCV), Dec. 2013, pp. 3551-3558. [
DOI:10.1109/ICCV.2013.441] [
PMCID]
14. [14] D. Oneata, J. Verbeek, and C. Schmid, "Action and event recognition with Fisher vectors on a compact feature set," in Proceedings of the IEEE International Conference on Computer Vision, Dec. 2013, pp. 1817-1824. [
DOI:10.1109/ICCV.2013.228]
15. [15] F. Metze, S. Rawat, and Y. Wang, "Improved audio features for large-scale multimedia event detection," in Proceedings of the IEEE International Conference on Multimedia and Expo (ICME), Jul. 2014, pp. 1-6. [
DOI:10.1109/ICME.2014.6890234]
16. [16] M.-L. Shyu, Z. Xie, M. Chen, and S.-C. Chen, "Video semantic event/concept detection using a subspace-based multimedia data mining framework," IEEE Transactions on Multimedia, vol. 10, no. 2, pp. 252-259, 2008. [
DOI:10.1109/TMM.2007.911830]
17. [17] V. S. Tseng, J.-H. Su, J.-H. Huang, and C.-J. Chen, "Integrated mining of visual features, speech features, and frequent patterns for semantic video annotation," IEEE Transactions on Multimedia, vol. 10, no. 2, pp. 260-267, 2008. [
DOI:10.1109/TMM.2007.911832]
18. [18] X. Peng and C. Schmid, "Encoding feature maps of cnns for action recognition," presented at the CVPR, THUMOS Challenge Workshop, 2015, Accessed: Apr. 08, 2016. [Online].
19. [19] R. Baradaran, E.Golpar-Raboki, "Feature Extraction and Efficiency Comparison Using Dimension Reduction Methods in Sentiment Analysis Context", JSDP, 2019, vol. 16 (3), pp. 88-79. [
DOI:10.29252/jsdp.16.3.88]
20. [20] F. Sherafati, J.Tahmoresnezhad, "Image Classification via Sparse Representation and Subspace Alignment", JSDP, 2020, vol.17 (2) , pp.58-47 [
DOI:10.29252/jsdp.17.2.58]
21. [21] R. Aly et al., "The AXES submissions at TrecVid 2013," 2013.
22. [22] Y. Shi, Y. Tian, Y. Wang, and T. Huang, "Sequential deep trajectory descriptor for action recognition with three-stream CNN," IEEE Transactions on Multimedia, vol. 19, no. 7, pp. 1510-1520, 2017. [
DOI:10.1109/TMM.2017.2666540]
23. [23] P. Scovanner, S. Ali, and M. Shah, "A 3-dimensional sift descriptor and its application to action recognition," in Proceedings of the 15th ACM international conference on Multimedia, Sep. 2007, pp. 357-360. [
DOI:10.1145/1291233.1291311]
24. [24] A. Klaser, M. Marszałek, and C. Schmid, "A spatio-temporal descriptor based on 3d-gradients," in Proceedings of the 19th British Machine Vision Conference (BMVC), 2008, pp. 99.1-99.10. [
DOI:10.5244/C.22.99]
25. [25] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-based learning applied to document recognition," Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, Nov. 1998. [
DOI:10.1109/5.726791]
26. [26] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, "Imagenet: A large-scale hierarchical image database," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2009, pp. 248-255. [
DOI:10.1109/CVPR.2009.5206848]
27. [27] M. Momeny, M .A Sarram, A. Latif, R. Sheikhpour, "A Convolutional Neural Network based on Adaptive Pooling for Classification of Noisy Images", JSDP, 2021, vol.17 (4), pp.139-154. [
DOI:10.29252/jsdp.17.4.139]
28. [28] D. Oneata, J. Verbeek, and C. Schmid, "The LEAR submission at Thumos 2014," 2014.
29. [29] S. Zha, F. Luisier, W. Andrews, N. Srivastava, and R. Salakhutdinov, "Exploiting image-trained cnn architectures for unconstrained video classification," in Proceedings of the 26th British Machine Vision Conference, 2015, p. 60.1-60.13. [
DOI:10.5244/C.29.60]
30. [30] A. Krizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet classification with deep convolutional neural networks," in Proceedings of the Advances in neural information processing systems, Jan. 2012, pp. 1097-1105.
31. [31] J. Deng, A. Berg, S. Satheesh, H. Su, A. Khosla, and L. Fei-Fei, "ILSVRC-2012, 2012," 2012, [Online]. Available: http://www. image-net. org/challenges/LSVRC.
32. [32] M. D. Zeiler and R. Fergus, "Visualizing and understanding convolutional networks," in Proceedings of the European Conference on Computer Vision, 2014, pp. 818-833. [
DOI:10.1007/978-3-319-10590-1_53]
33. [33] V. Nair and G. E. Hinton, "Rectified linear units improve restricted boltzmann machines," in Proceedings of the 27th International Conference on Machine Learning (ICML), 2010, pp. 807-814.
34. [34] M. Soltanian and S. Ghaemmaghami, "Hierarchical Concept Score Post-processing and Concept-wise Normalization in CNN based Video Event Recognition," IEEE Transactions on Multimedia, vol. 21, no. 1, pp. 157-172, 2019. [
DOI:10.1109/TMM.2018.2844101]
35. [35] Y. Han, X. Wei, X. Cao, Y. Yang, and X. Zhou, "Augmenting image descriptions using structured prediction output," IEEE Transactions on Multimedia, vol. 16, no. 6, pp. 1665-1676, 2014. [
DOI:10.1109/TMM.2014.2321530]
36. [36] H. Jégou, M. Douze, C. Schmid, and P. Pérez, "Aggregating local descriptors into a compact image representation," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2010, pp. 3304-3311. [
DOI:10.1109/CVPR.2010.5540039]
37. [37] Z. Zhao, Y. Song, and F. Su, "Specific video identification via joint learning of latent semantic concept, scene and temporal structure," Neurocomputing, vol. 208, pp. 378-386, 2016. [
DOI:10.1016/j.neucom.2016.06.002]
38. [38] F. Perronnin and C. Dance, "Fisher kernels on visual vocabularies for image categorization," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2007, pp. 1-8, Accessed: Jul. 06, 2016. [Online]. [
DOI:10.1109/CVPR.2007.383266]
39. [39] F. Markatopoulou et al., "ITI-CERTH participation to TRECVID 2013," in TRECVID 2013 Workshop, 2013, pp. 12-17.
40. [40] C. Sun and R. Nevatia, "Large-scale web video event classification by use of fisher vectors," in IEEE Workshop on Applications of Computer Vision (WACV), 2013, pp. 15-22, Accessed: Jul. 06, 2016. [Online]. [
DOI:10.1109/WACV.2013.6474994]
41. [41] R. Arandjelovic and A. Zisserman, "All about VLAD," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2013, pp. 1578-1585, Accessed: Jul. 06, 2016. [Online]. [
DOI:10.1109/CVPR.2013.207]
42. [42] J. Delhumeau, P.-H. Gosselin, H. Jégou, and P. Pérez, "Revisiting the vlad image representation," in Proceedings of the 21st ACM international conference on Multimedia, Oct. 2013, pp. 653-656. [
DOI:10.1145/2502081.2502171]
43. [43] G. Tolias, Y. Avrithis, and H. Jegou, "To Aggregate or Not to aggregate: Selective Match Kernels for Image Search," in Proceedings of the IEEE International Conference on Computer Vision, Dec. 2013, pp. 1401-1408, doi: 10.1109/ICCV.2013.177. [
DOI:10.1109/ICCV.2013.177]
44. [44] Y.-L. Boureau, J. Ponce, and Y. LeCun, "A theoretical analysis of feature pooling in visual recognition," in Proceedings of the 27th international conference on machine learning (ICML), Jun. 2010, pp. 111-118.
45. [45] T. De Campos, G. Csurka, and F. Perronnin, "Images as sets of locally weighted features," Computer Vision and Image Understanding, vol. 116, no. 1, pp. 68-85, 2012. [
DOI:10.1016/j.cviu.2011.07.011]
46. [46] N. Murray and F. Perronnin, "Generalized Max Pooling," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2014, pp. 2473-2480, doi: 10.1109/CVPR.2014.317. [
DOI:10.1109/CVPR.2014.317]
47. [47] T. Ge, Q. Ke, and J. Sun, "Sparse-Coded Features for Image Retrieval.," in Proceedings of the British Machine Vision Conference (BMVC), 2013, pp. 1-11, Accessed: Nov. 10, 2016. [Online]. [
DOI:10.5244/C.27.132]
48. [48] H. Jégou and O. Chum, "Negative evidences and co-occurrences in image retrieval: the benefit of PCA and whitening," in Proceedings of the European Conference on Computer Vision (ECCV), 2012, pp. 774-787, Accessed: Nov. 10, 2016. [Online]. [
DOI:10.1007/978-3-642-33709-3_55]
49. [49] M. K. Reddy, S. Arora, and R. V. Babu, "Spatio-temporal feature based VLAD for efficient video retrieval," in Proceedings of 4th National Conference on Computer Vision, Pattern Recognition, Image Processing and Graphics, 2013, pp. 1-4, Accessed: Nov. 10, 2016. [Online]. [
DOI:10.1109/NCVPRIPG.2013.6776268]
50. [50] M. Jain, H. Jegou, and P. Bouthemy, "Better Exploiting Motion for Better Action Recognition," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2013, pp. 2555-2562, doi: 10.1109/CVPR.2013.330. [
DOI:10.1109/CVPR.2013.330]
51. [51] M. Douze, H. Jégou, C. Schmid, and P. Pérez, "Compact video description for copy detection with precise temporal alignment," in Proceedings of the European Conference on Computer Vision, 2010, pp. 522-535, Accessed: Nov. 10, 2016. [Online]. [
DOI:10.1007/978-3-642-15549-9_38]
52. [52] A. Abbas, N. Deligiannis, and Y. Andreopoulos, "Vectors of locally aggregated centers for compact video representation," in 2015 IEEE International Conference on Multimedia and Expo (ICME), 2015, pp. 1-6, Accessed: Nov. 07, 2016. [Online]. [
DOI:10.1109/ICME.2015.7177501]
53. [53] J. Revaud, M. Douze, C. Schmid, and H. Jegou, "Event Retrieval in Large Video Collections with Circulant Temporal Encoding," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2013, pp. 2459-2466, doi: 10.1109/CVPR.2013.318. [
DOI:10.1109/CVPR.2013.318]
54. [54] M. Grant and S. Boyd, CVX: Matlab software for disciplined convex programming, version 2.1. http://cvxr.com/cvx, 2014.
55. [55] Y.-G. Jiang, G. Ye, S.-F. Chang, D. Ellis, and A. C. Loui, "Consumer video understanding: A benchmark database and an evaluation of human and machine performance," in Proceedings of the 1st ACM International Conference on Multimedia Retrieval, 2011, pp. 29.1-29.8. [
DOI:10.1145/1991996.1992025]
56. [56] "Pretrained CNNs - MatConvNet," 2017, Accessed: Jun. 12, 2017. [Online]. Available: http://www.vlfeat.org/matconvnet/pretrained/.
57. [57] C.-C. Chang and C.-J. Lin, "LIBSVM: a library for support vector machines," ACM Transactions on Intelligent Systems and Technology (TIST), vol. 2, no. 3, pp. 27.1-27.27, 2011. [
DOI:10.1145/1961189.1961199]
58. [58] "Matlab VideoUtils," SourceForge, 2015. https://sourceforge.net/projects/videoutils/ (accessed May 29, 2016).
59. [59] A. Vedaldi and K. Lenc, "MatConvNet: Convolutional neural networks for matlab," in Proceedings of the 23rd Annual ACM Conference on Multimedia, Oct. 2015, pp. 689-692. [
DOI:10.1145/2733373.2807412]
60. [60] A. Vedaldi and B. Fulkerson, "VLFeat: An open and portable library of computer vision algorithms," in Proceedings of the 18th ACM International Conference on Multimedia, 2010, pp. 1469-1472. [
DOI:10.1145/1873951.1874249]
61. [61] Y.-G. Jiang, Q. Dai, T. Mei, Y. Rui, and S.-F. Chang, "Super Fast Event Recognition in Internet Videos," IEEE Transactions on Multimedia, vol. 17, no. 8, pp. 1174-1186, 2015. [
DOI:10.1109/TMM.2015.2436813]
62. [62] P. Napoletano, "Visual descriptors for content-based retrieval of remote-sensing images," International Journal of Remote Sensing, vol. 39, no. 5, pp. 1343-1376, 2018. [
DOI:10.1080/01431161.2017.1399472]
63. [63] C. Goutte and E. Gaussier, "A probabilistic interpretation of precision, recall and F-score, with implication for evaluation," in Proceedings of the European Conference on Information Retrieval, 2005, pp. 345-359. [
DOI:10.1007/978-3-540-31865-1_25]
64. [64] F. Perronnin, J. Sánchez, and T. Mensink, "Improving the fisher kernel for large-scale image classification," in Proceedings of the European Conference on Computer Vision (ECCV), Sep. 2010, pp. 143-156. [
DOI:10.1007/978-3-642-15561-1_11]
65. [65] Z. Xu, Y. Yang, I. Tsang, N. Sebe, and A. G. Hauptmann, "Feature weighting via optimal thresholding for video analysis," in Proceedings of the IEEE International Conference on Computer Vision, 2013, pp. 3440-3447. [
DOI:10.1109/ICCV.2013.427]
66. [66] A. J. Ma and P. C. Yuen, "Reduced analytic dependency modeling: Robust fusion for visual recognition," International journal of computer vision, vol. 109, no. 3, pp. 233-251, 2014. [
DOI:10.1007/s11263-014-0723-7]
67. [67] G. Ye, D. Liu, I.-H. Jhuo, and S.-F. Chang, "Robust late fusion with rank minimization," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2012, pp. 3021-3028.
68. [68] I.-H. Jhuo et al., "Discovering joint audio-visual codewords for video event detection," Machine vision and applications, vol. 25, no. 1, pp. 33-47, 2014. [
DOI:10.1007/s00138-013-0567-0]
69. [69] D. Liu, K.-T. Lai, G. Ye, M.-S. Chen, and S.-F. Chang, "Sample-specific late fusion for visual category recognition," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2013, pp. 803-810. [
DOI:10.1109/CVPR.2013.109]
70. [70] Z. Wu, Y.-G. Jiang, J. Wang, J. Pu, and X. Xue, "Exploring inter-feature and inter-class relationships with deep neural networks for video classification," in Proceedings of the 22nd ACM international conference on Multimedia, 2014, pp. 167-176. [
DOI:10.1145/2647868.2654931]