Volume 19, Issue 4 (3-2023)                   JSDP 2023, 19(4): 33-44 | Back to browse issues page


XML Persian Abstract Print


Download citation:
BibTeX | RIS | EndNote | Medlars | ProCite | Reference Manager | RefWorks
Send citation to:

Pejhan E, Ghasemzadeh M. Improvement of generative adversarial networks for automatic text-to-image generation. JSDP 2023; 19 (4) : 3
URL: http://jsdp.rcisp.ac.ir/article-1-1170-en.html
Yazd University
Abstract:   (1196 Views)
This research is related to the use of deep learning tools and image processing technology in the automatic generation of images from text. Previous researches have used one sentence to produce images. In this research, a memory-based hierarchical model is presented that uses three different descriptions that are presented in the form of sentences to produce and improve the image. The proposed scheme focuses on using more information to produce high-resolution images, using competitive productive networks. Implementing programs related to this field require massive processing resources. Therefore, the proposed method was implemented and tested on a cluster with 25 GPUs using the hardware platform of the University of Copenhagen. The experiments were performed on CUB-200 and ids-ade datasets. The experimental results show that the proposed model can produce higher quality images than the two basic models StackGAN and AttGAN.
Article number: 3
Full-Text [PDF 869 kb]   (601 Downloads)    
Type of Study: Research | Subject: Paper
Received: 2020/08/21 | Accepted: 2021/05/24 | Published: 2023/03/20 | ePublished: 2023/03/20

References
1. [1] M. M. Haji-Esmaeili, and G. Montazer, "Automatic Coloring of Grayscale Images Using Generative Adversarial Networks, ", Journal of Signal and Data Processing (JSDP), vol. 16 (1), pp. 57-74, 2019. [DOI:10.29252/jsdp.16.1.57]
2. [2] T. Baltrusaitis, C. Ahuja, and L. P. Morency, "Multimodal machine learning: A survey and taxonomy, " in IEEE Transactions on Pattern Analysis, 2017.
3. [3] A. Dash, J. C. B. Gamboa, S. Ahmed, M. Liwicki, and M. Z. Afzal, "Tac-gan-text conditioned auxiliary classifier generative adversarial network, " arXiv preprint arXiv:1703.06412, 2017.
4. [4] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, "Generative adversarial nets, " in Advances in neural information processing systems, 2014.
5. [5] C. Gulcehre, S. Chandar, K. Cho, and Y. Bengio, "Dynamic neural turing machine with continuous and discrete addressing schemes, " Neural computation, vol. 30, no. 4, pp. 857-884, 2018. [DOI:10.1162/neco_a_01060] [PMID]
6. [6] N. Ilinykh, S. Zarrieß, and D. Schlangen, "Tell Me More: A Dataset of Visual Scene Description Sequences, " in Proceedings of the 12th International Conference on Natural Language Generation, 2019. [DOI:10.18653/v1/W19-8621]
7. [7] K. J. Joseph, A. Pal, S. Rajanala, and V. N. Balasubramanian, "C4synth: Cross-caption cycle-consistent text-to-image synthesis, " in IEEE Winter Conference on Applications of Computer Vision (WACV), 2019. [DOI:10.1109/WACV.2019.00044]
8. [8] W. Li, P. Zhang, L. Zhang, Q. Huang, X. He, S. Lyu, and J. Gao, "Object-driven text-to-image synthesis via adversarial training, " in Proc. of the IEEE Conf.e on Computer Vision and Pattern Recognition, 2019. [DOI:10.1109/CVPR.2019.01245]
9. [9] A. Miller, A. Fisch, J. Dodge, A. H. Karimi, A. Bordes, and J. Weston, "Key-value memory networks for directly reading documents, " in Proceeding of Empirical Methods in Natural Language Processing (EMNLP), 2016. [DOI:10.18653/v1/D16-1147]
10. [10] S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, and H. Lee, "Generative adversarial text to image synthesis, " arXiv preprint arXiv:1605.05396, 2016.
11. [11] T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford, and X. Chen, "Improved techniques for training gans, " in Advances in neural information processing systems (NIPS), 2016.
12. [12] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, "Rethinking the inception architecture for computer vision, " in Proc. of the IEEE conf. on computer vision and pattern recognition, 2016. [DOI:10.1109/CVPR.2016.308]
13. [13] C. Wah, S. Branson, P. Welinder, P. Perona, and S. Belongie, The caltech-ucsd birds-200-2011 dataset, 2011.
14. [14] T. Xu, P. Zhang, Q. Huang, H. Zhang, Z. Gan, X. Huang, and X. He, "Attngan: Fine-grained text to image generation with attentional generative adversarial networks, " in Proc. of the IEEE conf. on computer vision and pattern recognition, 2018. [DOI:10.1109/CVPR.2018.00143]
15. [15] X. Yan, J. Yang, K. Sohn, and H. Lee, "Attribute2image: Conditional image generation from visual attributes, " in European Conf. on Computer Vision, 2016. [DOI:10.1007/978-3-319-46493-0_47]
16. [16] G. Yin, B. Liu, L. Sheng, N. Yu, X. Wang, and J. Shao, "Semantics disentangling for text-to-image generation, " in Proceedings of the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2019. [DOI:10.1109/CVPR.2019.00243]
17. [17] H. Zhang, T. Xu, H. Li, S. Zhang, X. Huang, X. Wang, and D. Metaxas, "Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks, " in Proc.of the IEEE int. conference on computer vision, 2017. [DOI:10.1109/ICCV.2017.629] []
18. [18] H. Zhang, T. Xu, H. Li, S. Zhang, X. Wang, X. Huang, and D. N. Metaxas, "Stackgan++: Realistic image synthesis with stacked generative adversarial networks, " in IEEE transactions on pattern analysis and machine intelligence, 2017. [DOI:10.1109/ICCV.2017.629] []
19. [19] Z. Zhang, Y. Xie, and L. Yang, " Photo-graphic Text-to-Image Synthesis with a Hierarchically-nested Adversarial Network" in Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition, 2018. [DOI:10.1109/CVPR.2018.00649] [PMID]
20. [20] P. Zhou, W. Shi, J. Tian, Z. Qi, B. Li, H. Hao, and B. Xu, "Attention-based bidirectional long short-term memory networks for relation classification, " in Proceedings of the Annual Meeting of the Association for Computational Linguistics, 2016. [DOI:10.18653/v1/P16-2034]
21. [21] M. Zhu, P. Pan, W. Chen, and Y. Yang, "dm-gan: Dynamic memory generative adversarial net. for text-to-image synthesis, " in Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition, 2019. [DOI:10.1109/CVPR.2019.00595]
22. [22] X. Zhu, A. B. Goldberg, M. Eldawy, C. R. Dyer, and B. Strock, "A text-to-picture synthesis system for augmenting communication, " in proceeding of Association for the Advanced

Add your comments about this article : Your username or Email:
CAPTCHA

Send email to the article author


Rights and permissions
Creative Commons License This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

© 2015 All Rights Reserved | Signal and Data Processing