Improvement of generative adversarial networks for automatic text-to-image generation

Pejhan, Elham; Ghasemzadeh, Mohammad

doi:10.61186/jsdp.19.4.33

Volume 19, Issue 4 (3-2023) JSDP 2023, 19(4): 33-44 | Back to browse issues page

‎ 10.61186/jsdp.19.4.33

Mendeley

Zotero

RefWorks

Pejhan E, Ghasemzadeh M. Improvement of generative adversarial networks for automatic text-to-image generation. JSDP 2023; 19 (4) : 3
URL: http://jsdp.rcisp.ac.ir/article-1-1170-en.html

Improvement of generative adversarial networks for automatic text-to-image generation

Elham Pejhan

, Mohammad Ghasemzadeh ^*

Yazd University

Abstract: (1469 Views)

This research is related to the use of deep learning tools and image processing technology in the automatic generation of images from text. Previous researches have used one sentence to produce images. In this research, a memory-based hierarchical model is presented that uses three different descriptions that are presented in the form of sentences to produce and improve the image. The proposed scheme focuses on using more information to produce high-resolution images, using competitive productive networks. Implementing programs related to this field require massive processing resources. Therefore, the proposed method was implemented and tested on a cluster with 25 GPUs using the hardware platform of the University of Copenhagen. The experiments were performed on CUB-200 and ids-ade datasets. The experimental results show that the proposed model can produce higher quality images than the two basic models StackGAN and AttGAN.

Article number: 3

Keywords: Generative Adversarial Network, Deep Learning, Hierarchical Model, Natural Language Processing

Full-Text [PDF 869 kb] (719 Downloads)

Type of Study: Research | Subject: Paper
Received: 2020/08/21 | Accepted: 2021/05/24 | Published: 2023/03/20 | ePublished: 2023/03/20

References

1. [1] M. M. Haji-Esmaeili, and G. Montazer, "Automatic Coloring of Grayscale Images Using Generative Adversarial Networks, ", Journal of Signal and Data Processing (JSDP), vol. 16 (1), pp. 57-74, 2019. [DOI:10.29252/jsdp.16.1.57]

2. [2] T. Baltrusaitis, C. Ahuja, and L. P. Morency, "Multimodal machine learning: A survey and taxonomy, " in IEEE Transactions on Pattern Analysis, 2017.

3. [3] A. Dash, J. C. B. Gamboa, S. Ahmed, M. Liwicki, and M. Z. Afzal, "Tac-gan-text conditioned auxiliary classifier generative adversarial network, " arXiv preprint arXiv:1703.06412, 2017.

4. [4] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, "Generative adversarial nets, " in Advances in neural information processing systems, 2014.

5. [5] C. Gulcehre, S. Chandar, K. Cho, and Y. Bengio, "Dynamic neural turing machine with continuous and discrete addressing schemes, " Neural computation, vol. 30, no. 4, pp. 857-884, 2018. [DOI:10.1162/neco_a_01060] [PMID]

6. [6] N. Ilinykh, S. Zarrieß, and D. Schlangen, "Tell Me More: A Dataset of Visual Scene Description Sequences, " in Proceedings of the 12th International Conference on Natural Language Generation, 2019. [DOI:10.18653/v1/W19-8621]

7. [7] K. J. Joseph, A. Pal, S. Rajanala, and V. N. Balasubramanian, "C4synth: Cross-caption cycle-consistent text-to-image synthesis, " in IEEE Winter Conference on Applications of Computer Vision (WACV), 2019. [DOI:10.1109/WACV.2019.00044]

8. [8] W. Li, P. Zhang, L. Zhang, Q. Huang, X. He, S. Lyu, and J. Gao, "Object-driven text-to-image synthesis via adversarial training, " in Proc. of the IEEE Conf.e on Computer Vision and Pattern Recognition, 2019. [DOI:10.1109/CVPR.2019.01245]

9. [9] A. Miller, A. Fisch, J. Dodge, A. H. Karimi, A. Bordes, and J. Weston, "Key-value memory networks for directly reading documents, " in Proceeding of Empirical Methods in Natural Language Processing (EMNLP), 2016. [DOI:10.18653/v1/D16-1147]

10. [10] S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, and H. Lee, "Generative adversarial text to image synthesis, " arXiv preprint arXiv:1605.05396, 2016.

11. [11] T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford, and X. Chen, "Improved techniques for training gans, " in Advances in neural information processing systems (NIPS), 2016.

12. [12] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, "Rethinking the inception architecture for computer vision, " in Proc. of the IEEE conf. on computer vision and pattern recognition, 2016. [DOI:10.1109/CVPR.2016.308]

13. [13] C. Wah, S. Branson, P. Welinder, P. Perona, and S. Belongie, The caltech-ucsd birds-200-2011 dataset, 2011.

14. [14] T. Xu, P. Zhang, Q. Huang, H. Zhang, Z. Gan, X. Huang, and X. He, "Attngan: Fine-grained text to image generation with attentional generative adversarial networks, " in Proc. of the IEEE conf. on computer vision and pattern recognition, 2018. [DOI:10.1109/CVPR.2018.00143]

15. [15] X. Yan, J. Yang, K. Sohn, and H. Lee, "Attribute2image: Conditional image generation from visual attributes, " in European Conf. on Computer Vision, 2016. [DOI:10.1007/978-3-319-46493-0_47]

16. [16] G. Yin, B. Liu, L. Sheng, N. Yu, X. Wang, and J. Shao, "Semantics disentangling for text-to-image generation, " in Proceedings of the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2019. [DOI:10.1109/CVPR.2019.00243]

17. [17] H. Zhang, T. Xu, H. Li, S. Zhang, X. Huang, X. Wang, and D. Metaxas, "Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks, " in Proc.of the IEEE int. conference on computer vision, 2017. [DOI:10.1109/ICCV.2017.629] []

18. [18] H. Zhang, T. Xu, H. Li, S. Zhang, X. Wang, X. Huang, and D. N. Metaxas, "Stackgan++: Realistic image synthesis with stacked generative adversarial networks, " in IEEE transactions on pattern analysis and machine intelligence, 2017. [DOI:10.1109/ICCV.2017.629] []

19. [19] Z. Zhang, Y. Xie, and L. Yang, " Photo-graphic Text-to-Image Synthesis with a Hierarchically-nested Adversarial Network" in Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition, 2018. [DOI:10.1109/CVPR.2018.00649] [PMID]

20. [20] P. Zhou, W. Shi, J. Tian, Z. Qi, B. Li, H. Hao, and B. Xu, "Attention-based bidirectional long short-term memory networks for relation classification, " in Proceedings of the Annual Meeting of the Association for Computational Linguistics, 2016. [DOI:10.18653/v1/P16-2034]

21. [21] M. Zhu, P. Pan, W. Chen, and Y. Yang, "dm-gan: Dynamic memory generative adversarial net. for text-to-image synthesis, " in Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition, 2019. [DOI:10.1109/CVPR.2019.00595]

22. [22] X. Zhu, A. B. Goldberg, M. Eldawy, C. R. Dyer, and B. Strock, "A text-to-picture synthesis system for augmenting communication, " in proceeding of Association for the Advanced

Send email to the article author

Rights and permissions
	This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Signal and Data Processing

Vote