Semantic Image Synthesis from Text: Current Trends and Future Horizons in Text-to-Image Generation
DOI:
https://doi.org/10.4108/eetiot.5336Keywords:
Text-to-Image Generation, Generative Adversarial Networks (GANs), Multimodal Models, Natural Language Processing, Computer Vision, Ethical AI, InterpretabilityAbstract
Text-to-image generation, a captivating intersection of natural language processing and computer vision, has undergone a remarkable evolution in recent years. This research paper provides a comprehensive review of the state-of-the-art in text-to-image generation techniques, highlighting key advancements and emerging trends. We begin by surveying the foundational models, with a focus on Generative Adversarial Networks (GANs) and their pivotal role in generating realistic and diverse images from textual descriptions. We delve into the intricacies of training data, model architectures, and evaluation metrics, offering insights into the challenges and opportunities in this field. Furthermore, this paper explores the synergistic relationship between natural language processing and computer vision, showcasing multimodal models like DALL-E and CLIP. These models not only generate images from text but also understand the contextual relationships between textual descriptions and images, opening avenues for content recommendation, search engines, and visual storytelling. The paper discusses applications spanning art, design, e-commerce, healthcare, and education, where text-to-image generation has made significant inroads. We highlight the potential of this technology in automating content creation, aiding in diagnostics, and transforming the fashion and e-commerce industries. However, the journey of text-to-image generation is not without its challenges. We address ethical considerations, emphasizing responsible AI and the mitigation of biases in generated content. We also explore interpretability and model transparency, critical for ensuring trust and accountability.
Downloads
References
[1] Vinicius Luis Trevisan de Souza ∗, Bruno Augusto Dorta Marques, Harlen Costa Batagelo,João Paulo Gois, A review on Generative Adversarial Networks for image generation, Computers & Graphics, Volume 114, August 2023, Pages 13-25
[2] Chun Liu, Jingsong Hu, Hong Lin, “SWF-GAN: A Text-to-Image model based on sentence–word fusion Perception”, Computers & Graphics, Volume 115, October 2023, Pages 500-510
[3] Ruina Bai, Ruizhang Huang, Yongbin Qin , Yanping Chen, Chuan Lin, “HVAE: A deep generative model via hierarchical variational auto-encoder for multi-view document modeling”, Information Sciences, Volume 623, April 2023, Pages 40-55
[4] Zhaorui Tan, Xi Yang, Zihan Ye, Qiufeng Wang, Yuyao Yan, Anh Nguyen, Kaizhu Huang, “Semantic Similarity Distance: Towards better text-image consistency metric in text-to-image generation”, Pattern Recognition, Volume 144, December 2023, 109883
[5] Yong Xuan Tana, Chin Poo Leea, Mai Neo b, Kian Ming Lima, Jit Yan Lima, “Text-to-image synthesis with self-supervised bi-stage generative adversarial network”, Pattern Recognition Letters, Volume 169, May 2023, Pages 43-49
[6] Fengnan Quan, Bo Lang, Yanxi Liu, “ARRPNGAN: Text-to-image GAN with attention regularization and region proposal networks”, Signal Processing: Image Communication, Volume 106, August 2022, 116728
[7] Xin Zhang, Wentao Jiao, Bing Wang, Xuedong Tian, “CT-GAN: A conditional Generative Adversarial Network of transformer architecture for text-to-image”, Signal Processing: Image Communication, Volume 115, July 2023, 116959
[8] Guoshuai Zhao, Chaofeng Zhang, Heng Shang, Yaxiong Wang, Li Zhu ,Xueming Qian, “Generative label fused network for image–text matching”, Knowledge-Based Systems, Volume 263, 5 March 2023, 110280
[9] Hamil Stanly, Mercy Shalinie S, Riji Paul, “A review of generative and non-generative adversarial attack on context-rich Images”, Engineering Applications of Artificial Intelligence, Volume 124, September 2023, 106595
[10] Wenjie Liao, Yuli Huang, Zhe Zheng, Xinzheng Lu, “Intelligent generative structural design method for shear wall building based on “fused-text-image-to-image” generative adversarial networks”, Expert Systems with Applications, Volume 210, 30 December 2022, 118530
[11] Siyue Huang, Ying Chen, “Generative Adversarial Networks with Adaptive Semantic Normalization for text-to-image synthesis”, Digital Signal Processing, Volume 120, January 2022, 103267
[12] Xinsheng Wang, Tingting Qiao, Jihua Zhu, Member, IEEE, Alan Hanjalic, Fellow, IEEE, and Odette Scharenborg, Senior Member, IEEE, “Generating Images From Spoken Descriptions”, IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 29, 2021
[13] Jong Hak Moon, Hyungyung Lee, Woncheol Shin, Young-Hak Kim, and Edward Choi, “Multi-Modal Understanding and Generation for Medical Images and Text via Vision-Language Pre-Training”, IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 26, NO. 12, DECEMBER 2022.
[14] Zhiyuan Zheng, Jun Chen, Member, IEEE, Xiangtao Zheng, Member, IEEE, and Xiaoqiang Lu, Senior Member, IEEE, “Remote Sensing Image Generation From Audio”, IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, VOL. 18, NO. 6, JUNE 2021.
[15] P. MAHALAKSHMI AND N. SABIYATH FATIMA, “Summarization of Text and Image Captioning in Information Retrieval Using Deep Learning Techniques”, Digital Object Identifier 10.1109/ACCESS.2022.315041
Downloads
Published
How to Cite
Issue
Section
Categories
License
Copyright (c) 2024 EAI Endorsed Transactions on Internet of Things
This work is licensed under a Creative Commons Attribution 3.0 Unported License.
This is an open-access article distributed under the terms of the Creative Commons Attribution CC BY 3.0 license, which permits unlimited use, distribution, and reproduction in any medium so long as the original work is properly cited.