TY - JOUR
T1 - Text-to-image generation with enhanced GANs
T2 - Bridging semantic gaps using RNN and CNN
AU - Ramzan, Sadia
AU - Ramzan, Hafiz Arslan
AU - Kalsum, Tehmina
AU - Adnan, Mohammed
AU - Chaudhary, Muhammad Akmal
AU - Ali, Muhammad Moazzam
N1 - Publisher Copyright:
© 2026 Ramzan et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
PY - 2026/1
Y1 - 2026/1
N2 - Text-to-image generation is the process of generating images from a given text description. It is the most challenging task to produce consistently realistic images according to our conditions. We have considered this problem in our study and proposed a neural network-based model that can generate good-quality images from text descriptions. In this research, we have used a Generative Adversarial Network (GAN) for the generation of images with Recurrent Neural Network (RNN) and Convolutional Neural Network (CNN). RNN is used for creating word embeddings from textual sentences and for extracting important features from images we have used CNN. The generator model is used for generating images from text and this generated image is used as input to the discriminator with further matched text, mismatched text, and real images from the dataset. These experiments are performed on the Oxford 102-flowers dataset. We also modified this existing dataset and created a new version of this dataset, oxford-102 flowers (beta) consisting of 15 text descriptions for each image. The model is trained on these two datasets for generating images of 64 x 64, 128 x 128, and 256 x 256 resolution. Generator and discriminator loss during training of mode are calculated. The inception Score and peak signal-to-noise ratio are performance metrics that we have used for model evaluation. Our model achieves an inception score of 4.15 on the oxford-102 flowers dataset of 64 x 64 resolution, 3.87 on 256 x 256 resolution, and 3.97 on 128 x 128 oxford-102 flowers (beta). PSNR values are 28.25 dB and 30.12dB on the original and annotated dataset. Experiments show the outstanding performance of our methodology as compared to the existing models in terms of inception score and PSNR value.
AB - Text-to-image generation is the process of generating images from a given text description. It is the most challenging task to produce consistently realistic images according to our conditions. We have considered this problem in our study and proposed a neural network-based model that can generate good-quality images from text descriptions. In this research, we have used a Generative Adversarial Network (GAN) for the generation of images with Recurrent Neural Network (RNN) and Convolutional Neural Network (CNN). RNN is used for creating word embeddings from textual sentences and for extracting important features from images we have used CNN. The generator model is used for generating images from text and this generated image is used as input to the discriminator with further matched text, mismatched text, and real images from the dataset. These experiments are performed on the Oxford 102-flowers dataset. We also modified this existing dataset and created a new version of this dataset, oxford-102 flowers (beta) consisting of 15 text descriptions for each image. The model is trained on these two datasets for generating images of 64 x 64, 128 x 128, and 256 x 256 resolution. Generator and discriminator loss during training of mode are calculated. The inception Score and peak signal-to-noise ratio are performance metrics that we have used for model evaluation. Our model achieves an inception score of 4.15 on the oxford-102 flowers dataset of 64 x 64 resolution, 3.87 on 256 x 256 resolution, and 3.97 on 128 x 128 oxford-102 flowers (beta). PSNR values are 28.25 dB and 30.12dB on the original and annotated dataset. Experiments show the outstanding performance of our methodology as compared to the existing models in terms of inception score and PSNR value.
UR - https://www.scopus.com/pages/publications/105028206520
U2 - 10.1371/journal.pone.0340413
DO - 10.1371/journal.pone.0340413
M3 - Article
C2 - 41563993
AN - SCOPUS:105028206520
SN - 1932-6203
VL - 21
JO - PLoS ONE
JF - PLoS ONE
IS - 1 January
M1 - e0340413
ER -