Skip to main navigation Skip to search Skip to main content

ENHANCING MEDICAL VISION-LANGUAGE MODELS WITH RICH TEXTUAL DESCRIPTIONS AND MULTIPLE ALIGNMENTS FOR CHEST X-RAY DIAGNOSIS

  • Youssef Ibrahim
  • , Anabia Sohail
  • , Sajid Javed
  • , Hasan AlMarzouqi
  • , Mohamed Deriche
  • , Naoufel Werghi
  • Computer Science Department
  • Khalifa University of Science and Technology

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Scopus citations

Abstract

Vision-Language models (VLMs) integrate natural language understanding with visual data interpretation, crucial in diverse applications such as medical imaging. However, training VLMs on limited data, especially in radiology, remains a challenge. We propose a strategy to improve dual encoder performance under data constraints. Using contrastive learning to align visual and textual embeddings effectively, we generated a bag of rich textual descriptions using GPT-4 to augment merged information from esteemed medical resources and pre-trained BiomedCLIP. These rich textual descriptions provide in-depth information on disease visual description, major causes, and major symptoms, enhancing the model’s contextual understanding and classification accuracy. Unlike previous methods relying on a single alignment, our multiple alignment strategy associates multiple images with multiple textual descriptions per disease class while capping descriptors to maintain computational efficiency. Adapting the vision encoder for chest X-ray classification, our approach achieves competitive accuracy with fewer training pairs, highlighting its potential for data-limited domains.

Original languageEnglish
Title of host publication2025 IEEE International Conference on Image Processing, ICIP 2025 - Proceedings
PublisherIEEE Computer Society
Pages2079-2084
Number of pages6
ISBN (Electronic)9798331523794
DOIs
StatePublished - 2025
Event32nd IEEE International Conference on Image Processing, ICIP 2025 - Anchorage, United States
Duration: 14 Sep 202517 Sep 2025

Publication series

NameProceedings - International Conference on Image Processing, ICIP
ISSN (Print)1522-4880

Conference

Conference32nd IEEE International Conference on Image Processing, ICIP 2025
Country/TerritoryUnited States
CityAnchorage
Period14/09/2517/09/25

Keywords

  • Chest X-ray
  • Contrastive Learning
  • Multiple Alignment
  • Rich Textual Descriptions
  • Vision-Language Model

Fingerprint

Dive into the research topics of 'ENHANCING MEDICAL VISION-LANGUAGE MODELS WITH RICH TEXTUAL DESCRIPTIONS AND MULTIPLE ALIGNMENTS FOR CHEST X-RAY DIAGNOSIS'. Together they form a unique fingerprint.

Cite this