Vision and Multimodal Foundation Models in Medical Imaging: A Comprehensive Review of Architectures, Clinical Trends, and Future Directions
pdf

Keywords

Foundation Models
Medical Imaging
Multimodal Learning
Medicine
Promotable Segmentation
Precision
Vision Transformers

Abstract

Foundation models (FMs) are revolutionizing medical imaging by transitioning from task-specific algorithms to large-scale , generalizable systems that can learn from a broad range of multimodal data. Recent advances in these fields—transformer-based visual encoders , promptable segmentation architectures , vision–language models , and parameter-efficient fine-tuning—have resulted in improved performance among segmentation , detection , classification and report generation techniques in a variety of modalities such as MRI , CT , ultrasound , X-ray , endoscopy , and digital pathology. Domain specific FMs (including prostate MRI, brain MRI , retinal , ultrasound and pathology models) have proved to be effective in providing high label efficiency and competitive or better performance with the mainstream deep learning models , in particular under low-annotation conditions. Trends in the research emphasize such techniques as large-scale pretraining, multimodal integration , cross-task generalization , data-efficient learning , and the development of universal feature encoders. Simultaneously , extensive benchmarking and external validation indicate performance variability , motivating the continued development of standardized evaluation protocols. Adoption by clinical practice has been restricted because of interpretability , bias, workflow integration, computational requirements , and regulatory uncertainty. New options such as personalizable AI , continual learning , federated model adaptation , and imaging–genomics integration , stand out to make FMs key for the future of precision medicine. This article consolidates architectural , pioneering foundation models , clinical evaluation , and translational advancements , drawing upon the current context and future direction of foundation-model medical imaging.

pdf
Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Copyright (c) 2025 Iraqi Journal of Intelligent Computing and Informatics (IJICI)

Downloads

Download data is not yet available.