Exploring large vision-language models with prompt engineering for peripheral blood cell image analysis and classification

Sánchez Quijada, Marina

Empreu aquest identificador per citar o enllaçar aquest ítem: http://hdl.handle.net/10609/150706

Títol:	Exploring large vision-language models with prompt engineering for peripheral blood cell image analysis and classification
Autoria:	Sánchez Quijada, Marina
Tutor:	Alférez Baquero, Edwin Santiago
Resum:	In recent years, large vision and language models (LVLMs) have gained a lot of attention due to their accessibility and impressive performance in various language and vision tasks. Consequently, their applications in the medical imaging field are being studied, showing already great potential in clinical settings. However, very few studies have been carried out to evaluate the potential of LVLMs for disease diagnosis, especially for microscopy images. In this work, we explore for the first time the capabilities of three of the most advanced LVLMs (GPT-4, Claude3, and LLaVa) in the analysis and classification of peripheral blood cells. To perform this exploration, we build multiple prompts based on different prompting techniques, including few-shot learning and chain of thought (CoT), to study and improve the performance of these LVLMs for blood cell image analysis. We also explore the functionality of the assistant and the system roles in model behaviour and performance. Moreover, we perform a comprehensive comparison of their accuracy rates and create a web application for white blood cell classification. Our experiments conclude that the best-performing method and LVLM combination is GPT-4o when using a two-shot learning strategy with the addition of the assistant role. When testing this approach on 100 images of leukocytes, we attained an accuracy rate of 78%. Although this performance is not reliable enough and LVLMs should not be used as diagnostic tools, we believe that due to the rapid advancement of large language-vision models, LVLMs could become a great asset in the analysis of pathology images, working as an assistant for quick blood cell description and classification.
Paraules clau:	Machine learning Medical imaging analysis Large visual-language models
Tipus de document:	info:eu-repo/semantics/masterThesis
Data de publicació:	jun-2024
Llicència de publicació:	http://creativecommons.org/licenses/by-nc-nd/3.0/es/
Apareix a les col·leccions:	Trabajos finales de carrera, trabajos de investigación, etc.