Exploring large vision-language models with prompt engineering for peripheral blood cell image analysis and classification

Sánchez Quijada, Marina

Por favor, use este identificador para citar o enlazar este ítem: http://hdl.handle.net/10609/150706

Título :	Exploring large vision-language models with prompt engineering for peripheral blood cell image analysis and classification
Autoría:	Sánchez Quijada, Marina
Tutor:	Alférez Baquero, Edwin Santiago
Resumen :	In recent years, large vision and language models (LVLMs) have gained a lot of attention due to their accessibility and impressive performance in various language and vision tasks. Consequently, their applications in the medical imaging field are being studied, showing already great potential in clinical settings. However, very few studies have been carried out to evaluate the potential of LVLMs for disease diagnosis, especially for microscopy images. In this work, we explore for the first time the capabilities of three of the most advanced LVLMs (GPT-4, Claude3, and LLaVa) in the analysis and classification of peripheral blood cells. To perform this exploration, we build multiple prompts based on different prompting techniques, including few-shot learning and chain of thought (CoT), to study and improve the performance of these LVLMs for blood cell image analysis. We also explore the functionality of the assistant and the system roles in model behaviour and performance. Moreover, we perform a comprehensive comparison of their accuracy rates and create a web application for white blood cell classification. Our experiments conclude that the best-performing method and LVLM combination is GPT-4o when using a two-shot learning strategy with the addition of the assistant role. When testing this approach on 100 images of leukocytes, we attained an accuracy rate of 78%. Although this performance is not reliable enough and LVLMs should not be used as diagnostic tools, we believe that due to the rapid advancement of large language-vision models, LVLMs could become a great asset in the analysis of pathology images, working as an assistant for quick blood cell description and classification.
Palabras clave :	Machine learning Medical imaging analysis Large visual-language models
Tipo de documento:	info:eu-repo/semantics/masterThesis
Fecha de publicación :	jun-2024
Licencia de publicación:	http://creativecommons.org/licenses/by-nc-nd/3.0/es/
Aparece en las colecciones:	Trabajos finales de carrera, trabajos de investigación, etc.

Ficheros en este ítem:

Fichero	Descripción	Tamaño	Formato
TFM_Memoria_MarinaSanchez.pdf		2,2 MB	Adobe PDF	Visualizar/Abrir

Mostrar el registro completo del ítem

Comparte:

Impacto:

Google Scholar

Microsoft Academic

Exporta:

Consulta las estadísticas