Exploring large vision-language models with prompt engineering for peripheral blood cell image analysis and classification

Sánchez Quijada, Marina

Please use this identifier to cite or link to this item: http://hdl.handle.net/10609/150706

Full metadata record

DC Field	Value	Language
dc.contributor.author	Sánchez Quijada, Marina	-
dc.date.accessioned	2024-07-10T17:36:46Z	-
dc.date.available	2024-07-10T17:36:46Z	-
dc.date.issued	2024-06	-
dc.identifier.uri	http://hdl.handle.net/10609/150706	-
dc.description.abstract	In recent years, large vision and language models (LVLMs) have gained a lot of attention due to their accessibility and impressive performance in various language and vision tasks. Consequently, their applications in the medical imaging field are being studied, showing already great potential in clinical settings. However, very few studies have been carried out to evaluate the potential of LVLMs for disease diagnosis, especially for microscopy images. In this work, we explore for the first time the capabilities of three of the most advanced LVLMs (GPT-4, Claude3, and LLaVa) in the analysis and classification of peripheral blood cells. To perform this exploration, we build multiple prompts based on different prompting techniques, including few-shot learning and chain of thought (CoT), to study and improve the performance of these LVLMs for blood cell image analysis. We also explore the functionality of the assistant and the system roles in model behaviour and performance. Moreover, we perform a comprehensive comparison of their accuracy rates and create a web application for white blood cell classification. Our experiments conclude that the best-performing method and LVLM combination is GPT-4o when using a two-shot learning strategy with the addition of the assistant role. When testing this approach on 100 images of leukocytes, we attained an accuracy rate of 78%. Although this performance is not reliable enough and LVLMs should not be used as diagnostic tools, we believe that due to the rapid advancement of large language-vision models, LVLMs could become a great asset in the analysis of pathology images, working as an assistant for quick blood cell description and classification.	en
dc.format.mimetype	application/pdf	ca
dc.language.iso	eng	ca
dc.publisher	Universitat Oberta de Catalunya (UOC)	ca
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/3.0/es/	-
dc.subject	Machine learning	en
dc.subject	Medical imaging analysis	en
dc.subject	Large visual-language models	en
dc.title	Exploring large vision-language models with prompt engineering for peripheral blood cell image analysis and classification	ca
dc.type	info:eu-repo/semantics/masterThesis	ca
dc.contributor.tutor	Alférez Baquero, Edwin Santiago	-
Appears in Collections:	Trabajos finales de carrera, trabajos de investigación, etc.

Files in This Item:

File	Description	Size	Format
TFM_Memoria_MarinaSanchez.pdf		2,2 MB	Adobe PDF	View/Open

Show simple item record

Share:

Impact:

Google Scholar

Microsoft Academic

Export:

View statistics