Please use this identifier to cite or link to this item: http://hdl.handle.net/10609/151212
Title: Engineering data-sharing practices for a fair and trustworthy AI
Author: Giner Miguelez, Joan  
Director: Cabot, Jordi  
Gómez Llana, Abel
Abstract: Machine learning (ML) technology may discriminate toward specific social groups. For example, recent research have revealed that ML applications are more likely to fail in identifying women than males in hospitals. Recent research has identified the data used to train these models as one of the causes of these issues. The research community has proposed guidelines to detect the dimensions that can generate these discriminatory behaviors. However, these proposals lack a set structure, restricting their computation and the creation of engineering approaches built upon them. This thesis presents a domain-specific language to document data for ML. This language has served as a basis for creating the responsible AI extension of \emph{Croissant}, a standard adopted by major search engines, such as \emph{Google Dataset Search}. Moreover, this thesis studies the use of large language models (LLM) to automatically create data documentation and the readiness of scientific data for its use in ML.
Keywords: data-sharing practices
machine learning
trustworthy AI
fairness
data documentation
Document type: info:eu-repo/semantics/doctoralThesis
Issue Date: 15-Jul-2024
Publication license: http://creativecommons.org/licenses/by-nc-nd/3.0/es/  
Appears in Collections:Tesis doctorals

Files in This Item:
File Description SizeFormat 
Thesis_manuscript_acks.pdfGiner-Miguelez_dissertation12,09 MBAdobe PDFThumbnail
View/Open
Share:
Export:
View statistics

This item is licensed under aCreative Commons License Creative Commons