Please use this identifier to cite or link to this item:
http://hdl.handle.net/10609/151212
Title: | Engineering data-sharing practices for a fair and trustworthy AI |
Author: | Giner Miguelez, Joan ![]() |
Director: | Cabot, Jordi ![]() Gómez Llana, Abel |
Abstract: | Machine learning (ML) technology may discriminate toward specific social groups. For example, recent research have revealed that ML applications are more likely to fail in identifying women than males in hospitals. Recent research has identified the data used to train these models as one of the causes of these issues. The research community has proposed guidelines to detect the dimensions that can generate these discriminatory behaviors. However, these proposals lack a set structure, restricting their computation and the creation of engineering approaches built upon them. This thesis presents a domain-specific language to document data for ML. This language has served as a basis for creating the responsible AI extension of \emph{Croissant}, a standard adopted by major search engines, such as \emph{Google Dataset Search}. Moreover, this thesis studies the use of large language models (LLM) to automatically create data documentation and the readiness of scientific data for its use in ML. |
Keywords: | data-sharing practices machine learning trustworthy AI fairness data documentation |
Document type: | info:eu-repo/semantics/doctoralThesis |
Issue Date: | 15-Jul-2024 |
Publication license: | http://creativecommons.org/licenses/by-nc-nd/3.0/es/ ![]() |
Appears in Collections: | Tesis doctorals |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
Thesis_manuscript_acks.pdf | Giner-Miguelez_dissertation | 12,09 MB | Adobe PDF | ![]() View/Open |
Share:


This item is licensed under aCreative Commons License