Please use this identifier to cite or link to this item: http://hdl.handle.net/10609/151102
Title: Croissant: a metadata format for ML-Ready Datasets
Author: Akhtar, Mubashara  
Benjelloun, Omar  
Conforti, Costanza  
Gijsbers, Pieter  
Giner Miguelez, Joan  
Jain, Nitisha  
Kuchnik, Michael  
Lhoest, Quentin  
Marcenac, Pierre  
Maskey, Manil  
Mattson, Peter  
Oala, Luis  
Ruyssen, Pierre  
Shinde, Rajat  
Simperl, Elena  
Thomas, Goeff  
Tykhonov, Vyacheslav  
Vanschoren, Joaquin  
van der Velde, Jos  
Vogler, Steffen  
Wu, Carole-Jean  
Citation: Akhtar, M. [Mubashara], Benjelloun, O. [Omar], Conforti, C. [Costanza], Gijsbers, P. [Pieter], Giner-Miguelez, J. [Joan], Jain, N. [Nitisha],... & Wu, J.C. [Jean-Carole]. (2024). Croissant: a metadata format for ML-Ready Datasets. Proceedings of the 8th Workshop on Data Management for End-to-End Machine Learning, DEEM 2024 (p. 1-6). New York, NY: Association for computing machinery. doi: 10.1145/3650203.3663326
Abstract: Data is a critical resource for Machine Learning (ML), yet working with data remains a key friction point. This paper introduces Croissant, a metadata format for datasets that simplifies how data is used by ML tools and frameworks. Croissant makes datasets more discoverable, portable and interoperable, thereby addressing significant challenges in ML data management and responsible AI. Croissant is already supported by several popular dataset repositories, spanning hundreds of thousands of datasets, ready to be loaded into the most popular ML frameworks.
Keywords: ML datasets
discoverability
reproducibility
responsible AI
DOI: https://doi.org/10.1145/3650203.3663326
Document type: info:eu-repo/semantics/conferenceObject
Version: info:eu-repo/semantics/publishedVersion
Issue Date: 9-Jun-2024
Publication license: http://creativecommons.org/licenses/by/3.0/es/  
Linked data: https://dl.acm.org/doi/10.1145/3650203.3663326#core-collateral-metrics
Appears in Collections:Conferències
Conference lectures

Files in This Item:
File Description SizeFormat 
Akhtar_DEEM_Croissant.pdf1,44 MBAdobe PDFThumbnail
View/Open
Share:
Export:
View statistics

This item is licensed under aCreative Commons License Creative Commons