Please use this identifier to cite or link to this item:
http://hdl.handle.net/10609/151102
Title: | Croissant: a metadata format for ML-Ready Datasets |
Author: | Akhtar, Mubashara ![]() Benjelloun, Omar ![]() Conforti, Costanza ![]() Gijsbers, Pieter ![]() Giner Miguelez, Joan ![]() Jain, Nitisha ![]() Kuchnik, Michael ![]() Lhoest, Quentin ![]() Marcenac, Pierre ![]() Maskey, Manil ![]() Mattson, Peter ![]() Oala, Luis ![]() Ruyssen, Pierre ![]() Shinde, Rajat ![]() Simperl, Elena ![]() Thomas, Goeff ![]() Tykhonov, Vyacheslav ![]() Vanschoren, Joaquin ![]() van der Velde, Jos ![]() Vogler, Steffen ![]() Wu, Carole-Jean ![]() |
Citation: | Akhtar, M. [Mubashara], Benjelloun, O. [Omar], Conforti, C. [Costanza], Gijsbers, P. [Pieter], Giner-Miguelez, J. [Joan], Jain, N. [Nitisha],... & Wu, J.C. [Jean-Carole]. (2024). Croissant: a metadata format for ML-Ready Datasets. Proceedings of the 8th Workshop on Data Management for End-to-End Machine Learning, DEEM 2024 (p. 1-6). New York, NY: Association for computing machinery. doi: 10.1145/3650203.3663326 |
Abstract: | Data is a critical resource for Machine Learning (ML), yet working with data remains a key friction point. This paper introduces Croissant, a metadata format for datasets that simplifies how data is used by ML tools and frameworks. Croissant makes datasets more discoverable, portable and interoperable, thereby addressing significant challenges in ML data management and responsible AI. Croissant is already supported by several popular dataset repositories, spanning hundreds of thousands of datasets, ready to be loaded into the most popular ML frameworks. |
Keywords: | ML datasets discoverability reproducibility responsible AI |
DOI: | https://doi.org/10.1145/3650203.3663326 |
Document type: | info:eu-repo/semantics/conferenceObject |
Version: | info:eu-repo/semantics/publishedVersion |
Issue Date: | 9-Jun-2024 |
Publication license: | http://creativecommons.org/licenses/by/3.0/es/ ![]() |
Linked data: | https://dl.acm.org/doi/10.1145/3650203.3663326#core-collateral-metrics |
Appears in Collections: | Conferències Conference lectures |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
Akhtar_DEEM_Croissant.pdf | 1,44 MB | Adobe PDF | ![]() View/Open |
Share:


This item is licensed under aCreative Commons License