Please use this identifier to cite or link to this item: http://hdl.handle.net/10609/150467
Title: HFCommunity: an extraction process and relational database to analyze Hugging Face Hub data
Author: Ait Fonollà, Adem  
Canovas Izquierdo, Javier Luis  
Cabot, Jordi  
Citation: Ait, A. [Adem], Cánovas Izquierdo, J.L. [Javier Luis], & Cabot, J. [Jordi]. (2024). HFCommunity: an extraction process and relational database to analyze Hugging Face Hub data. Science of Computer Programming, 234, 103079. doi: 10.1016/j.scico.2024.103079
Abstract: Social coding platforms such as GitHub or GitLab have become the de facto standard for developing Open-Source Software (OSS) projects. With the emergence of Machine Learning (ML), platforms specifically designed for hosting and developing ML-based projects have appeared, being Hugging Face Hub (HFH) one of the most popular ones. HFH aims at sharing datasets, pre-trained ML models and the applications built with them. With over 400 K repositories, and growing fast, HFH is becoming a promising source of empirical data on all aspects of ML project development. However, apart from the API provided by the platform, there are no easy-to-use solutions to collect the data, nor prepackaged datasets to explore the different facets of HFH. We present HFCommunity, an extraction process for HFH data and a relational database to facilitate an empirical analysis on the growing number of ML projects.
Keywords: mining software repositories
data analysis
Hugging Face
DOI: https://doi.org/10.1016/j.scico.2024.103079
Document type: info:eu-repo/semantics/article
Version: info:eu-repo/semantics/publishedVersion
Issue Date: May-2024
Publication license: http://creativecommons.org/licenses/by/3.0/es/  
Appears in Collections:Articles cientÍfics
Articles

Files in This Item:
File Description SizeFormat 
Ait_scp_HFCommunity.pdf499,43 kBAdobe PDFThumbnail
View/Open
Share:
Export:
View statistics

This item is licensed under aCreative Commons License Creative Commons