Please use this identifier to cite or link to this item: http://hdl.handle.net/10609/149208
Title: El corpus paral·lel del Diari Oficial de la Generalitat de Catalunya
Other Titles: The parallel corpus of the Official Journal of the Catalan Government
Author: Oliver, Antoni  
Citation: Oliver González, A. [Antoni]. (2023). El corpus paral·lel del Diari Oficial de la Generalitat de Catalunya. Linguamática, 14(2), 75-81. doi: 10.21814/lm.14.2.380
Abstract: In this paper, the process of compilation of thenew version of the Catalan–Spanish parallel corpusof the Official Journal of the Catalan Government (DOGC) is presented. The processes of downloading,conversion to text, segmentation and automatic align-ment are described. All the programs that have beendeveloped to perform these processes are distributedunder a free license and the compiled corpus can befreely downloaded. Furthermore, the process of trai-ning and evaluation of two neural machine transla-tion systems, Catalan–Spanish and Spanish–Catalan,using this corpus is presented.
Keywords: parallel corpus
neural machine translation
DOI: https://doi.org/10.21814/lm.14.2.380
Document type: info:eu-repo/semantics/article
Version: info:eu-repo/semantics/publishedVersion
Issue Date: 31-Dec-2022
Publication license: https://creativecommons.org/licenses/by/4.0/  
Appears in Collections:Articles
Articles cientÍfics

Files in This Item:
File Description SizeFormat 
Oliver_l_corpus.pdf388,64 kBAdobe PDFThumbnail
View/Open
Share:
Export:
View statistics

This item is licensed under aCreative Commons License Creative Commons