Please use this identifier to cite or link to this item:
Title: El corpus paral·lel del Diari Oficial de la Generalitat de Catalunya
Author: Oliver, Antoni  
Citation: Oliver González, A. [Antoni]. (2023). El corpus paral·lel del Diari Oficial de la Generalitat de Catalunya. Linguamática, 14(2), 75-81. doi: 10.21814/lm.14.2.380
Abstract: In this paper, the process of compilation of thenew version of the Catalan–Spanish parallel corpusof the Official Journal of the Catalan Government (DOGC) is presented. The processes of downloading,conversion to text, segmentation and automatic align-ment are described. All the programs that have beendeveloped to perform these processes are distributedunder a free license and the compiled corpus can befreely downloaded. Furthermore, the process of trai-ning and evaluation of two neural machine transla-tion systems, Catalan–Spanish and Spanish–Catalan,using this corpus is presented.
Keywords: parallel corpus
neural machine translation
Type: info:eu-repo/semantics/article
Issue Date: 31-Dec-2022
Publication license:  
Appears in Collections:Articles
Articles cientÍfics

Files in This Item:
File Description SizeFormat 
Oliver_l_corpus.pdf388,64 kBAdobe PDFThumbnail
View statistics

This item is licensed under aCreative Commons License Creative Commons