Please use this identifier to cite or link to this item:
http://hdl.handle.net/10609/149208
Title: | El corpus paral·lel del Diari Oficial de la Generalitat de Catalunya |
Other Titles: | The parallel corpus of the Official Journal of the Catalan Government |
Author: | Oliver, Antoni |
Citation: | Oliver González, A. [Antoni]. (2023). El corpus paral·lel del Diari Oficial de la Generalitat de Catalunya. Linguamática, 14(2), 75-81. doi: 10.21814/lm.14.2.380 |
Abstract: | In this paper, the process of compilation of thenew version of the Catalan–Spanish parallel corpusof the Official Journal of the Catalan Government (DOGC) is presented. The processes of downloading,conversion to text, segmentation and automatic align-ment are described. All the programs that have beendeveloped to perform these processes are distributedunder a free license and the compiled corpus can befreely downloaded. Furthermore, the process of trai-ning and evaluation of two neural machine transla-tion systems, Catalan–Spanish and Spanish–Catalan,using this corpus is presented. |
Keywords: | parallel corpus neural machine translation |
DOI: | https://doi.org/10.21814/lm.14.2.380 |
Document type: | info:eu-repo/semantics/article |
Version: | info:eu-repo/semantics/publishedVersion |
Issue Date: | 31-Dec-2022 |
Publication license: | https://creativecommons.org/licenses/by/4.0/ |
Appears in Collections: | Articles Articles cientÍfics |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
Oliver_l_corpus.pdf | 388,64 kB | Adobe PDF | View/Open |
Share:
This item is licensed under aCreative Commons License