Please use this identifier to cite or link to this item:
Title: El corpus paral·lel del Diari Oficial de la Generalitat de Catalunya
Author: Oliver González, Antoni
Keywords: parallel corpus
translation memory
terminology extraction
statistical machine translation
natural language processing
Issue Date: 2017
Publisher: Zeitschrift für Katalanistik/Revista d'Estudis Catalans
Citation: Oliver González, A.(2017)"El corpus paral·lel del Diari Oficial de la Generalitat de Catalunya", Zeitschrift für Katalanistik/Revista d'Estudis Catalans, 30, p.269-291.I SSN 0932-2221
Abstract: In this paper the process of compilation of the parallel corpus from the Official Diary of the Catalan Government (DOGC) is presented. It describes the downloading process, the tools and processes for the treatment and linguistic analysis. The final result is a big parallel corpus that is freely available in several formats and with several annotation levels. This corpus is a very valuable resource for different applications. As example, three possible fields of application are described: as a translation memory to be used in a Computer-Assisted Translation tool; for terminology extraction and query and for training statistical machine translation systems.
Language: Catalan
ISSN: 0932-2221MIAR
Appears in Collections:Articles

Files in This Item:
File Description SizeFormat 
16_Oliver.pdfArticle Zeitschrift für Katalanistik546.07 kBAdobe PDFView/Open

This item is licensed under a Creative Commons License Creative Commons