Please use this identifier to cite or link to this item: http://hdl.handle.net/10609/150334
Title: Improving term candidate validation using ranking metrics
Author: Vàzquez, Mercè  
Oliver, Antoni  
Citation: Vàzquez, M. [Mercè] & Oliver, A. [Antoni]. (2012). Improving Term Candidate Validation Using Ranking Metrics. Proceedings of 3rd World Conference on Information Technology (WCIT-2012), 14-16 novembre de 2012, Barcelona, Espanya
Abstract: At times it is difficult to automatically identify the most representative terms in a specialized corpus and to validate them as correct due to the similarity of words and terms. In order to identify the most representative terms in a corpus that can be easily adapted to any language or terminology extraction tool, we explore the combination of token slot extraction and ranking metrics to select term candidates with a high likelihood of being terminological units. This paper presents the results we have identified using four statistical measures. We observe high term detection in English corpora (a precision of 76.92% and a recall of 79.09%) and Spanish corpora (a precision of 60% and a recall of 70.48%) using token slot detection together with four ranking metrics: Dice, True Mutual Information, T-score and Log-likelihood. In conclusion, token slot detection extracts terminological patterns in term candidates to reduce lists of candidates, and ranking metrics improve results and reduce the number to be evaluated manually. We will evaluate the algorithm’s performance in other domains and for other user profiles and needs.
Keywords: term candidate validation
token slot detection
term extraction
ranking metrics
Document type: info:eu-repo/semantics/conferenceObject
Issue Date: 2-Jan-2013
Publication license: http://creativecommons.org/licenses/by-nc-nd/3.0/es/  
Appears in Collections:Conferencias

Files in This Item:
File Description SizeFormat 
Vazquez-Oliver_.Improving-Term-Candidate-Validation-Using-Ranking-Metrics.pdfImproving Term Candidate Validation Using Ranking Metrics801,46 kBAdobe PDFThumbnail
View/Open
Share:
Export:
View statistics

This item is licensed under aCreative Commons License Creative Commons