Please use this identifier to cite or link to this item:

http://hdl.handle.net/10609/77147
Title: Semi-automatic generation of a corpus of Wikipedia articles on science and technology
Author: Minguillón Alfonso, Julià  
Lerga Felip, Maura
Aibar Puentes, Eduard  
Lladós Masllorens, Josep  
Meseguer Artola, Antoni  
Others: Universitat Oberta de Catalunya (UOC)
Keywords: Wikipedia
science and technology
Infomap
community detection
Unesco taxonomy
corpus
Issue Date: Sep-2017
Publisher: El Profesional de la Información
Citation: Minguillón, J., Lerga Felip, M., Aibar, E., Lladós-Masllorens, J. & Meseguer-Artola, A. (2017). Semi-automatic generation of a corpus of Wikipedia articles on science and technology. El Profesional de la Información, 26(5), 995-1005. doi: 10.3145/epi.2017.sep.20
Also see: http://www.elprofesionaldelainformacion.com/contenidos/2017/sep/20.html
Abstract: Despite the huge amount of scientific and technological content available on the World Wide Web, most of it is closed behind paywalls, as with academic journals, or almost invisible, as with institutional repositories. Wikipedia can act as a chain-transfer agent, providing people with an accessible, organized structure containing both understandable content and links to original sources. In Wikipedia, categories are collaboratively created and thus become a folksonomy rather than a true taxonomy. Consequently, categories are not a reliable tool to identify topics¿ organization. In this paper we describe a semi-automatic method, based on random walks, for determining a subset of pages containing scientific and technological content in the Spanish Wikipedia. Using the Unesco taxonomy, we determined the underlying graph structure of our corpus and detected clusters of pages strongly linked, establishing relationships between knowledge domains. Finally, we present the distribution of Wikipedia articles according to the Unesco taxonomy and the resulting map of scientific and technological content.
Language: English
URI: http://hdl.handle.net/10609/77147
ISSN: 1699-2407MIAR
Appears in Collections:Articles
Articles

Share:
Export:
Files in This Item:
File Description SizeFormat 
minguillon_semi_automatic.pdf2.62 MBAdobe PDFView/Open

This item is licensed under a Creative Commons License Creative Commons