Please use this identifier to cite or link to this item: http://hdl.handle.net/10609/93054
Full metadata record
DC FieldValueLanguage
dc.contributor.authorSánchez Ruenes, David-
dc.contributor.authorBatet, Montserrat-
dc.contributor.otherUniversitat Rovira i Virgili (URV)-
dc.contributor.otherUniversitat Oberta de Catalunya. Internet Interdisciplinary Institute (IN3)-
dc.date.accessioned2019-04-11T07:53:58Z-
dc.date.available2019-04-11T07:53:58Z-
dc.date.issued2016-11-
dc.identifier.citationSánchez, D. & Batet, M. (2017). Toward sensitive document release with privacy guarantees. Engineering Applications of Artificial Intelligence, 59(), 23-34. doi: 10.1016/j.engappai.2016.12.013-
dc.identifier.issn0952-1976MIAR
-
dc.identifier.urihttp://hdl.handle.net/10609/93054-
dc.description.abstractPrivacy has become a serious concern for modern Information Societies. The sensitive nature of much of the data that are daily exchanged or released to untrusted parties requires that responsible organizations undertake appropriate privacy protection measures. Nowadays, much of these data are texts (e.g., emails, messages posted in social media, healthcare outcomes, etc.) that, because of their unstructured and semantic nature, constitute a challenge for automatic data protection methods. In fact, textual documents are usually protected manually, in a process known as document redaction or sanitization. To do so, human experts identify sensitive terms (i.e., terms that may reveal identities and/or confidential information) and protect them accordingly (e.g., via removal or, preferably, generalization). To relieve experts from this burdensome task, in a previous work we introduced the theoretical basis of C-sanitization, an inherently semantic privacy model that provides the basis to the development of automatic document redaction/sanitization algorithms and offers clear and a priori privacy guarantees on data protection; even though its potential benefits C-sanitization still presents some limitations when applied to practice (mainly regarding flexibility, efficiency and accuracy). In this paper, we propose a new more flexible model, named (C, g(C))-sanitization, which enables an intuitive configuration of the trade-off between the desired level of protection (i.e., controlled information disclosure) and the preservation of the utility of the protected data (i.e., amount of semantics to be preserved). Moreover, we also present a set of technical solutions and algorithms that provide an efficient and scalable implementation of the model and improve its practical accuracy, as we also illustrate through empirical experiments.en
dc.language.isoeng-
dc.publisherEngineering Applications of Artificial Intelligence-
dc.relation.ispartofEngineering Applications of Artificial Intelligence, 2017, 59()-
dc.relation.urihttps://doi.org/10.1016/j.engappai.2016.12.013-
dc.rightsCC BY-NC-ND-
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/3.0/es/-
dc.subjectdocument redactionen
dc.subjectsanitizationen
dc.subjectsemanticsen
dc.subjectontologiesen
dc.subjectprivacyen
dc.subjectredacción de documentoses
dc.subjectdesinfecciónes
dc.subjectsemánticaes
dc.subjectontologíases
dc.subjectprivacidades
dc.subjectredacció de documentsca
dc.subjecthigienitzacióca
dc.subjectsemànticaca
dc.subjectontologiesca
dc.subjectprivacitatca
dc.subject.lcshData protectionen
dc.titleToward sensitive document release with privacy guarantees-
dc.typeinfo:eu-repo/semantics/article-
dc.subject.lemacProtecció de dadesca
dc.subject.lcshesProtección de datoses
dc.rights.accessRightsinfo:eu-repo/semantics/openAccess-
dc.identifier.doi10.1016/j.engappai.2016.12.013-
dc.gir.idAR/0000005425-
dc.type.versioninfo:eu-repo/semantics/submittedVersion-
Appears in Collections:Articles cientÍfics
Articles

Files in This Item:
File Description SizeFormat 
towardsensitive.pdfPreprint454,17 kBAdobe PDFThumbnail
View/Open
Share:
Export:
View statistics

Items in repository are protected by copyright, with all rights reserved, unless otherwise indicated.