Emphatic visual speech synthesis

Melenchón, Javier; Martínez Marroquín, Elisa; Torre Frade, Fernando de la; Montero Morales, José Antonio

Por favor, use este identificador para citar o enlazar este ítem: http://hdl.handle.net/10609/109843

Registro completo de metadatos

Campo DC	Valor	Lengua/Idioma
dc.contributor.author	Melenchón, Javier	-
dc.contributor.author	Martínez Marroquín, Elisa	-
dc.contributor.author	Torre Frade, Fernando de la	-
dc.contributor.author	Montero Morales, José Antonio	-
dc.contributor.other	Universitat Oberta de Catalunya. eLearning Innovation Center	-
dc.contributor.other	Universitat Ramon Llull	-
dc.contributor.other	Carnegie Mellon University	-
dc.date.accessioned	2020-02-18T08:24:04Z	-
dc.date.available	2020-02-18T08:24:04Z	-
dc.date.issued	2009-01	-
dc.identifier.citation	Melenchón Maldonado, J., Martínez Marroquín, E., De la Torre Frade, F. & Montero, J. (2009). Emphatic Visual Speech Synthesis. IEEE Transactions on Audio, Speech and Language Processing, 17(3), 459-468. doi: 10.1109/TASL.2008.2010213	es
dc.identifier.issn	1558-7916MIAR	-
dc.identifier.uri	http://hdl.handle.net/10609/109843	-
dc.description.abstract	The synthesis of talking heads has been a flourishing research area over the last few years. Since human beings have an uncanny ability to read people's faces, most related applications (e.g., advertising, video-teleconferencing) require absolutely realistic photometric and behavioral synthesis of faces. This paper proposes a person-specific facial synthesis framework that allows high realism and includes a novel way to control visual emphasis (e.g., level of exaggeration of visible articulatory movements of the vocal tract). There are three main contributions: a geodesic interpolation with visual unit selection, a parameterization of visual emphasis, and the design of minimum size corpora. Perceptual tests with human subjects reveal high realism properties, achieving similar perceptual scores as real samples. Furthermore, the visual emphasis level and two communication styles show a statistical interaction relationship.	en
dc.language.iso	eng	-
dc.publisher	IEEE Transactions on Audio, Speech and Language Processing	-
dc.relation.ispartof	IEEE Transactions on Audio, Speech and Language Processing, 2009, 17(3)	-
dc.relation.uri	https://doi.org/10.1109/TASL.2008.2010213	-
dc.subject	audiovisual speech synthesis	en
dc.subject	emphatic visual-speech	en
dc.subject	talking head	en
dc.subject	síntesi audiovisual de la veu	ca
dc.subject	síntesis audiovisual de la voz	es
dc.subject	discurs visual emfàtic	ca
dc.subject	discurso visual enfático	es
dc.subject	tertulià	ca
dc.subject	tertuliano	es
dc.subject.lcsh	Speech processing systems	en
dc.title	Emphatic visual speech synthesis	-
dc.type	info:eu-repo/semantics/article	-
dc.subject.lemac	Processament de la parla	ca
dc.subject.lcshes	Procesamiento del habla	es
dc.rights.accessRights	info:eu-repo/semantics/closedAccess	-
dc.identifier.doi	10.1109/TASL.2008.2010213	-
dc.gir.id	AR/0000004340	-
dc.relation.projectID	info:eu-repo/grantAgreement/TEC2006-08043/TCM	-
Aparece en las colecciones:	Articles Articles cientÍfics

Ficheros en este ítem:

No hay ficheros asociados a este ítem.

Mostrar el registro sencillo del ítem

Comparte:

Impacto:

Google Scholar

Microsoft Academic

Exporta:

Consulta las estadísticas