Tècniques de Deep Learning per reconeixement i classificació d'àudio

Garriga Muñoz, Jordi

Por favor, use este identificador para citar o enlazar este ítem: http://hdl.handle.net/10609/150950

Registro completo de metadatos

Campo DC	Valor	Lengua/Idioma
dc.contributor.author	Garriga Muñoz, Jordi	-
dc.date.accessioned	2024-07-17T22:42:06Z	-
dc.date.available	2024-07-17T22:42:06Z	-
dc.date.issued	2024-06-28	-
dc.identifier.uri	http://hdl.handle.net/10609/150950	-
dc.description.abstract	El món de la Intel·ligència Artificial cada cop avança més ràpid i cada dia veiem novetats en tots els seus vessants, especialment en el processament i generació d’imatges i vídeo. Les actuals tècniques de deep learning com les Convolutional Neural Networks o els models transformers han permès grans avenços mai pensats abans. En aquest treball es pretén introduir el lector en l’ús d’aquestes tècniques adaptades al processament del so i audio. Per fer-ho, s’ha triat un camp concret com és el del reconeixement i classificació de sons. El treball consta de dues parts: d’una banda, s’introdueixen els conceptes teòrics bàsics sobre aquesta disciplina; característiques del so i mètodes de conversió analògica-digital, processament, i tractament posterior. També s’expliquen les solucions disponibles actualment al mercat i els diferents estudis i recerca que diversos investigadors duen a terme en aquest camp. De l’altra, es vol mostrar un cas pràctic concret de l’ús del deep learning per la classificació de so. Mitjançant la creació d’un model convenientment entrenat a partir d’un dataset amb gran quantitat de referències sonores, aquest ha de ser capaç d’identificar i classificar amb el major nivell de precisió possible fragments sonors del mateix tipus. Per fer-ho, s’utilitzarà una combinació de vàries tècniques, englobades dins d’un concepte teòric conegut com CLAP (Contrastive Language Audio Procesing), que fa servir CNNs per processar els fragments sonors del conjunt d’entrenament juntament amb etiquetes de text que descriuen el so que conté el fragment.	ca
dc.description.abstract	The world of Artificial Intelligence is advancing faster and every day we see innovations in all its facets, especially in the processing and generation of images and video. Current deep learning techniques such as Convolutional Neural Networks or transformer models have enabled great advancements never thought of before. This work aims to introduce the reader to the use of these techniques adapted to sound and audio processing. To do this, a specific field such as sound recognition and classification has been chosen. The work is divided in two parts: on one hand, the basic theoretical concepts of this discipline are introduced: sound characteristics and analog-to-digital conversion methods, processing, and subsequent treatment. It also explains the applications currently available on the market and the different studies and research carried out by various researchers in this field. On the other hand, it aims to show a specific practical case of the use of deep learning for sound classification. Through the creation of a properly trained model from a dataset with a large number of sound references, it should be able to identify and classify sound fragments of the same type with the highest level of accuracy possible. To do this, a combination of various techniques will be used, encompassed within a theoretical concept known as CLAP (Contrastive Language Audio Processing), which uses CNNs to process sound fragments from the training set along with text labels describing the sound contained in the fragment.	en
dc.format.mimetype	application/pdf	ca
dc.language.iso	cat	ca
dc.publisher	Universitat Oberta de Catalunya (UOC)	ca
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/3.0/es/	-
dc.subject	deep learning	ca
dc.subject	classificació d'àudio	ca
dc.subject	CNN	ca
dc.title	Tècniques de Deep Learning per reconeixement i classificació d'àudio	ca
dc.type	info:eu-repo/semantics/bachelorThesis	ca
dc.contributor.director	Moyà Alcover, Gabriel	-
dc.contributor.tutor	Sanchez Castaño, Friman	-
Aparece en las colecciones:	Bachelor thesis, research projects, etc.

Ficheros en este ítem:

Fichero	Descripción	Tamaño	Formato
jgarrigamoonTFG0624.pdf	Memòria en PDF del treball	2,17 MB	Adobe PDF	Visualizar/Abrir
TFG_Presentació_JordiGarriga.mp4	Vídeo de la presentació del treball	541,67 MB	MP4	Visualizar/Abrir

Mostrar el registro sencillo del ítem

Comparte:

Impacto:

Google Scholar

Microsoft Academic

Exporta:

Consulta las estadísticas