Análisis de la Encuesta de Salud Nacional y Examen de Nutrición de Estados Unidos (NHANES) usando machine learning

Crespo Estévez, María José

Por favor, use este identificador para citar o enlazar este ítem: http://hdl.handle.net/10609/99127

Registro completo de metadatos

Campo DC	Valor	Lengua/Idioma
dc.contributor.author	Crespo Estévez, María José	-
dc.date.accessioned	2019-07-13T17:11:49Z	-
dc.date.available	2019-07-13T17:11:49Z	-
dc.date.issued	2019-06	-
dc.identifier.uri	http://hdl.handle.net/10609/99127	-
dc.description.abstract	En este trabajo se usará el conjunto de datos de kaggle National Health and Nutrition Examination Survey. La finalidad será diseñar e implementar diferentes modelos no supervisados para identificar patrones, descubrir como tienden los datos a agruparse y si existen comorbilidades entre las enfermedades. También diseñaremos modelos predictivos para detectar si un paciente sufre hipertensión. En los modelos de clustering, escogemos el parámetro n_neighbors con el método del codo y los parámetros de los modelos predictivos con el RandomizedSearchCV y después con GridSearchCV. Se implementa un modelo de clustering con k-Means para el conjunto total de los datos y otro para las enfermedades del archivo medications. En el primero se concluye que la edad y las variables relacionadas con la salud dental son los más importantes para la determinación de los clústeres, en el segundo se obtienen unas posibles comorbilidades para las enfermedades. Para los modelos predictivos se usan los algoritmos: Support Vector Classification, Gradient Boosting Classifier, AdaBoost Classifier, Random Forest Classifier, Naive Bayes, Logistic Regression y k-NN de la librería sklearn. El mejor modelo se obtiene con AdaBoost y una exactitud de 76.33, aunque el Naive Bayes ofrece un buen resultado del TPR de 62.69 al obtenerse la menor cantidad de falsos negativos entre todos los modelos.	es
dc.description.abstract	We are going to work with the kaggle¿s dataset named National Health and Nutrition Examination Survey in this paper. The main purpose is to design and implement different unsupervised models to identify patterns, to discover how the data tends to group and if there are comorbidities among the diseases. We are also going to design predictive models to detect if a patient suffers from some of the diseases written in the dataset. In the clustering models, we choose the parameter n_neighbors with the elbow method and the parameters of the predictive models with the RandomizedSearchCV and then with GridSearchCV. A clustering model with k-Means is implemented for the total data set and another for diseases of the medications file. In the first, it is concluded that age and variables related to dental health are the most important for the determination of clusters, in the second, possible comorbidities for diseases are obtained. For predictive models the algorithms are used: Support Vector Classification, Gradient Boosting Classifier, AdaBoost Classifier, Random Forest Classifier, Naive Bayes, Logistic Regression and k-NN from sklearn. The best model is obtained with AdaBoost and an accuracy of 76.33, although the Naive Bayes offers a good result of the TPR of 62.69 to obtain the lowest amount of false negatives among all models.	en
dc.description.abstract	En aquest treball es farà servir el conjunt de dades de kaggle National Health and Nutrition Examination Survey. La finalitat serà dissenyar i implementar diferents models no supervisats per identificar patrons, descobrir com tendeixen les dades a agrupar-se i si hi ha comorbiditats entre les malalties. També dissenyarem models predictius per detectar si un pacient pateix hipertensió. En els models de clustering, escollim el paràmetre n_neighbors amb el mètode del colze i els paràmetres dels models predictius amb el RandomizedSearchCV i després amb GridSearchCV. S'implementa un model de clustering amb k-Means per al conjunt total de les dades i un altre per a les malalties de l'arxiu médications. En el primer es conclou que l'edat i les variables relacionades amb la salut dental són els més importants per a la determinació dels clústers, en el segon s'obtenen unes possibles comorbiditats per a les malalties. Per als models predictius s'usen els algoritmes: Support Vector Classification, Gradient Boosting Classifier, AdaBoost Classifier, Random Forest Classifier, Naive Bayes, Logistic Regression i k-NN de la llibreria sklearn. El millor model s'obté amb AdaBoost i una exactitud de 76.33, tot i que el Naive Bayes ofereix un bon resultat del TPR de 62.69 a obtenir-el mínim de falsos negatius entre tots els models.	ca
dc.format.mimetype	application/pdf	-
dc.language.iso	spa	-
dc.publisher	Universitat Oberta de Catalunya (UOC)	-
dc.rights	CC BY-NC-ND	-
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/3.0/es/	-
dc.subject	medicina	es
dc.subject	NHANES	es
dc.subject	aprendizaje automático	es
dc.subject	medicine	en
dc.subject	NHANES	en
dc.subject	machine learning	en
dc.subject	medicina	ca
dc.subject	NHANES	ca
dc.subject	aprenentatge automàtic	ca
dc.subject.lcsh	Machine learning -- TFM	en
dc.title	Análisis de la Encuesta de Salud Nacional y Examen de Nutrición de Estados Unidos (NHANES) usando machine learning	-
dc.type	info:eu-repo/semantics/masterThesis	-
dc.audience.educationlevel	Estudis de Màster	ca
dc.audience.educationlevel	Estudios de Máster	es
dc.audience.educationlevel	Master's degrees	en
dc.subject.lemac	Aprenentatge automàtic -- TFM	ca
dc.subject.lcshes	Aprendizaje automático -- TFM	es
dc.contributor.director	Casas-Roma, Jordi	-
dc.contributor.tutor	Subirats, Laia	-
dc.rights.accessRights	info:eu-repo/semantics/openAccess	-
Aparece en las colecciones:	Bachelor thesis, research projects, etc.

Ficheros en este ítem:

Fichero	Descripción	Tamaño	Formato
marcreestTFM0619presentacion.pptx	Presentación en pptx	1,57 MB	Microsoft Powerpoint XML	Visualizar/Abrir
marcreestTFM0619memoria.pdf	Memoria del TFM	1,06 MB	Adobe PDF	Visualizar/Abrir
marcreestTFM0619presentación.pdf	Presentación del TFM	1,97 MB	Adobe PDF	Visualizar/Abrir