Integrative learning for heterogeneous blockwise missing omics data

Baena Miret, Sergi

Please use this identifier to cite or link to this item: http://hdl.handle.net/10609/145466

Title:	Integrative learning for heterogeneous blockwise missing omics data
Author:	Baena Miret, Sergi
Tutor:	Reverter, Ferran Vegas Lozano, Esteban
Abstract:	On many occasions the information that one can gather is not complete, since for some observations not all data sources are available (what is known as block-wise missing data) so the question that arises is how we could implement an integrative process with block-wise missing data based on a Lasso's type approximation that then could be applied to real omics data. Indeed, in this thesis we will solve an optimization regression problem consisting on a unified feature learning model for heterogeneous block-wise missing (or even complete) data that performs both feature-level and source-level analysis simultaneously. The novelty on this thesis relies on that although one can find the formulation and the theoretical optimization of the problem, we have not been able to find its code implementation anywhere, so it has been impossible for us (until we have succeed implementing them) to give a reasonable evaluation of the model. Indeed, for the evaluation of the model (the study of its effectiveness and performance) we will use synthetic data generated by a linear regression model and real data drawn from a new collaborative research project called the Human Early-Life Exposome (HELIX). All in all, in this manuscript we have studied a bi-level feature learning model motivated by the exposome data and we have implemented a code that approaches for both complete and block-wise missing data. Specifically, we have introduced a unified feature learning model for complete data, which contains several classical convex models that has been easily extended to handling the more challenging case: the block-wise missing data. At the end we have succeed in presenting an optimization regression model that given complete or block-wise missing data, we can obtain information from it in order to make predictions for similar structured data. In particular, we have observed great results for the simulated data and quite good results for this exposome data.
Keywords:	optimization regression model machine learning omics data
Document type:	info:eu-repo/semantics/masterThesis
Issue Date:	2-Jun-2022
Publication license:	http://creativecommons.org/licenses/by/3.0/es/
Appears in Collections:	Trabajos finales de carrera, trabajos de investigación, etc.