Please use this identifier to cite or link to this item: http://hdl.handle.net/10609/91506
Title: Análisis y optimización de un autoencoder variacional semisupervisado para diseño molecular condicionado
Author: Colmenarejo Sánchez, Gonzalo
Tutor: Vegas Lozano, Esteban
Others: Sánchez-Pla, Alex  
Abstract: The recent semisupervised variational autoencoder (SSVAE) of Kang & Chao (2018, J. Chem. Inf. Model., Article ASAP DOI: 10.1021/acs.jcim.8b00263) has been analyzed and optimized as Deep Learning inverse QSAR model for conditional molecular design. The aim is to characterize the output of the model (correctness, diversity and novelty, properties distribution) based on different factors: size and diversity of the training set, size of output, type of molecules (drug-like vs natural products) and conditioning properties (MWt, logP and QED vs TPSA, MR and LASA). TensorFlow has been used for the simulations and RDKit and chemfp as chemoinformatic libraries. The model, in its unconditioned mode, generates sets of molecules with high diversity (measured as number of unique molecules, number of clusters, and number of frameworks) and relatively low novelty (measured as percentage of molecules with no analogs, and percentage of new frameworks), while in the conditioned mode the diversity decreases and the novelty increases. Correction is slightly higher in the conditioned mode, but always showing very high values (>90%). Diversity increases with the size of the output set (at a lower rate in conditioned mode), and is not dependent on the size of the training set. Novelty decreases with the size of the training set, and increases with that of the output set. Both diversity and novelty increase with the intensive diversity of the training set. Moreover, the SSVAE has been modified to generate natural products and conditioned analogs (via multiobjective conditioning) thus extending the original molecular design capabilities of the model.
Keywords: deep learning
molecular design
variational autoencoder
Document type: info:eu-repo/semantics/masterThesis
Issue Date: 31-Dec-2018
Publication license: http://creativecommons.org/licenses/by-nc-nd/3.0/es/  
Appears in Collections:Trabajos finales de carrera, trabajos de investigación, etc.

Files in This Item:
File Description SizeFormat 

PresentaTFMGonzaloColmenarejoUOC.mp4

Presentación del TFM216,65 MBMP4View/Open
gcolmenarejoTFM1218memoria.pdfMemoria del TFM2,23 MBAdobe PDFThumbnail
View/Open