Please use this identifier to cite or link to this item: http://hdl.handle.net/10609/149868
Title: Application of machine learning methods to predict phytoplankton blooms and determine microbial biomarkers using marine microbiomes
Author: Fernandez-Gonzalez, Nuria  
Tutor: Rebrij, Romina  
Others: Ventura, Carles  
Abstract: Understanding the relationship between bacterioplankton and coastal phytoplankton blooms is key to understand coastal ecosystems functioning, which are the most productive areas for fisheries. With that knowledge, we could predict and may be mitigate, the effects of global change or contamination events in these productive ecosystems. However, these microbial communities are governed by very complex relationships. In addition, the data used to study bacterioplankton diversity (Amplicon Sequence Variants of 16S rRNA gene) is highly dimensional, sparse, and noisy. In this project, Random Forest classifiers based on diversity data were used to predict coastal phytoplankton blooms and search for their biomarkers. After joining two oceanographic campaigns data, samples were classified as bloom or normal depending on the total chlorophyll concentrations. The resulting dataset was highly dimensional (166 instances, 7593 features) and imbalanced (31 instances bloom, 135 – normal). To reduce dimensionality, biological features with relative abundances below 0.01 were removed, or they were grouped into clusters at genus level. Random forest models were trained and tuned with a grid-search of the number of features included in the individual trees. The process was repeated using one hundred different data splits into train and test groups to ensure results’ representativity. Good performance values (kappa, sensitivity, and specificity > 0.8) were achieved only after using the synthetic minority oversampling technique to level the number of instances between the two categories. Using those models, the topmost important features, according to the predictive error rate of features, were selected as biomarkers.
Keywords: coastal blooms
biomarkers
random forest
Document type: info:eu-repo/semantics/masterThesis
Issue Date: 20-Jun-2023
Publication license: http://creativecommons.org/licenses/by-nc-nd/3.0/es/  
Appears in Collections:Trabajos finales de carrera, trabajos de investigación, etc.

Files in This Item:
File Description SizeFormat 
nuriafergonzalezFMDP1323report.pdfReport of FMDP3,04 MBAdobe PDFThumbnail
View/Open
Share:
Export:
View statistics

This item is licensed under aCreative Commons License Creative Commons