G M Randazzo - Application of Deep Learning to Metabolomics
12 June 2019
Manno, Galleria 1, 2nd floor, room G1-201 @12:00
The untargeted steroid identification represents an important analytical challenge due to the chemical similarity of the molecules. Moreover, new experimental technologies such as the two-dimensional gas chromatography (GCxGC) coupled with high resolution time of fly mass spectrometry (HRMS-TOF) were demonstrated to show superior separation power especially for the isomeric compound discrimination. Unfortunately, few molecules are generally annotated, limiting thus the comprehension of the steroid metabolism in its complexity. To overcome this current limitation, in-silico retention time predictions represent an interesting option.
In this work, several machine learning and deep learning algorithms were utilised for the development of retention time prediction models in GCxGC. Starting from a three-dimensional molecular representation, convolutional neural networks (CNN) showed the best prediction performances compared to the classical machine learning models based on handcrafted molecular descriptors. Moreover, CNN were demonstrated to recognize the chiral information and to solve an important issue for steroid identification without the need for a manual feature engineering.
The final prediction model is applied to a real clinical case study. In combination with the MS information, retention time predictions allowed the untargeted annotation of 12 steroids in the urine of new-borns.