Title
|
|
|
|
Metabolomics data processing and analysis approaches
| |
Author
|
|
|
|
| |
Abstract
|
|
|
|
The field of metabolomics studies the end products of cellular processes, referred to as metabolites. These metabolites are small molecules that perform a vast number of functions in living organisms. Experiments are performed to study these metabolites and their behaviour in specific environments. Nuclear magnetic resonance (NMR) spectroscopy and mass spectrometry coupled to liquid chromatography (LC-MS) are most commonly used. These techniques result in enormous amounts of data, this is especially the case for LC-MS experiments. This dissertation focusses on the data processing and analyses part following a metabolomics experiment. In the first part of this dissertation a novel tool, speaq 2.0, was developed for the pre-processing of NMR spectra. Analogous to the methods used for the pre-processing of LC-MS data, wavelets are used to perform peak-picking on the spectra. The use of peak-picking results in a substantial decrease in data size without a large loss of information. This is contrary to the common binning or bucketing pre-processing method for NMR. The software contains additional pre-processing steps such as grouping, peak filling, etc. to convert the spectra into a data matrix ready for statistical analysis or machine learning. A main advantage of the speaq software is its broad applicability. Firstly, it is possible to augment existing metabolomics workflows by providing a better pre-processing approach for data reduction and secondly, the method is applicable to other experiments (i.e. non metabolomics) that produce one dimensional spectra. The software is freely distributed in the form of the speaq R package. The second part of the dissertation focusses on a specific type of metabolomics experiment called dynamic metabolomics. In these experiments, data of the same sample are collected multiple times over the course of the experiment. This results in an increase in both data size, compared to standard metabolomics experiments, and in data analysis complexity, as the data are now of a longitudinal nature. Data analysis for dynamic or longitudinal metabolomics is not commonplace. In this thesis a novel method was developed that couples an expert label collection procedure, called tinderesting, to machine learning. This allows the fast screening of large datasets in search of signals relevant to the specific experimental setup. To validate this method a comparison was made to the state of the art, both for the analysis of simulated data and experimental data. We show that our method performs equal to the state-of-the-art on the simulated data but has specific benefits and performance gains when using it for analyzing experimental data. Finally, the process of combining experimental and simulated data to train a machine learning model is used to find data features with specific patterns in dynamic metabolomics data. The initial results indicate that this method can be used to find specific signals in complex data. |
| |
Language
|
|
|
|
Dutch
| |
Publication
|
|
|
|
Antwerpen
:
2019
| |
Volume/pages
|
|
|
|
148 p.
| |
Note
|
|
|
|
:
Laukens, Kris [Supervisor]
:
Covaci, Adrian [Supervisor]
| |
Full text (open access)
|
|
|
|
| |
|