Bypassing the Identification: MS2Quant for Concentration Estimations of Chemicals Detected with Nontarget LC-HRMS from MS2 Data

التفاصيل البيبلوغرافية
العنوان: Bypassing the Identification: MS2Quant for Concentration Estimations of Chemicals Detected with Nontarget LC-HRMS from MS2 Data
المؤلفون: Sepman, Helen, Malm, Louise, Peets, Pilleriin, 1992, MacLeod, Matthew, 1973, Martin, Jonathan W., Breitholtz, Magnus, Kruve, Anneli
المصدر: Analytical Chemistry. 95(33):12329-12338
الوصف: Nontarget analysis by liquid chromatography-high-resolutionmass spectrometry (LC-HRMS) is now widely used to detect pollutants in the environment. Shifting away from targeted methods has led to detection of previously unseen chemicals, and assessing the risk posed by these newly detected chemicals is an important challenge. Assessing exposure and toxicity of chemicals detected with nontarget HRMS is highly dependent on the knowledge of the structure of the chemical. However, the majority of features detected in nontarget screening remain unidentified and therefore the risk assessment with conventional tools is hampered. Here, we developed MS2Quant, a machine learning model that enables prediction of concentration from fragmentation(MS2) spectra of detected, but unidentified chemicals. MS2Quant is an xgbTree algorithm-based regression model developed using ionization efficiency data for 1191 unique chemicals that spans 8 orders of magnitude. The ionization efficiency values are predicted from structural fingerprints that can be computed from the SMILES notation of the identified chemicals or from MS2 spectra of unidentified chemicals using SIRIUS+CSI: FingerID software. The root mean square errors of the training and test sets were 0.55(3.5x) and 0.80 (6.3x) log-units, respectively. In comparison, ionization efficiency prediction approaches that depend on assigning an unequivocal structure typically yield errors from 2x to 6x. The MS2Quant quantification model was validated on a set of 39 environmental pollutants and resulted in a mean prediction error of 7.4x, ageometric mean of 4.5x, and a median of 4.0x. For comparison, a model based on PaDEL descriptors that depends on unequivocal structural assignment was developed using the same dataset. The latter approach yielded a comparable mean prediction error of 9.5x, a geometricmean of 5.6x, and a median of 5.2x on the validation set chemicals when the top structural assignment was used as input. This confirms that MS2Quant enables to extract exposure information for unidentified chemicals which, although detected, have thus far been disregarded due to lack of accurate tools for quantification. TheMS2Quant model is available as an R-package in GitHub for improving discovery and monitoring of potentially hazardous environmental pollutants with nontarget screening.
وصف الملف: print
URL الوصول: https://urn.kb.se/resolve?urn=urn:nbn:se:su:diva-220853
https://doi.org/10.1021/acs.analchem.3c01744
قاعدة البيانات: SwePub
الوصف
تدمد:00032700
15206882
DOI:10.1021/acs.analchem.3c01744