دورية أكاديمية

Enhancing partial least squares modeling of comprehensive two-dimensional gas chromatography time-of-flight mass spectrometry data by tile-based variance ranking.

التفاصيل البيبلوغرافية
العنوان: Enhancing partial least squares modeling of comprehensive two-dimensional gas chromatography time-of-flight mass spectrometry data by tile-based variance ranking.
المؤلفون: Cain CN; Department of Chemistry, University of Washington, Box 351700, Seattle, WA, 98195, USA., Ochoa GS; Department of Chemistry, University of Washington, Box 351700, Seattle, WA, 98195, USA., Synovec RE; Department of Chemistry, University of Washington, Box 351700, Seattle, WA, 98195, USA. Electronic address: synovec@chem.washington.edu.
المصدر: Journal of chromatography. A [J Chromatogr A] 2023 Apr 12; Vol. 1694, pp. 463920. Date of Electronic Publication: 2023 Mar 11.
نوع المنشور: Journal Article
اللغة: English
بيانات الدورية: Publisher: Elsevier Country of Publication: Netherlands NLM ID: 9318488 Publication Model: Print-Electronic Cited Medium: Internet ISSN: 1873-3778 (Electronic) Linking ISSN: 00219673 NLM ISO Abbreviation: J Chromatogr A Subsets: MEDLINE
أسماء مطبوعة: Original Publication: Amsterdam ; New York : Elsevier, 1993-
مواضيع طبية MeSH: Algorithms*, Least-Squares Analysis ; Gas Chromatography-Mass Spectrometry/methods
مستخلص: Chemometric methods like partial least squares (PLS) regression are valuable for correlating sample-based differences hidden in comprehensive two-dimensional gas chromatography (GC × GC) data to independently measured physicochemical properties. Herein, this work establishes the first implementation of tile-based variance ranking as a selective data reduction methodology to improve PLS modeling performance of 58 diverse aerospace fuels. Tile-based variance ranking discovered a total of 521 analytes with a square of the relative standard deviation (RSD 2 ) in signal between 0.07 to 22.84. The goodness-of-fit for the models were determined by their normalized root-mean-square error of cross-validation (NRMSECV) and normalized root-mean-square error of prediction (NRMSEP). PLS models developed for viscosity, hydrogen content, and heat of combustion using all 521 features discovered by tile-based variance ranking had a respective NRMSECV (NRMSEP) equal to 10.5 % (10.2 %), 8.3 % (7.6 %), and 13.1 % (13.5 %). In contrast, use of a single-grid binning scheme, a common data reduction strategy for PLS analysis, resulted in less accurate models for viscosity (NRMSECV = 14.2 %; NRMSEP = 14.3 %), hydrogen content (NRMSECV = 12.1 %; NRMSEP = 11.0 %), and heat of combustion (NRMSECV = 14.4 %; NRMSEP = 13.6 %). Further, the features discovered by tile-based variance ranking can be optimized for each PLS model with RReliefF analysis, a machine learning algorithm. RReliefF feature optimization selected 48, 125, and 172 analytes out of the original 521 discovered by tile-based variance ranking to model viscosity, hydrogen content, and heat of combustion, respectively. The RReliefF optimized features developed highly accurate property-composition models for viscosity (NRMSECV = 7.9 %; NRMSEP = 5.8 %), hydrogen content (NRMSECV = 7.0 %; NRMSEP = 4.9 %), heat of combustion (NRMSECV = 7.9 %; NRMSEP = 8.4 %). This work also demonstrates that processing the chromatograms with a tile-based approach allows the analyst to directly identify the analytes of importance in a PLS model. Coupling tile-based feature selection with PLS analysis allows for deeper understanding in any property-composition study.
Competing Interests: Declaration of Competing Interest None.
(Copyright © 2023 Elsevier B.V. All rights reserved.)
فهرسة مساهمة: Keywords: Comprehensive two-dimensional gas chromatography; Feature selection; Fuel analysis; Partial least squares regression; Tile-based variance ranking
تواريخ الأحداث: Date Created: 20230318 Date Completed: 20230331 Latest Revision: 20230331
رمز التحديث: 20231215
DOI: 10.1016/j.chroma.2023.463920
PMID: 36933463
قاعدة البيانات: MEDLINE
الوصف
تدمد:1873-3778
DOI:10.1016/j.chroma.2023.463920