دورية أكاديمية
Enhancing partial least squares modeling of comprehensive two-dimensional gas chromatography time-of-flight mass spectrometry data by tile-based variance ranking.
العنوان: | Enhancing partial least squares modeling of comprehensive two-dimensional gas chromatography time-of-flight mass spectrometry data by tile-based variance ranking. |
---|---|
المؤلفون: | Cain CN; Department of Chemistry, University of Washington, Box 351700, Seattle, WA, 98195, USA., Ochoa GS; Department of Chemistry, University of Washington, Box 351700, Seattle, WA, 98195, USA., Synovec RE; Department of Chemistry, University of Washington, Box 351700, Seattle, WA, 98195, USA. Electronic address: synovec@chem.washington.edu. |
المصدر: | Journal of chromatography. A [J Chromatogr A] 2023 Apr 12; Vol. 1694, pp. 463920. Date of Electronic Publication: 2023 Mar 11. |
نوع المنشور: | Journal Article |
اللغة: | English |
بيانات الدورية: | Publisher: Elsevier Country of Publication: Netherlands NLM ID: 9318488 Publication Model: Print-Electronic Cited Medium: Internet ISSN: 1873-3778 (Electronic) Linking ISSN: 00219673 NLM ISO Abbreviation: J Chromatogr A Subsets: MEDLINE |
أسماء مطبوعة: | Original Publication: Amsterdam ; New York : Elsevier, 1993- |
مواضيع طبية MeSH: | Algorithms*, Least-Squares Analysis ; Gas Chromatography-Mass Spectrometry/methods |
مستخلص: | Chemometric methods like partial least squares (PLS) regression are valuable for correlating sample-based differences hidden in comprehensive two-dimensional gas chromatography (GC × GC) data to independently measured physicochemical properties. Herein, this work establishes the first implementation of tile-based variance ranking as a selective data reduction methodology to improve PLS modeling performance of 58 diverse aerospace fuels. Tile-based variance ranking discovered a total of 521 analytes with a square of the relative standard deviation (RSD 2 ) in signal between 0.07 to 22.84. The goodness-of-fit for the models were determined by their normalized root-mean-square error of cross-validation (NRMSECV) and normalized root-mean-square error of prediction (NRMSEP). PLS models developed for viscosity, hydrogen content, and heat of combustion using all 521 features discovered by tile-based variance ranking had a respective NRMSECV (NRMSEP) equal to 10.5 % (10.2 %), 8.3 % (7.6 %), and 13.1 % (13.5 %). In contrast, use of a single-grid binning scheme, a common data reduction strategy for PLS analysis, resulted in less accurate models for viscosity (NRMSECV = 14.2 %; NRMSEP = 14.3 %), hydrogen content (NRMSECV = 12.1 %; NRMSEP = 11.0 %), and heat of combustion (NRMSECV = 14.4 %; NRMSEP = 13.6 %). Further, the features discovered by tile-based variance ranking can be optimized for each PLS model with RReliefF analysis, a machine learning algorithm. RReliefF feature optimization selected 48, 125, and 172 analytes out of the original 521 discovered by tile-based variance ranking to model viscosity, hydrogen content, and heat of combustion, respectively. The RReliefF optimized features developed highly accurate property-composition models for viscosity (NRMSECV = 7.9 %; NRMSEP = 5.8 %), hydrogen content (NRMSECV = 7.0 %; NRMSEP = 4.9 %), heat of combustion (NRMSECV = 7.9 %; NRMSEP = 8.4 %). This work also demonstrates that processing the chromatograms with a tile-based approach allows the analyst to directly identify the analytes of importance in a PLS model. Coupling tile-based feature selection with PLS analysis allows for deeper understanding in any property-composition study. Competing Interests: Declaration of Competing Interest None. (Copyright © 2023 Elsevier B.V. All rights reserved.) |
فهرسة مساهمة: | Keywords: Comprehensive two-dimensional gas chromatography; Feature selection; Fuel analysis; Partial least squares regression; Tile-based variance ranking |
تواريخ الأحداث: | Date Created: 20230318 Date Completed: 20230331 Latest Revision: 20230331 |
رمز التحديث: | 20231215 |
DOI: | 10.1016/j.chroma.2023.463920 |
PMID: | 36933463 |
قاعدة البيانات: | MEDLINE |
تدمد: | 1873-3778 |
---|---|
DOI: | 10.1016/j.chroma.2023.463920 |