Comparison of missing data handling methods for variant pathogenicity predictors

التفاصيل البيبلوغرافية
العنوان: Comparison of missing data handling methods for variant pathogenicity predictors
المؤلفون: Mikko Ilmari Särkkä, Sami Myöhänen, Kaloyan Marinov, Inka Saarinen, Leo Lahti, Vittorio Fortino, Jussi Paananen
بيانات النشر: Cold Spring Harbor Laboratory, 2022.
سنة النشر: 2022
الوصف: Background Modern clinical genetic tests utilize next-generation sequencing (NGS) approaches to comprehensively analyze genetic variants from patients. Out of these millions of variants, clinically relevant variants that match the patient's phenotype need to be identified accurately within a rapid timeframe that facilitates clinical action. As manual evaluation of variants is not a feasible option for meeting the speed and volume requirements of clinical genetic testing, automated solutions are needed. Various machine learning (ML), artificial intelligence (AI), and in silico variant pathogenicity predictors have been developed to solve this challenge. These solutions rely on the comprehensiveness of the available data and struggle with the sparse nature of genetic variant data. Therefore, careful treatment of missing data is necessary, and the selected methods may have a huge impact on the accuracy, reliability, speed and associated computational costs. Results We present an open-source framework called AMISS that can be used to evaluate performance of different methods for handling missing genetic variant data in the context of variant pathogenicity prediction. Using AMISS, we evaluated 14 methods for handling missing values. The performance of these methods varied substantially in terms of precision, computational costs, and other attributes. Overall, simpler imputation methods and specifically mean imputation performed best. Conclusions Selection of the missing data handling method is crucial for AI/ML-based classification of genetic variants. We show that utilizing sophisticated imputation methods is not worth the cost when used in the context of genetic variant pathogenicity classification.
URL الوصول: https://explore.openaire.eu/search/publication?articleId=doi_________::aac9baef7c3c796ba8843e0610f026f2
https://doi.org/10.1101/2022.06.17.496578
رقم الأكسشن: edsair.doi...........aac9baef7c3c796ba8843e0610f026f2
قاعدة البيانات: OpenAIRE