دورية أكاديمية

A phylogenetic approach for weighting genetic sequences

التفاصيل البيبلوغرافية
العنوان: A phylogenetic approach for weighting genetic sequences
المؤلفون: Nicola De Maio, Alexander V. Alekseyenko, William J. Coleman-Smith, Fabio Pardi, Marc A. Suchard, Asif U. Tamuri, Jakub Truszkowski, Nick Goldman
المصدر: BMC Bioinformatics, Vol 22, Iss 1, Pp 1-24 (2021)
بيانات النشر: BMC, 2021.
سنة النشر: 2021
المجموعة: LCC:Computer applications to medicine. Medical informatics
LCC:Biology (General)
مصطلحات موضوعية: Phylogenetics, Sequence weights, Alignment, Protein profile, Conservation scores, Computer applications to medicine. Medical informatics, R858-859.7, Biology (General), QH301-705.5
الوصف: Abstract Background Many important applications in bioinformatics, including sequence alignment and protein family profiling, employ sequence weighting schemes to mitigate the effects of non-independence of homologous sequences and under- or over-representation of certain taxa in a dataset. These schemes aim to assign high weights to sequences that are ‘novel’ compared to the others in the same dataset, and low weights to sequences that are over-represented. Results We formalise this principle by rigorously defining the evolutionary ‘novelty’ of a sequence within an alignment. This results in new sequence weights that we call ‘phylogenetic novelty scores’. These scores have various desirable properties, and we showcase their use by considering, as an example application, the inference of character frequencies at an alignment column—important, for example, in protein family profiling. We give computationally efficient algorithms for calculating our scores and, using simulations, show that they are versatile and can improve the accuracy of character frequency estimation compared to existing sequence weighting schemes. Conclusions Our phylogenetic novelty scores can be useful when an evolutionarily meaningful system for adjusting for uneven taxon sampling is desired. They have numerous possible applications, including estimation of evolutionary conservation scores and sequence logos, identification of targets in conservation biology, and improving and measuring sequence alignment accuracy.
نوع الوثيقة: article
وصف الملف: electronic resource
اللغة: English
تدمد: 1471-2105
Relation: https://doaj.org/toc/1471-2105
DOI: 10.1186/s12859-021-04183-8
URL الوصول: https://doaj.org/article/800e81744aa5473185abd18a8a0160c6
رقم الأكسشن: edsdoj.800e81744aa5473185abd18a8a0160c6
قاعدة البيانات: Directory of Open Access Journals
الوصف
تدمد:14712105
DOI:10.1186/s12859-021-04183-8