دورية أكاديمية

dgfr: an R package to assess sequence diversity of gene families

التفاصيل البيبلوغرافية
العنوان: dgfr: an R package to assess sequence diversity of gene families
المؤلفون: Laila Viana Almeida, João Luís Reis-Cunha, Daniella C. Bartholomeu
المصدر: BMC Bioinformatics, Vol 25, Iss 1, Pp 1-6 (2024)
بيانات النشر: BMC, 2024.
سنة النشر: 2024
المجموعة: LCC:Computer applications to medicine. Medical informatics
LCC:Biology (General)
مصطلحات موضوعية: Sequence diversity, Gene families, Clustering, Computer applications to medicine. Medical informatics, R858-859.7, Biology (General), QH301-705.5
الوصف: Abstract Background Gene families are groups of homologous genes that often have similar biological functions. These families are formed by gene duplication events throughout evolution, resulting in multiple copies of an ancestral gene. Over time, these copies can acquire mutations and structural variations, resulting in members that may vary in size, motif ordering and sequence. Multigene families have been described in a broad range of organisms, from single-celled bacteria to complex multicellular organisms, and have been linked to an array of phenomena, such as host–pathogen interactions, immune evasion and embryonic development. Despite the importance of gene families, few approaches have been developed for estimating and graphically visualizing their diversity patterns and expression profiles in genome-wide studies. Results Here, we introduce an R package named dgfr, which estimates and enables the visualization of sequence divergence within gene families, as well as the visualization of secondary data such as gene expression. The package takes as input a multi-fasta file containing the coding sequences (CDS) or amino acid sequences from a multigene family, performs a pairwise alignment among all sequences, and estimates their distance, which is subjected to dimension reduction, optimal cluster determination, and gene assignment to each cluster. The result is a dataset that allows for the visualization of sequence divergence and expression within the gene family, an approximation of the number of clusters present in the family. Conclusions dgfr provides a way to estimate and study the diversity of gene families, as well as visualize the dispersion and secondary profile of the sequences. The dgfr package is available at https://github.com/lailaviana/dgfr under the GPL-3 license.
نوع الوثيقة: article
وصف الملف: electronic resource
اللغة: English
تدمد: 1471-2105
Relation: https://doaj.org/toc/1471-2105
DOI: 10.1186/s12859-024-05826-2
URL الوصول: https://doaj.org/article/2943ce5ea26646df9cd39ecf55935048
رقم الأكسشن: edsdoj.2943ce5ea26646df9cd39ecf55935048
قاعدة البيانات: Directory of Open Access Journals
الوصف
تدمد:14712105
DOI:10.1186/s12859-024-05826-2