A novel numerical representation for proteins: Three-dimensional Chaos Game Representation and its Extended Natural Vector

التفاصيل البيبلوغرافية
العنوان: A novel numerical representation for proteins: Three-dimensional Chaos Game Representation and its Extended Natural Vector
المؤلفون: Shaojun Pei, Stephen S.-T. Yau, Zeju Sun, Rong Lucy He
المصدر: Computational and Structural Biotechnology Journal, Vol 18, Iss, Pp 1904-1913 (2020)
Computational and Structural Biotechnology Journal
سنة النشر: 2020
مصطلحات موضوعية: Three-dimensional CGR, Computer science, lcsh:Biotechnology, Extended Natural Vector, Biophysics, Unit square, Biochemistry, Chaos Game Representation, Image (mathematics), 03 medical and health sciences, Dodecahedron, 0302 clinical medicine, Structural Biology, lcsh:TP248.13-248.65, Genetics, Representation (mathematics), ComputingMethodologies_COMPUTERGRAPHICS, 030304 developmental biology, Quantitative Biology::Biomolecules, 0303 health sciences, Series (mathematics), Euclidean space, Order (ring theory), Computer Science Applications, Protein classification, Distribution (mathematics), 030220 oncology & carcinogenesis, Algorithm, Research Article, Biotechnology
الوصف: Graphical abstract
Highlights • This method is a novel numerical representation for proteins. • It includes 3-dimensional Chaos Game Representation and Extended Natural Vector. • The new method performs well on protein classification and phylogenetic analysis. • The method can reflect the differences of protein structure.
Chaos Game Representation (CGR) was first proposed to be an image representation method of DNA and have been extended to the case of other biological macromolecules. Compared with the CGR images of DNA, where DNA sequences are converted into a series of points in the unit square, the existing CGR images of protein are not so elegant in geometry and the implications of the distribution of points in the CGR image are not so obvious. In this study, by naturally distributing the twenty amino acids on the vertices of a regular dodecahedron, we introduce a novel three-dimensional image representation of protein sequences with CGR method. We also associate each CGR image with a vector in high dimensional Euclidean space, called the extended natural vector (ENV), in order to analyze the information contained in the CGR images. Based on the results of protein classification and phylogenetic analysis, our method could serve as a precise method to discover biological relationships between proteins.
تدمد: 2001-0370
URL الوصول: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::e11cbf7c24008ba0f7fc3c74ba0eac05
https://pubmed.ncbi.nlm.nih.gov/32774785
حقوق: OPEN
رقم الأكسشن: edsair.doi.dedup.....e11cbf7c24008ba0f7fc3c74ba0eac05
قاعدة البيانات: OpenAIRE