Chemistry-informed Machine Learning Explains Calcium-binding Proteins Fuzzy Shape for Communicating Changes in the Atomic States of Calcium Ions

التفاصيل البيبلوغرافية
العنوان: Chemistry-informed Machine Learning Explains Calcium-binding Proteins Fuzzy Shape for Communicating Changes in the Atomic States of Calcium Ions
المؤلفون: Zhang, Pengzhi, Nde, Jules, Eliaz, Yossi, Jennings, Nathaniel, Cieplak, Piotr, Cheung, Margaret. S.
سنة النشر: 2024
المجموعة: Quantitative Biology
مصطلحات موضوعية: Quantitative Biology - Quantitative Methods
الوصف: Proteins' fuzziness are features for communicating changes in cell signaling instigated by binding with secondary messengers, such as calcium ions, associated with the coordination of muscle contraction, neurotransmitter release, and gene expression. Binding with the disordered parts of a protein, calcium ions must balance their charge states with the shape of calcium-binding proteins and their versatile pool of partners depending on the circumstances they transmit, but it is unclear whether the limited experimental data available can be used to train models to accurately predict the charges of calcium-binding protein variants. Here, we developed a chemistry-informed, machine-learning algorithm that implements a game theoretic approach to explain the output of a machine-learning model without the prerequisite of an excessively large database for high-performance prediction of atomic charges. We used the ab initio electronic structure data representing calcium ions and the structures of the disordered segments of calcium-binding peptides with surrounding water molecules to train several explainable models. Network theory was used to extract the topological features of atomic interactions in the structurally complex data dictated by the coordination chemistry of a calcium ion, a potent indicator of its charge state in protein. With our designs, we provided a framework of explainable machine learning model to annotate atomic charges of calcium ions in calcium-binding proteins with domain knowledge in response to the chemical changes in an environment based on the limited size of scientific data in a genome space.
Comment: submitted
نوع الوثيقة: Working Paper
URL الوصول: http://arxiv.org/abs/2407.17017
رقم الأكسشن: edsarx.2407.17017
قاعدة البيانات: arXiv