TransMHCII: a novel MHC-II binding prediction model built using a protein language model and an image classifier

التفاصيل البيبلوغرافية
العنوان: TransMHCII: a novel MHC-II binding prediction model built using a protein language model and an image classifier
المؤلفون: Xin Yu, Christopher Negron, Lili Huang, Geertruida Veldman
المصدر: Antibody Therapeutics.
بيانات النشر: Oxford University Press (OUP), 2023.
سنة النشر: 2023
مصطلحات موضوعية: Immunology, Immunology and Allergy
الوصف: Background The emergence of deep learning models such as AlphaFold2 has revolutionized the structure prediction of proteins. Nevertheless, much remains unexplored, especially on how we utilize structure models to predict biological properties. Herein, we present a method using features extracted from protein language models (PLMs) to predict the major histocompatibility complex class II (MHC-II) binding affinity of peptides. Specifically, we evaluated a novel transfer learning approach where the backbone of our model was interchanged with architectures designed for image classification tasks. Methods Features extracted from several PLMs (ESM1b, ProtXLNet, or ProtT5-XL-UniRef) were passed into image models (EfficientNet v2b0, EfficientNet v2m, or ViT-16) originally designed to classify pictures of ordinary objects. Eleven chimeric models were built to classify peptides into four bins according to their affinity to one of the selected 56 human MHC-II alleles. The models were trained on a dataset of 111,564 sequences with an 85%–15% train-validation split and evaluated on an independent test dataset of 21,424 sequences. Results The optimal pairing of PLM and image classifier resulted in the final model TransMHCII, with accuracies of 0.966 and 0.997 on the training and validation sets, respectively. On the test dataset, TransMHCII outperformed NetMHCIIpan 3.2 and NetMHCIIpan 4.0-BA on ROC AUC, balanced accuracy, and Jaccard scores. Conclusion We demonstrated that it is possible to develop robust MHC-II binding models from PLM embeddings and existing image classifiers. Our model facilitates antibody immunogenicity risk assessment, and its architecture innovation may inspire the development of other deep learning models for biological property prediction. Statement of Significance To our knowledge, TransMHCII is the first multi-class classification model for MHC-II binding prediction. This is also the first study that utilizes PLM embeddings to predict peptide/MHC-II binding. In addition, it is also the first attempt to integrate PLMs with image classifiers for biological property prediction. This approach is highly modular, enabling architecture optimization within a few swaps.
تدمد: 2516-4236
URL الوصول: https://explore.openaire.eu/search/publication?articleId=doi_________::f5bc114bc29de29b44cf849c8118d34b
https://doi.org/10.1093/abt/tbad011
حقوق: OPEN
رقم الأكسشن: edsair.doi...........f5bc114bc29de29b44cf849c8118d34b
قاعدة البيانات: OpenAIRE