Context-Guided Diffusion for Out-of-Distribution Molecular and Protein Design

التفاصيل البيبلوغرافية
العنوان: Context-Guided Diffusion for Out-of-Distribution Molecular and Protein Design
المؤلفون: Klarner, Leo, Rudner, Tim G. J., Morris, Garrett M., Deane, Charlotte M., Teh, Yee Whye
سنة النشر: 2024
المجموعة: Computer Science
Quantitative Biology
Statistics
مصطلحات موضوعية: Quantitative Biology - Biomolecules, Computer Science - Machine Learning, Statistics - Machine Learning
الوصف: Generative models have the potential to accelerate key steps in the discovery of novel molecular therapeutics and materials. Diffusion models have recently emerged as a powerful approach, excelling at unconditional sample generation and, with data-driven guidance, conditional generation within their training domain. Reliably sampling from high-value regions beyond the training data, however, remains an open challenge -- with current methods predominantly focusing on modifying the diffusion process itself. In this paper, we develop context-guided diffusion (CGD), a simple plug-and-play method that leverages unlabeled data and smoothness constraints to improve the out-of-distribution generalization of guided diffusion models. We demonstrate that this approach leads to substantial performance gains across various settings, including continuous, discrete, and graph-structured diffusion processes with applications across drug discovery, materials science, and protein design.
Comment: Published in the Proceedings of the 41st International Conference on Machine Learning (ICML 2024)
نوع الوثيقة: Working Paper
URL الوصول: http://arxiv.org/abs/2407.11942
رقم الأكسشن: edsarx.2407.11942
قاعدة البيانات: arXiv