Learning Interpretable Concepts: Unifying Causal Representation Learning and Foundation Models

التفاصيل البيبلوغرافية
العنوان: Learning Interpretable Concepts: Unifying Causal Representation Learning and Foundation Models
المؤلفون: Rajendran, Goutham, Buchholz, Simon, Aragam, Bryon, Schölkopf, Bernhard, Ravikumar, Pradeep
سنة النشر: 2024
المجموعة: Computer Science
Mathematics
Statistics
مصطلحات موضوعية: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Mathematics - Statistics Theory, Statistics - Machine Learning
الوصف: To build intelligent machine learning systems, there are two broad approaches. One approach is to build inherently interpretable models, as endeavored by the growing field of causal representation learning. The other approach is to build highly-performant foundation models and then invest efforts into understanding how they work. In this work, we relate these two approaches and study how to learn human-interpretable concepts from data. Weaving together ideas from both fields, we formally define a notion of concepts and show that they can be provably recovered from diverse data. Experiments on synthetic data and large language models show the utility of our unified approach.
Comment: 36 pages
نوع الوثيقة: Working Paper
URL الوصول: http://arxiv.org/abs/2402.09236
رقم الأكسشن: edsarx.2402.09236
قاعدة البيانات: arXiv