On the Origins of Linear Representations in Large Language Models

التفاصيل البيبلوغرافية
العنوان: On the Origins of Linear Representations in Large Language Models
المؤلفون: Jiang, Yibo, Rajendran, Goutham, Ravikumar, Pradeep, Aragam, Bryon, Veitch, Victor
سنة النشر: 2024
المجموعة: Computer Science
Statistics
مصطلحات موضوعية: Computer Science - Computation and Language, Computer Science - Machine Learning, Statistics - Machine Learning
الوصف: Recent works have argued that high-level semantic concepts are encoded "linearly" in the representation space of large language models. In this work, we study the origins of such linear representations. To that end, we introduce a simple latent variable model to abstract and formalize the concept dynamics of the next token prediction. We use this formalism to show that the next token prediction objective (softmax with cross-entropy) and the implicit bias of gradient descent together promote the linear representation of concepts. Experiments show that linear representations emerge when learning from data matching the latent variable model, confirming that this simple structure already suffices to yield linear representations. We additionally confirm some predictions of the theory using the LLaMA-2 large language model, giving evidence that the simplified model yields generalizable insights.
نوع الوثيقة: Working Paper
URL الوصول: http://arxiv.org/abs/2403.03867
رقم الأكسشن: edsarx.2403.03867
قاعدة البيانات: arXiv