On the Complexity of Learning Sparse Functions with Statistical and Gradient Queries

التفاصيل البيبلوغرافية
العنوان:	On the Complexity of Learning Sparse Functions with Statistical and Gradient Queries
المؤلفون:	Joshi, Nirmit, Misiakiewicz, Theodor, Srebro, Nathan
سنة النشر:	2024
المجموعة:	Computer Science
مصطلحات موضوعية:	Computer Science - Machine Learning, Computer Science - Data Structures and Algorithms
الوصف:	The goal of this paper is to investigate the complexity of gradient algorithms when learning sparse functions (juntas). We introduce a type of Statistical Queries ($\mathsf{SQ}$), which we call Differentiable Learning Queries ($\mathsf{DLQ}$), to model gradient queries on a specified loss with respect to an arbitrary model. We provide a tight characterization of the query complexity of $\mathsf{DLQ}$ for learning the support of a sparse function over generic product distributions. This complexity crucially depends on the loss function. For the squared loss, $\mathsf{DLQ}$ matches the complexity of Correlation Statistical Queries $(\mathsf{CSQ})$--potentially much worse than $\mathsf{SQ}$. But for other simple loss functions, including the $\ell_1$ loss, $\mathsf{DLQ}$ always achieves the same complexity as $\mathsf{SQ}$. We also provide evidence that $\mathsf{DLQ}$ can indeed capture learning with (stochastic) gradient descent by showing it correctly describes the complexity of learning with a two-layer neural network in the mean field regime and linear scaling. Comment: 43 pages, 1 table, 1 figure
نوع الوثيقة:	Working Paper
URL الوصول:	http://arxiv.org/abs/2407.05622
رقم الأكسشن:	edsarx.2407.05622
قاعدة البيانات:	arXiv

الوصف
الوصف غير متاح.