Exploring the MIT Mathematics and EECS Curriculum Using Large Language Models

التفاصيل البيبلوغرافية
العنوان:	Exploring the MIT Mathematics and EECS Curriculum Using Large Language Models
المؤلفون:	Zhang, Sarah J., Florin, Samuel, Lee, Ariel N., Niknafs, Eamon, Marginean, Andrei, Wang, Annie, Tyser, Keith, Chin, Zad, Hicke, Yann, Singh, Nikhil, Udell, Madeleine, Kim, Yoon, Buonassisi, Tonio, Solar-Lezama, Armando, Drori, Iddo
سنة النشر:	2023
المجموعة:	Computer Science
مصطلحات موضوعية:	Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
الوصف:	We curate a comprehensive dataset of 4,550 questions and solutions from problem sets, midterm exams, and final exams across all MIT Mathematics and Electrical Engineering and Computer Science (EECS) courses required for obtaining a degree. We evaluate the ability of large language models to fulfill the graduation requirements for any MIT major in Mathematics and EECS. Our results demonstrate that GPT-3.5 successfully solves a third of the entire MIT curriculum, while GPT-4, with prompt engineering, achieves a perfect solve rate on a test set excluding questions based on images. We fine-tune an open-source large language model on this dataset. We employ GPT-4 to automatically grade model responses, providing a detailed performance breakdown by course, question, and answer type. By embedding questions in a low-dimensional space, we explore the relationships between questions, topics, and classes and discover which questions and classes are required for solving other questions and classes through few-shot learning. Our analysis offers valuable insights into course prerequisites and curriculum design, highlighting language models' potential for learning and improving Mathematics and EECS education. Comment: Did not receive permission to release the data or model fine-tuned on the data
نوع الوثيقة:	Working Paper
URL الوصول:	http://arxiv.org/abs/2306.08997
رقم الأكسشن:	edsarx.2306.08997
قاعدة البيانات:	arXiv

الوصف
الوصف غير متاح.