MUX-PLMs: Data Multiplexing for High-throughput Language Models

التفاصيل البيبلوغرافية
العنوان:	MUX-PLMs: Data Multiplexing for High-throughput Language Models
المؤلفون:	Murahari, Vishvak, Deshpande, Ameet, Jimenez, Carlos E., Shafran, Izhak, Wang, Mingqiu, Cao, Yuan, Narasimhan, Karthik
سنة النشر:	2023
المجموعة:	Computer Science
مصطلحات موضوعية:	Computer Science - Machine Learning, Computer Science - Computation and Language
الوصف:	The widespread adoption of large language models such as ChatGPT and Bard has led to unprecedented demand for these technologies. The burgeoning cost of inference for ever-increasing model sizes coupled with hardware shortages has limited affordable access and poses a pressing need for efficiency approaches geared towards high throughput and performance. Multi-input multi-output (MIMO) algorithms such as data multiplexing, offer a promising solution with a many-fold increase in throughput by performing inference for multiple inputs at the cost of a single input. Yet these approaches are not currently performant enough to be deployed in modern systems. We change that by developing MUX-PLMs, a class of high throughput pre-trained language models (PLMs) trained with data multiplexing, that can be fine-tuned for any downstream task to yield high-throughput high-performance. Our novel multiplexing and demultiplexing modules proficiently entangle and disentangle inputs, and enable high-performance high throughput \muxplms{} that are competitive with vanilla PLMs while achieving 2x/5x inference speedup with only a $1-4\%$ drop on a broad suite of tasks.
نوع الوثيقة:	Working Paper
URL الوصول:	http://arxiv.org/abs/2302.12441
رقم الأكسشن:	edsarx.2302.12441
قاعدة البيانات:	arXiv

الوصف
الوصف غير متاح.