sLLM: Accelerating LLM Inference using Semantic Load Balancing with Shared Memory Data Structures

التفاصيل البيبلوغرافية
العنوان: sLLM: Accelerating LLM Inference using Semantic Load Balancing with Shared Memory Data Structures
المؤلفون: Lin, Jieyu, Zhang, Sai Qian, Leon-Garcia, Alberto
المصدر: 2024 25th International Symposium on Quality Electronic Design (ISQED) Quality Electronic Design (ISQED), 2024 25th International Symposium on. :1-6 Apr, 2024
Relation: 2024 25th International Symposium on Quality Electronic Design (ISQED)
قاعدة البيانات: IEEE Xplore Digital Library
الوصف
ردمك:9798350309270
9798350309263
تدمد:19483295
DOI:10.1109/ISQED60706.2024.10528703