Modeling and Controlling Many-Core HPC Processors: an Alternative to PID and Moving Average Algorithms

التفاصيل البيبلوغرافية
العنوان: Modeling and Controlling Many-Core HPC Processors: an Alternative to PID and Moving Average Algorithms
المؤلفون: Bambini, Giovanni, Ottaviano, Alessandro, Conficoni, Christian, Tilli, Andrea, Benini, Luca, Bartolini, Andrea
سنة النشر: 2024
المجموعة: Computer Science
مصطلحات موضوعية: Electrical Engineering and Systems Science - Systems and Control, Computer Science - Performance
الوصف: The race towards performance increase and computing power has led to chips with heterogeneous and complex designs, integrating an ever-growing number of cores on the same monolithic chip or chiplet silicon die. Higher integration density, compounded with the slowdown of technology-driven power reduction, implies that power and thermal management become increasingly relevant. Unfortunately, existing research lacks a detailed analysis and modeling of thermal, power, and electrical coupling effects and how they have to be jointly considered to perform dynamic control of complex and heterogeneous Multi-Processor System on Chips (MPSoCs). To close the gap, in this work, we first provide a detailed thermal and power model targeting a modern High Performance Computing (HPC) MPSoC. We consider real-world coupling effects such as actuators' non-idealities and the exponential relation between the dissipated power, the temperature state, and the voltage level in a single processing element. We analyze how these factors affect the control algorithm behavior and the type of challenges that they pose. Based on the analysis, we propose a thermal capping strategy inspired by Fuzzy control theory to replace the state-of-the-art PID controller, as well as a root-finding iterative method to optimally choose the shared voltage value among cores grouped in the same voltage domain. We evaluate the proposed controller with model-in-the-loop and hardware-in-the-loop co-simulations. We show an improvement over state-of-the-art methods of up to 5x the maximum exceeded temperature while providing an average of 3.56% faster application execution runtime across all the evaluation scenarios.
Comment: Paper in Review
نوع الوثيقة: Working Paper
URL الوصول: http://arxiv.org/abs/2405.18030
رقم الأكسشن: edsarx.2405.18030
قاعدة البيانات: arXiv