Dynamic scheduling using CPU oversubscription in the ALICE Grid.

التفاصيل البيبلوغرافية
العنوان: Dynamic scheduling using CPU oversubscription in the ALICE Grid.
المؤلفون: Bertran Ferrer, Marta, Grigoras, Costin, Badia, Rosa M.
المصدر: EPJ Web of Conferences; 5/6/2024, Vol. 295, p1-8, 8p
مصطلحات موضوعية: MOTHERBOARDS, COMPUTER input-output equipment, ROCKET payloads, METHODOLOGY, MONTE Carlo method
مستخلص: The ALICE Grid is designed to perform a realtime comprehensive monitoring of both jobs and execution nodes in order to maintain a continuous and consistent status of the Grid infrastructure. An extensive database of historical data is available and is periodically analyzed to tune the workflows and data management to optimal performance levels. This data, when evaluated in real time, has the power to trigger decisions for efficient resource management of the currently running payloads, for example to enable the execution of a higher volume of work per unit of time. In this article, we consider scenarios in which, through constant interaction with the monitoring agents, a dynamic adaptation of the running workflows is performed. The target resources are memory and CPU with the objective of using them in their entirety and ensuring optimal utilization fairness between executing jobs. Grid resources are heterogeneous and of different generations, which means that some of them have better hardware characteristics than the minimum required to execute ALICE jobs. Our middleware, JAliEn, works on the basis of having at least 2 GB of RAM allocated per core (allowing up to 8 GB of virtual memory when including swap). Many of the worker nodes have higher memory per core ratios than these basic limits and in terms of available memory they therefore have free resources to accommodate extra jobs. The running jobs may have different behaviors and unequal resource usages depending on their nature. For example, analysis tasks are I/O bound while Monte-Carlo tasks are CPU intensive. Running additional jobs with complementary resource usage patterns on a worker node has a great potential to increase its total efficiency. This paper presents the methodology to exploit the different resource usage profiles by oversubscribing the worker nodes with extra jobs taking into account their CPU resource usage levels and memory capacity. [ABSTRACT FROM AUTHOR]
Copyright of EPJ Web of Conferences is the property of EDP Sciences and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
قاعدة البيانات: Complementary Index
الوصف
تدمد:21016275
DOI:10.1051/epjconf/202429504020