Regression Training using Model Parallelism in a Distributed Cloud

التفاصيل البيبلوغرافية
العنوان: Regression Training using Model Parallelism in a Distributed Cloud
المؤلفون: Roberto Morabito, Mohammed Elmusrati, Miika Komu, Miljenko Opsenica, Joel Reijonen
المصدر: DASC/PiCom/DataCom/CyberSciTech
بيانات النشر: IEEE, 2019.
سنة النشر: 2019
مصطلحات موضوعية: business.industry, Computer science, Distributed computing, Big data, Cloud computing, Reuse, computer.software_genre, Scheduling (computing), Software portability, Intelligent agent, Scalability, business, computer, Edge computing
الوصف: Machine learning requires a relevant amount of computational resources and it is usually executed in high-capacity centralized cloud infrastructures (e.g., data centers). In such infrastructures, resources are shared in a scalable manner through instantiation and orchestration of multiple virtualized services. Emerging trends in machine learning are distribution and parallelization of model training, which allows the execution of model training tasks in multiple distributed computational domains, with the aim of reducing the overall training time. A possible drawback in decentralization of machine learning is that performance latency issues may arise when the computation of training is geographically distributed to nodes with long distance from each other. One way to reduce latency is to utilize edge computing infrastructure, i.e., to distribute computation near the origin of the request. As edge resources can be scarce, it is important to orchestrate the model training in a parallelized manner. To this extent, in order to effectively ease the use of parallelization both in centralized and in distributed scenarios, we propose and implement a concept that we refer to Intelligent Agent (IA). An IA is responsible for instantiating and scheduling of the machine learning tasks (e.g., model training), and deriving inferences. In our solution, model training is distributed to multiple IAs in parallel. Each IA is packaged into a Linux container in order to take advantage of container portability across heterogenous deployments and to reuse existing container orchestration tools. We validate our proposal by deploying and instantiating multiple IAs across a distributed cloud environment, where each IA is accounting for a fixed amount of computational resources. Keywords - Intelligent agent, Model parallelism, Regression training, Intelligent cloud
URL الوصول: https://explore.openaire.eu/search/publication?articleId=doi_________::cd2503b5330bb60e689d88b316a91e8a
https://doi.org/10.1109/dasc/picom/cbdcom/cyberscitech.2019.00139
حقوق: OPEN
رقم الأكسشن: edsair.doi...........cd2503b5330bb60e689d88b316a91e8a
قاعدة البيانات: OpenAIRE