A Computational-Graph Partitioning Method for Training Memory-Constrained DNNs

التفاصيل البيبلوغرافية
العنوان: A Computational-Graph Partitioning Method for Training Memory-Constrained DNNs
المؤلفون: Didem Unat, Mohamed Wahib, Doğa Dikbayır, Mehmet E. Belviranli, Fareed Qararyah
بيانات النشر: arXiv, 2020.
سنة النشر: 2020
مصطلحات موضوعية: FOS: Computer and information sciences, Computer Networks and Communications, Computer science, 010103 numerical & computational mathematics, Parallel computing, 01 natural sciences, Bottleneck, Theoretical Computer Science, Artificial Intelligence, Memory training, 0101 mathematics, Throughput (business), Scaling, business.industry, Deep learning, Graph partition, Computer Graphics and Computer-Aided Design, 010101 applied mathematics, Computer Science - Distributed, Parallel, and Cluster Computing, Hardware and Architecture, Deep neural networks, Artificial intelligence, Distributed, Parallel, and Cluster Computing (cs.DC), Graph operations, business, Software
الوصف: Many state-of-the-art Deep Neural Networks (DNNs) have substantial memory requirements. Limited device memory becomes a bottleneck when training those models. We propose ParDNN , an automatic, generic, and non-intrusive partitioning strategy for DNNs that are represented as computational graphs. ParDNN decides a placement of DNN’s underlying computational graph operations across multiple devices so that the devices’ memory constraints are met and the training time is minimized. ParDNN is completely independent of the deep learning aspects of a DNN. It requires no modification neither at the model nor at the systems level implementation of its operation kernels. ParDNN partitions DNNs having billions of parameters and hundreds of thousands of operations in seconds to few minutes. Our experiments with TensorFlow on 16 GPUs demonstrate efficient training of 5 very large models while achieving superlinear scaling for both the batch size and training throughput. ParDNN either outperforms or qualitatively improves upon the related work.
DOI: 10.48550/arxiv.2008.08636
URL الوصول: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::778ef7614d496d746ef57b7a558baba1
حقوق: OPEN
رقم الأكسشن: edsair.doi.dedup.....778ef7614d496d746ef57b7a558baba1
قاعدة البيانات: OpenAIRE
الوصف
DOI:10.48550/arxiv.2008.08636