Towards End-to-end SDC Detection for HPC Applications Equipped with Lossy Compression

التفاصيل البيبلوغرافية
العنوان: Towards End-to-end SDC Detection for HPC Applications Equipped with Lossy Compression
المؤلفون: Zizhong Chen, Franck Cappello, Xin Liang, Kai Zhao, Sheng Di, Sihuan Li
المصدر: CLUSTER
بيانات النشر: IEEE, 2020.
سنة النشر: 2020
مصطلحات موضوعية: Lossless compression, 020203 distributed computing, End-to-end principle, Computer engineering, Computer science, Distortion, 0202 electrical engineering, electronic engineering, information engineering, 020207 software engineering, Data_CODINGANDINFORMATIONTHEORY, 02 engineering and technology, Lossy compression, Volume (compression)
الوصف: Data reduction techniques have been widely demanded and used by large-scale high performance computing (HPC) applications because of vast volumes of data to be produced and stored for post-analysis. Due to very limited compression ratios of lossless compressors, error-bounded lossy compression has become an indispensable part in many HPC applications nowadays, because it can significantly reduce science data volume with user-acceptable data distortion. Since the large-scale HPC applications equipped with lossy compression techniques always need to deal with vast volume of data, soft errors or silent data corruptions (SDC) are non-negligible. Although SDC detection techniques have been studied for years, no studies were performed toward the HPC applications with lossy compression, leaving a significant gap between these applications and confidence of execution results. To fill this gap, this paper proposes a couple of SDC detection strategies for scientific simulations with lossy compression. Experimental results on 4 widely used scientific simulation datasets show promising detection ability could be still obtained with two popular lossy compressors. Our parallel experiments with up to 1,024 cores confirm that the time overheads could be limited within 7.9%.
URL الوصول: https://explore.openaire.eu/search/publication?articleId=doi_________::a54ba0c734d4924c771bf51307d77069
https://doi.org/10.1109/cluster49012.2020.00043
حقوق: CLOSED
رقم الأكسشن: edsair.doi...........a54ba0c734d4924c771bf51307d77069
قاعدة البيانات: OpenAIRE