SimXRD-4M: Big Simulated X-ray Diffraction Data Accelerate the Crystalline Symmetry Classification

التفاصيل البيبلوغرافية
العنوان: SimXRD-4M: Big Simulated X-ray Diffraction Data Accelerate the Crystalline Symmetry Classification
المؤلفون: Cao, Bin, Liu, Yang, Zheng, Zinan, Tan, Ruifeng, Li, Jia, Zhang, Tong-yi
سنة النشر: 2024
المجموعة: Condensed Matter
مصطلحات موضوعية: Condensed Matter - Materials Science
الوصف: Spectroscopic data, particularly diffraction data, contain detailed crystal and microstructure information and thus are crucial for materials discovery. Powder X-ray diffraction (XRD) patterns are greatly effective in identifying crystals. Although machine learning (ML) has significantly advanced the analysis of powder XRD patterns, the progress is hindered by a lack of training data. To address this, we introduce SimXRD, the largest open-source simulated XRD pattern dataset so far, to accelerate the development of crystallographic informatics. SimXRD comprises 4,065,346 simulated powder X-ray diffraction patterns, representing 119,569 distinct crystal structures under 33 simulated conditions that mimic real-world variations. We find that the crystal symmetry inherently follows a long-tailed distribution and evaluate 21 sequence learning models on SimXRD. The results indicate that existing neural networks struggle with low-frequency crystal classifications. The present work highlights the academic significance and the engineering novelty of simulated XRD patterns in this interdisciplinary field.
نوع الوثيقة: Working Paper
URL الوصول: http://arxiv.org/abs/2406.15469
رقم الأكسشن: edsarx.2406.15469
قاعدة البيانات: arXiv