NLP for Knowledge Discovery and Information Extraction from Energetics Corpora

التفاصيل البيبلوغرافية
العنوان: NLP for Knowledge Discovery and Information Extraction from Energetics Corpora
المؤلفون: VanGessel, Francis G., Perry, Efrem, Mohan, Salil, Barham, Oliver M., Cavolowsky, Mark
سنة النشر: 2024
المجموعة: Computer Science
Condensed Matter
مصطلحات موضوعية: Computer Science - Computation and Language, Condensed Matter - Materials Science
الوصف: We present a demonstration of the utility of NLP for aiding research into energetic materials and associated systems. The NLP method enables machine understanding of textual data, offering an automated route to knowledge discovery and information extraction from energetics text. We apply three established unsupervised NLP models: Latent Dirichlet Allocation, Word2Vec, and the Transformer to a large curated dataset of energetics-related scientific articles. We demonstrate that each NLP algorithm is capable of identifying energetic topics and concepts, generating a language model which aligns with Subject Matter Expert knowledge. Furthermore, we present a document classification pipeline for energetics text. Our classification pipeline achieves 59-76\% accuracy depending on the NLP model used, with the highest performing Transformer model rivaling inter-annotator agreement metrics. The NLP approaches studied in this work can identify concepts germane to energetics and therefore hold promise as a tool for accelerating energetics research efforts and energetics material development.
نوع الوثيقة: Working Paper
URL الوصول: http://arxiv.org/abs/2402.06964
رقم الأكسشن: edsarx.2402.06964
قاعدة البيانات: arXiv