Leveraging knowledge graphs to update scientific word embeddings using latent semantic imputation

التفاصيل البيبلوغرافية
العنوان: Leveraging knowledge graphs to update scientific word embeddings using latent semantic imputation
المؤلفون: Hoelscher-Obermaier, Jason, Stevinson, Edward, Stauber, Valentin, Zhelev, Ivaylo, Botev, Victor, Wu, Ronin, Minton, Jeremy
سنة النشر: 2022
المجموعة: Computer Science
مصطلحات موضوعية: Computer Science - Computation and Language
الوصف: The most interesting words in scientific texts will often be novel or rare. This presents a challenge for scientific word embedding models to determine quality embedding vectors for useful terms that are infrequent or newly emerging. We demonstrate how \gls{lsi} can address this problem by imputing embeddings for domain-specific words from up-to-date knowledge graphs while otherwise preserving the original word embedding model. We use the MeSH knowledge graph to impute embedding vectors for biomedical terminology without retraining and evaluate the resulting embedding model on a domain-specific word-pair similarity task. We show that LSI can produce reliable embedding vectors for rare and OOV terms in the biomedical domain.
Comment: Accepted for the Workshop on Information Extraction from Scientific Publications at AACL-IJCNLP 2022
نوع الوثيقة: Working Paper
URL الوصول: http://arxiv.org/abs/2210.15358
رقم الأكسشن: edsarx.2210.15358
قاعدة البيانات: arXiv