تقرير
wav2graph: A Framework for Supervised Learning Knowledge Graph from Speech
العنوان: | wav2graph: A Framework for Supervised Learning Knowledge Graph from Speech |
---|---|
المؤلفون: | Le-Duc, Khai, Dang, Quy-Anh, Pham, Tan-Hanh, Hy, Truong-Son |
سنة النشر: | 2024 |
المجموعة: | Computer Science |
مصطلحات موضوعية: | Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Information Retrieval, Computer Science - Machine Learning, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing |
الوصف: | Knowledge graphs (KGs) enhance the performance of large language models (LLMs) and search engines by providing structured, interconnected data that improves reasoning and context-awareness. However, KGs only focus on text data, thereby neglecting other modalities such as speech. In this work, we introduce wav2graph, the first framework for supervised learning knowledge graph from speech data. Our pipeline are straightforward: (1) constructing a KG based on transcribed spoken utterances and a named entity database, (2) converting KG into embedding vectors, and (3) training graph neural networks (GNNs) for node classification and link prediction tasks. Through extensive experiments conducted in inductive and transductive learning contexts using state-of-the-art GNN models, we provide baseline results and error analysis for node classification and link prediction tasks on human transcripts and automatic speech recognition (ASR) transcripts, including evaluations using both encoder-based and decoder-based node embeddings, as well as monolingual and multilingual acoustic pre-trained models. All related code, data, and models are published online. Comment: Preprint, 32 pages |
نوع الوثيقة: | Working Paper |
URL الوصول: | http://arxiv.org/abs/2408.04174 |
رقم الأكسشن: | edsarx.2408.04174 |
قاعدة البيانات: | arXiv |
الوصف غير متاح. |