wav2graph: A Framework for Supervised Learning Knowledge Graph from Speech

التفاصيل البيبلوغرافية
العنوان: wav2graph: A Framework for Supervised Learning Knowledge Graph from Speech
المؤلفون: Le-Duc, Khai, Dang, Quy-Anh, Pham, Tan-Hanh, Hy, Truong-Son
سنة النشر: 2024
المجموعة: Computer Science
مصطلحات موضوعية: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Information Retrieval, Computer Science - Machine Learning, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
الوصف: Knowledge graphs (KGs) enhance the performance of large language models (LLMs) and search engines by providing structured, interconnected data that improves reasoning and context-awareness. However, KGs only focus on text data, thereby neglecting other modalities such as speech. In this work, we introduce wav2graph, the first framework for supervised learning knowledge graph from speech data. Our pipeline are straightforward: (1) constructing a KG based on transcribed spoken utterances and a named entity database, (2) converting KG into embedding vectors, and (3) training graph neural networks (GNNs) for node classification and link prediction tasks. Through extensive experiments conducted in inductive and transductive learning contexts using state-of-the-art GNN models, we provide baseline results and error analysis for node classification and link prediction tasks on human transcripts and automatic speech recognition (ASR) transcripts, including evaluations using both encoder-based and decoder-based node embeddings, as well as monolingual and multilingual acoustic pre-trained models. All related code, data, and models are published online.
Comment: Preprint, 32 pages
نوع الوثيقة: Working Paper
URL الوصول: http://arxiv.org/abs/2408.04174
رقم الأكسشن: edsarx.2408.04174
قاعدة البيانات: arXiv