A Small and Fast BERT for Chinese Medical Punctuation Restoration

التفاصيل البيبلوغرافية
العنوان: A Small and Fast BERT for Chinese Medical Punctuation Restoration
المؤلفون: Ling, Tongtao, Lai, Yutao, Chen, Lei, Huang, Shilei, Liu, Yi
سنة النشر: 2023
المجموعة: Computer Science
مصطلحات موضوعية: Computer Science - Computation and Language
الوصف: In clinical dictation, utterances after automatic speech recognition (ASR) without explicit punctuation marks may lead to the misunderstanding of dictated reports. To give a precise and understandable clinical report with ASR, automatic punctuation restoration is required. Considering a practical scenario, we propose a fast and light pre-trained model for Chinese medical punctuation restoration based on 'pretraining and fine-tuning' paradigm. In this work, we distill pre-trained models by incorporating supervised contrastive learning and a novel auxiliary pre-training task (Punctuation Mark Prediction) to make it well-suited for punctuation restoration. Our experiments on various distilled models reveal that our model can achieve 95% performance while 10% model size relative to state-of-the-art Chinese RoBERTa.
Comment: 5 pages, 2 figures, Accepted by INTERSPEECH 2024
نوع الوثيقة: Working Paper
URL الوصول: http://arxiv.org/abs/2308.12568
رقم الأكسشن: edsarx.2308.12568
قاعدة البيانات: arXiv