تقرير
A Small and Fast BERT for Chinese Medical Punctuation Restoration
العنوان: | A Small and Fast BERT for Chinese Medical Punctuation Restoration |
---|---|
المؤلفون: | Ling, Tongtao, Lai, Yutao, Chen, Lei, Huang, Shilei, Liu, Yi |
سنة النشر: | 2023 |
المجموعة: | Computer Science |
مصطلحات موضوعية: | Computer Science - Computation and Language |
الوصف: | In clinical dictation, utterances after automatic speech recognition (ASR) without explicit punctuation marks may lead to the misunderstanding of dictated reports. To give a precise and understandable clinical report with ASR, automatic punctuation restoration is required. Considering a practical scenario, we propose a fast and light pre-trained model for Chinese medical punctuation restoration based on 'pretraining and fine-tuning' paradigm. In this work, we distill pre-trained models by incorporating supervised contrastive learning and a novel auxiliary pre-training task (Punctuation Mark Prediction) to make it well-suited for punctuation restoration. Our experiments on various distilled models reveal that our model can achieve 95% performance while 10% model size relative to state-of-the-art Chinese RoBERTa. Comment: 5 pages, 2 figures, Accepted by INTERSPEECH 2024 |
نوع الوثيقة: | Working Paper |
URL الوصول: | http://arxiv.org/abs/2308.12568 |
رقم الأكسشن: | edsarx.2308.12568 |
قاعدة البيانات: | arXiv |
الوصف غير متاح. |