تقرير
SpikeBERT: A Language Spikformer Learned from BERT with Knowledge Distillation
العنوان: | SpikeBERT: A Language Spikformer Learned from BERT with Knowledge Distillation |
---|---|
المؤلفون: | Lv, Changze, Li, Tianlong, Xu, Jianhan, Gu, Chenxi, Ling, Zixuan, Zhang, Cenyuan, Zheng, Xiaoqing, Huang, Xuanjing |
سنة النشر: | 2023 |
المجموعة: | Computer Science |
مصطلحات موضوعية: | Computer Science - Computation and Language |
الوصف: | Spiking neural networks (SNNs) offer a promising avenue to implement deep neural networks in a more energy-efficient way. However, the network architectures of existing SNNs for language tasks are still simplistic and relatively shallow, and deep architectures have not been fully explored, resulting in a significant performance gap compared to mainstream transformer-based networks such as BERT. To this end, we improve a recently-proposed spiking Transformer (i.e., Spikformer) to make it possible to process language tasks and propose a two-stage knowledge distillation method for training it, which combines pre-training by distilling knowledge from BERT with a large collection of unlabelled texts and fine-tuning with task-specific instances via knowledge distillation again from the BERT fine-tuned on the same training examples. Through extensive experimentation, we show that the models trained with our method, named SpikeBERT, outperform state-of-the-art SNNs and even achieve comparable results to BERTs on text classification tasks for both English and Chinese with much less energy consumption. Our code is available at https://github.com/Lvchangze/SpikeBERT. |
نوع الوثيقة: | Working Paper |
URL الوصول: | http://arxiv.org/abs/2308.15122 |
رقم الأكسشن: | edsarx.2308.15122 |
قاعدة البيانات: | arXiv |
كن أول من يترك تعليقا!