DCT-Former: Efficient Self-Attention with Discrete Cosine Transform

التفاصيل البيبلوغرافية
العنوان: DCT-Former: Efficient Self-Attention with Discrete Cosine Transform
المؤلفون: Carmelo Scribano, Giorgia Franchini, Marco Prato, Marko Bertogna
المصدر: Journal of Scientific Computing. 94
بيانات النشر: Springer Science and Business Media LLC, 2023.
سنة النشر: 2023
مصطلحات موضوعية: Signal Processing (eess.SP), FOS: Computer and information sciences, Computer Science - Machine Learning, Numerical Analysis, Applied Mathematics, General Engineering, Machine Learning (cs.LG), Deep learning, Discrete cosine transform, Frequencies domain, Natural language processing, Self-attention, Transformers, Theoretical Computer Science, Computational Mathematics, Computational Theory and Mathematics, FOS: Electrical engineering, electronic engineering, information engineering, Electrical Engineering and Systems Science - Signal Processing, Software
الوصف: Since their introduction the Trasformer architectures emerged as the dominating architectures for both natural language processing and, more recently, computer vision applications. An intrinsic limitation of this family of "fully-attentive" architectures arises from the computation of the dot-product attention, which grows both in memory consumption and number of operations as $O(n^2)$ where $n$ stands for the input sequence length, thus limiting the applications that require modeling very long sequences. Several approaches have been proposed so far in the literature to mitigate this issue, with varying degrees of success. Our idea takes inspiration from the world of lossy data compression (such as the JPEG algorithm) to derive an approximation of the attention module by leveraging the properties of the Discrete Cosine Transform. An extensive section of experiments shows that our method takes up less memory for the same performance, while also drastically reducing inference time. This makes it particularly suitable in real-time contexts on embedded platforms. Moreover, we assume that the results of our research might serve as a starting point for a broader family of deep neural models with reduced memory footprint. The implementation will be made publicly available at https://github.com/cscribano/DCT-Former-Public
تدمد: 1573-7691
0885-7474
URL الوصول: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::ceee7f1b9d7b86dee5bc556e2795475e
https://doi.org/10.1007/s10915-023-02125-5
حقوق: OPEN
رقم الأكسشن: edsair.doi.dedup.....ceee7f1b9d7b86dee5bc556e2795475e
قاعدة البيانات: OpenAIRE