DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence

التفاصيل البيبلوغرافية
العنوان: DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence
المؤلفون: DeepSeek-AI, Zhu, Qihao, Guo, Daya, Shao, Zhihong, Yang, Dejian, Wang, Peiyi, Xu, Runxin, Wu, Y., Li, Yukun, Gao, Huazuo, Ma, Shirong, Zeng, Wangding, Bi, Xiao, Gu, Zihui, Xu, Hanwei, Dai, Damai, Dong, Kai, Zhang, Liyue, Piao, Yishi, Gou, Zhibin, Xie, Zhenda, Hao, Zhewen, Wang, Bingxuan, Song, Junxiao, Chen, Deli, Xie, Xin, Guan, Kang, You, Yuxiang, Liu, Aixin, Du, Qiushi, Gao, Wenjun, Lu, Xuan, Chen, Qinyu, Wang, Yaohui, Deng, Chengqi, Li, Jiashi, Zhao, Chenggang, Ruan, Chong, Luo, Fuli, Liang, Wenfeng
سنة النشر: 2024
المجموعة: Computer Science
مصطلحات موضوعية: Computer Science - Software Engineering, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
الوصف: We present DeepSeek-Coder-V2, an open-source Mixture-of-Experts (MoE) code language model that achieves performance comparable to GPT4-Turbo in code-specific tasks. Specifically, DeepSeek-Coder-V2 is further pre-trained from an intermediate checkpoint of DeepSeek-V2 with additional 6 trillion tokens. Through this continued pre-training, DeepSeek-Coder-V2 substantially enhances the coding and mathematical reasoning capabilities of DeepSeek-V2, while maintaining comparable performance in general language tasks. Compared to DeepSeek-Coder-33B, DeepSeek-Coder-V2 demonstrates significant advancements in various aspects of code-related tasks, as well as reasoning and general capabilities. Additionally, DeepSeek-Coder-V2 expands its support for programming languages from 86 to 338, while extending the context length from 16K to 128K. In standard benchmark evaluations, DeepSeek-Coder-V2 achieves superior performance compared to closed-source models such as GPT4-Turbo, Claude 3 Opus, and Gemini 1.5 Pro in coding and math benchmarks.
نوع الوثيقة: Working Paper
URL الوصول: http://arxiv.org/abs/2406.11931
رقم الأكسشن: edsarx.2406.11931
قاعدة البيانات: arXiv