تقرير
65 GOPS/neuron Photonic Tensor Core with Thin-film Lithium Niobate Photonics
العنوان: | 65 GOPS/neuron Photonic Tensor Core with Thin-film Lithium Niobate Photonics |
---|---|
المؤلفون: | Lin, Zhongjin, Shastri, Bhavin J., Yu, Shangxuan, Song, Jingxiang, Zhu, Yuntao, Safarnejadian, Arman, Cai, Wangning, Lin, Yanmei, Ke, Wei, Hammood, Mustafa, Wang, Tianye, Xu, Mengyue, Zheng, Zibo, Al-Qadasi, Mohammed, Esmaeeli, Omid, Rahim, Mohamed, Pakulski, Grzegorz, Schmid, Jens, Barrios, Pedro, Jiang, Weihong, Morison, Hugh, Mitchell, Matthew, Qiang, Xiaogang, Guan, Xun, Jaeger, Nicolas A. F., Rusch, Leslie A. n, Shekhar, Sudip, Shi, Wei, Yu, Siyuan, Cai, Xinlun, Chrostowski, Lukas |
سنة النشر: | 2023 |
المجموعة: | Computer Science Physics (Other) |
مصطلحات موضوعية: | Physics - Optics, Computer Science - Emerging Technologies, Physics - Applied Physics, 78A05 |
الوصف: | Photonics offers a transformative approach to artificial intelligence (AI) and neuromorphic computing by providing low latency, high bandwidth, and energy-efficient computations. Here, we introduce a photonic tensor core processor enabled by time-multiplexed inputs and charge-integrated outputs. This fully integrated processor, comprising only two thin-film lithium niobate (TFLN) modulators, a III-V laser, and a charge-integration photoreceiver, can implement an entire layer of a neural network. It can execute 65 billion operations per second (GOPS) per neuron, including simultaneous weight updates-a hitherto unachieved speed. Our processor stands out from conventional photonic processors, which have static weights set during training, as it supports fast "hardware-in-the-loop" training, and can dynamically adjust the inputs (fan-in) and outputs (fan-out) within a layer, thereby enhancing its versatility. Our processor can perform large-scale dot-product operations with vector dimensions up to 131,072. Furthermore, it successfully classifies (supervised learning) and clusters (unsupervised learning) 112*112-pixel images after "hardware-in-the-loop" training. To handle "hardware-in-the-loop" training for clustering AI tasks, we provide a solution for multiplications involving two negative numbers based on our processor. Comment: 19 pages, 6 figures |
نوع الوثيقة: | Working Paper |
URL الوصول: | http://arxiv.org/abs/2311.16896 |
رقم الأكسشن: | edsarx.2311.16896 |
قاعدة البيانات: | arXiv |
الوصف غير متاح. |