An Efficient Convolutional Neural Network Computation using AVX-512 Instructions

Hiroki Kataoka, Kohei Yamashita, Koji Nakano, Yasuaki Ito, Akihiko Kasagi, Tsuguchika Tabaru


Recently, Convolutional Neural Networks (CNNs) are widely used for image processing. Since the computation cost is high, it is necessary to accelerate the computation. Therefore, in this paper, we propose an efficient implementation using Intel AVX-512 instructions on the multicore CPUs. AVX-512 instructions suppose 512-bit vector operations, in which 16 32-bit floating point number operations can be executed simultaneously. In this implementation, to reduce the computation, we use an idea of the fused filter that combines a convolutional layer and its following pooling layer. As a result, we achieve a speed-up factor of 1.62 over an existing library implementation using Intel Math Kernel Library for Deep Neural Networks.


Deep Learning; Neural Networks; Convolution; Average Pooling

Full Text:



  • There are currently no refbacks.