Developing a generator for CUDA C files to reduce synchronization overhead (preliminary version)
Abstract
In most GPU implementations of parallel algorithms,
multiple separated CUDA kernels are called from the
host one by one for barrier synchronization. However, CUDA
kernel calls have large overheads and barrier synchronization
degrades computing resource usage. The Single Kernel Soft
Synchronization (SKSS) technique performs only one CUDA
kernel and assigns tasks to CUDA blocks dynamically. The SKSS
technique can fully exploit GPU computing resources, and many
studies have reported performance improvements achieved using
SKSS. Unfortunately, efficient GPU implementations using the
SKSS technique require extensive knowledge and experience in
CUDA programming. To address this issue, we present in this
paper a tool that converts CUDA C files without SKSS into
SKSS-enabled CUDA C files.
Keywords
Full Text:
PDFRefbacks
- There are currently no refbacks.
