Performance Modeling in CUDA Streams - A Means for High-Throughput Data Processing

Hao Li; Di Yu; Anand Kumar; Yi-Cheng Tu

doi:10.1109/BigData.2014.7004245

Performance Modeling in CUDA Streams - A Means for High-Throughput Data Processing

Proc IEEE Int Conf Big Data. 2014 Oct:2014:301-310. doi: 10.1109/BigData.2014.7004245.

Authors

Hao Li, Di Yu, Anand Kumar, Yi-Cheng Tu

Abstract

Push-based database management system (DBMS) is a new type of data processing software that streams large volume of data to concurrent query operators. The high data rate of such systems requires large computing power provided by the query engine. In our previous work, we built a push-based DBMS named G-SDMS to harness the unrivaled computational capabilities of modern GPUs. A major design goal of G-SDMS is to support concurrent processing of heterogenous query processing operations and enable resource allocation among such operations. Understanding the performance of operations as a result of resource consumption is thus a premise in the design of G-SDMS. With NVIDIA's CUDA framework as the system implementation platform, we present our recent work on performance modeling of CUDA kernels running concurrently under a runtime mechanism named CUDA stream. Specifically, we explore the connection between performance and resource occupancy of compute-bound kernels and develop a model that can predict the performance of such kernels. Furthermore, we provide an in-depth anatomy of the CUDA stream mechanism and summarize the main kernel scheduling disciplines in it. Our models and derived scheduling disciplines are verified by extensive experiments using synthetic and real-world CUDA kernels.

Keywords: CUDA; CUDA stream; DBMS; GPGPU; GPU; push-based systems.

Grants and funding

R01 GM086707/GM/NIGMS NIH HHS/United States