AbstractGeneral purpose GPU architecture has various types of on-chip memory: registers, software-managed cache, and hardware-managed cache. These on-chip memory resources are powerful yet difficult to maneuver. Each type of on-chip memory has its advantages/disadvantages, making it suitable for different types of data. Further, the on-chip memory contention at different levels affects hardware concurrency that can be achieved on GPU. Unlike CPU architecture, on which on-chip memory allocation is performed under a fixed resource bound, GPU on-chip memory resource bound is a variable because of its relationship with the adjustable hardware concurrency. In this paper, we look at the data values that are analyzable at compile-time for placement in registers, softwaremanaged cache and hardware-managed cache. We propose an unified data placement strategy applicable to every type of on-chip memory, and yet flexible enough to maximize synergy among different types of on-chip memory
RightsThis Item is protected by copyright and/or related rights.You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use.For other uses you need to obtain permission from the rights-holder(s).