PH.D DEFENCE - PUBLIC SEMINAR

Software-hardware codesign for scalable energy-efficient CGRAs

Speaker
Mr Dissanayake Mudiyanselage Dhananjaya Wijerathne
Advisor
Dr Tulika Mitra, Provost'S Chair Professor, School of Computing


05 Jun 2023 Monday, 10:00 AM to 11:30 AM

MR3, COM2-02-26

Abstract:

Coarse-Grained Reconfigurable Array (CGRA) is a software-defined hardware accelerator that offers a good blend of performance, energy efficiency, and flexibility (programmability). Therefore, CGRA is suitable for systems with tight power budgets and require frequent reprogramming, such as the IoT, mobile, and data center devices. The CGRA architecture consists of an array of simple processing elements and runs a statically generated schedule exposing the scheduling complexity to the compiler. Therefore, CGRA performance heavily depends on a high-quality compiler to map the application kernels on the architecture. State-of-the-art compilers fail to generate high-quality mappings within an acceptable compilation time, especially with the complex kernels and the increased CGRA size. CGRA execution falls short of its potential performance and energy efficiency due to low-quality mappings.

This thesis introduces software-hardware codesign techniques to improve CGRA’s performance scalability and energy efficiency, primarily focusing on high-quality compilation. First, bringing the data into CGRA processing elements at high throughput is crucial to the execution performance. Existing CGRAs are unable to fully exploit the available memory bandwidth in multi-bank on-chip memories due to overheads in memory access-related computations. We present CASCADE, a novel decoupled access-execute CGRA design with architecture and compiler support to leverage multi-bank memory bandwidth fully. CGRA can process data faster now that it has faster data access. However, the quality of the computation mapping on CGRA processing elements determines data processing speed. Therefore, the application kernel mapping on CGRA processing elements is the focus of our following works, HiMap and Panorama. We specifically focus on the most challenging application mapping scenario with bigger kernels (kernels with high parallelism) and larger CGRAs. HiMap specializes in mapping highly parallel kernels with regular data dependencies, whereas Panorama specializes in mapping highly parallel kernels with irregular data dependencies. Both of these works employ a hierarchical mapping technique, in which the high-level mapping stage places the kernel dataflow graph on the architecture to simplify the detailed low-level mapping. Compared to state-of-the-art techniques, our solutions achieve superior performance and energy efficiency while significantly reducing compilation time. The approaches we suggest would serve as a foundation for the future development of scalable energy-efficient CGRA systems. Finally, we propose a comprehensive CGRA design framework to assist in design space exploration and fuel future innovations by creating functionally accurate CGRA designs that can be seamlessly integrated with real systems.