PH.D DEFENCE - PUBLIC SEMINAR

LOW-POWER COMPILER CONTROLLED PROGRAMMABLE ACCELERATORS FOR NEXT GENERATION WEARABLES

Speaker
MR DISSANAYAKA MUDIYANSELAGE EMIL MANUPA KARUNARATNE
Advisor
Dr Tulika Mitra, Provost'S Chair Professor, School of Computing
Dr Peh Li Shiuan, Provost'S Chair Professor, School of Computing


14 Apr 2020 Tuesday, 03:00 PM to 04:30 PM

Executive Classroom, COM2-04-02

Abstract :
In recent years, the wearable device market has begun to expand rapidly -- spanning a wide variety of usage scenarios: from extremely small ear-wear to larger head-mounted AR/VR devices. The main requirement of such devices would be superior computational ability under a stringent energy budget. Accelerator-rich SoCs (System-on-Chips) have been the mainstream solution for achieving power-efficient computation in wearable devices. However, it is not practical or feasible to design an accelerator for each wearable application starting from scratch due to the prohibitively high non-recurring engineering (NRE) cost and exacting time-to-market constraints. Coarse-Grained Reconfigurable Arrays (CGRA) are a promising alternative for accelerators that provide a good balance between flexibility, performance, and power. Towards this end, this dissertation focuses on expanding the research frontiers of CGRAs by adopting hardware/software co-designed approaches.

In the first study, the interconnection of state of the art CGRAs architectures, is identified as a limitation in handling irregular data dependencies that are common amongst wearable applications. This study introduces a novel CGRA architecture ??? HyCUBE ??? with a reconfigurable interconnect providing single-cycle communications between distant PEs, resulting in a new formulation of the application mapping problem that leads to the design of an efficient compiler.

In the second study, the wearable applications are noted for their complex control-divergence structures (e.g., nested if-else statement, switch statements, etc.). Moreover, the static scheduling on CGRA requires independent resource reservations for mutually-exclusive dataflows along control divergent paths. Such reservations are not only wasteful but also limit performance by increasing the schedule length. This study introduces a novel architecture ??? 4D-CGRA that encourages mutually-exclusive dataflows to map to the same set of resources but allows execution of the appropriate dataflows at runtime based on branch outcomes.

Being able to handle control-divergence and irregular dependencies allows efficient execution of the most critical innermost loops of the wearable applications in CGRAs. However, modern signal processing, machine learning and encryption applications involve more than a single hot innermost loop. Therefore, the wearable application domain presents a challenge on how to efficiently use the limited configuration memory to accelerate the deeply-nested loop structures of wearable applications. The third study introduce DNestMap, a partitioning and mapping tool for CGRAs that can judiciously extract the most beneficial code segments of multiple deeply-nested loops and effectively cache them together statically in the configuration memory through spatio-temporal partitioning.

Lastly, the availability of compiler and simulation tools for CGRAs is limiting the ease of design space exploration of state-of-the-art architectural features such as used in HyCUBE and 4D-CGRA. Additionally, modern CGRA architectures offer various performance optimizing features such as complex interconnects, multi-banked memories, etc. The last chapter presents Morpher, a powerful, integrated compilation and simulation framework for CGRA, that fills this gap with a flexible approach enabling easy specification of complex architectural features and automated modelling of these features in efficient compiler, simulator.

The goal of this dissertation is to attain superior energy-efficiency through adopting hardware/- software co-designed CGRAs as the universal accelerator for next generation wearable devices.