Low-Power Many-Core Architectures for the Next Generation Wearables

Mr Tan Cheng
Dr Mitra, Tulika, Professor, School of Computing
Dr Peh Li Shiuan, Provost'S Chair Professor, School of Computing

  18 Jan 2019 Friday, 10:30 AM to 12:00 PM

 MR1, COM1-03-19


Wearable devices are now leveraging power-efficient processors inside the SoC (System-on-Chip) to cater to the increasing computational demands of the wearable applications, such as real-time response. Many-core architectures enable improved performance by exploiting the available thread-level parallelism in many wearable applications but suffer from high power consumption that is crucial in the wearable domain with stringent power budget. In addition, the power-efficiency can be effectively improved using application-specific ASIC accelerators. However, it is not practical or feasible to design an accelerator for each wearable application starting from scratch due to the prohibitively high non-recurring engineering (NRE) cost and exacting time-to-market constraints. In this dissertation, we focus on novel customizable many-core architecture designs to enable high performance/watt for the next generation wearables.

In the first stage, we combine customizable processor cores with a configurable network on a message-passing architecture to deliver very competitive performance/watt. We propose LOCUS -- a LOw-power, highly CUStomizable many-core architecture that can be universally deployed as a wearable device across diverse application scenarios. In addition, LOCUS aggressively customizes the cores and network at run-time in a synergistic and integrated fashion. Frequently occurring instruction sequences in applications are automatically discovered, triggering custom instructions that jointly accelerate computing and communications. These custom instructions configure the processor cores and network at runtime to tailor-fit LOCUS for each specific application, leading to improved performance at lower energy. LOCUS uses a lightweight message-passing substrate for data transfers, instead of relying on costly (in area and power) hardware cache coherence prevalent in shared-memory many-cores. Furthermore, due to the imbalanced workload distributed among the cores and the shorter size of messages generated by the message passing mechanism transferred in the NoC, a power management mechanism deployed in computation and communication in LOCUS is proposed to further improve the performance/watt.

Adding accelerators can significantly improve the power-performance of SoCs. Configurable fabrics can amortize the NRE cost through re-use and improve the power-efficiency in computation. However, the overhead to add a complete, complex accelerator per core in a many-core architecture is too expensive both from power and area perspectives. Therefore, we propose Stitch, a many-core architecture where tiny, heterogeneous, configurable and fusible ISE accelerators, called polymorphic patches are effectively enmeshed with the cores. Each patch can handle very simple ISEs and multiple polymorphic patches can be stitched together across the chip to create large, virtual accelerators for complex ISEs by using an ultra lightweight compiler-scheduled network-on-chip without any buffers or control logic.

Combining patches to focus on accelerating computing-intensive kernels delivers high throughput for streaming applications. However, statically allocating pairs of patches to certain kernels during the entire runtime limits the overall performance and patch utilization. Thus, we propose DIFFUSION, DynamIc FUSIble heterOgeNeous accelerators enmeshed with many-core architecture. DIFFUSION dynamically fuses multiple patches together to accelerate computing-intensive kernels in wearable applications. The overall patch utilization is maximized by fusing a patch with different patches depending on the need to accelerate different kernels during the entire application runtime.

The goal of this dissertation is to embrace parallelism with many-core while deploying lightweight communication and aggressive customization at multiple levels in the architecture to deliver very competitive performance/watt for wearables within strict power budget.