A Pan-cancer Analysis of Alternative Promoters Using RNA-Seq Data

Mr Deniz Demircioglu
Dr Sung Wing Kin, Ken, Professor, School of Computing

  29 Jul 2019 Monday, 02:00 PM to 03:30 PM

 SR8, COM1-02-08


Most human genes generate multiple transcript isoforms. The differential expression of these isoforms can help specify cell types, developmental stages and disease phenotypes. Hence the transcription of isoforms heavily regulated by complex transcriptional regulatory mechanisms such as alternative splicing, alternative promoters and alternative polyadenylation. The wide availability of RNA-seq data allows the examination of alternative splicing however the study of alternative promoters mostly relies on specialized experimental methods such ChIP-Seq and CAGE methods which have limited availability. Furthermore computational methods to study promoter landscape are rare and utilize computationally costly differential exon usage analysis leading to limited scalability. Hence the large scale comprehensive analysis of promoter landscape is still an open question.

In this study, we developed a new computational method to infer promoter activity by using widely available RNA-Seq data. With the wide adoption of RNA-Seq experiments by the international consortia efforts such as ENCODE, GTEx, TCGA and PCAWG, our method enables the analysis of alternative promoters across tissues, cell lines and multiple cancer types. In total, we analyzed 18,468 RNA-Seq samples across 42 cancer types performing the largest promoter activity analysis to date. The large sample sizes lead to higher statistical power allowing identification of systematic changes in promoter activity. For this purpose, we developed a statistical framework to identify alternative promoters that display context dependent regulation. We compared tissues against other tissues and cancer and normal samples to identify tissue specific and cancer associated alternative promoters respectively. Furthermore, we examined the transcriptional diversity, possible mechanisms, and functional and prognostic consequences of alternative promoter usage.

In summary, we developed a computational method that leverages RNA-Seq data to infer promoter activities enabling large scale analysis of transcriptional regulation at promoter level. Using our promoter activity estimates and alternative promoter identification framework, we identified promoters that are deregulated across tissues, cancer types, and patients, creating the largest catalogue of alternative promoters. Our study suggests that promoter activity can be robustly estimated using RNA-Seq data and the dynamic landscape of active promoters shapes the tissue and cancer transcriptome.