PH.D DEFENCE - PUBLIC SEMINAR

A Pan-cancer Analysis of Alternative Promoters Using RNA-Seq Data

Speaker
Mr Demircioglu Deniz
Advisor
Dr Ken Sung Wing Kin, Professor, School of Computing


31 Jan 2020 Friday, 04:00 PM to 05:30 PM

MR1, COM1-03-19

Abstract:

Most human genes generate multiple transcript isoforms. The differential expression of these isoforms can help specify cell types, developmental stages and disease phenotypes. Hence the transcription of isoforms heavily regulated by complex transcriptional regulatory mechanisms such as alternative splicing, alternative promoters and alternative polyadenylation. The wide availability of RNA-seq data allows the examination of alternative splicing however the study of alternative promoters mostly relies on specialized experimental methods such ChIP-Seq and CAGE methods which have limited availability. Furthermore computational methods to study promoter landscape are rare and utilize computationally costly differential exon usage analysis leading to limited scalability. Hence the large scale comprehensive analysis of promoter landscape is currently not feasible leaving the role of promoters in diseases and development as an open question.

In this thesis, to address this question, we developed a new computational method to infer promoter activity by using widely available RNA-Seq data. With the wide adoption of RNA-Seq experiments by the international consortia efforts such as ENCODE, GTEx, TCGA and PCAWG, thousands of RNA-Seq samples became publicly available across tissues, cell lines and multiple cancer types. Using our method and the publicly available data, we performed large-scale analysis of alternative promoters by processing 18,468 RNA-Seq samples in total across 42 cancer types performing the largest promoter activity analysis to date. The large sample sizes lead to increased statistical power allowing identification of systematic changes in promoter activity. For this purpose, we developed a statistical framework to identify alternative promoters that display context dependent regulation. We compared tissues against other tissues and cancer and normal samples to identify tissue specific and cancer associated alternative promoters respectively. Furthermore, we examined the transcriptional diversity, possible mechanisms, and functional and prognostic consequences of alternative promoter usage.

In summary, we developed a computational method that leverages RNA-Seq data to infer promoter activities enabling large scale analysis of transcriptional regulation at promoter level as well as a statistical framework to identify alternative promoters. We compared our method to alternative approaches and demonstrated its accuracy and efficiency. Using our promoter activity estimates and alternative promoter identification framework, we identified promoters that are deregulated across tissues, cancer types, and patients, creating the largest catalogue of alternative promoters. This thesis suggests that promoter activity can be robustly and efficiently estimated using RNA-Seq data and the dynamic landscape of active promoters shapes the tissue and cancer transcriptome.