PH.D DEFENCE - PUBLIC SEMINAR

Computational Methods for Reconstruction and Enhancement of Genome-Scale Metabolic Networks

Speaker
Mr Nguyen Nam Ninh
Advisor
Dr Leong Hon Wai, Associate Professor, School of Computing


07 Mar 2017 Tuesday, 02:00 PM to 03:00 PM

Executive Classroom, COM2-04-02

Abstract:

Metabolic network has many useful applications through systems biology and bioengineering. Such network of metabolic reactions can be reconstructed from genome. However, the reconstruction process is time consuming and intensively involves expert labour, yet producing incomplete networks with many gaps. This thesis aims to develop a computational pipeline to minimize time and manual effort for reconstructing high quality networks. Three problems were addressed, namely, gap filling, enzyme annotation, and network assembly.

The first problem is to sufficiently fill the gaps that remain after a network has been reconstructed. Previous methods used enzyme families to find gap candidates; they failed for poorly characterized families. In our indirect approach, we relied on any relevant homolog, such that no potential candidate is missed. Multiple function predictors were retrofitted and integrated. This ensemble method MeGaFiller can putatively fill 35% of gaps in several networks that previous methods failed.

The second problem is to reliably annotate enzymes with high accuracy and coverage, thus minimizing network reconstruction errors for later steps. We developed a novel bottom up method, called EnzDP, based on protein functional domain composition and calibrated HMM profiles. EnzDP has a coverage of 4000+ substrate-specific enzymes. It achieved a 94% accuracy in solid 5-fold cross validation, and outperformed many other alternatives.

The third problem is to quickly build a connected network with minimum number of gaps from a given set of annotated enzymes. We showed that this problem is NP-Hard, and designed an approximation algorithm, called NetA. NetA predicts and then optimizes reference pathways by deleting non-evidence reactions while maintaining network connectivity. It was used to re-assemble a network for Aspergillus oryzae, which resulted in a network of 742 unique reactions in 119 pathways, in which 72.4% of reactions have enzyme genes identified.

From these methods, an automated pipeline can be achieved by running EnzDP to annotate enzymes, then using NetA to build a network and finally applying MeGaFiller to fill gaps.