Using Biological Networks and Gene-Expression Profiles for the Analysis of Diseases
COM1 Level 3
MR1, COM1-03-19
closeAbstract:
The wealth of microarray data available today allows us to perform two important tasks: (1) Inferring biological explanations or causes behind diseases. (2) Using these explanations to diagnose and predict the outcome of future patients. These tasks are challenging and results are often not reproducible when different batches of data are analyzed. This problem is further aggravated by the lack of samples because many laboratories are constrained by budget, biology or other factors; making it hard to draw reasonable and consistent biological conclusions.
By using databases of biological pathways, which represent a wealth of biological information about the interdependencies between genes in performing a specific function, we are able to formulate algorithms that draw meaningful and consistent biological explanations as plausible causes of diseases. We derive and find statistically significant ?subnetworks?, which are smaller connected components within biological pathways, because the cause of a disease may be linked to a small subset of genes within a pathway. This, in conjunction with a unique scoring methodology, we are able to compute a test statistic that is stable even when sample sizes are small, and is consistently detected over independent batches of data, even from different microarray platforms. We are able to attain a high subnetwork-level agreement of about 58% using only 2 samples. For other contemporary methods, this number falls to 27% when analyzed using GSEA and 13% using ORA. In addition, the subnetwork-level agreement achieved by our method continues to improve when a larger sample size is used, yielding a subnetwork agreement of about 93%. Our predicted subnetworks are also supported by many existing biological literature and allow biologists further insights to the mechanisms behind the diseases studied.
This work is important because the subnetworks, being consistent across independent datasets, also serve as informative and relevant features. Thus, we are able to build better predictive algorithms for inferring the outcome of patients. We also present a useful subnetwork-feature scoring function that is not only able to predict the outcome of future samples measured on independent microarray platforms but is also able to handle small-size training samples. This enables researchers to find the mechanisms behind a disease and use them directly as a tool for diagnosis and prognosis.