A Novel Approach to Revealing Positive and Negative Co-Regulated Genes

Yu-Hai Zhao, Guo-Ren Wang, Ying Yin, and Guang-Yu Xu   

  1. Department of Computer Science and Engineering, Northeastern University, Shengyang 110004, China
  • Received:2006-05-01 Revised:2006-12-19 Online:2007-03-10 Published:2007-03-10

As explored by biologists, there is a real and emerging need to identify co-regulated gene clusters, which include both positive and negative regulated gene clusters. However, the existing pattern-based and tendency-based clustering approaches are only designed for finding positive regulated gene clusters. In this paper, a new subspace clustering model called {g-Cluster} is proposed for gene expression data. The proposed model has the following advantages: $1)$ find both positive and negative co-regulated genes in a shot, $2)$ get away from the restriction of magnitude transformation relationship among co-regulated genes, and $3)$ guarantee quality of clusters and significance of regulations using a novel similarity measurement {gCode} and a user-specified regulation threshold $\delta$, respectively. No previous work measures up to the task which has been set. Moreover, MDL technique is introduced to avoid insignificant g-Clusters generated. A tree structure, namely GS-tree, is also designed, and two algorithms combined with efficient pruning and optimization strategies to identify all qualified g-Clusters. Extensive experiments are conducted on real and synthetic datasets. The experimental results show that $1)$ the algorithm is able to find an amount of co-regulated gene clusters missed by previous models, which are potentially of high biological significance, and $2)$ the algorithms are effective and efficient, and outperform the existing approaches.

