A Novel Approach to Revealing Positive and Negative Co-Regulated Genes
-
Abstract
As explored by biologists, there is a real and emerging need to identifyco-regulated gene clusters, which include both positive and negativeregulated gene clusters. However, the existing pattern-based andtendency-based clustering approaches are only designed for findingpositive regulated gene clusters. In this paper, a new subspaceclustering model called g-Cluster is proposed for geneexpression data. The proposed model has thefollowing advantages: 1) find both positive and negativeco-regulated genes in a shot, 2) get away from therestriction of magnitude transformation relationship amongco-regulated genes, and 3) guarantee quality of clusters andsignificance of regulations using a novel similarity measurementgCode and a user-specified regulation threshold \delta,respectively. No previous work measures up to the task which has beenset. Moreover, MDL technique is introduced to avoid insignificantg-Clusters generated. A tree structure, namely GS-tree, is also designed,and two algorithms combined with efficient pruning andoptimization strategies to identify all qualified g-Clusters.Extensive experiments are conducted on real and synthetic datasets. Theexperimental results show that 1) the algorithm is able to findan amount of co-regulated gene clusters missed by previous models,which are potentially of high biological significance, and 2)the algorithms are effective and efficient, and outperform theexisting approaches.
-
-