›› 2010, Vol. 25 ›› Issue (1): 42-52.

• Special Issue on Computational Challenges from Modern Molecular Biology • Previous Articles     Next Articles

Some Algorithmic Challenges in Genome-Wide Ortholog Assignment

Tao Jiang (姜涛), Fellow, ACM   

  1. Department of Computer Science and Engineering, University of California, Riverside, CA 92521, U.S.A.
    School of Information Science and Technology, Tsinghua University, Beijing 100084, China
  • Received:2009-09-01 Revised:2009-10-21 Online:2010-01-05 Published:2010-01-05
  • About author:
    Tao Jiang received the B.S. degree in computer science and technology from the University of Science and Technology of China, Hefei, in July 1984 and Ph.D. degree in computer science from University of Minnesota in Nov. 1988. He was a faculty member at McMaster University, Hamilton, Ontario, Canada during Jan. 1989~July 2001 and is now professor of computer science and engineering at University of California --- Riverside (UCR). He is also a member of the UCR Institute for Integrative Genome Biology, a member of the Center for Plant Cell Biology, a principal scientist at Shanghai Center for Bioinformation Technology, and Changjiang Visiting Professor at Tsinghua University. Tao Jiang's recent research interest includes combinatorial algorithms, computational molecular biology, bioinformatics, and computational aspects of information retrieval. He is a fellow of ACM and of AAAS. More information about his work can be found at http://www1.cs.ucr.edu/~jiang.
  • Supported by:

    This work is supported by the NSF of USA under Grant No. IIS-0711129.

Genome-scale assignment of orthologous genes is a fundamental and challenging problem in computational biology and has a wide range of applications in comparative genomics, functional genomics, and systems biology. Many methods based on sequence similarity, phylogenetic analysis, chromosomal syntenic information, and genome rearrangement have been proposed in recent years for ortholog assignment. Although these methods produce results that largely agree with each other, their results may still contain significant differences. In this article, we consider the recently proposed parsimony approach for assigning orthologs between closely related genomes based on genome rearrangement, which essentially attempts to transform one genome into another by the smallest number of genome rearrangement events including reversal, translocation, fusion, and fission, as well as gene duplication events. We will highlight some of the challenging algorithmic problems that arise in the approach including (i) minimum common substring partition, (ii) signed reversal distance with duplicates, and (iii) signed transposition distance with duplicates. The most recent progress towards the solution of these problems will be reviewed and some open questions will be posed. We will also discuss some possible extensions of the approach to the simultaneous comparison of multiple genomes.

[1] Fitch W M. Distinguishing homologous from analogous proteins. Syst. Zool., 1970, 19(2): 99-113.
[2] Koonin E V. Orthologs, paralogs, and evolutionary genomics. Annu. Rev. Genet., 2005, 39: 309-338.
[3] Remm M, Storm C, Sonnhammer E. Automatic clustering of orthologs and in-paralogs from pairwise species comparisons. J. Mol. Biol., 2001, 314(5): 1041-1052.
[4] Sankoff D. Genome rearrangement with gene families. Bioinformatics, 1999, 15(11): 909-917.
[5] Tatusov R L, Galperin M Y, Natale D A, Koonin E V. The COG database: A tool for genome-scale analysis of protein functions and evolution. Nucleic Acids Res., 2000, 28(1): 33- 36.
[6] Tatusov R L, Koonin E V, Lipman D J. A genomic perspective on protein families. Science, 1997, 278: 631-637.
[7] Altschul S, Madden T, Schaffer A, Zhang J, Zhang Z, Miller W, Lipman D. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Research, 1997, 25(17): 3389-3402.
[8] Chen X, Zheng J, Fu Z, Nan P, Zhong Y, Lonardi S, Jiang T. Computing the assignment of orthologous genes via genome rearrangement. In Proc. the 3rd Asia Pacific Bioinformatics Conf. (APBC 2005), Singapore, Jan. 17-21, 2005, pp.363-378.
[9] Chen X, Zheng J, Fu Z, Nan P, Zhong Y, Lonardi S, Jiang T. The assignment of orthologous genes via genome rearrangement. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2005, 2(4): 302-315.
[10] Fu Z, Chen X, Vacic V, Nan P, Zhong Y, Jiang T. A parsimony approach to genome-wide ortholog assignment. In Proc. the 10th Annual International Conference on Research in Computational Molecular Biology (RECOMB), Venice, Italy, April 2-5, 2006, pp.578-594.
[11] Fu Z, Chen X, Vacic V, Nan P, Zhong Y, Jiang T. MSOAR: A high-throughput ortholog assignment system based on genome rearrangement. Journal of Computational Biology, 2007, 14(9): 1160-1175.
[12] Lee Y, Sultana R, Pertea G, Cho J, Karamycheva S, Tsai J, Parvizi B, Cheung F, Antonescu V, White J, Holt I, Liang F, Quackenbush J. Cross-referencing eukaryotic genomes: TIGR orthologous gene alignments (TOGA). Genome Res., 2002, 12(3): 493-502.
[13] Li L, Stoeckert C, Roos D. OrthoMCL: Identification of ortholog groups for eukaryotic genomes. Genome Res., 2003, 13(9): 2178-2189.
[14] Yuan Y P, Eulenstein O, Vingron M, Bork P. Towards detection of orthologues in sequence databases. Bioinformatics, 1998, 14(3): 285-289.
[15] Storm C, Sonnhammer E. Automated ortholog inference from phylogenetic trees and calculation of orthology reliability. Bioinformatics, 2002, 18(1): 92-99.
[16] Cannon S B, Young N D. OrthoParaMap: Distinguishing orthologs from paralogs by integrating comparative genome data and gene phylogenies. BMC Bioinformatics, 2003, 4(1): 35.
[17] Zheng X H, Lu F, Wang Z, Zhong F, Hoover J, Mural R. Using shared genomic synteny and shared protein functions to enhance the identification of orthologous gene pairs. Bioinformatics, 2005, 21(6): 703-710.
[18] Kuzniar A, van Ham R, Pongor S, Leunissen J. The quest for orthologs: Finding the corresponding gene across genomes. Trends in Genetics, 2008, 24(11): 539-550.
[19] El-Mabrouk N. Reconstructing an ancestral genome using minimum segments duplications and reversals. Journal of Computer and System Sciences, 2002, 65(3): 442-464.
[20] Marron M, Swenson K, Moret B. Genomic distances under deletions and insertions. Theoretical Computer Science, 2004, 325(3): 347-360.
[21] Swenson K, Marron M, Earnest-DeYoung J. Moret B. Approximating the true evolutionary distance between two genomes. In Proc. the 7th SIA Workshop on Algorithm Engineering & Experiments, Vancouver, Canada, Jan. 22, 2005, pp.121-125.
[22] Swenson K, Pattengale N, Moret B. A framework for orthology assignment from gene rearrangement data. In Proc. the 3rd RECOMB Workshop on Comparative Genomics (RECOMB-CG2005), Dublin, Ireland, Sept. 18-20, 2005, LNCS 3678, Springer, pp.153-166.
[23] Hannenhalli S, Pevzner P. Transforming cabbage into turnip: Polynomial algorithm for sorting signed permutations by reversals. J. ACM, 1999, 46(1): 1-27; extended abstract in Proc. ACM STOC, Las Vegas, USA, May 23-June 1, 1995, pp.178-189.
[24] Shi G, Zhang L, Jiang T. MSOAR 2.0: Incorporating tandem duplications into ortholog assignment based on genome rearrangement. In Proc. the 8th LSS Computational Systems Bioinformatics Conference, Stanford, USA, August 10- 12, 2009, pp.12-24.
[25] Bairoch A, Apweiler R, Wu C H, Barker W C, Boeckmann B, Ferro S, Gasteiger E, Huang H, Lopez R, Magrane M, Martin M J, Natale D A, O’Donovan C, Redaschi N, Yeh L S. The Universal Protein Resource (UniProt). Nucleic Acids Res., 2005, 33(Database Issue): D154-D159.
[26] http://www.gene.ucl.ac.uk/cgi-bin/nomenclature/hcop.pl.
[27] ftp://ftp.pantherdb.org/sequence classifications/.
[28] http://www.jax.org.
[29] M Ozery-Flato, Ron Shamir. Two notes on genome rearragnements. Journal of Bioinformatics and Computational Biology, 2003, 1(1): 71-94.
[30] Tesler G. Efficient algorithms for multichromosomal genome rearrangements. Journal of Computer and System Sciences, 2002, 65(3): 587-609.
[31] Hannenhalli S, Pevzner P A. Transforming men into mice (polynomial algorithm for genomic distance problem). In Proc. IEEE 36th Ann. Symp. Foundations of Comp. Sci. Milwaukee, USA, Oct. 23-25, 1995, pp.581-592.
[32] Christie D, Irving R. Sorting strings by reversals and by transpositions. SIAM J. Discrete Math., 2001, 14(2): 193-206.
[33] Kaplan H, Shamir R, Tarjan R. Faster and simpler algorithm for sorting signed permutations by reversals. In Proc. the 8th Annual ACM-SIAM Symposium on Discrete Algorithms, New Orleans, USA, Jan. 5-7, 1997, pp.344-351.
[34] Bader D, Moret B, Yan M. A linear-time algorithm for computing inversion distance between signed permutations with an experimental study. Journal of Computational Biology, 2001, 8(5): 483-491.
[35] Radcliffe A, Scott A, Wilmer E. Reversals and transpositions over finite alphabets. SIAM J. Discrete Math., 2005, 19(1): 224-244.
[36] Caprara A. Sorting by reversals is difficult. In Proc. the First Annual International Conference on Computational Molecular Biology, Santa Fe, USA, Jan. 20-23, 1997, pp.75-83.
[37] Caprara A. Sorting permutations by reversals and Eulerian cycle decompositions. SIAM J. Discrete Math., 1999, 12(1): 91-110.
[38] Bafna V, Pevzner P. Genome rearrangements and sorting by reversals. SIAM J. Comput., 1996, 25(2): 272-289; extended abstract appeared in Proc. IEEE FOCS 1993, Palo Alto, USA, Nov. 3-5, 1993, pp.148-157.
[39] Kececioglu J, Sankoff D. Exact and approximation algorithms for sorting by reversals, with application to genome rearrangement. Algorithmica, 1995, 13(1/2): 180-210.
[40] Lin G, Jiang T. A further improved approximation algorithm for breakpoint graph decomposition. Journal of Combinatorial Optimization, 2004, 8(2): 183-194.
[41] Gu Q, Peng S, Sudborough H. A 2-approximation algorithm for genome rearrangements by reversals and transpositions. Theoret. Comput. Sci., 1999, 210(2): 327-339.
[42] Hartman T, Sharon R. A 1.5-approximation algorithm for sorting by transpositions and transreversals. Journal of Computer and System Sciences, 2005, 70(3): 300-320.
[43] Bafna V, Pevzner P. Sorting by transpositions. SIAM J. Discrete Math., 1998, 11(2): 224-240.
[44] Goldstein A, Kolman P, Zheng J. Minimum common string partition problem: Hardness and approximations. In Proc. the 15th International Symposium on Algorithms and Computation, Hong Kong, China, Dec. 20-22, 2004, pp.484-495.
[45] Chrobak M, Kolman P, Sgall J. The greedy algorithm for the minimum common string partition problem. In Proc. the 7th International Workshop on Approximation Algorithms for Combinatorial Optimization Problems, Cambridge, USA, Aug. 22-24, 2004, LNCS 3122, Springer, pp.84-95.
[46] Kolman P. Approximating aeversal distance for strings with bounded number of duplicates. In Proc. the 30th International Symposium on Mathematical Foundations of Computer Science, Gdansk, Poland, Aug. 29-Sept. 2, 2005, pp.580-590.
[47] Halldorsson M M. Approximating discrete collections via local improvements. In Proc. the Sixth Annual ACM-SIAM Symp. Discrete Algorithms, San Francisco, USA, Jan. 22-24, 1995, pp.160-169.
[48] Kolman P, Walen T. Reversal distance for strings with duplicates: Linear time approximation using hitting set. In Proc. the 4th Workshop on Approximation and Online Algorithms, Zurich, Switzerland, Sept. 14-15, 2006, pp.279-289.
[49] Bourque G, Pevzner P. Genome-scale evolution: Reconstructing gene orders in the ancestral species. Genome Research, 2002, 12(1): 26-36.
[50] Sankoff D, Blanchette M.Multiple genome rearrangement and breakpoint phylogeny. Journal of Computational Biology, 1998, 5(3): 555-570.
[51] Fu Z, Jiang T. Clustering of main orthologs for multiple genomes. Journal of Bioinformatics and Computational Biology, 2008, 6(3): 573-584.
[52] Wang L, Jiang T, Lawler E. Approximation algorithms for tree alignment with a given phylogeny. Algorithmica, 1996, 16(3): 302-315.

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] Min Yinghua; Yashwant K. Malaiya; Jin Boping;. Aliasing Errors in Parallel Signature Analyzers[J]. , 1990, 5(1): 24 -40 .
[2] Ruo-Chen Liu, Li-Cheng Jiao, and Hai-Feng Du. Clonal Strategy Algorithm Based on the Immune Memory[J]. , 2005, 20(5): 728 -734 .
[3] En-Jian Bai and Xiao-Juan Liu. Some Notes on Prime-Square Sequences[J]. , 2007, 22(3): 481 -486 .
[4] Xiao-Yong Fang, Zhi-Gang Luo, and Zheng-Hua Wang. Predicting RNA Secondary Structure Using Profile Stochastic Context-Free Grammars and Phylogenic Analysis[J]. , 2008, 23(4 ): 582 -589 .
[5] KangWoo Lee. Guiding Attention by Cooperative Cues[J]. , 2008, 23(5 ): 874 -884 .
[6] Yu-Ling Hsueh, Roger Zimmermann, Member, ACM, Senior Member, IEEE, and Wei-Shinn Ku, Member, ACM, IEEE. Efficient Location Updates for Continuous Queries over Moving Objects[J]. , 2010, 25(3): 415 -430 .
[7] Ming-Wei Zhang (张明卫), Member, CCF, Bin Zhang (张斌), Senior Member, CCF, Ying Liu (刘莹), Jun Na (那俊) and Zhi-Liang Zhu (朱志良), Senior Member, CCF. Web Service Composition Based on QoS Rules[J]. , 2010, 25(6): 1143 -1156 .
[8] Wei-Feng Pan (潘伟丰), Member, CCF, Bing Li (李兵), Senior Member, CCF, Yu-Tao Ma (马于涛), Member, CCF, ACM, Ye-Yi Qin (覃叶宜) and Xiao-Yan Zhou (周晓燕). Measuring Structural Quality of Object-Oriented Softwares via Bug Propagation Analysis on Weighted Software Networks[J]. , 2010, 25(6): 1202 -1213 .
[9] Mahshid Rahnamay-Naeini, and Masoud Sabaei. A Combinational Perspective in Stimulating Cooperation in Mobile Ad Hoc Networks[J]. , 2011, 26(2): 256 -268 .
[10] Wei Jiang (姜伟), Tian Wu (吴甜), Song-Lin Hu (虎嵩林), Senior Member, CCF, and Zhi-Yong Liu (刘志勇), Senior Member, CCF. QoS-Aware Automatic Service Composition: A Graph View[J]. , 2011, 26(5): 837 -853 .

ISSN 1000-9000(Print)

         1860-4749(Online)
CN 11-2296/TP

Home
Editorial Board
Author Guidelines
Subscription
Journal of Computer Science and Technology
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China
Tel.:86-10-62610746
E-mail: jcst@ict.ac.cn
 
  Copyright ©2015 JCST, All Rights Reserved