Journal of Computer Science and Technology  2010, 25(1) 10-25 DOI:     ISSN: 1000-9000 CN: CN 11-2296/TP

本期目录 | 过刊浏览 | 高级检索                                                            [打印本页]   [关闭]
扩展功能
本文信息
Supporting info
PDF(661KB)
服务与反馈
把本文推荐给朋友
加入我的书架
加入引用管理器
引用本文
Email Alert
文章反馈
浏览反馈信息
本文关键词相关文章
chromosome rearrangement
comparative genomics
gene clusters
phylogenomics
本文作者相关文章
David Sankoff
Chunfang Zheng
Adriana Munoz
Zhenyu Yang
Zaky Adam
Robert Warren
Vicky Choi
Qian Zhu
PubMed
Article by David Sankoff
Article by Chunfang Zheng
Article by Adriana Munoz
Article by Zhenyu Yang
Article by Zaky Adam
Article by Robert Warren
Article by Vicky Choi
Article by Qian Zhu
中文题目: 基因顺序的进化过程重建中的问题
中文导读

根据染色体重排可以重建物种进化历史。本文是一篇对于相关研究的综述。染色体重排是指基因从所在的染色体的正常位上易位至染色体的另一个位置上,或移动到另一条染色体上。染色体重排对物种分化来说是一个稳定、长存的影响因素,亲缘关系较近的物种之间染色体重排较少。染色体重排会导致基因在染色体上顺序的变化,因此可以根据基因顺序的相似程度来重建进化树。

在比较基因组时,基因序列通常以符号排列(Signed permutations)表示。常见的染色体重排有:倒位、转座、相互易位、染色体分裂和融合。倒位会打破两个相邻点(两个断点),转座会形成三个断点,易位会形成二个断点,而分裂会打破一个相邻点。从一个基因组变到另一个基因组所需的五个基本染色体重排操作的次数称为基因组距离。除此之外,还有DCJ距离,被定为DCJ操作的次数。一次DCJ操作会带来一个相邻点与另一个相邻点或端粒的一次重组。倒位、相互易位、染色体分裂和融合都可以通过一次DCJ操作来实现,而转座需要两次DCJ操作。DCJ距离可以通过定义一个二部图,并寻找其所含的环和奇数长路径的个数来计算。

系统发生树是一种常用的物种进化关系表示图。构建系统发生树的一个基因问题就是推断祖先基因组。通过现有物种染色体上的基因顺序可以推断它们祖先的基因顺序。常用的假设是祖先基因组应该离所有子代基因组的距离和最小。

许多物种的基因组在进化过程中发生过全基因组复制,而基因组二分(genome halving)就是要根据当前基因组推断复制前的基因组。基因组二分常会遇到多重解的情况,这时需要以另一个单拷贝基因组为模板来从多个解中挑选出一个最好的,这个过程被称为有指导的二分(guided halving)。除了二倍,基因组二分算法还可扩展到三倍、四倍甚至更多倍复制的情况。

以上问题的求解一般要求预知全基因组基因的顺序,但很多物种还目前还没有全基因组序列,只有一些测序的片断(contig)。这些片断中含有部分基因的局部顺序关系。由这些片断可以构建一个有向无环图,但有向无环图的比较却很难。另一个简单的策略就是直接把每个片断当作一个完整的染色体。

由于同源基因鉴定的错误,分析之前还需从片断库中选出一个兼容的片断集以最大地包含原有数据中的信息—MSR问题。这个问题可以通过转换成兼容图(compatible graph)中的最大权完全子图问题,或者矛盾图(conflict graph)中的最大权独立集问题。有时由于两个物种的基因顺序相差很大,还需要把一些相近的基因聚在一起考虑,文章回顾了基因聚类的一些问题。

Issues in the Reconstruction of Gene Order Evolution

David Sankoff1, Chunfang Zheng2,3 (郑春芳), Adriana Muñoz4, Zhenyu Yang1 (杨振宇), Zaky Adam4,5, Robert Warren4, Vicky Choi6 (蔡维仁), and Qian Zhu7,8 (祝谦)

1Department of Mathematics and Statistics, University of Ottawa, Ottawa, K1N 6N5, Canada
2Department of Biology, University of Ottawa, Ottawa, K1N 6N5, Canada
3D'e}partement d'informatique et de recherche op'e}rationnelle, Universit'e} de Montr'e}al, H3C 3J7, Canada
4School of Information Technology and Engineering, University of Ottawa, Ottawa, K1N 6N5, Canada
5Biology Department, The Pennsylvania State University, University Park, PA 16802, U.S.A.
6Department of Computer Science, Virginia Polytechnic Institute, Falls Church, VA 22043, U.S.A.
7Department of Biochemistry, University of Ottawa, Ottawa, K1N 6N5, Canada
8Department of Computer Science, Princeton University, Princeton, NJ 08544, U.S.A.

Abstract:

As genomes evolve over hundreds of millions years, the chromosomes become rearranged, with segments of some chromosomes inverted, while other chromosomes reciprocally exchange chunks from their ends. These rearrangements lead to the scrambling of the elements of one genome with respect to another descended from a common ancestor. Multidisciplinary work undertakes to mathematically model these processes and to develop statistical analyses and mathematical algorithms to understand the scrambling in the chromosomes of two or more related genomes. A major focus is the reconstruction of the gene order of the ancestral genomes.

Keywords: chromosome rearrangement    comparative genomics    gene clusters    phylogenomics  
收稿日期 2009-09-01 修回日期 2009-12-14 出版日期  
DOI:
基金项目:

This work was supported in part by grants and fellowships from the Natural Science and Engineering Council of Canada (NSERC).

作者简介:
David Sankoff holds the Canada Research Chair in Mathematical Genomics in the Mathematics and Statistics Department at the University of Ottawa, and is cross-appointed to the Biology Department and the School of Information Technology and Engineering. His research interest is comparative genomics, particularly probabi-lity models, statistics and algorithms for genome rearrangements.
Chunfang Zheng studied biology at Beijing Sports University and computer science at the University of Ottawa, where she then obtained her Master's and Ph.D. degrees in biology. She is a postdoctoral fellow with Nadia El-Mabrouk at the Universit;é|de Montr;éal. She has worked on the comparison of partially ordered and noisy genomes and the incorporation of whole genome duplication descendants into gene order phylogeny.
Adriana Muñoz holds a Master's degree from the University of Alberta and is currently a computer science Ph.D. candidate interested in the comparison of incompletely assembled genomes.
Zhenyu Yang holds a Master's degree in computer science from Liaoning University in China and is currently a mathematics Ph.D. candidate interested in gene clusters in comparative genomics.
Zaky Adam holds a Master's degree from the University of Western Ontario and completed his computer science Ph.D. at the University of Ottawa on gene order phylogeny. He is currently a postdoctoral fellow with Kateryna Makova at Penn State University.
Robert Warren holds a Master's degree from the University of British Columbia and is currently a computer science Ph.D. candidate interested in algorithms for genome halving, genome aliquoting and related problems.
Vicky Choi did her undergraduate studies at the Chinese University of Hong Kong and her masters at the Hong Kong University of Science and Technology. She was awarded her Ph.D. degree in computer science from Rutgers University. She has been assistant professor at Virginia Tech since 2004, except for a sabbatical leave working at a quantum computing enterprise. Her main research areas are the design, implementation, and analysis of algorithms and quantum computing.
Qian Zhu did his undergraduate studies in the biochemistry program at the University of Ottawa. He is currently a graduate student in computer science at Princeton University.

参考文献:

[1] Venter J C, Adams M D, Myers E W, Li PW, Mural R J, Sutton G G et al. The sequence of the human genome. Science, 2001, 291(5507): 1304-1351.
[2] Mouse Genome Sequencing Consortium. Initial sequencing and comparative analysis of the mouse genome. Nature, 2002, 420(6915): 520-562.
[3] Sankoff D, Leduc G, Antoine N, Paquin B, Lang B F, Cedergren R. Gene order comparisons for phylogenetic inference: Evolution of the mitochondrial genome. Proc. the National Academy of Sciences USA, 1992, 89(14): 6575-6579.
[4] Sankoff D. Edit distance for genome comparison based on nonlocal operations. In Proc. the Third Annual Symposium on Combinatorial Pattern Matching (CPM1992), Tucson, USA, April 29-May 1, 1992, pp.121-135.
[5] Cosner M E, Jansen R K, Moret B M E, Raubeson L A, Wang L-S, Warnow T, Wyman S. An Empirical Comparison of Phylogenetic Methods on Chloroplast Gene Order Data in Campanulaceae. Comparative Genomics: Empirical and Analytical Approaches to Gene Order Dynamics, Map Alignment, and the Evolution of Gene Families, Sankoff D, Nadeau J (eds.), Dordrecht: Kluwer Academic Publishers, 2000, pp.99-121.
[6] Ajana Y, Lefebvre J-F, Tillier E R M, El-Mabrouk N. Exploring the set of all minimal sequences of reversals — An application to test the replication-directed reversal hypothesis. In Proc. the Second International Workshop on Algorithms in Bioinformatics (WABI 2002), Rome, Italy, Sept. 17-21, 2002, pp.300-315.
[7] Hannenhalli S, Pevzner P A. Transforming cabbage into turnip: Polynomial algorithm for sorting signed permutations by reversals. Journal of the ACM, 1999, 46(1): 1-27.
[8] Caprara A. Sorting permutations by reversals and Eulerian cycle decompositions. SIAM Journal on Discrete Mathematics, 1999, 12(1): 91-110.
[9] Watterson G A, Ewens W J, Hall T E, Morgan A. The chromosome inversion problem. Journal of Theoretical Biology, 1982, 99(1): 1-7.
[10] Sankoff D. Mechanisms of genome evolution: Models and inference. Bulletin of the International Statistical Institute, 1989, 47(3): 461-475.
[11] Sturtevant A H, Novitski E. The homologies of the chromosome elements in the genus Drosophila. Genetics, 1941, 26(5): 517-541.
[12] Hannenhalli S, Pevzner P A. Transforming men into mice (polynomial algorithm for genomic distance problem). In Proc. the Thirty-Sixth Annual Symposium on Foundations of Computer Science (FOCS 1995), Milwaukee, USA, Oct. 23- 25, 1995, pp.581-592.
[13] Tesler G. Efficient algorithms for multichromosomal genome rearrangements. Journal of Computer and System Sciences, 2002, 65(3): 587-609.
[14] Sankoff D, Blanchette M. Multiple genome rearrangement and breakpoint phylogeny. Journal of Computational Biology, 1998, 5(3): 555-570.
[15] El-Mabrouk N, Bryant D, Sankoff D. Reconstructing the pre-doubling genome. In Proc. the Third Annual International Conference on Computational Molecular Biology (RECOMB), Lyon, France, April 11-14, 1999, pp.154-163.
[16] El-Mabrouk N, Sankoff D. The reconstruction of doubled genomes. SIAM Journal on Computing, 2003, 32(2): 754- 792.
[17] Pevzner P, Tesler G. Human and mouse genomic sequences reveal extensive breakpoint reuse in mammalian evolution. Proceedings of the National Academy of Sciences USA, 2003, 100(13): 7672-7677.
[18] Kent W J, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution’s cauldron: Duplication, deletion, and rearrangement in the mouse and human genomes. Proceedings of the National Academy of Sciences USA, 2003, 100(20): 11484- 11489.
[19] Kent W J, Sugnet C W, Furey T S, Roskin K M, Pringle T H, Zahler A M, Haussler A D. The Human genome browser at UCSC. Genome Research, 2002, 12(6): 996-1006.
[20] Mazowita M, Haque L, Sankoff D. Stability of rearrangement measures in the comparison of genome sequences. Journal of Computational Biology, 2006, 13(2): 554-566.
[21] Sinha A U, Meller J. Sensitivity analysis for reversal distance and breakpoint reuse in genome rearrangements. Pacific Symposium on Biocomputing, 2008, 13: 37-38.
[22] Jiang T. Some algorithmic challenges in genome-wide ortholog assignment. J. Comput. Sci. & Technol., 2010, 25(1): 42-52.
[23] Tannier E, Zheng C, Sankoff D. Multichromosomal median and halving problems under different genomic distances. BMC Bioinformatics, 2009, 10: 120.
[24] Fertin G, Labarre A, Rusu I, Tannier E, Vialette S. Combinatorics of Genome Rearrangements. Cambridge, Massachusetts: The MIT Press, 2009.
[25] Yancopoulos S, Attie O, Friedberg R. Efficient sorting of genomic permutations by translocation, inversion and block interchange. Bioinformatics, 2005, 21(16): 3340-3346,
[26] Bergeron A, Mixtacki J, Stoye J. A unifying view of genome rearrangements. In Proc. the Sixth International Workshop on Algorithms in Bioinformatics (WABI 2000), Zurich, Switzerland, Sept. 11-13, 2006, pp.163-173.
[27] Dalevi D, Eriksen N. Expected gene-order distances and model selection in bacteria. Bioinformatics, 2008, 24(11): 1332-1338.
[28] Eriksen N, Hultman A. Estimating the expected reversal distance after a fixed number of reversals. Advances in Applied Mathematics, 2004, 32(3): 439-453.
[29] Wang L-S, Warnow T. Distance-Based Genome Rearrangement Phylogeny. J. Mol. Wvol., 2006, 63(4): 473-483.
[30] Mu˜noz A, Sankoff D. Rearrangement phylogeny of genomes in contig form. In Proc. the Fifth International Symposium on Bioinformatics Research and Applications (ISBRA2009), Fort Lauderdale, USA, May 13-16, 2009, pp.160-172.
[31] Adam Z, Turmel M, Lemieux C, Sankoff D. Common intervals and symmetric difference in a model-free phylogenomics, with an application to streptophyte evolution. Journal of Computational Biology, 2007, 14(4): 436-445.
[32] Zhu Q, Adam Z, Choi V, Sankoff D. Generalized gene adjacencies, graph bandwidth, and clusters in yeast evolution. Transactions on Computational Biology and Bioinformatics, 2009, 6(2): 213-220.
[33] Tannier E. Yeast ancestral genome reconstructions: The possibilities of computational methods. In Proc. the 7th Ann. RECOMB Satellite Workshop on Comparative Genomics (RECOMB CG2009), Budapest, Hungary, Sept. 27- 29, 2009, pp.1-12.
[34] Sankoff D, Blanchette M. The median problem for breakpoints in comparative genomics. In Proc. the Third Annual International Conference on Computing and Combinatorics (COCOON1997), Shanghai, China, Aug. 20-22, 1997, pp.251-263.
[35] Bader D, Moret B M. GRAPPA runs in record time. HPCwire. November 23, 2000, 9(47).
[36] Siepel A C. Exact algorithms for the reversal median problem
[Master’s Thesis]. University of New Mexico, 2001.
[37] Caprara A. On the practical solution of the reversal median problem. In Proc. the First International Workshop on Algorithms in Bioinformatics (WABI 2001), Aarhus, Denmark, Aug. 28-31, 2001, pp.238-251.
[38] Bourque G, Pevzner P A. Genome-scale evolution: Reconstructing gene orders in the ancestral species. Genome Research, 2002, 12(1): 26-36.
[39] Xu A W. A fast and exact algorithm for the median of three problem — A graph decomposition approach. In Proc. Sixth Annual RECOMB Satellite Workshop Comparative Genomics (RECOMB CG2008), Paris, France, Oct. 13-15, 2008, pp.184-197.
[40] Xu A W, Sankoff D. Decompositions of multiple breakpoint graphs and rapid exact solutions to the median problem. In Proc. the Eighth International Workshop on Algorithms in Bioinformatics (WABI 2008), Karlsruhe, Germany, Sept. 15- 17, 2008, pp.25-37.
[41] Adam Z, Sankoff D. A statistically fair comparison of ancestral genome reconstructions, based on breakpoint and rearrangement distances. In Proc. the Seventh Annual RECOMB Satellite Workshop on Comparative Genomics (RECOMB CG2009), Budapest, Hungary, Sept. 16-18, 2009, pp.193-204.
[42] Adam Z, Sankoff D. The ABCs of MGR with DCJ. Evolutionary Bioinformatics Online, 2008, 4: 69-74.
[43] Warren R, Sankoff D. Genome halving with double cut and join. Journal of Bioinformatics and Computational Biology, 2009, 7(2): 357-371.
[44] Mixtacki J. Genome halving under DCJ revisited. In Proc. the Fourteenth Annual Conference on Computing and Combinatorics (COCOON), Dalian, China, June 27-29, 2008, pp.276-286.
[45] Blin G, Chauve C, Fertin G, Rizzi R, Vialette S. Comparing genomes with duplications: A computational complexity point of view. Transactions on Computational Biology and Bioinformatics, 2007, 4(4): 523-534.
[46] Zheng C, Zhu Q, Sankoff D. Genome halving with an outgroup. Evolutionary Bioinformatics, 2006, 2(13): 319-326.
[47] Sankoff D, Zheng C, Zhu Q. Polyploids, genome halving and phylogeny. Bioinformatics, 2007, 23(13): i433-i439.
[48] Zheng C, Zhu Q, Adam Z, Sankoff D. Guided genome halving: Hardness, heuristics and the history of the Hemiascomycetes. Bioinformatics, 2008, 24(13): i96-i104.
[49] Sankoff D, Zheng C, Wall P K, dePamphilis C, Leebens-Mack J, Albert V A. Towards improved reconstruction of ancestral gene order in angiosperm phylogeny. Journal of Computational Biology, 2009, 16(10): 1353-1367.
[50] Warren R, Sankoff D. Genome aliquoting with double cut and join. BMC Bioinformatics, 2009, 10: S2.
[51] Zheng C, Lenert A, Sankoff D. Reversal distance for partially ordered genomes. Bioinformatics, 2005, 21(Suppl. 1): i502- i508.
[52] Zheng C, Sankoff D. Genome rearrangements with partially ordered chromosomes. Journal of Combinatorial Optimization, 2006, 11(2): 133-144.
[53] Blin G, Blais E, Hermelin D, Guillon P, Blanchette M, El- Mabrouk N. Gene maps linearization using genomic rearrangement distances. Journal of Computational Biology, 2007, 14(4): 394-407.
[54] Chen X, Cui Y. An approximation algorithm for the minimum breakpoint linearization problem. Transactions on Computational Biology and Bioinformatics, 2009, 6(3): 401-409.
[55] Gaul E, Blanchette M. Ordering partially assembled genomes using gene arrangements. In Proc. the Fourth Annual Workshop on Comparative Genomics (RECOMB CG2006), Montreal, Canada, Sept. 24-26, 2006, pp.113-128.
[56] Bhutkar A, Russo S, Smith T F, Gelbart W M. Techniques for multi-genome synteny analysis to overcome assembly limitations. Genome Informatics, 2006, 17(2): 152-161.
[57] Zheng C, Zhu Q, Sankoff D. Removing noise and ambiguities from comparative maps in rearrangement analysis. Transactions on Computational Biology and Bioinformatics, 2007, 4(4): 515-522.
[58] Choi V, Zheng C, Zhu Q, Sankoff D. Algorithms for the extraction of synteny blocks from comparative maps. In Proc. the Seventh International Workshop on Algorithms in Bioinformatics (WABI 2007), Philadelphia, USA, Sept. 8-9, 2007, pp.277-288.
[59] Ostergard P R J. A new algorithm for the maximum-weight clique problem. Nordic Journal of Computing, 2001, 8(4): 424-436.
[60] Kumlander D. A new exact algorithm for the maximumweight clique problem based on a heuristic vertex-coloring and a backtrack search. In The Fourth European Congress of Mathematics, Stockholm, Sweden, June 27-July 2, 2004, MS. and Poster.
[61] Bulteau L, Fertin G, Rusu I. Maximal strip recovery problem with gaps: Hardness and approximation algorithms. In Proc. the 20th Int. Symp. Algorithms and Computation (ISAAC 2009), Hawaii, USA, Dec. 16-18, 2009, pp.710-719.
[62] Chen Z, Fu B, Jiang M, Zhu B. On recovering syntenic blocks from comparative maps. In Proc. the Second Annual Int. Conf. Combinatorial Optimization and Applications (COCOA2008). St. John’s, Canada, Aug. 21-24, 2008, pp.319-327.
[63] Jiang M. Inapproximability of maximal strip recovery. In Proc. the 20th Int. Symp. Algorithms and Computation (ISAAC 2009), Hawaii, USA, Dec. 16-18, 2009, pp.616-625.
[64] Wang L, Zhu B. On the tractability of maximal strip recovery. In Proc. the Sixth Annual Conf. Theory and Applications of Models of Computation (TAMC2009), Changsha, China, May 18-22, 2009, pp.400-409.
[65] Hoberman R, Durand D. The incompatible desiderata of gene cluster properties. In Proc. the Fifth Annual Workshop on Comparative Genomics (RECOMB CG2005), Dublin, Ireland, Sept. 18-20, 2005, pp.73-87.
[66] Sankoff D, Haque L. Power boosts for cluster tests. In Proc. the Fifth Annual Workshop on Comparative Genomics (RECOMB CG), Dublin, Ireland, Sept. 18-20, 2005. pp.121- 130.
[67] Xu X, Sankoff D. Tests for gene clusters satisfying the generalized adjacency criterion. In Proc. the Third Brazilian Symposium on Bioinformatics, Advances in Bioinformatics and Computational Biology (BSB 2008), Santo Andre, Brazil, Aug. 28-30, 2008, pp.152-160.
[68] Yang Z, Sankoff D. Natural parameter values for generalized gene adjacency. In Proc. the Seventh Annual RECOMB Satellite Workshop on Comparative Genomics (RECOMB CG), San Diego, USA, Sept. 16-18, 2009, pp.13-23.
[69] Hoberman R, Sankoff D, Durand D. The statistical analysis of spatially clustered genes under the maximum gap criterion. Journal of Computational Biology, 2005, 12(8): 1083-1102.
[70] Erd¨os P, R′enyi A. On random graphs. Publicationes Mathematicae, 1959, 6: 290-297.
[71] Erd¨os P, R′enyi A. On the evolution of random graphs. Publications of the Mathematical Institute of the Hungarian Academy of Sciences, 1960, 5: 17-61.
[72] Erd¨os P, R′enyi A. On the strength of connectedness of a random graph. Acta Mathematica Scientia Hungary, 1961, 12: 261-267.
[73] D’Souza R, Achlioptas D, Spencer J. Explosive percolation in random networks. Science, 2009, 323(5920): 1453-1455.

文章评论

Copyright 2008 by Journal of Computer Science and Technology