Journal of Computer Science and Technology  2010, 25(1) 26-34 DOI:     ISSN: 1000-9000 CN: CN 11-2296/TP

本期目录 | 过刊浏览 | 高级检索                                                            [打印本页]   [关闭]
扩展功能
本文信息
Supporting info
PDF(286KB)
服务与反馈
把本文推荐给朋友
加入我的书架
加入引用管理器
引用本文
Email Alert
文章反馈
浏览反馈信息
本文关键词相关文章
DNA methylation
epigenome
computational epigenomics
本文作者相关文章
Michael Q. Zhang
Andrew D. Smith
PubMed
Article by Michael Q. Zhang
Article by Andrew D. Smith
中文题目: 探索基因组甲基化的奥秘
中文导读

DNA甲基化实质上是对基因组碱基的化学修饰,属于表观遗传修饰的范畴。这种主要发生在基因组CpG岛上的修饰具有多种生物学功能。例如调控基因的表达、基因沉默、保持基因组对外界环境的稳定性。目前研究表明,DNA的甲基化有多种疾病相关,异常的甲基化可以导致肿瘤的发生。鉴于甲基化的重大生物学意义,人类表观基因组协会于2003年正式宣布开始投资和实施人类表观基因组计划(Human Epigenome Project,HEP),目的是为确认、分类和解释人类主要组织中所有基因在基因组水平的DNA甲基化模式。2009年,首张人类表观遗传学基因组图谱绘制成功,包括RNA转录信息,组蛋白修饰信息等内容。
伴随着测序技术的不断发展,人们得到的甲基化的数据也越来越多。采用何种途径获得甲基化数据,以及面对这些大量的数据生物信息学学家该如何去分析成为该领域当前急需解决的问题。
目前,有多种获得基因组甲基化数据的方法,这些方法大都对甲基化和未甲基化的特征进行对比,进而来鉴定甲基化位点。例如甲基化敏感性限制性内切酶方法是利用该酶对甲基化区的不切割的特性,将DNA消化为不同大小的片段后再进行分析。免疫共沉淀利用可以结合甲基化区域的抗体与测序等技术相结合鉴定甲基化区域。目前比较常用的是用亚硫酸盐处理的方法,该方法使DNA中未发生甲基化的胞嘧啶脱氨基转变成尿嘧啶,而甲基化的胞嘧啶保持不变,然后进行PCR扩增所需片段,则尿嘧啶全部转化成胸腺嘧啶。最后,对PCR产物进行测序,并且与未经处理的序列比较,判断是否CpG位点发生甲基化。该方法是目前能够获得单个CpG位点甲基化的唯一方法,其他两种方法只能鉴定甲基化的区域。
利用亚硫酸盐测序法获得的甲基化数据接下来如何去分析是生物信息学家们需要解决的问题。首先将Michael Q. Zhang等人发明的RMAPBS算法应用到对亚硫酸盐处理得到的片段的分析。由于存在测序错误、亚硫酸盐不完全处理等会产生很大的噪音,所以接下来要运用复杂的统计学模型来准确鉴定甲基化的状态。为了进一步得到甲基化的生物学意义,需要分析导致不同区域甲基化不同的因素有哪些,这需要准确的计算方法去鉴定那些在两个数据集中甲基化不同的区域。另外,由于样本存在异质性,例如肿瘤组织由不同类型的细胞构成,这位甲基化的分析带来了进一步的难度,用合适的计算方法去分析由不同细胞类型组成的异质样本为甲基化的准确鉴定提供了有效手段。将甲基化这一表观遗传学信息整合到调控网络构建中,可以对生物体内调控的复杂分子机制有更深一步的理解。最后,同对物种进化树的研究类似,对细胞发育树的研究同样吸引了众多计算生物学家的眼球。甲基化在细胞发育树中所扮演的角色成为研究干细胞分化,肿瘤克隆进化模型的有力武器。以上这些利用计算生物学的方法对甲基化数据的分析为揭示甲基化真实的生物学意义提供了重要的途径。

Challenges in Understanding Genome-Wide DNA Methylation

Michael Q. Zhang1,2 (张奇伟) and Andrew D. Smith3, Member, ACM

1Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, U.S.A.
2Bioinformatics Division, TNLIST and MOE Key Lab of Bioinformatics, Tsinghua University, Beijing 100084, China
3Department of Biological Sciences, University of Southern California, Los Angeles, California, U.S.A.

Abstract:

DNA methylation is a chemical modification of the bases in genomes. This modification, most frequently found at CpG dinucleotides in eukaryotes, has been identified as having multiple critical functions in broad and diverse species of animals and plants, while mysteriously appears to be lacking from several other well-studied species. DNA methylation has well known and important roles in genome stability and defense, its pattern change highly correlates with gene regulation. Much evidence has linked abnormal DNA methylation to human diseases. Most prominently, aberrant DNA methylation is a common feature of cancer genomes. Elucidating the precise functions of DNA methylation therefore has great biomedical significance. Here we provide an update on large-scale experimental technologies for detecting DNA methylation on a genomic scale. We also discuss new prospect and challenges that computational biologist will face when analyzing DNA methylation data.

Keywords: DNA methylation    epigenome    computational epigenomics  
收稿日期 2009-11-19 修回日期 2009-11-24 出版日期  
DOI:
基金项目:

This work is supported by NIH under Grant Nos. ES017166 and HG001696.

作者简介:
Michael Q. Zhang obtained the B.S. degree in mech. eng. from Univ. Sci. & Tech. China in 1981 and Ph.D. degree in physics from Rutgers University in 1987. He studied statistical mechanics and integrable systems as a postdoctoral fellow at Courant Institute of Mathematical Sciences, NYU for three years and then moved to Cold Spring Harbor Laboratory for twenty years. He is now a professor at Watson School of Biological Sciences at Cold Spring Harbor Laboratory in New York. He has also been a guest professor at Tsinghua University in Beijing, China since 2003. He has also been an adjunct professor at Stony Brook University since 1997. He has associated with the editorial board for Nucleic Acids Research, Bioinformatics, BMC Journals, etc. and served as chairman/section chair or program committee member for CSHL Meetings, ISMB, RECOMB, APBC, etc. Dr. Zhang is one of the pioneers in human genome research and made important contributions to computational genomics and epigenomics.
Andrew D. Smith received the B.A. degree in psychology and the B.C.S. degree (Bachelor of Computer Science) in 2000 and the Ph.D. degree in computer science from University of New Brunswich in 2004. Dr. Smith studied computational biology and genomics at Cold Spring Harbor Laboratory until 2008 at which time he moved to University of Southern California where he is currently assistant professor of biological sciences.

参考文献:

[1] Schwartz D C, Waterman M S. New generations: Sequencing machines and their computational challenges. J. Comput. Sci. & Technol., 2010, 25(1): 3-9.
[2] Holliday R, Pugh J E. DNA modification mechanisms and gene activity during development. Science, 1975, 187(4173): 226-232.
[3] Riggs A. X inactivation, differentiation, and DNA methylation. Cytogenet. Cell. Genet., 1975, 14(1): 9-25.
[4] Bird A. DNA methylation patterns and epigenetic memory. Genes & Development, 2002, 16(1): 6-21.
[5] Bestor T H. The DNA methyltransferases of mammals. Human Molecular Genetics, 2000, 9(16): 2395-2402.
[6] Yoder J A, Walsh C P, Bestor T H. Cytosine methylation and the ecology of intragenomic parasites. Trends in Genetics, 1997, 13(8): 335-340.
[7] Bestor T H. Cytosine methylation mediates sexual conflict. Trends in Genetics, 2003, 19(4): 185-190.
[8] Gonzalgo M L, Jones P A. Rapid quantitation of methylation differences at specific sites using methylation-sensitive single nucleotide primer extension (Ms-SNuPE). Nucleic Acids Research, 1997, 25(12): 2529-2531.
[9] Simmen M W. Genome-scale relationships between cytosine methylation and dinucleotide abundances in animals. Genomics, 2008, 92(1): 33-40.
[10] Cooper D N, Youssoufian H. The CpG dinucleotide and human genetic disease. Human Genetics, 1988, 78(2): 151-155.
[11] Jiang C, Zhao Z. Mutational spectrum in the recent human genome inferred by single nucleotide polymorphisms. Genomics, 2006, 88(5): 527-534.
[12] Wood L D, Parsons D W, Jones S et al. The genomic landscapes of human breast and colorectal cancers. Science, 2007, 318(5853): 1108-1113.
[13] Human Epigenome Consortium. http://www.epigenome.org/, Accessed Sept. 16, 2009.
[14] Epigenomics — Overview. Division of Program Coordination, Planning, and Strategic Initiatives, National Institutes of Healt. http://nihroadmap.nih.gov/epigenomics/, Accessed Sept. 16, 2009.
[15] Raleigh E A. Organization and function of the mcrBC genes of Escherichia coli K-12. Molecular Microbiology, 1992, 6(9): 1079-1086.
[16] Bird A P. Use of restriction enzymes to study eukaryotic DNA methylation: II. The symmetry of methylated sites supports semi-conservative copying of the methylation pattern. Journal of Molecular Biology, 1978, 118(1): 49-60.
[17] Gruenbaum Y, Cedar H, Razin A. Restriction enzyme digestion of hemimethylated DNA. Nucl. Acids Res., 1981, 9(11): 2509-2515.
[18] Lippman Z, Gendrel A V, Colot V, Martienssen R. Profiling DNA methylation patterns using genomic tiling microarrays. Nature Methods, 2005, 2(3): 219-224.
[19] Weber M, Davies J J, Wittig D, Oakeley E J, Haase M, Lam W L, Schubeler D. Chromosome-wide and promoter-specific analyses identify sites of differential DNA methylation in normal and transformed human cells. Nat. Genet., 2005, 37(8): 853-862.
[20] Down T A, Rakyan V K, Turner D J et al. A Bayesian deconvolution strategy for immunoprecipitation-based DNA methylome analysis. Nat. Biotech., 2008, 26(7): 779-785.
[21] Xiong Z, Laird P W. COBRA: A sensitive and quantitative DNA methylation assay. Nucleic Acids Research, 1997, 25(12): 2532-2534.
[22] Zhou D, Qiao W, Yang L, Lu Z. Bisulfite-modified target DNA array for aberrant methylation analysis. Analytical Biochemistry, 2006, 351(1): 26-35.
[23] Ehrich M, Nelson M R, Stanssens P et al. Quantitative highthroughput analysis of DNA methylation patterns by basespecific cleavage and mass spectrometry. Proc. Natl. Acad. Sci. USA, 2005, 102(44): 15785-15790.
[24] Smith A D, Xuan Z, Zhang M Q. Using quality scores and longer reads improves accuracy of Solexa read mapping. BMC Bioinformatics, 2008, 9: 128.
[25] Meissner A, Mikkelsen T S, Gu H et al. Genome-scale DNA methylation maps of pluripotent and differentiated cells. Nature, 2008, 475(7205): 766-770.
[26] Ball M P, Li J B, Gao Y et al. Targeted and genome-scale strategies reveal gene-body methylation signatures in human cells. Nature Biotechnology, 2009, 27(4): 361-368.
[27] Deng J, Shoemaker R, Xie B et al. Targeted bisulfite sequencing reveals changes in DNA methylation associated with nuclear reprogramming. Nature Biotechnology, 2009, 27(4): 353-360.
[28] Smith A D, Chung W, Hodges E, Kendall J, Hannon G, Hicks J, Xuan Z, Zhang M Q. Updates to the RMAP short-read mapping software. Bioinformatics, 2009, 25(21): 2841-2842.
[29] Li R, Li Y, Kristiansen K, Wang J. Soap: Short oligonucleotide alignment program. Bioinformatics, 2008, 24(5): 713-714.
[30] Langmead B, Trapnell C, Pop M, Salzberg S. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biology, 2009, 10(3): R25.
[31] Lister R, Ecker J, Ren B. 2009. (Personal Communication)
[32] Hodges E, Smith A D, Kendall J et al. High definition profiling of mammalian DNA methylation by array capture and single molecule bisulfite sequencing. Genome Research, 2009, 19(9): 1593-1605.
[33] Eckhardt F, Lewin J, Cortese R et al. DNA methylation profiling of human chromosomes 6, 20 and 22. Nat. Genet., 2006, 38(12): 1378-1385.
[34] Das R, Dimitrova N, Xuan Z et al. Computational prediction of methylation status in human genomic sequences. Proc. Natl. Acad. Sci. USA, 2006, 103(28): 10713-10716.
[35] Vilkaitis G, Suetake I, Klimasauskas S, Tajima S. Processive methylation of hemimethylated CpG sites by mouse Dnmt1 DNA methyltransferase. J. Biol. Chem., 2005, 280(1): 64-72.
[36] Sebat J, Lakshmi B, Troge J et al. Large-scale copy number polymorphism in the human genome. Science, 2004, 305(5683): 525-528.
[37] Model F, Adorjan P, Olek A, Piepenbrock C. Feature selection for DNA methylation based cancer classification. Bioinformatics, 2001, 17(Suppl. 1): S157-S164.
[38] Lister R, Ecker J R. Finding the fifth base: Genome-wide sequencing of cytosine methylation. Genome Research, 2009, 19(6): 959-968.
[39] Watt F, Molloy P L. Cytosine methylation prevents binding to DNA of a HeLa cell transcription factor required for optimal expression of the adenovirus major late promoter. Genes & Development, 1988, 2(9): 1136-1143.
[40] Bell A C, Felsenfeld G. Methylation of a CTCF-dependent boundary controls imprinted expression of the Igf 2 gene. Nature, 2000, 405(6785): 482-485.
[41] Lewis J D, Meehan R R, Henzel W J et al. Purification, sequence, and cellular localization of a novel chromosomal protein that binds to Methylated DNA. Cell, 1992, 69(6): 905-914.
[42] Klose R J, Sarraf S A, Schmiedeberg L, McDermott S M, Stancheva I, Bird A P. DNA binding selectivity of MeCP2 due to a requirement for A/T sequences adjacent to Methyl- CpG. Molecular Cell, 2005, 19(5): 667-678.
[43] Tompa M, Li N, Bailey T L et al. Assessing computational tools for the discovery of transcription factor binding sites. Nat. Biotechnol., 2005, 23(1): 137-144.
[44] Li M, Ma B, Wang L. On the closest string and substring problems. Journal of the ACM, 2002, 49(2): 157-171.
[45] Reya T, Morrison S J, Clarke M F, Weissman I L. Stem cells, cancer, and cancer stem cells. Nature, 2001, 414(6859): 105- 111.
[46] Riesenfeld C S, Schloss P D, Handelsman J. Metagenomics: Genomic analysis of microbial communities. Annu. Rev. Genet., 2004, 38: 525-552.
[47] Ford L, Fulkerson D. Flows in Networks. Princeton University Press, 1962.
[48] Eriksson N, Pachter L, Mitsuya Y et al. Viral population estimation using pyrosequencing. PLoS Comput. Biol., May 2008, 4(5): e1000074.
[49] Babu M M, Lang B, Aravind L. Methods to reconstruct and compare transcriptional regulatory networks. Methods Mol. Biol., 2009, 541: 163-180.
[50] Hecker M, Lambeck S, Toepfer S, van Someren E, Guthke R. Gene regulatory network inference: Data integration in dynamic models — A review. Biosystems, 2009, 96(1): 86-103.
[51] Bar-Joseph Z, Gerber G K, Lee T I et al. Computational discovery of gene modules and regulatory networks. Nature Biotechnology, 2003, 21(11): 1337-1342.
[52] Lee T I, Rinaldi N J, Robert F et al. Transcriptional regulatory networks in saccharomyces cerevisiae. Science, 2002, 298(5594): 799-804.
[53] Schwikowski B, Uetz P, Fields S. A network of protein-protein interactions in yeast. Nature Biotechnology, 2000, 18(12): 1257-1261.
[54] Beer M A, Tavazoie S. Predicting gene expression from sequence. Cell, 2004, 117(2): 185-198.
[55] Smith A D, Sumazin P, Xuan Z, Zhang M Q. DNA motifs in human and mouse proximal promoters predict tissue-specific expression. Proc. Natl. Acad. Sci. USA, 2006, 103(16): 6275-6280.
[56] Pennacchio L A, Loots G G, Nobrega M A, Ovcharenko I. Predicting tissue-specific enhancers in the human genome. Genome Research, 2007, 17(2): 201-211.
[57] Verona R I, Mann M R W, Bartolomei M S. Genomic imprinting: Intricacies of epigenetic regulation in clusters. Annual Review of Cell and Developmental Biology, 2003, 19(1): 237-259.
[58] Felsenstein J. Evolutionary trees from DNA sequences: A maximum likelihood approach. Journal of Molecular Evolution, 1981, 17(6): 368-376.
[59] Sankoff D. Computational complexity of inferring phylogenies by compatibility. Systematic Zoology, 1986, 35(2): 224-229.
[60] Gusfield D. Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press, 1997.
[61] Miyamoto T, Iwasaki H., Reizis B, Ye M, Graf T, Weissman I L, Akashi K. Myeloid or lymphoid promiscuity as a critical step in hematopoietic lineage commitment. Developmental Cell, 2002, 3(1): 137-147.
[62] Yatabe Y, Tavar′e S, Shibata D. Investigating stem cells in human colon by using methylation patterns. Proc. Natl. Acad. Sci. USA, 2001, 98(19): 10839-10844.
[63] Kim J Y, Tavar′e S, Shibata D. Counting human somatic cell replications: Methylation mirrors endometrial stem cell divisions. Proc. Natl. Acad. Sci. USA, 2005, 102(49): 17739- 17744.

文章评论

Copyright 2008 by Journal of Computer Science and Technology