Journal of Computer Science and Technology  2010, 25(1) 95-106 DOI:     ISSN: 1000-9000 CN: CN 11-2296/TP

本期目录 | 过刊浏览 | 高级检索                                                            [打印本页]   [关闭]
扩展功能
本文信息
Supporting info
PDF(404KB)
服务与反馈
把本文推荐给朋友
加入我的书架
加入引用管理器
引用本文
Email Alert
文章反馈
浏览反馈信息
本文关键词相关文章
automated NMR (Nuclear Magnetic Resonance)
protein structure determination
algorithms
本文作者相关文章
Ming Li
PubMed
Article by Ming Li
中文题目: 蛋白质结构能够即时确定吗?
中文导读

对于一个感兴趣的蛋白质,我们能否即时地,比如说在一周之内,精确地确定其空间结构呢?笔者认为,如果在现有的实验技术的基础上,再辅之以新开发的计算技术的话,那么即时确定蛋白质结构是有可能实现的。当前可资选择的技术,无论是X光晶体衍射方法还是核磁共振质谱方法,抑或是蛋白质结构的计算预测方法,都有其不足之处。在本文中,笔者以计算机科学家的视角来重新审视上述技术,并指出为完成这个宏大的目标,计算机科学家能够做出哪些独特的贡献。
蛋白质结构确定的重要性不言而喻,因为这是了解“从基因到功能”的重要一步。美国能源局(DOE)和国家卫生总署(NIH)启动了功能基因组项目,计划使用生物学实验方法比如(NMR和X射线晶体衍射)测定一些有代表性的蛋白质结构,至于其他的蛋白质,则使用计算的方法预测出的结构。然而上述方案存在着一个潜在的问题:假设生物学家试图研究某个蛋白质的功能,往往不满足于使用计算方法预测的蛋白质结构,但是高精度的蛋白质结构测定一般需要半年的时间。因此,一种“即时”的蛋白质结构确定技术是很有实际意义的,即对于感兴趣的蛋白质,综合使用实验技术和计算技术,在一周之内测定其结构。
值得指出的是,对于计算机科学家而言,蛋白质结构确定问题看起来包含过多的技术细节,然而这恰恰是我们想着重强调的:与其解决一个有着漂亮的形式化的问题,比如仅仅使用蛋白质序列预测蛋白质结构,不如充分挖掘和利用领域知识所蕴含的变量约束。下面我们即从这一观点出发重新检视现有的蛋白质结构确定技术,并列举出每种技术中的待解决问题。
1。蛋白质结构预测技术
使用计算技术预测蛋白质结构的方法大致可以分作两类:一类是基于结构模板的方法,比如像FB5-HMM、PROSPECT、ROSSETA、RAPTOR、MUFOLD等;另一类是基于结构片段的方法,比如ROSSETA和FALCON等。其中ROSSETA是对每个长度为9的片段预测出可能的局部结构,然后将这些局部结构拼接成整体结构;而FALCON是使用这些局部结构训练出一个Position-specific的隐马尔可夫模型(HMM),然后从这个HMM采样出整体结构。TASSER则从Threading结果中抽取出长度不等的局部结构片段,然后进行拼接。
虽然蛋白质结构预测技术近年来取得了显著的进展,然而人们对被标有“预测结果”标签的结构始终心存疑虑:预测技术能否稳定地产生出高精度的结构?如何给预测结果一个可信度打分?
仔细分析FALCON的性能,我们发现FALCON多次迭代之后能够收敛到一个高精度的结构。进一步提高精度的瓶颈不在于预测方法,而是设计更加精确的能量函数。 从计算的观点看,随着已知结构数量的不断增加,利用统计技术设计精确的能量函数是可行的。

2。基于NMR技术的蛋白质结构确定

使用NMR技术确定一个蛋白质结构往往需要很多时间:
第一步,蛋白质样品制备,大约需要5天的时间;
第二步,核磁共振实验来生成核磁共振谱图,每张谱图要花费大约1到2天的时间,多幅谱图可以并行生成;
第三步,谱图分析计算出化学偏移,估计出残基间距离,并最终计算出蛋白质结构。目前谱图分析部分还采用手工或半自动的方法,因此此步大约需要花费20到270天的时间。
因此,如果想基于NMR技术来达到”蛋白质结构按需即时确定”这一目标的话,需要解决如下几个挑战:
1) 高精度的谱峰提取算法:在NMR实验中,相互耦合的原子核表达出信号,被形象地称做NMR谱峰。经过傅立叶变换之后,谱峰坐标表示相应原子核的化学偏移信息。目前NMR实验室基本上仍让采用手工或者半手工的方式提取谱峰。
2) 容错的谱峰归属算法:由于谱峰提取不可避免地存在错误,因此在将谱峰归属到残基这一步需要容错算法。
容错的结构生成算法:由于谱峰提取中的错误,从NOE谱中估计残基间距离也可能存在错误,因此要求最终的结构生成算法也必须是能够容错的;
结构信息辅助的谱峰归属算法:在有些应用场合中,我们是能够找到一些结构信息来辅助谱峰归属计算的。比如蛋白质设计一般是从已知结构的蛋白质出发,对某些残基进行修改;在结构确定中,从上一轮计算得到的低精度结构也可以作为参考信息。我们称这些已知结构的蛋白质为参考蛋白质。此外,有时参考蛋白质的谱峰归属信息也是能够获得的。如何有效使用这些信息来提高谱峰归属的精度,是值得研究的问题。
化学偏移预测:如何从蛋白质结构出发预测出残基的化学偏移也是值得研究的问题。目前常用的软件,比如SHIFTX(SHIFTY)和SPARTA,对于N的预测误差大约为2-3ppm,其精度还有待提高。
此外,无论是X射线还是核磁共振都不能处理非可溶性蛋白质,比如膜蛋白,对于这些蛋白,我们只能期待新的技术的日益成熟。

Can We Determine a Protein Structure Quickly?

Ming Li (李明), Fellow, ACM, IEEE, Royal Society of Canada

D.R. Cheriton School of Computer Science, University of Waterloo, Waterloo, Ontario, N2L 3G1 Canada
Dingsheng Technologies, Beijing 100085, China

Abstract:

Can we determine a high resolution protein structure quickly, say, in a week? I will show this is possible by the current technologies together with new computational tools discussed in this article. We have three potential paths to explore:

 

 

 

 

  • X-ray crystallography. While this method has produced the most protein structures in the PDB (Protein Data Bank), the nasty trial-and-error crystallization step remains to be an inhibitive obstacle.
  • NMR (Nuclear Magnetic Resonance) spectroscopy. While the NMR experiments are relatively easy to do, the interpretation of the NMR data for structure calculation takes several months on average.
  • In silico protein structure prediction. Can we actually predict high resolution structures consistently? If the predicted models remain to be labeled as ``predicted'', and these structures still need to be experimentally verified by the wet lab methods, then this method at best can serve only as a screening tool.
    I investigate the question of ``quick protein structure determination'' from a computer scientist point of view and actually answer the more relevant question ``what can a computer scientist effectively contribute to this goal''.

     

     

     

     

     

  • Keywords: automated NMR (Nuclear Magnetic Resonance)    protein structure determination    algorithms  
    收稿日期 2009-10-13 修回日期 2009-11-16 出版日期  
    DOI:
    基金项目:

    This work was partially supported by the National High Tech Research and Development 863 Program under Grant No. 2008AA02Z313 from China's Ministry of Science and Technology, Canada's NSERC under Grant No. OGP0046506, Canada Research Chair Program, an NSERC Collaborative Grant, and Ontario's Premier's Discovery Award.

    作者简介:
    Ming Li is a Canada research chair in bioinformatics and a University Professor at the University of Waterloo. He is a fellow of Royal Society of Canada, ACM, and IEEE. He is a recipient of E.W.R. Steacie Fellowship Award in 1996, and the 2001 Killam Fellowship. Together with Paul Vitanyi he has pioneered the applications of Kolmogorov complexity and co-authored the book ``An introduction to Kolmogorov complexity and its applications''. His research interests recently include protein structure determination and the Internet search engine.

    参考文献:

    [1] Wooley J, Ye Y. A Historical Perspective and Overview of Protein Structure Prediction. Computational Methods for Protein Structure Prediction and Modeling, Xu Y et al. (eds.), Springer, 2007, pp.1-44.
    [2] Hiraki M et al. Development of an automated largescale protein-crystallization and monitoring system for highthroughput protein-structure analyses. Acta Crystallogr. D. Biol. Crystallogr., 2006, 62(9): 1058-1065.
    [3] Chandonia J M, Brenner S E. The impact of structural genomics: Expectations and outcomes. Science, Jan. 20, 2006, 311(5759): 347-351.
    [4] Hamelryck T, Kent J T, Krogh A. Sampling realistic protein conformations using local structural bias. PLoS Comput. Biol., 2006, 2(9): e131.
    [5] Kim D, Xu D, Guo J, Ellrott K, Xu Y. PROSPECT II: Protein structure prediction program for genome-scale applications. Protein Eng., 2003, 16(9): 641-650.
    [6] Bradley P, Misura K M S, Baker D. Toward high-resolution de novo structure prediction for small proteins. Science, 2005, 309(5742): 1868-1871.
    [7] Zhang Y, Arakaki A, Skolnick J. TASSER: An automated method for the prediction of protein tertiary structures in CASP6. Proteins, 2005, 61(S7): 91-98.
    [8] Zhang Y. Template-based modeling and free modeling by ITASSER in CASP7. Proteins, 2007, 69(Suppl. 8): 108-117.
    [9] Xu J, Li M, Kim D, Xu Y. RAPTOR: Optimal protein threading by linear programming. Journal of Bioinformatics and Computational Biology, 2003, 1(1): 95-117.
    [10] Zhang J, Wang Q, Barz B, He Z, Kosztin I, Shang Y, Xu D. MUFOLD: A new solution for protein 3D structure prediction. DOI: 10.1002/prot.22634, Proteins: Structure, Function and Bioinformatics, 2009, DOI:10.1002/prot.22634.
    [11] Li S C, Bu D, Xu J, Li M. Fragment-HMM: A new approach to protein structure prediction. Protein Science, 2008, 17: 1925-1934.
    [12] Li S C. New approaches to protein structure prediction
    [Ph.D. Dissertation]. University of Waterloo, Waterloo, Canada, 2009.
    [13] Li S C, Bu D B, Li M. ONION: Quality assessment of ab initio decoys. Manuscript, 2009.
    [14] Kurt W¨uthrich. NMR of Proteins and Nucleic Acids. John Wiley & Sons, 1986.
    [15] G¨untert P. Automated structure determination from NMR spectra. European Biophysics Journal, 2009, 38(2): 129-143.
    [16] Williamson M P, Craven C J. Automated protein structure calculation from NMR data. Journal of Biomolecular NMR, 2009, 43(3): 131-143.
    [17] Alipanahi B, Gao X, Karakoc E, Li S C, Bu D, Feng G, Donaldson L, Li M. An automated protocol for NMR protein structure determination, Manuscript, 2009.
    [18] Koradi R, Billeter M, Engeli M, G¨untert P, W¨uthrich K. Automated peak picking and peak integration in macromolecular NMR spectra using AUTOPSY. Journal of Magnetic Resonance, 1998, 135(2): 288-297.
    [19] Altieri A S, Byrd R A. Automation of NMR structure determination of proteins. Current Opinion in Structural Biology, 2004, 14(5): 547-553.
    [20] Corne S A, Johnson P. An artificial neural network for classifying cross peaks in two-dimensional NMR spectra. Journal of Magnetic Resonance, 1992, 100(2): 256-266.
    [21] Carrara E A, Pagliari F, Nicolini C. Neural networks for the peak-picking of nuclear magnetic resonance spectra. Neural Networks, 1993, 6(7): 1023-1032.
    [22] Rouh A, Louis-Joseph A, Lallemand J Y. Bayesian signal extraction from noisy FT NMR spectra. Journal of Biomolecular NMR, 1994, 4(4): 505-518.
    [23] Antz C, Neidig K P, Kalbitzer H R. A general Bayesian method for an automated signal class recognition in 2D NMR spectra combined with a multivariate discriminant analysis. Journal of Biomolecular NMR, 1995, 5(3): 287-296.
    [24] Orekhov V Y, Ibraghimov I V, Billeter M. MUNIN: A new approach to multi-dimensional NMR spectra interpretation. Journal of Biomolecular NMR, 2001, 20(1): 49-60.
    [25] Korzhnev D M, Ibraghimov I V, Billeter M, Orekhov V Y. MUNIN: Application of three-way decomposition to the analysis of heteronuclear NMR relaxation data. Journal of Biomolecular NMR, 2001, 21(3): 263-268.
    [26] Kleywegt G, Boelens R, Kaptein R. A versatile approach toward the partially automatic recognition of cross peaks in 2D 1H NMR spectra. Journal of Magnetic Resonance, 1990, 88(3): 601-608.
    [27] Garret D S, Powers R, Gronenborn A M, Clore G M. A common sense approach to peak picking in two-, three-, and four-dimensional spectra using automatic computer analysis of contour diagrams. Journal of Magnetic Resonance, 1991, 95: 214-220.
    [28] Johnson B A, Blevins R A. MR view: A computer program for the visualization and analysis of NMR data. Journal of Biomolecular NMR, 1994, 4(5): 603-614.
    [29] Herrmann T, G¨untert P, W¨uthrich K. Protein NMR structure determination with automated NOE-identification in the NOESY spectra using the new software ATNOS. Journal of Biomolecular NMR, 2002, 24(3): 171-189.
    [30] Goddard T D, Kneller D G. SPARKY 3. University of California, San Francisco, USA, 2008.
    [31] Alipanahi B, Gao X, Karakoc E, Donaldson L, Li M. PICKY: A novel SVD-based NMR spectra peak picking method. Bioinformatics, 2009, 25(12): i268-i275.
    [32] Bartels C, Billeter M, G¨untert P, W¨uthrich K. Automated sequence-specific NMR assignment of homologous proteins using the program GARANT. Journal of Biomolecular NMR, 1996, 7(3):207-213.
    [33] Zimmerman D E, Kulikowski C A, Huang Y, Feng W, Tashiro M, Shimotakahara S, Chien C, Powers R, Montelione G T. Automated analysis of protein NMR assignments using methods from artificial intelligence. Journal of Molecular Biology, 1997, 269(4): 592-610.
    [34] Gronwald W, Willard L, Jellard T, Boyko R F, Rajarathnam K, Wishart D S, S¨onnichsen F D, Sykes B D. Camra: Chemical shift based computer aided protein NMR assignments. Journal of Biomolecular NMR, 1998, 12(3): 395-405.
    [35] Bailey-Kellogg C, Widge A, Kelly J, Brushweller J, Donald B R. The NOESY jigsaw: Automated protein secondary structure and main-chain assignment from sparse, unassigned NMR data. Journal of Computational Biology, 2000, 7(3/4): 537-558.
    [36] G¨untert P, Salzmann M, Braun D, W¨uthrich K. Sequencespecific NMR assignment of proteins by global fragment mapping with the program MAPPER. Journal of Biomolecular NMR, 2000, 18(2): 129-137.
    [37] Hus J C, Prompers J, Br¨uschweiler R. Assignment strategy for proteins with known structure. Journal of Magnetic Resonance, 2002, 157(1): 119-123.
    [38] Erdmann M A, Rule G S. Rapid protein structure detection and assignment using residual dipolar couplings. Technical Report CMU-CS-02-195, School of Computer Science, Carnegie Mellon University, Pittsburgh, USA, 2002.
    [39] Pristovsek P, R¨uterjans H, Jerala R. Semiautomatic sequencespecific assignment of proteins based on the tertiary structure — The program st2nmr. Journal of Computational Chemistry, 2002, 23(3): 335-340.
    [40] Coggins B, Zhou P. PACES: Protein sequential assignment by computer-assisted exhaustive search. Journal of Biomolecular NMR, 2003, 26(2): 93-111.
    [41] Jung Y, Zweckstetter M. Mars—Robust automatic backbone assignment of proteins. Journal of Biomolecular NMR, 2004, 30(1): 11-23.
    [42] Langmead C J, Donald B R. An expectation/maximization nuclear vector replacement algorithm for automated NMR resonance assignments. Journal of Biomolecular NMR, 2004, 29(2): 111-138.
    [43] Langmead C J, Yan A, Lilien R, Wang L, Donald B R. A polynomial-time nuclear vector replacement algorithm for automated NMR resonance assignment. Journal of Computational Biology, 2004, 11(2/3): 277-298.
    [44] Masse J E, Keller R. Autolink: Automated sequential resonance assignment of biopolymers from NMR data by relativehypothesis- prioritization-based simulated logic. Journal of Magnetic Resonance, 2005, 174: 133-151.
    [45] Pristovsek P, Franzoni L. Stereospecific assignments of protein NMR resonances based on the tertiary structure and 2D/3D NOE data. Journal of Computational Chemistry, 2004, 27(6): 791-797.
    [46] Wu K, Chang J, Chen J, Chang C, Wu W, Huang T, Sung T, Hsu W. RIBRA: An error-tolerant algorithm for the NMR backbone assignment problem. Journal of Computational Biology, 2006, 13(2): 229-244.
    [47] Wan X, Lin G. CISA: Combined NMR resonance connectivity information determination and sequential assignment. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2007, 4(3): 336-348.
    [48] Lemak A, Steren C A, Arrowsmith C H, Llin´as, M. Sequence specific resonance assignment via Multicanonical Monte Carlo search using an ABACUS approach. Journal of Biomolecular NMR, 2008, 41(1): 29-41.
    [49] Volk J, Herrmann T, W¨uthrich K. Automated sequencespecific protein NMR assignment using the memetic algorithm MATCH. Journal of Biomolecular NMR, 2008, 41(3): 127- 138.
    [50] Xiong F, Bailey-Kellogg C. A hierarchical grow-and-match algorithm for backbone resonance assignments given 3D structure. In Proc. The 7th IEEE International Conference on Bioinformatics and Bioengineering, Boston, MA, Oct. 14–17, 2007, pp.403-410.
    [51] Xiong F, Pandurangan G, Bailey-KelloggC. Contact replacement for NMR resonance assignment. Bioinformatics, 2008, 24(13): i205-i213.
    [52] Fiorito F, Herrmann T, Damberger F F, W¨uthrich K. Automated amino acid side-chain NMR assignment of proteins using 13C and 15N-resolved 3D
    [1H,1 H]-NOESY. Journal of Biomolecular NMR, 2008, 42(1): 23-33.
    [53] Apaydin M S, Conitzer V, Donald B R. Structure-based protein NMR assignments using native structural ensembles. Journal of Biomolecular NMR, 2008, 40(4): 263-276.
    [54] Stratmann D, Heijenoort C, Guittet E. NOEnet — Use of NOE networks for NMR resonance assignment of proteins with known 3D structure. Bioinformatics, 2009, 25(4): 474- 481.
    [55] Alipanahi B, Gao X, Karakoc E, Balbach F, Donaldson L, Arrowsmith C, Li M. IPASS: Error tolerant NMR backbone resonance assignment by linear programming. Technical Report, No. CS-2009-16, 2009, University of Waterloo, http://www.cs.uwaterloo.ca/research/tr/2009/.
    [56] Seavey B R, Farr E A, Westler W M, Markley J. A relational database for sequence-specific protein NMR data. Journal of Biomolecular NMR, 1991, 1(3): 217-236.
    [57] Li S C, Bu D, Gao X, Xu J, Li M. Designing succinct structural alphabets. Bioinformatics, 2008, 24(13): i182-i189.
    [58] Shen Y, Lange O, Delaglio F, Rossi P, Aramini J M, Liu G, Eletsky A, Wu B, Singarapu K K, Lemak A, Ignatchenko A, Arrowsmith C, Szyperski T, Montelione G T, Baker D, Bax A. Consistent blind protein structure generation from NMR chemical shift data. Proc. the National Academy of Sciences, 2008, 105(12): 4685-4690.
    [59] Gao X. Towards automating protein structure determination from NMR data
    [Ph.D. Dissertation]. University of Waterloo, Waterloo, Canada, 2009.
    [60] Jang R, Gao X, Li M. Towards automated structure-based NMR assignment. Manuscript, 2009.
    [61] Zhao Y, Alipanahi B, Li S C, Li M. Protein secondary structure prediction using NMR chemical shift data. Manuscript, 2009.
    [62] Mobli M, Maciejewski M W, Gryk M R, Hoch J C. Au automated tool for maximum entropy reconstruction of biomolecular NMR spectra. Nature Methods, 2007, 4(6): 467-468.
    [63] Maciejewski M W, Qui H Z, Rujan I, Mobli M, Hoch J C. Nonuniform sampling and spectral aliasing. Journal of Magnetic Resonance, 2009, 199(1): 88-93.
    [64] Xu R, Ayers B, Cowburn D, Muir T W. Chemical ligation of folded recombinant proteins: Segmental isotopic labeling of domains for NMR studies. Proc. Natl. Acad. Sci. USA, 1999, 96(2): 388-393.
    [65] Yu H. Extending the size limit of protein nuclear magnetic resonance. Proc. Natl. Acad. Sci. USA, 1999, 96(2): 332- 334.
    [66] Ozawa K, Wu P S C, Dixon N E, Otting G. 15N-labelled proteins by cell-free protein synthesis — Strategies for highthroughput NMR studies of proteins and protein-ligand complexes. The FEBS Journal, 2006, 273(18): 4154-4159.
    [67] Torizawa T, Ono A M, Terauchi T, Kainosho M. NMR assignment methods for the aromatic ring resonances of phenylalanine and tyrosine residues in proteins. J. Am. Chem. Soc., 2005, 127(36): 12620-12626.
    [68] Kainosho M, Trizawa T, Ono A M, Guntert P. Optimal isotope labelling for NMR protein structure determination. Nature, 2006, 440: 52-57.

    文章评论

    Copyright 2008 by Journal of Computer Science and Technology