Journal of Computer Science and Technology  2010, 25(1) 107-123 DOI:     ISSN: 1000-9000 CN: CN 11-2296/TP

本期目录 | 过刊浏览 | 高级检索                                                            [打印本页]   [关闭]
扩展功能
本文信息
Supporting info
PDF(386KB)
服务与反馈
把本文推荐给朋友
加入我的书架
加入引用管理器
引用本文
Email Alert
文章反馈
浏览反馈信息
本文关键词相关文章
mass spectrometry
proteomics
bioinformatics
本文作者相关文章
Bin Ma
PubMed
Article by Bin Ma
中文题目: 蛋白质组学中质谱数据的计算分析所面临的挑战
中文导读

质谱技术极大地改变了蛋白质组学的研究模式,即从“逐个蛋白,人工密集”模式转变成“计算密集,高通量”模式。在质谱技术中,计算技术的作用是无可替代的‘。另一方面,质谱数据的规模和复杂程度都给数据分析带来了很大的挑战,本文试图尽可能全面地介绍这些挑战以及最新的进展。
---蛋白质鉴定的数据库搜索方法
C1:精确预测肽段的理论质谱;
C2:衡量肽段和质谱之间相似性的打分函数;
C3:肽段鉴定结果的性能评估,即有效估计假阳性率(False positive rate);
C4:”one hit wonders”,即当某个蛋白质仅仅只有一个肽段被质谱匹配时,如何准确鉴定蛋白质;
C5:蛋白质鉴定结果的性能评估;
C6:联合使用一级质谱和二级质谱进行蛋白质鉴定;
C7:由于同源蛋白质常常会包含相同的肽段,在这种情况下,如何从肽段匹配结果中准确推断出蛋白质;
C8:质谱质量评估,以去除噪音质谱;
C9:和固定修饰(fixed PTM)相比而言,可变修饰(variable PTM)对数据库搜索方法带来很大的麻烦,因为在这种情况下,一个氨基酸可能发生,也可能不发生修饰。多个氨基酸的可变修饰造成搜索空间呈指数级增长,因此需要开发有效的搜索算法;
C10:目前质谱仪器能够高通量地产生质谱,因此要求算法与软件实现必须能够处理大量的数据;

---肽段测序的De Novo方法
C11:为提高De Novo方法的性能,如下问题是值得考虑的:1)快速的组合搜索算法。理论上总质量等于给定值的肽段数目是指数数量级的,因此需要快速的搜索算法。2)和上述组合搜索相适合的,精确的打分函数设计;3)有效处理不理想质谱,比如谱峰丢失现象,异常断裂模式生成的谱峰识别问题等;4)使用不同质谱仪生成的多张质谱联合推断;

---使用同源序列库提高肽段/蛋白质鉴定性能
C12:使用同源序列库提高肽段/蛋白质鉴定性能。虽然数据库中未包括目标蛋白质,但是如果包含与其同源的序列的话,依然可以有效地提高序列鉴定的性能。为完成这个目标,需要修改经典的序列比对统计模型,比如考虑De Novo错误的同源比对算法。

---蛋白质完全测序
C13:使用MS/MS技术对目标蛋白质进行完全测序。一些前驱性工作包括1) 手工拼接:先使用De Novo技术鉴定出各酶切片段的序列,然后将这些序列人工拼接成完全序列;2) 自动化预处理:先从各酶切片段的质谱计算出中间量,称作前缀质谱(prefix residue mass spectrum),然后将这些前缀质谱拼接成一个总的质谱。如何进行全自动的完全测序,是值得关注的问题。

---PTM鉴定
C14:结合MS/MS与序列信息鉴定PTM。目前已经有工作从序列出发预测出哪些氨基酸会发生哪些PTM,如果再结合质谱信息的话,有希望准确鉴定PTM;
C15:适宜于PTM的序列-质谱相似性度量:PTM对质谱的影响是多样的,有些PTM仅仅影响氨基酸的质量,从而造成质谱的简单偏移,还有一些PTM会严重影响肽段的断裂方式,从而造成质谱发生很大的改变。因此,需要设计能够考虑PTM的序列-质谱相似性度量。
C16:未知PTM的自动发现:对每个氨基酸枚举其所有可能PTM会造成搜索空间的指数级增长,因此一种自动检测未知PTM的方法是很有意义。一些已有的工作包括1)质谱与序列的比对发现PTM;2)质谱-质谱比对发现PTM;

---糖基结构确定
C17:使用质谱数据确定糖基结构:和其他PTM不同,糖基化不仅会造成氨基酸质量的变化,同时还会形成不同的结构。一般来说,糖基是有多个糖原形成的树状结构,连接糖原的化学键在质谱仪中会发生断裂,形成一些离子,并混杂于肽段形成的离子中。不同于肽段线性结构,糖基形成树状结构,从而导致鉴定的复杂性。已有的工作包括1)使用酶切技术分离出糖基,然后使用质谱技术鉴定糖基结构;2)直接通过糖基化蛋白质的质谱鉴定糖基的结构。上述工作使用动态规划或者启发式算法逐步构建出糖基的整体结构。已有工作证明了糖基结构确定问题是NP-Hard的,因此需要开发有效的实用算法。

---质谱数据库搜索
C18:高质量质谱数据库构建:随着带序列标注信息的质谱数据的快速累积,将待鉴定质谱与已标注质谱进行直接比较成为一种质谱鉴定的有效方法。这条路线的首要之处在于质谱序列标注的质量控制;
C19:在质谱数据库中快速搜索相似质谱:随着质谱数据库的增大,如何进行快速的质谱比对,是值得研究的内容;
C20:同一肽段的带PTM质谱与不带PTM质谱的快速比对与发现:对于同一个肽段来说,如果能够通过比对发现其含有不同PTM的修饰产物,无疑是有意义的工作;

---蛋白质定量
C21:保留时间(retention time)校正:经典的肽段/蛋白质定量方法是基于蛋白质的同位素标定,最近无须同位素标记(label-free)的方法日益得到重视,其优势在于无须同位素标记样品制备、避免样品制备等过程的错误引入等,更重要的在于可以在不同蛋白质之间进行比较。
对存在于不同蛋白质的同一肽段来说,其保留时间大致相同,但是在不同反应中依然会存在较小的变化,因此首先需要进行校正。比如,通过多个样本的保留时间进行校正。
C22:特征肽段(peptide features)检测:存在于多个样品中的共同肽段称作为特征肽段。从质谱保留时间数据中发现特征肽段,称为特征肽段映射(mapping)。在建立映射并计算每个特征肽段的强度之后,特征肽段可以用来计算不同样本之间的比例。
C23:肽段特征匹配:
C24:依据肽段表达量计算蛋白质表达量:原则上,可以由肽段表达量推断出蛋白质的表达量,然而此步计算需要解决两个困难:1)共同肽段问题。如果两个蛋白质含有一个共同肽段,那么此肽段的表达量如何合理分配给这两个蛋白质。2)肽段表达量计算中的错误去除。
C25:PTM定量:含某种PTM的蛋白质/肽段的表达量是多少,是非常有意义的问题。利用保留时间信息有助于计算PTM定量。

---非标准蛋白质测序
C26:含二硫键肽段的肽段测序:常见的蛋白质通常呈线性结构,而含有多个Cystine的蛋白质会形成二硫键,从而导致质谱发生特异性变化;
C27:非核糖体合成蛋白质(non-ribosomal protein, NRP)的测序:非核糖体蛋白质通常呈环状或者分支结构,也会导致质谱发生特异性变化;
C28:从多个肽段的混合质谱中鉴定肽段:如果多个肽段具有相同的母离子质量,则其二级质谱会混杂在一起,因此如何从混合质谱中鉴定出所有肽段序列,是值得研究的;

---“自顶向下”(top-down)蛋白质鉴定
C29:“自顶向下”(top-down)蛋白质鉴定:传统的蛋白质鉴定方法都是采用“先酶切,测肽段”的策略,随着新型质谱仪器的出现,可以无需酶切步骤,直接对整个蛋白质加多个电荷获得整体质谱。因此,有必要开发新的打分函数以适应完全不同的断裂规律。

---肽段的可检测性
C30:精确预测肽段是否形成二级质谱:研究表明,即使是来源于同一个蛋白质,有些肽段会更容易形成可观察到的二级质谱,而有些肽段则更不容易产生质谱,其原因大致可归纳为:1)酶切步骤中的遗漏;2)肽段上PTM导致二级质谱未观察到;3)肽段在LC步骤丢失;4)肽段离子化不充分,从而在一级质谱阶段因强度较弱而未检测到;5)肽段断裂异常导致产生异常二级质谱。预测肽段形成可观测二级质谱的可能性有助于蛋白质定量。

---多质谱联合进行肽段鉴定
C31:使用多级质谱进行肽段/蛋白质鉴定与测序:由于断裂不完全等原因,往往会形成低质量质谱。多级质谱是解决此问题的一种可行方案。
C32:使用多类质谱进行肽段/蛋白质鉴定与测序:此外,还可以采用多类质谱,比如断裂规律不同的CID和ETD,联合进行质谱鉴定和测序。

---质谱数据压缩
C33:质谱数据文件压缩与管理:随着质谱数据的快速累积,如何进行压缩以降低磁盘空间需求,是有实际意义的;

---保留时间预测
C34:精确预测二级质谱的保留时间:每个二级质谱都关联一个保留时间,即相应肽段从LC上被洗脱的时间。原则上保留时间是能够重现和预测的。对二级质谱保留时间的精确预测,能够有效地校验肽段鉴定结果,并提高鉴定性能。

---质谱预处理
C35:噪声谱峰去除与去迭和(deconvolution)算法:质谱中往往包含一些噪声谱峰,识别并去除噪声谱峰能够有效提高后续步骤的性能。噪声谱峰识别的难点在于如何处理谱峰重合。ESI离子化过程容易产生带多个电荷的离子,因此需要先将多电荷离子形成的质谱变换成单电荷离子形成的质谱,然后再进行后续鉴定步骤。现有依赖于同位素谱峰的方法需要处理谱峰重合的情况。

---生化标记物发现
C36:从质谱数据直接发现生物标记物:对蛋白质鉴定结果的分析能够有助于发现新的生化标记物,最近的研究表明能够从质谱数据出发直接发现生化标记物。

Challenges in Computational Analysis of Mass Spectrometry Data for Proteomics

Bin Ma (马斌)

Cheriton School of Computer Science, University of Waterloo, Canada
Dingsheng Technologies, Beijing 100085, China

Abstract:

Mass spectrometry is an analytical technique for determining the composition of a sample. Recently it has become a primary tool for protein identification and quantification, and post translational modification characterization in proteomics research. Both the size and the complexity of the data produced by this experimental technique impose great computational challenges in the data analysis. This article reviews some of these challenges and serves as an entry point for those who want to study the area in general.

Keywords: mass spectrometry    proteomics    bioinformatics  
收稿日期 2009-09-09 修回日期 2009-11-21 出版日期  
DOI:
基金项目:

This work is supported by the National High-Tech Research and Development 863 Program of China under Grant No. 2008AA02Z313, NSERC RGPIN under Grant No. 238748-2006, and a start up grant at University of Waterloo.

作者简介:
Bin Ma is an associate professor and university research chair in David R. Cheriton School of Computer Science at University of Waterloo. He received his Ph.D. degree from Beijing University in 1999. During 2000~2008 he worked at University of Western Ontario as assistant professor, associate professor, and Canada research chair. He received the Ontario PREA Award in 2003 and Ontario Premier's Catalyst Award for Best Young Innovator in 2009.

参考文献:

[1] Peng J, Elias J E, Thoreen C C, Licklider L J, Gygi S P. Evaluation of multidimensional chromatography coupled with Tandem mass spectrometry (LC/LC-MS/MS) for large-scale protein analysis: The yeast proteome. Journal of Proteome Research, 2003, 2(1): 43-50.
[2] Mann M. Quantitative proteomics? Nature Biotechnology, 1999, 17(10): 954-955.
[3] Martin-Visscher L A, van Belkum M J, Garneau-Tsodikova S, Whittal R M, Zheng J, McMullen L M, Vederas J C. Isolation and characterization of carnocyclin A, a novel circular bacteriocin produced by Carnobacterium maltaromaticum UAL307. Applied and Environmental Microbiology, 2008, 74(15): 4756- 4763.
[4] MannM, Jensen O N. Proteomic analysis of post-translational modifications. Nature Biotechnology, 2003, 21(3): 255-261.
[5] Keykhosravani M, Doherty-Kirby A, Zhang C, Brewer D, Goldberg H A, Hunter G K, Lajoie G. Comprehensive identification of post-translational modifications of rat bone osteopontin by mass spectrometry. Biochemistry, 2005, 44(18): 6990-7003.
[6] Hoffmann E, Stroobant V. Mass Spectrometry: Principles and Applications. John Wiley & Sons Ltd., 2007.
[7] Tang K, Page J S, Smith R D. Charge competition and the linear dynamic range of detection in electrospray ionization mass spectrometry. Journal of American Society of Mass Spectrometry, 2004, 15(10): 1416-1423.
[8] Gygi S P, Corthals G L, Zhang Y, Rochon Y, Aebersold R. Evaluation of two-dimensional gel electrophoresis-based proteome analysis technology. PNAS, 2000, 97(17): 9390-9395.
[9] Perkins D N, Pappin D J, Creasy D M, Cottrell J S. Probability-based protein identification by searching sequence database using mass spectrometry data. Electrophoresis, 1999, 20(18): 3551-3567.
[10] Ma B, Zhang K, Hendrie C, Liang C, Li M, Doherty-Kirby A, Lajoie G. PEAKS: Powerful software for MS/MS peptide de novo sequencing. Rapid Communications in Mass Spectrometry, 2003, 17(20): 2337-2342.
[11] Eng J K, McCormack A L, Yates III J R. An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J. Amer. Soc. Mass Spectrom., 1994, 5(11): 976-989.
[12] Craig R, Beavis R C. TANDEM: Matching proteins with tandem mass spectra. Bioinformatics, 2004, 20(9): 1466-1467.
[13] Geer L Y, Markey S P, Kowalak J A, Wagner L, Xu M, Maynard D M, Yang X, Shi W, Bryant S H. Open mass spectrometry search algorithm. J. Proteome Research, 2004, 3(5): 958-964.
[14] Colinge J, Masselot A, Giron M, Dessingy T, Magnin J. OLAV: Towards high-throughput tandem mass spectrometry data identification. Proteomics, 2003, 3(8): 1454-1463.
[15] Bafna V, Edwards N. SCOPE: A probabilistic model for scoring tandem mass spectra against a peptide database. Bioinformatics, 2001, 17(Supplement 1): S13-S21.
[16] Wan Y et al. PepHMM: A hidden Markov model based scoring function for mass spectrometry database search. In Proc. RECOMB 2005, Standford, USA, May 21-22, 2005, pp.342- 356.
[17] Zhang Z. Prediction of low-energy collision-induced dissociation spectra of peptides. Analytical Chemistry, 2004, 76(14): 3908-3922.
[18] Fenyo D, Beavis R C. A method for assessing the statistical significance of mass spectrometry-based protein identifications using general scoring schemes. Analytical Chemistry, 2003, 75(4): 768-774.
[19] Elias J E, Gygi S P. Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry. Nature Methods, 2007, 4(3): 207-214.
[20] Bianco L, Mead J A, Bessant C. Comparison of novel decoy database designs for optimizing protein identification searches using ABRF sPRG2006 standard MS/MS data sets. Journal of Proteome Research, 2009, 8(4): 1782-1791.
[21] Moore R E, Young M K, Lee T D. Qscore: An algorithm for evaluating SEQUEST database search results. Journal of the American Society for Mass Spectrometry, 2002, 13(4): 378-386.
[22] Lu B, Motoyama A, Ruse C, Venable J, Yates J R III. Improving protein identification sensitivity by combining MS and MS/MS information for shotgun proteomics using LTQOrbitrap high mass accuracy data. Analytical Chemistry, 2008, 80(6): 2018-2025.
[23] Nesvizhskii A I, Aebersold R. Interpretation of shotgun proteomic data — The protein inference problem. Molecular & Cellular Proteomics, 2005, 4(10): 1419-1440.
[24] Carr S, Aebersold R, Baldwin M, Burlingame A, Clauser K, Nesvizhskii A. The need for guidelines in publication of peptide and protein identification data. Molecular and Cellular Proteomics, 2004, 3(6): 531-533.
[25] Junqueira M et al. Separating the wheat from the chaff: Unbiased filtering of background tandem mass spectra improves protein identification. J. Proteome Research, 2008, 7(8): 3382-3395.
[26] Hughes C, Doble B, Xin L, Chen C, Shan B, Ma B, Lajoie G. SILAC quantitation with PEAKS to a depth of 3000 proteins from a double knockout GSK-3 of mouse embryonic stem cells. In ASMS 2009, Philadelphia, USA, May 31-June 4, 2009, Session Bioinformatics: Quantification, Poster, No. 056.
[27] Frank A, Pevzner P. Pepnovo: De novo peptide sequencing via probabilistic network modeling. Analytical Chemistry, 2005, 77(4): 964-973.
[28] Taylor J A, Johnson R S. Implementation and uses of automated de novo peptide sequencing by tandem mass spectrometry. Analytical Chemistry, 2001, 73(11): 2594-2604.
[29] Bartels C. Fast algorithm for peptide sequencing by mass spectroscopy. Biomed. Environ. Mass Spectrom., 1990, 19(6): 363-368.
[30] Ma B, Zhang K, Liang C. An effective algorithm for the peptide de novo sequencing from MS/MS spectrum. Journal of Computer and System Sciences, 2005, 70(3): 418-430.
[31] Lu B, Chen T. Algorithms for de novo peptide sequencing via tandem mass spectrometry. Drug Discovery Today: BioSilico, 2004, 2(2): 85-90.
[32] Xu C, Ma B. Review of software for computational peptide identification from MS/MS data. Drug Discovery Today, 2006, 11(13/14): 595-600.
[33] Hughes C, Ma B, Lajoie G. De Novo Sequencing Methods in Proteomics. Methods in Molecular Biology, Series, Springer. (to appear)
[34] Pevtsov S, Fedulova I, Mirzaei H, Buck C, Zhang X. Performance evaluation of existing de novo sequencing algorithms. Journal of Proteome Research, 2006, 5(11): 3018-3028.
[35] Yan B, Qu Y, Mao F, Olman V, Xu Y. PRIME: A mass spectrum data mining tool for de novo sequencing and PTMs identification. Journal of Computer Science and Technology, 2005, 20(4): 483-490.
[36] Dancik V et al. De novo peptide sequencing via tandem massspectrometry. J. Comp. Biology, 1999, 6(3/4): 327-342.
[37] Xin L, Lajoie G, Ma B. New method for the validation of de novo sequencing results. In ASMS 2008, Denver, USA, Jun. 1-5, Session: Bioinformatics III, Poster, No. 645.
[38] Savitski M M, Nielsen M L, Kjeldsen F, Zubarev R A. Proteomics-Grade de Novo Sequencing Approach. J. Proteome Research, 2005, 4: 2348-2354.
[39] Datta R, Bern M. Spectrum fusion: Using multiple mass spectra for de novo peptide sequencing. In Proc. RECOMB, 2008, pp.140-153.
[40] Genome News Network. http://www.genomenewsnetwork. org/.
[41] Mackey A J, Haystead T A J, Pearson W R. Getting more for less: Algorithms for rapid protein identification with multiple short peptide sequences. Mol. Cell. Proteomics, 2002, 1(2): 139-147.
[42] Huang L, Jacob R J, Pegg S C H, Baldwin M A, Wang C C, Burlingame A L, Babbitt P C. Functional assignment of the 20 S proteasome from Trypanosoma Brucei using mass spectrometry and new bioinformatics approaches. J. Biol. Chem., 2001, 276(30): 28327-28339.
[43] Shevchenko A, Sunyaev S, Loboda A, Shevchenko A, Bork P, Ens W, Standing K G. Charting the proteomes of organisms with unsequenced genomes by MALDI-quadrupole timeofflight mass spectrometry and BLAST homology searching, Anal. Chem., 2001, 73(9): 1917-1926.
[44] Han Y, Ma B, Zhang K. SPIDER: Software for protein identification from sequence tags containing de novo sequencing error. Journal of Bioinformatics and Computational Biology, 2005, 3(3): 697-716.
[45] Searle B C et al. High-throughput identification of proteins and unanticipated sequence modifications using a mass-based alignment algorithm for MS/MS de novo sequencing results. Anal. Chem., 2004, 76(8): 2220-2230.
[46] Tabb D L, Saraf A, Yates J R III. GutenTag: Highthroughput sequence tagging via an empirically derived fragmentation model. Anal. Chem., 2003, 75(23): 6415-6421.
[47] Hopper S, Johnson R S, Vath J E, Biemann K. Glutaredoxin from rabbit bone marrow. Purification, characterization, and amino acid sequence determined by tandem mass spectrometry. J. Biol. Chem., 1989, 264(34): 20438-20447.
[48] Bandeira N, Tang H, Bafna V, Pevzner P. Shotgun protein sequencing by tandem mass spectra assembly. Analytical Chemistry, 2004, 76(24): 7221-7233.
[49] Bandeira N, Clauser K R, Pevzner P. Shotgun protein sequencing: Assembly of peptide tandem mass spectra from mixtures of modified proteins. Mol. Cell Proteomics, 2007, 6(7): 1123-1134.
[50] Bandeira N, Pham V, Pevzner P, Arnott D, Lill J R. Automated de novo protein sequencing of monoclonal antibodies. Nature Biotechnology, 2008, 26(12): 1336-1338.
[51] Liu X, Han Y, Yuen D, Ma B. Automated protein (re)sequencing with MS/MS and a homologous database yields almost full coverage and accuracy. Bioinformatics, 2009, 25(17): 2174-2180.
[52] Unimod database. http://www.unimod.org.
[53] Oki M, Aihara H, Ito T. Role of histone phosphorylation in chromatin dynamics and its implications in diseases. Subcellular Biochemistry, 2007, 41: 319-336.
[54] Blom N, Gammeltoft S, Brunak S. Sequence and structurebased prediction of eukaryotic protein phosphorylation sites. Journal of Molecular Biology, 1999, 294(5): 1351-1362.
[55] Tsur D, Tanner S, Zandi E, Bafna V, Pevzner PA. Identification of post-translational modifications by blind search of mass spectra. Nat. Biotechnol., 2005, 23(12): 1562-1567.
[56] MacCoss M J et al. Shotgun identification of protein modifications from protein complexes and lens tissue. Proc. Natl. Acad. Sci. USA, 2002, 99(12): 7900-7905.
[57] Bandeira N, Tsur D, Frank A, Pevzner P. Protein identification by spectral networks analysis. Proc. Natl. Acad. Sci. USA, 2007, 104(15): 6140-6145.
[58] Witze E S, Old W M, Resing K A, Ahn N G. Mapping protein post-translational modifications with mass spectrometry. Nature Methods, 2007, 4(10): 798-806.
[59] Dwek R A, Butters TD , Platt F M, Zitzmann N. Targeting glycosylation as a therapeutic approach. Nature Reviews Drug Discoveries, 2002, 1(1): 65-75.
[60] Parekh R B et al. Association of rheumatoid arthritis and primary osteoarthritis with changes in the glycosylation pattern of total serum IgG. Nature, 1985, 316(6027): 452-457.
[61] Dennisa JW, Granovskya M,Warrena C E. Glycoprotein glycosylation and cancer progression. Biochimica et Biophysica Acta (BBA) — General Subjects, 1999, 1473(1): 21-34.
[62] Tang H, Mechref Y, Novotny M V. Automated interpretation of MS/MS spectra of oligosaccharides. Bioinformatics, 2005, 21(Suppl. 1): i431-i439.
[63] Zala J. Mass spectrometry of oligosaccharides. Mass Spectrometry Reviews, 2004, 23(3): 161-227.
[64] Zhang C, Doherty-Kirby A, Lajoie G. Investigation of cationic peanut peroxidase glycans by electrospray ionization mass spectrometry. Phytochemistry, 2004, 65(11): 1575-1588.
[65] Shan B, Lajoie G, Ma B, Zhang K. Complexities and algorithms for glycan structure sequencing using tandem mass spectrometry. Journal of Bioinformatics and Computational Biology, 2008, 6(1): 77-91.
[66] An H J, Tillinghast J S, Woodruff D L, Rocke D M, Lebrilla C B. A new computer program (GlycoX) to determine simultaneously the glycosylation sites and oligosaccharide heterogeneity of glycoproteins. Journal of Proteome Research, 2006, 5(10): 2800-2808.
[67] Prince J T, Carlson M W, Wang R, Lu P, Marcotte E M. The need for a public proteomics repository. Nature Biotechnology, 2004, 22(4): 471-472.
[68] Desiere F et al. The PeptideAtlas project. Nucleic Acids Research, 2006, 34(Database Issue): D655-D658.
[69] Rudnick P et al. NIST reference libraries of peptide fragmentation spectra: 2008. In ASMS 2008, Denver, USA, Jun. 1-5, Session: Bioinformatics III, Poster, No. 2008.
[70] Craig R, Cortens J, Fenyo D, Beavis R. Using annotated peptide mass spectrum libraries for protein identification. J. Proteome Res., 2006, 5(8): 1843-1849.
[71] Dutta D, Chen T. Speeding up tandem mass spectrometry database search: Metric embeddings and fast near neighbor search. Bioinformatics, 2007, 23(5): 612-618.
[72] Wu Z, Lajoie G, Ma B. MSDash: Mass spectrometry database and search. In Proc. the 7th Int. Conf. Computational System Bioinformatics, Stanford, USA, Aug. 26-29, 2008, pp.63- 71.
[73] Gygi S P, Rist B, Gerber S A, Turecek F, Gelb M H, Aebersold R. Quantitative analysis of complex protein mixtures using isotope-coded affinity tags. Nature Biotechnology, 1999, 17(10): 994-999.
[74] Ong S E, Blagoev B, Kratchmarova I, Kristensen D B, Steen H, Pandey A, Mann M. Stable isotope labeling by amino acids in cell culture, SILAC, as a simple and accurate approach to expression proteomics. Molecular & Cellular Proteomics, 2002, 1(5): 376-386.
[75] Wiese S, Reidegeld K A, Meyer H E, Warscheid B. Protein labeling by iTRAQ: A new tool for quantitative mass spectrometry in proteome research. Proteomics, 2007, 7(3): 340- 350.
[76] Wang et al. Quantification of proteins and metabolites by mass spectrometry without isotopic labeling or spiked standards. Analytical Chemistry, 2003, 75(18): 4818-4826.
[77] Old W M et al. Comparison of label-free methods for quantifying human proteins by shotgun proteomics. Mol. Cell Proteomics, 2005, 4(10): 1487-1502.
[78] Smith CA, Want EJ, O’Maille G, Abagyan R, Siuzdak G. XCMS: Processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching, and identification. Anal. Chem., 2006, 78(3): 779-787.
[79] Chen W W et al. New algorithm for label-free protein quantification. In ASMS, Philadelphia, USA, May 31-June 4, 2009, Session MPB: Bioinformatics: Quantification, Poster, No. 043.
[80] Andreev V P, Li L, Cao L, Gu Y, Rejtar T, Wu S L, Karger B L. A new algorithm using cross-assignment for label-free quantitation with LC/LTQ-FT MS. Journal of Proteome Research, 2007, 6(6): 2186-2194.
[81] Lee T, Singh R, Yen TY, Macher B. An algorithmic approach to automated high-throughput identification of disulfide connectivity in proteins using tandem mass spectrometry. In Proc. Computational System Bioinformatics Conference, San Diego, USA, Aug. 13-17, 2007, pp.41-51.
[82] Ng J, Bandeira N, Liu W T, Ghassemian M, Simmons T L, Gerwick W H, Linington R, Dorrestein P C, Pevzner P A. Dereplication and de novo sequencing of nonribosomal peptides. Nature Methods, 2009, 6(8): 596-599.
[83] Zhang N et al. ProbIDtree: An automated software program capable of identifying multiple peptides from a single collision-induced dissociation spectrum collected by a tandem mass spectrometer. Proteomics 2005, 5(16): 4096-4106.
[84] Kelleher N L, Lin H Y, Valaskovic G A, Aaserud D J, Fridriksson E K, McLafferty F W. Top down versus bottom up protein characterization by tandem high-resolution mass spectrometry. Journal of the American Chemistry Society, 1999, 121(4): 806-812.
[85] Tang H et al. A computational approach toward label-free protein quantification using predicted peptide detectability. Bioinformatics, 2006, 22(14): e481-e488.
[86] Alves P, Arnold R J, Novotny M V, Radivojac P, Reilly J P, Tang H. Advancement in protein inference from shotgun proteomics using peptide detectability. In Proc. Pac. Symp. Biocomput., Maui, USA, Jan. 3-7, 2007, pp.409-20.
[87] H?akansson K et al. Combined electron capture and infrared multiphoton dissociation for multistage MS/MS in a Fourier transform ion cyclotron resonance mass spectrometer. Anal. Chem., 2003, 75(13): 3256-3262.
[88] Nuno Bandeira, Jesper V Olsen, Matthias Mann, Pavel A Pevzner. Multi-spectra peptide sequencing and its applications to multistage mass spectrometry. Bioinformatics, 2008, 24(13): i416-i423.
[89] Xie M, Ma B. MSPack — Mass spectrometry data compression software. In Proc. the 54th ASMS Conf. Mass Spectrometry, Seattle, USA, May 28-June 1, 2006, Session: Computer Applications, Poster, No. 071.
[90] Miguel A C, Kearney-Fischer M, Keane J F, Whiteaker J, Feng L C, Paulovich A. Near-lossless compression of mass spectra for proteomics. In Proc. IEEE International Conference on Acoustics, Speech and Signal Processing, Honolulu, USA, April 15-20, 2007, pp.I369-I372.
[91] Meek J L. Prediction of peptide retention times in highpressure liquid chromatography on the basis of amino acid composition. Proc. Natl. Acad. Sci. USA, 77(3): 1632-1636.
[92] Strittmatter E F et al. Application of peptide LC retention time information in a discriminant function for peptide identification by tandem mass spectrometry. Journal of Proteome Research, 2004, 3(4): 760-769.
[93] Henzel W J, Billeci T M, Stults J T, Wong S C, Grimley C, Watanabe C. Identifying proteins from two-dimensional gels by molecular mass searching of peptide fragments in protein sequence databases. Proc. Natl. Acad. Sci. USA, 1993, 90(11): 5011-5015.
[94] Du P, Kibbe W A, Lin S M. Improved peak detection in mass spectrum by incorporating continuous wavelet transformbased pattern matching. Bioinformatics, 2006, 22(17): 2059- 2065.
[95] Katajamaa M, Oreˇsiˇc M. Processing methods for differential analysis of LC/MS profile data. BMC Bioinformatics, 2005, 6: 179.
[96] Nagalla S R et al. Proteomic analysis of maternal serum in down syndrome: Identification of novel protein biomarkers. Journal of Proteome Research, 2007, 6(4): 1245-1257.
[97] Issaq H J, Veenstra T D, Conrads T P, Felschow D. The SELDI-TOF MS approach to proteomics: Protein profiling and biomarker identification. Biochemical and Biophysical Research Communications, 2002, 292(3): 587-592.
[98] Hancock W S, Wu S L, Shieh P. The challenges of developing a sound proteomics strategy. Proteomics, 2002, 2(4): 352-359.
[99] Steen H, Mann M. The ABC’s (and XYZ’s) of peptide sequencing. Nature Reviews Molecular Cell Biology, 2004, 5(9): 699-711.
[100] Snyder A P. Interpreting Protein Mass Spectra: A Comprehensive Resource. The American Chemical Society and Oxford University Press, 2000.
[101] Kinter M, Sherman N E. Protein Sequencing and Identification Using Tandem Mass Spectrometry. John Wiley & Sons Inc., 2000.

文章评论

Copyright 2008 by Journal of Computer Science and Technology