Journal of Computer Science and Technology ›› 2021, Vol. 36 ›› Issue (2): 248-260.doi: 10.1007/s11390-021-0856-4

Special Issue: Emerging Areas

• Special Section on AI and Big Data Analytics in Biology and Medicine • Previous Articles     Next Articles

Effective Identification and Annotation of Fungal Genomes

Jian Liu1,*, Member, CCF, Jia-Liang Sun1, and Yong-Zhuang Liu2        

  1. 1 College of Computer Science, Nankai University, Tianjin 300350, China;
    2 School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
  • Received:2020-08-01 Revised:2021-02-23 Online:2021-03-05 Published:2021-04-01
  • Contact: Jian Liu
  • About author:Jian Liu received his M.S. and Ph.D. degrees in computer application technology from Northeastern University, Shenyang, in 2009 and 2014 respectively. He is a professor in the College of Computer Science, Nankai University, Tianjin. His current research interests include massive biological database management, multi-omics data analysis and bioinformatics. He has published over 30 papers in international journals, conferences and edited books in these areas since 2010.
  • Supported by:
    The work was supported by the National Key Research and Development Program of China under Grant Nos. 2018YFC1603800, 2018YFC1603802, 2020YFA0908700 and 2020YFA0908702, and the National Natural Science Foundation of China under Grant No. 61872115.

In the past few decades, the dangers of mycosis have caused widespread concern. With the development of the sequencing technology, the effective analysis of fungal sequencing data has become a hotspot. With the gradual increase of fungal sequencing data, there is now a lack of sufficient approaches for the identification and functional annotation of fungal chromosomal genomes. To overcome this challenge, this paper firstly deals with the approaches of the identification and annotation of fungal genomes based on short and long reads sequenced by using multiple platforms such as Illumina and Pacbio. Then this paper develops an automated bioinformatics pipeline called PFGI for the identification and annotation task. The experimental evaluation on a real-world dataset ENA (European Nucleotide Archive) shows that PFGI provides a user-friendly way to perform fungal identification and annotation based on the sequencing data analysis, and could provide accurate analyzing results, accurate to the species level (97% sequence identity).

Key words: fungal genome; fungal identification; bioinformatics pipeline;

[1] Desprez-Loustau M L, Robin C, Buée M, Courtecuisse R, Garbaye J, Suffert F, Sache I, Rizzo D M. The fungal dimension of biological invasions. Trends in Ecology & Evolution, 2007, 22(9):472-480. DOI:10.1016/j.tree.2007.04.005.
[2] Schuster S C. Next-generation sequencing transforms today's biology. Nature Methods, 2008, 5(1):16-18. DOI:10.1038/nmeth1156.
[3] van Dijk E L, Auger H, Jaszczyszyn Y, Thermes C. Ten years of next-generation sequencing technology. Trends in Genetics, 2014, 30(9):418-426. DOI:10.1016/j.tig.2014.07.001.
[4] van Dijk E L, Jaszczyszyn Y, Naquin D, Thermes C. The third revolution in sequencing technology. Trends in Genetics, 2018, 34(9):666-681. DOI:10.1016/j.tig.2018.05.008.
[5] Dannemiller K C, Reeves D, Bibby K, Yamamoto N, Peccia J. Fungal high-throughput taxonomic identification tool for use with next-generation sequencing (FHiTINGS). Journal of Basic Microbiology, 2014, 54(4):315-321. DOI:10.1002/jobm.201200507.
[6] Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, Madden T L. BLAST+:Architecture and applications. BMC Bioinformatics, 2009, 10(1):Article No. 421. DOI:10.1186/1471-2105-10-421.
[7] Gweon H S, Oliver A, Taylor J, Booth T, Gibbs M, Read D S, Griffiths R I, Schonrogge K. PIPITS:An automated pipeline for analyses of fungal internal transcribed spacer sequences from the I llumina sequencing platform. Methods in Ecology and Evolution, 2015, 6(8):973-980. DOI:10.1111/2041-210X.12399.
[8] Eng A, Verster A J, Borenstein E. Meta-LAFFA:A flexible, end-to-end, distributed computing-compatible metagenomic functional annotation pipeline. BMC Bioinformatics, 2020, 21(1):Article No. 471. DOI:10.1186/s12859-020-03815-9.
[9] Clarke E L, Taylor L J, Zhao C, Connell A, Lee J J, Fett B, Bushman F D, Bittinger K. Sunbeam:An extensible pipeline for analyzing metagenomic sequencing experiments. Microbiome, 2019, 7(1):Article No. 46. DOI:10.1186/s40168-019-0658-x.
[10] Rhoads A, Au K F. PacBio sequencing and its applications. Genomics, Proteomics & Bioinformatics, 2015, 13(5):278-289. DOI:10.1016/j.gpb.2015.08.002.
[11] Seemann T. Prokka:Rapid prokaryotic genome annotation. Bioinformatics, 2014, 30(14):2068-2069. DOI:10.1093/bioinformatics/btu153.
[12] Jolley K A, Maiden M C. BIGSdb:Scalable analysis of bacterial genome variation at the population level. BMC Bioinformatics, 2010, 11(1):Article No. 595. DOI:10.1186/1471-2105-11-595.
[13] Chen S, Zhou Y, Chen Y, Gu J. FASTQ:An ultra-fast allin-one FASTQ preprocessor. Bioinformatics, 2018, 34(17):i884-i890. DOI:10.1093/bioinformatics/bty560.
[14] Bolger A M, Lohse M, Usadel B. Trimmomatic:A flexible trimmer for Illumina sequence data. Bioinformatics, 2014, 30(15):2114-2120. DOI:10.1093/bioinformatics/btu170.
[15] Martin M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet Journal, 2011, 17(1):10-12. DOI:10.14806/ej.17.1.200.
[16] Benson D A, Cavanaugh M, Clark K, Karsch-Mizrachi I, Lipman D J, Ostell J, Sayers E W. GenBank. Nucleic Acids Research, 2012, 41(D1):D36-D42. DOI:10.1093/nar/gks1195.
[17] Li D, Liu C M, Luo R, Sadakane K, Lam T W. MEGAHIT:An ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics, 2015, 31(10):1674-1676. DOI:10.1093/bioinformatics/btv033.
[18] Zerbino D R, Birney E. Velvet:Algorithms for de novo short read assembly using de Bruijn graphs. Genome Research, 2008, 18(5):821-829. DOI:10.1101/gr.074492.107.
[19] Bankevich A, Nurk S, Antipov D et al. SPAdes:A new genome assembly algorithm and its applications to singlecell sequencing. Journal of Computational Biology, 2012, 19(5):455-477. DOI:10.1089/cmb.2012.0021.
[20] Koren S, Walenz B P, Berlin K, Miller J R, Bergman N H, Phillippy A M. Canu:Scalable and accurate longread assembly via adaptive k-mer weighting and repeat separation. Genome Research, 2017, 27(5):722-736. DOI:10.1101/gr.215087.116.
[21] Gurevich A, Saveliev V, Vyahhi N, Tesler G. QUAST:Quality assessment tool for genome assemblies. Bioinformatics, 2013, 29(8):1072-1075. DOI:10.1093/bioinformatics/btt086.
[22] Cock P J, Antao T, Chang J T et al. Biopython:Freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics, 2009, 25(11):1422-1423. DOI:10.1093/bioinformatics/btp163.
[23] Rowe W P. When the levee breaks:A practical guide to sketching algorithms for processing the flood of genomic data. Genome Biology, 2019, 20(1):Article No. 199. DOI:10.1186/s13059-019-1809-x.
[24] Li H. Minimap2:Pairwise alignment for nucleotide sequences. Bioinformatics, 2018, 34(18):3094-3100. DOI:10.1093/bioinformatics/bty191.
[25] Kanz C, Aldebert P, Althorpe N et al. The EMBL nucleotide sequence database. Nucleic Acids Research, 2005, 33(suppl 1):D29-D33. DOI:10.1093/nar/gki098.
[26] Cornish-Bowden A. Nomenclature for incompletely specified bases in nucleic acid sequences:Recommendations 1984. Nucleic Acids Research, 1985, 13(9):3021-3030. DOI:10.1093/nar/13.9.3021.
[27] Caboche S, Even G, Loywick A, Audebert C, Hot D. MICRA:An automatic pipeline for fast characterization of microbial genomes from high-throughput sequencing data. Genome Biology, 2017, 18(1):Article No. 233. DOI:10.1186/s13059-017-1367-z.
No related articles found!
Full text



[1] Zhou Di;. A Recovery Technique for Distributed Communicating Process Systems[J]. , 1986, 1(2): 34 -43 .
[2] Chen Shihua;. On the Structure of Finite Automata of Which M Is an(Weak)Inverse with Delay τ[J]. , 1986, 1(2): 54 -59 .
[3] Li Wanxue;. Almost Optimal Dynamic 2-3 Trees[J]. , 1986, 1(2): 60 -71 .
[4] Wang Xuan; Lü Zhimin; Tang Yuhai; Xiang Yang;. A High Resolution Chinese Character Generator[J]. , 1986, 1(2): 1 -14 .
[5] C.Y.Chung; H.R.Hwa;. A Chinese Information Processing System[J]. , 1986, 1(2): 15 -24 .
[6] Zhang Cui; Zhao Qinping; Xu Jiafu;. Kernel Language KLND[J]. , 1986, 1(3): 65 -79 .
[7] Wang Jianchao; Wei Daozheng;. An Effective Test Generation Algorithm for Combinational Circuits[J]. , 1986, 1(4): 1 -16 .
[8] Chen Zhaoxiong; Gao Qingshi;. A Substitution Based Model for the Implementation of PROLOG——The Design and Implementation of LPROLOG[J]. , 1986, 1(4): 17 -26 .
[9] Huang Heyan;. A Parallel Implementation Model of HPARLOG[J]. , 1986, 1(4): 27 -38 .
[10] Zheng Guoliang; Li Hui;. The Design and Implementation of the Syntax-Directed Editor Generator(SEG)[J]. , 1986, 1(4): 39 -48 .

ISSN 1000-9000(Print)

CN 11-2296/TP

Editorial Board
Author Guidelines
Journal of Computer Science and Technology
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China
  Copyright ©2015 JCST, All Rights Reserved