Methods ARTICLE

Front. Plant Sci., 31 January 2012 | doi: 10.3389/fpls.2012.00005

TriAnnot: a versatile and high performance pipeline for the automated annotation of plant genomes

Philippe Leroy1*, Nicolas Guilhot1, Hiroaki Sakai2, Aurélien Bernard1,3, Frédéric Choulet1, Sébastien Theil1, Sébastien Reboux4, Naoki Amano2,5, Timothée Flutre4, Céline Pelegrin1, Hajime Ohyanagi6,7, Michael Seidel8, Franck Giacomoni9, Mathieu Reichstadt10, Michael Alaux4, Emmanuelle Gicquello1, Fabrice Legeai11, Lorenzo Cerutti12, Hisataka Numa2, Tsuyoshi Tanaka2, Klaus Mayer8, Takeshi Itoh2, Hadi Quesneville4 and Catherine Feuillet1*
  • 1 UMR 1095, Genetics, Diversity and Ecophysiology of Cereals, Institut National de la Recherche Agronomique-Université Blaise Pascal, Clermont-Ferrand, France
  • 2 National Institute of Agrobiological Sciences, Tsukuba, Ibaraki, Japan
  • 3 ISEM UMR5554, Institut des Sciences de l’Evolution de Montpellier, Montpellier, France
  • 4 UR 1164, Unité de Recherche en Génomique Informatique, Institut National de la Recherche Agronomique, Versailles, France
  • 5 Center for iPS Cell Research and Application, Kyoto University, Sakyo-ku Kyoto, Japan
  • 6 Tsukuba Division, Mitsubishi Space Software Co., Ltd,. Tsukuba, Ibaraki, Japan
  • 7 Plant Genetics Laboratory, National Institute of Genetics, Mishima, Shizuoka, Japan
  • 8 Institute of Bioinformatics and System Biology/MIPS, Helmholtz Center Munich, Neuherberg, Germany
  • 9 UMR1019, Unité de Recherche en Nutrition Humaine, Institut National de la Recherche Agronomique, Saint-Genès-Champanelle, France
  • 10 UR1213, Unité de Recherche sur les Herbivores, Institut National de la Recherche Agronomique, Saint-Genès-Champanelle, France
  • 11 UMR 1099, Biologie des Organismes et des Populations appliquée à la Protection des Plantes, Institut National de la Recherche Agronomique, Le Rheu, France
  • 12 Swiss Institute of Bioinformatics, Geneva, Switzerland

In support of the international effort to obtain a reference sequence of the bread wheat genome and to provide plant communities dealing with large and complex genomes with a versatile, easy-to-use online automated tool for annotation, we have developed the TriAnnot pipeline. Its modular architecture allows for the annotation and masking of transposable elements, the structural, and functional annotation of protein-coding genes with an evidence-based quality indexing, and the identification of conserved non-coding sequences and molecular markers. The TriAnnot pipeline is parallelized on a 712 CPU computing cluster that can run a 1-Gb sequence annotation in less than 5 days. It is accessible through a web interface for small scale analyses or through a server for large scale annotations. The performance of TriAnnot was evaluated in terms of sensitivity, specificity, and general fitness using curated reference sequence sets from rice and wheat. In less than 8 h, TriAnnot was able to predict more than 83% of the 3,748 CDS from rice chromosome 1 with a fitness of 67.4%. On a set of 12 reference Mb-sized contigs from wheat chromosome 3B, TriAnnot predicted and annotated 93.3% of the genes among which 54% were perfectly identified in accordance with the reference annotation. It also allowed the curation of 12 genes based on new biological evidences, increasing the percentage of perfect gene prediction to 63%. TriAnnot systematically showed a higher fitness than other annotation pipelines that are not improved for wheat. As it is easily adaptable to the annotation of other plant genomes, TriAnnot should become a useful resource for the annotation of large and complex genomes in the future.

Keywords: cluster, gene models, pipeline, plant genome, structural and functional annotation, transposable elements, wheat

Citation: Leroy P, Guilhot N, Sakai H, Bernard A, Choulet F, Theil S, Reboux S, Amano N, Flutre T, Pelegrin C, Ohyanagi H, Seidel M, Giacomoni F, Reichstadt M, Alaux M, Gicquello E, Legeai F, Cerutti L, Numa H, Tanaka T, Mayer K, Itoh T, Quesneville H and Feuillet C (2012) TriAnnot: a versatile and high performance pipeline for the automated annotation of plant genomes. Front. Plant Sci. 3:5. doi: 10.3389/fpls.2012.00005

Received: 02 October 2011; Accepted: 04 January 2012;
Published online: 31 January 2012.

Edited by:

Takuji Sasaki, National Institute of Agrobiological Sciences, Japan

Reviewed by:

Xiangfeng Wang, University of Arizona, USA
Kentaro Yano, Meiji University, Japan

Copyright: © 2012 Leroy, Guilhot, Sakai, Bernard, Choulet, Theil, Reboux, Amano, Flutre, Pelegrin, Ohyanagi, Seidel, Giacomoni, Reichstadt, Alaux, Gicquello, Legeai, Cerutti, Numa, Tanaka, Mayer, Itoh, Quesneville and Feuillet. This is an open-access article distributed under the terms of the Creative Commons Attribution Non Commercial License, which permits non-commercial use, distribution, and reproduction in other forums, provided the original authors and source are credited.

*Correspondence: Philippe Leroy and Catherine Feuillet, UMR 1095, Genetics, Diversity and Ecophysiology of Cereals, Institut National de la Recherche Agronomique-Université Blaise Pascal, 234 Avenue du Brézet, Domaine de Crouel, F-63000 Clermont-Ferrand, France. e-mail: leroy@clermont.inra.fr; catherine.feuillet@clermont.inra.fr

Back to top