Ghent University Academic Bibliography

Advanced

Exploring the plant transcriptome through phylogenetic profiling

Klaas Vandepoele UGent and Yves Van de Peer UGent (2005) PLANT PHYSIOLOGY. 137(1). p.31-42
abstract
Publicly available protein sequences represent only a small fraction of the full catalog of genes encoded by the genomes of different plants, such as green algae, mosses, gymnosperms, and angiosperms. By contrast, an enormous amount of expressed sequence tags (ESTs) exists for a wide variety of plant species, representing a substantial part of all transcribed plant genes. Integrating protein and EST sequences in comparative and evolutionary analyses is not straightforward because of the heterogeneous nature of both types of sequence data. By combining information from publicly available EST and protein sequences for 32 different plant species, we identified more than 250,000 plant proteins organized in more than 12,000 gene families. Approximately 60% of the proteins are absent from current sequence databases but provide important new information about plant gene families. Analysis of the distribution of gene families over different plant species through phylogenetic profiling reveals interesting insights into plant gene evolution, and identifies species- and lineage-specific gene families, orphan genes, and conserved core genes across the green plant lineage. We counted a similar number of approximately 9,500 gene families in monocotyledonous and eudicotyledonous plants and found strong evidence for the existence of at least 33,700 genes in rice (Oryza sativa). Interestingly, the larger number of genes in rice compared to Arabidopsis (Arabidopsis thaliana) can partially be explained by a larger amount of species-specific single-copy genes and species-specific gene families. In addition, a majority of large gene families, typically containing more than 50 genes, are bigger in rice than Arabidopsis, whereas the opposite seems true for small gene families.
Please use this url to cite or link to this publication:
author
organization
year
type
journalArticle (original)
publication status
published
subject
keyword
ARABIDOPSIS-THALIANA, TIGR GENE INDEXES, RICE GENOME, CHLAMYDOMONAS-REINHARDTII, DRAFT SEQUENCE, WHOLE GENOME, EVOLUTION, PROTEINS, DATABASE, FAMILY
journal title
PLANT PHYSIOLOGY
Plant Physiol.
volume
137
issue
1
pages
31 - 42
Web of Science type
Article
Web of Science id
000226613100003
JCR category
PLANT SCIENCES
JCR impact factor
6.114 (2005)
JCR rank
7/143 (2005)
JCR quartile
1 (2005)
ISSN
0032-0889
DOI
10.1104/pp.104.054700
language
English
UGent publication?
yes
classification
A1
copyright statement
I have transferred the copyright for this publication to the publisher
id
331385
handle
http://hdl.handle.net/1854/LU-331385
date created
2006-04-14 13:58:00
date last changed
2016-12-19 15:44:10
@article{331385,
  abstract     = {Publicly available protein sequences represent only a small fraction of the full catalog of genes encoded by the genomes of different plants, such as green algae, mosses, gymnosperms, and angiosperms. By contrast, an enormous amount of expressed sequence tags (ESTs) exists for a wide variety of plant species, representing a substantial part of all transcribed plant genes. Integrating protein and EST sequences in comparative and evolutionary analyses is not straightforward because of the heterogeneous nature of both types of sequence data. By combining information from publicly available EST and protein sequences for 32 different plant species, we identified more than 250,000 plant proteins organized in more than 12,000 gene families. Approximately 60\% of the proteins are absent from current sequence databases but provide important new information about plant gene families. Analysis of the distribution of gene families over different plant species through phylogenetic profiling reveals interesting insights into plant gene evolution, and identifies species- and lineage-specific gene families, orphan genes, and conserved core genes across the green plant lineage. We counted a similar number of approximately 9,500 gene families in monocotyledonous and eudicotyledonous plants and found strong evidence for the existence of at least 33,700 genes in rice (Oryza sativa). Interestingly, the larger number of genes in rice compared to Arabidopsis (Arabidopsis thaliana) can partially be explained by a larger amount of species-specific single-copy genes and species-specific gene families. In addition, a majority of large gene families, typically containing more than 50 genes, are bigger in rice than Arabidopsis, whereas the opposite seems true for small gene families.},
  author       = {Vandepoele, Klaas and Van de Peer, Yves},
  issn         = {0032-0889},
  journal      = {PLANT PHYSIOLOGY},
  keyword      = {ARABIDOPSIS-THALIANA,TIGR GENE INDEXES,RICE GENOME,CHLAMYDOMONAS-REINHARDTII,DRAFT SEQUENCE,WHOLE GENOME,EVOLUTION,PROTEINS,DATABASE,FAMILY},
  language     = {eng},
  number       = {1},
  pages        = {31--42},
  title        = {Exploring the plant transcriptome through phylogenetic profiling},
  url          = {http://dx.doi.org/10.1104/pp.104.054700},
  volume       = {137},
  year         = {2005},
}

Chicago
Vandepoele, Klaas, and Yves Van de Peer. 2005. “Exploring the Plant Transcriptome Through Phylogenetic Profiling.” Plant Physiology 137 (1): 31–42.
APA
Vandepoele, K., & Van de Peer, Y. (2005). Exploring the plant transcriptome through phylogenetic profiling. PLANT PHYSIOLOGY, 137(1), 31–42.
Vancouver
1.
Vandepoele K, Van de Peer Y. Exploring the plant transcriptome through phylogenetic profiling. PLANT PHYSIOLOGY. 2005;137(1):31–42.
MLA
Vandepoele, Klaas, and Yves Van de Peer. “Exploring the Plant Transcriptome Through Phylogenetic Profiling.” PLANT PHYSIOLOGY 137.1 (2005): 31–42. Print.