Ghent University Academic Bibliography

Advanced

Micropeptides, the next best thing after micro-RNA?: combining in silico prediction and ribosome profiling in a genome-wide search for novel micropeptides

Jeroen Crappé, Geert Trooskens UGent, Eisuke Hayakawa, Walter Luyten, Geert Baggerman, Wim Van Criekinge UGent and Gerben Menschaert UGent (2013) Mass Spectrometry and Allied Topics, 61st ASMS conference, Abstracts.
abstract
Introduction : It was long assumed that proteins are at least 100 amino acids (AAs) long. Moreover, the detection of short translation products (e.g. coded from small Open Reading Frames, sORFs) is very difficult as the short length makes it hard to distinguish true coding ORFs from ORFs occurring by chance. Nevertheless, over the past few years many such non-canonical genes (with ORFs < 100 AAs) have been discovered in different organisms like Arabidopsis thaliana, Saccharomyces cerevisiae, and Drosophila melanogaster. Thanks to advances in sequencing, bioinformatics and computing power, it is now possible to scan the genome in unprecedented scrutiny, for example in a search of this type of small ORFs. Methods : Using bioinformatics methods, we performed a systematic search for putatively functional sORFs in the Mus musculus genome. A genome-wide scan detected all sORFs which were subsequently analyzed for their coding potential, based on evolutionary conservation at the AA level using UCSC multiple species alignments, and ranked using a Support Vector Machine (SVM) learning model. The ranked sORFs are finally overlapped with ribosome profiling data proving sORF translation. All candidates are visually inspected using an in-house developed genome browser. Preliminary Data : The genome-wide search for sORFs with sORFfinder resulted in the prediction of 2,414,589 single-exon sORFs with high coding potential, out of a total pool of 40,704,347 sORFs. To assess their peptide-coding potential, all sORFs were analyzed using a UCSC multi-species alignment of 8 vertebrate species. For each sORF a number of basic peptide conservation characteristics were deduced and gathered. We used an SVM approach to classify the sORFs into a coding and non-coding group based on all aforementioned characteristics. After training the SVM on 4/5th of the data and testing the SVM on the remainder, we reached a correct classification for up to 93% of the test subjects, with a false positive rate not exceeding 4%. Even with very stringent parameters this genome-wide in silico prediction approach gives rise to hundreds, even thousands of possibly interesting sequences. Therefore we reanalyzed ribosome profiling data obtained from a mouse Embryonic Stem Cells (mESC) sample, uniquely mapping the reads to sORFs located in intergenic or ncRNA regions. Retaining only those sORFs that overlap with ribosome profiles at their start position in the harringtonine treated sample data and that have a sequence coverage of at least 75% relative to the untreated sample data, led to a set of 221 intergenic sORFs and 489 sORFs located in ncRNA regions. Looking only at lincRNA sORFs, as data points to their expression in these regions, further decreases the sample size to 33 sORFs. All sORFs are made accessible through an in-house developed H2G2 genome browser. Next to the sORF information, static visualization tracks are added depicting genomic annotation from Ensembl, phastCons conservation scores and other relevant information. Experimental ribosomal profiling data are incorporated using individual tracks for every analysis on the different samples (with or without harringtonine treatment).
Please use this url to cite or link to this publication:
author
organization
year
type
conference
publication status
published
subject
in
Mass Spectrometry and Allied Topics, 61st ASMS conference, Abstracts
conference name
61st ASMS conference on Mass Spectrometry and Allied Topics (ASMS 2013)
conference location
Minneapolis, MN, USA
conference start
2013-06-09
conference end
2013-06-13
language
English
UGent publication?
yes
classification
C3
additional info
uploaded document is poster version
copyright statement
I have retained and own the full copyright for this publication
id
4197314
handle
http://hdl.handle.net/1854/LU-4197314
date created
2013-12-06 15:35:33
date last changed
2016-12-19 15:37:44
@inproceedings{4197314,
  abstract     = {Introduction : It was long assumed that proteins are at least 100 amino acids (AAs) long. Moreover, the detection of short translation products (e.g. coded from small Open Reading Frames, sORFs) is very difficult as the short length makes it hard to distinguish true coding ORFs from ORFs occurring by chance. Nevertheless, over the past few years many such non-canonical genes (with ORFs {\textlangle} 100 AAs) have been discovered in different organisms like Arabidopsis thaliana, Saccharomyces cerevisiae, and Drosophila melanogaster. Thanks to advances in sequencing, bioinformatics and computing power, it is now possible to scan the genome in unprecedented scrutiny, for example in a search of this type of small ORFs.
Methods : Using bioinformatics methods, we performed a systematic search for putatively functional sORFs in the Mus musculus genome. A genome-wide scan detected all sORFs which were subsequently analyzed for their coding potential, based on evolutionary conservation at the AA level using UCSC multiple species alignments, and ranked using a Support Vector Machine (SVM) learning model. The ranked sORFs are finally overlapped with ribosome profiling data proving sORF translation. All candidates are visually inspected using an in-house developed genome browser.
Preliminary Data : The genome-wide search for sORFs with sORFfinder resulted in the prediction of 2,414,589 single-exon sORFs with high coding potential, out of a total pool of 40,704,347 sORFs. To assess their peptide-coding potential, all sORFs were analyzed using a UCSC multi-species alignment of 8 vertebrate species. For each sORF a number of basic peptide conservation characteristics were deduced and gathered. We used an SVM approach to classify the sORFs into a coding and non-coding group based on all aforementioned characteristics. After training the SVM on 4/5th of the data and testing the SVM on the remainder, we reached a correct classification for up to 93\% of the test subjects, with a false positive rate not exceeding 4\%. Even with very stringent parameters this genome-wide in silico prediction approach gives rise to hundreds, even thousands of possibly interesting sequences. Therefore we reanalyzed ribosome profiling data obtained from a mouse Embryonic Stem Cells (mESC) sample, uniquely mapping the reads to sORFs located in intergenic or ncRNA regions. Retaining only those sORFs that overlap with ribosome profiles at their start position in the harringtonine treated sample data and that have a sequence coverage of at least 75\% relative to the untreated sample data, led to a set of 221 intergenic sORFs and 489 sORFs located in ncRNA regions. Looking only at lincRNA sORFs, as data points to their expression in these regions, further decreases the sample size to 33 sORFs. All sORFs are made accessible through an in-house developed H2G2 genome browser. Next to the sORF information, static visualization tracks are added depicting genomic annotation from Ensembl, phastCons conservation scores and other relevant information. Experimental ribosomal profiling data are incorporated using individual tracks for every analysis on the different samples (with or without harringtonine treatment).},
  author       = {Crapp{\'e}, Jeroen and Trooskens, Geert and Hayakawa, Eisuke and Luyten, Walter and Baggerman, Geert and Van Criekinge, Wim and Menschaert, Gerben},
  booktitle    = {Mass Spectrometry and Allied Topics, 61st ASMS conference, Abstracts},
  language     = {eng},
  location     = {Minneapolis, MN, USA},
  title        = {Micropeptides, the next best thing after micro-RNA?: combining in silico prediction and ribosome profiling in a genome-wide search for novel micropeptides},
  year         = {2013},
}

Chicago
Crappé, Jeroen, Geert Trooskens, Eisuke Hayakawa, Walter Luyten, Geert Baggerman, Wim Van Criekinge, and Gerben Menschaert. 2013. “Micropeptides, the Next Best Thing After micro-RNA?: Combining in Silico Prediction and Ribosome Profiling in a Genome-wide Search for Novel Micropeptides.” In Mass Spectrometry and Allied Topics, 61st ASMS Conference, Abstracts.
APA
Crappé, J., Trooskens, G., Hayakawa, E., Luyten, W., Baggerman, G., Van Criekinge, W., & Menschaert, G. (2013). Micropeptides, the next best thing after micro-RNA?: combining in silico prediction and ribosome profiling in a genome-wide search for novel micropeptides. Mass Spectrometry and Allied Topics, 61st ASMS conference, Abstracts. Presented at the 61st ASMS conference on Mass Spectrometry and Allied Topics (ASMS 2013).
Vancouver
1.
Crappé J, Trooskens G, Hayakawa E, Luyten W, Baggerman G, Van Criekinge W, et al. Micropeptides, the next best thing after micro-RNA?: combining in silico prediction and ribosome profiling in a genome-wide search for novel micropeptides. Mass Spectrometry and Allied Topics, 61st ASMS conference, Abstracts. 2013.
MLA
Crappé, Jeroen, Geert Trooskens, Eisuke Hayakawa, et al. “Micropeptides, the Next Best Thing After micro-RNA?: Combining in Silico Prediction and Ribosome Profiling in a Genome-wide Search for Novel Micropeptides.” Mass Spectrometry and Allied Topics, 61st ASMS Conference, Abstracts. 2013. Print.