Advanced search
1 file | 599.61 KB Add to list

Novel transformer networks for improved sequence labeling in genomics

Jim Clauwaert (UGent) and Willem Waegeman (UGent)
Author
Organization
Abstract
Glycosylation of proteins in eukaryote cells is an important and complicated post-translation modification due to its pivotal role and association with crucial physiological functions within most of the proteins. Identification of glycosylation sites in a polypeptide chain is not an easy task due to multiple impediments. Analytical identification of these sites is expensive and laborious. There is a dire need to develop a reliable computational method for precise determination of such sites which can help researchers to save time and effort. Herein, we propose a novel predictor namely iGlycoS-PseAAC by integrating the Chou's Pseudo Amino Acid Composition (PseAAC) and relative/absolute position-based features. The self-consistency results show that the accuracy revealed by the model using the benchmark dataset for prediction of O-linked glycosylation having serine sites is 98.8 percent. The overall accuracy of predictor achieved through 10-fold cross validation by combining the positive and negative results is 97.2 percent. The overall accuracy achieved through Jackknife test is 96.195 percent by aggregating of all the prediction results. Thus the proposed predictor can help in predicting the O-linked glycosylated serine sites in an efficient and accurate way. The overall results show that the accuracy of the iGlycoS-PseAAC is higher than the existing tools.
Keywords
Proteins, Predictive models, Amino acids, Tools, Bioinformatics, Benchmark testing, Prediction algorithms, Serine, Glycosylation, GalNAc, O-linked, PseAAC, LYSINE SUCCINYLATION SITES, SEQUENCE-BASED PREDICTOR, CRITICAL SPHERICAL-SHELL, SUBCELLULAR-LOCALIZATION, ENSEMBLE CLASSIFIER, RECOMBINATION SPOTS, GRAPHICAL RULES, ENZYME-KINETICS, GENERAL-FORM, K-TUPLE, Genomics, deep learning, transformer networks, sequence labeling, BINDING SITES, DNA-SEQUENCES, DATABASE, IDENTIFICATION, RECOGNITION, PREDICTION, PROMOTERS, ALGORITHM, SEARCH, SPACER

Downloads

  • published.pdf
    • full text (Published version)
    • |
    • open access
    • |
    • PDF
    • |
    • 599.61 KB

Citation

Please use this url to cite or link to this publication:

MLA
Clauwaert, Jim, and Willem Waegeman. “Novel Transformer Networks for Improved Sequence Labeling in Genomics.” IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, vol. 19, no. 1, 2022, pp. 97–106, doi:10.1109/tcbb.2020.3035021.
APA
Clauwaert, J., & Waegeman, W. (2022). Novel transformer networks for improved sequence labeling in genomics. IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 19(1), 97–106. https://doi.org/10.1109/tcbb.2020.3035021
Chicago author-date
Clauwaert, Jim, and Willem Waegeman. 2022. “Novel Transformer Networks for Improved Sequence Labeling in Genomics.” IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 19 (1): 97–106. https://doi.org/10.1109/tcbb.2020.3035021.
Chicago author-date (all authors)
Clauwaert, Jim, and Willem Waegeman. 2022. “Novel Transformer Networks for Improved Sequence Labeling in Genomics.” IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 19 (1): 97–106. doi:10.1109/tcbb.2020.3035021.
Vancouver
1.
Clauwaert J, Waegeman W. Novel transformer networks for improved sequence labeling in genomics. IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS. 2022;19(1):97–106.
IEEE
[1]
J. Clauwaert and W. Waegeman, “Novel transformer networks for improved sequence labeling in genomics,” IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, vol. 19, no. 1, pp. 97–106, 2022.
@article{8721761,
  abstract     = {{Glycosylation of proteins in eukaryote cells is an important and complicated post-translation modification due to its pivotal role and association with crucial physiological functions within most of the proteins. Identification of glycosylation sites in a polypeptide chain is not an easy task due to multiple impediments. Analytical identification of these sites is expensive and laborious. There is a dire need to develop a reliable computational method for precise determination of such sites which can help researchers to save time and effort. Herein, we propose a novel predictor namely iGlycoS-PseAAC by integrating the Chou's Pseudo Amino Acid Composition (PseAAC) and relative/absolute position-based features. The self-consistency results show that the accuracy revealed by the model using the benchmark dataset for prediction of O-linked glycosylation having serine sites is 98.8 percent. The overall accuracy of predictor achieved through 10-fold cross validation by combining the positive and negative results is 97.2 percent. The overall accuracy achieved through Jackknife test is 96.195 percent by aggregating of all the prediction results. Thus the proposed predictor can help in predicting the O-linked glycosylated serine sites in an efficient and accurate way. The overall results show that the accuracy of the iGlycoS-PseAAC is higher than the existing tools.}},
  author       = {{Clauwaert, Jim and Waegeman, Willem}},
  issn         = {{1545-5963}},
  journal      = {{IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS}},
  keywords     = {{Proteins,Predictive models,Amino acids,Tools,Bioinformatics,Benchmark testing,Prediction algorithms,Serine,Glycosylation,GalNAc,O-linked,PseAAC,LYSINE SUCCINYLATION SITES,SEQUENCE-BASED PREDICTOR,CRITICAL SPHERICAL-SHELL,SUBCELLULAR-LOCALIZATION,ENSEMBLE CLASSIFIER,RECOMBINATION SPOTS,GRAPHICAL RULES,ENZYME-KINETICS,GENERAL-FORM,K-TUPLE,Genomics,deep learning,transformer networks,sequence labeling,BINDING SITES,DNA-SEQUENCES,DATABASE,IDENTIFICATION,RECOGNITION,PREDICTION,PROMOTERS,ALGORITHM,SEARCH,SPACER}},
  language     = {{eng}},
  number       = {{1}},
  pages        = {{97--106}},
  title        = {{Novel transformer networks for improved sequence labeling in genomics}},
  url          = {{http://dx.doi.org/10.1109/tcbb.2020.3035021}},
  volume       = {{19}},
  year         = {{2022}},
}

Altmetric
View in Altmetric
Web of Science
Times cited: