Advanced search
1 file | 3.00 MB Add to list

Assessing the reliability of point mutation as data augmentation for deep learning with genomic data

Hyunjung Lee, Utku Özbulak (UGent) , Homin Park (UGent) , Stephen Depuydt (UGent) , Wesley De Neve (UGent) and Joris Vankerschaver (UGent)
Author
Organization
Project
Abstract
Background: Deep neural networks (DNNs) have the potential to revolutionize our understanding and treatment of genetic diseases. An inherent limitation of deep neural networks, however, is their high demand for data during training. To overcome this challenge, other fields, such as computer vision, use various data augmentation techniques to artificially increase the available training data for DNNs. Unfortunately, most data augmentation techniques used in other domains do not transfer well to genomic data. Results: Most genomic data possesses peculiar properties and data augmentations may significantly alter the intrinsic properties of the data. In this work, we propose a novel data augmentation technique for genomic data inspired by biology: point mutations. By employing point mutations as substitutes for codons, we demonstrate that our newly proposed data augmentation technique enhances the performance of DNNs across various genomic tasks that involve coding regions, such as translation initiation and splice site detection. Conclusion: Silent and missense mutations are found to positively influence effectiveness, while nonsense mutations and random mutations in non-coding regions generally lead to degradation. Overall, point mutation-based augmentations in genomic datasets present valuable opportunities for improving the accuracy and reliability of predictive models for DNA sequences.
Keywords
Translation initiation, Splicing, Point mutations, Deep learning, Data augmentation

Downloads

  • Lee et al-2024-BMC Bioinformatics.pdf
    • full text (Published version)
    • |
    • open access
    • |
    • PDF
    • |
    • 3.00 MB

Citation

Please use this url to cite or link to this publication:

MLA
Lee, Hyunjung, et al. “Assessing the Reliability of Point Mutation as Data Augmentation for Deep Learning with Genomic Data.” BMC BIOINFORMATICS, vol. 25, no. 1, 2024, doi:10.1186/s12859-024-05787-6.
APA
Lee, H., Özbulak, U., Park, H., Depuydt, S., De Neve, W., & Vankerschaver, J. (2024). Assessing the reliability of point mutation as data augmentation for deep learning with genomic data. BMC BIOINFORMATICS, 25(1). https://doi.org/10.1186/s12859-024-05787-6
Chicago author-date
Lee, Hyunjung, Utku Özbulak, Homin Park, Stephen Depuydt, Wesley De Neve, and Joris Vankerschaver. 2024. “Assessing the Reliability of Point Mutation as Data Augmentation for Deep Learning with Genomic Data.” BMC BIOINFORMATICS 25 (1). https://doi.org/10.1186/s12859-024-05787-6.
Chicago author-date (all authors)
Lee, Hyunjung, Utku Özbulak, Homin Park, Stephen Depuydt, Wesley De Neve, and Joris Vankerschaver. 2024. “Assessing the Reliability of Point Mutation as Data Augmentation for Deep Learning with Genomic Data.” BMC BIOINFORMATICS 25 (1). doi:10.1186/s12859-024-05787-6.
Vancouver
1.
Lee H, Özbulak U, Park H, Depuydt S, De Neve W, Vankerschaver J. Assessing the reliability of point mutation as data augmentation for deep learning with genomic data. BMC BIOINFORMATICS. 2024;25(1).
IEEE
[1]
H. Lee, U. Özbulak, H. Park, S. Depuydt, W. De Neve, and J. Vankerschaver, “Assessing the reliability of point mutation as data augmentation for deep learning with genomic data,” BMC BIOINFORMATICS, vol. 25, no. 1, 2024.
@article{01HWW1743VJ0Z63MTTAVY0PJQV,
  abstract     = {{Background: Deep neural networks (DNNs) have the potential to revolutionize our understanding and treatment of genetic diseases. An inherent limitation of deep neural networks, however, is their high demand for data during training. To overcome this challenge, other fields, such as computer vision, use various data augmentation techniques to artificially increase the available training data for DNNs. Unfortunately, most data augmentation techniques used in other domains do not transfer well to genomic data.

Results: Most genomic data possesses peculiar properties and data augmentations may significantly alter the intrinsic properties of the data. In this work, we propose a novel data augmentation technique for genomic data inspired by biology: point mutations. By employing point mutations as substitutes for codons, we demonstrate that our newly proposed data augmentation technique enhances the performance of DNNs across various genomic tasks that involve coding regions, such as translation initiation and splice site detection.

Conclusion: Silent and missense mutations are found to positively influence effectiveness, while nonsense mutations and random mutations in non-coding regions generally lead to degradation. Overall, point mutation-based augmentations in genomic datasets present valuable opportunities for improving the accuracy and reliability of predictive models for DNA sequences.}},
  articleno    = {{170}},
  author       = {{Lee, Hyunjung and Özbulak, Utku and Park, Homin and Depuydt, Stephen and De Neve, Wesley and Vankerschaver, Joris}},
  issn         = {{1471-2105}},
  journal      = {{BMC BIOINFORMATICS}},
  keywords     = {{Translation initiation,Splicing,Point mutations,Deep learning,Data augmentation}},
  language     = {{eng}},
  number       = {{1}},
  pages        = {{19}},
  title        = {{Assessing the reliability of point mutation as data augmentation for deep learning with genomic data}},
  url          = {{http://doi.org/10.1186/s12859-024-05787-6}},
  volume       = {{25}},
  year         = {{2024}},
}

Altmetric
View in Altmetric
Web of Science
Times cited: