Advanced search
1 file | 51.78 MB Add to list

Epitope Odyssey : data-driven tools to extend the known immunopeptide universe

Arthur Declercq (UGent)
(2025)
Author
Promoter
(UGent) and (UGent)
Organization
Project
Abstract
DNA, RNA and proteins form the holy molecular trinity of biology, creating life as we know it. A complex flow of information regulated at many different steps ultimately enables linear DNA to create complex proteins, each dedicated to a specific task. These proteins have therefore found themselves an important subject of research, as a better understanding of these proteins, both in health and in disease, can provide new insights as to how our body functions, why things sometimes go wrong and how modern medicine can prevent or combat this. Due to this potential of protein studies, the importance of proteomics has skyrocketed and has led to many subfields each studying a smaller part of the whole leading to specific treatments and insights. One such subfield is immunopeptidomics which has shown great potential in identifying key components, i.e. epitopes, presented on the cell surface bound to MHC proteins. These epitopes enable our immune system to scan cells for pathogens, infections and mutations. It is through the identification of these epitopes that we can identify novel antigens that can be used in defeating viruses through vaccination or help our body to defeat cancer through cancer vaccinations or T-cell receptor technologies. It is therefore essential that we accurately identify as many of these epitopes as possible. However, currently epitope discovery is still hampered by the lack of dedicated bioinformatics software tailored to these immunopeptides. Therefore, I dedicated my PhD towards the development of new algorithms that can further push the identification of such epitopes. Firstly, by developing new MS²PIP peak intensity prediction models that more accurately predict how these immunopeptides fragment in the mass spectrometer. Secondly, I have integrated these peak intensity prediction models together with retention time prediction models into a package called MS²Rescore. The orthogonal information from these prediction models creates a unique feature set for PSM rescoring, allowing MS²Rescore to significantly improve the amount of identified immunopeptides in a single experiment. To make MS²Rescore widely available and easy to use, I substantially refactored MS²Rescore as a third step in my PhD. This new, more modular package, alongside a graphical user interface and Python API has ensured that MS²Rescore can be easily adapted with new predictors, integrated into new pipelines and used by the wider community. This also enabled me to create a new version of MS²Rescore, called TIMS²Rescore. Indeed, as my fourth objective I fully adapted MS²Rescore to DDA-PASEF data, made popular through the emergence of timsTOF instruments. Through the integration of newly created peak intensity prediction models, ion mobility prediction models and a streamlined data throughput, I showed that TIMS²Rescore not only boosts immunopeptide identification rates but also improves plasma proteomics and metaproteomics data analysis. Lastly, to move beyond mass spectrometry-based identification of potential epitopes, I set out to create a new epitope predictor, MHC-3PO. By combining cutting-edge deep learning techniques such as large protein language models, graph neural networks and Siamese contrastive learning, I created an epitope predictor that is capable of generalising to unseen pMHC pairs and capable of predicting epitopes carrying PTMs, a field still largely understudied. Through MHC-3PO and its explainability I aim to further uncover this modified epitope universe going forward. In conclusion, this research has introduced innovative bioinformatics tools that push the boundaries of epitope discovery through mass spectrometry-based immunopeptidomics further enhancing our understanding of immune responses. These tools have shown great promise to advance immunotherapy applications, including cancer vaccines and T-cell receptor technologies.
DNA, RNA en eiwitten vormen de heilige drievuldigheid van de biologie, de fundamenten van het leven zoals we het kennen. Informatie gecodeerd in ons DNA wordt via complexe stappen omgezet naar eiwitten, die op hun beurt steeds complexere functies vervullen in ons lichaam. Eiwitten staan daarom ook centraal in heel veel onderzoeken, want inzicht in hoe ze werken in gezonde én zieke toestand helpt ons te begrijpen hoe het lichaam functioneert, waarom het soms niet werkt, en hoe we dat via moderne geneeskunde kunnen voorkomen of oplossen. Daarom heeft de opkomst van proteomics, de grootschalige studie van eiwitten, ook geleid tot tal van gespecialiseerde subdomeinen die elk een stukje van ons proteoom bestuderen, met tal van toepassingen in therapieën en diagnostiek. Eén van die subdomeinen is immunopeptidomics, een vakgebied dat korte eiwitfragmenten (epitopen) op het celoppervlak onderzoekt die gepresenteerd worden via MHC-eiwitten. Deze presentatie is cruciaal voor de werking van ons immuunsysteem. Door zulke epitopen te identificeren, kunnen we nieuwe antigenen ontdekken die vervolgens gebruikt kunnen worden in vaccins tegen virussen, of in kankertherapieën zoals gepersonaliseerde kankervaccins of T-cel receptor behandelingen. Het is dus essentieel om zo veel mogelijk van deze epitopen accuraat te identificeren via massaspectrometrie. Helaas ontbreekt het momenteel nog aan bioinformatica software die specifiek ontworpen zijn om deze immuun-gerelateerde peptiden te identificeren. Daarom heb ik mijn doctoraat gewijd aan het ontwikkelen van algoritmes die de identificatie van epitopen kunnen verbeteren. Eerst ontwikkelde ik nieuwe MS²PIP modellen om te voorspellen hoe deze epitopen fragmenteren in de massaspectrometer. Deze fragmentatie voorspellingen heb ik vervolgens gecombineerd met retentietijd voorspellingen in een software tool genaamd MS²Rescore. Door al deze voorspellingen te gebruiken voor het herscoren van peptide-spectrum matches kan MS²Rescore het aantal geïdentificeerde immunopeptiden in een enkele analyse sterk verhogen. Dit is een belangrijke stap vooruit om de problemen, die standaard zoekalgoritmes steevast hebben met deze epitopen, tegen te gaan. Als derde deel van mijn doctoraat, herschreef ik MS²Rescore volledig om het toegankelijker te maken. Deze nieuwe versie bevat een gebruiksvriendelijke interface én een Python API, wat toelaat om makkelijk nieuwe algoritmes te integreren in MS²Rescore en om MS²Rescore zelf makkelijk op te nemen in andere software. Daardoor werd ook de ontwikkeling van een nieuwe variant eenvoudiger: TIMS²Rescore is een versie geoptimaliseerd voor DDA-PASEF data komende van timsTOF-instrumenten. Door nieuwe MS²PIP modellen te integreren voor fragmentatie voorspellingen en modellen voor CCS, én de software te optimaliseren voor timsTOF output, toont TIMS²Rescore zich niet alleen nuttig voor epitoop identificatie, maar ook voor het identificeren van eiwitten in het bloed of komende van verschillende microbiomen. Tot slot, nu het makkelijker is om deze epitopen te identificeren, ontwikkelde ik een nieuwe AI algoritme dan deze epitopen kan voorspellen, genaamd MHC-3PO. Deze methode combineert de nieuwste deep learning technieken, waaronder grote eiwit-taalmodellen, neurale netwerken gebaseerd op grafen en contrasterende Siamese netwerken. Het resultaat is dat MHC-3PO niet alleen makkelijker voorspellingen maakt voor ongeziene MHC moleculen, maar ook rekening kan houden met post-translationele modificaties op epitopen, iets wat tot nu toe niet mogelijk was. Via MHC-3PO hoop ik dan ook meer inzicht te krijgen in de wereld van gemodificeerde epitopen. Samengevat introduceert dit onderzoek een reeks innovatieve bioinformatica algoritmes die helpen om de epitopen beter te identificeren met immunopeptidomics. Deze tools kunnen dus helpen om meer inzicht te krijgen in de werking van het immuunsysteem en om geavanceerde immuuntherapieën, zoals kankervaccins en T-cel gebaseerde behandelingen een duwtje in de juiste richting te geven.
Keywords
Machine learning, bioinformatics, mass spectrometry, immunopeptidomics

Downloads

  • (...).pdf
    • full text (Published version)
    • |
    • UGent only (changes to open access on 2030-07-12)
    • |
    • PDF
    • |
    • 51.78 MB

Citation

Please use this url to cite or link to this publication:

MLA
Declercq, Arthur. Epitope Odyssey : Data-Driven Tools to Extend the Known Immunopeptide Universe. Ghent University. Faculty of Medicine and Health Sciences, 2025.
APA
Declercq, A. (2025). Epitope Odyssey : data-driven tools to extend the known immunopeptide universe. Ghent University. Faculty of Medicine and Health Sciences, Ghent, Belgium.
Chicago author-date
Declercq, Arthur. 2025. “Epitope Odyssey : Data-Driven Tools to Extend the Known Immunopeptide Universe.” Ghent, Belgium: Ghent University. Faculty of Medicine and Health Sciences.
Chicago author-date (all authors)
Declercq, Arthur. 2025. “Epitope Odyssey : Data-Driven Tools to Extend the Known Immunopeptide Universe.” Ghent, Belgium: Ghent University. Faculty of Medicine and Health Sciences.
Vancouver
1.
Declercq A. Epitope Odyssey : data-driven tools to extend the known immunopeptide universe. [Ghent, Belgium]: Ghent University. Faculty of Medicine and Health Sciences; 2025.
IEEE
[1]
A. Declercq, “Epitope Odyssey : data-driven tools to extend the known immunopeptide universe,” Ghent University. Faculty of Medicine and Health Sciences, Ghent, Belgium, 2025.
@phdthesis{01K6AN7CENRMG1M3SAKBXZN4A4,
  abstract     = {{DNA, RNA and proteins form the holy molecular trinity of biology, creating life as we know it. A complex flow of information regulated at many different steps ultimately enables linear DNA to create complex proteins, each dedicated to a specific task. These proteins have therefore found themselves an important subject of research, as a better understanding of these proteins, both in health and in disease, can provide new insights as to how our body functions, why things sometimes go wrong and how modern medicine can prevent or combat this. Due to this potential of protein studies, the importance of proteomics has skyrocketed and has led to many subfields each studying a smaller part of the whole leading to specific treatments and insights. 
One such subfield is immunopeptidomics which has shown great potential in identifying key components, i.e. epitopes, presented on the cell surface bound to MHC proteins. These epitopes enable our immune system to scan cells for pathogens, infections and mutations. It is through the identification of these epitopes that we can identify novel antigens that can be used in defeating viruses through vaccination or help our body to defeat cancer through cancer vaccinations or T-cell receptor technologies. It is therefore essential that we accurately identify as many of these epitopes as possible. However, currently epitope discovery is still hampered by the lack of dedicated bioinformatics software tailored to these immunopeptides.
Therefore, I dedicated my PhD towards the development of new algorithms that can further push the identification of such epitopes. Firstly, by developing new MS²PIP peak intensity prediction models that more accurately predict how these immunopeptides fragment in the mass spectrometer. Secondly, I have integrated these peak intensity prediction models together with retention time prediction models into a package called MS²Rescore. The orthogonal information from these prediction models creates a unique feature set for PSM rescoring, allowing MS²Rescore to significantly improve the amount of identified immunopeptides in a single experiment. To make MS²Rescore widely available and easy to use, I substantially refactored MS²Rescore as a third step in my PhD. This new, more modular package, alongside a graphical user interface and Python API has ensured that MS²Rescore can be easily adapted with new predictors, integrated into new pipelines and used by the wider community. This also enabled me to create a new version of MS²Rescore, called TIMS²Rescore. Indeed, as my fourth objective I fully adapted MS²Rescore to DDA-PASEF data, made popular through the emergence of timsTOF instruments. Through the integration of newly created peak intensity prediction models, ion mobility prediction models and a streamlined data throughput, I showed that TIMS²Rescore not only boosts immunopeptide identification rates but also improves plasma proteomics and metaproteomics data analysis. Lastly, to move beyond mass spectrometry-based identification of potential epitopes, I set out to create a new epitope predictor, MHC-3PO. By combining cutting-edge deep learning techniques such as large protein language models, graph neural networks and Siamese contrastive learning, I created an epitope predictor that is capable of generalising to unseen pMHC pairs and capable of predicting epitopes carrying PTMs, a field still largely understudied. Through MHC-3PO and its explainability I aim to further uncover this modified epitope universe going forward. 
In conclusion, this research has introduced innovative bioinformatics tools that push the boundaries of epitope discovery through mass spectrometry-based immunopeptidomics further enhancing our understanding of immune responses. These tools have shown great promise to advance immunotherapy applications, including cancer vaccines and T-cell receptor technologies.}},
  author       = {{Declercq, Arthur}},
  keywords     = {{Machine learning,bioinformatics,mass spectrometry,immunopeptidomics}},
  language     = {{eng}},
  pages        = {{I, 241}},
  publisher    = {{Ghent University. Faculty of Medicine and Health Sciences}},
  school       = {{Ghent University}},
  title        = {{Epitope Odyssey : data-driven tools to extend the known immunopeptide universe}},
  year         = {{2025}},
}