Advanced search
1 file | 3.54 MB Add to list

An automated end-to-end pipeline for fine-grained video annotation using deep neural networks

Author
Organization
Abstract
The searchability of video content is often limited to the descriptions authors and/or annotators care to provide. The level of description can range from absolutely nothing to fine-grained annotations at the level of frames. Based on these annotations, certain parts of the video content are more searchable than others. Within the context of the STEAMER project, we developed an innovative end-to-end system that attempts to tackle the problem of unsupervised retrieval of news video content, leveraging multiple information streams and deep neural networks. In particular, we extracted keyphrases and named entities from transcripts, subsequently refining these keyphrases and named entities based on their visual appearance in the news video content. Moreover, to allow for fine-grained frame-level annotations, we temporally located high-confidence keyphrases in the news video content. To that end, we had to tackle challenges such as the automatic construction of training sets and the automatic assessment of keyphrase imageability. In this paper, we discuss the main components of our end-to-end system, capable of transforming textual and visual information into fine-grained video annotations.
Keywords
fine-grained video annotation, deep neural networks, video retrieval

Downloads

  • (...).pdf
    • full text (Published version)
    • |
    • UGent only
    • |
    • PDF
    • |
    • 3.54 MB

Citation

Please use this url to cite or link to this publication:

MLA
Vandersmissen, Baptist, et al. “An Automated End-to-End Pipeline for Fine-Grained Video Annotation Using Deep Neural Networks.” ICMR’16: PROCEEDINGS OF THE 2016 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ACM, 2016, pp. 409–12, doi:10.1145/2911996.2912028.
APA
Vandersmissen, B., Sterckx, L., Demeester, T., Jalalvand, A., De Neve, W., & Van de Walle, R. (2016). An automated end-to-end pipeline for fine-grained video annotation using deep neural networks. ICMR’16: PROCEEDINGS OF THE 2016 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, 409–412. https://doi.org/10.1145/2911996.2912028
Chicago author-date
Vandersmissen, Baptist, Lucas Sterckx, Thomas Demeester, Azarakhsh Jalalvand, Wesley De Neve, and Rik Van de Walle. 2016. “An Automated End-to-End Pipeline for Fine-Grained Video Annotation Using Deep Neural Networks.” In ICMR’16: PROCEEDINGS OF THE 2016 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, 409–12. ACM. https://doi.org/10.1145/2911996.2912028.
Chicago author-date (all authors)
Vandersmissen, Baptist, Lucas Sterckx, Thomas Demeester, Azarakhsh Jalalvand, Wesley De Neve, and Rik Van de Walle. 2016. “An Automated End-to-End Pipeline for Fine-Grained Video Annotation Using Deep Neural Networks.” In ICMR’16: PROCEEDINGS OF THE 2016 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, 409–412. ACM. doi:10.1145/2911996.2912028.
Vancouver
1.
Vandersmissen B, Sterckx L, Demeester T, Jalalvand A, De Neve W, Van de Walle R. An automated end-to-end pipeline for fine-grained video annotation using deep neural networks. In: ICMR’16: PROCEEDINGS OF THE 2016 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL. ACM; 2016. p. 409–12.
IEEE
[1]
B. Vandersmissen, L. Sterckx, T. Demeester, A. Jalalvand, W. De Neve, and R. Van de Walle, “An automated end-to-end pipeline for fine-grained video annotation using deep neural networks,” in ICMR’16: PROCEEDINGS OF THE 2016 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, New York, USA, 2016, pp. 409–412.
@inproceedings{8048654,
  abstract     = {{The searchability of video content is often limited to the descriptions authors and/or annotators care to provide. The level of description can range from absolutely nothing to fine-grained annotations at the level of frames. Based on these annotations, certain parts of the video content are more searchable than others.

Within the context of the STEAMER project, we developed an innovative end-to-end system that attempts to tackle the problem of unsupervised retrieval of news video content, leveraging multiple information streams and deep neural networks. In particular, we extracted keyphrases and named entities from transcripts, subsequently refining these keyphrases and named entities based on their visual appearance in the news video content. Moreover, to allow for fine-grained frame-level annotations, we temporally located high-confidence keyphrases in the news video content. To that end, we had to tackle challenges such as the automatic construction of training sets and the automatic assessment of keyphrase imageability.

In this paper, we discuss the main components of our end-to-end system, capable of transforming textual and visual information into fine-grained video annotations.}},
  author       = {{Vandersmissen, Baptist and Sterckx, Lucas and Demeester, Thomas and Jalalvand, Azarakhsh and De Neve, Wesley and Van de Walle, Rik}},
  booktitle    = {{ICMR'16: PROCEEDINGS OF THE 2016 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL}},
  isbn         = {{9781450343596}},
  keywords     = {{fine-grained video annotation,deep neural networks,video retrieval}},
  language     = {{eng}},
  location     = {{New York, USA}},
  pages        = {{409--412}},
  publisher    = {{ACM}},
  title        = {{An automated end-to-end pipeline for fine-grained video annotation using deep neural networks}},
  url          = {{http://doi.org/10.1145/2911996.2912028}},
  year         = {{2016}},
}

Altmetric
View in Altmetric
Web of Science
Times cited: