Advanced search
1 file | 2.53 MB Add to list

SENTiVENT : enabling supervised information extraction of company-specific events in economic and financial news

Author
Organization
Project
Abstract
We present SENTiVENT, a corpus of fine-grained company-specific events in English economic news articles. The domain of event processing is highly productive and various general domain, fine-grained event extraction corpora are freely available but economically-focused resources are lacking. This work fills a large need for a manually annotated dataset for economic and financial text mining applications. A representative corpus of business news is crawled and an annotation scheme developed with an iteratively refined economic event typology. The annotations are compatible with benchmark datasets (ACE/ERE) so state-of-the-art event extraction systems can be readily applied. This results in a gold-standard dataset annotated with event triggers, participant arguments, event co-reference, and event attributes such as type, subtype, negation, and modality. An adjudicated reference test set is created for use in annotator and system evaluation. Agreement scores are substantial and annotator performance adequate, indicating that the annotation scheme produces consistent event annotations of high quality. In an event detection pilot study, satisfactory results were obtained with a macro-averaged F-1-score of 59% validating the dataset for machine learning purposes. This dataset thus provides a rich resource on events as training data for supervised machine learning for economic and financial applications. The dataset and related source code is made available at https://osf.io/8jec2/.
Keywords
Event extraction, Economic events, Financial information extraction, Annotation scheme, English corpus, Event detection, INVESTOR SENTIMENT, MARKET PREDICTION, VOLATILITY, IMPACT

Downloads

  • published.pdf
    • full text (Published version)
    • |
    • open access
    • |
    • PDF
    • |
    • 2.53 MB

Citation

Please use this url to cite or link to this publication:

MLA
Jacobs, Gilles, and Veronique Hoste. “SENTiVENT : Enabling Supervised Information Extraction of Company-Specific Events in Economic and Financial News.” LANGUAGE RESOURCES AND EVALUATION, vol. 56, no. 1, 2022, pp. 225–57, doi:10.1007/s10579-021-09562-4.
APA
Jacobs, G., & Hoste, V. (2022). SENTiVENT : enabling supervised information extraction of company-specific events in economic and financial news. LANGUAGE RESOURCES AND EVALUATION, 56(1), 225–257. https://doi.org/10.1007/s10579-021-09562-4
Chicago author-date
Jacobs, Gilles, and Veronique Hoste. 2022. “SENTiVENT : Enabling Supervised Information Extraction of Company-Specific Events in Economic and Financial News.” LANGUAGE RESOURCES AND EVALUATION 56 (1): 225–57. https://doi.org/10.1007/s10579-021-09562-4.
Chicago author-date (all authors)
Jacobs, Gilles, and Veronique Hoste. 2022. “SENTiVENT : Enabling Supervised Information Extraction of Company-Specific Events in Economic and Financial News.” LANGUAGE RESOURCES AND EVALUATION 56 (1): 225–257. doi:10.1007/s10579-021-09562-4.
Vancouver
1.
Jacobs G, Hoste V. SENTiVENT : enabling supervised information extraction of company-specific events in economic and financial news. LANGUAGE RESOURCES AND EVALUATION. 2022;56(1):225–57.
IEEE
[1]
G. Jacobs and V. Hoste, “SENTiVENT : enabling supervised information extraction of company-specific events in economic and financial news,” LANGUAGE RESOURCES AND EVALUATION, vol. 56, no. 1, pp. 225–257, 2022.
@article{8720119,
  abstract     = {{We present SENTiVENT, a corpus of fine-grained company-specific events in English economic news articles. The domain of event processing is highly productive and various general domain, fine-grained event extraction corpora are freely available but economically-focused resources are lacking. This work fills a large need for a manually annotated dataset for economic and financial text mining applications. A representative corpus of business news is crawled and an annotation scheme developed with an iteratively refined economic event typology. The annotations are compatible with benchmark datasets (ACE/ERE) so state-of-the-art event extraction systems can be readily applied. This results in a gold-standard dataset annotated with event triggers, participant arguments, event co-reference, and event attributes such as type, subtype, negation, and modality. An adjudicated reference test set is created for use in annotator and system evaluation. Agreement scores are substantial and annotator performance adequate, indicating that the annotation scheme produces consistent event annotations of high quality. In an event detection pilot study, satisfactory results were obtained with a macro-averaged F-1-score of 59% validating the dataset for machine learning purposes. This dataset thus provides a rich resource on events as training data for supervised machine learning for economic and financial applications. The dataset and related source code is made available at https://osf.io/8jec2/.}},
  author       = {{Jacobs, Gilles and Hoste, Veronique}},
  issn         = {{1574-020X}},
  journal      = {{LANGUAGE RESOURCES AND EVALUATION}},
  keywords     = {{Event extraction,Economic events,Financial information extraction,Annotation scheme,English corpus,Event detection,INVESTOR SENTIMENT,MARKET PREDICTION,VOLATILITY,IMPACT}},
  language     = {{eng}},
  number       = {{1}},
  pages        = {{225--257}},
  title        = {{SENTiVENT : enabling supervised information extraction of company-specific events in economic and financial news}},
  url          = {{http://doi.org/10.1007/s10579-021-09562-4}},
  volume       = {{56}},
  year         = {{2022}},
}

Altmetric
View in Altmetric
Web of Science
Times cited: