We consider the problem of automatic extraction and aggregation of event mentions in news.

We adopt the definition of events from the ECB+ corpus. An event is something that happens or a situation that occurs. Our goal in this project is to automatically extract mentions of events, their participants (who participate in the event), time (when it happens), and locations (where it happens) from each document. As the same event is often mentioned in multiple documents, we also want to grouping event mentions across documents such that all event mentions that refer to the same underlying event belong to the same cluster.

There are two subproblems:

(1) event mention extraction: We adapt our opinion extraction system for event mention extraction by using event-specific features. The system is trained on the gold-standard annotations (only a few number of sentences are annotated per document) of the ECB+ corpus, and is able to predict mentions of events, participant, time and location on unseen documents. The results can be viewed here.

(2) event coreference resolution: We propose a novel Bayesian model to solve the problem of within- and cross-document event coreference resolution. It is capable of performing generative clustering of event mentions while accounting for contextual similarities between event mentions. It is shown to significantly outperform the traditional supervised agglomerative clustering approach and the unsupervised nonparametric clustering approach. (demo coming soon!)