Modeling Multilingual Unrestricted Coreference in OntoNotes

CoNLL-2012, to be held jointly with EMNLP in conjunction with ACL (Jeju, Korea, 12-14 July 2012), will continue the tradition of including a shared task for natural language learning systems. The 2012 shared task will target the modeling of coreference resolution for multiple languages. The importance of the latter for the entity/event detection task, namely identifying all mentions of entities and events in text and clustering them into equivalence classes, has been well recognized in the natural language processing community. Automatic identification of coreferring entities and events in text has been an uphill battle for several decades, partly because it can require world knowledge which is not well-defined and partly owing to the lack of substantial annotated data.

The OntoNotes project ( -- a collaborative effort between BBN Technologies, University of Colorado, University of Southern California (ISI), University of Pennsylvania and Brandeis University -- has created a large-scale, accurate multilingual corpus for general anaphoric coreference that covers entities and events not limited to noun phrases or a limited set of entity types. The Linguistic Data Consortium (LDC) has agreed to make it freely available to the research community. The coreference layer in OntoNotes constitutes one part of a multi-layer, integrated annotation of shallow semantic structure in text with high inter-annotator agreement. In addition to coreference, this data is also tagged with syntactic trees, high coverage verb and some noun propositions, partial verb and noun word senses, and rich set of named entity types.

Modeling multilingual unrestricted coreference in the OntoNotes data is the shared task for CoNLL-2012. This is an extension of the CoNLL-2011 shared task and would involve automatic anaphoric mention detection and coreference resolution across three languages -- English, Chinese and Arabic -- using OntoNotes v5.0 corpus, given predicted information on the syntax, proposition, word sense and named entity layers. The training data will contain both gold standard and predicted annotations, but only predicted annotations will be provided with the test material. The English and Chinese language portion comprises roughly one million words per language from newswire, magazine articles, broadcast news, broadcast conversations, web data and conversational speech. The English corpus also contains a further 200k of the English translation of the New Testament. The Arabic portion is smaller, comprising 300k of newswire articles.

The evaluation will follow CoNLL-2011's strategy. The score for each language will be determined by computing the unweighted average across the MUC, BCUBED, and CEAF metrics. The introduction of two new languages in the shared task offers a unique opportunity to carry out research in new contexts of coreference resolution and derive more general findings, which go beyond the monolingual (English) setting. Given the multilingual focus of this shared task, the winner will be determined by aggregating the scores across all languages. Although the participants are not required to work with all three languages, they are strongly encouraged to work with at least two languages and one of them could be English. Systems will be penalized with a null score for the languages that are left out. In addition, the review process of the shared task will favorably consider papers reporting experiments in a multilingual settings.


Sameer Pradhan (Chair) Raytheon BBN Technologies, Cambridge, MA
Alessandro Moschitti University of Trento, Italy
Nianwen Xue, Brandeis University, Waltham, MA

Advisory Committee

Mitchell Marcus, University of Pennsylvania, Philadelphia, PA
Martha Palmer, University of Colorado, Boulder, CO
Lance Ramshaw, Raytheon BBN Technologies, Cambridge, MA
Ralph Weischedel, Raytheon BBN Technologies, Cambridge, MA
RocketTheme Joomla Templates