OntoNotes DB Tool

This is the first time in CoNLL that the data has many layers of annotation and there is significant supporting metadata, that we it is available in multiple format. One being the one that OntoNotes has realeased which is a separate file for each document and layer combination, in a hieararchical fashion. Second being the one mimicking the traditional CoNLL-style column format. We don't have any tools for researchers to manipulate the CoNLL-style format in, but for the standard OntoNotes format we have developed a Python API that makes it easier to read and manipulate it, and which we are making available as part of the CoNLL support software.

Documentation

The design of this API was discussed in the following article:

And, there are several examples on using it, in the following tutorial presented at HLT/NAACL in 2009:

The Python API can be downloaded below, and you can also download, or browse the documentation of the API [][HTML]

This is still a work in process and we would welcome your feedback or comments at This email address is being protected from spambots. You need JavaScript enabled to view it.

OntoNotes DB Tool v0.999b r6778 (Beta)

Scorer

We will be following the scoring strateey used in the "SemEval Task: Coreference Resolution in Multiple Languages". There is an ongoing debate in the community on the way the performance of a corference system is evaluated. Originally, the MUC metric was the standard for several years. Since then three metrics have been proposed — B-CUBED, CEAF, and most recently BLANC. We will be scoring the output of the CoNLL coreference systems using all the metrics. One of them (we haven't finalized yet) will be used to determine the winner.