Word Sense Disambiguation Evaluation Sets


One of the research areas that the MuchMore project focused on is sense disambiguation, which is an important enabling task in concept-based, cross-lingual information access. Unfortunately, there is a lack of test sets for sense disambiguation evaluation, specifically for languages other than English and even more so for specific domains like medicine. Given that MuchMore has a focus on English as well as German in the medical domain, the project developed its own evaluation sets in order to test different disambiguation methods. The sets consist of disambiguated instances (GermaNet and MeSH annotations) that were selected from the German MuchMore Springer corpus.


GermaNet corpus in MuchMore format (see deliverable D4.1)

Please note - In this version of the corpus, only instances of the evaluation set are annotated with GermaNet senses

GermaNet Evaluation Set


MeSH corpus in MuchMore format

Please note - In this version of the corpus, all instances are annotated with MeSH concepts

Evaluation Set


