Work on WP5 includes the development of a manual annotation
tool for lexical semantic tagging and the use of this tool
for the construction of a lexical sample corpus for sense
disambiguation evaluation based on the German Springer corpus.
This corpus consists of 100 occurrences for each of 30 ambiguous
words in the German Springer corpus, manually annotated
with the appropriate sense in EuroWordNet (i.e. GermaNet).
The annotation tool is currently being adapted to allow
also for manual annotation with UMLS (i.e. MeSH2001) concepts.
Also, tools were developed for and several experiments
conducted on domain specific sense selection as described
in a paper on Ranking
and Selecting Synsets by Domain Relevance. The methods
described in this paper could be used to pre-select relevant
senses and thereby reducing the disambiguation effort per
se.
At the same time, clustering
and visualization methods were developed for empirical
investigation of sense ambiguity. This can reveal: 1. which
clusters of occurrences in the document collection correspond
to which senses as defined in the lexical resource; 2. which
sense distinctions made in the lexical resource are redundant
in practice; and 3. if a cluster corresponds to a word sense
that is not yet covered by the lexical resource.
|