much.more about partners contacts home  
publications  

WP5: Sense Disambiguation

 

Work on WP5 includes the development of a manual annotation tool for lexical semantic tagging and the use of this tool for the construction of a lexical sample corpus for sense disambiguation evaluation based on the German Springer corpus. This corpus consists of 100 occurrences for each of 30 ambiguous words in the German Springer corpus, manually annotated with the appropriate sense in EuroWordNet (i.e. GermaNet). The annotation tool is currently being adapted to allow also for manual annotation with UMLS (i.e. MeSH2001) concepts.

Also, tools were developed for and several experiments conducted on domain specific sense selection as described in a paper on Ranking and Selecting Synsets by Domain Relevance. The methods described in this paper could be used to pre-select relevant senses and thereby reducing the disambiguation effort per se.

At the same time, clustering and visualization methods were developed for empirical investigation of sense ambiguity. This can reveal: 1. which clusters of occurrences in the document collection correspond to which senses as defined in the lexical resource; 2. which sense distinctions made in the lexical resource are redundant in practice; and 3. if a cluster corresponds to a word sense that is not yet covered by the lexical resource.

 

 
last modified, december 2001
more   close