Disambiguation
Methods and Evaluation: Four methods were developed
within the MuchMore project: 1. the bilingual method takes
advantage of having a translated corpus, because knowing
the translation of an ambiguous word can be enough to determine
its sense; 2. the dictionary based method uses relations
between terms as deduced from UMLS to determine which sense
is being used in a particular instance; 3. the domain-specific
method uses the fact that certain meanings of general terms
are more significant than others in specific domains (for
example, in the medical domain, operation is far more likely
to refer to a surgical operation than a military operation);
4. the instance-based learning method uses a machine-learning
technique that we applied to unsupervised training in word-sense
disambiguation. Evaluation of these methods showed that
high precision, broad coverage disambiguation of medical
documents can be achieved without the costly annotation
of many training examples. The best results for precision
ranged from 74% (English) to 79% (German), achieved by the
UMLS related terms method on the UMLS evaluation corpus,
and from 77%-99% achieved by the Domain Specific Sense method
on the GermaNet evaluation corpus (although with low coverage).
The best results for coverage range from 67% achieved by
the Instance-Based Learning method on the GermaNet evaluation
corpus, to 83% (English) and 87% (German) achieved by the
UMLS related terms method on the whole Springer corpus.
Semantic Tagging System: Development of
an integrated semantic tagging and disambiguation system
as part of the MuchMore tools for linguistic and semantic
annotation. The system integrates two DFKI methods (domain
specific sense; instance-based learning) for EuroWordNet
disambiguation and two CSLI methods (bilingual; collocation)
for UMLS disambiguation.
Sense Discovery: Development of tools
for clustering and cross-lingual visualisation of word distribution
to analyse which clusters carry meaningful information in
specific domains and may be interpreted as domain specific
word-senses.
|