|
Work on relation
extraction consists of: development of methods for filtering
of automatically annotated relations; extraction of new
instances for known relations; extraction of novel
relations;
Relation Filtering and Extraction of New Relation
Instances: During this period methods for filtering
UMLS-based relations and extraction of new relation instances
were developed. In order to evaluate these methods in Information
Retrieval, the Springer corpus was annotated in several
versions, combining UMLS-based relation annotation with
filtering on the one hand and extraction of new instances
on the other. Retrieval was tested using queries processed
with equivalent settings. In addition to this, for evaluation
purposes the queries were manually annotated by medical
experts with a smaller set of semantic relations (15) and
used in retrieval experiments. Results showed that the effect
of relation filtering in monolingual document retrieval
was either none or negative, thus confirming the previously
observed tendency that query expansion techniques generally
yield better results than query specification. Using the
document collection enriched with newly extracted relation
instances resulted in a slightly improved recall and average
precision. Finally, using manually annotated queries on
all versions of the annotated corpus worked best for the
corpus with only new relation instances, showing that the
relations we extract correspond to those perceived by experts
better than those provided by UMLS.
Extraction of Novel Relations: In addition
to the evaluation work described above, further experiments
were undertaken to learn relations on the basis of concept
coocurrences and context features. A controlled data set
consisting of 50 pairs of concept classes was constructed,
where the pairs were known to represent either relation
A (treats) or B (location of). The experiments were aimed
primarily at establishing the most reliable linguistic context
features that may serve as attributes in relation classification.
Among the scenarios tested were: all tokens, nouns, verbs,
prepositions, neighboring concepts (CUIs), neighboring semantic
types (TUIs), and combinations of the above. Results indicate
that the combination of nouns and verbs is best suited to
describe the context features of a relation, however a larger
data set will have to be constructed to enable a systematic
evaluation of the learning algorithms and settings.
|
|