much.more about partners contacts home  
publications  

WP7.2: Relation Extraction: Sate of the Art

 


Work on relation extraction consists of: development of methods for filtering of automatically annotated relations; extraction of new instances for known relations; extraction of novel relations;

Relation Filtering and Extraction of New Relation Instances: During this period methods for filtering UMLS-based relations and extraction of new relation instances were developed. In order to evaluate these methods in Information Retrieval, the Springer corpus was annotated in several versions, combining UMLS-based relation annotation with filtering on the one hand and extraction of new instances on the other. Retrieval was tested using queries processed with equivalent settings. In addition to this, for evaluation purposes the queries were manually annotated by medical experts with a smaller set of semantic relations (15) and used in retrieval experiments. Results showed that the effect of relation filtering in monolingual document retrieval was either none or negative, thus confirming the previously observed tendency that query expansion techniques generally yield better results than query specification. Using the document collection enriched with newly extracted relation instances resulted in a slightly improved recall and average precision. Finally, using manually annotated queries on all versions of the annotated corpus worked best for the corpus with only new relation instances, showing that the relations we extract correspond to those perceived by experts better than those provided by UMLS.

Extraction of Novel Relations: In addition to the evaluation work described above, further experiments were undertaken to learn relations on the basis of concept coocurrences and context features. A controlled data set consisting of 50 pairs of concept classes was constructed, where the pairs were known to represent either relation A (treats) or B (location of). The experiments were aimed primarily at establishing the most reliable linguistic context features that may serve as attributes in relation classification. Among the scenarios tested were: all tokens, nouns, verbs, prepositions, neighboring concepts (CUIs), neighboring semantic types (TUIs), and combinations of the above. Results indicate that the combination of nouns and verbs is best suited to describe the context features of a relation, however a larger data set will have to be constructed to enable a systematic evaluation of the learning algorithms and settings.


 
 
last modified, july 2003
more   close