much.more about partners contacts home  
publications  

WP7: Term and Relation Extraction

 

In the context of WP7.1 (Multilingual Terminology Extraction), each of the partners involved in this work package worked on the modification of their bilingual term extraction tools to work with project specific requirements. This includes adaptation to the German/English pair, integration of a bilingual thesaurus, sentence alignment of the Springer corpus, the development of domain-specific morphological resources and the development of new tools for bilingual terminology extraction from comparable corpora. In addition, WP7.1 covers the respective assessment of thesauri in different languages. Some of the tools developed for bilingual terminology extraction will be adapted to this end.

Developments in WP7.2 (Relation Extraction) include a quantitative and qualitative evaluation of the distribution and usefulness of UMLS-based semantic relations in the Springer corpus. The evaluation showed that most relations were not very useful. Therefore, further work will concentrate on extracting novel relations through a co-occurrence analysis over concepts within abstracts and individual sentences. Also, the extraction and tagging of semantic relations will be improved by the use of grammatical relations and by sense disambiguation.

In respect to grammatical relation tagging, an evaluation corpus was developed that covers around 600 sentences from the German part of the Springer corpus. Sentences are manually annotated with grammatical relations, such as subject, object, indirect object and various PP-argument roles.

 

 

 
last modified, december 2001
more   close