Annual Report 2001


A large part of the work in this period consisted of the compilation of several reports that would help to further define the scope and purpose of the project. The consortium therefore first formulated a State of the Art report on cross-lingual information retrieval (CLIR) in general and on concept-based methods in the medical domain in particular. On the basis of this report, User Requirements for a concept-based, medical CLIR system could be formulated, while for evaluation purposes of such a system a Performance Testing Plan could be defined.

Parallel to these developments, relevant Medical Corpora were identified, collected and prepared for further processing. To facilitate an easy exchange of annotated data, an XML-based Annotation Format was defined, on the basis of which Corpus Annotation was initiated with tools for shallow processing (i.e. PoS tagging, morphological analysis and phrase recognition - chunking) and semantic annotation (based on UMLS and EuroWordNet). An experimental prototype was set up that gives access to semantically annotated scientific medical journal abstracts.

In order to conduct a relevant Performance Evaluation, comparing different CLIR methods and combinations of such methods in the medical domain, the project initiated the development of a Test Collection of medical documents with corresponding relevance assessments.

Also, research and development work was started in the areas of Bilingual Term Extraction, Sense Disambiguation, and Relation Extraction.

MuchMore Project Developments: July 2000 - October 2001

