A large part of the work in this period consisted of the compilation
of several reports that would help to further define the scope
and purpose of the project. The consortium therefore first formulated
a State of the Art report on cross-lingual information
retrieval (CLIR) in general and on concept-based methods in the
medical domain in particular. On the basis of this report, User
Requirements for a concept-based, medical CLIR system could
be formulated, while for evaluation purposes of such a system
a Performance Testing Plan could be defined.
Parallel to these developments, relevant Medical Corpora
were identified, collected and prepared for further processing.
To facilitate an easy exchange of annotated data, an XML-based
Annotation Format was defined, on the basis of which Corpus
Annotation was initiated with tools for shallow processing
(i.e. PoS tagging, morphological analysis and phrase recognition
- chunking) and semantic annotation (based on UMLS and EuroWordNet).
An experimental prototype was set up that gives access to semantically
annotated scientific medical journal abstracts.
In order to conduct a relevant Performance Evaluation,
comparing different CLIR methods and combinations of such methods
in the medical domain, the project initiated the development of
a Test Collection of medical documents with corresponding
Also, research and development work was started in the areas
of Bilingual Term Extraction, Sense Disambiguation,
and Relation Extraction.
MuchMore Project Developments: July 2000 - October 2001