Work within WP3 has been completed. Deliverable D3.2:
Performance Testing Plan has been submitted to the
commission. It outlines the methodology and the tools that
will be used to evaluate the effectiveness of the prototypes
that are being built. Both large, so-called "TREC-style"
tests for near-final or final prototypes and simpler tests
for intermediate prototypes will be employed. It is planned
to use well-established and proven measures for effectiveness,
such as precision and recall, as well as known-item searches
and overlap measures. The meaning of these tools for evaluation
is well understood today, thanks to extensive research carried
out in the past. The use of these popular measures allows
us to maintain comparability with similar evaluations.
Deliverable D3.1 (Test Collection) originally only included
the publicly available OHSUMED
test collection, for which the corresponding set of queries
was translated from English into German. However, in order
to be able to evaluate on a truly bilingual corpus (both
queries and documents are available in two languages), the
test collection has been extended with a MUCHMORE specific,
bilingual test collection that is based on a parallel corpus
of scientific medical journal abstracts, obtained through
the Springer
Link web site. Remaining work on this corpus (pooling,
relevance assessments) will be achieved in the context of
work packages WP8 (Cross-Lingual Information Access) and
WP9 (Performance Evaluation).
|
|