During
this period, CMU refined the hierarchical classification
system HkNN (using a nearest-neighbor scheme) designed for
MuchMore as an “interlingual” approach to translingual
retrieval via an universal concept taxonomy – the
Medical Subject Headings (MeSH). We implemented an alternative
hierarchical classifier, HRocchio, for a comparative study.
We developed algorithms for crawling the Web for parallel
text and for automated extraction of comparable collections
from concurrent documents (broadcast news stories at the
same time period, for example). Those algorithms will be
used to automatically extract training data from the Web
for our corpus-based translingual retrieval methods (Psuedo-Relevance
Feedback, Chi-squared based thesaurus, etc.)
CSLI built a bilingual search engine from the parallel
Springer corpus and worked on devising and building a “bridging”
method, which numerically translates queries from one document
collection to another. This can be used in conjunction with
the bilingual search engine to search an English documents
collection using German queries.
|
|