much.more about partners contacts home  
publications  

WP9: Performance Evaluatione of the Art

 

During this period, CMU refined the hierarchical classification system HkNN (using a nearest-neighbor scheme) designed for MuchMore as an “interlingual” approach to translingual retrieval via an universal concept taxonomy – the Medical Subject Headings (MeSH). We implemented an alternative hierarchical classifier, HRocchio, for a comparative study. We developed algorithms for crawling the Web for parallel text and for automated extraction of comparable collections from concurrent documents (broadcast news stories at the same time period, for example). Those algorithms will be used to automatically extract training data from the Web for our corpus-based translingual retrieval methods (Psuedo-Relevance Feedback, Chi-squared based thesaurus, etc.)

CSLI built a bilingual search engine from the parallel Springer corpus and worked on devising and building a “bridging” method, which numerically translates queries from one document collection to another. This can be used in conjunction with the bilingual search engine to search an English documents collection using German queries.

 

 



 
last modified, july 2003
more   close