much.more about partners contacts home  
publications  

WP7.1: Multlingual Term Extraction: Sate of the Art

 


CMU implemented an initial version of a corpus-driven decompounder for German, based on cognates between the German and English (or dictionary translations of English) halves of a parallel corpus. In early tests, the system achieved 22% recall (of all German compounds) at greater than 99% precision. As a side effect of the decompounding process, a bilingual term lexicon from German compound words to English phrases is generated.


EIT built Similarity Thesauri from the Springer corpus for German-English and English-German using single words and phrases (based on ICD10 data).
 
 
last modified, july 2003
more   close