Lexical and Algorithmic Stemming Compared for 9 European Languages with Hummingbird SearchServer™ at CLEF 2003

Stephen Tomlinson. Draft version in Carol Peters, editor, Working Notes for the CLEF 2003 Workshop, 21-22 August, Trondheim, Norway.

Abstract

(Draft): Hummingbird participated in the monolingual information retrieval tasks of the Cross-Language Evaluation Forum (CLEF) 2003: for natural language queries in 9 European languages (German, French, Italian, Spanish, Dutch, Finnish, Swedish, Russian and English), find all the relevant documents (with high precision) in the CLEF 2003 document sets. For each language, SearchServer scored higher than the median average precision on more topics than it scored lower. In a comparison of experimental SearchServer lexical stemmers with Porter's algorithmic stemmers, the biggest differences were for the languages in which compound words are frequent (German, Dutch, Finnish and Swedish). SearchServer scored significantly higher in average precision for German and Finnish, apparently from its ability to split compound words and find terms when they are parts of compounds in these languages. Most of the differences for the other languages appeared to be from SearchServer's lexical stemmers performing inflectional stemming while the algorithmic stemmers often additionally performed derivational stemming; these differences did not pass a significance test.

Full Paper

Related Information


Last Updated: 2003 Aug 30

Comments are welcome at comments@stephent.com.

Copyright © 2003 Stephen Tomlinson http://www.stephent.com/ir/papers/clef03.html