Danish and Greek Web Search Experiments with Hummingbird SearchServer™ at CLEF 2005

Stephen Tomlinson. To appear in Proceedings of CLEF 2005.

Abstract

Hummingbird participated in the WebCLEF mixed monolingual retrieval task of the Cross-Language Evaluation Forum (CLEF) 2005. In this task, the system was given 547 known-item queries from 11 languages (134 Spanish, 121 English, 59 Dutch, 59 Portuguese, 57 German, 35 Hungarian, 30 Danish, 30 Russian, 16 Greek, 5 Icelandic and 1 French). The goal was to find the desired page in the 82GB EuroGOV collection (3.4 million pages crawled from government sites of 27 European domains). Our experiments found that stopword processing was more important than anticipated, perhaps because words common in one language may tend to be overweighted by inverse document frequency in a mixed language collection. Extra weight on the document title helped significantly, and extra weight on less deep urls significantly helped home page queries. Stemming was of neutral impact on average, but it made a substantial difference for some individual queries. We analyze several Danish and Greek queries in detail.

Full Paper

Related Information


Last Updated: 2005 Nov 22

Comments are welcome at comments@stephent.com.

Copyright © 2005 Stephen Tomlinson http://www.stephent.com/ir/papers/wc05.html