Experiments in Named Page Finding and Arabic Retrieval with Hummingbird SearchServer™ at TREC 2002

Stephen Tomlinson. In E. M. Voorhees and Lori P. Buckland, editors, Proceedings of the Eleventh Text REtrieval Conference (TREC 2002). Gaithersburg, Maryland, November 2002. NIST Special Publication 500-251.

Abstract

Hummingbird participated in the named page finding task of the TREC 2002 Web Track (find the named page in 18GB from the .GOV domain) and the monolingual Arabic topic relevance task of the TREC 2002 Cross-Language Track (find all relevant documents in 869MB of Arabic news data). In the named page finding task, SearchServer returned the named page in the first 10 rows for more than 80% of the 150 queries. Searching the full document content produced mean reciprocal rank (MRR) scores more than 20 points higher than just searching particular HTML properties (such as the Title), but enhancing a content search with a little extra weight for HTML properties further increased MRR by 6 points (with standard error of just 2 points). Treating queries as phrases was not found to help significantly (on average), but document length normalization increased MRR by more than 20 points. For Arabic topic relevance, light algorithmic stemming increased mean average precision (MAP) by 5 points, use of Arabic stop words increased MAP by 1 point, and query expansion from blind feedback increased MAP by 3 points.

Full Paper

Related Information


Last Updated: 2003 Aug 30

Comments are welcome at comments@stephent.com.

Copyright © 2003 Stephen Tomlinson http://www.stephent.com/ir/papers/trec2002.html