Early Precision
Measures:


Implications from the
Downside of Blind Feedback


S. Tomlinson, Hummingbird (SIGIR 2006, Aug 7/06)


The Unreported
Downside of
Blind Feedback:


Detrimental to the
First Relevant Item.


Observed at
CLEF,
NTCIR,
TREC
. . . and now RIA:


           MRR     GS10
umass     -0.02   -0.05*
city      -0.07*  -0.04*
albany    -0.02   -0.04*
sabir     -0.01   -0.03*
cmu       -0.06*  -0.02*
waterloo  -0.04   -0.02
clarit    +0.03   -0.01

Mean impact of blind feedback on 7 RIA systems over 150 topics
of TRECs 6-8 (* indicates statistical significance at 5% level)



Which Has
More Influence
on
Precision@10 ?


The
First Relevant Item?

Or
Secondary Recall?


Precision@10
Sides with
the
Recall Measures:


           P10     P20
umass     +0.03*  +0.03*
clarit    +0.02*  +0.03*
albany    +0.02   +0.02*
city      +0.02   +0.02*
waterloo  +0.02   +0.02*
cmu       +0.01   +0.03*
sabir     +0.00   +0.03*

Mean impact of blind feedback on 7 RIA systems over 150 topics
of TRECs 6-8 (* indicates statistical significance at 5% level)



Geometric MAP
is Not Robust.

GMAP Favors
Blind Feedback:


           MAP     GMAP
cmu       +0.02*  +0.02*
umass     +0.03*  +0.01*
waterloo  +0.02*  +0.01*
city      +0.03*  +0.01
sabir     +0.02*  +0.00
albany    +0.03*  -0.00
clarit    +0.03*  -0.01

Mean impact of blind feedback on 7 RIA systems over 150 topics
of TRECs 6-8 (* indicates statistical significance at 5% level)


'First Relevant'
Measures are
Robust
(especially GS10).



Generalized
Success@10:

GS10 = 1.08^(1-r)

(where r is the rank of
the first relevant item).


Estimates
Reading Saved
(instead of
Precision).


'First Relevant' Measures Compared:

     r   GS10    S10      RR (1/r)
     1:  1.00     1      1.00
     2:  0.93     1      0.50
     3:  0.86     1      0.33
     4:  0.79     1      0.25
   ...            1      ...
    10:  0.50     1      0.10 
    11:  0.46     0      0.09
   ...            0      ...
    50:  0.02     0      0.02
   ...            0      ...


An Intuitive
Interpretation
of
Generalized
Success@10:


Estimates S@10:

e.g. if GS10 = 0.8
then S10 = ~40/50

(because per topic, the S10 scores are rounded GS10 scores)


           S10     GS10
umass     -0.08*  -0.05*
city      -0.05*  -0.04*
albany     0.00   -0.04*
sabir     -0.05*  -0.03*
cmu       -0.01   -0.02*
waterloo   0.00   -0.02
clarit    -0.03   -0.01

Mean impact of blind feedback on 7 RIA systems over 150 topics
of TRECs 6-8 (* indicates statistical significance at 5% level)



Conclusion:


If Seeking
Just One Item

(e.g. to Answer
a Question):


Prefer GS10
to Precision@10.



(end)


What appears above is an HTML-formatted version of the poster displayed at the SIGIR 2006 conference in Seattle on August 7, 2006.

The corresponding 2-page paper (PDF) is here.


Last Updated: 2006 Aug 22

Comments are welcome at stephent@magma.ca.

Copyright © 2006 Stephen Tomlinson http://www.stephent.com/ir/papers/sigir2006poster.html