[nlpatumd] At what point do shared data sets cause more harm than good? A case study from IR...

Ted Pedersen Mon, 19 Apr 2010 11:34:28 -0700

I ran into an interesting blog posting and related paper not long ago
that touches on an issue that comes up in the NLP community from time
to time, and that's at what point do standardized test collections
start to cause more harm than good. In general we've come to value
shared test sets, because they allow us to compare to other methods
using the same data. However, this can go badly wrong, and it appears
that this might be the case with the problem of ad-hoc retrieval,
which is one of the classic problems from Information Retrieval. The
authors argue that there really has been no progress on this problem
in more than a decade, despite a large number of papers reporting
exciting new results. The problem of course isn't really the data,
it's how the data is being used.


http://blog.codalism.com/?p=1029

And the related paper...

Improvements That Don't Add Up: Ad-Hoc Retrieval Results Since 1998
http://ww2.cs.mu.oz.au/~wew/papers/amwz09_cikm.pdf

Offered as food for thought.

Enjoy,
Ted

-- 
Ted Pedersen
http://www.d.umn.edu/~tpederse

[nlpatumd] At what point do shared data sets cause more harm than good? A case study from IR...

Reply via email to