Evolving local and global weighting schemes in information retrieval |
| |
Authors: | Ronan Cummins Colm O’Riordan |
| |
Institution: | (1) Department of Information Technology, National University of Ireland, Galway, Ireland |
| |
Abstract: | This paper describes a method, using Genetic Programming, to automatically determine term weighting schemes for the vector
space model. Based on a set of queries and their human determined relevant documents, weighting schemes are evolved which
achieve a high average precision. In Information Retrieval (IR) systems, useful information for term weighting schemes is
available from the query, individual documents and the collection as a whole.
We evolve term weighting schemes in both local (within-document) and global (collection-wide) domains which interact with
each other correctly to achieve a high average precision. These weighting schemes are tested on well-known test collections
and are compared to the traditional tf-idf weighting scheme and to the BM25 weighting scheme using standard IR performance metrics.
Furthermore, we show that the global weighting schemes evolved on small collections also increase average precision on larger
TREC data. These global weighting schemes are shown to adhere to Luhn’s resolving power as both high and low frequency terms
are assigned low weights. However, the local weightings evolved on small collections do not perform as well on large collections.
We conclude that in order to evolve improved local (within-document) weighting schemes it is necessary to evolve these on
large collections. |
| |
Keywords: | Genetic Programming Information Retrieval Term-Weighting Schemes |
本文献已被 SpringerLink 等数据库收录! |
|