Information Filtering in TREC-9 and TDT-3: A Comparative Analysis |
| |
Authors: | Thomas Galen Ault Yiming Yang |
| |
Institution: | (1) Language Technologies Institute, Carnegie Mellon University, USA |
| |
Abstract: | Much work on automated information filtering has been done in the TREC and TDT domains, but differences in corpora, the nature of TREC topics vs. TDT events, the constraints imposed on training and testing, and the choices of performance measures confound any meaningful comparison between these domains. We attempt to bridge the gap between them by evaluating the performance of the k-nearest-neighbor (kNN) classification system on the corpus and categories from one domain using the constraints of the other. To maximize comparability and understand the effect of the evaluation metrics specific to each domain, we optimize the performance of kNN separately for the F
1, T9P (preferred metric for TREC-9) and C
trk (official metric for TDT-3) metrics. Through a thorough comparison of our within-domain and cross-domain results, our results demonstrate that the corpus used for TREC-9 is more challenging for an information filtering system than the TDT-3 corpus and strongly suggest that the TDT-3 event tracking task itself is more difficult than the TREC batch filtering task. We also show that optimizing performance in TREC-9 and TDT-3 tends to result in systems with different performance characteristics, confounding any meaningful comparison between the two domains, and that T9P and C
trk both have properties that make them undesirable as general information filtering metrics. |
| |
Keywords: | information filtering TREC TDT topic tracking |
本文献已被 SpringerLink 等数据库收录! |
|