Processing truncated terms in document retrieval systems期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

Processing truncated terms in document retrieval systems

Authors:	Paul Bratley Yaacov Choueka

Affiliation:	Département d''informatique et de recherche opérationnelle, Université de Montréal, C.P. 6128, Succursale “A”, Montreal P.Q., Canada H3C 3J7

Abstract:	In a typical inverted-file full-text document retrieval system, the user submits queries consisting of strings of characters combined by various operators. The strings are looked up in a text-dictionary which lists, for each string, all the places in the database at which it occurs. It is desirable to allow the user to include in his query truncated terms such as X¹, ¹X, ¹X¹, or X¹Y, where X and X are specified strings and ¹ is a variable-length-don't-care character, that is, ¹ represents an arbitrary, possibly empty, string. Processing these terms involves finding the set of all words in the dictionary that match these patterns. How to do this efficiently is a long-standing open problem in this domain.In this paper we present a uniform and efficient approach for processing all such query terms. The approach, based on a “permuted dictionary” and a corresponding set of access routines, requires essentially one disk access to obtain from the dictionary all the strings represented by a truncated term, with negligible computing time. It is thus well suited for on-line applications. Implementation is simple, and storage overhead is low: it can be made almost negligible by using some specially adapted compression techniques described in the paper.The basic approach is easily adaptable for slight variants, such as fixed (or bounded) length don't-care characters, or more complex pattern matching templates.

Keywords:
本文献已被 ScienceDirect 等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏