期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Performance of query processing implementations in ranking-based text retrieval systems using inverted indices

B. Barla Cambazoglu Cevdet Aykanat 《Information processing & management》2006

Similarity calculations and document ranking form the computationally expensive parts of query processing in ranking-based text retrieval. In this work, for these calculations, 11 alternative implementation techniques are presented under four different categories, and their asymptotic time and space complexities are investigated. To our knowledge, six of these techniques are not discussed in any other publication before. Furthermore, analytical experiments are carried out on a 30 GB document collection to evaluate the practical performance of different implementations in terms of query processing time and space consumption. Advantages and disadvantages of each technique are illustrated under different querying scenarios, and several experiments that investigate the scalability of the implementations are presented. 相似文献

2.

User k-anonymity for privacy preserving data mining of query logs

Guillermo Navarro-Arribas Vicenç Torra Arnau Erola Jordi Castellà-Roca 《Information processing & management》2012

The anonymization of query logs is an important process that needs to be performed prior to the publication of such sensitive data. This ensures the anonymity of the users in the logs, a problem that has been already found in released logs from well known companies. This paper presents the anonymization of query logs using microaggregation. Our proposal ensures the k-anonymity of the users in the query log, while preserving its utility. We provide the evaluation of our proposal in real query logs, showing the privacy and utility achieved, as well as providing estimations for the use of such data in data mining processes based on clustering. 相似文献

3.

Efficient probabilistic XML query processing using an extended labeling scheme and a lightweight index

Jung-Hee Yun Chin-Wan Chung 《Information processing & management》2012

Recently there is a growing interest in the data model and query processing for probabilistic XML data. There are many potential applications of probabilistic data, and the XML data model is suitable to represent hierarchical information and data uncertainty of different levels naturally. However, the previously proposed probabilistic XML data models and query processing techniques separate finding data matches with evaluating the probabilities of results. Therefore, they should repeatedly access the data and need to get full data of paths given in queries to calculate the probabilities of results. 相似文献

4.

利用ASP技术实现成绩查询系统的开发 总被引：1，自引：0，他引：1

李艳丽《中国科技信息》2008,(20)

基于B/S结构,以ASP为关键技术的学生成绩查询系统的合理解决方案;该系统的设计与开发能够实现学生成绩查询的网络化,加快管理的现代化进程. 相似文献

5.

Improving query precision using semantic expansion

Ahmed Abdelali Jim Cowie Hamdy S. Soliman 《Information processing & management》2007

Query Expansion (QE) is one of the most important mechanisms in the information retrieval field. A typical short Internet query will go through a process of refinement to improve its retrieval power. Most of the existing QE techniques suffer from retrieval performance degradation due to imprecise choice of query’s additive terms in the QE process. In this paper, we introduce a novel automated QE mechanism. The new expansion process is guided by the semantics relations between the original query and the expanding words, in the context of the utilized corpus. Experimental results of our “controlled” query expansion, using the Arabic TREC-10 data, show a significant enhancement of recall and precision over current existing mechanisms in the field. 相似文献

6.

A general model of query processing in information retrieval systems

Duncan A. Buell 《Information processing & management》1981,17(5):249-262

Most current document retrieval systems require that user queries be specified in the form of Boolean expressions. Although Boolean queries work, they have flaws. Some of the attempts to overcome these flaws have involved “partial-match” retrieval or the use of fuzzy-subset theory. Recently, some generalizations of fuzzy-subset theory have been suggested that would allow the user to specify queries with relevance weights or thresholds attached to terms. The various query-processing methods are discussed and compared. 相似文献

7.

A query term re-weighting approach using document similarity

《Information processing & management》2016,52(3):478-489

Pseudo-relevance feedback is the basis of a category of automatic query modification techniques. Pseudo-relevance feedback methods assume the initial retrieved set of documents to be relevant. Then they use these documents to extract more relevant terms for the query or just re-weigh the user's original query. In this paper, we propose a straightforward, yet effective use of pseudo-relevance feedback method in detecting more informative query terms and re-weighting them. The query-by-query analysis of our results indicates that our method is capable of identifying the most important keywords even in short queries. Our main idea is that some of the top documents may contain a closer context to the user's information need than the others. Therefore, re-examining the similarity of those top documents and weighting this set based on their context could help in identifying and re-weighting informative query terms. Our experimental results in standard English and Persian test collections show that our method improves retrieval performance, in terms of MAP criterion, up to 7% over traditional query term re-weighting methods. 相似文献

8.

Ambiguous author query detection using crowdsourced digital library annotations

Xiaoling Sun Jasleen Kaur Lino Possamai Filippo Menczer 《Information processing & management》2013

The name ambiguity problem is especially challenging in the field of bibliographic digital libraries. The problem is amplified when names are collected from heterogeneous sources. This is the case in the Scholarometer system, which performs bibliometric analysis by cross-correlating author names in user queries with those retrieved from digital libraries. The uncontrolled nature of user-generated annotations is very valuable, but creates the need to detect ambiguous names. Our goal is to detect ambiguous names at query time by mining digital library annotation data, thereby decreasing noise in the bibliometric analysis. We explore three kinds of heuristic features based on citations, metadata, and crowdsourced topics in a supervised learning framework. The proposed approach achieves almost 80% accuracy. Finally, we compare the performance of ambiguous author detection in Scholarometer using Google Scholar against a baseline based on Microsoft Academic Search. 相似文献

9.

A conceptual framework for fuzzy query processing—A step toward very intelligent database systems

Valiollah Tahani 《Information processing & management》1977,13(5):289-303

This paper is concerned with techniques for fuzzy query processing in a database system. By a fuzzy query we mean a query which uses imprecise or fuzzy predicates (e.g. AGE = “VERY YOUNG”, SALARY = “MORE OR LESS HIGH”, YEAR-OF-EMPLOYMENT = “RECENT”, SALARY ? 20,000, etc.). As a basis for fuzzy query processing, a fuzzy retrieval system based on the theory of fuzzy sets and linguistic variables is introduced. In our system model, the first step in processing fuzzy queries consists of assigning meaning to fuzzy terms (linguistic values), of a term-set, used for the formulation of a query. The meaning of a fuzzy term is defined as a fuzzy set in a universe of discourse which contains the numerical values of a domain of a relation in the system database.The fuzzy retrieval system developed is a high level model for the techniques which may be used in a database system. The feasibility of implementing such techniques in a real environment is studied. Specifically, within this context, techniques for processing simple fuzzy queries expressed in the relational query language SEQUEL are introduced. 相似文献

10.

Query reformulation using automatically generated query concepts from a document space

Youjin Chang Iadh Ounis Minkoo Kim 《Information processing & management》2006

We propose a new query reformulation approach, using a set of query concepts that are introduced to precisely denote the user’s information need. Since a document collection is considered to be a domain which includes latent primitive concepts, we identify those concepts through a local pattern discovery and a global modeling using data mining techniques. For a new query, we select its most associated primitive concepts and choose the most probable interpretations as query concepts. We discuss the issue of constructing the primitive concepts from either the whole corpus or from the retrieved set of documents. Our experiments are performed on the TREC8 collection. The experimental evaluation shows that our approach is as good as current query reformulation approaches, while being particularly effective for poorly performing queries. Moreover, we find that the approach using the primitive concepts generated from the set of retrieved documents leads to the most effective performance. 相似文献

11.

High-performance FAQ retrieval using an automatic clustering method of query logs

Harksoo Kim Jungyun Seo 《Information processing & management》2006

To resolve some of lexical disagreement problems between queries and FAQs, we propose a reliable FAQ retrieval system using query log clustering. On indexing time, the proposed system clusters the logs of users’ queries into predefined FAQ categories. To increase the precision and the recall rate of clustering, the proposed system adopts a new similarity measure using a machine readable dictionary. On searching time, the proposed system calculates the similarities between users’ queries and each cluster in order to smooth FAQs. By virtue of the cluster-based retrieval technique, the proposed system could partially bridge lexical chasms between queries and FAQs. In addition, the proposed system outperforms the traditional information retrieval systems in FAQ retrieval. 相似文献

12.

LABRADOR: Efficiently publishing relational databases on the web by using keyword-based query interfaces

Filipe Mesquita Altigran S. da Silva Edleno S. de Moura Pável Calado Alberto H.F. Laender 《Information processing & management》2007

A vast amount of valuable information, produced and consumed by people and institutions, is currently stored in relational databases. For many purposes, there is an ever increasing demand for having these databases published on the Web, so that users can query the data available in them. An important requirement for this to happen is that query interfaces must be as simple and intuitive as possible. In this paper we present LABRADOR, a system for efficiently publishing relational databases on the Web by using a simple text box query interface. The system operates by taking an unstructured keyword-based query posed by a user and automatically deriving an equivalent SQL query that fits the user’s information needs, as expressed by the original query. The SQL query is then sent to a DBMS and its results are processed by LABRADOR to create a relevance-based ranking of the answers. Experiments we present show that LABRADOR can automatically find the most suitable SQL query in more than 75% of the cases, and that the overhead introduced by the system in the overall query processing time is almost insignificant. Furthermore, the system operates in a non-intrusive way, since it requires no modifications to the target database schema. 相似文献

13.

Fast and reliable droplet transport on single-plate electrowetting on dielectrics using nonfloating switching method

Jun Kwon Park Seung Jun Lee Kwan Hyoung Kang 《Biomicrofluidics》2010,4(2)

In a droplet transport based on electrowetting on dielectrics, the parallel-plate configuration is more popular than the single-plate one because the droplet transport becomes increasingly difficult without cover plate. In spite of the improved transport performance, the parallel-plate configuration often limits the access to the peripheral components, requesting the removal of the cover plate, the single-plate configuration. We investigated the fundamental features of droplet transport for the single-plate configuration. We compared the performance of several switching methods with respect to maximum speed of successive transport without failure and suggested nonfloating switching method which is inherently free from the charge-residue problem and exerts greater force on a droplet than conventional switching methods. A simple theory is provided to understand the different results for the switching methods. 相似文献

14.

Data information processing of traffic digital twins in smart cities using edge intelligent federation learning

《Information processing & management》2023,60(2):103171

The present work analyzes the application of deep learning in the context of digital twins (DTs) to promote the development of smart cities. According to the theoretical basis of DTs and the smart city construction, the five-dimensional DTs model is discussed to propose the conceptual framework of the DTs city. Then, edge computing technology is introduced to build an intelligent traffic perception system based on edge computing combined with DTs. Moreover, to improve the traffic scene recognition accuracy, the Single Shot MultiBox Detector (SSD) algorithm is optimized by the residual network, form the SSD-ResNet50 algorithm, and the DarkNet-53 is also improved. Finally, experiments are conducted to verify the effects of the improved algorithms and the data enhancement method. The experimental results indicate that the SSD-ResNet50 and the improved DarkNet-53 algorithm show fast training speed, high recognition accuracy, and favorable training effect. Compared with the original algorithms, the recognition time of the SSD-ResNet50 algorithm and the improved DarkNet-53 algorithm is reduced by 6.37ms and 4.25ms, respectively. The data enhancement method used in the present work is not only suitable for the algorithms reported here, but also has a good influence on other deep learning algorithms. Moreover, SSD-ResNet50 and improved DarkNet-53 algorithms have significant applicable advantages in the research of traffic sign target recognition. The rigorous research with appropriate methods and comprehensive results can offer effective reference for subsequent research on DTs cities. 相似文献

15.

Unmasking text plagiarism using syntactic-semantic based natural language processing techniques: Comparisons,analysis and challenges

Vani K. Deepa Gupta 《Information processing & management》2018,54(3):408-432

The proposed work aims to explore and compare the potency of syntactic-semantic based linguistic structures in plagiarism detection using natural language processing techniques. The current work explores linguistic features, viz., part of speech tags, chunks and semantic roles in detecting plagiarized fragments and utilizes a combined syntactic-semantic similarity metric, which extracts the semantic concepts from WordNet lexical database. The linguistic information is utilized for effective pre-processing and for availing semantically relevant comparisons. Another major contribution is the analysis of the proposed approach on plagiarism cases of various complexity levels. The impact of plagiarism types and complexity levels, upon the features extracted is analyzed and discussed. Further, unlike the existing systems, which were evaluated on some limited data sets, the proposed approach is evaluated on a larger scale using the plagiarism corpus provided by PAN¹ competition from 2009 to 2014. The approach presented considerable improvement in comparison with the top-ranked systems of the respective years. The evaluation and analysis with various cases of plagiarism also reflected the supremacy of deeper linguistic features for identifying manually plagiarized data. 相似文献