共查询到15条相似文献,搜索用时 15 毫秒
1.
Similarity calculations and document ranking form the computationally expensive parts of query processing in ranking-based text retrieval. In this work, for these calculations, 11 alternative implementation techniques are presented under four different categories, and their asymptotic time and space complexities are investigated. To our knowledge, six of these techniques are not discussed in any other publication before. Furthermore, analytical experiments are carried out on a 30 GB document collection to evaluate the practical performance of different implementations in terms of query processing time and space consumption. Advantages and disadvantages of each technique are illustrated under different querying scenarios, and several experiments that investigate the scalability of the implementations are presented. 相似文献
2.
Guillermo Navarro-Arribas Vicenç Torra Arnau Erola Jordi Castellà-Roca 《Information processing & management》2012
The anonymization of query logs is an important process that needs to be performed prior to the publication of such sensitive data. This ensures the anonymity of the users in the logs, a problem that has been already found in released logs from well known companies. This paper presents the anonymization of query logs using microaggregation. Our proposal ensures the k-anonymity of the users in the query log, while preserving its utility. We provide the evaluation of our proposal in real query logs, showing the privacy and utility achieved, as well as providing estimations for the use of such data in data mining processes based on clustering. 相似文献
3.
Recently there is a growing interest in the data model and query processing for probabilistic XML data. There are many potential applications of probabilistic data, and the XML data model is suitable to represent hierarchical information and data uncertainty of different levels naturally. However, the previously proposed probabilistic XML data models and query processing techniques separate finding data matches with evaluating the probabilities of results. Therefore, they should repeatedly access the data and need to get full data of paths given in queries to calculate the probabilities of results. 相似文献
4.
利用ASP技术实现成绩查询系统的开发 总被引:1,自引:0,他引:1
基于B/S结构,以ASP为关键技术的学生成绩查询系统的合理解决方案;该系统的设计与开发能够实现学生成绩查询的网络化,加快管理的现代化进程. 相似文献
5.
Query Expansion (QE) is one of the most important mechanisms in the information retrieval field. A typical short Internet query will go through a process of refinement to improve its retrieval power. Most of the existing QE techniques suffer from retrieval performance degradation due to imprecise choice of query’s additive terms in the QE process. In this paper, we introduce a novel automated QE mechanism. The new expansion process is guided by the semantics relations between the original query and the expanding words, in the context of the utilized corpus. Experimental results of our “controlled” query expansion, using the Arabic TREC-10 data, show a significant enhancement of recall and precision over current existing mechanisms in the field. 相似文献
6.
Duncan A. Buell 《Information processing & management》1981,17(5):249-262
Most current document retrieval systems require that user queries be specified in the form of Boolean expressions. Although Boolean queries work, they have flaws. Some of the attempts to overcome these flaws have involved “partial-match” retrieval or the use of fuzzy-subset theory. Recently, some generalizations of fuzzy-subset theory have been suggested that would allow the user to specify queries with relevance weights or thresholds attached to terms. The various query-processing methods are discussed and compared. 相似文献
7.
《Information processing & management》2016,52(3):478-489
Pseudo-relevance feedback is the basis of a category of automatic query modification techniques. Pseudo-relevance feedback methods assume the initial retrieved set of documents to be relevant. Then they use these documents to extract more relevant terms for the query or just re-weigh the user's original query. In this paper, we propose a straightforward, yet effective use of pseudo-relevance feedback method in detecting more informative query terms and re-weighting them. The query-by-query analysis of our results indicates that our method is capable of identifying the most important keywords even in short queries. Our main idea is that some of the top documents may contain a closer context to the user's information need than the others. Therefore, re-examining the similarity of those top documents and weighting this set based on their context could help in identifying and re-weighting informative query terms. Our experimental results in standard English and Persian test collections show that our method improves retrieval performance, in terms of MAP criterion, up to 7% over traditional query term re-weighting methods. 相似文献
8.
The name ambiguity problem is especially challenging in the field of bibliographic digital libraries. The problem is amplified when names are collected from heterogeneous sources. This is the case in the Scholarometer system, which performs bibliometric analysis by cross-correlating author names in user queries with those retrieved from digital libraries. The uncontrolled nature of user-generated annotations is very valuable, but creates the need to detect ambiguous names. Our goal is to detect ambiguous names at query time by mining digital library annotation data, thereby decreasing noise in the bibliometric analysis. We explore three kinds of heuristic features based on citations, metadata, and crowdsourced topics in a supervised learning framework. The proposed approach achieves almost 80% accuracy. Finally, we compare the performance of ambiguous author detection in Scholarometer using Google Scholar against a baseline based on Microsoft Academic Search. 相似文献
9.
Valiollah Tahani 《Information processing & management》1977,13(5):289-303
This paper is concerned with techniques for fuzzy query processing in a database system. By a fuzzy query we mean a query which uses imprecise or fuzzy predicates (e.g. AGE = “VERY YOUNG”, SALARY = “MORE OR LESS HIGH”, YEAR-OF-EMPLOYMENT = “RECENT”, SALARY ? 20,000, etc.). As a basis for fuzzy query processing, a fuzzy retrieval system based on the theory of fuzzy sets and linguistic variables is introduced. In our system model, the first step in processing fuzzy queries consists of assigning meaning to fuzzy terms (linguistic values), of a term-set, used for the formulation of a query. The meaning of a fuzzy term is defined as a fuzzy set in a universe of discourse which contains the numerical values of a domain of a relation in the system database.The fuzzy retrieval system developed is a high level model for the techniques which may be used in a database system. The feasibility of implementing such techniques in a real environment is studied. Specifically, within this context, techniques for processing simple fuzzy queries expressed in the relational query language SEQUEL are introduced. 相似文献
10.
We propose a new query reformulation approach, using a set of query concepts that are introduced to precisely denote the user’s information need. Since a document collection is considered to be a domain which includes latent primitive concepts, we identify those concepts through a local pattern discovery and a global modeling using data mining techniques. For a new query, we select its most associated primitive concepts and choose the most probable interpretations as query concepts. We discuss the issue of constructing the primitive concepts from either the whole corpus or from the retrieved set of documents. Our experiments are performed on the TREC8 collection. The experimental evaluation shows that our approach is as good as current query reformulation approaches, while being particularly effective for poorly performing queries. Moreover, we find that the approach using the primitive concepts generated from the set of retrieved documents leads to the most effective performance. 相似文献
11.
To resolve some of lexical disagreement problems between queries and FAQs, we propose a reliable FAQ retrieval system using query log clustering. On indexing time, the proposed system clusters the logs of users’ queries into predefined FAQ categories. To increase the precision and the recall rate of clustering, the proposed system adopts a new similarity measure using a machine readable dictionary. On searching time, the proposed system calculates the similarities between users’ queries and each cluster in order to smooth FAQs. By virtue of the cluster-based retrieval technique, the proposed system could partially bridge lexical chasms between queries and FAQs. In addition, the proposed system outperforms the traditional information retrieval systems in FAQ retrieval. 相似文献
12.
Filipe Mesquita Altigran S. da Silva Edleno S. de Moura Pável Calado Alberto H.F. Laender 《Information processing & management》2007
A vast amount of valuable information, produced and consumed by people and institutions, is currently stored in relational databases. For many purposes, there is an ever increasing demand for having these databases published on the Web, so that users can query the data available in them. An important requirement for this to happen is that query interfaces must be as simple and intuitive as possible. In this paper we present LABRADOR, a system for efficiently publishing relational databases on the Web by using a simple text box query interface. The system operates by taking an unstructured keyword-based query posed by a user and automatically deriving an equivalent SQL query that fits the user’s information needs, as expressed by the original query. The SQL query is then sent to a DBMS and its results are processed by LABRADOR to create a relevance-based ranking of the answers. Experiments we present show that LABRADOR can automatically find the most suitable SQL query in more than 75% of the cases, and that the overhead introduced by the system in the overall query processing time is almost insignificant. Furthermore, the system operates in a non-intrusive way, since it requires no modifications to the target database schema. 相似文献
13.
In a droplet transport based on electrowetting on dielectrics, the parallel-plate configuration is more popular than the single-plate one because the droplet transport becomes increasingly difficult without cover plate. In spite of the improved transport performance, the parallel-plate configuration often limits the access to the peripheral components, requesting the removal of the cover plate, the single-plate configuration. We investigated the fundamental features of droplet transport for the single-plate configuration. We compared the performance of several switching methods with respect to maximum speed of successive transport without failure and suggested nonfloating switching method which is inherently free from the charge-residue problem and exerts greater force on a droplet than conventional switching methods. A simple theory is provided to understand the different results for the switching methods. 相似文献
14.
《Information processing & management》2023,60(2):103171
The present work analyzes the application of deep learning in the context of digital twins (DTs) to promote the development of smart cities. According to the theoretical basis of DTs and the smart city construction, the five-dimensional DTs model is discussed to propose the conceptual framework of the DTs city. Then, edge computing technology is introduced to build an intelligent traffic perception system based on edge computing combined with DTs. Moreover, to improve the traffic scene recognition accuracy, the Single Shot MultiBox Detector (SSD) algorithm is optimized by the residual network, form the SSD-ResNet50 algorithm, and the DarkNet-53 is also improved. Finally, experiments are conducted to verify the effects of the improved algorithms and the data enhancement method. The experimental results indicate that the SSD-ResNet50 and the improved DarkNet-53 algorithm show fast training speed, high recognition accuracy, and favorable training effect. Compared with the original algorithms, the recognition time of the SSD-ResNet50 algorithm and the improved DarkNet-53 algorithm is reduced by 6.37ms and 4.25ms, respectively. The data enhancement method used in the present work is not only suitable for the algorithms reported here, but also has a good influence on other deep learning algorithms. Moreover, SSD-ResNet50 and improved DarkNet-53 algorithms have significant applicable advantages in the research of traffic sign target recognition. The rigorous research with appropriate methods and comprehensive results can offer effective reference for subsequent research on DTs cities. 相似文献
15.
The proposed work aims to explore and compare the potency of syntactic-semantic based linguistic structures in plagiarism detection using natural language processing techniques. The current work explores linguistic features, viz., part of speech tags, chunks and semantic roles in detecting plagiarized fragments and utilizes a combined syntactic-semantic similarity metric, which extracts the semantic concepts from WordNet lexical database. The linguistic information is utilized for effective pre-processing and for availing semantically relevant comparisons. Another major contribution is the analysis of the proposed approach on plagiarism cases of various complexity levels. The impact of plagiarism types and complexity levels, upon the features extracted is analyzed and discussed. Further, unlike the existing systems, which were evaluated on some limited data sets, the proposed approach is evaluated on a larger scale using the plagiarism corpus provided by PAN1 competition from 2009 to 2014. The approach presented considerable improvement in comparison with the top-ranked systems of the respective years. The evaluation and analysis with various cases of plagiarism also reflected the supremacy of deeper linguistic features for identifying manually plagiarized data. 相似文献