首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
This paper reports our experimental investigation into the use of more realistic concepts as opposed to simple keywords for document retrieval, and reinforcement learning for improving document representations to help the retrieval of useful documents for relevant queries. The framework used for achieving this was based on the theory of Formal Concept Analysis (FCA) and Lattice Theory. Features or concepts of each document (and query), formulated according to FCA, are represented in a separate concept lattice and are weighted separately with respect to the individual documents they present. The document retrieval process is viewed as a continuous conversation between queries and documents, during which documents are allowed to learn a set of significant concepts to help their retrieval. The learning strategy used was based on relevance feedback information that makes the similarity of relevant documents stronger and non-relevant documents weaker. Test results obtained on the Cranfield collection show a significant increase in average precisions as the system learns from experience.  相似文献   

2.
Machine reading comprehension (MRC) is a challenging task in the field of artificial intelligence. Most existing MRC works contain a semantic matching module, either explicitly or intrinsically, to determine whether a piece of context answers a question. However, there is scant work which systematically evaluates different paradigms using semantic matching in MRC. In this paper, we conduct a systematic empirical study on semantic matching. We formulate a two-stage framework which consists of a semantic matching model and a reading model, based on pre-trained language models. We compare and analyze the effectiveness and efficiency of using semantic matching modules with different setups on four types of MRC datasets. We verify that using semantic matching before a reading model improves both the effectiveness and efficiency of MRC. Compared with answering questions by extracting information from concise context, we observe that semantic matching yields more improvements for answering questions with noisy and adversarial context. Matching coarse-grained context to questions, e.g., paragraphs, is more effective than matching fine-grained context, e.g., sentences and spans. We also find that semantic matching is helpful for answering who/where/when/what/how/which questions, whereas it decreases the MRC performance on why questions. This may imply that semantic matching helps to answer a question whose necessary information can be retrieved from a single sentence. The above observations demonstrate the advantages and disadvantages of using semantic matching in different scenarios.  相似文献   

3.
Machine learning tools are increasingly infiltrating everyday work life with implications for workers. By looking at machine learning tools as part of a sociotechnical system, we explore how machine learning tools enforce oppression of workers. We theorize, normatively, that with reorganizing processes in place, oppressive characteristics could be converted to emancipatory characteristics. Drawing on Paulo Freire’s critical theory of emancipatory pedagogy, we outline similarities between the characteristics Freire saw in oppressive societies and the characteristics of currently designed partnerships between humans and machine learning tools. Freire’s theory offers a way forward in reorganizing humans and machine learning tools in the workplace. Rather than advocating human control or the decoupling of workers and machines, we follow Freire’s theory in proposing four processes for emancipatory organizing of human and machine learning partnership: 1) awakening of a critical consciousness, 2) enabling role freedom, 3) instituting incentives and sanctions for accountability, and 4) identifying alternative emancipatory futures. Theoretical and practical implications of this emancipatory organizing theory are drawn.  相似文献   

4.
The paper combines a comprehensive account of the probabilistic model of retrieval with new systematic experiments on TREC Programme material. It presents the model from its foundations through its logical development to cover more aspects of retrieval data and a wider range of system functions. Each step in the argument is matched by comparative retrieval tests, to provide a single coherent account of a major line of research. The experiments demonstrate, for a large test collection, that the probabilistic model is effective and robust, and that it responds appropriately, with major improvements in performance, to key features of retrieval situations.Part 1 covers the foundations and the model development for document collection and relevance data, along with the test apparatus. Part 2 covers the further development and elaboration of the model, with extensive testing, and briefly considers other environment conditions and tasks, model training, concluding with comparisons with other approaches and an overall assessment.Data and results tables for both parts are given in Part 1. Key results are summarised in Part 2.  相似文献   

5.
Pseudo-relevance feedback (PRF) is a well-known method for addressing the mismatch between query intention and query representation. Most current PRF methods consider relevance matching only from the perspective of terms used to sort feedback documents, thus possibly leading to a semantic gap between query representation and document representation. In this work, a PRF framework that combines relevance matching and semantic matching is proposed to improve the quality of feedback documents. Specifically, in the first round of retrieval, we propose a reranking mechanism in which the information of the exact terms and the semantic similarity between the query and document representations are calculated by bidirectional encoder representations from transformers (BERT); this mechanism reduces the text semantic gap by using the semantic information and improves the quality of feedback documents. Then, our proposed PRF framework is constructed to process the results of the first round of retrieval by using probability-based PRF methods and language-model-based PRF methods. Finally, we conduct extensive experiments on four Text Retrieval Conference (TREC) datasets. The results show that the proposed models outperform the robust baseline models in terms of the mean average precision (MAP) and precision P at position 10 (P@10), and the results also highlight that using the combined relevance matching and semantic matching method is more effective than using relevance matching or semantic matching alone in terms of improving the quality of feedback documents.  相似文献   

6.
7.
We study several machine learning algorithms for cross-language patent retrieval and classification. In comparison with most of other studies involving machine learning for cross-language information retrieval, which basically used learning techniques for monolingual sub-tasks, our learning algorithms exploit the bilingual training documents and learn a semantic representation from them. We study Japanese–English cross-language patent retrieval using Kernel Canonical Correlation Analysis (KCCA), a method of correlating linear relationships between two variables in kernel defined feature spaces. The results are quite encouraging and are significantly better than those obtained by other state of the art methods. We also investigate learning algorithms for cross-language document classification. The learning algorithm are based on KCCA and Support Vector Machines (SVM). In particular, we study two ways of combining the KCCA and SVM and found that one particular combination called SVM_2k achieved better results than other learning algorithms for either bilingual or monolingual test documents.  相似文献   

8.
In this paper, we propose a multi-strategic matching and merging approach to find correspondences between ontologies based on the syntactic or semantic characteristics and constraints of the Topic Maps. Our multi-strategic matching approach consists of a linguistic module and a Topic Map constraints-based module. A linguistic module computes similarities between concepts using morphological analysis, string normalization and tokenization and language-dependent heuristics. A Topic Map constraints-based module takes advantage of several Topic Maps-dependent techniques such as a topic property-based matching, a hierarchy-based matching, and an association-based matching. This is a composite matching procedure and need not generate a cross-pair of all topics from the ontologies because unmatched pairs of topics can be removed by characteristics and constraints of the Topic Maps. Merging between Topic Maps follows the matching operations. We set up the MERGE function to integrate two Topic Maps into a new Topic Map, which satisfies such merge requirements as entity preservation, property preservation, relation preservation, and conflict resolution. For our experiments, we used oriental philosophy ontologies, western philosophy ontologies, Yahoo western philosophy dictionary, and Wikipedia philosophy ontology as input ontologies. Our experiments show that the automatically generated matching results conform to the outputs generated manually by domain experts and can be of great benefit to the following merging operations.  相似文献   

9.
从分析供需匹配之间存在的冲突入手,提出一种基于图模型冲突分析法来求解供需匹配问题的思路。首先在图模型视角下给出供需匹配问题的研究框架,分析不同类型的供需匹配问题,设定决策主体和决策策略;然后讨论不同匹配类型下的状态约简、偏好表达,进行稳定性分析;最后通过应用案例,将图模型法与群决策匹配算法进行对比分析,验证图模型求解供需匹配问题的有效性。  相似文献   

10.
In recent times, exploration of multimedia required ever increasing demand and application for intelligent video retrieval from repositories. This paper presents an efficient video retrieval framework by employing the effective singular value decomposition and computationally low complex ordered dither block truncation coding to extract simple, compact, and well discriminative Color Co-occurrence Feature (CCF). In this context, the occurrence probability of a video frame pixel in the neighborhood is employed to formulate this specific and distinct feature. Moreover, we applied a new adaptive low rank thresholding based on energy concentricity, transposition, and replacement invariance characteristics to formulate a unified fast shot boundary detection approach to solve the protuberant bottleneck problem for real-time cut and gradual transition that eventually contributes for effective keyframes extraction. Therefore, we can assert that the keyframes are distinct and discriminative to represent the whole video content. For effective indexing and retrieval, it is imperative to formulate similarity score evaluator for the encapsulated contextual video information with substantial temporal consistency, least computation, and post-processing. Therefore, we introduced graph-based pattern matching for video retrieval with an aim to sustain temporal consistency, accuracy and time overhead. Experimental results signify that the proposed method on average provides 7.40% and 17.91% better retrieval accuracy and 23.21% and 20.44% faster than the recent state-of-the-art methods for UCF11 and HMDB51 standard video dataset, respectively.  相似文献   

11.
The data fusion technique has been investigated by many researchers and has been used in implementing several information retrieval systems. However, the results from data fusion vary in different situations. To find out under which condition data fusion may lead to performance improvement is an important issue. In this paper, we present an analysis of the behaviour of several well-known methods such as CombSum and CombMNZ for fusion of multiple information retrieval results. Based on this analysis, we predict the performance of the data fusion methods. Experiments are conducted with three groups of results submitted to TREC 6, TREC 2001, and TREC 2004. The experiments show that the prediction of the performance of data fusion is quite accurate, and it can be used in situations very different from the training examples. Compared with previous work, our result is more accurate and in a better position for applications since various number of component systems can be supported while only two was used previously.  相似文献   

12.
In the present work we perform compressed pattern matching in binary Huffman encoded texts [Huffman, D. (1952). A method for the construction of minimum redundancy codes, Proc. of the IRE, 40, 1098–1101]. A modified Knuth–Morris–Pratt algorithm is used in order to overcome the problem of false matches, i.e., an occurrence of the encoded pattern in the encoded text that does not correspond to an occurrence of the pattern itself in the original text. We propose a bitwise KMP algorithm that can move one extra bit in the case of a mismatch since the alphabet is binary. To avoid processing any bit of the encoded text more than once, a preprocessed table is used to determine how far to back up when a mismatch is detected, and is defined so that we are always able to align the start of the encoded pattern with the start of a codeword in the encoded text. We combine our KMP algorithm with two practical Huffman decoding schemes which handle more than a single bit per machine operation; skeleton trees defined by Klein [Klein, S. T. (2000). Skeleton trees for efficient decoding of huffman encoded texts. Information Retrieval, 3, 7–23], and numerical comparisons between special canonical values and portions of a sliding window presented in Moffat and Turpin [Moffat, A., & Turpin, A. (1997). On the implementation of minimum redundancy prefix codes. IEEE Transactions on Communications, 45, 1200–1207]. Experiments show rapid search times of our algorithms compared to the “decompress then search” method, therefore, files can be kept in their compressed form, saving memory space. When compression gain is important, these algorithms are better than cgrep [Ferragina, P., Tommasi, A., & Manzini, G. (2004). C Library to search over compressed texts, http://roquefort.di.unipi.it/~ferrax/CompressedSearch], which is only slightly faster than ours.  相似文献   

13.
高校毕业生初次就业匹配质量和效率分析   总被引:1,自引:0,他引:1  
袁红清 《科技与管理》2010,12(2):137-140,144
通过对高校毕业生初次就业问题进行分析,尤其对影响就业匹配质量和效率的因素进行了研究,指出人才市场本身的缺陷和过度迷信市场的能力是造成匹配质量和效率低下的主要原因,并提出在不改变人才结构供需关系上,通过合作与社会性嵌入可以有效地改善高校毕业生初次就业匹配质量和效率问题。  相似文献   

14.
[目的/意义]基于引文网络识别跨学科交流中承担中间人角色的跨学科知识,一方面有利于了解跨学科交流现状,另一方面为后续基于中间人识别跨学科相关知识、促进跨学科合作研究奠定基础。[方法/过程]基于中间人角色分类理论,提出跨学科知识交流的中间人角色分类;构建模型,识别目标学科在当前跨学科交流中各类当采跨学科中间人。[结果/结论]选择图书情报学领域影响力较大、跨学科程度较高的6种期刊数据为样本,研究发现:当采跨学科输入守门型中间人有语义相似度、条件随机场、国家安全、智库等;当采跨学科输出代理型中间人有KANO模型、特征分析、智慧图书馆、研究热点等;当采跨学科输入输出沟通型中间人有政策分析、区块链、信息安全、网络舆情等。  相似文献   

15.
16.
Current citation-based document retrieval systems generally offer only limited search facilities, such as author search. In order to facilitate more advanced search functions, we have developed a significantly improved system that employs two novel techniques: Context-based Cluster Analysis (CCA) and Context-based Ontology Generation frAmework (COGA). CCA aims to extract relevant information from clusters originally obtained from disparate clustering methods by building relationships between them. The built relationships are then represented as formal context using the Formal Concept Analysis (FCA) technique. COGA aims to generate ontology from clusters relationship built by CCA. By combining these two techniques, we are able to perform ontology learning from a citation database using clustering results. We have implemented the improved system and have demonstrated its use for finding research domain expertise. We have also conducted performance evaluation on the system and the results are encouraging.  相似文献   

17.
简述了电脑测配色的基本工作原理及在实际工作中的应用,如色样准确性的控制和提供品管测色报告,利用配色软件选配较好的处方来方便快捷的解决同色异谱现象,有效提高了打样的效率。同时指出了电脑测配色在使用时的不足及解决方法。  相似文献   

18.
【目的】为中国科技期刊电子文档交换和存储标准的制定和使用提供借鉴。【方法】以中华医学会杂志社为例,介绍中华医学会期刊文档交换和存储标准(CMA JATS 0.1)的特点及其实践价值。【结果】 在2014年10月,中华医学会杂志社推出了初版标准CMA JATS 0.1,并成功指导数字出版上流的数据加工商和下流的平台开发商,让杂志社传统出版后续流程能基于该标准进行流畅的运行,目前超过15家期刊的现刊数据以此标准进行数据加工,并实现迅捷在线发布。【结论】CMA JAST 0.1 有效地整合了中华医学会杂志社的期刊资源,推动了中华医学会杂志社的数字出版发展,但以此标准的全面落实还需与数字出版产业链的上下流公司进行通力协作。  相似文献   

19.
针对信用信息系统的特性,设计了信用信息系统。该系统分为4个层次:系统应用接入、数据交换、数据存储与应用、信用信息发布以及基础服务平台。从系统建设的目标与原则、总体框架设计、基础服务平台设计、应用安全设计这几部分来设计信用信息系统。  相似文献   

20.
This paper presents a probabilistic information retrieval framework in which the retrieval problem is formally treated as a statistical decision problem. In this framework, queries and documents are modeled using statistical language models, user preferences are modeled through loss functions, and retrieval is cast as a risk minimization problem. We discuss how this framework can unify existing retrieval models and accommodate systematic development of new retrieval models. As an example of using the framework to model non-traditional retrieval problems, we derive retrieval models for subtopic retrieval, which is concerned with retrieving documents to cover many different subtopics of a general query topic. These new models differ from traditional retrieval models in that they relax the traditional assumption of independent relevance of documents.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号