首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 609 毫秒
1.
Vocabulary mining in information retrieval refers to the utilization of the domain vocabulary towards improving the user’s query. Most often queries posed to information retrieval systems are not optimal for retrieval purposes. Vocabulary mining allows one to generalize, specialize or perform other kinds of vocabulary-based transformations on the query in order to improve retrieval performance. This paper investigates a new framework for vocabulary mining that derives from the combination of rough sets and fuzzy sets. The framework allows one to use rough set-based approximations even when the documents and queries are described using weighted, i.e., fuzzy representations. The paper also explores the application of generalized rough sets and the variable precision models. The problem of coordination between multiple vocabulary views is also examined. Finally, a preliminary analysis of issues that arise when applying the proposed vocabulary mining framework to the Unified Medical Language System (a state-of-the-art vocabulary system) is presented. The proposed framework supports the systematic study and application of different vocabulary views in information retrieval.  相似文献   

2.
The application of natural language processing (NLP) to financial fields is advancing with an increase in the number of available financial documents. Transformer-based models such as Bidirectional Encoder Representations from Transformers (BERT) have been successful in NLP in recent years. These cutting-edge models have been adapted to the financial domain by applying financial corpora to existing pre-trained models and by pre-training with the financial corpora from scratch. In Japanese, by contrast, financial terminology cannot be applied from a general vocabulary without further processing. In this study, we construct language models suitable for the financial domain. Furthermore, we compare methods for adapting language models to the financial domain, such as pre-training methods and vocabulary adaptation. We confirm that the adaptation of a pre-training corpus and tokenizer vocabulary based on a corpus of financial text is effective in several downstream financial tasks. No significant difference is observed between pre-training with the financial corpus and continuous pre-training from the general language model with the financial corpus. We have released our source code and pre-trained models.  相似文献   

3.
This paper addresses the problem of how to rank retrieval systems without the need for human relevance judgments, which are very resource intensive to obtain. Using TREC 3, 6, 7 and 8 data, it is shown how the overlap structure between the search results of multiple systems can be used to infer relative performance differences. In particular, the overlap structures for random groupings of five systems are computed, so that each system is selected an equal number of times. It is shown that the average percentage of a system’s documents that are only found by it and no other systems is strongly and negatively correlated with its retrieval performance effectiveness, such as its mean average precision or precision at 1000. The presented method uses the degree of consensus or agreement a retrieval system can generate to infer its quality. This paper also addresses the question of how many documents in a ranked list need to be examined to be able to rank the systems. It is shown that the overlap structure of the top 50 documents can be used to rank the systems, often producing the best results. The presented method significantly improves upon previous attempts to rank retrieval systems without the need for human relevance judgments. This “structure of overlap” method can be of value to communities that need to identify the best experts or rank them, but do not have the resources to evaluate the experts’ recommendations, since it does not require knowledge about the domain being searched or the information being requested.  相似文献   

4.
在英语教学过程中,词汇教学占有十分重要的地位,而学生对词汇记忆方法的掌握又在一定程度上制约着英语教学质量的提高,应给予足够的重视。本文重点讨论在外语教学中如何利用“联想记忆法”来帮助学生提高单词记忆能力,希望能为英语词汇教学提供一些有益的启示。  相似文献   

5.
Changes in the vocabulary of information science over a period of eleven years were studied in order to determine the effect of such change on index vocabularies. The language of the Annual Review of Information Science and Technology was studied; the vocabulary was found to be changing at a rate of about 4% per year, with old terms leaving the vocabulary at about the same rate as new ones enter it. No change of usage among synonyms was found, but trends in the discipline showed up in changes in emphasis of the vocabulary. In particular, hardware-oriented terms seem to be declining in importance; there is some evidence that management and cataloging are becoming more important. Conclusions are: thesauri and other indexing vocabularies must provide for change as expected, with deletion or other provision for terms which pass out of use as important as addition of new terms; change of usage among synonyms is not significant in the short term (eleven years); and vocabulary change indicates the direction of growth in a discipline.  相似文献   

6.
自然语言理解心理学在短文本分类中的实证研究   总被引:1,自引:0,他引:1  
目前对文本分类研究多数集中在对大规模语料基础上的特征选择或分类器算法的研究。本文是建立在训练样本少且样本长度短的基础上,根据人脑对自然语言理解的心理学原理"人们总是根据已知的最熟悉的、最典型的例子进行判断,只有在该方法不奏效的时候才使用频率这一概念,并且使用的是十分简单的频率"从该角度进行短文本分类的实证研究。以心理学中的"熟悉原理"、"典型原理"等为模型建立特殊词库和典型案例词库,改进了传统文本分类的实验步骤,同时提出了该方法的优势和局限性。  相似文献   

7.
In image retrieval, most systems lack user-centred evaluation since they are assessed by some chosen ground truth dataset. The results reported through precision and recall assessed against the ground truth are thought of as being an acceptable surrogate for the judgment of real users. Much current research focuses on automatically assigning keywords to images for enhancing retrieval effectiveness. However, evaluation methods are usually based on system-level assessment, e.g. classification accuracy based on some chosen ground truth dataset. In this paper, we present a qualitative evaluation methodology for automatic image indexing systems. The automatic indexing task is formulated as one of image annotation, or automatic metadata generation for images. The evaluation is composed of two individual methods. First, the automatic indexing annotation results are assessed by human subjects. Second, the subjects are asked to annotate some chosen images as the test set whose annotations are used as ground truth. Then, the system is tested by the test set whose annotation results are judged against the ground truth. Only one of these methods is reported for most systems on which user-centred evaluation are conducted. We believe that both methods need to be considered for full evaluation. We also provide an example evaluation of our system based on this methodology. According to this study, our proposed evaluation methodology is able to provide deeper understanding of the system’s performance.  相似文献   

8.
Collaborative information behavior is an essential aspect of organizational work; however, we have very limited understanding of this behavior. Most models of information behavior focus on the individual seeker of information. In this paper, we report the results from two empirical studies that investigate aspects of collaborative information behavior in organizational settings. From these studies, we found that collaborative information behavior differs from individual information behavior with respect to how individuals interact with each other, the complexity of the information need, and the role of information technology. There are specific triggers for transitioning from individual to collaborative information behavior, including lack of domain expertise. The information retrieval technologies used affect collaborative information behavior by acting as important supporting mechanisms. From these results and prior work, we develop a model of collaborative information behavior along the axes of participant behavior, situational elements, and contextual triggers. We also present characteristics of collaborative information system including search, chat, and sharing. We discuss implications for the design of collaborative information retrieval systems and directions for future work.  相似文献   

9.
Parallel computation approaches for flexible multibody dynamics simulations   总被引:1,自引:0,他引:1  
Finite element based formulations for flexible multibody systems are becoming increasingly popular and as the complexity of the configurations to be treated increases, so does the computational cost. It seems natural to investigate the applicability of parallel processing to this type of problems; domain decomposition techniques have been used extensively for this purpose. In this approach, the computational domain is divided into non-overlapping sub-domains, and the continuity of the displacement field across sub-domain boundaries is enforced via the Lagrange multiplier technique. In the finite element literature, this approach is presented as a mathematical algorithm that enables parallel processing. In this paper, the divided system is viewed as a flexible multibody system, and the sub-domains are connected by kinematic constraints. Consequently, all the techniques applicable to the enforcement of constraints in multibody systems become applicable to the present problem. In particular, it is shown that a combination of the localized Lagrange multiplier technique with the augmented Lagrange formulation leads to interesting solution strategies.  相似文献   

10.
王莹芳 《科教文汇》2014,(20):180-181
阅读理解是一项综合能力的测试,对于中学生而言,由于学生掌握词汇量偏少、对难句和文章主旨缺乏理解,极易产生恐惧心理。阅读题型实际是有规律可遵循,学生需掌握选项与文章、重点句等之间的细节关联,积累词汇、培养推测能力可获取高分。  相似文献   

11.
陈文玲 《科教文汇》2012,(35):118-120
随机通达教学的核心是对同一内容的学习要在不同时间、不同情境,以不同目的、从不同角度多次进行,从而获得对同一事物更为全面的认识与理解.从随机通达理论的角度探讨课外阅读在增加英语词汇量和促进消极词汇向积极词汇转化两个方面的意义,并且就培养课外阅读习惯提出建议:转变阅读观念、选择感兴趣的题材、把握材料的合适难度、合理安排阅读时间等.  相似文献   

12.
王海建 《科教文汇》2011,(32):149-150
计算机专业英语已成为一门专业课,并在计算机应用中架起人机会话的桥梁。计算机英语与其他专业英语的最大区别就在于它的"日新月异",特点有:客观、严谨、准确、精练,专业术语多,缩略语经常出现,合成的新词多,介词短语、分词短语和名词性词组使用频繁,长句、祈使句和被动语态使用较多,方程与数字占有一定比例。因此,要学好计算机英语,首先要不断地学习新的计算机技术,这样才能对相关内容有很好的理解,对不断出现的新的计算机专业词汇要注意理解和记忆,在了解技术的基础上,结合对词汇的掌握就能很好地理解,多积累,就能不断提高自己的计算机英语水平。  相似文献   

13.
14.
Astronomy, like many domains, already has several sets of terminology in general use, referred to as controlled vocabularies. For example, the keywords for tagging journal articles, or the taxonomy of terms used to label image files. These existing vocabularies can be encoded into skos, a W3C proposed recommendation for representing vocabularies on the Semantic Web, so that computer systems can help users to search for and discover resources tagged with vocabulary concepts. However, this requires a search mechanism to go from a user-supplied string to a vocabulary concept.  相似文献   

15.
Recently the Wigner distribution has been shown to be a potentially useful tool for analysing the time varying frequency domain phenomenon. In this paper, some of the salient features of the Wigner distribution are presented; properties of this important discrete distribution are derived, and an efficient digital implementation is presented. Effective Wigner throughput rates, in excess of those obtainable with an equivalent length FFT, are shown to be feasible. In particular, the Wigner distribution is studied in the context of enhancing speech analysis and recognition systems. It is suggested that this class of distribution is consistent with the mechanics of human speech and, using experimentation, produces a very robust spectral signature. This enriched data space can be used to uncover some frequency domain attributes of human speech which may be lost using a discrete Fourier transform.  相似文献   

16.
Despite the recognized value that mobile BI (m-BI) brings to firms, our understanding of the use of m-BI and its determinants are limited. In this study, we suggest that m-BI system quality characteristics may be among the factors that influence m-BI use. Yet, in the information systems (IS) literature there is mixed support for the relationship between system quality and system use at the individual level. Given there is research suggesting that engaged users are an indication of the technology’s success, we believe that ‘engagement’ may be the key to understanding the relationship discrepancy between system quality and use. To address this gap, we conducted a quantitative study of key informants who use m-BI, to understand what the key m-BI capabilities are and other success dimensions perceived as important by users. The results indicate that m-BI system quality attributes affect m-BI use indirectly through engagement, with this finding contributing to understanding of the complexity of IS use in mobile technologies.  相似文献   

17.
One of the drawbacks of the controllability theory for nonlinear systems is that most existing controllability criteria are not algebraically verifiable, which makes them difficult to apply especially if the system dimension is high. Thus, it is a significant task to seek algebraically verifiable controllability criteria for nonlinear systems. In this paper, we study controllability of discrete-time inhomogeneous bilinear systems. In the classical results on controllability of such systems, a necessary condition is that the linear part has to be controllable. However, we will show that this condition is in fact not necessary for controllability. Specifically, we first define the spectrum for discrete-time inhomogeneous bilinear systems and reveal that the spectrum is a fundamental property which is very useful in investigating the controllability problems. We then present controllability criteria for the systems with real spectrum, which are algebraically verifiable. Furthermore, we also provide algorithms for the controllable systems to compute the exact or approximated control inputs to achieve the transition between any given pair of states. The presented controllability criteria and algorithms work for the systems in any finite dimension and are easy to implement. More importantly, through our controllability criteria, we reveal that controllability of the linear part is not necessary for discrete-time inhomogeneous bilinear systems to be controllable. Examples are given to illustrate the presented algebraic controllability criteria.  相似文献   

18.
Information systems for un-regimented domains such as museums, art and book collections, face representational and usability challenges that surpass the demands of traditional information systems for regimented domains. While the former require complex conceptual models supporting a set of dynamic and evolving qualitative properties of a small number of objects, the latter focus on the quantitative aspects of a possibly very large number of objects but with a relatively small and stable set of properties. In this paper we study the use of a non-monotonic knowledge-base system for the development of information systems for un-regimented domains. We discuss the ontological assumptions of the formalism, its structure and its inferential mechanisms through a simple example. Then we present an information system for a highly un-regimented domain in the digital humanities with promising results. The present study shows that the so-called extensible, flexible, dynamic or evolving information systems need the expressive power of non-monotonic knowledge-base systems, and that such phenomena should be addressed explicitly.  相似文献   

19.
The domain about what it means to give responsible and human centric recommendations in the context of Artificial Intelligence (AI)-based insurance has not yet been fully explored. In this article, we therefore, first provide an in-depth analysis and perform a systematic literature review on (i) the specifications and requirements for such systems from a regulation point of view, (ii) instructions on which data they can rely upon, (iii) which recommender techniques can be used for developing such an advisor, (iv) off-the-shelf components for the trustworthy, responsible, and ethical behavior of this AI-empowered tool. Then, we present a novel approach, based on AI, to suggest insurance coverage for users and families, as well as instructions on how to design such a system. The solution, as proposed in our paper, will be transparent, trustworthy, and responsible to the final users and thus, hopefully, better accepted by customers. After describing a possible system design and architecture, we critically discuss the challenges and opportunities for the deployment of such systems in insurance companies.  相似文献   

20.
杨霓 《科教文汇》2011,(26):117-117,139
本文试图对商务英语的词汇、语法、句式结构和篇章方面的特征进行初步探讨。商务英语属于特殊用途英语(ESP),是英语的一个分支,它与普通英语没有本质上的区别,既基于英语的基本词汇、语法和句法结构,但又具有其自身的特点,只有全面把握商务英语的特点,才能更为深入地理解商务英语。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号