首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The methods of information queries building in the SDI systems on the basis of the user's publications are presented in this paper. In most cases the users of the SDI system are scientists whose work is marked by publications resulting from the research they do. It was found that the users' publications may constitute input data for information queries building.The examination of the possible compatibility between the user's information queries and his publications consisted of determining the similarity between of a set of keywords indexed from the information query and a set of keywords indexed from the user's publications.Two methods of information query constructions determined by logical operators AND, OR, NOT and a set of weighted keywords are described.  相似文献   

2.
3.
基于词嵌入语义的精准检索式构建方法   总被引:1,自引:0,他引:1  
[目的/意义]使用科技文献数据库进行文献检索时,检索式中的关键词如果不够全面,将导致检索结果查全率较低;检索式中的关键词如果一词多义,则可能向检索结果中引入无关文献,导致查准率较低。[方法/过程]针对这两类问题,本文提出使用词嵌入这一新颖的文本数据化表现形式,一方面通过语义分析对检索关键词进行扩充从而提高查全率;另一方面通过发现语义异常点来提高查准率。[结果/结论]本文将该方法应用于人工智能领域中深度学习方向上的文献检索式构建,实验结果表明该方法能在一定程度上提高检索的查全率和查准率。  相似文献   

4.
Although most of the queries submitted to search engines are composed of a few keywords and have a length that ranges from three to six words, more than 15% of the total volume of the queries are verbose, introduce ambiguity and cause topic drifts. We consider verbosity a different property of queries from length since a verbose query is not necessarily long, it might be succinct and a short query might be verbose. This paper proposes a methodology to automatically detect verbose queries and conditionally modify queries. The methodology proposed in this paper exploits state-of-the-art classification algorithms, combines concepts from a large linguistic database and uses a topic gisting algorithm we designed for verbose query modification purposes. Our experimental results have been obtained using the TREC Robust track collection, thirty topics classified by difficulty degree, four queries per topic classified by verbosity and length, and human assessment of query verbosity. Our results suggest that the methodology for query modification conditioned to query verbosity detection and topic gisting is significantly effective and that query modification should be refined when topic difficulty and query verbosity are considered since these two properties interact and query verbosity is not straightforwardly related to query length.  相似文献   

5.
This paper reports our experimental investigation into the use of more realistic concepts as opposed to simple keywords for document retrieval, and reinforcement learning for improving document representations to help the retrieval of useful documents for relevant queries. The framework used for achieving this was based on the theory of Formal Concept Analysis (FCA) and Lattice Theory. Features or concepts of each document (and query), formulated according to FCA, are represented in a separate concept lattice and are weighted separately with respect to the individual documents they present. The document retrieval process is viewed as a continuous conversation between queries and documents, during which documents are allowed to learn a set of significant concepts to help their retrieval. The learning strategy used was based on relevance feedback information that makes the similarity of relevant documents stronger and non-relevant documents weaker. Test results obtained on the Cranfield collection show a significant increase in average precisions as the system learns from experience.  相似文献   

6.
In sponsored search, many advertisers have not achieved their expected performances while the search engine also has a large room to improve their revenue. Specifically, due to the improper keyword bidding, many advertisers cannot survive the competitive ad auctions to get their desired ad impressions; meanwhile, a significant portion of search queries have no ads displayed in their search result pages, even if many of them have commercial values. We propose recommending a group of relevant yet less-competitive keywords to an advertiser. Hence, the advertiser can get the chance to win some (originally empty) ad slots and accumulate a number of impressions. At the same time, the revenue of the search engine can also be boosted since many empty ad shots are filled. Mathematically, we model the problem as a mixed integer programming problem, which maximizes the advertiser revenue and the relevance of the recommended keywords, while minimizing the keyword competitiveness, subject to the bid and budget constraints. By solving the problem, we can offer an optimal group of keywords and their optimal bid prices to an advertiser. Simulation results have shown the proposed method is highly effective in increasing ad impressions, expected clicks, advertiser revenue, and search engine revenue.  相似文献   

7.
Both general and domain-specific search engines have adopted query suggestion techniques to help users formulate effective queries. In the specific domain of literature search (e.g., finding academic papers), the initial queries are usually based on a draft paper or abstract, rather than short lists of keywords. In this paper, we investigate phrasal-concept query suggestions for literature search. These suggestions explicitly specify important phrasal concepts related to an initial detailed query. The merits of phrasal-concept query suggestions for this domain are their readability and retrieval effectiveness: (1) phrasal concepts are natural for academic authors because of their frequent use of terminology and subject-specific phrases and (2) academic papers describe their key ideas via these subject-specific phrases, and thus phrasal concepts can be used effectively to find those papers. We propose a novel phrasal-concept query suggestion technique that generates queries by identifying key phrasal-concepts from pseudo-labeled documents and combines them with related phrases. Our proposed technique is evaluated in terms of both user preference and retrieval effectiveness. We conduct user experiments to verify a preference for our approach, in comparison to baseline query suggestion methods, and demonstrate the effectiveness of the technique with retrieval experiments.  相似文献   

8.
李海林  林春培 《科研管理》2022,43(1):176-183
   鉴于传统方法对科研成果关键词研究存在较强主观影响和较少考虑时间因素等问题,提出基于时间序列聚类的科研成果关键词分析方法。该方法通过统计分析方法验证关键词出现顺序在一定程度上反映了关键词反映主题思想的重要性,将关键词的重要度转化为时间序列数据,分别从重要度的数值和趋势两个角度出发,使用动态时间弯曲方法度量关键词重要度时间序列数据之间的相似性,结合近邻传播方法对关键词时间序列数据之间的相似性矩阵进行聚类分析,实现科研成果的关键词分析研究。通过对某科研管理类重要期刊2008—2017年期间刊发的科研成果论文关键词研究发现:新方法不仅可以对科研成果中关键词的关注热度和趋势进行聚类划分,自适应地找到中心关键词作为相应类别的特征代表对象,还能为科研成果关键词的主题分析提供理论方法和决策支持。  相似文献   

9.
Assessing the similarity of scientific outputs based on an indicator has not been addressed much so far. The topic, however, may find several potential applications which can help enrich procedures of ranking, research monitoring, and scientific policy-making. The present study offers a new method to quantify such similarities based on keyword co-occurrence matrices. In the proposed method, first, the keyword co-occurrence networks are transformed into their associated newly defined fuzzy sets, named as scientosemantic domains. Then, a fuzzy distance between the two domains is found based on an arbitrary indicator. In this paper, the three indicators of frequency, development and investment appeal are used. The proposed method is implemented for five types of concept comparison. For each type, concepts are represented by a canonical keyword with different field codes. Scientosemantic domains of concepts are sourced out of bibliometric data obtained from appropriate queries on SCOPUS. Number of keywords used to define scientosemantic domains ranges from about 30 to 800. Since indicator-based comparison of scientosemantic domains are not dealt with in the literature, the obtained distances between concepts are verified by qualitative and expert evaluations. For all cases, frequency- and development-based distances are less than those for investment appeal; while crisp distances for the latter extend beyond 0.6, the former does not exceed 0.3. The greatest distances are observed for investment appeal in technology-related keywords.  相似文献   

10.
Most current document retrieval systems require that user queries be specified in the form of Boolean expressions. Although Boolean queries work, they have flaws. Some of the attempts to overcome these flaws have involved “partial-match” retrieval or the use of fuzzy-subset theory. Recently, some generalizations of fuzzy-subset theory have been suggested that would allow the user to specify queries with relevance weights or thresholds attached to terms. The various query-processing methods are discussed and compared.  相似文献   

11.
Traditional approaches to information retrieval, based on automatic or manually constructed keywords, are inappropriate for certain desirable tasks in an intelligent information system. Obtaining simple answers to direct questions, a summary of an event sequence that could span multiple documents, and an update of recent developments in an ongoing event sequence are three examples of such tasks.In this paper, the SCISOR system is described. SCISOR illustrates the potential for increased recall and precision of stored information through the understanding in context of articles in its domain of corporate takeovers. A constrained form of marker passing is used to answer queries of the knowledge base posed in natural language. Among other desirable characteristics, this method of retrieval focuses search on likely candidates, and tolerates incomplete or incorrect input indices very well.  相似文献   

12.
文章提出了科技论文关键词的战略图分析方法,从论文作者关键词、机器标引关键词和标题摘要中抽取的关键词中选择关键词,以消除标引效应,通过聚类将关键词划分为研究主题簇,计算研究主题簇的向心度指标和密度指标,绘制战略图,将研究主题簇分为4类,据此分析问题领域现状;将数据分为若干阶段,分别形成战略图,通过计算相邻阶段的主题簇的相似度指标、起源指标和影响指标,了解研究主题变迁和相互关系。实验证明了战略图分析方法的有效性。  相似文献   

13.
Four advantages of storing and retrieving geometric figures and chromosome images through the use of shape-oriented similarity measures are presented. A complemented but not distributive lattice and a distributive but not complemented lattice are found. Answers to triangle related fuzzy queries such as “retrieve the triangles which are very similar to isosceles triangles but not similar to a given triangle Δx”, and chromosome related fuzzy queries such as “retrieve the chromosomes which are more or less similar to median chromosomes and very very similar to a given chromosome A” are presented and illustrated by examples. For shape-oriented storage of triangles, it is proposed to store the angles of each triangle in decreasing order of the magnitude and logically order all the triangles according to the magnitude of the angles. For shape-oriented storage of chromosomes, it is proposed to logically order all the chromosomes individually and independently according to the angular sums of its exterior biangles and interior biangles. The results may have useful applications in information storage and retrieval, artificial intelligence and pattern recognition.  相似文献   

14.
化柏林 《情报科学》2007,25(8):1176-1179,1189
摘 要:应用型计量分析分为四类,其中主题型计量分析与评价型计量分析占主流,而预测型计量分析与资源获取型计量分析却很少.本实验以获取可计算资源为目的,从中文科技期刊数据库(重庆维普)选取了1989年到2005年的17种图书情报学核心期刊(2004版)的所有论文,利用VBA对文献的关键词进行统计分析,主要从数量分布、词长规律、增长趋势以及关键词与文章的数量关系进行了分析,并按功能对关键词进行了分类.  相似文献   

15.
图书情报学核心期刊论文关键词计量分析研究(上)   总被引:3,自引:0,他引:3  
化柏林 《情报科学》2007,25(5):699-703
应用型计量分析分为四类,其中主题型计量分析与评价型计量分析占主流,而预测型计量分析与资源获取型计量分析却很少。本实验以获取可计算资源为目的,从中文科技期刊数据库(重庆维普)选取了1989年到2005年的17种图书情报学核心期刊(2004版)的所有论文,利用VBA对文献的关键词进行统计分析,主要从数量分布、词长规律、增长趋势以及关键词与文章的数量关系进行了分析,并按功能对关键词进行了分类。  相似文献   

16.
高劲松  黄梅  付家炜 《现代情报》2021,40(12):130-139
[目的/意义] 能以简洁的可视化来追踪某学科研究热点随时间的变化趋势,对于掌握学科研究热点的动向具有重要意义。词频分析法是学科研究热点分析方法之一,目前存在众多的基于词频分析的可视化工具,但是这些可视化工具能够以简洁的可视化形式清晰地展现年度热点存在局限性。[方法/过程] 因此本文提出通过学科领域年度发文量与学科全部发文量的比值来衡量年度热点关键词对总年度热点关键词贡献率的可视化方法:基于年度贡献率与二八定律设定并调整阈值参数来控制年度高频关键词的呈现数量,将选取的年度高频关键词按照词频大小与年份依次排序以实现研究热点可视化。[结果/结论]以"关联数据"领域为例进行实证研究,通过分析本文方法提取的高频关键词与现有高频词阈值算法的匹配情况,对比本文方法与Citespace共现图谱的可视化呈现效果,对本文方法的可行性进行检验与评价。  相似文献   

17.
闫莉莉  程刚 《现代情报》2015,35(8):22-27
以Web of Science中所有数据库2005-2014年的数据作为统计来源,对其进行计量分析,找出高频关键词,运用关键词共词分析,利用文献计量软件Bibexcel生成高频关键词共词矩阵,结合Netdraw绘制关键词网络可视图,借助SPSS进行聚类分析、多维尺度分析,探究高频关键词存在的内在联系,分析近十年来知识密集服务领域中的研究现状和发展趋势,以期为后续研究提供参考。  相似文献   

18.
[研究目的]寻找国内情报学领域期刊论文研究热点的分布特性,探讨我国情报学领域学术研究热点的基本演变轨迹。[研究方法]选择国内情报学领域21种期刊论文的关键词进行统计分析,统计时间从2000年至2020年,统计数据在万方数据库中进行。首先选择年度排序在前30次以上的关键词作为热点研究的统计对象,由此从中选出60个关键词作为情报期刊的基本热点关键词。其次用选出的60个关键词在万方数据库的全学科(注:这里的全学科指在万方数据库收录的所有学科)关键词中进行检索,检索时间为2000年至2020年,最后对比分析情报期刊的热点关键词在全学科中的个性化特征和全局性分布特征。[研究结论]实验结果表明,该统计分析揭示了我国情报期刊论文近20年来三个阶段的热点关键词演变轨迹。情报学热点关键词与全学科关键词存在超前和滞后现象,反映了情报学科与全学科具有相互学习和相互推进的发展规律,借此规律对我国未来情报学热点进行预测,预测出我国情报期刊论文未来延续出现和可能出现的72个热点关键词。  相似文献   

19.
In this paper, we define and present a comprehensive classification of user intent for Web searching. The classification consists of three hierarchical levels of informational, navigational, and transactional intent. After deriving attributes of each, we then developed a software application that automatically classified queries using a Web search engine log of over a million and a half queries submitted by several hundred thousand users. Our findings show that more than 80% of Web queries are informational in nature, with about 10% each being navigational and transactional. In order to validate the accuracy of our algorithm, we manually coded 400 queries and compared the results from this manual classification to the results determined by the automated method. This comparison showed that the automatic classification has an accuracy of 74%. Of the remaining 25% of the queries, the user intent is vague or multi-faceted, pointing to the need for probabilistic classification. We discuss how search engines can use knowledge of user intent to provide more targeted and relevant results in Web searching.  相似文献   

20.
This paper explores the integration of textual and visual information for cross-language image retrieval. An approach which automatically transforms textual queries into visual representations is proposed. First, we mine the relationships between text and images and employ the mined relationships to construct visual queries from textual ones. Then, the retrieval results of textual and visual queries are combined. To evaluate the proposed approach, we conduct English monolingual and Chinese–English cross-language retrieval experiments. The selection of suitable textual query terms to construct visual queries is the major issue. Experimental results show that the proposed approach improves retrieval performance, and use of nouns is appropriate to generate visual queries.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号