共查询到20条相似文献,搜索用时 31 毫秒
1.
2.
为提高医学文献检索的效率和检索结果输出的有效性,快速客观地为科研人员提供高信度、低冗余的参考文献,实现检索结果按相关度排序输出,就基于向量空间模型的文献相关度计算方案进行探讨,提出基于相关度的医学文献聚类分析和相关度排序。 相似文献
3.
4.
基于向量空间模型的文献相关性数据库的研究与实现* 总被引:1,自引:0,他引:1
探讨“相关性”的概念,简述文献相关性数据库的研究现状,提出基于词表和特征项提取的向量空间模型,并在此基础上设计、构建中国生物医学工程文献相关性数据库及其检索系统。 相似文献
5.
刘华 《现代图书情报技术》2007,2(3):43-45
设计并实现一个基于向量空间模型和简单贝叶斯的文本分类系统,系统采用层级多标签的分类策略。详细介绍词语切分统计、终分类器值计算、层级小类校正和兼类判断四个子系统模块。基于向量空间模型分类的第一级大类和层级小类的微平均分别为89.7%和77.8%,简单贝叶斯分别为67.6%和66.5%。 相似文献
6.
7.
针对目前信息服务机构只能提供文献的检索服务而不提供表格检索功能这一现状,提出一种基于向量空间模型的表格检索算法,并从表格特征抽取、特征词权值设置、检索结果匹配排序等方面进行讨论,为未来表格检索服务提供一定的理论依据。 相似文献
8.
文本分类相似度模型和概率模型的实现与比较* 总被引:1,自引:0,他引:1
刘华 《现代图书情报技术》2006,1(4):53-55
设计并建立一个基于向量空间模型和简单贝叶斯的文本分类系统,系统引入小类校正和兼类判断的算法,完成层级多标签的分类。进行基于向量空间模型和简单贝叶斯分类效果的对比,实验证明,在约3万篇测试集上(共15个大类,244个小类),基于向量空间模型的大类分类高25.2个百分点,层级小类分类高26.3个百分点。 相似文献
9.
10.
[目的/意义] 构建基于关联数据的探索式检索系统,充分利用关联数据中隐藏的知识网络,向用户提供发现新知识的机会。[方法/过程] 基于DBpedia电影数据集,采用改进的向量空间模型进行关联数据的语义相似度计算,使用可视化的交互技术对检索结果进行呈现。[结果/结论] 通过基于任务的评价方法与IMDB进行对比,证明基于关联数据的探索式检索系统能够提高检索效率,提升用户体验并能引导用户发现其感兴趣的信息。 相似文献
11.
12.
在信息检索中,代数理论是构建检索模型的重要手段之一,以代数理论为基础的检索模型克服了布尔模型不能进行部分匹配的缺点而广为采用。本文分析了代数理论的向量空间模型,并对该模型进行了扩展:用最小项标引词以反映词与词之间的关系,用奇异值分解来捕捉文献的语义结构;最后对这三种模型进行了比较。 相似文献
13.
Search effectiveness metrics are used to evaluate the quality of the answer lists returned by search services, usually based
on a set of relevance judgments. One plausible way of calculating an effectiveness score for a system run is to compute the
inner-product of the run’s relevance vector and a “utility” vector, where the ith element in the utility vector represents the relative benefit obtained by the user of the system if they encounter a relevant
document at depth i in the ranking. This paper uses such a framework to examine the user behavior patterns—and hence utility weightings—that
can be inferred from a web query log. We describe a process for extrapolating user observations from query log clickthroughs,
and employ this user model to measure the quality of effectiveness weighting distributions. Our results show that for measures
with static distributions (that is, utility weighting schemes for which the weight vector is independent of the relevance
vector), the geometric weighting model employed in the rank-biased precision effectiveness metric offers the closest fit to
the user observation model. In addition, using past TREC data as to indicate likelihood of relevance, we also show that the
distributions employed in the BPref and MRR metrics are the best fit out of the measures for which static distributions do
not exist. 相似文献
14.
This paper describes a method, using Genetic Programming, to automatically determine term weighting schemes for the vector
space model. Based on a set of queries and their human determined relevant documents, weighting schemes are evolved which
achieve a high average precision. In Information Retrieval (IR) systems, useful information for term weighting schemes is
available from the query, individual documents and the collection as a whole.
We evolve term weighting schemes in both local (within-document) and global (collection-wide) domains which interact with
each other correctly to achieve a high average precision. These weighting schemes are tested on well-known test collections
and are compared to the traditional tf-idf weighting scheme and to the BM25 weighting scheme using standard IR performance metrics.
Furthermore, we show that the global weighting schemes evolved on small collections also increase average precision on larger
TREC data. These global weighting schemes are shown to adhere to Luhn’s resolving power as both high and low frequency terms
are assigned low weights. However, the local weightings evolved on small collections do not perform as well on large collections.
We conclude that in order to evolve improved local (within-document) weighting schemes it is necessary to evolve these on
large collections. 相似文献
15.
16.
《Microprocessing and Microprogramming》1994,40(2-3):91-102
Conventional object-oriented design methodologies lead to a hierarchy of classes, but do not suggest which classes/objects should be loaded on which processors in a distributed system. We present the Decomposition Cost Evaluation Model (DCEM) as an approach to this problem. DCEM brings the mapping problem to a higher level of abstraction where the question is which classes, rather than which tasks, should be loaded on which processors. To support these decisions we define communication and computation cost functions for objects, classes, and hierarchies.We then introduce Confined Space Search Decomposition (CSSD) which enhances parallel operations of applications utilizing a tree topology for the processor interconnection scheme. To reduce the penalties of load imbalance, we include a distributed dynamic load balancing heuristic called Object Reincarnation (OR) in which no additional communication costs are incurred. 相似文献
17.
We investigate the effect of feature weighting on document clustering, including a novel investigation of Okapi BM25 feature
weighting. Using eight document datasets and 17 well-established clustering algorithms we show that the benefit of tf-idf weighting over tf weighting is heavily dependent on both the dataset being clustered and the algorithm used. In addition, binary weighting
is shown to be consistently inferior to both tf-idf weighting and tf weighting. We investigate clustering using both BM25 term saturation in isolation and BM25 term saturation with idf, confirming that both are superior to their non-BM25 counterparts under several common clustering quality measures. Finally,
we investigate estimation of the k1 BM25 parameter when clustering. Our results indicate that typical values of k1 from other IR tasks are not appropriate for clustering; k1 needs to be higher. 相似文献
18.
汉语文本结构的自动分析 总被引:5,自引:1,他引:4
本文试图运用向量空间模型来确定文本段落之间内容的相关性,从而实现文本主题的自动分析,找出构成文本大主题的各个小主题,从这些小主题入手来实现自动文摘,可为自动文摘技术探索一条新途径。另一方面,通过文本结构的自动分析,可确定文本结构的类型,也为全文检索等信息处理技术提供一些有用的信息。 相似文献
19.
基于IIG和LSI组合特征提取方法的文本聚类研究 总被引:8,自引:0,他引:8
本文利用改进的信息增益特征选择方法和潜在语义索引技术组合的特征提取方法 ,对文本进行了有效的自动聚类。从语料库中抽取了 2 5 0篇文本 ,首先利用向量空间模型和改进的信息增益特征选择方法 ,构造文本特征向量 ,利用C 均值方法聚类 ,聚类结果准确率、查全率、F measure分别达到 0 .82、0 . 88、0 .83。在此基础上 ,对最优的特征选择结果运用潜在语义索引方法 ,对奇异值分解的结果进行截断处理 ,发现奇异值K取 4 0时聚类结果的准确率、查全率、F measure达到 0 . 95、0. 5 7、0 . 78,在有效地降维的同时 ,大幅度地提高了聚类的准确率。 相似文献