共查询到20条相似文献,搜索用时 31 毫秒
1.
2.
Information filtering is an area getting more important as we have long been flooded with too much information, where product brokering in e-commerce is a typical example. Systems which can provide personalized product recommendations to their users (often called recommender systems) have gained a lot of interest in recent years. Collaborative filtering is one of the commonly used approaches which normally requires a definition of user similarity measure. In the literature, researchers have proposed different choices for the similarity measure using different approaches, and yet there is no guarantee for optimality. In this paper, we propose the use of machine learning techniques to learn the optimal user similarity measure as well as user rating styles for enhancing recommendation acurracy. Based on a criterion function measuring the overall prediction error, several ratings transformation functions for modeling rating styles together with their learning algorithms are derived. With the help of the formulation and the optimization framework, subjective components in user ratings are removed so that the transformed ratings can then be compared. We have evaluated our proposed methods using the EachMovie dataset and succeeded in obtaining significant improvement in recommendation accuracy when compared with the standard correlation-based algorithm. 相似文献
3.
探讨了音乐旋律特征的匹配检索,通过将检索过程分解为三个步骤:字符串匹配检索、相似度计算和相关度计算来对旋律轮廓中的不同特征进行相应的计算处理,得到最终的检索结果并总结了音乐旋律特征的匹配检索模型。 相似文献
4.
[目的/意义] 活态民间音乐文化资源是指存活于民间的、鲜活的、发展的音乐文化资源。为记录和保存民间音乐文化,展现当下活态音乐的生存模式,中国音乐学院图书馆于2004年成立项目对活态音乐资源进行主动收集和保存。[方法/过程] 以中国音乐学院民间音乐资源建设项目为例,运用人类学、民族音乐学等方法和"走出去、请进来"的方式,将民间音乐及与其存活相连的文化事项整合考察,并以录音、摄像等多种形式收集第一手资料。[结果/结论] 活态民间音乐文化资源建设是一项庞大的音乐文献建设工程,在特色资源建设、艺术实践、文化传承等方面都发挥着重要作用,未来还将面临更大的挑战。 相似文献
5.
基于组合加权评分的Item-based协同过滤算法 总被引:1,自引:0,他引:1
马丽 《现代图书情报技术》2008,24(11):60-64
针对Item-based协同过滤算法中用户评分数据稀疏性严重影响推荐质量的问题,提出一种基于组合加权评分的Item-based协同过滤算法,以用户评分项并集作为用户相似性计算基础,并提出一种组合加权评分方法来对并集中的未评分项进行计算和填补,从而降低了数据稀疏性。实验结果表明该算法能有效提高推荐质量。 相似文献
6.
LETOR: A benchmark collection for research on learning to rank for information retrieval 总被引:2,自引:0,他引:2
LETOR is a benchmark collection for the research on learning to rank for information retrieval, released by Microsoft Research
Asia. In this paper, we describe the details of the LETOR collection and show how it can be used in different kinds of researches.
Specifically, we describe how the document corpora and query sets in LETOR are selected, how the documents are sampled, how
the learning features and meta information are extracted, and how the datasets are partitioned for comprehensive evaluation.
We then compare several state-of-the-art learning to rank algorithms on LETOR, report their ranking performances, and make
discussions on the results. After that, we discuss possible new research topics that can be supported by LETOR, in addition
to algorithm comparison. We hope that this paper can help people to gain deeper understanding of LETOR, and enable more interesting
research projects on learning to rank and related topics. 相似文献
7.
Contextual factors greatly affect users’ preferences for music, so they can benefit music recommendation and music retrieval. However, how to acquire and utilize the contextual information is still facing challenges. This paper proposes a novel approach for context-aware music recommendation, which infers users’ preferences for music, and then recommends music pieces that fit their real-time requirements. Specifically, the proposed approach first learns the low dimensional representations of music pieces from users’ music listening sequences using neural network models. Based on the learned representations, it then infers and models users’ general and contextual preferences for music from users’ historical listening records. Finally, music pieces in accordance with user’s preferences are recommended to the target user. Extensive experiments are conducted on real world datasets to compare the proposed method with other state-of-the-art recommendation methods. The results demonstrate that the proposed method significantly outperforms those baselines, especially on sparse data. 相似文献
8.
基于旋律的音乐检索研究 --旋律特征的表达和提取 总被引:6,自引:1,他引:5
介绍了基本的旋律轮廓表达方式。并通过对常用音频文件格式的比较,总结了从MIDI格式文件中提取音乐旋律的优点及主旋律通道的判断方法。最后分析了乐句的定义及其在旋律检索中的重要性并总结了音乐旋律特征的表达和提取模型。 相似文献
9.
We introduce fast filtering methods for content-based music retrieval problems, where the music is modeled as sets of points
in the Euclidean plane, formed by the (on-set time, pitch) pairs. The filters exploit a precomputed index for the database,
and run in time dependent on the query length and intermediate output sizes of the filters, being almost independent of the
database size. With a quadratic size index, the filters are provably lossless for general point sets of this kind. In the
context of music, the search space can be narrowed down, which enables the use of a linear sized index for effective and efficient
lossless filtering. For the checking phase, which dominates the overall running time, we exploit previously designed algorithms
suitable for local checking. In our experiments on a music database, our best filter-based methods performed several orders
of a magnitude faster than the previously designed solutions. 相似文献
10.
一种新的数字图书馆图像检索算法 总被引:1,自引:0,他引:1
提出一种适应图书馆特点的视觉特征和高层语义相结合的图像检索算法,通过相关反馈构建了动态的相似性度量方程。实验结果表明,综合视觉特征和语义特征的检索比仅利用视觉特征的检索能获得更高的检索率。 相似文献
11.
Margaret Olufunke Adeogun 《Public Services Quarterly》2016,12(1):1-21
The academic library continues to formulate strategies for providing and sustaining a creative learning environment for knowledge creation. But little has been said about its role in skills building through micro employment that is enabling students to develop and integrate their academic, personal, and social skills sets. This study examines the role that Access Services plays in boosting the learning experiences of student employees in readiness for workplace integration. A survey that seeks to assess their experience was administered to 32 student employees at Access Services. The results indicate that student employees, through information delivery functions, gain personal transferable skills crucial for employment. 相似文献
12.
Most current machine learning methods for building search engines are based on the assumption that there is a target evaluation
metric that evaluates the quality of the search engine with respect to an end user and the engine should be trained to optimize
for that metric. Treating the target evaluation metric as a given, many different approaches (e.g. LambdaRank, SoftRank, RankingSVM,
etc.) have been proposed to develop methods for optimizing for retrieval metrics. Target metrics used in optimization act
as bottlenecks that summarize the training data and it is known that some evaluation metrics are more informative than others.
In this paper, we consider the effect of the target evaluation metric on learning to rank. In particular, we question the
current assumption that retrieval systems should be designed to directly optimize for a metric that is assumed to evaluate
user satisfaction. We show that even if user satisfaction can be measured by a metric X, optimizing the engine on a training
set for a more informative metric Y may result in a better test performance according to X (as compared to optimizing the
engine directly for X on the training set). We analyze the situations as to when there is a significant difference in the
two cases in terms of the amount of available training data and the number of dimensions of the feature space. 相似文献
13.
14.
Staša Milojević 《Journal of Informetrics》2013,7(4):767-773
There are a number of solutions that perform unsupervised name disambiguation based on the similarity of bibliographic records or common coauthorship patterns. Whether the use of these advanced methods, which are often difficult to implement, is warranted depends on whether the accuracy of the most basic disambiguation methods, which only use the author's last name and initials, is sufficient for a particular purpose. We derive realistic estimates for the accuracy of simple, initials-based methods using simulated bibliographic datasets in which the true identities of authors are known. Based on the simulations in five diverse disciplines we find that the first initial method already correctly identifies 97% of authors. An alternative simple method, which takes all initials into account, is typically two times less accurate, except in certain datasets that can be identified by applying a simple criterion. Finally, we introduce a new name-based method that combines the features of first initial and all initials methods by implicitly taking into account the last name frequency and the size of the dataset. This hybrid method reduces the fraction of incorrectly identified authors by 10–30% over the first initial method. 相似文献
15.
[目的/意义] 学科主题演化研究有助于掌握学科发展现状、研究热点、研究前沿和发展趋势等情况,是进行科技创新的基础,是面向科技创新的重要研究方向。[方法/过程] 提出一种语义分类的学科主题演化分析方法:将关键词分为研究问题、研究方法和研究技术3类,构建不同语义分类的共词网络;然后基于Fast Unfolding社区发现算法识别具有语义特征的社区(主题);利用相似度算法计算相邻子时期主题间的相似度,构建学科主题演化图谱,以分析某学科领域研究问题、研究方法和研究技术的变化,实现深度、细致的学科主题演化分析。[结果/结论] 通过对2012-2015年CNKI数据库收录的我国大数据研究领域相关论文数据的处理分析,证明该方法的准确性和有效性。 相似文献
16.
《Government Information Quarterly》2014,31(4):534-544
This paper investigates how text analysis and classification techniques can be used to enhance e-government, typically law enforcement agencies' efficiency and effectiveness by analyzing text reports automatically and provide timely supporting information to decision makers. With an increasing number of anonymous crime reports being filed and digitized, it is generally difficult for crime analysts to process and analyze crime reports efficiently. Complicating the problem is that the information has not been filtered or guided in a detective-led interview resulting in much irrelevant information. We are developing a decision support system (DSS), combining natural language processing (NLP) techniques, similarity measures, and machine learning, i.e., a Naïve Bayes' classifier, to support crime analysis and classify which crime reports discuss the same and different crime. We report on an algorithm essential to the DSS and its evaluations. Two studies with small and big datasets were conducted to compare the system with a human expert's performance. The first study includes 10 sets of crime reports discussing 2 to 5 crimes. The highest algorithm accuracy was found by using binary logistic regression (89%) while Naive Bayes' classifier was only slightly lower (87%). The expert achieved still better performance (96%) when given sufficient time. The second study includes two datasets with 40 and 60 crime reports discussing 16 different types of crimes for each dataset. The results show that our system achieved the highest classification accuracy (94.82%), while the crime analyst's classification accuracy (93.74%) is slightly lower. 相似文献
17.
基于概念向量空间的文档语义分类模型研究 总被引:1,自引:0,他引:1
18.
《Journal of Informetrics》2022,16(4):101343
In this study, MatrixSim, a new method for detecting the evolution paths of research topics based on matrix similarity, was proposed. In the analysis of research topic evolution with the help of co-word networks, in contrast to traditional methods of topic evolution path detection, such as cosine similarity and edge similarity, MatrixSim is based on the local community structure of topic communities in co-word networks and considers the similarity of research topics in both nodes and edges, that is, words and inter-word relations. Using the library and information science field as an example, two sets of experiments were designed for topic similarity detection and subject-specific research topic evolution analysis to evaluate and verify the performance of MatrixSim in detecting the evolution paths of research topics and its validity and feasibility in research topic evolution analysis. The results confirm that MatrixSim performs well in detecting the evolution paths of research topics. It can correlate important research topics, help describe the research development process in scientific fields, reveal the internal evolutionary features of research topics, and thus discover and track the research frontiers in scientific fields. This study provides significant methodological support for researchers conducting prospective research activities. 相似文献
19.
基于多层特征的字符串相似度计算模型 总被引:18,自引:6,他引:12
针对计算字符串相似度传统方法的不足之处,提出以相似元作为字符串的基本处理单元,综合考虑相似元的字面、语义及统计关联等多层特征的字符串相似度计算方法。对常规计算方法中存在的,由相似元排序引起的相似元位置信息丢失问题进行了修正。实验结果表明该算法的有效性,并且对句子间、段落间的相似度计算有启发意义。 相似文献
20.
Knowledge transfer for cross domain learning to rank 总被引:1,自引:1,他引:0
Depin Chen Yan Xiong Jun Yan Gui-Rong Xue Gang Wang Zheng Chen 《Information Retrieval》2010,13(3):236-253
Recently, learning to rank technology is attracting increasing attention from both academia and industry in the areas of machine
learning and information retrieval. A number of algorithms have been proposed to rank documents according to the user-given
query using a human-labeled training dataset. A basic assumption behind general learning to rank algorithms is that the training
and test data are drawn from the same data distribution. However, this assumption does not always hold true in real world
applications. For example, it can be violated when the labeled training data become outdated or originally come from another
domain different from its counterpart of test data. Such situations bring a new problem, which we define as cross domain learning
to rank. In this paper, we aim at improving the learning of a ranking model in target domain by leveraging knowledge from
the outdated or out-of-domain data (both are referred to as source domain data). We first give a formal definition of the
cross domain learning to rank problem. Following this, two novel methods are proposed to conduct knowledge transfer at feature
level and instance level, respectively. These two methods both utilize Ranking SVM as the basic learner. In the experiments,
we evaluate these two methods using data from benchmark datasets for document retrieval. The results show that the feature-level
transfer method performs better with steady improvements over baseline approaches across different datasets, while the instance-level
transfer method comes out with varying performance depending on the dataset used. 相似文献