首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Blog feed search aims to identify a blog feed of recurring interest to users on a given topic. A blog feed, the retrieval unit for blog feed search, comprises blog posts of diverse topics. This topical diversity of blog feeds often causes performance deterioration of blog feed search. To alleviate the problem, this paper proposes several approaches based on passage retrieval, widely regarded as effective to handle topical diversity at document level in ad-hoc retrieval. We define the global and local evidence for blog feed search, which correspond to the document-level and passage-level evidence for passage retrieval, respectively, and investigate their influence on blog feed search, in terms of both initial retrieval and pseudo-relevance feedback. For initial retrieval, we propose a retrieval framework to integrate global evidence with local evidence. For pseudo-relevance feedback, we gather feedback information from the local evidence of the top K ranked blog feeds to capture diverse and accurate information related to a given topic. Experimental results show that our approaches using local evidence consistently and significantly outperform traditional ones.  相似文献   

2.
Credibility-inspired ranking for blog post retrieval   总被引:1,自引:0,他引:1  
Credibility of information refers to its believability or the believability of its sources. We explore the impact of credibility-inspired indicators on the task of blog post retrieval, following the intuition that more credible blog posts are preferred by searchers. Based on a previously introduced credibility framework for blogs, we define several credibility indicators, and divide them into post-level (e.g., spelling, timeliness, document length) and blog-level (e.g., regularity, expertise, comments) indicators. The retrieval task at hand is precision-oriented, and we hypothesize that the use of credibility-inspired indicators will positively impact precision. We propose to use ideas from the credibility framework in a reranking approach to the blog post retrieval problem: We introduce two simple ways of reranking the top n of an initial run. The first approach, Credibility-inspired reranking, simply reranks the top n of a baseline based on the credibility-inspired score. The second approach, Combined reranking, multiplies the credibility-inspired score of the top n results by their retrieval score, and reranks based on this score. Results show that Credibility-inspired reranking leads to larger improvements over the baseline than Combined reranking, but both approaches are capable of improving over an already strong baseline. For Credibility-inspired reranking the best performance is achieved using a combination of all post-level indicators. Combined reranking works best using the post-level indicators combined with comments and pronouns. The blog-level indicators expertise, regularity, and coherence do not contribute positively to the performance, although analysis shows that they can be useful for certain topics. Additional analysis shows that a relative small value of n (15–25) leads to the best results, and that posts that move up the ranking due to the integration of reranking based on credibility-inspired indicators do indeed appear to be more credible than the ones that go down.  相似文献   

3.
传统主流媒体作为科学传播的重要主体,在科学事件报道和科学知识科普中扮演着重要角色。为更好地了解传统主流媒体在社交媒体上进行在线科学传播的特点和效果,本文探究了主流媒体科学类博文的文本特征及其对传播效果的影响。首先,本研究获取了九大官方主流媒体于2021年全年在微博平台上所发布的全部11万余条博文,根据科学传播相关的关键词筛选出6000余条科学类博文。基于LDA对文本数据集进行主题建模分析,归纳出29个一级主题和7个二级主题,得到主流媒体科学传播的整体主题分布情况。具体主题所囊括的意涵显示,主流媒体既对科学发现和科技创新进行及时且持续的报道,也生产分发与大众密切相关的社会民生、健康等知识普及类内容。其次,本文对抽样得到的样本数据集进行基于人工编码的内容分析,得到样本中每条博文的情感立场和引用源。最后,本文对主题、情感立场和引用源三个文本特征与博文的转发、点赞、评论三个传播效果表征指标之间的关系进行研究。结果显示,主题和情感立场对博文的三个传播指标均产生显著影响,引用源则并无显著影响。主题为社会民生类科学知识普及、持正向情感立场的博文的传播效果显著好于其他博文。大众对具有不同文本特征的科学类博文有着不同的传播积极性,与日常生活知识科普密切相关、更容易引起情感共鸣的内容能够获得更好的传播效果。  相似文献   

4.
A metric analysis of blogs on library and information science (LIS) between November 2006 and June 2009 indexed on the Libworm search engine characterizes the community's behavior quantitatively. An analysis of 1108 personal and corporate blogs with a total of 275,103 posts is used to calculate survival rate, production (number of posts published), and visibility via such indicators as links received, Technorati authority, and Google's PagePank. Over the study period, there was a 52% decrease in the number of active blogs. Despite the drop in production over this period, the average number of posts per blog remained constant (14 per month). The most representative blogs in the discipline are identified. The emergence of such platforms as Facebook and Twitter seems to have meant that both personal and corporate blogs have lost some of their prominence.  相似文献   

5.
6.
Patent prior art search is a type of search in the patent domain where documents are searched for that describe the work previously carried out related to a patent application. The goal of this search is to check whether the idea in the patent application is novel. Vocabulary mismatch is one of the main problems of patent retrieval which results in low retrievability of similar documents for a given patent application. In this paper we show how the term distribution of the cited documents in an initially retrieved ranked list can be used to address the vocabulary mismatch. We propose a method for query modeling estimation which utilizes the citation links in a pseudo relevance feedback set. We first build a topic dependent citation graph, starting from the initially retrieved set of feedback documents and utilizing citation links of feedback documents to expand the set. We identify the important documents in the topic dependent citation graph using a citation analysis measure. We then use the term distribution of the documents in the citation graph to estimate a query model by identifying the distinguishing terms and their respective weights. We then use these terms to expand our original query. We use CLEF-IP 2011 collection to evaluate the effectiveness of our query modeling approach for prior art search. We also study the influence of different parameters on the performance of the proposed method. The experimental results demonstrate that the proposed approach significantly improves the recall over a state-of-the-art baseline which uses the link-based structure of the citation graph but not the term distribution of the cited documents.  相似文献   

7.
blog信息源的信息组织与利用   总被引:6,自引:0,他引:6  
颜丽君 《图书情报工作》2004,48(11):107-110
分析blog信息源的类型及特征和blog的信息自组织方式、分类方式、搜索方式,重点论述如何利用blog信息组织的特点改进blog信息搜索。  相似文献   

8.
目前学术博客存在分类类目设置不全、标签使用不够规范、知识组织脱离语义环境、RSS内容分析深度不够等不足。在分析博客个性化特征和学术规范性、严谨性需求的基础上,将主题图技术应用于学术博客知识组织模型的构建;最后通过部分知名图情博客实例分析对模型进行验证并进行可视化显示,为学术博客知识组织提供有力的支持。  相似文献   

9.
A huge volume of news stories are reported by various news channels, on a daily basis. Subscribing to all the stories and keeping track of the important ones day after day is very time-consuming. This paper proposes several approaches to identify important news stories. To this end, we take advantage of the blogosphere as an information source to evaluate the importance of news stories. Blogs reflect the diverse opinions of bloggers about news stories, and the attention that these stories receive can help estimate the importance of the stories. In this paper, we define the popularity of a news story in the blogosphere as the attention it attracts from users. We measure popularity of the stories in the blogosphere from two viewpoints: content and a timeline. In terms of content, we suggest several approaches to estimate language models for a news story and blog posts, and we evaluate the importance of the story using these language models. Furthermore, we generate a temporal profile of a news story by exploring the timeline of blog posts related to the story, and evaluate its importance based on the temporal profile. We experimentally verify the effectiveness of the proposed approaches for identifying top news stories.  相似文献   

10.
《Communication monographs》2012,79(4):511-534
The study reported here explored the social dimension of health-related blogs by examining blogging as a means to marshal social support and, as a result, achieve some of the health benefits associated with supportive communication. A total of 121 individuals who author a blog dedicated to their experience living with a specific health condition completed the study questionnaire. The number of blog posts made by respondents and proportion of posts with reader comments were positively associated with perceived social support from blog readers. The relationship between blog reader support and two outcomes related to well-being depended upon the support available in bloggers' strong-tie relationships with family and friends. Consistent with the social compensation (i.e., “poor get richer”) perspective, blog reader support was negatively associated with loneliness and positively associated with personal growth when support in strong-tie relationships was relatively lacking.  相似文献   

11.
博客搜索引擎与传统搜索引擎的比较研究   总被引:8,自引:0,他引:8  
简要介绍了博客与国内外著名博客搜索引擎,针对博客搜索引擎与传统搜索引擎的不同,从工作原理、检索内容与检索方式三个方面对两种搜索引擎进行了系统的分析与比较,并选取了四个不同方面的具有代表性的主题,对两种搜索引擎的代表进行了检索功能和检索性能方面的测评,最后指出了两种搜索引擎在资源价值、检索方式、个性化服务等方面的各自的优势与不足之处,以期对两种搜索引擎的改进提供借鉴。  相似文献   

12.
The influential Text REtrieval Conference (TREC) retrieval conference has always relied upon specialist assessors or occasionally participating groups to create relevance judgements for the tracks that it runs. Recently however, crowdsourcing has been championed as a cheap, fast and effective alternative to traditional TREC-like assessments. In 2010, TREC tracks experimented with crowdsourcing for the very first time. In this paper, we report our successful experience in creating relevance assessments for the TREC Blog track 2010 top news stories task using crowdsourcing. In particular, we crowdsourced both real-time newsworthiness assessments for news stories as well as traditional relevance assessments for blog posts. We conclude that crowdsourcing not only appears to be a feasible, but also cheap and fast means to generate relevance assessments. Furthermore, we detail our experiences running the crowdsourced evaluation of the TREC Blog track, discuss the lessons learned, and provide best practices.  相似文献   

13.
14.
This article reports on an investigation of blog technology's potential for encouraging interaction between students, and its consequences in terms of peer learning and peer support, on a module of an accredited library and information science (LIS) degree program. The findings consider the treatment of blogs in the domain of LIS with particular reference to educational settings. Content analysis revealed that blogs offer comparable and additional benefits to other projects designed to encourage reflective engagement with teaching material, such as learning journals. Most notable is the level of shared peer support evident in the online discussions between class members. The findings of this study are of particular interest to LIS educators who seek to develop their consideration of blogs in the classroom; blogs may be seen as learning tools in their own right and not simply an option for providing information online.  相似文献   

15.
Since its inception in 2013, one of the key contributions of the CLEF eHealth evaluation campaign has been the organization of an ad-hoc information retrieval (IR) benchmarking task. This IR task evaluates systems intended to support laypeople searching for and understanding health information. Each year the task provides registered participants with standard IR test collections consisting of a document collection and topic set. Participants then return retrieval results obtained by their IR systems for each query, which are assessed using a pooling procedure. In this article we focus on CLEF eHealth 2013 and 2014s retrieval task, which saw topics created based on patients’ information needs associated with their medical discharge summaries. We overview the task and datasets created, and the results obtained by participating teams over these two years. We then provide a detailed comparative analysis of the results, and conduct an evaluation of the datasets in the light of these results. This twofold study of the evaluation campaign teaches us about technical aspects of medical IR, such as the effectiveness of query expansion; the quality and characteristics of CLEF eHealth IR datasets, such as their reliability; and how to run an IR evaluation campaign in the medical domain.  相似文献   

16.
An information retrieval (IR) system can often fail to retrieve relevant documents due to the incomplete specification of information need in the user’s query. Pseudo-relevance feedback (PRF) aims to improve IR effectiveness by exploiting potentially relevant aspects of the information need present in the documents retrieved in an initial search. Standard PRF approaches utilize the information contained in these top ranked documents from the initial search with the assumption that documents as a whole are relevant to the information need. However, in practice, documents are often multi-topical where only a portion of the documents may be relevant to the query. In this situation, exploitation of the topical composition of the top ranked documents, estimated with statistical topic modeling based approaches, can potentially be a useful cue to improve PRF effectiveness. The key idea behind our PRF method is to use the term-topic and the document-topic distributions obtained from topic modeling over the set of top ranked documents to re-rank the initially retrieved documents. The objective is to improve the ranks of documents that are primarily composed of the relevant topics expressed in the information need of the query. Our RF model can further be improved by making use of non-parametric topic modeling, where the number of topics can grow according to the document contents, thus giving the RF model the capability to adjust the number of topics based on the content of the top ranked documents. We empirically validate our topic model based RF approach on two document collections of diverse length and topical composition characteristics: (1) ad-hoc retrieval using the TREC 6-8 and the TREC Robust ’04 dataset, and (2) tweet retrieval using the TREC Microblog ’11 dataset. Results indicate that our proposed approach increases MAP by up to 9% in comparison to the results obtained with an LDA based language model (for initial retrieval) coupled with the relevance model (for feedback). Moreover, the non-parametric version of our proposed approach is shown to be more effective than its parametric counterpart due to its advantage of adapting the number of topics, improving results by up to 5.6% of MAP compared to the parametric version.  相似文献   

17.
In information retrieval research, models and systems traditionally assume that a single person is querying and reviewing the results. However, several empirical studies of professional practice identified collaboration during IR as everyday work patterns in order to solve a shared information need and to benefit from the diverse expertise and experience of the team members. Moreover, most IR systems that are employed in professional work routines are designed for individual use and prototype collaborative systems are too limited to support use in todays work practice. To bridge this gap, this papers develops and formalizes a decision theoretic approach towards supporting a team of people that explicitly set out together to resolve a shared information need. We develop a formal cost model for collaborative IR that considers the trade-off between estimated relevance of a document as well as estimated document redundancy. From this cost model, we use a decision theoretic approach to derive the notion of activity suggestions, that is, a formal optimum criterion that describes optimum collaboration strategies in IR as the solution of an integer linear program. Those collaboration strategies are suggested to team members with the aim to facilitate the collaborative performance of information retrieval tasks. We demonstrate the application of our model by means of search result division in two collaborative search tasks. In the conducted experiments, we study the effects of different domain knowledge and resulting relevance assessments of team members in four different conditions. The gathered results indicate that our approach can improve the retrieval effectiveness of teams in recall-oriented tasks.  相似文献   

18.
通过对国内96个大学图书馆博客网站的网络调查,从建站单位、建站时间、单位属性、博文数量及访问量、主题内容、建站平台等6个方面,对国内大学图书馆利用博客提供服务的现状进行全方位的分析和研究,得出4点结论:①国内大学图书馆的博客服务只在部分图书馆进入成熟应用阶段,而在大多数图书馆还处于初期尝试阶段;②学科馆员博客是目前国内大学图书馆提供博客服务的首选形式;③读者协会博客的活跃表现显示出大学图书馆在新的网络环境下的广阔前景;④以免费公共平台托管方式建站是国内大学图书馆最常见的博客建站方式。  相似文献   

19.
This article argues that although the act of mommy blogging may be empowering, the term itself reinforces women's hegemonic normative roles as nurturers, thrusting women who blog about their children into a form of digital domesticity in the blogosphere. Drawing on 29 blogs posts women wrote debating the term mommy blogger and 649 comments posted on these blogs, the author uses Judith Butler's concept of performativity to rhetorically analyze the term, using a techno-feminist lens and cyber-ethnographic approach. The author asserts that the use of the term mommy blogger continues the culturally ingrained performance of motherhood women learned since childhood, and, in so doing, holds women captive in this subjective norm that may not fit them. The use of mommy, versus mother, highlights the nurturing aspect of motherhood and conjures a prototype of the ideal mother, further marginalizing women by focusing on one attribute that does not apply to all women or even all mothers.  相似文献   

20.
为推进图书馆学科知识服务的发展,使用户拥有一个良好的学术交流与知识共享氛围,文章分析了目前图书馆学科知识服务存在的不足;在深入讨论了学术博客应用于图书馆学科知识服务可行性的基础上,利用学术博客自身特点,构建了基于学术博客的图书馆学科知识服务模式,建立了图书馆学科知识服务的网络社区,包括学术博客网络社区与图书馆学科知识服务网络;并阐述了基于学术博客的图书馆学科知识服务的具体应用实现,以解决图书馆知识服务中存在的不足,为全面提升图书馆专业化、特色化、个性化的知识服务能力提供参考.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号