首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 359 毫秒
1.
人类社会已经进入大数据时代了,随着互联网的迅猛发展,种类繁多,数量庞大的数据随之产生,作为辅助人们检索信息工具的搜索引擎也存在着一定的局限性,如:不同领域,背景的用户往往具有不同的检索目的和需求,通用搜索引擎所返回的结果包含大量用户不关心的网页。为了解决这个问题,网络爬虫系统应运而生。众所周知,搜索引擎从互联网中靶向性筛选出有用信息,而网络爬虫又是搜索引擎的基础构件之一。本文实现了一个基于python语言的聚焦网络爬虫,利用关键字匹配技术对目标网站进行扫描,得到所需数据并抓取。  相似文献   

2.
随着互联网的不断发展,搜索引擎现在已成为网络用户获取信息的一个不可或缺的检索工具.搜索引擎目前存在诸多问题,例如查全率和查准率、功能问题、作弊问题、安全性问题、信息更新问题、规范化问题等.本文就这些问题进行一些探讨.  相似文献   

3.
王真  刘海燕 《黑龙江科技信息》2011,(18):106+221-106,221
随着互联网的不断发展,搜索引擎现在已成为网络用户获取信息的一个不可或缺的检索工具。就搜索引擎的概念、评价指标及检索机制、分类、各种搜索引擎的比较以及它在网络信息检索中的作用进行了介绍。  相似文献   

4.
基于个性化信息推荐服务的Web搜索引擎技术综述   总被引:3,自引:0,他引:3  
李树青  崔北亮 《情报杂志》2007,26(8):98-101
现代互联网的高速发展给Web搜索引擎带来了新的挑战,改善用户的查询体验以便于用户从海量的网络信息资源中得到自己所需的内容,正在成为当代搜索引擎的主要发展方向.基于个性化信息推荐服务的搜索引擎正在快速得到人们的广泛关注.经过近几年来的不断研究,现在已经形成了四种主要的形式,分别依托于查询改进、个性化网页权重、个性化多元搜索引擎和个性化信息采集等技术.在对此综合介绍的基础上,指出了未来改进的方向.  相似文献   

5.
搜索引擎系统中的Web个性化信息推荐技术   总被引:1,自引:0,他引:1  
Web个性化推荐技术在现代互联网中有着广泛的应用,它能将Web网络信息按照用户的个性化需求主动地向用户提供服务。但是由于现代搜索引擎通常缺乏用户的相关登录信息和网页访问路径信息,所以传统的Web个性化推荐服务并不完全适用于搜索引擎。由于用户在访问搜索引擎时会产生大量的关键词访问序列,而这种关键词访问序列蕴含着丰富的用户个性化信息,基于此,提出了一种利用搜索引擎访问日志中的关键词访问序列来进行Web个性化推荐服务的方法,并分析了相关技术特点和实现细节。  相似文献   

6.
基于本体智能搜索引擎的研究   总被引:7,自引:0,他引:7  
李宝敏 《情报杂志》2006,25(10):60-62
分析了当前网上搜索引擎的现状及存在的问题,设计和实现了一个基于本体智能搜索引擎系统模型,讨论了该系统结构中各个功能模块的功能和相互之间的关系。利用本体规范用户查询语句的语义化、搜索信息的语义化,探讨了搜索引擎智能的相关技术以及它们在本体智能搜索引擎中的应用。  相似文献   

7.
搜索引擎为网络用户检索海量信息提供了便利,本文从扩大检索范围、提高查全率以及缩小检索范围、提高查准率着手进行了搜索引擎检索策略的调整研究。  相似文献   

8.
在现代不管是社会的发展还是经济的发展,都会用到我们不断丰富和发展的网络信息资源,因为我们的网络信息资源是不断丰富和发展的。许多用户在面对纷扰杂乱的信息资源,在这时,人们为了方便,并且为了让用户迅速准确的找到自己想要的信息资源,就有很多人发明了网络检索工具,这样,就诞生了大量搜索引擎工具,搜索引擎是不可缺少的搜索工具之一,因为它可以帮助我们检索到大亮点网络信息资源。但是,像那种简单网页检索已经满足不了用户者的要求了。近几年来,我们急速发展的搜索引擎进入新一轮的快速发展时期,为了满足广大用户者的各种各样的需求,我们需要进一步提升我们快速发展的搜索引擎,将我们的搜索引擎作为我们互联网的入口,提升搜索引擎的地位,由于广大用户者的急切需求,国内各种各样的搜索引擎研发商的服务呈现更加多元化的发展趋势。例如文档搜索引擎正是可以满足我们广大用户者的需求,这一搜索引擎可以满足用户多元化需求的重要的应用之一。  相似文献   

9.
垂直搜索引擎是针对某一个行业的专业搜索引擎,是搜索引擎的细分和延伸。通过垂直搜索引擎能够从互联网上自动搜集信息,并为用户提供信息服务。建设一个中文垂直搜索引擎大致需要以下技术:信息采集技术、网页信息抽取技术、中文分词技术、索引技术。  相似文献   

10.
搜索引擎是指根据一定的策略搜集互联网上的信息,为用户提供检索服务的系统。介绍了搜索引擎的结构、使用技术以及内容提取方法。  相似文献   

11.
陈海珠  徐辉 《现代情报》2009,29(8):32-36
学科信息门户是一种支持系统资源查询的因特网服务,是当前网络学术信息组织和开发利用的有效手段,从某种意义上讲就是网络学术图书馆,可以帮助用户寻找网络上的高质量信息。本文通过分析探讨制定并规范学科信息门户评价指标体系的意义,从资源选择政策、系统功能、维护与更新、个性化服务等方面提出关于学科信息门户的评价指标体系的初步构想。  相似文献   

12.
李祖培 《大众科技》2012,(8):44-47,10
文章分析广西农业信息网信息发布概况,以中文互联网数据统计分析第三方服务提供商CNZZ开发的数据采集统计平台为依托,随机选择了2012年5月广西农业信息网的用户访问搜索的关键字、应用的搜索工具、使用的访问终端、网络服务接入商、访问来源、访问深度、访问时段、访问来路等九个统计指标进行调查,分析了广西农业信息网用户访问行为的特征,得出以下几个结论:一是广西农业信息网用户以百度、搜狗、谷歌、腾讯为等主渠道进行信息的定位搜索;二是广西农业信息网终端用户手机用户约占1.95%,电脑用户占98%以上,农村地区移动互联网应用处于起步阶段;三是用户通过电信、电信通、网通、移动等运营商提供网络通讯服务的用户累计占到98%以上;四是广西本地网民是关注广西农业信息网的主体,占到接近60%;六是网民对于广西农业信息网的关注多为简单获取信息进行浅层访问;七是网站日流量呈现出明显的工作日流量大节假日流量小的趋势,访问以工作时间为主.提出了优化政府信息网站的7个方面的建议.  相似文献   

13.
成全  刘彬彬 《情报科学》2022,39(2):82-90
【目的/意义】基于双路径模型(Elaboration Likelihood Model,ELM)探究互联网环境下用户跨平台学术信息 搜索行为的影响因素,假设用户在多平台学术信息搜索行为的过程中除了受到信息质量(中心路径)与信源可信度 (边缘路径)的影响外,还会受到感知有用性的中介作用以及注意力控制水平与信息自我效能的调节作用。【方法/ 过程】通过建立学术信息搜索行为模型,采用问卷调查法收集到 350份有效样本,利用 SPSS 21.0进行数据分析,同 时对搜索模型的感知有用性、注意力控制水平及自我效能要素分别进行了中介效应与调节效应检验。【结果/结论】 结果发现信息质量或信源可信度对感知有用性的影响受到注意力控制水平的负向调节,且注意力控制水平对边缘 路径的调节更为显著,信息自我效能则能加强感知有用性对搜索行为的预测作用。【创新/局限】本研究从行为心理 的微观视角,将注意力控制水平引入双路径模型,为探究跨平台学术信息搜索行为提供新的研究路径。  相似文献   

14.
We report our experience with a novel approach to interactive information seeking that is grounded in the idea of summarizing query results through automated document clustering. We went through a complete system development and evaluation cycle: designing the algorithms and interface for our prototype, implementing them and testing with human users. Our prototype acted as an intermediate layer between the user and a commercial Internet search engine (AltaVista), thus allowing searches of the significant portion of World Wide Web. In our final evaluation, we processed data from 36 users and concluded that our prototype improved search performance over using the same search engine (AltaVista) directly. We also analyzed effects of various related demographic and task related parameters.  相似文献   

15.
A growing body of research is beginning to explore the information-seeking behavior of Web users. The vast majority of these studies have concentrated on the area of textual information retrieval (IR). Little research has examined how people search for non-textual information on the Internet, and few large-scale studies has investigated visual information-seeking behavior with general-purpose Web search engines. This study examined visual information needs as expressed in users’ Web image queries. The data set examined consisted of 1,025,908 sequential queries from 211,058 users of Excite, a major Internet search service. Twenty-eight terms were used to identify queries for both still and moving images, resulting in a subset of 33,149 image queries by 9855 users. We provide data on: (1) image queries – the number of queries and the number of search terms per user, (2) image search sessions – the number of queries per user, modifications made to subsequent queries in a session, and (3) image terms – their rank/frequency distribution and the most highly used search terms. On average, there were 3.36 image queries per user containing an average of 3.74 terms per query. Image queries contained a large number of unique terms. The most frequently occurring image related terms appeared less than 10% of the time, with most terms occurring only once. We contrast this to earlier work by P.G.B. Enser, Journal of Documentation 51 (2) (1995) 126–170, who examined written queries for pictorial information in a non-digital environment. Implications for the development of models for visual information retrieval, and for the design of Web search engines are discussed.  相似文献   

16.
We report a naturalistic interactive information retrieval (IIR) study of 18 ordinary users in the age of 20–25 who carry out everyday-life information seeking (ELIS) on the Internet with respect to the three types of information needs identified by Ingwersen (1986): the verificative information need (VIN), the conscious topical information need (CIN), and the muddled topical information need (MIN). The searches took place in the private homes of the users in order to ensure as realistic searching as possible. Ingwersen (1996) associates a given search behaviour to each of the three types of information needs, which are analytically deduced, but not yet empirically tested. Thus the objective of the study is to investigate whether empirical data does, or does not, conform to the predictions derived from the three types of information needs. The main conclusion is that the analytically deduced information search behaviour characteristics by Ingwersen are positively corroborated for this group of test participants who search the Internet as part of ELIS.  相似文献   

17.
The dynamic nature and size of the Internet can result in difficulty finding relevant information. Most users typically express their information need via short queries to search engines and they often have to physically sift through the search results based on relevance ranking set by the search engines, making the process of relevance judgement time-consuming. In this paper, we describe a novel representation technique which makes use of the Web structure together with summarisation techniques to better represent knowledge in actual Web Documents. We named the proposed technique as Semantic Virtual Document (SVD). We will discuss how the proposed SVD can be used together with a suitable clustering algorithm to achieve an automatic content-based categorization of similar Web Documents. The auto-categorization facility as well as a “Tree-like” Graphical User Interface (GUI) for post-retrieval document browsing enhances the relevance judgement process for Internet users. Furthermore, we will introduce how our cluster-biased automatic query expansion technique can be used to overcome the ambiguity of short queries typically given by users. We will outline our experimental design to evaluate the effectiveness of the proposed SVD for representation and present a prototype called iSEARCH (Intelligent SEarch And Review of Cluster Hierarchy) for Web content mining. Our results confirm, quantify and extend previous research using Web structure and summarisation techniques, introducing novel techniques for knowledge representation to enhance Web content mining.  相似文献   

18.
互联网带给人们的是海量信息,而这并没有给用户带来十足的方便,人们反而因为这些海量信息而不知道怎样找到对自己有用的信息。搜索引擎的出现,给了人们一个不错的工具。但用户的最终信息需求能否满足,要看用户能否利用这些工具找到有效的信息。因此,用户就是对于网页评价的最终主体,也只有建立在用户基础上的网页评价才最有价值。本文试图从用户的角度出发,建立一套基于用户信息需求的网页相关性评价方法,用于评价网页信息内容与用户需求的相关性。  相似文献   

19.
Search task success rate is an important indicator to measure the performance of search engines. In contrast to most of the previous approaches that rely on labeled search tasks provided by users or third-party editors, this paper attempts to improve the performance of search task success evaluation by exploiting unlabeled search tasks that are existing in search logs as well as a small amount of labeled ones. Concretely, the Multi-view Active Semi-Supervised Search task Success Evaluation (MA4SE) approach is proposed, which exploits labeled data and unlabeled data by integrating the advantages of both semi-supervised learning and active learning with the multi-view mechanism. In the semi-supervised learning part of MA4SE, we employ a multi-view semi-supervised learning approach that utilizes different parameter configurations to achieve the disagreement between base classifiers. The base classifiers are trained separately from the pre-defined action and time views. In the active learning part of MA4SE, each classifier received from semi-supervised learning is applied to unlabeled search tasks, and the search tasks that need to be manually annotated are selected based on both the degree of disagreement between base classifiers and a regional density measurement. We evaluate the proposed approach on open datasets with two different definitions of search tasks success. The experimental results show that MA4SE outperforms the state-of-the-art semi-supervised search task success evaluation approach.  相似文献   

20.
Mouse interaction data contain a lot of interaction information between users and Search Engine Result Pages (SERPs), which can be useful for evaluating search satisfaction. Existing studies use aggregated features or anchor elements to capture the spatial information in mouse interaction data, which might lose valuable mouse cursor movement patterns for estimating search satisfaction. In this paper, we leverage regions together with actions to extract sequences from mouse interaction data. Using regions to capture the spatial information in mouse interaction data would reserve more details of the interaction processes between users and SERPs. To modeling mouse interaction sequences for search satisfaction evaluation, we propose a novel LSTM unit called Region-Action LSTM (RALSTM), which could capture the interactive relations between regions and actions without subjecting the network to higher training complexity. Simultaneously, we propose a data augmentation strategy Multi-Factor Perturbation (MFP) to increase the pattern variations on mouse interaction sequences. We evaluate the proposed approach on open datasets. The experimental results show that the proposed approach achieves significant performance improvement compared with the state-of-the-art search satisfaction evaluation approach.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号