首页 | 本学科首页   官方微博 | 高级检索  
 共查询到19条相似文献,搜索用时 964 毫秒
先从网页内容和网页链接的角度计算网页与主题的相似度,然后将二者归一化,最终确定网页与主题的相关度.从分析网页内容相似度的角度保证网页与主题的相关性,从网页链接分析的角度解决网页搜索的权威性和覆盖乔率.算法还对PageRank算法进行了改进,将访问网页的概率加入到算法计算过程中,实验结果表明,新的算法具有较高的搜索效率.  相似文献   

垂直搜索引擎系统的研究与实现   总被引:2,自引:1,他引:1  
给出了垂直搜索引擎的体系结构,然后分别设计了垂直搜索引擎的三大核心模块:主题词库、网络机器人和中文分词。在主题词库模块中设计了一个分层的主题词库系统,该词库将颗粒大的主题词置于词库高层,而将颗粒小的主题词置于词库低层,既考虑了主题搜索的广度,也考虑了主题搜索的精度。而在网络机器人设计过程中则综合运用了多线程及基于VSM的主题相关度判断算法进行主题网页爬行,在中文分词设计中则采用最大正向匹配算法完成中文分词。实验表明,多线程是提高网络机器人爬行速度的关键。此外,搜索引擎的准确率达到了63%。  相似文献   

针对主题搜索引擎反馈信息主题相关度低的问题,提出了将遗传算法与基于内容的空间向量模型相结合的搜索策略。利用空间向量模型确定网页与主题的相关度,并将遗传算法应用于相关度判别,提高主题信息搜索的准确率和查全率。在Heritrix框架基础上,利用Eclipse3.3实现了相应功能。实验结果表明,搜索策略改进后的系统抓取主题页面所占比例与原系统相比提高了约30%。  相似文献   

网络蜘蛛搜索策略的研究是近年来专业搜索引擎研究的焦点之一,如何使搜索引擎快速准确地从庞大的网页数据中获取所需资源的需求是目前所面临的重要问题。重点阐述了搜索引擎的Web Spider(网络蜘蛛)的搜索策略和搜索优化措施,提出了一种简单的基于广度优先算法的网络蜘蛛设计方案,并分析了设计过程中的优化措施。  相似文献   

本文详细介绍了面向计算机教育资源的垂直搜索引擎的体系结构,重点叙述了构成垂直搜索引擎的主题爬虫的爬行策略、主题相关度算法和主题词库的设计策略。实验结果表明:软件系统中Heri-trix的最大响应时间是0.563秒,查询精度和主题相关度判别算法的精度均达到了60%以上,可以面向Web加以应用。  相似文献   

钟辉新 《情报杂志》2008,27(1):118-120
针对具体行业运用领域的需求,采用机器学习的方法对Web内容进行分析和挖掘并提供个性化服务是搜索引擎的一个重要发展趋势.从搜索引擎的通过原理出发,针对通用搜索引擎存在问题,构建基于垂直搜索引擎的个性化信息体系结构,结合用户共同兴趣模型与Hopfield Net Spider搜索策略实现用户个性化信息服务.  相似文献   

随着互联网技术的不断发展,用户收集和分析与特定主题相关的网页显得越来越困难.该文提出了面向主题的WWW信息的分类系统(WICS),该系统可以高效地收集网页,然后进行分类,最后将搜索结果呈现给用户.该文在分析典型的搜索引擎的基础上,介绍了Web文本挖掘,并将其应用到系统中.原型系统中使用了文本预处理、索引、倒排文件和向量空间距离测度等枝术、算法.初始实验表明,用原型系统进行Web信息分类,为用户获取信息提供了很大的方便,提高了搜索结果的相关性和精确度.  相似文献   

垂直搜索引擎系统的设计与实现   总被引:1,自引:0,他引:1  
面对日益专业和个性化的信息检索需求,通用搜索引擎存在的问题暴露无遗。垂直搜索技术作为搜索引擎发展的一个主要方向,正在受到越来越多的关注。在给出一个垂直搜索引擎总体结构的基础上,详细分析了所涉及的关键技术:网页抓取、中文分词、文本分类等。并将分词和分类算法加入到Nutch中,实现了系统原型。实验证明,该系统主题相关度达到94%以上。  相似文献   

基于语义理解的智能搜索引擎的研究   总被引:7,自引:0,他引:7  
曹二堂  刘玉林 《情报杂志》2005,24(6):58-59,63
通过对查询短语的结构分析,认为查询短语通常由关键字和特征词构成。特征词是对网页内容的概括,它预示着网页中包含一组特定的特征词条。基于此思想建立了面向Web网页内容的特征库,研究了以Web网页内容特征库为基础实现对查询短语进行语义理解的方法,提出了相关度级别的算法,对库中已收入的特征词进行了查询测试.查准率为86.7%。实验表明,该方法基本实现了对查询短语的理解,对提高搜索引擎的查准率有显著的效果。  相似文献   

基于RSS的搜索引擎技术及其发展趋向探析   总被引:3,自引:0,他引:3  
随着RSS资源的飞速增长,基于RSS的搜索引擎应运而生.RSS搜索引擎的使用方法和Google、百度一样,都是通过输入关键词来搜索要查询的内容.不同的是传统门户搜索引擎是对抓取到的网页内容进行搜索.而RSS搜索引擎则是直接对RSS种子或含有RSS种子的网页进行检索.RSS搜索引擎具有高度准确性、动态聚合机制和高效率、高速度的搜索特点,为当前网络资源最为重要的新型信息检索工具.鉴于此,本文就RSS搜索引擎的国内外研究状况、技术特点和应用机理等进行了初步探讨,在此基础上.笔者进一步对基于RSS的搜索引擎技术进行了展望.  相似文献   

随着网络的飞速发展,网页数量急剧膨胀,近几年来更是以指数级进行增长,搜索引擎面临的挑战越来越严峻,很难从海量的网页中准确快捷地找到符合用户需求的网页。网页分类是解决这个问题的有效手段之一,基于网页主题分类和基于网页体裁分类是网页分类的两大主流,二者有效地提高了搜索引擎的检索效率。网页体裁分类是指按照网页的表现形式及其用途对网页进行分类。介绍了网页体裁的定义,网页体裁分类研究常用的分类特征,并且介绍了几种常用特征筛选方法、分类模型以及分类器的评估方法,为研究者提供了对网页体裁分类的概要性了解。  相似文献   

Search Engine for South-East Europe (SE4SEE) is a socio-cultural search engine running on the grid infrastructure. It offers a personalized, on-demand, country-specific, category-based Web search facility. The main goal of SE4SEE is to attack the page freshness problem by performing the search on the original pages residing on the Web, rather than on the previously fetched copies as done in the traditional search engines. SE4SEE also aims to obtain high download rates in Web crawling by making use of the geographically distributed nature of the grid. In this work, we present the architectural design issues and implementation details of this search engine. We conduct various experiments to illustrate performance results obtained on a grid infrastructure and justify the use of the search strategy employed in SE4SEE.  相似文献   

Drawing on the ideas of the Sense-Making approach, the ways in which people face and bridge gaps in Web searching are analyzed. The empirical study is based on videotaped Web searches conducted by seven participants. Altogether 11 gaps and 13 search tactics of various types were identified. The gaps faced by the searchers originated from three major factors: problematic content of information, insufficient search competence and problems caused by the search environment. Of individual gaps, no relevant material available, inaccessible content and confusion were most frequent. Of the search tactics used in gap-bridging, following links and activating the Back button were most popular. Gaps related to the problematic content of information led the informants to redirect the search to find Web pages that focus better on the search topic. If the movement was stopped by insufficient search competence, the searchers tended to return to material that was familiar from earlier use contexts in order to regain control of the search process. Alternatively, they tried to specify the search terms. In cases where the search was interrupted by technical problems or other factors originating from the search system, gap-bridging aimed at returning to familiar and technically reliable links. The Sense-Making theory provides relevant conceptual tools to approach the dynamic and discontinuous nature of Web searching in terms of gap-facing and gap-bridging. The concept of gap-facing enables a context-sensitive analysis of the ways in which Web search processes may be stopped. Gap-bridging indicates a general level motive to find alternative ways to continue searching.  相似文献   

本文分析了正方法,查询修正中的用户信息行为,吸收网页抓取、检索与浏览并重的思想,综合考虑用户Web搜索过程中的行为特点、查询修正所用词汇的可用来源,给出一个新的面向Web搜索的查询修正解决方案.  相似文献   

The performance and capabilities of Web search engines is an important and significant area of research. Millions of people world wide use Web search engines very day. This paper reports the results of a major study examining the overlap among results retrieved by multiple Web search engines for a large set of more than 10,000 queries. Previous smaller studies have discussed a lack of overlap in results returned by Web search engines for the same queries. The goal of the current study was to conduct a large-scale study to measure the overlap of search results on the first result page (both non-sponsored and sponsored) across the four most popular Web search engines, at specific points in time using a large number of queries. The Web search engines included in the study were MSN Search, Google, Yahoo! and Ask Jeeves. Our study then compares these results with the first page results retrieved for the same queries by the metasearch engine Dogpile.com. Two sets of randomly selected user-entered queries, one set was 10,316 queries and the other 12,570 queries, from Infospace’s Dogpile.com search engine (the first set was from Dogpile, the second was from across the Infospace Network of search properties were submitted to the four single Web search engines). Findings show that the percent of total results unique to only one of the four Web search engines was 84.9%, shared by two of the three Web search engines was 11.4%, shared by three of the Web search engines was 2.6%, and shared by all four Web search engines was 1.1%. This small degree of overlap shows the significant difference in the way major Web search engines retrieve and rank results in response to given queries. Results point to the value of metasearch engines in Web retrieval to overcome the biases of individual search engines.  相似文献   

在Asp.net程序中,各个Web页面是相互孤立的,信息不能进行传递,如何高效地交换数据,是一个值得研究的问题。Web页面之间传值的方法较多,但多数对性能消耗较大,利用Microsoft Visual Studio 2010开发平台,精选出三种性能高效的传值方法。  相似文献   

Web智能检索中动态相关反馈技术研究   总被引:5,自引:1,他引:5  
随着网络技术的发展 ,网络逐步成为巨大的、分布广泛的综合了复杂文本、图像、声音等的信息资源中心。由于网络提供的信息大多缺乏结构上的统一性、组织上的有序性 ,所以非常有必要帮助用户有效收集信息并选择感兴趣的信息推荐给用户 ,做到用户信息检索的智能化 ,从而降低网络查询时间 ,提高网络检索精度。本文介绍的动态相关反馈技术可在用户进行Web浏览及查询时动态跟踪用户兴趣。在用户进行Web检索时通过用户即时反馈 ,确定用户兴趣模式 ,从而实现智能Web检索 ,使得检索结果最大程度地满足用户需要。1 Web智能检索中的动态…  相似文献   

The Web and especially major Web search engines are essential tools in the quest to locate online information for many people. This paper reports results from research that examines characteristics and changes in Web searching from nine studies of five Web search engines based in the US and Europe. We compare interactions occurring between users and Web search engines from the perspectives of session length, query length, query complexity, and content viewed among the Web search engines. The results of our research shows (1) users are viewing fewer result pages, (2) searchers on US-based Web search engines use more query operators than searchers on European-based search engines, (3) there are statistically significant differences in the use of Boolean operators and result pages viewed, and (4) one cannot necessary apply results from studies of one particular Web search engine to another Web search engine. The wide spread use of Web search engines, employment of simple queries, and decreased viewing of result pages may have resulted from algorithmic enhancements by Web search engine companies. We discuss the implications of the findings for the development of Web search engines and design of online content.  相似文献   

根据企业门户中信息更新的特点,结合企业门户信息检索的要求,在蜘蛛程序搜索策略中提出基于重要Web页面的增量获取思想,并利用多线程技术,设计应用于企业门户信息搜集的网络蜘蛛,使网络蜘蛛的搜索效率得到了提高。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号