首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到18条相似文献,搜索用时 46 毫秒
1.
SA-DBSCAN:一种自适应基于密度聚类算法   总被引:10,自引:0,他引:10  
DBSCAN是一种经典的基于密度聚类算法,能够自动确定簇的数量,对任意形状的簇都能有效处理.DBSCAN算法需要人为确定Eps和minPts?2个参数,导致聚类过程需人工干预才能进行.在DBSCAN的基础上提出了SA-DBSCAN聚类算法,通过分析数据集统计特性来自动确定Eps和minPts参数,从而避免了聚类过程的人工干预,实现聚类过程的全自动化.实验表明,SA-DBSCAN能够选择合理的Eps和minPts参数并得到较高准确度的聚类结果.  相似文献   

2.
祁志伟  张永平 《大众科技》2010,(6):66-67,46
网络安全是当今信息社会人们所关注的问题,入侵检测机制是防范网络攻击的有效手段。聚类算法是建立入侵检测模型的重要手段,在各种聚类算法中,密度聚类基于密度而非距离进行聚类,可以克服"类圆形"的缺点,遗传算法借鉴生物学的技术,是用于寻找最优解的算法。将遗传算法和密度聚类相结合的一种入侵检测算法,可以更准确的判断网络异常行为,从而提高网络的安全性。  相似文献   

3.
基于数据挖掘的DBSCAN算法及其应用   总被引:1,自引:0,他引:1  
利用基于数据挖掘技术的DBSCAN算法,提出了解决图像分割的新方法.把数字图像按照点的分布情况建立图像样本数据库,然后使用密度聚类法,利用DBSCAN算法进行图像分割.它能找到图像样本比较密集的部分,并且概括出图像样本相对比较集中的类,并可在带有"噪声"的图像中进行聚类,完成图像分割,有较强的抗"噪声"能力.  相似文献   

4.
对于k-means聚类算法,采用不同的权值计算方法,其聚类效果有明显差异,尤其当处理的文档数目较大时,这种效果差异可能会影响聚类准确性与合理性。通过实验,用不同的权值计算方法对文本进行聚类分析,并用熵值作为聚类效果的评价标准,对相关权值计算方法进行了研究和比较。  相似文献   

5.
利用空间坐标和属性特征的有机结合,定义了3种曼哈顿空间距离,用matlab编程给出了基于该空间距离的ACA—Cluster聚类算法,并对山东省生态环境质量进行了聚类分析和类型分区。实验表明,该方法可以较好地反映出空间位置邻近和属性特征相似的空间聚类要求。  相似文献   

6.
基于k-means聚类算法的专利地图制作方法研究   总被引:3,自引:1,他引:3       下载免费PDF全文
邱洪华  余翔 《科研管理》2009,30(2):70-76
利用专利文献,制作专利地图是有效监测和了解技术发展现状和趋势的重要手段之一,因此最近几年以来,关于专利地图的研究在知识产权领域引起了广泛的关注。本文分析了专利地图在国内外的研究现状,归纳了专利地图的功能,剖析了当前专利地图制作方法的不足,利用了专利文献中的结构化项目和非结构化项目,通过k-means聚类算法,形成语义网络,并最终制作完成可视化专利地图。而根据该专利地图,可以清晰而直观的看出所研究目标技术领域的技术发展路径。  相似文献   

7.
数据挖掘是一门面向应用的新兴学科分支,它涵盖了众多领域的知识,是解决从大量信息中获取有用知识、提供决策支持的有效途径,具有广泛的应用前景,聚类是数据挖掘中用来发现数据分布和隐含模式的一项重要技术。本文总结了大部分常用聚类算法的主要特点,对一些经典聚类算法进行比较并总结。  相似文献   

8.
研究了一种基于密度聚类模式下的依文本、段落、语句逐层分析的文本摘要自动生成方法。该聚类方法对噪声无敏感性,该层次分析方法对于长篇幅文本有较强的适应能力。同时,对特征向量的选取分别提出了一种线性及非线性加权模型。  相似文献   

9.
郭文娟 《科技风》2022,(4):63-65
针对传统的K-means算法运行的结果依赖于初始的聚类数目和聚类中心,本文提出了一种基于优化初始聚类中心的K-means算法。该算法通过量化样本间距离和聚类的紧密性来确定聚类数目K值;根据数据集的分布特征来选取相距较远的数据作为初始聚类中心,避免了传统K-means算法的聚类数目和聚类中心的随机选取。UCI机器学习数据库数据集的实验证明,本文所提出的改进的聚类算法获得了良好的聚类效果,同时获得较高的聚类准确率。  相似文献   

10.
一种基于聚类的云计算任务调度算法   总被引:2,自引:0,他引:2  
任务调度是云计算中的一个关键问题.针对 Min-Min 算法负载不平衡的缺点,引入 K-means 聚类,提出一种基于 K-means 聚类和 Min-Min 的云计算任务调度的新算法.该算法采用 K-means 聚类方法依据任务长度对任务聚类进行预处理,然后根据 Min-Min 算法的机制进行任务调度.仿真结果表明,该算法具有较好的负载均衡性和系统性能.  相似文献   

11.
With an increase in the number of data instances, data processing operations (e.g. clustering) requires an increasing amount of computational resources, and it is often the case that for considerably large datasets such operations cannot be executed on a single workstation. This requires the use of a server computer for carrying out the operations. However, to ensure privacy of the shared data, a privacy preserving data processing workflow involves applying an encoding transformation on the set of data points prior to applying the computation. This encoding should ideally cater to two objectives—first, it should be difficult to reconstruct the data, second, the results of the operation executed on the encoded space should be as close as possible to the results of the same operation executed on the original data. While standard encoding mechanisms, such as locality sensitive hashing, caters to the first objective, the second objective may not always be adequately satisfied.In this paper, we specifically focus on ‘clustering’ as the data processing operation. We apply a deep metric learning approach to learn a parameterized encoding transformation function with an objective to maximize the alignment of the clusters in the encoded space to those in the original data. We conduct experimentation on four standard benchmark datasets, particularly MNIST, Fashion-MNIST (each dataset contains 70K grayscale images), CIFAR-10 consisting of 60K color images and 20-Newsgroups containing 18K news articles. Our experiments demonstrate that the proposed method yields better clusters in comparison to approaches where the encoding process is agnostic of the clustering objective.  相似文献   

12.
In this paper, we propose a re-ranking algorithm using post-retrieval clustering for content-based image retrieval (CBIR). In conventional CBIR systems, it is often observed that images visually dissimilar to a query image are ranked high in retrieval results. To remedy this problem, we utilize the similarity relationship of the retrieved results via post-retrieval clustering. In the first step of our method, images are retrieved using visual features such as color histogram. Next, the retrieved images are analyzed using hierarchical agglomerative clustering methods (HACM) and the rank of the results is adjusted according to the distance of a cluster from a query. In addition, we analyze the effects of clustering methods, query-cluster similarity functions, and weighting factors in the proposed method. We conducted a number of experiments using several clustering methods and cluster parameters. Experimental results show that the proposed method achieves an improvement of retrieval effectiveness of over 10% on average in the average normalized modified retrieval rank (ANMRR) measure.  相似文献   

13.
This study presents a simple yet effective carrier frequency offset (CFO) estimation algorithm for orthogonal frequency division multiplexing (OFDM) systems. At the transmitter, the proposed algorithm uses null subcarriers to render the OFDM signal periodic in the time domain. At the receiver, these periodic time samples become CFO-bearing signals, which can be adopted to develop the maximum likelihood (ML) CFO estimation algorithm accordingly. In addition to providing reliable and efficient CFO estimation, the proposed algorithm has an adjustable acquisition region linearly proportional to the order of the null subcarrier insertion scheme.  相似文献   

14.
This paper presents an optimal fuzzy partition based Takagi Sugeno Fuzzy Model (TSFM) in which a novel clustering algorithm, known as Modified Fuzzy C-Regression Model (MFCRM), has been proposed. The objective function of MFCRM algorithm has been developed by considering of geometrical structure of input data and linear functional relation between input–output data. The MFCRM partitions the data space to create fuzzy subspaces (rules). A new validation criterion has been developed for detecting the right number of rules (subspaces) in a given data set. The obtained fuzzy partition is used to build the fuzzy structure and identify the premise parameters. Once, right number of rules and premise parameters have been identified, then consequent parameters have been identified by orthogonal least square (OLS) approach. The cluster validation index has been tested on synthetic data set. The effectiveness of MFCRM based TSFM has been validated on benchmark examples, such as Boiler Turbine system, Mackey–Glass time series data and Box–Jenkins model. The model performance is also validated through high-dimensional data such as Auto-MPG data and Boston Housing data.  相似文献   

15.
刘杰 《大众科技》2017,19(11):1-2,10
为了研究在复杂光照环境下的多目标特征聚类跟踪,文章分析了从傍晚到夜景时段下车辆视频流的素材,并设计了结合灯组聚类跟踪、灯影去除、车身聚类跟踪的多特征跟踪算法,实验结果表明采用多特征聚类跟踪算法后,在复杂的光照环境下取得较好的跟踪效果。  相似文献   

16.
We present an efficient document clustering algorithm that uses a term frequency vector for each document instead of using a huge proximity matrix. The algorithm has the following features: (1) it requires a relatively small amount of memory and runs fast, (2) it produces a hierarchy in the form of a document classification tree and (3) the hierarchy obtained by the algorithm explicitly reveals a collection structure. We confirm these features and thus show the algorithm's feasibility through clustering experiments in which we use two collections of Japanese documents, the sizes of which are 83,099 and 14,701 documents. We also introduce an application of this algorithm to a document browser. This browser is used in our Japanese-to-English translation aid system. The browsing module of the system consists of a huge database of Japanese news articles and their English translations. The Japanese article collection is clustered into a hierarchy by our method. Since each node in the hierarchy corresponds to a topic in the collection, we can use the hierarchy to directly access articles by topic. A user can learn general translation knowledge of each topic by browsing the Japanese articles and their English translations. We also discuss techniques of presenting a large tree-formed hierarchy on a computer screen.  相似文献   

17.
构建电子计算机及办公设备制造业竞争力评价指标体系,运用基于密度的聚类算法进行定量评价竞争力,得出相应结论,为政府和企业决策提供参考。  相似文献   

18.
The Internet, together with the large amount of textual information available in document archives, has increased the relevance of information retrieval related tools. In this work we present an extension of the Gambal system for clustering and visualization of documents based on fuzzy clustering techniques. The tool allows to structure the set of documents in a hierarchical way (using a fuzzy hierarchical structure) and represent this structure in a graphical interface (a 3D sphere) over which the user can navigate.Gambal allows the analysis of the documents and the computation of their similarity not only on the basis of the syntactic similarity between words but also based on a dictionary (Wordnet 1.7) and latent semantics analysis.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号