首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 9 毫秒
1.
The demand to detect opinionated spam, using opinion mining applications to prevent their damaging effects on e-commerce reputations is on the rise in many business sectors globally. The existing spam detection techniques in use nowadays, only consider one or two types of spam entities such as review, reviewer, group of reviewers, and product. Besides, they use a limited number of features related to behaviour, content and the relation of entities which reduces the detection's accuracy. Accordingly, these techniques mostly exploit synthetic datasets to analyse their model and are not able to be applied in the context of the real-world environment. As such, a novel graph-based model called “Multi-iterative Graph-based opinion Spam Detection” (MGSD) in which all various types of entities are considered simultaneously within a unified structure is proposed. Using this approach, the model reveals both implicit (i.e., similar entity's) and explicit (i.e., different entities’) relationships. The MGSD model is able to evaluate the ‘spamicity’ effects of entities more efficiently given it applies a novel multi-iterative algorithm which considers different sets of factors to update the spamicity score of entities. To enhance the accuracy of the MGSD detection model, a higher number of existing weighted features along with the novel proposed features from different categories were selected using a combination of feature fusion techniques and machine learning (ML) algorithms. The MGSD model can also be generalised and applied in various opinionated documents due to employing domain independent features. The output of the MGSD model showed that our feature selection and feature fusion techniques showed a remarkable improvement in detecting spam. The findings of this study showed that MGSD could improve the accuracy of state-of-the-art ML and graph-based techniques by around 5.6% and 4.8%, respectively, also achieving an accuracy of 93% for the detection of spam detection in our synthetic crowdsourced dataset and 95.3% for Ott's crowdsourced dataset.  相似文献   

2.
The advent of connected devices and omnipresence of Internet have paved way for intruders to attack networks, which leads to cyber-attack, financial loss, information theft in healthcare, and cyber war. Hence, network security analytics has become an important area of concern and has gained intensive attention among researchers, off late, specifically in the domain of anomaly detection in network, which is considered crucial for network security. However, preliminary investigations have revealed that the existing approaches to detect anomalies in network are not effective enough, particularly to detect them in real time. The reason for the inefficacy of current approaches is mainly due the amassment of massive volumes of data though the connected devices. Therefore, it is crucial to propose a framework that effectively handles real time big data processing and detect anomalies in networks. In this regard, this paper attempts to address the issue of detecting anomalies in real time. Respectively, this paper has surveyed the state-of-the-art real-time big data processing technologies related to anomaly detection and the vital characteristics of associated machine learning algorithms. This paper begins with the explanation of essential contexts and taxonomy of real-time big data processing, anomalous detection, and machine learning algorithms, followed by the review of big data processing technologies. Finally, the identified research challenges of real-time big data processing in anomaly detection are discussed.  相似文献   

3.
构建视频场景中目标轨迹分布的概率模型——混合单边广义高斯模型,通过计算目标轨迹的信息量分析目标轨迹是否异常.该方法不依赖场景的先验知识,模型建立过程无监督,且模型能实时更新以适应时变环境.实验表明,该方法的有效性和鲁棒性,具有一定的应用价值.  相似文献   

4.
提出了一种基于动态控制流路径分析的隐藏恶意代码检测方法.该方法首先有针对性地选取与恶意代码相关的敏感路径并动态记录其执行过程的控制流路径,然后采用基于调用层次树匹配的异常检测算法分析所获得的数据,从而检查出系统中隐藏型恶意代码.实验结果表明,该方法能有效检测出隐藏恶意代码,具有高检出率和低误报率的特点,适用于计算机操作系统内的隐藏型恶意代码的检测.  相似文献   

5.
Keyword extraction aims to capture the main topics of a document and is an important step in natural language processing (NLP) applications. The use of different graph centrality measures has been proposed to extract automatic keywords. However, there is no consensus yet on how these measures compare in this task. Here, we present the multi-centrality index (MCI) approach, which aims to find the optimal combination of word rankings according to the selection of centrality measures. We analyze nine centrality measures (Betweenness, Clustering Coefficient, Closeness, Degree, Eccentricity, Eigenvector, K-Core, PageRank, Structural Holes) for identifying keywords in co-occurrence word-graphs representation of documents. We perform experiments on three datasets of documents and demonstrate that all individual centrality methods achieve similar statistical results, while the proposed MCI approach significantly outperforms the individual centralities, three clustering algorithms, and previously reported results in the literature.  相似文献   

6.
This paper presents a robust and comprehensive graph-based rank aggregation approach, used to combine results of isolated ranker models in retrieval tasks. The method follows an unsupervised scheme, which is independent of how the isolated ranks are formulated. Our approach is able to combine arbitrary models, defined in terms of different ranking criteria, such as those based on textual, image or hybrid content representations.We reformulate the ad-hoc retrieval problem as a document retrieval based on fusion graphs, which we propose as a new unified representation model capable of merging multiple ranks and expressing inter-relationships of retrieval results automatically. By doing so, we claim that the retrieval system can benefit from learning the manifold structure of datasets, thus leading to more effective results. Another contribution is that our graph-based aggregation formulation, unlike existing approaches, allows for encapsulating contextual information encoded from multiple ranks, which can be directly used for ranking, without further computations and post-processing steps over the graphs. Based on the graphs, a novel similarity retrieval score is formulated using an efficient computation of minimum common subgraphs. Finally, another benefit over existing approaches is the absence of hyperparameters.A comprehensive experimental evaluation was conducted considering diverse well-known public datasets, composed of textual, image, and multimodal documents. Performed experiments demonstrate that our method reaches top performance, yielding better effectiveness scores than state-of-the-art baseline methods and promoting large gains over the rankers being fused, thus demonstrating the successful capability of the proposal in representing queries based on a unified graph-based model of rank fusions.  相似文献   

7.
Most of the existing large-scale high-dimensional streaming anomaly detection methods suffer from extremely high time and space complexity. Moreover, these models are very sensitive to parameters,make their generalization ability very low, can also be merely applied to very few specific application scenarios. This paper proposes a three-layer structure high-dimensional streaming anomaly detection model, which is called the double locality sensitive hashing Bloom filter, namely dLSHBF. We first build the former two layers that is double locality sensitive hashing (dLSH), proving that the dLSH method reduces the hash coding length of the data, and it ensures that the projected data still has a favorable mapping distance-preserving property after projection. Second, we use a Bloom filter to build the third layer of dLSHBF model, which used to improve the efficiency of anomaly detection. Six large-scale high-dimensional data stream datasets in different IIoT anomaly detection domains were selected for comparison experiments. First, extensive experiments show that the distance-preserving performance of the former dLSH algorithm proposed in this paper is significantly better than the existing LSH algorithms. Second, we verify the dLSHBF model more efficient than the other existing advanced Bloom filter model (for example Robust Bloom Filter, Fly Bloom Filter, Sandwich Learned Bloom Filter, Adaptive Learned Bloom Filters). Compared with the state of the art, dLSHBF can perform with the detection rate (DR) and false alarm rate (FAR) of anomaly detection more than 97%, and less than 2.2% respectively. Its effectiveness and generalization ability outperform other existing streaming anomaly detection methods.  相似文献   

8.
This paper presents a novel query expansion method, which is combined in the graph-based algorithm for query-focused multi-document summarization, so as to resolve the problem of information limit in the original query. Our approach makes use of both the sentence-to-sentence relations and the sentence-to-word relations to select the query biased informative words from the document set and use them as query expansions to improve the sentence ranking result. Compared to previous query expansion approaches, our approach can capture more relevant information with less noise. We performed experiments on the data of document understanding conference (DUC) 2005 and DUC 2006, and the evaluation results show that the proposed query expansion method can significantly improve the system performance and make our system comparable to the state-of-the-art systems.  相似文献   

9.
Privacy has raised considerable concerns recently, especially with the advent of information explosion and numerous data mining techniques to explore the information inside large volumes of data. These data are often collected and stored across different institutions (banks, hospitals, etc.), or termed cross-silo. In this context, cross-silo federated learning has become prominent to tackle the privacy issues, where only model updates will be transmitted from institutions to servers without revealing institutions’ private information. In this paper, we propose a cross-silo federated XGBoost approach to solve the federated anomaly detection problem, which aims to identify abnormalities from extremely unbalanced datasets (e.g., credit card fraud detection) and can be considered a special classification problem. We design two privacy-preserving mechanisms that are tailored to the federated XGBoost: anonymity based data aggregation and local differential privacy. In the anonymity based data aggregation scenario, we cluster data into different groups and using a cluster-level data feature to train the model. In the local differential privacy scenario, we design a federated XGBoost framework by incorporate differential privacy in parameter transmission. Our experimental results over two datasets show the effectiveness of our proposed schemes compared with existing methods.  相似文献   

10.
Anomalous data are such data that deviate from a large number of normal data points, which often have negative impacts on various systems. Current anomaly detection technology suffers from low detection accuracy, high false alarm rate and lack of labeled data. Anomaly detection is of great practical importance as an effective means to detect anomalies in the data and provide important support for the normal operation of various systems. In this paper, we propose an anomaly detection classification model that incorporates federated learning and mixed Gaussian variational self-encoding networks, namely MGVN. The proposed MGVN network model first constructs a variational self-encoder using a mixed Gaussian prior to extracting features from the input data, and then constructs a deep support vector network with the mixed Gaussian variational self-encoder to compress the feature space. The MGVN finds the minimum hypersphere to separate the normal and abnormal data and measures the abnormal fraction by calculating the Euclidean distance between the data features and the hypersphere center. Federated learning is finally incorporated with MGVN (FL-MGVN) to effectively address the problems that multiple participants collaboratively train a global model without sharing private data. The experiments are conducted on the benchmark datasets such as NSL-KDD, MNIST and Fashion-MNIST, which demonstrate that the proposed FL-MGVN has higher recognition performance and classification accuracy than other methods. The average AUC on MNIST and Fashion-MNIST reached 0.954 and 0.937, respectively.  相似文献   

11.
Semi-supervised anomaly detection methods leverage a few anomaly examples to yield drastically improved performance compared to unsupervised models. However, they still suffer from two limitations: 1) unlabeled anomalies (i.e., anomaly contamination) may mislead the learning process when all the unlabeled data are employed as inliers for model training; 2) only discrete supervision information (such as binary or ordinal data labels) is exploited, which leads to suboptimal learning of anomaly scores that essentially take on a continuous distribution. Therefore, this paper proposes a novel semi-supervised anomaly detection method, which devises contamination-resilient continuous supervisory signals. Specifically, we propose a mass interpolation method to diffuse the abnormality of labeled anomalies, thereby creating new data samples labeled with continuous abnormal degrees. Meanwhile, the contaminated area can be covered by new data samples generated via combinations of data with correct labels. A feature learning-based objective is added to serve as an optimization constraint to regularize the network and further enhance the robustness w.r.t. anomaly contamination. Extensive experiments on 11 real-world datasets show that our approach significantly outperforms state-of-the-art competitors by 20%–30% in AUC-PR and obtains more robust and superior performance in settings with different anomaly contamination levels and varying numbers of labeled anomalies.  相似文献   

12.
Learning latent representations for users and points of interests (POIs) is an important task in location-based social networks (LBSN), which could largely benefit multiple location-based services, such as POI recommendation and social link prediction. Many contextual factors, like geographical influence, user social relationship and temporal information, are available in LBSN and would be useful for this task. However, incorporating all these contextual factors for user and POI representation learning in LBSN remains challenging, due to their heterogeneous nature. Although the encouraging performance of POI recommendation and social link prediction are delivered, most of the existing representation learning methods for LBSN incorporate only one or two of these contextual factors. In this paper, we propose a novel joint representation learning framework for users and POIs in LBSN, named UP2VEC. In UP2VEC, we present a heterogeneous LBSN graph to incorporate all these aforementioned factors. Specifically, the transition probabilities between nodes inside the heterogeneous graph are derived by jointly considering these contextual factors. The latent representations of users and POIs are then learnt by matching the topological structure of the heterogeneous graph. For evaluating the effectiveness of UP2VEC, a series of experiments are conducted with two real-world datasets (Foursquare and Gowalla) in terms of POI recommendation and social link prediction. Experimental results demonstrate that the proposed UP2VEC significantly outperforms the existing state-of-the-art alternatives. Further experiment shows the superiority of UP2VEC in handling cold-start problem for POI recommendation.  相似文献   

13.
14.
Recommender Systems deal with the issue of overloading information by retrieving the most relevant sources in the wide range of web services. They help users by predicting their interests in many domains like e-government, social networks, e-commerce and entertainment. Collaborative Filtering (CF) is the most promising technique used in recommender systems to give suggestions based on liked-mind users’ preferences. Despite the widespread use of CF in providing personalized recommendation, this technique has problems including cold start, data sparsity and gray sheep. Eventually, these problems lead to the deterioration of the efficiency of CF. Most existing recommendation methods have been proposed to overcome the problems of CF. However, they fail to suggest the top-n recommendations based on the sequencing of the users’ priorities. In this research, to overcome the shortcomings of CF and current recommendation methods in ranking preference dataset, we have used a new graph-based structure to model the users’ priorities and capture the association between users and items. Users’ profiles are created based on their past and current interest. This is done because their interest can change with time. Our proposed algorithm keeps the preferred items of active user at the beginning of the recommendation list. This means these items come under top-n recommendations, which results in satisfaction among users. The experimental results demonstrate that our algorithm archives the significant improvement in comparison with CF and other proposed recommendation methods in terms of recall, precision, f-measure and MAP metrics using two benchmark datasets including MovieLens and Superstore.  相似文献   

15.
深入挖掘长三角“双一流”高校协同创新的基本特征与时空演进规律,对推进长三角更高质量一体化和提升双一流高校创新策源能力都具有重要意义。研究基于2012-2020年长三角“双一流”高校的授权专利数据,在分析专利总量、合作数量与倾向等特征现状及趋势变化的基础上,进一步基于数据挖掘和可视化的社会网络分析,围绕四种不同的合作类型,探索高校协同创新在时空分布、主体分布等方面的差异与时空演进。研究发现,合作授权专利数量快速增长但占比仍然偏低,上海和安徽表现出不同的变化趋势;在整体网络密度保持稳定的情况下,网络节点规模和强度显著提升;各高校随时间呈现出不同的演进规律,但高水平大学仍占据网络的中心位置;基于分类视角的研究表明,应用型合作(高校-企业)是长三角协同创新的主要形式,受到地理距离约束,而基础型合作(高校-高校、高校-科研院所等)则不受地理距离约束。相关研究结论补充和丰富了已有成果,并具有一定实践参考价值。  相似文献   

16.
The task of answering complex questions requires inferencing and synthesizing information from multiple documents that can be seen as a kind of topic-oriented, informative multi-document summarization. In generic summarization the stochastic, graph-based random walk method to compute the relative importance of textual units (i.e. sentences) is proved to be very successful. However, the major limitation of the TF*IDF approach is that it only retains the frequency of the words and does not take into account the sequence, syntactic and semantic information. This paper presents the impact of syntactic and semantic information in the graph-based random walk method for answering complex questions. Initially, we apply tree kernel functions to perform the similarity measures between sentences in the random walk framework. Then, we extend our work further to incorporate the Extended String Subsequence Kernel (ESSK) to perform the task in a similar manner. Experimental results show the effectiveness of the use of kernels to include the syntactic and semantic information for this task.  相似文献   

17.
对变电站一次误报"重合闸动作"信号进行了分析,针对异常出现的原因,结合变电站运行实际,提出了有针对性的解决方案。  相似文献   

18.
Anomalous event recognition requires an instant response to reduce the loss of human life and property; however, existing automated systems show limited performance due to considerations related to the temporal domain of the videos and ignore the significant role of spatial information. Furthermore, although current surveillance systems can detect anomalous events, they require human intervention to recognise their nature and to select appropriate countermeasures, as there are no fully automatic surveillance techniques that can simultaneously detect and interpret anomalous events. Therefore, we present a framework called Vision Transformer Anomaly Recognition (ViT-ARN) that can detect and interpret anomalies in smart city surveillance videos. The framework consists of two stages: the first involves online anomaly detection, for which a customised, lightweight, one-class deep neural network is developed to detect anomalies in a surveillance environment, while in the second stage, the detected anomaly is further classified into the corresponding class. The size of our anomaly detection model is compressed using a filter pruning strategy based on a geometric median, with the aim of easy adaptability for resource-constrained devices. Anomaly classification is based on vision transformer features and is followed by a bottleneck attention mechanism to enhance the representation. The refined features are passed to a multi-reservoir echo state network for a detailed analysis of real-world anomalies such as vandalism and road accidents. A total of 858 and 1600 videos from two datasets are used to train the proposed model, and extensive experiments on the LAD-2000 and UCF-Crime datasets comprising 290 and 400 testing videos reveal that our framework can recognise anomalies more effectively, outperforming other state-of-the-art approaches with increases in accuracy of 10.14% and 3% on the LAD-2000 and UCF-Crime datasets, respectively.  相似文献   

19.
焉耆盆地库鲁克绿洲扩展时空格局动态分析   总被引:2,自引:0,他引:2  
张杰  潘晓玲 《资源科学》2009,31(8):1369-1377
绿洲是西部干旱区人类生存发展的基础,绿洲的形成、发展和演变一直是干旱区生态与地理研究的核心内容.本研究以位于新疆焉耆盆地库鲁克塔格山北麓、博斯腾湖南畔的库鲁克绿洲为案例,利用4期1973年~2005年期间的遥感影像数据,在斑块水平和景观水平上完整回溯了库鲁克绿洲早期形成、发展和演变的时空格局动态过程.通过空间分析表明,库鲁克绿洲发展演变过程和格局呈现出景观镶嵌斑块的"散布、扩展和融合"循环往复的节律性动态过程特征,不同斑块的规模由子斑块扩散速率、边缘扩展、斑块边界接触和融合过程决定.斑块镶嵌格局的时空节律变化导致了聚集斑块的等级镶嵌结构和动态格局.研究结果可为今后干旱区绿洲形成、发展和演变过程的时空动态预测模拟、管理规划与环境保护等提供了基本理论依据.  相似文献   

20.
徐新良  王靓  蔡红艳 《资源科学》2016,38(9):1742-1752
基于美国国家气候中心发布的全球气象站点日数据,利用克里金插值、线性趋势法、累计距平曲线法、Mann-Kendall显著性检验和多尺度区域统计等方法,系统阐述了“丝绸之路经济带”主要国家1980-2014年气温和降水的变化趋势和空间分布特征。结果表明,研究区近35 年以0.4℃/10a的速率呈现明显升温态势,30.1%的区域升温显著,0.03%的区域降温显著。各国普遍在20世纪末进入偏暖阶段。降水以减少为主,却仅有0.19%的区域减少显著,零星分布于沙特阿拉伯和中国的西部。南亚1991年后进入降水偏多阶段,其余地区多在1999年后进入降水偏少阶段。该成果能够为相关国家在“一带一路”战略的统领下解决和应对气候变化问题提供科学依据和有益参考。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号