首页 | 本学科首页   官方微博 | 高级检索  
     

一种客户关系数据库相似重复记录清洗算法
引用本文:郭文龙. 一种客户关系数据库相似重复记录清洗算法[J]. 衡水师专学报, 2014, 0(1): 15-17
作者姓名:郭文龙
作者单位:福建江夏学院电子信息科学学院,福建福州350108
基金项目:福建省教育厅A类科技项目(JA12335)
摘    要:客户关系数据库中拥有大量的客户记录,其中许多记录构成相似重复记录,检测、清洗进而合并相似重复记录可以提高存储空间的利用率,还可以加快记录查询的速度。在研究客户记录的基础上,提出一种客户关系数据库相似重复记录清洗算法,算法首先对记录进行排序,设定属性权重和记录相似度闸值,通过计算相邻记录的相似度判定记录是否相似重复,最后对检测到的相似重复记录进行清洗与合并。

关 键 词:客户关系  相似重复记录  清洗  合并

A Cleaning Algorithm for Approximately Duplicated Records in Customer Relationship Database
GUO Wen-long. A Cleaning Algorithm for Approximately Duplicated Records in Customer Relationship Database[J]. Journal of Hengshui Normal College, 2014, 0(1): 15-17
Authors:GUO Wen-long
Affiliation:GUO Wen-long (College of Electronics and Infolanation Science, Fujian Jiangxia University, Fuzhou, Fujian 350108, China)
Abstract:Customer relationship database has a large number of customer records, many of which constitute approximately duplicated records. Detecting, cleaning and then merging approximately duplicated records can improve storage utilization, and can also improve the speed of searching records. Based on the research of customer records, an algorithm which is used to clean approximately duplicated records in customer relationship database is proposed. In this algorithm, first, records are sorted;the property weight and records similarity values are set. Then by calculating the similarity between adjacent records, approximate or duplicate records are judged. Finally the detected approximately duplicated records are cleaned and merged.
Keywords:customer relationship  approximately duplicated records  cleaning  merge
本文献已被 维普 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号