文章摘要
基于优化深度置信网络的多源异构数据清洗算法研究
Research on multi source heterogeneous data cleaning algorithm based on optimized deep confidence network
投稿时间:2021-07-14  修订日期:2021-07-14
DOI:
中文关键词: 多源异构数据  Hadoop框架  曼哈顿距离  深度置信网络  属性约简
英文关键词: Multi source heterogeneous data  Hadoop framework  Manhattan distance  deep confidence network  attribute reduction
基金项目:项目编号:2020kcszyjxm035,项目名称:《软件工程与项目管理》课程思政化路径探索与研究,项目类别:课程思政建设研究项目
作者单位邮编
程大勇 安徽工业职业技术学院 244000
摘要点击次数: 481
全文下载次数: 0
中文摘要:
      针对海量工业大数据多源异构性特征,在Hadoop框架下提出一种基于优化深度置信网络的大数据清洗方法研究。利用曼哈顿距离描述高维空间内多源异构数据之间的关系,构建深度置信网络模型,基于隐含层中的RBM结构训练样本数据,并利用能量函数联合概率优化神经网络模型,实现数据属性约简,并剔除冗余、错误及不完整的干扰数据。实验结果表明,提出大数据清洗算法具有更高的效率和稳定性,查准率、查全率及字符串匹配准确率等指标,均优于现有清洗方法。
英文摘要:
      Aiming at the multi-source heterogeneity of massive industrial big data, this paper proposed a big data cleaning method based on deep belief network under the Hadoop framework. Manhattan distance was used to describe the relationship between multi-source heterogeneous data in high-dimensional space; depth confidence network model was constructed, sample data was trained based on RBM structure in hidden layer, and neural network model was optimized by joint probability of energy function to achieve data attribute reduction and eliminate redundant, error and incomplete interference data. Experimental results showed that the proposed big data cleaning algorithm had higher efficiency and stability, precision, recall and string matching accuracy, which were better than the existing industrial big data cleaning methods.
View Fulltext   查看/发表评论  下载PDF阅读器
关闭

手机扫一扫看 分享按钮