期刊文献+
共找到955篇文章
< 1 2 48 >
每页显示 20 50 100
A New Algorithm for Mining Frequent Pattern 被引量:2
1
作者 李力 靳蕃 《Journal of Southwest Jiaotong University(English Edition)》 2002年第1期10-20,共11页
Mining frequent pattern in transaction database, time series databases, and many other kinds of databases have been studied popularly in data mining research. Most of the previous studies adopt Apriori like candidat... Mining frequent pattern in transaction database, time series databases, and many other kinds of databases have been studied popularly in data mining research. Most of the previous studies adopt Apriori like candidate set generation and test approach. However, candidate set generation is very costly. Han J. proposed a novel algorithm FP growth that could generate frequent pattern without candidate set. Based on the analysis of the algorithm FP growth, this paper proposes a concept of equivalent FP tree and proposes an improved algorithm, denoted as FP growth * , which is much faster in speed, and easy to realize. FP growth * adopts a modified structure of FP tree and header table, and only generates a header table in each recursive operation and projects the tree to the original FP tree. The two algorithms get the same frequent pattern set in the same transaction database, but the performance study on computer shows that the speed of the improved algorithm, FP growth * , is at least two times as fast as that of FP growth. 展开更多
关键词 data mining algorithm frequent pattern set FP growth
在线阅读 下载PDF
Quantum Algorithm for Mining Frequent Patterns for Association Rule Mining 被引量:1
2
作者 Abdirahman Alasow Marek Perkowski 《Journal of Quantum Information Science》 CAS 2023年第1期1-23,共23页
Maximum frequent pattern generation from a large database of transactions and items for association rule mining is an important research topic in data mining. Association rule mining aims to discover interesting corre... Maximum frequent pattern generation from a large database of transactions and items for association rule mining is an important research topic in data mining. Association rule mining aims to discover interesting correlations, frequent patterns, associations, or causal structures between items hidden in a large database. By exploiting quantum computing, we propose an efficient quantum search algorithm design to discover the maximum frequent patterns. We modified Grover’s search algorithm so that a subspace of arbitrary symmetric states is used instead of the whole search space. We presented a novel quantum oracle design that employs a quantum counter to count the maximum frequent items and a quantum comparator to check with a minimum support threshold. The proposed derived algorithm increases the rate of the correct solutions since the search is only in a subspace. Furthermore, our algorithm significantly scales and optimizes the required number of qubits in design, which directly reflected positively on the performance. Our proposed design can accommodate more transactions and items and still have a good performance with a small number of qubits. 展开更多
关键词 data mining Association Rule mining frequent Pattern Apriori algorithm Quantum Counter Quantum Comparator Grover’s Search algorithm
在线阅读 下载PDF
Mining φ-Frequent Itemset Using FP-Tree
3
作者 李天瑞 《Journal of Modern Transportation》 2001年第1期67-74,共8页
The problem of association rule mining has gained considerable prominence in the data mining community for its use as an important tool of knowledge discovery from large scale databases. And there has been a spurt of... The problem of association rule mining has gained considerable prominence in the data mining community for its use as an important tool of knowledge discovery from large scale databases. And there has been a spurt of research activities around this problem. However, traditional association rule mining may often derive many rules in which people are uninterested. This paper reports a generalization of association rule mining called φ association rule mining. It allows people to have different interests on different itemsets that arethe need of real application. Also, it can help to derive interesting rules and substantially reduce the amount of rules. An algorithm based on FP tree for mining φ frequent itemset is presented. It is shown by experiments that the proposed methodis efficient and scalable over large databases. 展开更多
关键词 data processing dataBASES φ association rule mining φ frequent itemset FP tree data mining
在线阅读 下载PDF
Backward Support Computation Method for Positive and Negative Frequent Itemset Mining
4
作者 Mrinmoy Biswas Akash Indrani Mandal Md. Selim Al Mamun 《Journal of Data Analysis and Information Processing》 2023年第1期37-48,共12页
Association rules mining is a major data mining field that leads to discovery of associations and correlations among items in today’s big data environment. The conventional association rule mining focuses mainly on p... Association rules mining is a major data mining field that leads to discovery of associations and correlations among items in today’s big data environment. The conventional association rule mining focuses mainly on positive itemsets generated from frequently occurring itemsets (PFIS). However, there has been a significant study focused on infrequent itemsets with utilization of negative association rules to mine interesting frequent itemsets (NFIS) from transactions. In this work, we propose an efficient backward calculating negative frequent itemset algorithm namely EBC-NFIS for computing backward supports that can extract both positive and negative frequent itemsets synchronously from dataset. EBC-NFIS algorithm is based on popular e-NFIS algorithm that computes supports of negative itemsets from the supports of positive itemsets. The proposed algorithm makes use of previously computed supports from memory to minimize the computation time. In addition, association rules, i.e. positive and negative association rules (PNARs) are generated from discovered frequent itemsets using EBC-NFIS algorithm. The efficiency of the proposed algorithm is verified by several experiments and comparing results with e-NFIS algorithm. The experimental results confirm that the proposed algorithm successfully discovers NFIS and PNARs and runs significantly faster than conventional e-NFIS algorithm. 展开更多
关键词 data mining Positive frequent itemset Negative frequent itemset Association Rule Backward Support
在线阅读 下载PDF
FPGA-Based Stream Processing for Frequent Itemset Mining with Incremental Multiple Hashes
5
作者 Kasho Yamamoto Masayuki Ikebe +1 位作者 Tetsuya Asai Masato Motomura 《Circuits and Systems》 2016年第10期3299-3309,共11页
With the advent of the IoT era, the amount of real-time data that is processed in data centers has increased explosively. As a result, stream mining, extracting useful knowledge from a huge amount of data in real time... With the advent of the IoT era, the amount of real-time data that is processed in data centers has increased explosively. As a result, stream mining, extracting useful knowledge from a huge amount of data in real time, is attracting more and more attention. It is said, however, that real- time stream processing will become more difficult in the near future, because the performance of processing applications continues to increase at a rate of 10% - 15% each year, while the amount of data to be processed is increasing exponentially. In this study, we focused on identifying a promising stream mining algorithm, specifically a Frequent Itemset Mining (FIsM) algorithm, then we improved its performance using an FPGA. FIsM algorithms are important and are basic data- mining techniques used to discover association rules from transactional databases. We improved on an approximate FIsM algorithm proposed recently so that it would fit onto hardware architecture efficiently. We then ran experiments on an FPGA. As a result, we have been able to achieve a speed 400% faster than the original algorithm implemented on a CPU. Moreover, our FPGA prototype showed a 20 times speed improvement compared to the CPU version. 展开更多
关键词 data mining frequent itemset mining FPGA Stream Processing
在线阅读 下载PDF
A Fast Distributed Algorithm for Association Rule Mining Based on Binary Coding Mapping Relation
6
作者 CHEN Geng NI Wei-wei +1 位作者 ZHU Yu-quan SUN Zhi-hui 《Wuhan University Journal of Natural Sciences》 EI CAS 2006年第1期27-30,共4页
Association rule mining is an important issue in data mining. The paper proposed an binary system based method to generate candidate frequent itemsets and corresponding supporting counts efficiently, which needs only ... Association rule mining is an important issue in data mining. The paper proposed an binary system based method to generate candidate frequent itemsets and corresponding supporting counts efficiently, which needs only some operations such as "and", "or" and "xor". Applying this idea in the existed distributed association rule mining al gorithm FDM, the improved algorithm BFDM is proposed. The theoretical analysis and experiment testify that BFDM is effective and efficient. 展开更多
关键词 frequent itemsets distributed association rule mining relation of itemsets-binary data
在线阅读 下载PDF
Fast FP-Growth for association rule mining 被引量:1
7
作者 杨明 杨萍 +1 位作者 吉根林 孙志挥 《Journal of Southeast University(English Edition)》 EI CAS 2003年第4期320-323,共4页
In this paper, we propose an efficient algorithm, called FFP-Growth (shortfor fast FP-Growth) , to mine frequent itemsets. Similar to FP-Growth, FFP-Growth searches theFP-tree in the bottom-up order, but need not cons... In this paper, we propose an efficient algorithm, called FFP-Growth (shortfor fast FP-Growth) , to mine frequent itemsets. Similar to FP-Growth, FFP-Growth searches theFP-tree in the bottom-up order, but need not construct conditional pattern bases and sub-FP-trees,thus, saving a substantial amount of time and space, and the FP-tree created by it is much smallerthan that created by TD-FP-Growth, hence improving efficiency. At the same time, FFP-Growth can beeasily extended for reducing the search space as TD-FP-Growth (M) and TD-FP-Growth (C). Experimentalresults show that the algorithm of this paper is effective and efficient. 展开更多
关键词 data mining frequent itemsets association rules frequent pattern tree(FP-tree)
在线阅读 下载PDF
基于区域供热方案的高级㶲分析模型
8
作者 柯一鸿 林艺龙 +2 位作者 林周勇 易秉恒 王炯铭 《电子设计工程》 2026年第2期192-196,共5页
为了评估热电联产系统在区域供热中的性能和效率,构建了基于区域供热方案的高级㶲分析模型。利用普通类间距离准则和加权类间距离准则,挖掘用于供暖潜能评估的频繁项集条件模式基,将条件fg-tree的构建与挖掘过程视为一种递归处理过程。以... 为了评估热电联产系统在区域供热中的性能和效率,构建了基于区域供热方案的高级㶲分析模型。利用普通类间距离准则和加权类间距离准则,挖掘用于供暖潜能评估的频繁项集条件模式基,将条件fg-tree的构建与挖掘过程视为一种递归处理过程。以㶲效率指标和能级平衡系数指标为度量依据,分析了基于热电联产系统的高级㶲分析模型的性能,从而实现热电联产系统中的热能评估。实验结果表明,正常生产工况下,文中模型计算得到的㶲效率为0.65,能级平衡系数为0.82,优于对比方法,并具有更高的评估效率。 展开更多
关键词 大数据频繁项集挖掘算法 热电联产 㶲效率 区域供热
在线阅读 下载PDF
基于Apriori算法的人才需求分析
9
作者 郑翊 《山西电子技术》 2026年第1期98-100,共3页
基于大数据技术以及人工智能技术飞速发展下,推动软件工程应用范围逐步扩大,为多个领域的发展起到了重要推动作用,并在相应的领域中取得了良好的成效。如何借助大数据时代下的技术提取分析软件工程专业的人才技能模型,是高校目前的研究... 基于大数据技术以及人工智能技术飞速发展下,推动软件工程应用范围逐步扩大,为多个领域的发展起到了重要推动作用,并在相应的领域中取得了良好的成效。如何借助大数据时代下的技术提取分析软件工程专业的人才技能模型,是高校目前的研究热点。基于此,使用以TF-IDF值采样方法抽取文本数据集中的岗位关键词,再选择简单关联分析中经典的Apriori算法来进行技能分析。并从多个角度和维度对软件工程专业就业技能的需求进行了分析。获得了软件工程人才市场需求信息的岗位技能需求,准确地了解企业所需人才的特定要求,从而为制定与企业需求相符的人才培养方案提供科学的决策支持。 展开更多
关键词 大数据 软件工程 文本挖掘技术 APRIORI算法
在线阅读 下载PDF
大数据背景下化工分析检验数据挖掘研究
10
作者 陈邦富 《粘接》 2026年第3期795-798,共4页
针对化工分析检验数据存在的数据量大、产生速度快、类型过多、价值密度低等问题,以及传统数据统计处理方法已经无法对其进行处理的挑战,提出并构建了一个大数据背景下化工分析检验数据挖掘框架。首先采用架构设计对整体框架进行搭建,... 针对化工分析检验数据存在的数据量大、产生速度快、类型过多、价值密度低等问题,以及传统数据统计处理方法已经无法对其进行处理的挑战,提出并构建了一个大数据背景下化工分析检验数据挖掘框架。首先采用架构设计对整体框架进行搭建,然后对框架中数据挖掘层的核心算法,即BP神经网络进行改进和优化,最后通过实验与测试对数据挖掘框架的有效性和实用性进行验证。测试结果表明:经过本文改进后的BP神经网络模型在测试中MSE值最低,具有较高的预测精度、泛化能力以及稳定性,可用作大数据背景下化工分析检验数据挖掘框架的核心算法;同时该数据挖掘框架可以从海量化工检验数据中发现潜在规律,对化工材料配方进行优化,为化工行业的智能升级提供了有力支持。 展开更多
关键词 大数据 数据挖掘 化工分析经验数据 BP神经网络 粒子群优化算法
在线阅读 下载PDF
An efficient and resilience linear prefix approach for mining maximal frequent itemset using clustering
11
作者 M.Sinthuja S.Pravinthraja +3 位作者 B K Dhanalakshmi H L Gururaj Vinayakumar Ravi G Jyothish Lal 《Journal of Safety Science and Resilience》 2025年第1期93-104,共12页
The numerous volumes of data generated every day necessitate the deployment of new technologies capable of dealing with massive amounts of data efficiently.This is the case with Association Rules,a tool for unsupervis... The numerous volumes of data generated every day necessitate the deployment of new technologies capable of dealing with massive amounts of data efficiently.This is the case with Association Rules,a tool for unsupervised data mining that extracts information in the form of IF-THEN patterns.Although various approaches for extracting frequent itemset(prior step before mining association rules)in extremely large databases have been presented,the high computational cost and shortage of memory remain key issues to be addressed while processing enormous data.The objective of this research is to discover frequent itemset by using clustering for preprocessing and adopting the linear prefix tree algorithm for mining the maximal frequent itemset.The performance of the proposed CL-LP-MAX-tree was evaluated by comparing it with the existing FP-max algorithm.Experimentation was performed with the three different standard datasets to record evidence to prove that the proposed CL-LP-MAX-tree algorithm outperform the existing FP-max algorithm in terms of runtime and memory consumption. 展开更多
关键词 CLUSTERING data mining frequent itemset mining Linear prefix tree Maximal frequent itemset mining
原文传递
Parallel Incremental Frequent Itemset Mining for Large Data 被引量:5
12
作者 Yu-Geng Song Hui-Min Cui Xiao-Bing Feng 《Journal of Computer Science & Technology》 SCIE EI CSCD 2017年第2期368-385,共18页
Frequent itemset mining (FIM) is a popular data mining issue adopted in many fields, such as commodity recommendation in the retail industry, log analysis in web searching, and query recommendation (or related sea... Frequent itemset mining (FIM) is a popular data mining issue adopted in many fields, such as commodity recommendation in the retail industry, log analysis in web searching, and query recommendation (or related search). A large number of FIM algorithms have been proposed to obtain better performance, including parallelized algorithms for processing large data volumes. Besides, incremental FIM algorithms are also proposed to deal with incremental database updates. However, most of these incremental algorithms have low parallelism, causing low efficiency on huge databases. This paper presents two parallel incremental FIM algorithms called IncMiningPFP and IncBuildingPFP, implemented on the MapReduce framework. IncMiningPFP preserves the FP-tree mining results of the original pass, and utilizes them for incremental calculations. In particular, we propose a method to generate a partial FP-tree in the incremental pass, in order to avoid unnecessary mining work. Further, some of the incremental parallel tasks can be omitted when the inserted transactions include fewer items. IncbuildingPFP preserves the CanTrees built in the original pass, and then adds new transactions to them during the incremental passes. Our experimental results show that IncMiningPFP can achieve significant speedup over PFP (Parallel FPGrowth) and a sequential incremental algorithm (CanTree) in most cases of incremental input database, and in other cases IncBuildingPFP can achieve it. 展开更多
关键词 incremental parallel FPGrowth data mining frequent itemset mining MAPREDUCE
原文传递
Research and Application on Web Information Retrieval Based on Improved FP-Growth Algorithm 被引量:3
13
作者 JIAO Minghai YAN Ping JIANG Huiyan 《Wuhan University Journal of Natural Sciences》 CAS 2006年第5期1065-1068,共4页
A kind of single linked lists named aggregative chain is introduced to the algorithm, thus improving the architecture of FP tree. The new FP tree is a one-way tree and only the pointers that point its parent at each n... A kind of single linked lists named aggregative chain is introduced to the algorithm, thus improving the architecture of FP tree. The new FP tree is a one-way tree and only the pointers that point its parent at each node are kept. Route information of different nodes in a same item are compressed into aggregative chains so that the frequent patterns will be produced in aggregative chains without generating node links and conditional pattern bases. An example of Web key words retrieval is given to analyze and verify the frequent pattern algorithm in this paper. 展开更多
关键词 data mining CHAINS FP-growth algorithm frequent pattern aggregative information retrieval
在线阅读 下载PDF
基于ISODATA的电力负荷曲线分类 被引量:9
14
作者 李仲恒 刘蓉晖 《上海电力学院学报》 CAS 2019年第4期327-332,共6页
迭代自组织数据分析算法(ISODATA)是一种基于统计模式识别的非监督学习动态聚类算法。针对当前各算法初始聚类数取值困难、容易陷入局部最优等问题,介绍了ISODATA的原理和实现步骤,并将此算法应用于负荷分类中。在MATLAB中结合具体日负... 迭代自组织数据分析算法(ISODATA)是一种基于统计模式识别的非监督学习动态聚类算法。针对当前各算法初始聚类数取值困难、容易陷入局部最优等问题,介绍了ISODATA的原理和实现步骤,并将此算法应用于负荷分类中。在MATLAB中结合具体日负荷曲线样本进行聚类分析,结果证明聚类效果较好。将ISODATA与各种传统聚类方法进行了对比实验,比较各种算法的聚类效果、预定聚类数目对算法结果的影响,以及初始聚类中心的选择对结果的影响。对比结果证明,此方法适用于负荷分类的研究。 展开更多
关键词 迭代自组织数据分析算法 聚类 日负荷曲线 曲线识别 大数据 数据挖掘
在线阅读 下载PDF
Fast Discovering Frequent Patterns for Incremental XML Queries
15
作者 PENGDun-lu QIUYang 《Wuhan University Journal of Natural Sciences》 EI CAS 2004年第5期638-646,共9页
It is nontrivial to maintain such discovered frequent query patterns in real XML-DBMS because the transaction database of queries may allow frequent updates and such updates may not only invalidate some existing frequ... It is nontrivial to maintain such discovered frequent query patterns in real XML-DBMS because the transaction database of queries may allow frequent updates and such updates may not only invalidate some existing frequent query patterns but also generate some new frequent query patterns. In this paper, two incremental updating algorithms, FUX-QMiner and FUXQMiner, are proposed for efficient maintenance of discovered frequent query patterns and generation the new frequent query patterns when new XMI, queries are added into the database. Experimental results from our implementation show that the proposed algorithms have good performance. Key words XML - frequent query pattern - incremental algorithm - data mining CLC number TP 311 Foudation item: Supported by the Youthful Foundation for Scientific Research of University of Shanghai for Science and TechnologyBiography: PENG Dun-lu (1974-), male, Associate professor, Ph.D, research direction: data mining, Web service and its application, peerto-peer computing. 展开更多
关键词 XML frequent query pattern incremental algorithm data mining
在线阅读 下载PDF
Unstructured Big Data Threat Intelligence Parallel Mining Algorithm
16
作者 Zhihua Li Xinye Yu +1 位作者 Tao Wei Junhao Qian 《Big Data Mining and Analytics》 EI CSCD 2024年第2期531-546,共16页
To efficiently mine threat intelligence from the vast array of open-source cybersecurity analysis reports on the web,we have developed the Parallel Deep Forest-based Multi-Label Classification(PDFMLC)algorithm.Initial... To efficiently mine threat intelligence from the vast array of open-source cybersecurity analysis reports on the web,we have developed the Parallel Deep Forest-based Multi-Label Classification(PDFMLC)algorithm.Initially,open-source cybersecurity analysis reports are collected and converted into a standardized text format.Subsequently,five tactics category labels are annotated,creating a multi-label dataset for tactics classification.Addressing the limitations of low execution efficiency and scalability in the sequential deep forest algorithm,our PDFMLC algorithm employs broadcast variables and the Lempel-Ziv-Welch(LZW)algorithm,significantly enhancing its acceleration ratio.Furthermore,our proposed PDFMLC algorithm incorporates label mutual information from the established dataset as input features.This captures latent label associations,significantly improving classification accuracy.Finally,we present the PDFMLC-based Threat Intelligence Mining(PDFMLC-TIM)method.Experimental results demonstrate that the PDFMLC algorithm exhibits exceptional node scalability and execution efficiency.Simultaneously,the PDFMLC-TIM method proficiently conducts text classification on cybersecurity analysis reports,extracting tactics entities to construct comprehensive threat intelligence.As a result,successfully formatted STIX2.1 threat intelligence is established. 展开更多
关键词 unstructured big data mining parallel deep forest multi-label classification algorithm threat intelligence
原文传递
基于频繁模式树和深度学习的频繁项集挖掘算法 被引量:1
17
作者 李洋 李华 《黑龙江工业学院学报(综合版)》 2025年第1期94-98,共5页
随着数据量的急剧增长,从海量数据中挖掘有价值的信息变得尤为重要。频繁项集挖掘作为数据挖掘的一个关键领域,旨在识别数据集中频繁出现的项集,这些项集能够揭示数据间的内在联系,并为后续的高级分析提供基础。然而,传统的频繁项集挖... 随着数据量的急剧增长,从海量数据中挖掘有价值的信息变得尤为重要。频繁项集挖掘作为数据挖掘的一个关键领域,旨在识别数据集中频繁出现的项集,这些项集能够揭示数据间的内在联系,并为后续的高级分析提供基础。然而,传统的频繁项集挖掘算法在处理大规模数据集时面临准确性和效率的挑战。为了解决这些问题,本研究提出频繁模式树和深度学习的新型频繁项集挖掘算法。该算法首先利用深度置信网络提取数据的高级特征,然后基于这些特征构建频繁模式树,以高效挖掘频繁项集。实验结果表明,该算法在查全率和查准率方面均表现优异,查全率高达97.56%,查准率高达95.49%,显示出其在实际应用中的高准确性和广泛适用性。 展开更多
关键词 频繁模式树 深度学习 频繁项集 数据挖掘 挖掘算法
在线阅读 下载PDF
基于孤立森林算法的供电营销大数据异常挖掘方法 被引量:1
18
作者 于亮 钟宏伟 +2 位作者 冯祎辰 肖莞 张硕 《微型电脑应用》 2025年第2期51-54,共4页
当前供电营销大数据异常挖掘方法中异常数据检测环节较为薄弱,导致数据异常挖掘率相对较低且误检率较高,为此,提出基于孤立森林算法的供电营销大数据异常挖掘方法。构建孤立森林模型检测原始数据,借鉴二叉搜索树(BST)方法确定模型之间... 当前供电营销大数据异常挖掘方法中异常数据检测环节较为薄弱,导致数据异常挖掘率相对较低且误检率较高,为此,提出基于孤立森林算法的供电营销大数据异常挖掘方法。构建孤立森林模型检测原始数据,借鉴二叉搜索树(BST)方法确定模型之间数据距离,获取异常数据。使用离散小波变换方法,提取异常数据特征。对传统k-means算法进行优化,结合异常数据特征构建数据簇,实现对异常数据的挖掘分析。构建应用测试环节,测试结果表明,所提方法可有效提升供电营销大数据异常挖掘率,进一步降低数据异常挖掘误检率,为供电营销大数据分析环节提供新的发展方向。 展开更多
关键词 孤立森林算法 供电营销 大数据平台 异常数据挖掘 聚类数目
在线阅读 下载PDF
基于大数据深度挖掘技术的电力负荷智能感知系统 被引量:1
19
作者 马国真 王云佳 +2 位作者 夏静 彭寒 邵华 《电子设计工程》 2025年第14期171-175,共5页
为提升电力系统对电力负荷需求的满足能力,设计了基于大数据深度挖掘技术的电力负荷智能感知系统。该系统通过数据连接模块将电力负荷数据汇总、暂存。利用数据挖掘模块对暂存数据进行分析,获取最优聚类。依据预测时间段的属性数据实现... 为提升电力系统对电力负荷需求的满足能力,设计了基于大数据深度挖掘技术的电力负荷智能感知系统。该系统通过数据连接模块将电力负荷数据汇总、暂存。利用数据挖掘模块对暂存数据进行分析,获取最优聚类。依据预测时间段的属性数据实现电力负荷预测。同时,结合特征权重算法去除电力负荷风险特征中的冗余项,构建最优风险特征子集,并采用支持向量机建立风险预警模型。将预测结果与最优风险特征子集输入预警模型,实现负荷预警。实验结果表明,该系统能准确预测电力负荷情况,且电力负荷风险预警结果的Ka值高达0.9以上,显著提升电力系统对电力负荷需求的满足能力。 展开更多
关键词 大数据 深度挖掘 聚类算法 负荷预测 特征选择 风险预警
在线阅读 下载PDF
闭合高效用项集的枚举空间并行挖掘算法
20
作者 李成严 孙安祺 刘松霖 《哈尔滨理工大学学报》 北大核心 2025年第6期29-42,共14页
针对大数据环境下高效用项集挖掘的结果冗余和时间开销问题,提出一种闭合高效用项集并行挖掘算法(closed high utility itemsets mining on spark, SpCHUIM)。将在闭项集中定义的后缀集合等概念应用于高效用项集挖掘,可精简结果并减少... 针对大数据环境下高效用项集挖掘的结果冗余和时间开销问题,提出一种闭合高效用项集并行挖掘算法(closed high utility itemsets mining on spark, SpCHUIM)。将在闭项集中定义的后缀集合等概念应用于高效用项集挖掘,可精简结果并减少内存占用,结合高效用项集闭包性质,简化加权效用计算;采用前缀划分策略减少交集操作,降低时间成本。算法在构建项集超集的过程中,使用深度优先搜索方式构造枚举空间以保证所生成项集的完备性;在Spark框架下实现并行算法,完成大数据环境下的闭合项集挖掘;在mushroom等数据集上进行挖掘实验。与其他文献对比,算法运行效率提升了50%。在稠密数据集上进行前缀划分策略的消融实验,结果表明:在移除前缀划分策略后,算法运行时间延长30%。 展开更多
关键词 高效用项集 大数据 闭合项集 并行计算 数据挖掘
在线阅读 下载PDF
上一页 1 2 48 下一页 到第
使用帮助 返回顶部