Currently,the top-rank-k has been widely applied to mine frequent patterns with a rank not exceeding k.In the existing algorithms,although a level-wise-search could fully mine the target patterns,it usually leads to t...Currently,the top-rank-k has been widely applied to mine frequent patterns with a rank not exceeding k.In the existing algorithms,although a level-wise-search could fully mine the target patterns,it usually leads to the delay of high rank patterns generation,resulting in the slow growth of the support threshold and the mining efficiency.Aiming at this problem,a greedy-strategy-based top-rank-k frequent patterns hybrid mining algorithm(GTK)is proposed in this paper.In this algorithm,top-rank-k patterns are stored in a static doubly linked list called RSL,and the patterns are divided into short patterns and long patterns.The short patterns generated by a rank-first-search always joins the two patterns of the highest rank in RSL that have not yet been joined.On the basis of the short patterns satisfying specific conditions,the long patterns are extracted through level-wise-search.To reduce redundancy,GTK improves the generation method of subsume index and designs the new pruning strategies of candidates.This algorithm also takes the use of reasonable pruning strategies to reduce the amount of computation to improve the computational speed.Real datasets and synthetic datasets are adopted in experiments to evaluate the proposed algorithm.The experimental results show the obvious advantages in both time efficiency and space efficiency of GTK.展开更多
Because data warehouse is frequently changing, incremental data leads to old knowledge which is mined formerly unavailable. In order to maintain the discovered knowledge and patterns dynamically, this study presents a...Because data warehouse is frequently changing, incremental data leads to old knowledge which is mined formerly unavailable. In order to maintain the discovered knowledge and patterns dynamically, this study presents a novel algorithm updating for global frequent patterns-IPARUC. A rapid clustering method is introduced to divide database into n parts in IPARUC firstly, where the data are similar in the same part. Then, the nodes in the tree are adjusted dynamically in inserting process by "pruning and laying back" to keep the frequency descending order so that they can be shared to approaching optimization. Finally local frequent itemsets mined from each local dataset are merged into global frequent itemsets. The results of experimental study are very encouraging. It is obvious from experiment that IPARUC is more effective and efficient than other two contrastive methods. Furthermore, there is significant application potential to a prototype of Web log Analyzer in web usage mining that can help us to discover useful knowledge effectively, even help managers making decision.展开更多
Because mining complete set of frequent patterns from dense database could be impractical, an interesting alternative has been proposed recently. Instead of mining the complete set of frequent patterns, the new model ...Because mining complete set of frequent patterns from dense database could be impractical, an interesting alternative has been proposed recently. Instead of mining the complete set of frequent patterns, the new model only finds out the maximal frequent patterns, which can generate all frequent patterns. FP-growth algorithm is one of the most efficient frequent-pattern mining methods published so far. However, because FP-tree and conditional FP-trees must be two-way traversable, a great deal memory is needed in process of mining. This paper proposes an efficient algorithm Unid_FP-Max for mining maximal frequent patterns based on unidirectional FP-tree. Because of generation method of unidirectional FP-tree and conditional unidirectional FP-trees, the algorithm reduces the space consumption to the fullest extent. With the development of two techniques: single path pruning and header table pruning which can cut down many conditional unidirectional FP-trees generated recursively in mining process, Unid_FP-Max further lowers the expense of time and space.展开更多
Maximum frequent pattern generation from a large database of transactions and items for association rule mining is an important research topic in data mining. Association rule mining aims to discover interesting corre...Maximum frequent pattern generation from a large database of transactions and items for association rule mining is an important research topic in data mining. Association rule mining aims to discover interesting correlations, frequent patterns, associations, or causal structures between items hidden in a large database. By exploiting quantum computing, we propose an efficient quantum search algorithm design to discover the maximum frequent patterns. We modified Grover’s search algorithm so that a subspace of arbitrary symmetric states is used instead of the whole search space. We presented a novel quantum oracle design that employs a quantum counter to count the maximum frequent items and a quantum comparator to check with a minimum support threshold. The proposed derived algorithm increases the rate of the correct solutions since the search is only in a subspace. Furthermore, our algorithm significantly scales and optimizes the required number of qubits in design, which directly reflected positively on the performance. Our proposed design can accommodate more transactions and items and still have a good performance with a small number of qubits.展开更多
It is nontrivial to maintain such discovered frequent query patterns in real XML-DBMS because the transaction database of queries may allow frequent updates and such updates may not only invalidate some existing frequ...It is nontrivial to maintain such discovered frequent query patterns in real XML-DBMS because the transaction database of queries may allow frequent updates and such updates may not only invalidate some existing frequent query patterns but also generate some new frequent query patterns. In this paper, two incremental updating algorithms, FUX-QMiner and FUXQMiner, are proposed for efficient maintenance of discovered frequent query patterns and generation the new frequent query patterns when new XMI, queries are added into the database. Experimental results from our implementation show that the proposed algorithms have good performance. Key words XML - frequent query pattern - incremental algorithm - data mining CLC number TP 311 Foudation item: Supported by the Youthful Foundation for Scientific Research of University of Shanghai for Science and TechnologyBiography: PENG Dun-lu (1974-), male, Associate professor, Ph.D, research direction: data mining, Web service and its application, peerto-peer computing.展开更多
In this paper, we propose an enhanced associative classification method by integrating the dynamic property in the process of associative classification. In the proposed method, we employ a support vector machine(SVM...In this paper, we propose an enhanced associative classification method by integrating the dynamic property in the process of associative classification. In the proposed method, we employ a support vector machine(SVM) based method to refine the discovered emerging ~equent patterns for classification rule extension for class label prediction. The empirical study shows that our method can be used to classify increasing resources efficiently and effectively.展开更多
Mining frequent patterns has been studied popularly in data mining area. However, little work has been done on mining patterns when the database has an influx of fresh data constantly. In these dynamic scenarios, effi...Mining frequent patterns has been studied popularly in data mining area. However, little work has been done on mining patterns when the database has an influx of fresh data constantly. In these dynamic scenarios, efficient maintenance of the discovered patterns is crucial. Most existing methods need to scan the entire database repeatedly, which is an obvious disadvantage. In this paper, an efficient incremental mining algorithm, Incremental-Mining (IM), is proposed for maintenance of the frequent patterns when incremental data come. Based on the frequent pattern tree (FP-tree) structure, IM gives a way to make the most of the things from the previous mining process, and requires scanning the original data once at most. Furthermore, IM can identify directly the differential set of frequent patterns, which may be more informative to users. Moreover, IM can deal with changing thresholds as well as changing data, thus provide a full maintenance scheme. IM has been implemented and the performance study shows it outperforms three other incremental algorithms: FUP, DB-tree and re-running frequent pattern growth (FP-growth). Keywords data mining - association rule mining - frequent pattern mining - incremental mining Supported by the National Basic Research 973 Program of China under Grant No.G1999032705.Xiu-Li Ma received the Ph.D. degree in computer science from Peking University in 2003. She is currently a postdoctoral researcher at National Lab on Machine Perception of Peking University. Her main research interests include data warehousing, data mining, intelligent online analysis, and sensor network.Yun-Hai Tong received the Ph.D. degree in computer software from Peking University in 2002. He is currently an assistant professor at School of Electronics Engineering and Computer Science of Peking University. His research interests include data warehousing, online analysis processing and data mining.Shi-Wei Tang received the B.S. degree in mathematics from Peking University in 1964. Now, he is a professor and Ph.D. supervisor at School of Electronics Engineering and Computer Science of Peking University. His research interests include DBMS, information integration, data warehousing. OLAP, and data mining, database technology in specific application fields. He is the vice chair of the Database Society of China Computer Federation.Dong-Qing Yang received the B.S. degree in mathematics from Peking University in 1969. Now, she is a professor and Ph.D supervisor at School of Electronics Engineering and Computer Science of Peking University. Her research interests include database design methodology, database system implementation techniques, data warehousing and data mining, information integration and sharing in Web environment. She is a member of academic committee of Database Society of China Computer Federation.展开更多
We propose an efficient hybrid algorithm WDHP in this paper for mining frequent access patterns. WDHP adopts the techniques of DHP to optimize its performance, which is using hash table to filter candidate set and tri...We propose an efficient hybrid algorithm WDHP in this paper for mining frequent access patterns. WDHP adopts the techniques of DHP to optimize its performance, which is using hash table to filter candidate set and trimming database. Whenever the database is trimmed to a size less than a specified threshold, the algorithm puts the database into main memory by constructing a tree, and finds frequent patterns on the tree. The experiment shows that WDHP outperform algorithm DHP and main memory based algorithm WAP in execution efficiency.展开更多
It is of great significance to improve the efficiency of railway production and operation by realizing the fault knowledge association through the efficient data mining algorithm.However,high utility quantitative freq...It is of great significance to improve the efficiency of railway production and operation by realizing the fault knowledge association through the efficient data mining algorithm.However,high utility quantitative frequent pattern mining algorithms in the field of data mining still suffer from the problems of low time-memory performance and are not easy to scale up.In the context of such needs,we propose a related degree-based frequent pattern mining algorithm,named Related High Utility Quantitative Item set Mining(RHUQI-Miner),to enable the effective mining of railway fault data.The algorithm constructs the item-related degree structure of fault data and gives a pruning optimization strategy to find frequent patterns with higher related degrees,reducing redundancy and invalid frequent patterns.Subsequently,it uses the fixed pattern length strategy to modify the utility information of the item in the mining process so that the algorithm can control the length of the output frequent pattern according to the actual data situation and further improve the performance and practicability of the algorithm.The experimental results on the real fault dataset show that RHUQI-Miner can effectively reduce the time and memory consumption in the mining process,thus providing data support for differentiated and precise maintenance strategies.展开更多
A new algorithm based on an FC-tree (frequent closed pattern tree) and a max-FCIA (maximal frequent closed itemsets algorithm) is presented, which is used to mine the frequent closed itemsets for solving memory an...A new algorithm based on an FC-tree (frequent closed pattern tree) and a max-FCIA (maximal frequent closed itemsets algorithm) is presented, which is used to mine the frequent closed itemsets for solving memory and time consuming problems. This algorithm maps the transaction database by using a Hash table,gets the support of all frequent itemsets through operating the Hash table and forms a lexicographic subset tree including the frequent itemsets.Efficient pruning methods are used to get the FC-tree including all the minimum frequent closed itemsets through processing the lexicographic subset tree.Finally,frequent closed itemsets are generated from minimum frequent closed itemsets.The experimental results show that the mapping transaction database is introduced in the algorithm to reduce time consumption and to improve the efficiency of the program.Furthermore,the effective pruning strategy restrains the number of candidates,which saves space.The results show that the algorithm is effective.展开更多
Mining frequent pattern in transaction database, time series databases, and many other kinds of databases have been studied popularly in data mining research. Most of the previous studies adopt Apriori like candidat...Mining frequent pattern in transaction database, time series databases, and many other kinds of databases have been studied popularly in data mining research. Most of the previous studies adopt Apriori like candidate set generation and test approach. However, candidate set generation is very costly. Han J. proposed a novel algorithm FP growth that could generate frequent pattern without candidate set. Based on the analysis of the algorithm FP growth, this paper proposes a concept of equivalent FP tree and proposes an improved algorithm, denoted as FP growth * , which is much faster in speed, and easy to realize. FP growth * adopts a modified structure of FP tree and header table, and only generates a header table in each recursive operation and projects the tree to the original FP tree. The two algorithms get the same frequent pattern set in the same transaction database, but the performance study on computer shows that the speed of the improved algorithm, FP growth * , is at least two times as fast as that of FP growth.展开更多
Reliability parameter selection is very important in the period of equipment project design and demonstration. In this paper, the problem in selecting the reliability parameters and their number is proposed. In order ...Reliability parameter selection is very important in the period of equipment project design and demonstration. In this paper, the problem in selecting the reliability parameters and their number is proposed. In order to solve this problem, the thought of text mining is used to extract the feature and curtail feature sets from text data firstly, and frequent pattern tree (FPT) of the text data is constructed to reason frequent item-set between the key factors by frequent patter growth (FPC) algorithm. Then on the basis of fuzzy Bayesian network (FBN) and sample distribution, this paper fuzzifies the key attributes, which forms associated relationship in frequent item-sets and their main parameters, eliminates the subjective influence factors and obtains condition mutual information and maximum weight directed tree among all the attribute variables. Furthermore, the hybrid model is established by reason fuzzy prior probability and contingent probability and concluding parameter learning method. Finally, the example indicates the model is believable and effective.展开更多
In the network security system,intrusion detection plays a significant role.The network security system detects the malicious actions in the network and also conforms the availability,integrity and confidentiality of da...In the network security system,intrusion detection plays a significant role.The network security system detects the malicious actions in the network and also conforms the availability,integrity and confidentiality of data informa-tion resources.Intrusion identification system can easily detect the false positive alerts.If large number of false positive alerts are created then it makes intrusion detection system as difficult to differentiate the false positive alerts from genuine attacks.Many research works have been done.The issues in the existing algo-rithms are more memory space and need more time to execute the transactions of records.This paper proposes a novel framework of network security Intrusion Detection System(IDS)using Modified Frequent Pattern(MFP-Tree)via K-means algorithm.The accuracy rate of Modified Frequent Pattern Tree(MFPT)-K means method infinding the various attacks are Normal 94.89%,for DoS based attack 98.34%,for User to Root(U2R)attacks got 96.73%,Remote to Local(R2L)got 95.89%and Probe attack got 92.67%and is optimal when it is compared with other existing algorithms of K-Means and APRIORI.展开更多
Periodic patternmining has become a popular research subject in recent years;this approach involves the discoveryof frequently recurring patterns in a transaction sequence. However, previous algorithms for periodic pa...Periodic patternmining has become a popular research subject in recent years;this approach involves the discoveryof frequently recurring patterns in a transaction sequence. However, previous algorithms for periodic patternmining have ignored the utility (profit, value) of patterns. Additionally, these algorithms only identify periodicpatterns in a single sequence. However, identifying patterns of high utility that are common to a set of sequencesis more valuable. In several fields, identifying high-utility periodic frequent patterns in multiple sequences isimportant. In this study, an efficient algorithm called MHUPFPS was proposed to identify such patterns. To addressexisting problems, three new measures are defined: the utility, high support, and high-utility period sequenceratios. Further, a new upper bound, upSeqRa, and two new pruning properties were proposed. MHUPFPS usesa newly defined HUPFPS-list structure to significantly accelerate the reduction of the search space and improvethe overall performance of the algorithm. Furthermore, the proposed algorithmis evaluated using several datasets.The experimental results indicate that the algorithm is accurate and effective in filtering several non-high-utilityperiodic frequent patterns.展开更多
Previous weighted frequent pattern (WFP) mining algorithms are not suitable for data streams for they need multiple database scans. In this paper, we present an efficient algorithm SWFP-Miner to mine weighted freque...Previous weighted frequent pattern (WFP) mining algorithms are not suitable for data streams for they need multiple database scans. In this paper, we present an efficient algorithm SWFP-Miner to mine weighted frequent pattern over data streams. SWFP-Miner is based on sliding window and can discover important frequent pattern from the recent data. A new refined weight definition is proposed to keep the downward closure property, and two pruning strategies are presented to prune the weighted infrequent pattern. Experimental studies are performed to evaluate the effectiveness and efficiency of SWFP-Miner.展开更多
Periodic frequent pattern discovery is a non-trivial task to discover frequent patterns based on user interests using a periodicity measure.Although conventional algorithms for periodic frequent pattern detection have...Periodic frequent pattern discovery is a non-trivial task to discover frequent patterns based on user interests using a periodicity measure.Although conventional algorithms for periodic frequent pattern detection have numerous applications,there is still little research on periodic frequent pattern detection of individual passengers in the metro.The travel behavior of individual passengers has complex spatio-temporal characteristics in the metro network,which may pose new challenges in discovering periodic frequent patterns of individual metro passengers and developing mining algorithms based on real-world smart card data.This study addresses these issues by proposing a novel pattern for metro passenger travel pattern called periodic frequent passenger traffic patterns with time granularities and station attributes(PFPTS).This discovered pattern can automatically capture the features of the temporal dimension(morning and evening peak hours,week)and the spatial dimension(entering and leaving stations).The corresponding complete mining algorithm with the PFPTS-tree structure has been developed.To evaluate the performance of PFPTS-tree,several experiments are conducted on one-year real-world smart card data collected by an automatic fare collection system in a certain large metro network.The results show that PFPTS-Tree is efficient and can discover numerous interesting periodic frequent patterns of metro passengers in the real-world dataset.展开更多
Periodic pattern mining is of great significance for understanding passenger travel behav-ior,but the previous works mainly focused on the trajectory data and the dimension of the spot/point.Besides,many uncertain fac...Periodic pattern mining is of great significance for understanding passenger travel behav-ior,but the previous works mainly focused on the trajectory data and the dimension of the spot/point.Besides,many uncertain factors(severe weather,traffic accident,etc.)may interfere with discovering original and accurate periodic travel patterns.This paper pro-poses a novel type of travel pattern called motif periodic frequent pattern(MPFP),which captures the periodicity of network temporal motifs of individual metro passengers with higher-order spatio-temporal characteristics considering,uncertain disturbances.We also propose a new complete mining algorithm MPFP-growth to extract MPFP from smart card data(SCD),and apply the real long-time-span experimental data from a large-scale metro system is applied.Results show that frequent-travel metro passengers usually have some typical MPFPs with the temporal periodic characteristic of“week”.Only the top 10 types of all 4624 types account for about 95%of all motifs and the top 5 types constitute about 90%,and the MPFP of the top 3 types of motifs account for nearly 80%of all periodic patterns,in which Mono-MPFP and 2-MPFP are the main ones.The relatively stable time range of MPFP is three months,and the threshold for the optimal uncertain disturbance factor should be set at 5%.Additionally,several interesting typical MPFPs of individual metro commuting passengers and their proportions are introduced to further understand the multifarious variants of MPFP.展开更多
In this paper, we propose an efficient algorithm, called FFP-Growth (shortfor fast FP-Growth) , to mine frequent itemsets. Similar to FP-Growth, FFP-Growth searches theFP-tree in the bottom-up order, but need not cons...In this paper, we propose an efficient algorithm, called FFP-Growth (shortfor fast FP-Growth) , to mine frequent itemsets. Similar to FP-Growth, FFP-Growth searches theFP-tree in the bottom-up order, but need not construct conditional pattern bases and sub-FP-trees,thus, saving a substantial amount of time and space, and the FP-tree created by it is much smallerthan that created by TD-FP-Growth, hence improving efficiency. At the same time, FFP-Growth can beeasily extended for reducing the search space as TD-FP-Growth (M) and TD-FP-Growth (C). Experimentalresults show that the algorithm of this paper is effective and efficient.展开更多
In wind and solar renewable-dominant hybrid alternating current/direct current(AC/DC)power systems,the active power of high-voltage direct current(HVDC)system is significantly limited by the security and stability eve...In wind and solar renewable-dominant hybrid alternating current/direct current(AC/DC)power systems,the active power of high-voltage direct current(HVDC)system is significantly limited by the security and stability events caused by cascading failures.To identify critical lines in cascading failures,a rapid risk assessment method is proposed based on the gradient boosting decision tree(GBDT)and frequent pat-tern growth(FP-Growth)algorithms.First,security and stability events triggered by cascading failures are analyzed to explain the impact of cascading failures on the maximum DC power.Then,a cascading failure risk index is defined,focusing on the DC power being limited.To handle the strong nonlinear relationship between the maximum DC power and cascading failures,a GBDT with an update strategy is utilized to rapidly predict the maximum DC power under uncertain operating conditions.Finally,the FP-Growth algorithm is improved to mine frequent patterns in cascading failures.The importance index for each fault in a frequent pattern is defined by evaluating its impact on cascading failures,enabling the identification of critical lines.Simulation results of a modified Ningxia–Shandong hybrid AC/DC system in China demonstrate that the proposed method can rapidly assess the risk of cascading failures and effectively identify critical lines.展开更多
OBJECTIVE:To explore the correlation between diagnostic information of tongue and gastroscopy results of patients with chronic gastritis.METHODS:Frequent pattern growth(FP-Growth),SPSS Modeler was used to analyze the ...OBJECTIVE:To explore the correlation between diagnostic information of tongue and gastroscopy results of patients with chronic gastritis.METHODS:Frequent pattern growth(FP-Growth),SPSS Modeler was used to analyze the correlation rules between the image information of tongue parameters and the characteristics of the stomach and duodenum seen under gastroscopy.RESULTS:Ranking in order of confidence:cyanotic tongue,slippery fur,yellow fur and spotted tongue were sequently associated with both gastric antrum mucosal hyperemia or edema and gastric antrum mucosal erythema/macula.L,one value of tongue coating color,which counted among(30,60),tooth-marked tongue and b,one value of tongue coating color,which counted in the range of(5,20)were sequently associated with gastric antrum mucosal erythema/macula.A,one value of tongue body color,which counted in the range of(0,20),was related to both gastric antrum mucosal hyperemia or edema and gastric antrum mucosal erythema/macula.a,one value of tongue coating color,which counted in the range of(15,35),was associated with gastric antrum mucosal erythema/macula.There are a total of 9 strong correlation rules.CONCLUSIONS:Cyanotic tongue,slippery fur,yellow fur,the CIE Lab value of tongue coating,a,the value of tongue body color,spotted tongue,and tooth-marked tongue are all related to the gastric antrum mucosal hyperemia or edema and gastric antrum mucosal erythema/macula.The conditions of gastric mucosa could be predicted by the examination of the above related image information of tongue.展开更多
基金This research was supported in part by the Hunan Province’s Strategic and Emerging Industrial Projects under Grant 2018GK4035in part by the Hunan Province’s Changsha Zhuzhou Xiangtan National Independent Innovation Demonstration Zone projects under Grant 2017XK2058+1 种基金in part by the National Natural Science Foundation of China under Grant 61602171in part by the Scientific Research Fund of Hunan Provincial Education Department under Grant 17C0960 and 18B037.
文摘Currently,the top-rank-k has been widely applied to mine frequent patterns with a rank not exceeding k.In the existing algorithms,although a level-wise-search could fully mine the target patterns,it usually leads to the delay of high rank patterns generation,resulting in the slow growth of the support threshold and the mining efficiency.Aiming at this problem,a greedy-strategy-based top-rank-k frequent patterns hybrid mining algorithm(GTK)is proposed in this paper.In this algorithm,top-rank-k patterns are stored in a static doubly linked list called RSL,and the patterns are divided into short patterns and long patterns.The short patterns generated by a rank-first-search always joins the two patterns of the highest rank in RSL that have not yet been joined.On the basis of the short patterns satisfying specific conditions,the long patterns are extracted through level-wise-search.To reduce redundancy,GTK improves the generation method of subsume index and designs the new pruning strategies of candidates.This algorithm also takes the use of reasonable pruning strategies to reduce the amount of computation to improve the computational speed.Real datasets and synthetic datasets are adopted in experiments to evaluate the proposed algorithm.The experimental results show the obvious advantages in both time efficiency and space efficiency of GTK.
基金Supported by the National Natural Science Foundation of China(60472099)Ningbo Natural Science Foundation(2006A610017)
文摘Because data warehouse is frequently changing, incremental data leads to old knowledge which is mined formerly unavailable. In order to maintain the discovered knowledge and patterns dynamically, this study presents a novel algorithm updating for global frequent patterns-IPARUC. A rapid clustering method is introduced to divide database into n parts in IPARUC firstly, where the data are similar in the same part. Then, the nodes in the tree are adjusted dynamically in inserting process by "pruning and laying back" to keep the frequency descending order so that they can be shared to approaching optimization. Finally local frequent itemsets mined from each local dataset are merged into global frequent itemsets. The results of experimental study are very encouraging. It is obvious from experiment that IPARUC is more effective and efficient than other two contrastive methods. Furthermore, there is significant application potential to a prototype of Web log Analyzer in web usage mining that can help us to discover useful knowledge effectively, even help managers making decision.
基金Supported by the National Natural Science Foundation of China ( No.60474022)Henan Innovation Project for University Prominent Research Talents (No.2007KYCX018)
文摘Because mining complete set of frequent patterns from dense database could be impractical, an interesting alternative has been proposed recently. Instead of mining the complete set of frequent patterns, the new model only finds out the maximal frequent patterns, which can generate all frequent patterns. FP-growth algorithm is one of the most efficient frequent-pattern mining methods published so far. However, because FP-tree and conditional FP-trees must be two-way traversable, a great deal memory is needed in process of mining. This paper proposes an efficient algorithm Unid_FP-Max for mining maximal frequent patterns based on unidirectional FP-tree. Because of generation method of unidirectional FP-tree and conditional unidirectional FP-trees, the algorithm reduces the space consumption to the fullest extent. With the development of two techniques: single path pruning and header table pruning which can cut down many conditional unidirectional FP-trees generated recursively in mining process, Unid_FP-Max further lowers the expense of time and space.
文摘Maximum frequent pattern generation from a large database of transactions and items for association rule mining is an important research topic in data mining. Association rule mining aims to discover interesting correlations, frequent patterns, associations, or causal structures between items hidden in a large database. By exploiting quantum computing, we propose an efficient quantum search algorithm design to discover the maximum frequent patterns. We modified Grover’s search algorithm so that a subspace of arbitrary symmetric states is used instead of the whole search space. We presented a novel quantum oracle design that employs a quantum counter to count the maximum frequent items and a quantum comparator to check with a minimum support threshold. The proposed derived algorithm increases the rate of the correct solutions since the search is only in a subspace. Furthermore, our algorithm significantly scales and optimizes the required number of qubits in design, which directly reflected positively on the performance. Our proposed design can accommodate more transactions and items and still have a good performance with a small number of qubits.
文摘It is nontrivial to maintain such discovered frequent query patterns in real XML-DBMS because the transaction database of queries may allow frequent updates and such updates may not only invalidate some existing frequent query patterns but also generate some new frequent query patterns. In this paper, two incremental updating algorithms, FUX-QMiner and FUXQMiner, are proposed for efficient maintenance of discovered frequent query patterns and generation the new frequent query patterns when new XMI, queries are added into the database. Experimental results from our implementation show that the proposed algorithms have good performance. Key words XML - frequent query pattern - incremental algorithm - data mining CLC number TP 311 Foudation item: Supported by the Youthful Foundation for Scientific Research of University of Shanghai for Science and TechnologyBiography: PENG Dun-lu (1974-), male, Associate professor, Ph.D, research direction: data mining, Web service and its application, peerto-peer computing.
基金Supported by the National High Technology Research and Development Program of China (No. 2007AA01Z132) the National Natural Science Foundation of China (No.60775035, 60933004, 60970088, 60903141)+1 种基金 the National Basic Research Priorities Programme (No. 2007CB311004) the National Science and Technology Support Plan (No.2006BAC08B06).
文摘In this paper, we propose an enhanced associative classification method by integrating the dynamic property in the process of associative classification. In the proposed method, we employ a support vector machine(SVM) based method to refine the discovered emerging ~equent patterns for classification rule extension for class label prediction. The empirical study shows that our method can be used to classify increasing resources efficiently and effectively.
文摘Mining frequent patterns has been studied popularly in data mining area. However, little work has been done on mining patterns when the database has an influx of fresh data constantly. In these dynamic scenarios, efficient maintenance of the discovered patterns is crucial. Most existing methods need to scan the entire database repeatedly, which is an obvious disadvantage. In this paper, an efficient incremental mining algorithm, Incremental-Mining (IM), is proposed for maintenance of the frequent patterns when incremental data come. Based on the frequent pattern tree (FP-tree) structure, IM gives a way to make the most of the things from the previous mining process, and requires scanning the original data once at most. Furthermore, IM can identify directly the differential set of frequent patterns, which may be more informative to users. Moreover, IM can deal with changing thresholds as well as changing data, thus provide a full maintenance scheme. IM has been implemented and the performance study shows it outperforms three other incremental algorithms: FUP, DB-tree and re-running frequent pattern growth (FP-growth). Keywords data mining - association rule mining - frequent pattern mining - incremental mining Supported by the National Basic Research 973 Program of China under Grant No.G1999032705.Xiu-Li Ma received the Ph.D. degree in computer science from Peking University in 2003. She is currently a postdoctoral researcher at National Lab on Machine Perception of Peking University. Her main research interests include data warehousing, data mining, intelligent online analysis, and sensor network.Yun-Hai Tong received the Ph.D. degree in computer software from Peking University in 2002. He is currently an assistant professor at School of Electronics Engineering and Computer Science of Peking University. His research interests include data warehousing, online analysis processing and data mining.Shi-Wei Tang received the B.S. degree in mathematics from Peking University in 1964. Now, he is a professor and Ph.D. supervisor at School of Electronics Engineering and Computer Science of Peking University. His research interests include DBMS, information integration, data warehousing. OLAP, and data mining, database technology in specific application fields. He is the vice chair of the Database Society of China Computer Federation.Dong-Qing Yang received the B.S. degree in mathematics from Peking University in 1969. Now, she is a professor and Ph.D supervisor at School of Electronics Engineering and Computer Science of Peking University. Her research interests include database design methodology, database system implementation techniques, data warehousing and data mining, information integration and sharing in Web environment. She is a member of academic committee of Database Society of China Computer Federation.
文摘We propose an efficient hybrid algorithm WDHP in this paper for mining frequent access patterns. WDHP adopts the techniques of DHP to optimize its performance, which is using hash table to filter candidate set and trimming database. Whenever the database is trimmed to a size less than a specified threshold, the algorithm puts the database into main memory by constructing a tree, and finds frequent patterns on the tree. The experiment shows that WDHP outperform algorithm DHP and main memory based algorithm WAP in execution efficiency.
基金supported by the Research on Key Technologies and Typical Applications of Big Data in Railway Production and Operation(P2023S006)the Fundamental Research Funds for the Central Universities(2022JBZY023).
文摘It is of great significance to improve the efficiency of railway production and operation by realizing the fault knowledge association through the efficient data mining algorithm.However,high utility quantitative frequent pattern mining algorithms in the field of data mining still suffer from the problems of low time-memory performance and are not easy to scale up.In the context of such needs,we propose a related degree-based frequent pattern mining algorithm,named Related High Utility Quantitative Item set Mining(RHUQI-Miner),to enable the effective mining of railway fault data.The algorithm constructs the item-related degree structure of fault data and gives a pruning optimization strategy to find frequent patterns with higher related degrees,reducing redundancy and invalid frequent patterns.Subsequently,it uses the fixed pattern length strategy to modify the utility information of the item in the mining process so that the algorithm can control the length of the output frequent pattern according to the actual data situation and further improve the performance and practicability of the algorithm.The experimental results on the real fault dataset show that RHUQI-Miner can effectively reduce the time and memory consumption in the mining process,thus providing data support for differentiated and precise maintenance strategies.
基金The National Natural Science Foundation of China(No.60603047)the Natural Science Foundation of Liaoning ProvinceLiaoning Higher Education Research Foundation(No.2008341)
文摘A new algorithm based on an FC-tree (frequent closed pattern tree) and a max-FCIA (maximal frequent closed itemsets algorithm) is presented, which is used to mine the frequent closed itemsets for solving memory and time consuming problems. This algorithm maps the transaction database by using a Hash table,gets the support of all frequent itemsets through operating the Hash table and forms a lexicographic subset tree including the frequent itemsets.Efficient pruning methods are used to get the FC-tree including all the minimum frequent closed itemsets through processing the lexicographic subset tree.Finally,frequent closed itemsets are generated from minimum frequent closed itemsets.The experimental results show that the mapping transaction database is introduced in the algorithm to reduce time consumption and to improve the efficiency of the program.Furthermore,the effective pruning strategy restrains the number of candidates,which saves space.The results show that the algorithm is effective.
基金theFundoftheNationalManagementBureauofTraditionalChineseMedicine(No .2 0 0 0 J P 5 4 )
文摘Mining frequent pattern in transaction database, time series databases, and many other kinds of databases have been studied popularly in data mining research. Most of the previous studies adopt Apriori like candidate set generation and test approach. However, candidate set generation is very costly. Han J. proposed a novel algorithm FP growth that could generate frequent pattern without candidate set. Based on the analysis of the algorithm FP growth, this paper proposes a concept of equivalent FP tree and proposes an improved algorithm, denoted as FP growth * , which is much faster in speed, and easy to realize. FP growth * adopts a modified structure of FP tree and header table, and only generates a header table in each recursive operation and projects the tree to the original FP tree. The two algorithms get the same frequent pattern set in the same transaction database, but the performance study on computer shows that the speed of the improved algorithm, FP growth * , is at least two times as fast as that of FP growth.
基金the Weapon Equipment Beforehand Research Foundation of China(No.9140A19030314JB35275)the Army Technology Element Foundation of China(No.A157167)
文摘Reliability parameter selection is very important in the period of equipment project design and demonstration. In this paper, the problem in selecting the reliability parameters and their number is proposed. In order to solve this problem, the thought of text mining is used to extract the feature and curtail feature sets from text data firstly, and frequent pattern tree (FPT) of the text data is constructed to reason frequent item-set between the key factors by frequent patter growth (FPC) algorithm. Then on the basis of fuzzy Bayesian network (FBN) and sample distribution, this paper fuzzifies the key attributes, which forms associated relationship in frequent item-sets and their main parameters, eliminates the subjective influence factors and obtains condition mutual information and maximum weight directed tree among all the attribute variables. Furthermore, the hybrid model is established by reason fuzzy prior probability and contingent probability and concluding parameter learning method. Finally, the example indicates the model is believable and effective.
文摘In the network security system,intrusion detection plays a significant role.The network security system detects the malicious actions in the network and also conforms the availability,integrity and confidentiality of data informa-tion resources.Intrusion identification system can easily detect the false positive alerts.If large number of false positive alerts are created then it makes intrusion detection system as difficult to differentiate the false positive alerts from genuine attacks.Many research works have been done.The issues in the existing algo-rithms are more memory space and need more time to execute the transactions of records.This paper proposes a novel framework of network security Intrusion Detection System(IDS)using Modified Frequent Pattern(MFP-Tree)via K-means algorithm.The accuracy rate of Modified Frequent Pattern Tree(MFPT)-K means method infinding the various attacks are Normal 94.89%,for DoS based attack 98.34%,for User to Root(U2R)attacks got 96.73%,Remote to Local(R2L)got 95.89%and Probe attack got 92.67%and is optimal when it is compared with other existing algorithms of K-Means and APRIORI.
文摘Periodic patternmining has become a popular research subject in recent years;this approach involves the discoveryof frequently recurring patterns in a transaction sequence. However, previous algorithms for periodic patternmining have ignored the utility (profit, value) of patterns. Additionally, these algorithms only identify periodicpatterns in a single sequence. However, identifying patterns of high utility that are common to a set of sequencesis more valuable. In several fields, identifying high-utility periodic frequent patterns in multiple sequences isimportant. In this study, an efficient algorithm called MHUPFPS was proposed to identify such patterns. To addressexisting problems, three new measures are defined: the utility, high support, and high-utility period sequenceratios. Further, a new upper bound, upSeqRa, and two new pruning properties were proposed. MHUPFPS usesa newly defined HUPFPS-list structure to significantly accelerate the reduction of the search space and improvethe overall performance of the algorithm. Furthermore, the proposed algorithmis evaluated using several datasets.The experimental results indicate that the algorithm is accurate and effective in filtering several non-high-utilityperiodic frequent patterns.
文摘Previous weighted frequent pattern (WFP) mining algorithms are not suitable for data streams for they need multiple database scans. In this paper, we present an efficient algorithm SWFP-Miner to mine weighted frequent pattern over data streams. SWFP-Miner is based on sliding window and can discover important frequent pattern from the recent data. A new refined weight definition is proposed to keep the downward closure property, and two pruning strategies are presented to prune the weighted infrequent pattern. Experimental studies are performed to evaluate the effectiveness and efficiency of SWFP-Miner.
基金supported by the National Natural Science Foundation of China(Grant No.52102382)the Shanghai Science and Technology Committee(Grant No.20DZ1203201)+1 种基金the Fundamental Research Funds for the Central Universities(2022-5-YB-04)the Shanghai Shentong Metro Group Co.,Ltd.(Grant Nos.JSKY21R005-1-WT-21064 and JS-KY21R005-2).
文摘Periodic frequent pattern discovery is a non-trivial task to discover frequent patterns based on user interests using a periodicity measure.Although conventional algorithms for periodic frequent pattern detection have numerous applications,there is still little research on periodic frequent pattern detection of individual passengers in the metro.The travel behavior of individual passengers has complex spatio-temporal characteristics in the metro network,which may pose new challenges in discovering periodic frequent patterns of individual metro passengers and developing mining algorithms based on real-world smart card data.This study addresses these issues by proposing a novel pattern for metro passenger travel pattern called periodic frequent passenger traffic patterns with time granularities and station attributes(PFPTS).This discovered pattern can automatically capture the features of the temporal dimension(morning and evening peak hours,week)and the spatial dimension(entering and leaving stations).The corresponding complete mining algorithm with the PFPTS-tree structure has been developed.To evaluate the performance of PFPTS-tree,several experiments are conducted on one-year real-world smart card data collected by an automatic fare collection system in a certain large metro network.The results show that PFPTS-Tree is efficient and can discover numerous interesting periodic frequent patterns of metro passengers in the real-world dataset.
基金supported by the National Natural Science Foundation of China(No.52372332)the Fundamental Research Funds for the Central Universities of China(No.2022-5-YB-04)the Shanghai Shentong Metro Group Co.,Ltd.(Nos.JSKY21R005-1-WT-21064,and JS-KY22R033-2).
文摘Periodic pattern mining is of great significance for understanding passenger travel behav-ior,but the previous works mainly focused on the trajectory data and the dimension of the spot/point.Besides,many uncertain factors(severe weather,traffic accident,etc.)may interfere with discovering original and accurate periodic travel patterns.This paper pro-poses a novel type of travel pattern called motif periodic frequent pattern(MPFP),which captures the periodicity of network temporal motifs of individual metro passengers with higher-order spatio-temporal characteristics considering,uncertain disturbances.We also propose a new complete mining algorithm MPFP-growth to extract MPFP from smart card data(SCD),and apply the real long-time-span experimental data from a large-scale metro system is applied.Results show that frequent-travel metro passengers usually have some typical MPFPs with the temporal periodic characteristic of“week”.Only the top 10 types of all 4624 types account for about 95%of all motifs and the top 5 types constitute about 90%,and the MPFP of the top 3 types of motifs account for nearly 80%of all periodic patterns,in which Mono-MPFP and 2-MPFP are the main ones.The relatively stable time range of MPFP is three months,and the threshold for the optimal uncertain disturbance factor should be set at 5%.Additionally,several interesting typical MPFPs of individual metro commuting passengers and their proportions are introduced to further understand the multifarious variants of MPFP.
文摘In this paper, we propose an efficient algorithm, called FFP-Growth (shortfor fast FP-Growth) , to mine frequent itemsets. Similar to FP-Growth, FFP-Growth searches theFP-tree in the bottom-up order, but need not construct conditional pattern bases and sub-FP-trees,thus, saving a substantial amount of time and space, and the FP-tree created by it is much smallerthan that created by TD-FP-Growth, hence improving efficiency. At the same time, FFP-Growth can beeasily extended for reducing the search space as TD-FP-Growth (M) and TD-FP-Growth (C). Experimentalresults show that the algorithm of this paper is effective and efficient.
基金supported by the National Key Research and Development Program of China"Key technologies for system stability and HVDC transmission of large-scale renewable energy generation base without conventional power support(2022YFB2402700)"the project of the State Grid Corporation of China(52272222001J).
文摘In wind and solar renewable-dominant hybrid alternating current/direct current(AC/DC)power systems,the active power of high-voltage direct current(HVDC)system is significantly limited by the security and stability events caused by cascading failures.To identify critical lines in cascading failures,a rapid risk assessment method is proposed based on the gradient boosting decision tree(GBDT)and frequent pat-tern growth(FP-Growth)algorithms.First,security and stability events triggered by cascading failures are analyzed to explain the impact of cascading failures on the maximum DC power.Then,a cascading failure risk index is defined,focusing on the DC power being limited.To handle the strong nonlinear relationship between the maximum DC power and cascading failures,a GBDT with an update strategy is utilized to rapidly predict the maximum DC power under uncertain operating conditions.Finally,the FP-Growth algorithm is improved to mine frequent patterns in cascading failures.The importance index for each fault in a frequent pattern is defined by evaluating its impact on cascading failures,enabling the identification of critical lines.Simulation results of a modified Ningxia–Shandong hybrid AC/DC system in China demonstrate that the proposed method can rapidly assess the risk of cascading failures and effectively identify critical lines.
基金Key Special Project of the National Key Research and Development Program of Ministry of Science and Technology(No.2017YFB1002300):Topic One:Multimodal Heterogeneous Efficient Acquisition of Traditional Chinese Medicine Big Data and Resource Library Construction(No.2017YFB1002301)and Topic Three:Multi-Scale Cognition Methods and Treatment Analysis Model of Traditional Chinese Medicine Based on Deep Learning(No.2017YFB1002303)from Big Data-Driven Traditional Chinese Medicine Intelligent Auxiliary Diagnostic Service SystemGraduation Design of“Cultivation Program”for Cross-cultivation of High-level Talents in Beijing Colleges and Universities in 2010(Scientific Research):the Research on the Clinical Diagnosis and Prediction System of Gastric Precancerous Lesions Based on Artificial Intelligence+2 种基金National Natural Science Foundation of China(No.30701071)the Sixth Batch of Academic Experience Inheritance of Traditional Chinese Medicine Experts(2017)“3+3”Project of Beijing Traditional Chinese Medicine Inheritance(No.2012-SZ-C-41)。
文摘OBJECTIVE:To explore the correlation between diagnostic information of tongue and gastroscopy results of patients with chronic gastritis.METHODS:Frequent pattern growth(FP-Growth),SPSS Modeler was used to analyze the correlation rules between the image information of tongue parameters and the characteristics of the stomach and duodenum seen under gastroscopy.RESULTS:Ranking in order of confidence:cyanotic tongue,slippery fur,yellow fur and spotted tongue were sequently associated with both gastric antrum mucosal hyperemia or edema and gastric antrum mucosal erythema/macula.L,one value of tongue coating color,which counted among(30,60),tooth-marked tongue and b,one value of tongue coating color,which counted in the range of(5,20)were sequently associated with gastric antrum mucosal erythema/macula.A,one value of tongue body color,which counted in the range of(0,20),was related to both gastric antrum mucosal hyperemia or edema and gastric antrum mucosal erythema/macula.a,one value of tongue coating color,which counted in the range of(15,35),was associated with gastric antrum mucosal erythema/macula.There are a total of 9 strong correlation rules.CONCLUSIONS:Cyanotic tongue,slippery fur,yellow fur,the CIE Lab value of tongue coating,a,the value of tongue body color,spotted tongue,and tooth-marked tongue are all related to the gastric antrum mucosal hyperemia or edema and gastric antrum mucosal erythema/macula.The conditions of gastric mucosa could be predicted by the examination of the above related image information of tongue.