The rapid development of network technology and its evolution toward heterogeneous networks has increased the demand to support automatic monitoring and the management of heterogeneous wireless communication networks....The rapid development of network technology and its evolution toward heterogeneous networks has increased the demand to support automatic monitoring and the management of heterogeneous wireless communication networks.This paper presents a multilevel pattern mining architecture to support automatic network management by discovering interesting patterns from telecom network monitoring data.This architecture leverages and combines existing frequent itemset discovery over data streams,association rule deduction,frequent sequential pattern mining,and frequent temporal pattern mining techniques while also making use of distributed processing platforms to achieve high-volume throughput.展开更多
Sparse large-scale multi-objective optimization problems(SLMOPs)are common in science and engineering.However,the large-scale problem represents the high dimensionality of the decision space,requiring algorithms to tr...Sparse large-scale multi-objective optimization problems(SLMOPs)are common in science and engineering.However,the large-scale problem represents the high dimensionality of the decision space,requiring algorithms to traverse vast expanse with limited computational resources.Furthermore,in the context of sparse,most variables in Pareto optimal solutions are zero,making it difficult for algorithms to identify non-zero variables efficiently.This paper is dedicated to addressing the challenges posed by SLMOPs.To start,we introduce innovative objective functions customized to mine maximum and minimum candidate sets.This substantial enhancement dramatically improves the efficacy of frequent pattern mining.In this way,selecting candidate sets is no longer based on the quantity of nonzero variables they contain but on a higher proportion of nonzero variables within specific dimensions.Additionally,we unveil a novel approach to association rule mining,which delves into the intricate relationships between non-zero variables.This novel methodology aids in identifying sparse distributions that can potentially expedite reductions in the objective function value.We extensively tested our algorithm across eight benchmark problems and four real-world SLMOPs.The results demonstrate that our approach achieves competitive solutions across various challenges.展开更多
The identification of design pattern instances is important for program understanding and software maintenance. Aiming at the mining of design patterns in existing systems, this paper proposes a subgraph isomorphism a...The identification of design pattern instances is important for program understanding and software maintenance. Aiming at the mining of design patterns in existing systems, this paper proposes a subgraph isomorphism approach to discover several design patterns in a legacy system at a time. The attributed relational graph is used to describe design patterns and legacy systems. The sub-graph isomorphism approach consists of decomposition and composition process. During the decomposition process, graphs corresponding to the design patterns are decomposed into sub-graphs, some of which are graphs corresponding to the elemental design patterns. The composition process tries to get sub-graph isomorphism of the matched graph if sub-graph isomorphism of each subgraph is obtained. Due to the common structures between design patterns, the proposed approach can reduce the matching times of entities and relations. Compared with the existing methods, the proposed algorithm is not linearly dependent on the number of design pattern graphs. Key words design pattern mining - attributed relational graph - subgraph isomorphism CLC number TP 311.5 Foundation item: Supported by the National Natural Science Foundation of China (60273075) and the Science Foundation of Naval University of Engineering (HGDJJ03019)Biography: LI Qing-hua (1940-), male, Professor, research direction: parallel computing.展开更多
A recommender system is an approach performed by e-commerce for increasing smooth users’experience.Sequential pattern mining is a technique of data mining used to identify the co-occurrence relationships by taking in...A recommender system is an approach performed by e-commerce for increasing smooth users’experience.Sequential pattern mining is a technique of data mining used to identify the co-occurrence relationships by taking into account the order of transactions.This work will present the implementation of sequence pattern mining for recommender systems within the domain of e-com-merce.This work will execute the Systolic tree algorithm for mining the frequent patterns to yield feasible rules for the recommender system.The feature selec-tion's objective is to pick a feature subset having the least feature similarity as well as highest relevancy with the target class.This will mitigate the feature vector's dimensionality by eliminating redundant,irrelevant,or noisy data.This work pre-sents a new hybrid recommender system based on optimized feature selection and systolic tree.The features were extracted using Term Frequency-Inverse Docu-ment Frequency(TF-IDF),feature selection with the utilization of River Forma-tion Dynamics(RFD),and the Particle Swarm Optimization(PSO)algorithm.The systolic tree is used for pattern mining,and based on this,the recommendations are given.The proposed methods were evaluated using the MovieLens dataset,and the experimental outcomes confirmed the efficiency of the techniques.It was observed that the RFD feature selection with systolic tree frequent pattern mining with collaborativefiltering,the precision of 0.89 was achieved.展开更多
It is of great significance to improve the efficiency of railway production and operation by realizing the fault knowledge association through the efficient data mining algorithm.However,high utility quantitative freq...It is of great significance to improve the efficiency of railway production and operation by realizing the fault knowledge association through the efficient data mining algorithm.However,high utility quantitative frequent pattern mining algorithms in the field of data mining still suffer from the problems of low time-memory performance and are not easy to scale up.In the context of such needs,we propose a related degree-based frequent pattern mining algorithm,named Related High Utility Quantitative Item set Mining(RHUQI-Miner),to enable the effective mining of railway fault data.The algorithm constructs the item-related degree structure of fault data and gives a pruning optimization strategy to find frequent patterns with higher related degrees,reducing redundancy and invalid frequent patterns.Subsequently,it uses the fixed pattern length strategy to modify the utility information of the item in the mining process so that the algorithm can control the length of the output frequent pattern according to the actual data situation and further improve the performance and practicability of the algorithm.The experimental results on the real fault dataset show that RHUQI-Miner can effectively reduce the time and memory consumption in the mining process,thus providing data support for differentiated and precise maintenance strategies.展开更多
Faster internet, IoT, and social media have reformed the conventional web into a collaborative web resulting in enormous user-generated content. Several studies are focused on such content;however, they mainly focus o...Faster internet, IoT, and social media have reformed the conventional web into a collaborative web resulting in enormous user-generated content. Several studies are focused on such content;however, they mainly focus on textual data, thus undermining the importance of metadata. Considering this gap, we provide a temporal pattern mining framework to model and utilize user-generated content's metadata. First, we scrap 2.1 million tweets from Twitter between Nov-2020 to Sep-2021 about 100 hashtag keywords and present these tweets into 100 User-Tweet-Hashtag (UTH) dynamic graphs. Second, we extract and identify four time-series in three timespans (Day, Hour, and Minute) from UTH dynamic graphs. Lastly, we model these four time-series with three machine learning algorithms to mine temporal patterns with the accuracy of 95.89%, 93.17%, 90.97%, and 93.73%, respectively. We demonstrate that user-generated content's metadata contains valuable information, which helps to understand the users' collective behavior and can be beneficial for business and research. Dataset and codes are publicly available;the link is given in the dataset section.展开更多
Mineral resources in Asia continent and its mining industry play a significant role in the economic growth and industrialization of both Asia and the world.Asia continent boasts the most comprehensive kinds of mineral...Mineral resources in Asia continent and its mining industry play a significant role in the economic growth and industrialization of both Asia and the world.Asia continent boasts the most comprehensive kinds of minerals,with reserves of at least 38 of over 80 widely used minerals worldwide accounting for more than30%of the global total reserves.Asia continent experienced three main tectonic evolution and mineralization stages:The Precambrian,the Paleozoic,and the Mesozoic to Cenozoic.The abundant mineral resources in this continent can be divided into seven first-order metallogenic belts(metallogenic domains),18 second-order metallogenic belts(metallogenic provinces),61 third-order metallogenic belts(metallogenic zones),and nine main minerogenetic series.Asia continent exhibits the most significant metallogenic specialization among all continents.Specifically,granite belts of Asia continent manifest pronounced metallogenic specialization of tin,rare metals,and porphyry Cu-Au-Mo deposits.Its maficultramafic rock belts and ophiolite belts display notable metallogenic specialization of lateritic nickel deposits and magmatic type chromite deposits,while its Mesozoic to Cenozoic basalt belts show remarkable metallogenic specialization of lateritic bauxite deposits.Consequently,many giant metallogenic belts were formed,including the Southeast Asian tin belt,the Qinghai-Xizang Plateau rare metal metallogenic belt,the Tethyan porphyry Cu-Au-Mo metallogenic belt,the circum-Pacific porphyry Cu-Au-Mo metallogenic belt,the Southeast Asian lateritic bauxite metallogenic belt,the Deccan Plateau lateritic bauxite metallogenic belt in India,the Southeast Asian lateritic nickel metallogenic belt,and the Tethyan magmatic type chromite metallogenic belt—all of which are significant metallogenic belts in Asia continent.Future mineral exploration in Asia should focus primarily on the Precambrian mineralization of ancient cratons,the Paleozoic mineralization of the Central Asian-Mongolian orogenic belt,and the Mesozoic to Cenozoic mineralization of the Tethyan and circum-Pacific mobile belts.Asia's mining industry not only underpins its own economic growth but also propels global economic development and industrialization,contributing significantly to the world economy.Asia boasts the highest production value of minerals,the largest annual production of minerals,and the greatest trade value of mineral products among all the continents,having emerged as the trade center of global mineral products and the center of the mining industry economy.China is identified as one of the few countries that possess the most comprehensive kinds of minerals,and its mining industry has supported and driven the economic development and industrialization of Asia and even the world.Standing as the largest mineral producer worldwide,China ranked first in the production of 28 mineral commodities in the world in 2022.Besides,China exhibits the highest annual production value of minerals and the largest trade value of mineral products among all countries.Therefore,China's demand for global mineral products influences the global supply and demand patterns of minerals and the world economic situation.展开更多
The task of mining erasable patterns(EPs)is a data mining problem that can help factory managers come up with the best product plans for the future.This problem has been studied by many scientists in recent times,and ...The task of mining erasable patterns(EPs)is a data mining problem that can help factory managers come up with the best product plans for the future.This problem has been studied by many scientists in recent times,and many approaches for mining EPs have been proposed.Erasable closed patterns(ECPs)are an abbreviated representation of EPs and can be con-sidered condensed representations of EPs without information loss.Current methods of mining ECPs identify huge numbers of such patterns,whereas intelligent systems only need a small number.A ranking process therefore needs to be applied prior to use,which causes a reduction in efficiency.To overcome this limitation,this study presents a robust method for mining top-rank-k ECPs in which the mining and ranking phases are combined into a single step.First,we propose a virtual-threshold-based pruning strategy to improve the mining speed.Based on this strategy and dPidset structure,we then develop a fast algorithm for mining top-rank-k ECPs,which we call TRK-ECP.Finally,we carry out experiments to compare the runtime of our TRK-ECP algorithm with two algorithms modified from dVM and TEPUS(Top-rank-k Erasable Pattern mining Using the Subsume concept),which are state-of-the-art algorithms for mining top-rank-k EPs.The results for the running time confirm that TRK-ECP outperforms the other experimental approaches in terms of mining the top-rank-k ECPs.展开更多
Sequential pattern mining is an important data mining problem with broadapplications. However, it is also a challenging problem since the mining may have to generate orexamine a combinatorially explosive number of int...Sequential pattern mining is an important data mining problem with broadapplications. However, it is also a challenging problem since the mining may have to generate orexamine a combinatorially explosive number of intermediate subsequences. Recent studies havedeveloped two major classes of sequential pattern mining methods: (1) a candidategeneration-and-test approach, represented by (ⅰ) GSP, a horizontal format-based sequential patternmining method, and (ⅱ) SPADE, a vertical format-based method; and (2) a pattern-growth method,represented by PrefixSpan and its further extensions, such as gSpan for mining structured patterns.In this study, we perform a systematic introduction and presentation of the pattern-growthmethodology and study its principles and extensions. We first introduce two interestingpattern-growth algorithms, FreeSpan and PrefixSpan, for efficient sequential pattern mining. Then weintroduce gSpan for mining structured patterns using the same methodology. Their relativeperformance in large databases is presented and analyzed. Several extensions of these methods arealso discussed in the paper, including mining multi-level, multi-dimensional patterns and miningconstraint-based patterns.展开更多
Design patterns are often used in the development of object-oriented software. It offers reusable abstract information that is helpful in solving recurring design problems. Detecting design patterns is beneficial to t...Design patterns are often used in the development of object-oriented software. It offers reusable abstract information that is helpful in solving recurring design problems. Detecting design patterns is beneficial to the comprehension and maintenance of object-oriented software systems. Several pattern detection techniques based on static analysis often encounter problems when detecting design patterns for identical structures of patterns. In this study, we attempt to detect software design patterns by using software metrics and classification-based techniques. Our study is conducted in two phases: creation of metrics-oriented dataset and detection of software design patterns. The datasets are prepared by using software metrics for the learning of classifiers. Then, pattern detection is performed by using classification-based techniques. To evaluate the proposed method, experiments are conducted using three open source software programs, JHotDraw, QuickUML, and JUnit, and the results are analyzed.展开更多
Data mining is a powerful emerging technology that helps to extract hidden information from a huge volume of historical data. This paper is concerned with finding the frequent trajectories of moving objects in spatio-...Data mining is a powerful emerging technology that helps to extract hidden information from a huge volume of historical data. This paper is concerned with finding the frequent trajectories of moving objects in spatio-temporal data by a novel method adopting the concepts of clustering and sequential pattern mining. The algorithms used logically split the trajectory span area into clusters and then apply the k-means algorithm over this clusters until the squared error minimizes. The new method applies the threshold to obtain active clusters and arranges them in descending order based on number of trajectories passing through. From these active clusters, inter cluster patterns are found by a sequential pattern mining technique. The process is repeated until all the active clusters are linked. The clusters thus linked in sequence are the frequent trajectories. A set of experiments conducted using real datasets shows that the proposed method is relatively five times better than the existing ones. A comparison is made with the results of other algorithms and their variation is analyzed by statistical methods. Further, tests of significance are conducted with ANOVA to find the efficient threshold value for the optimum plot of frequent trajectories. The results are analyzed and found to be superior than the existing ones. This approach may be of relevance in finding alternate paths in busy networks ( congestion control), finding the frequent paths of migratory birds, or even to predict the next level of pattern characteristics in case of time series data with minor alterations and finding the frequent path of balls in certain games.展开更多
Disinformation,often known as fake news,is a major issue that has received a lot of attention lately.Many researchers have proposed effective means of detecting and addressing it.Current machine and deep learning base...Disinformation,often known as fake news,is a major issue that has received a lot of attention lately.Many researchers have proposed effective means of detecting and addressing it.Current machine and deep learning based methodologies for classification/detection of fake news are content-based,network(propagation)based,or multimodal methods that combine both textual and visual information.We introduce here a framework,called FNACSPM,based on sequential pattern mining(SPM),for fake news analysis and classification.In this framework,six publicly available datasets,containing a diverse range of fake and real news,and their combination,are first transformed into a proper format.Then,algorithms for SPM are applied to the transformed datasets to extract frequent patterns(and rules)of words,phrases,or linguistic features.The obtained patterns capture distinctive characteristics associated with fake or real news content,providing valuable insights into the underlying structures and commonalities of misinformation.Subsequently,the discovered frequent patterns are used as features for fake news classification.This framework is evaluated with eight classifiers,and their performance is assessed with various metrics.Extensive experiments were performed and obtained results show that FNACSPM outperformed other state-of-the-art approaches for fake news classification,and that it expedites the classification task with high accuracy.展开更多
Holistic understanding of wind behaviour over space,time and height is essential for harvesting wind energy application.This study presents a novel approach for mapping frequent wind profile patterns using multidimens...Holistic understanding of wind behaviour over space,time and height is essential for harvesting wind energy application.This study presents a novel approach for mapping frequent wind profile patterns using multidimensional sequential pattern mining(MDSPM).This study is illustrated with a time series of 24 years of European Centre for Medium-Range Weather Forecasts European Reanalysis-Interim gridded(0.125°×0.125°)wind data for the Netherlands every 6 h and at six height levels.The wind data were first transformed into two spatio-temporal sequence databases(for speed and direction,respectively).Then,the Linear time Closed Itemset Miner Sequence algorithm was used to extract the multidimensional sequential patterns,which were then visualized using a 3D wind rose,a circular histogram and a geographical map.These patterns were further analysed to determine their wind shear coefficients and turbulence intensities as well as their spatial overlap with current areas with wind turbines.Our analysis identified four frequent wind profile patterns.One of them highly suitable to harvest wind energy at a height of 128 m and 68.97%of the geographical area covered by this pattern already contains wind turbines.This study shows that the proposed approach is capable of efficiently extracting meaningful patterns from complex spatio-temporal datasets.展开更多
The discovery of gradual moving object clusters pattern from trajectory streams allows characterizing movement behavior in real time environment,which leverages new applications and services.Since the trajectory strea...The discovery of gradual moving object clusters pattern from trajectory streams allows characterizing movement behavior in real time environment,which leverages new applications and services.Since the trajectory streams is rapidly evolving,continuously created and cannot be stored indefinitely in memory,the existing approaches designed on static trajectory datasets are not suitable for discovering gradual moving object clusters pattern from trajectory streams.This paper proposes a novel algorithm of gradual moving object clusters pattern discovery from trajectory streams using sliding window models.By processing the trajectory data in current window,the mining algorithm can capture the trend and evolution of moving object clusters pattern.Firstly,the density peaks clustering algorithm is exploited to identify clusters of different snapshots.The stable relationship between relatively few moving objects is used to improve the clustering efficiency.Then,by intersecting clusters from different snapshots,the gradual moving object clusters pattern is updated.The relationship of clusters between adjacent snapshots and the gradual property are utilized to accelerate updating process.Finally,experiment results on two real datasets demonstrate that our algorithm is effective and efficient.展开更多
A frequent trajectory patterns mining algorithm is proposed to learn the object activities and classify the trajectories in intelligent visual surveillance system.The distribution patterns of the trajectories were gen...A frequent trajectory patterns mining algorithm is proposed to learn the object activities and classify the trajectories in intelligent visual surveillance system.The distribution patterns of the trajectories were generated by an Apriori based frequent patterns mining algorithm and the trajectories were classified by the frequent trajectory patterns generated.In addition,a fuzzy c-means(FCM)based learning algorithm and a mean shift based clustering procedure were used to construct the representation of trajectories.The algorithm can be further used to describe activities and identify anomalies.The experiments on two real scenes show that the algorithm is effective.展开更多
Previous weighted frequent pattern (WFP) mining algorithms are not suitable for data streams for they need multiple database scans. In this paper, we present an efficient algorithm SWFP-Miner to mine weighted freque...Previous weighted frequent pattern (WFP) mining algorithms are not suitable for data streams for they need multiple database scans. In this paper, we present an efficient algorithm SWFP-Miner to mine weighted frequent pattern over data streams. SWFP-Miner is based on sliding window and can discover important frequent pattern from the recent data. A new refined weight definition is proposed to keep the downward closure property, and two pruning strategies are presented to prune the weighted infrequent pattern. Experimental studies are performed to evaluate the effectiveness and efficiency of SWFP-Miner.展开更多
Purpose-With the deepening integration of rail transit systems-encompassing urban rail,regional railways,trunk lines and medium-low capacity transportation-the four-network integration imposes higher demands on operat...Purpose-With the deepening integration of rail transit systems-encompassing urban rail,regional railways,trunk lines and medium-low capacity transportation-the four-network integration imposes higher demands on operation and maintenance systems regarding cross-modal coordination,full-element interconnectivity and dynamic responsiveness.Design/methodology/approach-This paper,based on policy directives and engineering practices,analyzes the operational maintenance characteristics of urban rail traction systems from perspectives including device interconnectivity and fault data mining.A non-intrusive high-frequency diagnostic device independent of vehicle control is proposed,informed by practical onboard operation experience.This innovation significantly enhances diagnostic accuracy for components requiring high sampling frequency,while integrating“Flash”storage with far greater capacity than conventional control chips.Findings-This article will systematically introduces the key points and diagnostic methods for typical faults in urban rail traction systems.Through rational diagnostic algorithms combined with high-precision,highstorage diagnostic instrumentation,the overall safety and reliability of urban rail traction systems have been improved.The proposed non-intrusive high-frequency diagnostic solution has been validated across multiple rail lines.Originality/value-This paper introduces an innovative non-intrusive diagnostic device with a dual-channel design for multi-system compatibility and a high-speed acquisition architecture enabling 400 kHz sampling.Its originality stems from the independent,high-fidelity capture of microsecond-level transient faults like IGBT shoot-through and pantograph arcing;Validated in operational environments,this approach provides a significant leap in diagnostic precision,directly enhancing traction system availability and operational safety by enabling precise fault localization and intelligent,adaptive protection strategies.展开更多
The volume of trajectory data has become tremendously huge in recent years. How to effectively and efficiently maintain and compute such trajectory data has become a challenging task. In this paper, we propose a traje...The volume of trajectory data has become tremendously huge in recent years. How to effectively and efficiently maintain and compute such trajectory data has become a challenging task. In this paper, we propose a trajectory spatial and temporal compression framework, namely CLEAN. The key of spatial compression is to mine meaningful trajectory frequent patterns on road network. By treating the mined patterns as dictionary items, the long trajectories have the chance to be encoded by shorter paths, thus leading to smaller space cost. And an error-bounded temporal compression is carefully designed on top of the identified spatial patterns for much low space cost. Meanwhile, the patterns are also utilized to improve the performance of two trajectory applications, range query and clustering, without decompression overhead. Extensive experiments on real trajectory datasets validate that CLEAN significantly outperforms existing state-of-art approaches in terms of spatial-temporal compression and trajectory applications.展开更多
Despite advances in technological complexity and efforts,software repository maintenance requires reusing the data to reduce the effort and complexity.However,increasing ambiguity,irrelevance,and bugs while extracting...Despite advances in technological complexity and efforts,software repository maintenance requires reusing the data to reduce the effort and complexity.However,increasing ambiguity,irrelevance,and bugs while extracting similar data during software development generate a large amount of data from those data that reside in repositories.Thus,there is a need for a repository mining technique for relevant and bug-free data prediction.This paper proposes a fault prediction approach using a data-mining technique to find good predictors for high-quality software.To predict errors in mining data,the Apriori algorithm was used to discover association rules by fixing confidence at more than 40%and support at least 30%.The pruning strategy was adopted based on evaluation measures.Next,the rules were extracted from three projects of different domains;the extracted rules were then combined to obtain the most popular rules based on the evaluation measure values.To evaluate the proposed approach,we conducted an experimental study to compare the proposed rules with existing ones using four different industrial projects.The evaluation showed that the results of our proposal are promising.Practitioners and developers can utilize these rules for defect prediction during early software development.展开更多
基金funded by the Enterprise Ireland Innovation Partnership Programme with Ericsson under grant agreement IP/2011/0135[6]supported by the National Natural Science Foundation of China(No.61373131,61303039,61232016,61501247)+1 种基金the PAPDCICAEET funds
文摘The rapid development of network technology and its evolution toward heterogeneous networks has increased the demand to support automatic monitoring and the management of heterogeneous wireless communication networks.This paper presents a multilevel pattern mining architecture to support automatic network management by discovering interesting patterns from telecom network monitoring data.This architecture leverages and combines existing frequent itemset discovery over data streams,association rule deduction,frequent sequential pattern mining,and frequent temporal pattern mining techniques while also making use of distributed processing platforms to achieve high-volume throughput.
基金support by the Open Project of Xiangjiang Laboratory(22XJ02003)the University Fundamental Research Fund(23-ZZCX-JDZ-28,ZK21-07)+5 种基金the National Science Fund for Outstanding Young Scholars(62122093)the National Natural Science Foundation of China(72071205)the Hunan Graduate Research Innovation Project(CX20230074)the Hunan Natural Science Foundation Regional Joint Project(2023JJ50490)the Science and Technology Project for Young and Middle-aged Talents of Hunan(2023TJZ03)the Science and Technology Innovation Program of Humnan Province(2023RC1002).
文摘Sparse large-scale multi-objective optimization problems(SLMOPs)are common in science and engineering.However,the large-scale problem represents the high dimensionality of the decision space,requiring algorithms to traverse vast expanse with limited computational resources.Furthermore,in the context of sparse,most variables in Pareto optimal solutions are zero,making it difficult for algorithms to identify non-zero variables efficiently.This paper is dedicated to addressing the challenges posed by SLMOPs.To start,we introduce innovative objective functions customized to mine maximum and minimum candidate sets.This substantial enhancement dramatically improves the efficacy of frequent pattern mining.In this way,selecting candidate sets is no longer based on the quantity of nonzero variables they contain but on a higher proportion of nonzero variables within specific dimensions.Additionally,we unveil a novel approach to association rule mining,which delves into the intricate relationships between non-zero variables.This novel methodology aids in identifying sparse distributions that can potentially expedite reductions in the objective function value.We extensively tested our algorithm across eight benchmark problems and four real-world SLMOPs.The results demonstrate that our approach achieves competitive solutions across various challenges.
文摘The identification of design pattern instances is important for program understanding and software maintenance. Aiming at the mining of design patterns in existing systems, this paper proposes a subgraph isomorphism approach to discover several design patterns in a legacy system at a time. The attributed relational graph is used to describe design patterns and legacy systems. The sub-graph isomorphism approach consists of decomposition and composition process. During the decomposition process, graphs corresponding to the design patterns are decomposed into sub-graphs, some of which are graphs corresponding to the elemental design patterns. The composition process tries to get sub-graph isomorphism of the matched graph if sub-graph isomorphism of each subgraph is obtained. Due to the common structures between design patterns, the proposed approach can reduce the matching times of entities and relations. Compared with the existing methods, the proposed algorithm is not linearly dependent on the number of design pattern graphs. Key words design pattern mining - attributed relational graph - subgraph isomorphism CLC number TP 311.5 Foundation item: Supported by the National Natural Science Foundation of China (60273075) and the Science Foundation of Naval University of Engineering (HGDJJ03019)Biography: LI Qing-hua (1940-), male, Professor, research direction: parallel computing.
文摘A recommender system is an approach performed by e-commerce for increasing smooth users’experience.Sequential pattern mining is a technique of data mining used to identify the co-occurrence relationships by taking into account the order of transactions.This work will present the implementation of sequence pattern mining for recommender systems within the domain of e-com-merce.This work will execute the Systolic tree algorithm for mining the frequent patterns to yield feasible rules for the recommender system.The feature selec-tion's objective is to pick a feature subset having the least feature similarity as well as highest relevancy with the target class.This will mitigate the feature vector's dimensionality by eliminating redundant,irrelevant,or noisy data.This work pre-sents a new hybrid recommender system based on optimized feature selection and systolic tree.The features were extracted using Term Frequency-Inverse Docu-ment Frequency(TF-IDF),feature selection with the utilization of River Forma-tion Dynamics(RFD),and the Particle Swarm Optimization(PSO)algorithm.The systolic tree is used for pattern mining,and based on this,the recommendations are given.The proposed methods were evaluated using the MovieLens dataset,and the experimental outcomes confirmed the efficiency of the techniques.It was observed that the RFD feature selection with systolic tree frequent pattern mining with collaborativefiltering,the precision of 0.89 was achieved.
基金supported by the Research on Key Technologies and Typical Applications of Big Data in Railway Production and Operation(P2023S006)the Fundamental Research Funds for the Central Universities(2022JBZY023).
文摘It is of great significance to improve the efficiency of railway production and operation by realizing the fault knowledge association through the efficient data mining algorithm.However,high utility quantitative frequent pattern mining algorithms in the field of data mining still suffer from the problems of low time-memory performance and are not easy to scale up.In the context of such needs,we propose a related degree-based frequent pattern mining algorithm,named Related High Utility Quantitative Item set Mining(RHUQI-Miner),to enable the effective mining of railway fault data.The algorithm constructs the item-related degree structure of fault data and gives a pruning optimization strategy to find frequent patterns with higher related degrees,reducing redundancy and invalid frequent patterns.Subsequently,it uses the fixed pattern length strategy to modify the utility information of the item in the mining process so that the algorithm can control the length of the output frequent pattern according to the actual data situation and further improve the performance and practicability of the algorithm.The experimental results on the real fault dataset show that RHUQI-Miner can effectively reduce the time and memory consumption in the mining process,thus providing data support for differentiated and precise maintenance strategies.
基金supported by the National Natural Science Foundation of China(grant no.61573328).
文摘Faster internet, IoT, and social media have reformed the conventional web into a collaborative web resulting in enormous user-generated content. Several studies are focused on such content;however, they mainly focus on textual data, thus undermining the importance of metadata. Considering this gap, we provide a temporal pattern mining framework to model and utilize user-generated content's metadata. First, we scrap 2.1 million tweets from Twitter between Nov-2020 to Sep-2021 about 100 hashtag keywords and present these tweets into 100 User-Tweet-Hashtag (UTH) dynamic graphs. Second, we extract and identify four time-series in three timespans (Day, Hour, and Minute) from UTH dynamic graphs. Lastly, we model these four time-series with three machine learning algorithms to mine temporal patterns with the accuracy of 95.89%, 93.17%, 90.97%, and 93.73%, respectively. We demonstrate that user-generated content's metadata contains valuable information, which helps to understand the users' collective behavior and can be beneficial for business and research. Dataset and codes are publicly available;the link is given in the dataset section.
基金funded by geological survey project of China Geological Survey(DD20211404)。
文摘Mineral resources in Asia continent and its mining industry play a significant role in the economic growth and industrialization of both Asia and the world.Asia continent boasts the most comprehensive kinds of minerals,with reserves of at least 38 of over 80 widely used minerals worldwide accounting for more than30%of the global total reserves.Asia continent experienced three main tectonic evolution and mineralization stages:The Precambrian,the Paleozoic,and the Mesozoic to Cenozoic.The abundant mineral resources in this continent can be divided into seven first-order metallogenic belts(metallogenic domains),18 second-order metallogenic belts(metallogenic provinces),61 third-order metallogenic belts(metallogenic zones),and nine main minerogenetic series.Asia continent exhibits the most significant metallogenic specialization among all continents.Specifically,granite belts of Asia continent manifest pronounced metallogenic specialization of tin,rare metals,and porphyry Cu-Au-Mo deposits.Its maficultramafic rock belts and ophiolite belts display notable metallogenic specialization of lateritic nickel deposits and magmatic type chromite deposits,while its Mesozoic to Cenozoic basalt belts show remarkable metallogenic specialization of lateritic bauxite deposits.Consequently,many giant metallogenic belts were formed,including the Southeast Asian tin belt,the Qinghai-Xizang Plateau rare metal metallogenic belt,the Tethyan porphyry Cu-Au-Mo metallogenic belt,the circum-Pacific porphyry Cu-Au-Mo metallogenic belt,the Southeast Asian lateritic bauxite metallogenic belt,the Deccan Plateau lateritic bauxite metallogenic belt in India,the Southeast Asian lateritic nickel metallogenic belt,and the Tethyan magmatic type chromite metallogenic belt—all of which are significant metallogenic belts in Asia continent.Future mineral exploration in Asia should focus primarily on the Precambrian mineralization of ancient cratons,the Paleozoic mineralization of the Central Asian-Mongolian orogenic belt,and the Mesozoic to Cenozoic mineralization of the Tethyan and circum-Pacific mobile belts.Asia's mining industry not only underpins its own economic growth but also propels global economic development and industrialization,contributing significantly to the world economy.Asia boasts the highest production value of minerals,the largest annual production of minerals,and the greatest trade value of mineral products among all the continents,having emerged as the trade center of global mineral products and the center of the mining industry economy.China is identified as one of the few countries that possess the most comprehensive kinds of minerals,and its mining industry has supported and driven the economic development and industrialization of Asia and even the world.Standing as the largest mineral producer worldwide,China ranked first in the production of 28 mineral commodities in the world in 2022.Besides,China exhibits the highest annual production value of minerals and the largest trade value of mineral products among all countries.Therefore,China's demand for global mineral products influences the global supply and demand patterns of minerals and the world economic situation.
文摘The task of mining erasable patterns(EPs)is a data mining problem that can help factory managers come up with the best product plans for the future.This problem has been studied by many scientists in recent times,and many approaches for mining EPs have been proposed.Erasable closed patterns(ECPs)are an abbreviated representation of EPs and can be con-sidered condensed representations of EPs without information loss.Current methods of mining ECPs identify huge numbers of such patterns,whereas intelligent systems only need a small number.A ranking process therefore needs to be applied prior to use,which causes a reduction in efficiency.To overcome this limitation,this study presents a robust method for mining top-rank-k ECPs in which the mining and ranking phases are combined into a single step.First,we propose a virtual-threshold-based pruning strategy to improve the mining speed.Based on this strategy and dPidset structure,we then develop a fast algorithm for mining top-rank-k ECPs,which we call TRK-ECP.Finally,we carry out experiments to compare the runtime of our TRK-ECP algorithm with two algorithms modified from dVM and TEPUS(Top-rank-k Erasable Pattern mining Using the Subsume concept),which are state-of-the-art algorithms for mining top-rank-k EPs.The results for the running time confirm that TRK-ECP outperforms the other experimental approaches in terms of mining the top-rank-k ECPs.
文摘Sequential pattern mining is an important data mining problem with broadapplications. However, it is also a challenging problem since the mining may have to generate orexamine a combinatorially explosive number of intermediate subsequences. Recent studies havedeveloped two major classes of sequential pattern mining methods: (1) a candidategeneration-and-test approach, represented by (ⅰ) GSP, a horizontal format-based sequential patternmining method, and (ⅱ) SPADE, a vertical format-based method; and (2) a pattern-growth method,represented by PrefixSpan and its further extensions, such as gSpan for mining structured patterns.In this study, we perform a systematic introduction and presentation of the pattern-growthmethodology and study its principles and extensions. We first introduce two interestingpattern-growth algorithms, FreeSpan and PrefixSpan, for efficient sequential pattern mining. Then weintroduce gSpan for mining structured patterns using the same methodology. Their relativeperformance in large databases is presented and analyzed. Several extensions of these methods arealso discussed in the paper, including mining multi-level, multi-dimensional patterns and miningconstraint-based patterns.
文摘Design patterns are often used in the development of object-oriented software. It offers reusable abstract information that is helpful in solving recurring design problems. Detecting design patterns is beneficial to the comprehension and maintenance of object-oriented software systems. Several pattern detection techniques based on static analysis often encounter problems when detecting design patterns for identical structures of patterns. In this study, we attempt to detect software design patterns by using software metrics and classification-based techniques. Our study is conducted in two phases: creation of metrics-oriented dataset and detection of software design patterns. The datasets are prepared by using software metrics for the learning of classifiers. Then, pattern detection is performed by using classification-based techniques. To evaluate the proposed method, experiments are conducted using three open source software programs, JHotDraw, QuickUML, and JUnit, and the results are analyzed.
基金the receipt of research supported by the TATA Consultancy Service's scholarship
文摘Data mining is a powerful emerging technology that helps to extract hidden information from a huge volume of historical data. This paper is concerned with finding the frequent trajectories of moving objects in spatio-temporal data by a novel method adopting the concepts of clustering and sequential pattern mining. The algorithms used logically split the trajectory span area into clusters and then apply the k-means algorithm over this clusters until the squared error minimizes. The new method applies the threshold to obtain active clusters and arranges them in descending order based on number of trajectories passing through. From these active clusters, inter cluster patterns are found by a sequential pattern mining technique. The process is repeated until all the active clusters are linked. The clusters thus linked in sequence are the frequent trajectories. A set of experiments conducted using real datasets shows that the proposed method is relatively five times better than the existing ones. A comparison is made with the results of other algorithms and their variation is analyzed by statistical methods. Further, tests of significance are conducted with ANOVA to find the efficient threshold value for the optimum plot of frequent trajectories. The results are analyzed and found to be superior than the existing ones. This approach may be of relevance in finding alternate paths in busy networks ( congestion control), finding the frequent paths of migratory birds, or even to predict the next level of pattern characteristics in case of time series data with minor alterations and finding the frequent path of balls in certain games.
文摘Disinformation,often known as fake news,is a major issue that has received a lot of attention lately.Many researchers have proposed effective means of detecting and addressing it.Current machine and deep learning based methodologies for classification/detection of fake news are content-based,network(propagation)based,or multimodal methods that combine both textual and visual information.We introduce here a framework,called FNACSPM,based on sequential pattern mining(SPM),for fake news analysis and classification.In this framework,six publicly available datasets,containing a diverse range of fake and real news,and their combination,are first transformed into a proper format.Then,algorithms for SPM are applied to the transformed datasets to extract frequent patterns(and rules)of words,phrases,or linguistic features.The obtained patterns capture distinctive characteristics associated with fake or real news content,providing valuable insights into the underlying structures and commonalities of misinformation.Subsequently,the discovered frequent patterns are used as features for fake news classification.This framework is evaluated with eight classifiers,and their performance is assessed with various metrics.Extensive experiments were performed and obtained results show that FNACSPM outperformed other state-of-the-art approaches for fake news classification,and that it expedites the classification task with high accuracy.
基金This work was supported by the Malaysian Ministry of Education(SLAI)and Universiti Teknologi Malaysia(UTM).
文摘Holistic understanding of wind behaviour over space,time and height is essential for harvesting wind energy application.This study presents a novel approach for mapping frequent wind profile patterns using multidimensional sequential pattern mining(MDSPM).This study is illustrated with a time series of 24 years of European Centre for Medium-Range Weather Forecasts European Reanalysis-Interim gridded(0.125°×0.125°)wind data for the Netherlands every 6 h and at six height levels.The wind data were first transformed into two spatio-temporal sequence databases(for speed and direction,respectively).Then,the Linear time Closed Itemset Miner Sequence algorithm was used to extract the multidimensional sequential patterns,which were then visualized using a 3D wind rose,a circular histogram and a geographical map.These patterns were further analysed to determine their wind shear coefficients and turbulence intensities as well as their spatial overlap with current areas with wind turbines.Our analysis identified four frequent wind profile patterns.One of them highly suitable to harvest wind energy at a height of 128 m and 68.97%of the geographical area covered by this pattern already contains wind turbines.This study shows that the proposed approach is capable of efficiently extracting meaningful patterns from complex spatio-temporal datasets.
基金This work is supported by the National Natural Science Foundationof China under Grants No. 41471371.
文摘The discovery of gradual moving object clusters pattern from trajectory streams allows characterizing movement behavior in real time environment,which leverages new applications and services.Since the trajectory streams is rapidly evolving,continuously created and cannot be stored indefinitely in memory,the existing approaches designed on static trajectory datasets are not suitable for discovering gradual moving object clusters pattern from trajectory streams.This paper proposes a novel algorithm of gradual moving object clusters pattern discovery from trajectory streams using sliding window models.By processing the trajectory data in current window,the mining algorithm can capture the trend and evolution of moving object clusters pattern.Firstly,the density peaks clustering algorithm is exploited to identify clusters of different snapshots.The stable relationship between relatively few moving objects is used to improve the clustering efficiency.Then,by intersecting clusters from different snapshots,the gradual moving object clusters pattern is updated.The relationship of clusters between adjacent snapshots and the gradual property are utilized to accelerate updating process.Finally,experiment results on two real datasets demonstrate that our algorithm is effective and efficient.
基金National High-Tech Research and Development Plan of China(No.2003AA1Z2130)Science and Technology Project of Zhejiang Province of China(No.2005C1100102)
文摘A frequent trajectory patterns mining algorithm is proposed to learn the object activities and classify the trajectories in intelligent visual surveillance system.The distribution patterns of the trajectories were generated by an Apriori based frequent patterns mining algorithm and the trajectories were classified by the frequent trajectory patterns generated.In addition,a fuzzy c-means(FCM)based learning algorithm and a mean shift based clustering procedure were used to construct the representation of trajectories.The algorithm can be further used to describe activities and identify anomalies.The experiments on two real scenes show that the algorithm is effective.
文摘Previous weighted frequent pattern (WFP) mining algorithms are not suitable for data streams for they need multiple database scans. In this paper, we present an efficient algorithm SWFP-Miner to mine weighted frequent pattern over data streams. SWFP-Miner is based on sliding window and can discover important frequent pattern from the recent data. A new refined weight definition is proposed to keep the downward closure property, and two pruning strategies are presented to prune the weighted infrequent pattern. Experimental studies are performed to evaluate the effectiveness and efficiency of SWFP-Miner.
基金supported by the Fund of China Academy of Railway Sciences Corporation Limited(2023YJ342).
文摘Purpose-With the deepening integration of rail transit systems-encompassing urban rail,regional railways,trunk lines and medium-low capacity transportation-the four-network integration imposes higher demands on operation and maintenance systems regarding cross-modal coordination,full-element interconnectivity and dynamic responsiveness.Design/methodology/approach-This paper,based on policy directives and engineering practices,analyzes the operational maintenance characteristics of urban rail traction systems from perspectives including device interconnectivity and fault data mining.A non-intrusive high-frequency diagnostic device independent of vehicle control is proposed,informed by practical onboard operation experience.This innovation significantly enhances diagnostic accuracy for components requiring high sampling frequency,while integrating“Flash”storage with far greater capacity than conventional control chips.Findings-This article will systematically introduces the key points and diagnostic methods for typical faults in urban rail traction systems.Through rational diagnostic algorithms combined with high-precision,highstorage diagnostic instrumentation,the overall safety and reliability of urban rail traction systems have been improved.The proposed non-intrusive high-frequency diagnostic solution has been validated across multiple rail lines.Originality/value-This paper introduces an innovative non-intrusive diagnostic device with a dual-channel design for multi-system compatibility and a high-speed acquisition architecture enabling 400 kHz sampling.Its originality stems from the independent,high-fidelity capture of microsecond-level transient faults like IGBT shoot-through and pantograph arcing;Validated in operational environments,this approach provides a significant leap in diagnostic precision,directly enhancing traction system availability and operational safety by enabling precise fault localization and intelligent,adaptive protection strategies.
基金National Natural Science Foundation of China (Grant No. 61772371,No. 61972286)
文摘The volume of trajectory data has become tremendously huge in recent years. How to effectively and efficiently maintain and compute such trajectory data has become a challenging task. In this paper, we propose a trajectory spatial and temporal compression framework, namely CLEAN. The key of spatial compression is to mine meaningful trajectory frequent patterns on road network. By treating the mined patterns as dictionary items, the long trajectories have the chance to be encoded by shorter paths, thus leading to smaller space cost. And an error-bounded temporal compression is carefully designed on top of the identified spatial patterns for much low space cost. Meanwhile, the patterns are also utilized to improve the performance of two trajectory applications, range query and clustering, without decompression overhead. Extensive experiments on real trajectory datasets validate that CLEAN significantly outperforms existing state-of-art approaches in terms of spatial-temporal compression and trajectory applications.
基金This research was financially supported in part by the Ministry of Trade,Industry and Energy(MOTIE)and Korea Institute for Advancement of Technology(KIAT)through the International Cooperative R&D program.(Project No.P0016038)in part by the MSIT(Ministry of Science and ICT),Korea,under the ITRC(Information Technology Research Center)support program(IITP-2021-2016-0-00312)supervised by the IITP(Institute for Information&communications Technology Planning&Evaluation).
文摘Despite advances in technological complexity and efforts,software repository maintenance requires reusing the data to reduce the effort and complexity.However,increasing ambiguity,irrelevance,and bugs while extracting similar data during software development generate a large amount of data from those data that reside in repositories.Thus,there is a need for a repository mining technique for relevant and bug-free data prediction.This paper proposes a fault prediction approach using a data-mining technique to find good predictors for high-quality software.To predict errors in mining data,the Apriori algorithm was used to discover association rules by fixing confidence at more than 40%and support at least 30%.The pruning strategy was adopted based on evaluation measures.Next,the rules were extracted from three projects of different domains;the extracted rules were then combined to obtain the most popular rules based on the evaluation measure values.To evaluate the proposed approach,we conducted an experimental study to compare the proposed rules with existing ones using four different industrial projects.The evaluation showed that the results of our proposal are promising.Practitioners and developers can utilize these rules for defect prediction during early software development.