Multidatabase systems are designed to achieve schema integration and data interoperation among distributed and heterogeneous database systems. But data model heterogeneity and schema heterogeneity make this a challeng...Multidatabase systems are designed to achieve schema integration and data interoperation among distributed and heterogeneous database systems. But data model heterogeneity and schema heterogeneity make this a challenging task. A multidatabase common data model is firstly introduced based on XML, named XML-based Integration Data Model (XIDM), which is suitable for integrating different types of schemas. Then an approach of schema mappings based on XIDM in multidatabase systems has been presented. The mappings include global mappings, dealing with horizontal and vertical partitioning between global schemas and export schemas, and local mappings, processing the transformation between export schemas and local schemas. Finally, the illustration and implementation of schema mappings in a multidatabase prototype - Panorama system are also discussed. The implementation results demonstrate that the XIDM is an efficient model for managing multiple heterogeneous data sources and the approaches of schema mapping based on XIDM behave very well when integrating relational, object-oriented database systems and other file systems.展开更多
Symbol portrayal is an important function of GIS. Sharing symbolic information in different GIS platforms is necessary for GIS applications and users. This paper discusses the necessity, possibility and solution techn...Symbol portrayal is an important function of GIS. Sharing symbolic information in different GIS platforms is necessary for GIS applications and users. This paper discusses the necessity, possibility and solution technique of sharing a symbol library in different GIS platforms. The route map is designed as follows: first, to set up a general data model for the symbol library, then to design a standard exchange format, and finally to call on the GIS manufacturer to provide the interchange tools for their symbol library for the standard exchange format. This paper analyzes the general characteristics of GIS symbolic library, gives a symbol library model and a draft of XML schema of the symbol library exchange format.展开更多
Cooling process of iron ore pellets in a circular cooler has great impacts on the pellet quality and systematic energy exploitation. However, multi-variables and non-visualization of this gray system is unfavorable to...Cooling process of iron ore pellets in a circular cooler has great impacts on the pellet quality and systematic energy exploitation. However, multi-variables and non-visualization of this gray system is unfavorable to efficient production. Thus, the cooling process of iron ore pellets was optimized using mathematical model and data mining techniques. A mathematical model was established and validated by steady-state production data, and the results show that the calculated values coincide very well with the measured values. Based on the proposed model, effects of important process parameters on gas-pellet temperature profiles within the circular cooler were analyzed to better understand the entire cooling process. Two data mining techniques—Association Rules Induction and Clustering were also applied on the steady-state production data to obtain expertise operating rules and optimized targets. Finally, an optimized control strategy for the circular cooler was proposed and an operation guidance system was developed. The system could realize the visualization of thermal process at steady state and provide operation guidance to optimize the circular cooler.展开更多
The existing data mining methods are mostly focused on relational databases and structured data, but not on complex structured data (like in extensible markup language(XML)). By converting XML document type descriptio...The existing data mining methods are mostly focused on relational databases and structured data, but not on complex structured data (like in extensible markup language(XML)). By converting XML document type description to the relational semantic recording XML data relations, and using an XML data mining language, the XML data mining system presents a strategy to mine information on XML.展开更多
In order to find an effective way to improve the quality of school management,finding valuable information from students' original data and providing feedback for student management are necessary. Firstly,some new...In order to find an effective way to improve the quality of school management,finding valuable information from students' original data and providing feedback for student management are necessary. Firstly,some new and successful educational data mining models were analyzed and compared. These models have better performance than traditional models( such as Knowledge Tracing Model) in efficiency,comprehensiveness,ease of use,stability and so on. Then,the neural network algorithm was conducted to explore the feasibility of the application of educational data mining in student management,and the results show that it has enough predictive accuracy and reliability to be put into practice. In the end,the possibility and prospect of the application of educational data mining in teaching management system for university students was assessed.展开更多
OBJECTIVE: To help researchers selecting appropriate data mining models to provide better evidence for the clinical practice of Traditional Chinese Medicine(TCM) diagnosis and therapy.METHODS: Clinical issues based on...OBJECTIVE: To help researchers selecting appropriate data mining models to provide better evidence for the clinical practice of Traditional Chinese Medicine(TCM) diagnosis and therapy.METHODS: Clinical issues based on data mining models were comprehensively summarized from four significant elements of the clinical studies:symptoms, symptom patterns, herbs, and efficacy.Existing problems were further generalized to determine the relevant factors of the performance of data mining models, e.g. data type, samples, parameters, variable labels. Combining these relevant factors, the TCM clinical data features were compared with regards to statistical characters and informatics properties. Data models were compared simultaneously from the view of applied conditions and suitable scopes.RESULTS: The main application problems were the inconsistent data type and the small samples for the used data mining models, which caused the inappropriate results, even the mistake results. These features, i.e. advantages, disadvantages, satisfied data types, tasks of data mining, and the TCM issues, were summarized and compared.CONCLUSION: By aiming at the special features of different data mining models, the clinical doctors could select the suitable data mining models to resolve the TCM problem.展开更多
An experience is presented using the finite element method (FEM) and data mining (DM) techniques to develop models that can be used to optimieze the skin-pass rolling process based on its operating conditions. A F...An experience is presented using the finite element method (FEM) and data mining (DM) techniques to develop models that can be used to optimieze the skin-pass rolling process based on its operating conditions. A FE model based on a real skin-pass process is built and validated. Based on this model, a group of FE models is simulated with different adjustment parameters and with different materials for the sheet; both variables are chosen from pre-set ranges, From all FE model simulations, a database is generated; this database is made up of the above mentioned adjustment parameters, sheet properties and the variables of the process arising from the simulation of the model. Various types of data mining algorithms are used to develop predictive models for each of the variables of the process.The best predictive models can be used to predict experimentally hard-to-measure variables (internal stresses, internal straine, etc.) which are useful in the optimal design of the process or to be applied in real time control systems of a skin-pass process in -plant.展开更多
The high temperature dielectrics of Quartz fiber-reinforced silicon dioxide ceramic (Si02/SiO2 ) composites were studied both theoretically and experimentally. A multi-scale theoretical model was developed based on ...The high temperature dielectrics of Quartz fiber-reinforced silicon dioxide ceramic (Si02/SiO2 ) composites were studied both theoretically and experimentally. A multi-scale theoretical model was developed based on the theory of dielectrics. It was realized to predict dielectric properties at higher temperature ( 〉 1200 ℃) by experimental data mining for correlative coefficients in model. The results show that the dielectrics of SiO2/SiO2, which were calculated with the theoretical model, were in agreement with experimental measured value.展开更多
Data Mining has become an important technique for the exploration and extraction of data in numerous and various research projects in different fields (technology, information technology, business, the environment, ec...Data Mining has become an important technique for the exploration and extraction of data in numerous and various research projects in different fields (technology, information technology, business, the environment, economics, etc.). In the context of the analysis and visualisation of large amounts of data extracted using Data Mining on a temporary basis (time-series), free software such as R has appeared in the international context as a perfect inexpensive and efficient tool of exploitation and visualisation of time series. This has allowed the development of models, which help to extract the most relevant information from large volumes of data. In this regard, a script has been developed with the goal of implementing ARIMA models, showing these as useful and quick mechanisms for the extraction, analysis and visualisation of large data volumes, in addition to presenting the great advantage of being applied in multiple branches of knowledge from economy, demography, physics, mathematics and fisheries among others. Therefore, ARIMA models appear as a Data Mining technique, offering reliable, robust and high-quality results, to help validate and sustain the research carried out.展开更多
For the multi-mode radar working in the modern electronicbattlefield, different working states of one single radar areprone to being classified as multiple emitters when adoptingtraditional classification methods to p...For the multi-mode radar working in the modern electronicbattlefield, different working states of one single radar areprone to being classified as multiple emitters when adoptingtraditional classification methods to process intercepted signals,which has a negative effect on signal classification. A classificationmethod based on spatial data mining is presented to address theabove challenge. Inspired by the idea of spatial data mining, theclassification method applies nuclear field to depicting the distributioninformation of pulse samples in feature space, and digs out thehidden cluster information by analyzing distribution characteristics.In addition, a membership-degree criterion to quantify the correlationamong all classes is established, which ensures classificationaccuracy of signal samples. Numerical experiments show that thepresented method can effectively prevent different working statesof multi-mode emitter from being classified as several emitters,and achieve higher classification accuracy.展开更多
Supply Chain Finance(SCF)is important for improving the effectiveness of supply chain capital operations and reducing the overall management cost of a supply chain.In recent years,with the deep integration of supply c...Supply Chain Finance(SCF)is important for improving the effectiveness of supply chain capital operations and reducing the overall management cost of a supply chain.In recent years,with the deep integration of supply chain and Internet,Big Data,Artificial Intelligence,Internet of Things,Blockchain,etc.,the efficiency of supply chain financial services can be greatly promoted through building more customized risk pricing models and conducting more rigorous investment decision-making processes.However,with the rapid development of new technologies,the SCF data has been massively increased and new financial fraud behaviors or patterns are becoming more covertly scattered among normal ones.The lack of enough capability to handle the big data volumes and mitigate the financial frauds may lead to huge losses in supply chains.In this article,a distributed approach of big data mining is proposed for financial fraud detection in a supply chain,which implements the distributed deep learning model of Convolutional Neural Network(CNN)on big data infrastructure of Apache Spark and Hadoop to speed up the processing of the large dataset in parallel and reduce the processing time significantly.By training and testing on the continually updated SCF dataset,the approach can intelligently and automatically classify the massive data samples and discover the fraudulent financing behaviors,so as to enhance the financial fraud detection with high precision and recall rates,and reduce the losses of frauds in a supply chain.展开更多
In the course of network supported collaborative design,the data processing plays a very vital role.Much effort has been spent in this area,and many kinds of approaches have been proposed.Based on the correlative mate...In the course of network supported collaborative design,the data processing plays a very vital role.Much effort has been spent in this area,and many kinds of approaches have been proposed.Based on the correlative materials,this paper presents extensible markup language(XML)based strategy for several important problems of data processing in network supported collaborative design,such as the representation of standard for the exchange of product model data(STEP)with XML in the product information expression and the management of XML documents using relational database.The paper gives a detailed exposition on how to clarify the mapping between XML structure and the relationship database structure and how XML-QL queries can be translated into structured query language(SQL)queries.Finally,the structure of data processing system based on XML is presented.展开更多
In conjunction with association rules for data mining, the connections between testing indices and strong and weak association rules were determined, and new derivative rules were obtained by further reasoning. Associ...In conjunction with association rules for data mining, the connections between testing indices and strong and weak association rules were determined, and new derivative rules were obtained by further reasoning. Association rules were used to analyze correlation and check consistency between indices. This study shows that the judgment obtained by weak association rules or non-association rules is more accurate and more credible than that obtained by strong association rules. When the testing grades of two indices in the weak association rules are inconsistent, the testing grades of indices are more likely to be erroneous, and the mistakes are often caused by human factors. Clustering data mining technology was used to analyze the reliability of a diagnosis, or to perform health diagnosis directly. Analysis showed that the clustering results are related to the indices selected, and that if the indices selected are more significant, the characteristics of clustering results are also more significant, and the analysis or diagnosis is more credible. The indices and diagnosis analysis function produced by this study provide a necessary theoretical foundation and new ideas for the development of hydraulic metal structure health diagnosis technology.展开更多
By rapid progress of network and storage technologies, a huge amount of electronic data such as Web pages and XML has been available on Internet. In this paper, we study a data-mining problem of discovering frequent o...By rapid progress of network and storage technologies, a huge amount of electronic data such as Web pages and XML has been available on Internet. In this paper, we study a data-mining problem of discovering frequent ordered sub-trees in a large collection of XML data, where both of the patterns and the data are modeled by labeled ordered trees. We present an efficient algorithm of Ordered Subtree Miner (OSTMiner) based on two- layer neural networks with Hebb rule, that computes all ordered sub-trees appearing in a collection of XML trees with frequent above a user-specified threshold using a special structure EM-tree. In this algo- rithm, EM-tree is used as an extended merging tree to supply scheme information for efficient pruning and mining frequent sub-trees. Experiments results showed that OSTMiner has good response time and scales well.展开更多
Statistics are most crucial than ever due to the accessibility of huge counts of data from several domains such as finance,medicine,science,engineering,and so on.Statistical data mining(SDM)is an interdisciplinary dom...Statistics are most crucial than ever due to the accessibility of huge counts of data from several domains such as finance,medicine,science,engineering,and so on.Statistical data mining(SDM)is an interdisciplinary domain that examines huge existing databases to discover patterns and connections from the data.It varies in classical statistics on the size of datasets and on the detail that the data could not primarily be gathered based on some experimental strategy but conversely for other resolves.Thus,this paper introduces an effective statistical Data Mining for Intelligent Rainfall Prediction using Slime Mould Optimization with Deep Learning(SDMIRPSMODL)model.In the presented SDMIRP-SMODL model,the feature subset selection process is performed by the SMO algorithm,which in turn minimizes the computation complexity.For rainfall prediction.Convolution neural network with long short-term memory(CNN-LSTM)technique is exploited.At last,this study involves the pelican optimization algorithm(POA)as a hyperparameter optimizer.The experimental evaluation of the SDMIRP-SMODL approach is tested utilizing a rainfall dataset comprising 23682 samples in the negative class and 1865 samples in the positive class.The comparative outcomes reported the supremacy of the SDMIRP-SMODL model compared to existing techniques.展开更多
Data mining in the educational field can be used to optimize the teaching and learning performance among the students.The recently developed machine learning(ML)and deep learning(DL)approaches can be utilized to mine ...Data mining in the educational field can be used to optimize the teaching and learning performance among the students.The recently developed machine learning(ML)and deep learning(DL)approaches can be utilized to mine the data effectively.This study proposes an Improved Sailfish Optimizer-based Feature SelectionwithOptimal Stacked Sparse Autoencoder(ISOFS-OSSAE)for data mining and pattern recognition in the educational sector.The proposed ISOFS-OSSAE model aims to mine the educational data and derive decisions based on the feature selection and classification process.Moreover,the ISOFS-OSSAEmodel involves the design of the ISOFS technique to choose an optimal subset of features.Moreover,the swallow swarm optimization(SSO)with the SSAE model is derived to perform the classification process.To showcase the enhanced outcomes of the ISOFSOSSAE model,a wide range of experiments were taken place on a benchmark dataset from the University of California Irvine(UCI)Machine Learning Repository.The simulation results pointed out the improved classification performance of the ISOFS-OSSAE model over the recent state of art approaches interms of different performance measures.展开更多
To improve surface accuracy of the work-piece and obtain potentially valuable information,a dynamic milling force prediction model was proposed based on data mining.In view of the current dynamic milling force obtaine...To improve surface accuracy of the work-piece and obtain potentially valuable information,a dynamic milling force prediction model was proposed based on data mining.In view of the current dynamic milling force obtained through finite element simulation and analytical calculation,in the finite element modeling,the model built is inevitably different from the actual working conditions,and the analytical calculation is slightly cumbersome and complex,and a dynamic milling force prediction model based on data mining is proposed.The model was established using a combination of regression analysis and Radial Basis Function(RBF) neural network.Using data mining as a means,the internal relationship between milling force,cutting parameters,temperature,vibration and surface quality is deeply analyzed,and the influence of dynamic milling force changes on different situations is extracted and summarized by the methods of cluster analysis and correlation analysis.The results show that the proposed dynamic milling force model has a good prediction effect,ensures the production quality,reduces the occurrence of flutter,improves the surface accuracy of the work-piece,and provides a more accurate basis for the selection of process parameters.展开更多
文摘Multidatabase systems are designed to achieve schema integration and data interoperation among distributed and heterogeneous database systems. But data model heterogeneity and schema heterogeneity make this a challenging task. A multidatabase common data model is firstly introduced based on XML, named XML-based Integration Data Model (XIDM), which is suitable for integrating different types of schemas. Then an approach of schema mappings based on XIDM in multidatabase systems has been presented. The mappings include global mappings, dealing with horizontal and vertical partitioning between global schemas and export schemas, and local mappings, processing the transformation between export schemas and local schemas. Finally, the illustration and implementation of schema mappings in a multidatabase prototype - Panorama system are also discussed. The implementation results demonstrate that the XIDM is an efficient model for managing multiple heterogeneous data sources and the approaches of schema mapping based on XIDM behave very well when integrating relational, object-oriented database systems and other file systems.
基金Supported by the Spatial Information Engineering Key Laboratory Found of Chinese National Surveying and Mapping Bureau.(No.200722)
文摘Symbol portrayal is an important function of GIS. Sharing symbolic information in different GIS platforms is necessary for GIS applications and users. This paper discusses the necessity, possibility and solution technique of sharing a symbol library in different GIS platforms. The route map is designed as follows: first, to set up a general data model for the symbol library, then to design a standard exchange format, and finally to call on the GIS manufacturer to provide the interchange tools for their symbol library for the standard exchange format. This paper analyzes the general characteristics of GIS symbolic library, gives a symbol library model and a draft of XML schema of the symbol library exchange format.
基金Item Sponsored by National Natural Science Foundation of China(51174253)
文摘Cooling process of iron ore pellets in a circular cooler has great impacts on the pellet quality and systematic energy exploitation. However, multi-variables and non-visualization of this gray system is unfavorable to efficient production. Thus, the cooling process of iron ore pellets was optimized using mathematical model and data mining techniques. A mathematical model was established and validated by steady-state production data, and the results show that the calculated values coincide very well with the measured values. Based on the proposed model, effects of important process parameters on gas-pellet temperature profiles within the circular cooler were analyzed to better understand the entire cooling process. Two data mining techniques—Association Rules Induction and Clustering were also applied on the steady-state production data to obtain expertise operating rules and optimized targets. Finally, an optimized control strategy for the circular cooler was proposed and an operation guidance system was developed. The system could realize the visualization of thermal process at steady state and provide operation guidance to optimize the circular cooler.
文摘The existing data mining methods are mostly focused on relational databases and structured data, but not on complex structured data (like in extensible markup language(XML)). By converting XML document type description to the relational semantic recording XML data relations, and using an XML data mining language, the XML data mining system presents a strategy to mine information on XML.
基金Sponsored by the Ability Enhancement Project of Teaching Staff in Harbin Institute of Technology(Grant No.06)
文摘In order to find an effective way to improve the quality of school management,finding valuable information from students' original data and providing feedback for student management are necessary. Firstly,some new and successful educational data mining models were analyzed and compared. These models have better performance than traditional models( such as Knowledge Tracing Model) in efficiency,comprehensiveness,ease of use,stability and so on. Then,the neural network algorithm was conducted to explore the feasibility of the application of educational data mining in student management,and the results show that it has enough predictive accuracy and reliability to be put into practice. In the end,the possibility and prospect of the application of educational data mining in teaching management system for university students was assessed.
基金Supported by Research on Pattern differentiation of AIDS based on Graph Theroy of National Natural Science Foundation of China(No.81202858)Research on Intervention Evaluation of TCM Health Differentiation of National Key Technology Support Program(No.2012BAI25B02)+3 种基金Research and Development in Digital Information System of Traditional Chinese Medicine of National 863 Program of China(No.2012AA02A609)Acupuncture Efficacy of Gastrointestinal Dysfunction(No.ZZ05003)Acupuncture-point Specialty Analysis based on Image Processing Technology(No.ZZ03090)of Self-selected subject of China Academy of Chinese Medical SciencesSemantic Recognition of Tongue and Pulse based on Image Content of the Beijing Key Laboratory of Advanced Information Science and Network Technology(No.XDXX1306)
文摘OBJECTIVE: To help researchers selecting appropriate data mining models to provide better evidence for the clinical practice of Traditional Chinese Medicine(TCM) diagnosis and therapy.METHODS: Clinical issues based on data mining models were comprehensively summarized from four significant elements of the clinical studies:symptoms, symptom patterns, herbs, and efficacy.Existing problems were further generalized to determine the relevant factors of the performance of data mining models, e.g. data type, samples, parameters, variable labels. Combining these relevant factors, the TCM clinical data features were compared with regards to statistical characters and informatics properties. Data models were compared simultaneously from the view of applied conditions and suitable scopes.RESULTS: The main application problems were the inconsistent data type and the small samples for the used data mining models, which caused the inappropriate results, even the mistake results. These features, i.e. advantages, disadvantages, satisfied data types, tasks of data mining, and the TCM issues, were summarized and compared.CONCLUSION: By aiming at the special features of different data mining models, the clinical doctors could select the suitable data mining models to resolve the TCM problem.
基金Item Sponsored by Spanish Ministry of Education and Science(DPI2007-61090)European Commission Research Programme of the Research Fund for Coal and Steel(RFS-PR-06035)
文摘An experience is presented using the finite element method (FEM) and data mining (DM) techniques to develop models that can be used to optimieze the skin-pass rolling process based on its operating conditions. A FE model based on a real skin-pass process is built and validated. Based on this model, a group of FE models is simulated with different adjustment parameters and with different materials for the sheet; both variables are chosen from pre-set ranges, From all FE model simulations, a database is generated; this database is made up of the above mentioned adjustment parameters, sheet properties and the variables of the process arising from the simulation of the model. Various types of data mining algorithms are used to develop predictive models for each of the variables of the process.The best predictive models can be used to predict experimentally hard-to-measure variables (internal stresses, internal straine, etc.) which are useful in the optimal design of the process or to be applied in real time control systems of a skin-pass process in -plant.
基金the National Defense 973 (Grant No.513180303) and National Defense Basic Scientific Research (Grant No. A2220061080)the Na-tional Defense Foundation (Grant No. 5142040205BQ0154).
文摘The high temperature dielectrics of Quartz fiber-reinforced silicon dioxide ceramic (Si02/SiO2 ) composites were studied both theoretically and experimentally. A multi-scale theoretical model was developed based on the theory of dielectrics. It was realized to predict dielectric properties at higher temperature ( 〉 1200 ℃) by experimental data mining for correlative coefficients in model. The results show that the dielectrics of SiO2/SiO2, which were calculated with the theoretical model, were in agreement with experimental measured value.
文摘Data Mining has become an important technique for the exploration and extraction of data in numerous and various research projects in different fields (technology, information technology, business, the environment, economics, etc.). In the context of the analysis and visualisation of large amounts of data extracted using Data Mining on a temporary basis (time-series), free software such as R has appeared in the international context as a perfect inexpensive and efficient tool of exploitation and visualisation of time series. This has allowed the development of models, which help to extract the most relevant information from large volumes of data. In this regard, a script has been developed with the goal of implementing ARIMA models, showing these as useful and quick mechanisms for the extraction, analysis and visualisation of large data volumes, in addition to presenting the great advantage of being applied in multiple branches of knowledge from economy, demography, physics, mathematics and fisheries among others. Therefore, ARIMA models appear as a Data Mining technique, offering reliable, robust and high-quality results, to help validate and sustain the research carried out.
基金supported by the National Natural Science Foundation of China(61371172)the International S&T Cooperation Program of China(2015DFR10220)+1 种基金the Ocean Engineering Project of National Key Laboratory Foundation(1213)the Fundamental Research Funds for the Central Universities(HEUCF1608)
文摘For the multi-mode radar working in the modern electronicbattlefield, different working states of one single radar areprone to being classified as multiple emitters when adoptingtraditional classification methods to process intercepted signals,which has a negative effect on signal classification. A classificationmethod based on spatial data mining is presented to address theabove challenge. Inspired by the idea of spatial data mining, theclassification method applies nuclear field to depicting the distributioninformation of pulse samples in feature space, and digs out thehidden cluster information by analyzing distribution characteristics.In addition, a membership-degree criterion to quantify the correlationamong all classes is established, which ensures classificationaccuracy of signal samples. Numerical experiments show that thepresented method can effectively prevent different working statesof multi-mode emitter from being classified as several emitters,and achieve higher classification accuracy.
基金This research work is supported by Hunan Provincial Education Science 13th Five-Year Plan(Grant No.XJK016BXX001,Zhou,H.,http://jyt.hunan.gov.cn/jyt/sjyt/jky/index.html)Social Science Foundation of Hunan Province(Grant No.17YBA049,Zhou,H.,https://sk.rednet.cn/channel/7862.html)The work is also supported by Open Foundation for University Innovation Platform from Hunan Province,China(Grand No.18K103,Sun,G.,http://kxjsc.gov.hnedu.cn/).
文摘Supply Chain Finance(SCF)is important for improving the effectiveness of supply chain capital operations and reducing the overall management cost of a supply chain.In recent years,with the deep integration of supply chain and Internet,Big Data,Artificial Intelligence,Internet of Things,Blockchain,etc.,the efficiency of supply chain financial services can be greatly promoted through building more customized risk pricing models and conducting more rigorous investment decision-making processes.However,with the rapid development of new technologies,the SCF data has been massively increased and new financial fraud behaviors or patterns are becoming more covertly scattered among normal ones.The lack of enough capability to handle the big data volumes and mitigate the financial frauds may lead to huge losses in supply chains.In this article,a distributed approach of big data mining is proposed for financial fraud detection in a supply chain,which implements the distributed deep learning model of Convolutional Neural Network(CNN)on big data infrastructure of Apache Spark and Hadoop to speed up the processing of the large dataset in parallel and reduce the processing time significantly.By training and testing on the continually updated SCF dataset,the approach can intelligently and automatically classify the massive data samples and discover the fraudulent financing behaviors,so as to enhance the financial fraud detection with high precision and recall rates,and reduce the losses of frauds in a supply chain.
基金supported by National High Technology Research and Development Program of China(863 Program)(No.AA420060)
文摘In the course of network supported collaborative design,the data processing plays a very vital role.Much effort has been spent in this area,and many kinds of approaches have been proposed.Based on the correlative materials,this paper presents extensible markup language(XML)based strategy for several important problems of data processing in network supported collaborative design,such as the representation of standard for the exchange of product model data(STEP)with XML in the product information expression and the management of XML documents using relational database.The paper gives a detailed exposition on how to clarify the mapping between XML structure and the relationship database structure and how XML-QL queries can be translated into structured query language(SQL)queries.Finally,the structure of data processing system based on XML is presented.
基金supported by the Key Program of the National Natural Science Foundation of China(Grant No.50539010)the Special Fund for Public Welfare Industry of the Ministry of Water Resources of China(Grant No.200801019)
文摘In conjunction with association rules for data mining, the connections between testing indices and strong and weak association rules were determined, and new derivative rules were obtained by further reasoning. Association rules were used to analyze correlation and check consistency between indices. This study shows that the judgment obtained by weak association rules or non-association rules is more accurate and more credible than that obtained by strong association rules. When the testing grades of two indices in the weak association rules are inconsistent, the testing grades of indices are more likely to be erroneous, and the mistakes are often caused by human factors. Clustering data mining technology was used to analyze the reliability of a diagnosis, or to perform health diagnosis directly. Analysis showed that the clustering results are related to the indices selected, and that if the indices selected are more significant, the characteristics of clustering results are also more significant, and the analysis or diagnosis is more credible. The indices and diagnosis analysis function produced by this study provide a necessary theoretical foundation and new ideas for the development of hydraulic metal structure health diagnosis technology.
基金Supported by Key Science-Technology Project ofHeilongjiang Province(GA010401-3)
文摘By rapid progress of network and storage technologies, a huge amount of electronic data such as Web pages and XML has been available on Internet. In this paper, we study a data-mining problem of discovering frequent ordered sub-trees in a large collection of XML data, where both of the patterns and the data are modeled by labeled ordered trees. We present an efficient algorithm of Ordered Subtree Miner (OSTMiner) based on two- layer neural networks with Hebb rule, that computes all ordered sub-trees appearing in a collection of XML trees with frequent above a user-specified threshold using a special structure EM-tree. In this algo- rithm, EM-tree is used as an extended merging tree to supply scheme information for efficient pruning and mining frequent sub-trees. Experiments results showed that OSTMiner has good response time and scales well.
基金This research was partly supported by the Technology Development Program of MSS[No.S3033853]by the National Research Foundation of Korea(NRF)grant funded by the Korea government(MSIT)(No.2021R1A4A1031509).
文摘Statistics are most crucial than ever due to the accessibility of huge counts of data from several domains such as finance,medicine,science,engineering,and so on.Statistical data mining(SDM)is an interdisciplinary domain that examines huge existing databases to discover patterns and connections from the data.It varies in classical statistics on the size of datasets and on the detail that the data could not primarily be gathered based on some experimental strategy but conversely for other resolves.Thus,this paper introduces an effective statistical Data Mining for Intelligent Rainfall Prediction using Slime Mould Optimization with Deep Learning(SDMIRPSMODL)model.In the presented SDMIRP-SMODL model,the feature subset selection process is performed by the SMO algorithm,which in turn minimizes the computation complexity.For rainfall prediction.Convolution neural network with long short-term memory(CNN-LSTM)technique is exploited.At last,this study involves the pelican optimization algorithm(POA)as a hyperparameter optimizer.The experimental evaluation of the SDMIRP-SMODL approach is tested utilizing a rainfall dataset comprising 23682 samples in the negative class and 1865 samples in the positive class.The comparative outcomes reported the supremacy of the SDMIRP-SMODL model compared to existing techniques.
文摘Data mining in the educational field can be used to optimize the teaching and learning performance among the students.The recently developed machine learning(ML)and deep learning(DL)approaches can be utilized to mine the data effectively.This study proposes an Improved Sailfish Optimizer-based Feature SelectionwithOptimal Stacked Sparse Autoencoder(ISOFS-OSSAE)for data mining and pattern recognition in the educational sector.The proposed ISOFS-OSSAE model aims to mine the educational data and derive decisions based on the feature selection and classification process.Moreover,the ISOFS-OSSAEmodel involves the design of the ISOFS technique to choose an optimal subset of features.Moreover,the swallow swarm optimization(SSO)with the SSAE model is derived to perform the classification process.To showcase the enhanced outcomes of the ISOFSOSSAE model,a wide range of experiments were taken place on a benchmark dataset from the University of California Irvine(UCI)Machine Learning Repository.The simulation results pointed out the improved classification performance of the ISOFS-OSSAE model over the recent state of art approaches interms of different performance measures.
基金Supported by Gansu Science and Technology Program(21YF5GA080)。
文摘To improve surface accuracy of the work-piece and obtain potentially valuable information,a dynamic milling force prediction model was proposed based on data mining.In view of the current dynamic milling force obtained through finite element simulation and analytical calculation,in the finite element modeling,the model built is inevitably different from the actual working conditions,and the analytical calculation is slightly cumbersome and complex,and a dynamic milling force prediction model based on data mining is proposed.The model was established using a combination of regression analysis and Radial Basis Function(RBF) neural network.Using data mining as a means,the internal relationship between milling force,cutting parameters,temperature,vibration and surface quality is deeply analyzed,and the influence of dynamic milling force changes on different situations is extracted and summarized by the methods of cluster analysis and correlation analysis.The results show that the proposed dynamic milling force model has a good prediction effect,ensures the production quality,reduces the occurrence of flutter,improves the surface accuracy of the work-piece,and provides a more accurate basis for the selection of process parameters.