Visual data mining is one of important approach of data mining techniques. Most of them are based on computer graphic techniques but few of them exploit image-processing techniques. This paper proposes an image proces...Visual data mining is one of important approach of data mining techniques. Most of them are based on computer graphic techniques but few of them exploit image-processing techniques. This paper proposes an image processing method, named RNAM (resemble neighborhood averaging method), to facilitate visual data mining, which is used to post-process the data mining result-image and help users to discover significant features and useful patterns effectively. The experiments show that the method is intuitive, easily-understanding and effectiveness. It provides a new approach for visual data mining.展开更多
The parameter values which actually change with the circumstances, weather and load level etc. produce great effect to the result of state estimation. A new parameter estimation method based on data mining technology ...The parameter values which actually change with the circumstances, weather and load level etc. produce great effect to the result of state estimation. A new parameter estimation method based on data mining technology was proposed. The clustering method was used to classify the historical data in supervisory control and data acquisition (SCADA) database as several types. The data processing technology was implied to treat the isolated point, missing data and yawp data in samples for classified groups. The measurement data which belong to each classification were introduced to the linear regression equation in order to gain the regression coefficient and actual parameters by the least square method. A practical system demonstrates the high correctness, reliability and strong practicability of the proposed method.展开更多
Backscatter electron analysis from scanning electron microscopes(BSE-SEM)produces high-resolution image data of both rock samples and thin-sections,showing detailed structural and geochemical(mineralogical)information...Backscatter electron analysis from scanning electron microscopes(BSE-SEM)produces high-resolution image data of both rock samples and thin-sections,showing detailed structural and geochemical(mineralogical)information.This allows an in-depth exploration of the rock microstructures and the coupled chemical characteristics in the BSE-SEM image to be made using image processing techniques.Although image processing is a powerful tool for revealing the more subtle data“hidden”in a picture,it is not a commonly employed method in geoscientific microstructural analysis.Here,we briefly introduce the general principles of image processing,and further discuss its application in studying rock microstructures using BSE-SEM image data.展开更多
In order to rapidly and effectively meet the informative demand from commanding decision-making, it is important to build, maintain and mine the intelligence database. The type, structure and maintenance of military i...In order to rapidly and effectively meet the informative demand from commanding decision-making, it is important to build, maintain and mine the intelligence database. The type, structure and maintenance of military intelligence database are discussed. On this condition, a new data-mining arithmetic based on relation intelligence database is presented according to the preference information and the requirement of time limit given by the commander. Furthermore, a simple calculative example is presented to prove the arithmetic with better maneuverability. Lastly, the problem of how to process the intelligence data mined from the intelligence database is discussed.展开更多
This paper tries to characterize volcanic rocks through the development and application of an empirical geomechanical system. Geotechnical information was collected from the samples from several Atlantic Ocean islands...This paper tries to characterize volcanic rocks through the development and application of an empirical geomechanical system. Geotechnical information was collected from the samples from several Atlantic Ocean islands including Madeira, Azores and Canarias archipelagos. An empirical rock classification system termed as the volcanic rock system(VRS) is developed and presented in detail. Results using the VRS are compared with those obtained using the traditional rock mass rating(RMR) system. Data mining(DM) techniques are applied to a database of volcanic rock geomechanical information from the islands.Different algorithms were developed and consequently approaches were followed for predicting rock mass classes using the VRS and RMR classification systems. Finally, some conclusions are drawn with emphasis on the fact that a better performance was achieved using attributes from VRS.展开更多
A new method of establishing rolling load distribution model was developed by online intelligent information-processing technology for plate rolling. The model combines knowledge model and mathematical model with usin...A new method of establishing rolling load distribution model was developed by online intelligent information-processing technology for plate rolling. The model combines knowledge model and mathematical model with using knowledge discovery in database (KDD) and data mining (DM) as the start. The online maintenance and optimization of the load model are realized. The effectiveness of this new method was testified by offline simulation and online application.展开更多
Objective This study aimed to examine and propagate the medication experience and group formula of traditional Chinese medicine(TCM)Master XIONG Jibo in diagnosing and treat-ing arthralgia syndrome(AS)through data min...Objective This study aimed to examine and propagate the medication experience and group formula of traditional Chinese medicine(TCM)Master XIONG Jibo in diagnosing and treat-ing arthralgia syndrome(AS)through data mining.Methods Data of outpatient cases of Professor XIONG Jibo were collected from January 1,2014 to December 31,2018,along with cases recorded in A Real Famous Traditional Chinese Medicine Doctor:XIONG Jibo's Clinical Medical Record 1,which was published in December 2019.The five variables collected from the patients’data were TCM diagnostic information,TCM and western medicine diagnoses,syndrome,treatment,and prescription.A database was established for the collected data with Excel.Using the Python environment,a custom-ized modified natural language processing(NLP)model for the diagnosis and treatment of AS by Professor XIONG Jibo was established to preprocess the data and to analyze the word cloud.Frequency analysis,association rule analysis,cluster analysis,and visual analysis of AS cases were performed based on the Traditional Chinese Medicine Inheritance Computing Platform(V3.0)and RStudio(V4.0.3).Results A total of 610 medical records of Professor XIONG Jibo were collected from the case database.A total of 103 medical records were included after data screening criteria,which comprised 187 times(45 kinds)of prescriptions and 1506 times(125 kinds)of Chinese herbs.The main related meridians were the liver,spleen,and kidney meridians.The properties of Chinese herbs used most were mainly warm,flat,and cold,while the flavors of herbs were mainly bitter,pungent,and sweet.The main patterns of AS included the damp heat,phlegm stasis,and neck arthralgia.The most commonly used herbs for AS were Chuanniuxi(Cyathu-lae Radix),Huangbo(Phellodendri Chinensis Cortex),Cangzhu(Atractylodis Rhizoma),Qinjiao(Gentianae Macrophyllae Radix),Gancao(Glycyrrhizae Radix et Rhizoma),Huangqi(Astragali Radix),and Chuanxiong(Chuanxiong Rhizoma).The most common effect of the herbs was“promoting blood circulation and removing blood stasis”,followed by“supple-menting deficiency(Qi supplementing,blood supplementing,and Yang supplementing)”,and“dispelling wind and dampness”.The data were analyzed with the support≥15%and con-fidence=100%,and after de-duplication,five second-order association rules,39 third-order association rules,39 fourth-order association rules,and two fifth-order association rules were identified.The top-ranking association rules of each were“Cangzhu(Atractylodis Rhizoma)→Huangbo(Phellodendri Chinensis Cortex)”“Cangzhu(Atractylodis Rhizoma)+Chuanniuxi(Cyathulae Radix)→Huangbo(Phellodendri Chinensis Cortex)”“Chuanniuxi(Cyathulae Radix)+Danggui(Angelicae Sinensis Radix)+Gancao(Glycyrrhizae Radix et Rhizoma)→Qinjiao(Gentianae Macrophyllae Radix)”and“Chuanniuxi(Cyathulae Radix)+Danggui(Angelicae Sinensis Radix)+Gancao(Glycyrrhizae Radix et Rhizoma)+Huangbo(Phello-dendri Chinensis Cortex)→Qinjiao(Gentianae Macrophyllae Radix)”,respectively.Five clusters were obtained using cluster analysis of the top 30 herbs.The herbs were mainly dry-ing dampness,supplementing Qi,and promoting blood circulation.The main prescriptions of AS were Ermiao San(二妙散),Gegen Jianghuang San(葛根姜黄散),and Huangqi Chongteng Yin(黄芪虫藤饮).The herbs of core prescription included Cangzhu(Atractylodis Rhizoma),Chuanniuxi(Cyathulae Radix),Gancao(Glycyrrhizae Radix et Rhizoma),Huangbo(Phellodendri Chinensis Cortex),Mugua(Chaenomelis Fructus),Qinjiao(Gentianae Macro-phyllae Radix),Danggui(Angelicae Sinensis Radix),and Yiyiren(Coicis Semen).Conclusion Clearing heat and dampness,relieving collaterals and pain,and invigorating Qi and blood are the most commonly used therapies for the treatment of AS by Professor XIONG Jibo.Additionally,customized NLP model could improve the efficiency of data mining in TCM.展开更多
It is crucial,while using healthcare data,to assess the advantages of data privacy against the possible drawbacks.Data from several sources must be combined for use in many data mining applications.The medical practit...It is crucial,while using healthcare data,to assess the advantages of data privacy against the possible drawbacks.Data from several sources must be combined for use in many data mining applications.The medical practitioner may use the results of association rule mining performed on this aggregated data to better personalize patient care and implement preventive measures.Historically,numerous heuristics(e.g.,greedy search)and metaheuristics-based techniques(e.g.,evolutionary algorithm)have been created for the positive association rule in privacy preserving data mining(PPDM).When it comes to connecting seemingly unrelated diseases and drugs,negative association rules may be more informative than their positive counterparts.It is well-known that during negative association rules mining,a large number of uninteresting rules are formed,making this a difficult problem to tackle.In this research,we offer an adaptive method for negative association rule mining in vertically partitioned healthcare datasets that respects users’privacy.The applied approach dynamically determines the transactions to be interrupted for information hiding,as opposed to predefining them.This study introduces a novel method for addressing the problem of negative association rules in healthcare data mining,one that is based on the Tabu-genetic optimization paradigm.Tabu search is advantageous since it removes a huge number of unnecessary rules and item sets.Experiments using benchmark healthcare datasets prove that the discussed scheme outperforms state-of-the-art solutions in terms of decreasing side effects and data distortions,as measured by the indicator of hiding failure.展开更多
The big data cloud computing is a new computing mode,which integrates the distributed processing,the parallel processing,the network computing,the virtualization technology,the load balancing and other network technol...The big data cloud computing is a new computing mode,which integrates the distributed processing,the parallel processing,the network computing,the virtualization technology,the load balancing and other network technologies.Under the operation of the big data cloud computing system,the computing resources can be distributed in a resource pool composed of a large number of the computers,allowing users to connect with the remote computer systems according to their own data information needs.展开更多
This paper adopts data mining(DM) technique and fuzzy system theory for robust time series forecasting.By introducing DM technique,the fuzzy rule extraction algorithm is improved to be more robust with the noises and ...This paper adopts data mining(DM) technique and fuzzy system theory for robust time series forecasting.By introducing DM technique,the fuzzy rule extraction algorithm is improved to be more robust with the noises and outliers in time series.Then,the constructed fuzzy inference system(FIS) is optimized with a partition refining strategy to balance the system's accuracy and complexity.The proposed algorithm is compared with the WangMendel(WM) method,a benchmark method for building FIS,in comprehensive analysis of robustness.In the classical Mackey-Glass time series forecasting,the simulation results prove that the proposed method is able to predict time series with random perturbation more accurately.For the practical application,the proposed FIS is applied to predicting the time series of ship maneuvering motion.To obtain actual time series data records,the ship maneuvering motion trial is conducted in the Yukun ship of Dalian Maritime University in China.The time series forecasting results show that the FIS constructed with DM concepts can forecast ship maneuvering motion robustly and effectively.展开更多
Blast furnace data processing is prone to problems such as outliers.To overcome these problems and identify an improved method for processing blast furnace data,we conducted an in-depth study of blast furnace data.Bas...Blast furnace data processing is prone to problems such as outliers.To overcome these problems and identify an improved method for processing blast furnace data,we conducted an in-depth study of blast furnace data.Based on data samples from selected iron and steel companies,data types were classified according to different characteristics;then,appropriate methods were selected to process them in order to solve the deficiencies and outliers of the original blast furnace data.Linear interpolation was used to fill in the divided continuation data,the Knearest neighbor(KNN)algorithm was used to fill in correlation data with the internal law,and periodic statistical data were filled by the average.The error rate in the filling was low,and the fitting degree was over 85%.For the screening of outliers,corresponding indicator parameters were added according to the continuity,relevance,and periodicity of different data.Also,a variety of algorithms were used for processing.Through the analysis of screening results,a large amount of efficient information in the data was retained,and ineffective outliers were eliminated.Standardized processing of blast furnace big data as the basis of applied research on blast furnace big data can serve as an important means to improve data quality and retain data value.展开更多
Real-time perception of rock mass information is of great importance to efficient tunneling and hazard prevention in tunnel boring machines(TBMs).In this study,a TBM-rock mutual feedback perception method based on dat...Real-time perception of rock mass information is of great importance to efficient tunneling and hazard prevention in tunnel boring machines(TBMs).In this study,a TBM-rock mutual feedback perception method based on data mining(DM) is proposed,which takes 10 tunneling parameters related to surrounding rock conditions as input features.For implementation,first,the database of TBM tunneling parameters was established,in which 10,807 tunneling cycles from the Songhua River water conveyance tunnel were accommodated.Then,the spectral clustering(SC) algorithm based on graph theory was introduced to cluster the TBM tunneling data.According to the clustering results and rock mass boreability index,the rock mass conditions were classified into four classes,and the reasonable distribution intervals of the main tunneling parameters corresponding to each class were presented.Meanwhile,based on the deep neural network(DNN),the real-time prediction model regarding different rock conditions was established.Finally,the rationality and adaptability of the proposed method were validated via analyzing the tunneling specific energy,feature importance,and training dataset size.The proposed TBM-rock mutual feedback perception method enables the automatic identification of rock mass conditions and the dynamic adjustment of tunneling parameters during TBM driving.Furthermore,in terms of the prediction performance,the method can predict the rock mass conditions ahead of the tunnel face in real time more accurately than the traditional machine learning prediction methods.展开更多
In this paper, we designed a customer-centered data warehouse system with five subjects: listing, bidding, transaction, accounts, and customer contact based on the business process of online auction companies. For ea...In this paper, we designed a customer-centered data warehouse system with five subjects: listing, bidding, transaction, accounts, and customer contact based on the business process of online auction companies. For each subject, we analyzed its fact indexes and dimensions. Then take transaction subject as example, analyzed the data warehouse model in detail, and got the multi-dimensional analysis structure of transaction subject. At last, using data mining to do customer segmentation, we divided customers into four types: impulse customer, prudent customer, potential customer, and ordinary customer. By the result of multi-dimensional customer data analysis, online auction companies can do more target marketing and increase customer loyalty.展开更多
The performance and reliability of converting natural language into structured query language can be problematic in handling nuances that are prevalent in natural language. Relational databases are not designed to und...The performance and reliability of converting natural language into structured query language can be problematic in handling nuances that are prevalent in natural language. Relational databases are not designed to understand language nuance, therefore the question why we must handle nuance has to be asked. This paper is looking at an alternative solution for the conversion of a Natural Language Query into a Structured Query Language (SQL) capable of being used to search a relational database. The process uses the natural language concept, Part of Speech to identify words that can be used to identify database tables and table columns. The use of Open NLP based grammar files, as well as additional configuration files, assist in the translation from natural language to query language. Having identified which tables and which columns contain the pertinent data the next step is to create the SQL statement.展开更多
With the advent of the IoT era, the amount of real-time data that is processed in data centers has increased explosively. As a result, stream mining, extracting useful knowledge from a huge amount of data in real time...With the advent of the IoT era, the amount of real-time data that is processed in data centers has increased explosively. As a result, stream mining, extracting useful knowledge from a huge amount of data in real time, is attracting more and more attention. It is said, however, that real- time stream processing will become more difficult in the near future, because the performance of processing applications continues to increase at a rate of 10% - 15% each year, while the amount of data to be processed is increasing exponentially. In this study, we focused on identifying a promising stream mining algorithm, specifically a Frequent Itemset Mining (FIsM) algorithm, then we improved its performance using an FPGA. FIsM algorithms are important and are basic data- mining techniques used to discover association rules from transactional databases. We improved on an approximate FIsM algorithm proposed recently so that it would fit onto hardware architecture efficiently. We then ran experiments on an FPGA. As a result, we have been able to achieve a speed 400% faster than the original algorithm implemented on a CPU. Moreover, our FPGA prototype showed a 20 times speed improvement compared to the CPU version.展开更多
基于CRISP-DM(cross-industry standard process for data mining)模型设计与实现了一个时序预测Web服务,对网站资源的下载需求量进行预测。重点阐述了CRISP-DM模型应用于时序预测任务时的设计思想和实现的关键技术。测试结果表明,该时...基于CRISP-DM(cross-industry standard process for data mining)模型设计与实现了一个时序预测Web服务,对网站资源的下载需求量进行预测。重点阐述了CRISP-DM模型应用于时序预测任务时的设计思想和实现的关键技术。测试结果表明,该时序预测Web服务具有较高的预测准确率,部署快速,使用方便,对解决同类问题具有一定的示范和参考价值。展开更多
基金Supported by the National Natural Science Foun-dation of China (60173051) ,the Teaching and Research Award Pro-gramfor Outstanding Young Teachers in Higher Education Institu-tions of Ministry of Education of China ,and Liaoning Province HigherEducation Research Foundation (20040206)
文摘Visual data mining is one of important approach of data mining techniques. Most of them are based on computer graphic techniques but few of them exploit image-processing techniques. This paper proposes an image processing method, named RNAM (resemble neighborhood averaging method), to facilitate visual data mining, which is used to post-process the data mining result-image and help users to discover significant features and useful patterns effectively. The experiments show that the method is intuitive, easily-understanding and effectiveness. It provides a new approach for visual data mining.
基金the National High Technology Research and Development (863) Program of China(No. 2006AA05Z214)
文摘The parameter values which actually change with the circumstances, weather and load level etc. produce great effect to the result of state estimation. A new parameter estimation method based on data mining technology was proposed. The clustering method was used to classify the historical data in supervisory control and data acquisition (SCADA) database as several types. The data processing technology was implied to treat the isolated point, missing data and yawp data in samples for classified groups. The measurement data which belong to each classification were introduced to the linear regression equation in order to gain the regression coefficient and actual parameters by the least square method. A practical system demonstrates the high correctness, reliability and strong practicability of the proposed method.
基金funded by the National Natural Science Foundation(No.42261134535)the National Key Research and Development Program(No.2023YFE0125000)+2 种基金the Frontiers Science Center for Deep-time Digital Earth(No.2652023001)the 111 Project of the Ministry of Science and Technology(No.BP0719021)supported by the department of Geology,University of Vienna(No.FA536901)。
文摘Backscatter electron analysis from scanning electron microscopes(BSE-SEM)produces high-resolution image data of both rock samples and thin-sections,showing detailed structural and geochemical(mineralogical)information.This allows an in-depth exploration of the rock microstructures and the coupled chemical characteristics in the BSE-SEM image to be made using image processing techniques.Although image processing is a powerful tool for revealing the more subtle data“hidden”in a picture,it is not a commonly employed method in geoscientific microstructural analysis.Here,we briefly introduce the general principles of image processing,and further discuss its application in studying rock microstructures using BSE-SEM image data.
文摘In order to rapidly and effectively meet the informative demand from commanding decision-making, it is important to build, maintain and mine the intelligence database. The type, structure and maintenance of military intelligence database are discussed. On this condition, a new data-mining arithmetic based on relation intelligence database is presented according to the preference information and the requirement of time limit given by the commander. Furthermore, a simple calculative example is presented to prove the arithmetic with better maneuverability. Lastly, the problem of how to process the intelligence data mined from the intelligence database is discussed.
文摘This paper tries to characterize volcanic rocks through the development and application of an empirical geomechanical system. Geotechnical information was collected from the samples from several Atlantic Ocean islands including Madeira, Azores and Canarias archipelagos. An empirical rock classification system termed as the volcanic rock system(VRS) is developed and presented in detail. Results using the VRS are compared with those obtained using the traditional rock mass rating(RMR) system. Data mining(DM) techniques are applied to a database of volcanic rock geomechanical information from the islands.Different algorithms were developed and consequently approaches were followed for predicting rock mass classes using the VRS and RMR classification systems. Finally, some conclusions are drawn with emphasis on the fact that a better performance was achieved using attributes from VRS.
文摘A new method of establishing rolling load distribution model was developed by online intelligent information-processing technology for plate rolling. The model combines knowledge model and mathematical model with using knowledge discovery in database (KDD) and data mining (DM) as the start. The online maintenance and optimization of the load model are realized. The effectiveness of this new method was testified by offline simulation and online application.
基金Project of State Administration of Traditional Chinese Medicine(GZY-YZS-2019-45)The Horizontal Project of Hunan Medical College(HYH-2021Y-KJ-6-33)+1 种基金Scientific Research Project of Hunan Provincial Department of Education in 2021(21C0223)Natural Science Foundation of Hunan Province in 2022(1524)。
文摘Objective This study aimed to examine and propagate the medication experience and group formula of traditional Chinese medicine(TCM)Master XIONG Jibo in diagnosing and treat-ing arthralgia syndrome(AS)through data mining.Methods Data of outpatient cases of Professor XIONG Jibo were collected from January 1,2014 to December 31,2018,along with cases recorded in A Real Famous Traditional Chinese Medicine Doctor:XIONG Jibo's Clinical Medical Record 1,which was published in December 2019.The five variables collected from the patients’data were TCM diagnostic information,TCM and western medicine diagnoses,syndrome,treatment,and prescription.A database was established for the collected data with Excel.Using the Python environment,a custom-ized modified natural language processing(NLP)model for the diagnosis and treatment of AS by Professor XIONG Jibo was established to preprocess the data and to analyze the word cloud.Frequency analysis,association rule analysis,cluster analysis,and visual analysis of AS cases were performed based on the Traditional Chinese Medicine Inheritance Computing Platform(V3.0)and RStudio(V4.0.3).Results A total of 610 medical records of Professor XIONG Jibo were collected from the case database.A total of 103 medical records were included after data screening criteria,which comprised 187 times(45 kinds)of prescriptions and 1506 times(125 kinds)of Chinese herbs.The main related meridians were the liver,spleen,and kidney meridians.The properties of Chinese herbs used most were mainly warm,flat,and cold,while the flavors of herbs were mainly bitter,pungent,and sweet.The main patterns of AS included the damp heat,phlegm stasis,and neck arthralgia.The most commonly used herbs for AS were Chuanniuxi(Cyathu-lae Radix),Huangbo(Phellodendri Chinensis Cortex),Cangzhu(Atractylodis Rhizoma),Qinjiao(Gentianae Macrophyllae Radix),Gancao(Glycyrrhizae Radix et Rhizoma),Huangqi(Astragali Radix),and Chuanxiong(Chuanxiong Rhizoma).The most common effect of the herbs was“promoting blood circulation and removing blood stasis”,followed by“supple-menting deficiency(Qi supplementing,blood supplementing,and Yang supplementing)”,and“dispelling wind and dampness”.The data were analyzed with the support≥15%and con-fidence=100%,and after de-duplication,five second-order association rules,39 third-order association rules,39 fourth-order association rules,and two fifth-order association rules were identified.The top-ranking association rules of each were“Cangzhu(Atractylodis Rhizoma)→Huangbo(Phellodendri Chinensis Cortex)”“Cangzhu(Atractylodis Rhizoma)+Chuanniuxi(Cyathulae Radix)→Huangbo(Phellodendri Chinensis Cortex)”“Chuanniuxi(Cyathulae Radix)+Danggui(Angelicae Sinensis Radix)+Gancao(Glycyrrhizae Radix et Rhizoma)→Qinjiao(Gentianae Macrophyllae Radix)”and“Chuanniuxi(Cyathulae Radix)+Danggui(Angelicae Sinensis Radix)+Gancao(Glycyrrhizae Radix et Rhizoma)+Huangbo(Phello-dendri Chinensis Cortex)→Qinjiao(Gentianae Macrophyllae Radix)”,respectively.Five clusters were obtained using cluster analysis of the top 30 herbs.The herbs were mainly dry-ing dampness,supplementing Qi,and promoting blood circulation.The main prescriptions of AS were Ermiao San(二妙散),Gegen Jianghuang San(葛根姜黄散),and Huangqi Chongteng Yin(黄芪虫藤饮).The herbs of core prescription included Cangzhu(Atractylodis Rhizoma),Chuanniuxi(Cyathulae Radix),Gancao(Glycyrrhizae Radix et Rhizoma),Huangbo(Phellodendri Chinensis Cortex),Mugua(Chaenomelis Fructus),Qinjiao(Gentianae Macro-phyllae Radix),Danggui(Angelicae Sinensis Radix),and Yiyiren(Coicis Semen).Conclusion Clearing heat and dampness,relieving collaterals and pain,and invigorating Qi and blood are the most commonly used therapies for the treatment of AS by Professor XIONG Jibo.Additionally,customized NLP model could improve the efficiency of data mining in TCM.
文摘It is crucial,while using healthcare data,to assess the advantages of data privacy against the possible drawbacks.Data from several sources must be combined for use in many data mining applications.The medical practitioner may use the results of association rule mining performed on this aggregated data to better personalize patient care and implement preventive measures.Historically,numerous heuristics(e.g.,greedy search)and metaheuristics-based techniques(e.g.,evolutionary algorithm)have been created for the positive association rule in privacy preserving data mining(PPDM).When it comes to connecting seemingly unrelated diseases and drugs,negative association rules may be more informative than their positive counterparts.It is well-known that during negative association rules mining,a large number of uninteresting rules are formed,making this a difficult problem to tackle.In this research,we offer an adaptive method for negative association rule mining in vertically partitioned healthcare datasets that respects users’privacy.The applied approach dynamically determines the transactions to be interrupted for information hiding,as opposed to predefining them.This study introduces a novel method for addressing the problem of negative association rules in healthcare data mining,one that is based on the Tabu-genetic optimization paradigm.Tabu search is advantageous since it removes a huge number of unnecessary rules and item sets.Experiments using benchmark healthcare datasets prove that the discussed scheme outperforms state-of-the-art solutions in terms of decreasing side effects and data distortions,as measured by the indicator of hiding failure.
文摘The big data cloud computing is a new computing mode,which integrates the distributed processing,the parallel processing,the network computing,the virtualization technology,the load balancing and other network technologies.Under the operation of the big data cloud computing system,the computing resources can be distributed in a resource pool composed of a large number of the computers,allowing users to connect with the remote computer systems according to their own data information needs.
基金the Fundamental Research Funds for the Central Universities,China(No.01750307)the Doctoral Scientific Research Foundation of Liaoning Province,China(No.201501188)
文摘This paper adopts data mining(DM) technique and fuzzy system theory for robust time series forecasting.By introducing DM technique,the fuzzy rule extraction algorithm is improved to be more robust with the noises and outliers in time series.Then,the constructed fuzzy inference system(FIS) is optimized with a partition refining strategy to balance the system's accuracy and complexity.The proposed algorithm is compared with the WangMendel(WM) method,a benchmark method for building FIS,in comprehensive analysis of robustness.In the classical Mackey-Glass time series forecasting,the simulation results prove that the proposed method is able to predict time series with random perturbation more accurately.For the practical application,the proposed FIS is applied to predicting the time series of ship maneuvering motion.To obtain actual time series data records,the ship maneuvering motion trial is conducted in the Yukun ship of Dalian Maritime University in China.The time series forecasting results show that the FIS constructed with DM concepts can forecast ship maneuvering motion robustly and effectively.
基金This work is financially supported by the National Nature Science Foundation of China(No.52004096)the Hebei Province High-End Iron and Steel Metallurgical Joint Research Fund Project,China(No.E2019209314)+1 种基金the Scientific Research Program Project of Hebei Education Department,China(No.QN2019200)the Tangshan Science and Technology Planning Project,China(No.19150241E).
文摘Blast furnace data processing is prone to problems such as outliers.To overcome these problems and identify an improved method for processing blast furnace data,we conducted an in-depth study of blast furnace data.Based on data samples from selected iron and steel companies,data types were classified according to different characteristics;then,appropriate methods were selected to process them in order to solve the deficiencies and outliers of the original blast furnace data.Linear interpolation was used to fill in the divided continuation data,the Knearest neighbor(KNN)algorithm was used to fill in correlation data with the internal law,and periodic statistical data were filled by the average.The error rate in the filling was low,and the fitting degree was over 85%.For the screening of outliers,corresponding indicator parameters were added according to the continuity,relevance,and periodicity of different data.Also,a variety of algorithms were used for processing.Through the analysis of screening results,a large amount of efficient information in the data was retained,and ineffective outliers were eliminated.Standardized processing of blast furnace big data as the basis of applied research on blast furnace big data can serve as an important means to improve data quality and retain data value.
基金supported by the National Natural Science Foundation of China(Grant Nos.41772309 and 51908431)the Outstanding Youth Foundation of Hubei Province,China(Grant No.2019CFA074)。
文摘Real-time perception of rock mass information is of great importance to efficient tunneling and hazard prevention in tunnel boring machines(TBMs).In this study,a TBM-rock mutual feedback perception method based on data mining(DM) is proposed,which takes 10 tunneling parameters related to surrounding rock conditions as input features.For implementation,first,the database of TBM tunneling parameters was established,in which 10,807 tunneling cycles from the Songhua River water conveyance tunnel were accommodated.Then,the spectral clustering(SC) algorithm based on graph theory was introduced to cluster the TBM tunneling data.According to the clustering results and rock mass boreability index,the rock mass conditions were classified into four classes,and the reasonable distribution intervals of the main tunneling parameters corresponding to each class were presented.Meanwhile,based on the deep neural network(DNN),the real-time prediction model regarding different rock conditions was established.Finally,the rationality and adaptability of the proposed method were validated via analyzing the tunneling specific energy,feature importance,and training dataset size.The proposed TBM-rock mutual feedback perception method enables the automatic identification of rock mass conditions and the dynamic adjustment of tunneling parameters during TBM driving.Furthermore,in terms of the prediction performance,the method can predict the rock mass conditions ahead of the tunnel face in real time more accurately than the traditional machine learning prediction methods.
基金Supported by the National Natural Science Foundation of China (70471037)211 Project Foundation of Shanghai University (8011040506)
文摘In this paper, we designed a customer-centered data warehouse system with five subjects: listing, bidding, transaction, accounts, and customer contact based on the business process of online auction companies. For each subject, we analyzed its fact indexes and dimensions. Then take transaction subject as example, analyzed the data warehouse model in detail, and got the multi-dimensional analysis structure of transaction subject. At last, using data mining to do customer segmentation, we divided customers into four types: impulse customer, prudent customer, potential customer, and ordinary customer. By the result of multi-dimensional customer data analysis, online auction companies can do more target marketing and increase customer loyalty.
文摘The performance and reliability of converting natural language into structured query language can be problematic in handling nuances that are prevalent in natural language. Relational databases are not designed to understand language nuance, therefore the question why we must handle nuance has to be asked. This paper is looking at an alternative solution for the conversion of a Natural Language Query into a Structured Query Language (SQL) capable of being used to search a relational database. The process uses the natural language concept, Part of Speech to identify words that can be used to identify database tables and table columns. The use of Open NLP based grammar files, as well as additional configuration files, assist in the translation from natural language to query language. Having identified which tables and which columns contain the pertinent data the next step is to create the SQL statement.
文摘With the advent of the IoT era, the amount of real-time data that is processed in data centers has increased explosively. As a result, stream mining, extracting useful knowledge from a huge amount of data in real time, is attracting more and more attention. It is said, however, that real- time stream processing will become more difficult in the near future, because the performance of processing applications continues to increase at a rate of 10% - 15% each year, while the amount of data to be processed is increasing exponentially. In this study, we focused on identifying a promising stream mining algorithm, specifically a Frequent Itemset Mining (FIsM) algorithm, then we improved its performance using an FPGA. FIsM algorithms are important and are basic data- mining techniques used to discover association rules from transactional databases. We improved on an approximate FIsM algorithm proposed recently so that it would fit onto hardware architecture efficiently. We then ran experiments on an FPGA. As a result, we have been able to achieve a speed 400% faster than the original algorithm implemented on a CPU. Moreover, our FPGA prototype showed a 20 times speed improvement compared to the CPU version.
文摘基于CRISP-DM(cross-industry standard process for data mining)模型设计与实现了一个时序预测Web服务,对网站资源的下载需求量进行预测。重点阐述了CRISP-DM模型应用于时序预测任务时的设计思想和实现的关键技术。测试结果表明,该时序预测Web服务具有较高的预测准确率,部署快速,使用方便,对解决同类问题具有一定的示范和参考价值。