The technique of data mining was provided to predict gas disaster in view of the characteristics of coal mine gas disaster and feature knowledge based on gas disaster. The rough set theory was used to establish data m...The technique of data mining was provided to predict gas disaster in view of the characteristics of coal mine gas disaster and feature knowledge based on gas disaster. The rough set theory was used to establish data mining model of gas disaster prediction, and rough set attributes relations was discussed in prediction model of gas disaster to supplement the shortages of rough intensive reduction method by using information en- tropy criteria.The effectiveness and practicality of data mining technology in the prediction of gas disaster is confirmed through practical application.展开更多
With the gradual acceleration of information construction in colleges and universities,digital campus and smart campus have gradually become important means for colleges and universities to scientifically manage the c...With the gradual acceleration of information construction in colleges and universities,digital campus and smart campus have gradually become important means for colleges and universities to scientifically manage the campus.They have been applied to teaching,scientific research,student management,and other fields,improving the quality and efficiency of management.This paper mainly studies the intelligent educational administration management system based on data mining technology.Firstly,this paper introduces the application process of data mining technology,and builds an intelligent educational administration management system based on data mining technology.Then,this paper optimizes the application of the Apriori algorithm in educational administration management through transaction compression and frequent sampling.Compared with the traditional Apriori algorithm,the optimized Apriori algorithm in this paper has a shorter execution time under the same minimum support.展开更多
In the present study,data mining and network pharmacology were utilized to explore the principles and mechanisms of traditional Chinese medicine(TCM)in treating acute appendicitis.The goal was to provide a scientific ...In the present study,data mining and network pharmacology were utilized to explore the principles and mechanisms of traditional Chinese medicine(TCM)in treating acute appendicitis.The goal was to provide a scientific basis for clinical treatment and further research on this disease.First,we searched the National Patent Database for Chinese herbal compound prescriptions used to treat acute appendicitis.We then applied frequency analysis,character and taste meridian analysis,association rule analysis,and hierarchical cluster analysis to identify the patterns of TCM treatment for acute appendicitis,selecting key combinations of Chinese medicines.Next,we screened the main active components of these key TCM based on quality markers.Using databases such as SwissTargetPrediction,SymMap,ETCM,and STRING,we analyzed the pharmacological mechanisms of these key TCM in treating acute appendicitis.Key active components and targets were further verified through molecular docking.We identified a total of 129 patents involving 316 Chinese medicines,with 24 being frequently used.The results indicated that most Chinese herbs used for acute appendicitis were heat-clearing drugs,blood-activating and stasis-removing drugs,and purging drugs.The primary active ingredients of the Rhubarb-cortex moutan-flos lonicerae combination for treating acute appendicitis included Emodin,Paeonol,Physcion,Chlorogenic acid,Chrysophanol,Rhein acid,and Aloe-emodin.These ingredients targeted key proteins such as ALB,TP53,BCL2,STAT3,IL-6,and TNF,and were involved in cellular responses to lipopolysaccharides,cell composition,and various cytokine-mediated biological processes.They also interacted with signaling pathways like AGE-RAGE,TNF,IL-17,and FoxO.Based on patent data,this study analyzed medication patterns in the treatment of acute appendicitis,discussed the possible mechanisms of key TCM combinations,and provided a scientific basis and new perspectives for the diagnosis and treatment of the disease.展开更多
Objective To explore the optimization and principles of acupoint selection and coordination in the treatment of adult abdominal obesity using acupuncture and moxibustion over the past decade using data mining.Methods ...Objective To explore the optimization and principles of acupoint selection and coordination in the treatment of adult abdominal obesity using acupuncture and moxibustion over the past decade using data mining.Methods Clinical studies of abdominal obesity treated with acupuncture and moxibustion,collected in the past 10 years,were searched from China Biology Medicine disc(CBMdisc),China National knowledge infrastructure(CNKI),Wanfang,China Science and Technology Journal Database(VIP),Pubmed,Embase,Google Scholar,Web of Science,(The Cumulative Index to Nursing and Allied Health Literature)CINAHL,Psyclnfo and Scopus,dated from March 1,2013 to March 31,2023.Using IBM SPSS Modeler 18.0 and other software,the frequency analysis,association-rules analysis and cluster analysis were conducted on interventions,traditional Chinese medicine(TCM)patterns,use frequency of acupoint,meridian attribution of acupoint,acupoint location,etc.Results A total of 55 articles were included,with 102 prescriptions and 71 acupoints involved.The top 3 interventions were acupoint embedding method,simple electroacupuncture and simple filiform needling.Seventeen patterns/syndromes of TCM differentiation were collected,dominated by spleen deficiency and damp blockage,spleen and kidney yang deficiency and heat accumulation in stomach and intestines.The acupoints in clinical practice were mostly at the foot-yangming stomach meridian,the conception vessel and the foot-taiyin spleen meridian,and located at the abdominal region.The top 5 acupoints of high frequency were Tianshu(ST25),Zhongwan(CV12),Daheng(SP15),Zusanli(ST36),Huaroumen(ST24)and Daimai(GB26).The specific points of the high frequency were the crossing points and front-mu points,of which,ST25 and CV12 were the most prominent.After association-rules analysis on the high-frequency acupoints,20 groups of associated acupoints were obtained,in which,the core acupoints included ST25,CV12,SP15 and ST36.Conclusion In recent 10 years,abdominal obesity is treated by the acupoints of foot-yangming stomach meridian,the conception vessel and the foot-taiyin spleen meridian.Compared with the regimen for simple obesity,the acupoints at the abdominal region are specially selected in treatment of abdominal obesity,such as ST25,CV12,SP15 and ST36.Supplementary acupoints are selected based on syndrome differentiation to simultaneously address both the disease manifestations and root causes.展开更多
Objective:To explore the core acupuncture acupoints and pattern-adapted acupoint combination rules for autism spectrum disorder(ASD)complicated with sleep disorder using clinical data mining technology.Methods:A retro...Objective:To explore the core acupuncture acupoints and pattern-adapted acupoint combination rules for autism spectrum disorder(ASD)complicated with sleep disorder using clinical data mining technology.Methods:A retrospective analysis was conducted on the diagnosis and treatment data of 104 children with ASD complicated with sleep disorder admitted to Xi’an Traditional Chinese Medicine(TCM)Encephalopathy Hospital from January 2022 to December 2024.Cross-pattern main acupoints were screened via frequency statistics,chi-square test,and factor analysis;pattern-specific auxiliary acupoints were extracted by combining multiple correspondence analysis,cluster analysis,and association rule mining.Results:Ten cross-pattern main acupoints(Baihui,Sishenzhen,Language Area 1,Language Area 2,Neiguan,Shenmen,Yongquan,Xuanzhong)were identified,and acupoint combination schemes for four major TCM patterns(Hyperactivity of Liver and Heart Fire,Deficiency of Kidney Essence,Deficiency of Both Heart and Spleen,Hyperactivity of Liver with Spleen Deficiency)were established.Conclusion:Acupuncture treatment should follow the principle of“regulating spirit and calming the brain as the root,and dredging collaterals based on pattern differentiation as the branch”.The synergy between main and auxiliary acupoints can accurately regulate the disease,providing a basis for precise clinical treatment.展开更多
Introduction Neurosurgical emergencies such as spontaneous intracerebral hemorrhage(ICH),traumatic brain injury(TBI),and acute brain herniation are among the most time-sensitive and high-stakes conditions in modern me...Introduction Neurosurgical emergencies such as spontaneous intracerebral hemorrhage(ICH),traumatic brain injury(TBI),and acute brain herniation are among the most time-sensitive and high-stakes conditions in modern medicine.Clinical decisions often must be made within minutes,yet these decisions are traditionally guided by limited information,heuristic reasoning,and past experience.In this context,the rise of medical data mining and real-time analytics offers a transformative opportunity:to extract actionable intelligence from the flood of clinical,imaging,and physiological data already being collected,and to use this intelligence to guide care in real time[1–3](Figure 1).展开更多
Previous weighted frequent pattern (WFP) mining algorithms are not suitable for data streams for they need multiple database scans. In this paper, we present an efficient algorithm SWFP-Miner to mine weighted freque...Previous weighted frequent pattern (WFP) mining algorithms are not suitable for data streams for they need multiple database scans. In this paper, we present an efficient algorithm SWFP-Miner to mine weighted frequent pattern over data streams. SWFP-Miner is based on sliding window and can discover important frequent pattern from the recent data. A new refined weight definition is proposed to keep the downward closure property, and two pruning strategies are presented to prune the weighted infrequent pattern. Experimental studies are performed to evaluate the effectiveness and efficiency of SWFP-Miner.展开更多
At present, there are some resistible illegal operations aiming at creating false public opinions in internet public opinions on emergent event, which seriously disrupted the normal Internet order. However, the tradit...At present, there are some resistible illegal operations aiming at creating false public opinions in internet public opinions on emergent event, which seriously disrupted the normal Internet order. However, the traditional research method of internet public opinion pre-waming mainly relies on manual analysis, which is too inefficient to adapt to the analysis of massive internet public opinion information. According to the above analysis, this paper puts forward an internet public opinion pre-warning mechanism on emergent event based on multi-relational data clustering algorithm, discusses the specific pre-waming from the aspects of the state and dissemination of internet public opinions and the historical data, and automatically classifies the internet public opinions through multi-relational data clustering algorithm. And the results show that such method can be used to effectively study the internet public opinion pre-waming on emergent event, with the accuracy rate of as high as 95%.展开更多
The safe production of coalmine can be further improved by forecasting the quantity of gas emission based on the real-time data and historical data which the gas monitoring system has saved. By making use of the advan...The safe production of coalmine can be further improved by forecasting the quantity of gas emission based on the real-time data and historical data which the gas monitoring system has saved. By making use of the advantages of data warehouse and data mining technology for processing large quantity of redundancy data, the method and its application of forecasting mine gas emission quantity based on FDM were studied. The constructing fuzzy resembling relation and clustering analysis were proposed, which the potential relationship inside the gas emission data may be found. The mode finds model and forecast model were presented, and the detailed approach to realize this forecast was also proposed, which have been applied to forecast the gas emission quantity efficiently.展开更多
Data mining is the process of extracting implicit but potentially useful information from incomplete, noisy, and fuzzy data. Data mining offers excellent nonlinear modeling and self-organized learning, and it can play...Data mining is the process of extracting implicit but potentially useful information from incomplete, noisy, and fuzzy data. Data mining offers excellent nonlinear modeling and self-organized learning, and it can play a vital role in the interpretation of well logging data of complex reservoirs. We used data mining to identify the lithologies in a complex reservoir. The reservoir lithologies served as the classification task target and were identified using feature extraction, feature selection, and modeling of data streams. We used independent component analysis to extract information from well curves. We then used the branch-and- bound algorithm to look for the optimal feature subsets and eliminate redundant information. Finally, we used the C5.0 decision-tree algorithm to set up disaggregated models of the well logging curves. The modeling and actual logging data were in good agreement, showing the usefulness of data mining methods in complex reservoirs.展开更多
Different acupuncture-moxibustion therapies can produce different clinical effects, that is, the effect has specificity, which is significantly important in obtaining acupuncture-moxibustion efficacy. In this study, t...Different acupuncture-moxibustion therapies can produce different clinical effects, that is, the effect has specificity, which is significantly important in obtaining acupuncture-moxibustion efficacy. In this study, the clinical application laws of fire needle, acupoint injection, catgut embedment, acupoint application, moxibustion therapy and filiform needle acupuncture were summarized in the aspects of category of disease, efficacy and related prescriptions (such as medication and acupoint selection) based on the result of data mining, and the general applicable categories of disease of acupuncture-moxibustion treatment methods were further screened, so as to guide the clinical application and give play to the best efficacy.展开更多
An intrusion detection (ID) model is proposed based on the fuzzy data mining method. A major difficulty of anomaly ID is that patterns of the normal behavior change with time. In addition, an actual intrusion with a...An intrusion detection (ID) model is proposed based on the fuzzy data mining method. A major difficulty of anomaly ID is that patterns of the normal behavior change with time. In addition, an actual intrusion with a small deviation may match normal patterns. So the intrusion behavior cannot be detected by the detection system.To solve the problem, fuzzy data mining technique is utilized to extract patterns representing the normal behavior of a network. A set of fuzzy association rules mined from the network data are shown as a model of “normal behaviors”. To detect anomalous behaviors, fuzzy association rules are generated from new audit data and the similarity with sets mined from “normal” data is computed. If the similarity values are lower than a threshold value,an alarm is given. Furthermore, genetic algorithms are used to adjust the fuzzy membership functions and to select an appropriate set of features.展开更多
In order to improve the accuracy and integrality of mining data records from the web, the concepts of isomorphic page and directory page and three algorithms are proposed. An isomorphic web page is a set of web pages ...In order to improve the accuracy and integrality of mining data records from the web, the concepts of isomorphic page and directory page and three algorithms are proposed. An isomorphic web page is a set of web pages that have uniform structure, only differing in main information. A web page which contains many links that link to isomorphic web pages is called a directory page. Algorithm 1 can find directory web pages in a web using adjacent links similar analysis method. It first sorts the link, and then counts the links in each directory. If the count is greater than a given valve then finds the similar sub-page links in the directory and gives the results. A function for an isomorphic web page judgment is also proposed. Algorithm 2 can mine data records from an isomorphic page using a noise information filter. It is based on the fact that the noise information is the same in two isomorphic pages, only the main information is different. Algorithm 3 can mine data records from an entire website using the technology of spider. The experiment shows that the proposed algorithms can mine data records more intactly than the existing algorithms. Mining data records from isomorphic pages is an efficient method.展开更多
基金the National Natural Science Foundation of China(70572070)the Liaoning Province Talents Fund Projects(2005219005)the Technology Key Project of Liaoning Province(2006220019)
文摘The technique of data mining was provided to predict gas disaster in view of the characteristics of coal mine gas disaster and feature knowledge based on gas disaster. The rough set theory was used to establish data mining model of gas disaster prediction, and rough set attributes relations was discussed in prediction model of gas disaster to supplement the shortages of rough intensive reduction method by using information en- tropy criteria.The effectiveness and practicality of data mining technology in the prediction of gas disaster is confirmed through practical application.
文摘With the gradual acceleration of information construction in colleges and universities,digital campus and smart campus have gradually become important means for colleges and universities to scientifically manage the campus.They have been applied to teaching,scientific research,student management,and other fields,improving the quality and efficiency of management.This paper mainly studies the intelligent educational administration management system based on data mining technology.Firstly,this paper introduces the application process of data mining technology,and builds an intelligent educational administration management system based on data mining technology.Then,this paper optimizes the application of the Apriori algorithm in educational administration management through transaction compression and frequent sampling.Compared with the traditional Apriori algorithm,the optimized Apriori algorithm in this paper has a shorter execution time under the same minimum support.
基金Henan Province Special Research Project of Tra ditional Chinese Medicine(Grant No.2022ZY1090).
文摘In the present study,data mining and network pharmacology were utilized to explore the principles and mechanisms of traditional Chinese medicine(TCM)in treating acute appendicitis.The goal was to provide a scientific basis for clinical treatment and further research on this disease.First,we searched the National Patent Database for Chinese herbal compound prescriptions used to treat acute appendicitis.We then applied frequency analysis,character and taste meridian analysis,association rule analysis,and hierarchical cluster analysis to identify the patterns of TCM treatment for acute appendicitis,selecting key combinations of Chinese medicines.Next,we screened the main active components of these key TCM based on quality markers.Using databases such as SwissTargetPrediction,SymMap,ETCM,and STRING,we analyzed the pharmacological mechanisms of these key TCM in treating acute appendicitis.Key active components and targets were further verified through molecular docking.We identified a total of 129 patents involving 316 Chinese medicines,with 24 being frequently used.The results indicated that most Chinese herbs used for acute appendicitis were heat-clearing drugs,blood-activating and stasis-removing drugs,and purging drugs.The primary active ingredients of the Rhubarb-cortex moutan-flos lonicerae combination for treating acute appendicitis included Emodin,Paeonol,Physcion,Chlorogenic acid,Chrysophanol,Rhein acid,and Aloe-emodin.These ingredients targeted key proteins such as ALB,TP53,BCL2,STAT3,IL-6,and TNF,and were involved in cellular responses to lipopolysaccharides,cell composition,and various cytokine-mediated biological processes.They also interacted with signaling pathways like AGE-RAGE,TNF,IL-17,and FoxO.Based on patent data,this study analyzed medication patterns in the treatment of acute appendicitis,discussed the possible mechanisms of key TCM combinations,and provided a scientific basis and new perspectives for the diagnosis and treatment of the disease.
基金Supported by Shanghai College Students Innovation and Entrepreneurship Training Program Project:202310268066The 16th Batch of Science And Technology Innovation Projects of Shanghai University of Traditional Chinese Medicine:SHUTCM2023010+1 种基金2024 Shanghai Oriental Talent Program Youth Project2021 High-level Local University Innovation Team Project of Shanghai University of Traditional Chinese Medicine:No.3 Shanghai Education Commission Personnel [2022]。
文摘Objective To explore the optimization and principles of acupoint selection and coordination in the treatment of adult abdominal obesity using acupuncture and moxibustion over the past decade using data mining.Methods Clinical studies of abdominal obesity treated with acupuncture and moxibustion,collected in the past 10 years,were searched from China Biology Medicine disc(CBMdisc),China National knowledge infrastructure(CNKI),Wanfang,China Science and Technology Journal Database(VIP),Pubmed,Embase,Google Scholar,Web of Science,(The Cumulative Index to Nursing and Allied Health Literature)CINAHL,Psyclnfo and Scopus,dated from March 1,2013 to March 31,2023.Using IBM SPSS Modeler 18.0 and other software,the frequency analysis,association-rules analysis and cluster analysis were conducted on interventions,traditional Chinese medicine(TCM)patterns,use frequency of acupoint,meridian attribution of acupoint,acupoint location,etc.Results A total of 55 articles were included,with 102 prescriptions and 71 acupoints involved.The top 3 interventions were acupoint embedding method,simple electroacupuncture and simple filiform needling.Seventeen patterns/syndromes of TCM differentiation were collected,dominated by spleen deficiency and damp blockage,spleen and kidney yang deficiency and heat accumulation in stomach and intestines.The acupoints in clinical practice were mostly at the foot-yangming stomach meridian,the conception vessel and the foot-taiyin spleen meridian,and located at the abdominal region.The top 5 acupoints of high frequency were Tianshu(ST25),Zhongwan(CV12),Daheng(SP15),Zusanli(ST36),Huaroumen(ST24)and Daimai(GB26).The specific points of the high frequency were the crossing points and front-mu points,of which,ST25 and CV12 were the most prominent.After association-rules analysis on the high-frequency acupoints,20 groups of associated acupoints were obtained,in which,the core acupoints included ST25,CV12,SP15 and ST36.Conclusion In recent 10 years,abdominal obesity is treated by the acupoints of foot-yangming stomach meridian,the conception vessel and the foot-taiyin spleen meridian.Compared with the regimen for simple obesity,the acupoints at the abdominal region are specially selected in treatment of abdominal obesity,such as ST25,CV12,SP15 and ST36.Supplementary acupoints are selected based on syndrome differentiation to simultaneously address both the disease manifestations and root causes.
基金Song Hujie’s Inheritance Studio of National Renowned Traditional Chinese Medicine Experts.
文摘Objective:To explore the core acupuncture acupoints and pattern-adapted acupoint combination rules for autism spectrum disorder(ASD)complicated with sleep disorder using clinical data mining technology.Methods:A retrospective analysis was conducted on the diagnosis and treatment data of 104 children with ASD complicated with sleep disorder admitted to Xi’an Traditional Chinese Medicine(TCM)Encephalopathy Hospital from January 2022 to December 2024.Cross-pattern main acupoints were screened via frequency statistics,chi-square test,and factor analysis;pattern-specific auxiliary acupoints were extracted by combining multiple correspondence analysis,cluster analysis,and association rule mining.Results:Ten cross-pattern main acupoints(Baihui,Sishenzhen,Language Area 1,Language Area 2,Neiguan,Shenmen,Yongquan,Xuanzhong)were identified,and acupoint combination schemes for four major TCM patterns(Hyperactivity of Liver and Heart Fire,Deficiency of Kidney Essence,Deficiency of Both Heart and Spleen,Hyperactivity of Liver with Spleen Deficiency)were established.Conclusion:Acupuncture treatment should follow the principle of“regulating spirit and calming the brain as the root,and dredging collaterals based on pattern differentiation as the branch”.The synergy between main and auxiliary acupoints can accurately regulate the disease,providing a basis for precise clinical treatment.
文摘Introduction Neurosurgical emergencies such as spontaneous intracerebral hemorrhage(ICH),traumatic brain injury(TBI),and acute brain herniation are among the most time-sensitive and high-stakes conditions in modern medicine.Clinical decisions often must be made within minutes,yet these decisions are traditionally guided by limited information,heuristic reasoning,and past experience.In this context,the rise of medical data mining and real-time analytics offers a transformative opportunity:to extract actionable intelligence from the flood of clinical,imaging,and physiological data already being collected,and to use this intelligence to guide care in real time[1–3](Figure 1).
文摘Previous weighted frequent pattern (WFP) mining algorithms are not suitable for data streams for they need multiple database scans. In this paper, we present an efficient algorithm SWFP-Miner to mine weighted frequent pattern over data streams. SWFP-Miner is based on sliding window and can discover important frequent pattern from the recent data. A new refined weight definition is proposed to keep the downward closure property, and two pruning strategies are presented to prune the weighted infrequent pattern. Experimental studies are performed to evaluate the effectiveness and efficiency of SWFP-Miner.
文摘At present, there are some resistible illegal operations aiming at creating false public opinions in internet public opinions on emergent event, which seriously disrupted the normal Internet order. However, the traditional research method of internet public opinion pre-waming mainly relies on manual analysis, which is too inefficient to adapt to the analysis of massive internet public opinion information. According to the above analysis, this paper puts forward an internet public opinion pre-warning mechanism on emergent event based on multi-relational data clustering algorithm, discusses the specific pre-waming from the aspects of the state and dissemination of internet public opinions and the historical data, and automatically classifies the internet public opinions through multi-relational data clustering algorithm. And the results show that such method can be used to effectively study the internet public opinion pre-waming on emergent event, with the accuracy rate of as high as 95%.
文摘The safe production of coalmine can be further improved by forecasting the quantity of gas emission based on the real-time data and historical data which the gas monitoring system has saved. By making use of the advantages of data warehouse and data mining technology for processing large quantity of redundancy data, the method and its application of forecasting mine gas emission quantity based on FDM were studied. The constructing fuzzy resembling relation and clustering analysis were proposed, which the potential relationship inside the gas emission data may be found. The mode finds model and forecast model were presented, and the detailed approach to realize this forecast was also proposed, which have been applied to forecast the gas emission quantity efficiently.
基金sponsored by the National Science and Technology Major Project(No.2011ZX05023-005-006)
文摘Data mining is the process of extracting implicit but potentially useful information from incomplete, noisy, and fuzzy data. Data mining offers excellent nonlinear modeling and self-organized learning, and it can play a vital role in the interpretation of well logging data of complex reservoirs. We used data mining to identify the lithologies in a complex reservoir. The reservoir lithologies served as the classification task target and were identified using feature extraction, feature selection, and modeling of data streams. We used independent component analysis to extract information from well curves. We then used the branch-and- bound algorithm to look for the optimal feature subsets and eliminate redundant information. Finally, we used the C5.0 decision-tree algorithm to set up disaggregated models of the well logging curves. The modeling and actual logging data were in good agreement, showing the usefulness of data mining methods in complex reservoirs.
基金National Natural Science Foundation of China:81072883,81173342,81473773Scientific Research Project of Hebei Education Department:Z 2014145Planned Project of Young Talents in Colleges and Universities in Hebei Province:BJ 2014047
文摘Different acupuncture-moxibustion therapies can produce different clinical effects, that is, the effect has specificity, which is significantly important in obtaining acupuncture-moxibustion efficacy. In this study, the clinical application laws of fire needle, acupoint injection, catgut embedment, acupoint application, moxibustion therapy and filiform needle acupuncture were summarized in the aspects of category of disease, efficacy and related prescriptions (such as medication and acupoint selection) based on the result of data mining, and the general applicable categories of disease of acupuncture-moxibustion treatment methods were further screened, so as to guide the clinical application and give play to the best efficacy.
文摘An intrusion detection (ID) model is proposed based on the fuzzy data mining method. A major difficulty of anomaly ID is that patterns of the normal behavior change with time. In addition, an actual intrusion with a small deviation may match normal patterns. So the intrusion behavior cannot be detected by the detection system.To solve the problem, fuzzy data mining technique is utilized to extract patterns representing the normal behavior of a network. A set of fuzzy association rules mined from the network data are shown as a model of “normal behaviors”. To detect anomalous behaviors, fuzzy association rules are generated from new audit data and the similarity with sets mined from “normal” data is computed. If the similarity values are lower than a threshold value,an alarm is given. Furthermore, genetic algorithms are used to adjust the fuzzy membership functions and to select an appropriate set of features.
文摘In order to improve the accuracy and integrality of mining data records from the web, the concepts of isomorphic page and directory page and three algorithms are proposed. An isomorphic web page is a set of web pages that have uniform structure, only differing in main information. A web page which contains many links that link to isomorphic web pages is called a directory page. Algorithm 1 can find directory web pages in a web using adjacent links similar analysis method. It first sorts the link, and then counts the links in each directory. If the count is greater than a given valve then finds the similar sub-page links in the directory and gives the results. A function for an isomorphic web page judgment is also proposed. Algorithm 2 can mine data records from an isomorphic page using a noise information filter. It is based on the fact that the noise information is the same in two isomorphic pages, only the main information is different. Algorithm 3 can mine data records from an entire website using the technology of spider. The experiment shows that the proposed algorithms can mine data records more intactly than the existing algorithms. Mining data records from isomorphic pages is an efficient method.