This review presents a comprehensive and forward-looking analysis of how Large Language Models(LLMs)are transforming knowledge discovery in the rational design of advancedmicro/nano electrocatalyst materials.Electroca...This review presents a comprehensive and forward-looking analysis of how Large Language Models(LLMs)are transforming knowledge discovery in the rational design of advancedmicro/nano electrocatalyst materials.Electrocatalysis is central to sustainable energy and environmental technologies,but traditional catalyst discovery is often hindered by high complexity,fragmented knowledge,and inefficiencies.LLMs,particularly those based on Transformer architectures,offer unprecedented capabilities in extracting,synthesizing,and generating scientific knowledge from vast unstructured textual corpora.This work provides the first structured synthesis of how LLMs have been leveraged across various electrocatalysis tasks,including automated information extraction from literature,text-based property prediction,hypothesis generation,synthesis planning,and knowledge graph construction.We comparatively analyze leading LLMs and domain-specific frameworks(e.g.,CatBERTa,CataLM,CatGPT)in terms of methodology,application scope,performance metrics,and limitations.Through curated case studies across key electrocatalytic reactions—HER,OER,ORR,and CO_(2)RR—we highlight emerging trends such as the growing use of embedding-based prediction,retrieval-augmented generation,and fine-tuned scientific LLMs.The review also identifies persistent challenges,including data heterogeneity,hallucination risks,lack of standard benchmarks,and limited multimodal integration.Importantly,we articulate future research directions,such as the development of multimodal and physics-informedMatSci-LLMs,enhanced interpretability tools,and the integration of LLMswith selfdriving laboratories for autonomous discovery.By consolidating fragmented advances and outlining a unified research roadmap,this review provides valuable guidance for both materials scientists and AI practitioners seeking to accelerate catalyst innovation through large language model technologies.展开更多
Traditional Chinese medicine(TCM)serves as a treasure trove of ancient knowledge,holding a crucial position in the medical field.However,the exploration of TCM's extensive information has been hindered by challeng...Traditional Chinese medicine(TCM)serves as a treasure trove of ancient knowledge,holding a crucial position in the medical field.However,the exploration of TCM's extensive information has been hindered by challenges related to data standardization,completeness,and accuracy,primarily due to the decen-tralized distribution of TCM resources.To address these issues,we developed a platform for TCM knowledge discovery(TCMKD,https://cbcb.cdutcm.edu.cn/TCMKD/).Seven types of data,including syndromes,formulas,Chinese patent drugs(CPDs),Chinese medicinal materials(CMMs),ingredients,targets,and diseases,were manually proofread and consolidated within TCMKD.To strengthen the integration of TCM with modern medicine,TCMKD employs analytical methods such as TCM data mining,enrichment analysis,and network localization and separation.These tools help elucidate the molecular-level commonalities between TCM and contemporary scientific insights.In addition to its analytical capabilities,a quick question and answer(Q&A)system is also embedded within TCMKD to query the database efficiently,thereby improving the interactivity of the platform.The platform also provides a TCM text annotation tool,offering a simple and efficient method for TCM text mining.Overall,TCMKD not only has the potential to become a pivotal repository for TCM,delving into the pharmaco-logical foundations of TCM treatments,but its flexible embedded tools and algorithms can also be applied to the study of other traditional medical systems,extending beyond just TCM.展开更多
As a new data management paradigm,knowledge graphs can integrate multiple data sources and achieve quick responses,reasoning and better predictions in drug discovery.Characterized by powerful contagion and a high rate...As a new data management paradigm,knowledge graphs can integrate multiple data sources and achieve quick responses,reasoning and better predictions in drug discovery.Characterized by powerful contagion and a high rate of morbidity and mortality,porcine reproductive and respiratory syndrome(PRRS)is a common infectious disease in the global swine industry that causes economically great losses.Traditional Chinese medicine(TCM)has advantages in low adverse effects and a relatively affordable cost of application,and TCM is therefore conceived as a possibility to treat PRRS under the current circumstance that there is a lack of safe and effective approaches.Here,we constructed a knowledge graph containing common biomedical data from humans and Sus Scrofa as well as information from thousands of TCMs.Subsequently,we validated the effectiveness of the Sus Scrofa knowledge graph by the t-SNE algorithm and selected the optimal model(i.e.,transR)from six typical models,namely,transE,transR,DistMult,ComplEx,RESCAL and RotatE,according to five indicators,namely,MRR,MR,HITS@1,HITS@3 and HITS@10.Based on embedding vectors trained by the optimal model,anti-PRRSV TCMs were predicted by two paths,namely,VHC-Herb and VHPC-Herb,and potential anti-PRRSVTCMs were identified by retrieving the HERB database according to the phar-macological properties corresponding to symptoms of PRRS.Ultimately,Dan Shen's(Salvia miltiorrhiza Bunge)capacity to resist PRRSV infection was validated by a cell experiment in which the inhibition rate of PRRSV exceeded90%when the concentrations of Dan Shen extract were 0.004,0.008,0.016 and 0.032 mg/mL.In summary,this is the first report on the Sus Scrofa knowledge graph including TCM information,and our study reflects the important application values of deep learning on graphs in the swine industry as well as providing accessible TCM resources for PRRS.展开更多
Since Google introduced the concept of Knowledge Graphs(KGs)in 2012,their construction technologies have evolved into a comprehensive methodological framework encompassing knowledge acquisition,extraction,representati...Since Google introduced the concept of Knowledge Graphs(KGs)in 2012,their construction technologies have evolved into a comprehensive methodological framework encompassing knowledge acquisition,extraction,representation,modeling,fusion,computation,and storage.Within this framework,knowledge extraction,as the core component,directly determines KG quality.In military domains,traditional manual curation models face efficiency constraints due to data fragmentation,complex knowledge architectures,and confidentiality protocols.Meanwhile,crowdsourced ontology construction approaches from general domains prove non-transferable,while human-crafted ontologies struggle with generalization deficiencies.To address these challenges,this study proposes an OntologyAware LLM Methodology for Military Domain Knowledge Extraction(LLM-KE).This approach leverages the deep semantic comprehension capabilities of Large Language Models(LLMs)to simulate human experts’cognitive processes in crowdsourced ontology construction,enabling automated extraction of military textual knowledge.It concurrently enhances knowledge processing efficiency and improves KG completeness.Empirical analysis demonstrates that this method effectively resolves scalability and dynamic adaptation challenges in military KG construction,establishing a novel technological pathway for advancing military intelligence development.展开更多
The reliable operation of power grid secondary equipment is an important guarantee for the safety and stability of the power system.However,various defects could be produced in the secondary equipment during longtermo...The reliable operation of power grid secondary equipment is an important guarantee for the safety and stability of the power system.However,various defects could be produced in the secondary equipment during longtermoperation.The complex relationship between the defect phenomenon andmulti-layer causes and the probabilistic influence of secondary equipment cannot be described through knowledge extraction and fusion technology by existing methods,which limits the real-time and accuracy of defect identification.Therefore,a defect recognition method based on the Bayesian network and knowledge graph fusion is proposed.The defect data of secondary equipment is transformed into the structured knowledge graph through knowledge extraction and fusion technology.The knowledge graph of power grid secondary equipment is mapped to the Bayesian network framework,combined with historical defect data,and introduced Noisy-OR nodes.The prior and conditional probabilities of the Bayesian network are then reasonably assigned to build a model that reflects the probability dependence between defect phenomena and potential causes in power grid secondary equipment.Defect identification of power grid secondary equipment is achieved by defect subgraph search based on the knowledge graph,and defect inference based on the Bayesian network.Practical application cases prove this method’s effectiveness in identifying secondary equipment defect causes,improving identification accuracy and efficiency.展开更多
With the widespread use of social media,the propagation of health-related rumors has become a significant public health threat.Existing methods for detecting health rumors predominantly rely on external knowledge or p...With the widespread use of social media,the propagation of health-related rumors has become a significant public health threat.Existing methods for detecting health rumors predominantly rely on external knowledge or propagation structures,with only a few recent approaches attempting causal inference;however,these have not yet effectively integrated causal discovery with domain-specific knowledge graphs for detecting health rumors.In this study,we found that the combined use of causal discovery and domain-specific knowledge graphs can effectively identify implicit pseudo-causal logic embedded within texts,holding significant potential for health rumor detection.To this end,we propose CKDG—a dual-graph fusion framework based on causal logic and medical knowledge graphs.CKDG constructs a weighted causal graph to capture the implicit causal relationships in the text and introduces a medical knowledge graph to verify semantic consistency,thereby enhancing the ability to identify the misuse of professional terminology and pseudoscientific claims.In experiments conducted on a dataset comprising 8430 health rumors,CKDG achieved an accuracy of 91.28%and an F1 score of 90.38%,representing improvements of 5.11%and 3.29%over the best baseline,respectively.Our results indicate that the integrated use of causal discovery and domainspecific knowledge graphs offers significant advantages for health rumor detection systems.This method not only improves detection performance but also enhances the transparency and credibility of model decisions by tracing causal chains and sources of knowledge conflicts.We anticipate that this work will provide key technological support for the development of trustworthy health-information filtering systems,thereby improving the reliability of public health information on social media.展开更多
To improve the performance of the multiple classifier system, a new method of feature-decision level fusion is proposed based on knowledge discovery. In the new method, the base classifiers operate on different featur...To improve the performance of the multiple classifier system, a new method of feature-decision level fusion is proposed based on knowledge discovery. In the new method, the base classifiers operate on different feature spaces and their types depend on different measures of between-class separability. The uncertainty measures corresponding to each output of each base classifier are induced from the established decision tables (DTs) in the form of mass function in the Dempster-Shafer theory (DST). Furthermore, an effective fusion framework is built at the feature-decision level on the basis of a generalized rough set model and the DST. The experiment for the classification of hyperspectral remote sensing images shows that the performance of the classification can be improved by the proposed method compared with that of plurality voting (PV).展开更多
To discover the knowledge of fault diagnosis in maintenance record of flexible manufacture system(FMS) equipment. An algorithm (process) was presented, which consists of ① preparatory phase in which some items in mai...To discover the knowledge of fault diagnosis in maintenance record of flexible manufacture system(FMS) equipment. An algorithm (process) was presented, which consists of ① preparatory phase in which some items in maintenance record are selected and decomposed into associated concepts and attributes, and ② discovering and establishing process, in which some possible relationships between the concepts and attributes can be established and knowledge is formulated. The rich diagnosis knowledge in maintenance record was captured through applying the method. An application of the method to the diagnosis system for FMS equipment showed that the approach is correct and effective.展开更多
A new algorithm for the knowledge discovery based on statistic inductionlogic is proposed, and the validity of the methods is verified by examples. The method is suitablefor a large range of knowledge discovery applic...A new algorithm for the knowledge discovery based on statistic inductionlogic is proposed, and the validity of the methods is verified by examples. The method is suitablefor a large range of knowledge discovery applications in the studying of causal relation,uncertainty knowledge acquisition and principal factors analyzing. The language filed description ofthe state space makes the algorithm robust in the adaptation with easier understandable results,which are isomotopy with natural language in the topologic space.展开更多
The 1st International Conference on Data-driven Knowledge Discovery: When Data Science Meets Information Science took place at the National Science Library (NSL), Chinese Academy of Sciences (CAS) in Beijing from...The 1st International Conference on Data-driven Knowledge Discovery: When Data Science Meets Information Science took place at the National Science Library (NSL), Chinese Academy of Sciences (CAS) in Beijing from June 19 till June 22, 2016. The Conference was opened by NSL Director Xiangyang Huang, who placed the event within the goals of the Library, and lauded the spirit of intemational collaboration in the area of data science and knowledge discovery. The whole event was an encouraging success with over 370 registered participants and highly enlightening presentations. The Conference was organized by the Journal of Data andlnformation Science (JDIS) to bring the Joumal to the attention of an international and local audience.展开更多
The present article outlines progress made in designing an intelligent information system for automatic management and knowledge discovery in large numeric and scientific databases, with a validating application to th...The present article outlines progress made in designing an intelligent information system for automatic management and knowledge discovery in large numeric and scientific databases, with a validating application to the CAST-NEONS environmental databases used for ocean modeling and prediction. We describe a discovery-learning process (Automatic Data Analysis System) which combines the features of two machine learning techniques to generate sets of production rules that efficiently describe the observational raw data contained in the database. Data clustering allows the system to classify the raw data into meaningful conceptual clusters, which the system learns by induction to build decision trees, from which are automatically deduced the production rules.展开更多
In the current biomedical data movement, numerous efforts have been made to convert and normalize a large number of traditional structured and unstructured data (e.g., EHRs, reports) to semi-structured data (e.g., RDF...In the current biomedical data movement, numerous efforts have been made to convert and normalize a large number of traditional structured and unstructured data (e.g., EHRs, reports) to semi-structured data (e.g., RDF, OWL). With the increasing number of semi-structured data coming into the biomedical community, data integration and knowledge discovery from heterogeneous domains become important research problem. In the application level, detection of related concepts among medical ontologies is an important goal of life science research. It is more crucial to figure out how different concepts are related within a single ontology or across multiple ontologies by analysing predicates in different knowledge bases. However, the world today is one of information explosion, and it is extremely difficult for biomedical researchers to find existing or potential predicates to perform linking among cross domain concepts without any support from schema pattern analysis. Therefore, there is a need for a mechanism to do predicate oriented pattern analysis to partition heterogeneous ontologies into closer small topics and do query generation to discover cross domain knowledge from each topic. In this paper, we present such a model that predicates oriented pattern analysis based on their close relationship and generates a similarity matrix. Based on this similarity matrix, we apply an innovated unsupervised learning algorithm to partition large data sets into smaller and closer topics and generate meaningful queries to fully discover knowledge over a set of interlinked data sources. We have implemented a prototype system named BmQGen and evaluate the proposed model with colorectal surgical cohort from the Mayo Clinic.展开更多
Purpose: This paper explores a method of knowledge discovery by visualizing and analyzing co-occurrence relations among three or more entities in collections of journal articles.Design/methodology/approach: A variety ...Purpose: This paper explores a method of knowledge discovery by visualizing and analyzing co-occurrence relations among three or more entities in collections of journal articles.Design/methodology/approach: A variety of methods such as the model construction,system analysis and experiments are used. The author has improved Morris' crossmapping technique and developed a technique for directly describing,visualizing and analyzing co-occurrence relations among three or more entities in collections of journal articles.Findings: The visualization tools and the knowledge discovery method can efficiently reveal the multiple co-occurrence relations among three entities in collections of journal papers. It can reveal more and in-depth information than analyzing co-occurrence relations between two entities. Therefore,this method can be used for mapping knowledge domain that is manifested in association with the entities from multi-dimensional perspectives and in an all-round way.Research limitations: The technique could only be used to analyze co-occurrence relations of less than three entities at present.Practical implications: This research has expanded the study scope of co-occurrence analysis.The research result has provided a theoretical support for co-occurrence analysis.Originality/value: There has not been a systematic study on co-occurrence relations among multiple entities in collections of journal articles. This research defines multiple co-occurrence and the research scope,develops the visualization analysis tool and designs the analysis model of the knowledge discovery method.展开更多
Knowledge discovery, as an increasingly adopted information technology in biomedical science, has shown great promise in the field of Traditional Chinese Medicine (TCM). In this paper, we provided a kind of multidimen...Knowledge discovery, as an increasingly adopted information technology in biomedical science, has shown great promise in the field of Traditional Chinese Medicine (TCM). In this paper, we provided a kind of multidimensional table which was well suited for organizing and analyzing the data in ancient Chinese books on Materia Medica. Moreover, we demonstrated its capability of facilitating further mining works in TCM through two illustrative studies of discovering meaningful patterns in the three-dimensional table of Shennong’s Classic of Materia Medica. This work might provide an appropriate data model for the development of knowledge discovery in TCM.展开更多
Singular rough sets (S-rough sets) have three classes of forms: one-directional S-rough sets, dual of onedirectional S-rough sets, and two-directional S-rough sets. Dynamic, hereditary, mnemonic, and hiding propert...Singular rough sets (S-rough sets) have three classes of forms: one-directional S-rough sets, dual of onedirectional S-rough sets, and two-directional S-rough sets. Dynamic, hereditary, mnemonic, and hiding properties are the basic characteristics of S-rough sets. By using the S-rough sets, the concepts of f-hiding knowledge, F-hiding knowledge, hiding degree, and hiding dependence degree are given. Then, both the hiding theorem and the hiding dependence theorem of hiding knowledge are proposed. Finally, an application of hiding knowledge is discussed.展开更多
This paper proposes the principle of comprehensive knowledge discovery.Unlike most of the current knowledge discovery methods,the comprehensive knowledge discovery considers both the spatial relations and attributes o...This paper proposes the principle of comprehensive knowledge discovery.Unlike most of the current knowledge discovery methods,the comprehensive knowledge discovery considers both the spatial relations and attributes of spatial entities or objects.We introduce the theory of spatial knowledge expression system and some concepts including comprehensive knowledge discovery and spatial union information table(SUIT).In theory,SUIT records all information contained in the studied objects,but in reality,because of the complexity and varieties of spatial relations,only those factors of interest to us are selected.In order to find out the comprehensive knowledge from spatial databases,an efficient comprehensive knowledge discovery algorithm called recycled algorithm(RAR)is suggested.展开更多
From the ecological viewpoint this paper discusses the urban spatial-temporal relationship. We take regional towns and cities as a complex man-land system of urban eco-community. This complex man-land system comprises...From the ecological viewpoint this paper discusses the urban spatial-temporal relationship. We take regional towns and cities as a complex man-land system of urban eco-community. This complex man-land system comprises two elements of ' man' and ' land' . Here, ' man' means organization with self-determined consciousness, and ' land' means the physical environment (niche) that ' man' depends on. The complex man-land system has three basic components. They are individual, population and community. Therefore there are six types of spatial relationship for the complex man-land system. They are individual, population,community,man-man, land-land and man-land spatial relationships. Taking the Pearl(Zhujiang) River Delta as a case study, the authors found some evidence of the urban spatial relationship from the remote sensing data. Firstly, the concentration and diffusion of the cities spatial relationship was found in the remote sensing imagery. Most of the cities concentrate in the core area of the Pearl River Delta, but the diffusion situation is also significant. Secondly, the growth behavior and succession behavior of the urban spatial relationship was found in the remote sensing images comparison with different temporal data. Thirdly, the inheritance, break, or meeting emergency behavior was observed from the remote sensing data. Fourthly, the authors found many cases of symbiosis and competition in the remote sensing data of the Pearl River Delta. Fifthly, the autoeciousness, stranglehold and invasion behavior of the urban spatial relationship was discovered from the remote sensing data.展开更多
LP (Logic Programming) has been successfully applied to knowledge discovery in many fields. The execution of the LP is based on the evaluation of the first order predicate. Usually the information involved in the pred...LP (Logic Programming) has been successfully applied to knowledge discovery in many fields. The execution of the LP is based on the evaluation of the first order predicate. Usually the information involved in the predicates are local and homogenous, thus the evaluation process is relatively simple. However, the evaluation process become much more complicated when applied to KDD on the Internet where the information involved in the predicates maybe heterogeneous and distributed over many different sits. Therefor, we try to attack the problem in a multi agent system's framework so that the logic program can be written in a site independent style and deal easily with heterogeneous represented information.展开更多
There are both associations and differences between structured and unstructured data mining. How to unite them together to be a united theoretical framework and to guide the research of knowledge discovery and data mi...There are both associations and differences between structured and unstructured data mining. How to unite them together to be a united theoretical framework and to guide the research of knowledge discovery and data mining has become an urgent problem to be solved. On the base of analysis and study of existing research results, the united model of knowledge discovery state space (UMKDSS) is presented, and the structured data mining and the complex type data mining are associated together. UMKDSS can provide theoretical guidance for complex type data mining. An application example of UMKDSS is given at last.展开更多
文摘This review presents a comprehensive and forward-looking analysis of how Large Language Models(LLMs)are transforming knowledge discovery in the rational design of advancedmicro/nano electrocatalyst materials.Electrocatalysis is central to sustainable energy and environmental technologies,but traditional catalyst discovery is often hindered by high complexity,fragmented knowledge,and inefficiencies.LLMs,particularly those based on Transformer architectures,offer unprecedented capabilities in extracting,synthesizing,and generating scientific knowledge from vast unstructured textual corpora.This work provides the first structured synthesis of how LLMs have been leveraged across various electrocatalysis tasks,including automated information extraction from literature,text-based property prediction,hypothesis generation,synthesis planning,and knowledge graph construction.We comparatively analyze leading LLMs and domain-specific frameworks(e.g.,CatBERTa,CataLM,CatGPT)in terms of methodology,application scope,performance metrics,and limitations.Through curated case studies across key electrocatalytic reactions—HER,OER,ORR,and CO_(2)RR—we highlight emerging trends such as the growing use of embedding-based prediction,retrieval-augmented generation,and fine-tuned scientific LLMs.The review also identifies persistent challenges,including data heterogeneity,hallucination risks,lack of standard benchmarks,and limited multimodal integration.Importantly,we articulate future research directions,such as the development of multimodal and physics-informedMatSci-LLMs,enhanced interpretability tools,and the integration of LLMswith selfdriving laboratories for autonomous discovery.By consolidating fragmented advances and outlining a unified research roadmap,this review provides valuable guidance for both materials scientists and AI practitioners seeking to accelerate catalyst innovation through large language model technologies.
基金supported by Natural Science Foundation of Sichuan,China(Grant No.:2024ZDZX0019).
文摘Traditional Chinese medicine(TCM)serves as a treasure trove of ancient knowledge,holding a crucial position in the medical field.However,the exploration of TCM's extensive information has been hindered by challenges related to data standardization,completeness,and accuracy,primarily due to the decen-tralized distribution of TCM resources.To address these issues,we developed a platform for TCM knowledge discovery(TCMKD,https://cbcb.cdutcm.edu.cn/TCMKD/).Seven types of data,including syndromes,formulas,Chinese patent drugs(CPDs),Chinese medicinal materials(CMMs),ingredients,targets,and diseases,were manually proofread and consolidated within TCMKD.To strengthen the integration of TCM with modern medicine,TCMKD employs analytical methods such as TCM data mining,enrichment analysis,and network localization and separation.These tools help elucidate the molecular-level commonalities between TCM and contemporary scientific insights.In addition to its analytical capabilities,a quick question and answer(Q&A)system is also embedded within TCMKD to query the database efficiently,thereby improving the interactivity of the platform.The platform also provides a TCM text annotation tool,offering a simple and efficient method for TCM text mining.Overall,TCMKD not only has the potential to become a pivotal repository for TCM,delving into the pharmaco-logical foundations of TCM treatments,but its flexible embedded tools and algorithms can also be applied to the study of other traditional medical systems,extending beyond just TCM.
基金supported by the China Fundamental Research Funds for the Central Universities(No.2662022XXYJ001,2662022JC004,2662023XXPY005)。
文摘As a new data management paradigm,knowledge graphs can integrate multiple data sources and achieve quick responses,reasoning and better predictions in drug discovery.Characterized by powerful contagion and a high rate of morbidity and mortality,porcine reproductive and respiratory syndrome(PRRS)is a common infectious disease in the global swine industry that causes economically great losses.Traditional Chinese medicine(TCM)has advantages in low adverse effects and a relatively affordable cost of application,and TCM is therefore conceived as a possibility to treat PRRS under the current circumstance that there is a lack of safe and effective approaches.Here,we constructed a knowledge graph containing common biomedical data from humans and Sus Scrofa as well as information from thousands of TCMs.Subsequently,we validated the effectiveness of the Sus Scrofa knowledge graph by the t-SNE algorithm and selected the optimal model(i.e.,transR)from six typical models,namely,transE,transR,DistMult,ComplEx,RESCAL and RotatE,according to five indicators,namely,MRR,MR,HITS@1,HITS@3 and HITS@10.Based on embedding vectors trained by the optimal model,anti-PRRSV TCMs were predicted by two paths,namely,VHC-Herb and VHPC-Herb,and potential anti-PRRSVTCMs were identified by retrieving the HERB database according to the phar-macological properties corresponding to symptoms of PRRS.Ultimately,Dan Shen's(Salvia miltiorrhiza Bunge)capacity to resist PRRSV infection was validated by a cell experiment in which the inhibition rate of PRRSV exceeded90%when the concentrations of Dan Shen extract were 0.004,0.008,0.016 and 0.032 mg/mL.In summary,this is the first report on the Sus Scrofa knowledge graph including TCM information,and our study reflects the important application values of deep learning on graphs in the swine industry as well as providing accessible TCM resources for PRRS.
文摘Since Google introduced the concept of Knowledge Graphs(KGs)in 2012,their construction technologies have evolved into a comprehensive methodological framework encompassing knowledge acquisition,extraction,representation,modeling,fusion,computation,and storage.Within this framework,knowledge extraction,as the core component,directly determines KG quality.In military domains,traditional manual curation models face efficiency constraints due to data fragmentation,complex knowledge architectures,and confidentiality protocols.Meanwhile,crowdsourced ontology construction approaches from general domains prove non-transferable,while human-crafted ontologies struggle with generalization deficiencies.To address these challenges,this study proposes an OntologyAware LLM Methodology for Military Domain Knowledge Extraction(LLM-KE).This approach leverages the deep semantic comprehension capabilities of Large Language Models(LLMs)to simulate human experts’cognitive processes in crowdsourced ontology construction,enabling automated extraction of military textual knowledge.It concurrently enhances knowledge processing efficiency and improves KG completeness.Empirical analysis demonstrates that this method effectively resolves scalability and dynamic adaptation challenges in military KG construction,establishing a novel technological pathway for advancing military intelligence development.
基金supported by the State Grid Southwest Branch Project“Research on Defect Diagnosis and Early Warning Technology of Relay Protection and Safety Automation Devices Based on Multi-Source Heterogeneous Defect Data”.
文摘The reliable operation of power grid secondary equipment is an important guarantee for the safety and stability of the power system.However,various defects could be produced in the secondary equipment during longtermoperation.The complex relationship between the defect phenomenon andmulti-layer causes and the probabilistic influence of secondary equipment cannot be described through knowledge extraction and fusion technology by existing methods,which limits the real-time and accuracy of defect identification.Therefore,a defect recognition method based on the Bayesian network and knowledge graph fusion is proposed.The defect data of secondary equipment is transformed into the structured knowledge graph through knowledge extraction and fusion technology.The knowledge graph of power grid secondary equipment is mapped to the Bayesian network framework,combined with historical defect data,and introduced Noisy-OR nodes.The prior and conditional probabilities of the Bayesian network are then reasonably assigned to build a model that reflects the probability dependence between defect phenomena and potential causes in power grid secondary equipment.Defect identification of power grid secondary equipment is achieved by defect subgraph search based on the knowledge graph,and defect inference based on the Bayesian network.Practical application cases prove this method’s effectiveness in identifying secondary equipment defect causes,improving identification accuracy and efficiency.
基金funded by the Hunan Provincial Natural Science Foundation of China(Grant No.2025JJ70105)the Hunan Provincial College Students’Innovation and Entrepreneurship Training Program(Project No.S202411342056)The article processing charge(APC)was funded by the Project No.2025JJ70105.
文摘With the widespread use of social media,the propagation of health-related rumors has become a significant public health threat.Existing methods for detecting health rumors predominantly rely on external knowledge or propagation structures,with only a few recent approaches attempting causal inference;however,these have not yet effectively integrated causal discovery with domain-specific knowledge graphs for detecting health rumors.In this study,we found that the combined use of causal discovery and domain-specific knowledge graphs can effectively identify implicit pseudo-causal logic embedded within texts,holding significant potential for health rumor detection.To this end,we propose CKDG—a dual-graph fusion framework based on causal logic and medical knowledge graphs.CKDG constructs a weighted causal graph to capture the implicit causal relationships in the text and introduces a medical knowledge graph to verify semantic consistency,thereby enhancing the ability to identify the misuse of professional terminology and pseudoscientific claims.In experiments conducted on a dataset comprising 8430 health rumors,CKDG achieved an accuracy of 91.28%and an F1 score of 90.38%,representing improvements of 5.11%and 3.29%over the best baseline,respectively.Our results indicate that the integrated use of causal discovery and domainspecific knowledge graphs offers significant advantages for health rumor detection systems.This method not only improves detection performance but also enhances the transparency and credibility of model decisions by tracing causal chains and sources of knowledge conflicts.We anticipate that this work will provide key technological support for the development of trustworthy health-information filtering systems,thereby improving the reliability of public health information on social media.
文摘To improve the performance of the multiple classifier system, a new method of feature-decision level fusion is proposed based on knowledge discovery. In the new method, the base classifiers operate on different feature spaces and their types depend on different measures of between-class separability. The uncertainty measures corresponding to each output of each base classifier are induced from the established decision tables (DTs) in the form of mass function in the Dempster-Shafer theory (DST). Furthermore, an effective fusion framework is built at the feature-decision level on the basis of a generalized rough set model and the DST. The experiment for the classification of hyperspectral remote sensing images shows that the performance of the classification can be improved by the proposed method compared with that of plurality voting (PV).
文摘To discover the knowledge of fault diagnosis in maintenance record of flexible manufacture system(FMS) equipment. An algorithm (process) was presented, which consists of ① preparatory phase in which some items in maintenance record are selected and decomposed into associated concepts and attributes, and ② discovering and establishing process, in which some possible relationships between the concepts and attributes can be established and knowledge is formulated. The rich diagnosis knowledge in maintenance record was captured through applying the method. An application of the method to the diagnosis system for FMS equipment showed that the approach is correct and effective.
基金[This work was financially supported by the National Natural Science Foundation of China (No. 69835001).]
文摘A new algorithm for the knowledge discovery based on statistic inductionlogic is proposed, and the validity of the methods is verified by examples. The method is suitablefor a large range of knowledge discovery applications in the studying of causal relation,uncertainty knowledge acquisition and principal factors analyzing. The language filed description ofthe state space makes the algorithm robust in the adaptation with easier understandable results,which are isomotopy with natural language in the topologic space.
文摘The 1st International Conference on Data-driven Knowledge Discovery: When Data Science Meets Information Science took place at the National Science Library (NSL), Chinese Academy of Sciences (CAS) in Beijing from June 19 till June 22, 2016. The Conference was opened by NSL Director Xiangyang Huang, who placed the event within the goals of the Library, and lauded the spirit of intemational collaboration in the area of data science and knowledge discovery. The whole event was an encouraging success with over 370 registered participants and highly enlightening presentations. The Conference was organized by the Journal of Data andlnformation Science (JDIS) to bring the Joumal to the attention of an international and local audience.
文摘The present article outlines progress made in designing an intelligent information system for automatic management and knowledge discovery in large numeric and scientific databases, with a validating application to the CAST-NEONS environmental databases used for ocean modeling and prediction. We describe a discovery-learning process (Automatic Data Analysis System) which combines the features of two machine learning techniques to generate sets of production rules that efficiently describe the observational raw data contained in the database. Data clustering allows the system to classify the raw data into meaningful conceptual clusters, which the system learns by induction to build decision trees, from which are automatically deduced the production rules.
文摘In the current biomedical data movement, numerous efforts have been made to convert and normalize a large number of traditional structured and unstructured data (e.g., EHRs, reports) to semi-structured data (e.g., RDF, OWL). With the increasing number of semi-structured data coming into the biomedical community, data integration and knowledge discovery from heterogeneous domains become important research problem. In the application level, detection of related concepts among medical ontologies is an important goal of life science research. It is more crucial to figure out how different concepts are related within a single ontology or across multiple ontologies by analysing predicates in different knowledge bases. However, the world today is one of information explosion, and it is extremely difficult for biomedical researchers to find existing or potential predicates to perform linking among cross domain concepts without any support from schema pattern analysis. Therefore, there is a need for a mechanism to do predicate oriented pattern analysis to partition heterogeneous ontologies into closer small topics and do query generation to discover cross domain knowledge from each topic. In this paper, we present such a model that predicates oriented pattern analysis based on their close relationship and generates a similarity matrix. Based on this similarity matrix, we apply an innovated unsupervised learning algorithm to partition large data sets into smaller and closer topics and generate meaningful queries to fully discover knowledge over a set of interlinked data sources. We have implemented a prototype system named BmQGen and evaluate the proposed model with colorectal surgical cohort from the Mayo Clinic.
文摘Purpose: This paper explores a method of knowledge discovery by visualizing and analyzing co-occurrence relations among three or more entities in collections of journal articles.Design/methodology/approach: A variety of methods such as the model construction,system analysis and experiments are used. The author has improved Morris' crossmapping technique and developed a technique for directly describing,visualizing and analyzing co-occurrence relations among three or more entities in collections of journal articles.Findings: The visualization tools and the knowledge discovery method can efficiently reveal the multiple co-occurrence relations among three entities in collections of journal papers. It can reveal more and in-depth information than analyzing co-occurrence relations between two entities. Therefore,this method can be used for mapping knowledge domain that is manifested in association with the entities from multi-dimensional perspectives and in an all-round way.Research limitations: The technique could only be used to analyze co-occurrence relations of less than three entities at present.Practical implications: This research has expanded the study scope of co-occurrence analysis.The research result has provided a theoretical support for co-occurrence analysis.Originality/value: There has not been a systematic study on co-occurrence relations among multiple entities in collections of journal articles. This research defines multiple co-occurrence and the research scope,develops the visualization analysis tool and designs the analysis model of the knowledge discovery method.
文摘Knowledge discovery, as an increasingly adopted information technology in biomedical science, has shown great promise in the field of Traditional Chinese Medicine (TCM). In this paper, we provided a kind of multidimensional table which was well suited for organizing and analyzing the data in ancient Chinese books on Materia Medica. Moreover, we demonstrated its capability of facilitating further mining works in TCM through two illustrative studies of discovering meaningful patterns in the three-dimensional table of Shennong’s Classic of Materia Medica. This work might provide an appropriate data model for the development of knowledge discovery in TCM.
基金supported by the National Natural Science Foundation of China (60364001,70461001)the Hainan Provincial Natural Science Foundation of China (807054)Hainan Provincial Education Office Foundation (HJ 2008-56)
文摘Singular rough sets (S-rough sets) have three classes of forms: one-directional S-rough sets, dual of onedirectional S-rough sets, and two-directional S-rough sets. Dynamic, hereditary, mnemonic, and hiding properties are the basic characteristics of S-rough sets. By using the S-rough sets, the concepts of f-hiding knowledge, F-hiding knowledge, hiding degree, and hiding dependence degree are given. Then, both the hiding theorem and the hiding dependence theorem of hiding knowledge are proposed. Finally, an application of hiding knowledge is discussed.
基金the China’s National Surveying Technical Fund(No.20007)
文摘This paper proposes the principle of comprehensive knowledge discovery.Unlike most of the current knowledge discovery methods,the comprehensive knowledge discovery considers both the spatial relations and attributes of spatial entities or objects.We introduce the theory of spatial knowledge expression system and some concepts including comprehensive knowledge discovery and spatial union information table(SUIT).In theory,SUIT records all information contained in the studied objects,but in reality,because of the complexity and varieties of spatial relations,only those factors of interest to us are selected.In order to find out the comprehensive knowledge from spatial databases,an efficient comprehensive knowledge discovery algorithm called recycled algorithm(RAR)is suggested.
基金Under the auspices of the National Natural Science Foundation of China(No.69896250-4).
文摘From the ecological viewpoint this paper discusses the urban spatial-temporal relationship. We take regional towns and cities as a complex man-land system of urban eco-community. This complex man-land system comprises two elements of ' man' and ' land' . Here, ' man' means organization with self-determined consciousness, and ' land' means the physical environment (niche) that ' man' depends on. The complex man-land system has three basic components. They are individual, population and community. Therefore there are six types of spatial relationship for the complex man-land system. They are individual, population,community,man-man, land-land and man-land spatial relationships. Taking the Pearl(Zhujiang) River Delta as a case study, the authors found some evidence of the urban spatial relationship from the remote sensing data. Firstly, the concentration and diffusion of the cities spatial relationship was found in the remote sensing imagery. Most of the cities concentrate in the core area of the Pearl River Delta, but the diffusion situation is also significant. Secondly, the growth behavior and succession behavior of the urban spatial relationship was found in the remote sensing images comparison with different temporal data. Thirdly, the inheritance, break, or meeting emergency behavior was observed from the remote sensing data. Fourthly, the authors found many cases of symbiosis and competition in the remote sensing data of the Pearl River Delta. Fifthly, the autoeciousness, stranglehold and invasion behavior of the urban spatial relationship was discovered from the remote sensing data.
文摘LP (Logic Programming) has been successfully applied to knowledge discovery in many fields. The execution of the LP is based on the evaluation of the first order predicate. Usually the information involved in the predicates are local and homogenous, thus the evaluation process is relatively simple. However, the evaluation process become much more complicated when applied to KDD on the Internet where the information involved in the predicates maybe heterogeneous and distributed over many different sits. Therefor, we try to attack the problem in a multi agent system's framework so that the logic program can be written in a site independent style and deal easily with heterogeneous represented information.
文摘There are both associations and differences between structured and unstructured data mining. How to unite them together to be a united theoretical framework and to guide the research of knowledge discovery and data mining has become an urgent problem to be solved. On the base of analysis and study of existing research results, the united model of knowledge discovery state space (UMKDSS) is presented, and the structured data mining and the complex type data mining are associated together. UMKDSS can provide theoretical guidance for complex type data mining. An application example of UMKDSS is given at last.