Structural choice is a significant decision having an important influence on structural function, social economics, structural reliability and construction cost. A Case Based Reasoning system with its retrieval part c...Structural choice is a significant decision having an important influence on structural function, social economics, structural reliability and construction cost. A Case Based Reasoning system with its retrieval part constructed with a KDD subsystem, is put forward to make a decision for a large scale engineering project. A typical CBR system consists of four parts: case representation, case retriever, evaluation, and adaptation. A case library is a set of parameterized excellent and successful structures. For a structural choice, the key point is that the system must be able to detect the pattern classes hidden in the case library and classify the input parameters into classes properly. That is done by using the KDD Data Mining algorithm based on Self Organizing Feature Maps (SOFM), which makes the whole system more adaptive, self organizing, self learning and open.展开更多
Thallium has been used geochemical exploration of gold deposits. However, as an indicator element in searching for hydrothermal the T1 minerals and mineralization are rare in nature. Lorandite T1AsS2, a relatively un...Thallium has been used geochemical exploration of gold deposits. However, as an indicator element in searching for hydrothermal the T1 minerals and mineralization are rare in nature. Lorandite T1AsS2, a relatively uncommon mineral, has been dominantly discovered in some Carlin gold deposits, and minor Sb- Hg, U and Pb-Zn-Ag deposits.展开更多
Objective To explore the candidate genes that play significant roles in the interconnection between abdominal aortic aneurysm(AAA)and type 2 diabetes mellitus(DM).Methods We used the Biomedical Discovery Support Syste...Objective To explore the candidate genes that play significant roles in the interconnection between abdominal aortic aneurysm(AAA)and type 2 diabetes mellitus(DM).Methods We used the Biomedical Discovery Support System(BITOLA)to screen out the candidate intermediate molecular(CIM)"Gene or Gene Product”that are related to AAA and DM.The dataset of GSE13760,GSE7084,GSE57691,GSE47472 were used to analyze the differentially expressed genes(DEGs)of AAA and DM compared to the healthy status.We used the online tool ofVenny 2.1 assisted by manual checking to identify the overlapped DEGs with the CIMs.The Human eFP Browser was applied to examine the tissue specific expression levels of the detected genes in order to recognize strong expressed genes in both human artery and pancreatic tissue.Results There were 86 CIMs suggested by the closed BITOLA system.Among all the DEGs of AAA and DM,8 genes in GSE7084(ISG20,ITGAX,DSTN,CCL5,CCR5,AGTR1,CD19,CD44)and 2 genes in GSE 13760(PSMD12,FAS)were found to be overlapped with the 86 CIMs.By manual checking and comparing with tissuespecific gene data through Human eFP Browser,the gene PSMD12(proteasome 26S subunit,non-ATPase 12)was recognized to be strongly expressed in both the aorta and pancreatic tissue.Conclusion We proposed a hypothesis through text mining that PSMD12 might be involved or potentially involved in the interconnection between AAA and DM,which may provide a new clue for studies on novel therapeutic strategies for the two diseases.展开更多
The need for the analysis of modern businesses is rapidly increasing as the supporting enterprise systems generate more and more data.This data can be extremely valuable for executing organizations because the data al...The need for the analysis of modern businesses is rapidly increasing as the supporting enterprise systems generate more and more data.This data can be extremely valuable for executing organizations because the data allows constant monitoring,analyzing,and improving the underlying processes,which leads to the reduction of cost and the improvement of the quality.Process mining is a useful technique for analyzing enterprise systems by using an event log that contains behaviours.This research focuses on the process discovery and refinement using real-life event log data collected from a large multinational organization that deals with coatings and paints.By investigating and analyzing their order handling pro-cesses,this study aims at learning a model that gives insight inspection of the processes and performance analysis.Furthermore,the animation is also performed for the better inspection,diagnostics,and compliance-related questions to specify the system.The configuration of the system and the conformance checking for further enhancement is also addressed in this research.To achieve the objectives,this research uses process mining techniques,i.e.process discovery in the form of formal Petri nets models with the help of process maps,and process refinement through conformance checking and enhancement.Initially,the identified executed process is reconstructed by using the process discovery techniques.Following the reconstruction,we perform a deep analysis for the underlying process to ensure the process improvement and redesigning.Finally,some recommendations are made to improve the enterprise management system processes.展开更多
A new structure of ESKD (expert system based on knowledge discovery system KD (D&K)) is first presented on the basis of KD (D&K)-a synthesized knowledge discovery system based on double-base (database and know...A new structure of ESKD (expert system based on knowledge discovery system KD (D&K)) is first presented on the basis of KD (D&K)-a synthesized knowledge discovery system based on double-base (database and knowledge base) cooperating mechanism. With all new features, ESKD may form a new research direction and provide a great probability for solving the wealth of knowledge in the knowledge base. The general structural frame of ESKD and some sub-systems among ESKD have been described, and the dynamic knowledge base based on double-base cooperating mechanism has been emphased on. According to the result of demonstrative experi- ment, the structure of ESKD is effective and feasible.展开更多
To overcome the problem of existing neighboring access point (AP) discovery methods in WLAN, for example they (PnP) of mul proposed. Us can not provide the accurate neighboring APs information needed for the Plug-...To overcome the problem of existing neighboring access point (AP) discovery methods in WLAN, for example they (PnP) of mul proposed. Us can not provide the accurate neighboring APs information needed for the Plug-and-Play ti-mode APs, three kinds of neighboring AP discovery and information exchange methods are ing these three neighboring AP discovery methods, passive discovery method, active discovery method and station assistant discovery method, the multi-mode AP can discover all neighboring APs and obtain needed information. We further propose two whole process flows, which combine three discovery methods in different manner, to achieve different goals. One process flow is to discover the neighboring AP as fast as possible, called fast discovery process flow. The other is to discover the neighboring AP with minimal interference to neighboring and accuracy of the method is confirmed APs, called the minimal interference process flow. The validity by the simulation.展开更多
A method is presented for performing knowledge discovery on the dynamic data of a nonlinear system. In the proposed approach, a synchronized phasor measurement technique is used to acquire the dynamic data of the nonl...A method is presented for performing knowledge discovery on the dynamic data of a nonlinear system. In the proposed approach, a synchronized phasor measurement technique is used to acquire the dynamic data of the nonlinear system and a hyper-rectangular type neural network (HRTNN) is then applied to extract crisp and fuzzy rules with which to estimate the system stability. The effectiveness of the proposed methodology is verified using the dynamic data of a typical real-world nonlinear system, namely an AEP-14 bus, and the extracted rules are relating to the knowledge discovery of the stability levels for the nonlinear system. The discovered relationships among the dynamic data (i.e., the operating state), the extracted rules, and the system stability are confirmed by means of a two-stage confirmatory factor analysis.展开更多
Transformer models have emerged as pivotal tools within the realm of drug discovery,distinguished by their unique architectural features and exceptional performance in managing intricate data landscapes.Leveraging the...Transformer models have emerged as pivotal tools within the realm of drug discovery,distinguished by their unique architectural features and exceptional performance in managing intricate data landscapes.Leveraging the innate capabilities of transformer architectures to comprehend intricate hierarchical dependencies inherent in sequential data,these models showcase remarkable efficacy across various tasks,including new drug design and drug target identification.The adaptability of pre-trained trans-former-based models renders them indispensable assets for driving data-centric advancements in drug discovery,chemistry,and biology,furnishing a robust framework that expedites innovation and dis-covery within these domains.Beyond their technical prowess,the success of transformer-based models in drug discovery,chemistry,and biology extends to their interdisciplinary potential,seamlessly combining biological,physical,chemical,and pharmacological insights to bridge gaps across diverse disciplines.This integrative approach not only enhances the depth and breadth of research endeavors but also fosters synergistic collaborations and exchange of ideas among disparate fields.In our review,we elucidate the myriad applications of transformers in drug discovery,as well as chemistry and biology,spanning from protein design and protein engineering,to molecular dynamics(MD),drug target iden-tification,transformer-enabled drug virtual screening(VS),drug lead optimization,drug addiction,small data set challenges,chemical and biological image analysis,chemical language understanding,and single cell data.Finally,we conclude the survey by deliberating on promising trends in transformer models within the context of drug discovery and other sciences.展开更多
Recent developments in database technology have seen a wide variety of data being stored in huge collections. The wide variety makes the analysis tasks of a generic database a strenuous task in knowledge discovery. On...Recent developments in database technology have seen a wide variety of data being stored in huge collections. The wide variety makes the analysis tasks of a generic database a strenuous task in knowledge discovery. One approach is to summarize large datasets in such a way that the resulting summary dataset is of manageable size. Histogram has received significant attention as summarization/representative object for large database. But, it suffers from computational and space complexity. In this paper, we propose an idea to transform the histogram object into a Piecewise Linear Regression (PLR) line object and suggest that PLR objects can be less computational and storage intensive while compared to those of histograms. On the other hand to carry out a cluster analysis, we propose a distance measure for computing the distance between the PLR lines. Case study is presented based on the real data of online education system LMS. This demonstrates that PLR is a powerful knowledge representative for very large database.展开更多
BACKGROUND Gastrointestinal(GI)malignancies,including gastric and colorectal cancers,remain one of the primary contributors to cancer-related illness and death globally.Despite the availability of conventional diagnos...BACKGROUND Gastrointestinal(GI)malignancies,including gastric and colorectal cancers,remain one of the primary contributors to cancer-related illness and death globally.Despite the availability of conventional diagnostic tools,early detection and personalized treatment remain significant clinical challenges.Integrated multi-omics methods encompassing genomic,transcriptomic,proteomic,metabolomic,and microbiome profiles have emerged as powerful tools for advancing precision oncology,improving diagnostic accuracy,and informing therapeutic strategies.AIM To investigate the application of multi-omics approaches in the early detection,risk stratification,treatment optimization,and biomarker discovery of GI malignancies.METHODS The systematic review process was conducted in accordance with the PRISMA 2020 guidelines.Five databases,PubMed,ScienceDirect,Scopus,ProQuest,and Web of Science,were searched for studies published in English from 2015 onwards.Eligible studies involved human subjects and focused on multi-omics integration in GI cancers,including biomarker identification,tumor microenvironment analysis,tumor heterogeneity,organoid modeling,and artificial intelligence(AI)-driven analytics.Data extraction included study characteristics,omics modalities,clinical applications,and evaluation of study quality conducted with the Cochrane risk of bias 2.0 instrument.RESULTS A total of 17196 initially identified articles,20 met the inclusion criteria.The findings highlight the superiority of multi-omics platforms over traditional biomarkers(e.g.,carcinoembryonic antigen and carbohydrate antigen 19-9 in detecting early stage GI cancers.Key applications include the identification of circulating tumor DNA,extracellular vesicles,lipidomic and proteomic signatures,and the adoption of AI algorithms to enhance diagnostic precision.Multi-omics analysis has also revealed the mechanisms of immune modulation,tumor microenvironment regulation,metastatic behavior,and drug resistance.Organoid models and microbiota profiling have contributed to personalized therapeutic strategies and immunotherapy optimization.CONCLUSION Multi-omics approaches offer significant advancements in the early diagnosis,prognostic evaluation,and personalized treatment of GI malignancies.Their integration with AI analytics,organoid biobanking,and microbiota modulation provides a pathway for precision oncology research.展开更多
This review presents a comprehensive and forward-looking analysis of how Large Language Models(LLMs)are transforming knowledge discovery in the rational design of advancedmicro/nano electrocatalyst materials.Electroca...This review presents a comprehensive and forward-looking analysis of how Large Language Models(LLMs)are transforming knowledge discovery in the rational design of advancedmicro/nano electrocatalyst materials.Electrocatalysis is central to sustainable energy and environmental technologies,but traditional catalyst discovery is often hindered by high complexity,fragmented knowledge,and inefficiencies.LLMs,particularly those based on Transformer architectures,offer unprecedented capabilities in extracting,synthesizing,and generating scientific knowledge from vast unstructured textual corpora.This work provides the first structured synthesis of how LLMs have been leveraged across various electrocatalysis tasks,including automated information extraction from literature,text-based property prediction,hypothesis generation,synthesis planning,and knowledge graph construction.We comparatively analyze leading LLMs and domain-specific frameworks(e.g.,CatBERTa,CataLM,CatGPT)in terms of methodology,application scope,performance metrics,and limitations.Through curated case studies across key electrocatalytic reactions—HER,OER,ORR,and CO_(2)RR—we highlight emerging trends such as the growing use of embedding-based prediction,retrieval-augmented generation,and fine-tuned scientific LLMs.The review also identifies persistent challenges,including data heterogeneity,hallucination risks,lack of standard benchmarks,and limited multimodal integration.Importantly,we articulate future research directions,such as the development of multimodal and physics-informedMatSci-LLMs,enhanced interpretability tools,and the integration of LLMswith selfdriving laboratories for autonomous discovery.By consolidating fragmented advances and outlining a unified research roadmap,this review provides valuable guidance for both materials scientists and AI practitioners seeking to accelerate catalyst innovation through large language model technologies.展开更多
In the realm of drug discovery,recent advancements have paved the way for innovative approaches and methodologies.This comprehensive review encapsulates six distinct yet interrelated mini-reviews,each shedding light o...In the realm of drug discovery,recent advancements have paved the way for innovative approaches and methodologies.This comprehensive review encapsulates six distinct yet interrelated mini-reviews,each shedding light on novel strategies in drug development.(a)The resurgence of covalent drugs is highlighted,focusing on the targeted covalent inhibitors(TCIs)and their role in enhancing selectivity and affinity.(b)The potential of the quantum mechanics-based computational aid drug design(CADD)tool,Cov_DOX,is introduced for predicting protein-covalent ligand binding structures and affinities.(c)The scaffolding function of proteins is proposed as a new avenue for drug design,with a focus on modulating protein-protein interactions through small molecules and proteolysis targeting chimeras(PROTACs).(d)The concept of pro-PROTACs is explored as a promising strategy for cancer therapy,combining the principles of prodrugs and PROTACs to enhance specificity and reduce toxicity.(e)The design of prodrugs through carbon-carbon bond cleavage is discussed,offering a new perspective for the activation of drugs with limited modifiable functional groups.(f)The targeting of programmed cell death pathways in cancer therapies with small molecules is reviewed,emphasizing the induction of autophagy-dependent cell death,ferroptosis,and cuproptosis.These insights collectively contribute to a deeper understanding of the dynamic landscape of drug discovery.展开更多
Synthetic biology(SynBio)is an emerging field of study with great potential in designing,engineering,and constructing new microbial synthetic cells that do not pre-exist in nature or re-engineering existing cells to a...Synthetic biology(SynBio)is an emerging field of study with great potential in designing,engineering,and constructing new microbial synthetic cells that do not pre-exist in nature or re-engineering existing cells to accomplish industrial purposes.Systems biology seeks to understand biology at multiple dimensions,beginning with the molecular and cellular level and progressing to the tissues and organismal level and characterizes cells as complex information-processing systems.SynBio,on the other hand,toggles further and strives to develop and create its systems from scratch.SynBio is now applied in the development of novel therapeutic drugs for the prevention of human diseases,scale up industrial processes,and accomplish previously unfeasible industrial outcomes.This is made possible through significant breakthroughs in DNA sequencing and synthesis technology,as well as insights gained from synthetic chemistry and systems biology.SynBio technologies have allowed for the introduction of improved and synthetic metabolic functionalities in microorganisms to enable the synthesis of a range of pharmacologically-relevant compounds for pharmaceutical exploration.SynBio applications range from finding new ways to making industrial chemical synthesis processes more sustainable as well as the microbial synthesis of improved therapeutic modalities.Hence,this study underpins several innovations,auspicious potentials,and future directions afforded by SynBio that proposes improved industrial microbial synthesis for pharmaceutical exploration.展开更多
Endometrial cancer is the most common gynecologic cancer diagnosed in the United States and mortality is on the rise.Advanced and recurrent endometrial cancer represents a treatment challenge as historically there hav...Endometrial cancer is the most common gynecologic cancer diagnosed in the United States and mortality is on the rise.Advanced and recurrent endometrial cancer represents a treatment challenge as historically there have been limited therapeutic options for patients.In the last several years,multiple practice-changing clinical trials have led to significant improvements in the treatment landscape.This review will cover updates in the treatment and management of advanced and recurrent endometrial cancer with a focus on novel therapeutics,such as anti-PD-L1 and PD-1 inhibitors,poly ADP-ribose polymerase(PARP)inhibitors,antibody-drug conjugates,and hormonal therapy.展开更多
The learning algorithms of causal discovery mainly include score-based methods and genetic algorithms(GA).The score-based algorithms are prone to searching space explosion.Classical GA is slow to converge,and prone to...The learning algorithms of causal discovery mainly include score-based methods and genetic algorithms(GA).The score-based algorithms are prone to searching space explosion.Classical GA is slow to converge,and prone to falling into local optima.To address these issues,an improved GA with domain knowledge(IGADK)is proposed.Firstly,domain knowledge is incorporated into the learning process of causality to construct a new fitness function.Secondly,a dynamical mutation operator is introduced in the algorithm to accelerate the convergence rate.Finally,an experiment is conducted on simulation data,which compares the classical GA with IGADK with domain knowledge of varying accuracy.The IGADK can greatly reduce the number of iterations,populations,and samples required for learning,which illustrates the efficiency and effectiveness of the proposed algorithm.展开更多
Structural optimization of lead compounds is a crucial step in drug discovery.One optimization strategy is to modify the molecular structure of a scaffold to improve both its biological activities and absorption,distr...Structural optimization of lead compounds is a crucial step in drug discovery.One optimization strategy is to modify the molecular structure of a scaffold to improve both its biological activities and absorption,distribution,metabolism,excretion,and toxicity(ADMET)properties.One of the deep molecular generative model approaches preserves the scaffold while generating drug-like molecules,thereby accelerating the molecular optimization process.Deep molecular diffusion generative models simulate a gradual process that creates novel,chemically feasible molecules from noise.However,the existing models lack direct interatomic constraint features and struggle with capturing long-range dependencies in macromolecules,leading to challenges in modifying the scaffold-based molecular structures,and creates limitations in the stability and diversity of the generated molecules.To address these challenges,we propose a deep molecular diffusion generative model,the three-dimensional(3D)equivariant diffusion-driven molecular generation(3D-EDiffMG)model.The dual strong and weak atomic interaction force-based long-range dependency capturing equivariant encoder(dual-SWLEE)is introduced to encode both the bonding and non-bonding information based on strong and weak atomic interactions.Addi-tionally,a gate multilayer perceptron(gMLP)block with tiny attention is incorporated to explicitly model complex long-sequence feature interactions and long-range dependencies.The experimental results show that 3D-EDiffMG effectively generates unique,novel,stable,and diverse drug-like molecules,highlighting its potential for lead optimization and accelerating drug discovery.展开更多
Pesticides play a pivotal role in modern agriculture. However, the pesticide industry faces significant challenges closely linked to major global concerns such as pesticide resistance, environmental pollution, food sa...Pesticides play a pivotal role in modern agriculture. However, the pesticide industry faces significant challenges closely linked to major global concerns such as pesticide resistance, environmental pollution, food safety, and crop yields. Developing safe, efficient, and environmentally friendly pesticides has become a key challenge for the industry. Recently, Qing Yang and colleagues unveiled the mode of action of a dual-functional protein, the ABCH transporter, which plays essential roles in lipid transport to construct the lipid barrier of insect cuticles and in pesticide detoxification within insects. Since ABCH transporters are critical for all insects but absent in mammals and plants, this elegant and exciting work provides a highly promising target for developing safe, low-resistance pesticides. Here, we highlight the groundbreaking discoveries made by Qing Yang's team in unraveling the intricate mechanisms of the ABCH transporter.展开更多
Crystal structure prediction(CSP)is a foundational computational technique for determining the atomic arrangements of crystalline materials,especially under high-pressure conditions.While CSP plays a critical role in ...Crystal structure prediction(CSP)is a foundational computational technique for determining the atomic arrangements of crystalline materials,especially under high-pressure conditions.While CSP plays a critical role in materials science,traditional approaches often encounter significant challenges related to computational efficiency and scalability,particularly when applied to complex systems.Recent advances in machine learning(ML)have shown tremendous promise in addressing these limitations,enabling the rapid and accurate prediction of crystal structures across a wide range of chemical compositions and external conditions.This review provides a concise overview of recent progress in ML-assisted CSP methodologies,with a particular focus on machine learning potentials and generative models.By critically analyzing these advances,we highlight the transformative impact of ML in accelerating materials discovery,enhancing computational efficiency,and broadening the applicability of CSP.Additionally,we discuss emerging opportunities and challenges in this rapidly evolving field.展开更多
Semi-supervised new intent discovery is a significant research focus in natural language understanding.To address the limitations of current semi-supervised training data and the underutilization of implicit informati...Semi-supervised new intent discovery is a significant research focus in natural language understanding.To address the limitations of current semi-supervised training data and the underutilization of implicit information,a Semi-supervised New Intent Discovery for Elastic Neighborhood Syntactic Elimination and Fusion model(SNID-ENSEF)is proposed.Syntactic elimination contrast learning leverages verb-dominant syntactic features,systematically replacing specific words to enhance data diversity.The radius of the positive sample neighborhood is elastically adjusted to eliminate invalid samples and improve training efficiency.A neighborhood sample fusion strategy,based on sample distribution patterns,dynamically adjusts neighborhood size and fuses sample vectors to reduce noise and improve implicit information utilization and discovery accuracy.Experimental results show that SNID-ENSEF achieves average improvements of 0.88%,1.27%,and 1.30%in Normalized Mutual Information(NMI),Accuracy(ACC),and Adjusted Rand Index(ARI),respectively,outperforming PTJN,DPN,MTP-CLNN,and DWG models on the Banking77,StackOverflow,and Clinc150 datasets.The code is available at https://github.com/qsdesz/SNID-ENSEF,accessed on 16 January 2025.展开更多
文摘Structural choice is a significant decision having an important influence on structural function, social economics, structural reliability and construction cost. A Case Based Reasoning system with its retrieval part constructed with a KDD subsystem, is put forward to make a decision for a large scale engineering project. A typical CBR system consists of four parts: case representation, case retriever, evaluation, and adaptation. A case library is a set of parameterized excellent and successful structures. For a structural choice, the key point is that the system must be able to detect the pattern classes hidden in the case library and classify the input parameters into classes properly. That is done by using the KDD Data Mining algorithm based on Self Organizing Feature Maps (SOFM), which makes the whole system more adaptive, self organizing, self learning and open.
基金supported by the National Science Foundation of China(grants No.41372090 and 41573042)the National Special Research Programs for Non-Profit Trades (grant No.201311136)Basic Scientific Research Operation Cost of State-Leveled Public Welfare Scientific Research Courtyard(grant No.K1203)
文摘Thallium has been used geochemical exploration of gold deposits. However, as an indicator element in searching for hydrothermal the T1 minerals and mineralization are rare in nature. Lorandite T1AsS2, a relatively uncommon mineral, has been dominantly discovered in some Carlin gold deposits, and minor Sb- Hg, U and Pb-Zn-Ag deposits.
文摘Objective To explore the candidate genes that play significant roles in the interconnection between abdominal aortic aneurysm(AAA)and type 2 diabetes mellitus(DM).Methods We used the Biomedical Discovery Support System(BITOLA)to screen out the candidate intermediate molecular(CIM)"Gene or Gene Product”that are related to AAA and DM.The dataset of GSE13760,GSE7084,GSE57691,GSE47472 were used to analyze the differentially expressed genes(DEGs)of AAA and DM compared to the healthy status.We used the online tool ofVenny 2.1 assisted by manual checking to identify the overlapped DEGs with the CIMs.The Human eFP Browser was applied to examine the tissue specific expression levels of the detected genes in order to recognize strong expressed genes in both human artery and pancreatic tissue.Results There were 86 CIMs suggested by the closed BITOLA system.Among all the DEGs of AAA and DM,8 genes in GSE7084(ISG20,ITGAX,DSTN,CCL5,CCR5,AGTR1,CD19,CD44)and 2 genes in GSE 13760(PSMD12,FAS)were found to be overlapped with the 86 CIMs.By manual checking and comparing with tissuespecific gene data through Human eFP Browser,the gene PSMD12(proteasome 26S subunit,non-ATPase 12)was recognized to be strongly expressed in both the aorta and pancreatic tissue.Conclusion We proposed a hypothesis through text mining that PSMD12 might be involved or potentially involved in the interconnection between AAA and DM,which may provide a new clue for studies on novel therapeutic strategies for the two diseases.
文摘The need for the analysis of modern businesses is rapidly increasing as the supporting enterprise systems generate more and more data.This data can be extremely valuable for executing organizations because the data allows constant monitoring,analyzing,and improving the underlying processes,which leads to the reduction of cost and the improvement of the quality.Process mining is a useful technique for analyzing enterprise systems by using an event log that contains behaviours.This research focuses on the process discovery and refinement using real-life event log data collected from a large multinational organization that deals with coatings and paints.By investigating and analyzing their order handling pro-cesses,this study aims at learning a model that gives insight inspection of the processes and performance analysis.Furthermore,the animation is also performed for the better inspection,diagnostics,and compliance-related questions to specify the system.The configuration of the system and the conformance checking for further enhancement is also addressed in this research.To achieve the objectives,this research uses process mining techniques,i.e.process discovery in the form of formal Petri nets models with the help of process maps,and process refinement through conformance checking and enhancement.Initially,the identified executed process is reconstructed by using the process discovery techniques.Following the reconstruction,we perform a deep analysis for the underlying process to ensure the process improvement and redesigning.Finally,some recommendations are made to improve the enterprise management system processes.
文摘A new structure of ESKD (expert system based on knowledge discovery system KD (D&K)) is first presented on the basis of KD (D&K)-a synthesized knowledge discovery system based on double-base (database and knowledge base) cooperating mechanism. With all new features, ESKD may form a new research direction and provide a great probability for solving the wealth of knowledge in the knowledge base. The general structural frame of ESKD and some sub-systems among ESKD have been described, and the dynamic knowledge base based on double-base cooperating mechanism has been emphased on. According to the result of demonstrative experi- ment, the structure of ESKD is effective and feasible.
基金NTT-DoCoMo Beijing Communication Labs the National High Technology Research and Development Program of China(No.2006AA01Z276).
文摘To overcome the problem of existing neighboring access point (AP) discovery methods in WLAN, for example they (PnP) of mul proposed. Us can not provide the accurate neighboring APs information needed for the Plug-and-Play ti-mode APs, three kinds of neighboring AP discovery and information exchange methods are ing these three neighboring AP discovery methods, passive discovery method, active discovery method and station assistant discovery method, the multi-mode AP can discover all neighboring APs and obtain needed information. We further propose two whole process flows, which combine three discovery methods in different manner, to achieve different goals. One process flow is to discover the neighboring AP as fast as possible, called fast discovery process flow. The other is to discover the neighboring AP with minimal interference to neighboring and accuracy of the method is confirmed APs, called the minimal interference process flow. The validity by the simulation.
文摘A method is presented for performing knowledge discovery on the dynamic data of a nonlinear system. In the proposed approach, a synchronized phasor measurement technique is used to acquire the dynamic data of the nonlinear system and a hyper-rectangular type neural network (HRTNN) is then applied to extract crisp and fuzzy rules with which to estimate the system stability. The effectiveness of the proposed methodology is verified using the dynamic data of a typical real-world nonlinear system, namely an AEP-14 bus, and the extracted rules are relating to the knowledge discovery of the stability levels for the nonlinear system. The discovered relationships among the dynamic data (i.e., the operating state), the extracted rules, and the system stability are confirmed by means of a two-stage confirmatory factor analysis.
基金supported in part by National Institute of Health(NIH),USA(Grant Nos.:R01GM126189,R01AI164266,and R35GM148196)the National Science Foundation,USA(Grant Nos.DMS2052983,DMS-1761320,and IIS-1900473)+3 种基金National Aero-nautics and Space Administration(NASA),USA(Grant No.:80NSSC21M0023)Michigan State University(MSU)Foundation,USA,Bristol-Myers Squibb(Grant No.:65109)USA,and Pfizer,USAsupported by the National Natural Science Foundation of China(Grant Nos.:11971367,12271416,and 11972266).
文摘Transformer models have emerged as pivotal tools within the realm of drug discovery,distinguished by their unique architectural features and exceptional performance in managing intricate data landscapes.Leveraging the innate capabilities of transformer architectures to comprehend intricate hierarchical dependencies inherent in sequential data,these models showcase remarkable efficacy across various tasks,including new drug design and drug target identification.The adaptability of pre-trained trans-former-based models renders them indispensable assets for driving data-centric advancements in drug discovery,chemistry,and biology,furnishing a robust framework that expedites innovation and dis-covery within these domains.Beyond their technical prowess,the success of transformer-based models in drug discovery,chemistry,and biology extends to their interdisciplinary potential,seamlessly combining biological,physical,chemical,and pharmacological insights to bridge gaps across diverse disciplines.This integrative approach not only enhances the depth and breadth of research endeavors but also fosters synergistic collaborations and exchange of ideas among disparate fields.In our review,we elucidate the myriad applications of transformers in drug discovery,as well as chemistry and biology,spanning from protein design and protein engineering,to molecular dynamics(MD),drug target iden-tification,transformer-enabled drug virtual screening(VS),drug lead optimization,drug addiction,small data set challenges,chemical and biological image analysis,chemical language understanding,and single cell data.Finally,we conclude the survey by deliberating on promising trends in transformer models within the context of drug discovery and other sciences.
文摘Recent developments in database technology have seen a wide variety of data being stored in huge collections. The wide variety makes the analysis tasks of a generic database a strenuous task in knowledge discovery. One approach is to summarize large datasets in such a way that the resulting summary dataset is of manageable size. Histogram has received significant attention as summarization/representative object for large database. But, it suffers from computational and space complexity. In this paper, we propose an idea to transform the histogram object into a Piecewise Linear Regression (PLR) line object and suggest that PLR objects can be less computational and storage intensive while compared to those of histograms. On the other hand to carry out a cluster analysis, we propose a distance measure for computing the distance between the PLR lines. Case study is presented based on the real data of online education system LMS. This demonstrates that PLR is a powerful knowledge representative for very large database.
文摘BACKGROUND Gastrointestinal(GI)malignancies,including gastric and colorectal cancers,remain one of the primary contributors to cancer-related illness and death globally.Despite the availability of conventional diagnostic tools,early detection and personalized treatment remain significant clinical challenges.Integrated multi-omics methods encompassing genomic,transcriptomic,proteomic,metabolomic,and microbiome profiles have emerged as powerful tools for advancing precision oncology,improving diagnostic accuracy,and informing therapeutic strategies.AIM To investigate the application of multi-omics approaches in the early detection,risk stratification,treatment optimization,and biomarker discovery of GI malignancies.METHODS The systematic review process was conducted in accordance with the PRISMA 2020 guidelines.Five databases,PubMed,ScienceDirect,Scopus,ProQuest,and Web of Science,were searched for studies published in English from 2015 onwards.Eligible studies involved human subjects and focused on multi-omics integration in GI cancers,including biomarker identification,tumor microenvironment analysis,tumor heterogeneity,organoid modeling,and artificial intelligence(AI)-driven analytics.Data extraction included study characteristics,omics modalities,clinical applications,and evaluation of study quality conducted with the Cochrane risk of bias 2.0 instrument.RESULTS A total of 17196 initially identified articles,20 met the inclusion criteria.The findings highlight the superiority of multi-omics platforms over traditional biomarkers(e.g.,carcinoembryonic antigen and carbohydrate antigen 19-9 in detecting early stage GI cancers.Key applications include the identification of circulating tumor DNA,extracellular vesicles,lipidomic and proteomic signatures,and the adoption of AI algorithms to enhance diagnostic precision.Multi-omics analysis has also revealed the mechanisms of immune modulation,tumor microenvironment regulation,metastatic behavior,and drug resistance.Organoid models and microbiota profiling have contributed to personalized therapeutic strategies and immunotherapy optimization.CONCLUSION Multi-omics approaches offer significant advancements in the early diagnosis,prognostic evaluation,and personalized treatment of GI malignancies.Their integration with AI analytics,organoid biobanking,and microbiota modulation provides a pathway for precision oncology research.
文摘This review presents a comprehensive and forward-looking analysis of how Large Language Models(LLMs)are transforming knowledge discovery in the rational design of advancedmicro/nano electrocatalyst materials.Electrocatalysis is central to sustainable energy and environmental technologies,but traditional catalyst discovery is often hindered by high complexity,fragmented knowledge,and inefficiencies.LLMs,particularly those based on Transformer architectures,offer unprecedented capabilities in extracting,synthesizing,and generating scientific knowledge from vast unstructured textual corpora.This work provides the first structured synthesis of how LLMs have been leveraged across various electrocatalysis tasks,including automated information extraction from literature,text-based property prediction,hypothesis generation,synthesis planning,and knowledge graph construction.We comparatively analyze leading LLMs and domain-specific frameworks(e.g.,CatBERTa,CataLM,CatGPT)in terms of methodology,application scope,performance metrics,and limitations.Through curated case studies across key electrocatalytic reactions—HER,OER,ORR,and CO_(2)RR—we highlight emerging trends such as the growing use of embedding-based prediction,retrieval-augmented generation,and fine-tuned scientific LLMs.The review also identifies persistent challenges,including data heterogeneity,hallucination risks,lack of standard benchmarks,and limited multimodal integration.Importantly,we articulate future research directions,such as the development of multimodal and physics-informedMatSci-LLMs,enhanced interpretability tools,and the integration of LLMswith selfdriving laboratories for autonomous discovery.By consolidating fragmented advances and outlining a unified research roadmap,this review provides valuable guidance for both materials scientists and AI practitioners seeking to accelerate catalyst innovation through large language model technologies.
基金supported by grants from the National Natural Science Foundation of China(No.82273770)the Foundation for Innovative Research Groups of the National Natural Science Foundation of Sichuan Province(No.24NSFTD0051).
文摘In the realm of drug discovery,recent advancements have paved the way for innovative approaches and methodologies.This comprehensive review encapsulates six distinct yet interrelated mini-reviews,each shedding light on novel strategies in drug development.(a)The resurgence of covalent drugs is highlighted,focusing on the targeted covalent inhibitors(TCIs)and their role in enhancing selectivity and affinity.(b)The potential of the quantum mechanics-based computational aid drug design(CADD)tool,Cov_DOX,is introduced for predicting protein-covalent ligand binding structures and affinities.(c)The scaffolding function of proteins is proposed as a new avenue for drug design,with a focus on modulating protein-protein interactions through small molecules and proteolysis targeting chimeras(PROTACs).(d)The concept of pro-PROTACs is explored as a promising strategy for cancer therapy,combining the principles of prodrugs and PROTACs to enhance specificity and reduce toxicity.(e)The design of prodrugs through carbon-carbon bond cleavage is discussed,offering a new perspective for the activation of drugs with limited modifiable functional groups.(f)The targeting of programmed cell death pathways in cancer therapies with small molecules is reviewed,emphasizing the induction of autophagy-dependent cell death,ferroptosis,and cuproptosis.These insights collectively contribute to a deeper understanding of the dynamic landscape of drug discovery.
文摘Synthetic biology(SynBio)is an emerging field of study with great potential in designing,engineering,and constructing new microbial synthetic cells that do not pre-exist in nature or re-engineering existing cells to accomplish industrial purposes.Systems biology seeks to understand biology at multiple dimensions,beginning with the molecular and cellular level and progressing to the tissues and organismal level and characterizes cells as complex information-processing systems.SynBio,on the other hand,toggles further and strives to develop and create its systems from scratch.SynBio is now applied in the development of novel therapeutic drugs for the prevention of human diseases,scale up industrial processes,and accomplish previously unfeasible industrial outcomes.This is made possible through significant breakthroughs in DNA sequencing and synthesis technology,as well as insights gained from synthetic chemistry and systems biology.SynBio technologies have allowed for the introduction of improved and synthetic metabolic functionalities in microorganisms to enable the synthesis of a range of pharmacologically-relevant compounds for pharmaceutical exploration.SynBio applications range from finding new ways to making industrial chemical synthesis processes more sustainable as well as the microbial synthesis of improved therapeutic modalities.Hence,this study underpins several innovations,auspicious potentials,and future directions afforded by SynBio that proposes improved industrial microbial synthesis for pharmaceutical exploration.
文摘Endometrial cancer is the most common gynecologic cancer diagnosed in the United States and mortality is on the rise.Advanced and recurrent endometrial cancer represents a treatment challenge as historically there have been limited therapeutic options for patients.In the last several years,multiple practice-changing clinical trials have led to significant improvements in the treatment landscape.This review will cover updates in the treatment and management of advanced and recurrent endometrial cancer with a focus on novel therapeutics,such as anti-PD-L1 and PD-1 inhibitors,poly ADP-ribose polymerase(PARP)inhibitors,antibody-drug conjugates,and hormonal therapy.
基金supported by the National Social Science Fund of China(2022-SKJJ-B-084).
文摘The learning algorithms of causal discovery mainly include score-based methods and genetic algorithms(GA).The score-based algorithms are prone to searching space explosion.Classical GA is slow to converge,and prone to falling into local optima.To address these issues,an improved GA with domain knowledge(IGADK)is proposed.Firstly,domain knowledge is incorporated into the learning process of causality to construct a new fitness function.Secondly,a dynamical mutation operator is introduced in the algorithm to accelerate the convergence rate.Finally,an experiment is conducted on simulation data,which compares the classical GA with IGADK with domain knowledge of varying accuracy.The IGADK can greatly reduce the number of iterations,populations,and samples required for learning,which illustrates the efficiency and effectiveness of the proposed algorithm.
基金supported by the National Key R&D Program of China(Grant No.:2023YFF1205102)the National Natural Science Foundation of China(Grant Nos.:82273856,22077143,and 21977127)the Science Foundation of Guangzhou,China(No.:2Grant024A04J2172).
文摘Structural optimization of lead compounds is a crucial step in drug discovery.One optimization strategy is to modify the molecular structure of a scaffold to improve both its biological activities and absorption,distribution,metabolism,excretion,and toxicity(ADMET)properties.One of the deep molecular generative model approaches preserves the scaffold while generating drug-like molecules,thereby accelerating the molecular optimization process.Deep molecular diffusion generative models simulate a gradual process that creates novel,chemically feasible molecules from noise.However,the existing models lack direct interatomic constraint features and struggle with capturing long-range dependencies in macromolecules,leading to challenges in modifying the scaffold-based molecular structures,and creates limitations in the stability and diversity of the generated molecules.To address these challenges,we propose a deep molecular diffusion generative model,the three-dimensional(3D)equivariant diffusion-driven molecular generation(3D-EDiffMG)model.The dual strong and weak atomic interaction force-based long-range dependency capturing equivariant encoder(dual-SWLEE)is introduced to encode both the bonding and non-bonding information based on strong and weak atomic interactions.Addi-tionally,a gate multilayer perceptron(gMLP)block with tiny attention is incorporated to explicitly model complex long-sequence feature interactions and long-range dependencies.The experimental results show that 3D-EDiffMG effectively generates unique,novel,stable,and diverse drug-like molecules,highlighting its potential for lead optimization and accelerating drug discovery.
基金National Natural Science Foundation of China(32471265).
文摘Pesticides play a pivotal role in modern agriculture. However, the pesticide industry faces significant challenges closely linked to major global concerns such as pesticide resistance, environmental pollution, food safety, and crop yields. Developing safe, efficient, and environmentally friendly pesticides has become a key challenge for the industry. Recently, Qing Yang and colleagues unveiled the mode of action of a dual-functional protein, the ABCH transporter, which plays essential roles in lipid transport to construct the lipid barrier of insect cuticles and in pesticide detoxification within insects. Since ABCH transporters are critical for all insects but absent in mammals and plants, this elegant and exciting work provides a highly promising target for developing safe, low-resistance pesticides. Here, we highlight the groundbreaking discoveries made by Qing Yang's team in unraveling the intricate mechanisms of the ABCH transporter.
基金supported by the National Key Research and Development Program of China(Grant No.2022YFA1402304)the National Natural Science Foundation of China(Grant Nos.12034009,12374005,52288102,52090024,and T2225013)+1 种基金the Fundamental Research Funds for the Central Universitiesthe Program for JLU Science and Technology Innovative Research Team.
文摘Crystal structure prediction(CSP)is a foundational computational technique for determining the atomic arrangements of crystalline materials,especially under high-pressure conditions.While CSP plays a critical role in materials science,traditional approaches often encounter significant challenges related to computational efficiency and scalability,particularly when applied to complex systems.Recent advances in machine learning(ML)have shown tremendous promise in addressing these limitations,enabling the rapid and accurate prediction of crystal structures across a wide range of chemical compositions and external conditions.This review provides a concise overview of recent progress in ML-assisted CSP methodologies,with a particular focus on machine learning potentials and generative models.By critically analyzing these advances,we highlight the transformative impact of ML in accelerating materials discovery,enhancing computational efficiency,and broadening the applicability of CSP.Additionally,we discuss emerging opportunities and challenges in this rapidly evolving field.
基金supported by Research Projects of the Nature Science Foundation of Hebei Province(F2021402005).
文摘Semi-supervised new intent discovery is a significant research focus in natural language understanding.To address the limitations of current semi-supervised training data and the underutilization of implicit information,a Semi-supervised New Intent Discovery for Elastic Neighborhood Syntactic Elimination and Fusion model(SNID-ENSEF)is proposed.Syntactic elimination contrast learning leverages verb-dominant syntactic features,systematically replacing specific words to enhance data diversity.The radius of the positive sample neighborhood is elastically adjusted to eliminate invalid samples and improve training efficiency.A neighborhood sample fusion strategy,based on sample distribution patterns,dynamically adjusts neighborhood size and fuses sample vectors to reduce noise and improve implicit information utilization and discovery accuracy.Experimental results show that SNID-ENSEF achieves average improvements of 0.88%,1.27%,and 1.30%in Normalized Mutual Information(NMI),Accuracy(ACC),and Adjusted Rand Index(ARI),respectively,outperforming PTJN,DPN,MTP-CLNN,and DWG models on the Banking77,StackOverflow,and Clinc150 datasets.The code is available at https://github.com/qsdesz/SNID-ENSEF,accessed on 16 January 2025.